Interesting..
Doug Cowles, October 16, 2001 - 8:39 pm UTC
Very helpful. One more question though.. can you explain the meaning of idle and non-idle conditions?
October 16, 2001 - 8:53 pm UTC
It basically means that it can happen as the result of an idle wait (which is "innocent") or as the result of a non-idle wait (we were REALLY truly waiting for this to finish before doing something else)
For example:
ops$tkyte@ORA717DEV.US.ORACLE.COM> select event, time_waited
2 from v$session_event
3 where sid = ( select sid from v$mystat where rownum = 1 )
4 and event = 'SQL*Net message from client'
5 /
EVENT TIME_WAITED
---------------------------------------------------------------- -----------
SQL*Net message from client 263
ops$tkyte@ORA717DEV.US.ORACLE.COM> rem wait a bit....
ops$tkyte@ORA717DEV.US.ORACLE.COM> @a
ops$tkyte@ORA717DEV.US.ORACLE.COM> select event, time_waited
2 from v$session_event
3 where sid = ( select sid from v$mystat where rownum = 1 )
4 and event = 'SQL*Net message from client'
5 /
EVENT TIME_WAITED
---------------------------------------------------------------- -----------
SQL*Net message from client 925
Here the sql*net message from client is a true IDLE wait event. As we sit in sqlplus, it just keeps on ticking up..... It is showing that we are idle
However, its not ALWAYS an idle wait event. Say you are running a pro*c app you wrote, its a batch program. Its taking too long to run. You look and see that the sql*net message from client is HUGE -- guess what that means. In this case, it means the database is not the bottleneck, its spent alot of time WAITING for you to tell it to do something. In this case, this is a non-idle condition, we were not idle (as an entire process -- client+database)....
Reader
Reader, October 16, 2001 - 9:13 pm UTC
Tom,
There is only one LGWR process right? How does it
write in parallel to redo log members
What constitutes the # of i/o waits (P3 value)
Thanks
October 16, 2001 - 9:18 pm UTC
There can be IO slaves in the system and most all systems support async IO (fire the write off but don't wait for it so you can fire that next one off -- then wait for them all to finish)..
P3 is the number of requests - not waits.
Reader
Reader, October 17, 2001 - 9:36 am UTC
I understand async i/o. In some some systems ( ex:Silicon Graphics), even disk_asynch_io = true, does not enable
asynch i/o because of some hardware requirements.
In such cases, does Oracle write to log members alternatively in a round-robbin fashion. If so how big a
Oracle buffer is used for each batch of logfile parallel
write. ( I've read from an article that ARC process reads
from the redo logs alternatively buffer = archive_log_buffer_size).
Again could you let us know what constitutes *no. of io requests*
Thanks
October 17, 2001 - 10:02 am UTC
the number of IO requests is how many requests we made to perform IO. How many writes we did to write the data. If you have 1meg of data to write, you typically don't do it in one big call.
As I said, we can use IO slaves to simulate async IO or if there is no async io, we would do it sequentially. We dump the log buffer (happens upon commit, every 3 seconds, when 1/3 full, when 1meg full -- whichever comes first). It is a function of the size of the log buffer which you control.
Follow up on IDLE and Non-idle conditions
Doug, October 17, 2001 - 9:06 pm UTC
Just to follow up on idle and non-idle conditions. I understand the difference, however you said that the log posting waits can include both idle and non-idle conditions and can therefore be misleading. Does this mean that I'm going to get log file sync and log parallel write waits even if there is nothing wrong and it is just waiting for a checkpoint interval for example? Log file sync and log file parallel wait are showing up as the #2 and #3 wait events in one system I watch over which I think is peculiar. I usually seem them in the top 10 or so but never right at the top.
October 17, 2001 - 9:37 pm UTC
IO done is the idle/non-idle event. Log file Parallel Write -- not an idle/non-idle event.
Log stuff -- real time waits.
IO done, maybe real time, maybe not.
What about PX DEQ waits
DGR, October 03, 2002 - 1:44 pm UTC
I can't find ANY documentation on these, but they keep showing up on my reports and people are asking me what can be done about them.
I know some of these must be meaningless (when looking for problems), but can all of them be ignored?
The two I see most often are Execute Reply and Credit:send blkd.
I realize that 2 parallel operations can take different amounts of time to finish their tasks and that some waits cannot be avoided (or even considered bad), I just don't know which ones.
I've seen lot's of questions on the internet about these, but no answers. Can you tell if any of the PX Deq waits are interesting or significant/correctable?
Thanks.
October 03, 2002 - 6:30 pm UTC
search for
px deq
on this site. Ignore this article, read the other (currently) two.
SQL*Net message from client waited time
vinodhps, May 07, 2003 - 8:32 am UTC
Hi... Tom,
In our production system , while executing one procedure it take very long time to execute it when i run the query individualy it run fast.. so i traced the session and found that my SQL*Net message from client waited that much time... Here what will the exact reason for this Tom.
SQL> select * from v$session_event where sid=88;
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.03 0 0 0 0
Execute 18480 36.69 88.65 0 0 0 0
Fetch 18480 8.77 20.28 0 73920 0 18480
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 36961 45.46 108.96 0 73920 0 18480
Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 324 (recursive depth: 1)
SID EVENT TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT
--------- ------------------------------ ----------- -------------- ----------- ------------
MAX_WAIT
---------
3
88 SQL*Net more data to client 1 0 0 0
0
88 SQL*Net message from client 304 0 680756 2239.3289
382821
SQL> select * from v$session_wait where sid=88;
SID SEQ# EVENT
--------- --------- ------------------------------
P1TEXT P1 P1RAW
---------------------------------------------------------------- --------- --------
P2TEXT P2 P2RAW
---------------------------------------------------------------- --------- --------
P3TEXT P3 P3RAW WAIT_TIME
---------------------------------------------------------------- --------- -------- ---------
SECONDS_IN_WAIT STATE
--------------- -------------------
88 2132 SQL*Net message from client
driver id 675562835 28444553
#bytes 1 00000001
0 00 0
136 WAITING
Thanks in advance and Your doing great job tom........
May 07, 2003 - 8:49 am UTC
sqlnet message from client means .....
we were sitting idle waiting for the client to bug us to do something. meaning, the slowdown -- well, it is in the client, it is not asking us to do any work.
now, looking at that tkprof -- that "whatever" statement you have the report for runs instantly -- less then 5/1000ths of a second by the wall clock.
However, you do it lots and anytime you do something lots, it ADDS UP.
I'm always suspicious when I see statements being executed 10's of thousands of times. It leads me to believe the coder only knew how to do row at a time processing, and doesn't do any bulk SET oriented operations.
That is like the difference between going to the library and asking for War and Peace a page at a time -- making the librarian run back and forth and back and forth
versus
Asking for War and Peace -- one bulk operation....
I think the problem lies in the logic of the client.
SQL*Net message from client waited time
vinodhps, May 07, 2003 - 1:13 pm UTC
Thanks for your immediate responce tom, well my query goes like this below.
Can you please tell me with an example about can librarian run page for "WAR" & "PEACE"
SELECT
SUBSTR (a.prd_cd,1,1) PRD_GRP,
SUBSTR (a.prd_cd, 2, 3) variety_cd,
SUBSTR (a.prd_cd, 5, 1) std_ret_ind,
TO_NUMBER (SUBSTR (a.prd_cd,6,3)) gsm,
TO_NUMBER (SUBSTR (a.prd_cd, 9, 4)) / 10 width,
TO_NUMBER (SUBSTR (a.prd_cd, 13, 4)) / 10 length,a.mchn_no,
(SUM(NVL(a.st,0)) - NVL(b.pend_qty,0)) qty,
DECODE( SUBSTR(a.prd_cd,2,3) , c.variety_cd, c.dscnt_cd,
DECODE(c.variety_cd, 'ALL', c.dscnt_cd,NULL)) dscnt_cd,
DECODE( SUBSTR(a.prd_cd,2,3) , c.variety_cd, c.oth_dscnt,
DECODE(c.variety_cd, 'ALL', c.oth_dscnt,NULL)) oth_dscnt,
DECODE( SUBSTR(a.prd_cd,2,3) , c.variety_cd, c.remarks,
DECODE(c.variety_cd, 'ALL', c.remarks,NULL)) remarks
FROM (
SELECT prd_cd, mchn_no,
Func_mks_gdw_ret_stock (prd_cd,DECODE ( MCHN_NO , 1, 1, 2,2),
DECODE ( MCHN_NO , 1, st_date, 2, st_date), 1) st
FROM (
SELECT prd_cd, mchn_no, MAX (stock_date) st_date
FROM t_gdw_daily_stock
WHERE stock_date >= '1-APR-2001'
AND (DECODE (SUBSTR (prd_cd, 13, 4), '0000', 'R', 'S') = 'B'
OR 'B' = 'B')
AND (SUBSTR(prd_cd,2,3) = 'CRW' OR 'CRW' = 'ALL')
AND ( (TO_NUMBER(SUBSTR(prd_cd,6,3)) BETWEEN 999 AND 999)
OR (999 = 999 AND 999 = 999 ))
AND ( (SUBSTR(prd_cd,5,1)= 'B' ) OR ('B' = 'B'))
AND (mchn_no = 0 OR 0 = 0)
GROUP BY prd_cd,mchn_no
)
) a,
(
SELECT prd_cd, SUM (NVL(accpt_qty,0) - NVL (dsp_qty, 0)) pend_qty
FROM t_ord_prd_dtls
WHERE ROWID IN
(
SELECT /*+ index_ffs(T_ORD_PRD_DTLS CS_ORD_PRD_DTL_PK) */ rowid
FROM t_ord_prd_dtls
WHERE prd_status = 'N'
AND (accpt_qty - NVL(dsp_qty,0)) >= .2
AND (DECODE (SUBSTR (prd_cd, 13, 4), '0000', 'R', 'S') = 'B' OR 'B'= 'B')
AND (SUBSTR(prd_cd,2,3) = 'CRW' OR 'CRW' = 'ALL')
AND ( (TO_NUMBER(SUBSTR(prd_cd,6,3)) BETWEEN 999 AND 999)
OR (999 = 999 AND 999 = 999 ))
AND ( (SUBSTR(prd_cd,5,1)= 'B' ) OR ('B' ='B'))
AND (mchn_no = 0 OR 0 = 0)
)
GROUP BY prd_cd
) b,
(
SELECT a.mchn_no, a.variety_cd, a.from_gsm, a.to_gsm, a.from_width,
a.to_width, a.from_length, a.to_length, a.dscnt_cd,
a.oth_dscnt, a.std_ret_ind, a.remarks, b.amt
FROM t_mks_stock_lot_mst a,
t_mks_gen_dscnt_mst b
WHERE a.DSCNT_CD = b.DSCNT_CD
AND TRUNC(SYSDATE) BETWEEN a.FROM_DATE AND a.TO_DATE
) c
WHERE a.prd_cd = b.prd_cd
AND (( TO_NUMBER(SUBSTR(a.prd_cd,7,2)) BETWEEN c.from_gsm AND c.to_gsm)
OR (TO_NUMBER(SUBSTR(a.prd_cd,7,2)) = TO_NUMBER(SUBSTR(a.prd_cd,7,2))))
AND (( TO_NUMBER(SUBSTR(a.prd_cd,9,4))/10 BETWEEN c.from_width AND c.to_width )
OR ( TO_NUMBER(SUBSTR(a.prd_cd,9,4))/10 = TO_NUMBER(SUBSTR(a.prd_cd,9,4))/10) )
AND ((TO_NUMBER(SUBSTR(a.prd_cd,13,4))/10 BETWEEN c.from_length AND c.to_length)
OR ( TO_NUMBER(SUBSTR(a.prd_cd,13,4))/10 = TO_NUMBER(SUBSTR(a.prd_cd,13,4))/10))
AND (SUBSTR(a.prd_cd,2,3) = c.variety_cd
OR 'ALL' = c.variety_cd
OR SUBSTR(a.prd_cd,2,3) =SUBSTR(a.prd_cd,2,3))
AND (SUBSTR(a.prd_cd,5,1) = c.std_ret_ind
OR 'B' = c.std_ret_ind
OR SUBSTR(a.prd_cd,5,1) = SUBSTR(a.prd_cd,5,1))
AND (a.mchn_no = c.mchn_no
OR c.mchn_no = 0
OR a.mchn_no = a.mchn_no )
GROUP BY
SUBSTR (a.prd_cd,1,1) ,
SUBSTR (a.prd_cd,2,3) ,
SUBSTR (a.prd_cd,5,1) ,
TO_NUMBER (SUBSTR (a.prd_cd,6,3)) ,
TO_NUMBER (SUBSTR (a.prd_cd, 9, 4)) / 10 ,
TO_NUMBER (SUBSTR (a.prd_cd, 13, 4)) / 10 , a.mchn_no, B.PEND_QTY,
DECODE( SUBSTR(a.prd_cd,2,3) , c.variety_cd, c.dscnt_cd,
DECODE(c.variety_cd, 'ALL', c.dscnt_cd,NULL)) ,
DECODE( SUBSTR(a.prd_cd,2,3) , c.variety_cd, c.oth_dscnt,
DECODE(c.variety_cd, 'ALL', c.oth_dscnt,NULL)) ,
DECODE( SUBSTR(a.prd_cd,2,3) , c.variety_cd, c.remarks,
DECODE(c.variety_cd, 'ALL', c.remarks,NULL))
HAVING (SUM(NVL(a.st,0)) - NVL(b.pend_qty,0)) > 0
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.04 0 0 0 0
Execute 1 0.00 0.02 0 0 0 0
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.00 0.06 0 0 0 0
Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 324
********************************************************************************
SELECT (NVL(OPNG_STOCK,0) + NVL(RCPT_STOCK,0) - NVL(ISS_STOCK,0) -
NVL(RET_STOCK,0) )
FROM
T_GDW_DAILY_STOCK WHERE STOCK_DATE = :b1 AND PRD_CD = :b2 AND MCHN_NO =
:b3
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.03 0 0 0 0
Execute 18480 36.69 88.65 0 0 0 0
Fetch 18480 8.77 20.28 0 73920 0 18480
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 36961 45.46 108.96 0 73920 0 18480
Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 324 (recursive depth: 1)
Thanks in Advance Tom,
May 08, 2003 - 9:03 am UTC
look at your logic
try to figure out "well, hey, why do I run this query 18,480 times. PERHAPS I am missing an opportunity to join"
that is, instead of codeing like this:
for x in ( query1 )
loop
run query2 with inputs from query1
process
just do this
for x in ( query1 joined to query2 )
loop
process.
vinodhps, May 08, 2003 - 11:38 am UTC
Hi Tom,
That was great tom, and iam very happy with your responce
This time i got 2 different senario
1)In this table i got 35,000 rows and 316 block appox~ 75 avg row len 8KB block size but here why it takes these much of time..(wait time)
Here using 10046 i traced the session
SELECT SUM (DECODE (doc_type, 'FI', 1, 'BI', 1, 'EI', 1, 'DN', 1, 'IC', 1, -1) * out_amt),
SUM (DECODE (doc_type, 'FI', 1, 'BI', 1, 'WI', 1, 0))
FROM t_fas_sale_awa
WHERE br_code = br_code
AND doc_type IN ('FI', 'BI', 'EI', 'DN', 'IC', 'CN', 'ADV', 'UMR')
AND doc_no = doc_no
AND prd_group_ind = DECODE (:b5, 'B', prd_group_ind, :b5)
AND indr_code = DECODE (:b4, 'ALL', indr_code, :b4)
AND cons_cd = DECODE (:b3, 'ALL', cons_cd, :b3)
AND :b2 - doc_date > :b1
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 0 0.00 0.00 0 0 0 0
Execute 2573 1.03 1.03 0 0 0 0
Fetch 2573 303.79 817.92 183000 975167 0 2573
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 5146 304.82 818.95 183000 975167 0 2573
Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 21 (recursive depth: 2)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file sequential read 1680 0.11 10.21
db file scattered read 19032 0.20 503.57
********************************************************************************
2) At same session the overall value make me worry about the procedure
SQL> @t1
Query Plan
--------------------------------------------------------------
SELECT STATEMENT [CHOOSE] Cost=19 Rows=2332 Bytes=16324
TABLE ACCESS FULL T_MKS_CONS_MST [ANALYZED]
/*
select cons_cd, func_fas_chk_custind_avail_bal ('P', 'ALL', cons_cd, sysdate,
60, 'Y', 'B') ans
from t_mks_cons_mst
where cons_type = 'C';
*/
CREATE OR REPLACE
FUNCTION func_fas_chk_custind_avail_bal (f_prd_group_ind IN VARCHAR2,
f_indr_code IN VARCHAR2,
f_cust_code IN VARCHAR2,
f_as_on_date IN DATE,
f_days IN NUMBER,
f_is_2give_net_bal IN VARCHAR2,
f_chk_what IN VARCHAR2)
RETURN NUMBER AS
v_awa_bal NUMBER := 0;
v_pending_qty_bal NUMBER := 0;
v_os_inv_cnt NUMBER := 0;
v_ret_val NUMBER := 0;
v_limit NUMBER := 0;
v_days NUMBER := f_days;
BEGIN
BEGIN
IF f_is_2give_net_bal = 'Y' THEN
IF f_cust_code <> 'ALL' THEN
SELECT cred_lmt, credit_days
INTO v_limit, v_days
FROM t_mks_cons_cred_lmt_mst
WHERE cons_cd = f_cust_code
AND f_as_on_date BETWEEN from_date AND NVL (to_date, f_as_on_date);
ELSE
IF f_indr_code <> 'ALL' THEN
SELECT cred_lmt, credit_days
INTO v_limit, v_days
FROM t_mks_ind_cred_lmt_mst
WHERE ind_cd = f_indr_code
AND f_as_on_date BETWEEN from_date AND NVL (to_date, f_as_on_date);
END IF;
END IF;
END IF;
EXCEPTION
WHEN NO_DATA_FOUND THEN
v_limit := 0; -- temp val so that output will fail
v_days := 0;
END;
IF f_chk_what IN ('A', 'B') THEN
SELECT SUM (DECODE (doc_type, 'FI', 1, 'BI', 1, 'EI', 1, 'DN', 1, 'IC', 1, -1) * out_amt),
SUM (DECODE (doc_type, 'FI', 1, 'BI', 1, 'WI', 1, 0))
INTO v_awa_bal, v_os_inv_cnt
FROM t_fas_sale_awa
WHERE br_code = br_code
AND doc_type IN ('FI', 'BI', 'EI', 'DN', 'IC', 'CN', 'ADV', 'UMR')
AND doc_no = doc_no
AND prd_group_ind = DECODE (f_prd_group_ind, 'B', prd_group_ind, f_prd_group_ind)
AND indr_code = DECODE (f_indr_code, 'ALL', indr_code, f_indr_code)
AND cons_cd = DECODE (f_cust_code, 'ALL', cons_cd, f_cust_code)
AND f_as_on_date - doc_date > v_days;
END IF;
IF f_chk_what IN ('O', 'B') THEN
IF f_cust_code <> 'ALL' THEN
SELECT NVL (pending_order_value, 0)
INTO v_pending_qty_bal
FROM t_mks_cons_mst
WHERE cons_cd = f_cust_code;
ELSE
IF f_indr_code <> 'ALL' THEN
SELECT NVL (pending_order_value, 0)
INTO v_pending_qty_bal
FROM t_mks_ind_mst
WHERE ind_cd = f_indr_code;
END IF;
END IF;
END IF;
v_ret_val := v_awa_bal + v_pending_qty_bal;
v_ret_val := v_ret_val - v_limit;
IF v_ret_val > 0 AND v_os_inv_cnt > 0 AND f_cust_code <> 'ALL' THEN
v_ret_val := v_ret_val * -1;
END IF;
RETURN (v_ret_val);
END func_fas_chk_custind_avail_bal;
/
May 09, 2003 - 11:48 am UTC
well that is about as ugly as it gets. sorry -- you'll want to relook at the logic, see if you can tighten it up, see if you can move things around. It is a bummer when the data you need is strewn throughout your database sort of all over the place. Makes it really hard to piece back together.
"WAR" & "PEACE"
vinodhps, May 08, 2003 - 12:17 pm UTC
Hi tom ,
With Your Referance for
just do this
for x in ( query1 joined to query2 )
loop
process.
Actually i would like to tell what we are doing is..
we got 6696 consinee for each person we are doing that process how do i join query one with query two.
so do i have to process 6696 times ...
can you clear my doubt Tom..
Thanks In Advance.
SQL*Net message from Client - more on
A reader, August 16, 2003 - 9:11 am UTC
Hi Tom,
I run a query from SQL*Plus (connected directly - no connect string passed). I have a very unpredictable and high wait against SQL*Net message from client, when I see the SQL Trace. Why is that when a query is submitted a non-idle event SQL*Net message from client kicks into? Please tell me more on this event. This is the only event that is making the difference to the query from a minute to a sec.
Thanks
August 16, 2003 - 9:12 am UTC
need
a) example
b) how you are computing waits
c) versions
d) platforms
e) lots of stuff so I could make an educated guess...
How to get v$mystats contect from other session
A reader, January 21, 2005 - 4:44 am UTC
Hi Tom, is there a way to get the content I can see in the view v$mystats in my session from other session, please
Thank your Tom
January 21, 2005 - 8:26 am UTC
you use v$sesstat instead, it has them all.
A reader, January 21, 2005 - 1:27 pm UTC
Thanks Tom
Is there a view showing all stats per class
A reader, January 24, 2005 - 2:08 pm UTC
Hi Tom in 10g
Sorry, is there a view showing only all stats per class
I can't find.
Thanks.
January 24, 2005 - 3:38 pm UTC
the "stats by class" are in v$sysstat
A reader, January 24, 2005 - 5:48 pm UTC
Thanks Tom, I found them in v$event_name
Please clarify
A reader, January 25, 2005 - 4:44 pm UTC
Tom,
In the above discussion, you mentioned
<quote>
now, looking at that tkprof -- that "whatever" statement you have the report for
runs instantly -- less then 5/1000ths of a second by the wall clock.
However, you do it lots and anytime you do something lots, it ADDS UP.
I'm always suspicious when I see statements being executed 10's of thousands of
times. It leads me to believe the coder only knew how to do row at a time
processing, and doesn't do any bulk SET oriented operations.
That is like the difference between going to the library and asking for War and
Peace a page at a time -- making the librarian run back and forth and back and
forth
</quote>
1. Where can you find the information you mentioned: "from the tkprof report, it runs runs instantly -- less then 5/1000ths of a second by the wall clock."
2. Is there a way to differentiate the "SQL*Net message from client" in a situation where a client say SQL*Plus, just logs into SQL*Plus without doing nothing and the clock ticking up vs. the scenario where a batch process runs and the server waits for the client asking it to do something
3. What is the meaning of the following wait events :
1. SQL*Net message from client
2. SQL*Net message to client
3. SQL*Net more data from client
4. SQL*Net more data to client
I read the descriptions of Oracle manuals and found very difficult to understand and interpret.
5. What can we infer if SQL*Net message from client is huge and we're running a batch-job as you had mentioned.
Your help is greatly appreciated
January 25, 2005 - 7:09 pm UTC
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.03 0 0 0 0
Execute 18480 36.69 88.65 0 0 0 0
Fetch 18480 8.77 20.28 0 73920 0 18480
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 36961 45.46 108.96 0 73920 0 18480
36.69/18480 = 0.001985 cpu seconds/execution
88.65/18480 = 0.004797 wall clock seconds/execution.
the query ran really really fast. They just ran it over and over and over.
2) you would "know" -- in a batch program -- you know "this is batch, therefore sqlnet message from client is not an idle wait event.
or "these 5 queries are executed in rapid succession in our application -- any sqlnet message from client waits during their execution (trace file contains this raw data) indicates client isn't processing data nearly as fast as the database is throwing it at it.
3) 1 - database waiting to be told to do something
2 - database is sending message to client but having to wait (slow link to
client perhaps)
3 - waiting on read from client -- they've started talking to us, but
it is taking a long time to get the entire message
4 - same as #2 but the n'th send in a series of sends.
4) I like 4, it was easy to answer :)
5) make the client process data faster "typically" (that is usually the cause, the client is taking a long time to process the data, hence the database is sitting there "waiting")
if the client is going at light speed, split the client up -- implement "do it yourself parallel processing" -- run N copies of your client at the same time.
Typo!!!
A reader, January 25, 2005 - 10:25 pm UTC
Hi Tom,
Thanks much for your answer. However, for question 5 above, in your last paragraph you mention "if the client is going at light speed...", however after reading the discussion above it I understand it should be "if the client is NOT going at light speed...". Please advice if it is a typo or my understanding is not correct.
January 26, 2005 - 8:24 am UTC
it was not a typo. I was saying "if the bottleneck is in the client, but the client is already going light speed (as fast as it can), the only way to remove the bottleneck is by adding more of the same clients -- all working at light speed, but getting the job done faster since there are more of them"
RE : Bulk collect
A reader, January 26, 2005 - 9:20 am UTC
Tom,
I have understood the significance of SQL*Net message from client. However, I have the following question regarding "SQL*Net message from client".
1. "SQL*Net message from client" event will be applicable only
A. If the client and server are on different machines and the client uses Net8/SQL*Net for communicating with server
B. If the client and server are on same machine but the client uses Net8/SQL*Net using the protocol such as TCP/IP to communicate with the server instead of using the IPC communication.
Is my understanding correct?
2. When the client say a PL/SQL or Pro*C application does a bulk fetch of 100 rows at a time, the server process would bring back 100 rows at a time to the client. The client process would then these 100 records 1 at a time.
In that case what would happen to SQL*Net message from client event? Is it not true that the "SQL*Net message from client" would keep ticking (since client is doing it's own processing and the server is waiting for the client to give some work). Please advise if I am not correct. If my understanding is correct, how would we identify a difference between a true bottleneck and a bulk operation?
3. Also, if the client and server are running on the same box and if the client uses Net8/SQL*Net connection to connect to the server rather than IPC, how would the performance compare? What would happen to "SQL*Net message from client" event in these cases.
Sorry that I get these kind of doubts as and when I understand more about how these wait events work.
As always, your help is highly appreciate to clear the confusions I have
January 26, 2005 - 9:35 am UTC
1) is is applicable in all cases. it is the wait registered by the server under all protocols when the server is just sitting there "waiting"
2) anytime the server isn't being asked to do something, it is ticking away at this wait event.
how do you recognize the difference between a true bottleneck and a bulk operation? first you have to tell me what the difference is! You look at the accrued wait time of say "15 seconds" encountered during the fetch of those rows. Now, you tell me (with your domain knowledge of your process -- something you and only you has...) is 15 seconds
a) good
b) bad
c) just ok
if 15 seconds of server idle time while the client processed the records is a) good, then so be it. You and only you can make that determination. Perhaps that was 15 seconds but we processed 1 billion records. that is probably good.
But if we processed 101 records, it is probably (PROBABLY) bad. but again, only you can make that determination.
3) ipc gives the same metrics.
Unit of measure of statitics
A reader, January 28, 2005 - 4:13 am UTC
Hi Tom I would like to know which is the unit of mewasure for al statistics in mystats view, is this possible, or the only unit of measure is blocks.
Thank you
January 28, 2005 - 7:28 am UTC
A reader, January 28, 2005 - 10:09 am UTC
Thanks Tom I was searching for something like another view to join, and get the unit of measure :)
Lost time.
Sai, January 28, 2005 - 6:35 pm UTC
Tom,
We are using Oracle 8.1.7.4 64 bit and SAN storage. Here, the total time waited by each background process from v$session_event not matching with the database up time.
15:03:22 SQL> select sum(time_waited)/(100*3600) from v$session_event where sid=4;
SUM(TIME_WAITED)/(100*3600)
---------------------------
36.6146056
1* select (sysdate-startup_time)*24 from v$instance where 4=4
15:04:53 SQL> /
(SYSDATE-STARTUP_TIME)*24
-------------------------
58.935
15:04:53 SQL> select sum(time_waited)/(100*3600) from v$session_event where sid=2;
SUM(TIME_WAITED)/(100*3600)
---------------------------
56.1558333
15:07:25 SQL> c/2/3
1* select sum(time_waited)/(100*3600) from v$session_event where sid=3
15:10:27 SQL> /
SUM(TIME_WAITED)/(100*3600)
---------------------------
55.2067444
There Sid's 2 and 3 are the DB writers and Sid 4 is the Log writer.
can you please explain why the timings didnt match?
Thanks
January 28, 2005 - 7:31 pm UTC
why do you think they should match even a little bit?
Re: Lost time.
Sai, January 28, 2005 - 9:03 pm UTC
Tom,
My understanding is that all Oracle background processes record their time spent on each waitevent in v$session_event, including the idle waitevents like 'rdbms%' or '%timer', hence I thought the times should match.
Othewise, where is that time spent, which is not recorded in v$session_event?
Thanks
January 29, 2005 - 8:26 am UTC
it is spent "not waiting on something"
If they were always waiting "on something", your database would be "frozen" :)
Pls. clarify
A reader, February 03, 2005 - 9:21 am UTC
Hi Tom,
In the discussion above, you mentioned the following :
<Quote>
2) you would "know" -- in a batch program -- you know "this is batch, therefore
sqlnet message from client is not an idle wait event.
or "these 5 queries are executed in rapid succession in our application -- any
sqlnet message from client waits during their execution (trace file contains
this raw data) indicates client isn't processing data nearly as fast as the
database is throwing it at it.
</Quote>
1. In the above case, when the database throws data to the client which the client could not process as fast as it is received from database, you will see some "SQL*Net message to client" wait event also when the database waits for the message to be sent to client. Is it correct?
2. How does the server process know that the client has received all of the data it sent to the client and stop waiting on "SQL*Net message to client". Will there be a hand-shake between client and server?
3. For all "SQL*Net message from client", will there be "SQL*Net message to client" wait event? If not, what are the situations under which the database will send data to the client and what are the situations under which the "SQL*Net message to client" wait event seen?
4. How do we know from SQL*Net message from client wait event that the bottle-neck is in the client? By this I mean, the database would have sent the data to the client back, but there may be network problem/latency that makes client not to receive all of the data before it can process them, so the server may be waiting on "SQL*Net message from client". In the above case will it reflect in "SQL*Net message to client" wait event also in which case we can conclude this is a network latency?
I would appreciate if you could give me an example of how the client/server communication works and how the "SQL*Net message ..." wait event works. If you could give a step-by-step scenario of how the communication takes place, it would be really helpful in understanding how these wait events work together?
February 03, 2005 - 2:24 pm UTC
1) if the data base sends data to the client and the client doesn't tell database to do something - sqlnet message from client starts to add up, increment. yes.
2) we sent the last bytes, we are done. now we are waiting. no handshake needed, we finished our send. so sqlnet message from client could be due to
a) client being slow
b) network buffering a ton of data and sending it really slow
3) maybe, maybe not. we report waits when and if we wait, not if we don't have to wait for something
4) you check out things, look at the network, look at the code in the client. if the client is instruemented (fat chance, people just never ever seem to do that :( ) that would be very useful, you could see timings from it.
database waiting for client to say something - sqlnet message from client+++++
client says something, sqlnet message from client written out to trace
it is as simple as it sounds really. while waiting, wait increments, when done waiting, wait is written out to trace file
Some doubts
A reader, February 03, 2005 - 3:30 pm UTC
Tom,
You mentioned in the above thread that when the server sends the last byte to the client, then the server process has done it's job and no hand-shanking needed. So under what circumstances will we see server waiting on "SQL*Net message to client". So my understanding on this is as follows :
Assume process C is client and process S is the server. A typical client/server communication looks like this :
0. S is waiting on "SQL*Net message from client" and the count keeps ticking
1. C sends a message to S
2. A "SQL*Net message from client" event is recorded in trace file
3. S processes and sends the results back to client.
4. After step 3 is complete, S starts waiting again for
"SQL*Net message from client" and increments the counter
5. C (client) processes the data that S sent and sends data/another request to S. Another "SQL*Net message from client" event is written to the trace file.
Questions :
Qn #1. Is the above sequence of steps I mentioned correct? Please advise if this is not correct.
Qn #2. After step 3, a "SQL*Net message to client" event is recorded in trace file if S waits. Correct? If so, under what circumstances will that
happen (because you have mentioned in the above thread that no hand-shaking is needed and S is done once it sends the last byte).
Qn #3. If the time difference between when S sends data to client and starts waiting (step 3 & 4) and when S gets another data/request
from client C is large (step 5), then it can mean :
a) client C is slow
b) network buffering a ton of data and sending it to client C really slow
Is that correct?
Qn #4. Will "SQL*Net message from client" count be reset to 0 each time the server S sends data back to the client and waiting on client C
to send a new request
Qn #5. In one of the discussion threads, you mentioned that "CPU time" is one of the response time components that is written out to the
trace file once the call is completed and not as it happens. Also, I remember the same thing about "SQL*Net message from client". ie
it is recorded in the trace file only when client sends another request. Is it correct?
Qn #6. What are the other wait events that are in this category (mentioned in Qn #5).
As always, thanks much for your help to Oracle community
February 04, 2005 - 1:26 am UTC
that is the time it takes for the send to complete.
start timing
send(...)
stop timing, output wait for "to client"
start timeing
read( .....)
stop timing, output wait for "from client"
Pls. clarify
A reader, February 06, 2005 - 9:04 pm UTC
Hi Tom,
In your book "Expert one-on-one Oracle", you provide an example in chapter 10 that captures the session event once before a PL/SQL block (that does EXECUTE IMMEDIATE in a for loop to insert into a table T. After the example, you mention about the description of SQL*Net message from client. You mentioned that "In most cases, this wait event can be ignored. If application has lots of think-time in it, this number will necessarily be high. In our case, since the SQL*Plus session should have been continously been feeding statements to the database during the time we measured therefore we expect a lower number there". The following are my questions. Please let me know if my understanding is correct. Otherwise, please explain me.
1. My understanding of "SQL*Net message from client" is that the server process keeps incrementing the time waited column until the client session is connected and until the client (SQL*Plus in this case) requests the server to do something. That is the reason you mention that the wait-time on this event will be high ? Correct?
2. When the client sends a request to the server, the wait is written out to the trace file and the counts are reset and the server starts waiting again. Is that the reason, you mention that "you expect a lower number there" ?
3. If the client is extremely slow, then there will be FEWER NUMBER of WAITS (TOTAL_WAITS), but the TIME_WAITED between consecutive waits will be high. This is because the wait count gets incremented each time the server begins a new wait, but the time waited between the successive waits are high indicating a slow network or an inefficient client process or client process doing lot of work and no problem exists in the database. Right?
4. When a session connects to the database, but remains idle, the WAIT count (TOTAL_WAITS) will be 1, but the TIME_WAITED keeps incrementing. Right?
5. In that example, although you loop through 10000 times, is there any reason why the total_waits is only 4 and the time_waited is only 14. My understanding is that total_waits keeps incrementing each time the server process begins a new wait (so it should be 10000 in this case) and the time_waited will be reset each time the client sends a request to the server (in this case it will send 10000 requests). Is that correct ?
6. Also, in that chapter on page 467, you mention that "... event was 8.18 seconds. This just means I waited 8.18 seconds after sending the ALTER SESSION command to the database to send next command....".
In this context, I refers to the database server process.
Correct?
Why is the "SQL*Net message from client" for the ALTER SESSION command not in the trace file?
Why is the ela = 0 for the subsequent "SQL*Net message from client" message (which appears after the EXEC#3 line)
When a client issues a SQL command, is the following the correct sequence that you could see in the trace file?
-- client sent a request. So print "SQL*Net from client"
WAIT #n "SQL*Net message from client" ela = xxx
PARSING IN CURSOR #n......
<SQL statement here>
WAIT #n "<If any non-idle waits associated with parse>"
PARSE #n c=xxx e=zzz ....."
WAIT #n "<If any non-idle waits associated with exec>"
EXEC #n ....
WAIT #n "<If any non-idle waits associated with fetch>"
FETCH #n ....
WAIT #n "SQL*Net message to client....."
-- wait for another request
WAIT #n "SQL*Net message from client...."
I would appreciate if you could confirm my understanding on the wait events. If not please help me.
February 07, 2005 - 4:16 am UTC
these are looking very familar - i feel like I've answered some of them before.....
1) the message from client increments as long as the server process is sitting there waiting and waiting for the client to say something. If you log into sqlplus in the morning -- do nothing all day, and then before you leave home -- issue "select * from dual", that would have generated a days worth of "message from client" wait time prior to the select....
2) as long as we keep the server busy -- it cannot be waiting for us, hence that is why I would expect a low number -- because we are keeping it busy. yes, when we send a request to the server, it writes out the wait, resets the counters, does whatever we told it to and then waits for the next request. if the next request is there already (we are keeping it busy) then there would not be a wait on the message from client.
3) in this case yes, since the message from client is "saved up" until we send it a message. one big wait with lots of time in it.
4) v$session wait will show the increasing time waited on this event, yes.
5) I did multiple statements there -- we "saved" the before waits via an insert, did a couple of statements. that is why you see multiple waits there.
6) No, "I" refered to "me, tom". I waited 8 seconds after sending the alter to send the next command.
if there was no wait, that means it didn't wait -- there was no message from client to be recorded (think about it, the alter turned on tracing, prior to turning on tracing, we are not recording wait times, there was nothing to report on! we only trace when we trace)
How to simulate
A reader, February 07, 2005 - 6:06 pm UTC
Hi Tom,
I want to simulate a situation where there is inefficiency in the client application. For that I wrote the following PL/SQL block.
-- Capture the initial statistics
drop table prev_sess_wait
/
create table prev_sess_wait
as select sid,event,total_waits,time_waited
from v$session_event
where sid = '87'
and event = 'SQL*Net message from client'
/
-- Issue SELECT stmt to generate 'from client' message
declare
x char(10);
begin
select 'aba' into x from dual;
end;
/
-- Simulate a scenario to show inefficiency in client
-- application (tight loop with a huge loop index)
declare
j number;
begin
for j in 1..1000000000
loop
null;
end loop;
end;
/
-- Issue SELECT stmt to generate 'from client' message
declare
x char(10);
begin
select 'aba' into x from dual;
end;
/
-- Get the delta.
select a.event,
b.total_waits - a.total_waits,
b.time_waited - a.time_waited
from v$session_event b,
prev_sess_wait a
where a.sid = b.sid
and a.event = b.event
/
I was trying to simulate a scenario where there is inefficient client/application code and to test my understanding of SQL*Net message from client based on your feedback. I was also hoping that there would be a huge difference in TIME_WAITED between the first and second select statements (because of the tight loop in between that loops for a large number of iterations). But the output I get always prints a lower value for time_waited column.
I am logging into the server where Oracle is running and executing a SQL*Plus session.
I am not sure what I am doing wrong and is there a way to simulate this.
Please help.
Thanks
February 08, 2005 - 1:20 am UTC
then you would need to use a client - not plsql running in the server!
you'll have no sqlnet waits with the above script.
you would do something like:
select * from dual;
pause
select * from dual;
the pause, that is the inefficient client.
Help please
A reader, February 08, 2005 - 11:09 am UTC
Tom,
Thanks much for your previous response. I realized what I was doing wrong there. However, I did the following :
From my TOAD client session (SID = 30), I ran the following :
insert into begin_sess_wait
select sid,event,total_waits,time_waited
from v$session_event
where sid = '30'
and event = 'SQL*Net message from client'
/
declare
x char(10);
begin
select 'aba' into x from dual;
end;
/
-- To simulate a PAUSE
declare
j integer;
i integer;
begin
for j in 1..1000000
loop
for i in 1..1000
loop
null;
end loop;
end loop;
end;
/
declare
x char(10);
begin
select 'aba' into x from dual;
end;
/
insert into end_sess_wait
select a.sid,
a.event,
b.total_waits - a.total_waits,
b.time_waited - a.time_waited
from v$session_event b,
begin_sess_wait a
where a.sid = b.sid
and a.event = b.event
/
-- So an other session can see the wait times
commit
/
After running the above block of code, I logged into another session and did :
select * from end_sess_wait
hoping to get a big number (because of tight loop), but still got only low numbers.
Also, please let me know if the tight loop will be run on the server or on my client?
Please advise
February 09, 2005 - 1:25 am UTC
your pause similuation is NOT A PAUSE, it is busy work on the server, the server is not paused.
the plsql is run in the server.
Waits from TOAD DBA/Server Statistics/Waits
Ma$e, February 09, 2005 - 10:49 am UTC
Hi Tom,
I see the folllowing waits being reported on my d/b. Can you please point me in the right direction what I should be addressing first or what would give me the best roi ?
What numbers are of important? total_wait or time_waited or timeouts.
(sorry about the formatting)
Thanks
System Event total_wait timeouts Time_waited
rdbms ipc message 56898972 36120150 33668096.42
log file parallel write 14737457 14683699 38703.95
pmon timer 1723112 1722480 4991993.55
latch free 1640076 1283451 11745.19
jobq slave wait 338267 322892 1002667.58
enqueue 129245 87978 285329.53
smon timer 16726 16611 4818615.78
log file sync 12798192 7949 128665.68
PX Idle Wait 1953 1735 3545.33
library cache pin 11580 753 3005.81
buffer deadlock 126 126 0
PX Deq: Execute Reply 1612 120 592.35
kksfbc child completion 112 110 1.68
LGWR wait for redo copy 66536 101 13.85
buffer busy waits 5780368 89 38665.97
log buffer space 36496 84 1774.5
PX Deq: Execution Msg 2407 72 193.41
PX Deq: Signal ACK 161 70 5.14
library cache load lock 112 25 123.64
PX Deq: Table Q Normal 850458 12 725.86
row cache lock 54 12 43.71
log file switch completion 29408 7 1246.3
log file switch (checkpoint incomplete) 15 3 4.87
process startup 15354 2 1308.42
rdbms ipc reply 6160 2 89.36
control file heartbeat 1 1 4.01
library cache lock 8 1 3.17
sort segment request 1 1 1.01
inactive session 1 1 1
slave TJ process wait 1 1 0.01
SQL*Net message from client 4639248215 0 285400827.1
db file scattered read 897071974 0 2363449.05
db file sequential read 975636541 0 1040653.1
sbtwrite 22616707 0 119158.44
sbtbackup 492 0 72716.73
async disk IO 9096833 0 43924.3
db file parallel write 849749 0 35688.79
SQL*Net more data to client 157299429 0 20019.08
SQL*Net message to client 4639248348 0 10631.92
sbtclose2 476 0 10581.06
SQL*Net break/reset to client 114953 0 10059.36
db file parallel read 428253 0 9557.09
sbtinfo2 469 0 9330.47
direct path read 3938064 0 8320.02
control file parallel write 1968466 0 7518.8
direct path write 711456 0 4233.8
log file sequential read 185980 0 4085.61
SQL*Net message from dblink 1506 0 1441.8
SQL*Net more data from client 8183157 0 969.79
control file sequential read 2781538 0 869.75
sbtrestore 2 0 526.76
PX Deq Credit: send blkd 136578 0 236.9
log file single write 47992 0 231.86
SQL*Net more data from dblink 63050 0 106.85
control file single write 1714 0 95.59
switch logfile command 1455 0 81.82
PX Deq Credit: need buffer 81146 0 77.03
sbtread2 1893 0 42.62
sbtremove2 5 0 27.07
sbtinit 449 0 23.4
db file single write 4626 0 6.05
PX Deq: Parse Reply 145 0 1.38
local write wait 32 0 0.8
sbtcommand 449 0 0.56
PX Deq: Join ACK 176 0 0.35
refresh controlfile command 22 0 0.05
reliable message 1 0 0.03
sbtend 449 0 0.02
SQL*Net break/reset to dblink 65 0 0.02
wait list latch free 1 0 0.02
PX Deq: Table Q Get Keys 2 0 0.01
SQL*Net message to dblink 1506 0 0
sbtinit2 449 0 0
sbterror 38 0 0
PX qref latch 8 0 0
PX Deq: Msg Fragment 3 0 0
instance state change 2 0 0
PX Deq: Table Q qref 1 0 0
February 09, 2005 - 2:48 pm UTC
just erase it.
sort of meaningless.
I mean -- over what period of time.
are you experiencing a performance issue.
who is waiting - end users (they care) batches (they care less) background processes - they care even less (pmon timer -- big deal)
Let me try again...formating the data
Ma$e, February 09, 2005 - 11:29 am UTC
rdbms ipc message 56898972 36120150 33668096.42
log file parallel write 14737457 14683699 38703.95
pmon timer 1723112 1722480 4991993.55
latch free 1640076 1283451 11745.19
jobq slave wait 338267 322892 1002667.58
enqueue 129245 87978 285329.53
smon timer 16726 16611 4818615.78
log file sync 12798192 7949 128665.68
PX Idle Wait 1953 1735 3545.33
library cache pin 11580 753 3005.81
buffer deadlock 126 126 0
PX Deq: Execute Reply 1612 120 592.35
kksfbc child completion 112 110 1.68
LGWR wait for redo copy 66536 101 13.85
buffer busy waits 5780368 89 38665.97
log buffer space 36496 84 1774.5
PX Deq: Execution Msg 2407 72 193.41
PX Deq: Signal ACK 161 70 5.14
library cache load lock 112 25 123.64
PX Deq: Table Q Normal 850458 12 725.86
row cache lock 54 12 43.71
log file switch completion 29408 7 1246.3
log file switch (checkpoint incomplete) 15 3 4.87
process startup 15354 2 1308.42
rdbms ipc reply 6160 2 89.36
control file heartbeat 1 1 4.01
library cache lock 8 1 3.17
sort segment request 1 1 1.01
inactive session 1 1 1
slave TJ process wait 1 1 0.01
SQL*Net message from client 4639248215 0 285400827.1
db file scattered read 897071974 0 2363449.05
db file sequential read 975636541 0 1040653.1
sbtwrite 22616707 0 119158.44
sbtbackup 492 0 72716.73
async disk IO 9096833 0 43924.3
db file parallel write 849749 0 35688.79
SQL*Net more data to client 157299429 0 20019.08
SQL*Net message to client 4639248348 0 10631.92
sbtclose2 476 0 10581.06
SQL*Net break/reset to client 114953 0 10059.36
db file parallel read 428253 0 9557.09
sbtinfo2 469 0 9330.47
direct path read 3938064 0 8320.02
control file parallel write 1968466 0 7518.8
direct path write 711456 0 4233.8
log file sequential read 185980 0 4085.61
SQL*Net message from dblink 1506 0 1441.8
SQL*Net more data from client 8183157 0 969.79
control file sequential read 2781538 0 869.75
sbtrestore 2 0 526.76
PX Deq Credit: send blkd 136578 0 236.9
log file single write 47992 0 231.86
SQL*Net more data from dblink 63050 0 106.85
control file single write 1714 0 95.59
switch logfile command 1455 0 81.82
PX Deq Credit: need buffer 81146 0 77.03
sbtread2 1893 0 42.62
sbtremove2 5 0 27.07
sbtinit 449 0 23.4
db file single write 4626 0 6.05
PX Deq: Parse Reply 145 0 1.38
local write wait 32 0 0.8
sbtcommand 449 0 0.56
PX Deq: Join ACK 176 0 0.35
refresh controlfile command 22 0 0.05
reliable message 1 0 0.03
sbtend 449 0 0.02
SQL*Net break/reset to dblink 65 0 0.02
wait list latch free 1 0 0.02
PX Deq: Table Q Get Keys 2 0 0.01
SQL*Net message to dblink 1506 0 0
sbtinit2 449 0 0
sbterror 38 0 0
PX qref latch 8 0 0
PX Deq: Msg Fragment 3 0 0
instance state change 2 0 0
PX Deq: Table Q qref 1 0 0
February 09, 2005 - 2:53 pm UTC
rm *
it is not very meaningful.
tkprof - useful
statspack over 15 minutes during a peak load *maybe* useful
a dump of the stats since the database is up, not very useful.
(lots of latch frees, you are using binds right.......)
True. I forgot to give you the elapsed time
Ma$e, February 14, 2005 - 11:20 am UTC
Hi Tom,
I guess a small oversight can lead to inconclusive results.
These stats were collected over several days, at most over a week.
We are having a generic performance problem and I was hoping that your trained eye might pick out something right away.
For example, do the ratios between certain metrics make sense (I'm not sure any specific ones)?
Also. What do I need to look for if I am trying to determine if I need to add freelists? The reason I am going down this path is that we have about 50 d/b connections all trying to insert into at least 4 tables? How do I determnine/demonstrate that I need to consider increasing free lists and if required what pre and post metrics do I need to collect to determine the benefits derived.
Thanks once again.
Ma$e
February 14, 2005 - 1:57 pm UTC
definition of ratio (my definition bear in mind):
ratio: 1) a mathematical device used to hide and obscure information, most useful to make non-relevant information appear to be really important looking. 2) a source of great confusion and panic
the only ratio I look at anymore is soft parse pct -- it should always be HIGH (like 99 high) after the system is up for some period of time.
freelist content could be observed by seeing high values for data block in v$waitstat over some short measured period of time. so, you would look for a reduction in this metric after an increase of freelists on the segments involved.
SQL*Net message from client
Dennis, August 23, 2005 - 2:24 am UTC
Hi Tom,
We are having performance issue with an user screen.
I have traced the module using sql trace & tkprof.
The stats are as below
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 271 0.31 0.44 0 0 0 0
Execute 272 0.08 0.22 0 0 0 0
Fetch 272 325.42 329.89 0 3064670 0 12
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 815 325.81 330.55 0 3064670 0 12
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 753
Rows Row Source Operation
------- ---------------------------------------------------
0 FILTER
103 TABLE ACCESS BY INDEX ROWID BMF
72614 INDEX RANGE SCAN (object id 134629)
102 INDEX RANGE SCAN (object id 534095)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message from client 1630 5.71 1459.53
SQL*Net message to client 1629 0.01 0.01
latch free 76 0.02 0.64
********************************************************************************
The wait event is showing event as SQL*Net message from client
1> Can this be attributed to the network or is it the processing time in the client front end
2> The sql when run from sql plus comes out in 1 secs even when run from client PC.
The execution plan is optimal, why does it access 3064670 buffers to fetch 12 rows.
3> Does the high buffer read and CPU time have anything to do with the wait event.
thanks,
Dennis
August 24, 2005 - 3:59 am UTC
103 TABLE ACCESS BY INDEX ROWID BMF
72614 INDEX RANGE SCAN (object id 134629)
I'd attribute it to that, you hit that index 72,614 times only to goto the table and find that a small 103 rows actually match the criteria.
Your index likely needs to have a couple of more columns at the "end" of it, so that hte index range scan doesn't return 72,614
Look at the number of LIO's you have there for a query that returns 12 small records!
the sqlnet message from client in this case likely means "you sat idle for a while before running the query"
this is an example of what I meant by adding more columns to the index, see the difference in work performed:
create table t1 ( x int primary key, data char(30) );
create table t2 ( x int, y int, data char(30) );
insert into t1 select rownum, username from all_users;
insert into t2 select mod(rownum,28)+1, rownum, owner from all_objects;
create index t2_idx1 on t2(x);
exec dbms_stats.gather_table_stats( user, 'T1', cascade=>true );
exec dbms_stats.gather_table_stats( user, 'T2', cascade=>true );
set autotrace traceonly
@trace
select /*+ first_rows */ * from t1, t2 where t1.x = t2.x and t1.x <= 12 and t2.y <= 100;
create index t2_idx2 on t2(x,y) compute statistics;
SELECT /*+ first_rows */ * from t1, t2 where t1.x = t2.x and t1.x <= 12 and t2.y <= 100;
set autotrace off
select /*+ first_rows */ *
from
t1, t2 where t1.x = t2.x and t1.x <= 12 and t2.y <= 100
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 5 0.09 0.13 0 2084 0 47
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 7 0.09 0.13 0 2084 0 47
Rows Row Source Operation
------- ---------------------------------------------------
47 TABLE ACCESS BY INDEX ROWID T2 (cr=2084 r=0 w=0 time=131534 us)
12036 NESTED LOOPS (cr=52 r=0 w=0 time=52634 us)
12 TABLE ACCESS BY INDEX ROWID T1 (cr=9 r=0 w=0 time=121 us)
12 INDEX RANGE SCAN SYS_C008981 (cr=5 r=0 w=0 time=62 us)(object id 41218)
12023 INDEX RANGE SCAN T2_IDX1 (cr=43 r=0 w=0 time=19555 us)(object id 41220)
********************************************************************************
SELECT /*+ first_rows */ *
from
t1, t2 where t1.x = t2.x and t1.x <= 12 and t2.y <= 100
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 5 0.01 0.00 11 34 0 47
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 7 0.01 0.00 11 34 0 47
Rows Row Source Operation
------- ---------------------------------------------------
47 TABLE ACCESS BY INDEX ROWID T2 (cr=34 r=11 w=0 time=1724 us)
60 NESTED LOOPS (cr=29 r=11 w=0 time=1478 us)
12 TABLE ACCESS BY INDEX ROWID T1 (cr=9 r=0 w=0 time=104 us)
12 INDEX RANGE SCAN SYS_C008981 (cr=5 r=0 w=0 time=56 us)(object id 41218)
47 INDEX RANGE SCAN T2_IDX2 (cr=20 r=11 w=0 time=1191 us)(object id 41221)
SQL*Net message from client
Dennis, August 25, 2005 - 2:13 am UTC
Hi Tom,
the sqlnet message from client in this case likely means "you sat idle for a
while before running the query"
- Can this be attributed to network, please advice
thanks,
Dennis
August 25, 2005 - 3:36 am UTC
go into sqlplus.
turn on sql_trace
go to lunch
run select * from dual;
exit sqlplus
run tkprof.
sqlnet message from client is the time the server process spent sitting there "waiting" for the client to tell it to do something, in your trace file from above, you'll see a total wait of "your lunchtime"
Jack, September 03, 2005 - 2:09 am UTC
Excessive "SQL message from/to user" waits might be an indication of network problem when running batch program between hosts.
I have seen significant SQL message waits in a Peoplesoft batch job. We narrowed down the problem to some full duplex setting on interface card and switch.
September 03, 2005 - 8:53 am UTC
yes, but given this:
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message from client 1630 5.71 1459.53
SQL*Net message to client 1629 0.01 0.01
latch free 76 0.02 0.64
I tend to doubt that, this is likely due to client fetching bunch of data and taking it's good time about processing it before asking for more.
async io and LGWR
reader, September 03, 2005 - 10:03 pm UTC
From your reply earlier on this thread,
<quote>As I said, we can use IO slaves to simulate async IO or if there is no async io,
we would do it sequentially.<quote>
Is there a way to find that if LGWR is writing sequentially to all members or using async io slaves? Thanks.
September 04, 2005 - 10:09 am UTC
show parameter, see if io slaves have been asked for or not.
dbwr_io_slaves
is currently used to configure both dbwr and lgwr IO slaves.
brief snippet from my forthcoming book
<quote>
I/O Slaves
I/O slaves are used to emulate asynchronous I/O for systems or devices that do not support it. For example, tape devices (which are notoriously slow) do not support asynchronous I/O. By using I/O slaves, we can mimic for tape drives what the operating system normally provides for disk drives. Just as with true asynchronous I/O, the process writing to the device batches a large amount of data and hands it off to be written. When the data is successfully written, the writer (our I/O slave this time, not the operating system) signals the original invoker, who removes this batch of data from its list of data that needs to be written. In this fashion, we can achieve a much higher throughput, since the I/O slaves are the ones waiting for the slow device, while their caller is off doing other important work getting the data together for the next write.
I/O slaves are used in a couple of places in Oracle. DBWn and LGWR can make use of them to simulate asynchronous I/O, and RMAN will make use of them when writing to tape.
Two parameters control the use of I/O slaves:
* BACKUP_TAPE_IO_SLAVES: This parameter specifies whether I/O slaves are used by RMAN to back up, copy, or restore data to tape. Since this parameter is designed around tape devices, and tape devices may be accessed by only one process at any time, this parameter is a Boolean, and not the number of slaves to use, as you might expect. RMAN will start up as many slaves as necessary for the number of physical devices being used. When BACKUP_TAPE_IO_SLAVES = TRUE, an I/O slave process is used to write to or read from a tape device. If this parameter is FALSE (the default), then I/O slaves are not used for backups. Instead, the dedicated server process engaged in the backup will access the tape device.
* DBWR_IO_SLAVES: This parameter specifies the number of I/O slaves used by the DBW0 process. The DBW0 process and its slaves always perform the writing to disk of dirty blocks in the buffer cache. By default, the value is 0 and I/O slaves are not used. Note that if you set this parameter to a nonzero value, LGWR and ARCH will use their own I/O slaves as well-up to four I/O slaves for LGWR and ARCH will be permitted.
The DBWR I/O slaves appear with the name I1nn, and the LGWR I/O slaves appear with the name I2nn, where nn is a number.
Jack, September 04, 2005 - 2:05 am UTC
Tom,
Is it possible to have asktom send an email notification when you posted a follow-up?
September 04, 2005 - 10:10 am UTC
only if you asked the question in the first place currently.....
jobq slave wait
Sami, September 11, 2005 - 3:00 pm UTC
Dear Tom,
We just upgrade our production database this weekend from 8.1.7.3 (2 Node OPS) to 9.2.0.6 (3 Node RAC - Solaris 5.9).
We see lots of "jobq slave wait" event. The number of job queue process defined as 18 and at any point we don't see any
more than 2 or 3 jobs running.
Do we have to change any specific init parameters?
September 11, 2005 - 6:34 pm UTC
well, if you want to remove those waits -- make them (the job queues) do more.
It is an idle wait event, the job queue processes are just hanging out waiting to be told to do something.
it is a new idle wait for the cjq0 (job queue coordinator) process in 9i.
jobq_slave_wait --Okay to ignore?
A reader, September 11, 2005 - 7:19 pm UTC
Thanks Tom.
What you are saying is "jobq_slave_wait" event is harmless similar to "SQL*Net message from client".
Hence we don't need to worry about this wait. Am I right?
September 11, 2005 - 7:23 pm UTC
yes, it could be considered similar to that wait.
*similar*, for sql*net message from client could be a seriously bad wait event to have.
v$system_event.time_waited_micro
Roger, November 11, 2005 - 10:10 am UTC
Hi Tom,
I am trying to understand the output of v$system_event. The times for wait events in v$system_event are from the start of the instance, right? Based on this, here is a quick test that confused me (I hope the formatting doesn't get messed up):
[code]
09:43:08 SQL> select * from (select x.*, rownum rnum from v$system_event x order by x.time_waited_micro desc) where rownum = 1;
EVENT TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT
---------------------------------------------------------------- ----------- -------------- ----------- ------------
TIME_WAITED_MICRO RNUM
----------------- ----------
rdbms ipc message 117821 110135 39176981 333
3.9177E+11 4
Elapsed: 00:00:00.00
09:44:08 SQL> set line 150
09:44:19 SQL> select * from (select x.*, rownum rnum from v$system_event x order by x.time_waited_micro desc) where rownum = 1;
EVENT TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT TIME_WAITED_MICRO RNUM
---------------------------------------------------------------- ----------- -------------- ----------- ------------ ----------------- ----------
rdbms ipc message 117850 110158 39181789 332 3.9182E+11 4
Elapsed: 00:00:00.00
09:44:20 SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
09:44:49 SQL> startup;
ORACLE instance started.
Total System Global Area 135338868 bytes
Fixed Size 453492 bytes
Variable Size 109051904 bytes
Database Buffers 25165824 bytes
Redo Buffers 667648 bytes
Database mounted.
Database opened.
09:45:13 SQL> select * from (select x.*, rownum rnum from v$system_event x order by x.time_waited_micro desc) where rownum = 1;
EVENT TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT TIME_WAITED_MICRO RNUM
---------------------------------------------------------------- ----------- -------------- ----------- ------------ ----------------- ----------
rdbms ipc message 96 53 7368 77 73680537 3
Elapsed: 00:00:00.00
09:45:18 SQL> select * from (select x.*, rownum rnum from v$system_event x order by x.time_waited_micro desc) where rownum = 1;
EVENT TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT TIME_WAITED_MICRO RNUM
---------------------------------------------------------------- ----------- -------------- ----------- ------------ ----------------- ----------
rdbms ipc message 114 71 13513 119 135131484 3
Elapsed: 00:00:00.00
09:45:32 SQL>
09:45:39 SQL> select * from (select x.*, rownum rnum from v$system_event x order by x.time_waited_micro desc) where rownum = 1;
EVENT TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT TIME_WAITED_MICRO RNUM
---------------------------------------------------------------- ----------- -------------- ----------- ------------ ----------------- ----------
rdbms ipc message 138 95 21707 157 217071883 3
Elapsed: 00:00:00.00
09:45:53 SQL> select * from (select x.*, rownum rnum from v$system_event x order by x.time_waited_micro desc) where rownum = 1;
EVENT TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT TIME_WAITED_MICRO RNUM
---------------------------------------------------------------- ----------- -------------- ----------- ------------ ----------------- ----------
rdbms ipc message 211 147 34088 162 340882760 3
Elapsed: 00:00:00.00
[/code]
As you can see, the time_waited_micro column had a large value before I restarted the instance but was reset on instance startup. After the restart, time_waited_micro was increasing faster than time was moving! 35 seconds later total_waited_micro went from 73680537 (approx 74 seconds) to 340882760 (approx 340 seconds). We can clearly see from the time stamps, the instance had not been up for 267 (340-74) seconds.
This was just a short example. In our production systems when I look at the instance startup time (v$instance) and compare it to the most waited event, the most waited event has waited more than 10 times the time since the instance started.
Can you please help me understand this better? Thanks.
November 12, 2005 - 8:36 am UTC
rdbms ipc message is not a good one to use - we are constantly and forever doing that one. it is not a meaningful one for us to look at even.
look at and use an event that means something to us outside of the background processes.
wait event
Roger, November 14, 2005 - 4:49 pm UTC
Hi Tom,
Thanks for the response. Can you please give me an example of a event that I can use? A simple example would be very useful. Thank you for your valuable time.
November 15, 2005 - 7:29 am UTC
what events do you have in your list to choose from.
PX Deq Credit: send blkd
Ashish Kapur, December 14, 2005 - 2:49 am UTC
Hi Tom,
Can you please let me know the above mentioned wait event. What are the causes of this wait event.
How best to overcome this wait event.
Thanks,
Ashish
December 14, 2005 - 8:28 am UTC
see support Note 271767.1 on metalink for details.
Wait Events
Karmit, March 05, 2006 - 6:16 am UTC
Hi Tom,
We are trying to continuously performance tune various processes in our
development database, ranging from data load processes to end user sql's.
Sometimes we need to change the method of doing things i.e implement some
new/better feature or rewrite end user sql's and we've acheived good results.
We have been using basic "instrumentation" so far - i.e make the process log
the start/end time during each step. Using this "log" we could make out which
step was an issue and using our "collective knowledge" plus "asktom/metalink
reference" tune that step.
However, now we have come to point where we are running multiple things at
the same time (as would happen in production) and someone raised an "expert
point" that we need to start logging "internal statistics" and our tuning
henceforth should be guided not only by start/end log times of the processes
but mostly by the wait events/ internal stats for each session. This should
also help us to determine exactly why a certain process took longer/shorter
than usual and what else was going on the system during its run.
I've been thinking about what all needs to be done to incorporate such a
thing. I understand that AWR is available in 10g - but main issue is that
its stats are purged after regular intervals and also its stats are system
wide - which is not very useful if trying to diagnose a particular process.
As a supplement to AWR, would taking a snap of the following views
help? There would be a procedure which will take in parameters like SID,
serial# (determined using mystat) and would be called by each package/procedure
in the database as soon as it enters the BEGIN part of the code and just
before its about to exit.
a) V$SESSION_EVENT
b) V$SESSTAT
c) V$SESSION_WAIT
1. Would like to hear your views / recommendations on the above.
2. Also, considering a RAC environment should I be using using GV$ views?
3. One more thing I'm having issues understanding is the parallel query
slaves get allocated their own SID's - which are different to the
PQ Coordinator. In such a case will it be enough to take a snap of just
the PQ Coordinator process? or should it be taking snap of all the
slave SIDs?! Sometimes (by design) the PQ coordinator sits on
node 1 while the slaves are on node 2. I guess this makes it all the
more critical to access GV$ instead of V$? But given the fact that
most GV$ views "duplicate" information - which ones should I be using
and which not? Or rather "how" should I be accessing ONLY the "correct"
information?
Thanks!
Karmit
March 05, 2006 - 1:46 pm UTC
awr data lives for a long time, days and days - so what if it is purged? You can control that as well if you like (the purging).
I would not snap those views - they would be rather not useful - how would you use snaps of them to determine anything? They are a snapshot of various current sessions at a point in time. Sessions come, they go - I don't see that being useful
You use v$ views when interested in an INSTANCES data.
You use gv$ views when interested in data across instances, or in another instance.
snapping these sids would not make sense to me either, what would you do with it? they are "at a point in time" and come and go (get associated with different coordinators all over the place)
Wait Events
Karmit, March 05, 2006 - 2:44 pm UTC
Hi Tom,
The purpose of taking a snap of the v$sesstat OR v$session_event
per SID is to store the events/ stats that the session was waiting
on/ generated during *that* run. We have various stored procedures
/packages which are run at various times to load/refresh data in
the database.
All these PL/SQL procedures store the start/end date with an
application generated "jobid" within the database. This is what
we use to calulate the "elapsed time" of a specific run of a specfic
job. Now, if we do some "performance tuning" - and check the
elapsed time - which hopefully has decreased, we assume that
our "change" helped and theorize why it did so. No FACTS.
So, if we had a procedure which takes a snap of v$sesstat/event
views - taking input as the SID, we could then call this procedure
twice from within all the application procedures (which would
determine the SID using v$mystat and pass it on as parameter) -
first time as soon as the "to be instrumented" code BEGIN's
and second when its just about to exit.
Since we can also link these "snaps" of the views to the application
generated "jobid" of that process - we would have the ability to
analyze the behaviour of a specific run of a specific job and also
to compare it with any of its runs in the past.
Most importantly we could then use this to "performance tune" the job
in question and support any change in behaviour with
"facts from the snaps".
At least thats what the idea is!
But after I read your reply I'm now totally confused!!! since I've
read in your book (Expert Oracle - 9i and 10g Programming - pg 300)
that we could use v$ views to collect information like "redo
generated" . In fact doesn't your runstat_pkg do something similar?
Also, having a look at the book "Oracle Wait interface - Shee/Deshpande"
seems to advocate a session logoff trigger to store this information
(however in that case I lose the ability to track it to my application
job run - hence a stored procedure).
If the v$session_event view shows the events waited on by the
session and the v$sesstat shows the stats - why would storing them
(with the ability to link them to the applications job) not help?
AWR for me is of little use since I can't link it to my applications
jobs - or use it to compare different runs of the same job - OR
to analyze what exactly it did/ waited on etc.
Please enlighten!
Thanks & Regards,
Karmit
March 06, 2006 - 8:33 am UTC
all you need is the ending view for a session, unless you are talking about "we would like to monitor process X" then yes it would/could make sense but that wasn't spelled out in the first case.
Wait events
Karmit, March 06, 2006 - 10:52 am UTC
Tom,
Sorry! I thought I'd made it quite clear especially with -
"This should also help us to determine exactly why a certain process took longer/shorter than usual and what else was going on the system during its run...." ...maybe
not clear enough.
Again, why only one snap at the end? shouldn't it be
start/end of the session? (as you might do in runstats?)
Also, in case of a parallel execution where there are
multiple SIDs, will taking the snap only for the SID associated with the PQ coordinator make sense?
Thanks in advance,
Karmit
Is i am looking in wrong view
Orakle_lover, March 06, 2006 - 6:38 pm UTC
I was trying to know the number of user commits in my session.
So my query was :
1 select a.name,a.class,b.value
2 from v$statname a,v$mystat b
3 where a.statistic#=b.statistic#
4* and a.name = 'user commits';
Before Commit:
NAME CLASS VALUE
------------------------- ---------- ----------
user commits 1 0
SQL> commit;
Commit complete.
After Commit:
SQL> select a.name,a.class,b.value
2 from v$statname a,v$mystat b
3 where a.statistic#=b.statistic#
4 and a.name = 'user commits';
NAME CLASS VALUE
------------------------- ---------- ----------
user commits 1 0
Is the value does not change for each commit????
I think i was looking in some wrong way!!!!
What is the way to find the number of commits by a session?
If the "user commits" is not the one we have to look then what is the value is used for?
March 08, 2006 - 4:20 pm UTC
because the commit of "nothing" is optimized away - we know we didn't need to do it, so we ignored it.
ops$tkyte@ORA10GR2> create table t ( x int );
Table created.
ops$tkyte@ORA10GR2> insert into t values ( 1 );
1 row created.
ops$tkyte@ORA10GR2> commit;
Commit complete.
ops$tkyte@ORA10GR2> @mystat "user commit"
ops$tkyte@ORA10GR2> set echo off
NAME VALUE
-------------------------------------------------- ----------
user commits 3
ops$tkyte@ORA10GR2> commit;
Commit complete.
ops$tkyte@ORA10GR2> @mystat "user commit"
ops$tkyte@ORA10GR2> set echo off
NAME VALUE
-------------------------------------------------- ----------
user commits 3
ops$tkyte@ORA10GR2> update t set x = 1;
1 row updated.
ops$tkyte@ORA10GR2> commit;
Commit complete.
ops$tkyte@ORA10GR2> @mystat "user commit"
ops$tkyte@ORA10GR2> set echo off
NAME VALUE
-------------------------------------------------- ----------
user commits 4
user commits go up when there is some actual work in need of committing.
Oracle Wait Events
Karmit, March 07, 2006 - 4:42 pm UTC
Tom,
Sorry! I thought I'd made it quite clear especially with -
"This should also help us to determine exactly why a certain process took longer/shorter than usual and what else was going on the system during its run...." ...maybe not clear enough.
a) Again, why only one snap at the end? shouldn't it be
start/end of the session? (as you might do in runstats?)
b) In case of parallel execution where there are
multiple SIDs, will taking the snap only for the SID
associated with the PQ coordinator make sense?
c) Is it possible for a session to post the "amount of
temp space it used" immediately before it logsoff?
I'm trying to get this through v$tempseg_usage but it
seems that the information vanishes as soon as the
session ends?
Thanks,
Karmit
March 09, 2006 - 11:23 am UTC
you said "snap", a "snap" is an isolated point in time "lets copy the data".
You mean "when the session is over, we will - at the session level - save off the relevant wait events - the cumulative wait events - for that session. This is very different from "snapping" the v$ tables.
Yes, that can (obviously) be useful - maybe. You might have to go more granular with the ability to say "begin_task(:x)" and "end_task(:x)" and implement sort of a stack that can let you see what the various waits and statistics were for that single task in your larger program (aggregate waits for the entire session would be not extremely useful)
a) if you were doing this at the session level, no need - everything for the session starts at "zero". if you do it task oriented, then you need the begin/end sort of thing.
b) not necessarily. you know you waited - but not really why.
c) correct, it does disappear. I don't think that statistic is captured, but you do have physical reads/writes to temp which would be an indicator.
PL/SQL lock timer
Reader, March 07, 2006 - 9:29 pm UTC
I would appreciate if you could help me understand more about this wait .
Thanks
March 09, 2006 - 12:09 pm UTC
dbms_lock.sleep( 5 )
will generate 5 seconds of wait on plsql lock timer.
you are sleeping.
Alexander the ok, March 17, 2006 - 10:40 am UTC
Hi Tom, Happy Friday
I've been practicing a little backup and recovery stuff recently on my test database. I had to enable archivelogmode. Today I tried using our application against my database and I noticed on of the queries that normally doesn't take long hung on a wait. The wait was a log file switch(archive needed). The docs say this:
"Waiting for a log switch because the log that the LGWR will be switching into has not been archived yet".
I don't really know what this means. "Archived" as in, backed up? How do you archive something? I thought once you enable archivelogmode on you system, oracle just continues to write changes to your redo log files instead of writing over them in a circular fashion.
I also noticed I had a ton of archive files (I think that's what the are), they're the same size as my redo log files, with this type of naming convention "ARC01028.001". I had almost 50 of them in the directory I specified when I enabled archivelogmode in OEM. Is this normal to have this many files generated? I've only had this set for a week or so, and it's my test single user system. I couldn't believe that much was created from the little work I do. Does this mean I have too few redo log files, or they're too small, or both?
I've read through the backup and recovery basics guide once already. This is still kid of confusing I hope you can clarify these questions. Thank you.
March 17, 2006 - 5:51 pm UTC
It means lgwr is waiting for ARCH to make a copy of the file elsewhere before it can reuse it.
This is exactly what archive log does - as lgwr fills up a log file, ARCH copies it to the archive destination and names them sensibly. You then back this up to some other storage (get them off of the system so they are safe).
In the event you need to restore - you restore the datafile(s) and apply the archives that were created since the backup was taken
Alexander the ok, March 20, 2006 - 9:52 am UTC
Ok so when this occurs, I should backup my archive files? How does one stay ahead of this problem so it doesn't happen all of a sudden on a production system and all kinds of stuff comes to a halt?
I have a book that basically says this problem is due to having too few archiver processes configured, or your redo logs are too small. Can you provide guidance as to determine which of these it might be, and how to know how to configured how many ARCH(n) processes and redo log sizes?
My redo logs are currently 20 megs in size, and I have 5 of them. This is an OLTP system, roughly 35 gb of data. I know these settings depend on a lot of things, I'm just looking to get some idea, a ball park. Thanks as always.
March 22, 2006 - 12:33 pm UTC
You should be CONSTANTLY AND CONTINUALLY backing them up.
scenario: machine fails, takes disk with it.
if you have not constantly been backing up the archives, you have just lost them. Your job is to make sure they are safe, away, elsewhere.
whether that book is right or not depends on whether the problem was
a) you didn't have automatic archiving enabled
b) you ran out of room in the archive destination
c) you really couldn't archive fast enough
long waits while disks are not busy
Andr, March 24, 2006 - 9:36 am UTC
Hi Tom.
Could you please give some tips to understand the situation:
Sometimes in our database DBWR and CKPT waits very long.
Right after their status has become 'waiting', others sessions become
waiting also. (very long, sometimes more than 90 sec)
Session's waits are various: db file sequential read,
buffer busy waits, direct path read etc)
It is imortant that it is not a checkpoint, because :
1) In the alert log, event 'checkpoint completed' appear much later after
hang finishing (10-15 minutes after it).
2) status of all logfiles ('current','active','inactive') is not changing
during the hang
(status before the hang and after is identical for all logfiles)
Also, at these moments the disks activity is not large:
(OS iostat) : 10-30 kbps. Disks with rollback have some more activity:
500-700 kbps (only writes).
Behind these moments disk activity can reach 1000-3000 kbps and more.
CPU activity is about 30-40%
sessions waits like this (from v$session_wait):
SID EVENT_NAME P1TEXT P2TEXT P3TEXT P1 P2 P3 SEC_IN_W STAT
--- -------------------- --------- ---------- --------- ------ ------ ---- ------ ------
...
11 rdbms ipc message timeout 030000 0 0 1360 WATING
56 buffer busy waits file# block# id 04 7781 130 41 WATING
132 buffer busy waits file# block# id 08 114724 130 66 WATING
4 control file parallel files blocks requests 02 2 2 66 WATING
write
198 buffer busy waits file# block# id 08 114724 130 66 WATING
162 buffer busy waits file# block# id 05 23693 130 21 WATING
141 buffer busy waits file# block# id 08 2526 130 42 WATING
94 buffer busy waits file# block# id 04 7781 130 25 WATING
126 buffer busy waits file# block# id 08 2526 130 30 WATING
125 buffer busy waits file# block# id 05 23693 130 27 WATING
68 db file scattered rea file# block# blocks 08 114724 3 66 WATING
115 db file scattered rea file# block# blocks 04 7781 2 66 WATING
135 db file scattered rea file# block# blocks 05 23693 2 41 WATING
196 db file scattered rea file# block# blocks 08 2526 4 65 WATING
2 db file parallel writ files blocks requests 0208 0 0 103 WATING
201 direct path read file num first dba block cnt 0203 421513 1 66 WATING
160 direct path write (lo file num first dba block cnt 06 156319 2 25 WATING
.... etc
( 4 - CKPT, 2 - DBWR, 11 - ARC;
among file# waits are no files with rollback segments )
Oracle 8i
log_checkpoint_interval 0
log_checkpoint_timeout 0
redo logs about 400Mb, they are on the separate disks
What is it ? Why CKPT waits so long ? Why session is reading one block so long (including direct path)?
We very hope for your help.
strange waits (long waits while disks are not busy)
Andr, March 29, 2006 - 2:39 am UTC
Hi Tom.
We are hopening for your help still. (see above)
Could you give us any supposition ?
March 29, 2006 - 7:16 am UTC
you have really long IO waits maybe disk problem?
strange waits
A reader, March 30, 2006 - 3:19 am UTC
Hi Tom.
Now I have some more information.
_Presumably_ (but I'm not sured yet finally) it can happen when the current log is filled about 90% (not always) . It is important that dbwr _and_ ckpt waits. Also during the waiting the table x$kcccp is not changing (in other time it changes constantly). Sometimes at that the disk activity is really about 0. The 'Completed checkpoint' appeared in the alert.log some minutes later. For example (at 20 sec interval)
disks: tps kb_read kb_wrtn
disk1 0.0 0 0
disk2 0.0 0 0
Out of that disks work normally.
We have one DBWR and async io is true. The Buffer cache is more 1Gb. But we have no free buffer waits.
March 31, 2006 - 11:01 am UTC
then, if it is not causing waits to happen elsewhere in the system - is there an identified problem you are trying to solve?
wait event,
A reader, March 31, 2006 - 10:13 am UTC
What is "wait for master scn" event?
In my database, one of my INSERT statement is showing on this wait for a long time.
EVENT TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT
------------------------------ ----------- -------------- ----------- ------------
wait for master scn 108775355 138 3354407 0
What would cause this and how to fix this?
Thanks,
March 31, 2006 - 12:48 pm UTC
you have an insert that waited 108,775,355 for this event?
followup,
A reader, March 31, 2006 - 1:58 pm UTC
Yes. It is for INSERT statement. Later we killed the SID and restarted the job.
Thanks
March 31, 2006 - 3:00 pm UTC
using rac? distributed queries?
follow up,
A reader, March 31, 2006 - 3:16 pm UTC
Sorry..I didn't put that information earlier.
The database is a 10.1.0.4 on a 3 node RAC.
Another question:
I have seen in our other databases as well, parallel query sometimes just hang on RAC machines (not always but whenever there are good number of sessions starting at the same time).
Why is that?
The wait events in those cases will be
PX Deq: Execute Reply,gc cr request,gc buffer busy,
PX Deq: Execution Msg
Any settings at the database level to avoid these kind of wait events?
March 31, 2006 - 3:18 pm UTC
I'll ask you to work with support on this particular issue - I don't think we'd be able to get to the bottom of it efficient here.
strange waits (long waits while disks are not busy)
Andr, April 03, 2006 - 4:31 am UTC
Hi Tom
You said
"if it is not causing waits to happen elsewhere in the system - is there an identified problem you are trying to solve?"
- Long waits DBWR and CKPT is causing other long waits
(scattered read, direct read/write read).
And I don't understand what the reason of this. Definitely,it's not a disk problem . No LGWR waits also.
But I have no another suppositions about source of it. I'm puzzled.
(see the messages here of March 24 and March 30)
April 03, 2006 - 8:14 am UTC
dbwr and ckpt wait - so what?
unless my end users are - I don't care, they are just software.
so, is this causing a wait anywhere we CARE about.
(same about lgwr, unless I'm waiting on lgwr, I don't really *care*)
strange waits (long waits while disks are not busy)
Andr, April 03, 2006 - 9:08 am UTC
Hi Tom.
That's just the point that at the same time
some end users _also_ are waiting long. But there is no a disk activity at that. And the queue of the active users is increasing until the hung finish. And the incremental checkpoint position is not moving (x$kcccp is not changing) at that (it seems that it is frozen at these moments). And it isn't normal checkpoint. (I look at the log file status before and after).
For example (from from v$session_wait):
Event_name Seconds_in_wait state
...
direct path read 56 WATING
db file scattered read 41 WATING
...
control file parallel wr 56 WATING
db file parallel write 83 WATING
April 04, 2006 - 9:32 am UTC
if you are waiting 50 seconds for an IO to complete - you have something seriously wrong with the disks there. Don't you think?
strange waits (long waits while disks are not busy)
A reader, April 04, 2006 - 10:04 am UTC
Hi Tom.
"if you are waiting 50 seconds for an IO to complete - you have something seriously wrong with the disks there. Don't you think?"
Of course, I am not sured at all. But I suppose that it is not the disk problem. Because the correlation exists: the long waits appear only when CKPT is waiting (control file parallel write). And the situation occurs only a few times per day. May be it any oracle reason ?
April 04, 2006 - 7:23 pm UTC
you have 50 seconds of wait on an IO
that is a huge, gigantic, impossibly large number.
strange waits (long waits while disks are not busy)
Andr, April 05, 2006 - 2:04 am UTC
Hi Tom
Thanks for the answer.
But in what way I can identify exactly this disk problem in OS (unix). Our system administrator says the disks are all right. Mainly we have no long waits (only sometimes).
April 05, 2006 - 5:43 pm UTC
if you have 50 second waits for an IO to return, there is seriously something very very wrong. We can tell you how long we waited for IO.
long disk waits
A reader, April 12, 2006 - 4:05 am UTC
Hi Tom
Maybe these disk waits are caused by AIX ?
Can you give advice about oracle on AIX ?
April 12, 2006 - 8:07 am UTC
the IO waits would be for everyone, not just oracle.
long disk waits
A reader, April 13, 2006 - 6:12 am UTC
while oracle has io waits the file coping on disk works normally (no waits).
April 13, 2006 - 8:01 am UTC
We queue for IO using the same stuff as anything else.
SQL*Net break/reset to client
Avid Follower, May 18, 2006 - 10:17 am UTC
Hello Tom,
We often get errors in our Alert logfile referring to a certain trace file in udump for errors, and 90% of the times its
Dump of memory from 0x0274E8C0 to 0x0274EA00
274E8C0 3A20736C 62647520 73666564 00000000 [ls : udbdefs....]
[opipls:incorrect]
[ no. of binds...]
[opipls:sqlstr...]
[FATAL ERROR IN T]
[WO-TASK SERVER: ]
[error = %d..Prob]
[able error (ORA-]
[28546) in networ]
[k administration]
[....Network prot]
[ocol error on fi]
[rst data after n]
[ew connect..opit]
[sk_1....opitsk_2]
[....------------]
[---------.Remote]
[ SQL-DSR stateme]
[nt:.....opixrb-1]
[8...opixrb-14...]
Argument/Register addr=0x78B49B78.
Then followed by similar boxes with information of the Forms client. And then the following lines
O/S info: user: radoff, term: SKM-0312, ospid: 3168:3716, machine: SKM\SKM-0312
program:
last wait for 'SQL*Net break/reset to client' blocking sess=0x0 seq=689 wait_time=1
driver id=28444553, break?=1, =0
temporary object counter: 0
It's always this, we haven't been able to figure out what this means, seems some sort of locking somewhere, but what exactly do we need to look at. (I can paste you the rest of the lines that follow if they can help you in guiding us what do we need to look at). Especially the "last wait for 'SQL*Net break/reset to client'" I don't get.
If more of these come, severe locking is caused, or I should say when locking occurs then we find a number of occurences of such traces in the alert logfile.
I have contacted Oracle support, and we haven't been able to figure out the reason for TWO-TASK SERVER Error, so I was hoping that you could tell me what could be the reasons, and/or where do we need to look first. Application we try to check but since it's always a different client and a different query given, I can't put it to one application. Will be grateful if you could tell me what could possibly be happening: network, application, database or something else!
Regards,
May 19, 2006 - 9:47 am UTC
you should really utilize support for something like this.
SQL*Net break/reset to client
Avid Follower, May 23, 2006 - 10:35 am UTC
Ok, will try raising this issue again with them, but if you can only tell me, generally, what does this system wait actually mean, "SQL*Net break/reset to client"? as it appears in alot of examples when someone tries to find different types of waits on their databases. And what area one needs to look at in case this event is a problem area.
Thanks
May 23, 2006 - 3:29 pm UTC
problem in returning the status from dbms_pipe.send_message
supriya, May 30, 2006 - 6:20 am UTC
Hi Tom,
I have a procedure which calls one more procedure.
the 2nd proc has dbms_pipe.send_message command.
here its getting hanged and not returning status.
is there any way to return the status even it doesn't run... so, that the procedure to continue to execute the other statements...
plesae needs quick reply.
Thanks in advance,
Supriya
rdbms ipc reply
A reader, May 31, 2006 - 5:49 pm UTC
Hi Tom there are some sessions in my database that in a wait state of "rdbms ipc reply"
When I turn tracing on, I get a trace file with the following line repeated over and over:
WAIT #0: nam='rdbms ipc reply' ela= 1953595 from_process=3 timeout=2 p3=0 obj#=-1 tim=1122179692553413
How do I tell what is actuall happening with the session and how do I resolve this wait?
Thanks.
June 01, 2006 - 9:44 am UTC
Who is waiting on it? (what processes)
What are they "doing" - what is that session (do you have any insight into that at all - looking at the session, what is it - a user session, a background, what?)
and - is this in fact causing an issue (eg: a human being is waiting for something)
follow-up
A Reader, June 01, 2006 - 11:05 am UTC
These are all SYS sqlplus sessions there were doing system state dumps. They were started by a monitoring script.
Looking at the v$session table, these sessions are active and waiting on a from_process = 3.
SELECT * FROM v$process WHERE pid =3;
yields:
ADDR PID SPID USERNAME SERIAL#
---------------- ---------- ------------ --------------- ----------
TERMINAL PROGRAM
------------------------------ ------------------------------------------------
TRACEID
--------------------------------------------------------------------------------
B LATCHWAIT LATCHSPIN PGA_USED_MEM PGA_ALLOC_MEM PGA_FREEABLE_MEM
- ---------------- ---------------- ------------ ------------- ----------------
PGA_MAX_MEM
-----------
00000000DF474868 3 8907 oracle 34
UNKNOWN oracle@opcenterdbc (TNS V1-V3)
470721 539361 131072
1580105
June 01, 2006 - 11:23 am UTC
well, there you go.
They sent a message to a background "please dump"
They are waiting for the background to come back and say "done dumping"
to remove this wait - stop dumping?
Follow-up
A Reader, June 01, 2006 - 11:38 am UTC
dumping of a system state dump lasts just a few minutes, if even that long.
Why would these sessions run for days to dump the system state.
I am trying to figure out why they are held up for so long.
I believe the following queries are useful, but I am not sure how to interpret the results of the second one:
SQL> select s.sid
2 from v$process p,v$session s
3 where s.paddr = p.addr
4 and p.pid = 3;
SID
----------
283
SQL> select * from V$SESSION_EVENT where sid =283;
SID EVENT
---------- ----------------------------------------------------------------
TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT MAX_WAIT TIME_WAITED_MICRO
----------- -------------- ----------- ------------ ---------- -----------------
EVENT_ID WAIT_CLASS_ID WAIT_CLASS#
---------- ------------- -----------
WAIT_CLASS
----------------------------------------------------------------
283 SQL*Net message to client
6 0 0 0 0 17
2067390145 2000153315 7
Network
283 SQL*Net message from client
6 0 0 .04 0 2504
1421975091 2723168908 6
Idle
283 events in waitclass Other
346893 345621 67660408 195.05 208 6.7660E+11
1736664284 1893977003 0
Other
June 01, 2006 - 11:53 am UTC
if you are doing system state dumps, you MUST be working with support - please continue to work with them on this issue. The only reason you'd be doing this is because they asked you to.
oracle wait event,
A reader, June 07, 2006 - 11:00 am UTC
What is db file parallel read wait event?
I see this pretty much on RAC databases.
A sample output from tkprof output says this wait event was the biggest. Do you know why it is happening? How to fix this?
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.16 0.37 168 15801 0 0
Execute 1 5.05 142.68 14191 14284 0 0
Fetch 1 0.05 0.09 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 5.26 143.16 14359 30085 0 0
Sample output of tkprof:
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
rdbms ipc reply 2 0.00 0.00
gc cr grant 2-way 164 0.00 0.10
db file sequential read 195 0.09 2.01
gc cr multi block request 9644 0.01 1.33
db file parallel read 150 1.61 135.92
gc cr grant congested 1 0.00 0.00
latch: KCL gc element parent latch 3 0.01 0.01
latch: gcs resource hash 1 0.00 0.00
enq: PS - contention 50 0.00 0.02
latch: ges resource hash list 1 0.00 0.00
KJC: Wait for msg sends to complete 1 0.00 0.00
DFS lock handle 12 0.15 0.26
enq: PI - contention 4 0.00 0.00
PX Deq: reap credit 159 0.00 0.01
PX Deq: Join ACK 5 0.00 0.01
PX Deq: Parse Reply 7 0.29 0.31
June 07, 2006 - 3:05 pm UTC
wait events
mal, June 21, 2006 - 2:46 pm UTC
Can you please provide some suggestion after looking at following ?
select EVENT,TIME_WAITED from v$system_event order by TIME_WAITED desc
EVENT TIME_WAITED
---------------------------------------------------------------- -----------
SQL*Net message from client 489711645
rdbms ipc message 483178445
gcs remote message 110453129
ges remote message 55418765
pmon timer 55404668
smon timer 53678792
wakeup time manager 53421063
PX Idle Wait 8094027
db file scattered read 3396348
db file sequential read 3157320
ARCH wait on SENDREQ 2208004
global cache cr request 1270045
control file sequential read 1154137
enqueue 951439
PX Deq: Execution Msg 460557
db file parallel read 396697
log file sync 164629
log file parallel write 128685
log file sequential read 99154
SQL*Net more data to client 94446
global cache null to x 52629
wait for scn from all nodes 42464
buffer busy global CR 41099
control file parallel write 39633
log file switch (archiving needed) 34082
buffer busy waits 31820
global cache s to x 30801
KJC: Wait for msg sends to complete 30636
log buffer space 27504
global cache open x 24963
latch free 21894
PX Deq: Table Q Normal 17949
row cache lock 11890
select parameter, count, dlm_requests, dlm_conflicts, dlm_releases
2 from v$rowcache;
PARAMETER COUNT DLM_REQUESTS DLM_CONFLICTS DLM_RELEASES
-------------------------------- ---------- ------------ ------------- ------------
dc_free_extents 0 0 0 0
dc_used_extents 0 0 0 0
dc_segments 1346 14400 100 9286
dc_tablespaces 19 166 0 147
dc_tablespace_quotas 0 3388 51 10
dc_files 29 232 0 203
dc_users 23 212 0 189
dc_rollback_segments 108 270 34 0
dc_objects 1790 15774 171 12978
dc_global_oids 17 126 0 109
dc_constraints 0 808 23 1
dc_object_ids 1383 10918 25 9089
dc_sequences 20 580765 17744 67
dc_usernames 11 101 0 90
dc_database_links 1 8 0 7
dc_histogram_defs 4780 71509 27565 30214
dc_table_scns 62 47533 12766 1562
dc_outlines 0 0 0 0
dc_profiles 2 16 0 14
dc_encrypted_objects 0 0 0 0
dc_encryption_profiles 0 0 0 0
dc_qmc_cache_entries 0 0 0 0
dc_qmc_ldap_cache_entries 0 0 0 0
dc_users 0 0 0 0
dc_histogram_data 0 0 0 0
dc_histogram_data_values 0 0 0 0
dc_partition_scns 0 0 0 0
dc_user_grants 20 0 0 0
dc_app_role 0 0
select namespace, dlm_lock_requests, dlm_pin_requests, dlm_pin_releases, dlm_invalidation_requests, dlm_invalidations
2 from v$librarycache;
NAMESPACE DLM_LOCK_REQUESTS DLM_PIN_REQUESTS DLM_PIN_RELEASES DLM_INVALIDATION_REQUESTS DLM_INVALIDATIONS
--------------- ----------------- ---------------- ---------------- ------------------------- -----------------
SQL AREA 0 0 0 0 0
TABLE/PROCEDURE 1661706 16424 13373 11168 4
BODY 79 416 337 366 3
TRIGGER 0 274 239 274 0
INDEX 37495 193 82 42 0
CLUSTER 2144 37 22 26 0
OBJECT 0 0 0 0 0
PIPE 0 0 0 0 0
JAVA SOURCE 0 0 0 0 0
JAVA RESOURCE 0 0 0 0 0
JAVA DATA 0 0 0 0 0
June 22, 2006 - 11:43 am UTC
I think you should use statspack or AWR to look at things, since these are just a big bunch of numbers that have been accumulating since the database instance was started.
you need to look at a smaller, defined timeframe. and the statspack/awr reports will contain "meaningful useful information you can read and interpret and act on"
wait events
mal, June 23, 2006 - 7:11 pm UTC
Snap Id Snap Time Sessions Curs/Sess Comment
--------- ------------------ -------- --------- -------------------
Begin Snap: 33942 22-Jun-06 00:25:04 10,818 1.0
End Snap: 33981 22-Jun-06 21:25:00 12,017 1.0
Elapsed: 1,259.93 (mins)
Top 5 Timed Events
~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
CPU time 10,628 26.96
global cache cr request 5,867,899 7,014 17.79
db file sequential read 1,825,166 6,794 17.23
db file scattered read 1,318,552 6,125 15.53
ARCH wait on SENDREQ 9,568 2,664 6.76
June 24, 2006 - 11:40 am UTC
1,259.93 minutes...
that does not fit into my definition of "you need to look at a smaller, defined timeframe. "
I was thinking more like 15 minutes, not 21 hours.
I see nothing wrong here, no one could. You have 75,595 seconds of elapsed time. Way too much time to average anything over.
SQL*Net Messsage from client
vivek, July 26, 2006 - 11:52 am UTC
Tom,
On my production server i have gathered some statspack report and found SQL*Net message from client wait number to be huge. I am aware of some network issues on the production site how can i confirm from my statspack report that this SQL*Net message from client or any other event is primarily because of network not a database bottleneck.
Abstract from statspack
Avg
Total Wait wait Waits
Event Waits Timeouts Time (s) (ms) /txn
---------------------------- ------------ ---------- ---------- ------ --------
db file parallel write 483 0 60 124 0.0
log file sync 3,144 8 38 12 0.3
log file parallel write 8,834 4,859 28 3 0.8
db file scattered read 17,257 0 22 1 1.5
control file parallel write 579 0 21 36 0.1
buffer busy waits 34 9 14 425 0.0
db file sequential read 5,311 0 13 2 0.5
latch free 135 20 6 41 0.0
direct path read 363 0 1 2 0.0
SQL*Net more data to client 5,479 0 0 0 0.5
direct path write 16 0 0 11 0.0
SQL*Net break/reset to clien 280 0 0 0 0.0
LGWR wait for redo copy 69 0 0 0 0.0
control file sequential read 588 0 0 0 0.1
log buffer space 1 0 0 4 0.0
SQL*Net message from client 35,008 0 66,622 1903 3.1
jobq slave wait 1,189 1,113 3,500 2943 0.1
SQL*Net more data from clien 2,720 0 0 0 0.2
SQL*Net message to client 35,127 0 0 0 3.1
July 26, 2006 - 12:03 pm UTC
it is likely neither a database or network.
It is just the client. It is the normal wait event accrued when you are connected but not asking the database to do anything.
Just log into sqlplus for example, let it sit there. Then select * from dual.
You'll have a lot of sqlnet message from client. The entire time you had the blinking prompt in sqlplus - connected but not active - was sqlnet message from client!
SQL*Net
vivek, July 27, 2006 - 12:18 pm UTC
Tom,
During the slow response time, Capacity planner statistics shows average wait (for SQL*Net break/reset to client event) jumped from 0.33 to 101.92 s and at the client side i recieved following error in sqlnet.ora
Tns error struct:
nr err code: 12224
TNS-12224: TNS:no listener
ns main err code: 12541
TNS-12541: TNS:no listener
ns secondary err code: 12560
nt main err code: 511
TNS-00511: No listener
nt secondary err code: 61
nt OS err code: 0
There were no errors in the listener log. So i believe that its not just idle time for server but it was not able to receive the request from clients. So please advise what conclusion can i make from this statatistics?
As i mentioned earlier that there are some reported issues with network and for that sniffer is going to be configured on router side, What can i monitor with in database so that next time if slow response issue is raised i should be able to identify the cause?
Thanks
Vivek
July 27, 2006 - 12:23 pm UTC
you asked about sqlnet message from client before???? I don't see any correlation between this and that.
What I said above applies.
Statspack Anomoly
Danny Gawarecki, October 05, 2006 - 1:06 pm UTC
Tom - We have an OLTP system that is capable of processing more than 1,000 logical transactions per second (although it currently runs around 400-500 per second now). We are making a change to the system to add a new table to query for every transaction that is processed. In addition there is a process to populate this new table that will completely delete and insert the data in the table every 20 seconds. We have done multiple 1 hour test runs of the current system and the system with the new table, etc. and captured CPU and database statistics. The CPU stays very even on both runs, but the database stats are wildly different. Below are some stats from each run that are wildly different. Those with a description ending in 2 are from the current production run and the others are from the run with the new table. We have repeated the tests three times and received the same results each time. Our quandry is that we can't understand why the db stats would improve by adding a new table and constantly deleting/inserting that table. In the very best case it should stay the same or possibly even become worse. Can you help explain what might be going on? Any help you can give will be appreciated and I can provide you with any other information that you might need. Thanks.
latch free 17,233 3,411 1,500 87 0.0
latch free2 205,379 66,944 865 4 0.1
log file sync 726,473 0 942 1 1.2
log file sync2 1,552,982 49 5,349 3 0.7
buffer busy waits 4,613 0 30 7 0.0
buffer busy waits2 148,357 0 249 2 0.1
LGWR wait for redo copy 8,500 0 1 0 0.0
LGWR wait for redo copy2 121,397 22 9 0 0.1
October 05, 2006 - 1:24 pm UTC
you say the cpu stays the same, but is the amount of work performed equal??
insufficient data to say anything so far.
Follow Up
Danny Gawarecki, October 05, 2006 - 1:49 pm UTC
The system was put through identical tests in both cases. The hardware, setup, time span and work done is the same between both tests. The only difference is the spicific differences that we are testing (i.e. the new table and processing for that table). We can do the exact same test two times in a row and get nearly (although not completely) identical outcomes from an application, database, and Unix system standpoint.
October 05, 2006 - 4:19 pm UTC
tell you what, zip up both statspack that cover the SAME AMOUNT OF WORK (eg: transactions committed should be equal) - save as a '.zyp' file and send it to me, I'll peek at it and see what I see. insufficient data in the above to say anything.
Update, I got the statspack and this is what I observed:
both statspacks are for 60 minutes.
one of them does 272.21 transactions per second, or a total of 982,079.238 commits issued by your transaction. That is "production"
The new_table_changes.lst does 168.06 transactions per second, or a total of 606,226.032
The "new table" statspack therefore reflects a 40% DECREASE in work performed during the same period of time.
so, it was sort of what I thought, they are doing significantly less "transactions" in the same amount of time, the code is not running faster due to these changes - it is doing less work in the same amount of time, which of course accounts for the reduced statistics reported above - less commits per hour = less log file sync waits per hour!
local write wait
abz, October 28, 2006 - 10:54 am UTC
can you please describe the wait event "local write wait".
Also
Please read the first section at the link
</code>
http://download-uk.oracle.com/docs/cd/B19306_01/server.102/b14237/waitevents003.htm <code>
I was not able to find the "local write wait" event in
the documentation, so this means according to the above link, "local write wait" event is not of "GREATEST INTEREST",
then as the manual suggests, I issued
SELECT name FROM V$EVENT_NAME ORDER BY name;
the V$event_name does not have any column describing the
cause of this event.
We are receiving a lot of this event, so it is of
GREATEST INTEREST to us.
How can Oracle leave such things undocumented, what
should a person receiving such events do in this case.
October 28, 2006 - 11:28 am UTC
can you show it to me IN CONTEXT "abz" ?
have you been to metalink.oracle.com? (one of the first places I would think to type in a wait event name and see what comes back?)
indeed, when I typed it in there, I found lots.
basically, you are waiting for dbwr to finish a local write (a write on your machine) in order to reuse a buffer.
do you do lots of truncates of tables that possibly have lots of blocks in the cache? You have to wait for them to be "checkpointed", that'll do it too.
IN CONTEXT?
abz, October 28, 2006 - 11:43 am UTC
"can you show it to me IN CONTEXT "abz" ?"
cant get it.
Ok, metalink provides the information, but I was surprised
why Oracle chooses to leave such things undocumented.
Also, the session in which I am receiving this error
is executing an INSERT statment not a TRUNCATE, is this
event also can occur in INSERT statements?
October 28, 2006 - 1:15 pm UTC
sure, dbwr is cleaning out some blocks.
IN CONTEXT?
abz, October 28, 2006 - 1:31 pm UTC
Can you please clarify the sentence
"can you show it to me IN CONTEXT "abz" ?"
I am note sure what you mean by above, but my point was
that metalink support is an optional thing, which, a client
who purchased Oracle software, might or might not purchase
metalink support. So things like this must be in the
documentation.
October 28, 2006 - 3:24 pm UTC
show me where you are "seeing them", tkprof, statspack, what.
in context
abz, October 28, 2006 - 6:06 pm UTC
Ok you mean where I saw the "local write wait" wait event.
I am using browser based dbconsole (database control)
of 10g, there I clicked on my session ID, and then a screen appeared where this event was shown under the heading WAIT.
You didnt comment on not documenting this.
October 29, 2006 - 7:03 am UTC
I'm not going to comment on everything - It isn't in the documention, I personally cannot say anything beyond "so sorry" to you.
And how much of a wait is this for you - you give absolutely no context here "abz"
More on SQL*Net break/reset to client
John Feibusch, November 06, 2006 - 3:13 am UTC
There've been some questions about "SQL*Net break/reset to client", and the description in the doc seems pretty unhelpful. It seems to me that this event occurs when a SQL statement execution generates an error, or when an interactive user aborts execution of a SQL statement with CTRL-C (which is really treated like an error in Oracle).
I've done 10046 traces and seen this event occur whenever a SQL error occurs.
That would mean that if you see a sudden increase in the number of waits on this event and it seems to be causing performance issues, you should see if one of your apps is getting a lot of SQL errors.
Even class "others"
A reader, December 20, 2006 - 1:54 pm UTC
Tom,
We are on 10gr2, RAC.
If I have to write a query to find out: Database time spent waiting (%) for event class "Others", which v$ view(s) would I use?
The reason I am asking is that grid control is constantly reporting that database time spent waiting (%) for event class "Other" is more than 40 throughout the day and often reaching 100%. This is happening on several databases, some of which have practically no activity during the day. I wrote the following queries to find out what % of wait is attributed to "Other":
select (sum(decode(wait_class#,0,time_waited,0))/sum(time_waited))*100 from gv$session_wait_class;
select (sum(decode(wait_class#,0,time_waited,0))/sum(time_waited))*100 from gv$system_wait_class;
The results were less than 0.1% in both cases throughout the day. The AWR/ADDM reports do not report any waits on "Other". I opened a SR with Oracle support 2 months ago but they are still unable to give me a resolution.
My thought is that if a database has 40%-100% waits on "Other" throughout the day, it will be a database where no work could be done. This is not the case. The databases are running fine.
Can you please help me on this issue? I understand that working with support and asking a question on this site are a duplication of effort, but you can understand my frustration when I do not get any resolution for 2 months and everyday I get 200-300 e-mail messages from grid control about these bogus waits.
Thanks
December 21, 2006 - 3:55 pm UTC
I asked around and got back:
If the query is reporting only a small value but EM is alerting because it believes that the value is high, then there must be a problem somewhere.
I haven't seen it at any other sites though.
As a workaround, they could try setting the alert threshold to 100 in EM to avoid the emails.
Event class "other"
A reader, December 21, 2006 - 6:37 pm UTC
Tom,
Thanks very much for replying. Oracle support did get back to me today and they are using the v$waitclassmetric view to get the information. The query is:
SELECT TO_CHAR(CAST(wcd.end_time AS TIMESTAMP),
'YYYY-MM-DD HH24:MI:SS TZD') time,
'wait_sess_cls', label, wcn.wait_class, 0, wcd.value
FROM (SELECT DISTINCT wait_class, wait_class_id
FROM v$event_name) wcn,
(SELECT end_time, wait_class_id,
'avg_users_waiting_on_class' as label
,average_waiter_count as value
FROM v$waitclassmetric
UNION ALL
SELECT end_time, wait_class_id,'dbtime_waitclass_pct' as label
, dbtime_in_wait as value
FROM v$waitclassmetric) wcd
WHERE wcn.wait_class_id = wcd.wait_class_id;
I am bit confused about the v$waitclassmetric view and the way it is being used in the query.
From the label, I can infer that the query is treating dbtime_in_wait as a percentage. From the Oracle reference manual, dbtime_in_wait is defined as "Database time spent in the wait". Now the documentation does not mention whether this is a percentage or actual time in cs. If the column is a percentage, then yes this view is showing values anywhere from 12% to 100% and I will have to open SR with RDBMS group since the numbers do not match what I get from other wait event views. If this column is just actual time in cs, then its interpretation in the query is wrong and it is a grid control issue.
I did try setting the threshold to 100%. It slows down the e-mails but does not get rid of them altogether because most times this number is at 100 for one instance or the other.
I will follow up with support. I thank you very much and really appreciate your taking time to respond back.
Still no info on all waits.
Evan, February 19, 2007 - 7:33 pm UTC
This was a great thread, but not exactly what I need.
I have a RAC with "reliable message" waits and a bunch of other, unknown to me, events. I just want to know what it is and how to tune it if needed.
Searching on Metalink and Google keeps showing it in the "list of events", but I never find a definition. How is a guy supposed to find these obscure waits?
Thanks
February 19, 2007 - 9:26 pm UTC
well, first I would ask myself "are my applications waiting on this" - tkprof would be the tool to use.
Clarity on Time out
A reader, June 13, 2007 - 9:24 am UTC
Hi tom ,
The v$system_event has a column named timed_out.
I read the reference manual where this is defined as
TOTAL_TIMEOUTS: Number of times the wait timed out
What exaclty is the meaning of this?
what happens when a wait is timed out.
Are all the waits timed out?
Does this mean that when a wait it timed out it goes on a sleep mode ?
Please can you elaborate the time out with an example.
Thanks in advance
June 13, 2007 - 1:57 pm UTC
some waits are waited for only for some period of time - then we time out - look around - make sure everything is OK - and go back to waiting.
Eg:
if you wait on an enqueue (to lock a row for example), you will wait 3 seconds, time out, wait 3 seconds, time out and so on - until the row becomes available.
Time out
RAM, June 13, 2007 - 2:28 pm UTC
Tom,
Now Lets say we have 2 sessions waiting on enqueue then the total_waits would be 2 now after waiting one of the session got hold of what it was waiting and the other session continues to wait and if this exceeds 3 seconds this times out and again comes back on wait which means that the TOTAL_WAITS is now 3 and the timed_out is 1.
And the total_wait_time would be 3*2=6 for the said duration
Is this correct?
Many Thanks
June 13, 2007 - 3:31 pm UTC
total wait time would be > 6 since one session waited and then got the resource and the other session waited 3 seconds twice.
buffer busy waits
Deepak, June 21, 2007 - 7:33 am UTC
Hi Tom,
We are encountering a lot of waits on "undo header". I believe it is same as "buffer busy waits". Am I correct?
And as we can not do much about automatic UNDO management, would like to know what best can be done to avoid this problem.
June 21, 2007 - 10:49 am UTC
if it were the same, they would be named the same.
what is "a lot", how are you measuring, and is it affecting the application you are actually concerned about tuning.
UNDO HEADER
Deepak, June 21, 2007 - 11:10 am UTC
Thanks a lot for your answer.
Actually am doing a schema import. And when queried the v$waitstat dyanamic performance view, it is showing that oracle is waiting for "undo header".
Is it normal?
Just wanted to know if we can do anything to reduce the figure. Just out of curiousity.
June 21, 2007 - 11:22 am UTC
if you are the only thing going - there will not really be any waits.
however, v$waitstat is cumulative from the beginning of the instance. if you want to reduce it, bounce the instance :)
and sigh, still no indication of how long a piece of string you believe you have.
is truncate a transaction ?
A reader, June 29, 2007 - 3:11 pm UTC
is truncate a transaction ?
July 03, 2007 - 8:44 am UTC
Transaction defined (copied from Expert Oracle Database Architecture)
<quote>
Transactions
Transactions are one of the features that set a database apart from a file system. In a file system, if you are in the middle of writing a file and the operating system crashes, this file is likely to be corrupted. It is true there are ¿journaled¿ file systems and the like, which may be able to recover your file to some point in time. However, if you need to keep two files synchronized, such as system won¿t help you there¿if you update one file, and the system fails before you finish updating the second, then you will have out-of-sync files.
This is the main purpose of transactions in the database; they take the database from one consistent state to the next. That is their job. When you commit work in the database, you are assured that either all of your changes have been saved or none of them is saved. Furthermore, you are assured that the various rules and checks that protect data integrity are implemented.
In the previous chapter we discussed transactions in terms of concurrency control and how, as a result of Oracle¿s multi-versioning read-consistent model, Oracle transactions can provide consistent data every time, under highly concurrent data access conditions. Transactions in Oracle exhibit all of the required ACID characteristics. ACID is an acronym for
* Atomicity: Either all of a transaction happens or none of it happens.
* Consistency: A transaction takes the database from one consistent state to the next.
* Isolation: The effects of a transaction may not be visible to other transactions until the transaction has committed.
* Durability: Once the transaction is committed, it is permanent.
</quote>
TRUNCATE is DDL
DDL is like a stored procedure - it is typically dozens or hundreds of statements executed as a single transaction.
Think of truncate as being logically equivalent to this plsql code:
begin
COMMIT; -- all DDL starts by closing any open tranaction
do_truncate(); -- do the DDL
COMMIT; -- all DDL ends by committing when successful
EXCEPTION
when others then
ROLLBACK; -- all unsuccessful DDL rolls back anything it started doing
RAISE; -- and re-raises the error
END;
sql*net connection tracing
Reader, August 10, 2007 - 6:14 pm UTC
Hi Tom,
We have been receiving complaints from the application team regarding slowness in database. we have enabled session tracing and evaluated the trace files. The elapsed time for all the sql statements seem to be acceptable. But its taking so much of time for the output to be displayed on front end.
There are wait events sql*net message from client and sql*net message to client for each statements.
Do you what could be the issue?
How do we fix this?
Can you please let me know how to trace the sql*net connections and operations associated with it?
I would really appreciate you advice.
Thanks..
August 14, 2007 - 3:27 pm UTC
sqlnet message from client.
we are WAITING FOR THE FRONT END TO TELL US TO DO SOMETHING.
you have just shown to the application team that - well - problem is not database, problem apparently is their code.
The data is coming from the database fast - we are spending most of our time in the database waiting to be told to do something. Problem is not in database in all likelihood
control file heartbeat??
Hector Gabriel Ulloa Ligarius, August 22, 2007 - 10:53 am UTC
Hi Team Tom
What's that event?? "control file hearbeat"
The code :
col event for a60
col total_waits for 999,999,999,999
col total_timeouts for 999,999,999,999
col time_waited for 999G999G999G999
col average_wait for 999,999,999,999
select a.* ,
b.startup_time
from v$system_event a,
v$instance b
order by a.time_waited;
Result :
EVENT : control file heartbeat
TOTAL_WAITS : 1
TOTAL_TIMEOUTS : 1
TIME_WAITED : 430
AVERAGE_WAIT : 430
TIME_WAITED_MICRO : 4296704
STARTUP_TIME : 19-07-2007 02:07:57
TOTAL_WAITED is
430 seconds??
Is very high???
Thank you very much
Regards
Hector Gabriel Ulloa Ligarius
http://ligarius.wordpress.com
August 22, 2007 - 2:15 pm UTC
it is an event waited on by a background process in a clustered environment - it is normal to have it.
But what information can we find - not much
Evan, November 19, 2007 - 5:08 pm UTC
Tom,
You said "it is an event waited on by a background process in a clustered environment - it is normal to have it."
I wish you had more insight than I get elsewhere.
You say it is a background process, but not anything more. I would love for Oracle to provide a web page with all of the wait states AND WHAT THEY MEAN!!!
This is an ongoing problem. I need to know about wait state "xyz". When we search metalink, there is little to no information. When we google, the text we search is listed, but no information about it.
It is a shame there is no list of ALL wait events and what they mean for RAC.
I could give a list of the most obscure, but I would rather you get an address or dig up a listing of all of them for us users.
You must hear this all the time, but I really appreciate what you do for us all.
Thanks,
Evan
jobq slave waits...
VLS, March 13, 2008 - 10:52 am UTC
Hi Tom,
In one of the post above, you mentioned that jobq slave waits are idle and should be ignored, which is fine. But I have seen a strange behaviour on one of a 9205 database.
The parameter job_queue_processes are defined as 100.
Sometimes (almost 2-3 times a week), we observe almost 80-90 sessions waiting on jobq slave waits and all these are active sessions. Surprisingly, when we query for the sql that is being executed by these waiting sessions, the sql's are purely OLTP Application Queries and not a query that should be a part of job. Secondly, querying dba_jobs_running shows only 2 or 3 jobs running.
How is it possible ? Need your help in this.
Regards
March 15, 2008 - 8:40 am UTC
sorry, but I believe you mis-diagnosed something.
our dedicated and shared server processes do not wait on this event. The Background snpN or jNNN processes do.
so, maybe you are joining wrong?
rdbms ipc reply event debug
David, April 11, 2008 - 2:20 pm UTC
Tom,
I'm trying to debug the weird hangs for my truncate statements - they might hand for 10-20 minutes
(without any user locks blocking from another session)
and then all release at once.
I join V$SESSION with V$SESSION_WAIT and it shows that truncates seem to line up waiting for "rdbms ipc reply"
from process 5, Process with SID=5 is "smon timer".
Not sure what this could be waiting for... even log sync or buffer writes never take 10-20 minutes on our system :)
SELECT substr(module,1,28) as module,
EVENT,
case
when LAST_CALL_ET < 120 then LAST_CALL_ET||' seconds'
when LAST_CALL_ET < 3600 then round(LAST_CALL_ET/60,1)||' minutes'
when LAST_CALL_ET < 3600*24 then round(LAST_CALL_ET/60/60,1)||' hours'
else round(LAST_CALL_ET/60/60/24,1)||' days'
end LAST_CALL,
P1TEXT,
P1,
P2TEXT,
P2
FROM V$SESSION_WAIT JOIN V$SESSION s using (sid)
WHERE s.status = 'ACTIVE' and schemaname <> 'SYS'
order by 3 ;
MODULE EVENT LAST_CALL P1TEXT P1 P2TEXT P2
----------------------- ------------------ ----------- ------------- ---------- -------- ----------
SQL*Plus log file sync 0 seconds buffer# 3775 0
MYSELF(DEBUGGER) SQL*Net message to 0 seconds driver id 1650815232 #bytes 1
DBSync: _EDU.ADM_PUBLIS buffer busy waits 2.6 minutes file# 2 block# 2
DATA_COLLECTOR Truncate rdbms ipc reply 4.3 minutes from_process 5 timeout 21474812
DATA_COLLECTOR Truncate enqueue 4.3 minutes name|mode 1128857606 id1 1
DATA_COLLECTOR Truncate enqueue 4.3 minutes name|mode 1128857606 id1 1
DATA_COLLECTOR Truncate enqueue 2.9 minutes name|mode 1128857606 id1 1
DATA_COLLECTOR Truncate enqueue 7.3 minutes name|mode 1128857606 id1 1
DATA_COLLECTOR Truncate enqueue 4.3 minutes name|mode 1128857606 id1 1
DATA_COLLECTOR Truncate enqueue 2.9 minutes name|mode 1128857606 id1 1
DATA_COLLECTOR Truncate enqueue 4.3 minutes name|mode 1128857606 id1 1
DATA_COLLECTOR Truncate enqueue 4.3 minutes name|mode 1128857606 id1 1
DATA_COLLECTOR Truncate enqueue 76 seconds name|mode 1128857606 id1 1
REFRESH_SUM_PHP enqueue 4.2 minutes name|mode 1128857606 id1 1
DBSync: Disable constr enqueue 71 seconds name|mode 1128857606 id1 1
April 11, 2008 - 3:34 pm UTC
I would guess that you have lots of dirty buffers - and we are waiting for the necessary checkpoint that is issued for the truncate to complete - and the rest of the truncates are waiting on the first truncate.
So, do you see lots of write activity when this happens - if so, it is the large amount of dirty buffers that must be written to disk before the truncate can safely happen.
You can use something like fast_start_mttr_target to have the cache be kept "more clean" (limit the number of blocks that need checkpointing at any point in time) or just realize that the truncate is not an instantaneous thing - it might take a while to complete - depending on the amount of work to be done before it can be safely executed.
Dirty Buffers/IO bottleneck
David, April 14, 2008 - 4:01 pm UTC
Tom,
thank you very much - this is the problem.
I cleared some secondary issues on the instance, set fast_start_mttr_target to 60, added another DBRW process and clearly see now by iostat utilization that everything comes to halt when it lears the dirty buffers:
Normal state:
TARGET_MTTR EST_MTTR ACTUAL_REDO_BLKS RECOVERY_ESTD_IOS
----------- -------- ---------------- -----------------
60 59 232671 7702
when everything hangs:
TARGET_MTTR EST_MTTR ACTUAL_REDO_BLKS RECOVERY_ESTD_IOS
----------- -------- ---------------- -----------------
60 601 1021561 87394
Thanks a lot!
SQL*Net more data from dblink
Chaman, May 09, 2008 - 7:29 am UTC
Tom,
One of the query seems to be hanging while selecting data over the dblink and we get the wait event "SQL*Net more data from dblink".
Does increasing PGA in remote db will help me? Can you please tell me any other reason for this ?
May 12, 2008 - 10:29 am UTC
... Does increasing PGA in remote db will help me? ...
yes
no
maybe
Really - it totally depends on what is going on on the remote site doesn't it. Increasing the PGA will not help a query like "select * from t" - faster IO might, but PGA - not likely.
There are *so many* other reasons for this that one cannot enumerate them. You'll really need to dig a little bit to see what might be the bottleneck on the remote site. What is IT doing.
response time analysis
karthik, August 12, 2008 - 5:17 am UTC
Tom,
We are doing production support and have to analyze the statspack report periodically during the batch runs and high usage of database(peak hour).We have both top 5 wait events and top sql statements.
1,Do all top wait events(like sequential read) related to application has a significant impact over response time?To quote an example,my wait time for sequential reads may be high but how can i conclude that is the bottleneck for my application---or--how can i associate an particular session with the listed top 5 wait events which is having a performance problem.
2,If there are background processes with high wait time reported,how can i relate to an application that has an unusual response time?
3,You may suggest me to do an application level tuning to tune the response time. But what evidence related to tuning an application i can find from statspack report.(my above 2 questions are based on this only).
August 13, 2008 - 4:20 am UTC
1) you cannot, not really from a statspack, unless you identify all of your sql (not just the top) - and can see that it does lots of the IO...
You'd have to use application level tracing (active session history - ASH - in 10g would be fairly useful there)
2) you have to see what the foregrounds are waiting on. for example - if you see log writer is waiting a lot on log file parallel write, but, you do not see the foreground processes waiting on log file sync - you don't really care that log writer is waiting on that - log writer will do lots of IO - but unless end users are waiting for commits to finish - you do not really care if log writer is waiting on that IO. You care about foreground wait events mostly - and if you have them, you might look at the background wait events to see if they are a cause (eg: if you see log file sync waits, your end users wait for that, you might look to see if lgwr is bottlenecked on IO or if there is some other cause)
3) statspack is an aggregate at the system level, it is fairly hard truly to use statspack for application level tuning. If your goal is to improve the response time of application X out of a set of applications A-Z, you'll need really to look at X, not averages over A-Z
Consistent Read Calculation
Arvind Mishra, December 01, 2008 - 12:34 am UTC
Hello Tom,
I read your book on performance tuning and understood the importance of consistent read. But how can calculate or estimate the minimum required consistent read value for a query? I need this because even though I get this value but I am not sure whether it is the minimum required value for this query or query is doing more than required consistent read.
Thanks,
Arvind
December 01, 2008 - 6:44 am UTC
... But how can calculate or estimate the minimum required
consistent read value for a query? ...
using explain plan, it is basically what it is doing. The cost is roughly equivalent to IO's (not entirely, in 10g and above - cpu costing is involved as well)
or, by closing your eyes and envisioning how it works (I am being *dead serious* here, I am not kidding)
Think about this - ok, take a query:
select * from emp where empno = ?;
Ok, what do you know about this table.
a) empno is primary key
b) index exists on empno
c) table was created with reasonable pctfree
d) table has a few thousand records.
So, what would be the MINIMUM (and maximum) IO's we would expect....
Well, we'd have to guess the index on empno probably is height 2 (a few thousand records only - just a root and then leaf blocks in index). We'd guess that rows were not migrated in the table so, the minimum IO's would be:
2(index reads) + 1(table access by index rowid read) = 3 IO's
What would the max be:
4(index reads - if index was really tall for some reason) + 2(table access by index rowid reads - the row was migrated and the first IO did not hit it) = 6 IO's
so, that query would be between 3 and 6 IO's
in a single user system with no concurrent updates. If people were modifying this data - then the IO's might go up as we read undo....
What about a query like:
select * from emp where ename like ?;
Now, that one is sort of "impossible". What would you need to KNOW?
a) how many rows are returned by this query. You cannot, you do not know what ? is
b) whether a full scan or index would be used
c) if index is used, what is clustering factor of the index (is the data stored on disk in the same order as the keys in the index - less IO will result - if not most IO will result)
d) what array fetch size does the application use?
You would use your knowledge of what is REASONABLE for that query - truly. You would have to assign some common sense numbers to those questions.
a) I estimate that about 100 records will be returned on average
b) I have an index in place and expect it to be used
c) The clustering factor of ename (search this site for "clustering factor" if those words mean nothing to you) is 'poor' since ename's arrive randomly over time, not sequentially.
d) I use an array fetch of 100 rows - that is a very nice number.
Therefore, I'd guess 5 IO's through the index (root+branch+3 leaves to get 100 entries) + 100 table blocks - since the data is spread out - so, about 105 IO's
The array fetch size will not have a considerable effect - since the data is randomly spread out.
Now what if the query were:
select * from emp where hiredate between ? and ?
you would need to know
a) how many rows...
b) whether full scan or index
c) clustering factor if index
Let us assume:
a) about 100 rows again
b) uses index
c) clustering factor is VERY good since hiredate is sequential and therefore, the data in the table is actually very well sorted by the index key in this case (that is, everyone hired in Aug-2008 would be "close" since they are inserted sequentially and we tend to not delete from EMP - just insert over time - and the inserts TEND to go to the 'end' of the table)
d) I'd use an array fetch size of 100, and this will have great effect since the data is very clustered
So, now we guess 5 IO's through the index + 3 IO's against the table (since all 100 employee records would be close to each other) or about 8 IO's
both queries return 100 records
both queries use an index to do this
one would be expected to consume 105+ IO's
the other would be expected to consume about 8 IO's
but you need to have that knowledge - the knowledge of how the data is stored, the data distributions, how the data is processed by Oracle and so on...
ops$tkyte%ORA11GR1> /*
ops$tkyte%ORA11GR1> drop table t;
ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> create table t
ops$tkyte%ORA11GR1> as
ops$tkyte%ORA11GR1> select object_name ename, to_date('01-jan-2000','dd-mon-yyyy')+rownum hiredate,
ops$tkyte%ORA11GR1> x.*
ops$tkyte%ORA11GR1> from (
ops$tkyte%ORA11GR1> select *
ops$tkyte%ORA11GR1> from stage
ops$tkyte%ORA11GR1> order by dbms_random.random
ops$tkyte%ORA11GR1> ) x
ops$tkyte%ORA11GR1> /
ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> create index ename_idx on t(ename);
ops$tkyte%ORA11GR1> create index hiredate_idx on t(hiredate);
ops$tkyte%ORA11GR1> exec dbms_stats.gather_table_stats( user, 'T' );
ops$tkyte%ORA11GR1> */
ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> select ui.index_name, ui.clustering_factor,
2 ut.blocks, ut.num_rows
3 from user_indexes ui,
4 user_tables ut
5 where ui.table_name = 'T'
6 and ut.table_name = 'T'
7 /
INDEX_NAME CLUSTERING_FACTOR BLOCKS NUM_ROWS
------------------------------ ----------------- ---------- ----------
ENAME_IDX 67481 1332 67522
HIREDATE_IDX 1302 1332 67522
<b>data is well clustered by hiredate (cluster factor near number of blocks tells us that) but not ename (cluster factor near number of rows tells us that)</b>
ops$tkyte%ORA11GR1> set autotrace traceonly statistics
ops$tkyte%ORA11GR1> set arraysize 100
ops$tkyte%ORA11GR1> select * from t where ename like 'UT%';
105 rows selected.
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
110 consistent gets
0 physical reads
0 redo size
13320 bytes sent via SQL*Net to client
431 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
105 rows processed
ops$tkyte%ORA11GR1> select * from t where hiredate between
2 to_date( '01-jan-2005', 'dd-mon-yyyy') and
3 to_date( '01-jan-2005', 'dd-mon-yyyy')+100
4 /
101 rows selected.
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
9 consistent gets
0 physical reads
0 redo size
15626 bytes sent via SQL*Net to client
431 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
101 rows processed
<b>My guesses were very accurate (and yes, trust me, they were GUESSES - I got lucky, I did the example after guessing, you can too eventually...)</b>
ops$tkyte%ORA11GR1> set arraysize 2
ops$tkyte%ORA11GR1> select * from t where ename like 'UT%';
105 rows selected.
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
161 consistent gets
0 physical reads
0 redo size
19916 bytes sent via SQL*Net to client
992 bytes received via SQL*Net from client
54 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
105 rows processed
ops$tkyte%ORA11GR1> select * from t where hiredate between
2 to_date( '01-jan-2005', 'dd-mon-yyyy') and
3 to_date( '01-jan-2005', 'dd-mon-yyyy')+100
4 /
101 rows selected.
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
105 consistent gets
0 physical reads
0 redo size
21996 bytes sent via SQL*Net to client
970 bytes received via SQL*Net from client
52 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
101 rows processed
ops$tkyte%ORA11GR1> set autotrace off
ops$tkyte%ORA11GR1>
<b>just to show the profound effect arraysize MIGHT have - not WILL have, MIGHT have - on your IO's</b>
OBI array size
Arvind Mishra, December 01, 2008 - 7:06 pm UTC
Hello Tom,
Thanks Tom. Now I am able to understand it completely. You are a very good teacher indeed :-). Can you tell me how to set array size in OBI(Earlier known as Sibel)
Regards,
Arvind
December 02, 2008 - 7:29 am UTC
talk to the Siebel people to see how/if they do it. I do not work with much of the middle tier stuff myself.
Performance book?
A reader, December 02, 2008 - 11:27 am UTC
"I read your book on performance tuning and understood the importance of consistent read." sorry Tom, but which book he is talking about ?
December 02, 2008 - 11:35 am UTC
I think he most likely meant "effective oracle by DESIGN"
which is less about "turning" (that is reactionary, after the fact) and more about designing, measuring and testing up front...
Thanks.
A reader, December 02, 2008 - 3:17 pm UTC
Book Name
Arvind Mishra, December 03, 2008 - 12:23 am UTC
Yes..It is 'Effective Oracle by Design'.
Thanks,
Arvind
Performance Tuning
A Reader, December 09, 2008 - 8:55 am UTC
Tom, which is the best book for performance Tuning Guide.
December 09, 2008 - 2:17 pm UTC
the book of life.
seriously - you need a smattering of lots of bits of knowledge (no one book has it all) and an understanding of how things work.
there is no single 'best' book.
read as many as you can.
question those that just tell you what to do - NOT WHY you are doing something. In fact, do more than question - benchmark and evaluate what they say so you can see just how wrong it usually is.
enjoy those that show you what you should be seeing when you do something - so that you can gain an understanding of WHY you are doing what you are doing (tuning wise).
A reader, December 09, 2008 - 4:30 pm UTC
I guess what Tom is saying that to do performance tuning, one must first understand what is the root cause of problem. Keep digging till root cause is identified and then do the tuning. The process of finding the root cause and fixing it will stimulate learning, either through books or Asktom website or any other means. The learning process will be the true "book of tuning".
My thought is that this not only applicable to tuning, but to everything we do on a daily basis whether it is DBA work or development work or anything else.
December 10, 2008 - 9:14 am UTC
... one must first
understand what is the root cause of problem. ...
exactly. You wouldn't like it if you Doctor said "Ok, we do not really need an appointment to figure out what is wrong - many times when I prescribe antibiotics - the problem goes away, so take these pills and see how you make out with them"
Tuning.
A Reader, September 29, 2009 - 12:29 pm UTC
Many thanks Tom,
for your time.
Database version 9.2.0.8
Optimizer Mode : First_rows
From 20 mins statspack:
********************************************************************************
CPU Elapsd
Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
650,274 1 650,274.0 2.1 30.83 192.91 417649475
Module: java_q4p@hostName (TNS V1-V3)
SELECT * FROM ( SELECT T.objid,T.x_request_date FROM view_T T WHERE ( T.x_parent_objid = :P16 ) ORDER BY 2 ASC ) WHERE ROWNUM <= 200
********************************************************************************
The view_T definition:
CREATE OR REPLACE VIEW VIEW_T
(OBJID, DEV, X_DT_OBJID, X_PARENT_OBJID, X_PARENT_ID,
X_CASE_OBJID, X_CASE_ID, X_OUTCOME, X_DTR, X_DTR_TEXT,
X_TEST_INST_ID, X_TEST_ID, X_EQUIPMENT_ID, X_REQUEST_DATE, X_MFL,
X_COMPLETION_CODE, X_TEST_STOP_TIME)
AS
(select rownum as objid,
-1 as dev,
dt.objid as x_dt_objid,
pc.objid as x_parent_objid,
pc.id_number as x_parent_id,
c.objid as x_case_objid,
c.id_number as x_case_id,
dt.x_outcome as x_outcome,
dt.x_dtr as x_dtr,
dt.x_dtr_text as x_dtr_text,
dt.x_test_inst_id as x_test_inst_id,
dt.x_test_id as x_test_id,
dt.x_equipment_id as x_equipment_id,
mr.x_date_sent as x_request_date,
dt.x_mfl as x_mfl,
dt.x_completion_code as x_completion_code,
dt.x_test_stop_time as x_test_stop_time
FROM T1 pc,
T1 c,
T2 dt,
T3 mr
where pc.objid(+) = c.case_victim2case
and c.objid = dt.x_focus_id
and dt.x_focus_type = 0
and dt.x_diag_test2x_msg_req = mr.objid(+)
and dt.x_outcome not like 'Fail -%'
union
select rownum as objid,
-1 as dev,
dt.objid as x_dt_objid,
c.objid as x_parent_objid,
c.id_number as x_parent_id,
c.objid as x_case_objid,
c.id_number as x_case_id,
dt.x_outcome as x_outcome,
dt.x_dtr as x_dtr,
dt.x_dtr_text as x_dtr_text,
dt.x_test_inst_id as x_test_inst_id,
dt.x_test_id as x_test_id,
dt.x_equipment_id as x_equipment_id,
mr.x_date_sent as x_request_date,
dt.x_mfl as x_mfl,
dt.x_completion_code as x_completion_code,
dt.x_test_stop_time as x_test_stop_time
FROM T1 c,
T2 dt,
T3 mr
where c.objid = dt.x_focus_id
and dt.x_focus_type = 0
and dt.x_diag_test2x_msg_req = mr.objid(+)
and dt.x_outcome not like 'Fail -%'
);
********************************************************************************
Explain Plan of query:
SELECT * FROM ( SELECT T.objid,T.x_request_date FROM view_T T WHERE ( T.x_parent_objid = :P16 )
ORDER BY 2 ASC ) WHERE ROWNUM <= 200
SQL> set linesize 200
SQL> /
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 200 | 4400 | | 48982 |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | VIEW | | 169K| 3636K| | 48982 |
|* 3 | SORT ORDER BY STOPKEY | | 169K| 5785K| 15M| 48982 |
|* 4 | VIEW | VIEW_T | 169K| 5785K| | 47385 |
| 5 | SORT UNIQUE | | 169K| 24M| 52M| 47385 |
| 6 | UNION-ALL | | | | | |
| 7 | COUNT | | | | | |
| 8 | MERGE JOIN OUTER | | 84638 | 13M| | 21985 |
| 9 | SORT JOIN | | 84638 | 12M| 29M| 6883 |
|* 10 | HASH JOIN | | 84638 | 12M| 10M| 4032 |
|* 11 | HASH JOIN OUTER | | 61471 | 2941K| 3691K| 2781 |
| 12 | TABLE ACCESS FULL| T1 | 61471 | 1560K| | 1252 |
| 13 | TABLE ACCESS FULL| T1 | 61471 | 1380K| | 1252 |
|* 14 | TABLE ACCESS FULL | T2 | 84638 | 8348K| | 207 |
|* 15 | SORT JOIN | | 785K| 9976K| 36M| 15102 |
| 16 | TABLE ACCESS FULL | T3 | 785K| 9976K| | 11321 |
| 17 | COUNT | | | | | |
|* 18 | HASH JOIN | | 84638 | 11M| 9048K| 19682 |
| 19 | TABLE ACCESS FULL | T1 | 61471 | 1380K| | 1252 |
| 20 | MERGE JOIN OUTER | | 84638 | 9422K| | 17298 |
| 21 | SORT JOIN | | 84638 | 8348K| 21M| 2196 |
|* 22 | TABLE ACCESS FULL | T2 | 84638 | 8348K| | 207 |
|* 23 | SORT JOIN | | 785K| 9976K| 36M| 15102 |
| 24 | TABLE ACCESS FULL | T3 | 785K| 9976K| | 11321 |
------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 - filter(ROWNUM<=200)
3 - filter(ROWNUM<=200)
4 - filter("T"."X_PARENT_OBJID"=16)
10 - access("C"."OBJID"="DT"."X_FOCUS_ID")
11 - access("PC"."OBJID"(+)="C"."CASE_VICTIM2CASE")
14 - filter("DT"."X_FOCUS_TYPE"=0 AND "DT"."X_OUTCOME" NOT LIKE 'Fail -%')
15 - access("DT"."X_DIAG_TEST2X_MSG_REQ"="MR"."OBJID"(+))
filter("DT"."X_DIAG_TEST2X_MSG_REQ"="MR"."OBJID"(+))
18 - access("C"."OBJID"="DT"."X_FOCUS_ID")
22 - filter("DT"."X_FOCUS_TYPE"=0 AND "DT"."X_OUTCOME" NOT LIKE 'Fail -%')
23 - access("DT"."X_DIAG_TEST2X_MSG_REQ"="MR"."OBJID"(+))
filter("DT"."X_DIAG_TEST2X_MSG_REQ"="MR"."OBJID"(+))
Note: cpu costing is off
48 rows selected.
SQL>
********************************************************************************
tkprof output.( I ran the sql identified from statspack in a separate session, with value of :P16 as 16 )
SELECT * FROM ( SELECT T.objid,T.x_request_date
FROM view_T T
WHERE ( T.x_parent_objid = 16 ) ORDER BY 2 ASC ) WHERE ROWNUM <= 200
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.01 0 2 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 25.10 313.88 512597 561247 209 4
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 25.11 313.89 512597 561249 209 4
Misses in library cache during parse: 1
Optimizer goal: FIRST_ROWS
Parsing user id: 42 (SA)
Rows Row Source Operation
------- ---------------------------------------------------
4 COUNT STOPKEY
4 VIEW
4 SORT ORDER BY STOPKEY
4 VIEW
222568 SORT UNIQUE
222568 UNION-ALL
111284 COUNT
111284 MERGE JOIN OUTER
111284 SORT JOIN
111284 HASH JOIN
73011 HASH JOIN OUTER
73011 TABLE ACCESS FULL T1
73011 TABLE ACCESS FULL T1
111284 TABLE ACCESS FULL T2
99111 SORT JOIN
954900 TABLE ACCESS FULL T3
111284 COUNT
111284 HASH JOIN
73011 TABLE ACCESS FULL T1
111284 MERGE JOIN OUTER
111284 SORT JOIN
111284 TABLE ACCESS FULL T2
99111 SORT JOIN
954900 TABLE ACCESS FULL T3
********************************************************************************
So, in order to read 4 rows output we are hitting above 60k rows (ID- 12,13 of the explain plan). However filter("T"."X_PARENT_OBJID"=16) is done at Id-4 of the explain plan.
column x_parent_objid is indexed.
Questions:
a) Why optimizer is not choosing this filter ( "T"."X_PARENT_OBJID"=16 ) in the begining so that initial rowcount hit is reduced . hence less IO.. less work?
b) There are lot PIO. On table T1 looks it is due to the following predicate:
where pc.objid(+) = c.case_victim2case
Here. why hash outer join on table T1 not going for Index Scan as pc.objid ( Unique index) and c.case_victim2case (non unique index) are indexed?
c)Any other way we can re-write the query?
Regards
Tuning.
A Reader, September 29, 2009 - 12:35 pm UTC
Tom,
reposting the explain plan & tkprof o/p for the above question for better redability.
SQL> set linesize 200
SQL> /
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 200 | 4400 | | 48982 |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | VIEW | | 169K| 3636K| | 48982 |
|* 3 | SORT ORDER BY STOPKEY | | 169K| 5785K| 15M| 48982 |
|* 4 | VIEW | VIEW_T | 169K| 5785K| | 47385 |
| 5 | SORT UNIQUE | | 169K| 24M| 52M| 47385 |
| 6 | UNION-ALL | | | | | |
| 7 | COUNT | | | | | |
| 8 | MERGE JOIN OUTER | | 84638 | 13M| | 21985 |
| 9 | SORT JOIN | | 84638 | 12M| 29M| 6883 |
|* 10 | HASH JOIN | | 84638 | 12M| 10M| 4032 |
|* 11 | HASH JOIN OUTER | | 61471 | 2941K| 3691K| 2781 |
| 12 | TABLE ACCESS FULL| T1 | 61471 | 1560K| | 1252 |
| 13 | TABLE ACCESS FULL| T1 | 61471 | 1380K| | 1252 |
|* 14 | TABLE ACCESS FULL | T2 | 84638 | 8348K| | 207 |
|* 15 | SORT JOIN | | 785K| 9976K| 36M| 15102 |
| 16 | TABLE ACCESS FULL | T3 | 785K| 9976K| | 11321 |
| 17 | COUNT | | | | | |
|* 18 | HASH JOIN | | 84638 | 11M| 9048K| 19682 |
| 19 | TABLE ACCESS FULL | T1 | 61471 | 1380K| | 1252 |
| 20 | MERGE JOIN OUTER | | 84638 | 9422K| | 17298 |
| 21 | SORT JOIN | | 84638 | 8348K| 21M| 2196 |
|* 22 | TABLE ACCESS FULL | T2 | 84638 | 8348K| | 207 |
|* 23 | SORT JOIN | | 785K| 9976K| 36M| 15102 |
| 24 | TABLE ACCESS FULL | T3 | 785K| 9976K| | 11321 |
------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 - filter(ROWNUM<=200)
3 - filter(ROWNUM<=200)
4 - filter("T"."X_PARENT_OBJID"=16)
10 - access("C"."OBJID"="DT"."X_FOCUS_ID")
11 - access("PC"."OBJID"(+)="C"."CASE_VICTIM2CASE")
14 - filter("DT"."X_FOCUS_TYPE"=0 AND "DT"."X_OUTCOME" NOT LIKE 'Fail -%')
15 - access("DT"."X_DIAG_TEST2X_MSG_REQ"="MR"."OBJID"(+))
filter("DT"."X_DIAG_TEST2X_MSG_REQ"="MR"."OBJID"(+))
18 - access("C"."OBJID"="DT"."X_FOCUS_ID")
22 - filter("DT"."X_FOCUS_TYPE"=0 AND "DT"."X_OUTCOME" NOT LIKE 'Fail -%')
23 - access("DT"."X_DIAG_TEST2X_MSG_REQ"="MR"."OBJID"(+))
filter("DT"."X_DIAG_TEST2X_MSG_REQ"="MR"."OBJID"(+))
Note: cpu costing is off
48 rows selected.
SQL>
********************************************************************************
tkprof output.
SELECT * FROM ( SELECT T.objid,T.x_request_date
FROM view_T T
WHERE ( T.x_parent_objid = 16 ) ORDER BY 2 ASC ) WHERE ROWNUM <= 200
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.01 0 2 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 25.10 313.88 512597 561247 209 4
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 25.11 313.89 512597 561249 209 4
Misses in library cache during parse: 1
Optimizer goal: FIRST_ROWS
Parsing user id: 42 (SA)
Rows Row Source Operation
------- ---------------------------------------------------
4 COUNT STOPKEY
4 VIEW
4 SORT ORDER BY STOPKEY
4 VIEW
222568 SORT UNIQUE
222568 UNION-ALL
111284 COUNT
111284 MERGE JOIN OUTER
111284 SORT JOIN
111284 HASH JOIN
73011 HASH JOIN OUTER
73011 TABLE ACCESS FULL T1
73011 TABLE ACCESS FULL T1
111284 TABLE ACCESS FULL T2
99111 SORT JOIN
954900 TABLE ACCESS FULL T3
111284 COUNT
111284 HASH JOIN
73011 TABLE ACCESS FULL T1
111284 MERGE JOIN OUTER
111284 SORT JOIN
111284 TABLE ACCESS FULL T2
99111 SORT JOIN
954900 TABLE ACCESS FULL T3
********************************************************************************
Tuning. ....
A Reader, October 07, 2009 - 10:38 am UTC
Many thanks Tom,
for your time.
Sorry I am reposting above question again... actually I gave all the details but missed the asking question previously ( error in copy/paste from my side). Also this question is best categorired under "SQL Tuning" section..I will take care of the same next time and make sure that I post the question to right category.
Database version 9.2.0.8
Optimizer Mode : First_rows
From 20 mins statspack:
********************************************************************************
CPU Elapsd
Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
650,274 1 650,274.0 2.1 30.83 192.91 417649475
Module: java_q4p@hostName (TNS V1-V3)
SELECT * FROM ( SELECT T.objid,T.x_request_date FROM view_T T WHERE ( T.x_parent_objid = :P16 ) ORDER BY 2 ASC ) WHERE ROWNUM <= 200
********************************************************************************
The view_T definition:
CREATE OR REPLACE VIEW VIEW_T
(OBJID, DEV, X_DT_OBJID, X_PARENT_OBJID, X_PARENT_ID,
X_CASE_OBJID, X_CASE_ID, X_OUTCOME, X_DTR, X_DTR_TEXT,
X_TEST_INST_ID, X_TEST_ID, X_EQUIPMENT_ID, X_REQUEST_DATE, X_MFL,
X_COMPLETION_CODE, X_TEST_STOP_TIME)
AS
(select rownum as objid,
-1 as dev,
dt.objid as x_dt_objid,
pc.objid as x_parent_objid,
pc.id_number as x_parent_id,
c.objid as x_case_objid,
c.id_number as x_case_id,
dt.x_outcome as x_outcome,
dt.x_dtr as x_dtr,
dt.x_dtr_text as x_dtr_text,
dt.x_test_inst_id as x_test_inst_id,
dt.x_test_id as x_test_id,
dt.x_equipment_id as x_equipment_id,
mr.x_date_sent as x_request_date,
dt.x_mfl as x_mfl,
dt.x_completion_code as x_completion_code,
dt.x_test_stop_time as x_test_stop_time
FROM T1 pc,
T1 c,
T2 dt,
T3 mr
where pc.objid(+) = c.case_victim2case
and c.objid = dt.x_focus_id
and dt.x_focus_type = 0
and dt.x_diag_test2x_msg_req = mr.objid(+)
and dt.x_outcome not like 'Fail -%'
union
select rownum as objid,
-1 as dev,
dt.objid as x_dt_objid,
c.objid as x_parent_objid,
c.id_number as x_parent_id,
c.objid as x_case_objid,
c.id_number as x_case_id,
dt.x_outcome as x_outcome,
dt.x_dtr as x_dtr,
dt.x_dtr_text as x_dtr_text,
dt.x_test_inst_id as x_test_inst_id,
dt.x_test_id as x_test_id,
dt.x_equipment_id as x_equipment_id,
mr.x_date_sent as x_request_date,
dt.x_mfl as x_mfl,
dt.x_completion_code as x_completion_code,
dt.x_test_stop_time as x_test_stop_time
FROM T1 c,
T2 dt,
T3 mr
where c.objid = dt.x_focus_id
and dt.x_focus_type = 0
and dt.x_diag_test2x_msg_req = mr.objid(+)
and dt.x_outcome not like 'Fail -%'
);
SQL> set linesize 200
SQL> /
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 200 | 4400 | | 48982 |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | VIEW | | 169K| 3636K| | 48982 |
|* 3 | SORT ORDER BY STOPKEY | | 169K| 5785K| 15M| 48982 |
|* 4 | VIEW | VIEW_T | 169K| 5785K| | 47385 |
| 5 | SORT UNIQUE | | 169K| 24M| 52M| 47385 |
| 6 | UNION-ALL | | | | | |
| 7 | COUNT | | | | | |
| 8 | MERGE JOIN OUTER | | 84638 | 13M| | 21985 |
| 9 | SORT JOIN | | 84638 | 12M| 29M| 6883 |
|* 10 | HASH JOIN | | 84638 | 12M| 10M| 4032 |
|* 11 | HASH JOIN OUTER | | 61471 | 2941K| 3691K| 2781 |
| 12 | TABLE ACCESS FULL| T1 | 61471 | 1560K| | 1252 |
| 13 | TABLE ACCESS FULL| T1 | 61471 | 1380K| | 1252 |
|* 14 | TABLE ACCESS FULL | T2 | 84638 | 8348K| | 207 |
|* 15 | SORT JOIN | | 785K| 9976K| 36M| 15102 |
| 16 | TABLE ACCESS FULL | T3 | 785K| 9976K| | 11321 |
| 17 | COUNT | | | | | |
|* 18 | HASH JOIN | | 84638 | 11M| 9048K| 19682 |
| 19 | TABLE ACCESS FULL | T1 | 61471 | 1380K| | 1252 |
| 20 | MERGE JOIN OUTER | | 84638 | 9422K| | 17298 |
| 21 | SORT JOIN | | 84638 | 8348K| 21M| 2196 |
|* 22 | TABLE ACCESS FULL | T2 | 84638 | 8348K| | 207 |
|* 23 | SORT JOIN | | 785K| 9976K| 36M| 15102 |
| 24 | TABLE ACCESS FULL | T3 | 785K| 9976K| | 11321 |
------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
1 - filter(ROWNUM<=200)
3 - filter(ROWNUM<=200)
4 - filter("T"."X_PARENT_OBJID"=16)
10 - access("C"."OBJID"="DT"."X_FOCUS_ID")
11 - access("PC"."OBJID"(+)="C"."CASE_VICTIM2CASE")
14 - filter("DT"."X_FOCUS_TYPE"=0 AND "DT"."X_OUTCOME" NOT LIKE 'Fail -%')
15 - access("DT"."X_DIAG_TEST2X_MSG_REQ"="MR"."OBJID"(+))
filter("DT"."X_DIAG_TEST2X_MSG_REQ"="MR"."OBJID"(+))
18 - access("C"."OBJID"="DT"."X_FOCUS_ID")
22 - filter("DT"."X_FOCUS_TYPE"=0 AND "DT"."X_OUTCOME" NOT LIKE 'Fail -%')
23 - access("DT"."X_DIAG_TEST2X_MSG_REQ"="MR"."OBJID"(+))
filter("DT"."X_DIAG_TEST2X_MSG_REQ"="MR"."OBJID"(+))
Note: cpu costing is off
48 rows selected.
SQL>
********************************************************************************
tkprof output.
SELECT * FROM ( SELECT T.objid,T.x_request_date
FROM view_T T
WHERE ( T.x_parent_objid = 16 ) ORDER BY 2 ASC ) WHERE ROWNUM <= 200
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.01 0 2 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 25.10 313.88 512597 561247 209 4
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 25.11 313.89 512597 561249 209 4
Misses in library cache during parse: 1
Optimizer goal: FIRST_ROWS
Parsing user id: 42 (SA)
Rows Row Source Operation
------- ---------------------------------------------------
4 COUNT STOPKEY
4 VIEW
4 SORT ORDER BY STOPKEY
4 VIEW
222568 SORT UNIQUE
222568 UNION-ALL
111284 COUNT
111284 MERGE JOIN OUTER
111284 SORT JOIN
111284 HASH JOIN
73011 HASH JOIN OUTER
73011 TABLE ACCESS FULL T1
73011 TABLE ACCESS FULL T1
111284 TABLE ACCESS FULL T2
99111 SORT JOIN
954900 TABLE ACCESS FULL T3
111284 COUNT
111284 HASH JOIN
73011 TABLE ACCESS FULL T1
111284 MERGE JOIN OUTER
111284 SORT JOIN
111284 TABLE ACCESS FULL T2
99111 SORT JOIN
954900 TABLE ACCESS FULL T3
********************************************************************************
Questions:
a) Why filter in explain plan output ( 4 - filter("T"."X_PARENT_OBJID"=16) ) is not considered at step#12 & 13 in execution plan? as this would have reduces row source drastically in the begining stage.
I have access to your book "Effective Oracle by Desgin"- and it looks first entire result set is calculated then filter is applied.
b) Why indexes are not being considered in Id#12 & 13?
here only it goes for lots of PIOs
Indexing scheme:
pc.objid has unique index
c.case_victim2case has non unique index
c.objid has unqiue index
dt.x_diag_test2x_msg_req has non unique index
mr.objid has unique index
c) Any other way to optimise(rewrite) the query?
Regards
Tuning. ....
A Reader, October 07, 2009 - 10:42 am UTC
Sorry
Tom. It looks text is getting truncated after certain limit.
Posting the questions below for above.
Questions:
a) Why filter in explain plan output ( 4 - filter("T"."X_PARENT_OBJID"=16) ) is not considered at step#12 & 13 in execution plan? as this would have reduces row source drastically in the begining stage.
I have access to your book "Effective Oracle by Desgin"- and it looks first entire result set is calculated then filter is applied.
b) Why indexes are not being considered in Id#12 & 13?
here only it goes for lots of PIOs
Indexing scheme:
pc.objid has unique index
c.case_victim2case has non unique index
c.objid has unqiue index
dt.x_diag_test2x_msg_req has non unique index
mr.objid has unique index
c) Any other way to optimise(rewrite) the query?
Regards
asynch descriptor resize ?
A reader, June 28, 2010 - 11:10 am UTC
What is the meaning / cause of "asynch descriptor resize"? . I see this wait in my top list of waits and consuming 199361.53% DB time .
I looked into metalink . Not enough information.
I am on 11g R2 on windows 2008 64 bit.
Thanks
July 06, 2010 - 12:44 pm UTC
it cannot consume more than 100% of db time, numbers look wrong if it does. Please file a service request via support - if it consumes way more than 100% of time, something is wrong.
(and there is nothing you can do to offset this wait, short of "do not do direct IO" which in itself would become the larger bottleneck (lack of directio)
Oracle wait events
Shafi M., July 29, 2010 - 8:57 pm UTC
Hello Tom,suddenly I see a RMAN backup job hangs(for more than 2 hours) with "Enqueue: CF Contention" on 10gR2 7-node RAC on Linux RH system.
looks like the controlfile backup is taking enormous time.I killed the session later.what may be the issue?
best regards,
August 02, 2010 - 7:55 am UTC
it could be caused by a few things, I start with bug 7023147 - but utilize support for this please. You'll need a bit more information to diagnose.
row cache lock in top 5 wait events
baskar.l, October 05, 2010 - 9:41 am UTC
Hi Tom,
DB version is 10.2.0.4. i could see row cache lock in top 5 wait events in the db..
Top 5 Timed Events Avg %Total
~~~~~~~~~~~~~~~~~~ wait Call
Event Waits Time (s) (ms) Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
db file sequential read 3,544,685 49,464 14 59.3 User I/O
CPU time 19,307 23.2
ARCH wait on SENDREQ 2,272 5,002 2202 6.0 Network
db file scattered read 128,920 3,514 27 4.2 User I/O
row cache lock 1,787,698 2,192 1 2.6 Concurrenc
-------------------------------------------------------------
SQL> select p1text,p1,p2text,p2,p3text,p3 from v$session where event = 'row cache lock';
P1TEXT P1 P2TEXT P2 P3TEXT P3
-------------------- ---------- -------------------- ---------- ---------------------------------------------------------------- ----------
cache id 16 mode 0 request 3
1 row selected.
SQL> select PARAMETER ,COUNT ,GETS ,GETMISSES ,MODIFICATIONS from v$rowcache where cache#=16;
PARAMETER COUNT GETS GETMISSES MODIFICATIONS
-------------------------------- ---------- ---------- ---------- -------------
dc_histogram_defs 1704 47481213 11479508 5408
dc_histogram_data 218 8211149 1277649 0
dc_histogram_data 83 2668657 301607 0
3 rows selected.
what does dc_histogram_defs and dc_histogram_data?? Why is it showing in row cache lock??
baskar.l
October 05, 2010 - 12:28 pm UTC
hah, given your other waits, I wouldn't even be concerned - it would not have dawned on me to look at them (the row cache waits).
you are seeing waits on getting histogram information - probably related to IO waits to read that stuff in - given your other massive IO problem.
You can try increasing your shared pool to increase the chance of caching more of the dictionary
Thanks Tom
baskar.l, October 06, 2010 - 10:34 pm UTC
Thanks Tom..Its just amazing to read your blogs and answers.Guess you have a great time management in place(off topic)
baskar.l
Why need db file sequential read wait event?
tom.lai, October 26, 2010 - 1:16 am UTC
Hi Tom,
db file sequential read wait event is relative to single block read event, db file scattered read wait event is multi-block read. my question is why oracle need to "db file sequential read"? because contiguous multi blocks read into cache buffer, this will occur better performance.
thanks
October 26, 2010 - 7:53 pm UTC
because sometimes we need A SINGLE BLOCK.
If you give us a rowid (after seaching an index for example), we need A SINGLE BLOCK.
reading a bunch of blocks we don't need would be a waste, would not it? (rhetorical question, yes is the answer)
A reader, October 26, 2010 - 8:26 pm UTC
Thanks tom.
wait class
ashish kumar, July 28, 2011 - 2:53 am UTC
Hi Tom,
I am facing issues with "Concurrency" wait class, which is consuming database time, due to it performance of database is low now. In my addm report it shows like this:
Contention for latches related to the shared pool was consuming significant
database time.
NO RECOMMENDATIONS AVAILABLE
ADDITIONAL INFORMATION:
Waits for "cursor: pin S wait on X" amounted to 38% of database time.
Hard parsing of SQL statements was not consuming significant database
time.
SYMPTOMS THAT LED TO THE FINDING:
SYMPTOM: Wait class "Concurrency" was consuming significant database
time. (39% impact [71248 seconds])
and in AWR top event category it shows:
Event Waits Time(s) Avg Wait(ms) % Total Call Time Wait Class
cursor: pin S wait on X 3,589,632 69,583 19 38.5 Concurrency
give some suggestion to overcome this issue
Thanks in advance,
Ashish
July 28, 2011 - 7:19 pm UTC
you are parsing a lot.
what is your setting for session cached cursors?
how many parses/second do you do?
wait class
ashish kumar, July 29, 2011 - 3:35 am UTC
Hi Tom,
We have set session_cached_cursors=50
and
parse time elapsed 1,841.98 per sec and consuming 1.02% of DB time
I got some more statistics in AWR Regarding concurrency, its:
Concurrency(Wait Class)
3,596,358(Waits)
99.83(%Time -outs)
71,248(Total Wait Time/s)
20(Avg wait/ms)
1.44(Waits/txn
Instance Efficiency Percentages (Target 100%)
Buffer Nowait %: 99.81 Redo NoWait %: 100.00
Buffer Hit %: 79.96 In-memory Sort %: 100.00
Library Hit %: 98.80 Soft Parse %: 89.92
Execute to Parse %: 95.60 Latch Hit %: 99.94
Parse CPU to Parse Elapsd %: 6.59 % Non-Parse CPU: 98.98
Regards,
Ashish
August 01, 2011 - 11:29 am UTC
soft parse % is horrible, you are hard parsing like crazy.
nothing but fixing the code that is not using binds correctly will fix that (I will not recommend cursor_sharing=force/similar, I do not recommend using that)
buffer busy wait vs buffer gets
Lasse Jenssen, November 09, 2011 - 7:37 am UTC
Hi Tom
Our database got yammed last night, and I'm strugling to see what's the different from the other nights.
There is (rman) backup going on at this time and there is a delete script deleting thousands of rows from different tables (this includes a log table that gets written to by the web application).
The system is running on a IBM lpar with 10 lcpu's. I thinkt this means that there are 5 physical CPU available for the partition.
The day before the load statistics looks the same.
But in the statspack report (30 min snap) from yesterday and the day before i see the following top events:
Here we definitly got a problem:Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
buffer busy waits 1,404,117 54,841 38.71
latch free 1,814,866 51,804 36.57
enqueue 30,772 17,764 12.54
CPU time 7,608 5.37
sbtwrite2 274,823 4,706 3.32
The day before everything is ok:Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
db file sequential read 973,353 4,783 30.26
sbtwrite2 545,896 4,656 29.45
CPU time 4,539 28.71
db file parallel write 22,299 605 3.83
async disk IO 301,410 392 2.48
In the top SQL order by Gets from yesterday i see:
CPU Elapsd
Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
2,380,729,134 24,511 97,129.0 98.2 5584.91 ######### 2174909325
Module: JDBC Thin Client
INSERT INTO log ...
The day before the number of execution was even higher, but buffer gets was "only" one fifth (about 481,300,919):
CPU Elapsd
Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
481,300,919 30,197 15,938.7 91.6 1012.91 3251.11 2174909325
Module: JDBC Thin Client
INSERT INTO log ...
Question:Does Oracle count an "buffer get" also when there is a "buffer busy waits"? So when you see abnormal "buffer busy waits" there would naturally be an increase in "buffer gets"?</code>
November 09, 2011 - 7:57 am UTC
Here we definitly got a problem:
how so - maybe yes, maybe no.
The day before the load statistics looks the same.
what does that mean in the context of this discussion? I'm not sure what "load statistics" are to you?
It looks like your delete of the LOG table was running at a time of heavy activity (other DML activity - the insert) today during this window of time - but not yesterday.
A buffer get could be preceded by a buffer busy wait - yes. But it would be an attempt to get a buffer that would cause the wait, not the wait causing increased buffer gets directly.
The increased buffer gets sound like they could be due to read consistency (we've hit blocks that have been modified since our statement began and we need to roll them back - the extra buffer gets are reads against undo to roll back the changes).
Can you consider using partitioning to purge this information - a simple fast DDL operation to truncate or drop an old partition. No redo, no undo, no read consistency issues?
buffer busy wait vs buffer gets
Lasse Jenssen, November 10, 2011 - 2:56 am UTC
You wrote: "maybe yes, maybe no."No. Definitly yes! The application ran slower and slower, and after a while the customer couldn't even log into the application. That is a problem, and its definit :-)
You wrote: "not sure what "load statistics" are to you?"Sorry. I was talking about the load profiles in the statspack report (... I know :-( ... we're working to upgrade to 11g). As you see below the load seems to be almost similar for the two periods, except of the logical reads. And the question regarding buffer gets and buffer busy waits was regarding this (see further explaination below).
Load Profiles from statspack:
DB Name DB Id Instance Inst Num Release Cluster Host
------------ ----------- ------------ -------- ----------- ------- ------------
xx xxxxxxxxxx xx 1 9.2.0.8.0 NO xxxxx
Snap Id Snap Time Sessions Curs/Sess Comment
--------- ------------------ -------- --------- -------------------
Begin Snap: 46269 08-Nov-11 19:15:02 418 33.1
End Snap: 46270 08-Nov-11 19:45:03 413 33.5
Elapsed: 30.02 (mins)
Cache Sizes (end)
~~~~~~~~~~~~~~~~~
Buffer Cache: 2,560M Std Block Size: 8K
Shared Pool Size: 512M Log Buffer: 1,024K
Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 1,211,114.60 79,239.20
Logical reads: 735,605.13 48,128.20
Block changes: 4,025.15 263.35
Physical reads: 526.88 34.47
Physical writes: 521.38 34.11
User calls: 380.24 24.88
Parses: 293.71 19.22
Hard parses: 0.70 0.05
Sorts: 67.20 4.40
Logons: 0.98 0.06
Executes: 462.64 30.27
Transactions: 15.28
% Blocks changed per Read: 0.55 Recursive Call %: 67.49
Rollback per transaction %: 6.12 Rows per Sort: 176.17
Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 99.95 Redo NoWait %: 100.00
Buffer Hit %: 99.93 In-memory Sort %: 99.99
Library Hit %: 99.91 Soft Parse %: 99.76
Execute to Parse %: 36.51 Latch Hit %: 96.70
Parse CPU to Parse Elapsd %: 6.69 % Non-Parse CPU: 99.95
Snap Id Snap Time Sessions Curs/Sess Comment
--------- ------------------ -------- --------- -------------------
Begin Snap: 46210 07-Nov-11 19:15:01 394 31.4
End Snap: 46211 07-Nov-11 19:45:01 390 31.6
Elapsed: 30.00 (mins)
Cache Sizes (end)
~~~~~~~~~~~~~~~~~
Buffer Cache: 2,560M Std Block Size: 8K
Shared Pool Size: 512M Log Buffer: 1,024K
Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 1,229,491.52 78,192.58
Logical reads: 291,867.69 18,562.05
Block changes: 4,011.40 255.11
Physical reads: 652.70 41.51
Physical writes: 572.51 36.41
User calls: 359.03 22.83
Parses: 291.68 18.55
Hard parses: 1.06 0.07
Sorts: 68.46 4.35
Logons: 1.00 0.06
Executes: 469.89 29.88
Transactions: 15.72
% Blocks changed per Read: 1.37 Recursive Call %: 70.04
Rollback per transaction %: 6.12 Rows per Sort: 163.33
Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 99.96 Redo NoWait %: 100.00
Buffer Hit %: 99.80 In-memory Sort %: 99.99
Library Hit %: 99.87 Soft Parse %: 99.64
Execute to Parse %: 37.93 Latch Hit %: 98.77
Parse CPU to Parse Elapsd %: 11.54 % Non-Parse CPU: 99.93
You wrote: A buffer get could be preceded by a buffer busy wait - yes. But it would be an attempt to get a buffer that would cause the wait ... Yes. A buffer get precedes a buffer busy waits. I think you maybe misunderstood the question (or more precise: my formulation was not good). I'll try to explain. When you for instance have a sleep waiting to get a latch (after spinning) oracle will retry to get the latch. The same need for a latch could therefore trigger more than one latch wait. When oracle has to wait for a buffer (buffer busy waits) it does not retry to get the buffer? It just waits to get it? So the attempt to get one buffer could not lead to more than one buffer busy wait?
You wrote: Can you consider using partitioning to purge this information? Yes we are definitly considering this for the log table and other tables, both for performance and for maintainance reasons.
You wrote: could be due to read consistency Yes. It does seem that some of the buffer busy waits come from read conistency. I'm also wondering if some of these waits could come from reaching the maxtrans on index block on the log table. The inserts on this table are using a sequence as the PK, and always hitting one side of the index. If autonumous transaction is being used, this shouldn't be a problem, but I'll have to do some more analysis here.
Thanks for your comments Tom!
By the way! Just to let you know ... I must always put an -/code- in the beginning of the review to get the formating ok. I also get an -/code- attached at the end of the review. If I put an -code- at the end, it will disappear in the preview, but report an unclosed HTML-tag.
</code>
November 10, 2011 - 1:27 pm UTC
Given the information you gave me above, the only logical response is "maybe yes, maybe no". You cannot make the conclusion you made on this page based on the information supplied. You may well have other information that makes it obvious *TO YOU*, but we don't. Please do keep that in mind. We cannot see your console, we do not know your history, we do not know your current situation. please just keep that in mind.
It just waits to get it? So the attempt to get one buffer could not lead to more than one buffer busy wait?
In most all cases, we wait for up to a second before waking up and looking around and waiting again.
I'm also wondering if some of these waits could come from reaching the maxtrans on index block on the log table.
look at v$segement_statsistics, it would manifest itself as ITL waits on that index segment.
A global hash partitioned index on that key could work (but would make range scans on the ID not quite as possible) to reduce that sort of contention. As would not using a sequential number.
buffer busy wait vs buffer gets
Lasse Jenssen, November 11, 2011 - 2:27 pm UTC
I do not take this service as granted. I really think it's excellent, and really try to give you the necessary information needed.
With 5 physical CPUs and a 30 minute snapshot (this would be 9000 CPU seconds available) i would think a total of over 135000 sec in wait would indicate a problem?
You wrote: "In most all cases, we wait for up to a second before waking up and looking around and waiting again"
So when waking up and the "looking around" will trigger another "buffer get"?
I'll definitely have a look at the v$segment_statistics, and I'll look into your suggestion on a global hash partitioned index.
Thanks again Tom.
November 11, 2011 - 3:07 pm UTC
I do not take this service as granted. I really think it's excellent, and really try to give you the necessary information needed.
no worries, I'm just trying to point out that what is obvious to you - is not necessarily obvious to us :) we only have what you give us to look at.
... 135000 sec in wait would indicate a problem? ...
depends on the user load. If you had 10,000 users, they only waited 13.5 seconds apiece. You could have much more wait time in a given window than 'real time' and have everything be just dandy. It depends.
.. So when waking up and the "looking around" will trigger another "buffer get"?
...
I do not believe so, the buffer get gets stopped by the wait - not the other way around. So 1 get.
wait events: PX Deq: Execute Reply
George Joseph, March 05, 2012 - 12:37 am UTC
Dear Tom
We are using oracle version 10g Enterprise Edition Release 10.2.0.3.0.
There is a stored procedure in a package that inserts data from a multi-table join sources and insert into 4 tables.
I find that there is PARALLEL enabled in one of the tables of the source. This is a partitioned table and index on local partitioned.
We are trying to tune the procedure to finish in less time. THe current procedure takes close to an hour to complete.
It basically selects data from source, performs smaller look ups to non_parallel enabled tables and finally inserts to 4 main tables.
What i did is to check out the wait events while the procdure is running from v$sesssion_event.(484 is the main session id)
SID EVENT TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED WAIT_CLASS
484 latch: cache buffers chains 3 0 2 Concurrency
484 db file sequential read 5604 0 6290 User I/O
484 db file scattered read 54 0 74 User I/O
484 latch: row cache objects 1 0 0 Concurrency
484 PX Deq: Join ACK 76 0 8 Idle
484 PX Deq Credit: need buffer 3027 0 17 Idle
484 PX Deq: Parse Reply 52 0 47 Idle
484 PX Deq: Execute Reply 1525 567 118549 Idle
484 PX Deq: Table Q Normal 27 0 1 Idle
484 SQL*Net message to client 23 0 0 Network
484 SQL*Net message from client 23 0 12986 Idle
484 SQL*Net more data from client 58 0 75 Network
484 events in waitclass Other 328114 326797 1516 Other
Since most of the TOTAL_TIMEOUTS lie in the "events in waitclass Other" i investigated a bit more on that front and used
waitprof.sql by Tanel Poder.
After running the waitprof.sql with sample size of 500000. I figured out that there
SW_SID STATE SW_EVENT SWP_P1 SWP_P2 SWP_P3 SWP_SEQ# PCT_TOTAL_SAMPLES WAITPROF_TOTAL_MS DIST_EVENTS WAITPROF_AVG_MS
484 WAITING PX Deq: Execute Reply sleeptime/senderid= 200 passes= 5 0 25.9536 3155.95776 2 1577.97888
484 WAITING PX Deq: Execute Reply sleeptime/senderid= 200 passes= 1 0 18.2622 2220.68352 2 1110.34176
484 WAITING PX Deq: Execute Reply sleeptime/senderid= 200 passes= 2 0 16.9174 2057.15584 1 2057.15584
484 WAITING PX Deq: Execute Reply sleeptime/senderid= 200 passes= 4 0 16.4168 1996.28288 2 998.14144
484 WAITING PX Deq: Execute Reply sleeptime/senderid= 200 passes= 3 0 16.2952 1981.49632 1 1981.49632
484 WAITING PX Deq: Execute Reply sleeptime/senderid= 200 passes= 6 0 6.1502 747.86432 1 747.86432
484 WORKING On CPU / runqueue 0.0046 0.55936 7 0.079908571
Googling further aboutthis event :PX Deq: Execute Reply from
http://oracledoug.com/px8.html.says its an idle wait event and not to
worry too much about it, although the article says it could indicate that PX Slaves are taking time to process and send results back to
the query coordinator.
What i did is to set the NO_PARALLEL hint at the source tables from wehre i am selecting from and the whole process finishes in less than 10 minutes,
and everyone is happy.
I was hoping to get some direction from you as to how to pin-point what the exact problem is with regards to slaves taking more time(if my assumption is correct that
slaves were indeed the cause of the procedure taking long time).
I then attempted to think it was because of parallel dml not enabled(my procedure essentially does this INSERT INTO DEST SELECT col1,col2, from <parallel_enabled_table>).
ie the slaves were queuing up the work because they knewthe query consumer couldnt process the results that quick.
I set the session to alter sesion enable parallel dml and reran the procedure, however the effect was the same. Same wait events showed up.
With the NO_PARALLEL hint i find that the wait event is "On CPU / runqueue".
Is such an event the ideal wait event to know a process is being run correctly?.Does this show that the entire process is CPU bound.
SW_SID STATE SW_EVENT SWP_P1 SWP_P2 SWP_P3 SWP_SEQ# PCT_TOTAL_SAMPLES WAITPROF_TOTAL_MS DIST_EVENTS WAITPROF_AVG_MS
483 WORKING On CPU / runqueue 100 10110 1 10110
March 05, 2012 - 6:56 am UTC
It basically selects data from source, performs smaller look ups to non_parallel enabled tables and finally inserts to 4 main tables.
don't do that, just write one larger query that does the lookups, pack as much work into the sql statement as possible. DO NOT code a "do it yourself nested loops join" which is what you've just described here.
What i did is to set the NO_PARALLEL hint at the source tables from wehre i am selecting from and the whole process finishes in less than 10 minutes,
and everyone is happy.
parallel is not fast=true, it works when you have an abundance of free, fast, idle resources that you can use. You apparently - didn't have them. You do not have sufficient CPU to run with the degree of parallelism you tried, or you didn't have enough memory, or you didn't have enough IO bandwidth, or any combination of the three.
event in wait class other
George Joseph, March 05, 2012 - 7:55 am UTC
The degree of parallelism on the table which has the problem is 2 and instances=1. If i were to run a parallel query on this table such as select count(col) from table it returns the data in a fast manner compared with /*+NO_PARALLEL*/ hint.
Could you please share your thoughts on the wait event event in wait class other.
March 05, 2012 - 8:12 am UTC
the wait event is exactly what you described, we are waiting for a parallel execution server to finish the work it is doing. You'd have to look at the parallel execution servers session to see what it was doing.
log file parallel write - really in parallel?
Michael, April 11, 2012 - 8:33 am UTC
hi tom,
afaik a log write is always done synchronous ("_lgwr_async_io=false").
question: how can a "log file parallel write" (writing to several logfiles at the same time) be done in parallel when writing synchronous having only one lgwr process (on unix)?
cheers, michael
April 11, 2012 - 11:53 am UTC
As that is an undocumented parameter, its true meaning is not clear to either of us.
It no longer appears in 11g
It was ignored in many cases prior to that.
see note 247959.1, it references HP/UX but it is true on other platforms as well.
log file parallel write
Michael, April 12, 2012 - 12:02 pm UTC
i can only find patch 247959 - "data fix for encumbered amount......"
i can't find a note 247959.1 ?
cheers, michael
April 12, 2012 - 1:33 pm UTC
I see, it is in limited publication - sorry about that.
basically - it says that this undocumented parameter is ignored in many situations, even if set explicitly to false, it won't be respected.
meaning, false for this undocumented parameter isn't necessarily false
log file parallel write
Michael, April 13, 2012 - 11:23 am UTC
i'm asking because of following situation.
we have several logfile groups. every group has two members. every member is written to its own independent volume (1:1) - on production.
we plan to change this for pre-production to save costs. every member should write to its own subdirectory on the same volume only (2:1). (loss of data is accepted).
one argument is that lgwr writes log members sequentially in any case. is this a valid assumption?
environment: rhel5, 1 lgwr, no slaves, direct IO
cheers, michael
April 13, 2012 - 1:17 pm UTC
what are disk async io and filesystemio options set to
you can always truss lgwr for a bit to see what OS calls it is making on a test system as well.
log file parallel write
Michael, April 17, 2012 - 3:38 am UTC
disk_asynch_io = true
filesystemio_options = asynch (having n databases per host)
filesystemio_options = setall (having 1 database per host)
April 17, 2012 - 5:29 am UTC
the underscore parameter is ignored in your case.
logfile parallel write
Michael, April 18, 2012 - 8:36 am UTC
sorry, i don't understand.
my question was:
<quote>
one argument is that lgwr writes log members sequentially in any case. is this a valid assumption?
<\quote>
cheers, michael
April 19, 2012 - 4:19 am UTC
question: how can a "log file parallel write" (writing to several logfiles at the same time) be
done in parallel when writing synchronous having only one lgwr process (on unix)?
you are not using synchronous io, you have async on.
logfile parallel write
Michael, April 20, 2012 - 6:23 am UTC
so i got a logfile parallel write is like this:
start logfile parallel write
=> send write call to logfile 1
=> send write call to logfile 2
<= wait until both calls finished
end logfile parallel write
would this be true in any case as long as "disk_asynch_io = true"?
thanks,
michael
April 20, 2012 - 4:13 pm UTC
yes, you have async io enabled - even though it looks like you don't by looking at an undocumented parameter (that no longer exists in current releases)
you can always truss or strace the process to verify if you like (suggest doing that on TEST)
gc buffer busy release
A reader, June 14, 2012 - 1:14 pm UTC
Hello,
We are having a insert statement running for more than 2 days. This was supposed to take less than 4 hours (historically that was the elapsed time). I just started tracing the session and noticed wait events like:
gc buffer busy release
ges message buffer allocation
gc current grant busy
This table has quite a few CLOBs.
I haven't seen this wait event before and am curious to know about that. Our system is a 2-node RAC 11.2.0.1
Thanks,
gc Wait Events in 3 instance RAC
Rajeshwaran, Jeyabal, January 11, 2013 - 4:51 am UTC
Tom,
We are monitoring a batch process in production it is running since last night (for past 22 hrs), The same process with same data volumn completed in Pre-production environment with in 8hrs. I queried v$session_event in production and found the below waits are the major ones. (our prod is 3 instance RAC on Oracle 10.2.0.5 running in linux)
gc cr grant 2-way
gc current block 3-way
gc current grant 2-way
What could be the reason for this and how can we resolve this?