Skip to Main Content

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Alan.

Asked: December 29, 2001 - 1:31 pm UTC

Last updated: September 17, 2008 - 7:24 pm UTC

Version: 817

Viewed 10K+ times! This question is

You Asked

Hi Tom,

I would like to set up an automated performance monitoring system for my 20 or so Oracle 817 databases. I suspect that if I could monitor about 10 or so statistics and keep them in line (ie. DB Buffer Hit rate of +90%) then I will be in pretty good shape.

In your opinion what would the the most imporant performance statistics that I should be watching (obviously, your answer doesn't have to be exactly 10)?

Thanks in advance,

Alan


and Tom said...

There is only one performance indicator I care about.

Is the thing running sufficiently fast, is it meeting or exceeding its designed response times.

If so, I could really care less if the buffer hit ratio was 0%.

All of these numbers everyone looks for -- are mostly meaningless unless taken in context. In many high end OLTP systems -- a cache hit of 90% would be unachievable, consider how you process. You insert orders into a table. You read the lookup tables but thats about it. You read a customer record once and then not again for a very very long time. Re-reading the same information isn't likely to happen.

In many high end DSS (warehouse) systems -- a cache hit of 90% would be unachievable, again considering how you process....

So in these cases -- cache hit would be a lousy indicator of performance (but the rule of thumb is "cache hit is the single most predicative indicator of performance"....)

I suppose the one ratio that I would pay attention to is the hard/soft parse ratio. in an oltp system the percentage of hard parses should rapidly goto zero so that you are not hard parsing at all. On a warehouse, this may or may not be true depending on how skewed the data is and whether you need different plans given different criteria. In 9i however I would expect this ratio to go away as you should use bind variables even in datawarehouses there since the optimizer will "peek" at the binds when picking a plan.


Probably not the answer you were looking for but in my opinion -- in order to gauge if your databases are performing OK, you need to have custom metrics -- specific to the instance and applications running therein -- that you measure to. What is good for one instance will be terrible for another. You should measure response time, not all of the knobs and dials (they are what you might look at when you discover you have a problem...)

Rating

  (32 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

peek the bind variables

A reader, December 30, 2001 - 6:33 am UTC

hi tom

in 8i there are restirction using histograms, query rewrites with bind variables, is this because the execution plan is generated before the variables are bind? How does 9i overcome this?

And ratio doesnt mean anything these days! Waits/Latch statistics are probably much better indicators for perfomance problems

Tom Kyte
December 30, 2001 - 10:23 am UTC

in 8i and before, binds are not taken into consideration when producing a query plan.  In 9i, the CBO will peek at binds when developing the query plan and the plan can change.

Start with this schema:

ops$tkyte@ORA9I.WORLD> create table t
  2  as
  3  select rownum x, rpad( '*', 150, '*' ) y from all_objects;

Table created.

ops$tkyte@ORA9I.WORLD> 
ops$tkyte@ORA9I.WORLD> insert /*+ append */ into t select -1, y from t;

18895 rows created.

ops$tkyte@ORA9I.WORLD> 
ops$tkyte@ORA9I.WORLD> create index t_idx on t(x);

Index created.

ops$tkyte@ORA9I.WORLD> 
ops$tkyte@ORA9I.WORLD> analyze table t compute statistics
  2  for table
  3  for all indexes
  4  for all indexed columns
  5  /

Table analyzed.


then, run a script like this:
variable x_one number
variable x_neg_one number

exec :x_one := 1
exec :x_neg_one := -1

set termout off
alter session set sql_trace=true;
select * from t where x = :x_one;
select * from t where x = :x_neg_one;


the resulting tkprof shows:

select *
from
 t where x = :x_one


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.01          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.00       0.00          0          4          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.01       0.01          0          4          0           1

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 103

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  TABLE ACCESS BY INDEX ROWID T
      1   INDEX RANGE SCAN (object id 20780)

********************************************************************************

select *
from
 t where x = :x_neg_one


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch     1261      0.27       0.20        118       2110         12       18895
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     1263      0.27       0.20        118       2110         12       18895

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 103

Rows     Row Source Operation
-------  ---------------------------------------------------
  18895  TABLE ACCESS FULL T


different plans for the nearly identical query......


As for waits and latches, you cannot look at them in ISOLATION either.  When someone says to me "hey, I got XXXX latches, how do I fix that" -- i always say, "well, I had to stop at XXXX red lights on my last trip, was that bad?" 

If XXXX is large, they say "bad", but then I say "well I drove from the east coast of the US to the west coast, it was really pretty good".  

If they say "good", I say "but I only drove a mile and it took me hours"....


In order to see if 1,000,000 latches is "bad" -- you need to consider what you were doing, what the period of observation was (if it is weeks -- who cares, if it was minutes, you might have a problem), if it is unavoidable due to the nature of your problem -- it might be ok)..... 

Yes, but ...

Alan, December 30, 2001 - 12:58 pm UTC

Hi Tom,

I understand your answer even though it wasn't what I was hoping for. Certainly, if the customers are happy with the performance then you probably shouldn't get heavily into an academic tuning exercise (fun, though it may be). However, let me take another stab at what I was trying to get at with an example:

Let's say my pickup truck is advertised to get 20 miles to the gallon. Obviously, that will vary with driving patterns, load and age of the truck. Now if the truck continues to perform it's job but I discover that it is only getting 3 miles to the gallon shouldn't that raise a red flag? Maybe I'm carrying heavy loads up steep hills or maybe I need a tune up.

My Oracle goal is that I would like to set up a monitoring methodology that would allow me to watch the basic health of the database. I would like to do things like:

- Check availability of machines and databases.
- Check available space.
- Check for O/S swapping or paging.
- Check performance of some key, custom, long
running reports and processes.
- Check some key Oracle statistics like buffer hits,
dictionary hits, sorts to memory/disk, ect.

What do you think?

Thanks,

Alan

Tom Kyte
December 30, 2001 - 3:30 pm UTC

Your analogy is good (excellent actually) -- but just change "miles per gallon" into transactions/unit of time. Now you have MY analogy. You should have setup some measurable performance metric (like miles/gallon) beforehand -- then you can report performance.

Your analogy is perfect actually since different vehichles get different MPG -- and we expect it. A tractor trailer will be very different from a pickup, from an SUV from a BMW, from a Toyota -- and we expect them to be different. For this reason, reporting back simple ratios/metrics are pretty much meaningless.

You should be measuring things like "my response to a query like this should be 0.5 seconds or less -- if it is not, something is wrong". You now have something you can in fact measure and report on. Just like "i should be getting 25 MPG".

Once you find the query doesn't return in 0.5 seconds -- then you need to look at the system and figure out whether you a) need a tune up or b) have been carrying heavy loads up steep hills.

To check for the things you listed there (most of which are not ratios) tools like OEM (Oracle Enterprise Manager), statspack, or even just the SNMP MIBs for use with any monitoring tool would be appropriate. That'll cover availability of databases, available space and metrics like buffer hits/dictionary hits/sorts to memory/disk.

To monitor your OWN reports/processes, we have DBMS_APPLICATION_INFO and the ability to measure long runnging processes in v$session_longops (i have a fairly extensive chapter on this in my book, search for dbms_application_info on this site for other examples).

To monitor the OS, you would be looking at some system management tool or writing your own with sar/vmstat/iostat on unix or perfmon on NT.
This is very different from getting the top N performance indicators though. You should instrument your applications so that you monitor your apps performance over time. This is the KEY factor in monitoring your system. The rest of the information is just NUMBERS, measuring your response time (compare actual against expected) thats the ONLY thing that counts. For example on asktom.oracle.com, I measure every single request and have a screen that easily tells me stuff like (for this sunday) in the last 24 hours:

I generated 2,761 pages
This is an average of 0.03 pages/second
From 355 different ip addresses
each ip address viewed about 8 pages.
There were 449 USERS (different browser sessions) that viewed about 6pages/each

The average elapsed time per page was 0.35 seconds with the slowest page taking 2.18 seconds and the fastest taking 0.13

Now, Sunday is a little slow -- but during the week I check this page from time to time to see how we are doing. If the avg/max times start to go up -- i have a problem. I have this information stored historically and can tell you exactly how I was doing performance wise yesterday -- two months ago -- one year ago. All of the applications I work on have this "feature". This is something I think every application should add -- sure, the extra insert adds a little overhead but hey -- the wealth of information I have makes pinpointing performance issues TRIVIAL and allows me proactively monitor my system. None of the metrics I report back to myself however have anything to do with the database itself -- those are things I would start to look at AFTER finding there was an issue.



How do I draw the Application Performance Metric?

shivaswamy, May 14, 2002 - 11:06 am UTC

Tom,

It was a very nice topic and article.

I manage a big (800+ G)database for a small shop. Local clients. DSS nature. Long running jobs. New, unstabilized(changing) custom application. When do I say, the application is doing good..?I keep tab of the top 10 written and read datafiles with their response time on a day-to-day basis and store it. Anything else would you care to suggest..? I plan to update 500,000 rows on a test table to measure the performance, daily to give me an yardstick to measure..

Lookforward to your invaluable input.
Regards,
Shivaswamy

memory issue

Mike, July 10, 2003 - 8:37 am UTC

I used top on Solaris:

Memory: 10G real, 175M free, 15G swap in use, 858M swap free

does it imply the machine still has 858M 'memory' to be allocated or just 175 M?

TIA

Tom Kyte
July 10, 2003 - 10:07 am UTC

impossible to say since solaris steals as much ram as it wants to do OS file system buffering and gives it back when applications need it.

So, you might very very well have 9.9gig free

In my database, "plans for the nearly identical query" are the same

Paul Druker, July 10, 2003 - 12:53 pm UTC

Tom,

I ran your example in our 9.2.0.3.0 database, and, in fact, both plans were identical.

select *
from
t where x = :x_one


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.00 0 4 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.01 0.00 0 4 0 1

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 50 (PDRUKER)

Rows Row Source Operation
------- ---------------------------------------------------
1 TABLE ACCESS BY INDEX ROWID T (cr=4 r=0 w=0 time=146 us)
1 INDEX RANGE SCAN T_IDX (cr=3 r=0 w=0 time=102 us)(object id 60991)

********************************************************************************
select *
from
t where x = :x_neg_one


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1407 0.34 0.34 0 3324 0 21088
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 1409 0.34 0.34 0 3324 0 21088

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 50 (PDRUKER)

Rows Row Source Operation
------- ---------------------------------------------------
21088 TABLE ACCESS BY INDEX ROWID T (cr=3324 r=0 w=0 time=209221 us)
21088 INDEX RANGE SCAN T_IDX (cr=1449 r=0 w=0 time=97363 us)(object id 60991)

********************************************************************************

Now, according to the manual, "The CBO peeks at the values of user-defined bind variables on the first invocation
of a cursor. This feature lets the optimizer determine the selectivity of any WHERE clause condition, as well as if literals have been used instead of bind variables. On
subsequent invocations of the cursor, no peeking takes place, and the cursor is shared, based on the standard cursor-sharing criteria, even if subsequent
invocations use different bind values."

So, actually, I'd expect plans to be the same. Or, possibly, have different plan for both queries if
select * from t where x = :x_neg_one
statement is executed first and it's not in the library cache yet.

Can you explain why plans are the same in my database? Is is possible because of difference in some init.ora parameters? cursor_sharing =TRUE in my database.

Regards,
Paul


Stats Gathering

Steven Cooper, September 19, 2003 - 4:14 am UTC

Tom,

Any chance of seeing how you implement your stats gathering?

I'm thinking about implementing this in a new application that I'm writing and was wondering how you have done it.

Kind Regards
Steven

Tom Kyte
September 20, 2003 - 5:08 pm UTC

depends. for many things

- setting all tables to monitoring
- using AUTO with dbms_stats

so, basically, two calls to dbms_stats, one to set all tables to monitoring and the other to gather with auto.

is great. for really big or complex (large partitioned tables, loads into DW's, etc) -- you think about it much more. you would probably do it segment by segment

Interesting

Steven Cooper, September 22, 2003 - 2:55 am UTC

Tom,

Not quite what I had in mind, my question was a little ambiguous though. What i meant was you mention above how you gather info (stats) about your applications performance over time i.e. this screen took 1 minute to return some data usually its 10 seconds etc.. I wanted to know how you implement this. The way I'm thinking i.e. Keep it Simple , is to have three tables,
APPS (which holds which application the stats are for and a description of the App ),
STATS (which holds which stat and a description of that stat) and
APP_STAT (which holds App_Id,stat_Id,user,date,start_time,end_time)

In the application I would have two calls to a stat_package , one to get the starting time and populate a pl/sql table with all the other info except end_time. The second call would get the end_time and then insert and commit (autonmous transaction) into the app_stat table.

How do you implement yours?

Kind regards
Steven

Tom Kyte
September 22, 2003 - 7:50 am UTC

page starts with an insert/commit
page ends with an update/commit

record logged is simply:

flows_010400@ASKUS> desc WWV_FLOW_ACTIVITY_LOG1$
Name Null? Type
----------------------------------------------------- -------- --------------
TIME_STAMP NOT NULL DATE
COMPONENT_TYPE VARCHAR2(255)
COMPONENT_NAME VARCHAR2(255)
COMPONENT_ATTRIBUTE VARCHAR2(4000)
INFORMATION VARCHAR2(4000)
ELAP NUMBER
NUM_ROWS NUMBER
USERID VARCHAR2(255)
IP_ADDRESS VARCHAR2(4000)
USER_AGENT VARCHAR2(4000)
FLOW_ID NUMBER
STEP_ID NUMBER
SESSION_ID NUMBER
SECURITY_GROUP_ID NOT NULL NUMBER


some pages do their own auditing as well - for example, the search page records:



ask_tom@ASKUS> desc WWC_ASK_QUESTION_SEARCH_LOG$
Name Null? Type
----------------------------------------------------- -------- ---------------
ID NOT NULL NUMBER
SEARCH_CRITERIA VARCHAR2(1000)
EMAIL VARCHAR2(255)
ACCESS_DATE DATE
REMOTE_ADDR VARCHAR2(255)
HTTP_USER_AGENT VARCHAR2(255)
HTTP_REFERER VARCHAR2(255)
HOST_NAME VARCHAR2(255)


so I can see what people are querying for and what pages are submitting searches to me. The RSS page is another one with an audit trail

one question

A reader, September 22, 2003 - 12:05 pm UTC

"flows_010400@ASKUS> desc WWV_FLOW_ACTIVITY_LOG1$
Name Null? Type
----------------------------------------------------- -------- --------------
...
ELAP NUMBER"

Is the "elap" this the elapsed time since the last timestamp from the previous record?



Tom Kyte
September 22, 2003 - 12:13 pm UTC

it is the amount of time the page took to generate

when we start, we call dbms_utility.get_time
do the audit insert
do the page
get the time again
update that audit trail record with (STOP-START) -- 1/100th of seconds..

thanx!

A reader, September 22, 2003 - 12:31 pm UTC


How would you mesure this?

msc, September 23, 2003 - 6:48 pm UTC

Tom,

your solution is nice and simple but it seems limited in use.

Let look rather typical [reporting] case in our system.

Setup:

-- performance log table...
create table LOG (A varchar2(200));

-- package containing report
create or replace package TEST_CUR is
type TRefCur is ref cursor;

procedure Open_cur(cur OUT TRefCur);
end;
/


create or replace package body TEST_CUR is

procedure Open_cur(cur OUT TRefCur) is
begin
insert into LOG values ('Start time: ' || to_char( sysdate, 'YYYY-MM-DD HH24:MI' ));
commit;

open cur for
select OBJECT_NAME
from ALL_OBJECTS
where ROWNUM < 50;

insert into LOG values ('End time: ' || to_char( sysdate, 'YYYY-MM-DD HH24:MI' ));
commit;
end;
end;
/


Now lets test:

var curResult REFCURSOR

set arraysize 1

alter session set SQL_TRACE = TRUE;

begin
TEST_CUR.Open_cur(:curResult );
end;
/

print curResult

alter session set SQL_TRACE = FALSE;

Now lets examine trace file (only partial):
=====================
PARSING IN CURSOR #7 len=45 dep=0 uid=59 oct=47 lid=59 tim=183619990 hv=420908503 ad='411cfc8'
begin
TEST_CUR.Open_cur(:curResult );
end;
END OF STMT

...

PARSING IN CURSOR #8 len=76 dep=1 uid=59 oct=2 lid=59 tim=183619995 hv=4020378713 ad='31e88d0'
INSERT INTO LOG VALUES ( 'Start time: ' || TO_CHAR(SYSDATE,'YYYY-MM-DD') )
END OF STMT
PARSE #8:c=1,e=5,p=0,cr=3,cu=0,mis=1,r=0,dep=1,og=4,tim=183619995
EXEC #8:c=0,e=0,p=0,cr=1,cu=8,mis=0,r=1,dep=1,og=4,tim=183619995

...

--> HERE is ref cursor opened.

PARSING IN CURSOR #9 len=56 dep=1 uid=59 oct=3 lid=59 tim=183620014 hv=2527982697 ad='32a0838'
SELECT OBJECT_NAME FROM ALL_OBJECTS WHERE ROWNUM < 50
END OF STMT
PARSE #9:c=12,e=18,p=2,cr=18,cu=0,mis=1,r=0,dep=1,og=4,tim=183620014
EXEC #9:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=183620014

...

--> HERE we update final time

PARSING IN CURSOR #11 len=74 dep=1 uid=59 oct=2 lid=59 tim=183620014 hv=749481542 ad='31e4244'
INSERT INTO LOG VALUES ( 'End time: ' || TO_CHAR(SYSDATE,'YYYY-MM-DD') )
END OF STMT
PARSE #11:c=0,e=0,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=183620014
EXEC #11:c=0,e=0,p=0,cr=1,cu=3,mis=0,r=1,dep=1,og=4,tim=183620014

...

--> HERE the report data is fetched (actual work)

FETCH #9:c=0,e=3,p=4,cr=7,cu=4,mis=0,r=1,dep=0,og=4,tim=183620092
...
FETCH #9:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=4,tim=183621284

--> THIS would be right point to mesure end time
=====================

This example would measure only parse and execution time(approx time to get the first row) but should measure time to get the *last* row in report.

When first page of report is displayed on user computer screen then [s]he is probably pouring coffee and will print report later. And we cannot measure time in client application as there is user time included.

Is there any means to measure automatically time spent in database (except trace)? Or any other way to get reliable data on reporting performance if using ref cursors?


Tom Kyte
September 23, 2003 - 8:17 pm UTC

so, why did you do it that way??? i wouldn't have.

i use mod_plsql.

everything, everything is done in plsql.

I would have timed the open, fetch and close.

I would never time the open -- that isn't relevant.

you have to INSTRUMENT THE CLIENT.

in my case, the client is (smartly) PLSQL, so it is all very easy

if your client is not plsql, you'll have to move the instrumentation OUT there.

It is client-server application

msc, September 24, 2003 - 12:42 pm UTC

Tom,

it is client-server application not web app (no mod_plsql, no html page to construct). We have all SQL in database packages (open cur for ...) no single SQL query in client executable. We just call the procedure and fetch ref cursor.

Measuring at client side would be ok but... this is non-modal (MDI) application and user can do other things (even another report) before printing complete report out.

Tom Kyte
September 25, 2003 - 4:56 am UTC

then that is STILL where the timing/instruementation needs to take place -- as well as in the database.

as you noted "open" = "fast"

"fetch" = "where the work is done"

if you want response time metrics, your client will be doing that, it is the only one that can.

FETCH takes too long

Jens, November 11, 2004 - 7:41 am UTC

Hi Tom,

what does it mean, when a fetch takes so much time?

I querried a Mat.View containing 20 rows.

select *
from
testtab where id = 12


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.00 46.52 6 7 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 0.00 46.52 6 7 0 0

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 82

Rows Row Source Operation
------- ---------------------------------------------------
0 TABLE ACCESS FULL TESTTAB

A select * from table (for the first time) need the same. What could be the reason that the FETCH does take so much?

Bye,

Jens



Tom Kyte
November 11, 2004 - 8:01 am UTC

use a 10046 level 12 trace and

a) if you are on 9i -- it'll show you the wait events in the tkprof report
b) if you are not on 9i, you can read the raw trace file to see the wait events, my book "Expert One on One Oracle" covers that in some detail if you have access to that.

Dominic, January 27, 2005 - 6:56 am UTC

Earlier on in this thread you wrote
'You should be measuring things like "my response to a query like this should be 0.5 seconds or less -- if it is not, something is wrong". You now have something you can in fact measure and report on. "

Would a 'select count(*) from dba_objects' suffice ?
I am trying to write a generic quick and nasty performance/response indicator for several types of database so I'm trying to use standard table/views

Thanks


Tom Kyte
January 27, 2005 - 9:14 am UTC

impossible to write a "generic quick and nasty performance/response indicator for several types of database"

You need to time things end users care about, that is how THEIR applications, doing THEIR stuff are performing.

Most end users care not a whit about the response time of select count(*) from dba_objects.

They care very much when transaction "X" takes 5 times as long to complete today.

Dominic, January 27, 2005 - 12:51 pm UTC

I agree with your comment.
What I was trying to get at was if query takes X normally, whilst no-one is complaining, then later it takes 2X, to me that would indicate something out of the ordinary happening which should be investigated.
I was not going to use the query as a performance indicator alone, rather as a comparison with previous known values. I understand that it is not a true reflection of response time as experienced by the user, it doesn't reflect the efficency of the database, but just a rough indication of load.

thanks and regards



Tom Kyte
January 27, 2005 - 12:57 pm UTC

but -- your query is not their query.

say your query runs 2x as long, maybe that

a) means the dictionary is getting larger and the application itself is totally unaffected

b) means the entire system is 2x slow

c) neither a nor b.


say your query runs 2x as fast, maybe that means

a b and c again.


you need to run their transactions -- optimally the application itself would have a comprehensive log, but most programmers never even think about logging stuff like this.


I'd look at things like

o transactions per second over time (should be fairly constant, no big swings)
o redo generated per second over time
o LIO/sec over time
o PIO/sec over time
o Parses/sec over time
o etc

(eg: lots of the stuff you see on page 1 of a stacks pack report.


o average response times from the top sql over time...

(eg: from statspack)


actually -- 10g does this out of the box, but you can use statspack to do this yourself.


I'll firmly keep replying "no, i don't thing select count(*) from dba-objects means a thing"



performance

abc, February 01, 2005 - 5:01 pm UTC

Control file parallel write is the major wait event in the database (37%). This may be due to all the
Current system:Three control files are placed on the same file system.
3. How to avoid the next wait event is switch logfile command (22%).
:There are only two log groups available and
The log file size is just 5 MB.
4. DB File Scattered Read contributes in this database (18%)
5. Wait event SQL* Net message from DB Link (15%) may denote the time is actually all spent doing

Please usggets to avoid above wait events

Tom Kyte
February 02, 2005 - 4:49 am UTC

but -- are your end user waiting on that?


22% of *WHAT* precisely - 1 second 10000000 seconds, 5 seconds? over what duration of time?


Hey, I just grew my investments by 1000%, pretty good? too bad I invested a penny.


ratios -- meaningless, pretty useless in this case.


(most times, having controlf file parallel write at the top means - good job, because end users don't wait on that, it could be a "so what" wait)

Bind peeking

James Su, May 29, 2006 - 5:06 pm UTC

hi Tom,
In your example, these different bind variables result in different plans:
select * from t where x = :x_one;
select * from t where x = :x_neg_one;

What happens if this is inside a stored procedure:

open p_returning for select * from t where x = :p_input;

Here p_returning is a ref cursor, p_input is an input parameter of the procedure.

Will Oracle peek the values of p_input every time I call the procedure?

Thank you.



Tom Kyte
May 30, 2006 - 9:25 am UTC

it resulted in different plans simply becaue I forced a hard parse by using different literal SQL


In plsql, there will be ONE literal sql statement - exactly ONE. the bind variable names will be held constant.

Hence, that'll be hard parsed ONCE and the plan set upon that initial hard parse.


(no, it'll not peek every time, that defeats the purpose of shared sql)

bind peeking

James Su, June 05, 2006 - 10:36 am UTC

hi Tom,

In this example:
jsu@JSU> set autotrace traceonly
jsu@JSU> variable x_one number
jsu@JSU> variable x_neg_one number
jsu@JSU> exec :x_one := 1

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.03
jsu@JSU> exec :x_neg_one := -1

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.00
jsu@JSU> select * from t where x = :x_one;

Elapsed: 00:00:00.09

Execution Plan
----------------------------------------------------------
Plan hash value: 470836197

-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 162 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 2 | 162 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | T_IDX | 2 | | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("X"=TO_NUMBER(:X_ONE))


Statistics
----------------------------------------------------------
325 recursive calls
0 db block gets
53 consistent gets
4 physical reads
0 redo size
610 bytes sent via SQL*Net to client
385 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
7 sorts (memory)
0 sorts (disk)
1 rows processed

jsu@JSU> select * from t where :x_one is null or x = :x_one;

Elapsed: 00:00:00.31

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4071 | 322K| 411 (1)| 00:00:05 |
|* 1 | TABLE ACCESS FULL| T | 4071 | 322K| 411 (1)| 00:00:05 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(:X_ONE IS NULL OR "X"=TO_NUMBER(:X_ONE))


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
1832 consistent gets
1825 physical reads
72 redo size
606 bytes sent via SQL*Net to client
385 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

The second SQL does a full table scan even :x_one is not null. Is it possible to make it work in this way:
if input(x_one) is null then full scan;
otherwise index scan.
My solution is to use dynamic sql, is there a better way?
Thanks.

Tom Kyte
June 05, 2006 - 11:28 am UTC

explain plan doesn't bind peek, ignore those - you cannot use explain plan to see bind peeking.

use TKPROF and the TKPROF plans.


using the cbo, you can just:

select * from t where x = nvl(:x,x);



sets up a neat plan that flip flops between full scan and index access depending on the input:

big_table@ORA10GR2> variable x number;
big_table@ORA10GR2> set autotrace traceonly explain
big_table@ORA10GR2> select * from big_table where id = nvl(:x,id);

Execution Plan
----------------------------------------------------------
Plan hash value: 1003526628

-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%C
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10M| 945M| 46541
| 1 | CONCATENATION | | | |
|* 2 | FILTER | | | |
|* 3 | TABLE ACCESS FULL | BIG_TABLE | 10M| 945M| 46538
|* 4 | FILTER | | | |
| 5 | TABLE ACCESS BY INDEX ROWID| BIG_TABLE | 1 | 99 | 3
|* 6 | INDEX UNIQUE SCAN | BIG_TABLE_PK | 1 | | 2
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter(:X IS NULL)
3 - filter("ID" IS NOT NULL)
4 - filter(:X IS NOT NULL)
6 - access("ID"=:X)


Only a recommendation for someone that need a tool for help...

Sven Bleckwedel, June 05, 2006 - 6:18 pm UTC

Hi Alan and others,

<quote>

My Oracle goal is that I would like to set up a monitoring methodology that would allow me to watch the basic health of the database. I would like to do things like:

- Check availability of machines and databases.
- Check available space.
- Check for O/S swapping or paging.
- Check performance of some key, custom, long
running reports and processes.
- Check some key Oracle statistics like buffer hits,
dictionary hits, sorts to memory/disk, ect.

What do you think?

Thanks,

Alan

Followup:
Your analogy is good (excellent actually) -- but just change "miles per gallon" into transactions/unit of time. Now you have MY analogy. You should have setup some measurable performance metric (like miles/gallon) beforehand -- then you can report performance.

Your analogy is perfect actually since different vehichles get different MPG -- and we expect it. A tractor trailer will be very different from a pickup, from an SUV from a BMW, from a Toyota -- and we expect them to be different. For this reason, reporting back simple ratios/metrics are pretty much meaningless.

<quote>

After readind the previous words (mentioned previously) about this issue I would say something to anyone that need a tool to help monitoring - from one to various (really) - Oracle instances, doesn´t matter the equipment that you has. It´s name is Lab128 and could be found in:

</code> http://www.lab128.com <code>

I´m using it to maintain control of many oracle instances with great success. With it I found the reasons for inumerous bottlenecks (I/O interfaces) and contentions (latches, enqueues, etc.) and PROVED TO THE DEVELOPERS that they are building poorly tuned apps, explaining the plans ON-THE-FLY. With this tool, the words inside "Oracle Wait Interface" sounds really true. Don´t forget to reread the Oracle Concepts, also...

Regards,


Tom Kyte
June 05, 2006 - 6:51 pm UTC

well, I'm not sure I like this too much as it is posted by an employee of lab128...

Just thought I would point that out... I'm sure you are using it, however ...

plsql or sql

Roger, June 05, 2006 - 9:36 pm UTC

Hi Tom

Our application's backend is oracle 10, our client app only uses one dml sql per transaction to db, like update, insert, select. when the concurrency going up, soft parse is really the consern. do you think it's still helpful to package all sql into pl/sql in this context? or just sql high with session_cached_cursors is effiecnt enough?

Thanks

Roger

Tom Kyte
June 06, 2006 - 8:22 am UTC

if your client application uses a single DML for *every* transaction - I have a real feeling your client application is totally "broken". I cannot imagine a system of any complexity where every single transaction is a single statement.


If you package the sql, PLSQL can cache the sql - but then you have to still parse the PLSQL block that would invoke it! catch 22

the APPLICATION can (and should) parse once, execute many. The developers do not have to close statements.

plsql or sql

A reader, June 06, 2006 - 10:22 am UTC

Thanks Tom

Actually application's logical is simple, the middleware judges a bit in xml body, let's say
if bit-a = 'i' then generate sql statement 'insert into x values (y)'; else if bit-a = 'u' then generate sql statement 'update....'; ... end if. do you have any advice for this case? if we should package judgement jobs of middleware and sql statements into pl/sql?
i have another question about performance, if we have a box has 16 cpu, and there are 2 instances running in this box and the workloads of these 2 are almost same, if we should set cpu_count to 8 for each instance or just level this as 16 by default?


Tom Kyte
June 06, 2006 - 11:22 am UTC

I still cannot believe that an application consists of single DML statements in its entirety. Sounds like something was done really wrong.



If the APPLICATION is incapable of caching (not parsing over and over) a simple insert or update, that application is likely incapable of caching (not parsing over and over) a call to a plsql routine.

The application *parses*, Oracle does what it is told. If the application is over parsing now and every transaction is a single statement - all you would accomplish by using plsql is doubling the number of statements the application uses (the procedure call AND the insert would be there) - but not doing anything about the number of times it parses (except perhaps INCREASING it)....


In answer to the last question, you should definitely shutdown one of the instances and combine them!

Don't touch cpu count in any case.

About Lab128 tool to check what's happening with Oracle instances...

Sven Bleckwedel, June 06, 2006 - 2:47 pm UTC

Hi Mr. Kyte,

<quote>

After readind the previous words (mentioned previously) about this issue I would say something to anyone that need a tool to help monitoring - from one to various (really) - Oracle instances, doesn´t matter the equipment that you has. It´s name is Lab128 and could be found in:

</code> http://www.lab128.com <code>

I´m using it to maintain control of many oracle instances with great success. With it I found the reasons for inumerous bottlenecks (I/O interfaces) and
contentions (latches, enqueues, etc.) and PROVED TO THE DEVELOPERS that they are building poorly tuned apps, explaining the plans ON-THE-FLY. With this tool,
the words inside "Oracle Wait Interface" sounds really true. Don´t forget to reread the Oracle Concepts, also...

Regards,

Followup:
well, I'm not sure I like this too much as it is posted by an employee of lab128...

Just thought I would point that out... I'm sure you are using it, however ...

<quote>

Please don't misunderstood my intention, please. I was only making mention for a tool that helped me a lot. I'm not an Lab128's employee, of course. I'm an Oracle DBA and a consultant for many businesses by myself, in my country. But because the possibilities of this tool in helping many other DBAs, I opened a new business in my country to help them too, ONLY because it could REALLY help them. And, I made a mention to it in this issue, only for help...

I talked a lot with the creator of this tool, making inumerous suggestions, also. But this would not make me an employee...

And, please take a look at this tool by yourself, if you wish. Probably you could find many interesting features (like Activity Explorer that enhances Oracle ASH for any Oracle release, Buffer Explorer, Sessions and Session longops windows, Locks and locked objects, Tablespaces details, and many, many others that you could customize for your own needs). As like as the authors said in the "Oracle Wait Interface" book, sometimes isn't possible to take a look in many Oracle instances, WITHOUT A TOOL that collects the performance data in some appropriated form. Enjoy !

Regards,
Sven


Tom Kyte
June 06, 2006 - 3:24 pm UTC

your email address has lab128 in the domain? That is something that lead me to believe perhaps - well....

The good results matter...

Sven Bleckwedel, June 06, 2006 - 6:11 pm UTC

Hi Mr. Kyte,

In my opinion, doesn't matter the domain I used in my e-mail address, because if you noted well, is from other country. Would be better if I used my Hotmail address ? Or do you prefer Yahoo ? First of all, I was (and remain) an end user of this tool, many months ago. Later, after a lot of success using this tool, I noted also that was an opportunity in being a representative, in my own country. Second, I cannot see the link (in the page) that shows this mail address, also. Unfortunately the other viewers couldn't see, too...

Using one or another e-mail address doesn't matter, to me. But, I was only opening an opportunity to someone to ask something about this tool, when possible. But, if they can't see the mail address, nothing to be worried about...

Regards


Tom Kyte
June 06, 2006 - 9:42 pm UTC

In my opinion, if your email address includes the name of the company - well, umm, you tell me. Does lab128 provide free email accounts as well as Oracle tools?

It was the fact you promoted the tool as "casual end user", when it would appear you are somehow associated with the company.

If you aren't, apologies - but why would your email address be name@lab128.x.y ?

plsql or sql

Roger, June 06, 2006 - 10:50 pm UTC

Hi Tom

The single sql per transaction is shareable for the other sessions, the transaction consists of only one sql, but this sql should be in shared pool, and the other sessions can use this cache sql in the future. do you think pl/sql is still helpful?

Tom Kyte
June 07, 2006 - 6:51 am UTC

IF the application parses the sql each time (hard parses are very very very bad, soft parses are simply VERY VERY bad) then moving this into plsql will only have the effect I described above:

a) you will increase the amount of sql your shared pool has to hold, instead of just holding the insert, it holds the insert AND "begin call_procedure_to_do_insert(...); end;"

b) you will increase the amount of parsing that your system does, you have to parse the insert at least once and the application developers will likely still parse the plsql block to invoke it every silly time

c) since you have more sql to cache, and since plsql does not always cache 100% of the sql - you may well increase the number of hard parses (as things age out of this shared pool that is doing more work) and soft parses.


I don't care that the sql is shareable in the shared pool - your parsing problem is in the application, soft parses are really bad, hard parses are really really bad. The only good parse is NO PARSE.

The good results matter, again...

Sven Bleckwedel, June 07, 2006 - 12:04 pm UTC

Hi Mr. Kyte,

<quote>

First of all, I was (and remain) an end user of this tool, many months ago. Later, after a lot of success using this
tool, I noted also that was an opportunity in being a representative, in my own country. Second, I cannot see the link (in the page) that shows this mail address, also. Unfortunately the other viewers couldn't see, too...

Using one or another e-mail address doesn't matter, to me. But, I was only opening an opportunity to someone to ask something about this tool, when possible. But, if they can't see the mail address, nothing to be worried
about...

Regards

Followup:

In my opinion, if your email address includes the name of the company - well, umm, you tell me. Does lab128 provide free email accounts as well as Oracle tools?

It was the fact you promoted the tool as "casual end user", when it would appear you are somehow associated with the company.

If you aren't, apologies - but why would your email address be name@lab128.x.y ?

<quote>

Your apologies are accepted, but wasn't easy to understand your last question. As I described before, I opened in my country a representation of this product, because was an opportunity to offer a good tool and support for other DBAs, LOCALLY. Sorry if wasn't clear enough in my previous posts, but I mentioned this before. The fact in using this mail address in place of another was only to have/disponibilize a valid mail address, only. AND, you have noted that the mail address was from a different domain from the link I mentioned, also...

And wasn't my intention to promoting this tool as a "casual end user". I'm a intensive user of this tool, everyday. Why not ? If this tool gives a complete view of what's happening IN THE BEHIND of Oracle instances, why not could I give any advice about this ? Again, my intention was only to showing for anybody that isn't necessary to "reinvent the wheels", as like as you said about other things (in the past) and not spending more time building anything that exists in the market, also...

Regards


Follow up from Dominic's Question in 2005

Robert, September 16, 2008 - 10:35 am UTC

Hi Tom,

Version 10.2.0.4

You replied to Dominic in 2005

<quote>

I'd look at things like

o transactions per second over time (should be fairly constant, no big swings)
o redo generated per second over time
o LIO/sec over time
o PIO/sec over time
o Parses/sec over time
o etc

<quote>

I am trying to create some meaningful reports on database load/performance from Statspack data. Your suggestions, were just the kind of thing I was thinking about.

But now I am seeing in a database I inherited many opportunities for reducing *work* (i.e. logical i/o) for many common, highly used, queries by creating indexes on several tables.

These queries are already pretty fast, so response time won't be that noticeable. But they are doing many more logical I/O than necessary because of lack of some indexes.

What I would like to do is brag about how much I reduced the load on the database.

Nowadays we all know not to depend on summary values and ratios because aggregates hide the details, etc.... but would you comment on my plan to try to showcase my tuning endeavor? After I create a few indexes, the number of I/O in database will be greatly reduced.
Do you think there might be some overall averages I could use to show this... something with number of executions, number of I/O, etc. ?

(I understand I'm taking a chance of this being off topic but I think it may fit).

Thank you,

Robert.
Tom Kyte
September 16, 2008 - 11:20 pm UTC

.. Nowadays we all know not to depend on summary values and ratios because
aggregates hide the details, etc.... ...

no, we do RELY on them - we just interpret them for what they are.


eg: a cache hit ratio of 99% is good.

that is false - it might be good, it might be really bad, it might be indifferent. It is a number. when it CHANGES we want to ask "why"


If you get some aggregate number to change due to your work - your work explains why. You want to look at a bunch though - if LIO goes way down but PIO skyrockets - that is not good (probably) and so on. So, look at a bunch of indicators (response time would be BEST) and verify they all point to "good"

Tuning to response time

Robert, September 17, 2008 - 1:23 pm UTC

Tom,

To reply to your previous response about tuning to response time...

I understand the concept:
"the only performance metric that really counts is whether the user is happy with the response time"

But I have found some queries in v$sql which have 1000's of executions and have very quick response times, but which are using lots more LIO than necessary because of 'missing' indexes.

So this is a pro-active/scaling type of tuning issue.

Do you think this could be a valuable tuning exersize?

Is there any way I could show the overall impact to the database of tuning these queries?

Thank you,

Robert.
Tom Kyte
September 17, 2008 - 1:53 pm UTC

... Do you think this could be a valuable tuning exersize? ....

sure, especially if you have a future need to scale up.

... Is there any way I could show the overall impact to the database of tuning
these queries?
...

by showing the same or reduced response times coupled with lower cpu utilization, less physical IO and so on (eg: drop some numbers, coupled with faster queries)

Capturing Response Times

Robert, September 17, 2008 - 2:02 pm UTC

Tom,

Can I sum up the new 10g columns in v$sql for a given query...

APPLICATION_WAIT_TIME
CONCURRENCY_WAIT_TIME
CLUSTER_WAIT_TIME
USER_IO_WAIT_TIME
PLSQL_EXEC_TIME
JAVA_EXEC_TIME
etc.

to show pretty much the actual *response time*?
(or close enough to it for the most part)?

Thanks,

Robert.

Tom Kyte
September 17, 2008 - 2:23 pm UTC

well, add cpu time in - those are just waits.

Total Response Time

Robert, September 17, 2008 - 3:41 pm UTC

Tom,

Yeah.. that's what I meant. :)
It looks like to me that now we have enough info in v$sql to calculate the total response time for a user..

(all wait times in v$sql) + CPU_TIME ... correct?

But then if that is correct then isn't this correct....
(all wait times in v$sql) + CPU_TIME = v$sql.elapsed_time ?

So therefore can I just use V$SQL.ELAPSED_TIME as the accurate user *response time* ?

Thanks!

Robert.
Tom Kyte
September 17, 2008 - 7:24 pm UTC

total response time to the application - the application might add significant processing time itself :)

That is why only applications can really report this stuff - we are missing the think time in the application....


the elapsed time would be good enough - yes. But you'd be looking for what caused the reduction (or increase) and then you'd be back to the individual numbers again.

Robert, September 18, 2008 - 2:49 pm UTC


what could be a good candidate as KPI

Ajeet, April 04, 2011 - 8:35 am UTC

Hello Tom,

I am working on a capacity planing exercise, to setup a capacity planing framework , rather enhance the exisitng one. it is a large environment, having quite few databases, RAC, oracle ERPs , servers. plenty of concurrent users ,mission critical apps and all.the question i have is what are few things which you would recommend as key performance indicator for such environment - apart from response time.

CPU utilization, ,memory utilization, IO performance, run queue, network collisions,swap utilization, paging (OS)...

should DB time be a KPI with Oracle 10g onwards ? does it make sense.does it serve any purpose which cpu data does not serve to.

your thoughts, even if you think otherwise it is ok to share .

Thanks

Alan -- Thanks for the question regarding "Top 10 Performance Indicators", version 817

newbie, October 28, 2014 - 3:15 am UTC

Hi Tom,
This question about "Top 10 Performance Indicators" was asked in 2001, with oracle 8i, 9i. At present, are your answers consistent with oracle 10g, 11g, 12c?

SQL Monitoring

A reader, September 28, 2016 - 8:59 am UTC

I keep eye on sql monitoring sql execution via em manager.
but i have confused such terms like
Duration is 30.0s and database time is 30.1s and IO Bytes is 1GB so
my question is what is IO Bytes and IO Request ?
suppose table fetch 100 records then size of the records is 1GB?

More to Explore

Performance

Get all the information about database performance in the Database Performance guide.