Skip to Main Content

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Ardiel.

Asked: November 06, 2019 - 3:49 pm UTC

Last updated: November 25, 2019 - 12:26 pm UTC

Version: 12.2.01

Viewed 1000+ times

You Asked

Hello Tom

I am a member of an site reliability team (SRE) and we are trying to develop SRE "golden signals" for an Oracle 11g/12c database.

These signal are:

1) Throughput
2) Latency
3) Response Time
4) Error rate (not sure about this one)

How do I derive these measurements using SQL queries in an Oracle database?

My research has highlighted the following SQL queries:

1) Response time
-------------

select to_char(begin_time,'hh24:mi') time, round( value * 10, 2) "Response Time (ms)"
     from v$sysmetric
     where metric_name='SQL Service Response Time'



2) Throughput
----------

select a.begin_time, a.end_time,round(((a.value + b.value)/131072),2) "GB per sec" 
where a.metric_name = 'Logical Reads Per Sec' 
and b.metric_name = 'Physical Reads Direct Per Sec' 
and a.begin_time = b.begin_time; 


Not sure about latency?



and Connor said...

I'll be honest - I'm not a fan of these kind of "signals", at least at the macro level.

For example, if I run 10 SQL statements with the following response time:

1ms
1ms
1ms
10secs
1ms
1ms
1ms
1ms
1ms
1ms

then my average response time is: 1second

1 second might be deemed as "OK", but we've lost the fact that somewhere in there was someone stuck for 10 seconds waiting on their browser.

Rather than signals, look out for outliers that are exceeding your thresholds, and even they should be as defined by business requirements. For example, the 10seconds above is a disaster on a web browser, but 1000seconds might be totally fine for a nightly batch job.

And a good way of doing that is with the AWR reports, or, if you are not licensed for that, things like Statspack because they'll capture information from things like V$EVENT_HISTOGRAM to give you a better distribution of data.

Rating

  (2 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

A reader, November 21, 2019 - 6:01 am UTC

I agree with Connor on this.. these are just that i.e. "signals".

And the idea for setting these gets up in the long run.. especially when you start getting false positives.

If you have access to OEM, then you can configure alerts for TOP SQLs, long running SQLs, SQL throughput, IO, etc.

If you don't then you can utilize tools available at the OS level for disk statistics, TOP process etc.

If you are on cloud, then there are tools provided which will help you configure alerts, etc.

Cheers!

on getting to know those outliers.

Rajeshwaran Jeyabal, November 21, 2019 - 10:36 am UTC

....
look out for outliers that are exceeding your thresholds,
....


On a Typical transactional system - how do we know that which particular execution took more than X secs?

(dba_hist_sqlstat - doesn't expose the details at each execution, we see the cumulative value there)

Questions
..........
a) perhaps by doing statement level logging like this inside a stored proc?

create or replace procedure do_work(p_input number)
as
 l_start number;
begin
 dbms_application_info.set_module( 'do_work','input ='||p_input);
 
 l_start := dbms_utility.get_time();
  select * bulk collect into l_data
  from t 
  where x = p_input;
 l_end := dbms_utility.get_time();
 
 if (l_end - l_start) > SOME_THRESHOLD then 
  dbms_application_info.set_module( 'do_work','input ='||p_input
   ||' took more than '||(l_end - l_start)
   ||' secs - Analyze it');
 end if;
end;
/


b)How about sql's comming from application side? like sql generated from application frameworks (like Spring boot/hibernate frameworks etc).
Connor McDonald
November 25, 2019 - 12:26 pm UTC

ASH is your friend here. SQL_ID/SQL_EXEC_ID can be used to look for specific instances of a long running SQL.

More to Explore

Administration

Need more information on Administration? Check out the Administrators guide for the Oracle Database