Skip to Main Content

Breadcrumb

more

Connor and Chris don't just spend all day on AskTOM. You can also catch regular content via Connor's blog and Chris's blog. Or if video is more your thing, check out Connor's latest video and Chris's latest video from their Youtube channels. And of course, keep up to date with AskTOM via the official twitter account.

Question and Answer

Connor McDonald

Thanks for the question, Ardiel.

Asked: November 06, 2019 - 3:49 pm UTC

Answered by: Connor McDonald - Last updated: November 25, 2019 - 12:26 pm UTC

Category: Database Administration - Version: 12.2.01

Viewed 100+ times

Whilst you are here, check out some content from the AskTom team: Oh those tricky exception sections! Why won't it handle my error?

You Asked

Hello Tom

I am a member of an site reliability team (SRE) and we are trying to develop SRE "golden signals" for an Oracle 11g/12c database.

These signal are:

1) Throughput
2) Latency
3) Response Time
4) Error rate (not sure about this one)

How do I derive these measurements using SQL queries in an Oracle database?

My research has highlighted the following SQL queries:

1) Response time
-------------

select to_char(begin_time,'hh24:mi') time, round( value * 10, 2) "Response Time (ms)"
     from v$sysmetric
     where metric_name='SQL Service Response Time'



2) Throughput
----------

select a.begin_time, a.end_time,round(((a.value + b.value)/131072),2) "GB per sec" 
where a.metric_name = 'Logical Reads Per Sec' 
and b.metric_name = 'Physical Reads Direct Per Sec' 
and a.begin_time = b.begin_time; 


Not sure about latency?



and we said...

I'll be honest - I'm not a fan of these kind of "signals", at least at the macro level.

For example, if I run 10 SQL statements with the following response time:

1ms
1ms
1ms
10secs
1ms
1ms
1ms
1ms
1ms
1ms

then my average response time is: 1second

1 second might be deemed as "OK", but we've lost the fact that somewhere in there was someone stuck for 10 seconds waiting on their browser.

Rather than signals, look out for outliers that are exceeding your thresholds, and even they should be as defined by business requirements. For example, the 10seconds above is a disaster on a web browser, but 1000seconds might be totally fine for a nightly batch job.

And a good way of doing that is with the AWR reports, or, if you are not licensed for that, things like Statspack because they'll capture information from things like V$EVENT_HISTOGRAM to give you a better distribution of data.

and you rated our response

  (2 ratings)

Reviews

November 21, 2019 - 6:01 am UTC

Reviewer: A reader

I agree with Connor on this.. these are just that i.e. "signals".

And the idea for setting these gets up in the long run.. especially when you start getting false positives.

If you have access to OEM, then you can configure alerts for TOP SQLs, long running SQLs, SQL throughput, IO, etc.

If you don't then you can utilize tools available at the OS level for disk statistics, TOP process etc.

If you are on cloud, then there are tools provided which will help you configure alerts, etc.

Cheers!

on getting to know those outliers.

November 21, 2019 - 10:36 am UTC

Reviewer: Rajeshwaran Jeyabal

....
look out for outliers that are exceeding your thresholds,
....


On a Typical transactional system - how do we know that which particular execution took more than X secs?

(dba_hist_sqlstat - doesn't expose the details at each execution, we see the cumulative value there)

Questions
..........
a) perhaps by doing statement level logging like this inside a stored proc?

create or replace procedure do_work(p_input number)
as
 l_start number;
begin
 dbms_application_info.set_module( 'do_work','input ='||p_input);
 
 l_start := dbms_utility.get_time();
  select * bulk collect into l_data
  from t 
  where x = p_input;
 l_end := dbms_utility.get_time();
 
 if (l_end - l_start) > SOME_THRESHOLD then 
  dbms_application_info.set_module( 'do_work','input ='||p_input
   ||' took more than '||(l_end - l_start)
   ||' secs - Analyze it');
 end if;
end;
/


b)How about sql's comming from application side? like sql generated from application frameworks (like Spring boot/hibernate frameworks etc).
Connor McDonald

Followup  

November 25, 2019 - 12:26 pm UTC

ASH is your friend here. SQL_ID/SQL_EXEC_ID can be used to look for specific instances of a long running SQL.

More to Explore

Administration

Need more information on Administration? Check out the Administrators guide for the Oracle Database