Database, SQL and PL/SQL

Beginning Performance Tuning: Diagnose the Past

Take the next step in diagnosing Oracle Database performance issues.

By Arup Nanda

November/December 2012

In “Beginning Performance Tuning,” in the July/August 2012 issue of Oracle Magazine, I demonstrated the basics of diagnosing performance issues in Oracle Database. In this article, I will show how to take performance tuning to the next level—identifying patterns and measuring resource utilization. As was the case with the previous article, you can create this article’s examples by executing the setup script—setup.sql.

Historical Waits

Recall from “Beginning Performance Tuning” that you can identify why a session is slow by looking at what event it is waiting for right now in the V$SESSION view. This technique is effective when you are able to peek into the view while the issue is occurring, but what if a user reports that there was an increase in the response time of a session but it is back to normal now? Because the issue is no longer present, the session may appear to be working productively with the STATE column value of WAITED KNOWN TIME. What’s worse, the current wait event shown in the V$SESSION view may be something unrelated to what the user encountered earlier. To diagnose this issue, you need to know all the different events the session had to wait for—not just what it is waiting for now. How difficult is it to get that information?

Fortunately, it’s quite trivial. There is another view—V$SESSION_EVENT—that shows the details of all the wait events the session has ever waited for. Here are the important columns in that view:

SID: the session ID
EVENT: the name of the wait event experienced by the session
TOTAL_WAITS: the total number of times the session had to wait for that wait event
TIME_WAITED: the total length of time the session had to wait for that event, recorded in cs (centiseconds [hundredths of a second])
AVERAGE_WAIT: the average length of time (also in cs) the session waited for that event
MAX_WAIT: the maximum length of time (in cs) the session waited for that event

Let’s examine the usage of the view with an example. Make sure you have set up the example as described in “Test Case Setup.” Connect to the Oracle Database instance as ARUP, and execute the test1.sql script, also available in “Test Case Setup.” (Note that the script will take some time to complete). While that script is running, connect as SYS from a different session and find the SID of the ARUP session as follows:

select sid from v$session
where username = 'ARUP';
 SID
—————
  37

Now display the session events of SID 37 by executing the script in Listing 1. Because the V$SESSION_EVENT columns report time in centiseconds, I have multiplied them by 10 to show them in milliseconds (ms), a unit of measure that is a bit more familiar. Study the output carefully—it shows the various events the session waited for earlier, not what it is experiencing now. You can see that the session has waited for one event in particular: “kfk: async disk IO.” The session has waited 904,818 times for this event, but it has waited for a total of only 3,050 ms. The average wait shows zero, but it is simply because the average wait was so short that it couldn’t be shown within the two digits after the decimal point. So although the total number of waits for this event was high, it added only 3,050 ms to the overall time of the session—an insignificant number. You can thus rule this wait event out as a cause of delay for the session.

Code Listing 1: History of wait events in a specific session

set lines 120 trimspool on
col event head "Waited for" format a30
col total_waits head "Total|Waits" format 999,999
col tw_ms head "Waited|for (ms)" format 999,999.99
col aw_ms head "Average|Wait (ms)" format 999,999.99
col mw_ms head "Max|Wait (ms)" format 999,999.99
select event, total_waits, time_waited*10 tw_ms,
       average_wait*10 aw_ms, max_wait*10 mw_ms
from v$session_event
where sid = 37
/
                                  Total      Waited     Average         Max
Waited for                        Waits    for (ms)   Wait (ms)   Wait (ms)
—————————————————————————— ————————————  ———————————  ——————————  —————————
Disk file operations I/O              8         .00         .10         .00
KSV master wait                       2      350.00      173.20      340.00
os thread startup                     1       20.00       19.30       20.00
db file sequential read               5      160.00       32.10       70.00
direct path read                  1,521   51,010.00       33.50      120.00
direct path read temp           463,035  513,810.00        1.10      120.00
direct path write temp               20      370.00       18.70       50.00
resmgr:cpu quantum                   21      520.00       24.60      110.00
utl_file I/O                          8         .00         .00         .00
SQL*Net message to client            20         .00         .00         .00
SQL*Net message from client          20    9,620.00      481.20    9,619.00
kfk: async disk IO              904,818    3,050.00         .00         .00
events in waitclass Other            35       20.00         .70       20.00

In this case, you should focus your attention on the event that caused the session to wait for the maximum time. From the output in Listing 1, you can see that session 37 (SID = 37) waited 513,810 ms, or more than 8.5 minutes, for the “direct path read temp” event. Every time the session waited for this event, it waited 1.1 ms on average, so if you can reduce the time for this event, you can reduce the overall time for the session. Looking at the session event history enables you to identify the biggest contributors to the delay in the session, whether they are currently affecting the session or have already affected it.

Did you notice the column for the maximum time waited: “Max Wait (ms)”? Why is that information useful? You see, the average wait time does not tell the whole story. Consider the “SQL*Net message from client” event in the output in Listing 1. The session waited 20 times with an average wait time of 481 ms for that event. Does that mean that the session waited for approximately 481 ms at each of the 20 occurrences or that the session waited a very short time for most of the event instances and a very long time for one event? The latter will skew the average to a high value but will indicate an isolated issue rather than a persistent one—the two possibilities lead to very different conclusions.

The “Max Wait (ms)” column shows the maximum time the session had to wait for one occurrence of this event. The value here is 9,619 ms, and because the total wait time was 9,620 ms, it appears that the session waited 9,619 ms on one occasion, leaving a 1 ms combined total for the other 19 occasions—a very small wait each time. Considering the single incidence of a large wait, this event should not be a general cause of concern. On the other hand, had you seen a maximum time close to the average time, you could surmise that all occurrences had to wait about the same amount of time. In such a case, reducing the time for this event would likely apply uniformly to all occurrences and consequently reduce the overall elapsed time.

Although the V$SESSION_EVENT view shows what the session waited for earlier, it doesn’t show when. That information is visible in another view—V$ACTIVE_SESSION_HISTORY (part of the extra-cost Oracle Diagnostics Pack)—which is beyond the scope of this article.

Statistics

Although wait events are great for helping with understanding the speed bumps the sessions experience, they do not show another important attribute of sessions: the use of resources such as CPU, I/O, and memory. A resource-hogging session deprives other sessions of the same resources, thus causing performance issues. When the root of the problem is that the session is consuming too much CPU, you should look at resource consumption—not the events waited for—by a session. Fortunately, finding that information is quite simple: it’s visible in a view named V$SESSTAT, which has three columns:

SID: the session ID
STATISTIC#: the ID of the statistic being captured (You can get the name of the statistic from the NAME column in another view—V$STATNAME—by using this ID.)
VALUE: the value of the statistic

Let’s see how to use this information with a performance problem example.

CPU Spike

Suppose you’ve heard from several users that the performance is terrible across the board. Further, the UNIX system administrator has reported that both CPU and memory consumption are very high on the server and most of the consumption is by Oracle Database–owned processes. Sound familiar? From my experience, it’s the second-most-common performance issue in Oracle Database systems. You can reproduce this problem by running the test1.sql script, available in this article’s example scripts, as user ARUP.

To diagnose this resource issue, review the top resource-consuming processes by issuing the top command at the UNIX command prompt. Listing 2 shows the output of the top command.

Code Listing 2: Output of the top command

top - 10:56:49 up 18 days, 18:48,  4 users,  load average: 1.02, 0.92, 0.48
Tasks: 180 total,   2 running, 178 sleeping,   0 stopped,   0 zombie
Cpu(s): 49.8%us,  0.5%sy,  0.0%ni, 49.2%id,  0.5%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   1815256k total,  1771772k used,    43484k free,    66120k buffers
Swap:  2031608k total,   734380k used,  1297228k free,   747740k cached
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 5946 oracle    25   0  706m 177m 159m R  100 10.0   9:20.26 oracle
 6104 oracle    15   0  2324 1060  800 R    1  0.1   0:00.12 top
31446 oracle    15   0  688m 135m 129m S    0  7.7   0:08.24 oracle
… output truncated …

In the output in Listing 2, you can see that the process with ID 5946 consumes the most CPU (100 percent) and memory (10 percent) and therefore should be the focus of your attention. To find out more about the process, enter the following command at the UNIX prompt:

$ ps -aef|grep 5946
oracle    5946  5945 63 10:59 ?
00:01:52 oracleD112D2 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))

The output shows the entire description of the process, which is clearly an Oracle “server process”—a process that is created by Oracle Database when a session is established—and that the process has been running for 1 minute and 52 seconds. The next question, then, is which Oracle Database session this process was created for. For that, you should look into another view—V$PROCESS—where the SPID column shows the server process ID. However, this view does not show the session information, so you need to join this view with the familiar V$SESSION view, as follows:

select sid
from v$session s, v$process p
where p.spid = 5946
and s.paddr = p.addr;
SID
———
37

Once you know the SID, you can get everything you need to know about the session—the user who established the session, the machine it came from, the operating system user, the SQL it is executing, and so on—from the V$SESSION view. This script for getting the information was described in Listing 5 in “Beginning Performance Tuning,” in the July/August 2012 issue of Oracle Magazine. To find the SQL being run by session 37, use this query:

select sql_fulltext
from v$sql l, v$session s
where s.sid = 37
and l.sql_id = s.sql_id;

Here is the output:

select max(test1.owner)
from test1, test2, test2, test2,
     test2, test2, test2, test2,
     test2, test2, test2, test2,
     test2, test2, test2, test2,
     test2, test2, test2, test2,
     test2, test2, test2, test2;

This SQL is performing multiple Cartesian joins, so it’s no wonder it’s consuming so much CPU and memory.

Now that you have found the culprit, should you just go ahead and kill the session to release the CPU consumption? Not so fast. First, you need to know if the CPU consumption was recent or if the session has been chewing it up since the beginning. This is where the V$SESSTAT view comes in very handy—it shows the resource consumption (CPU in this case) by a specific session. To find out the CPU used by session 37, you would use the following query:

select s.value
from v$sesstat s, v$statname n
where s.sid = 37
and n.statistic# = s.statistic#
and n.name = 'CPU used by this session';
VALUE
—————
47379

The output shows the number of CPU “ticks” that have been consumed by this session since it started. Considering that this session has been running for about two minutes, the CPU consumption is pretty high, so it is likely that this session has been consuming CPU all the time. Again, checking the session’s other details, such as the SQL it is executing, makes it fairly easy to understand why this is the case: the session is performing a multiple-table Cartesian join, which is bound to consume a lot of CPU. At this point, you may decide to kill the session to stop the high CPU consumption, or you may decide to let it run for now and fix the SQL later to avoid the Cartesian product.

All Statistics

Let’s revisit the current problem by checking the CPU consumption once again with this:

select s.valuez
from v$sesstat s, v$statname n
where s.sid = 37
and n.statistic# = s.statistic#
and n.name = 'CPU used by this session';
VALUE
—————
69724

Now the result—the CPU used—is 69,724. Note that this number is larger than the number the last time I checked CPU usage (47,379). This is because the statistic value increases over time. The first time I checked CPU usage, I surmised that the multitable Cartesian product was to blame for that CPU consumption—but can I prove it? The answer is a resounding “Yes, I can, using other statistics.”

Listing 3 shows the query for gathering all the statistics for session 37. In the output, note the “table scan rows gotten” statistic value: 1.0236E+10—about 10 billion rows! This is indeed a very high number of rows to be accessed by one session in two minutes. The value for the “consistent gets” statistic is 25,898,543—about 25.9 million blocks read from the buffer cache. The high number of buffer gets takes up a considerable amount of CPU.

Code Listing 3: All session statistics

select name, value
from v$sesstat s, v$statname n
where sid = 37
and n.statistic# = s.statistic#
order by value desc
/
NAME                                        VALUE
———————————————————————————————   ———————————————
table scan rows gotten                 1.0236E+10
session logical reads                    25898547
consistent gets                          25898543
table scan blocks gotten                 25325165
session pga memory max                   21250020
session pga memory                       21250020
session uga memory max                   20156552
session uga memory                       20156552
bytes sent via SQL*Net to client           878760
recursive calls                            576848
opened cursors cumulative                  143367
parse count (total)                        143292
parse count (hard)                         143118
table scans (short tables)                 143086
sql area evicted                           141996
DB time                                     70007
CPU used by this session                    69724
… output truncated …

Another cause of CPU consumption is the parsing of SQL statements. In the output in Listing 3, note the “parse count (total)” statistic, a very high number at 143,292. In plain English, it means that the session had to parse—not just execute—SQL statements that many times in about two minutes, which is quite unusual. Examine the SQL statement executed by the session (shown in the script test2.sql script in “Test Case Setup”). You can see that it creates distinct literal SQL statements. Each literal SQL statement needs to be parsed, which is something you have confirmed by looking at the parse count. Therefore, you surmise from the Listing 3 output that there are two causes of high CPU usage for this session: a high number of buffer gets and a high number of parses.

In Listing 3, also note two other sets of statistics: “session pga memory max” and “session uga memory max,” which indicate the total memory consumed by the session. The very high numbers explain the high memory consumption by the Oracle server process that you noticed in the output of the operating system top command earlier. If you want to reduce the CPU and memory consumption of the server, you need to make sure the session consumes fewer of these resources, by appropriately modifying the SQL statement it issues.

Redo Spike

Occasionally you may have a performance issue that will not appear as clearly at the OS level as CPU and memory consumption. One such case is redo generation by the database instance, which, in turn, increases both the rapid switching of redo logs and the creation rate and number of archived logs. This may cause an increase in overall I/O on file systems (or Oracle Automatic Storage Management disk groups), causing a systemwide performance issue. To alleviate this type of issue, you need to locate the session or sessions that caused the generation of high amounts of redo, but looking at OS metrics will not provide any insights into the offending session. In this case, you need to look at the sessions responsible for most of the load: the sessions generating maximum redo. Again, this information is available quite easily in the same V$SESSTAT view. The following query shows the sessions generating the most redo:

select sid, value
from v$sesstat s, v$statname n
where n.statistic# = s.statistic#
and n.name = 'redo size'
order by value desc;
 SID     VALUE
————  ————————
 13   11982752
 10    3372240
 17     964912
 26     571324
… output truncated …

It’s clear from the output that SID 13 produced most of the redo, followed by SID 10, and so on.

The Setup sidebar describes how to find the SQL statement that is generating the redo, provides a replacement SQL statement, and tests that new statement for redo generation.

Other Statistics I’ve shown you how to use the redo size, session pga memory max, and CPU statistics used by a session. Here are some other useful statistics visible in the V$SESSTAT view:

physical reads: the number of database blocks retrieved from disk
db block changes: the number of database blocks changed in the session
bytes sent via SQL*Net to client: the bytes received from the client over the network, which is used to determine the data traffic from the client

These are just a few of the 604 such statistics available in the V$SESSTAT view. It is not possible to describe each one of them, but I hope you get the general idea of how to use statistics to see the various resources used by a session and focus on performance issues. (Note that there is another view—V$SYSSTAT—that shows the statistics of the entire instance.)

Conclusion

In this article, you learned about two very important sources of performance tuning information in Oracle Database: (1) the history of wait events experienced by sessions—visible in V$SESSION_EVENT, and (2) the resource consumption by session—visible in V$SESSTAT. From the history, you can find out why a session has waited in the past and for how long—very useful information for diagnosing performance issues when they are no longer present. The resource statistics show the consumption of various resources such as CPU, memory, and redo by a specific session—very useful for focusing on resource-hogging sessions. Using these two views and scripts mentioned in this article, you can resolve many performance issues encountered in Oracle Database instances. Happy tuning!

Test Case Setup

To set up the test cases for this article, execute the following script. It assumes that you have access to the SYS user, you can create a user called ARUP (which means that you don’t have a user by the same name), and you have a USERS tablespace with at least 64 KB of free space.

Connect as the SYS user, and execute the following:

-- Script: setup.sql
connect sys/<password> as sysdba
create user arup identified by arup
default tablespace users
/
alter user arup quota unlimited on users
/
-- now connect as arup
connect arup/arup
-- drop the tables if present already
drop table test1
/
drop table test2
/
drop table test3
/
create table test1
as
select * from all_objects
/
create table test2
as
select * from test1
/
create table test3
as
select rownum, created
from test3
/
-- Script test1.sql
declare
    l_dummy varchar2(200);
begin
    select max(test1.owner)
    into l_dummy
    from test1, test2, test2, test2, test2,
        test2, test2, test2, test2,
        test2, test2, test2, test2,
        test2, test2, test2, test2,
        test2, test2, test2, test2,
        test2, test2, test2, test2;
    dbms_lock.sleep(120);
end;
/
-- Script test2.sql
declare
    l_dummy_1   varchar2(2000);
    l_dummy_2   varchar2(2000);
    l_stmt      varchar2(2000);
begin
    for i in 1..71540 loop
        l_stmt :=
            ‘select to_char(col2,’’mm/dd/yyyy hh24:mi:ss’’)’||
            ‘ from test3’||
            ‘ where col1 = to_char(‘||i||’)’;
        dbms_output.put_line(‘l_stmt=’||l_stmt);
        execute immediate l_stmt into l_dummy_1;
        l_stmt :=
            ‘select col1 ‘||
            ‘from test3 ‘||
            ‘where col2 = to_date(‘’’||
            l_dummy_1||
            ‘’’,’’mm/dd/yyyy hh24:mi:ss’’)’||
            ‘ and col1 = ‘’’||to_char(i)||’’’’;
        dbms_output.put_line(‘l_stmt=’||l_stmt);
        execute immediate l_stmt into l_dummy_2;
    end loop;
end;
/

Next Steps

READ “Beginning Performance Tuning”

LEARN more about performance tuning
Oracle Database 2 Day + Performance Tuning Guide 11g Release 2 (11.2)

Arup Nanda

Arup Nanda has been an Oracle DBA since 1993, handling all aspects of database administration, from performance tuning to security and disaster recovery. He was Oracle Magazine’s DBA of the Year in 2003 and received an Oracle Excellence Award for Technologist of the Year in 2012.

DISCLAIMER: We've captured these popular historical magazine articles for your reference. Links etc contained within these article possibly won't work. These articles are provided as-is with no guarantee that the information within them is the best advice for current versions of the Oracle Database.

Share

Beginning Performance Tuning: Diagnose the Past

Take the next step in diagnosing Oracle Database performance issues.