How to Analyze the Performance History of a SQL Statement as Recorded in AWR Using Simple SQL (Doc ID 1580764.1) To BottomTo Bottom
In this Document
Goal
Solution
Identify the SQL_ID for the SQL you want to analyze.
Script to collect SQL execution history based on AWR data retention.
Execute the script xplawrall.sql which you download above.
For FULL sample AWR data download the attached file
(a) Explanation for the data points in the sample output
(b) Data Analysis
How to fix the SQL with good plan in a situation where multiple plans are used/available?
APPLIES TO:
Oracle Database - Enterprise Edition - Version 10.2.0.1 and later
Oracle Net Services - Version 10.2.0.5 to 10.2.0.5 [Release 10.2]
Information in this document applies to any platform.
GOAL
Identify and analyze the problematic SQL using AWR data and fix based on efficient plan.
SOLUTION
Identify the SQL_ID for the SQL you want to analyze.
If the SQL is currently being executed or executed very recently, it might be available in shared pool.
SQL> set long 10000000
SQL> select sql_id,sql_fulltext from v$sql where sql_text like '%PART_OF_THE_SQL%';
If you find the SQL, make note of the SQL_ID for the SQL and go to step #2.
If the SQL is not in shared pool :
SQL> set long 10000000
SQL> select sql_id,sql_text from dba_hist_sqltext where sql_text like '%PART_OF_THE_SQL%';
If you find the SQL, make note of the SQL_ID for the SQL.
Script to collect SQL execution history based on AWR data retention.
Download the script xplawrall.sql
Execute the script xplawrall.sql which you download above.
It will prompt for the SQL_ID and number of days to go back for AWR data. Enter the SQL_ID identified in step #1 above.
SQL> @xplawrall.sql
Enter value for sqlid: 78q57fuchx12w
old 18: b.sql_id = '&sqlid'
new 18: b.sql_id = '78q57fuchx12w'
Enter value for days_back: 180
old 20: and s.begin_interval_time >= sysdate-&days_back
new 20: and s.begin_interval_time >= sysdate-180
For FULL sample AWR data download the attached file
Plan Analysis Sample Data
(a) Explanation for the data points in the sample output
PLAN_HASH_VALUE If there are more than one plan for the SQL, you will see multiple plan hash values under this column; Like you see in the sample AWR data
INST If the SQL is run in one or more instances of RAC environment, this column will tell in which instance the SQL was executed
SNAP_ID The snap ID of AWR
SNAP_TIME The snap time of AWR
DELTA_EXE The number of executions during that snap period
DELTA_CPU_SEC The CPU seconds for all the executions during that snap period
CPU_EXE The CPU seconds per execution (DELTA_CPU_SEC/DELTA_EXE). If there are more than 1 execution (DELTA_EXE), then the CPU per execution need not be same for every executions.
ELA_EXE Elapsed time per execution (Average). That is, If there are more than 1 execution (DELTA_EXE), then the elapsed time per execution need not be same for every executions.
BG_EXE The number of Buffer Gets (Logical I/O) per execution (average). That is, If there are more than 1 execution (DELTA_EXE), then the Buffer Gets per execution need not be same for every executions.
PRD_EXE The number of blocks read from disk per execution (average). That is, If there are more than 1 execution (DELTA_EXE), then the number of blocks read from disk per execution need not be same for every executions.
ROWS_EXE The number of rows processed per execution (average). That is, If there are more than 1 execution (DELTA_EXE), then the number of rows processed per execution need not be same for every executions.
PLAN_TABLE_OUTPUT Under this section you may see,
(i) the actual SQL text for the SQL_ID
(ii) plan table with the plan hash value
(iii) query block identified to each of the execution IDs in plan table (Query Block Name / Object Alias (identified by operation id):)
(iv) outline data used for the SQL at the time of execution (Outline Data)
(v) the bind data, if bind are used in the sQL (Peeked Binds (identified by position):). NOTE : Not all the bind data used in the last X days are reported here. If the data is
collected of more than one day, the bind data shown in the report will be for the oldest date.
Note -
An IMPORTANT. In the sample output you see the following:
"- SQL profile "coe_78q57fuchx12w_2349384962" used for this statement"
It says that the SQL used the "SQL profile" at the time of execution.
Here are the few examples which you may see under Note section.
- statistics feedback used for this statement
- dynamic sampling used for this statement (level=2)
- Degree of Parallelism is 8 because of hint
- automatic DOP: Computed Degree of Parallelism of 8
(b) Data Analysis
How to identify good and bad plan?
In the attached word document, you see metrics for 3 different plan hash values. The lines highlighted in red and bold are bad plans and lines highlighted in green are good. The reason I say the plans highlighted in red and bold are bad is, at the glimpse of the data, the execution time are high in red (mostly 3 digits) compared to the ones in green (mostly 2 digits).
Compare and analyze the data
Scenario : 1 (compare good and bad plan data)
Let us take the metric data one line each from good and bad.
Data from Bad plan:
PLAN_HASH_VALUE INST SNAP_ID SNAP_TIME DELTA_EXE DELTA_CPU_SEC CPU_EXE ELA_EXE BG_EXE PRD_EXE ROWS_EXE
--------------- ---- ---------- ----------- ---------- ------------- ------------- ------------- ------------ ---------- ----------
1713891141 3 118090 20130827:04 1 71.738094 71.7381 269.9821 4207596 3389259 230
There was only one execution fetched 230 rows took 269.98 seconds. The total number of blocks read to fetch 230 rows was 7596855 (BG:4207596 + PR:3389259).
So, the average number of blocks to fetch 1 row is 33029.80 (4207596 + 3389259 / 230). That is because of TABLE ACCESS FULL on XLA_EVENTS.
Date from Good plan:
PLAN_HASH_VALUE INST SNAP_ID SNAP_TIME DELTA_EXE DELTA_CPU_SEC CPU_EXE ELA_EXE BG_EXE PRD_EXE ROWS_EXE
--------------- ---- ---------- ----------- ---------- ------------- ------------- ------------- ------------ ---------- ----------
2349384962 1 112053 20130625:06 2 26.875913 13.4380 54.3881 657671 34007.5 37760
There was 2 executions (DELTA_EXE) during this snap time. It took only 54.38 seconds (avg) (ELA_EXE) to fetch 37760 rows (avg) (ROWS_EXE). The total number of blocks read to fetch 37760 rows was 691678.5 (BG:657671 + PR:34007.5). So, the average number of blocks read to fetch 1 row is 18.31 (657671 + 34007.5 / 37760). That is because of index scan (INDEX RANGE SCAN on XLA_EVENTS_N3) instead of TABLE ACCESS FULL on XLA_EVENTS table in bad plan.
Scenario : 2 (Why high elapsed time in GOOD plan?)
On a few occasions, you can see that the high elapsed time in good plan that are comparable to the elapsed time of bad plan. Here is reason for the high elapsed time even in Good plan. Look at the data given above in Scenario 1 under Data from Good plan and the below in this scenario. Both belongs to the same plan hash value. Here we see high elapsed time (ELA_EXE), and Physical Read (PRD_EXE) but the number of rows fetched (ROWS_EXE) is also low compared to the one in Scenario 1 under Data from Good plan. The number of blocks read per record in Scenario 1 under Data from Good plan is 18.31 and for the below is 16.96. But still the elapsed time was high. As you know, physical read is always expensive and slower than the logical read. So, in this case, the reason for high elapsed time is the high physical read.
So, if the tables and indexes part of the SQL can be kept (keep_pool) in the cache, you may be able to avoid physical read to improve the performance:
PLAN_HASH_VALUE INST SNAP_ID SNAP_TIME DELTA_EXE DELTA_CPU_SEC CPU_EXE ELA_EXE BG_EXE PRD_EXE ROWS_EXE
--------------- ---- ---------- ----------- ---------- ------------- ------------- ------------- ------------ ---------- ----------
1 112163 20130626:10 1 11.177301 11.1773 230.4202 348393 73615 24874
Scenario : 3 (Can fragmentation cause high elapsed time?)
If you compare the metrics of this scenario with other scenarios above, all the data in all the columns in this scenario are almost comparable with other other 2 scenarios of good plan except ROWS_EXE. The number of blocks to fetch one row is 11213. That is, read more blocks to meet the condition. If you look at the plan for this plan hash value, you can see "INDEX RANGE SCAN" and "INDEX SKIP SCAN". The concentration of data in the index block may be very low that causes to read more blocks and but less number of rows. This may be due to fragmention in index objects involved in this SQL.
PLAN_HASH_VALUE INST SNAP_ID SNAP_TIME DELTA_EXE DELTA_CPU_SEC CPU_EXE ELA_EXE BG_EXE PRD_EXE ROWS_EXE
--------------- ---- ---------- ----------- ---------- ------------- ------------- ------------- ------------ ---------- ----------
1 112240 20130627:05 1 11.11931 11.1193 166.5117 324605 67851 35
So, rebuilding of the indexes might help.
How to fix the SQL with good plan in a situation where multiple plans are used/available?
(a) If the application code can be changed, then modify the SQL in the application with the hint by copying the outline data of a good plan from /* to */
(b) Use SPM to swap a plan. Follow Doc ID : 1580762.1