I was recently asked to find out why an SQL in our web solution took so much time to execute.
- The SQL involved querying vast amounts of data
- Without the expression involved in the query which took much time, the SQL took less than a second to execute.
- Our DBA used an "optimizer advisor" tool to make an optimal execution plan. It then ran in less than a second.
The part which took time was (at the top of the query):
-
SELECT DECODE(COUNT(*), 0, 0, 1)
combined with (inside the query)
-
AND NVL(gu.dstopdate + in_exclude_prior_days, SYSDATE) >= SYSDATE
Question
Is there a way to get the optimizer to automatically find the best explain plan for an SQL with an expression included
Note
I did find a way to always get an optimal result by simply making the expression run on only the resulting query rows instead of the whole query rowset. But that is besides the point. An answer to the question above would make life a lot simpler for our DBA and of course our customers.
Getting accurate estimates for expressions is a common problem. There are various ways you can do this in Oracle Database.
A basic way is to specify the expression in the method_opt when gathering stats. e.g.:
dbms_stats.gather_table_stats ( user, '...', method_opt => 'for columns ( c1 + c2 )' );
This is easy to forget though.
You can get the database to detect these expressions automatically. To do this use DBMS_STATS.SEED_COL_USAGE to monitor a workload. For more details see Maria's post:
https://blogs.oracle.com/optimizer/post/how-do-i-know-what-extended-statistics-are-needed-for-a-given-workload You could also use the expression in a virtual column. The database will then gather stats on it like a regular column. This can also simplify your SQL, as you can use the virtual column instead of the expression - though this is unnecessary. The database will detect there's a virtual column matching the expression, even if you don't access it.
Note you can't use sysdate or other non-deterministic function in a virtual column as in your example. A virtual column that adds the column together should be enough to improve estimates though.
Here's a demo of a similar scenario. The query finds 1816 rows, but initially the optimizer estimates it'll return just 500:
create table t (
c1 int, c2 date
);
insert into t
select level, date'2020-01-01' + ( level / 24 )
connect by level <= 10000;
exec dbms_stats.gather_table_stats ( user, 't', method_opt => 'for columns ( c1 + c2 )' );
commit;
select count(*) from t
where nvl ( c1 + c2, sysdate ) <= sysdate;
COUNT(*)
----------
1816
select * from dbms_xplan.display_cursor ( format => 'BASIC +ROWS' );
-------------------------------------------
| Id | Operation | Name | Rows |
-------------------------------------------
| 0 | SELECT STATEMENT | | |
| 1 | SORT AGGREGATE | | 1 |
| 2 | TABLE ACCESS FULL| T | 500 |
-------------------------------------------
After adding a virtual column adding c1 and c2, and regathering stats, the optimizer gets an almost perfect estimate of 1817 rows. This happens whether you use the virtual column or stick with the original expression:
alter table t
add c3 as ( c1 + c2 );
exec dbms_stats.gather_table_stats ( user, 't' );
select count(*) from t
where nvl ( c1 + c2, sysdate ) <= sysdate;
select * from dbms_xplan.display_cursor ( format => 'BASIC +ROWS' );
-------------------------------------------
| Id | Operation | Name | Rows |
-------------------------------------------
| 0 | SELECT STATEMENT | | |
| 1 | SORT AGGREGATE | | 1 |
| 2 | TABLE ACCESS FULL| T | 1817 |
-------------------------------------------
select count(*) from t
where nvl ( c3, sysdate ) <= sysdate;
select * from dbms_xplan.display_cursor ( format => 'BASIC +ROWS' );
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 8 | 9 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 8 | | |
|* 2 | TABLE ACCESS FULL| T | 1817 | 14536 | 9 (0)| 00:00:01 |
---------------------------------------------------------------------------