Skip to Main Content
  • Questions
  • Sub-query in WHERE statement "breaks" optimization?

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Alexander.

Asked: June 17, 2016 - 2:34 pm UTC

Last updated: June 20, 2016 - 3:54 am UTC

Version: 11.2.0.2

Viewed 1000+ times

You Asked

We have a large data warehouse with data going back over 10 years. Asked to write a query to generate a month to date summary I ran into a unexpected issue.

There is a table with all invoiced sales, like

sales_tab (
  source_system Varchar2(10),
  invoice_date  date,
  part_no       varchar2(10),
  qty           number,
  net_amount    number,
  cost          number,
  ...
  ...)


I built a query like:

SELECT part.vendor,
       ...
       SUM( sales.net_amount ),
       SUM( sales.cost )
  FROM sales_tab sales
 INNER JOIN part_information_tab part ON part.source_system = sales.source_system AND part.part_no = sales.part_no
  LEFT OUTER JOIN <other part info>
  LEFT OUTER JOIN <more part info>
 WHERE sales.source_system = 'SystemA'
   AND invoice_date >= TRUNC(SYSDATE, 'MM')
 GROUP BY part.vendor, ...


Everything went well, and I got good performance. Looking at the plan for the query I could see that Oracle was using the existing index on the sales_tab.invoice_date column to do a index range scan.

But the finance department has in its infinite wisdom decided that our accounting periods shouldn't be the calendar months. Well, no problem I thought, and replaced
TRUNC(SYSDATE, 'MM')
with
(SELECT MAX(START_DATE) FROM ACCOUNTING_PERIODS_TAB WHERE source_system = 'SystemA' AND START_DATE <= TRUNC(SYSDATE))


The performance tanked completely. :( The optimizer here decides to do full table access joins of sales and part before applying any filtering, which means that it is going through several years worth of data instead of only the last month. I can only assume that the optimizer evaluates the first expression earlier and then realizes the benefit of using the index.

I tried influencing the optimizer with hints like INDEX specifying the invoice_date index, but however I tried it was ignored. I tried PUSH_SUBQ to try and make it evaluate the sub-query earlier so it could use the answer better, but that didn't help either.


Is there some other hint I should be looking at, or is there some other way I should rewrite the query to try to regain the performance?

Best regards,


and Connor said...

OK, lets look at cause first. Here's a demo that is comparable


SQL> drop table t cascade constraints purge;

Table dropped.

SQL> create table t ( pk int, invoice_date date , other_data char(100) )
  2  tablespace DEMO;

Table created.

SQL> col x new_value y
SQL> select sysdate - date '2005-12-31' x from dual;

         X
----------
3824.45914

1 row selected.

SQL> insert /*+ APPEND */ into t
  2  select rownum, date '2005-12-31' + rownum * &&y / (10000*10000), rownum
  3  from
  4  ( select 1 from dual connect by level <= 10000 ),
  5  ( select 1 from dual connect by level <= 10000 );

100000000 rows created.

SQL>
SQL> commit;

Commit complete.

SQL> create index IX on T ( invoice_date ) tablespace DEMO;

Index created.

SQL> drop table acct_periods purge;

Table dropped.

SQL> create table acct_periods ( p int primary key, d date );

Table created.

SQL>
SQL> insert into acct_periods values (1, date '2016-02-01');

1 row created.

SQL> insert into acct_periods values (2, date '2016-03-01');

1 row created.

SQL> insert into acct_periods values (3, date '2016-04-01');

1 row created.

SQL> insert into acct_periods values (4, date '2016-05-01');

1 row created.

SQL> insert into acct_periods values (5, date '2016-06-01');

1 row created.

SQL> insert into acct_periods values (6, date '2016-07-01');

1 row created.

SQL> insert into acct_periods values (7, date '2016-08-01');

1 row created.

SQL>
SQL> commit;

Commit complete.


SQL> set autotrace traceonly explain
SQL> select max(other_data)
  2  from t
  3  where invoice_date > trunc(sysdate,'MM');

Execution Plan
----------------------------------------------------------
Plan hash value: 2143077847

---------------------------------------------------------------------------------------------
| Id  | Operation                            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |      |     1 |   109 |  9837   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE                      |      |     1 |   109 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T    |   508K|    52M|  9837   (1)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | IX   |   508K|       |  1352   (1)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("INVOICE_DATE">TRUNC(SYSDATE@!,'fmmm'))

SQL>
SQL> select max(other_data)
  2  from t
  3  where invoice_date >
  4   ( select max(d)
  5     from acct_periods
  6     where d < trunc(sysdate)
  7   );

Execution Plan
----------------------------------------------------------
Plan hash value: 459069127

-----------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |              |     1 |   109 | 17395   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE                      |              |     1 |   109 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T            |  5000K|   519M| 17395   (1)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | IX           |   900K|       |  2388   (1)| 00:00:01 |
|   4 |     SORT AGGREGATE                   |              |     1 |     8 |            |          |
|*  5 |      TABLE ACCESS FULL               | ACCT_PERIODS |     5 |    40 |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("INVOICE_DATE"> (SELECT MAX("D") FROM "ACCT_PERIODS" "ACCT_PERIODS" WHERE
              "D"<TRUNC(SYSDATE@!)))
   5 - filter("D"<TRUNC(SYSDATE@!))



Put aside for a moment that both are using index access. The key thing to look at is the estimated row counts. In the first example, our estimate is 508,000 rows. Check out the true value below - our estimate was spot on, because we *know* in advance the invoice date value we are using. Now look at the second plan...once the subquery got involved, we bailed out and went with 5000K, which is a "default", ie, 5% of the total number of rows. That is why your plans are more at risk, because we dont have as much good information to work with.

SQL> set autotrace off
SQL> select count(*) from t where invoice_date > trunc(sysdate,'MM');

  COUNT(*)
----------
    508808

1 row selected.

SQL>


But we should be able to rectify with a hint, so I'm surprised you said that wasnt working. To bring this more into alignment with yours, I'll make indexes very expensive so it opts for a full scan:

SQL> exec dbms_stats.set_system_stats('MBRC',128)

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.set_index_stats('','IX',CLSTFCT=>10000*10000 );

PL/SQL procedure successfully completed.

SQL> alter session set optimizer_index_cost_adj = 800;

Session altered.

SQL> set autotrace traceonly explain
SQL> select max(other_data)
  2  from t
  3  where invoice_date >
  4   ( select max(d)
  5     from acct_periods
  6     where d < trunc(sysdate)
  7   );

Execution Plan
----------------------------------------------------------
Plan hash value: 2217154347

-------------------------------------------------------------------------------------
| Id  | Operation            | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |              |     1 |   109 |   289K  (1)| 00:00:12 |
|   1 |  SORT AGGREGATE      |              |     1 |   109 |            |          |
|*  2 |   TABLE ACCESS FULL  | T            |  5000K|   519M|   289K  (1)| 00:00:12 |
|   3 |    SORT AGGREGATE    |              |     1 |     8 |            |          |
|*  4 |     TABLE ACCESS FULL| ACCT_PERIODS |     5 |    40 |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("INVOICE_DATE"> (SELECT MAX("D") FROM "ACCT_PERIODS"
              "ACCT_PERIODS" WHERE "D"<TRUNC(SYSDATE@!)))
   4 - filter("D"<TRUNC(SYSDATE@!))
   
   


But when we add an index hint, the optimizer respects our wishes

SQL> select /*+ index(t) */ max(other_data)
  2  from t
  3  where invoice_date >
  4   ( select max(d)
  5     from acct_periods
  6     where d < trunc(sysdate)
  7   );

Execution Plan
----------------------------------------------------------
Plan hash value: 459069127

-----------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |              |     1 |   109 |   290K  (1)| 00:00:12 |
|   1 |  SORT AGGREGATE                      |              |     1 |   109 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T            |  5000K|   519M|   290K  (1)| 00:00:12 |
|*  3 |    INDEX RANGE SCAN                  | IX           |   900K|       | 19105   (1)| 00:00:01 |
|   4 |     SORT AGGREGATE                   |              |     1 |     8 |            |          |
|*  5 |      TABLE ACCESS FULL               | ACCT_PERIODS |     5 |    40 |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("INVOICE_DATE"> (SELECT MAX("D") FROM "ACCT_PERIODS" "ACCT_PERIODS" WHERE
              "D"<TRUNC(SYSDATE@!)))
   5 - filter("D"<TRUNC(SYSDATE@!))


Hence in your case, there is a chance that your query is being transformed before optimization, that your index hint is no longer being respected. Take a look at an 10053 trace for clues there.

Is this answer out of date? If it is, please let us know via a Comment

More to Explore

PL/SQL demos

Check out more PL/SQL tutorials on our LiveSQL tool.

PL/SQL docs

PL/SQL reference manual from the Oracle documentation library