We have an old query that was running fine (though it could use improvements in general) but after having a VPD policy put in place, it started to run long and not complete. The difference between an 1 - 1.5 hours to not completing after running for well over 48 hours.
After digging into the VPD policy in place, in this scenario of an application ID, they did not want to apply the policy so the predicate they were adding was just "1 = 1." From what I have experienced (and after doing some research) I haven't really found anything where 1 = 1 could really affect a query.
In this case, when we run the explain plan for the query in an environment where the new policies are not present, we get a different explain plan when the "1 = 1" is added into the query.
In a nutshell, this query will select against a large detail table, but it must first check whether detail has been completed (subquery with the vw_header_detail view) and then it will need to check whether the order was modified in the window of time we are looking at (2nd subquery that does an exist clause into 4 different tables). Note: these are all views that reference tables.
If it is completed and it shows that it has been modified in the time window, it goes back to the detail table and pulls all the applicable rows for that seq_id. Which is a very large table.
SELECT *
FROM vw_detail dtl
WHERE EXISTS (SELECT 1
FROM vw_header_detail sts1
WHERE ((sts1.posted_date IS NOT NULL) OR (sts1.secondary_posted_date IS NOT NULL))
AND sts1.seq_id = dtl.seq_id
AND sts1.c_status in ('P','D','C','G','K','A')
AND sts1.p_status in ('S','P','F','O') )
AND (EXISTS (SELECT 1
FROM vw_header hdr
WHERE nvl(update_datetime, :out_of_range) BETWEEN :from_date and :to_date
AND hdr.seq_id = dtl.seq_id)
OR EXISTS (SELECT 1
FROM vw_detail dtl2
WHERE nvl(update_datetime, :out_of_range) BETWEEN :from_date and :to_date
AND dtl2.seq_id = dtl.seq_id)
OR EXISTS (SELECT 1
FROM vw_secondary_header sech
WHERE nvl(update_datetime, :out_of_range) BETWEEN :from_date and :to_date
AND sech.seq_id = dtl.seq_id)
OR EXISTS (SELECT 1
FROM vw_secondary_detail secd
WHERE nvl(update_datetime, :out_of_range) BETWEEN :from_date and :to_date
AND secd.seq_id = dtl.seq_id))
ORDER BY dtl.seq_id, dtl.line_number
The existing query above will return this explain plan:
------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 21 | 9282 | 6282K (1)| 00:04:06 |
| 1 | SORT ORDER BY | | 21 | 9282 | 6282K (1)| 00:04:06 |
| 2 | NESTED LOOPS SEMI | | 21 | 9282 | 6282K (1)| 00:04:06 |
| 3 | NESTED LOOPS | | 21 | 9240 | 6281K (1)| 00:04:06 |
| 4 | VIEW | VW_SQ_1 | 945K| 6463K| 6278K (1)| 00:04:06 |
| 5 | HASH UNIQUE | | 1 | 27M| 6278K (1)| 00:04:06 |
| 6 | UNION-ALL | | | | | |
|* 7 | FILTER | | | | | |
|* 8 | TABLE ACCESS FULL | SECONDARY_DETAIL | 207K| 3033K| 693K (1)| 00:00:28 |
|* 9 | FILTER | | | | | |
|* 10 | TABLE ACCESS FULL | HEADER_TABLE | 73653 | 1078K| 1643K (1)| 00:01:05 |
|* 11 | FILTER | | | | | |
|* 12 | TABLE ACCESS FULL | SECONDARY_HEADER | 65894 | 965K| 319K (1)| 00:00:13 |
|* 13 | FILTER | | | | | |
|* 14 | TABLE ACCESS FULL | DETAIL_TABLE | 598K| 8771K| 3622K (1)| 00:02:22 |
| 15 | TABLE ACCESS BY INDEX ROWID | DETAIL_TABLE | 21 | 9093 | 2 (0)| 00:00:01 |
|* 16 | INDEX RANGE SCAN | P_DETAIL#SCLM#PDT | 21 | | 1 (0)| 00:00:01 |
| 17 | VIEW PUSHED PREDICATE | VW_SQ_2 | 1 | 2 | 2 (0)| 00:00:01 |
| 18 | NESTED LOOPS | | 1 | 28 | 2 (0)| 00:00:01 |
|* 19 | INDEX UNIQUE SCAN | P_HEADER#SEQ_ID | 1 | 7 | 1 (0)| 00:00:01 |
|* 20 | TABLE ACCESS BY INDEX ROWID| DETAIL_TABLE | 1 | 21 | 1 (0)| 00:00:01 |
|* 21 | INDEX RANGE SCAN | P_DETAIL#SQ_C_I | 2 | | 1 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------------
If add on the 1 = 1 at the end of the query and run the explain plan against the same database, the plan is now this:
SELECT *
FROM vw_detail dtl
WHERE EXISTS (SELECT 1
FROM vw_header_detail sts1
WHERE ((sts1.posted_date IS NOT NULL) OR (sts1.secondary_posted_date IS NOT NULL))
AND sts1.seq_id = dtl.seq_id
AND sts1.c_status in ('P','D','C','G','K','A')
AND sts1.p_status in ('S','P','F','O') )
AND (EXISTS (SELECT 1
FROM vw_header hdr
WHERE nvl(update_datetime, :out_of_range) BETWEEN :from_date and :to_date
AND hdr.seq_id = dtl.seq_id)
OR EXISTS (SELECT 1
FROM vw_detail dtl2
WHERE nvl(update_datetime, :out_of_range) BETWEEN :from_date and :to_date
AND dtl2.seq_id = dtl.seq_id)
OR EXISTS (SELECT 1
FROM vw_secondary_header sech
WHERE nvl(update_datetime, :out_of_range) BETWEEN :from_date and :to_date
AND sech.seq_id = dtl.seq_id)
OR EXISTS (SELECT 1
FROM vw_secondary_detail secd
WHERE nvl(update_datetime, :out_of_range) BETWEEN :from_date and :to_date
AND secd.seq_id = dtl.seq_id))
AND 1 = 1
ORDER BY dtl.seq_id, dtl.line_number
--------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 239M| 96G| 479M (1)| 05:11:53 |
| 1 | TABLE ACCESS BY INDEX ROWID | DETAIL_TABLE | 239M| 96G| 222K (1)| 00:00:09 |
|* 2 | INDEX FULL SCAN | P_DETAIL#SQ_C_I | 598K| | 165K (1)| 00:00:07 |
| 3 | UNION-ALL | | | | | |
|* 4 | FILTER | | | | | |
|* 5 | TABLE ACCESS BY INDEX ROWID BATCHED| SECONDARY_DETAIL | 1 | 15 | 1 (0)| 00:00:01 |
|* 6 | INDEX RANGE SCAN | IDX_FEP_DETAIL#S_C_I | 3 | | 1 (0)| 00:00:01 |
|* 7 | FILTER | | | | | |
|* 8 | TABLE ACCESS BY INDEX ROWID | HEADER_TABLE | 1 | 15 | 1 (0)| 00:00:01 |
|* 9 | INDEX UNIQUE SCAN | P_HEADER_#SEQ_ID | 1 | | 1 (0)| 00:00:01 |
|* 10 | FILTER | | | | | |
|* 11 | TABLE ACCESS BY INDEX ROWID BATCHED| SECONDARY_HEADER | 1 | 15 | 1 (0)| 00:00:01 |
|* 12 | INDEX RANGE SCAN | PK_#S_C_I | 1 | | 1 (0)| 00:00:01 |
|* 13 | FILTER | | | | | |
|* 14 | TABLE ACCESS BY INDEX ROWID BATCHED| DETAIL_TABLE | 1 | 15 | 2 (0)| 00:00:01 |
|* 15 | INDEX RANGE SCAN | P_DETAIL_#SCLM#PDT | 21 | | 1 (0)| 00:00:01 |
| 16 | NESTED LOOPS | | 1 | 28 | 2 (0)| 00:00:01 |
|* 17 | INDEX UNIQUE SCAN | P_HEADER#SEQ_ID | 1 | 7 | 1 (0)| 00:00:01 |
|* 18 | TABLE ACCESS BY INDEX ROWID BATCHED | DETAIL_TABLE | 1 | 21 | 1 (0)| 00:00:01 |
|* 19 | INDEX RANGE SCAN | P_DETAIL#SCLM#PDT | 1 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------------------
It is no longer using the full table scans and instead using the indexes but when it gets back to the main table to select all the detail lines, it's using a Full Index Scan. But I'm not sure why the introduction of the "1 = 1" would have caused the plan to change that drastically.
We've tested it where we just turned off the policies all together and then it runs fine again. But the confusing thing in troubleshooting is in an environment where the policies aren't present, manually doing a plan on what the VPD policy would add is causing such a big difference. So I am struggling to understand what implications the "1 = 1" have and what we could do to workaround it -- or if its a red herring for something else actually occuring.
In general, adding WHERE 1 = 1 (or other always true conditions) to queries is safe. The optimizer will factor this out when parsing the query. So I'm unsure what's going on here too!
Looking at the plans, there's a massive difference in the row estimates and costs ( 21 & 6M for the fast plan; 239M & 479M for the slow plan). Along with the other plan differences suggest that in this case WHERE 1=1 is affecting row estimates or transformations available.
A quick test would be to turn this into a top-N query with FETCH FIRST, e.g.:
SELECT *
FROM vw_detail dtl
WHERE ...
AND 1 = 1
ORDER BY dtl.seq_id, dtl.line_number
FETCH FIRST 100 ROWS ONLY
This will tell the optimizer the query returns at most 100 rows. This should make it more likely to choose nested loops as in the first plan (these are better for returning a few rows from a big data set).
If you still end up with the slow plan, it suggests WHERE 1=1 is preventing the optimizer from picking the fast plan for some reason.
If adding FETCH FIRST gives the fast plan, it suggests then WHERE 1=1 is affecting row estimates, leading to the slow plan. If this is the case, assuming you can pick a value for N big enough to ensure the query returns all the rows it needs, but small enough to produce the fast plan, this may be a good enough workaround for now!
Either way, it'll help diagnose why the condition causes a difference.
In the meantime, some other observations to help improve the query:
DETAIL_TABLE This table appears
three times in each of the plans!
Given this table appears to be the largest in the query (it has the highest row estimates) rewriting the query to access it once will help whatever else you do. I'm
guessing that two of the accesses come from vw_header_detail and vw_detail, so I'd look to see if it's possible to move the where clause on SELECT 1 FROM vw_header_detail sts1 to the main query itself.
Then look at whichever of the OR EXISTS subqueries this appears in and ask if this could be merged with the parent query.
nvl(update_datetime, :out_of_range)Why does the query do this? Will the searched date range (:from_date - :to_date) ever include :out_of_range?
If no, simplify to
update_datetime BETWEEN :from_date and :to_date
This will enable the optimizer to use indexes on UPDATE_DATETIME => possibly improve row estimates => better plans.