So what you're wanting is:
The database to stop processing v_schedule when launch_date > sysdate?
The problem here is the database
doesn't know there are no rows > sysdate in t_launches. It has to check t_launches to verify this. So it can't just stop processing the join for plan_dates after this.
As far as I'm aware, the optimizer won't currently use filters in the join clause to avoid the join as you want. It
may be possible to add this, you'll need to add an enhancement request for this though.
So get what you want, you need to split the queries and union them as in the T2 query.
In practice it makes little difference to the example queries. While 100,000 index lookups vs. 1,000 sounds bad the runtime is similar. Most of the time is spent fetching the rows from the schedule:
select plan_date, launch_date
from (
select plan_date,
lead(plan_date) over (order by plan_date) as next_plan_date
from v_schedule
)
left join t_launches
on plan_date <= to_date ( '2021-04-20 13:00', 'yyyy-mm-dd hh24:mi' )
and launch_date >= plan_date
and launch_date < next_plan_date;
------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 100K|00:00:00.64 | 1030 |
| 1 | NESTED LOOPS OUTER | | 1 | 1 | 100K|00:00:00.64 | 1030 |
| 2 | VIEW | | 1 | 1 | 100K|00:00:00.48 | 0 |
| 3 | WINDOW SORT | | 1 | 1 | 100K|00:00:00.43 | 0 |
| 4 | VIEW | V_SCHEDULE | 1 | 1 | 100K|00:00:00.19 | 0 |
| 5 | CONNECT BY WITHOUT FILTERING| | 1 | | 100K|00:00:00.06 | 0 |
| 6 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | 0 |
|* 7 | INDEX RANGE SCAN | LAUNCH_DATE_I | 100K| 1 | 780 |00:00:00.11 | 1030 |
------------------------------------------------------------------------------------------------------------
select plan_date, launch_date
from (
select plan_date,
lead(plan_date) over (order by plan_date) as next_plan_date
from v_schedule
where plan_date <= to_date ( '2021-04-20 13:00', 'yyyy-mm-dd hh24:mi' )
)
left join t_launches
on launch_date >= plan_date
and launch_date < next_plan_date
union all
select plan_date, null
from v_schedule
where plan_date > to_date ( '2021-04-20 13:00', 'yyyy-mm-dd hh24:mi' );
-------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 100K|00:00:00.58 | 38 |
| 1 | UNION-ALL | | 1 | | 100K|00:00:00.58 | 38 |
| 2 | NESTED LOOPS OUTER | | 1 | 3 | 780 |00:00:00.19 | 38 |
| 3 | VIEW | | 1 | 1 | 780 |00:00:00.19 | 0 |
| 4 | WINDOW SORT | | 1 | 1 | 780 |00:00:00.19 | 0 |
|* 5 | VIEW | V_SCHEDULE | 1 | 1 | 780 |00:00:00.19 | 0 |
| 6 | CONNECT BY WITHOUT FILTERING| | 1 | | 100K|00:00:00.06 | 0 |
| 7 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | 0 |
|* 8 | INDEX RANGE SCAN | LAUNCH_DATE_I | 780 | 3 | 779 |00:00:00.01 | 38 |
|* 9 | VIEW | V_SCHEDULE | 1 | 1 | 99220 |00:00:00.31 | 0 |
| 10 | CONNECT BY WITHOUT FILTERING | | 1 | | 100K|00:00:00.10 | 0 |
| 11 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | 0 |
-------------------------------------------------------------------------------------------------------------
(I swapped sysdate for 20 Apr to make the queries consistent)
Just joining takes 640ms vs 580ms for splitting the join on date. Though the union all approach does notably less IO (Buffers - 38 vs 1,030).
This can be greatly reduced by using a different join method, but there are several challenges with this query stopping this:
- The optimizer thinks v_schedule only returns one row
- You're outer joining v_schedule, so the database needs to return all the rows it produces
- There are no equality conditions (=) in the join
The optimizer prefers to start joins with the table returning fewer rows. So thinking that the larger table only returns one row stops this! This also leads it to think nested loops are best, as it believes there's only one indexed lookup to do on t_launches. This is clearly wrong :)
Outer joining also limits the join options. It can't use t_launches as the outer table in nested loops joins for example.
As all the join criteria are range conditions, the optimizer is unable to use a hash join. This would allow it to make t_launches the driving table and only scan both data sets once.
Adding an equality to the join (trunc ( launch_date ) = trunc ( plan_date ) - is this safe?) means it can hash join. I've also cheated and used the cardinlaity hint to tell the optimizer that v_schedule has 100,000 rows.
Avoid the cardinality hint in production code!This flips the join order and allows the database to use a hash join:
select plan_date, launch_date
from (
select /*+ cardinality ( 100000 ) */plan_date,
lead(plan_date) over (order by plan_date) as next_plan_date
from v_schedule
) v
left join t_launches t
on launch_date >= plan_date
and launch_date < next_plan_date
and trunc ( launch_date ) = trunc ( plan_date );
------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 100K|00:00:00.74 | 7 |
|* 1 | HASH JOIN RIGHT OUTER | | 1 | 100K| 100K|00:00:00.74 | 7 |
| 2 | INDEX FAST FULL SCAN | LAUNCH_DATE_I | 1 | 1000 | 1000 |00:00:00.01 | 7 |
| 3 | VIEW | | 1 | 100K| 100K|00:00:00.45 | 0 |
| 4 | WINDOW SORT | | 1 | 100K| 100K|00:00:00.41 | 0 |
| 5 | VIEW | V_SCHEDULE | 1 | 1 | 100K|00:00:00.19 | 0 |
| 6 | CONNECT BY WITHOUT FILTERING| | 1 | | 100K|00:00:00.06 | 0 |
| 7 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | 0 |
------------------------------------------------------------------------------------------------------------
Much less work, but slower than either of the original joins!
As for the plan with merge join, funnily enough making v_schedule into a table and proper stats have ruined the planCan you share exactly what you did?