Problem Statement
I am working with a large dataset (~11-12 million records), where I need to identify key out-of-stock (OOS) event dates for different products (MASTER_ID) across multiple marketplaces (MARKETPLACE_ID).
Each record in the DETAILED_OOS_EVENTS table represents an OOS event on a particular date, and the combination of (MASTER_ID, MARKETPLACE_ID, OOS_DATE) is always unique.
Goal of the Query
I need to:
Find the earliest OOS event (MIN(OOS_DATE)) for each (MASTER_ID, MARKETPLACE_ID).
Recursively find the next OOS event, which occurs at least 7 days after the previous event.
Repeat this process until no more OOS events satisfy the condition.
Issue
I am using a recursive CTE (WITH RECURSIVE), but since the dataset is large (~11-12M rows), the query takes too long to execute. I’m looking for ways to optimize it.
I have provided table structure, sample data and current query as a livesql script.
Script URL:
https://livesql.oracle.com/next/library/scripts/recursive-query-for-oos-events-47UhGS
The query reads detailed_oos_events three times. If you've got no indexes on it, that's at least three full scans. Even with a (unique) index on ( master_id, marketplace_id, oos_date ), changing these full table scans to index full scans is likely the best this query can manage.
The recursive joins will run many times, so it'll be re-reading this data a lot!
It's better to walk through the rows sorted by oos_date. While you can do this using recursive with by assigning row_numbers first, I think this problem is better suited to pattern matching.
The key here is to combine rows into groups within 7 days. You can do this with a pattern like:
after match skip to more_than_7
pattern ( init within_7* more_than_7 )
define
within_7 as oos_date < init.oos_date + 7,
more_than_7 as oos_date >= init.oos_date + 7
The pattern matches any row (init is undefined so always true), then zero or more rows within 7 days, followed by exactly one more than 7 days after the first init row.
After finding a group, the after match clause starts looking again from the last row. Without this it'll skip to the next row.
Here's the query:
select master_id, marketplace_id,
event_date, iteration,
days_from_previous_event
from detailed_oos_events
match_recognize (
partition by master_id, marketplace_id
order by oos_date
measures
oos_date as event_date,
match_number() as iteration,
more_than_7.oos_date - init.oos_date as days_from_previous_event
after match skip to more_than_7
pattern ( init within_7* more_than_7 )
define
within_7 as oos_date < init.oos_date + 7,
more_than_7 as oos_date >= init.oos_date + 7
);
MASTER_ID MARKETPLACE_ID EVENT_DATE ITERATION DAYS_FROM_PREVIOUS_EVENT
P04G 13 30-JUN-2024 00:00:00 1 363
P04G 13 07-JUL-2024 00:00:00 2 7
P04G 13 14-JUL-2024 00:00:00 3 7
P04G 13 21-JUL-2024 00:00:00 4 7
This excludes the zeroth iteration - you can union this to include it. Then use lag over the final results to get the previous dates, e.g.:
select master_id, marketplace_id,
lag ( event_date ) over (
partition by master_id, marketplace_id
order by event_date
) previous_date,
event_date,
iteration,
days_from_previous_event
from (
select master_id, marketplace_id, min ( oos_date ) event_date,
0 iteration, 0 days_from_previous_event
from detailed_oos_events
group by master_id, marketplace_id
union all
select master_id, marketplace_id,
event_date,
iteration,
days_from_previous_event
from detailed_oos_events
match_recognize (
partition by master_id, marketplace_id
order by oos_date
measures
oos_date as event_date,
match_number() as iteration,
more_than_7.oos_date - init.oos_date as days_from_previous_event
after match skip to more_than_7
pattern ( init within_7* more_than_7 )
define
within_7 as oos_date < init.oos_date + 7,
more_than_7 as oos_date >= init.oos_date + 7
)
);
MASTER_ID MARKETPLACE_ID PREVIOUS_DATE EVENT_DATE ITERATION DAYS_FROM_PREVIOUS_EVENT
P04G 13 <null> 03-JUL-2023 00:00:00 0 0
P04G 13 03-JUL-2023 00:00:00 30-JUN-2024 00:00:00 1 363
P04G 13 30-JUN-2024 00:00:00 07-JUL-2024 00:00:00 2 7
P04G 13 07-JUL-2024 00:00:00 14-JUL-2024 00:00:00 3 7
P04G 13 14-JUL-2024 00:00:00 21-JUL-2024 00:00:00 4 7
This still means two full table/index scans, but should be much more efficient than the original query as it's a single pass through the data each time. If there are many rows for each ( master_id, marketplace_id ) pair, creating a materialized view (MV) over:
select master_id, marketplace_id, min ( oos_date ) event_date
from detailed_oos_events
group by master_id, marketplace_id
And using the MV to get the zeroth rows could help improve this further too.