Skip to Main Content
  • Questions
  • Optimizing Recursive SQL Query Performance

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Amol.

Asked: March 07, 2025 - 9:02 am UTC

Last updated: March 18, 2025 - 5:26 pm UTC

Version: 19C

Viewed 1000+ times

You Asked

Problem Statement
I am working with a large dataset (~11-12 million records), where I need to identify key out-of-stock (OOS) event dates for different products (MASTER_ID) across multiple marketplaces (MARKETPLACE_ID).

Each record in the DETAILED_OOS_EVENTS table represents an OOS event on a particular date, and the combination of (MASTER_ID, MARKETPLACE_ID, OOS_DATE) is always unique.

Goal of the Query
I need to:

Find the earliest OOS event (MIN(OOS_DATE)) for each (MASTER_ID, MARKETPLACE_ID).
Recursively find the next OOS event, which occurs at least 7 days after the previous event.
Repeat this process until no more OOS events satisfy the condition.

Issue
I am using a recursive CTE (WITH RECURSIVE), but since the dataset is large (~11-12M rows), the query takes too long to execute. I’m looking for ways to optimize it.

I have provided table structure, sample data and current query as a livesql script.

Script URL: https://livesql.oracle.com/next/library/scripts/recursive-query-for-oos-events-47UhGS

with LiveSQL Test Case:

and Chris said...

The query reads detailed_oos_events three times. If you've got no indexes on it, that's at least three full scans. Even with a (unique) index on ( master_id, marketplace_id, oos_date ), changing these full table scans to index full scans is likely the best this query can manage.

The recursive joins will run many times, so it'll be re-reading this data a lot!

It's better to walk through the rows sorted by oos_date. While you can do this using recursive with by assigning row_numbers first, I think this problem is better suited to pattern matching.

The key here is to combine rows into groups within 7 days. You can do this with a pattern like:

    after match skip to more_than_7
    pattern ( init within_7* more_than_7 ) 
    define 
      within_7 as oos_date < init.oos_date + 7,
      more_than_7 as oos_date >= init.oos_date + 7  


The pattern matches any row (init is undefined so always true), then zero or more rows within 7 days, followed by exactly one more than 7 days after the first init row.

After finding a group, the after match clause starts looking again from the last row. Without this it'll skip to the next row.

Here's the query:

select master_id, marketplace_id, 
       event_date, iteration, 
       days_from_previous_event
from   detailed_oos_events
  match_recognize (
    partition by master_id, marketplace_id
    order by oos_date 
    measures 
      oos_date as event_date,
      match_number() as iteration,
      more_than_7.oos_date - init.oos_date as days_from_previous_event
    after match skip to more_than_7
    pattern ( init within_7* more_than_7 ) 
    define 
      within_7 as oos_date < init.oos_date + 7,
      more_than_7 as oos_date >= init.oos_date + 7  
  );

MASTER_ID    MARKETPLACE_ID    EVENT_DATE              ITERATION    DAYS_FROM_PREVIOUS_EVENT    
P04G                        13 30-JUN-2024 00:00:00               1                         363 
P04G                        13 07-JUL-2024 00:00:00               2                           7 
P04G                        13 14-JUL-2024 00:00:00               3                           7 
P04G                        13 21-JUL-2024 00:00:00               4                           7 


This excludes the zeroth iteration - you can union this to include it. Then use lag over the final results to get the previous dates, e.g.:

select master_id, marketplace_id, 
       lag ( event_date ) over ( 
         partition by master_id, marketplace_id
         order by event_date
       ) previous_date, 
       event_date, 
       iteration, 
       days_from_previous_event 
from (
  select master_id, marketplace_id, min ( oos_date ) event_date,
         0 iteration, 0 days_from_previous_event
  from   detailed_oos_events
  group  by master_id, marketplace_id
  union  all 
  select master_id, marketplace_id, 
         event_date, 
         iteration, 
         days_from_previous_event
  from   detailed_oos_events
    match_recognize (
      partition by master_id, marketplace_id
      order by oos_date 
      measures 
        oos_date as event_date,
        match_number() as iteration,
        more_than_7.oos_date - init.oos_date as days_from_previous_event
      after match skip to more_than_7
      pattern ( init within_7* more_than_7 ) 
      define 
        within_7 as oos_date < init.oos_date + 7,
        more_than_7 as oos_date >= init.oos_date + 7  
    )
);

MASTER_ID    MARKETPLACE_ID    PREVIOUS_DATE           EVENT_DATE              ITERATION    DAYS_FROM_PREVIOUS_EVENT    
P04G                        13 <null>                  03-JUL-2023 00:00:00               0                           0 
P04G                        13 03-JUL-2023 00:00:00    30-JUN-2024 00:00:00               1                         363 
P04G                        13 30-JUN-2024 00:00:00    07-JUL-2024 00:00:00               2                           7 
P04G                        13 07-JUL-2024 00:00:00    14-JUL-2024 00:00:00               3                           7 
P04G                        13 14-JUL-2024 00:00:00    21-JUL-2024 00:00:00               4                           7 


This still means two full table/index scans, but should be much more efficient than the original query as it's a single pass through the data each time. If there are many rows for each ( master_id, marketplace_id ) pair, creating a materialized view (MV) over:

  select master_id, marketplace_id, min ( oos_date ) event_date
  from   detailed_oos_events
  group  by master_id, marketplace_id


And using the MV to get the zeroth rows could help improve this further too.

Rating

  (5 ratings)

Comments

Amol Patel, March 10, 2025 - 11:24 am UTC

Thank you so much Chris! This is really helpful.

Recursive with

Amol Patel, March 10, 2025 - 12:45 pm UTC

Hello Chris, Your solution using MATCH_RECOGNIZE works perfectly! However, I was wondering if there's a way to achieve the same result using a recursive CTE with row number assignment, for backward compatibility.
Chris Saxon
March 10, 2025 - 3:52 pm UTC

Backward compatibility with what?

Amol Patel, March 10, 2025 - 4:09 pm UTC

For older version of Oracle.
Chris Saxon
March 11, 2025 - 12:17 pm UTC

Pattern matching was added in 12c.

Better use of MATCH_RECOGNIZE

mathguy, March 11, 2025 - 2:13 am UTC

The extra step of capturing the zero-th iteration can be avoided, with a more careful coding of MATCH_RECOGNIZE. For example:

select master_id, marketplace_id, previous_date, event_date, iteration,
       event_date - previous_date as days_from_previous_event
from   detailed_oos_events
match_recognize(
  partition by master_id, marketplace_id
  order     by oos_date
  measures  e.oos_date          as event_date,
            last(e.oos_date, 1) as previous_date,
            count(e.*) - 1      as iteration
  all rows per match
  pattern   ((e {-x*-})*)
  define    x as oos_date < e.oos_date + 7
);


With this code, the zeroth iteration will show NULL for DAYS_FROM_PREVIOUS_EVENT, as it should (there is no previous event!) Of course, if the business user insists they have a very good reason to require 0 to be shown, that is a trivial application of NVL.

There are perhaps two non-basic features I used: the {- ... -} notation for rows that should not be included in the output (with ALL ROWS PER MATCH, not all rows are actually shown - those enclosed in {- ... -} are not!) and the second argument to LAST. These are discussed in the documentation, but it seems that not all programmers are familiar with them.
Chris Saxon
March 11, 2025 - 12:16 pm UTC

Nice work mathguy; I was hoping someone could find a way to do this all within match_recognize!

Off Course with MODEL Clause

Rajeshwaran Jeyabal, March 17, 2025 - 1:46 pm UTC

demo@FREEPDB1> select master_id
  2      , marketplace_id
  3      , new_dt as event_dt
  4      , count(*)
  5      , count(*) over( order by new_dt )-1 as iteration
  6      , new_dt - lag(new_dt) over(order by new_dt) as diff
  7  from (
  8  select *
  9  from detailed_oos_events
 10  model
 11      partition by (master_id,marketplace_id)
 12      dimension by ( row_number() over( order by oos_date) r)
 13      measures ( 1 x1 ,oos_date , cast(null as date) as new_dt)
 14      rules( x1[any] order by r = 1
 15          , new_dt[r=1] = oos_date[cv(r)]
 16          , new_dt[r>1] order by r = case when oos_date[cv(r)] - new_dt[cv(r)-1] < 7 then new_dt[cv(r)-1]
 17                      else oos_date[cv(r)] end
 18      ) )
 19  group by master_id
 20      , marketplace_id
 21      , new_dt
 22  /

MASTER_ID  MARKETPLACE_ID EVENT_DT      COUNT(*)  ITERATION       DIFF
---------- -------------- ----------- ---------- ---------- ----------
P04G                   13 03-JUL-2023          2          0
P04G                   13 30-JUN-2024          7          1        363
P04G                   13 07-JUL-2024          7          2          7
P04G                   13 14-JUL-2024          7          3          7
P04G                   13 21-JUL-2024          1          4          7


Chris Saxon
March 18, 2025 - 5:26 pm UTC

Yep, that's another solution

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.