Skip to Main Content
  • Questions
  • Optimizing Recursive SQL Query Performance

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Amol.

Asked: March 07, 2025 - 9:02 am UTC

Last updated: March 18, 2025 - 5:26 pm UTC

Version: 19C

Viewed 1000+ times

You Asked

Problem Statement
I am working with a large dataset (~11-12 million records), where I need to identify key out-of-stock (OOS) event dates for different products (MASTER_ID) across multiple marketplaces (MARKETPLACE_ID).

Each record in the DETAILED_OOS_EVENTS table represents an OOS event on a particular date, and the combination of (MASTER_ID, MARKETPLACE_ID, OOS_DATE) is always unique.

Goal of the Query
I need to:

Find the earliest OOS event (MIN(OOS_DATE)) for each (MASTER_ID, MARKETPLACE_ID).
Recursively find the next OOS event, which occurs at least 7 days after the previous event.
Repeat this process until no more OOS events satisfy the condition.

Issue
I am using a recursive CTE (WITH RECURSIVE), but since the dataset is large (~11-12M rows), the query takes too long to execute. I’m looking for ways to optimize it.

I have provided table structure, sample data and current query as a livesql script.

Script URL: https://livesql.oracle.com/next/library/scripts/recursive-query-for-oos-events-47UhGS

with LiveSQL Test Case:

and Chris said...

The query reads detailed_oos_events three times. If you've got no indexes on it, that's at least three full scans. Even with a (unique) index on ( master_id, marketplace_id, oos_date ), changing these full table scans to index full scans is likely the best this query can manage.

The recursive joins will run many times, so it'll be re-reading this data a lot!

It's better to walk through the rows sorted by oos_date. While you can do this using recursive with by assigning row_numbers first, I think this problem is better suited to pattern matching.

The key here is to combine rows into groups within 7 days. You can do this with a pattern like:

    after match skip to more_than_7
    pattern ( init within_7* more_than_7 ) 
    define 
      within_7 as oos_date < init.oos_date + 7,
      more_than_7 as oos_date >= init.oos_date + 7  


The pattern matches any row (init is undefined so always true), then zero or more rows within 7 days, followed by exactly one more than 7 days after the first init row.

After finding a group, the after match clause starts looking again from the last row. Without this it'll skip to the next row.

Here's the query:

select master_id, marketplace_id, 
       event_date, iteration, 
       days_from_previous_event
from   detailed_oos_events
  match_recognize (
    partition by master_id, marketplace_id
    order by oos_date 
    measures 
      oos_date as event_date,
      match_number() as iteration,
      more_than_7.oos_date - init.oos_date as days_from_previous_event
    after match skip to more_than_7
    pattern ( init within_7* more_than_7 ) 
    define 
      within_7 as oos_date < init.oos_date + 7,
      more_than_7 as oos_date >= init.oos_date + 7  
  );

MASTER_ID    MARKETPLACE_ID    EVENT_DATE              ITERATION    DAYS_FROM_PREVIOUS_EVENT    
P04G                        13 30-JUN-2024 00:00:00               1                         363 
P04G                        13 07-JUL-2024 00:00:00               2                           7 
P04G                        13 14-JUL-2024 00:00:00               3                           7 
P04G                        13 21-JUL-2024 00:00:00               4                           7 


This excludes the zeroth iteration - you can union this to include it. Then use lag over the final results to get the previous dates, e.g.:

select master_id, marketplace_id, 
       lag ( event_date ) over ( 
         partition by master_id, marketplace_id
         order by event_date
       ) previous_date, 
       event_date, 
       iteration, 
       days_from_previous_event 
from (
  select master_id, marketplace_id, min ( oos_date ) event_date,
         0 iteration, 0 days_from_previous_event
  from   detailed_oos_events
  group  by master_id, marketplace_id
  union  all 
  select master_id, marketplace_id, 
         event_date, 
         iteration, 
         days_from_previous_event
  from   detailed_oos_events
    match_recognize (
      partition by master_id, marketplace_id
      order by oos_date 
      measures 
        oos_date as event_date,
        match_number() as iteration,
        more_than_7.oos_date - init.oos_date as days_from_previous_event
      after match skip to more_than_7
      pattern ( init within_7* more_than_7 ) 
      define 
        within_7 as oos_date < init.oos_date + 7,
        more_than_7 as oos_date >= init.oos_date + 7  
    )
);

MASTER_ID    MARKETPLACE_ID    PREVIOUS_DATE           EVENT_DATE              ITERATION    DAYS_FROM_PREVIOUS_EVENT    
P04G                        13 <null>                  03-JUL-2023 00:00:00               0                           0 
P04G                        13 03-JUL-2023 00:00:00    30-JUN-2024 00:00:00               1                         363 
P04G                        13 30-JUN-2024 00:00:00    07-JUL-2024 00:00:00               2                           7 
P04G                        13 07-JUL-2024 00:00:00    14-JUL-2024 00:00:00               3                           7 
P04G                        13 14-JUL-2024 00:00:00    21-JUL-2024 00:00:00               4                           7 


This still means two full table/index scans, but should be much more efficient than the original query as it's a single pass through the data each time. If there are many rows for each ( master_id, marketplace_id ) pair, creating a materialized view (MV) over:

  select master_id, marketplace_id, min ( oos_date ) event_date
  from   detailed_oos_events
  group  by master_id, marketplace_id


And using the MV to get the zeroth rows could help improve this further too.

Rating

  (6 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Amol Patel, March 10, 2025 - 11:24 am UTC

Thank you so much Chris! This is really helpful.

Recursive with

Amol Patel, March 10, 2025 - 12:45 pm UTC

Hello Chris, Your solution using MATCH_RECOGNIZE works perfectly! However, I was wondering if there's a way to achieve the same result using a recursive CTE with row number assignment, for backward compatibility.
Chris Saxon
March 10, 2025 - 3:52 pm UTC

Backward compatibility with what?

Amol Patel, March 10, 2025 - 4:09 pm UTC

For older version of Oracle.
Chris Saxon
March 11, 2025 - 12:17 pm UTC

Pattern matching was added in 12c.

Better use of MATCH_RECOGNIZE

mathguy, March 11, 2025 - 2:13 am UTC

The extra step of capturing the zero-th iteration can be avoided, with a more careful coding of MATCH_RECOGNIZE. For example:

select master_id, marketplace_id, previous_date, event_date, iteration,
       event_date - previous_date as days_from_previous_event
from   detailed_oos_events
match_recognize(
  partition by master_id, marketplace_id
  order     by oos_date
  measures  e.oos_date          as event_date,
            last(e.oos_date, 1) as previous_date,
            count(e.*) - 1      as iteration
  all rows per match
  pattern   ((e {-x*-})*)
  define    x as oos_date < e.oos_date + 7
);


With this code, the zeroth iteration will show NULL for DAYS_FROM_PREVIOUS_EVENT, as it should (there is no previous event!) Of course, if the business user insists they have a very good reason to require 0 to be shown, that is a trivial application of NVL.

There are perhaps two non-basic features I used: the {- ... -} notation for rows that should not be included in the output (with ALL ROWS PER MATCH, not all rows are actually shown - those enclosed in {- ... -} are not!) and the second argument to LAST. These are discussed in the documentation, but it seems that not all programmers are familiar with them.
Chris Saxon
March 11, 2025 - 12:16 pm UTC

Nice work mathguy; I was hoping someone could find a way to do this all within match_recognize!

Off Course with MODEL Clause

Rajeshwaran Jeyabal, March 17, 2025 - 1:46 pm UTC

demo@FREEPDB1> select master_id
  2      , marketplace_id
  3      , new_dt as event_dt
  4      , count(*)
  5      , count(*) over( order by new_dt )-1 as iteration
  6      , new_dt - lag(new_dt) over(order by new_dt) as diff
  7  from (
  8  select *
  9  from detailed_oos_events
 10  model
 11      partition by (master_id,marketplace_id)
 12      dimension by ( row_number() over( order by oos_date) r)
 13      measures ( 1 x1 ,oos_date , cast(null as date) as new_dt)
 14      rules( x1[any] order by r = 1
 15          , new_dt[r=1] = oos_date[cv(r)]
 16          , new_dt[r>1] order by r = case when oos_date[cv(r)] - new_dt[cv(r)-1] < 7 then new_dt[cv(r)-1]
 17                      else oos_date[cv(r)] end
 18      ) )
 19  group by master_id
 20      , marketplace_id
 21      , new_dt
 22  /

MASTER_ID  MARKETPLACE_ID EVENT_DT      COUNT(*)  ITERATION       DIFF
---------- -------------- ----------- ---------- ---------- ----------
P04G                   13 03-JUL-2023          2          0
P04G                   13 30-JUN-2024          7          1        363
P04G                   13 07-JUL-2024          7          2          7
P04G                   13 14-JUL-2024          7          3          7
P04G                   13 21-JUL-2024          1          4          7


Chris Saxon
March 18, 2025 - 5:26 pm UTC

Yep, that's another solution

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.