Skip to Main Content
  • Questions
  • Match Recognize Examples (SQL Pattern Matching)

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Amit.

Asked: February 16, 2017 - 8:21 am UTC

Answered by: Chris Saxon - Last updated: February 21, 2017 - 4:44 pm UTC

Category: Developer - Version: 12c

Viewed 10K+ times! This question is

You Asked

Hi Ask Tom Team,
Recently I came across MATCH_RECOGNIZE clause. I checked the Oracle documentation but it's not easy to digest.
Looking at OLL webinars, youtube videos and some of Ketih's article on oracle blog helped me understanding a little of it.
In oracle training as well there seems nothing on it so i thought to as here.

Is it possible for you share some light on it by giving less complex examples or any series of videos like you did on Analytical functions.

Oracle documentation syntax:

table_reference ::=
{only (query_table_expression) | query_table_expression }[flashback_query_clause]
[pivot_clause|unpivot_clause|row_pattern_recognition_clause] [t_alias]

row_pattern_recognition_clause ::=
MATCH_RECOGNIZE (
[row_pattern_partition_by ]
[row_pattern_order_by ]
[row_pattern_measures ]
[row_pattern_rows_per_match ]
[row_pattern_skip_to ]
PATTERN (row_pattern)
[ row_pattern_subset_clause]
DEFINE row_pattern_definition_list
)

row_pattern_partition_by ::=
PARTITION BY column[, column]...

row_pattern_order_by ::=
ORDER BY column[, column]...

row_pattern_measures ::=
MEASURES row_pattern_measure_column[, row_pattern_measure_column]...

row_pattern_measure_column ::=
expression AS c_alias

row_pattern_rows_per_match ::=
ONE ROW PER MATCH
| ALL ROWS PER MATCH

row_pattern_skip_to ::=
AFTER MATCH {
SKIP TO NEXT ROW
| SKIP PAST LAST ROW
| SKIP TO FIRST variable_name
| SKIP TO LAST variable_name
| SKIP TO variable_name}

row_pattern ::=
row_pattern_term
| row_pattern "|" row_pattern_term

row_pattern_term ::=
row_pattern_factor
| row_pattern_term row_pattern_factor

row_pattern_factor ::=
row_pattern_primary [row_pattern_quantifier]

row_pattern_quantifier ::=
*[?]
|+[?]
|?[?]
|"{"[unsigned_integer ],[unsigned_integer]"}"[?]
|"{"unsigned_integer "}"

row_pattern_primary ::=
variable_name
|$
|^
|([row_pattern])
|"{-" row_pattern"-}"
| row_pattern_permute

row_pattern_permute ::=
PERMUTE (row_pattern [, row_pattern] ...)

row_pattern_subset_clause ::=
SUBSET row_pattern_subset_item [, row_pattern_subset_item] ...

row_pattern_subset_item ::=
variable_name = (variable_name[ , variable_name]...)

row_pattern_definition_list ::=
row_pattern_definition[, row_pattern_definition]...

row_pattern_definition ::=
variable_name AS condition

and we said...

Match_recognize in SQL is awesome. But does take a bit of explaining to understand as you say.

So here goes:

Imagine you've started running. You're keeping track of your progress in the following table:

create table t (
  run_date date, 
  time_in_s int,
  distance_in_miles int
);

insert into t values (date'2017-01-01', 420, 1);
insert into t values (date'2017-01-02', 2400, 5);
insert into t values (date'2017-01-03', 2430, 5);
insert into t values (date'2017-01-04', 2350, 5);
insert into t values (date'2017-01-05', 410, 1);
insert into t values (date'2017-01-06', 400, 1);
insert into t values (date'2017-01-08', 2300, 5);
insert into t values (date'2017-01-09', 425, 1);
insert into t values (date'2017-01-10', 422, 1);
commit;

select * from t;

RUN_DATE              TIME_IN_S  DISTANCE_IN_MILES  
01-JAN-2017 00:00:00  420        1                  
02-JAN-2017 00:00:00  2,400      5                  
03-JAN-2017 00:00:00  2,430      5                  
04-JAN-2017 00:00:00  2,350      5                  
05-JAN-2017 00:00:00  410        1                  
06-JAN-2017 00:00:00  400        1                  
08-JAN-2017 00:00:00  2,300      5                  
09-JAN-2017 00:00:00  425        1                  
10-JAN-2017 00:00:00  422        1


You want to know how many days in a row you ran faster than the previous day. Pattern matching is ideal for this kind of problem. Let's see how.

Before we do that, let's start with the simplest thing possible: matching anything.

To do this look for a pattern of "anything". Next you need to define this variable. To always have a match, you need something that is always true. One way to do this is looking for rows where the run_date equals itself:

select * from t
match_recognize (
  order by run_date
  pattern ( anything ) 
  define
    anything as run_date = run_date
);

SQL Error: ORA-30732: table contains no user-visible columns


Hmmm. What's going on there?

The query has no columns to display!

To resolve this you either need to add:

- A measures clause specifying output columns
- A partition by clause. These columns are included in your output
- The "all rows per match" clause, which includes the source table's columns

Let's add measures with a couple of predefined functions:

- match_number() - Assigns a number to each row in the match. All rows in the same match will have the same value. This starts at 1 and increases for each new match.
- classifier() - Which pattern variable the current row matches

These are handy for debugging. In they go and you get:

select * from t
match_recognize (
  order by run_date
  measures
    match_number() as mno,
    classifier() as cls
  pattern ( anything ) 
  define
    anything as run_date = run_date
);

MNO  CLS       
1    ANYTHING  
2    ANYTHING  
3    ANYTHING  
4    ANYTHING  
5    ANYTHING  
6    ANYTHING  
7    ANYTHING  
8    ANYTHING  
9    ANYTHING


So that gives 9 different matches with match numbers 1-9. All match the variable anything, so this is the classifier.

Why's that? Surely if you're matching "anything", it should all be one match?

Well the pattern is a regular expression. Anything on its own means match one row. To match zero or more rows, use the asterisk operator:

select * from t
match_recognize (
  order by run_date
  measures
    match_number() as mno,
    classifier() as cls
  pattern ( anything* ) 
  define
    anything as run_date = run_date
);

MNO  CLS       
1    ANYTHING


Now you have just a single match. But where have all the other rows gone?!

By default, match_recognize only returns one row per match. There's only one match, so you only get one row. To see them all, specify "all rows per match":

select * from t
match_recognize (
  order by run_date
  measures
    match_number() as mno,
    classifier() as cls
  all rows per match
  pattern ( anything* ) 
  define
    anything as run_date = run_date
);

RUN_DATE              MNO  CLS       TIME_IN_S  DISTANCE_IN_MILES  
01-JAN-2017 00:00:00  1    ANYTHING  420        1                  
02-JAN-2017 00:00:00  1    ANYTHING  2,400      5                  
03-JAN-2017 00:00:00  1    ANYTHING  2,430      5                  
04-JAN-2017 00:00:00  1    ANYTHING  2,350      5                  
05-JAN-2017 00:00:00  1    ANYTHING  410        1                  
06-JAN-2017 00:00:00  1    ANYTHING  400        1                  
08-JAN-2017 00:00:00  1    ANYTHING  2,300      5                  
09-JAN-2017 00:00:00  1    ANYTHING  425        1                  
10-JAN-2017 00:00:00  1    ANYTHING  422        1 


Note doing this also shows you all the columns in the source table. So I'd recommend using "all rows per match" while you're developing your queries.

Finding Faster Runs

OK, that's the basics. Let's move onto the original problem: Finding days you ran faster than the previous one.

The pattern you're looking for is one or more days where time_in_s is less than this value for the previous row. The patterns use regular expressions. The + is for one or more. So your pattern variable will be:

  pattern ( faster+ ) 


To get the value for a column in the previous row, pass it to the prev() function. So the definition of this variable is:

  define 
    faster as time_in_s < prev(time_in_s)


Plug this into your query and you get:

select * from t
match_recognize (
  order by run_date
  measures
    match_number() as mno,
    classifier() as cls
  all rows per match
  pattern ( faster+ ) 
  define 
    faster as time_in_s < prev(time_in_s)
);

RUN_DATE              MNO  CLS     TIME_IN_S  DISTANCE_IN_MILES  
04-JAN-2017 00:00:00  1    FASTER  2,350      5                  
05-JAN-2017 00:00:00  1    FASTER  410        1                  
06-JAN-2017 00:00:00  1    FASTER  400        1                  
09-JAN-2017 00:00:00  2    FASTER  425        1                  
10-JAN-2017 00:00:00  2    FASTER  422        1 


So you've two periods where you had a series of days faster than the previous. Unsurprisingly when you ran one mile the day after a five mile run, you were quicker!

These probably aren't the results you want. It's more useful to know whether you were quicker when you ran the same distance. To split the results up like this, add a "partition by distance_in_miles" clause:

select * from t
match_recognize (
  partition by distance_in_miles
  order by run_date
  measures
    match_number() as mno,
    classifier() as cls
  all rows per match
  pattern ( faster+ ) 
  define 
    faster as time_in_s < prev(time_in_s)
);

DISTANCE_IN_MILES  RUN_DATE              MNO  CLS     TIME_IN_S  
1                  05-JAN-2017 00:00:00  1    FASTER  410        
1                  06-JAN-2017 00:00:00  1    FASTER  400        
1                  10-JAN-2017 00:00:00  2    FASTER  422        
5                  04-JAN-2017 00:00:00  1    FASTER  2,350      
5                  08-JAN-2017 00:00:00  1    FASTER  2,300  


So now you have three periods when you were faster over the same distance. Two for one mile and one for 5 miles. Note the match_number() resets back to 1 for the group of 5 miles.

This is good. But you probably want to include the day before you were faster in your results. i.e. your very first run or the last one you were slower. To get this, add an "anything" match again:

select * from t
match_recognize (
  partition by distance_in_miles
  order by run_date
  measures
    match_number() as mno,
    classifier() as cls
  all rows per match
  pattern ( anything faster+ ) 
  define 
    faster as time_in_s < prev(time_in_s)
);

DISTANCE_IN_MILES  RUN_DATE              MNO  CLS       TIME_IN_S  
1                  01-JAN-2017 00:00:00  1    ANYTHING  420        
1                  05-JAN-2017 00:00:00  1    FASTER    410        
1                  06-JAN-2017 00:00:00  1    FASTER    400        
1                  09-JAN-2017 00:00:00  2    ANYTHING  425        
1                  10-JAN-2017 00:00:00  2    FASTER    422        
5                  03-JAN-2017 00:00:00  1    ANYTHING  2,430      
5                  04-JAN-2017 00:00:00  1    FASTER    2,350      
5                  08-JAN-2017 00:00:00  1    FASTER    2,300


Note you don't actually need to define the "anything" variable! Also the rows now have different classifiers, depending on which variable they matched in the pattern clause.

OK, this is looking better. But in your final results you may want to see:

- The first and last days in each series of improvements
- The number of consecutive days you were faster

To get these, add them to the measures clause. The first() and last() functions will, as you might expect, return the first and last values in the match for the expression. You can get the number of rows in the match with your trusty old count(*):

alter session set nls_date_format = 'DD-MON-YYYY';
select * from t
match_recognize (
  partition by distance_in_miles
  order by run_date
  measures
    match_number() as mno,
    classifier() as cls,
    first(run_date) as frun_date,
    last(run_date) as lrun_date,
    count(*) as c
  all rows per match
  pattern ( anything faster+ ) 
  define 
    faster as time_in_s < nvl(prev(time_in_s), time_in_s)
);

DISTANCE_IN_MILES  RUN_DATE     MNO  CLS       FRUN_DATE    LRUN_DATE    C  TIME_IN_S  
1                  01-JAN-2017  1    ANYTHING  01-JAN-2017  01-JAN-2017  1  420        
1                  05-JAN-2017  1    FASTER    01-JAN-2017  05-JAN-2017  2  410        
1                  06-JAN-2017  1    FASTER    01-JAN-2017  06-JAN-2017  3  400        
1                  09-JAN-2017  2    ANYTHING  09-JAN-2017  09-JAN-2017  1  425        
1                  10-JAN-2017  2    FASTER    09-JAN-2017  10-JAN-2017  2  422        
5                  03-JAN-2017  1    ANYTHING  03-JAN-2017  03-JAN-2017  1  2,430      
5                  04-JAN-2017  1    FASTER    03-JAN-2017  04-JAN-2017  2  2,350      
5                  08-JAN-2017  1    FASTER    03-JAN-2017  08-JAN-2017  3  2,300


Note: count(*) and last() return the relevant values up to that point. i.e. the running count and current date. To overcome this and show the actual last values in the group, add the final keyword before them:

select * from t
match_recognize (
  partition by distance_in_miles
  order by run_date
  measures
    match_number() as mno,
    classifier() as cls,
    first(run_date) as frun_date,
    final last(run_date) as lrun_date,
    final count(*) as c
  all rows per match
  pattern ( anything faster+ ) 
  define 
    faster as time_in_s < nvl(prev(time_in_s), time_in_s)
);

DISTANCE_IN_MILES  RUN_DATE     MNO  CLS       FRUN_DATE    LRUN_DATE    C  TIME_IN_S  
1                  01-JAN-2017  1    ANYTHING  01-JAN-2017  06-JAN-2017  3  420        
1                  05-JAN-2017  1    FASTER    01-JAN-2017  06-JAN-2017  3  410        
1                  06-JAN-2017  1    FASTER    01-JAN-2017  06-JAN-2017  3  400        
1                  09-JAN-2017  2    ANYTHING  09-JAN-2017  10-JAN-2017  2  425        
1                  10-JAN-2017  2    FASTER    09-JAN-2017  10-JAN-2017  2  422        
5                  03-JAN-2017  1    ANYTHING  03-JAN-2017  08-JAN-2017  3  2,430      
5                  04-JAN-2017  1    FASTER    03-JAN-2017  08-JAN-2017  3  2,350      
5                  08-JAN-2017  1    FASTER    03-JAN-2017  08-JAN-2017  3  2,300


At this point you're nearly done!

If you only want one row/series of faster days, remove the "all rows per match" clause. You can also take out the "final" keywords before last and count. This is because "one row per match" gives the final value:

select * from t
match_recognize (
  partition by distance_in_miles
  order by run_date
  measures
    match_number() as mno,
    classifier() as cls,
    first(run_date) as frun_date,
    last(run_date) as lrun_date,
    count(*) as c
  pattern ( anything faster+ ) 
  define 
    faster as time_in_s < nvl(prev(time_in_s), time_in_s)
);

DISTANCE_IN_MILES  MNO  CLS     FRUN_DATE    LRUN_DATE    C  
1                  1    FASTER  01-JAN-2017  06-JAN-2017  3  
1                  2    FASTER  09-JAN-2017  10-JAN-2017  2  
5                  1    FASTER  03-JAN-2017  08-JAN-2017  3 


Hopefully that gives you enough of the basics to get started :)

Pattern matching is a fantastic feature that makes it easy to solve problems that were previously thorny using SQL. For example, bin packing style problems:

https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:9533067800346375228

If you need more examples, check out these by Tim Hall:

https://oracle-base.com/articles/12c/pattern-matching-in-oracle-database-12cr1

And some from Tom Kyte at:

http://www.oracle.com/technetwork/issue-archive/2013/13-nov/o63asktom-2034271.html

You've already mentioned Keith Laker's match_recognize series. Here's the link to the first in the series for those interested:

http://oracle-big-data.blogspot.co.uk/2016/03/sql-pattern-matching-deep-dive-part-1.html

NOTE: Edited explanation of SQL Error: ORA-30732 to include all ways to resolve this.

and you rated our response

  (3 ratings)

Is this answer out of date? If it is, please let us know via a Review

Reviews

Practical Bin-Fitting -- LISTAGG Avoids ORA-01489: result of string concatenation is too long

February 17, 2017 - 9:58 pm UTC

Reviewer: Duke Ganote

LISTAGG is wonderful, but ORA-prone. There's always the risk of overrunning the maximum length of VARCHAR2(4000). However, MATCH_RECOGNIZE can solve that problem.

Suppose you need to aggregate column names within a 75-character limit. Easily done with MATCH_RECOGNIZE:

WITH 
fitter AS (
select MR.*
     , listagg(column_name,',')
          within group(order by column_name)
         over(partition by owner, table_name
                         , i#) -- SECOND PARTITIONIING BY owner, table AND i#
          AS column_name_list
  from ( select owner, table_name, column_name, column_id
           from all_tab_columns
          where owner = 'SYS'
       )
 match_recognize (
 partition by owner, table_name -- FIRST PARTITIONIING BY owner, table
 order by column_id
 measures sum(length(column_name)+1) as "length"
        , sum(1)                as r#
        , match_number()        as i#
        , classifier()          as "what"
 all rows per match with unmatched rows
 pattern (varchar2fit+)
 define varchar2fit as sum(length(column_name)+1) <= 75
) MR
)
select table_name, i#, column_name_list, length(column_name_list) list_length
  from fitter
 where r# = 1
   and table_name IN ( 'ALL_TABLES',  'ALL_TAB_COLUMNS')
 order by table_name, i#, column_id;

TABLE_NAME            I# COLUMN_NAME_LIST                                                            LIST_LENGTH
-------------------- --- --------------------------------------------------------------------------- -----------
ALL_TABLES             1 CLUSTER_NAME,IOT_NAME,OWNER,PCT_FREE,STATUS,TABLESPACE_NAME,TABLE_NAME            70
ALL_TABLES             2 INITIAL_EXTENT,INI_TRANS,MAX_TRANS,MIN_EXTENTS,NEXT_EXTENT,PCT_USED               67
ALL_TABLES             3 BACKED_UP,FREELISTS,FREELIST_GROUPS,LOGGING,MAX_EXTENTS,PCT_INCREASE              68
ALL_TABLES             4 AVG_ROW_LEN,AVG_SPACE,BLOCKS,CHAIN_CNT,EMPTY_BLOCKS,NUM_ROWS                      60
ALL_TABLES             5 AVG_SPACE_FREELIST_BLOCKS,CACHE,DEGREE,INSTANCES,NUM_FREELIST_BLOCKS              68
ALL_TABLES             6 IOT_TYPE,LAST_ANALYZED,PARTITIONED,SAMPLE_SIZE,TABLE_LOCK,TEMPORARY               67
ALL_TABLES             7 BUFFER_POOL,CELL_FLASH_CACHE,FLASH_CACHE,NESTED,ROW_MOVEMENT,SECONDARY            70
ALL_TABLES             8 CLUSTER_OWNER,DURATION,GLOBAL_STATS,MONITORING,SKIP_CORRUPT,USER_STATS            70
ALL_TABLES             9 COMPRESSION,COMPRESS_FOR,DEPENDENCIES,DROPPED,READ_ONLY,SEGMENT_CREATED           71
ALL_TABLES            10 ACTIVITY_TRACKING,CLUSTERING,DML_TIMESTAMP,HAS_IDENTITY,RESULT_CACHE              68
ALL_TABLES            11 CONTAINER_DATA,INMEMORY,INMEMORY_DISTRIBUTE,INMEMORY_PRIORITY                     61
ALL_TABLES            12 INMEMORY_COMPRESSION,INMEMORY_DUPLICATE                                           39
ALL_TAB_COLUMNS        1 COLUMN_NAME,DATA_TYPE,DATA_TYPE_MOD,DATA_TYPE_OWNER,OWNER,TABLE_NAME              68
ALL_TAB_COLUMNS        2 COLUMN_ID,DATA_LENGTH,DATA_PRECISION,DATA_SCALE,DEFAULT_LENGTH,NULLABLE           71
ALL_TAB_COLUMNS        3 DATA_DEFAULT,DENSITY,HIGH_VALUE,LOW_VALUE,NUM_DISTINCT,NUM_NULLS                  64
ALL_TAB_COLUMNS        4 CHARACTER_SET_NAME,LAST_ANALYZED,NUM_BUCKETS,SAMPLE_SIZE                          56
ALL_TAB_COLUMNS        5 AVG_COL_LEN,CHAR_COL_DECL_LENGTH,CHAR_LENGTH,GLOBAL_STATS,USER_STATS              68
ALL_TAB_COLUMNS        6 CHAR_USED,DATA_UPGRADED,DEFAULT_ON_NULL,HISTOGRAM,V80_FMT_IMAGE                   63
ALL_TAB_COLUMNS        7 EVALUATION_EDITION,IDENTITY_COLUMN,UNUSABLE_BEFORE,UNUSABLE_BEGINNING             69

19 rows selected.


Connor McDonald

Followup  

February 18, 2017 - 4:34 am UTC

nice work

Other resources

February 17, 2017 - 10:16 pm UTC

Reviewer: Stew Ashton from Paris, France

Connor McDonald

Followup  

February 18, 2017 - 4:34 am UTC

thanks stew

Another excellent source

February 21, 2017 - 12:46 am UTC

Reviewer: Iudith Mentzel from Haifa, Israel

For completeness, here is also an excellent webinar presented by Kim Berg Hansen
on this topic:

http://www.prohuddle.com/webinars/KimBergHansen/Use_Cases_of_Row_Pattern_Matching_in_Oracle_12c.php

Enjoy !

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.