Skip to Main Content
  • Questions
  • REGEXP_REPLACE Match Parameters m and n.

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Duke.

Asked: April 25, 2017 - 2:17 pm UTC

Last updated: April 25, 2017 - 3:42 pm UTC

Version: 12c

Viewed 1000+ times

You Asked

I've been trying to write a query that will remove the comments from another query. Here's the small sample query-with-comments, from which I want to remove the comments:

WITH sample_SQL ( line#, txt ) AS (
SELECT line#, txt
  FROM dual
 MODEL DIMENSION BY (1 as line#)
       MEASURES (cast(null as varchar2(50)) as txt)
RULES
( txt[1] = 'SELECT count(*) cnt -- comment type#1'
, txt[2] = ' FROM dual /* comment type #2 */ table_alias'
, txt[3] = ' WHERE 1 = 1 /* start multiline comment'
, txt[4] = ' ... other comments here ...'
, txt[5] = ' ... and more here ...'
, txt[6] = ' ... end of comment */ AND 2 = 2'
, txt[7] = ';'
) ORDER BY line#
) select * from sample_SQL;

LINE# TXT
----- --------------------------------------------
    1 SELECT count(*) cnt -- comment type#1
    2  FROM dual /* comment type #2 */ table_alias
    3  WHERE 1 = 1 /* start multiline comment
    4  ... other comments here ...
    5  ... and more here ...
    6  ... end of comment */ AND 2 = 2
    7 ;

7 rows selected.


I figured out how to remove comments using analytic functions; it's pretty straight-forward. HOWEVER, it's not obvious how to do so using Regular Expressions. I thought I could use the match parameters m and n; the documentation states:

m -- expression is assumed to have multiple lines, where ^ is the start of a line and $ is the end of a line, regardless of the position of those characters in expression. By default, expression is assumed to be a single line.
n -- Allows the period character (.) to match the newline character. By default, the period is a wildcard.

However it's not obvious how to get the match parameters to work for me. I couldn't find any examples using the parameters.

The closest Regular Expression query I could concoct uses LISTAGG, so it suffers from the character-size limitation of VARCHAR2. (It doesn't require the match parameters; see also the liveSQL).

WITH sample_SQL ( line#, txt ) AS (
SELECT line#, txt
  FROM dual
 MODEL DIMENSION BY (1 as line#)
       MEASURES (cast(null as varchar2(50)) as txt)
RULES
( txt[1] = 'SELECT count(*) cnt -- comment type#1'
, txt[2] = ' FROM dual /* comment type #2 */ table_alias'
, txt[3] = ' WHERE 1 = 1 /* start multiline comment'
, txt[4] = ' ... other comments here ...'
, txt[5] = ' ... and more here ...'
, txt[6] = ' ... end of comment */ AND 2 = 2'
, txt[7] = ';'
) ORDER BY line#
),
concat_SQL as (
SELECT listagg(txt,chr(10))within group(order by line#) txt
  FROM sample_SQL
),
parms AS (
SELECT r#, pt, rr, dx
  FROM dual
 MODEL DIMENSION BY (1 as r#)
       MEASURES(cast(null as varchar2(50)) as pt
               ,cast(null as varchar2(50)) as rr
               ,cast(null as varchar2(50)) as dx)
RULES
( pt[1] = '--.+'||chr(10)            , rr[1]=chr(10) , dx[1]='double-dash'
, pt[2] = '/\*(.|'||chr(10)||')*?\*/', rr[2]=''      , dx[2]='slash/asterisk'
)),
rCTE (r#, txt, dx ) AS (
SELECT 0, txt, 'original'
  FROM concat_SQL
UNION ALL
SELECT parms.r#
     , REGEXP_REPLACE(rCTE.txt,parms.pt,parms.rr)
     , parms.dx
  FROM rCTE
  JOIN parms
    ON rCTE.r# + 1 = parms.r#
)
SELECT *
  FROM rCTE
 ORDER BY r#;

 R# TXT                                                DX
--- -------------------------------------------------- -------------------
  0 SELECT count(*) cnt -- comment type#1              original
     FROM dual /* comment type #2 */ table_alias
     WHERE 1 = 1 /* start multiline comment
     ... other comments here ...
     ... and more here ...
     ... end of comment */ AND 2 = 2
    ;

  1 SELECT count(*) cnt                                double-dash
     FROM dual /* comment type #2 */ table_alias
     WHERE 1 = 1 /* start multiline comment
     ... other comments here ...
     ... and more here ...
     ... end of comment */ AND 2 = 2
    ;

  2 SELECT count(*) cnt                                slash/asterisk
     FROM dual  table_alias
     WHERE 1 = 1  AND 2 = 2
    ;


Can I solve this using SQL Regular Expressions without concatenating all the lines? When are m and n useful as match parameters? Thank you! Please feel free to add your comments... :)

with LiveSQL Test Case:

and Chris said...

Sure, here's a simple example using 'n' so period matches newlines too:

with rws as (
  select 'this string
has a newline' str from dual
)
select str two_lines, regexp_replace(str, '.has', ' doesn''t have', 1, 1, 'n') one_line
from   rws;

TWO_LINES                  ONE_LINE                            
this string
has a newline  this string doesn't have a newline 


The problem with your example is the lines are separate rows. It doesn't matter what parameters you pass to regexp_replace. Each row is considered separately. You have to glue them together somehow to make one row so regexp_replace can match one string over many lines.

Is this answer out of date? If it is, please let us know via a Comment