I've been trying to write a query that will remove the comments from another query. Here's the small sample query-with-comments, from which I want to remove the comments:
WITH sample_SQL ( line#, txt ) AS (
SELECT line#, txt
FROM dual
MODEL DIMENSION BY (1 as line#)
MEASURES (cast(null as varchar2(50)) as txt)
RULES
( txt[1] = 'SELECT count(*) cnt -- comment type#1'
, txt[2] = ' FROM dual /* comment type #2 */ table_alias'
, txt[3] = ' WHERE 1 = 1 /* start multiline comment'
, txt[4] = ' ... other comments here ...'
, txt[5] = ' ... and more here ...'
, txt[6] = ' ... end of comment */ AND 2 = 2'
, txt[7] = ';'
) ORDER BY line#
) select * from sample_SQL;
LINE# TXT
----- --------------------------------------------
1 SELECT count(*) cnt -- comment type#1
2 FROM dual /* comment type #2 */ table_alias
3 WHERE 1 = 1 /* start multiline comment
4 ... other comments here ...
5 ... and more here ...
6 ... end of comment */ AND 2 = 2
7 ;
7 rows selected.
I figured out how to remove comments using analytic functions; it's pretty straight-forward. HOWEVER, it's not obvious how to do so using Regular Expressions. I thought I could use the match parameters m and n; the documentation states:
m -- expression is assumed to have multiple lines, where ^ is the start of a line and $ is the end of a line, regardless of the position of those characters in expression. By default, expression is assumed to be a single line.
n -- Allows the period character (.) to match the newline character. By default, the period is a wildcard.
However it's not obvious how to get the match parameters to work for me. I couldn't find any examples using the parameters.
The closest Regular Expression query I could concoct uses LISTAGG, so it suffers from the character-size limitation of VARCHAR2. (It doesn't require the match parameters; see also the liveSQL).
WITH sample_SQL ( line#, txt ) AS (
SELECT line#, txt
FROM dual
MODEL DIMENSION BY (1 as line#)
MEASURES (cast(null as varchar2(50)) as txt)
RULES
( txt[1] = 'SELECT count(*) cnt -- comment type#1'
, txt[2] = ' FROM dual /* comment type #2 */ table_alias'
, txt[3] = ' WHERE 1 = 1 /* start multiline comment'
, txt[4] = ' ... other comments here ...'
, txt[5] = ' ... and more here ...'
, txt[6] = ' ... end of comment */ AND 2 = 2'
, txt[7] = ';'
) ORDER BY line#
),
concat_SQL as (
SELECT listagg(txt,chr(10))within group(order by line#) txt
FROM sample_SQL
),
parms AS (
SELECT r#, pt, rr, dx
FROM dual
MODEL DIMENSION BY (1 as r#)
MEASURES(cast(null as varchar2(50)) as pt
,cast(null as varchar2(50)) as rr
,cast(null as varchar2(50)) as dx)
RULES
( pt[1] = '--.+'||chr(10) , rr[1]=chr(10) , dx[1]='double-dash'
, pt[2] = '/\*(.|'||chr(10)||')*?\*/', rr[2]='' , dx[2]='slash/asterisk'
)),
rCTE (r#, txt, dx ) AS (
SELECT 0, txt, 'original'
FROM concat_SQL
UNION ALL
SELECT parms.r#
, REGEXP_REPLACE(rCTE.txt,parms.pt,parms.rr)
, parms.dx
FROM rCTE
JOIN parms
ON rCTE.r# + 1 = parms.r#
)
SELECT *
FROM rCTE
ORDER BY r#;
R# TXT DX
--- -------------------------------------------------- -------------------
0 SELECT count(*) cnt -- comment type#1 original
FROM dual /* comment type #2 */ table_alias
WHERE 1 = 1 /* start multiline comment
... other comments here ...
... and more here ...
... end of comment */ AND 2 = 2
;
1 SELECT count(*) cnt double-dash
FROM dual /* comment type #2 */ table_alias
WHERE 1 = 1 /* start multiline comment
... other comments here ...
... and more here ...
... end of comment */ AND 2 = 2
;
2 SELECT count(*) cnt slash/asterisk
FROM dual table_alias
WHERE 1 = 1 AND 2 = 2
;
Can I solve this using SQL Regular Expressions without concatenating all the lines? When are m and n useful as match parameters? Thank you! Please feel free to add your comments... :)
Sure, here's a simple example using 'n' so period matches newlines too:
with rws as (
select 'this string
has a newline' str from dual
)
select str two_lines, regexp_replace(str, '.has', ' doesn''t have', 1, 1, 'n') one_line
from rws;
TWO_LINES ONE_LINE
this string
has a newline this string doesn't have a newline
The problem with your example is the lines are separate
rows. It doesn't matter what parameters you pass to regexp_replace. Each row is considered separately. You have to glue them together somehow to make one row so regexp_replace can match one string over many lines.