• Questions
• # Help to understand this recursive query

Thanks for the question, ken.

Asked: October 13, 2021 - 3:54 pm UTC

Last updated: October 22, 2021 - 3:15 am UTC

Version: 12c

Viewed 1000+ times

Hi Tom,

I am not able to fully understand below recursive query. Especially, the condition "WHERE POS > 0" and " WHERE POS = 0". my understanding is pos is result of REGEXP_INSTR (STR, '[A-Z][0-9]{5}',1, RN).
1. if the first POS > 0, how could the final condition has pos = 0.
2. what's difference between pos > o and regexp_instr(str, '[A-Z][0-9]{5}', 1, rn)>0. regexp_instr(str, '[A-Z][0-9]{5}', 1, rn)>0 doesn't give expected result.

```WITH T
AS (SELECT '///2E/A12345/E53421\$\$@#/A123456*#/A123456**4E53421\$\$\$\$\$@@!!!'
STR
FROM DUAL
UNION ALL
SELECT 'ABC' FROM DUAL
UNION ALL
SELECT 'ABC67890' FROM DUAL
UNION ALL
SELECT 'ABC67890XYZ345679' FROM DUAL
UNION ALL
SELECT 'ABC67890XYZ345679UVW384453' FROM DUAL
UNION ALL
SELECT 'ABC67890XYZ345679UVW384453' || 'ABC67890XYZ345679'
FROM DUAL
UNION ALL
SELECT    'ABC67890'
|| 'ABC67890'
|| 'ABC67890'
|| 'ABC67890'
|| 'ABC67890'
|| 'ABC67890'
|| 'ABC67890'
FROM DUAL),
TMP (RN,
POS,
STR,
STR0)
AS (SELECT 1, 1, STR, STR FROM T
UNION ALL
SELECT RN + 1,
REGEXP_INSTR (STR,
'[A-Z][0-9]{5}',
1,
RN),
REGEXP_REPLACE (STR,
SUBSTR (STR,
REGEXP_INSTR (STR,
'[A-Z][0-9]{5}',
1,
RN),
6),
'',
REGEXP_INSTR (STR,
'[A-Z][0-9]{5}',
1,
RN)
+ 6),
STR0
FROM TMP
WHERE POS > 0)
SELECT *
FROM TMP
WHERE POS = 0```

## and Connor said...

1. regexp_instr returns starting position of the matched regex. If there's no match, it returns zero.

So this keeps searching through the text until the there's no remaining matches for the regex.

2. The POS expression is looking for the next occurrence of the pattern. If there are no more, this returns zero.

By changing the recursive condition to

where regexp_instr(str, '[A-Z][0-9]{5}', 1, rn)>0

The query stops when it's on the last match. With pos > 0 it stops when it's past the last match.

So using the regex to filter => POS is never zero => the query returns no rows.

You can avoid this by changing:
```SELECT *
FROM TMP
WHERE POS = 0|```

To returns the rows for the highest value of POS for each string.

REGEXP_REPLACE can also do many replacements in one pass; so you could probably also remove the recursive component entirely and just do regex_replace.

## Rating

(4 ratings)

A reader, October 15, 2021 - 12:00 am UTC

thanks Chris.!
the requests is to replace repeated string. the string pattern is [A-Z][0-9]5, for example:, string: '///2E/A12345/E53421\$\$@#/A123456*#/A123456**4E53421\$\$\$\$\$@@!!!'
A12345 repeated 3 time, last two should be replaced. E53421 only has one, keep it.
expected result is: '///2E/A12345/E53421\$\$@#*#/6**4E53421\$\$\$\$\$@@!!!'
for below regexp_repalce, I am not able to fully understand. below is my understanding based on the result not sure if this is correct.

for SUBSTR (STR, REGEXP_INSTR (STR, '[A-Z][0-9]{5}', 1, 1), the occurrence loops multiple times like REGEXP_INSTR (STR, '[A-Z][0-9]{5}', 1, 1) + 6, REGEXP_INSTR (STR, '[A-Z][0-9]{5}', 1, 2) + 6, REGEXP_INSTR (STR, '[A-Z][0-9]{5}', 1, 3) + 6 ...
then SUBSTR (STR, REGEXP_INSTR (STR, '[A-Z][0-9]{5}', 2, repeating the look.

`REGEXP_REPLACE (STR, SUBSTR (STR,  REGEXP_INSTR (STR, '[A-Z][0-9]{5}', 1, RN), 6), '', REGEXP_INSTR (STR, '[A-Z][0-9]{5}', 1, RN) + 6)`

October 15, 2021 - 5:42 pm UTC

REGEXP_INSTR is searching for the Nth occurrence of the pattern. So

REGEXP_INSTR (STR, '[A-Z][0-9]{5}', 1, 2)

finds the 2nd time there's a letter followed by five numbers.

There may be a way to simplify this, but I'm not seeing it right now.

A reader, October 15, 2021 - 12:03 am UTC

string E53421 appears twice, 2nd one should be removed. '///2E/A12345/E53421\$\$@#/A123456*#/A123456**4E53421\$\$\$\$\$@@!!!'

expected result is: '///2E/A12345/E53421\$\$@#/6*#/6**4\$\$\$\$\$@@!!!'

A reader, October 16, 2021 - 12:11 am UTC

thanks Chris!

I am not able to fully understand below REGEXP_REPLACE. could you please explain briefly how this REGEXP_REPLACE to replace string: '///2E/A12345/E53421\$\$@#/A123456*#/A123456**4E53421\$\$\$\$\$@@!!!' to '///2E/A12345/E53421\$\$@#/6*#/A123456**\$\$\$\$\$@@!!!'

`REGEXP_REPLACE (STR, SUBSTR (STR,  REGEXP_INSTR (STR, '[A-Z][0-9]{5}', 1, RN), 6), '', <b>REGEXP_INSTR (STR, '[A-Z][0-9]{5}', 1, RN) + 6</b>)`

really appreciate.

October 18, 2021 - 10:19 am UTC

Essentially what it's doing is:

Searching the string for the pattern '[A-Z][0-9]{5}'
Return the string that matches this (so A12345, E53421, etc.)

This happens in the substr which is the second argument to the overall regexp_substr.

Search the source again for the substring matched above (A12345, E53421, etc.)
Starting after the end of the string matched above (that's what REGEXP_INSTR (STR, '[A-Z][0-9]{5}', 1, RN) + 6 does - keep searching from the point you found a letter followed by five numbers)
Replace this with null

A reader, October 21, 2021 - 1:08 pm UTC

Thank you very much, Christ.

October 22, 2021 - 3:15 am UTC