## Question and Answer

## You Asked

Hi Tom,

I am not able to fully understand below recursive query. Especially, the condition "WHERE POS > 0" and " WHERE POS = 0". my understanding is pos is result of REGEXP_INSTR (STR, '[A-Z][0-9]{5}',1, RN).

1. if the first POS > 0, how could the final condition has pos = 0.

2. what's difference between pos > o and regexp_instr(str, '[A-Z][0-9]{5}', 1, rn)>0. regexp_instr(str, '[A-Z][0-9]{5}', 1, rn)>0 doesn't give expected result.

thanks in advance.

I am not able to fully understand below recursive query. Especially, the condition "WHERE POS > 0" and " WHERE POS = 0". my understanding is pos is result of REGEXP_INSTR (STR, '[A-Z][0-9]{5}',1, RN).

1. if the first POS > 0, how could the final condition has pos = 0.

2. what's difference between pos > o and regexp_instr(str, '[A-Z][0-9]{5}', 1, rn)>0. regexp_instr(str, '[A-Z][0-9]{5}', 1, rn)>0 doesn't give expected result.

thanks in advance.

WITH T AS (SELECT '///2E/A12345/E53421$$@#/A123456*#/A123456**4E53421$$$$$@@!!!' STR FROM DUAL UNION ALL SELECT 'ABC' FROM DUAL UNION ALL SELECT 'ABC67890' FROM DUAL UNION ALL SELECT 'ABC67890XYZ345679' FROM DUAL UNION ALL SELECT 'ABC67890XYZ345679UVW384453' FROM DUAL UNION ALL SELECT 'ABC67890XYZ345679UVW384453' || 'ABC67890XYZ345679' FROM DUAL UNION ALL SELECT 'ABC67890' || 'ABC67890' || 'ABC67890' || 'ABC67890' || 'ABC67890' || 'ABC67890' || 'ABC67890' FROM DUAL), TMP (RN, POS, STR, STR0) AS (SELECT 1, 1, STR, STR FROM T UNION ALL SELECT RN + 1, REGEXP_INSTR (STR, '[A-Z][0-9]{5}', 1, RN), REGEXP_REPLACE (STR, SUBSTR (STR, REGEXP_INSTR (STR, '[A-Z][0-9]{5}', 1, RN), 6), '', REGEXP_INSTR (STR, '[A-Z][0-9]{5}', 1, RN) + 6), STR0 FROM TMP WHERE POS > 0) SELECT * FROM TMP WHERE POS = 0

## and Connor said...

1. regexp_instr returns starting position of the matched regex. If there's no match, it returns zero.

So this keeps searching through the text until the there's no remaining matches for the regex.

2. The POS expression is looking for the

By changing the recursive condition to

where regexp_instr(str, '[A-Z][0-9]{5}', 1, rn)>0

The query stops when it's

So using the regex to filter => POS is never zero => the query returns no rows.

You can avoid this by changing:

To returns the rows for the highest value of POS for each string.

REGEXP_REPLACE can also do many replacements in one pass; so you could probably also remove the recursive component entirely and just do regex_replace.

So this keeps searching through the text until the there's no remaining matches for the regex.

2. The POS expression is looking for the

*next*occurrence of the pattern. If there are no more, this returns zero.By changing the recursive condition to

where regexp_instr(str, '[A-Z][0-9]{5}', 1, rn)>0

The query stops when it's

**on**the last match. With pos > 0 it stops when it's**past**the last match.So using the regex to filter => POS is never zero => the query returns no rows.

You can avoid this by changing:

SELECT * FROM TMP WHERE POS = 0|

To returns the rows for the highest value of POS for each string.

REGEXP_REPLACE can also do many replacements in one pass; so you could probably also remove the recursive component entirely and just do regex_replace.

## Rating

# Comments

thanks Chris.!

the requests is to replace repeated string. the string pattern is [A-Z][0-9]5, for example:, string: '///2E/A12345/E53421$$@#/A123456*#/A123456**4E53421$$$$$@@!!!'

A12345 repeated 3 time, last two should be replaced. E53421 only has one, keep it.

expected result is: '///2E/A12345/E53421$$@#*#/6**4E53421$$$$$@@!!!'

for below regexp_repalce, I am not able to fully understand. below is my understanding based on the result not sure if this is correct.

for SUBSTR (STR, REGEXP_INSTR (STR, '[A-Z][0-9]{5}', 1,

then SUBSTR (STR, REGEXP_INSTR (STR, '[A-Z][0-9]{5}',

please advice.

the requests is to replace repeated string. the string pattern is [A-Z][0-9]5, for example:, string: '///2E/A12345/E53421$$@#/A123456*#/A123456**4E53421$$$$$@@!!!'

A12345 repeated 3 time, last two should be replaced. E53421 only has one, keep it.

expected result is: '///2E/A12345/E53421$$@#*#/6**4E53421$$$$$@@!!!'

for below regexp_repalce, I am not able to fully understand. below is my understanding based on the result not sure if this is correct.

for SUBSTR (STR, REGEXP_INSTR (STR, '[A-Z][0-9]{5}', 1,

**1**), the occurrence loops multiple times like REGEXP_INSTR (STR, '[A-Z][0-9]{5}', 1,**1**) + 6, REGEXP_INSTR (STR, '[A-Z][0-9]{5}', 1,**2**) + 6, REGEXP_INSTR (STR, '[A-Z][0-9]{5}', 1,**3**) + 6 ...then SUBSTR (STR, REGEXP_INSTR (STR, '[A-Z][0-9]{5}',

**2**, repeating the look.please advice.

REGEXP_REPLACE (STR, SUBSTR (STR, REGEXP_INSTR (STR, '[A-Z][0-9]{5}', 1, RN), 6), '', REGEXP_INSTR (STR, '[A-Z][0-9]{5}', 1, RN) + 6)

REGEXP_INSTR is searching for the Nth occurrence of the pattern. So

REGEXP_INSTR (STR, '[A-Z][0-9]{5}', 1, 2)

finds the 2nd time there's a letter followed by five numbers.

There may be a way to simplify this, but I'm not seeing it right now.

REGEXP_INSTR (STR, '[A-Z][0-9]{5}', 1, 2)

finds the 2nd time there's a letter followed by five numbers.

There may be a way to simplify this, but I'm not seeing it right now.

sorry, my bad.

string E53421 appears twice, 2nd one should be removed. '///2E/A12345/E53421$$@#/A123456*#/A123456**4E53421$$$$$@@!!!'

expected result is: '///2E/A12345/E53421$$@#/6*#/6**4$$$$$@@!!!'

string E53421 appears twice, 2nd one should be removed. '///2E/A12345/E53421$$@#/A123456*#/A123456**4E53421$$$$$@@!!!'

expected result is: '///2E/A12345/E53421$$@#/6*#/6**4$$$$$@@!!!'

thanks Chris!

I am not able to fully understand below REGEXP_REPLACE. could you please explain briefly how this REGEXP_REPLACE to replace string: '///2E/A12345/E53421$$@#/A123456*#/A123456**4E53421$$$$$@@!!!' to '///2E/A12345/E53421$$@#/6*#/A123456**$$$$$@@!!!'

really appreciate.

thanks in advance

I am not able to fully understand below REGEXP_REPLACE. could you please explain briefly how this REGEXP_REPLACE to replace string: '///2E/A12345/E53421$$@#/A123456*#/A123456**4E53421$$$$$@@!!!' to '///2E/A12345/E53421$$@#/6*#/A123456**$$$$$@@!!!'

REGEXP_REPLACE (STR, SUBSTR (STR, REGEXP_INSTR (STR, '[A-Z][0-9]{5}', 1, RN), 6), '', <b>REGEXP_INSTR (STR, '[A-Z][0-9]{5}', 1, RN) + 6</b>)

really appreciate.

thanks in advance

Essentially what it's doing is:

Searching the string for the pattern '[A-Z][0-9]{5}'

Return the string that matches this (so A12345, E53421, etc.)

This happens in the substr which is the second argument to the overall regexp_substr.

Search the source again for the substring matched above (A12345, E53421, etc.)

Starting

Replace this with null

Searching the string for the pattern '[A-Z][0-9]{5}'

Return the string that matches this (so A12345, E53421, etc.)

This happens in the substr which is the second argument to the overall regexp_substr.

Search the source again for the substring matched above (A12345, E53421, etc.)

Starting

**after**the end of the string matched above (that's what REGEXP_INSTR (STR, '[A-Z][0-9]{5}', 1, RN) + 6 does - keep searching from the point you found a letter followed by five numbers)Replace this with null

Thank you very much, Christ.

glad we could help