Skip to Main Content
  • Questions
  • Help to understand this recursive query

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, ken.

Asked: October 13, 2021 - 3:54 pm UTC

Last updated: October 22, 2021 - 3:15 am UTC

Version: 12c

Viewed 10K+ times! This question is

You Asked

Hi Tom,

I am not able to fully understand below recursive query. Especially, the condition "WHERE POS > 0" and " WHERE POS = 0". my understanding is pos is result of REGEXP_INSTR (STR, '[A-Z][0-9]{5}',1, RN).
1. if the first POS > 0, how could the final condition has pos = 0.
2. what's difference between pos > o and regexp_instr(str, '[A-Z][0-9]{5}', 1, rn)>0. regexp_instr(str, '[A-Z][0-9]{5}', 1, rn)>0 doesn't give expected result.


thanks in advance.


WITH T
     AS (SELECT '///2E/A12345/E53421$$@#/A123456*#/A123456**4E53421$$$$$@@!!!'
                    STR
           FROM DUAL
         UNION ALL
         SELECT 'ABC' FROM DUAL
         UNION ALL
         SELECT 'ABC67890' FROM DUAL
         UNION ALL
         SELECT 'ABC67890XYZ345679' FROM DUAL
         UNION ALL
         SELECT 'ABC67890XYZ345679UVW384453' FROM DUAL
         UNION ALL
         SELECT 'ABC67890XYZ345679UVW384453' || 'ABC67890XYZ345679'
           FROM DUAL
         UNION ALL
         SELECT    'ABC67890'
                || 'ABC67890'
                || 'ABC67890'
                || 'ABC67890'
                || 'ABC67890'
                || 'ABC67890'
                || 'ABC67890'
           FROM DUAL),
     TMP (RN,
          POS,
          STR,
          STR0)
     AS (SELECT 1, 1, STR, STR FROM T
         UNION ALL
         SELECT RN + 1,
                REGEXP_INSTR (STR,
                              '[A-Z][0-9]{5}',
                              1,
                              RN),
                REGEXP_REPLACE (STR,
                                SUBSTR (STR,
                                        REGEXP_INSTR (STR,
                                                      '[A-Z][0-9]{5}',
                                                      1,
                                                      RN),
                                        6),
                                '',
                                  REGEXP_INSTR (STR,
                                                '[A-Z][0-9]{5}',
                                                1,
                                                RN)
                                + 6),
                STR0
           FROM TMP
          WHERE POS > 0)
SELECT *
  FROM TMP
 WHERE POS = 0



and Chris said...

1. regexp_instr returns starting position of the matched regex. If there's no match, it returns zero.

So this keeps searching through the text until the there's no remaining matches for the regex.

2. The POS expression is looking for the next occurrence of the pattern. If there are no more, this returns zero.

By changing the recursive condition to

where regexp_instr(str, '[A-Z][0-9]{5}', 1, rn)>0

The query stops when it's on the last match. With pos > 0 it stops when it's past the last match.

So using the regex to filter => POS is never zero => the query returns no rows.

You can avoid this by changing:
SELECT *
  FROM TMP
 WHERE POS = 0|


To returns the rows for the highest value of POS for each string.

REGEXP_REPLACE can also do many replacements in one pass; so you could probably also remove the recursive component entirely and just do regex_replace.

Rating

  (4 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

A reader, October 15, 2021 - 12:00 am UTC

thanks Chris.!
the requests is to replace repeated string. the string pattern is [A-Z][0-9]5, for example:, string: '///2E/A12345/E53421$$@#/A123456*#/A123456**4E53421$$$$$@@!!!'
A12345 repeated 3 time, last two should be replaced. E53421 only has one, keep it.
expected result is: '///2E/A12345/E53421$$@#*#/6**4E53421$$$$$@@!!!'
for below regexp_repalce, I am not able to fully understand. below is my understanding based on the result not sure if this is correct.

for SUBSTR (STR, REGEXP_INSTR (STR, '[A-Z][0-9]{5}', 1, 1), the occurrence loops multiple times like REGEXP_INSTR (STR, '[A-Z][0-9]{5}', 1, 1) + 6, REGEXP_INSTR (STR, '[A-Z][0-9]{5}', 1, 2) + 6, REGEXP_INSTR (STR, '[A-Z][0-9]{5}', 1, 3) + 6 ...
then SUBSTR (STR, REGEXP_INSTR (STR, '[A-Z][0-9]{5}', 2, repeating the look.

please advice.

REGEXP_REPLACE (STR, SUBSTR (STR,  REGEXP_INSTR (STR, '[A-Z][0-9]{5}', 1, RN), 6), '', REGEXP_INSTR (STR, '[A-Z][0-9]{5}', 1, RN) + 6)


Chris Saxon
October 15, 2021 - 5:42 pm UTC

REGEXP_INSTR is searching for the Nth occurrence of the pattern. So

REGEXP_INSTR (STR, '[A-Z][0-9]{5}', 1, 2)

finds the 2nd time there's a letter followed by five numbers.

There may be a way to simplify this, but I'm not seeing it right now.

A reader, October 15, 2021 - 12:03 am UTC

sorry, my bad.
string E53421 appears twice, 2nd one should be removed. '///2E/A12345/E53421$$@#/A123456*#/A123456**4E53421$$$$$@@!!!'

expected result is: '///2E/A12345/E53421$$@#/6*#/6**4$$$$$@@!!!'

A reader, October 16, 2021 - 12:11 am UTC

thanks Chris!

I am not able to fully understand below REGEXP_REPLACE. could you please explain briefly how this REGEXP_REPLACE to replace string: '///2E/A12345/E53421$$@#/A123456*#/A123456**4E53421$$$$$@@!!!' to '///2E/A12345/E53421$$@#/6*#/A123456**$$$$$@@!!!'

REGEXP_REPLACE (STR, SUBSTR (STR,  REGEXP_INSTR (STR, '[A-Z][0-9]{5}', 1, RN), 6), '', <b>REGEXP_INSTR (STR, '[A-Z][0-9]{5}', 1, RN) + 6</b>)

really appreciate.

thanks in advance
Chris Saxon
October 18, 2021 - 10:19 am UTC

Essentially what it's doing is:

Searching the string for the pattern '[A-Z][0-9]{5}'
Return the string that matches this (so A12345, E53421, etc.)

This happens in the substr which is the second argument to the overall regexp_substr.

Search the source again for the substring matched above (A12345, E53421, etc.)
Starting after the end of the string matched above (that's what REGEXP_INSTR (STR, '[A-Z][0-9]{5}', 1, RN) + 6 does - keep searching from the point you found a letter followed by five numbers)
Replace this with null

A reader, October 21, 2021 - 1:08 pm UTC

Thank you very much, Christ.



Connor McDonald
October 22, 2021 - 3:15 am UTC

glad we could help

More to Explore

SQL

The Oracle documentation contains a complete SQL reference.