Skip to Main Content
  • Questions
  • How to get all subexpressions from the regexp_substr?

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Richard.

Asked: March 03, 2016 - 10:14 pm UTC

Last updated: March 05, 2016 - 1:19 am UTC

Version: 11.2.0.3

Viewed 1000+ times

You Asked

Hi Tom and your team,

Thanks for taking my question.

I can get each subexpression using regexp_substr if I know how many times it occur. For example,

select regexp_count('text=1234 text=asdf text=asdf4567 text=6yffv4', 'text=([[:alnum:]]*)') regexp_cnt,
regexp_substr('text=1234 text=asdf text=asdf4567 text=6yffv4', 'text=([[:alnum:]]*)', 1, 1, 'c', 1) regexp1,
regexp_substr('text=1234 text=asdf text=asdf4567 text=6yffv4', 'text=([[:alnum:]]*)', 1, 2, 'c', 1) regexp2,
regexp_substr('text=1234 text=asdf text=asdf4567 text=6yffv4', 'text=([[:alnum:]]*)', 1, 3, 'c', 1) regexp3,
regexp_substr('text=1234 text=asdf text=asdf4567 text=6yffv4', 'text=([[:alnum:]]*)', 1, 4, 'c', 1) regexp4
from dual;

I got:
REGEXP_CNT REGE REGE REGEXP3 REGEXP
---------- ---- ---- -------- ------
4 1234 asdf asdf4567 6yffv4

If in a table column, the number of occurrence is not a fix number, how to retrieve all the subexpressions from the table records?

Thanks.
Richard Xu

and Connor said...

I'm not entirely sure what you gain from this, because if you want a result in columns, then you need to know the upper bound for the number of possible matches anyway, because a single query cannot return a variable number of columns *per row*. For example, if I set an upper bound of (say) 8, I could do this:

SQL> create table t1 ( id int, str varchar2(1000));

Table created.

SQL>
SQL> insert into t1 values (1,'text=1234 text=asdf text=asdf4567 text=6yffv4');

1 row created.

SQL> insert into t1 values (2,'text=asdasd text=qweqwe text=sdfsdf');

1 row created.

SQL>
SQL> select  id,
  2          max(decode(r,1,regex)) r1,
  3          max(decode(r,2,regex)) r2,
  4          max(decode(r,3,regex)) r3,
  5          max(decode(r,4,regex)) r4,
  6          max(decode(r,5,regex)) r5,
  7          max(decode(r,6,regex)) r6,
  8          max(decode(r,7,regex)) r7,
  9          max(decode(r,8,regex)) r8
 10  from
 11  (
 12  select id,
 13          r,
 14          regexp_substr(str, 'text=([[:alnum:]]*)', 1, r, 'c', 1) regex
 15    from t1,
 16         ( select rownum r from dual connect by level <= 10  )
 17  )
 18  group by id
 19  /

  ID R1         R2         R3         R4         R5         R6         R7         R8
---- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
   1 1234       asdf       asdf4567   6yffv4
   2 asdasd     qweqwe     sdfsdf

2 rows selected.


But did I really gain anything over simply coding the 8 regex's ? So I'm not sure what you're really hoping to achieve.

Hope this helps.

Rating

  (1 rating)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Could you use the regexp_count for the number of subexpression?

Richard Xu, March 04, 2016 - 5:39 pm UTC

what I really want is:
ID CNT STRING
1 4 1234, asdf, asdf4567, 6yffv4
2 3 asdasd, qweqwe, sdfsdf

It shows the count of subexpressions in the row, and a comma separated string for all the subexpressions.

Thanks.
Richard Xu
Connor McDonald
March 05, 2016 - 1:19 am UTC

Ah...that old chestnut: "What I *really* want"

Sorry, my crystal ball was getting repaired, so I didnt know that :-)

Try this

SQL> select id,
  2         max(column_value) cnt,
  3         listagg(regexp_substr(str,'text=([[:alnum:]]*)', 1, column_value, 'c', 1),',')
  4            within group ( order by column_value) regex
  5  from   t1 ,
  6         table(
  7           cast(
  8             multiset(
  9               select level
 10               from dual
 11               connect by level <= regexp_count(str, 'text=([[:alnum:]]*)')
 12             )
 13           as sys.odcinumberlist
 14           )
 15         )
 16  group by id
 17  order by 1;

        ID        CNT REGEX
---------- ---------- ----------------------------------------
         1          4 1234,asdf,asdf4567,6yffv4
         2          3 asdasd,qweqwe,sdfsdf