Skip to Main Content

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, George.

Asked: November 08, 2016 - 4:46 pm UTC

Last updated: October 05, 2018 - 9:15 am UTC

Version: 11g

Viewed 1000+ times

You Asked

Hi

i am trying to match a pattern in a string. It is web log data so basically i need to look as follows
find a comma followed by two spaces followed by anynumber of characters other than a quote follwed by =

Eg: is string is
abc=1, name='1,2,3', severity=, age=24

I need to get the 6th, 21, 33 positions in instr value

Thanks in advance

with LiveSQL Test Case:

and Chris said...

Thanks for providing a LiveSQL test case!

Your regex is close. Remember that regexes are greedy. ".*=" matches every character up to the last equals. So instead of matching any character, you want to match any character that isn't a quote or comma. e.g.

.* goes to [^'',]* (escaping the quote)

Which gives:

select regexp_instr(q'|abc=1,  name='1,2,3',  severity=,  age=24|','[,]  [^'',]*=',1) pos
from   dual;

POS  
6    

select regexp_instr(q'|abc=1,  name='1,2,3',  severity=,  age=24|','[,]  [^'',]*=',1,2) pos
from   dual;

POS  
21   

select regexp_instr(q'|abc=1,  name='1,2,3',  severity=,  age=24|','[,]  [^'',]*=',1,3) pos
from   dual;

POS  
33  

Rating

  (24 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Re

George Joseph, November 11, 2016 - 3:51 am UTC

Thanks so much Chris.

I got into a bit of more trouble, when there are patterns inside my quotes.

Eg:

param1='1, 2, 3', param2='age=24, sex=M, sal=500', param3=1

ie my pattern [,] *[^',]*= matches contents inside my value. i want to ignore those.

https://livesql.oracle.com/apex/livesql/file/content_D4S8YO62BPLJZGOY22FBZL5ON.html


Chris Saxon
November 11, 2016 - 10:21 am UTC

Bah, pesky regular expressions!

If may be easiest to first replace the text between the quotes with something else. You can find these sections with a regex like:

'[^']+'


Completing this left as an exercise for the reader ;)

GJ, November 12, 2016 - 9:25 am UTC

I tried the below, the regexp_replace only allows to replace the entire contents of the string, rather than the text inside the string.

Eg: param2 gets can be replaced by appending'<' and '>'
param2=<'age=24, sex=M, sal=500'>

However i am not able to figure out how to replace the contents inside the matched parameter that matches an = with say a %

select q'|param1=1,  param2='age=24, sex=M, sal=500',  param3=,  param4='24'|' as orig_str
       ,regexp_replace(q'|param1=1,  param2='age=24, sex=M, sal=500',  param3=,  param4='24'|',q'|('[^',]+[=][^']+')|','<\1>') as replaced_str
from dual


does this helps ?

Rajeshwaran Jeyabal, November 14, 2016 - 4:51 am UTC

....
However i am not able to figure out how to replace the contents inside the matched parameter that matches an = with say a %
....


demo@ORA12C> variable txt varchar2(70)
demo@ORA12C> exec :txt := q'|param1=1,  param2='age=24, sex=M, sal=500',  param3=,  param4=24|';

PL/SQL procedure successfully completed.

demo@ORA12C> print txt

TXT
-------------------------------------------------------------------------------------------------------
param1=1,  param2='age=24, sex=M, sal=500',  param3=,  param4=24

demo@ORA12C> select regexp_substr(:txt,q'|,\s\s(.)*'.*',\s\s|') x1 from dual;

X1
-------------------------------------------------------------------------------------------------------
,  param2='age=24, sex=M, sal=500',

1 row selected.

demo@ORA12C> select replace( regexp_substr(:txt,q'|,\s\s(.)*'.*',\s\s|') ,'=','%') x1 from dual;

X1
-------------------------------------------------------------------------------------------------------
,  param2%'age%24, sex%M, sal%500',

1 row selected.

demo@ORA12C>

Chris Saxon
November 14, 2016 - 10:20 am UTC

That's incorrect:

set null <NULL>
with rws as (
  select q'|param1='1, 2, 3', param2='age=24, sex=M, sal=500', param3=1|' txt from dual
)
select replace( regexp_substr(txt,q'|,\s\s(.)*'.*',\s\s|') ,'=','%') x1 from rws;

X1      
<NULL> 


You need to find the start position and length of the string that has doesn't have a quote between two quotes:

with rws as (
  select q'|param1='1, 2, 3', param2='age=24, sex=M, sal=500', param3=1|' txt from dual
)
select regexp_instr(txt,q'|'[^']*'|')+1, length(regexp_substr(txt,q'|'[^']*'|')), regexp_substr(txt,q'|'[^']*'|')
from rws;


Then use these to manipulate the original string to do the replacement.

Re

George Joseph, November 14, 2016 - 7:17 am UTC

@Rajesh
Thanks for the input. But i have a problem

Say my string is as follows


eg:

app=24, dvc_host='age=1, sex=M', sender='age=1, sex=F'


so if i were to replace using your method

dvc_host%='age%1, sex%M'
sender%='age%1, sex%F'

I dont wish to get the dvc_host% replaced. It should be
dvc_host='age%1, sex%M.
sender='age%1, sex%F'



Chris Saxon
November 14, 2016 - 10:22 am UTC

See my response above.

Back reference positions

Rajeshwaran, Jeyabal, November 14, 2016 - 12:09 pm UTC

http://docs.oracle.com/database/121/ADFNS/adfns_regexp.htm#ADFNS242

Using the back-reference to repostion the characters technique we could get this.

demo@ORA12C> variable x varchar2(70)
demo@ORA12C> exec :x := q'|app=24, dvc_host='age=1, sex=M', sender='age=1, sex=F'|';

PL/SQL procedure successfully completed.

demo@ORA12C> print x

X
-------------------------------------------------------------------------------------------------------
app=24, dvc_host='age=1, sex=M', sender='age=1, sex=F'

demo@ORA12C> select regexp_replace( :x , q'|'(\S+)=(\S+)\s(\S+)=(\S+)'|'  ,'''\1%\2 \3%\4''') from dual;

REGEXP_REPLACE(:X,Q'|'(\S+)=(\S+)\S(\S+)=(\S+)'|','''\1%\2\3%\4''')
-----------------------------------------------------------------------------------------------------------
app=24, dvc_host='age%1, sex%M', sender='age%1, sex%F'

1 row selected.

demo@ORA12C>

To Chris

Rajeshwaran, Jeyabal, November 14, 2016 - 12:30 pm UTC

That's incorrect:

set null <NULL>
with rws as (
select q'|param1='1, 2, 3', param2='age=24, sex=M, sal=500', param3=1|' txt from dual
)
select replace( regexp_substr(txt,q'|,\s\s(.)*'.*',\s\s|') ,'=','%') x1 from rws;

X1
<NULL>


Chirs - you changed the inputs.

https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:9532244000346199677#9532301800346109783

select q'|param1=1,  param2='age=24, sex=M, sal=500',  param3=,  param4='24'|' as orig_str
------------------>##


We should have two space between "," and "Param2" (the same for others occurrences) - but that is missing with your sql.

Rerunning it with right inputs - will get the outputs like this.

demo@ORA12C> set null <NULL>
demo@ORA12C> with rws as (
  2  select q'|param1='1, 2, 3',  param2='age=24, sex=M, sal=500', param3=1|' txt from dual
  3  )
  4  select replace( regexp_substr(txt,q'|,\s\s(.)*'.*',\s\s|') ,'=','%') x1 from rws;

X1
------------------------------
<NULL>

1 row selected.

demo@ORA12C> with rws as (
  2  select q'|param1='1, 2, 3',  param2='age=24, sex=M, sal=500',  param3=1|' txt from dual
  3  )
  4  select txt , regexp_substr( txt ,q'|,\s\s(.)*'(.)*',\s\s|') from rws ;

TXT                                                           REGEXP_SUBSTR(TXT,Q'|,\S\S(.)*'(.)*',\S\S|')
------------------------------------------------------------- -------------------------------------------------------------
param1='1, 2, 3',  param2='age=24, sex=M, sal=500',  param3=1 ,  param2='age=24, sex=M, sal=500',

1 row selected.

demo@ORA12C> with rws as (
  2  select q'|param1='1, 2, 3',  param2='age=24, sex=M, sal=500',  param3=1|' txt from dual
  3  )
  4  select txt , replace( regexp_substr( txt ,q'|,\s\s(.)*'(.)*',\s\s|'),'=','%') from rws
  5  /

TXT                                                           REPLACE(REGEXP_SUBSTR(TXT,Q'|,\S\S(.)*'(.)*',\S\S|'),'=','%')
------------------------------------------------------------- -------------------------------------------------------------
param1='1, 2, 3',  param2='age=24, sex=M, sal=500',  param3=1 ,  param2%'age%24, sex%M, sal%500',

1 row selected.

demo@ORA12C>

Chris Saxon
November 14, 2016 - 12:58 pm UTC

That was the example the OP gave in the first review...

Re

GJ, November 14, 2016 - 1:04 pm UTC

@Rajesh

Thanks for the response

For the backreference option,there could be more than 2 parameters inside the single quote so the backreference parameters(Eg: %1%2 cant be hardcoded)

q'|app=24, dvc_host='age=1, sex=M, ht=151', sender='age=1, sex=F'|';

select q'|app=24, dvc_host='age=1, sex=M, ht=151', sender='age=1, sex=F'|' as str
      ,regexp_replace(q'|app=24, dvc_host='age=1, sex=M', sender='age=1, sex=F, sex=F'|'
                      ,q'|'(\S+)=(\S+)\s(\S+)=(\S+)'|'
                      ,'''\1%\2 \3%\4''')
  from dual



Chris Saxon
November 14, 2016 - 1:38 pm UTC

Like I said, use a regex like this:

exec :x := q'|param1=1,  param2='age=24, sex=M, sal=500',  param3=,  param4=24|';

select regexp_substr( :x , q'|'[^']+'|' ) from dual;

REGEXP_SUBSTR(:X,Q'|'[^']+'|')  
'age=24, sex=M, sal=500' 


This gives the position and length of the string to replace. Then reconstruct the original string, masking this out by:

- Chopping it at the point the regexp_substr returns
- Padding the length with # or some other "invalid" character
- Sticking the rest of the string back on the end

I can't think of a better way at the moment...

To Chris

Rajeshwaran, Jeyabal, November 14, 2016 - 1:04 pm UTC

Thanks Chris, that helps.

Did you see any other smarter way of doing this apart from "Back reference positions (the two post above)" ?
Chris Saxon
November 14, 2016 - 1:39 pm UTC

See my response above.

Tweek the inputs

Rajeshwaran, Jeyabal, November 14, 2016 - 1:34 pm UTC

Any option to slightly tweek the inputs ?

Say instead of this

q'|app=24, dvc_host='age=1, sex=M, ht=151', sender='age=1, sex=F'|';

if we have the inputs like this.

q'|app:24, dvc_host:'age=1, sex=M, ht=151', sender:'age=1, sex=F'|';

(Basically we need a kind of differentiation between the "=" sign present inside the quoted string and outside the quoted string)

then it becomes like this.

demo@ORA11G> variable x varchar2(70)
demo@ORA11G> exec :x := q'|app:24, dvc_host:'age=1, sex=M, ht=151', sender:'age=1, sex=F'|';

PL/SQL procedure successfully completed.

demo@ORA11G> print x

X
----------------------------------------------------------------------------------------------
app:24, dvc_host:'age=1, sex=M, ht=151', sender:'age=1, sex=F'

demo@ORA11G>
demo@ORA11G> select regexp_replace( :x , '(\w+)=(\w+)' ,'\1%\2') from dual;

REGEXP_REPLACE(:X,'(\W+)=(\W+)','\1%\2')
----------------------------------------------------------------------------------------------
app:24, dvc_host:'age%1, sex%M, ht%151', sender:'age%1, sex%F'

1 row selected.

demo@ORA11G>

Chris Saxon
November 14, 2016 - 1:41 pm UTC

Figuring out which characters are inside the quotes and which aren't is the root of the whole problem, no?

Need help

A reader, August 09, 2018 - 5:50 pm UTC

Giving a formatted string S and input x value I need to get the relative position of x in S .

Eg . S = *654*6666*2*99
× = 2
So pos should be 3

X= 2 so pos should be 6666

X= 33 pos is null

The star is always the separator

Sorry a mistake in

A reader, August 09, 2018 - 5:56 pm UTC

X= 2 so pos should be 6666 is incorrect

Correction:
X= 6666 so pos should be 2

Ans so on.

Values separated are unique
Chris Saxon
August 10, 2018 - 8:16 am UTC

Here's one way:

- Find the position of your search value in the string.
- Substr the input to this point.
- Count the *s in this shortened string

e.g.

with val as (
  select '*654*6666*2*99' s from dual
)
  select s, 
         substr ( s, 1, instr ( s, '2' ) ) str,
         regexp_count ( substr ( s, 1, instr ( s, '2' ) ), '\*' ) pos
  from   val;

S                STR           POS   
*654*6666*2*99   *654*6666*2       3 


Of course, the above searches for the first instance of 2 in the string. If you need an exact match (e.g. *2*) you'll need to modify the substr to do this.

On the original question: handling quoted stuff

Stew Ashton, August 11, 2018 - 8:30 am UTC

The result we want is the position of the comma. If we put the comma in parentheses we can locate it as a "subexpression" without having to put it at the beginning of our regex pattern. I start by looking for (optional) quoted stuff, then the comma, then the rest.
with rws as (
  --select q'#abc=1,  name='1,2,3',  severity=,  age=24#' txt from dual
  --select q'#param1='1, 2, 3', param2='age=24, sex=M, sal=500', param3=1#' txt from dual
  --select q'#param2='age=24, sex=M, sal=500', param1='1, 2, 3', param3=1#' txt from dual
  select q'#param1='1, 2, 3', param3=1, param2='age=24, sex=M, sal=500'#' txt from dual
)
select '00000000011111111112222222222333333333344444444445555555555' txt, null ans from dual
UNION ALL
select '12345678901234567890123456789012345678901234567890123456789' txt, null ans from dual
UNION ALL
select txt,
  regexp_instr(txt, q'#('.*?')?($|((,) +.*?=))#', 1, level, 0, null, 4) ans
from rws
connect by
  regexp_instr(txt, q'#('.*?')?($|((,) +.*?=))#', 1, level, 0, null, 4) > 0;

TXT                                                                ANS
----------------------------------------------------------- ----------
00000000011111111112222222222333333333344444444445555555555           
12345678901234567890123456789012345678901234567890123456789           
param1='1, 2, 3', param3=1, param2='age=24, sex=M, sal=500'         17
param1='1, 2, 3', param3=1, param2='age=24, sex=M, sal=500'         27
Best regards,
Stew

great solution

A reader, September 20, 2018 - 5:00 pm UTC

you solution is very nice :

with val as (
select '*654*6666*2*99' s from dual
)
select s,
substr ( s, 1, instr ( s, '2' ) ) str,
regexp_count ( substr ( s, 1, instr ( s, '2' ) ), '\*' ) pos
from val;


so my final question is :
giving the same string s= '*654*6666*2*99'
how can get in one select statement the nth occurence for instance :
finding the 3rd item starting by the end gives "6666"
finding the third item starting from begining gives "2"

what is given is the -3 or +2 i.e get_item(-3) = '6666'
get_item(2) = '2'

so how write sql (not plsql) query to meet the need?




Chris Saxon
September 24, 2018 - 12:44 pm UTC

What's the issue with the query shown? It's one statement!

Another requirement

A reader, September 24, 2018 - 1:45 pm UTC

What's the issue with the query shown? It's one statement! 

No issue it's perfect.
What I need in order to complete my requirements is to do same one statement reg exp but this time given the position (say 3 for from the beginning or -3 for from the end) to get the 3rd item .
For instance I expect get 123 if my string is *43*22*123*321* and my input is 3. I expect get 123 for same string if my input position desired is 2.

So how write one statement for such requirement?
Many thanks
Chris Saxon
September 24, 2018 - 3:50 pm UTC

So you want to find the string at the Nth position? And if N is negative, work backwards from the end?

If you pass a negative value for the third argument of instr, it'll work back from the end:

with val as (
select '*43*22*123*321*' s 
from   dual
)
select s, 
       instr ( s, '*', 1, 2 ) fwd, 
       instr ( s, '*', -1, 2 ) bck
from val;

S                 FWD   BCK   
*43*22*123*321*       4    11 

Not exactly

A reader, September 24, 2018 - 5:49 pm UTC

Hi Chris

You are so close to the requirement.

The needed string separated by the separator *
I want to get for instance 22 which is the third substr ing starting from the end.

I need the whole substr delimited by *
Chris Saxon
September 25, 2018 - 1:08 pm UTC

So why does instr not help? The string has a trailing *, so you can use this to find the position of the 3rd & 4th *s from the end:

And pass this into substr.

yes but

A reader, September 25, 2018 - 4:34 pm UTC

hi Chris

ys the instr is good approach.
but onec in get the position (bck) i have to do a substr till the instr of the net '*'

is there a way to do it in reg_exp ?
so if may string is *43*22*123*321
and need to get the 3rd from the end i expect a result of '22'
how do it in one sql ?
Chris Saxon
September 27, 2018 - 10:04 am UTC

I'm not sure what the problem is with instr. What exactly have you tried?

Regexp_substr doesn't allow you to work backwards.

Provide final solution

A reader, September 28, 2018 - 5:38 am UTC

Ah. Didn't know that about regular expression.
Then how could finally get my expected feedback?

Need one single statement to return the 3rd occ from the end. Using whatever instr or anything.

Select ....... ('*00*434*5555*22*67*456' , -3) from dual

Result = 22

Select ....... ('*00*474*5555*654*67*456' , -3) from dual

Result = 654
Chris Saxon
October 02, 2018 - 8:47 am UTC

If you substr the string, starting from the third * working backwards (found using instr), you get:

with str as (
  select '*00*434*5555*22*67*456' s from dual
)
Select substr( s, instr(s, '*', -1, 3) ) from str;

SUBSTR(S,INSTR(S,'*',-1,3))   
*22*67*456         


From here it's just a matter of finding the length to include (hint: it's the difference between the 2nd & 3rd last *s).

How about a regex only solution?

cd, October 02, 2018 - 9:15 am UTC

WITH t AS (SELECT '*00*434*5555*22*67*456' str
             FROM dual
            UNION ALL
           SELECT '*00*474*5555*654*67*456' str
             FROM dual
          )
SELECT t.str
     , regexp_substr(t.str, '\*[^*]+', 1, regexp_count(t.str, '\*[^*]+') + 1 - 3) ret_val
  FROM t
;

STR                     RET_VAL                
----------------------- -----------------------
*00*434*5555*22*67*456  *22                    
*00*474*5555*654*67*456 *654       

to cd

A reader, October 02, 2018 - 4:10 pm UTC

great
but workaround if my str if small

ie. if str = '*555*111*121' should return 555
if str = '*555*111'should return null
curently it fail

Follow up

A reader, October 04, 2018 - 5:15 am UTC

Any workarround pls on the last post?
The reg exp should not fail if the third pos do not exist but should return null
Chris Saxon
October 04, 2018 - 11:23 am UTC

What exactly have you tried?

Answer to "reader"s last question

Stew Ashton, October 04, 2018 - 11:36 am UTC

WITH t AS (
  SELECT '*00*434*5555*22*67*456' str
  FROM dual
  UNION ALL
  SELECT '*00*434*5555*22*67' str
  FROM dual
  UNION ALLs
  SELECT '*00*434*5555*22' str
  FROM dual
  UNION ALL
  SELECT '*00*434*5555' str
  FROM dual
  UNION ALL
  SELECT '*00*474' str
  FROM dual
)
SELECT t.str,
  regexp_substr(
    t.str, 
    '\*[^*]+', 
    1,
    case when regexp_count(t.str, '\*') > 3 then regexp_count(t.str, '\*') -3 end
) ret_val
FROM t
;

STR                      RET_VAL   
*00*434*5555*22*67*456   *5555     
*00*434*5555*22*67       *434      
*00*434*5555*22          *00       
*00*434*5555                       
*00*474 

Best regards,
Stew Ashton
Chris Saxon
October 05, 2018 - 9:15 am UTC

Thanks Stew

Oops!

Stew Ashton, October 04, 2018 - 11:38 am UTC

There is a
UNION ALLs
in the above review. Sorry!

Check for no. of substrings

cd, October 04, 2018 - 2:44 pm UTC

Another reader already wrote a check condition, here's my updated version:

WITH t AS (SELECT '*00*434*5555*22*67*456' str
             FROM dual
            UNION ALL
           SELECT '*00*474*5555*654*67*456' str
             FROM dual
            UNION ALL
           SELECT '*555*111*121' str
             FROM dual
            UNION ALL
           SELECT '*555*111' str
             FROM dual
          )
SELECT t.str
     , CASE WHEN regexp_count(t.str, '\*') >= 3 THEN regexp_substr(t.str, '\*[^*]+', 1, regexp_count(t.str, '\*[^*]*') + 1 - 3) END ret_val
  FROM t
;

STR                     RET_VAL                
----------------------- -----------------------
*00*434*5555*22*67*456  *22                    
*00*474*5555*654*67*456 *654                   
*555*111*121            *555                   
*555*111                                       

Why use regexp?

AndyP, October 05, 2018 - 9:33 am UTC

Why use regexp, which needs special treatment for the short input strings case and, at least in the suggested solutions so far, leaves a leading delimiter in the output, when the "standard" instr/substr approach gets us the required result?

with data as
(
  select '*00*434*5555*22*67*456' str from dual union all
  select '*00*434*5555*22*67' from dual union all
  select '*00*434*5555*22' from dual union all
  select '*00*434*5555' from dual union all
  select '*00*474' from dual
)
select str
      ,instr(str,'*',-1,3) bck3
      ,instr(str,'*',-1,2) bck2
      ,instr(str,'*',-1,2)-instr(str,'*',-1,3)-1 bck3strlen
      ,substr(str,instr(str,'*',-1,3)+1,instr(str,'*',-1,2)-instr(str,'*',-1,3)-1) bck3str
from data;

STR                          BCK3       BCK2 BCK3STRLEN BCK3STR
---------------------- ---------- ---------- ---------- ----------------------
*00*434*5555*22*67*456         13         16          2 22
*00*434*5555*22*67              8         13          4 5555
*00*434*5555*22                 4          8          3 434
*00*434*5555                    1          4          2 00
*00*474                         0          1          0