substr/length will do?
Kirill Richine, February 01, 2007 - 11:21 pm UTC
Do you really need regexp for this?
update t set
output = case
when substr (input, 1,4) = '0612'
or (substr (input, 1,4)='0614' and substr (input,-1,3) = '312')
then substr (input, 5)
when substr (input, 1, 3) = '613' and length (input)=8
then '0612' || substr (input, 4)
else input
end;
with regexp_replace:
'^0612',''
'^0614(.*?312)$','\1'
'^613(.{5})$','0612\1'
Update is so powerful
Vivek Nema, February 02, 2007 - 3:15 am UTC
Oh thats perfect...You are great
I got chance to understand regular expression in 10G and I really like that feature.
My problem is solved with update statement.
Excellent answer
The merits of regular expressions ...
cd, February 02, 2007 - 8:57 am UTC
Regexp_replace Performace
Vikas, March 28, 2007 - 4:54 pm UTC
Hi tom,
Can this filter be made more performant, it takes 20 mins to scan 34 million rows of data. I think it is because of double regexp_replace.
Ist regexp_replace replaces all junk characters by space and the other regexp_replace does act on the strings formatted by the Ist regexp_replace to weed out more than one space between words.
regexp_replace(Regexp_replace(trim(lower(request_terms)),'[^a-z,_, ,A-Z,0-9,@,&,$,%,'']',' '),'[[:space:]]{1,}',' ')
Can they somehow be converted to one regexp_replace?
Thanks
March 30, 2007 - 11:44 am UTC
that is almost 30,000 executions per second. Not bad when you look at it that way.
how about you state as a requirement what you are trying to do in your function.
Business Requirements
vikas, March 30, 2007 - 2:41 pm UTC
Hi Tom,
Here are the business requirements:
1.Eliminate all characters from the String except a-z,A-Z,_,0-9,@,&,$,%
2. If there are more than one space between words replace them with a space.
3. Filter these strings having length < 256 char's
& having count(*) > 10
Select /*+ parallel(rd 16) */
regexp_replace(Regexp_replace(trim(lower(request_terms)),'[^a-z,_, ,A-Z,0-9,@,&,$,%,'']',' '),'[[:space:]]{1,}',' '),
Count(*),
Sum(Request_Count)
FROM Request_data rd
WHERE
Insert_date >= to_date('23/03/2007 00:00:00'.'dd/mm/yyyy hh24:mi:ss') AND Insert_date < to_date('23/03/2007 00:00:00'.'dd/mm/yyyy hh24:mi:ss') + 1
AND (Request_terms is NOT NULL AND Request_terms <> ' ')
AND Length(regexp_replace(Regexp_replace(trim(lower(request_terms)),'[^a-z,_, ,A-Z,0-9,@,&,$,%,'']',' '),'[[:space:]]{1,}',' ')) < 256
Group by regexp_replace(Regexp_replace(trim(lower(request_terms)),'[^a-z,_, ,A-Z,0-9,@,&,$,%,'']',' '),'[[:space:]]{1,}',' ')
Having Count(regexp_replace(Regexp_replace(trim(lower(request_terms)),'[^a-z,_, ,A-Z,0-9,@,&,$,%,'']',' '),'[[:space:]]{1,}',' ')) > 10;
When this query runs the CPU climbs ~ to 98%
18:38:13 up 6 days, 2:37, 4 users, load average: 22.28, 21.51, 18.44
287 processes: 262 sleeping, 25 running, 0 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 97.9% 0.8% 1.1% 0.0% 0.1% 0.0% 0.0%
cpu00 93.7% 3.3% 2.5% 0.0% 0.4% 0.0% 0.0%
cpu01 98.7% 0.0% 1.2% 0.0% 0.0% 0.0% 0.0%
cpu02 99.5% 0.0% 0.4% 0.0% 0.0% 0.0% 0.0%
cpu03 99.5% 0.0% 0.4% 0.0% 0.0% 0.0% 0.0%
Mem: 16304524k av, 14338584k used, 1965940k free, 0k shrd, 186008k buff
March 30, 2007 - 4:42 pm UTC
well, parallel 8 is the most you ever want to do on a 4 cpu machine.
and only if you are the only user of course.
so, given that you have 4 processes running on each of the 4 cpu's - i would hope they would be 100% - anything less and we'd have a problem.
but we might find this less cpu intensive if we materialized the regexp result
Select
str,
Count(*),
Sum(Request_Count)
FROM (select object_id request_count, object_name request_terms, to_date('23/03/2007 00:00:00','dd/mm/yyyy hh24:mi:ss') insert_date ,
regexp_replace(Regexp_replace(trim(lower(object_name)),'[^a-z,_, ,A-Z,0-9,@,&,$,%,'']',' '),'[[:space:]]{1,}',' ') str
from all_objects ) rd
WHERE Insert_date >= to_date('23/03/2007 00:00:00','dd/mm/yyyy hh24:mi:ss') AND Insert_date < to_date('23/03/2007 00:00:00','dd/mm/yyyy hh24:mi:ss') + 1
AND Length(str) < 256 Group by str Having Count(str) > 10
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 5 11.98 11.71 0 65372 0 46
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 7 11.98 11.71 0 65372 0 46
--------------------------------------------------------------------------------
Select
str,
Count(*),
Sum(Request_Count)
FROM (select ROWNUM r, object_id request_count, object_name request_terms, to_date('23/03/2007 00:00:00','dd/mm/yyyy hh24:mi:ss') insert_date ,
regexp_replace(Regexp_replace(trim(lower(object_name)),'[^a-z,_, ,A-Z,0-9,@,&,$,%,'']',' '),'[[:space:]]{1,}',' ') str
from all_objects ) rd
WHERE
Insert_date >= to_date('23/03/2007 00:00:00','dd/mm/yyyy hh24:mi:ss') AND Insert_date < to_date('23/03/2007 00:00:00','dd/mm/yyyy hh24:mi:ss') + 1
AND Length(str) < 256
Group by str
Having Count(str) > 10
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.05 0.06 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 5 4.98 4.87 0 65372 0 46
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 7 5.04 4.93 0 65372 0 46
Why use 2 regexp functions ...
cd, April 02, 2007 - 8:45 am UTC
... if the rules fit into 1? According to the given business rules (and the additional trim/lower function), maybe this might improve performance (and readability)?
WITH t AS (SELECT 'axzvrA1_ 0123@&$% ' col1
FROM dual
)
SELECT t.col1
, LOWER(TRIM(REGEXP_REPLACE(t.col1, '([a-zA-Z0-9_@&$%])|( )+|.', '\1\2'))) new_col1
FROM t
;
C.
Using regexp_replace to replace N patterns
Naresh, November 17, 2007 - 7:07 pm UTC
hi Tom,
I have to replace repeating patterns in a string as below:
'remain1=1;remain2=2;' should give an output as
<remain1><value=1></remain1> <remain2><value=2></remain2>
Is this possible to do with a single regexp_replace command?
I tried below, but not getting the result I am looking for:
SQL> select regexp_replace ('remain1=1;remain2=2;', '(^([a-z0-9]*)=([^;]*);)?{1,}',
2 '<\2> <value = \3> </\2>')
3 from dual
4 /
REGEXP_REPLACE('REMAIN1=1;REMAIN2=2;','(^([A-Z0-9]*)=([^;]*);)?{1,}','<\2><VALUE
--------------------------------------------------------------------------------
<remain1> <value = 1> </remain1><> <value = > </>r<> <value = > </>e<> <value =
> </>m<> <value = > </>a<> <value = > </>i<> <value = > </>n<> <value = > </>2<>
<value = > </>=<> <value = > </>2<> <value = > </>;<> <value = > </>
Thanks,
Naresh
Tony Reed, February 19, 2008 - 5:23 pm UTC
Hi Naresh,
try this one:
select regexp_replace('remain1=1;remain2=2;'
,'([a-z0-9]*)=([0-9]*);'
,'<\1><value=\2></\1>'
)
from dual;
ps. Regexbuddy is another good application
http://www.regexbuddy.com/ btw. Tom, let me know when you're accepting new questions. I'm having performance issues with large strings ;-)
Horrendous Performance For Nested REGEXP_REPLACE
A reader, September 24, 2009 - 5:23 pm UTC
update abc set comments = regexp_replace(regexp_replace(regexp_replace(regexp_replace(
regexp_replace(regexp_replace(regexp_replace(regexp_replace(
regexp_replace(regexp_replace(regexp_replace(regexp_replace(comments, '^@', ''),
'[^:alphanum:]{0,}[^:space:]{0,}ALL( ){0,}(AFFECTED)( ){0,}(PEOPLE)( ){0,}:( ){0,}', 'people:', 1, 1, 'i'),
'[^:alphanum:]{0,}[^:space:]{0,}PEOPLE( ){0,}(CHANGED)*( ){0,}:( ){0,}', 'people:', 1, 1, 'i'),
'[^:alphanum:]{0,}[^:space:]{0,}LAB( ){0,}(CODES)*( ){0,}DESCRIPTION( ){0,}:{0,}( ){0,}', 'code:', 1, 1, 'i'),
'[^:alphanum:]{0,}[^:space:]{0,}(LAB)*( ){0,}CODES( ){0,}DESCRIPTION( ){0,}:{0,}( ){0,}', 'code:', 1, 1, 'i'),
'[^:alphanum:]{0,}[^:space:]{0,}(LAB)*( ){0,}RESOLUTION( ){0,}DESCRIPTION( ){0,}:{0,}( ){0,}', 'resolution:', 1, 1, 'i'),
'[^:alphanum:]{0,}[^:space:]{0,}BACK( ){0,}(BROKEN)*( ){0,}', 'back:', 1, 1, 'i'),
'[^:alphanum:]{0,}[^:space:]{0,}HIP( ){0,}PAIN( ){0,}', 'back:', 1, 1, 'i'),
'[^:alphanum:]{0,}[^:space:]{0,}NEW( ){0,}THERAPY( ){0,}(REQUIRED)*( ){0,}:{0,}( ){0,}', 'therapy:', 1, 1, 'i'),
'[^:alphanum:]{0,}[^:space:]{0,}CATALOGUE( ){0,}(INFORMATION)*( ){0,}:{0,}( ){0,}', 'catalogue:', 1, 1, 'i'),
'[^:alphanum:]{0,}[^:space:]{0,}COMPENSATION( ){0,}:{0,}( ){0,}', 'comp:', 1, 1, 'i'),
'[^:alphanum:]{0,}[^:space:]{0,}PERSONAL( ){0,}NOTES( ){0,}:{0,}( ){0,}', 'notes:', 1, 1, 'i');
The 'update' does exactly what I intended it to do. However the performance is horrendous.
Takes hours to update 30K rows.
How can I improve the performance ?
September 29, 2009 - 8:16 am UTC
how about you put into words, like as in "a specification", what you want to do.
Don't expect us to reverse engineer a big nested regular expression and be able to rewrite it - we are not compilers - give us a SPECIFICATION.
Posting with required details
A reader, October 05, 2009 - 3:24 pm UTC
What I have is free text with some keywords in it. These alphabetical keywords always
appear first next to an optional non-alphanumeric character like "@" or "{" or "}.
The following rows can be fetched from a table which has a patient_prob_id as an fk.
eg.
select line_text from pat_records where parent_prob_id = 100 order by lineno
will give me the following ordered records -
@ free text that i shud ignore.
@ free text that i shud ignore.
@ parser is about to start scanning from next line since it sees a keyword
@ ALL AFFECTED PEOPLE: all affected patients displaying this symptom should be
@ immediately taken to ER. parser is now in action
@ blah ...
@ blah blah
@ LAB CODES DESCRIPTION: LC900, LC1000
@ LC1001
@ BACK BROKEN : Yes
@ COMPENSATION: $100,000
@ *****
@ I saw ***** above which mean's i row 11 was the last row for the parser.
I need to replace text with a consistent token which I parse later using parser written in pl/sql. Think of the regexp_replace
as a tokenizer.
Tokens are -
Token 1 : "ALL AFFECTED PEOPLE:" / "AFFECTED PEOPLE :" / "PEOPLE CHANGED :" -> replace with "people:"
Token 2 : "LAB CODES DESCRIPTION" / "LAB DESCRIPTION" / "CODES DESCRIPTION" -> replace with "code:"
Token 3 : "LAB RESOLUTION DESCRIPTION" / "RESOLUTION DESCRIPTION" / "LAB DESCRIPTION" -> replace with "resolution:"
Token 4 : "BACK" / "BACK BROKEN" -> replace with "back:"
Token 5 : "NEW THERAPY REQUIRED" / "NEW THERAPY" -> replace with "therapy:"
Token 6 : "CATALOGUE INFORMATION" / "CATALOGUE" -> replace with "catalogue:"
Token 7 : "COMPENSATION:" -> replace with "comp:"
Token 8 : "PERSONAL NOTES" -> replace with "notes:"
The quesry should return me rows as follows -
@ free text that i shud ignore.
@ free text that i shud ignore.
@ parser is about to start scanning from next line since it sees a keyword
poeple: all affected patients displaying this symptom should be
@ immediately taken to ER. parser is now in action
@ blah ...
@ blah blah
code: LC900, LC1000
@ LC1001
back: Yes
comp: $100,000
@ *****
@ I saw ***** above which mean's i row 11 was the last row for the parser.
Obviously I've modified the token names for confidential reasons. But rest assured that the replaced tokens ("people:", "code:" etc) will not appear anywhere in the free text.
October 08, 2009 - 6:52 am UTC
just use replace, nothing here called for the overhead and expense of regular expressions.
I'd demonstrate, but there is no example to work with :( No create, no inserts (now - how hard would that have been for you?)
I have no idea what you mean by "parser starts" and "*****" parser ends as you seem to have a very very very simple problem here and nothing remotely related to "parsing" even seems to come into play?
A reader, October 15, 2009 - 5:47 am UTC
Hello Tom,
I want to remove the characters before ":".
Ex: si:busm1-194e12.8:BC085418.1^si:busm1-194e12.8:BC165072.1^
The delimiter for the above word is '^' and i want the result should be BC085418.1^BC165072.1^.
I want to user regexp_replace to do this.
Thanks in Advance.
regexp_replace's parameter "occurance"
brucehuang, November 29, 2009 - 8:27 pm UTC
Hi Tom:
select regexp_replace('1234567890', '(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)', '\1')
from dual;
-------
return 1
-------
how to return 0? because the 0 is 10th character.
November 30, 2009 - 1:45 am UTC
explain what you are trying to accomplish, not what you have tried. so - zero is the tenth character, if you wanted the 10th character you would use substr of course
substr(str,10,1)
but, you probably don't want the 10th character - tell us what you want (and please don't say zero - give us the specification)
regexp_replace
A reader, January 11, 2010 - 6:13 pm UTC
select *
from
(
select ' abc: def ' from dual
union
select ' nnn abc ? xyz ' from dual
union
select ' ? xyz abc def ' from dual
)
expected output -
'L1:def '
' nnn abc ? xyz '
'L3:abc def '
i.e.
1. remove leading and trailing spaces if the first word on the line is 'abc:' and replace with 'L1:'
2. remove leading and trailing spaces if the first two words on the line are '? xyz:' and replace with 'L2:'
3. all the other rows: leave them unchanged
How do I achieve that using regexp_replace/replace? How will it perform for a table with a million rows to be scanned for the above patterns?
January 18, 2010 - 12:30 pm UTC
regular expressions are not my forte' but...
ops$tkyte%ORA11GR2> select regexp_replace( regexp_replace( d, '^[ ]*abc:', 'L1:' ), '^[ ]*[?] xyz', 'L2:' )
2 from
3 (
4 select ' abc: def ' d from dual
5 union
6 select ' nnn abc ? xyz ' from dual
7 union
8 select ' ? xyz abc def ' from dual
9 )
10 /
REGEXP_REPLACE(REGEXP_REPLACE(D,'^[]*ABC:','L1:'),'^[]*[?]XYZ','L2:')
-------------------------------------------------------------------------------
L1: def
L2: abc def
nnn abc ? xyz
as for performance, regexp has a high CPU cost associated with it. you could add
"where str like '%abc:%' or str like '%? xyz%'"
to limit your search to just possible candidates.
Can we replace this
Snehasish Das, January 13, 2011 - 11:31 am UTC
Hi Tom,
I have a requirement like as below
The column card_reader is corrupted and instead of having numbers like 12343 it has 112222333433
can u suppress the continuous digits to a single one using regular expression
Regards,
Snehasish Das
January 13, 2011 - 1:55 pm UTC
I asked Tyler Muth - my reference for all things 'regexp' and he said:
select regexp_replace('11122233333334333222','(.)\1+','\1') from dual;
might be what you are looking for - not 100% tested for all cases - but it seems logical
But in real life ...
AndyP, January 14, 2011 - 3:28 am UTC
Probably can't be done, because surely there exist card numbers where digits are repeated perfectly legally
REGEXP_REPLACE
A reader, January 18, 2011 - 8:53 am UTC
Hi Tom,
I have a string containing one, more ore no placeholders and I want to replace those placeholders with the items from a list in another string. I cannot use dynamic sql. I already created a function that does what I want and also checks if the number of placeholders in the first string is equal to the number of items in list in the second string but I think this could be done easier using REGEXP_REPLACE. Can you please help? Oracle version 11.2.0.1.0.
Business cases:
Example 1 (no placeholders):
String 1: 'Script was applied on database'
String 2: ''
Desired result: 'Script was applied on database'
Example 1 (one placeholder):
String 1: 'Script %1 was applied on database'
String 2: 'qwe'
Desired result: 'Script qwe was applied on database'
Example 1 (two placeholders):
String 1: 'Script %1 was applied on database %2'
String 2: 'qwe,asd'
Desired result: 'Script qwe was applied on database asd'
The placeholders can be named anyhow you want (ex: name all of them as % instead of %1 %2 and so on).
Actual code I wrote (I want to use this for displaying error messages):
create or replace package EXCEPTIONS is
/* builds the exception message using a list of arguments */
function error_text (pi_error_msg in varchar2, pi_string in varchar2) return varchar2;
/* raised if the script was already applied */
e_ScriptAppliedException exception;
/* error code for e_ScriptAppliedException */
e_ScriptAppliedException_code CONSTANT INTEGER := -20002;
/* error message for e_ScriptAppliedException */
e_ScriptAppliedException_msg CONSTANT VARCHAR2(1000) :=
'Script %1 was already applied on database %2.';
end EXCEPTIONS;
/
create or replace package body EXCEPTIONS is
function error_text (pi_error_msg in varchar2, pi_string in varchar2) return varchar2 is
l_percents integer;
l_commas integer;
l_error_msg varchar2(1000);
l_value varchar2(1000);
l_percent varchar2(1) := '%';
l_separator varchar2(1) := chr(10);
l_idx pls_integer := 1;
begin
l_error_msg := pi_error_msg;
l_percents := regexp_count(pi_error_msg,l_percent);
l_commas := regexp_count(pi_string,l_separator);
if l_percents = l_commas + 1 then
if l_commas >= 0 then
for i in 1..l_commas + 1 loop
l_value := regexp_substr(pi_string, '[^' || l_separator || ']+', 1, l_idx);
l_error_msg := replace(l_error_msg,'%'||l_idx,l_value);
l_idx := l_idx + 1;
end loop;
end if;
end if;
return l_error_msg;
end error_text;
end EXCEPTIONS;
/
begin
raise exceptions.e_ScriptAppliedException;
EXCEPTION
WHEN exceptions.e_ScriptAppliedException THEN
RAISE_APPLICATION_ERROR(EXCEPTIONS.e_ScriptAppliedException_code,
EXCEPTIONS.error_text(EXCEPTIONS.e_ScriptAppliedException_msg,'current_script'||chr(10)||'current_db'));
end;
/
regexp_replace
Sumanth, August 03, 2011 - 2:14 am UTC
I have a requirement like to replace values in one string with values in another string
like this
source string extract from {1} of {2} it is well and good
replace string replace with 4~5 characters (separated with '~')
here there can be {1} {2}.....n in first string and replacing characters will be 4~5....n in second string.
so final returning string should be extract from 4 of 5 it is well and good
could you please help with query using regular expression
Regexp_Replace
Rajeshwaran, Jeyabal, April 30, 2012 - 9:56 am UTC
Tom:
I was reading about Regexp_replace from docs.
http://docs.oracle.com/cd/E11882_01/server.112/e26088/functions149.htm#i1305521 <quote>
'm' treats the source string as multiple lines. Oracle interprets the caret (^) and dollar sign ($) as the start and end, respectively, of any line anywhere in the source string, rather than only at the start or end of the entire source string. If you omit this parameter, then Oracle treats the source string as a single line.
</quote>
I gave the match_param as 'm' in the below query to treat input string as multiple lines. Why not the first occurance of 'T' in EACH LINE is replaced with '$'? is that my understand with docs explanation is wrong?
rajesh@ORA11GR2> with datas as
2 (
3 select 'This is line1'||chr(10)||'This is line2' as txt
4 from dual
5 )
6 select txt,
7 regexp_replace(txt,'T','$',1,1,'m') val1,
8 regexp_replace(txt,'^T','$',1,1,'m') va2
9 from datas
10 /
TXT VAL1 VA2
--------------- --------------- ---------------------------
This is line1 $his is line1 $his is line1
This is line2 This is line2 This is line2
Elapsed: 00:00:00.00
rajesh@ORA11GR2>
rajesh@ORA11GR2>
Regexp_Replace
Rajeshwaran, Jeyabal, April 30, 2012 - 12:32 pm UTC
Sorry Tom, I think i got it.
regexp_replace(txt,'T','$',1,1,'m') val1, I framed regexp saying starting from the First character replace the first occurance.
Just replaced 1 with 2 and got the correct result. sorry its just a mis-understanding from my end.
rajesh@ORA11GR2> with datas as
2 (
3 select 'This is line1'||chr(10)||'This is line2' as txt
4 from dual
5 )
6 select txt,
7 regexp_replace(txt,'T','$',1,2,'m') val1,
8 regexp_replace(txt,'^T','$',1,2,'m') va2
9 from datas
10 /
TXT VAL1 VA2
--------------------------- --------------------------- -----------------------
This is line1 This is line1 This is line1
This is line2 $his is line2 $his is line2
Elapsed: 00:00:00.01
rajesh@ORA11GR2>
Regexp_replace
Rajeshwaran, Jeyabal, May 01, 2012 - 1:51 am UTC
Tom:
I gave the match_param as 'm' in the below query to treat input string as multiple lines. Why not the first occurance of 'T' in EACH LINE is replaced with '$'? is that my understand with docs explanation is wrong?It becomes very very crystal clear now with this quote from docs.
http://docs.oracle.com/cd/E11882_01/server.112/e26088/functions149.htm#i1305521 <quote>
occurrence is a nonnegative integer indicating the occurrence of the replace operation:
If you specify 0, then Oracle replaces all occurrences of the match.If you specify a positive integer n, then Oracle replaces the nth occurrence.
</quote>
Finally i got it Thanks!
rajesh@ORA10GR2>
rajesh@ORA10GR2> with datas as
2 (select 'Text1'||chr(10)||'Text2' txt from dual)
3 select regexp_replace(txt,'^T','$',1,0,'m')
4 from datas;
REGEXP_REPL
-----------
$ext1
$ext2
Elapsed: 00:00:00.11
rajesh@ORA10GR2>
rajesh@ORA10GR2>
Regexp_replace for multibyte character
A reader, April 30, 2013 - 6:53 am UTC
Hi Tom,
I was using regexp_replace for multibyte character string .
The organization_name(from japan) is multibyte character set
which is present over 2 tables(a and b ).
My 1st query giving me the result but 2nd does not .
1)upper(a.organization_name) - upper(b.organization_name)
2)upper(REGEXP_REPLACE(a.organization_name, '[^A-Za-z0-9]', '')) = upper(REGEXP_REPLACE(b.organization_name, '[^A-Za-z0-9]', ''))
Could you please help me out in this regard.
--Asit
Obsolete link
Gus Spier, July 07, 2017 - 9:45 pm UTC
July 08, 2017 - 11:24 am UTC