Skip to Main Content

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question.

Asked: February 12, 2013 - 4:15 pm UTC

Last updated: April 09, 2020 - 1:21 am UTC

Version: 11.1.7

Viewed 10K+ times! This question is

You Asked

Hi Tom,
I need to replace a word from a String with null from a text field with the following rules.


I am trying to frame the rule:
So the rule is replace the word TEST with null but if the word TEST is towards the end of the string along with a hyphen - then even replace the - with null and also REMOVE any trailing space at the end.

There should not be more than one space between the words in the string


Examples :

1) -- QUOTE Is not part of string
TEXT_INPUT :
'THIS IS - TEST FILE'

TEXT_OUTPUT:
'THIS IS - FILE'

Note two spaces have been replaced by single space after we replace TEST with null -- intermediate result
'THIS IS -  FILE'
, final result
'THIS IS - FILE'


2)
TEXT_INPUT : 'THIS IS - TEST'
TEXT_OUTPUT: 'THIS IS'

3)
TEXT_INPUT : 'THIS IS TEST'
TEXT_OUTPUT: 'THIS IS'

4)
TEXT_INPUT :
'THIS IS - TEST'

TEXT_OUTPUT:
'THIS IS'


5)
TEXT_INPUT :
'THIS IS FILE1 TEST FILE2 - TEST'

TEXT_OUTPUT:
'THIS IS FILE1 FILE2'


and not :
'THIS IS FILE1  FILE2 - TEST' 
-- has two spaces , need one between FILE1 AND FILE2


How to do this using REGEXP ?

How to do this using simple replace , substr , trim ?

Thanks

Regarding

what about a string:

'This is -'

what should the output be in that case?

The Output should be :
'This is'



and Tom said...

ops$tkyte%ORA11GR2> select  '"' || x || '"',
  2          '"' ||
  3           replace( replace( replace(
  4            rtrim(
  5             rtrim(
  6               rtrim(
  7                 replace( x, 'TEST', '' ),
  8                 ' ' ),
  9             '-' ),
 10          ' ' ),
 11          ' ', ' @'), '@ ', ''), ' @', ' ')
 12          || '"'
 13    from  t
 14  /

'"'||X||'"'
------------------------------------------
'"'||REPLACE(REPLACE(REPLACE(RTRIM(RTRIM(RTRIM(REPLACE(X,'TEST',''),''),'-'),''
-------------------------------------------------------------------------------
"THIS IS - TEST FILE"
"THIS IS - FILE"

"THIS IS - TEST"
"THIS IS"

"THIS IS TEST"
"THIS IS"

"THIS IS - TEST"
"THIS IS"

"THIS IS FILE1 TEST FILE2 - TEST"
"THIS IS FILE1 FILE2"

"THIS IS FILE1  FILE2 - TEST"
"THIS IS FILE1 FILE2"


6 rows selected.

Rating

  (11 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

How to do the same using REGEXP

A reader, February 14, 2013 - 5:07 pm UTC

Hi Tom,
Thanks for your solution, is it possible to use
REGEXP for this requirement ?
Tom Kyte
February 14, 2013 - 5:32 pm UTC

probably, but the amount of cpu consumed by regexp makes me avoid them whenever possible

and it is possible to avoid regular expressions

so therefore I won't even begin to expend the brain cells to think about a regular expression...

regular expressions - very cool in vi...


Replace multiple spaces with a single space

A reader, February 14, 2013 - 6:09 pm UTC

Hi Tom,
1) What about if I want to replace multiple spaces ( the count I which can vary )with just one space.

How to do that ? Should we use REGEXP then ?

2) I see you are using @ a couple of times to replace space, But What about when we do not know the number of spaces

Hence the Question
Tom Kyte
February 15, 2013 - 9:21 am UTC

1) from the above code, we have:

replace( replace( replace(
  4            rtrim(
  5             rtrim(
  6               rtrim(
  7                 replace( x, 'TEST', '' ),
  8                 ' ' ),
  9             '-' ),
 10          ' ' ),
 11          ' ', ' @'), '@ ', ''), ' @', ' ')



the rtrim(rtrim(rtrim(replace()))) is getting rid of TEST, trailing spaces and trailing - so....

replace( replace( replace( X ), ' ', ' @'), '@ ', ''), ' @', ' ')

gets rid of multiple spaces (assuming @ is not a valid character in your string...


regexp is not necessary


2) try it out, you don't need to know....

ops$tkyte%ORA11GR2> select replace( replace( replace( X , ' ', ' @'), '@ ', ''), ' @', ' ')
  2    from (select 'hello                                                 world' x from dual )
  3  /

REPLACE(REP
-----------
hello world


@: A reader

Rajeshwaran, Jeyabal, February 15, 2013 - 7:10 am UTC

1) What about if I want to replace multiple spaces ( the count I which can vary )with just
one space.

How to do that ? Should we use REGEXP then ?

Try, running the Tom's query bits and pieces (from inwards to outward) and see how it works..once you an idea of that, working with your question is easy.

You still dont need an REGEXP to replace multiple spaces..you can use triple-replace Trick as provided below.
drop table t purge;
create table t(x varchar2(10));
insert into t values('a b');
insert into t values('a  b');
insert into t values('a   b');
insert into t values('a    b');
insert into t values('a     b');
commit;
rajesh@ORA10G> select x,
  2    replace(replace(replace(x,' ','@ '),' @',''),'@ ',' ') new_x,
  3    regexp_replace(x,'(\s)+','\1') regexp_x
  4  from t;

X          NEW_X                REGEXP_X
---------- -------------------- --------------------
a b        a b                  a b
a  b       a b                  a b
a   b      a b                  a b
a    b     a b                  a b
a     b    a b                  a b

Elapsed: 00:00:00.01
rajesh@ORA10G>

To know which one is best,(using regexp or triple-replace trick), stuff some high volume of data in 'T' and start running this query with sql_trace=true and use Tkprof to see comparison, and for sure Regexp will never scale up.
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:73830657104020#2341120800346243823

So why more CPU?

Ojas N, August 12, 2013 - 3:56 pm UTC

Tow, we are facing a similar issue where a developer put multiple REGEXP_REPLACE functions in the SELECT clause which seemed to be killing our query (running close to 4 hours) and when replaced the query with TRANSLATE/REPLACE, the performance was much more acceptable (<10 mins). The question is, how has Oracle implemented REGEXP to make it so CPU intensive and causing slowdowns & why does it not show in the explain plan as a red flag?
Tom Kyte
August 12, 2013 - 4:27 pm UTC

Oracle hasn't implemented REGEXP to be so CPU intensive - it is that regular expressions are CPU intensive - regardless of who implements them.


You can write in a small bit of regexp something that is so computationally expensive it'll blow your mind. It is trivial to write expensive regexp.

there is nothing to red flag, the history of regular expressions is

a) fantastically powerful
b) computationally expensive


My rule is

1) if you can do it with substr, translate, replace, trim, etc - do it
2) use a regular expression only when you absolutely have to
3) take your regular expression and in your head imagine the code you would have to write to emulate it - that'll give you an idea of how expensive it is going to be.

POSIX Operator in regexp

Rajeshwaran Jeyabal, March 02, 2016 - 12:55 pm UTC

Team,

I read about POSIX regexp here
http://docs.oracle.com/database/121/ADFNS/adfns_regexp.htm#ADFNS1012

Dot(.) - is a meta character that matches any single character except the new line character
square braces [] -is a meta character that list of matching character set.

given this pattern "A.C" - that tell us between A and C need to have any single character.

the data ABBC doesn't match with the pattern so no data returned.

rajesh@ORA12C> select * from dual where regexp_like('ABBC','A.C');

no rows selected


but what does this pattern [A.C] denotes? why does the condition is True now? could you please us to understand?

rajesh@ORA12C> select * from dual where regexp_like('ABBC','[A.C]');

D
-
X

1 row selected.

rajesh@ORA12C>

Chris Saxon
March 03, 2016 - 2:38 am UTC

Square brackets denote a class of items to match.

So [A.C] says "match A or match C or match period"

Hence

SQL> select * from dual where regexp_like('ABBC','A.C');

no rows selected

SQL> select * from dual where regexp_like('ABBC','[A.C]');

D
-
X

SQL> select * from dual where regexp_like('ABBC','[Z.Y]');

no rows selected

SQL> select * from dual where regexp_like('ABBC','[A.C]');

D
-
X

SQL> select * from dual where regexp_like('ABBC','[AC]');

D
-
X

SQL> select * from dual where regexp_like('ABBC','[A]');

D
-
X

SQL> select * from dual where regexp_like('ABBC','[X]');

no rows selected

SQL> select * from dual where regexp_like('ABBC','[.]');

no rows selected

SQL> select * from dual where regexp_like('ABBC','[AD]');

D
-
X

SQL> select * from dual where regexp_like('ABBC','[AD]');

D
-
X


POSIX Operator in regexp

Rajeshwaran Jeyabal, March 03, 2016 - 4:03 am UTC

So [A.C] says "match A or match C or match period"

Thanks Connor. Often i missed out these kind of simple things.

When you say "match period" dot(.) is treated as literal not as meta-character right?
Connor McDonald
March 03, 2016 - 7:09 am UTC

Thats my understanding yes.

Glad we could help.

Pattern Help.

Rajeshwaran Jeyabal, August 31, 2017 - 8:07 am UTC

Team:

what does this pattern "[^[!-~]]*" represents ?

zero or more occurance that should not have either of this characters "!-~" ?

drop table t purge;
create table t(x varchar2(20));
insert into t values('!abc');
insert into t values('-abc');
insert into t values('~abc');
insert into t values('!-~abc');
insert into t values('abc!');
insert into t values('abc-');
insert into t values('abc~');
insert into t values('abc!-~');
insert into t values('abcdef');
insert into t values('1234');
commit;

demo@ORA11G> column x1 format a10
demo@ORA11G> select x, regexp_replace( x,'[^[!-~]]*','.' ) x1
  2  from t
  3  /

X                    X1
-------------------- ----------
!abc                 !abc
-abc                 -abc
~abc                 ~abc
!-~abc               !-~abc
abc!                 abc!
abc-                 abc-
abc~                 abc~
abc!-~               abc!-~
abcdef               abcdef
1234                 1234

10 rows selected.

Sergiusz Wolicki
September 01, 2017 - 4:21 am UTC

[^[!-~]]*

The pattern means "confuse me" :-D

More seriously, the pattern means: "any character that is not a '[' and does not sort between '!' and '~' (in the order decided by the NLS_SORT session parameter), followed by 0 or more occurrences of ']'. Square brackets do not simply nest in regular expressions. Note also that '!' is the first printable ASCII character (after space) and '~' is the last printable ASCII character. Therefore, if NLS_SORT=BINARY, the range [!-~] means all printable ASCII characters except space.

If you want a pattern that says "zero or more occurrences that should not have either of these three characters "!-~", then use
[^-!~]*


clarification with pattern

Rajeshwaran, Jeyabal, February 15, 2018 - 6:29 am UTC

Team,

I am able to understand this pattern "([^,]+),[^,;]*;" - look for any occurrence of AS MANY non-comma characters AS POSSIBLE,
followed by comma followed by as many non-comma, non-semi-colon characters as possible followed by semicolon.
Every such occurrence that is found is simply removed.

Question is this portion of the pattern [^,]+ why doesn't it match with this data AAA alone in the input string instead it matches with AAA,55 - please clarify

demo@ORA12C> select x1,regexp_replace( x1, '([^,]+),[^,;]*;') x2 ,
  2      regexp_replace( x1, '([^,]+),[^,;]*;','\1') x3
  3  from (
  4  select 'AAA,55,2;' x1 from dual
  5      )
  6  /

X1        X2                   X3
--------- -------------------- --------------------
AAA,55,2; AAA,                 AAA,55

demo@ORA12C>

Chris Saxon
February 15, 2018 - 11:42 am UTC

It doesn't match with AAA,55. It matches with 55 only.

Regexp_replace is showing what you didn't match. Regexp_substr shows what you did:

select x1,
       regexp_substr( x1, '([^,]+),[^,;]*;') x3
from (
  select 'AAA,55,2;' x1 from dual
);

X1          X3      
AAA,55,2;   55,2;  


AAA never matches your regular expression.

a bug with REGEXP_REPLACE with LOBs

fred, April 06, 2020 - 3:09 pm UTC

Simple test case, 1st with VARCHAR2, then one with LOB;

create table test_vc (col1 varchar2(30));
insert into test_vc values (chr(9));
select count(*) from test_vc  -- return 0 rows -- as expected
where regexp_replace(col1, '\s', '') is not null ;

create table test_clob (col1 clob);
insert into test_clob values (chr(9));
select count(*) from test_clob  -- return 1 rows -- BUT EXPECTING 0 ?
where regexp_replace(col1, '\s', '') is not null ;


Is it just me getting this weird result ?
Connor McDonald
April 09, 2020 - 1:21 am UTC

It is a subtle difference here when it comes to lobs

SQL> set serverout on
SQL> declare
  2    x clob;
  3  begin
  4    select regexp_replace(col1, '\s', '')
  5    into   x
  6    from   test_clob;
  7
  8    if x is null then
  9      dbms_output.put_line('NULL');
 10    end if;
 11
 12    if x = empty_clob() then
 13      dbms_output.put_line('EMPTY CLOB');
 14    end if;
 15  end;
 16  /
EMPTY CLOB

PL/SQL procedure successfully completed.


with a regular expression

GrantO403, February 24, 2021 - 7:49 am UTC

I've been away from Oracle for a little while but this is how I would expect it to work with regexp_replace

trim(   regexp_replace(<<your string>>, '(^|\s)+((TEST|test)(\s|-|$)+)*',' ') )


I think this would work for a case insensitive match of test
trim(   regexp_replace(<<your string>>, '(^|\s)+(TEST(\s|-|$)+)*',' ', 'i') ) 


This breaks down as
'(^|\s)+ -- starts with the beginning-of-string and/or spaces
(
(TEST|test)
--followed by upper or lower case TEST. I believe you can is the 'I' flag as a fourth arguement in oracle to make it case insensitive
(\s|-|$)+ --followed by any combination and number of spaces, dashes end-of-string
)* -- and zero or many instances of the pattern between this enclosing set of brackets which is the word TEST with spaces/dashes. this is so if you have TEST TEST TEST, all of these will be caught

The trim is there to catch leading and trailing spaces introduced by the replace where there are matches at the beginning or end of your string

with a regular expression - update...

GrantO403, February 24, 2021 - 8:17 am UTC

I had missed that you wanted it to catch leading dashes as well

trim( regexp_replace(<<your string>>, '(^|\s|-)+(TEST(\s|-|$)+)*',' ', 'i') )


so you could use this as
select 
    trim(regexp_replace(x, '(^|\s|-)+((test)(\s|-|$)+)*', ' ', 1, 0, 'i') ) output_val
from
    (
    select
        '       TEST     THIS IS A TEST - SENTENCE   WITHOUT test THE - TEST - WORD TEST T E S T TEST--' x
    FROM dual) a;

OUTPUT_VAL                                 |
-------------------------------------------|
THIS IS A SENTENCE WITHOUT THE WORD T E S T|


This should work for you.