• Questions
  • Unexpected result with Regular Expression

Breadcrumb

Announcement

Forty years

Connor and Chris don't just spend all day on AskTOM. You can also catch regular content via Connor's blog and Chris's blog. Or if video is more your thing, check out Connor's latest video and Chris's latest video from their Youtube channels. And of course, keep up to date with AskTOM via the official twitter account.

Question and Answer

Chris Saxon

Thanks for the question, Nilesh.

Asked: June 07, 2019 - 5:41 pm UTC

Answered by: Chris Saxon - Last updated: June 13, 2019 - 10:05 am UTC

Category: SQL - Version: 12.2

Viewed 100+ times

Whilst you are here, check out some content from the AskTom team: Existence checks

You Asked

Hey Tom,
we have a case where
1. If first character is - (hypen) or space, then strip it out and give o/p
2. If their is an space before or after hypen, strip it out and give o/p
3. If their is an space before or after apostrophe, strip it out and give o/p

So, I decided to go with regullar expression, and my query is like this

create table regex_matching (
sno number(10) not null constraint regex_pk primary key,
c_pattern varchar2(50) not null);


insert into regex_matching values (1, '- My name');
insert into regex_matching values (2, ' My name');
insert into regex_matching values (3, 'My - name ');
insert into regex_matching values (4, 'Why ‘my’ name');
insert into regex_matching values (5, 'Why ‘ my ’ name');
insert into regex_matching values (6, 'Why ‘my ’ name');
insert into regex_matching values (7, 'Why my name');


select case
 when REGEXP_COUNT(c_pattern, '^[-| ]{1,}',1, 'i')>=1 then REGEXP_REPLACE(c_pattern, '^[-| ]{1,}', '')
 when REGEXP_COUNT(c_pattern, ' ?- ?',1, 'i')>=1 then REGEXP_REPLACE(c_pattern, ' ?- ?', '-')
 when REGEXP_COUNT(c_pattern, ' ?‘ ?',1, 'i')>=1 then REGEXP_REPLACE(c_pattern, ' ?‘ ?', '‘')
 when REGEXP_COUNT(c_pattern, ' ?’ ?',1, 'i')>=1 then REGEXP_REPLACE(c_pattern, ' ?’ ?', '’')
 else c_pattern
 end as "regexp_value"
from regex_matching where sno=1;


regexp_value
--------------------------------------------------------------------------------
My name



With this query, I could achive desired results set for sno=1, 2 & 3. But with this query, for sno 4, 5 & 6 the o/p is

select case
when REGEXP_COUNT(c_pattern, '^[-| ]{1,}',1, 'i')>=1 then REGEXP_REPLACE(c_pattern, '^[-| ]{1,}', '')
when REGEXP_COUNT(c_pattern, ' ?- ?',1, 'i')>=1 then REGEXP_REPLACE(c_pattern, ' ?- ?', '-')
when REGEXP_COUNT(c_pattern, ' ?‘ ?',1, 'i')>=1 then REGEXP_REPLACE(c_pattern, ' ?‘ ?', '‘')
when REGEXP_COUNT(c_pattern, ' ?’ ?',1, 'i')>=1 then REGEXP_REPLACE(c_pattern, ' ?’ ?', '’')
else c_pattern
end as "regexp_value"
from regex_matching where sno=6

SQL> /

regexp_value
--------------------------------------------------------------------------------
Why???my???name


even more suprising is, when line no. 4 is modified (the regexp_replace expression), the result changes totally. I understand that pattern matched is deleted, but what about the closing apostrophy
select case
 when REGEXP_COUNT(c_pattern, '^[-| ]{1,}',1, 'i')>=1 then REGEXP_REPLACE(c_pattern, '^[-| ]{1,}', '')
 when REGEXP_COUNT(c_pattern, ' ?- ?',1, 'i')>=1 then REGEXP_REPLACE(c_pattern, ' ?- ?', '-')
 when REGEXP_COUNT(c_pattern, ' ?‘ ?',1, 'i')>=1 then REGEXP_REPLACE(c_pattern, ' ?‘ ?', '')
 when REGEXP_COUNT(c_pattern, ' ?’ ?',1, 'i')>=1 then REGEXP_REPLACE(c_pattern, ' ?’ ?', '’')
 else c_pattern
 end as "regexp_value"
from regex_matching where sno=4;


regexp_value
--------------------------------------------------------------------------------
Whymyname


where as their is no effect on o/p if line no 5 (the regexp_replace expression) is modified
select case
 when REGEXP_COUNT(c_pattern, '^[-| ]{1,}',1, 'i')>=1 then REGEXP_REPLACE(c_pattern, '^[-| ]{1,}', '')
 when REGEXP_COUNT(c_pattern, ' ?- ?',1, 'i')>=1 then REGEXP_REPLACE(c_pattern, ' ?- ?', '-')
 when REGEXP_COUNT(c_pattern, ' ?‘ ?',1, 'i')>=1 then REGEXP_REPLACE(c_pattern, ' ?‘ ?', '‘')
 when REGEXP_COUNT(c_pattern, ' ?’ ?',1, 'i')>=1 then REGEXP_REPLACE(c_pattern, ' ?’ ?', '')
 else c_pattern
 end as "regexp_value"
from regex_matching where sno=4;


regexp_value
--------------------------------------------------------------------------------
Why???my???name


So, my questions are
1. Is their any other better options to achieve this result (I mean other then regullar expression)
2. why opening apostrophe is replaced by question marks
3. why closing apostrophe has no effect on the o/p

I am using 12.2 version

with LiveSQL Test Case:

and we said...

There's a general issue in that the case expression will only process one of the when clauses. So if you have opening and closing quotes in the input, only one of these will get replaced.

You could do this by nesting replaces inside each other. e.g.

replace ( replace ( c_pattern, ' -', '-' ), '- ', '-' )


But that gets messy fast!

An alternative is a regex that searches for any of your desired characters with optional spaces. e.g.:

\s{0,1}([-‘’])\s{0,1}


Then use a backreference (\1) to return the matched character:

select c_pattern,
       regexp_replace (
         c_pattern,
         '\s{0,1}([-‘’])\s{0,1}', '\1'
       ) strip_spaces,
       case
         when REGEXP_COUNT(c_pattern, '^[-| ]{1,}',1, 'i')>=1 then REGEXP_REPLACE(c_pattern, '^[-| ]{1,}', '')
         when REGEXP_COUNT(c_pattern, ' ?- ?',1, 'i')>=1 then REGEXP_REPLACE(c_pattern, ' ?- ?', '-')
         when REGEXP_COUNT(c_pattern, ' ?‘ ?',1, 'i')>=1 then REGEXP_REPLACE(c_pattern, ' ?‘ ?', '')
         when REGEXP_COUNT(c_pattern, ' ?’ ?',1, 'i')>=1 then REGEXP_REPLACE(c_pattern, ' ?’ ?', '’')
         else c_pattern
       end as "regexp_value"
from   regex_matching;

C_PATTERN          STRIP_SPACES    regexp_value   
- My name          -My name        My name         
 My name            My name        My name         
My - name          My-name         My-name         
Why ‘my’ name      Why‘my’name     Whymy’ name     
Why ‘ my ’ name    Why‘my’name     Whymy ’ name    
Why ‘my ’ name     Why‘my’name     Whymy ’ name    
Why my name        Why my name     Why my name   


To strip the leading space or hyphen, you could call the result of this in another regex. Or if you're feeling brave try to do it all in one!

I'm not sure why you're getting the quotes appearing quotation marks or replacing them all. But it's almost certainly a character set conversion issue. Check how these are stored using dump or ascii to get the character code. And ensure your client can display this character!

and you rated our response

  (1 rating)

Is this answer out of date? If it is, please let us know via a Review

Reviews

character set conversion issue

June 12, 2019 - 5:10 pm UTC

Reviewer: Nilesh Kumar

Thanks for the reply Chris Saxon
Your explanation are indeed right on point, and here is the O/P from the query you provided

SQL> select c_pattern, regexp_replace (c_pattern,'\s{0,1}([-‘’])\s{0,1}', '\1') strip_spaces,
       case
         when REGEXP_COUNT(c_pattern, '^[-| ]{1,}',1, 'i')>=1 then REGEXP_REPLACE(c_pattern, '^[-| ]{1,}', '')
         when REGEXP_COUNT(c_pattern, ' ?- ?',1, 'i')>=1 then REGEXP_REPLACE(c_pattern, ' ?- ?', '-')
         when REGEXP_COUNT(c_pattern, ' ?‘ ?',1, 'i')>=1 then REGEXP_REPLACE(c_pattern, ' ?‘ ?', '‘')
         when REGEXP_COUNT(c_pattern, ' ?’ ?',1, 'i')>=1 then REGEXP_REPLACE(c_pattern, ' ?’ ?', '’')
         else c_pattern
       end as "regexp_value"
from   regex_matching;

C_PATTERN           STRIP_SPACES        regexp_value
------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------
- My name           -My name         My name
 My name            My name         My name
My - name           My-name         My-name
Why ???my??? name          Why???my???name        Why???my???name
Why ??? my ??? name          Why???my???name        Why???my???name
Why ???my ??? name          Why???my???name        Why???my???name
Why my name           Why my name        Why my name

7 rows selected.


Column strip_spaces still has space, which are replaced in regexp_values column.
Also, as can be seen the O/P of c_pattern column itself is showing as quotation marks. The character set in Database is "AL32UTF8"

Also, I couldn't get your last line "Check how these are stored using dump or ascii to get the character code." can you please explain this.
Chris Saxon

Followup  

June 13, 2019 - 10:05 am UTC

Either call the dump or ascii functions to see their raw values, e.g.

select dump ( c_pattern ) from regex_matching


Ascii process one character at a time, so you need to select the appropriate character.