Skip to Main Content

Breadcrumb

May 4th

Question and Answer

Connor McDonald

Thanks for the question, Subhobrata.

Asked: November 26, 2015 - 6:57 am UTC

Last updated: August 01, 2017 - 3:49 am UTC

Version: 11g

Viewed 1000+ times

You Asked

Hi Tom,


I was trying to understang the regular expression from the below link

https://community.oracle.com/message/13402428#13402428


WITH sample_data AS
(
SELECT 'C,D,E' AS str, '[A-Z]?' AS pattern FROM dual
)
SELECT LEVEL
, REGEXP_SUBSTR (str, pattern, 1, LEVEL) AS greedy
, REGEXP_INSTR (str, pattern, 1, LEVEL) AS greedy_pos
, REGEXP_SUBSTR (str, pattern || '?', 1, LEVEL) AS non_greedy
, REGEXP_INSTR (str, pattern || '?', 1, LEVEL) AS non_greedy_pos
FROM sample_data
CONNECT BY LEVEL <= 7;


NON_
GREEDY_ NON_ GREEDY_
LEVEL GREEDY POS GREEDY POS
----- ------ ------- ------ -------
1 C 1 1
2 2 2
3 D 3 3
4 4 4
5 E 5 5
6 6 6
7 0 0



In this regard, one user replied like

In a technical sense i am not sure that it is totally accurate to say the regexp machine first matches the C (and then takes one step back and returns null in the case of non-greediness (laziness)).

I also have the same doubt how it matches C in level 1 , then return null in level 2

Can you please explain this issue in more detail

Thanks,






and Connor said...

I think this is a better description (also taken from the forum):


"If it's greedy, REGEXP_SUBSTR will return as much as possible (that is 1 character), starting at position 1.
If it's non-greedy, REGEXP_SUBSTR will return as little as possible (that is, 0 characters), starting at position 1."

So its not like the non-greedy case went "backwards", it just went "forwards as little as possbile" (in this case, 0 characters)

For your second query, this also from the forum:

The pattern [A-Z]? is saying "a character in the set A-Z with optional occurrence" i.e. this is really a pointless pattern as it's wanting any character that is A-Z but that is optional so it doesn't have to be an A-Z character, thus the 2nd occurrence of a character is not A-Z, which matches the "optional" part of the pattern, but as it's a comma and not A-Z the result is a null.

Hope this helps


Rating

  (4 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

negative lookahead

Praveen Ray, July 19, 2017 - 1:10 pm UTC

with premium_no as
(
select '1233367890' no from dual -- category1
union all select '1234444890' from dual -- category2
)
select * from premium_no where regexp_like (no, '(\d)\1{2}');

this is providing me the list of numbers repeating same digit thrice or more, but for this category I need just 3-repititions, and not 4 or more?

lookahead

Praveen Ray, July 19, 2017 - 1:45 pm UTC

OK I got it actually by adding AND NOT with 4 or more. But if you have any other solution please provide
Connor McDonald
July 19, 2017 - 11:08 pm UTC

I dont think you can get away without having the NOT reference, because repeating 4, 5, 6 times etc is just a superset of repeating 3 times.

\ as an operator

Praveen Ray, July 20, 2017 - 9:37 am UTC

Could you please provide an example where \ can be used as an operator
Connor McDonald
July 24, 2017 - 1:40 am UTC

You'll need to elucidate more.

regexp / as an operator

Praveen Ray, July 27, 2017 - 9:51 am UTC

In SQL Language reference documentation (Appendix-D), there are:


\

The backslash character can have four different meanings depending on the context. It can:

•Stand for itself
•Quote the next character
•Introduce an operator
•Do nothing

Could you please elucidate with an example "•Introduce an operator" part?
Connor McDonald
August 01, 2017 - 3:49 am UTC

It's just a reference to the "special characters" used in regex that we refer to as operators.

eg

"\d" is the "pattern formatting operator" that matches a digit, so the slash "introduces" the "d" operator.