Skip to Main Content
  • Questions
  • How can I DO PATTERN MATCHING from within PL-SQL using REGULAR Expression

Breadcrumb

May 4th

Question and Answer

Tom Kyte

Thanks for the question, Pascal.

Asked: December 12, 2001 - 9:39 am UTC

Last updated: June 18, 2013 - 3:18 pm UTC

Version: 8.1.6

Viewed 10K+ times! This question is

You Asked

Hi Tom,

How r u doing..
I must be quick on this because today is the first time i am able to put my Qnand hope that u will see it..

My problem is :

1.

I need to search for matches in a table which has a bunch of Mobile phone numbers, from different countries...

For example , i need to return some rows from this Table, with Phone numbers matching the regular expression : +346[017][07]*

(I hope u r familiar with java.util.regex Package : jdk 1.4)

that means return rows from the Table with anything matching the above regular expression:

This means match:

pattern = +346[017]5[07]*

matching every occurrence of numbers enclosed within [ ] and expanding them, i would like to get a match to any of the following:

+346 0 5 0*
+346 0 5 7*
+346 1 5 0*
+346 1 5 7*
+346 7 5 0*
+346 7 5 7*

Ofcourse thats just a simple pattern, it could get too complicated to expand it manually.


Also,i would like to ask if u have any other alternative for a Query to search for an Operator from a Table of Operators, assuming that i have a child Table match_list which has all Mobile number matches for that Operator in REGULAR EXPRESSION form such as +4917[24]*


Is it also a good idea to expand the pattern and write triggers to expand it into another Table Automatically?

Ok, Tom i hope i will hear from u soon.

Regards,

pascal







and Tom said...

Well, I got together with Mark Piermarini and this is what we came up with.

We went to:

</code> http://www.savarese.org/oro/software/OROMatcher1.1.html <code>

and got the perl regexp matcher they have. we then wrote this small java layer:

create or replace and resolve java source named "RegExp" as
import com.oroinc.text.regex.*;
import oracle.sql.*;

public final class RegExp
{
static PatternMatcher matcher;
static PatternCompiler compiler;

static {
compiler = new Perl5Compiler();
matcher = new Perl5Matcher();
}
static Pattern pattern;
static String last_test_pattern = null;
static PatternMatcherInput input = new PatternMatcherInput("");

public static final void compile( String test_pattern ) throws Exception
{
pattern = compiler.compile( test_pattern );
}

public static final int match(String test_input) throws Exception
{
input.setInput( test_input );
return matcher.contains(input, pattern) ? 1 : 0;
}
}
/
show errors
create or replace function reg_exp_match(
p_test_input in varchar2) return number
as language java name 'RegExp.match(java.lang.String) return int';
/
show errors
create or replace procedure reg_exp_compile(
p_test_input in varchar2)
as language java name 'RegExp.compile(java.lang.String) ';
/
show errors


We went out of our way to make the "match" routine as efficient as possible (hence the compile return).... Now we can:


exec reg_exp_compile( 'ALL.*OBJ' );


select object_name
from all_objects
where reg_exp_match(object_name) = 1
/

and you have regular expressions. You can use any regular expressions from anyone and can squeeze even more performance out of it by NCOMPing the java byte code on your platform.


I don't see how using a trigger can help you here, you don't know the patterns until you go to search, not when you insert.

Rating

  (121 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Excellent

A reader, December 12, 2001 - 2:42 pm UTC

I'm sure I'll need it one day.
Funny that Oracle hasn't made it a built in yet.

Pattern Matching within PL-SQL

Pascal, December 13, 2001 - 4:27 am UTC

Hi Tom ...

Thank you very much for taking all the Trouble to answer my Query....

I went to </code> http://www.savarese.org/oro/software/OROMatcher1.1.html <code>

and downloaded OROMatcher1.1


I am now trying to load ORO Classes into my Oracle Schema but look what i get instead....

First i jar-ed all the .class Files from Folder \com\oroinc\text\regex into myload.jar File

and then i used loadjava : as Follows :


C:\java\ORO\OROa\com\oroinc\text\regex>loadjava -user pascal/pascal@my_host:1521:sid -v -f -r -t -s -g public myload.jar

initialization complete
loading : Perl5StreamInput
creating : Perl5StreamInput
loading : OpCode
creating : OpCode
loading : Substitution
creating : Substitution
loading : PatternMatcherInput
creating : PatternMatcherInput
loading : Perl5Debug
creating : Perl5Debug
loading : Perl5Compiler
creating : Perl5Compiler
loading : Perl5MatchResult
creating : Perl5MatchResult
loading : Util
creating : Util
loading : Perl5Matcher
creating : Perl5Matcher
loading : Perl5Pattern
creating : Perl5Pattern
loading : Perl5Substitution
creating : Perl5Substitution
loading : PatternMatcher
creating : PatternMatcher
loading : MatchResult
creating : MatchResult
loading : Pattern
creating : Pattern
loading : Perl5Repetition
creating : Perl5Repetition
loading : PatternCompiler
creating : PatternCompiler
loading : StringSubstitution
creating : StringSubstitution
loading : MalformedPatternException
creating : MalformedPatternException
loading : CharStringPointer
creating : CharStringPointer
loading : META-INF/MANIFEST.MF
creating : META-INF/MANIFEST.MF
Error while granting to Perl5StreamInput
ORA-04042: procedure, function, package, or package body does not exist

Error while granting to OpCode
ORA-04042: procedure, function, package, or package body does not exist

Error while granting to Substitution
ORA-04042: procedure, function, package, or package body does not exist

Error while granting to PatternMatcherInput
ORA-04042: procedure, function, package, or package body does not exist

Error while granting to Perl5Debug
ORA-04042: procedure, function, package, or package body does not exist

Error while granting to Perl5Compiler
ORA-04042: procedure, function, package, or package body does not exist

Error while granting to Perl5MatchResult
ORA-04042: procedure, function, package, or package body does not exist

Error while granting to Util
ORA-04042: procedure, function, package, or package body does not exist

Error while granting to Perl5Matcher
ORA-04042: procedure, function, package, or package body does not exist

Error while granting to Perl5Pattern
ORA-04042: procedure, function, package, or package body does not exist

Error while granting to Perl5Substitution
ORA-04042: procedure, function, package, or package body does not exist

Error while granting to PatternMatcher
ORA-04042: procedure, function, package, or package body does not exist

Error while granting to MatchResult
ORA-04042: procedure, function, package, or package body does not exist

Error while granting to Pattern
ORA-04042: procedure, function, package, or package body does not exist

Error while granting to Perl5Repetition
ORA-04042: procedure, function, package, or package body does not exist

Error while granting to PatternCompiler
ORA-04042: procedure, function, package, or package body does not exist

Error while granting to StringSubstitution
ORA-04042: procedure, function, package, or package body does not exist

Error while granting to MalformedPatternException
ORA-04042: procedure, function, package, or package body does not exist

Error while granting to CharStringPointer
ORA-04042: procedure, function, package, or package body does not exist

resolver :
resolving: Perl5StreamInput
Error while resolving class Perl5StreamInput
ORA-04043: object Perl5StreamInput does not exist

resolving: OpCode
Error while resolving class OpCode
ORA-04043: object OpCode does not exist

resolving: Substitution
Error while resolving class Substitution
ORA-04043: object Substitution does not exist

resolving: PatternMatcherInput
Error while resolving class PatternMatcherInput
ORA-04043: object PatternMatcherInput does not exist

resolving: Perl5Debug
Error while resolving class Perl5Debug
ORA-04043: object Perl5Debug does not exist

resolving: Perl5Compiler
Error while resolving class Perl5Compiler
ORA-04043: object Perl5Compiler does not exist

resolving: Perl5MatchResult
Error while resolving class Perl5MatchResult
ORA-04043: object Perl5MatchResult does not exist

resolving: Util
Error while resolving class Util
ORA-04043: object Util does not exist

resolving: Perl5Matcher
Error while resolving class Perl5Matcher
ORA-04043: object Perl5Matcher does not exist

resolving: Perl5Pattern
Error while resolving class Perl5Pattern
ORA-04043: object Perl5Pattern does not exist

resolving: Perl5Substitution
Error while resolving class Perl5Substitution
ORA-04043: object Perl5Substitution does not exist

resolving: PatternMatcher
Error while resolving class PatternMatcher
ORA-04043: object PatternMatcher does not exist

resolving: MatchResult
Error while resolving class MatchResult
ORA-04043: object MatchResult does not exist

resolving: Pattern
Error while resolving class Pattern
ORA-04043: object Pattern does not exist

resolving: Perl5Repetition
Error while resolving class Perl5Repetition
ORA-04043: object Perl5Repetition does not exist

resolving: PatternCompiler
Error while resolving class PatternCompiler
ORA-04043: object PatternCompiler does not exist

resolving: StringSubstitution
Error while resolving class StringSubstitution
ORA-04043: object StringSubstitution does not exist

resolving: MalformedPatternException
Error while resolving class MalformedPatternException
ORA-04043: object MalformedPatternException does not exist

resolving: CharStringPointer
Error while resolving class CharStringPointer
ORA-04043: object CharStringPointer does not exist

synonym : Perl5StreamInput
synonym : OpCode
synonym : Substitution
synonym : PatternMatcherInput
synonym : Perl5Debug
synonym : Perl5Compiler
synonym : Perl5MatchResult
synonym : Util
synonym : Perl5Matcher
synonym : Perl5Pattern
synonym : Perl5Substitution
synonym : PatternMatcher
synonym : MatchResult
synonym : Pattern
synonym : Perl5Repetition
synonym : PatternCompiler
synonym : StringSubstitution
synonym : MalformedPatternException
synonym : CharStringPointer
loadjava: 38 errors


There i got stuck,

i have jdk 1.4 Beta on my Windwos 2000 Client ....
Oracle Back-end is on Solaris Machine.

My CLASSPAPH ENV variable is correctly set and i also added the /com path in the list.


Please, i would like more help on how t load these classes into Oracle so that i can test your pattern Matcher program.

Best Regards,

Pascal





Tom Kyte
December 13, 2001 - 9:39 am UTC

1. Download </code> http://www.savarese.org/oro/downloads/download/OROMatcher-1.1.zip <code>

2. Unzip this file. Make sure that you have your un-zipping program create the directories.

You should have a tree like this:

C:\temp\OROMatcher-1.1.0a>tree
Folder PATH listing
Volume serial number is 71FAE346 1C7F:F8DB
C:.
+---com
¦ +---oroinc
¦ +---text
¦ +---regex
+---doc
¦ +---api
¦ ¦ +---images
¦ +---images
+---examples
+---images

3. Now we want to ignore the 'doc', 'examples', and 'images' directories. Put another way, we only want to include everything in the 'com' directory. I used the free utility called unzip.exe but use whatever you desire. Make sure that you have your zipping program includes the sub-directories. This command creates a file named ORO.ZIP.

C:\temp\OROMatcher-1.1.0a>zip -r oro.zip com
updating: com/ (256 bytes security) (stored 0%)
updating: com/oroinc/ (256 bytes security) (stored 0%)
updating: com/oroinc/text/ (256 bytes security) (stored 0%)
updating: com/oroinc/text/regex/ (256 bytes security) (stored 0%)
updating: com/oroinc/text/regex/CharStringPointer.class (188 bytes security) (deflated 50%)
updating: com/oroinc/text/regex/MalformedPatternException.class (188 bytes security) (deflated 30%)
updating: com/oroinc/text/regex/MatchResult.class (188 bytes security) (deflated 32%)
updating: com/oroinc/text/regex/OpCode.class (188 bytes security) (deflated 49%)
updating: com/oroinc/text/regex/Pattern.class (188 bytes security) (deflated 22%)
updating: com/oroinc/text/regex/PatternCompiler.class (188 bytes security) (deflated 51%)
updating: com/oroinc/text/regex/PatternMatcher.class (188 bytes security) (deflated 51%)
updating: com/oroinc/text/regex/PatternMatcherInput.class (188 bytes security) (deflated 58%)
updating: com/oroinc/text/regex/Perl5Compiler.class (188 bytes security) (deflated 51%)
updating: com/oroinc/text/regex/Perl5Debug.class (188 bytes security) (deflated 40%)
updating: com/oroinc/text/regex/Perl5Matcher.class (188 bytes security) (deflated 54%)
updating: com/oroinc/text/regex/Perl5MatchResult.class (188 bytes security) (deflated 49%)
updating: com/oroinc/text/regex/Perl5Pattern.class (188 bytes security) (deflated 40%)
updating: com/oroinc/text/regex/Perl5Repetition.class (188 bytes security) (deflated 36%)
updating: com/oroinc/text/regex/Perl5StreamInput.class (188 bytes security) (deflated 43%)
updating: com/oroinc/text/regex/Perl5Substitution.class (188 bytes security) (deflated 49%)
updating: com/oroinc/text/regex/StringSubstitution.class (188 bytes security) (deflated 50%)
updating: com/oroinc/text/regex/Substitution.class (188 bytes security) (deflated 38%)
updating: com/oroinc/text/regex/Util.class (188 bytes security) (deflated 64%)

4. Let's double-check the contents of that file.

C:\temp\OROMatcher-1.1.0a>unzip -v oro.zip
Archive: oro.zip
Length Method Size Ratio Date Time CRC-32 Name
-------- ------ ------- ----- ---- ---- ------ ----
0 Stored 0 0% 12-13-01 09:11 00000000 com/
0 Stored 0 0% 12-13-01 09:11 00000000 com/oroinc/
0 Stored 0 0% 12-13-01 09:11 00000000 com/oroinc/text/
0 Stored 0 0% 12-13-01 09:11 00000000 com/oroinc/text/regex/
1458 Defl:N 735 50% 10-21-99 19:25 bbe7f548 com/oroinc/text/regex/CharStringPointer.class
331 Defl:N 232 30% 10-21-99 19:26 bf322bfe com/oroinc/text/regex/MalformedPatternException.class
379 Defl:N 258 32% 10-21-99 19:25 33232905 com/oroinc/text/regex/MatchResult.class
2400 Defl:N 1220 49% 10-21-99 19:25 5ebe5019 com/oroinc/text/regex/OpCode.class
241 Defl:N 188 22% 10-21-99 19:25 173e9b29 com/oroinc/text/regex/Pattern.class
507 Defl:N 248 51% 10-21-99 19:25 4782e5ad com/oroinc/text/regex/PatternCompiler.class
574 Defl:N 280 51% 10-21-99 19:25 fa3a5349 com/oroinc/text/regex/PatternMatcher.class
2354 Defl:N 989 58% 10-21-99 19:25 48e548c6 com/oroinc/text/regex/PatternMatcherInput.class
13501 Defl:N 6569 51% 10-21-99 19:26 ee171492 com/oroinc/text/regex/Perl5Compiler.class
2599 Defl:N 1563 40% 10-21-99 19:26 7d5322dc com/oroinc/text/regex/Perl5Debug.class
11287 Defl:N 5148 54% 10-21-99 19:26 f850bc2a com/oroinc/text/regex/Perl5Matcher.class
1198 Defl:N 612 49% 10-21-99 19:26 2e63a3dd com/oroinc/text/regex/Perl5MatchResult.class
891 Defl:N 537 40% 10-21-99 19:26 a87088c4 com/oroinc/text/regex/Perl5Pattern.class
468 Defl:N 298 36% 10-21-99 19:26 9becfff7 com/oroinc/text/regex/Perl5Repetition.class
2218 Defl:N 1259 43% 10-21-99 19:25 51a33c0c com/oroinc/text/regex/Perl5StreamInput.class
2624 Defl:N 1329 49% 10-21-99 19:27 d52ea45f com/oroinc/text/regex/Perl5Substitution.class
1009 Defl:N 503 50% 10-21-99 19:27 47fc9bae com/oroinc/text/regex/StringSubstitution.class
362 Defl:N 224 38% 10-21-99 19:26 743dd420 com/oroinc/text/regex/Substitution.class
2844 Defl:N 1030 64% 10-21-99 19:27 900ecc52 com/oroinc/text/regex/Util.class
-------- ------- --- -------
47245 23222 51% 23 files


5. This ORO.ZIP file is what you need to load into the database.

C:\temp\OROMatcher-1.1.0a>loadjava -u scott/tiger -f -r -v oro.zip
arguments: '-u' 'scott/tiger' '-f' '-r' '-v' 'oro.zip'
creating : class com/oroinc/text/regex/CharStringPointer
loading : class com/oroinc/text/regex/CharStringPointer
creating : class com/oroinc/text/regex/MalformedPatternException
loading : class com/oroinc/text/regex/MalformedPatternException
creating : class com/oroinc/text/regex/MatchResult
loading : class com/oroinc/text/regex/MatchResult
creating : class com/oroinc/text/regex/OpCode
loading : class com/oroinc/text/regex/OpCode
creating : class com/oroinc/text/regex/Pattern
loading : class com/oroinc/text/regex/Pattern
creating : class com/oroinc/text/regex/PatternCompiler
loading : class com/oroinc/text/regex/PatternCompiler
creating : class com/oroinc/text/regex/PatternMatcher
loading : class com/oroinc/text/regex/PatternMatcher
creating : class com/oroinc/text/regex/PatternMatcherInput
loading : class com/oroinc/text/regex/PatternMatcherInput
creating : class com/oroinc/text/regex/Perl5Compiler
loading : class com/oroinc/text/regex/Perl5Compiler
creating : class com/oroinc/text/regex/Perl5Debug
loading : class com/oroinc/text/regex/Perl5Debug
creating : class com/oroinc/text/regex/Perl5Matcher
loading : class com/oroinc/text/regex/Perl5Matcher
creating : class com/oroinc/text/regex/Perl5MatchResult
loading : class com/oroinc/text/regex/Perl5MatchResult
creating : class com/oroinc/text/regex/Perl5Pattern
loading : class com/oroinc/text/regex/Perl5Pattern
creating : class com/oroinc/text/regex/Perl5Repetition
loading : class com/oroinc/text/regex/Perl5Repetition
creating : class com/oroinc/text/regex/Perl5StreamInput
loading : class com/oroinc/text/regex/Perl5StreamInput
creating : class com/oroinc/text/regex/Perl5Substitution
loading : class com/oroinc/text/regex/Perl5Substitution
creating : class com/oroinc/text/regex/StringSubstitution
loading : class com/oroinc/text/regex/StringSubstitution
creating : class com/oroinc/text/regex/Substitution
loading : class com/oroinc/text/regex/Substitution
creating : class com/oroinc/text/regex/Util
loading : class com/oroinc/text/regex/Util
resolving: class com/oroinc/text/regex/CharStringPointer
resolving: class com/oroinc/text/regex/MalformedPatternException
resolving: class com/oroinc/text/regex/MatchResult
resolving: class com/oroinc/text/regex/OpCode
resolving: class com/oroinc/text/regex/Pattern
resolving: class com/oroinc/text/regex/PatternCompiler
resolving: class com/oroinc/text/regex/PatternMatcher
skipping : class com/oroinc/text/regex/PatternMatcherInput
resolving: class com/oroinc/text/regex/Perl5Compiler
resolving: class com/oroinc/text/regex/Perl5Debug
resolving: class com/oroinc/text/regex/Perl5Matcher
skipping : class com/oroinc/text/regex/Perl5MatchResult
skipping : class com/oroinc/text/regex/Perl5Pattern
skipping : class com/oroinc/text/regex/Perl5Repetition
skipping : class com/oroinc/text/regex/Perl5StreamInput
resolving: class com/oroinc/text/regex/Perl5Substitution
skipping : class com/oroinc/text/regex/StringSubstitution
skipping : class com/oroinc/text/regex/Substitution
resolving: class com/oroinc/text/regex/Util



How can I DO PATTERN MATCHING from within PL-SQL using REGULAR Expression , version 8.1.6

Pascal, December 13, 2001 - 8:22 am UTC

Hi Tom,
I managed to use Loadava to load the Class Files  below into Oracle DB , but some are INVALID  as indicated below:

I still managed to load your sample Java Stored procedure though , because i  note  that it does not use any of those INVALID classes .

I also modified your Code a little bit by writing :

import java.sql.*  
instead of  import oracle.sql.*

The Classes are:

Match                      INVALID
Perl5Debug                 INVALID 
Perl5Compiler
Perl5MatchResult
Util                       INVALID
Perl5Matcher
Perl5Pattern
Perl5Substitution          INVALID
PatternMatcher
MatchResult
Pattern
Perl5Repetition
PatternCompiler
StringSubstitution        INVALID
MalformedPatternException
CharStringPointer
META-INF/MANIFEST.MF
RegExp
Substitution              INVALID
OpCode
PatternMatcherInput
Perl5StreamInput



Some Tests for your program:

SQL>  select * from reg_test;

ID
------------------------------
+49179232332
+4917223232
+49160232323
+4917423232
+25574123232
+25574432432
+25574234343
+4915634343
+4916334343
+3467832323
+346023232
346122342
+346243232
+346545454
+346823232
+34686
+346222423
+34623345353
+346322332
+3464323232

exec reg_exp_compile( '\346[34]+' );

SQL> select *
  2    from reg_test
  3   where reg_exp_match(id) = 1;

ID
------------------------------
+346322332
+3464323232


I am stll Testing it if it works for more Complex RegExps.


Once again, thank you very very much for  this.

Best regards,

Pascal

 

I got it Right This time around!!..Everything now works fine.....

Pascal, December 14, 2001 - 8:09 am UTC

Hi Tom,

Thanks very much for your followup to my Query...i managed to install zipped /com Folder and i loaded all the class Files successfully....

But i have one more question,
Where does the Pattern that i compile gets stored , before i match it with some string, because in this case the
reg_exp_compile and
reg_exp_match and not in a package and so how does Oracle Remember the Compiled string before it gets executed .


Best Regards,

Pascal





Tom Kyte
December 14, 2001 - 8:17 am UTC

It's in the static attribute "pattern" in the java class. I broke compile out from match for performance reasons -- rather then compiling the string for each call or comparing the last compiled string to the current match string on each call -- we just call compile once and match as many times as we like. Just makes it faster...

public final class RegExp
{
static PatternMatcher matcher;
static PatternCompiler compiler;

static {
compiler = new Perl5Compiler();
matcher = new Perl5Matcher();
}
static Pattern pattern;

static String last_test_pattern = null;
static PatternMatcherInput input = new PatternMatcherInput("");

public static final void compile( String test_pattern ) throws Exception
{
pattern = compiler.compile( test_pattern );
}

public static final int match(String test_input) throws Exception
{
input.setInput( test_input );
return matcher.contains(input, pattern) ? 1 : 0;
}
}




Combining reg_exp_compile and reg_exp_match in One CALL

pascal, December 27, 2001 - 4:10 am UTC

Hi Tom and Mark,

Happy New Year in advance to u Guyz!

I was thinking of a way to combine
reg_exp_compile and
reg_exp_match so that they could be called in a Single SQL statement , instead of one procedural call and one SQL.

How could this be achieved.

Thanks,


pascal

************************************************
exec reg_exp_compile( 'ALL.*OBJ' );
select object_name
from all_objects
where reg_exp_match(object_name) = 1
***********************************************


Tom Kyte
December 27, 2001 - 9:03 am UTC

You can -- and in fact the first version we wrote did that -- however, the overhead of doing this was considered too large.

You are going to call reg_exp_match many times, perhaps 10's of thousands or even more. Every bit of extra work that it must do is overhead that should be avoided.

You can add a method:

public final class RegExp
{
static PatternMatcher matcher;
static PatternCompiler compiler;

static {
compiler = new Perl5Compiler();
matcher = new Perl5Matcher();
}
static Pattern pattern;
static String last_test_pattern = null;
static PatternMatcherInput input = new PatternMatcherInput("");

public static final void compile( String test_pattern ) throws Exception
{
pattern = compiler.compile( test_pattern );
}

public static final int match(String test_input) throws Exception
{
input.setInput( test_input );
return matcher.contains(input, pattern) ? 1 : 0;
}

public static final int match2( String test_input, String test_pattern )
throws Exception
{
-- compare test_pattern to last_test_pattern --
-- if different
compile( test_pattern );
last_test_pattern = test_pattern
end if
return match( test_input )
}

}
/

The overhead of doing that string compare on every call adds up over time. We didn't like that so didn't do it that way (but apparently we left the last_test_pattern in there!)


Follow-up

Pascal, January 09, 2002 - 3:25 am UTC

Hi Tom amd Mark,

How are you guys doing..

Thanks for your previous follow-up on my question...

I am planning to use this regExp package as follows:


I have a DB Table which has a list of Operators and their corresponding Patterns , which matches all mobile numbers for a particular operator.

I have a c++ program which feeds the database with some short messages , and i need to compare each message so that i could extract an Operator by checking the originating phone numbers from the short messages ...and so, that means i have to compile and match fo each and every short message that i receive and there could be millios of them....

Do you think i could use this package for the above mentioned purpose without incurring performance problems?

Also, i don't have much experience with java, and so could you help me write that match_2 Compare logic if you have time please ?

Thanks in advance,


Best regards,

Pasko







Tom Kyte
January 09, 2002 - 8:50 am UTC

Millions -- I would not bank on this being "fast". You can try Native Compilation which will speed up the java (possibly by alot).

I would definitely suggest NOT using the match_2 method if you plan on doing this lots -- it really does add lots of overhead.

public static final int match2( String test_input, String test_pattern )
throws Exception
{
if (last_test_pattern == null || !test_pattern.isEqual(last_test_pattern))
{
compile(test_pattern);
last_test_pattern = test_pattern;
}
return match( test_input )
}

should do it.

I ordered ur Book !

Pascal, January 10, 2002 - 4:05 am UTC

Hi Tom/Mark,

Thanks again for your follow-up on my Qn.

I added match2 to regExp class you sent me earlier but compiled with errors on jdk 1.4 beta .

regExp.java:33: cannot resolve symbol
symbol : method isEqual (java.lang.String)
location: class java.lang.String
if (last_test_pattern == null || !test_pattern.isEqual(last_test_pattern
))
^
1 error



But i changed : your 'isEqual' method TO 'equals' and it compiled successfully :

if (last_test_pattern == null || !test_pattern.isEqual(last_test_pattern))

TO:

if (last_test_pattern == null || !test_pattern.equals(last_test_pattern))



I am still testing this to see if i get any performance improvements.


Thanks very much for your support .

Today i ordered your BOOK and i am looking forward to getting it soon , because it is not available in most of the English Bookshops here in Germany.

Best regards,


Pasko

How about using this .....

Pascal, February 08, 2002 - 11:36 am UTC

Hi Tom/Mark,
I went to this Link and found that they also have another Tool called PerlTools-1.2.zip

at :

</code> http://www.savarese.org/oro/downloads/index.html#PerlTools <code>

So i downloaded it and wrote my simple Wrapper JSP.

It looks like :


create or replace and resolve java source named "PerlExp" as

import com.oroinc.text.regex.*;
import com.oroinc.text.perl.*;

import oracle.sql.*;

public final class PerlExp
{

static Perl5Util perl;

static {
perl = new Perl5Util();
}


public static final int match( String p_pattern , String p_input )
{

p_pattern = "/" + p_pattern + "/" ;

return perl.match(p_pattern , p_input) ? 1 : 0 ;

}


}
/



create or replace function perl_match(
p_pattern IN varchar2 , p_input IN varchar2 )
return number
as language java name 'PerlExp.match(java.lang.String , java.lang.String ) return int';
/


Soi could just code in a Siingle sQL as : from your Example:


select * from all_objects where
perl_match( 'ALL.*OBJ' , object_name ) = 1


What do you think is the Performance of this one compared to the Earlier procedure.


Best Regards,

Pascal




Tom Kyte
February 08, 2002 - 4:40 pm UTC

I don't know -- benchmark it for us and let us know ;)


I will guess:

the original implementation will be faster.

o it does not create a temporary string "/" + pattern + "/"
o it does not compile the pattern for each row.

Function-based Index for Java functions

Tony, December 24, 2002 - 1:43 am UTC

Do we have to create function-based index to speed up the performance in pattern serach using this Java function?.

How to make a deterministic Java function for the above case?




Tom Kyte
December 24, 2002 - 8:30 am UTC

pattern matching isn't something an index is going to help. You have to inspect each and every record -- no chance to short circut it.

what exactly would you INDEX? there isn't anything...

Error during compilation

Sasa Petkovic, February 17, 2003 - 4:19 am UTC

Hi Tom,

I successfully loaded all the classes from or.zip but got an error during compilation java source "RegExp".
There is an output:
CREATE OR REPLACE AND resolve java source named "RegExp" AS
*
ERROR at line 1:
ORA-29536: badly formed source: Encountered "PUBLIC" at line 4, column 1.
Was expecting one of:
<EOF>
"#sql" ...
"abstract" ...
"final" ...
"public" ...
"class" ...
"import" ...
"interface" ...
";" ...
"\\u001a" ...

Could you help please.

Sasa




Tom Kyte
February 17, 2003 - 11:22 am UTC

what version of sqlplus are you using?

and do it with SET ECHO ON

and show us the ENTIRE cut and paste -- you must either

o have a typo
o be using a client that doesn't know what to do with create or replace java code.

Sasa Petkovic, February 17, 2003 - 11:32 am UTC

Hi Tom,
I did as you recomended and got as follows:

SQL*Plus: Release 8.1.7.0.0 - Production on Mon Feb 17 17:25:27 2003
(c) Copyright 2000 Oracle Corporation.  All rights reserved.

Connected to:
Oracle8i Enterprise Edition Release 8.1.7.3.0 - Production
With the Partitioning option
JServer Release 8.1.7.3.0 - Production

SQL> 
SQL> CREATE OR REPLACE AND resolve java source named "RegExp" AS
  2  import com.oroinc.text.regex.*;
  3  import oracle.SQL.*;
  4  
  5  PUBLIC final class RegExp
  6  {
  7    static PatternMatcher matcher;
  8    static PatternCompiler compiler;
  9  
 10    static {
 11      compiler = NEW Perl5Compiler();
 12      matcher  = NEW Perl5Matcher();
 13    }
 14    static Pattern pattern;
 15    static String  last_test_pattern = NULL;
 16    static PatternMatcherInput input = NEW PatternMatcherInput("");
 17  
 18    PUBLIC static final void COMPILE( String test_pattern ) throws EXCEPTION
 19    {
 20       pattern = compiler.COMPILE( test_pattern );
 21    }
 22  
 23    PUBLIC static final INT match(String test_input)  throws EXCEPTION
 24    {
 25        input.setInput( test_input );
 26        RETURN matcher.contains(input, pattern) ? 1 : 0;
 27    }
 28  }
 29  /
CREATE OR REPLACE AND resolve java source named "RegExp" AS
                                                *
ERROR at line 1:
ORA-29536: badly formed source: Encountered "PUBLIC" at line 4, column 1.

Was expecting one of:

<EOF> 
"#sql" ...
"abstract" ...
"final" ...
"public" ...
"class" ...
"import" ...
"interface" ...
";" ...
"\\u001a" ...

SQL> 


Cut and paste whole input and output.

Regards,

Sasa 

PS
Run from TOAD (calling sql plus from TOAD) 

Tom Kyte
February 17, 2003 - 12:09 pm UTC

so, why did you change my example and capitalize the word public? Java is a case sensitive language.

Use the code I provided as I provided it and it'll work.

Change the case -- and you are on your own.

I'll betcha that neat "toad" thingy pretty printed that for you did it? well, java isn't plsql, "toad" isn't going to do good things to it. RETURN EXCEPTION -- all of those "shouting statements" will fail (why do people like keywords upper cased? I've always found that to be 100% ugly)



Built-in owa_pattern

Alexander, February 17, 2003 - 11:58 pm UTC

Maybe built-in package owa_pattern can be used? Comment from package:
The package OWA_PATTERN is a "regular expression" pattern matching
package. There are 3 fundamental subprograms in OWA_PATTERN.

The are: AMATCH, MATCH, and CHANGE.



TOAD capitalization effect

Sasa, February 18, 2003 - 1:18 am UTC

Hi Tom,

You are completely right. I didn't notice that I had capitalization effect option on in TOAD.
Now it works.Thanks again.

Regards,

Sasa

j., March 08, 2003 - 7:33 am UTC

hi tom,

I 've tried your sample - everything worked very well. but I've now two additional questions:

1st: what (in general) is the right way to remove all the database objects created for this? actually the tool I use (SQL navigator) doesn't show dependencies between stored programs "REG_EXP*" and their associated java classes.

2nd: I get the following output from USER_OBJECTS:

/10074bae_Substitution /184abd21_Perl5Repetition /22eba5d6_Perl5Matcher /2ca03ed7_MalformedPatternExce /74c0e441_Perl5Substitution /781a1e56_Perl5StreamInput /83c2e19d_Perl5Compiler /8a706168_StringSubstitution /a572ed52_PatternMatcher /af39100e_MatchResult /b3a68f05_Perl5Pattern /dbdcd9f3_PatternMatcherInput /dc521360_CharStringPointer /e8ec9c82_Perl5MatchResult /ea37ca8e_PatternCompiler /ede86355_Perl5Debug REG_EXP_COMPILE REG_EXP_MATCH RegExp RegExp com/oroinc/text/regex/OpCode com/oroinc/text/regex/Pattern com/oroinc/text/regex/Util

could you please explain, what has happened here, if there 's anything wrong? where did all classes except of Util, OpCode and Pattern get their (unexpected) names from, and is there any way to "rearrange" that?

thanks for your help.

and can be rearranged, that ALL classes include their pathes)?
anything wrong with the way I installed your example? could those entries with "/" be avoided/hidden

Tom Kyte
March 08, 2003 - 9:04 am UTC

1) use user dependencies -- but don't you know what you installed??? You drop that which you install and it goes away.

2) they are the short names (we need 30 characters) of the external interfaces available to you. You can use dbms_java.longname to get their "longer" name back.



How about this -- create a user regex. Put all this stuff in there -- and nothing more, nothing less. Then, if you want to remove it -- drop user regex cascade; Grant on that which you want exposed and the "ugly" object names don't exist as far as anyone else is concerned.

j., March 08, 2003 - 11:07 am UTC

30 characters only - I should have known.

thanks, DBMS_JAVA.LONGNAME helps very much to identify java classes through their fully qualified names.

but one problem is left (to me): I've already searched user_dependencies and didn't find links between PL/SQL code (REG_EXP_MATCH, REG_EXP_COMPILE) and java source (RegExp) there. so do one have to do a "code search" in order to check those dependencies?

Tom Kyte
March 08, 2003 - 11:17 am UTC

create or replace procedure reg_exp_compile(
  p_test_input in varchar2)
as language java name 'RegExp.compile(java.lang.String) ';
/

See how it is done as a string there?

  1  create or replace procedure reg_exp_compile(
  2    p_test_input in varchar2)<b>
  3* as language java name 'I_dontReally.ExistAnywhere(java.lang.String) ';</b>
ops$tkyte@ORA920> /

Procedure created.


ops$tkyte@ORA920> exec reg_exp_compile('xxx' );
BEGIN reg_exp_compile('xxx' ); END;

*
ERROR at line 1:
ORA-29540: class I_dontReally does not exist
ORA-06512: at "OPS$TKYTE.REG_EXP_COMPILE", line 0
ORA-06512: at line 1



it is sort of like dynamic SQL in that sense.
 

NullPointerException

A reader, April 25, 2003 - 12:25 pm UTC

I tried the following:

begin
reg_exp_compile('^[AB].*i$');
end;

select * from emp
where reg_exp_match(last_nm)=1

I got ORA-29532 NullPointerException

I changed this to

select * from emp
where last_nm is not null
reg_exp_match(last_nm)=1

and everything worked fine. It gave me all last names starting with A or B and ending with i.

So, apparenntly, the regexp package cannot handle a null input.

How can I fix this so that a null column doesnt match any regexp? So I dont have to keep adding this 'and column is not null' condition to any query using this regexp?

Thanks


Tom Kyte
April 25, 2003 - 3:45 pm UTC

just either add a check in the java or wrap yet another layer of plsql.

most efficient would be:

where decode( last_nm, null, 0, reg_exp_match(last_nm) ) = 1


as that would not even have the overhead of invoking the function (and your approach is "unsafe", we can and will flip flop the order of evaluation of the predicate AT WILL -- use the decode -- not the predicate - to ensure order of operations)

Flip-flop the order?

A reader, April 25, 2003 - 4:03 pm UTC

Oracle flip-flops the order of evaluation of predicates at will?

Doesnt it follow the basic short-circuit boolean evaluation?
So, if I have

where 1=0
and <bunch of other stuff>

This query should come out right away, since it is never going to match anything, right?

Tom Kyte
April 25, 2003 - 4:45 pm UTC

it is purposely "not said".

We can evaluate:

where A and B


as where A and then B or where B and then A.


You have NO real true 100% control over it.

Substituting?

A reader, April 25, 2003 - 4:51 pm UTC

I wrote a fancy regexp using this method and it is working like a charm. Saved me hours of work.

Logical extension: Can I use the Substitution methods in the ORO classes to perform my substitutions on the matches that I found? Using the \(...\) groupings in my compiled pattern and referencing them as \1, \2, etc in my UPDATE

i.e. something like

update table
set column=Substitute(my_pattern,'\1..\2..')
where reg_exp_match(...)=1

This would be awesome. Anyway to do this?

Thanks

Tom Kyte
April 25, 2003 - 7:39 pm UTC

You would have to look at the ORO classes and see if they support the "replace" sort of functionality.

Jakarta ORO

A reader, April 26, 2003 - 11:41 pm UTC

Seems the ORO software has moved to </code> http://jakarta.apache.org <code>I downloaded ORO 2.0.7 and added the following method to your Java source

public static final String substitute(String p_input,String p_expression) throws Exception
{
Perl5Util util = new Perl5Util();
return util.substitute(p_expression,p_input);
}

Now I can do

UPDATE table
set column=substitute(column,'s/pattern/replacement/')
where ...

Also, changed your compile procedure to a function so that it can be selected from dual instead of having to run PL/SQL and then the SQL.

public static final String compile( String p_pattern ) throws Exception
{
try {
pattern = compiler.compile(p_pattern);
return "Pattern compiled successfully: " + p_pattern;
} catch (MalformedPatternException e) {
return "Error compiling pattern: " + e.getMessage();
}
}

Thanks.

Tom Kyte
April 27, 2003 - 8:22 am UTC

why

select regexp.compile( ? ) from dual;

is considered "easier" then

begin regexp.compile( ? ); end;

(especially when one takes more keystrokes to type......)

select from dual vs. pl/sql

A reader, April 27, 2003 - 9:15 pm UTC

Well, most front-end query apps here have no PL/SQL blocks in their code. They are 100% query apps, so this just makes it easier to provide a 'select from dual' interface to this feature rather than a 'begin ... end;' interface.

I guess its just a matter of personal preference...

fyi, how to make pattern_match work with backrefs

Dave Lowenstein, May 14, 2003 - 5:39 pm UTC

I searched high and low for how to make owa_pattern.match work with saving what it found in an array of back references (backrefs). I figure that asktom.oracle.com is a good place to put what I found. I pulled this out of thin air, as it wasn't in any docs i found. 

Just like in Perl regular expressions, your backrefs need to be between parentheses to stand up and be counted. I don't know why i didn't think of that before. Here's a working example of what i spent a large portion of yesterday trying to make work. It finds any characters between two pipes and saves what it finds into an array:

declare
a owa_text.vc_arr;
b boolean;
c varchar(3000):='|abc|';
begin
b:=owa_pattern.match(line=>c, pat=>'|(\w*)|', backrefs=>a);
if a.exists(1)
then
for i in a.first..a.last
loop
dbms_output.put_line(a(i));
end loop;
end if;
end;
/

SQL>/
abc

PL/SQL procedure successfully completed. 

Largest matching string

Arun Kalavath, July 11, 2003 - 4:23 am UTC

Hi Tom
I am trying to do something similar. I have a list of phone numbers and I want to return the largest matching number.

drop table t
  2  /

Table dropped.

SQL> 
SQL> create table t(pnum varchar2(15) )
  2  /

Table created.

SQL> insert into t values ('0123456789')
  2  /

1 row created.

SQL> insert into t values ('012345678')
  2  /

1 row created.

SQL> insert into t values ('01234567')
  2  /

1 row created.

SQL> insert into t values ('0123456')
  2  /

1 row created.

SQL> insert into t values ('0123')
  2  /

1 row created.

SQL> 
SQL> create or replace procedure get_best_match (Arg_pnum in t.pnum%type )
  2  as
  3  result t.pnum%type;
  4  BEGIN
  5  SELECT MAX( pnum ) into result
  6  FROM t
  7  WHERE (pnum = substr( Arg_pnum, 1, 1 )
  8        or pnum = substr( Arg_pnum, 1, 2 )
  9        or pnum = substr( Arg_pnum, 1, 3 )
 10        or pnum = substr( Arg_pnum, 1, 4 )
 11        or pnum = substr( Arg_pnum, 1, 5 )
 12        or pnum = substr( Arg_pnum, 1, 6 )
 13        or pnum = substr( Arg_pnum, 1, 7 )
 14        or pnum = substr( Arg_pnum, 1, 8 )
 15        or pnum = substr( Arg_pnum, 1, 9 )
 16        or pnum = substr( Arg_pnum, 1, 10 )
 17        or pnum = substr( Arg_pnum, 1, 11 )
 18        or pnum = substr( Arg_pnum, 1, 12)
 19        or pnum = substr( Arg_pnum, 1, 13 )
 20        or pnum = substr( Arg_pnum, 1, 14 )
 21        or pnum = substr( Arg_pnum, 1, 15 ));
 22     dbms_output.put_line (result);
 23  end get_best_match;
 24  
 25  /

Procedure created.

SQL> select * from t;

PNUM                                                                            
---------------                                                                 
0123456789                                                                      
012345678                                                                       
01234567                                                                        
0123456                                                                         
0123                                                                            

SQL> set serveroutput on

SQL>  exec get_best_match('0123');
0123                                                                            

PL/SQL procedure successfully completed.

SQL>  exec get_best_match('01231');
0123                                                                            

PL/SQL procedure successfully completed.

SQL>  exec get_best_match('0123456');
0123456                                                                         

PL/SQL procedure successfully completed.

SQL> exec get_best_match('01234561');
0123456                                                                         

PL/SQL procedure successfully completed.

SQL> exec get_best_match('01234562');
0123456                                                                         

PL/SQL procedure successfully completed.

SQL>  exec get_best_match('01234567');
01234567                                                                        

PL/SQL procedure successfully completed.


I.e. I need to return the highest matching number. As you can see currently we use substring. Is there a better way to achive this?

Thank you
Arun
 

Tom Kyte
July 11, 2003 - 9:46 am UTC

ops$tkyte@ORA817DEV> variable n varchar2(20)
ops$tkyte@ORA817DEV>
ops$tkyte@ORA817DEV> exec :n := '0123'

PL/SQL procedure successfully completed.

ops$tkyte@ORA817DEV>
ops$tkyte@ORA817DEV> select substr( max(to_char(length(pnum),'fm00009') || pnum ), 6 )
  2    from t
  3   where :n like pnum||'%';

SUBSTR(MAX(TO_CH
----------------
0123

ops$tkyte@ORA817DEV>
ops$tkyte@ORA817DEV> exec :n := '01231'

PL/SQL procedure successfully completed.

ops$tkyte@ORA817DEV> /

SUBSTR(MAX(TO_CH
----------------
0123

ops$tkyte@ORA817DEV> exec :n := '0123456'

PL/SQL procedure successfully completed.

ops$tkyte@ORA817DEV> /

SUBSTR(MAX(TO_CH
----------------
0123456

ops$tkyte@ORA817DEV> exec :n := '01234561';

PL/SQL procedure successfully completed.

ops$tkyte@ORA817DEV> /

SUBSTR(MAX(TO_CH
----------------
0123456

ops$tkyte@ORA817DEV> exec :n := '01234562';

PL/SQL procedure successfully completed.

ops$tkyte@ORA817DEV> /

SUBSTR(MAX(TO_CH
----------------
0123456

ops$tkyte@ORA817DEV> exec :n := '01234567';

PL/SQL procedure successfully completed.

ops$tkyte@ORA817DEV> /

SUBSTR(MAX(TO_CH
----------------
01234567

 

Sorry, forgot version number

Arun Kalavath, July 11, 2003 - 4:26 am UTC

We use Oracle 8.1.7. Enterprise edition.

Thanks
Arun

Always does a full tablescan

Arun Kalavath, July 15, 2003 - 3:43 am UTC

Hi Tom

Thank you for this reply. Could you tell me why you used "substr( max(to_char(length(pnum),'fm00009') || pnum 
), 6 )". As I understand it, this statments adds the length of the maximum value returned as a 'fm00009' to the max value and then we substring this from the 6th position. I got the same result in the same format when I did 
select max(pnum) from t where :n like pnum||'%';
So what is the difference?

Question 2 :- I tried adding an index on pnum. But the query alway seems to do a full table scan. How can I get the query to use an index on pnum column? Or is this because I only got few records in the table?

SQL> create index xx on t (pnum);

Index created.

SQL> set autotrace traceonly
SQL>  select max(pnum) from t where :n like pnum||'%';


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   SORT (AGGREGATE)
   2    1     TABLE ACCESS (FULL) OF 'T'




Statistics
----------------------------------------------------------
          0  recursive calls
          4  db block gets
          1  consistent gets
          0  physical reads
          0  redo size
        198  bytes sent via SQL*Net to client
        311  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> drop index xx;

Index dropped.

SQL> create unique index xx on t (pnum);

Index created.

SQL> select max(pnum) from t where :n like pnum||'%';


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   SORT (AGGREGATE)
   2    1     TABLE ACCESS (FULL) OF 'T'




Statistics
----------------------------------------------------------
          0  recursive calls
          4  db block gets
          1  consistent gets
          0  physical reads
          0  redo size
        210  bytes sent via SQL*Net to client
        311  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed


Thanks
Arun 

Tom Kyte
July 15, 2003 - 9:41 am UTC

to to_char() trick was used to get the PNUM that had the maximum length, the max worked on the first characters -- getting the one with the maximum length.

I'm fairly sure "max" will work as well -- havent thought it all of the way out.


and of course it does a full scan -- it has to look at EACH and EVERY row.  that -- and you are using the RBO and the RBO isn't very smart.

Consider if you used the CBO AND pnum was NOT NULL (or you added "and pnum is not null" to the query itself -- since nulls are not indexed in b*trees)

ops$tkyte@ORA920LAP> create table t
  2  as
  3  select rownum pnum, a.* from all_objects a;

Table created.

ops$tkyte@ORA920LAP>
ops$tkyte@ORA920LAP> create index t_idx on t(pnum);

Index created.

ops$tkyte@ORA920LAP> alter table t modify pnum NOT NULL;

Table altered.

ops$tkyte@ORA920LAP>
ops$tkyte@ORA920LAP> variable x varchar2(5)
ops$tkyte@ORA920LAP> exec :x := '1234'

PL/SQL procedure successfully completed.

ops$tkyte@ORA920LAP>
ops$tkyte@ORA920LAP> set autotrace on explain
ops$tkyte@ORA920LAP> select max(pnum) from t where :x like pnum ||'%';

 MAX(PNUM)
----------
      1234


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   SORT (AGGREGATE)
   2    1     TABLE ACCESS (FULL) OF 'T'



ops$tkyte@ORA920LAP> analyze table t compute statistics for table for all indexes for all indexed columns;

Table analyzed.

ops$tkyte@ORA920LAP> select max(pnum) from t where :x like pnum ||'%';

 MAX(PNUM)
----------
      1234


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=4)
   1    0   SORT (AGGREGATE)
   2    1     FIRST ROW (Cost=2 Card=1532 Bytes=6128)
   3    2       INDEX (FULL SCAN (MIN/MAX)) OF 'T_IDX' (NON-UNIQUE) (Cost=2 Card=1)


<b>the cbo will use the index as a skinny table and full scan it very efficiently</b>

 

Thank you Tom

Arun Kalavath, July 16, 2003 - 4:32 am UTC


A reader, September 12, 2003 - 10:13 am UTC

Can we achive(pattern matching) this in oracle 10g feature "Regular Expressions" ?

Thanks

Tom Kyte
September 12, 2003 - 10:51 am UTC

regular expressions are pattern matching, yes.

OWA_PATTERN

Tatiane, September 12, 2003 - 1:22 pm UTC

Sirs, Oracle already provides a package for handling regular expressions: OWA_PATTERN. I am using 9iR2. I imagine 10G will make it built-in in the database. 

Tom, why the heck is it not listed anywhere in the online docs (like dbms_system is not either) ?
  
SQL> desc owa_pattern
FUNCTION AMATCH RETURNS NUMBER(38)
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 LINE                           VARCHAR2                IN
 FROM_LOC                       NUMBER(38)              IN
 PAT                            VARCHAR2                IN
 FLAGS                          VARCHAR2                IN     DEFAULT
FUNCTION AMATCH RETURNS NUMBER(38)
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 LINE                           VARCHAR2                IN
 FROM_LOC                       NUMBER(38)              IN
 PAT                            TABLE OF VARCHAR2(4)    IN/OUT
 FLAGS                          VARCHAR2                IN     DEFAULT
FUNCTION AMATCH RETURNS NUMBER(38)
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 LINE                           VARCHAR2                IN
 FROM_LOC                       NUMBER(38)              IN
 PAT                            VARCHAR2                IN
 BACKREFS                       TABLE OF VARCHAR2(32767) OUT
 FLAGS                          VARCHAR2                IN     DEFAULT
FUNCTION AMATCH RETURNS NUMBER(38)
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 LINE                           VARCHAR2                IN
 FROM_LOC                       NUMBER(38)              IN
 PAT                            TABLE OF VARCHAR2(4)    IN/OUT
 BACKREFS                       TABLE OF VARCHAR2(32767) OUT
 FLAGS                          VARCHAR2                IN     DEFAULT
FUNCTION CHANGE RETURNS NUMBER(38)
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 LINE                           VARCHAR2                IN/OUT
 FROM_STR                       VARCHAR2                IN
 TO_STR                         VARCHAR2                IN
 FLAGS                          VARCHAR2                IN     DEFAULT
PROCEDURE CHANGE
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 LINE                           VARCHAR2                IN/OUT
 FROM_STR                       VARCHAR2                IN
 TO_STR                         VARCHAR2                IN
 FLAGS                          VARCHAR2                IN     DEFAULT
FUNCTION CHANGE RETURNS NUMBER(38)
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 MLINE                          RECORD                  IN/OUT
   ROWS                         TABLE OF VARCHAR2(32767) IN/OUT
   NUM_ROWS                     NUMBER(38)              IN/OUT
   PARTIAL_ROW                  BOOLEAN                 IN/OUT
 FROM_STR                       VARCHAR2                IN
 TO_STR                         VARCHAR2                IN
 FLAGS                          VARCHAR2                IN     DEFAULT
PROCEDURE CHANGE
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 MLINE                          RECORD                  IN/OUT
   ROWS                         TABLE OF VARCHAR2(32767) IN/OUT
   NUM_ROWS                     NUMBER(38)              IN/OUT
   PARTIAL_ROW                  BOOLEAN                 IN/OUT
 FROM_STR                       VARCHAR2                IN
 TO_STR                         VARCHAR2                IN
 FLAGS                          VARCHAR2                IN     DEFAULT
PROCEDURE GETPAT
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 ARG                            VARCHAR2                IN
 PAT                            TABLE OF VARCHAR2(4)    IN/OUT
FUNCTION MATCH RETURNS BOOLEAN
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 LINE                           VARCHAR2                IN
 PAT                            VARCHAR2                IN
 FLAGS                          VARCHAR2                IN     DEFAULT
FUNCTION MATCH RETURNS BOOLEAN
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 LINE                           VARCHAR2                IN
 PAT                            TABLE OF VARCHAR2(4)    IN/OUT
 FLAGS                          VARCHAR2                IN     DEFAULT
FUNCTION MATCH RETURNS BOOLEAN
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 LINE                           VARCHAR2                IN
 PAT                            VARCHAR2                IN
 BACKREFS                       TABLE OF VARCHAR2(32767) OUT
 FLAGS                          VARCHAR2                IN     DEFAULT
FUNCTION MATCH RETURNS BOOLEAN
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 LINE                           VARCHAR2                IN
 PAT                            TABLE OF VARCHAR2(4)    IN/OUT
 BACKREFS                       TABLE OF VARCHAR2(32767) OUT
 FLAGS                          VARCHAR2                IN     DEFAULT
FUNCTION MATCH RETURNS BOOLEAN
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 MLINE                          RECORD                  IN
   ROWS                         TABLE OF VARCHAR2(32767) IN
   NUM_ROWS                     NUMBER(38)              IN
   PARTIAL_ROW                  BOOLEAN                 IN
 PAT                            VARCHAR2                IN
 RLIST                          RECORD                  OUT
   ROWS                         TABLE OF NUMBER(38)     OUT
   NUM_ROWS                     NUMBER(38)              OUT
 FLAGS                          VARCHAR2                IN     DEFAULT
FUNCTION MATCH RETURNS BOOLEAN
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 MLINE                          RECORD                  IN
   ROWS                         TABLE OF VARCHAR2(32767) IN
   NUM_ROWS                     NUMBER(38)              IN
   PARTIAL_ROW                  BOOLEAN                 IN
 PAT                            TABLE OF VARCHAR2(4)    IN/OUT
 RLIST                          RECORD                  OUT
   ROWS                         TABLE OF NUMBER(38)     OUT
   NUM_ROWS                     NUMBER(38)              OUT
 FLAGS                          VARCHAR2                IN     DEFAULT
 

Tom Kyte
September 12, 2003 - 2:12 pm UTC

owa_pattern, a very limited -- not too functional implementation -- is part of the web toolkit and is documented in the same place htp, htf, owa, owa_util, et. al. are.

in the ias documentation. it is part of the webserver.

owa_pattern , owa_text Installation

pasko, October 15, 2003 - 9:00 am UTC

Hi Tom,

i'm back again to my original query:

It seems there's a new Version again for OROMatcher stuff:

your initial version for reg_exp_match and reg_exp_compile still works though, with the addition of the new import statements as :


import org.apache.oro.text.regex.*;
import org.apache.oro.text.perl.*;


Download @:
</code> http://jakarta.apache.org/oro/index.html <code>

They have even provided an executable which can directly be loaded in the Database with the loadjava Utility:

jakarta-oro-2.0.7.jar

------------------------------------------------------

A Question though about the owa_pattern and owa_match Packages :

-----------------------------------------------------

I note that, if i run the following scripts under the
$ORACLE_HOME/rdbms/admin Folder ,

@pubpat.sql
@pubtext.sql
@privpat.sql
@privtext.sql

i could also get most of the Regular Expressions functionality that i wanted.


The Limitations for this owa_pattern as listed in the Package are :

--------------------------

Possible future enhancements:
-----------------------------
* \B - match on non-"word" boundaries (between \w and \w, or \W and \W)

* "or" character matches:
change(text,'(Unix|unix)','UNIX') would change both occurances

* Using control character references:

A backslashed c followed by a single character, such as \cD, matches
the corresponding control character.

-- No support for:
-- \b == Backspace (in a character class)
-- \r == Carriage return
-- \f == Form feed
-- Modified support for:
-- \s == A whitespace charcter -> [ \t\n\r\f]
-- \S == A non-whitespace character

----------------------------------------------------




Are we allowed to install the packages as stand-alone Packages in other Applications which might require their usage ?

Thanks in advance.











Tom Kyte
October 15, 2003 - 6:20 pm UTC

in 8i and up -- they should just be "already there" -- they are part of the database, installed by default.

in 10g - the regex stuff is going to be fantastic.

How to pass Array of Pattern back and forth from Oracle to Java...

pasko, October 31, 2003 - 8:56 am UTC

Hi Tom,

I know regular Expressions are coming in 10G,
but in the meantime, i was just trying to figure out how to pass an Array of type: Pattern from your Example above:

i mean an array of Compiled Regular Expressions.

The Reason i am doing this is that i need to save the Compiled Patterns so that i could re-visit some of them again when doing the Matching without recompiling them again.

I know the solution in this case is optimzed for compile- once , match-many times using the same compiled Pattern.

Now i want to Optimize it further by caching some frequently Used Patterns and skip the Compilation process.

Something like this pseudo code :

// I get this from Oracle table
String[] tokens = { array_of_regular_Expressions } ;

// I need to keep track of some known number of
// Compiled patterns
Pattern[] patterns ; // Array of patterns

patterns = new Pattern[ tokens.length ];

for( int token=0; token < tokens.length; token++ ){

patterns[token] = compiler.compile( tokens[token] );
}


I know using some Examples on this site, i could pass
the String Array: tokens from Oracle to Java.

But how to pass an Array of Class Type : Pattern .

Thanks in advance.




Tom Kyte
November 01, 2003 - 11:42 am UTC

you would not pass an array of patterns? You would just make that a static data structure in the class itself. there is no SQL type that corresponds to "pattern", it is 100% java.

Just set up a hash table of "reg expression" to "compiled regex" and in the compile function -- just do a lookup in the hash table to see if you've compiled it and if so -- just set the pattern to that, else compile it, put it in the hash table and set the pattern to that.

Give the link..

Arangaperumal, July 12, 2004 - 7:29 am UTC

Hi Tom,
Kindly give the link to know about
OWA_PATTERN ( oracle 8.1.7).
I can not able find oracle document.
Thank you

a working example

CHAG, September 21, 2004 - 5:51 pm UTC

declare
v_string_returned boolean;
BEGIN
v_string_returned :=owa_pattern.match('ABCn124x11xs', '\d');
END;





Non-greedy matching?

Mark, May 26, 2005 - 4:49 pm UTC

Hi Tom,

I have been toying with REGEXP_REPLACE to parse strings. The problem I am having is that this function seems to match greedily; I can't figure out how to make it match in a non-greedy way. 

In case you are not familiar with the terms, this page gives a brief example explaining the difference between greedy and non-greedy: 
http://www.channelping.com/practicalperl/?id=lectures/regex/greedy_vs_nongreedy

Example of what I'm trying to accomplish:

To extract this zip code from an address string, the following query appears to give just what I want:

SQL> select a, length(a) from (SELECT REGEXP_REPLACE('DEER PARK, NY 11729-2002  ', '^[^[:digit:]]*([[:digit:]-]+).*$', '\1') a from dual);

A           LENGTH(A)
---------- ----------
11729-2002         10

However, try throwing a digit into the city name...

SQL> select a, length(a) from (SELECT REGEXP_REPLACE('DEER4 PARK, NY 11729-2002  ', '^[^[:digit:]]*([[:digit:]-]+).*$', '\1') a from dual);

A  LENGTH(A)
- ----------
4          1

If this were PERL, we could replace the [^[:digit:]] with .*?, thus telling the parser to match all characters non-greedily up until the first digit. Unfortunately, attempting to do so with REGEXP_REPLACE yields this result:

SQL> select a, length(a) from (SELECT REGEXP_REPLACE('DEER PARK, NY 11729-2002  ', '^.*?([[:digit:]-]+).*?$', '\1') a from dual);

A  LENGTH(A)
- ----------
2          1

So, is there any way to do a non-greedy match? (No city really has a digit in its name, but I hope I have illustrated the point.) I'm ready to load up jakarta.regexp, but I thought I'd ask the Oracle man first. :)

Thanks a lot,
Mark 

Tom Kyte
May 26, 2005 - 7:40 pm UTC

I believe that right now in 10gr1, it is only the posix regex which is greedy, rumors of 10gr2 having perl mode too where it can be "not so greedy"

I know if no way to make it 'non-greedy'

Khalid Azmi, July 12, 2005 - 11:16 am UTC


How to find Mixed case values

A reader, August 19, 2005 - 6:49 pm UTC

Hello Sir.

How do I find values having mixed case in SQL.
I dont want all upper case or all lower case values.

Only values that have mixed casei.e values that have both capital as well as small letters. Text may also have some numeric values.


'ORACLE IS GOOD1' -- dont pick
'oracle is good2' -- dont pick
'Oracle is Good1' -- Pick this
'Oracle is gOod2' -- Pick this

create table mix (mix_col varchar2(100) ;

insert into mix values ('ORACLE IS GOOD1' ) ;
insert into mix values ('oracle is good2' ) ;
insert into mix values ('Oracle is Good1') ;
insert into mix values ('Oracle is gOod2') ;








Tom Kyte
August 20, 2005 - 4:57 pm UTC

ops$tkyte@ORA10G> select * from mix where upper(mix_col)<>mix_col and lower(mix_col)<>mix_col;
 
MIX_COL
-------------------------------------------------------------------------------
Oracle is Good1
Oracle is gOod2
 
 

Thanx

A reader, August 20, 2005 - 5:58 pm UTC


use of RegExp_Like

Kamran, September 23, 2005 - 7:03 am UTC

reference: All results used in these question are based on Oracle's default Scott.Emp table
All questions are about second parameter in Regexp_like function

/********** Question # 1**************************************/
select ename from emp
where regexp_like (ename, '[[A*B]$');

OR

select ename from emp
where regexp_like (ename, '[^[A*B]$');

above queries run but not the following one.
Now removing one bracket '[]' from each side:

select ename from emp
where regexp_like (ename, '[A*B$');

ERROR at line 2:
ORA-12726: unmatched bracket in regular expression


why such error is not raised in previous cases.

/***********************************************************/

/************** Question # 2 *******************************/

select ename from emp
where regexp_like (ename, '[A*B]');

using [A*B] or [A.B] or [A-B] results in same. is there any difference.
when to use [] and when not to use?
e.g. '^A' or '^[AL]' means starting with 'A'
where '^AL' or '^(AL)' mean starting with 'AL'

/***********************************************************/

/************** Question # 3 *******************************/
'A|N' means either starting with 'A' or ending with 'N'.
Result is:
ALLEN
MARTIN
ADAMS
How can we get names starting with 'A' and ending with 'N'?
Result should be:
ALLEN

Hopeful for a an elucidating response as ever.

Tom Kyte
September 23, 2005 - 9:37 am UTC

I'm not a regexp expert, but it would appear that [ starts the [] pair and the second [ is considered in the "set"


the second one is not "the same"

ops$tkyte@ORA10GR1> create table t ( x varchar2(5) );

Table created.

ops$tkyte@ORA10GR1> insert into t values ('*');

1 row created.

ops$tkyte@ORA10GR1> select * from t where regexp_like(x,'[A*B]');

X
-----
*

ops$tkyte@ORA10GR1> select * from t where regexp_like(x,'[A.B]');

no rows selected
 

Re: use of RegExp_Like

Anders, September 23, 2005 - 12:17 pm UTC

I'd like to take a stab at this if I may:

The character class markers [...] can not be nested, so '[' in a class just mean '['. Therefore, '[[A*B]$' just matches a string ended by either '[', 'A', '*' or 'B', and '[^[A*B]$' matches a string not ended by '[', 'A', '*' nor 'B'. But normally (outside a character class) '[' have the special meaning of starting a character class and since that have to be ended by a ']', the pattern '[A*B$' contains a syntax error. Had you wanted to match a string ending with the '[', 'A', '*' and 'B' in sequence, you could escape the '[' and '*' and write '\[A\*B$'. (This is just what Tom said).

The pattern '[A*B]' matches any string that contain either 'A', '*' or 'B', the pattern '[A.B]' matches any string that contain either 'A', '.' or 'B' and the pattern '[A-B]' matches any string that contain a character in the range between 'A' and 'B' which is just 'A' or 'B'. So the patterns are all different, but if your input string only contain letters (as emp.ename) you won't see the difference. But you could test it with something like "select v from (select chr(level) v from dual connect by level<=255) where regexp_like(v,'[a*b]')"

"when to use [] and when not to use?" You should use [abc] when you want to match any of a set of expressions. It is kind of a shorthand for (a|b|c). But note that the empty group (eg. the pattern '[]') is not allowed. ']' have no special meaning when placed in the very begining, just as '-' have not special meaning when placed at the begining or end of a group, so the pattern '[][-]' would match any string containing either ']', '[' or '-' (be cause the first ']' is a the start, the second '[' is inside and the '-' is at the end of a group).

'^A' matches any string stared by 'A', but '^[AL]' matches any string started by 'A' or 'L'.
Bot '^AL' and '^(AL)' matches any string started by the characters 'A' and 'L' in sequence by the second one also creates a grouping that you could refer to via a back reference.

'A|N' matches any string containing either 'A' or 'N'. As noted above [abc] is simelair to (a|b|c), so this pattern might also be written with the shorthand '[AN]'.

You could get names starting with 'A' and ending with 'N', by using the pattern '^A.*N$', but "ename like 'A%N'" would probably be faster (to write and execute).

You might want to check out:
</code> http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/ap_posix001.htm http://search.cpan.org/dist/perl/pod/perlre.pod#Version_8_Regular_Expressions <code>
(Oracles regexp documentation is lacking quite a bit, so you might get some inspiration/understanding by looking at the perl docs, but remember that oracles regexp is not perl-compatible so some perl-features is missing)


Tom Kyte
September 23, 2005 - 8:37 pm UTC

10gr2 adds perl like compatibility however....

</code> http://docs.oracle.com/docs/cd/B19306_01/server.102/b14214/chapter1.htm#sthref162 <code>

Thanks

Kamran, September 26, 2005 - 1:24 am UTC

All that reference stuff is very useful. Thanks Tom, and special thanks for Andres.

A reader, January 10, 2006 - 10:40 am UTC

Hi Tom,

My requirement is a little abstract and very difficult to explain but I will try my best.

We are working on a project right now where we are trying to analyze all records from a customer.

Consider a table like

create table names
(
fn1 varchar2(20),
fn2 varchar2(20),
ln1 varchar2(20),
ln2 varchar2(20)
);

insert into names ('ANTHONY', 'ANTHONY', 'JOHN', 'JOHN');
insert into names ('JOHNNY', 'KOHNNY', 'SOMELN', 'SOMELN2');
insert into names ('RACHEL', 'RACQUEL', 'KOLI', 'KOLI');

Now, the customer wants a way to retrieve records that follow some rules:

1. Records which are an exact match or where the first few characters match like the first record(comparison between fn1 and fn2 and ln1 and ln2).
2. Records which are similar matches like the second record where atleast the last five characters match in the firstname and the first six characters match in the lastname.

The inputs will be what to compare (firstname/lastname), how many consecutive characters in the name to compare (a numeric value).

Can we use some kind of pattern matching for this...

Please help.

Thanks.

regular expressions

mo, January 26, 2006 - 8:31 pm UTC

Tom:

I see a big reference that oracle 10g supports regular expressions like perl and awk.

1. Does not 8i and 9i support regular expression using owa_pattern.

2. Would you upgrade a database from 8i to 10g for the regular expression feature.

Thanks,

Tom Kyte
January 27, 2006 - 8:31 am UTC

1) rudimentary support in a plsql package is provided, as opposed to full posix (10gr1) and perl (10gr2) support compiled into the kernel.

2) that would not be my reason for doing that upgrade. It would be one of the thousands of other really big additions (and in particular the supportability - unless you freeze that 8i system, the OS it is on, the hardware, etc - you have a support issue)

regular expressions

mo, January 27, 2006 - 5:00 pm UTC

Tom:

What do you mean by full posix and perl support.

Do you mean owa_pattern provide baic regular expressions while regexp in 10g provides advanced capabilities like perl

Tom Kyte
January 28, 2006 - 12:53 pm UTC

I mean "posix is a standard, it describes how to do regular expressions, we do what they say. perl is a psuedo standard, it also describes how it believes regular expressions should be done. people like that, so we did that too"

owa_match, the plsql package is very rudimentary in its implementation

Best match

Serge Shmygelsky, February 23, 2006 - 9:28 am UTC

Hello Tom,

I was looking through this thread and found a post about matching dialled numbers with prefixes which is closely related to my needs. I've managed to do it in the following way:

SHMYG@REX> desc calls
Name Null? Type
------------------------------- -------- ----------------------------
1 IMSI VARCHAR2(15)
2 B_NUMBER VARCHAR2(10)

SHMYG@REX> desc zones
Name Null? Type
------------------------------- -------- ----------------------------
1 ZONE_PREFIX VARCHAR2(5)
2 ZONE_DES VARCHAR2(20)

SHMYG@REX> select * from calls;

IMSI B_NUMBER
--------------- ----------
1 124
1 1234
1 15
2 126
2 1
SHMYG@REX> select * from zones
2 /

ZONE_ ZONE_DES
----- --------------------
12 zone 12
123 zone 123
1 zone 1
SHMYG@REX> get a
1 select * from (
2 select calls.imsi, calls.b_number, zones.zone_des, length(regexp_substr(calls.b_number, '^' || zones.zone_prefix)) as zone_length,
3 row_number() over (partition by imsi, b_number order by length(regexp_substr(calls.b_number, '^' || zones.zone_prefix)) desc ) rn
4 from calls, zones
5 where length(regexp_substr(calls.b_number, '^' || zones.zone_prefix)) is not null
6 )
7* where rn = 1
SHMYG@REX> /

IMSI B_NUMBER ZONE_DES ZONE_LENGTH RN
--------------- ---------- -------------------- ----------- ----------
1 1234 zone 123 3 1
1 124 zone 12 2 1
1 15 zone 1 1 1
2 1 zone 1 1 1
2 126 zone 12 2 1
SHMYG@REX>

So it looks quite fine.
From your opinion, is this the right approach? Especially performance-wise assuming that CALLS table will have 30 mln records per day and I need to create a report containing number of calls per zone.
I'm using Oracle 10G.

Thanks in advance

Regex and Indexes

Barry Chase, March 28, 2007 - 3:15 pm UTC

Need flexibility of Regular Expressions but need indexing too..

I have a situation where I need to search for last name based upon user input, but also need to search for hyphenated names as well. I want to be able to use in index for performance, however I haven't found a means to do so without sacrificing either search capabilities or performance.

I need to be able to find :

Yost%
Yost-%
%-Yost%
%-Yost
but not %Yost%

The reason for the latter is to prevent simple searches that return undesirable sized result sets.

If I use just LIKE statements... I can create a func index and maintain performance, but lose the %-Yost% and %-Yost searches. If use regular expressions, I get all of the above but lose the index. Since one of the components is a user provided value, I can create a function index for the regular expression. Thus I get a full table scan.

So what are my options ?

REGEXP_LIKE ( REGEXP_REPLACE ( UPPER ( hds.last_name ), ' ', NULL )
, '^(' ||
replace(REGEXP_REPLACE ( UPPER ( replace(LTRIM ( :p_last_name, '-' ),'-','000000')), '\W', ''''),'000000','-') ||
')|(-' ||
replace(REGEXP_REPLACE ( UPPER ( replace(LTRIM ( :p_last_name, '-' ),'-','000000')), '\W', ''''),'000000','-') ||
')')

===========================
Barry Chase
Application Architect
Humana Inc.
Louisville, Kentucky USA
==========================
Tom Kyte
March 30, 2007 - 11:35 am UTC

... I can create a func index and maintain performance, b ...


how would a "function based index" be useful here?


Oracle Text indexing can be used to facilitate index based retrieval for wildcards that start with %

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:37336026927381#37360199308431


Per Abich, April 02, 2007 - 9:59 am UTC

Is there a way, to use one pattern expression to split a complex string in multiple parts? I have i.e. the following string:
IBS_01M_20071013_C

the _ is a sepeartor string.

In other languages I'd use the following pattern:
([[:alpha:]]{3-5})_([[:alpha]]{3})_([[:digit:]]{8})_([[:alpha:]])

and after parsing, I can simply address the data using an index.

some more help on this.

Vijay, August 13, 2007 - 10:18 am UTC

Dear Tom,
good day to you as always, I am trying to get some information using regexp_like but facing one small issue need your help on this.

The table I am working on has a column of varchar2(15) having numbers, the reason to have it varchar2 is the account numbers start with 0 (zero), I want to search all junk account number that have non numeric values so I used the below query

select acc_number from accounts where not regexp_like(acc_number,'[0-9]');

this query list all account number having non numeric values but if there are account number with . (decimal) these are not listed, so I have to use another query

select acc_number from accounts where regexp_like(acc_number,'[.]');

I am not sure why is '.' not being considered in first query, your help in this regard is much appreciated.

Thanks in advance for your time on this.

Regards,
Vijay Sehgal
Tom Kyte
August 15, 2007 - 11:06 am UTC

give example, not really following your text description all of the way.

if you are trying to find things that are ONLY numeric (0..9), I myself prefer the old fashioned:
ops$tkyte%ORA10GR2> select x, replace( translate(x,'0123456789','0'), '0', '' ) newx
  2    from t
  3  /

X          NEWX
---------- ----------
123
12.3       .
12x3       x

ops$tkyte%ORA10GR2> select x
  2    from t
  3   where replace( translate(x,'0123456789','0'), '0', '' ) is null
  4  /

X
----------
123


escape \.

Pasko, August 15, 2007 - 3:09 pm UTC

Hi Vijay,

you have to escape the period character '.'

like this :
---------------------------------------
select acc_number from accounts where regexp_like(acc_number,'[\.]');
---------------------------------------

You can also try to use the alphanumeric [:alnum:]
for example here is a small test case:

create table test(num varchar2(10))
insert into test values( '1');
insert into test values( '1123');
insert into test values( 'aa1123');
insert into test values( 'notnumber');
insert into test values( '123.45');
insert into test values( '0.45');
insert into test values( '.46');
commit;

SQL> select * from test where regexp_like(num,'([\.]|[:alnum:])');

NUM
----------
aa1123
notnumber
123.45
0.45
.46

Pattern Matching

Rich, August 15, 2007 - 4:09 pm UTC

Tom,

Again, thanks for posting this site.

I have a requirement where I need to convert a dollar figure into a ten digit zero padded field with no decimal point and the last two digits are always the decimal portion of the number. I created a fairly complex case statement that seems to work okay but was wondering if I used regular expressions I might be able to create a more concise way to accomplish this. Here is the statment

lpad(replace(case length(substr(to_char(sum(decode(trim(f.frick_value), 'B', a.wage_amount, Null))), instr(to_char(sum(decode(trim(f.frick_value), 'B', a.wage_amount, Null))), '.') + 1, 2))
when 0 then to_char(sum(decode(trim(f.frick_value), 'B', a.wage_amount, Null))) || '00'
when 1 then to_char(sum(decode(trim(f.frick_value), 'B', a.wage_amount, Null))) || '0'
when 2 then to_char(sum(decode(trim(f.frick_value), 'B', a.wage_amount, Null)))
else '0000000000'
end, '.', null), 10, '0') "BONUS"

Any suggestions would be appreciated.

Thanks.
Tom Kyte
August 20, 2007 - 7:22 pm UTC

do you mean this?

ops$tkyte%ORA10GR2> select x, to_char( x, 'fm00000000v00' ) from t;

         X TO_CHAR(X,'
---------- -----------
    123.45 0000012345

thanks for the help on this.

Vijay Sehgal, August 16, 2007 - 2:52 am UTC

Dear Tom,
good day to you as always and thanks for your help on this, I was trying to achieve the same functionality which you have highlighted using replace and translate.

give example, not really following your text description all of the way.

select x, replace( translate x,'0123456789','0'), '0', '' ) newx from t

X NEWX
---------- ----------
123
12.3 .
12x3 x

Is it possible to achieve the same using regexp_like in one query.

Thanks again for your help and time.

Kind Regards,
Vijay Sehgal.

Hi Rich, try the 'fm' format modifier

Pasko, August 16, 2007 - 5:07 am UTC

Hi Rich from NJ,

before you get a better answer from Tom, i had some time and
this is what i came up with, using the fm format modifier.
you can also see examples from this site using this trick.

First i think this is what your Query does:
step 1:
----------------
extract two digits after decimal point from wage_amount
and check
if length of the extracted value is 0
then append '00' at the end of wage_amount
if length of the extracted value is 1
then append '0' at the end of wage_amount
if length of the extracted value is 2
then append nothing
else append ten '0's
----------------
step 2:
remove '.' from the step 1 result
----------------
step 3:
left pad result from step 3 with '0's until length 10

and then i simplified your query a bit:(Tom likes this :) )

select lpad (
replace (
case length (substr (wage_amount, instr (wage_amount, '.') + 1, 2) )
when 0 then wage_amount || '00'
when 1 then wage_amount || '0'
when 2 then wage_amount
else '0000000000'
end,
'.',
null
),
10,
'0'
) "BONUS"
from (select '0000000000' wage_amount
from dual) a
---
and here is what i think would work:
---
select wage_amount ,
replace( case when instr(wage_amount,'.') > 0
then to_char(wage_amount, 'fm00000000D00')
else
to_char(wage_amount, 'fm0000000000')
end,'.') as Bonus
from
( select '1234567890' wage_amount from dual
union all
select '34567890' wage_amount from dual
union all
select '12345678.9' wage_amount from dual
union all
select '12345678.09' wage_amount from dual
union all
select '0234567890' wage_amount from dual
union all
select '00345678.9' wage_amount from dual
union all
select '00045678.09' wage_amount from dual
)

>>Results:
WAGE_AMOUNT BONUS
----------- ------------
1234567890 1234567890
34567890 0034567890
12345678.9 1234567890
12345678.09 1234567809
0234567890 0234567890
00345678.9 0034567890
00045678.09 0004567809

Hope this helps.

By the way, i am the original poster of the first Question
in this thread.
We learn a lot from Tom.

Rich from NJ, USA

Etbin Bras, August 16, 2007 - 5:10 am UTC

how about (simplified for clarity)

lpad(to_char(round(100*nvl(:num,0),0)),10,'0')

using

sum(decode(trim(f.frick_value),'B',a.wage_amount,null))

as :num

regards

Etbin

difference between Pasko's and Etbin's solution

Pasko, August 16, 2007 - 5:20 am UTC

Hi Etbin,
i like your solution,
the only difference between mines and yours is:


SQL> select * from (
  2  select  wage_amount ,
  3     replace( case when instr(wage_amount,'.') > 0
  4                then to_char(wage_amount, 'fm00000000D00')
  5             else
  6                to_char(wage_amount, 'fm0000000000')
  7             end,'.')  as Bonus_Pasko,
  8      lpad(to_char(round(100*nvl(wage_amount,0),0)),10,'0') Bonus_Etbin
  9    from
 10  ( select '1234567890' wage_amount from dual
 11    union all
 12    select '34567890' wage_amount from dual
 13    union all
 14    select '12345678.9' wage_amount from dual
 15    union all
 16    select '12345678.09' wage_amount from dual
 17    union all
 18    select '0234567890' wage_amount from dual
 19    union all
 20    select '00345678.9' wage_amount from dual
 21    union all
 22    select '00045678.09' wage_amount from dual
 23  )
 24  )
 25  where Bonus_Pasko <> Bonus_Etbin;

WAGE_AMOUNT BONUS_PASKO  BONUS_ETBIN
----------- ------------ ----------
34567890    0034567890   3456789000
0234567890  0234567890   2345678900

Regards,

Pasko

Replacing Special Characters

Maverick439, September 13, 2007 - 2:13 pm UTC

Tom,
I need to search and replace all Special characters [characters, that cannot be entered using Keyboard] in Varchar2 field. Is this possible with Regexp_replace function? if so how? I am not sure how to ask regular expression to look for specific ASCII characters ..any help is greatly appreciated.

Thanks,
Tom Kyte
September 15, 2007 - 7:44 pm UTC

... [characters, that cannot be
entered using Keyboard] ...

that doesn't make sense, there are keyboards that can enter anything and using the ALT-key, I can enter anything I want really

# - I put this # there by holding alt and typing the number 35.

you would need to put together a better "requirement", then we can build a string that we can use with replace - to replace away anything you don't want.

try these using Oracle Regular Expressions

Pasko, September 16, 2007 - 4:58 am UTC

Hi Maverick,

may be you can try one of these on your Data and see
if they return what you expected:

[:print:] Printable characters
[:punct:] Punctuation characters
[:space:] Space characters (nonprinting),
such as
carriage return,
newline, vertical tab, and
form feed


Hope this helps.

Replacing Special Characters

Maverick439, September 17, 2007 - 10:01 am UTC

Tom, when i said characters those cannot be entered by keyboard, I meant, special characters like ¿ ¿ ` etc. these characters you cannot use Keyboard to get them[Atleast that's what I thought]. I got these special symbols in MS Word. Most frequently used is ` [Apostrophe].

I need to search for them and replace with nulls. Sorry for not being clear.

Any ideas?? [I'll try Pasko's suggestion. Thanks Pasko].

Tom Kyte
September 18, 2007 - 2:55 pm UTC

but you can enter those through various keyboards all over the planet (you should see some european keyboards sometimes...)

these are known as

`smart quotes¿

believe it or not (smart quotes, ugh)
http://www.fonts.com/AboutFonts/Articles/fyti/Smartquotes.htm

Special Characters

Maverick439, September 18, 2007 - 4:34 pm UTC

Ok, Atleast we are going somewhere. So, How do we eliminate those smart Quotes? I tried Pasko's suggestion, but it did not eliminate those[Doesn't mean I am doing it right though]..

Thanks for your help..
Tom Kyte
September 19, 2007 - 8:51 am UTC

created a file in word with 'smart quotes'
saved it as text
dumped it

221 and 222 are the 'codes'

od -c smart\ quotes.txt
0000000 221   s   m   a   r   t       q   u   o   t   e   s 222  \r  \n
0000020  \r  \n
0000022



so, if you

translate( str, chr(221)||chr(222), chr(39)||chr(39) )

you would turn those smart quotes into dumb ones

did you try this?

Pasko, September 19, 2007 - 9:12 am UTC

Hi Maverick,

did you try like this:

for example to get all non-printable characters:


select * from table
where regexp_like(column , '[^[:print:]]')

Special Characters

Maverick439, September 19, 2007 - 10:16 am UTC

Thanks Tom and Pasko for your help.
Tom, what do you mean when you said Dumped the text file?
Also, 221 and 222 are for all special characters?

Pasko, I tried your suggestion like this:

select column,regexp_replace(column,[^[:print]]',null) from table

it did not replace those characters..did I miss some thing?
Tom Kyte
September 19, 2007 - 1:11 pm UTC

I demonstrated what I did, I used a common unix utility "od" to dump out the file and cut and pasted the results...


In my test, using my charactersets installed with windows and my version of word - 221 and 222 were the decimal codes used to represent these so called smart quotes.

Not much of a help

Maverick439, September 19, 2007 - 1:28 pm UTC

Tom, I checked chr 221 and 222 and they are smart characters alright. But If I take your approach, I have to find chr() value for each and every special character out there and try to implement in a loop to replace them. I wish it would be simple like regexp(replace(column,'smartquote',null). I cannot use [[:print:]] posix notation as these are all printable characters..I tried
[^[:graph:]]..but no use with that either.

Thanks
Tom Kyte
September 20, 2007 - 9:20 pm UTC

this is why I'm trying to tell you that YOU have to identify what is special to YOU.

And what you want to replace them with. Eg: 221 and 222 should be replaced with 39

when you find the rest of them, you have to pretty much tell us what you want them to be replaced with.

Think about this please - these are only special to you, to the rest of the world, they are normal printable characters.

Special Characters

Maverick439, September 21, 2007 - 12:44 pm UTC

Thanks Tom. I understand what you are saying [now!!]. I don't know what are not allowed. I only know what are allowed and those are somewhere between 90 and 102 characters[a big list] that we can enter from keyboard. So, rest are all not allowed. I'll try to filter them someway.

Thanks for all the help.


try NOT ( all characters you know ) ..

Pasko, September 21, 2007 - 4:11 pm UTC

Hi Maverick,
may be it's better if you try to replace all the characters
you do not know :)


regexp_replace(column , '[^a-zA-Z0-9]' )

Special Characters

Maverick439, September 23, 2007 - 2:40 pm UTC

Actually I tried this way after struggling for a while:


 select regexp_replace(Description,'[^][(A-Za-z0-9!@#$%&*() _+={}?^\|<>,.\''~`\\/":;\-][:spaces:]','')
     into l_desc 
     from dual;


I know I wanted all those characters and only those. I was wondering if there was any POSIX notation for special characters. Actually there is ..[:PUNCT:] .but it includes every thing..
But this works for me.

Limited Use of Oracle's Regular Expression?

A reader, September 09, 2008 - 1:22 pm UTC

Based on the tutorial

http://www.oracle.com/technology/obe/10gr2_db_single/develop/regexp/regexp_otn.htm

it appears that there are 4 functions that can be used for regular expressions:

REGEXP_LIKE
REGEXP_INSTR
REGEXP_REPLACE
REGEXP_SUBSTR

Now I want to write a SQL using regular expression to return strings that contain ONLY numbers. I try the following.

CREATE TABLE x (
   col VARCHAR2(20)
);

INSERT INTO x VALUES ('123');
INSERT INTO x VALUES ('abc');
INSERT INTO x VALUES ('ABC');
INSERT INTO x VALUES ('abcdef');
INSERT INTO x VALUES ('abc11');
INSERT INTO x VALUES ('ab39c');
INSERT INTO x VALUES ('45abc');
INSERT INTO x VALUES ('abc!');
INSERT INTO x VALUES ('abc#');
INSERT INTO x VALUES ('abc%');
INSERT INTO x VALUES ('###');

COMMIT;

SELECT col
FROM   x
WHERE  regexp_like(col, '[0-9]');

COL
--------------------
123
abc11
ab39c
45abc



The SQL does not return the desired result. My guess is that because of the "like" function that I used, it returns all strings that contain one or more numbers. How do I use regular expression to return only what I ask it to match? In this case, I want strings that contain only numbers.
Tom Kyte
September 10, 2008 - 9:22 am UTC

example here:
http://laurentschneider.com/wordpress/2007/09/isnumber-in-sql.html

but I would suggest something easier and faster maybe

where replace( translate( col, '0123456789','0000000000'), '0', '') is null

turn all digits into zero, turn all zeroes into nothing, if you end up with nothing - it was either nothing to begin with or was all digits.

expected pattern

Nicosa, September 10, 2008 - 10:27 am UTC

Hello,

this in answer to "A reader" (previous followup) :
I think the pattern you should have used is REGEXP_LIKE(col,'^[0-9]+$')

(Anyway, Tom's answer might still be better performance wise)

Re:Limited Use of Oracle's Regular Expression?

Brian Camire, September 10, 2008 - 10:47 am UTC

Here's another way:

SQL> SELECT
  2      *
  3  FROM
  4      (
  5          SELECT '123' AS COL FROM DUAL
  6          UNION ALL SELECT 'abc' FROM DUAL
  7          UNION ALL SELECT 'ABC' FROM DUAL
  8          UNION ALL SELECT 'abcdef' FROM DUAL
  9          UNION ALL SELECT 'abc11' FROM DUAL
 10          UNION ALL SELECT 'ab39c' FROM DUAL
 11          UNION ALL SELECT '45abc' FROM DUAL
 12          UNION ALL SELECT 'abc!' FROM DUAL
 13          UNION ALL SELECT 'abc#' FROM DUAL
 14          UNION ALL SELECT 'abc%' FROM DUAL
 15          UNION ALL SELECT '###' FROM DUAL
 16          UNION ALL SELECT NULL FROM DUAL
 17      )
 18  WHERE
 19      NOT REGEXP_LIKE(COL, '[^0-9]');

COL
------
123


In other words, "where COL does not contain a character that is not a digit".

Is this a bug?

A reader, September 10, 2008 - 12:38 pm UTC


CREATE TABLE x (
col VARCHAR2(20)
);

INSERT INTO x VALUES ('123');
INSERT INTO x VALUES ('abc');
INSERT INTO x VALUES ('ABC');
INSERT INTO x VALUES ('abcdef');
INSERT INTO x VALUES ('abc11');
INSERT INTO x VALUES ('ab39c');
INSERT INTO x VALUES ('45abc');
INSERT INTO x VALUES ('abc!');
INSERT INTO x VALUES ('abc#');
INSERT INTO x VALUES ('abc%');
INSERT INTO x VALUES ('###');

COMMIT;

-- ---------------------------------------------------------------------------------------------
-- After applying the translate function, why do we get NULLs for alphanumeric strings?
-- ---------------------------------------------------------------------------------------------
SELECT col, TRANSLATE(LOWER(col), LOWER('0123456789abcdefghijklmnopqrstuvwxyz'), '0') translated
FROM x;

COL TRANSLATED
------- ----------
123
abc
ABC
abcdef
abc11
ab39c
45abc
abc! !
abc# #
abc% %
### ###

-- ---------------------------------------------------------------------------------------------
-- Puzzling result - Why doesn't '123' get translated to '000' instead?
-- ---------------------------------------------------------------------------------------------
SELECT col, TRANSLATE(LOWER(col), LOWER('0123456789abcdefghijklmnopqrstuvwxyz'), '000') translated
FROM x;

COL TRANSLATED
------- ----------
123 00
abc
ABC
abcdef
abc11 00
ab39c
45abc
abc! !
abc# #
abc% %
### ###

Tom Kyte
September 11, 2008 - 11:11 am UTC

ops$tkyte%ORA10GR2> select translate( 'abc', 'abc', 'X' ) from dual;

T
-
X

ops$tkyte%ORA10GR2> select translate( 'abc', 'abc', 'XY' ) from dual;

TR
--
XY

ops$tkyte%ORA10GR2> select translate( 'abc', 'abc', 'XYZ' ) from dual;

TRA
---
XYZ


Your TRANSLATE(LOWER(col),
LOWER('0123456789abcdefghijklmnopqrstuvwxyz'), '0')

turned zero into zero and 1..9a..z into NULL - that is why.


-- Puzzling result - Why doesn't '123' get translated to '000' instead?
--
--------------------------------------------------------------------------------
-------------
SELECT col, TRANSLATE(LOWER(col),
LOWER('0123456789abcdefghijklmnopqrstuvwxyz'), '000') translated


because you turned 0 into 0, 1 into 0, 2 into 0 and 3..9a..z into NULL

RE: Is this a bug?

Brian Camire, September 10, 2008 - 3:29 pm UTC

This is the documented behavior of TRANSLATE. The documentation at http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions196.htm#i1501659 says:

TRANSLATE returns expr [the first argument] with all occurrences of each character in from_string [the second argument] replaced by its corresponding character in to_string [the third argument]. Characters in expr that are not in from_string are not replaced. If expr is a character string, then you must enclose it in single quotation marks. The argument from_string can contain more characters than to_string. In this case, the extra characters at the end of from_string have no corresponding characters in to_string. If these extra characters appear in char [this should be "expr"], then they are removed from the return value.

So:

1. In your first example, all letters a-z and all digits except zero are being removed. For the alphanumeric strings, there are no zeroes, so there are no characters left, leaving an empty (null) string.

2. In your second example, '123' gets translated to '00' because a '0' (if there was one) would get translated to '0', the '1' gets translated to '0' (the second '0' in the to_string), and the '2' gets translated to '0' (the third '0' in the to_string. There's no character in the to_string corresponding to the '3' in the from string, so it gets removed.

Hope this helps.

appropriate indexing for regexp searching

Chris Gould, October 17, 2008 - 4:49 am UTC

I have a requirement similar to the original one in this thread. I have a database of customer phone numbers which are held as varchar2, many containing non-numeric characters (eg spaces, brackets, hyphens etc. example : +44 (208) 123-4566 )
We have a help-desk call routing system which tries to identify customers from the phone-number they call in on and routes the call accordingly. The call-routing system always gives the incoming call number as a string of all digits. eg 02081234566 in the case of the example above.
What I've done so far is to build a table specifically for this search using REGEXP_REPLACE to eliminate all non-digits from the stored numbers.
eg
create table cleansed_data
as select ACCOUNT_ID
         ,regexp_replace(phonenumber,'([^[:digit:]])',NULL) as phone_number
  from CUST_CONTACTS
   where phonenumber is not null;

Also, when I try to match the incoming call number against the cleansed data, I ignore the leading zeros on the incoming number and try to match against my stored numbers from the right-hand end of the stored-phone number
eg
select * from cleansed_data
where regexp_like(phone_number,'(&inphone)$')  

(not showing the elimination of leading digits here).
My question is regarding what indexes (if any) would be appropriate on my cleansed_data table to make the REGEXP_LIKE search more efficient. (I have about 3million rows in the table being searched).



Tom Kyte
October 17, 2008 - 9:36 pm UTC

why create the table - why not just create a function based index on the column itself?


ops$tkyte%ORA10GR2> create table t ( phonenumber varchar2(40), data char(1000) );

Table created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create index t_idx on t(regexp_replace(phonenumber,'([^[:digit:]])',NULL));

Index created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> exec dbms_stats.set_table_stats( user, 'T', numrows=> 1000000 );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> variable x varchar2(40)
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> @at
ops$tkyte%ORA10GR2> column PLAN_TABLE_OUTPUT format a72 truncate
ops$tkyte%ORA10GR2> set autotrace traceonly explain
ops$tkyte%ORA10GR2> select * from t where regexp_replace(phonenumber,'([^[:digit:]])',NULL) = :x;

Execution Plan
----------------------------------------------------------
Plan hash value: 470836197

------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU
------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       | 10000 |   976K|     5   (0
|   1 |  TABLE ACCESS BY INDEX ROWID| T     | 10000 |   976K|     5   (0
|*  2 |   INDEX RANGE SCAN          | T_IDX |  4000 |       |     1   (0
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access( REGEXP_REPLACE ("PHONENUMBER",'([^[:digit:]])',NULL)=:X)


No auxiliary table to maintain, no muss, no fuss.


Chris Gould, October 23, 2008 - 4:59 am UTC

But I want to use regexp_like to match on the string starting from the right-hand end (the $ anchor in the regexp_like example I gave). Won't your solution still match from the left-hand end of the string?
The phone number I have stored may or may not include the country-code for example. I still want a match regardless of whether that's stored or not.
So the incoming number I'm trying to match might be 2223333444 (I eliminate the leading zeros from the incoming number).
The numbers I have stored might be 00442223333444 or equally +44-222-333-4444 or may omit the country-code 0222 333-4444
I need the incoming number to match with all 3 examples here. (I have other code later which tries to narrow the list down by considering other attributes of the account).

Tom Kyte
October 23, 2008 - 1:39 pm UTC

if you want to use a fbi- you will come up with a function that can be indexed.

can you come up with such a function? it can use regular expressions and all..


let me ask you - can you create a function that can be indexed that would help you.

I indexed the data you had pre-created, is that sufficient for your needs. You know how an index works - can you use this technique.

Chris Gould, October 23, 2008 - 5:05 am UTC

Sorry : bad examples in my last post.
What I meant to say was >>>

So the incoming number I'm trying to match might be 223334444 (I eliminate the leading zeros from the incoming number).
The numbers I have stored might be 0044223334444 or equally +44-22-333-4444 or may omit the country-code 022 333-4444
I need the incoming number to match with all 3 examples here. (I have other code later which tries to narrow the list down by considering other attributes of the account).

A reader, October 23, 2008 - 8:57 pm UTC

Thanks, as always. I benefit so much from this site . . .

I don't mean to nit-pick and possibly shave off only a hundredth of a percent,
but I believe your Sept 10 suggestion to "A reader" could be optimized/simplified further by:

where translate(col,'~0123456789','~') is null

(Really any non-numeric leading character could be used instead of the ~.)

B

Chris Gould, October 27, 2008 - 8:14 am UTC

Hang on - it was you that suggested using a function-based index, not me! I only wanted to know if it was possible to create an index on my table of cleansed data that would work when I used REGEXP_LIKE(), using an expression that matched from the right-hand end of a string.

I can't see a way of creating a function-based index that would be of any help with this search. Hence my original question : is there ANY index I could create that would be of use?

Tom Kyte
October 27, 2008 - 9:30 am UTC

Ok, not being a heavy user of regular expression, I misinterpreted the "(&x)$", totally.


This is a combination of "index a function and do a fast leading wildcard search"

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:9455353124561

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:37336026927381#37360199308431

So, here goes:

ops$tkyte%ORA10GR2> create table t ( phonenumber varchar2(30) );

Table created.

<b>Your table...</b>


ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> begin
  2      ctx_ddl.create_preference('SUBSTRING_PREF', 'BASIC_WORDLIST');
  3      ctx_ddl.set_attribute('SUBSTRING_PREF', 'SUBSTRING_INDEX','TRUE');
  4  end;
  5  /

PL/SQL procedure successfully completed.

<b>for fast leading edge wildcard searching...</b>


ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create or replace procedure clean_phone_number( p_rowid in rowid, p_return in out nocopy varchar2 )
  2  as
  3          l_string t.phonenumber%type;
  4  begin
  5          select phonenumber into l_string from t where rowid = p_rowid;
  6          p_return := regexp_replace(l_string,'([^[:digit:]])',NULL);
  7  exception
  8          when no_data_found then raise program_error;
  9  end;
 10  /

Procedure created.

<b>this'll be the procedure we index...</b>

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> begin
  2          ctx_ddl.create_preference('my_user_datastore', 'user_datastore');
  3          ctx_ddl.set_attribute('my_user_datastore', 'procedure', '"OPS$TKYTE"."CLEAN_PHONE_NUMBER"');
  4          ctx_ddl.set_attribute('my_user_datastore', 'output_type', 'VARCHAR2');
  5  end;
  6  /

PL/SQL procedure successfully completed.

<b>tell text to use our procedure....</b>

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create index search_idx on T( phonenumber )
  2  indextype is ctxsys.context parameters
  3  ('wordlist SUBSTRING_PREF datastore my_user_datastore sync(on commit)');

Index created.

<b>and create the index. Now:</b>

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> insert into t values ( '0044223334444' );

1 row created.

ops$tkyte%ORA10GR2> insert into t values ( '+44-22-333-4444' );

1 row created.

ops$tkyte%ORA10GR2> insert into t values ( '022 333-4444' );

1 row created.

ops$tkyte%ORA10GR2> commit;

Commit complete.

ops$tkyte%ORA10GR2> variable x varchar2(30)
ops$tkyte%ORA10GR2> exec :x := '223334444';

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select * from t where contains( phonenumber, '%' || :x ) > 0;

PHONENUMBER
------------------------------
0044223334444
+44-22-333-4444
022 333-4444


Chris Gould, October 27, 2008 - 10:29 am UTC

Brilliant! Thanks for the extremely prompt and comprehensive reply. I'm also not that familiar with regular expressions, but they've been very useful so far in this data-cleansing / matching problem I've been working on.


followup

Chris Gould, October 28, 2008 - 9:11 am UTC

Tom -
just as a follow-up, I tried out the Oracle-text based solution and the query performance using the index was very good. However I hit several performance problems maintaining the data in the table. Creating the index on table with 800,000 rows took over 30mins. And inserting a new row into the table with the index in place sometimes took several minutes (the insert itself was milli-seconds, but the commit took minutes). I checked that it wasn't a case of waiting for locks - it wasn't.
There seem to be several documented cases on Metalink related to poor performance inserts/updates on tables with Oracle-Text indexes.

What I've done instead is similar to your original suggestion. I've created a function and a function-based index and use the LIKE operator to do the match.
The function uses REGEXP_REPLACE to eliminate non-digits but also reverses the order of the digits so that func('123-456-789') = '987654321'
I then use the LIKE operator in my where clause instead of the REGEXP_LIKE, since I now can match from the left-hand end of the indexed-value (I don't need the REGEXP "$" to match from the right-hand end).
eg.
create or replace function f_revstr(i_string in varchar2)
return varchar2 DETERMINISTIC
AS
  v_rev     varchar2(4000);
  v_len     pls_integer;
  v_clean   varchar2(4000);
begin
  
  v_clean := regexp_replace(i_string,'([^[:digit:]])',NULL);
  v_len   := length(v_clean);
  
  for i in reverse 1 .. v_len
  loop
     v_rev := v_rev||substr(v_clean,i,1);
  end loop;
  
  return v_rev;

end f_revstr;
/

Index :
create index phone_numbers_fbi on phone_numbers(f_revstr(orig_phone_number));

Query:
select * from phone_numbers
where f_revstr(orig_phone_number) like f_revstr('12-34-56 789')||'%'
/

This seems to do the trick. The index creates in 20 seconds, inserts/updates are more or less instantaneous and queries using the index take only milliseconds. Having read up a bit more on Oracle Text (aka interMedia in your book "Expert Oracle") what I want to do is only a fraction of the capability of Oracle Text, so probably a simpler solution is more appropriate here.


String matching

Prakash, February 13, 2009 - 7:45 am UTC

Tom,
I need to match 2 character columns in 2 tables and the match is fuzzy. The field is 12 characters long and if ANY 6 consecutive characters are the same then I need to consider it a match.

For example:

A8UI0$WER76$ and HGDU0$WER78*
KL6WER78*YIZ and UL6WER78*MBV are matches because atleast 6 consecutive characters match in the 2 columns.

I can use INSTR and SUBSTR to match a 6 characters but that only works if I the match positions are fixed. My requirement if that I need to match any 6 consecutive characters in the string starting anywhere - it could be starting from the first character to the 7th character.

Is there a way to do this in SQL using any in-built functions including Regular expressions?

I have learned so much from your site over the years and I would like to provide my sincerest Thanks for this amazing service you provide to the Oracle community.

Prakash
Tom Kyte
February 16, 2009 - 11:06 am UTC

well, most applications would involve a cartesian join, this does too - but "a pair of mini cartesian joins"

ops$tkyte%ORA10GR2> create table t1 ( x varchar2(12) );

Table created.

ops$tkyte%ORA10GR2> create table t2 ( y varchar2(12) );

Table created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> insert into t1 values ( 'A8UI0$WER76$' );

1 row created.

ops$tkyte%ORA10GR2> insert into t2 values ( 'HGDU0$WER78*' );

1 row created.

ops$tkyte%ORA10GR2> insert into t1 values ( 'KL6WER78*YIZ' );

1 row created.

ops$tkyte%ORA10GR2> insert into t2 values ( 'UL6WER78*MBV' );

1 row created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> with data as (select level l from dual connect by level <= 7)
  2  select 1, x, l, substr( x, l, 6 ) ss
  3    from t1, data
  4   union all
  5  select 2, y, l, substr( y, l, 6 ) ss
  6    from t2, data
  7   order by 1, 2, 3
  8  /

         1 X                     L SS
---------- ------------ ---------- ------
         1 A8UI0$WER76$          1 A8UI0$
         1 A8UI0$WER76$          2 8UI0$W
         1 A8UI0$WER76$          3 UI0$WE
         1 A8UI0$WER76$          4 I0$WER
         1 A8UI0$WER76$          5 0$WER7
         1 A8UI0$WER76$          6 $WER76
         1 A8UI0$WER76$          7 WER76$
         1 KL6WER78*YIZ          1 KL6WER
         1 KL6WER78*YIZ          2 L6WER7
         1 KL6WER78*YIZ          3 6WER78
         1 KL6WER78*YIZ          4 WER78*
         1 KL6WER78*YIZ          5 ER78*Y
         1 KL6WER78*YIZ          6 R78*YI
         1 KL6WER78*YIZ          7 78*YIZ
         2 HGDU0$WER78*          1 HGDU0$
         2 HGDU0$WER78*          2 GDU0$W
         2 HGDU0$WER78*          3 DU0$WE
         2 HGDU0$WER78*          4 U0$WER
         2 HGDU0$WER78*          5 0$WER7
         2 HGDU0$WER78*          6 $WER78
         2 HGDU0$WER78*          7 WER78*
         2 UL6WER78*MBV          1 UL6WER
         2 UL6WER78*MBV          2 L6WER7
         2 UL6WER78*MBV          3 6WER78
         2 UL6WER78*MBV          4 WER78*
         2 UL6WER78*MBV          5 ER78*M
         2 UL6WER78*MBV          6 R78*MB
         2 UL6WER78*MBV          7 78*MBV

28 rows selected.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> with data as (select level l from dual connect by level <= 7)
  2  select t1_data.x, t2_data.y, count(*)
  3    from (
  4  select 1, x, l, substr( x, l, 6 ) ss
  5    from t1, data
  6         ) t1_data,
  7             (
  8  select 2, y, l, substr( y, l, 6 ) ss
  9    from t2, data
 10         ) t2_data
 11   where t1_data.ss = t2_data.ss
 12   group by t1_data.x, t2_data.y
 13  /

X            Y              COUNT(*)
------------ ------------ ----------
KL6WER78*YIZ HGDU0$WER78*          1
KL6WER78*YIZ UL6WER78*MBV          3
A8UI0$WER76$ HGDU0$WER78*          1


Excellent

Prakash, February 17, 2009 - 11:46 am UTC

Tom,
Thank you for the solution. Exactly what I was looking for. You come through again and again for thousands of users across the globe. I don't know of another person providing the kind of service you are providing for any other technology/solution. You are truly Peerless...

Prakash

regexp_substr Unexpected Result (10.1.0.4)

Richard, March 13, 2009 - 1:45 pm UTC

Given:
create table t ( ref_codes varchar2(20));
insert into t values ('A00000');
insert into t values ('100000');
insert into t values ('10000A');
insert into t values ('B0000A-  1');
insert into t values ('B0000A-311');
insert into t values ('B0000A/CTK');
insert into t values ('B0000A, 11');
insert into t values ('10000009');
insert into t values ('B0000A A00000 100000');
insert into t values ('B0000A A00000 ');
insert into t values ('B0000A A00001');
insert into t values ('B0000A/A00000');
commit;


I need to extract up to 3 reference codes that are 6 characters in length when separated by a non-alphanumeric character.

select ref_codes,
       regexp_substr(ref_codes,'[[:alnum:]][[:digit:]]{4}[[:alnum:]]',1) ref1,
       regexp_substr(ref_codes,'[[:alnum:]][[:digit:]]{4}[[:alnum:]]',2) ref2,
       regexp_substr(ref_codes,'[[:alnum:]][[:digit:]]{4}[[:alnum:]]',3) ref3
  from t;

Expected Output:

REF_CODES            REF1    REF2    REF3 
A00000               A00000
100000               100000
10000A               10000A
B0000A-  1           B0000A
B0000A-311           B0000A
B0000A/CTK           B0000A
B0000A, 11           B0000A
10000009
B0000A A00000 100000 B0000A  A00000  100000
B0000A A00000        B0000A  A00000
B0000A A00001        B0000A  A00001
B0000A/A00000        B0000A  A00000


However, records 7-11 come out as:
REF_CODES            REF1    REF2    REF3 
10000009             100000  000000  000009
B0000A A00000 100000 B0000A  A00000  A00000
B0000A A00000        B0000A  A00000  A00000
B0000A A00001        B0000A  A00001  A00001
B0000A/A00000        B0000A  A00000  A00000

What do I need to add to the pattern string?
Tom Kyte
March 13, 2009 - 3:16 pm UTC

sorry, not a regexp expert (hate em actually)...

I see exactly why you get the output you are (it is exactly what I would have expected - you expect the regexp to move on to BEYOND where you last stopped, it doesn't, it picks up at the next character after you started...)

how about you specify in text, in the form of a requirement, what you want - forget regexp (you have some code that doesn't work, not useful, we need a specification that dictates exactly, precisely what you want in the form of a requirement, a specification - not code, really not code that does not work).


it is likely you'll need a series of "regexp substr", then "regexp substr AFTER getting rid of the first regexp substr stuff" - eg: repexp replace - then substr


Huh?

A reader, March 13, 2009 - 3:41 pm UTC

Hi Tom,

I'm a bit surprised when you say you 'hate' regexp. That's quite a statement.
I know you hate 'when others not followed by a raise', you hate triggers and I agree fully on that and fully understand your 'crusade' against them (making their caveats very clear), but the regexp_replace saves me from, lets say: about 42 substr/instr/translate etc. function calls, making my code hardly readable (or understandable), while using regexp turns it all into one call in a SQL statement.
What makes you hate the regexp builtin? What are it's shortcomings in your opinion? Any caveats?
I agree, it's a puzzle sometimes, but isn't puzzling fun?
Or is the regexp 'just not your kinda thing'?
Ah...so many questions do rise now ;-)

Oh, and most of times I use regexp_replace, that's one difference. But still a regexp.


Best regards,
Martijn




Tom Kyte
March 13, 2009 - 4:05 pm UTC

Ok, I strongly dislike complex regular expressions. Because I've never gotten my head around them entirely. It is a puzzle without commentary most of the times - If there was a long commentary saying "this is what this does and how it does it" (like I many times do with my 'tricky queries'), maybe it would be different.

so, maybe it is the lack of commentary that bothers me :)

reg exps and comments

rc, March 13, 2009 - 4:17 pm UTC

Some regular expressions flavors like perl's make it possible to include comments. Maybe that would help?
Tom Kyte
March 16, 2009 - 8:46 am UTC

well, everything can be commented and yes - that would definitely help almost everything :)

A reader, March 13, 2009 - 4:50 pm UTC

"so, maybe it is the lack of commentary that bothers me :)"

Without clear documentation, specifications and honest comments and an open mind/ healthy discussion, we're all lost, so let's keep on preventing that from happening, therefore: 'why?' should always be meta-question #1 ;-)

But I'm glad anyway you only 'dislike' regexps, and not 'hate' them anymore, almost thought I took a wrong path :-)

Thanks &
Best regards,

Martijn

re: regexp_substr Unexpected Result (10.1.0.4)

Richard, March 13, 2009 - 11:08 pm UTC

Requirement spec:

The reference codes field is 20 characters and may contain up to 3 valid reference codes; valid reference codes are 6 characters each.

Each code may be separated by space, comma, or /.

The field may contain a combination of 1 valid reference code and extra info following a /, -, or space.

The field may contain a non-standard reference code of 8 or more characters; these can be treated as nulls.


Tom Kyte
March 16, 2009 - 9:01 am UTC

... Each code may be separated by space, comma, or /. ....


each code may be, or will be - and by "a" space, comma or '/' or by any sequence of space+comma+'/' (eg: / ,,,,,, ,////// ////////,,,,,,,,,,,/ would be a valid, single separator.)


and if "...and extra info following a /, -, or space. ...." is true, how could '/' and ' ' be separators?

Richard, March 16, 2009 - 10:52 am UTC

As originally posted, these are all the formats contained in this field:

insert into t values ('10000A');
insert into t values ('B0000A-  1');
insert into t values ('B0000A-311');
insert into t values ('B0000A/CTK');
insert into t values ('B0000A, 11');
insert into t values ('10000009');
insert into t values ('B0000A A00000 100000');
insert into t values ('B0000A A00000 ');
insert into t values ('B0000A A00001');

Tom Kyte
March 17, 2009 - 10:23 am UTC

I just need you to answer a question - in text.

As originally posted was a small snippet, rarely (if ever) do we get test cases that test all of the boundary conditions. Your specifications are in conflict with each other.


...
Each code may be separated by space, comma, or /.

The field may contain a combination of 1 valid reference code and extra info following a /, -, or space.
.......


Ok, then

insert into t values ('B0000A/CTK');
is the same as

insert into t values ('B0000A CTK');

and ctk should be considered a field all in it's own right? So, what if you have

insert into t values ('B0000A/CTK B0000B/CTK B0000C/X');

Now we have, by your rules, six fields, the first three of which are the only valid ones to look at - you would get as output just the B0000A & B0000B items, B0000C would not be considered

So, that is probably not what you wanted - and Just because your example doesn't contain that exact insert doesn't mean it won't be a possible value

So, before I spend any time (this is going to be doable I believe), I want you to take a couple of seconds and just spell it out - I know to you it is *obvious* because it is your data, you live with it, you look at it every day. To me it is just data, I need to understand what the real rules are.


I believe the separator is really just "a space", not " ,/"

Re: String matching

Parthiban Nagarajan, March 17, 2009 - 11:57 pm UTC

Hi Tom

The RegEx pack in Oracle is awesome.  Just now I started learning RegEx. And here is the alternate solution for "String matching - PRAKASH from Dallas" using RegEx.

create table t1 ( x varchar2(12) );
create table t2 ( y varchar2(12) );
insert into t1 values ( 'A8UI0$WER76$' );
insert into t2 values ( 'HGDU0$WER78*' );
insert into t1 values ( 'KL6WER78*YIZ' );
insert into t2 values ( 'UL6WER78*MBV' );

SQL> col src format a24
SQL> select t1.x||t2.y src,
            case when regexp_like(t1.x||t2.y, '(.{6}).*\1') then 1 else 0 end as match,
            substr(regexp_substr(t1.x||t2.y, '(.{6}).*\1'),1,6) matched_str
       from t1, t2;

SRC                           MATCH MATCHED_STR
------------------------ ---------- ------------------
A8UI0$WER76$HGDU0$WER78*          1 0$WER7
A8UI0$WER76$UL6WER78*MBV          0
KL6WER78*YIZHGDU0$WER78*          1 WER78*
KL6WER78*YIZUL6WER78*MBV          1 L6WER7

The SQL became very much simpler only because of the power of RegEx.

Comments:
---------
 .      -> Find a character
 .{6}   -> Find six consecutive characters
 (.{6}) -> Remember those six consecutive characters
 .*     -> Find any character of arbitrary length
 \1     -> Find the 1st remembered pattern again



Tom,
You are a genius in Oracle and its SQL. Common people cannot understand SQL. And developers of a little knowledge (like me ;) cannot understand some complex and tricky SQL written by geniuses like you, even with comments to describe it. Guys with RegEx background dont need the description. They see the world as patterns. They can speak and understand the RegEx language. The thing is you dont have time to MASTER RegEx. You have been kept away from RegEx because of some other busy things (For eg: This ASKTOM site). We are learning Oracle from you and we dont want to miss RegEx tricks from you. When you miss RegEx then WE miss a Guru for RegEx.
Tom Kyte
March 18, 2009 - 8:15 am UTC

... cannot understand some complex and tricky SQL written by geniuses like you, even with comments to describe it. ...


sorry, that just cannot be true.


how about you take your solution and mine and scale it up and see which "performs"... I have a suspicion I know the answer...

comments

rc, March 18, 2009 - 5:34 am UTC

Sure everything can be commented but is it possible to include comments inside the Oracle reg exp?
Tom Kyte
March 18, 2009 - 8:19 am UTC

why would it have to be - OR EVEN SHOULD be "in the code, right in the middle of it all"?

I never use comments IN my sql (rarely I guess I should say, I comment out bits sometimes)

I preface my sql with a comment, it would come as a comment block


for /* this is my loop */ ( i = 0 /* set i to zero */; i++ /*increment i */; i<10 /* we need to do this for i in 0..9 */ )
{

I wouldn't recommend that, and I wouldn't recommend a comment "in" a regular expression.

comments

rc, March 18, 2009 - 10:29 am UTC

Well others have a different 'vision' on comments inside the reg exp.

See for example here: http://www.perl.com/pub/a/2004/01/16/regexps.html
Tom Kyte
March 19, 2009 - 9:53 am UTC

that is fine, all I was saying is "it wouldn't really matter to me if it doesn't support comments in the regular expression, I'm of the old school that uses comment blocks above code"

regular expression

A reader, April 27, 2009 - 10:20 am UTC

Hi Tom

I am still trying to understand the 10g regexp* functions.
I have a question on this .

I have a table where I have data like 'JUN10' , 'AUG09','SEPT11' , 'OCTOBER' ;

so I want to write a query using regexp_like to get the rows where I have 2 last characters are always numeric (number) .

APPS@DEVL1-2504>create table t ( ss varchar2(10)) ;

Table created.

Elapsed: 00:00:01.53
insert into t values ('Jun10' ) ;

1 row created.

Elapsed: 00:00:01.57
commit ;

Commit complete.

Elapsed: 00:00:01.67
insert into t values ('ABC') ;

so the query should give me only one row

that is JUN10.

can you please give some simple examples of using regexp
thanks
Tom Kyte
April 27, 2009 - 2:27 pm UTC

ops$tkyte%ORA10GR2> create table t ( ss varchar2(10)) ;

Table created.

ops$tkyte%ORA10GR2> insert into t values ('Jun10' ) ;

1 row created.

ops$tkyte%ORA10GR2> insert into t values ('ABC') ;

1 row created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select * from t where regexp_like( ss, '.*[0-9][0-9]$' );

SS
----------
Jun10



You'll want to read through various regular expression tutorials (they are not an "Oracle-ism", they are widely used in many bits of software)

http://www.google.com/search?q=regular+expression+tutorial

great

A reader, April 28, 2009 - 3:43 am UTC

Thanks Tom, really good stuff, i will read thorugh the suggested URLs and try to understand it better.

Potential solution

Scott, July 09, 2009 - 5:21 am UTC

I'm facing a similar battle to Maverick back in 07.

Something I found since when I was inserting this information, it seemed to take up more length than I was expecting.

10gr2> select length('that¿more'), lengthb('that¿more') from dual;

LENGTH('THAT¿MORE') LENGTHB('THAT¿MORE')
------------------- --------------------
                  9                   10

Well, this identifies the information... next step is removing. Depends on what you need to do...

CONVERT

Scott, July 09, 2009 - 10:47 pm UTC

10gr2> select convert('aéñàueoüûç','US7ASCII','UTF8') yes from dual
/

YES
----------
ae?aueouuc

Even better.

Teasing out congruent sets of numbers

J from Vancouver, November 05, 2009 - 3:36 pm UTC

Hello Tom (all),

Here's my setup:

drop table jt;
create table jt (idn number(2), str varchar2(30));
insert into jt (idn,str) values (1,'10');
insert into jt (idn,str) values (2,'Test 200');
insert into jt (idn,str) values (3,'Test 3000 ok');
insert into jt (idn,str) values (4,'4000 Test');
insert into jt (idn,str) values (5,'Test 5.00 - ok');
insert into jt (idn,str) values (6,'Test -60.0 ');
insert into jt (idn,str) values (7,'Test 7 but not 1');
insert into jt (idn,str) values (8,'Test 8 but not 2 anywhere');
insert into jt (idn,str) values (9,'Test number nine');
insert into jt (idn,str) values (10,' Test.10');
insert into jt (idn,str) values (11,'11.1.2.3');
commit;

So what I'd like to get is the IDN and numeric value
of STR (including leading minus signs and decimal points),
provided that there is no more than ONE distinct
set of congruent numbers. For example:

IDN NUM_VAL
--- -------
1 10
2 200
3 3000
4 4000
5 5.0
6 -60.0
10 .10

Rows 7 and 8 are excluded because they have two sets of congruent numbers. Row 9 is excluded because the string has no numerics. Row 11 is excluded because 11.1.2.3 is not a number.

So I wrote a little "isnumeric" function which traps an implicit TO_NUMBER:

create or replace function isnumeric
(p_string in varchar2)
return varchar2
as
l_number number;
begin
l_number := nvl(p_string,'x');
return 'Y';
exception
when others then
return 'N';
end;
/

And wrote up a regular expression query which ALMOST gets me there:

select idn,
rtrim(regexp_replace(trim(str),'[^-.0123456789]'),'-') num_val
from jt
where isnumeric(rtrim(regexp_replace(trim(str),'[^-.0123456789]'),'-')) = 'Y';

IDN NUM_VAL
--- -------
1 10
2 200
3 3000
4 4000
5 5.00
6 -60.0
7 71
8 82
10 .10

But I can't figure out how to exclude IDN's 7 and 8.
My regular expression concatenates the numbers together.
Any thoughts as to how I can identify sets of congruent numbers?

I tried combinations of TRANSLATE and REPLACE, but I'm still stuck.

Many Thanks!
- J



Tom Kyte
November 11, 2009 - 1:13 pm UTC

ops$tkyte%ORA11GR2> select str,
  2         trim(regexp_replace(str,'[^-.0123456789]', ' ')) str2,
  3         isnumeric(regexp_replace(str,'[^-.0123456789]', ' ')) flag
  4    from jt;

STR                            STR2                 F
------------------------------ -------------------- -
10                             10                   Y
Test 200                       200                  Y
Test 3000 ok                   3000                 Y
4000 Test                      4000                 Y
Test 5.00 - ok                 5.00 -               N
Test -60.0                     -60.0                Y
Test 7 but not 1               7         1          N
Test 8 but not 2 anywhere      8         2          N
Test number nine                                    N
 Test.10                       .10                  Y
11.1.2.3                       11.1.2.3             N

11 rows selected.

Teasing out congruent sets of numbers

J from Vancouver, November 06, 2009 - 12:31 pm UTC

Oh duh.

I just needed to add a "space" to the regexp.
From: [^-.0123456789]
To: [^-.0123456789 ]

Whoo-Hoo! =oP

Cheers!
- J

Reg Ex to split the name

A reader, January 12, 2010 - 12:50 pm UTC

I am trying to use Regular expression to split the full name into First Name , Middle Name , Last Name . Some of the records may not have a middle name . Tried few variations .. no luck so far. 

SQL> With V_Data As
  2    ( SELECT 'Walt F Disney' Fn FROM Dual
  3    Union All
  4    SELECT 'Walt Disney' Fn FROM Dual
  5    )
  6  Select fn ,
  7    Regexp_Substr(Fn,'[^ ]+', 1, 1) Firstname  ,
  8    Regexp_Substr(Fn, ' [^ ]+ ') Middle ,
  9    Regexp_Substr(Fn,'[^ ]+', 1, 3)Lastname
 10  From V_Data;

FN            FIRSTNAME     MIDDLE        LASTNAME
------------- ------------- ------------- -------------
Walt F Disney Walt           F            Disney
Walt Disney   Walt

SQL>
SQL> With V_Data As
  2    ( SELECT 'Walt F Disney' Fn FROM Dual
  3    Union All
  4    SELECT 'Walt Disney' Fn FROM Dual
  5    )
  6  Select  fn ,
  7    Regexp_Substr(Fn,'[^ ]+', 1, 1) Firstname  ,
  8    Regexp_Substr(Fn, ' [^ ]+ ') Middle ,
  9    Regexp_Substr(Fn,'[^ ]+', 1, 2)Lastname
 10  From V_Data;

FN            FIRSTNAME     MIDDLE        LASTNAME
------------- ------------- ------------- -------------
Walt F Disney Walt           F            F
Walt Disney   Walt                        Disney

Tom Kyte
January 18, 2010 - 3:43 pm UTC

ops$tkyte%ORA10GR2> with data
  2  as
  3  (select 'Tom Kyte' fn from dual
  4   union all
  5   select 'Thomas Jeffrey Kyte' fn from dual
  6   union all
  7   select '  Thomas J Kyte  ' fn from dual
  8  )
  9  select piece1, case when piece3 is null then null else piece2 end, nvl(piece3,piece2)
 10    from (
 11  select substr( trim(fn), 1, instr(trim(fn)||' ',' ')-1) piece1,
 12         substr( trim(fn), instr(trim(fn)||' ',' ',1,1)+1, instr(trim(fn)||' ',' ',1,2)-instr(trim(fn)||' ',' ',1,1)-1 ) piece2,
 13         substr( trim(fn), instr(trim(fn)||' ',' ',1,2)+1, instr(trim(fn)||' ',' ',1,3)-instr(trim(fn)||' ',' ',1,2)-1 ) piece3
 14    from data
 15         )
 16  /

PIECE1              CASEWHENPIECE3ISNUL NVL(PIECE3,PIECE2)
------------------- ------------------- -------------------
Tom                                     Kyte
Thomas              Jeffrey             Kyte
Thomas              J                   Kyte




if your data is predicable enough, that'll work.

Enhancements to fuzzy comparision between two columns

Logan Palanisamy, February 23, 2010 - 5:07 pm UTC

This is in response to posting by "Parthiban Nagarajan from Coimbatore, India" dated March 17, 2009.

His solution is pretty elegant, except for the following two cases:

1. Repeating string within the same column
2. End of the first column and the beginning of the second column match the end of the second column.

This is easily solved by introducing a space, (or some other character that is not normally part of the column) between the columns when they concatenated.


SQL>select * from t1;

X
------------
A8UI0$WER76$
KL6WER78*YIZ
012345012345
abcdefghi123

SQL>select * from t2;

Y
------------
HGDU0$WER78*
UL6WER78*MBV
123456123456
456abc123456

SQL>select t1.x, t2.y,
  2  case when regexp_like(t1.x||t2.y, '(.{6}).*\1') then 1 else 0 end as match,
  3  substr(regexp_substr(t1.x||t2.y, '(.{6}).*\1'),1,6) matched_str
  4  from t1, t2;

X            Y                       MATCH MATCHED_ST
------------ ------------ ---------------- ----------
A8UI0$WER76$ HGDU0$WER78*                1 0$WER7
A8UI0$WER76$ UL6WER78*MBV                0
A8UI0$WER76$ 123456123456                1 123456 <-- False match due to repeating strings within Y
A8UI0$WER76$ 456abc123456                0  
KL6WER78*YIZ HGDU0$WER78*                1 WER78*
KL6WER78*YIZ UL6WER78*MBV                1 L6WER7
KL6WER78*YIZ 123456123456                1 123456 <-- False match due to repeating strings within Y
KL6WER78*YIZ 456abc123456                0 
012345012345 HGDU0$WER78*                1 012345 <-- False match due to repeating strings within X
012345012345 UL6WER78*MBV                1 012345 <-- False match due to repeating strings within X
012345012345 123456123456                1 012345 <-- False match due to repeating strings within X
012345012345 456abc123456                1 012345 <-- False match due to repeating strings within X
abcdefghi123 HGDU0$WER78*                0
abcdefghi123 UL6WER78*MBV                0
abcdefghi123 123456123456                1 123456 <-- False match due to repeating strings within Y
abcdefghi123 456abc123456                1 123456 <-- False match. The last three chars of the X and first three chars of Y match the last six chars of Y 

16 rows selected.

-- The false matches go away once the two columns are treated separately with a space, assuming space is not embedded in the columns. \s is the perl notation for space in regex. 
SQL>
SQL>select t1.x, t2.y,
  2  case when regexp_like(t1.x|| ' ' || t2.y, '(.{6}).*\s.*\1') then 1 else 0 end as match,
  3  substr(regexp_substr(t1.x|| ' ' || t2.y, '(.{6}).*\s.*\1'),1,6) matched_str
  4  from t1, t2;

X            Y                       MATCH MATCHED_ST
------------ ------------ ---------------- ----------
A8UI0$WER76$ HGDU0$WER78*                1 0$WER7
A8UI0$WER76$ UL6WER78*MBV                0
A8UI0$WER76$ 123456123456                0
A8UI0$WER76$ 456abc123456                0
KL6WER78*YIZ HGDU0$WER78*                1 WER78*
KL6WER78*YIZ UL6WER78*MBV                1 L6WER7
KL6WER78*YIZ 123456123456                0
KL6WER78*YIZ 456abc123456                0
012345012345 HGDU0$WER78*                0
012345012345 UL6WER78*MBV                0
012345012345 123456123456                0
012345012345 456abc123456                0
abcdefghi123 HGDU0$WER78*                0
abcdefghi123 UL6WER78*MBV                0
abcdefghi123 123456123456                0
abcdefghi123 456abc123456                0

16 rows selected.

Regex to split names

Logan Palanisamy, February 24, 2010 - 6:42 pm UTC

SQL>with data
  2    as
  3    (select 'Tom Kyte' fn from dual
  4     union all
  5     select 'TomKyte' fn from dual
  6     union all
  7     select 'Thomas Jeffrey Kyte' fn from dual
  8     union all
  9     select '  Thomas J Kyte  ' fn from dual
 10    )
 11  select
 12  regexp_replace(fn, '^\s*([A-Z][a-z]*)\s*([A-Z][a-z]*)?\s*([A-Z][a-z]*).*$', '\1') first_name,
 13  regexp_replace(fn, '^\s*([A-Z][a-z]*)\s*([A-Z][a-z]*)?\s*([A-Z][a-z]*).*$', '\2') middle_name,
 14  regexp_replace(fn, '^\s*([A-Z][a-z]*)\s*([A-Z][a-z]*)?\s*([A-Z][a-z]*).*$', '\3') last_name
 15  from data;

FIRST_NAME           MIDDLE_NAME          LAST_NAME
-------------------- -------------------- --------------------
Tom                                       Kyte
Tom                                       Kyte
Thomas               Jeffrey              Kyte
Thomas               J                    Kyte


Regular Expression

Abhisek, September 29, 2010 - 10:12 am UTC

Hi Tom,

I have to use regular expression to remove the > and < braces from beginning and end only. e.g. if my text is '<DEMO > TEXT>' , it will be displayed as 'DEMO > TEXT'.

It will be great if we could translate them to &lt; and &gt; to look like '&lt;DEMO > TEXT &gt;
but any one will do.

Thanks a lot.
Tom Kyte
September 29, 2010 - 10:23 am UTC

good luck with that.

how do you plan on dealing with:

<demo > text> and this is something > something else, but this is less then - you know < - this <hello world>

what is the solution to that? how the heck to you know where a <> pair starts and ends??? what are your RULES - then and only then can any solution be provided or even considered.

Regular Expression

Abhisek, September 29, 2010 - 10:14 am UTC

I know it is possible using htf.escape_sc like

select htf.escape_sc('escape <this> one) from dual;

but I want it through regular expression. Please help.
Tom Kyte
September 29, 2010 - 10:25 am UTC

I probably won't be giving a regular expression - and no you cannot do it via escape_sc - that would change ALL of them - not just the ones you pointed at.

but first, you must - you MUST - provide your specific logic for how this works.


And it almost certainly won't be a regular express from me - it'll be an efficient, correct solution.



If escape_sc does solve your problem, then escape_sc is your answer.

regular expression

Abhisek, September 29, 2010 - 2:45 pm UTC

Hi Tom,

You are absolutely correct when you say that the escape_sc will change every < or > sign.

My requirement is I have to remove the < at the very first character and > is at the last character.. In the text I can have these signs but I am considered for first and last only.. e.g.

Input Text Output Text
<Hello World> Hello World
Hello World Hello World
<Hello World Hello World
<Hello > World> Hello > World
Hello <Test> World Hello <Test> World

Please do let me know if I am clear.
Tom Kyte
September 29, 2010 - 4:14 pm UTC

ops$tkyte%ORA11GR2> select str , ltrim(rtrim(str,'>'),'<')
  2    from (
  3  select '<Hello World>' str from dual union all
  4  select 'Hello World' str from dual union all
  5  select '<Hello World' str from dual union all
  6  select '<Hello > World>' str from dual union all
  7  select 'Hello <Test> World' str from dual )
  8  /

STR                LTRIM(RTRIM(STR,'>
------------------ ------------------
<Hello World>      Hello World
Hello World        Hello World
<Hello World       Hello World
<Hello > World>    Hello > World
Hello <Test> World Hello <Test> World




see, no regular expression, that would be the wrong way to do it.

Simply Awesome

ABhisek, September 29, 2010 - 4:54 pm UTC

This is really nice pretty way... Just was curious if we could replace the > and < signs with &gt; and &lt;

e.g.
Input Text Output_text
<Hello World> &lt;Hello World&gt;
<Hello > World> &lt;Hello > World&gt;

Thanks a lot.
Tom Kyte
September 29, 2010 - 5:52 pm UTC


ops$tkyte%ORA11GR2> select str , ltrim(rtrim(str,'>'),'<'),
  2         decode( instr(str,'<'), 1, '&lt;' ) ||
  3                 ltrim(rtrim(str,'>'),'<') ||
  4             decode( instr(str,'>',-1), length(str), '&gt;' )
  5    from (
  6  select '<Hello World>' str from dual union all
  7  select 'Hello World' str from dual union all
  8  select '<Hello World' str from dual union all
  9  select '<Hello > World>' str from dual union all
 10  select 'Hello <Test> World' str from dual )
 11  /

STR                LTRIM(RTRIM(STR,'> DECODE(INSTR(STR,'<'),1,'&
------------------ ------------------ --------------------------
<Hello World>      Hello World        &lt;Hello World&gt;
Hello World        Hello World        Hello World
<Hello World       Hello World        &lt;Hello World
<Hello > World>    Hello > World      &lt;Hello > World&gt;
Hello <Test> World Hello <Test> World Hello <Test> World


or

ops$tkyte%ORA11GR2> select str , ltrim(rtrim(str,'>'),'<'),
  2         case when str like '<%>'
  3                  then '&lt;' || ltrim(rtrim(str,'>'),'<') || '&gt;'
  4                  else           ltrim(rtrim(str,'>'),'<')
  5                  end
  6    from (
  7  select '<Hello World>' str from dual union all
  8  select 'Hello World' str from dual union all
  9  select '<Hello World' str from dual union all
 10  select '<Hello > World>' str from dual union all
 11  select 'Hello <Test> World' str from dual )
 12  /

STR                LTRIM(RTRIM(STR,'> CASEWHENSTRLIKE'<%>'THEN'&
------------------ ------------------ --------------------------
<Hello World>      Hello World        &lt;Hello World&gt;
Hello World        Hello World        Hello World
<Hello World       Hello World        Hello World
<Hello > World>    Hello > World      &lt;Hello > World&gt;
Hello <Test> World Hello <Test> World Hello <Test> World




depending on whether you want it only when the text is enclosed in <>

Thanks a lot

Abhisek, September 30, 2010 - 1:33 am UTC

Hi Tom,

Thanks a lot for making life easier.

Meanwhile, I had to change a little bit to avoid SCAN SQL as it was looking for assignment variable.

select str , ltrim(rtrim(str,'>'),'<'),
case when str like '<%>'
then '&' ||'lt;' || ltrim(rtrim(str,'>'),'<') || '&' || 'gt;'
else ltrim(rtrim(str,'>'),'<')
end
from (
select '<Hello World>' str from dual union all
select 'Hello World' str from dual union all
select '<Hello World' str from dual union all
select '<Hello > World>' str from dual union all
select 'Hello <Test> World' str from dual )

Thanks a lot again.
Tom Kyte
September 30, 2010 - 7:31 am UTC

yeah, sorry about that - I set define off and that didn't make the cut and paste. I meant to have that in there...

How to use REGEXP_LIKE to find string with 'a','c', and 'z'

A reader, November 08, 2010 - 1:42 pm UTC

Hey Tom,
How to use REGEXP_LIKE to find string with 'a','c', and 'z'?

Thanks.
Tom Kyte
November 08, 2010 - 1:44 pm UTC

my answer would be to NOT use regexp_like to do that.

Just use instr - where instr(str,'a')>0 AND instr(str,'c')>0 AND instr(str,'z')>0

it will use slightly (kidding, a lot) less cpu than a regular expression would

Is there a way with regular expression

A reader, November 08, 2010 - 2:52 pm UTC

Tom,

Thank you for your prompt reply.
Is there a way to use ALL? I only see OR with regualr expression.

Thanks.
Tom Kyte
November 08, 2010 - 3:25 pm UTC

Not really worth my time - I've given you the *efficient* way to accomplish your goal - the *right* way to achieve your goal. I'd rather not chase "wrong ways" myself.

sorry.

REGEXP_SUBSTR

A reader, February 02, 2011 - 3:13 pm UTC


How do I achieve this
using REGEXP_SUBSTR 
Requirement "portion of ATTRIBUTE up to and NOT including the first comma"

I tried the following but in for those cases where the values starts with 'comma' it is not working.
<code>
with lst as (select 'test,tes1' as val1,',t' val2 from dual)
select 
val1 ,val2,
REGEXP_SUBSTR( val1 , '[^$,]+' ) as "regexp_val1 working",
REGEXP_SUBSTR( val2 , '[^$,]+' ) "regexp_val2 not working"
 from lst;



VAL1 VAL2 regexp_val1 working regexp_val2 not working
------------------------------ ------------------------------ ------------------------------ -------
test,tes1 ,t test t

</code>

A reader, April 02, 2011 - 11:13 pm UTC

How would one achieve a Check constraint on a table column that stores string value which should only contain a string that beings with a number and should not contain '?' character?

Tom Kyte
April 12, 2011 - 11:35 am UTC

ops$tkyte%ORA11GR2> create table t
  2  ( x varchar2(30) not null check( to_number(substr(x,1,1)) between 0 and 9 and instr( x, '?' ) = 0 )
  3  )
  4  /

Table created.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> insert into t values ( '123abc' );

1 row created.

ops$tkyte%ORA11GR2> insert into t values ( '123?bc' );
insert into t values ( '123?bc' )
*
ERROR at line 1:
ORA-02290: check constraint (OPS$TKYTE.SYS_C0017241) violated


ops$tkyte%ORA11GR2> insert into t values ( 'y23abc' );
insert into t values ( 'y23abc' )
           *
ERROR at line 1:
ORA-01722: invalid number


is one approach.

doubt regarding REGEXP_LIKE

Siby, July 05, 2011 - 2:11 pm UTC

Hi,

I have a regular expression which is used to check if an email address matches a certain pattern. This regular expression check goes in an infinite loop when I run it against the following value:

select 1 from dual
where
REGEXP_LIKE (
TRIM ('valenzuela@-call.abc.com-'),
'^[a-zA-Z]+(([-_''`.+!#$%&/=?^{}|~*])*\w*)*\@[A-Za-z0-9]+(-[A-Za-z0-9]+)*(\.[A-Za-z0-9]+(-[A-Za-z0-9]+)*){1,4}$')

I was under the impression that it would either pass or fail in case of a pattern match. Would you happen to know why this is happening so?

Thanks.

Tom Kyte
July 05, 2011 - 4:27 pm UTC

nope, you'll need a regular expression parser to help you with that and I'm definitely not one of those.

but not sure what you mean by infinite loop - that shouldn't happen (a hang), if it does, please contact support - it seems to fly for me:
ops$tkyte%ORA11GR2> select 1 from dual
  2  where
  3  REGEXP_LIKE (
  4              TRIM ('valenzuela@call.abc.com'),
  5  '^[a-zA-Z]+(([-_''`.+!#$%&/=?^{}|~*])*\w*)*\@[A-Za-z0-9]+(-[A-Za-z0-9]+)*(\.[A-Za-z0-9]+(-[A-Za-z0-9]+)*){1,4}$')
  6  /

         1
----------
         1

ops$tkyte%ORA11GR2> edit test

ops$tkyte%ORA11GR2> @test
ops$tkyte%ORA11GR2> select 1 from dual
  2  where
  3  REGEXP_LIKE (
  4              TRIM ('valenzuela@-call.abc.com-'),
  5  '^[a-zA-Z]+(([-_''`.+!#$%&/=?^{}|~*])*\w*)*\@[A-Za-z0-9]+(-[A-Za-z0-9]+)*(\.[A-Za-z0-9]+(-[A-Za-z0-9]+)*){1,4}$')
  6  /

no rows selected


same as above

Siby, July 05, 2011 - 2:43 pm UTC

Hi,

I put in the wrong example above. Please try executing the following query?

select 1 from dual
where
REGEXP_LIKE (
TRIM ('valenzuelabuentellojesusgerardo@-car.abc.com-'),
'^[a-zA-Z]+(([-_''`.+!#$%&/=?^{}|~*])*\w*)*\@[A-Za-z0-9]+(-[A-Za-z0-9]+)*(\.[A-Za-z0-9]+(-[A-Za-z0-9
]+)*){1,4}$')

Thanks.
Tom Kyte
July 05, 2011 - 4:28 pm UTC

i did, no problem for me, running 11.2.0.2

Gets tuck for me.

Siby, July 05, 2011 - 5:52 pm UTC

The last regular expression hangs on my database, running 11.1.0.7.0

select 1 from dual
where
REGEXP_LIKE (
TRIM ('valenzuelabuentellojesusgerardo@-car.abc.com-'),
'^[a-zA-Z]+(([-_''`.+!#$%&/=?^{}|~*])*\w*)*\@[A-Za-z0-9]+(-[A-Za-z0-9]+)*(\.[A-Za-z0-9]+(-[A-Za-z0-9

]+)*){1,4}$')

I'll see if i can get help from Oracle Support.

regexp is evil :-)

Sokrates, July 06, 2011 - 2:54 am UTC

@Siby

interesting, I can reproduce your issue on 11.2.0.2 (linux x86-64):
11.2.0.2 > select 1 from dual
  2  where
  3  REGEXP_LIKE (
  4             TRIM ('valenzuelabuentellojesusgerardo@-car.abc.com-'),
  5  '^[a-zA-Z]+(([-_''`.+!#$%&/=?^{}|~*])*\w*)*\@[A-Za-z0-9]+(-[A-Za-z0-9]+)*(\.[A-Za-z0-9]+(-[A
  6  -Za-z0-9]+)*){1,4}$')
  7  /

(waiting for 5 minutes, then
kill -9 <PID>
on Server)
select 1 from dual
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 7270
Session ID: 6 Serial number: 1671


ERROR:
ORA-03114: not connected to ORACLE


Elapsed: 00:04:58.29


trace (events 10046, level 12) showed:
....
An invalid number has been seen.Memory contents are :
...

Regex is too complex

Steven Kladitis, July 07, 2011 - 10:49 am UTC

Using a popular regex tool, I got the following message on your regex and the test case.

The match attempt was aborted early because the regular expression is too complex.
The regex engine you plan to use it with may not be able to handle it at all and crash.
Look up "catastrophic backtracking" to learn how to avoid this situation.

Not sure what Oracle should do with regular expressions like that.

My 11gr2 test database became very busy for a long time, until I restarted it. I recommend using a tool like regex buddy to check out your regular expressions before running in any environment.

Regexp hanging

Siby, July 07, 2011 - 10:56 pm UTC

The regular expression works fine if I remove the "-" in the domain name. Putting just one hyphen causes all the problem.

Oracle repsonse for hanging REGEXP

Siby, July 08, 2011 - 8:09 am UTC

This was the response from Oracle Support:

This issue was already reported in
Bug 8579113: UPDATE HANGS USING REGEXP_REPLACE ON LOB COLUMN closed as not a bug
Bug 10036245: QUERY SPINS WHEN REGULAR EXPRESSION IS USED ( REGEXP_LIKE ) status 11 - potential duplicate of Bug 8579113

Bug 8579113 was closed as not a bug because the development considers that this is an usage issue as explained in REGEXP_LIKE Hangs and CPU Spikes After DB Upgrade To 11gr2 From 10.2.0.4 (Doc ID 1276379.1)
Bug 10036245: QUERY SPINS WHEN REGULAR EXPRESSION IS USED ( REGEXP_LIKE ) is still in work by development, but it seems that could be closed as duplicate of 8579113, so not an issue from Oracle point of view.

The SQL containing REGEXP_LIKE as below consumes high CPU and runs for ever:
SELECT * FROM dual WHERE REGEXP_LIKE('12aBC', '^(-{0,1}+{0,1})[0-9]+(.{0,1}[0-9]+)$')

The statement spins in the lxregmatch stack; and the only possible way to stop the statement is to kill the session.
As per bug 10036245 and bug 8579113, this is a usage issue.

REG_EXP LIKE

Deanna, August 08, 2011 - 1:03 pm UTC

There is a lot of good information within these posts about REGEXP_LIKE, thank you.

Why would reg_exp return invalid relational operator, is it because it is returning a boolean? I am trying to use REGEXP_LIKE in a SQL statement

The data column is a VARCHAR (2) and has values such as

abcd123
1345-9876
1234-455678-88997
abcdefg

I only want to return those fields that match the pattern:
##-####

I tried the following
select column from table where
(REGEXP_LIKE (column, '^\d{2}-\d{4}$')

I know its got to be something easy I am just not seeing it.

Any help you can provide is greatly appreciated



Tom Kyte
August 13, 2011 - 3:32 pm UTC

once you fix your parentheses, it works fine:

ops$tkyte%ORA11GR2> select dummy from dual where
  2  (REGEXP_LIKE (dummy, '^\d{2}-\d{4}$')
  3  /
(REGEXP_LIKE (dummy, '^\d{2}-\d{4}$')
                                    *
ERROR at line 2:
ORA-00907: missing right parenthesis


ops$tkyte%ORA11GR2> select dummy from dual where
  2  REGEXP_LIKE (dummy, '^\d{2}-\d{4}$')
  3  /

no rows selected

Range parsing of text

J, February 16, 2012 - 9:52 pm UTC

Hello Tom.

Great thread - thanks so much.

Here's our challenge: Tidy up ranges by extracting
out the LOW_VAL, HIGH_VAL, UOM values from a text field.

And it's a royal mess!! Two of us have been tussling
with this for days - plsql developers here with
newbie (but enthusiastic) regex skills. The following
patterns are pretty representative of what we find.

Can you (or an ace regex friend) shed some light as
to how we can best approach this beast? We're still
learning how to "think regex" so any advice would be
greatly appreciated. We are on 11.2.0.3.

Thanks a bunch!

with x as
(select 'R1' nm,
null target_low_val,
null target_high_val,
null target_uom,
'baseline - no range info' notes
from dual union
select 'R1 S-7 FT',
null,
null,
null,
'letter "S" instead of a five'
from dual union
select 'R1-ABC 5.5FT',
5.5,
5.5,
'FT',
'min = max, no space before FT'
from dual union
select 'R1-ABC 5.5 FT',
5.5,
5.5,
'FT',
'min = max, extra spaces'
from dual union
select 'R1-ABC (0.5 FT - 2FT)',
0.5,
2,
'FT',
'sometimes they enclose w/parens'
from dual union
select 'R1 ABC 20-30 FT XYZ',
20,
30,
'FT',
'appended text'
from dual union
select 'R1/ABC 20-30 FTXYZ',
null,
null,
null,
'uom != FT'
from dual)
select nm, target_low_val, target_high_val,
target_uom, notes
from x;


Tom Kyte
February 17, 2012 - 5:28 am UTC

can we have an explanation of what the answer for the above example should be

and most importantly:

why it is the right answer - the rules for parsing and filling stuff in

Range parsing of text (2)

J, February 17, 2012 - 10:44 am UTC

Yes Sir! (thanks)

We see a couple of patterns. And this is how we have tried to translate into Regex'ese:

Please note: The value numbers are always positive (no numbers <= zero).

The RANGE pattern is as follows - luckily, the data entry for the range ALWAYS follows a [low value+dash+high value] pattern, with optional spaces between those three elements:
[descriptive text]
(might have spaces,dashes,slashes,parens) followed by
[one or more spaces] followed by
[zero or one open parens] followed by
[number - might have a decimal point] followed by
[zero or more spaces] followed by
[dash] followed by
[zero or more spaces] followed by
[number - might have a decimal point] followed by
[zero or one close parens] followed by
['FT'] followed by
[space or end-of-string]

The SINGLE-VALUE pattern is:
[descriptive text]
(might have spaces,dashes,slashes,parens) followed by
[one or more spaces] followed by
[number - might have a decimal point] followed by
[zero or more spaces] followed by
['FT'] followed by
[space or end-of-string]

The output would be:
[the entire string]
[the low value number]
[the high value number] - may be equal to the low value
['FT'] the UOM

Please let me know if you'd like to see a description of what we've been trying. Since the values we are after ALWAYS come on the back-half of the string, we figured to start by finding 'FT' and (hopefully) avoid the potential mess with the [descriptive text]. So we're trying backwards character-by-character evaluation anchored off of 'FT', and it's gruesome.

Hope this helps.
Thanks so very-very much!
Tom Kyte
February 17, 2012 - 9:00 pm UTC

I've sent an email to someone that likes to do regular expressions to see if they have an idea for you - anyone else is free to comment as well :)

try this

chris227, February 18, 2012 - 5:46 pm UTC

{code}
translate(regexp_substr(nm, '( |\()(\d+\.?\d*)', 1),'#(' ,'#') lv,
translate(regexp_substr(nm, '- *(\d+\.?\d*)', 1),'#- ','#') hv,
regexp_substr(nm, '(FT[^ ]*)', 1) uom,
nm,
target_low_val, target_high_val,
target_uom, notes
from x
{code}

try this follow up

chris227, February 18, 2012 - 5:52 pm UTC

sorry, did not know that editing is not possible here.

translate(regexp_substr(nm, '( |\()(\d+\.?\d*)', 1),'#(' ,'#') lv,
translate(regexp_substr(nm, '- *(\d+\.?\d*)', 1),'#- ','#') hv,
regexp_substr(nm, '(FT[^ ]*)', 1) uom,
nm
from x

LV HV UOM NM
- - - R1
20 30 FT R1 ABC 20-30 FT XYZ
- 7 FT R1 S-7 FT
0.5 2 FT R1-ABC (0.5 FT - 2FT)
5.5 - FT R1-ABC 5.5 FT
5.5 - FT R1-ABC 5.5FT
20 30 FTXYZ R1/ABC 20-30 FTXYZ

I did not know exactly what to do if uom is not FT, so i wrote it out. Just filter it from the result if not needed.
If # may be part of the data, take another character not being in or use regexp_replace instead e.g.

the part with # is nonsens

chris227, February 18, 2012 - 5:54 pm UTC

last statement on the use of # is nonsens, we are in the substring already, so there won't be a #

bug fix

chris227, February 18, 2012 - 6:26 pm UTC

i have done a "bugfix" in regard of the rule for after FT there must be space or end of line.
For R1-ABC (0.5 FT - 2FT) uom should be FT. But with the rule given above this could only be the first occurence of FT.
So FT- 2FT would not be matched.
But i think with the blueprint given it is possible to adjust it quit easy if necessary. (e.g. (FT |FT$|FT\))

select
translate(regexp_substr(nm, '( |\()(\d+\.?\d*)', 1),'#(' ,'#') lv,
translate(regexp_substr(nm, '- *(\d+\.?\d*)', 1),'#- ','#') hv,
regexp_substr(nm, '(FT |FT$)', 1) uom,
nm
from x

Range Parsing of Text(3)

J, February 21, 2012 - 9:51 am UTC

Tom and Chris.

THANK YOU!!! Awesome - this is waay more than we expected. Chris, this is a great bit of core code - we're going to parse this out (in English) to help us decipher what is actually going on in there, and then work it into our solution. BTW, just picked up "Mastering Regular Expressions", and the author does a pretty good job of explaining Regex'ese. Thanks again.

Splitting string with regular expression

Sangeetha, June 13, 2013 - 1:26 pm UTC

Hi Tom,

I have a scenario in which I need to migrate already existing data to a different structure. In our legacy system, there is a log column.Data is like this:
"

Log Time: 14/05/2009 08:31 Log By: Abubaker (26306) A/C Status: ETD 09:00 DOH (051) IAD

BAO R/H IDG FAULT , ENG RUN IS C/O , OCC IS INFORMED BEFORE 10 MINs , ONE HOUR DELAY REQUESTED,

Log Time: 14/05/2009 08:48 Log By: Abubaker (26306) A/C Status: ETD 09:00 DOH (051) IAD

GCU2 REPLACEMENT IN PROGRESS.

Log Time: 14/05/2009 09:20 Log By: Abubaker (26306) A/C Status: ETA 23:20 DOH (051) IAD

A/C DECLARED SERVICEABLE, OCC INFORMED AND REQUESTED TO EXPIDIT THE BOARDING."

I am trying to split this into another structure where, logtime, logged by, log desc are different columns and all these concatenated data will be separate rows in that table.So for this example , I will create 3 rows in the new table.
I tried many things with regular expression. But it is not coming correctly. Just wanted to know whether there is a way with regular expression?



Thanks and Regards
Sangeetha

Tom Kyte
June 18, 2013 - 3:18 pm UTC

there might be a way with regexp, but it isn't necessary at all. I never ever even consider using regexp unless and until there is no other way (very cpu intensive).

simple instr and substr will work fine here.

ops$tkyte%ORA11GR2> create table t ( x varchar2(4000) );

Table created.

ops$tkyte%ORA11GR2> insert into t values ( 'Log Time: 14/05/2009 08:31   Log By: Abubaker (26306)     A/C Status: ETD 09:00
  2     DOH (051) IAD
  3     BAO R/H IDG FAULT , ENG RUN IS C/O , OCC IS INFORMED BEFORE 10 MINs , ONE HOUR
  4     DELAY REQUESTED,' );

1 row created.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> select substr( x, instr( x, 'Log Time: ' )+10, 16 ),
  2         rtrim( substr( x, instr( x, 'Log By: ' )+ 8, instr( x, 'A/C Status: ')-instr( x, 'Log By: ' )-8 ) ) ,
  3             substr( x, instr( x, 'A/C Status: ')+12, 9 )
  4    from t;

SUBSTR(X,INSTR(X
----------------
RTRIM(SUBSTR(X,INSTR(X,'LOGBY:')+8,INSTR(X,'A/CSTATUS:')-INSTR(X,'LOGBY:')-8))
-------------------------------------------------------------------------------
SUBSTR(X,
---------
14/05/2009 08:31
Abubaker (26306)
ETD 09:00



Thank You

Sangeetha, June 20, 2013 - 2:00 pm UTC

Hi Tom,

Thanks. Yes I did it with instr and substr.

Thanks again
Have a great day.
Sangeetha

pattern matching for strings,

A reader, October 01, 2013 - 7:26 pm UTC

Hello,

Table A has column job_name and table B has column table_name.

The values of Table A:
J_ICtoRZ_VehicleD_Staging
J_ICtoRZ_VehicleD_Merge_Optimized
J_ICtoRZ_VehicleBldCompF_Staging
J_ICtoRZ_VehicleBldCompF_Merge_Optimized

Values of Table B:
VEHICLE
VEHICLE_S
VEHICLE_BLD_COMP
VEHICLE_BLD_COMP_S

I want to join the two tables and get the appropriate tables for their corresponding job names. If the job name has "staging" keyword, then it need to pick the tablename that has "_S"

I want the output like:
J_ICtoRZ_VehicleD_Staging VEHICLE_S
J_ICtoRZ_VehicleD_Merge_Optimized VEHICLE
J_ICtoRZ_VehicleBldCompF_Staging VEHICLE_BLD_COMP
J_ICtoRZ_VehicleBldCompF_Merge_Optimized VEHICLE_BLD_COMP_S

How to achieve this?

Thanks,

Pattern matching,

A reader, November 07, 2013 - 5:54 pm UTC

Hello,

Table A has column job_name and table B has column table_name.

The values of Table A:
J_ICtoRZ_VehicleD_Staging
J_ICtoRZ_VehicleD_Merge_Optimized
J_ICtoRZ_VehicleBldCompF_Staging
J_ICtoRZ_VehicleBldCompF_Merge_Optimized

Values of Table B:
VEHICLE
VEHICLE_S
VEHICLE_BLD_COMP
VEHICLE_BLD_COMP_S

I want to join the two tables and get the appropriate tables for their corresponding job names. If
the job name has "staging" keyword, then it need to pick the tablename that has "_S"

I want the output like:
J_ICtoRZ_VehicleD_Staging VEHICLE_S
J_ICtoRZ_VehicleD_Merge_Optimized VEHICLE
J_ICtoRZ_VehicleBldCompF_Staging VEHICLE_BLD_COMP
J_ICtoRZ_VehicleBldCompF_Merge_Optimized VEHICLE_BLD_COMP_S

How to achieve this?

Thanks,

More to Explore

PL/SQL demos

Check out more PL/SQL tutorials on our LiveSQL tool.

PL/SQL docs

PL/SQL reference manual from the Oracle documentation library