Skip to Main Content
  • Questions
  • Extract a repeated string from a field

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Guy.

Asked: December 09, 2008 - 12:06 pm UTC

Last updated: March 29, 2012 - 8:39 am UTC

Version: 10.2.0

Viewed 10K+ times! This question is

You Asked

Tom,

First off thanks for all the times in the past when you have helped me by already answering my question that somebody else had asked first!

I am trying to extract a list of email addresses that have been entered in a comment field in a 3rd party database. The field "comment_text" is defined as varchar2(4000). As the database is 3rd party owned I'm not allowed to amend it's structure to create easily queriable fields for these email addresses and so am left with a problem of extracting email addresses from up to 4000 characters entered in this field! I'm only working with one record at a time based on job_id which is another field in this table.

The sort of text that will be in a single comment might be as follows with varying email addresses:-

The support contacts are as follows:-

Email: theboss@mycompany.com          Tel: 01234567890
Email: 2ndleutenant@mycompany.com     Tel: 01234567891
Email: dogsbody@mycompany.com         Tel: 01234567899


I've been trying to use regexp_substr and instr to locate the email addresses anchored around the "Email:" tag, but only seem to be able to get one email at a time. I thought I'd got close to striping out the email addresses with the code below and was looking at stripping off "space and any character to end of line", but took way too long to realise that I still only had a field with 3 lines in it rather than 3 rows I thought I'd got :-(

SELECT REGEXP_REPLACE(comment_text,'Email: ',chr(13)||chr(10)) emails   
FROM details_tab 
WHERE job_id = 12345; 


Ideally I'd like to just extract the email addresses and have them in a single variable like this 'theboss@mycompany.com; 2ndleutenant@mycompany.com; dogsbody@mycompany.com'. Once I have them in this format it will be easy for me to send emails using utl_mail.send.

Any pointers would be much appreciated.

regards

Guy

and Tom said...

ops$tkyte%ORA10GR2> create table t ( x varchar2(4000), job_id number );

Table created.

ops$tkyte%ORA10GR2> insert into t (x, job_id ) values (
  2  'The support contacts are as follows:-
  3  Email: theboss@mycompany.com          Tel: 01234567890
  4  Email: 2ndleutenant@mycompany.com     Tel: 01234567891
  5  Email: dogsbody@mycompany.com         Tel: 01234567899' , 12345 );

1 row created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> variable job_id number
ops$tkyte%ORA10GR2> exec :job_id := 12345;

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> with data
  2  as
  3  (
  4  select translate( x, chr(13)||chr(10)||chr(9)||':', '    ')||' ' x,
  5         length(x)-length(replace(x,'@','')) cnt,
  6             length(x)+1 len
  7    from t
  8   where job_id = :job_id
  9  )
 10  select substr(max(sys_connect_by_path(data,'; ')),3) emails
 11    from (
 12  select substr( x, prior_space+1, next_space-prior_space-1 ) data, l
 13    from (
 14  select x, cnt,
 15         instr( x, '@', 1, level ) atsign,
 16             instr( x, ' ', instr( x, '@', 1, level ) ) next_space,
 17             instr( x, ' ', instr( x, '@', 1, level )-len ) prior_space,
 18             level l
 19    from data
 20  connect by level <= cnt
 21         )
 22             )
 23  connect by prior l=l-1
 24  /

EMAILS
-------------------------------------------------------------------------------
theboss@mycompany.com; 2ndleutenant@mycompany.com; dogsbody@mycompany.com



Notes:

line 4 - you might have to play with the translate - my goal: put a space before and after the string with an @ in it (that is why I added ':' in case you have email:x@y for example...

line 5 - gets the number of '@' in the email - we will assume that represents an email address...

line 8: assumption - job_id is primary key - this query works on a single row.


lines 14-20 - turn the single row into N rows where N is number of @ signs. For each of those, use instr to find the space in front and behind the N'th @

line 12: substr out the N'th email

line 10: connect the strings together into a single string and using max, get the longest (most complete) one...



Rating

  (14 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Excellent response - nailed the problem

Guy Tew, December 10, 2008 - 4:03 pm UTC

Thank you very much for the answer, Tom. The notes you've given have made it so even I can understand the code and feel confident about implementing and maintaining it. Cheers.

Extract first Numbers from String

Rupal, April 29, 2009 - 11:05 pm UTC

Hi Tom,

On the smililar line as above, I have following requirement.
I have string field in the table which can have following values

288
28 Years
6 Days
9 Years 5 Months
21/Mar/1989

From above records I want to pull only first numbers,
output for above records should be

288
28
6
9
21

Can you please suggest me some good way of doing it in SQL?

Thanks & regards
Rupal

Tom Kyte
April 30, 2009 - 8:47 am UTC


ops$tkyte%ORA10GR2> select x, regexp_replace( x, '[^0-9].*', '' )  y
  2    from (select '288' x from dual union all
  3          select '28 Years' x from dual union all
  4         select  '6 Days' x from dual union all
  5         select  '9 Years 5 Months' x from dual union all
  6         select  '21/Mar/1989' x from dual  )
  7  /

X                Y
---------------- ----------
288              288
28 Years         28
6 Days           6
9 Years 5 Months 9
21/Mar/1989      21


Thanks so much Tom

Rupali, May 04, 2009 - 10:19 am UTC

You are always a great help Tom. God bless you asn AskTom Site.

Extract a repeated string from a field - Oracle 10g Regular Expression.

Rajeshwaran, Jeyabal, May 05, 2009 - 8:35 am UTC

Tom,

Regarding the above Email id Question, we know the format of the Email id, that will be in this format '\w+@\w+\.\w+'.
So we can use our Favourite Oracle 10g regular expression feature ( REGEXP_SUBSTR ) to answer it right?
like the below. instead of using SUBSTR & INSTR old Oracle 9i handy features.
Scott@IRADSDB> WITH datas AS
  2    (SELECT regexp_substr(x,    '\w+@\w+\.\w+',    1,    LEVEL) AS
  3    emails,
  4       LEVEL lvl
  5     FROM t CONNECT BY LEVEL <= LENGTH(x) -LENGTH(REPLACE(x,    '@')))
  6  SELECT LTRIM(MAX(sys_connect_by_path(emails,   ' ; ')),   ' ;') AS
  7  emails_lists
  8  FROM datas START WITH lvl = 1 CONNECT BY PRIOR lvl = lvl -1
  9  /

EMAILS_LISTS
---------------------------------------------------------------------------------
theboss@mycompany.com ; 2ndleutenant@mycompany.com ; dogsbody@mycompany.com

Please correct me if i am wrong.

Thanks,
Rajesh.
Tom Kyte
May 11, 2009 - 8:43 am UTC

I like to use the old fashioned builtin's (substr, trim, etc) before even considering regular expressions.

regular expressions can be quite CPU intensive.

Regular expressions can be quite CPU intensive

Rajeshwaran, Jeyabal, May 11, 2009 - 9:42 am UTC

Tom,

Regular expressions can be quite CPU intensive.
So you mean to say Stick on old fashioned builtin's (substr, trim, etc), rather than using new handy regular expressions in 10g?

Thanks,
Rajesh.
Tom Kyte
May 11, 2009 - 6:46 pm UTC

isn't that what I said directly?

... I like to use the old fashioned builtin's (substr, trim, etc) before even considering regular expressions. ...

String of dba_advisor_findings

Fabio, March 29, 2010 - 9:56 am UTC

Hi Tom,
I have the following situations:

SQL> select MORE_INFO from dba_advisor_findings where task_name = 'SEGMENT_ADVISOR_TASK';

MORE_INFO
--------------------------------------------------
Allocated Space:3145728: Used Space:2976626: Recla
imable Space :169102:

And I wondering hos can I extract only the numbers of this query and insert into a new table.
Please let me know

Tkanks,

Fabio

creation of email id using substring

kiranmayi, August 24, 2011 - 3:22 pm UTC

hai tom
here the employee emailid is like this
kiranmayi.mokkapati@gmail.com
kiranmayi_604@gmail.com
now how to create a check constraint for the above mailid
Tom Kyte
August 30, 2011 - 3:52 pm UTC

umm, that is a naive interpretation of a valid email - there are lots of rules of what can be where.

Suggest you look around to see if you can find a regular expression that validates an email and look to use the REGEXP functions to create a check constraint.

creation of mailid which supports versions like 8i,9i

kiranmayi, August 24, 2011 - 3:24 pm UTC

for the above question
Tom Kyte
August 30, 2011 - 3:53 pm UTC

not sure what you mean at all - but if you are using really old software, you are going to be painfully limited in what you can do - no regular expressions for you...

Florin, December 21, 2011 - 2:49 am UTC

Hi Tom,

I've tried to check all your relevant pages related to a substr/instr issue but couldn't find.

I have a table T1 (a varchar(10)) with the 3 following rows:

Example:

12AA
BC33
FG00A

I have another table T2 (aa varchar(10))

Here the data is like (one record):

'DD, EE, 12, 00, AB'

Now, I want to select from T1 only the records that are not in T2 string (checking only the string between commas):

Ex , I want to get at the end only BC33 (as 12AA contains '12' from T2 and also 'FG00A' contains '00' from T2.

Tom Kyte
December 21, 2011 - 8:39 am UTC

ops$tkyte%ORA11GR2> select a, aa, '(' || replace( replace(t2.aa,',','|'), ' ', '' ) || ')'
  2    from t1, t2
  3   where NOT regexp_like( t1.a, '(' || replace( replace(t2.aa,',','|'), ' ', '' ) || ')' );

A          AA                   '('||REPLACE(REPLACE(T
---------- -------------------- ----------------------
BC33       DD, EE, 12, 00, AB   (DD|EE|12|00|AB)


Breaking varchar with datestamps into multiple rows

Rob, December 22, 2011 - 12:29 pm UTC

Hi Tom,

My case is similar to the original question: I have a table with multiple historical entries in a comment field that have been date stamped each time the field was edited. There may not be whitespace before each date stamp but the formats match in each case and there can be n comments per record.

create table t1 (task_id number,
history varchar2(4000));

insert into t1 (task_id, history) values (
1, '12/30/2010 11:15:36 AM: first text sample
1/6/2011 1:15:37 PM: second text sample blah 11/30/2011 8:20:19 AM: task 1 complete');

insert into t1 (task_id, history) values (
2, '3/15/2011 12:12:12 PM: task 2 started.3/16/2011 1:01:01 PM: task 2 complete');

insert into t1 (task_id) values (
3);

commit;

Output I am looking for is:

task_id note_time
1 12/30/2010 11:15:36 AM
1 1/6/2011 1:15:37 PM
1 11/30/2011 8:20:19 AM
2 3/15/2011 12:12:12 PM
2 3/16/2011 1:01:01 PM

Task 3 is not desired since it has no history.

Like the original question, I have an easy time getting one date per record using to_date(regexp_substr(history,'\d+/\d+/\d{4} \d+:\d+:\d+ [[:upper:]]{2}')), but having trouble getting all dates on separate lines. Have also tried regexp_count(history,'\d+/\d+/\d{4} \d+:\d+:\d+ [[:upper:]]{2}') to collect max occurrences to feed into regexp_substr but cannot figure out how to get it to populate from 3 for just the last occurence down to 3, 2, 1 (for task_id 1) 2, 1 (for task 2, etc.) for the outer query.

After the first output for tracking historical activity, will need to collect this output to put into a new table when phasing out the old comment field:

task_id note_time comment
1 12/30/2010 11:15:36 AM first text sample
1 1/6/2011 1:15:37 PM second text sample blah
1 11/30/2011 8:20:19 AM task 1 complete
2 3/15/2011 12:12:12 PM task 2 started.
2 3/16/2011 1:01:01 PM task 2 complete

Or, can just move straight to the second output since it will be adequate for activity tracking purposes. Can you help?

- Rob
Tom Kyte
December 22, 2011 - 2:07 pm UTC

ugh, this is a perfect example of why programmers should not be allowed to 'design' schemas. Unless and until the programmer has read a book or two on databases (they read all about their frameworks, their programming language, their OS - but database - no....)

Why would anyone have thought to store this row data in a single row like this :(


here is the ugliness that can help you work around this, warning, it will be CPU intensive - regular expressions usually are:

ops$tkyte%ORA11GR2> select task_id,
  2         history,
  3         to_date( sub1, 'mm/dd/yyyy hh:mi:ss am' ) dt,
  4         '"' || substr( history, strt, stop-strt-1 ) || '"' txt
  5    from (
  6  select x.*,
  7         instr( history, sub1 )+length(sub1)+2 strt,
  8         nvl( instr( history, sub2 ), length(history)+2 ) stop
  9    from (
 10  select t.* ,
 11     regexp_substr(history,'\d+/\d+/\d{4} \d+:\d+:\d+ [[:upper:]]{2}',1,column_value) sub1,
 12     regexp_substr(history,'\d+/\d+/\d{4} \d+:\d+:\d+ [[:upper:]]{2}',1,column_value+1) sub2
 13    from t, table( cast( multiset( select level r
 14                                     from dual
 15                                  connect by level <=
 16        regexp_count(t.history,'\d+/\d+/\d{4} \d+:\d+:\d+ [[:upper:]]{2}') ) as sys.odciVarchar2List ) )
 17        ) x
 18        )
 19  /

   TASK_ID HISTORY                        DT                   TXT
---------- ------------------------------ -------------------- ------------------------------
         1 12/30/2010 11:15:36 AM: first  30-dec-2010 11:15:36 "first text sample"
           text sample 1/6/2011 1:15:37 P
           M: second text sample blah 11/
           30/2011 8:20:19 AM: task 1 com
           plete

         1 12/30/2010 11:15:36 AM: first  06-jan-2011 13:15:37 "second text sample blah"
           text sample 1/6/2011 1:15:37 P
           M: second text sample blah 11/
           30/2011 8:20:19 AM: task 1 com
           plete

         1 12/30/2010 11:15:36 AM: first  30-nov-2011 08:20:19 "task 1 complete"
           text sample 1/6/2011 1:15:37 P
           M: second text sample blah 11/
           30/2011 8:20:19 AM: task 1 com
           plete

         2 3/15/2011 12:12:12 PM: task 2  15-mar-2011 12:12:12 "task 2 started"
           started.3/16/2011 1:01:01 PM:
           task 2 complete

         2 3/15/2011 12:12:12 PM: task 2  16-mar-2011 13:01:01 "task 2 complete"
           started.3/16/2011 1:01:01 PM:
           task 2 complete

         3                                                     ""

6 rows selected.


Fantastic

Rob, December 23, 2011 - 1:12 pm UTC

Thanks, worked like a charm!

Extract first Numbers from String

Soumadip, March 29, 2012 - 8:11 am UTC

Hi Tom,

Regarding the following post above :-

"
On the smililar line as above, I have following requirement.
I have string field in the table which can have following values

288
28 Years
6 Days
9 Years 5 Months
21/Mar/1989

From above records I want to pull only first numbers,
output for above records should be

288
28
6
9
21

Can you please suggest me some good way of doing it in SQL?
"

------------------------------------------------------------
Can this query be written instead of using Regular Expression, as you said it is always better to use the old techniques before even thinking of using Regular Expression :-

WITH abc AS
(
SELECT a.str
,translate(a.str,'ABCDEFGHIJKLMNOPQRSTUVWXYZ/- &^%$£','a') AS trans
,length(translate(a.str,'ABCDEFGHIJKLMNOPQRSTUVWXYZ/- &^%$£','a')) AS tot_len
,instr(translate(a.str,'ABCDEFGHIJKLMNOPQRSTUVWXYZ/- &^%$£','a'),'a') AS instrr
FROM (SELECT '288' AS str FROM dual
UNION ALL
SELECT '28 YEARS' AS str FROM dual
UNION ALL
SELECT '6 DAYS' AS str FROM dual
UNION ALL
SELECT '9 YEARS 5 MONTHS' AS str FROM dual
UNION ALL
SELECT '21/MAR/1989' AS str FROM dual
) a
)
SELECT str
,substr(trans,1,CASE WHEN instrr = 0
THEN tot_len
ELSE instrr -1
END) AS result
FROM abc a;
Tom Kyte
March 29, 2012 - 8:39 am UTC

ops$tkyte%ORA11GR2> select str, substr( new_str, 1, length(new_str)-nvl(length(ltrim(new_zero_str,'0')),0) )
  2    from (
  3  select str, substr( str, instr( zero_str, '0' ) ) new_str, substr( zero_str, instr( zero_str, '0' ) ) new_zero_str
  4    from (
  5  select str, translate( str, '0123456789', '0000000000' ) zero_str
  6    from t
  7         )
  8         )
  9  /

STR                     SUBSTR(NEW_STR,1,LENGTH
----------------------- -----------------------
288                     288
28 YEARS                28
6 DAYS                  6
9 YEARS 5 MONTHS        9
21/MAR/1989             21
the date is 21/MAR/1989 21

6 rows selected.




is another approach. If the data is such that it ALWAYS starts with a number, it can be simplified to:

ops$tkyte%ORA11GR2> select str, substr( str, 1, length(str)-nvl(length(ltrim(translate( str, '0123456789', '0000000000' ),'0')),0) )
  2    from t
  3  /

STR              SUBSTR(STR,1,LEN
---------------- ----------------
288              288
28 YEARS         28
6 DAYS           6
9 YEARS 5 MONTHS 9
21/MAR/1989      21



How to insert a delimiter

A reader, November 29, 2013 - 1:13 pm UTC

Hi Tom,

How can I get this result:

1 3
1 YYY
1 225 - 1 3
1 3 - 1 225
2 7 - 2 8 - 1 3

From this query,

with data as ( SELECT '1 3' str
FROM dual
UNION ALL
SELECT '1 YYY' str
FROM dual
UNION ALL
SELECT '1 225 1 3' str
FROM dual
UNION ALL
SELECT '1 3 1 225'
FROM dual
UNION ALL
SELECT '2 7 2 8 1 3' str
FROM dual
)
select str
from data

Thanks..

Pattern Matching

Ravi Kishore, October 16, 2014 - 6:08 pm UTC

Hi Tom,

I am trying to extract data from column with data delimited by a SPACE-HYPHEN-SPACE ( - ). In the data I need to pick the data between first delimiter and second delimiter.

For example the requirement is to get the value DEF XYZ-MNO from the below query.

ORA11G@DEV SELECT REGEXP_REPLACE(REGEXP_SUBSTR('ABC - DEF XYZ-MNO - GHI - JKL', '
( - )[^( - )]+'),'( - )','') PATTERN FROM DUAL;

PAT
---
DEF

ORA11G@DEV


I am trying to make SPACE-HYPHEN-SPACE as a group using the () and then look for the first occurrence of that pattern and then grabbing everything till I encounter another instance of the delimiter.

Can you please let me know where I am going wrong?

Thanks and Regards
Ravi

More to Explore

PL/SQL demos

Check out more PL/SQL tutorials on our LiveSQL tool.

PL/SQL docs

PL/SQL reference manual from the Oracle documentation library