Skip to Main Content
  • Questions
  • How extract text between 2 strings and get multiple occurances

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Nagaraj.

Asked: March 08, 2016 - 12:12 pm UTC

Last updated: August 23, 2021 - 2:19 am UTC

Version: 12.1

Viewed 10K+ times! This question is

You Asked

I have CLOB column which contains comments entered by multiple people multiple times. All concatenated and stored in CLOB.
I want to get comments entered by one person from this Column and get all comments.

Following is comments from CLOB:-
"


*** EMP1 - 02/15/16 09:32 PST ***Sallary Approved - Emp1EMP2***** EMP2 - 02/19/16 07:59 ***What is the value of raise. *** EMP2 - 02/19/16 7:59 PST ***confidential***** EMP1 - 02/23/16 21:31 PST ***Details can be shared - emp1"\
---------------------


In the above text , I want to extract strings between EMP1 and emp for all occurances.


I can use substr. But is there a way to do this without using loops in single regular expression.


I tried the following for one occurance.

ex. Table temp1 contains comments.

select substr(to_char(comments),INSTR(to_char(a.comments),'EMP1',1,1),(INSTR(to_char(a.comments),'emp1',1,1)-INSTR(to_char(a.comments),'EMP1',1,1))+4) from temp1 where workflow_user='EMP1';

and Chris said...

I'm not clear what output you want. Do you want a separate line for each section starting EMPn and finishing at the next empn?

If so, you need a row generator to create the lines:

with rws as (
  select level lev from dual
  connect by level <= (select max(regexp_count(comments, 'EMP1')) from temp)
)
  select * from rws;


Then cross join this with the table. You can then get pass the level (lev) to substr/instr to return the Nth occurrence:

create table temp (
  comments clob
);
insert into temp values (
  '*** EMP1 - 02/15/16 09:32 PST ***Sallary Approved - emp1EMP2***** EMP2 - 02/19/16 07:59 ***What is the value of raise. *** EMP2 - 02/19/16 7:59 PST ***confidential***** EMP1 - 02/23/16 21:31 PST ***Details can be shared - emp1'
);

with rws as (
  select level lev from dual
  connect by level <= (select max(regexp_count(comments, 'EMP1')) from temp)
)
  select substr ( 
           to_char ( comments ) ,
           instr ( to_char ( a.comments ) ,'EMP1',1, lev ) , 
           ( instr ( to_char ( a.comments ) ,'emp1',1, lev ) - 
             instr ( to_char ( a.comments ) ,'EMP1',1, lev ) )+4 
         ) 
  from   rws, temp a;

SUBSTR(TO_CHAR(COMMENTS),INSTR(TO_CHAR(A.COMMENTS),'EMP1',1,LEV),(INSTR(TO_CHAR(A.COMMENTS),'EMP1',1,LEV)-INSTR(TO_CHAR(A.COMMENTS),'EMP1',1,LEV))+4)          
----------------------------------------------------------------------------------------------------------------------------------------------------------------
EMP1 - 02/15/16 09:32 PST ***Sallary Approved - emp1                                                                                                            
EMP1 - 02/23/16 21:31 PST ***Details can be shared - emp1  


Note: in your example the first entry finished with Emp1 instead of emp1. I'm not sure whether this is intentional or not. I've gone with all lowercase for simplicity.

Rating

  (3 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

thanks a lot..

Nagaraj Hegde, March 08, 2016 - 4:17 pm UTC

Thanks much. This works...I wanted in separate line and this results looks fine with me...

This helped thank you

Praseetha, December 02, 2020 - 6:31 pm UTC

This thread helped me make good progress on the issue I was facing. Thank you Chris
Chris Saxon
December 03, 2020 - 8:31 am UTC

You're welcome

I need help on same thing

Anuja, August 20, 2021 - 11:55 am UTC

My code is as follows:
with rws as (
select level lev from dual
connect by level <= (select regexp_count(comments, 'INFO:') from temp where id='1111111111111'
)
)
select substr (
DBMS_LOB.SUBSTR ( comments,4000,1 ) ,
instr ( DBMS_LOB.SUBSTR ( a.comments,4000,1) ,'INFO:',1, lev ) ,
( instr ( DBMS_LOB.SUBSTR ( a.comments,4000,1 ) ,'TYPE:',1, lev ) -
instr ( DBMS_LOB.SUBSTR ( a.comments,4000,1 ) ,'INFO:',1,lev ) )+10
)
from rws, temp a where system_id='id='1111111111111';

and data is like as below in comment column which is clob type

Sample
match
pattrn
INFO:
u59043u;bvbaugdkn
TYPE:
.
.
.
Sample
match
pattrn
INFO:
ncsdkfw34y38u;
TYPE:

and many more like this pattern...



Here I want to extract the line which is between INFO: and TYPE: for eg. ncsdkfw34y38u;

I used above code and manage to get 5 such line from 16 as my table have 16 unique line for above mention id.

Can you please help me to get result?

I am attaching output here:

1. INFO: u59043u;bvbaugdkn TYPE:
2. INFO: ncsdkfw34y38ughjj TYPE:
3. INFO: ncsdkfw34y38uxsrt TYPE:
4. INFO: ncsdkfw34y38unjhi TYPE:
5. INFO: ncsdkfw34y38ucfdd TYPE:
6. Sample
7. Sample
8. Sample
9. Sample
10. Sample
11. Sample
12. Sample
13. Sample
14. Sample
15. Sample
16. Sample

Not sure why but its starting search from top and not from INFO: from row 6.
Can you please help me with this?

Connor McDonald
August 23, 2021 - 2:19 am UTC

Can we get a full text case please (with table population scripts )

More to Explore

CLOB/BLOB/etc

Complete documentation on Securefiles and Large Objects here