Skip to Main Content
  • Questions
  • SUBSTR - Dynamically spliting the string.

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, P.

Asked: July 11, 2019 - 12:31 pm UTC

Last updated: May 18, 2021 - 3:01 pm UTC

Version: 12c

Viewed 10K+ times! This question is

You Asked

Hi Tom


I need to split a string dynamically into 4 parts each of maximum 22 characters but if 22nd character is not the right break (space) it should consider the last space.

example :'1100 Crescent PkWay Apartment 101 suite 200'
22 characters would be "1100 Crescent PkWay Ap" then I need it as
first part : 1100 Crescent PkWay
second : Apartment 101 suite
third : 200
fourth : null

I tried taking the count of space using REGEXP_INSTR and looping but looking for a better approach.

Thanks
P

and Chris said...

Here's one way to approach this:

- Split the string into words, with a word per row
- Recombine them into strings up to 22 characters long, using the techniques to avoid listagg overflows. You can find these discussed at:

https://blogs.oracle.com/datawarehousing/managing-overflows-in-listagg
https://stewashton.wordpress.com/2019/07/10/making-lists/

Which looks something like:

with val as ( 
  select '1100 Crescent PkWay Apartment 101 suite 200' str
  from   dual
), words as (
  select regexp_substr(str, '[^ ]+', 1, level) w
  from   val
  connect by regexp_substr(str, '[^ ]+', 1, level) is not null
), grps as (
  select * from words 
  match_recognize (
    measures 
      match_number () grp,
      count ( text.* ) word#
    all rows per match
    pattern ( init text* )
    define 
      text as sum ( length ( w ) ) + count ( text.* ) <= 22
  )
)
  select listagg ( w, ' ' ) 
           within group ( order by word# ) split_strs
  from   grps
  group  by grp
  order  by grp;

SPLIT_STRS            
1100 Crescent PkWay    
Apartment 101 suite    
200  

Rating

  (10 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

P K, July 16, 2019 - 4:06 pm UTC

Thank You Tom !
This approach is simpler than complex looping on space count which I wrote.
Chris Saxon
July 16, 2019 - 4:43 pm UTC

:)

Order by !

Rajeshwaran Jeyabal, July 19, 2019 - 11:43 am UTC

we have this match_recognized without Order by clause - will it be deterministic ?
 select * from words 
  match_recognize (
    measures 
      match_number () grp,

Chris Saxon
July 19, 2019 - 2:08 pm UTC

Ah, good spot. No it isn't deterministic.

Trying same as a table output

Shiva, June 11, 2020 - 4:58 am UTC

Tom, i am trying the same example as an output of the query but the first value keeps repesting for me. Can you please help me ?

--create table
CREATE TABLE TEST_ADDRESS(ADDRESS VARCHAR2(100));
--add data
INSERT INTO TEST_ADDRESS VALUES ('1411 FOREST LANE FAIRBORN OH 45324');
INSERT INTO TEST_ADDRESS VALUES ('1100 CRESCENT PKWAY APARTMENT 101 SUITE 200');
--select query
SELECT (SELECT LISTAGG(W, ' ') WITHIN GROUP(ORDER BY WORD#) SPLIT_STRS
FROM (SELECT *
   FROM (SELECT REGEXP_SUBSTR(A.ADDRESS, '[^ ]+', 1, LEVEL) W
      FROM DUAL
      CONNECT BY REGEXP_SUBSTR(A.ADDRESS, '[^ ]+', 1, LEVEL) IS NOT NULL)
             WORDS MATCH_RECOGNIZE(MEASURES MATCH_NUMBER() GRP, COUNT(TEXT.*) WORD# ALL ROWS PER MATCH PATTERN(INIT TEXT *) DEFINE TEXT AS SUM(LENGTH(W)) + COUNT(TEXT.*) <= 50)) GRPS
GROUP BY GRP
ORDER BY GRP DESC
FETCH FIRST 1 ROWS ONLY) SPLIT_ADDR FROM TEST_ADDRESS A;

Chris Saxon
June 11, 2020 - 3:16 pm UTC

I hate it when things repest on me ;)

But seriously, I'm not sure what the issue is here. Could you clarify please?

shiva, June 12, 2020 - 4:50 am UTC

If i run the code below i get the first record repeated for every row.

--create table  
CREATE TABLE TEST_ADDRESS(ADDRESS VARCHAR2(100));  
--add data  
INSERT INTO TEST_ADDRESS VALUES ('1411 FOREST LANE FAIRBORN OH 45324');  
INSERT INTO TEST_ADDRESS VALUES ('1100 CRESCENT PKWAY APARTMENT 101 SUITE 200');  
--select query  
SELECT (SELECT LISTAGG(W, ' ') WITHIN GROUP(ORDER BY WORD#) SPLIT_STRS  
FROM (SELECT *  
FROM (SELECT REGEXP_SUBSTR(A.ADDRESS, '[^ ]+', 1, LEVEL) W  
FROM DUAL  
CONNECT BY REGEXP_SUBSTR(A.ADDRESS, '[^ ]+', 1, LEVEL) IS NOT NULL)  
             WORDS MATCH_RECOGNIZE(MEASURES MATCH_NUMBER() GRP, COUNT(TEXT.*) WORD# ALL ROWS PER MATCH PATTERN(INIT TEXT *) DEFINE TEXT AS SUM(LENGTH(W)) + COUNT(TEXT.*) <= 25)  
) GRPS  
GROUP BY GRP  
ORDER BY GRP  
FETCH FIRST 1 ROWS ONLY) FIRST_PART FROM TEST_ADDRESS A;  


Below is my output.

FIRST_PART
----------------------------------------------------------------
1411 FOREST LANE FAIRBORN
1411 FOREST LANE FAIRBORN
Chris Saxon
June 12, 2020 - 7:43 am UTC

That's not what I see...

SQL> SELECT (SELECT LISTAGG(W, ' ') WITHIN GROUP(ORDER BY WORD#) SPLIT_STRS  
  2  FROM (SELECT *  
  3  FROM (SELECT REGEXP_SUBSTR(A.ADDRESS, '[^ ]+', 1, LEVEL) W  
  4  FROM DUAL  
  5  CONNECT BY REGEXP_SUBSTR(A.ADDRESS, '[^ ]+', 1, LEVEL) IS NOT NULL)  
  6               WORDS MATCH_RECOGNIZE(MEASURES MATCH_NUMBER() GRP, COUNT(TEXT.*) WORD# ALL ROWS PER MATCH PATTERN(INIT TEXT *) DEFINE TEXT AS SUM(LENGTH(W)) + COUNT(TEXT.*) <= 25)  
  7  ) GRPS  
  8  GROUP BY GRP  
  9  ORDER BY GRP  
 10  FETCH FIRST 1 ROWS ONLY) FIRST_PART 
 11  FROM TEST_ADDRESS A;

FIRST_PART                  
1411 FOREST LANE FAIRBORN    
1100 CRESCENT PKWAY  

shiva, June 12, 2020 - 8:29 am UTC

I see the difference now Tom. The issue exists in 12.2 version. Its working fine in 19 version.
Chris Saxon
June 12, 2020 - 8:54 am UTC

Thanks for letting us know, glad you figured this out.

SQL-Not working for large text having > 4000 characters

Shyam baviskar, May 17, 2021 - 12:55 pm UTC

Hi Tom,
I have table like below
CREATE TABLE T1(CALL_ID NUMBER(10), CALL_NOTE(VARCHAR2(4000)))

INSERT INTO T1
VALUES(1, '***4000 Characters length string****')

When I used same logic then it's running longer than usual.
Could you please help to optimize this query?
Chris Saxon
May 17, 2021 - 2:14 pm UTC

Please share a complete example of the code you're running

SQL-Not working for large text having > 4000 characters

Shyam baviskar, May 17, 2021 - 2:38 pm UTC

Here is the query I am using,

SELECT DISTINCT CALL_ID,
'T' AS CALL_TYPE,
LEVEL-1 AS CALL_SEQ_NO,
SUBSTR(TRIM(CALL_NOTE),1+((level-1)*255),255) AS CALL_TEXT
FROM T1
CONNECT BY LEVEL <= LENGTH(SUBSTR(TRIM(CALL_NOTE),1+((LEVEL-1)*255),255));
Chris Saxon
May 17, 2021 - 3:01 pm UTC

This is fast for me:

CREATE TABLE T1(CALL_ID NUMBER(10), CALL_NOTE VARCHAR2(4000));

INSERT INTO T1
VALUES(1, lpad ( '***4000 Characters length string****', 4000, '*' ) );

set timing on

SELECT DISTINCT CALL_ID,
       'T' AS CALL_TYPE,
       LEVEL-1 AS CALL_SEQ_NO,
       SUBSTR(TRIM(CALL_NOTE),1+((level-1)*255),255) AS CALL_TEXT
FROM T1
CONNECT BY LEVEL <= LENGTH(SUBSTR(TRIM(CALL_NOTE),1+((LEVEL-1)*255),255)); 

Elapsed: 00:00:00.062


DISTINCT is unnecessary; this adds an extra sort which can slow things down. Remove it.

So what exactly does "slow" mean? Please provide a representative test case. Something we can copy and see similar results to you.

SQL-Not working for large text having > 4000 characters

Shyam baviskar, May 17, 2021 - 2:41 pm UTC

Using the above query, I am trying to split and create new row after each 255 characters.

SQL-Not working for large text having > 4000 characters

Shyam baviskar, May 17, 2021 - 4:55 pm UTC

Hi Tom,
I found the issue in string which is having '&' and single quote (') special characters which oracle internally consider them as Bind variables...
Now finding out the solution to consider both special characters.
Chris Saxon
May 18, 2021 - 3:01 pm UTC

To avoid the client interpreting & as a substitution variable, run:

set define off


And to allow quotes within a string, use the q operation:

select q'[you're able to within this]' from dual


The query is still fast when running for me though:

CALL_ID   CALL_SEQ_NO   CALL_TEXT                                                                                                                                                                                                                                                         
         1              0 Eligibility Message: SUBSCRIBER LAST NAME: XXXXXX. EVC #: 123456789 and CNTY CODE: 19. PRMY AID CODE: 3R. MEDI-CAL ELIGIBLE W/ NO SOC/SPEND DOWN. HEALTH PLAN MEMBER: PHP-L.A. CARE HLTH PLAN: MEDICAL CALL (xxx)xxx-1234. HCP: CARE FIRST CALL: (123) 123-1234    
         1              1 . PCP: BARTZ - ALTADONNA COMMUN CALL: (123)123-1234 TRANSFER TO CS FOR FACETS REINSTATEMENT,Home Phone NO RECORDED ELIGIBILITY FOR REQUESTED DATE OF SERVICE 05/12/2020.,Home Phone 123456789,00T0H00005PHJjIUAX,xxxxxxxx,12-May-20 10.19.35.000000000 AM,xxxxx    
         1              2 xx,04-Feb-21 09.10.25.000000000 PM,TURNER - Retention Call,Retention Call On Hold,Outbound Call,Pending,(05/12/20 @ 09:49AM - PER AEVS: NRE) - CONNECTED MBR TO DPSS FOR REINSTATEMENT ASSISTANCE. SHE RCVD A NOA STATING RDTERM WAS APPROVED (RDTERM DUE 4/202    
         1              3 0). EDUCATED AND LEFT HER ON THE LINE. MBR UNDERSTOO. *TERMED 4/30/20* CALLER: XXXXXXXX XXXXXX / CASE# B123VC146 ................................................................................. (06/05/20 @ 01:43PM - PER AEVS: NRE) - PER MBR SPOKE TO DPSS    
         1              4  WKR ON WED AND MC WAS MANUALLY APPROVED AND WAS ADVISED TO F/U IN 3DAYS. MBR UNDERSTOOD. ................................................................................... (06/09/20 @ 10:44AM - PER AEVS: NRE) - PER MBR ASKED FOR A C/B SINCE SHE WAS ON T    
         1              5 HE LINE WITH DPSS ALREADY F/U ON CASE. ................................................................................... (06/10/20 @ 10:07AM - PER AEVS: NRE) - PER MBR WAS TOLD A MSSG WAS GOING TO BE SENT TO SUPEVISOR AS URGENT. WKR HAS ALL DAY TODAY TO    
         1              6  RETURN CALL.,Home Phone 123456789,00T0H00005PHKMjUAP,xxxxxxxx,12-May-20 10.24.16.000000000 AM,xxxxxxxx,04-Feb-21 09.10.58.000000000 PM,MOORE - Retention Call,Medi-Cal On Hold,Outbound Call,Wrong Phone Number,05/12/20 @ 10:19AM (AEVS-NRE AS OF 04-30-20) W    
         1              7 RONG PH # TRANSFER TO EXT # 1234567,Home Phone 123456789,00T0H00005PHKVTUA5,xxxxxxxxxx,12-May-20 10.49.18.000000000 AM,xxxxxxxx,04-Feb-21 09.12.21.000000000 PM,XXXXXXX - Retention Call,Retention Call On Hold,Outbound Call,Left Message,(05/12/20 @ 10:30AM)    
         1              8  VERIFIED AEVS, MEDICAL NOT ACTIVE AS OF 05/01/2020. CALLED LEFT MESSAGE TO XXXXXXX, ISSUED HIM MY CALL BACK NUMBER.,Home Phone 123456789,00T0H00005PHKlVUAX,xxxxxxxxxx,12-May-20 10.46.52.000000000 AM,xxxxxxx,04-Feb-21 09.12.21.000000000 PM,XXXXXXXXX - Ret    
         1              9 ention Call,Retention Call On Hold,Outbound Call,Left Message,(05/12/20 @ 10:30AM) VERIFIED AEVS, MEDICAL NOT ACTIVE AS OF 05/01/2020. CALLED LEFT MESSAGE TO XXXXXXXX, ISSUED HIM MY CALL BACK NUMBER.,Home Phone 123456789,00T0H00005PHKs0UAH,xxxxxxxx,12-May    
         1             10 -20 10.36.34.000000000 AM,xxxxxxx,04-Feb-21 09.10.25.000000000 PM,XXXXX - Retention Call,Retention Call,Outbound Call,Closed,(05/12/20 @ 10:27AM - PER AEVS: FFS) - PENDING HCO/PROCESSED *MC JUST APPROVED TODAY. REP: XXXXXXX EFF: 15-45DAYS PCP: xxxxxxxx HE    
         1             11 ALTH COMM SUBSCRIBER LAST NAME: XXXXX . EVC #: 11111PT4LQ. CNTY CODE: xx1. PRMY AID CODE: M3. MEDI-CAL ELIGIBLE W/ NO SOC/SPEND DOWN.,Home Phone                                                                                                                   

12 rows selected. 

Elapsed: 00:00:00.068

SQL-Not working for large text having > 4000 characters

Shyam baviskar, May 18, 2021 - 5:26 am UTC

Hi Tom,
I have removed distinct and replace the '&' with 'and' but still it is running longer than expected.
I have 15 records in my table with CALL_NOTE field having 4000 characters string.
Any suggestion on to execute the query in faster way?
Here is my sample CALL_NOTE text:
'Eligibility Message: SUBSCRIBER LAST NAME: XXXXXX. EVC #: 123456789 and CNTY CODE: 19. PRMY AID CODE: 3R. MEDI-CAL ELIGIBLE W/ NO SOC/SPEND DOWN. HEALTH PLAN MEMBER: PHP-L.A. CARE HLTH PLAN: MEDICAL CALL (xxx)xxx-1234. HCP: CARE FIRST CALL: (123) 123-1234. PCP: BARTZ - ALTADONNA COMMUN CALL: (123)123-1234 TRANSFER TO CS FOR FACETS REINSTATEMENT,Home Phone NO RECORDED ELIGIBILITY FOR REQUESTED DATE OF SERVICE 05/12/2020.,Home Phone 123456789,00T0H00005PHJjIUAX,xxxxxxxx,12-May-20 10.19.35.000000000 AM,xxxxxxx,04-Feb-21 09.10.25.000000000 PM,TURNER - Retention Call,Retention Call On Hold,Outbound Call,Pending,(05/12/20 @ 09:49AM - PER AEVS: NRE) - CONNECTED MBR TO DPSS FOR REINSTATEMENT ASSISTANCE. SHE RCVD A NOA STATING RDTERM WAS APPROVED (RDTERM DUE 4/2020). EDUCATED AND LEFT HER ON THE LINE. MBR UNDERSTOO. *TERMED 4/30/20* CALLER: XXXXXXXX XXXXXX / CASE# B123VC146 ................................................................................. (06/05/20 @ 01:43PM - PER AEVS: NRE) - PER MBR SPOKE TO DPSS WKR ON WED AND MC WAS MANUALLY APPROVED AND WAS ADVISED TO F/U IN 3DAYS. MBR UNDERSTOOD. ................................................................................... (06/09/20 @ 10:44AM - PER AEVS: NRE) - PER MBR ASKED FOR A C/B SINCE SHE WAS ON THE LINE WITH DPSS ALREADY F/U ON CASE. ................................................................................... (06/10/20 @ 10:07AM - PER AEVS: NRE) - PER MBR WAS TOLD A MSSG WAS GOING TO BE SENT TO SUPEVISOR AS URGENT. WKR HAS ALL DAY TODAY TO RETURN CALL.,Home Phone 123456789,00T0H00005PHKMjUAP,xxxxxxxx,12-May-20 10.24.16.000000000 AM,xxxxxxxx,04-Feb-21 09.10.58.000000000 PM,MOORE - Retention Call,Medi-Cal On Hold,Outbound Call,Wrong Phone Number,05/12/20 @ 10:19AM (AEVS-NRE AS OF 04-30-20) WRONG PH # TRANSFER TO EXT # 1234567,Home Phone 123456789,00T0H00005PHKVTUA5,xxxxxxxxxx,12-May-20 10.49.18.000000000 AM,xxxxxxxx,04-Feb-21 09.12.21.000000000 PM,XXXXXXX - Retention Call,Retention Call On Hold,Outbound Call,Left Message,(05/12/20 @ 10:30AM) VERIFIED AEVS, MEDICAL NOT ACTIVE AS OF 05/01/2020. CALLED LEFT MESSAGE TO XXXXXXX, ISSUED HIM MY CALL BACK NUMBER.,Home Phone 123456789,00T0H00005PHKlVUAX,xxxxxxxxxx,12-May-20 10.46.52.000000000 AM,xxxxxxx,04-Feb-21 09.12.21.000000000 PM,XXXXXXXXX - Retention Call,Retention Call On Hold,Outbound Call,Left Message,(05/12/20 @ 10:30AM) VERIFIED AEVS, MEDICAL NOT ACTIVE AS OF 05/01/2020. CALLED LEFT MESSAGE TO XXXXXXXX, ISSUED HIM MY CALL BACK NUMBER.,Home Phone 123456789,00T0H00005PHKs0UAH,xxxxxxxx,12-May-20 10.36.34.000000000 AM,xxxxxxx,04-Feb-21 09.10.25.000000000 PM,XXXXX - Retention Call,Retention Call,Outbound Call,Closed,(05/12/20 @ 10:27AM - PER AEVS: FFS) - PENDING HCO/PROCESSED *MC JUST APPROVED TODAY. REP: XXXXXXX EFF: 15-45DAYS PCP: xxxxxxxx HEALTH COMM SUBSCRIBER LAST NAME: XXXXX . EVC #: 11111PT4LQ. CNTY CODE: xx1. PRMY AID CODE: M3. MEDI-CAL ELIGIBLE W/ NO SOC/SPEND DOWN.,Home Phone'

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.