Skip to Main Content
  • Questions
  • String-splitting with not all values present

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Anonymous.

Asked: January 23, 2024 - 1:20 pm UTC

Last updated: January 25, 2024 - 2:54 pm UTC

Version: 12.1.0.2

Viewed 1000+ times

You Asked

Hello. This is a variation on the string-splitting examples herein.

In this case, I have strings in a column with a consistent delimiter (comma in this example),
but not all values are present.

I would like the values to be split into separate columns, which is why I have not tried to employ
the example code to do with turning these values into rows.

This does not work as it omits the null values, which I need:

WITH input_string AS (
  SELECT 'Jagjit,,Jane,Harinder,,Alice' AS my_string FROM dual
)
SELECT 
  REGEXP_SUBSTR(my_string, '[^,]+', 1, 1)  Name1
, REGEXP_SUBSTR(my_string, '[^,]+', 1, 2)  Name2
, REGEXP_SUBSTR(my_string, '[^,]+', 1, 3)  Name3
, REGEXP_SUBSTR(my_string, '[^,]+', 1, 4)  Name4
, REGEXP_SUBSTR(my_string, '[^,]+', 1, 5)  Name5
, REGEXP_SUBSTR(my_string, '[^,]+', 1, 6)  Name6
FROM input_string
;

NAME1  NAME NAME3    NAME4 NAME5 NAME6
------ ---- -------- ----- ----- -----
Jagjit Jane Harinder Alice            



I can get this method to work by artificially inserting a character in the null strings such as in this case:

WITH input_string AS (
  SELECT 'Jagjit, ,Jane,Harinder, ,Alice' AS my_string FROM dual
)
SELECT 
  TRIM(REGEXP_SUBSTR(my_string, '[^,]+', 1, 1))  Name1
, TRIM(REGEXP_SUBSTR(my_string, '[^,]+', 1, 2))  Name2
, TRIM(REGEXP_SUBSTR(my_string, '[^,]+', 1, 3))  Name3
, TRIM(REGEXP_SUBSTR(my_string, '[^,]+', 1, 4))  Name4
, TRIM(REGEXP_SUBSTR(my_string, '[^,]+', 1, 5))  Name5
, TRIM(REGEXP_SUBSTR(my_string, '[^,]+', 1, 6))  Name6
FROM input_string
;

NAME1  NAME2 NAME NAME4    NAME5 NAME6
------ ----- ---- -------- ----- -----
Jagjit       Jane Harinder       Alice



However, I am sure there is a better way!
I just cannot figure it out.

I do realise that the '[^,]' means look for the absence of the delimiter.


Thanks in advance.


and Chris said...

There probably is a clever regex you can write that matches either double commas or a non-comma string. Honestly I think you're approach of replacing double delimiters with delimiter space delimiter is easier to understand and maintain.

I'd do this by REPLACEing the input string in a subquery like so:

WITH input_string AS (
  SELECT replace ( 'Jagjit,,Jane,Harinder,,Alice', ',,', ', ,' ) AS my_string FROM dual
)
SELECT 
  REGEXP_SUBSTR(my_string, '[^,]+', 1, 1)  Name1
, REGEXP_SUBSTR(my_string, '[^,]+', 1, 2)  Name2
, REGEXP_SUBSTR(my_string, '[^,]+', 1, 3)  Name3
, REGEXP_SUBSTR(my_string, '[^,]+', 1, 4)  Name4
, REGEXP_SUBSTR(my_string, '[^,]+', 1, 5)  Name5
, REGEXP_SUBSTR(my_string, '[^,]+', 1, 6)  Name6
FROM input_string;

/*
NAME1  N NAME NAME4    N NAME6
------ - ---- -------- - -----
Jagjit   Jane Harinder   Alice
*/


The values are a space instead of null; if this matters you can always replace the final values at the end.

Rating

  (7 ratings)

Comments

Standard regexp approach to split strings with "missing" values

mathguy, January 24, 2024 - 2:26 pm UTC

The standard regexp solution is to look for substrings of ZERO or more non-delimiters (instead of ONE or more), followed by a comma or the end-of-string. This last part is the only subtle bit (and the only reason for the more complicated regexp, using backreferences); without it, [^,]* would also match empty substrings after each nonempty one.

with
  input_string (my_string) as (
    select 'Jagjit,,Jane,Harinder,,Alice'
    from    dual
  )
select regexp_substr(my_string, '([^,]*)(,|$)', 1, 1, null, 1) as name1,
       regexp_substr(my_string, '([^,]*)(,|$)', 1, 2, null, 1) as name2,
       regexp_substr(my_string, '([^,]*)(,|$)', 1, 3, null, 1) as name3,
       regexp_substr(my_string, '([^,]*)(,|$)', 1, 4, null, 1) as name4,
       regexp_substr(my_string, '([^,]*)(,|$)', 1, 5, null, 1) as name5,
       regexp_substr(my_string, '([^,]*)(,|$)', 1, 6, null, 1) as name6
from   input_string
;

NAME1     NAME2     NAME3     NAME4     NAME5     NAME6    
--------- --------- --------- --------- --------- ---------
Jagjit              Jane      Harinder            Alice   

Connor McDonald
January 25, 2024 - 5:59 am UTC

As they saying goes ... "If you have a problem, you can use regex.... Now you 2 problems" :-)

Nice code.

Weaknesses of "replace NULL with single space" approach

mathguy, January 24, 2024 - 2:40 pm UTC

@Chris - that method has one structural weakness, and the implementation is deficient as well.

Let's start with the implementation: substituting comma-space-comma for two consecutive commas will not account for NULL tokens right at the beginning of the string (that is: a string that begins with a comma, indicating the FIRST token is NULL), and also for NULL tokens right at the end. It will also not account for two consecutive NULL tokens in the string (like 'Alice,,,Dan,Emma,Flor') - a space will be added between the first and second comma, but not between the second and the third.

In all these cases, the output from the query, as written, will be wrong (there will be no error thrown to alert us to anything; the output will just be incorrect). They can be fixed, but the query becomes even more complicated.

The structural issue is more serious though. In the most general case, the tokens don't have to be "names", they can be any valid strings. If a single space is a valid NON-NULL token (a string of length 1), this approach will destroy that distinction. This can also be fixed - use a control character (like CHR(7), the system bell - assuming this is not a "valid" character in the inputs) as a marker for NULL - and then replace it back with "nothing" after the splitting.

regexp with subexpression

Jonathan Taylor, January 24, 2024 - 7:23 pm UTC

Using subexpression in regular expression:-
WITH input_string AS (
SELECT 'Jagjit,,Jane,Harinder,,Alice' AS my_string
, '([^,]*),?' AS regex --Part in brackets is the subexpression "any non comma character", followed by an optional comma
FROM dual
)
SELECT
--the final ,1 gets the subexpression inside commas (in other words remove the comma it matches).
REGEXP_SUBSTR(my_string, regex, 1, 1,'',1) Name1
, REGEXP_SUBSTR(my_string, regex, 1, 2,'',1) Name2
, REGEXP_SUBSTR(my_string, regex, 1, 3,'',1) Name3
, REGEXP_SUBSTR(my_string, regex, 1, 4,'',1) Name4
, REGEXP_SUBSTR(my_string, regex, 1, 5,'',1) Name5
, REGEXP_SUBSTR(my_string, regex, 1, 6,'',1) Name6
FROM input_string;


Alternatively, consider APEX_STRING

SELECT *
FROM APEX_STRING.split ('Jagjit,,Jane,Harinder,,Alice',',')


Connor McDonald
January 25, 2024 - 6:00 am UTC

nice stuff

A reader, January 25, 2024 - 8:35 am UTC

I found this method in AskTom while ago and been using it ever since. No regexp necessary:)

WITH input_strings AS
 (SELECT 'Jagjit,,Jane,Harinder,,Alice' input_string
    FROM dual),
strings AS
 (SELECT TRIM(substr(txt, instr(txt, ',', 1, LEVEL) + 1, instr(txt, ',', 1, LEVEL + 1) - instr(txt, ',', 1, LEVEL) - 1)) serving_result
    FROM (SELECT ',' || input_strings.input_string || ',' txt
            FROM input_strings)
        ,input_strings
  CONNECT BY LEVEL <= length(input_strings.input_string) - length(REPLACE(input_strings.input_string, ',', '')) + 1)
SELECT *
  FROM strings;

Chris Saxon
January 25, 2024 - 2:54 pm UTC

Yep, that works too; though you'll need to pivot the result to get them as columns (which is what the OP asked for)

With current database versions, try SQL/JSON

Stew Ashton, January 25, 2024 - 8:36 am UTC

I realize the question says version 12.1.0.2, but I hope anyone who has the same question using a more recent version will consider using SQL/JSON - as a first step toward replacing all those CSV strings with JSON arrays!
with data(x) as (
  select '' from dual union all
  select ',' from dual union all
  select ',,' from dual union all
  select ',,,' from dual union all
  select ',,c' from dual union all
  select ',,c,' from dual union all
  select ',b' from dual union all
  select ',b,' from dual union all
  select ',b,,' from dual union all
  select ',b,c' from dual union all
  select ',b,c,' from dual union all
  select 'a' from dual union all
  select 'a,' from dual union all
  select 'a,,' from dual union all
  select 'a,,,' from dual union all
  select 'a,,c' from dual union all
  select 'a,,c,' from dual union all
  select 'a,b' from dual union all
  select 'a,b,' from dual union all
  select 'a,b,,' from dual union all
  select 'a,b,c' from dual union all
  select 'a,b,c,' from dual union all
  select 'a,b,c,d' from dual
)
select nvl(x, '(null)') x, 
  nvl(a, '(null)') a,
  nvl(b, '(null)') b,
  nvl(c, '(null)') c,
  nvl(d, '(null)') d
from data, json_table(
  replace(json_array(x), ',', '","'), '$' columns
  a path '$[0]',
  b path '$[1]',
  c path '$[2]',
  d path '$[3]'
);

X          A         B         C         D         
(null)     (null)    (null)    (null)    (null)    
,          (null)    (null)    (null)    (null)    
,,         (null)    (null)    (null)    (null)    
,,,        (null)    (null)    (null)    (null)    
,,c        (null)    (null)    c         (null)    
,,c,       (null)    (null)    c         (null)    
,b         (null)    b         (null)    (null)    
,b,        (null)    b         (null)    (null)    
,b,,       (null)    b         (null)    (null)    
,b,c       (null)    b         c         (null)    
,b,c,      (null)    b         c         (null)    
a          a         (null)    (null)    (null)    
a,         a         (null)    (null)    (null)    
a,,        a         (null)    (null)    (null)    
a,,,       a         (null)    (null)    (null)    
a,,c       a         (null)    c         (null)    
a,,c,      a         (null)    c         (null)    
a,b        a         b         (null)    (null)    
a,b,       a         b         (null)    (null)    
a,b,,      a         b         (null)    (null)    
a,b,c      a         b         c         (null)    
a,b,c,     a         b         c         (null)    
a,b,c,d    a         b         c         d 

Best regards,
Stew

For older versions using INSTR and SUBSTR

Stew Ashton, January 25, 2024 - 1:19 pm UTC

A reader mentioned an INSTR&SUBSTR solution that generates one row per item. This is a way to get multiple columns on one line; maybe it's easier to understand than REGEX?
with data(x) as (
  select '' from dual union all
  select ',' from dual union all
  select ',,' from dual union all
  select ',,,' from dual union all
  select ',,c' from dual union all
  select ',,c,' from dual union all
  select ',b' from dual union all
  select ',b,' from dual union all
  select ',b,,' from dual union all
  select ',b,c' from dual union all
  select ',b,c,' from dual union all
  select 'a' from dual union all
  select 'a,' from dual union all
  select 'a,,' from dual union all
  select 'a,,,' from dual union all
  select 'a,,c' from dual union all
  select 'a,,c,' from dual union all
  select 'a,b' from dual union all
  select 'a,b,' from dual union all
  select 'a,b,,' from dual union all
  select 'a,b,c' from dual union all
  select 'a,b,c,' from dual union all
  select 'a,b,c,d' from dual
)
select x,
  substr(x,           1, after_a           - 1) a,
  substr(x, after_a + 1, after_b - after_a - 1) b,
  substr(x, after_b + 1, after_c - after_b - 1) c,
  substr(x, after_c + 1, after_d - after_c - 1) d
from (
  select x,
    instr(x||',', ',', 1, 1) after_a,
    instr(x||',', ',', 1, 2) after_b,
    instr(x||',', ',', 1, 3) after_c,
    instr(x||',', ',', 1, 4) after_d
  from data
);

Same results as above.

Regards,
Stew
Chris Saxon
January 25, 2024 - 2:52 pm UTC

Good suggestion, thanks Stew