Skip to Main Content
  • Questions
  • oracle SQL — Replace comma separated string with another set of values

Breadcrumb

May 4th

Question and Answer

Connor McDonald

Thanks for the question, Rose.

Asked: July 10, 2016 - 11:07 am UTC

Last updated: July 14, 2016 - 12:10 am UTC

Version: 10 g

Viewed 10K+ times! This question is

You Asked

I have the following table:

B_ID  I_ID   R_ID
W00001 1234 1235,1237
B00001 1235 1236,1235
T00001 1236 1235,1235,1235
X00001 1237 1234,1236,1238
M00001 1238 1238



I need output like below using sql

B_ID    I_ID    New_ID
W00001  1234    B00001|X00001
B00001  1235    T00001|B00001
T00001  1236    B00001
X00001  1237    W00001|T00001|M00001
M00001  1238    M00001

Note: Comma's present in R_ID column should replaced with '|' delimitor

and Connor said...

OK - lets build this up piece by piece.

Note for you - see how I do create-table/insert statements...easier and quicker if you did them when posting a question...because its your data.

SQL> create table t ( b varchar2(10), i int, r varchar2(30));

Table created.

SQL> insert into t values ('W00001', 1234 ,'1235,1237');

1 row created.

SQL> insert into t values ('B00001', 1235 ,'1236,1235');

1 row created.

SQL> insert into t values ('T00001', 1236 ,'1235,1235,1235');

1 row created.

SQL> insert into t values ('X00001', 1237 ,'1234,1236,1238');

1 row created.

SQL> insert into t values ('M00001', 1238 ,'1238');

1 row created.



So first we use some code that lets us parse out a string


SQL>
SQL>
SQL> variable str varchar2(30)
SQL> exec :str := '1235,1237'

PL/SQL procedure successfully completed.

SQL>
SQL> select substr(:str,
  2                    nvl(lag(loc) over ( order by loc),0)+1,
  3                    loc-nvl(lag(loc) over ( order by loc),0)-1
  4                   ) list_as_rows
  5      from (
  6        select distinct (instr(:str||',',',',1,level)) loc
  7        from dual
  8        connect by level <= length(:str)-length(replace(:str,','))+1
  9        );

LIST_AS_ROWS
--------------------------------
1235
1237

2 rows selected.


We'll plug that into our data and apply that parsing to each row in the table


SQL>
SQL>
SQL> select b, r, column_value
  2  from t,
  3       table(cast(multiset(
  4                    select substr(t.r,
  5                                      nvl(lag(loc) over ( order by loc),0)+1,
  6                                      loc-nvl(lag(loc) over ( order by loc),0)-1
  7                                     ) list_as_rows
  8                        from (
  9                          select distinct (instr(t.r||',',',',1,level)) loc
 10                          from dual
 11                          connect by level <= length(t.r)-length(replace(t.r,','))+1
 12                          )
 13                    ) as sys.ODCIvarchar2LIST));

B          R                              COLUMN_VALUE
---------- ------------------------------ ------------------------------
W00001     1235,1237                      1235
W00001     1235,1237                      1237
B00001     1236,1235                      1236
B00001     1236,1235                      1235
T00001     1235,1235,1235                 1235
T00001     1235,1235,1235                 1235
T00001     1235,1235,1235                 1235
X00001     1234,1236,1238                 1234
X00001     1234,1236,1238                 1236
X00001     1234,1236,1238                 1238
M00001     1238                           1238

11 rows selected.



Now we can join back to the original table to pick up the 'B' values that we ultimately will want to concatenate into a list


SQL>
SQL>
SQL> with parsed_strings as (
  2    select b, r, to_number(column_value) extracted_i
  3    from t,
  4         table(cast(multiset(
  5                      select substr(t.r,
  6                                        nvl(lag(loc) over ( order by loc),0)+1,
  7                                        loc-nvl(lag(loc) over ( order by loc),0)-1
  8                                       ) list_as_rows
  9                          from (
 10                            select distinct (instr(t.r||',',',',1,level)) loc
 11                            from dual
 12                            connect by level <= length(t.r)-length(replace(t.r,','))+1
 13                            )
 14                      ) as sys.ODCIvarchar2LIST))
 15  )
 16  select p.*, t.b mapped_b
 17  from parsed_strings p,
 18       t
 19  where p.extracted_i = t.i;

B          R                              EXTRACTED_I MAPPED_B
---------- ------------------------------ ----------- ----------
W00001     1235,1237                             1235 B00001
W00001     1235,1237                             1237 X00001
B00001     1236,1235                             1236 T00001
B00001     1236,1235                             1235 B00001
T00001     1235,1235,1235                        1235 B00001
T00001     1235,1235,1235                        1235 B00001
T00001     1235,1235,1235                        1235 B00001
X00001     1234,1236,1238                        1234 W00001
X00001     1234,1236,1238                        1236 T00001
X00001     1234,1236,1238                        1238 M00001
M00001     1238                                  1238 M00001

11 rows selected.



Now if this was 11g, we could then trivially use LISTAGG to concatenate the list as below

SQL>
SQL>
SQL> with
  2  parsed_strings as (
  3    select b, r, i, to_number(column_value) extracted_i
  4    from t,
  5         table(cast(multiset(
  6                      select substr(t.r,
  7                                        nvl(lag(loc) over ( order by loc),0)+1,
  8                                        loc-nvl(lag(loc) over ( order by loc),0)-1
  9                                       ) list_as_rows
 10                          from (
 11                            select distinct (instr(t.r||',',',',1,level)) loc
 12                            from dual
 13                            connect by level <= length(t.r)-length(replace(t.r,','))+1
 14                            )
 15                      ) as sys.ODCIvarchar2LIST))
 16  ),
 17  raw_data as (
 18  select p.*, t.b mapped_b
 19  from parsed_strings p,
 20       t
 21  where p.extracted_i = t.i
 22  )
 23  select b, i, listagg(mapped_b,'|') within group ( order by extracted_i) b_list
 24  from raw_data
 25  group by b,i;

B                   I B_LIST
---------- ---------- ------------------------------
B00001           1235 B00001|T00001
M00001           1238 M00001
T00001           1236 B00001|B00001|B00001
W00001           1234 B00001|X00001
X00001           1237 W00001|T00001|M00001

5 rows selected.



But we're on 10g, so we have to use a slightly more convoluted method to get there.



SQL>
SQL> with
  2  parsed_strings as (
  3    select b, r, i, to_number(column_value) extracted_i
  4    from t,
  5         table(cast(multiset(
  6                      select substr(t.r,
  7                                        nvl(lag(loc) over ( order by loc),0)+1,
  8                                        loc-nvl(lag(loc) over ( order by loc),0)-1
  9                                       ) list_as_rows
 10                          from (
 11                            select distinct (instr(t.r||',',',',1,level)) loc
 12                            from dual
 13                            connect by level <= length(t.r)-length(replace(t.r,','))+1
 14                            )
 15                      ) as sys.ODCIvarchar2LIST))
 16  ),
 17  raw_data as (
 18  select p.*, t.b mapped_b
 19  from parsed_strings p,
 20       t
 21  where p.extracted_i = t.i
 22  )
 23  select b, i,
 24         ltrim(max(substr(sys_connect_by_path(mapped_b,','),1,200))
 25         keep (dense_rank last order by curr),',') as b_list
 26  from   (select b, i,
 27                 mapped_b,
 28                 row_number() over (partition by b, i order by mapped_b) as curr,
 29                 row_number() over (partition by b, i order by mapped_b) -1 as prev
 30          from   raw_data)
 31  group by b, i
 32  connect by prev = prior curr
 33  and b = prior b
 34  and i = prior i
 35  start with curr = 1;

B                   I B_LIST
---------- ---------- ------------------------------
B00001           1235 B00001,T00001
M00001           1238 M00001
T00001           1236 B00001,B00001,B00001
W00001           1234 B00001,X00001
X00001           1237 M00001,T00001,W00001

5 rows selected.

SQL>
SQL>




Rating

  (2 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Awesome Solution

Pranav, July 11, 2016 - 7:29 pm UTC

You have considered so many possibilities great stuff to look at. Learnt something new.
Chris Saxon
July 12, 2016 - 1:30 am UTC

Thanks for the feedback

Can you explain?

AndyP, July 13, 2016 - 10:42 am UTC

Hi Connor,

On my 11.2.0.4 instance I get an error from your approach (same data setup):

select b, r, column_value
from t,
     table(cast(multiset(
                  select substr(t.r,
                                    nvl(lag(loc) over ( order by loc),0)+1,
                                    loc-nvl(lag(loc) over ( order by loc),0)-1
                                   ) list_as_rows
                      from (
                        select distinct (instr(t.r||',',',',1,level)) loc
                        from dual
                        connect by level <= length(t.r)-length(replace(t.r,','))+1
                        )
                  ) as sys.ODCIvarchar2LIST));

                        connect by level <= length(t.r)-length(replace(t.r,','))+1
                                                                       *
ERROR at line 11:
ORA-00904: "T"."R": invalid identifier


I'm thinking the reference is too deeply nested, though can't explain why it worked for you

Whereas, if I use the classic Tom Kyte substr/instr approach to splitting the csvs:

select t.b
      ,b.column_value token
  from (select b, r, ','||r||',' txt from t) t
      ,table ( cast ( multiset (
          select to_number (
                    substr (txt
                           ,instr (txt, ',', 1, level) + 1
                           ,instr (txt, ',', 1, level + 1) - instr (txt, ',', 1, level) - 1)) token
            from dual
         connect by level <= length(t.r) - length(replace(t.r,',','')) + 1
         ) as sys.odcinumberlist ) ) b
 order by 1,2
/

B               TOKEN
---------- ----------
B00001           1235
B00001           1236
M00001           1238
T00001           1235
T00001           1235
T00001           1235
W00001           1235
W00001           1237
X00001           1234
X00001           1236
X00001           1238


And using that approach we can listagg the values back together
Using order by rownum for the listagg appears to keep the original token order (possibly not guaranteed), which might be what the questioner wanted

with parsed_strings as
(
select b,r,i,to_number(column_value) extracted_i
  from (select b,i,r,','||r||',' txt from t) t
      ,table ( cast ( multiset (
          select to_number (
                    substr (txt
                           ,instr (txt, ',', 1, level) + 1
                           ,instr (txt, ',', 1, level + 1) - instr (txt, ',', 1, level) - 1)) token
            from dual
         connect by level <= length(t.r) - length(replace(t.r,',','')) + 1
         ) as sys.odcinumberlist ) ) b
)
,raw_data as
(
select p.*,t.b mapped_b
  from parsed_strings p,t
 where p.extracted_i=t.i
)
select b,i,substr(listagg(mapped_b,'|') within group (order by rownum),1,25) b_list
  from raw_data
 group by b,i
 order by b;

B                   I B_LIST
---------- ---------- -------------------------
B00001           1235 T00001|B00001
M00001           1238 M00001
T00001           1236 B00001|B00001|B00001
W00001           1234 B00001|X00001
X00001           1237 W00001|T00001|M00001



Connor McDonald
July 14, 2016 - 12:10 am UTC

Ah... I ran my demo on 12c (I dont have a 10g to play with). Looks like an improvement there (because I get the same error as you on 11g)

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.