Skip to Main Content
  • Questions
  • USING REGEXP_REPLACE to remove duplicates

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question.

Asked: February 28, 2018 - 2:33 pm UTC

Last updated: April 20, 2021 - 12:03 pm UTC

Version: 12c

Viewed 10K+ times! This question is

You Asked

Sorted input:
select regexp_replace( 'three0, three, two, two, two2, two3, three, three, three, one1, one1', '([^,]+)(,\1)+(,|$)', '\1\3') from dual;

three0, three, two, two2, two3, three, one1


unSorted input:
select regexp_replace( 'three0, three, two,  two2, two3, three, three, three, one1, one1, two', '([^,]+)(,\1)+(,|$)', '\1\3') from dual;

three0, three, two,  two2, two3, three, one1, two


How can I get a unique result if the input list is not sorted?



with LiveSQL Test Case:

and Chris said...

I don't know of a way to do this using regular expressions without making many passes over your data.

For example, the following has two duplicates:

one,two,two,one


Stripping out the extras leaves:

one,two


But in order to match the final "one" with the first, you also need to match the twos in-between. But now you're at the end of the string!

So you need to go back to the start to match the second "two" with the first.

Instead, split your string up using your favourite CSV-to-rows method. Then glue the distinct results of this back together with listagg:

with str as (
  select 'three0, three, two, two, two2, two3, three, three, three, one1, one1' s 
  from   dual
), rws as (
  select distinct trim(regexp_substr(s, '[^,]+', 1, r.rn)) s
  from   str, lateral (
    select level rn from dual 
    connect by level <= length (regexp_replace(s, '[^,]+')) + 1
  ) r
)
  select listagg(s, ',') within group (order by s) dist
  from   rws;

DIST                              
one1,three,three0,two,two2,two3 

Rating

  (4 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

A reader, February 28, 2018 - 6:28 pm UTC

Thanks Very Much. Never would have figured that out. The strings were a result of a previous listagg() that output duplicates.

ListAGG Unique

A reader, February 28, 2018 - 6:46 pm UTC

I tried the solution on my data instead of the string hoping it work on a column but it does not work.

I wanted to go from a listagg() result like this:

ColA ColB
AA774 AA258,AA774,GR481,GR718
AA774 AA774,CA576,GR481,PP619
AA774 AA774,GR554,PP733

To this:
ColA ColB
AA774 AA258,AA774,GR481,GR718,CA576,PP619,GR554,PP733

Connor McDonald
March 01, 2018 - 1:15 am UTC

There's a few suggested workarounds here

https://community.oracle.com/ideas/12533

and I'm directing you there so you can also Vote on the idea to implement native listagg-distinct

error

Nadeem, January 14, 2021 - 8:20 am UTC

I not working error is showing
Chris Saxon
January 14, 2021 - 10:19 am UTC

Exactly what is your query and what is the error?

Solution for 11g

John Taylor, April 20, 2021 - 10:59 am UTC

One of our databases is 11g and Tom's solution didn't work (got an error). I did this:

set serverout on
declare
vstr varchar2(4000) := 'dd,ss,dd,qq,ww,ss,dd';
type dups_type is table of varchar2(4000) index by binary_integer;
dups_list dups_type;
entries number := 0;
ventry varchar2(2000);
found number := 0;
begin
entries := regexp_count(vstr,',')+1;
for i in 1..entries loop
ventry := substr(vstr,1,instr(vstr,',')-1);
dups_list(i) := ventry;
vstr := substr(vstr,instr(vstr,',')+1);
end loop;
dups_list(entries) := vstr;
for i in 1..entries loop
found := 0;
for x in 1..entries loop
if dups_list(i) = dups_list(x) then
found := found + 1;
end if;
end loop;
if found > 1 then
dups_list(i) := null;
end if;
end loop;
vstr := null;
for i in 1..entries loop
if dups_list(i) is not null then
vstr := vstr||','||dups_list(i);
end if;
end loop;
vstr := substr(vstr,2);
dbms_output.put_line(vstr);
end;
/

That returned:

qq,ww,ss,dd


Chris Saxon
April 20, 2021 - 12:03 pm UTC

Thanks for sharing

More to Explore

SQL

The Oracle documentation contains a complete SQL reference.