Sorted input:
select regexp_replace( 'three0, three, two, two, two2, two3, three, three, three, one1, one1', '([^,]+)(,\1)+(,|$)', '\1\3') from dual;
three0, three, two, two2, two3, three, one1
unSorted input:
select regexp_replace( 'three0, three, two, two2, two3, three, three, three, one1, one1, two', '([^,]+)(,\1)+(,|$)', '\1\3') from dual;
three0, three, two, two2, two3, three, one1, two
How can I get a unique result if the input list is not sorted?
I don't know of a way to do this using regular expressions without making many passes over your data.
For example, the following has two duplicates:
one,two,two,one
Stripping out the extras leaves:
one,two
But in order to match the final "one" with the first, you also need to match the twos in-between. But now you're at the end of the string!
So you need to go back to the start to match the second "two" with the first.
Instead, split your string up using your favourite CSV-to-rows method. Then glue the distinct results of this back together with listagg:
with str as (
select 'three0, three, two, two, two2, two3, three, three, three, one1, one1' s
from dual
), rws as (
select distinct trim(regexp_substr(s, '[^,]+', 1, r.rn)) s
from str, lateral (
select level rn from dual
connect by level <= length (regexp_replace(s, '[^,]+')) + 1
) r
)
select listagg(s, ',') within group (order by s) dist
from rws;
DIST
one1,three,three0,two,two2,two3