Skip to Main Content
  • Questions
  • Need help with regular expression for address replacements

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Murthy.

Asked: March 31, 2021 - 11:40 am UTC

Last updated: April 01, 2021 - 1:36 pm UTC

Version: 12c

Viewed 1000+ times

You Asked

I have a recursive query as below:

with addresses as
(
  select cust_id,address addr from
  (
  select 10 cust_id,'9 Help Street, Level 4' address from dual union all
  select 11 cust_id,'22 Victoria Street' address from dual union all
  select 12 cust_id,'1495 Franklin Str.' address from dual union all
  select 13 cust_id,'30 Hasivim St.,Petah-Tikva' address from dual union all
  select 14 cust_id,'2 Jakaranda St' address from dual union all
  select 15 cust_id,'61, Science Park Rd' address from dual union all
  select 16 cust_id,'61, Social park road.' address from dual union all
  select 17 cust_id,'Av. Hermanos Escobar 5756' address from dual union all
  select 18 cust_id,'Ave. Hermanos Escobar 5756' address from dual union all
  select 19 cust_id,'8000 W FLORISSANT Ave.' address from dual union all
  select 20 cust_id,'8600 MEMORIAL PKWY SW' address from dual union all
  select 21 cust_id,'8200 FLORISSANTMEMORIALWAYABOVE SW' address from dual union all
  select 22 cust_id,'8600 MEMORIALFLORISSANT PKWY SW.' address from dual
  ) t1
),
 replacements as
  (
  select id,to_str,from_string from_str from
  (
  select 1 id,'St' to_str,'Street' from_string from dual union all
 select 2 id,'St' to_str,'St' from_string from dual union all
 select 3 id,'St' to_str,'Strit' from_string from dual union all
 select 4 id,'St' to_str,'Str' from_string from dual union all
 select 5 id,'Rd' to_str,'Rd.' from_string from dual union all
 select 6 id,'Rd' to_str,'road' from_string from dual union all
 select 7 id,'Av' to_str,'Av.' from_string from dual union all
 select 8 id,'Av' to_str,'Ave' from_string from dual union all
 select 9 id,'Av' to_str,'Avenue' from_string from dual union all
 select 10 id,'Av' to_str,'Aven.' from_string from dual union all
 select 11 id,'West' to_str,'W' from_string from dual union all
 select 12 id,'South West' to_str,'SW' from_string from dual
 ) t2
),
 r(cust_id,addr,test_addr,l) as 
 (
   select cust_id,addr,regexp_replace(addr,'(^|\W)' || from_str || '(\W|$)','\1' || to_str || '\2') test_addr,
     id - 1
     from  
     addresses,
     replacements
     where id = (select count(*) from replacements)
   union all
     select cust_id,addr,regexp_replace(test_addr,'(^|\W)' || from_str || '(\W|$)','\1' || to_str || '\2') test_addr,
     l - 1
     from r,
     replacements
     where id = l
  )
 select cust_id,addr,test_addr,l
 from r
 where l=0
 ;


PRESENT OUTPUT:

cust_id addr test_addr
10 9 Help Street, Level 4          9 Help St, Level 4
11 22 Victoria Street                   22 Victoria St
12 1495 Franklin Str.                   1495 Franklin St.
13 30 Hasivim St.,Petah-Tikva    30 Hasivim St.,Petah-Tikva
14 2 Jakaranda St                    2 Jakaranda St
15 61, Science Park Rd 61, Science Park Rd
16 61, Social park road. 61, Social park Rd
17 Av. Hermanos Escobar 5756 Av Hermanos Escobar 5756
18 Ave. Hermanos Escobar 5756 Av Hermanos Escobar 5756
19 8000 W FLORISSANT Ave. 8000 West FLORISSANT Av
20 8600 MEMORIAL PKWY SW 8600 MEMORIAL PKWY South West
21 8200 FLORISSANTMEMORIALWAYABOVE SW 8200 FLORISSANTMEMORIALWAYABOVE South West
22 8600 MEMORIALFLORISSANT PKWY SW. 8600 MEMORIALFLORISSANT PKWY South West.


Query not working as expected for cust_id like 16,18,22. in cust_id dot is there after road but still it changes to Rd.

I need 2 queries...one with exact match including dot and the other sql having match with dot or without dot.

The expect out put for first sql with exact match:

cust_id addr test_addr
10 9 Help Street, Level 4 9 Help St, Level 4
11 22 Victoria Street 22 Victoria St
12 1495 Franklin Str. 1495 Franklin St.
13 30 Hasivim St.,Petah-Tikva 30 Hasivim St.,Petah-Tikva
14 2 Jakaranda St 2 Jakaranda St
15 61, Science Park Rd 61, Science Park Rd
16 61, Social park road. 61, Social park road.
17 Av. Hermanos Escobar 5756 Av Hermanos Escobar 5756
18 Ave. Hermanos Escobar 5756 Ave. Hermanos Escobar 5756
19 8000 W FLORISSANT Ave. 8000 West FLORISSANT Ave.
20 8600 MEMORIAL PKWY SW 8600 MEMORIAL PKWY South West
21 8200 FLORISSANTMEMORIALWAYABOVE SW 8200 FLORISSANTMEMORIALWAYABOVE South West
22 8600 MEMORIALFLORISSANT PKWY SW. 8600 MEMORIALFLORISSANT PKWY SW.


The expected output with second sql including dot if exists:

cust_id addr test_addr
10 9 Help Street, Level 4 9 Help St, Level 4
11 22 Victoria Street 22 Victoria St
12 1495 Franklin Str. 1495 Franklin St.
13 30 Hasivim St.,Petah-Tikva 30 Hasivim St.,Petah-Tikva
14 2 Jakaranda St 2 Jakaranda St
15 61, Science Park Rd 61, Science Park Rd
16 61, Social park road. 61, Social park Rd
17 Av. Hermanos Escobar 5756 Av Hermanos Escobar 5756
18 Ave. Hermanos Escobar 5756 Av Hermanos Escobar 5756
19 8000 W FLORISSANT Ave. 8000 West FLORISSANT Av
20 8600 MEMORIAL PKWY SW 8600 MEMORIAL PKWY South West
21 8200 FLORISSANTMEMORIALWAYABOVE SW 8200 FLORISSANTMEMORIALWAYABOVE South West
22 8600 MEMORIALFLORISSANT PKWY SW. 8600 MEMORIALFLORISSANT PKWY South West.


Thanks

and Chris said...

The problem lies in the recursive nature of the replacements.

The replacement data maps

road => Rd
Rd. => Rd

So as the query works through the replacements for cust 16, it'll make these changes:

61, Social park road. =>
61, Social park Rd. =>
61, Social park Rd

It's likely you'll get chains of replacements like this using recursive SQL giving unexpected results whatever you do. Plus it tries every substitution for every address, which is a waste of effort.

You can avoid these issues by:

- Splitting the source address into words
- Outer joining these to the replacement strings
- Using listagg to recombine these back into the address

Which looks something like:

with addresses as
(
  select cust_id,address addr from
  (
  select 10 cust_id,'9 Help Street, Level 4' address from dual union all
  select 11 cust_id,'22 Victoria Street' address from dual union all
  select 12 cust_id,'1495 Franklin Str.' address from dual union all
  select 13 cust_id,'30 Hasivim St.,Petah-Tikva' address from dual union all
  select 14 cust_id,'2 Jakaranda St' address from dual union all
  select 15 cust_id,'61, Science Park Rd' address from dual union all
  select 16 cust_id,'61, Social park road.' address from dual union all
  select 17 cust_id,'Av. Hermanos Escobar 5756' address from dual union all
  select 18 cust_id,'Ave. Hermanos Escobar 5756' address from dual union all
  select 19 cust_id,'8000 W FLORISSANT Ave.' address from dual union all
  select 20 cust_id,'8600 MEMORIAL PKWY SW' address from dual union all
  select 21 cust_id,'8200 FLORISSANTMEMORIALWAYABOVE SW' address from dual union all
  select 22 cust_id,'8600 MEMORIALFLORISSANT PKWY SW.' address from dual
  ) t1
),
 replacements as
  (
  select id,to_str,from_string from_str from
  (
  select 1 id,'St' to_str,'Street' from_string from dual union all
 select 2 id,'St' to_str,'St' from_string from dual union all
 select 3 id,'St' to_str,'Strit' from_string from dual union all
 select 4 id,'St' to_str,'Str' from_string from dual union all
 select 5 id,'Rd' to_str,'Rd.' from_string from dual union all
 select 6 id,'Rd' to_str,'road' from_string from dual union all
 select 7 id,'Av' to_str,'Av.' from_string from dual union all
 select 8 id,'Av' to_str,'Ave' from_string from dual union all
 select 9 id,'Av' to_str,'Avenue' from_string from dual union all
 select 10 id,'Av' to_str,'Aven.' from_string from dual union all
 select 11 id,'West' to_str,'W' from_string from dual union all
 select 12 id,'South West' to_str,'SW' from_string from dual
 ) t2
), words as (
  select cust_id, addr,
         regexp_substr ( addr, '[^ ]+', 1, rn ) wd, rn
  from   addresses, lateral (
    select level rn from dual
    connect by level <= regexp_count ( addr, ' ' ) + 1
  )
)
  select cust_id, addr,
         listagg ( nvl ( to_str, wd ), ' ' ) 
           within group ( order by rn ) new_addr
  from   words
  left   join replacements
  on     wd = from_str
  group  by cust_id, addr;
  
CUST_ID ADDR                                NEW_ADDR                                     
     10 9 Help Street, Level 4              9 Help Street, Level 4                        
     11 22 Victoria Street                  22 Victoria St                                
     12 1495 Franklin Str.                  1495 Franklin Str.                            
     13 30 Hasivim St.,Petah-Tikva          30 Hasivim St.,Petah-Tikva                    
     14 2 Jakaranda St                      2 Jakaranda St                                
     15 61, Science Park Rd                 61, Science Park Rd                           
     16 61, Social park road.               61, Social park road.                         
     17 Av. Hermanos Escobar 5756           Av Hermanos Escobar 5756                      
     18 Ave. Hermanos Escobar 5756          Ave. Hermanos Escobar 5756                    
     19 8000 W FLORISSANT Ave.              8000 West FLORISSANT Ave.                     
     20 8600 MEMORIAL PKWY SW               8600 MEMORIAL PKWY South West                 
     21 8200 FLORISSANTMEMORIALWAYABOVE SW  8200 FLORISSANTMEMORIALWAYABOVE South West    
     22 8600 MEMORIALFLORISSANT PKWY SW.    8600 MEMORIALFLORISSANT PKWY SW.  


If you want to replace strings with a trailing dot (and remove it), change the join criteria to:

on     ( wd = from_str or wd = from_str || '.' )

Rating

  (4 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Need help with regular expression for address replacements

Murthy, March 31, 2021 - 3:30 pm UTC

Chris, Thank you very much and appreciate your help.

Need help with regular expression for address replacements

Murthy, March 31, 2021 - 3:43 pm UTC

Chris, it is working for cust_id:10 and 16
expected output:
10 9 Help Street, Level 4 9 Help St, Level 4

16 61, Social park road. 61, Social park Rd.

could you please help?

Need help with regular expression for address replacements

Murthy, March 31, 2021 - 3:44 pm UTC

Chris, Type in earlier comment....

it is NOT working for cust_id:10 and 16
expected output:
10 9 Help Street, Level 4 9 Help St, Level 4

16 61, Social park road. 61, Social park Rd.

could you please help?
Chris Saxon
March 31, 2021 - 4:19 pm UTC

What exactly is the query you're using?

Need help with regular expression for address replacements

Murthy, March 31, 2021 - 4:42 pm UTC

with addresses as
(
select cust_id,address addr from
(
select 10 cust_id,'9 Help Street, Level 4' address from dual union all
select 11 cust_id,'22 Victoria Street' address from dual union all
select 12 cust_id,'1495 Franklin Str.' address from dual union all
select 13 cust_id,'30 Hasivim St.,Petah-Tikva' address from dual union all
select 14 cust_id,'2 Jakaranda St' address from dual union all
select 15 cust_id,'61, Science Park Rd' address from dual union all
select 16 cust_id,'61, Social park road.' address from dual union all
select 17 cust_id,'Av. Hermanos Escobar 5756' address from dual union all
select 18 cust_id,'Ave. Hermanos Escobar 5756' address from dual union all
select 19 cust_id,'8000 W FLORISSANT Ave.' address from dual union all
select 20 cust_id,'8600 MEMORIAL PKWY SW' address from dual union all
select 21 cust_id,'8200 FLORISSANTMEMORIALWAYABOVE SW' address from dual union all
select 22 cust_id,'8600 MEMORIALFLORISSANT PKWY SW.' address from dual
) t1
),
replacements as
(
select id,to_str,from_string from_str from
(
select 1 id,'St' to_str,'Street' from_string from dual union all
select 2 id,'St' to_str,'St' from_string from dual union all
select 3 id,'St' to_str,'Strit' from_string from dual union all
select 4 id,'St' to_str,'Str' from_string from dual union all
select 5 id,'Rd' to_str,'Rd.' from_string from dual union all
select 6 id,'Rd' to_str,'road' from_string from dual union all
select 7 id,'Av' to_str,'Av.' from_string from dual union all
select 8 id,'Av' to_str,'Ave' from_string from dual union all
select 9 id,'Av' to_str,'Avenue' from_string from dual union all
select 10 id,'Av' to_str,'Aven.' from_string from dual union all
select 11 id,'West' to_str,'W' from_string from dual union all
select 12 id,'South West' to_str,'SW' from_string from dual
) t2
), words as (
select cust_id, addr,
regexp_substr ( addr, '[^ ]+', 1, rn ) wd, rn
from addresses, lateral (
select level rn from dual
connect by level <= regexp_count ( addr, ' ' ) + 1
)
)
select cust_id, addr,
listagg ( nvl ( to_str, wd ), ' ' )
within group ( order by rn ) new_addr
from words
left join replacements
on wd = from_str
group by cust_id, addr;

can we use something like...
on on (substr(wd,instr(wd,to_string),length(to_string)) = from_str)
and add extra character like dot, comma etc after changing to to_string? can you throw some light here??


Chris Saxon
April 01, 2021 - 1:36 pm UTC

As I said in the original answer:

If you want to replace strings with a trailing dot (and remove it), change the join criteria to:

on ( wd = from_str or wd = from_str || '.' )

More to Explore

SQL

The Oracle documentation contains a complete SQL reference.