Skip to Main Content
  • Questions
  • Need SQL for standardizing the addresses

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Murthy.

Asked: February 25, 2021 - 12:21 pm UTC

Last updated: February 26, 2021 - 4:54 pm UTC

Version: 12c

Viewed 1000+ times

You Asked

Hi Tom and Team,
I really appreciate your help for all of us.
We are in the process of standardizing the addresses of our customers.
I have a main t1omer table which contains the customers data.
I have a mapping table that contains mapping for final value to multiple values.
I need to get the final_value that is correspond to the multiple_values.
if the length of the address is more than 35 characters after standardizing,
then from previous space(' ') from right side to end of the string, it is created as a address2 field
could you please help with sql or pl/sql?
using Oracle 12c database.


customer Table:

cust_id address
10  9 Help Street, Level 4
11  22 Victoria Street
12  1495 Franklin Str
13  30 Hasivim St.,Petah-Tikva
14  2 Jakaranda St
15  61, Science Park Rd
16  61, Social park road
17  Av. Hermanos Escobar 5756
18  Ave. Hermanos Escobar 5756
19  8000 W FLORISSANT AVE
20  8600 MEMORIAL PKWY SW
21  8200 FLORISSANTMEMORIALWAYABOVE SW
22  8600 MEMORIALFLORISSANT PKWY SW

create table t1
  (
   cust_id number,
   address varchar2(100)
  );

Insert into t1 values(10,'9 Help Street, Level 4');
Insert into t1 values(11,'22 Victoria Street');
Insert into t1 values(12,'1495 Franklin Str');
Insert into t1 values(13,'61, Science Park Rd');
Insert into t1 values(14,'61, Social park road');
Insert into t1 values(15,'Av. Hermanos Escobar 5756');
Insert into t1 values(16,'Ave. Hermanos Escobar 5756');
Insert into t1 values(17,'8000 W FLORISSANT AVE');
Insert into t1 values(18,'8600 MEMORIAL PKWY SW');
Insert into t1 values(19,'8200 FLORISSANTMEMORIALWAYABOVE SW');
Insert into t1 values(20,'8600 MEMORIALFLORISSANT PKWY SW');


--------------

Mapping Table:

id  final_value multiple_values
1   St      Street
2   St      St.
3   St      Str
4   St      St
5   Rd      Rd.
6   Rd      road
7   Av      Av.
8   Av      Ave.
9   Av      Avenue
10   Av      Aven.
11   West    W
12  South West  SW

create table t2
  (
  id number,
  final_vaue varchar2(50),
  multiple_values varchar2(50)
  );

insert into t2 values(1,'St','Street');
insert into t2 values(2,'St','St.');
insert into t2 values(3,'St','Str');
insert into t2 values(4,'St','St');
insert into t2 values(5,'Rd','Rd.');
insert into t2 values(6,'Rd','road');
insert into t2 values(7,'Av','Av.');
insert into t2 values(8,'Av','Ave.');
insert into t2 values(9,'Av','Avenue');
insert into t2 values(10,'Av','Aven.');
insert into t2 values(11,'West','W');
insert into t2 values(12,'South West','SW.');


------------

Expected Output:

cust_id address
10  9 Help St, Level 4
11  22 Victoria St
12  1495 Franklin St
13  30 Hasivim St ,Petah-Tikva
14  2 Jakaranda St
15  61, Science Park Rd
16  61, Social park Rd
17  Av Hermanos Escobar 5756
18  Av Hermanos Escobar 5756
19  8000 West FLORISSANT Ave
20  8600 MEMORIAL PKWY South West


if length of the address is more than 35 characters then the expected output is:

cust_id address                              address2
21  8200 FLORISSANTMEMORIALWAYABOVE      South West
22  8600 MEMORIALFLORISSANT PKWY         South West


Thaks for all your help

and Chris said...

Here's one way to approach it:

- Split the address into a row for each word/component
- Outer join the mapping table to these rows, returning the final value if there is a match and the address string if not
- Use match_recognize to split this into groups of address strings < 35 characters https://blogs.oracle.com/datawarehousing/managing-overflows-in-listagg
- Use pivot with listagg to convert the groups into address 1, 2, 3, etc. columns: https://blogs.oracle.com/sql/how-to-convert-rows-to-columns-and-back-again-with-sql-aka-pivot-and-unpivot

Which gives something like:

create table t1 (
  cust_id number,
  address varchar2(100)
);

create table t2 (
  id number,
  final_value varchar2(50),
  multiple_values varchar2(50)
);

Insert into t1 values(10,'9 Help Street, Level 4');
Insert into t1 values(11,'22 Victoria Street');
Insert into t1 values(12,'1495 Franklin Str');
Insert into t1 values(13,'61, Science Park Rd');
Insert into t1 values(14,'61, Social park road');
Insert into t1 values(15,'Av. Hermanos Escobar 5756');
Insert into t1 values(16,'Ave. Hermanos Escobar 5756');
Insert into t1 values(17,'8000 W FLORISSANT AVE');
Insert into t1 values(18,'8600 MEMORIAL PKWY SW');
Insert into t1 values(19,'8200 FLORISSANTMEMORIALWAYABOVE SW');
Insert into t1 values(20,'8600 MEMORIALFLORISSANT PKWY SW');

insert into t2 values(1,'St','Street');
insert into t2 values(2,'St','St.');
insert into t2 values(3,'St','Str');
insert into t2 values(4,'St','St');
insert into t2 values(5,'Rd','Rd.');
insert into t2 values(6,'Rd','road');
insert into t2 values(7,'Av','Av.');
insert into t2 values(8,'Av','Ave.');
insert into t2 values(9,'Av','Avenue');
insert into t2 values(10,'Av','Aven.');
insert into t2 values(11,'West','W');
insert into t2 values(12,'South West','SW.');
insert into t2 values(13,'South West','SW');

commit;

with words as (
  select cust_id,
         regexp_substr ( address, '[^ ]+', 1, l ) word,
         l
  from   t1, lateral ( 
    select level l from dual
    connect by level <= length ( address ) - 
      length ( replace ( address, ' ' ) ) + 1
  )
), replacements as (
  select cust_id, nvl ( final_value, word ) w, l
  from   words
  left   join t2
  on     word = multiple_values
), grps as (
  select * from replacements
    match_recognize (
      partition by cust_id
      order by l
      measures 
        match_number() grp
      all rows per match
      pattern ( str+ ) 
      define 
        str as sum ( length ( w || ' ' ) ) < 35
    )
)
  select * from grps
  pivot (
    listagg ( w, ' ' ) within group ( order by l )
    for grp in ( 1 as address1, 2 as address2 )
  );

CUST_ID    ADDRESS1                           ADDRESS2     
        10 9 Help Street, Level 4             <null>        
        11 22 Victoria St                     <null>        
        12 1495 Franklin St                   <null>        
        13 61, Science Park Rd                <null>        
        14 61, Social park Rd                 <null>        
        15 Av Hermanos Escobar 5756           <null>        
        16 Av Hermanos Escobar 5756           <null>        
        17 8000 West FLORISSANT AVE           <null>        
        18 8600 MEMORIAL PKWY South West      <null>        
        19 8200 FLORISSANTMEMORIALWAYABOVE    South West    
        20 8600 MEMORIALFLORISSANT PKWY       South West


There's still some work to do here to ensure you handle punctuation correctly, but this should be enough to get you started

Rating

  (1 rating)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Need SQL for standardizing the addresses

Murthy, February 26, 2021 - 12:17 pm UTC

Thank you very much for your help.
Chris Saxon
February 26, 2021 - 4:54 pm UTC

You're welcome

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.