Hi Tom and Team,
I really appreciate your help for all of us.
We are in the process of standardizing the addresses of our customers.
I have a main t1omer table which contains the customers data.
I have a mapping table that contains mapping for final value to multiple values.
I need to get the final_value that is correspond to the multiple_values.
if the length of the address is more than 35 characters after standardizing,
then from previous space(' ') from right side to end of the string, it is created as a address2 field
could you please help with sql or pl/sql?
using Oracle 12c database.
customer Table:
cust_id address
10 9 Help Street, Level 4
11 22 Victoria Street
12 1495 Franklin Str
13 30 Hasivim St.,Petah-Tikva
14 2 Jakaranda St
15 61, Science Park Rd
16 61, Social park road
17 Av. Hermanos Escobar 5756
18 Ave. Hermanos Escobar 5756
19 8000 W FLORISSANT AVE
20 8600 MEMORIAL PKWY SW
21 8200 FLORISSANTMEMORIALWAYABOVE SW
22 8600 MEMORIALFLORISSANT PKWY SW
create table t1
(
cust_id number,
address varchar2(100)
);
Insert into t1 values(10,'9 Help Street, Level 4');
Insert into t1 values(11,'22 Victoria Street');
Insert into t1 values(12,'1495 Franklin Str');
Insert into t1 values(13,'61, Science Park Rd');
Insert into t1 values(14,'61, Social park road');
Insert into t1 values(15,'Av. Hermanos Escobar 5756');
Insert into t1 values(16,'Ave. Hermanos Escobar 5756');
Insert into t1 values(17,'8000 W FLORISSANT AVE');
Insert into t1 values(18,'8600 MEMORIAL PKWY SW');
Insert into t1 values(19,'8200 FLORISSANTMEMORIALWAYABOVE SW');
Insert into t1 values(20,'8600 MEMORIALFLORISSANT PKWY SW');
--------------
Mapping Table:
id final_value multiple_values
1 St Street
2 St St.
3 St Str
4 St St
5 Rd Rd.
6 Rd road
7 Av Av.
8 Av Ave.
9 Av Avenue
10 Av Aven.
11 West W
12 South West SW
create table t2
(
id number,
final_vaue varchar2(50),
multiple_values varchar2(50)
);
insert into t2 values(1,'St','Street');
insert into t2 values(2,'St','St.');
insert into t2 values(3,'St','Str');
insert into t2 values(4,'St','St');
insert into t2 values(5,'Rd','Rd.');
insert into t2 values(6,'Rd','road');
insert into t2 values(7,'Av','Av.');
insert into t2 values(8,'Av','Ave.');
insert into t2 values(9,'Av','Avenue');
insert into t2 values(10,'Av','Aven.');
insert into t2 values(11,'West','W');
insert into t2 values(12,'South West','SW.');
------------
Expected Output:
cust_id address
10 9 Help St, Level 4
11 22 Victoria St
12 1495 Franklin St
13 30 Hasivim St ,Petah-Tikva
14 2 Jakaranda St
15 61, Science Park Rd
16 61, Social park Rd
17 Av Hermanos Escobar 5756
18 Av Hermanos Escobar 5756
19 8000 West FLORISSANT Ave
20 8600 MEMORIAL PKWY South West
if length of the address is more than 35 characters then the expected output is:
cust_id address address2
21 8200 FLORISSANTMEMORIALWAYABOVE South West
22 8600 MEMORIALFLORISSANT PKWY South West
Thaks for all your help
Here's one way to approach it:
- Split the address into a row for each word/component
- Outer join the mapping table to these rows, returning the final value if there is a match and the address string if not
- Use match_recognize to split this into groups of address strings < 35 characters
https://blogs.oracle.com/datawarehousing/managing-overflows-in-listagg - Use pivot with listagg to convert the groups into address 1, 2, 3, etc. columns:
https://blogs.oracle.com/sql/how-to-convert-rows-to-columns-and-back-again-with-sql-aka-pivot-and-unpivot Which gives something like:
create table t1 (
cust_id number,
address varchar2(100)
);
create table t2 (
id number,
final_value varchar2(50),
multiple_values varchar2(50)
);
Insert into t1 values(10,'9 Help Street, Level 4');
Insert into t1 values(11,'22 Victoria Street');
Insert into t1 values(12,'1495 Franklin Str');
Insert into t1 values(13,'61, Science Park Rd');
Insert into t1 values(14,'61, Social park road');
Insert into t1 values(15,'Av. Hermanos Escobar 5756');
Insert into t1 values(16,'Ave. Hermanos Escobar 5756');
Insert into t1 values(17,'8000 W FLORISSANT AVE');
Insert into t1 values(18,'8600 MEMORIAL PKWY SW');
Insert into t1 values(19,'8200 FLORISSANTMEMORIALWAYABOVE SW');
Insert into t1 values(20,'8600 MEMORIALFLORISSANT PKWY SW');
insert into t2 values(1,'St','Street');
insert into t2 values(2,'St','St.');
insert into t2 values(3,'St','Str');
insert into t2 values(4,'St','St');
insert into t2 values(5,'Rd','Rd.');
insert into t2 values(6,'Rd','road');
insert into t2 values(7,'Av','Av.');
insert into t2 values(8,'Av','Ave.');
insert into t2 values(9,'Av','Avenue');
insert into t2 values(10,'Av','Aven.');
insert into t2 values(11,'West','W');
insert into t2 values(12,'South West','SW.');
insert into t2 values(13,'South West','SW');
commit;
with words as (
select cust_id,
regexp_substr ( address, '[^ ]+', 1, l ) word,
l
from t1, lateral (
select level l from dual
connect by level <= length ( address ) -
length ( replace ( address, ' ' ) ) + 1
)
), replacements as (
select cust_id, nvl ( final_value, word ) w, l
from words
left join t2
on word = multiple_values
), grps as (
select * from replacements
match_recognize (
partition by cust_id
order by l
measures
match_number() grp
all rows per match
pattern ( str+ )
define
str as sum ( length ( w || ' ' ) ) < 35
)
)
select * from grps
pivot (
listagg ( w, ' ' ) within group ( order by l )
for grp in ( 1 as address1, 2 as address2 )
);
CUST_ID ADDRESS1 ADDRESS2
10 9 Help Street, Level 4 <null>
11 22 Victoria St <null>
12 1495 Franklin St <null>
13 61, Science Park Rd <null>
14 61, Social park Rd <null>
15 Av Hermanos Escobar 5756 <null>
16 Av Hermanos Escobar 5756 <null>
17 8000 West FLORISSANT AVE <null>
18 8600 MEMORIAL PKWY South West <null>
19 8200 FLORISSANTMEMORIALWAYABOVE South West
20 8600 MEMORIALFLORISSANT PKWY South West
There's still some work to do here to ensure you handle punctuation correctly, but this should be enough to get you started