Skip to Main Content
  • Questions
  • extract numbers from varchar2 upto first occurrence of a non-numeric value

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Philip.

Asked: September 21, 2017 - 2:40 pm UTC

Last updated: October 06, 2017 - 1:26 am UTC

Version: 12.

Viewed 1000+ times

You Asked

On Oracle12 database, we have this table tbl_house_number that has one column "house_number" which is a varchar2 and has data entries of all different combinations.
And all I need is the numbers from left to the first occurrence of a non-numeric like space, special characters and ascii characters.

Below are some sample house_numbers:

217 3RDFL, 2173RDFL, 500D, 527 # 2ND, 527# 2ND, 5422NDFL, 5422, 30# D2, 1250 2ND, 
12502ND, 217 3RDFL, 2173RDFL, 5241R, 3 2R, 32R, 5092R, 24 # 2R, 24# 2R, 129 B-16, 
129# B17, 129B-16, 16 # 2B, 4229B, 539# APT 3, 563 # A5,2162A, 934-A, 109-A, 511-A, 339-REA, 339REAR


And for the above house_numbers, the expected corresponding outputs should be as follows:
217, 217, 500, 527, 527, 542, 542, 30,1250,1250, 217, 217, 5241, 3, 32, 5092, 24, 
24, 129, 129, 129, 16, 4229, 539, 563, 2162, 934, 109, 511, 339, 339


The best sql i can come up with is below, but it still does not resolve a situation when the data is 3933thfl....and the correct data should be 393...but my below sql code will show as 3933.

select house_number,
REGEXP_SUBSTR(replace(replace(replace(replace(replace(replace(house_number,'1/2',''),'-',''), '+', ''), '.', ''), '#', ''), '?', ''),'^\d+') AS "House_Number"
from tbl_HOUSE_NUMBER
where house_number != '0' and house_number != '00' and house_number != '000' and house_number != '0000' and house_number != '00000' and house_number != '000000'
order by 1;


Any help would be appreciated much.

Thanks.

with LiveSQL Test Case:

and Chris said...

Pro-tip: make sure you save your LiveSQL script before sharing the URL! Public URLs are in this format:

https://livesql.oracle.com/apex/livesql/file/content

We can't access the URL provided.

Anyway, I don't understand why you say 3933thfl should be 393 and not 3933? 3 is a digit, right?

In any case, you can simplify the replace to:

1. Find the digits 0-9
2. Then match any character that isn't a comma
3. Finally match a comma or end-of-line ($)

You can then use backreferences to keep parts one and three:

with house_numbers as (
  select '217 3RDFL, 2173RDFL, 500D, 527 # 2ND, 527# 2ND, 5422NDFL, 5422, 30# D2, 1250 2ND, 
12502ND, 217 3RDFL, 2173RDFL, 5241R, 3 2R, 32R, 5092R, 24 # 2R, 24# 2R, 129 B-16, 
129# B17, 129B-16, 16 # 2B, 4229B, 539# APT 3, 563 # A5,2162A, 934-A, 109-A, 511-A, 339-REA, 339REAR, 3933thfl' house_number from dual
)
select house_number,
       regexp_replace(house_number, '([0-9]+)[^,]*(,|$)', '\1\2') AS "House_Number"
from house_numbers
where house_number != '0' and house_number != '00' and house_number != '000' and house_number != '0000' and house_number != '00000' and house_number != '000000'
order by 1;

HOUSE_NUMBER
---------------------------------------------------------------------------------------------------------------
House_Number                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                   
---------------------------------------------------------------------------------------------------------------
217 3RDFL, 2173RDFL, 500D, 527 # 2ND, 527# 2ND, 5422NDFL, 5422, 30# D2, 1250 2ND,                                                                                                                                                                                                    
12502ND, 217 3RDFL, 2173RDFL, 5241R, 3 2R, 32R, 5092R, 24 # 2R, 24# 2R, 129 B-16,                                                                                                                                                                                                    
129# B17, 129B-16, 16 # 2B, 4229B, 539# APT 3, 563 # A5,2162A, 934-A, 109-A, 511-A, 339-REA, 339REAR, 3933thfl    
                                                                                                                                                                   
217, 2173, 500, 527, 527, 5422, 5422, 30, 1250,                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                   
12502, 217, 2173, 5241, 3, 32, 5092, 24, 24, 129,                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                                   
129, 129, 16, 4229, 539, 563,2162, 934, 109, 511, 339, 339, 3933 

Rating

  (6 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Philip Ebenezer, September 21, 2017 - 5:31 pm UTC

Thanks for providing a simpler sql. Below are my response to your comments
We can't access the URL provided.
i fixed the LiveSQL and below is the link
https://livesql.oracle.com/apex/livesql/file/content_FLG1A7GLIHOFYSMY9X6GKTIXN.html

Anyway, I don't understand why you say 3933thfl should be 393 and not 3933? 3 is a digit, right?
sorry ...it was a typo....
what i meant was....for example if the data value is 3933rdfl....this actually means 393 3rd floor...and i only want to extract out the 393 for the house_number.
So i wasn't sure how i can come up with a logic for that.

Chris Saxon
September 22, 2017 - 4:38 pm UTC

Thanks for the updated link.

How do you know it means "393 3rd floor"? There's no chance it means 3393 Rd (road)?

Do you have a complete list of what these possibilities are and are there any exceptions?

Philip Ebenezer, September 22, 2017 - 5:14 pm UTC

yes for your question
How do you know it means "393 3rd floor"? There's no chance it means 3393 Rd (road)?

Answer: Yes for now we wont know. But we are ok with making the assumption that the number before a 1st or 2nd or 3rd or 4th etc...9th can be ignored.
So what we need to extract out from "3393rd floor" is "339".
Can you help me with a solution for the above condition?
Thanks.
Sergiusz Wolicki
September 23, 2017 - 10:42 am UTC

Please note that from the internationalization point of view trying to interpret postal addresses is generally a bad idea. You have not shared the actual business requirement behind your question so it is difficult to validate the approach. Could you shed some light on this?

Pui Chan, September 23, 2017 - 1:42 am UTC

How about,
with house_numbers as (
select '217 3RDFL, 2173RDFL, 500D, 527 # 2ND, 527# 2ND, 5422NDFL, 5422, 30# D2, 1250 2ND,
12502ND, 217 3RDFL, 2173RDFL, 5241R, 3 2R, 32R, 5092R, 24 # 2R, 24# 2R, 129 B-16,
129# B17, 129B-16, 16 # 2B, 4229B, 539# APT 3, 563 # A5,2162A, 934-A, 109-A, 511-A, 339-REA, 339REAR, 393 3rd fl' house_number from dual
)
select val, regexp_instr(val, '[^0-9]') idx, nvl(substr(val, 1, regexp_instr(val, '[^0-9]')-1),val) house_number
from (
SELECT trim(REGEXP_SUBSTR(str, '[^,]+',1, LEVEL)) val
FROM (SELECT house_number str FROM house_numbers)
CONNECT BY LEVEL <= LENGTH(str)-LENGTH(REPLACE(str,','))+1
);

Philip Ebenezer, September 26, 2017 - 12:32 pm UTC

Well the business reason to get only the house number from this varchar2 data, but it is going to be impossible to come up with a one solution fits all.
But thanks for your response. I will use your sql as another solution.
So as of now i would like to close this request.
Chris Saxon
September 29, 2017 - 12:29 pm UTC

Yep, as Sergiusz said interpreting addresses is a bad idea. There are edge cases everywhere!

Might be worth a shot

Chuck Jolley, September 29, 2017 - 4:10 pm UTC

There are services that rationalize address data. I have no clue what the costs are, but they are pretty good.
We are not allowed to use them by law, but the company that does our mass billings runs their other customer's address data through one as a matter of course.
This one is for the US, but I'm pretty sure you could find at least one for any developed country.

https://ribbs.usps.gov/index.cfm?page=aec

Chris Saxon
October 06, 2017 - 1:26 am UTC

Yep. There are similar services available in the UK too.

Philip Ebenezer, October 06, 2017 - 2:50 pm UTC

Thanks for the link to the service to rationalize US addresses.

More to Explore

SQL

The Oracle documentation contains a complete SQL reference.