Skip to Main Content
  • Questions
  • Check if a string contains date, arbitrary date format

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Yury.

Asked: February 07, 2022 - 10:16 am UTC

Last updated: March 04, 2022 - 3:46 am UTC

Version: Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit Production

Viewed 10K+ times! This question is

You Asked

Dear Colleagues,

My program deals with URLs and needs to separate the URLs that probably contain date. Date format is unknown, but let's assume only number formats for simplicity. Currently I solve this task with a self-written package. It contains a list of reasonable date formats and tries to extract numbers and separators, and to convert these substrings to date data type using this list of formats. Obviously, performance is not too high.

My question is following: Maybe there is some elegant variant to guess presence of a date in a string when date format is unknown? It is better to make a false-positive decision (date exists) than false-negative (miss existing date).

Example:

with t as
(
select 1 as id, 'http://some1.com/file-02.02.2022.xls' as url from dual union all
select 2 as id, 'http://some2.com/file-20220202.xls' as url from dual union all
select 3 as id, 'http://some1.com/file-980992022.xls' as url from dual
)
select * from t
;

ID = 1 - date exists;
ID = 2 - date exists;
ID = 3 - date does not exist.


Thank you!
Regards,
Yury

and Chris said...

I don't know of an in-built way to do this. This is a hard problem to solve in general.

Here's one way to approach it:

- Search the table for values that might be dates
- Extract the candidate date strings from the URL
- Pass these to a validation function to see if they are real dates. From 12.2 you can do this with VALIDATE_CONVERSION; you'll have to roll your own in earlier releases.

Here's an example to get you started:

with t as (
  select 1 as id, 'http://some1.com/file-02.02.2022.xls' as url from dual union all
  select 2 as id, 'http://some2.com/file-20220202.xls' as url from dual union all
  select 3 as id, 'http://some1.com/file-980992022.xls' as url from dual
), possible_dates as (
  select t.*,
         coalesce (
           regexp_substr ( url, '[0-9]{8}' ),
           regexp_substr ( url, '([0-9]{2}[[:punct:]]){2}[0-9]{4}' )
         ) dt
  from   t
  where  regexp_like (
    url, '[0-9]{8}'
  ) 
  or    regexp_like (
    url, '([0-9]{2}[[:punct:]]){2}[0-9]{4}'
  )
)
  select id, url, dt
  from   possible_dates d
  where  1 in ( 
    validate_conversion ( dt as date, 'ddmmyyyy' ),
    validate_conversion ( dt as date, 'yyyymmdd' ),
    validate_conversion ( dt as date, 'dd.mm.yyyy' )
  );
  
        ID URL                                  DT                                  
---------- ------------------------------------ ------------------------------------
         1 http://some1.com/file-02.02.2022.xls 02.02.2022                          
         2 http://some2.com/file-20220202.xls   20220202  


There are lots of risks with this - the first is ensuring you cover all the possible date formats. The good news is Oracle Database is relatively forgiving of differing punctuation in the string.

For example, the conversion below works even though the string has hyphens and the format slashes:

select to_date ( '01-01-2022', 'dd/mm/yyyy' ) from dual;

TO_DATE('01
-----------
01-JAN-2022


The downside to the method above is the regexes return the first match. This can overlook possible dates. For example, this string contains the possible date 1 Jan 2022, but is prefixed by 99:

select regexp_substr ( '9901012022', '[0-9]{8}' )
from   dual;

REGEXP_S
--------
99010120


So the regex returns the start which is an invalid date. You could get clever with the regex (e.g. "([0-3][0-9]){2}[0-9]{4}" ) to reduce the chance of this, but covering off all the possible day/month/year orders is a pain.

Rating

  (2 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

but covering off all the possible day/month/year orders is a pain.

A reader, February 10, 2022 - 9:03 pm UTC

Thank you, Chris!

You are right: "but covering off all the possible day/month/year orders is a pain".

Let me think on this.

Thank you,
Yury

Connor McDonald
February 11, 2022 - 4:43 am UTC

The other concern here is:

file-02-03-2022

Is that Feb 3? Or Mar 2?


Regexp_substr Followup.

Venkat Sagaram, March 03, 2022 - 3:34 pm UTC

Hi Chris-

Could you please elaborate on

regex (e.g. "([0-3][0-9]){2}[0-9]{4}" )

This would be extremely useful for something that I am working on right now.

I have dates in a text column in all kinds of different formats which includes free text for example "APPT. 11-18-99 BD. RPT." and I need to get the date off of this column.

I have tried with the example that you have suggested, but does not work for certain date formats. And I would like to try the above format as suggested.

Thanks
Venkat Sagaram
Connor McDonald
March 04, 2022 - 3:46 am UTC

regex isn't anything particular to Oracle.

Here's one of my links I often visit when building a regex

https://cheatography.com/davechild/cheat-sheets/regular-expressions/

More to Explore

SQL

The Oracle documentation contains a complete SQL reference.