Skip to Main Content
  • Questions
  • Converting CHAR to Numeric via user written function

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Udi.

Asked: September 02, 2008 - 5:50 pm UTC

Last updated: September 04, 2008 - 4:29 pm UTC

Version: 10.2.0.4

Viewed 1000+ times

You Asked

Hi Tom,

We use the following function to convert numbers in CHAR format to numeric. The data is very dirty so if it can't convert or you get "infinity" we just put in a zero.

We run in parallel over billions of rows. It used to be quite slow until we added DETERMINISTRIC and PARALLEL_ENABLE which helped, but it still drives up CPU utilization quite a bit to add this function to the select. You add a few functions on a few columns - it's even worse.

(1) can this be coded even more efficiently?
(2) is there a truly efficient builtin function to do this?

It's used like this -

CREATE TABLE CLEAN NOLOGGING PCTFREE 0 PARALLEL 16
AS
SELECT /*+ PARALLEL (X,16) */
LAB_KEY,
TO_NUMERIC (LAB_RESULT)
FROM
DIRTY;


Here's the function
===================

CREATE OR REPLACE FUNCTION TO_NUMERIC (P_NUMBER IN VARCHAR2)
RETURN NUMBER DETERMINISTIC PARALLEL_ENABLE
IS
BEGIN
RETURN TRANSLATE (TO_NUMBER(P_NUMBER),'~',0);
EXCEPTION
WHEN OTHERS THEN
RETURN 0;
END;
/


Thanks!

and Tom said...

well, if your numbers are 'simple positive decimal numbers' (eg: no scientific notation, nothing fancy), you could:

ops$tkyte%ORA10GR2> create table t
  2  as
  3  select object_name x from all_objects  union all
  4  select to_char(object_id ) from all_objects  union all
  5  select object_name x from all_objects  union all
  6  select to_char(object_id ) from all_objects
  7  /

Table created.

ops$tkyte%ORA10GR2> insert into t values ('.');

1 row created.

ops$tkyte%ORA10GR2> insert into t values ('.0');

1 row created.

ops$tkyte%ORA10GR2> insert into t values ('0.0');

1 row created.

ops$tkyte%ORA10GR2> insert into t values ('0.');

1 row created.

ops$tkyte%ORA10GR2> insert into t values ('.0.');

1 row created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create or replace function is_numeric( p_str in varchar2 ) return number
  2  as
  3  begin
  4          return to_number(p_str);
  5  exception
  6          when others then return null;
  7  end;
  8  /

Function created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> variable n number;
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> exec :n := dbms_utility.get_cpu_time;

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> select count(num), count(*) from (select is_numeric(x) num from t);

COUNT(NUM)   COUNT(*)
---------- ----------
     99883     199765

ops$tkyte%ORA10GR2> exec dbms_output.put_line( (dbms_utility.get_cpu_time-:n) ); :n := dbms_utility.get_cpu_time;
104

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> select count(num), count(*) from (
  2  select case when x <> '.' and nvl(replace(translate(x,'0123456789','000000000'),'0',''),'.') ='.' then to_number(x) end num
  3             from t);

COUNT(NUM)   COUNT(*)
---------- ----------
     99883     199765

ops$tkyte%ORA10GR2> exec dbms_output.put_line( (dbms_utility.get_cpu_time-:n) ); :n := dbms_utility.get_cpu_time;
25

PL/SQL procedure successfully completed.






do a quick check to see if it is all numbers, and maybe a decimal in the middle.

If you can do it in sql without calling plsql, it'll chew up a lot less cpu.


(yes, I know about regexp - that function would take more cpu than plsql does in general... tried it)

Rating

  (2 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Brilliant !

Udi Karni, September 04, 2008 - 2:50 pm UTC

Tom - thanks. You are right - using SQL made things move much quicker than the PL/SQL approach of trying to convert and trapping potential failures.

The key is whether your SQL traps any and all non-numerics so that the subsequent conversion will not fail.

I tried it on a very large and very dirty file and it sailed through perfectly, so I'm very optimistic!

OK to ask a related question? what if instead of converting a number from CHAR to NUM we want to convert a date from CHAR to DATE? standard yyyymmdd format. Any clever SQL that will establish "cleanliness" as in your previous example?

Thanks,

Udi
Tom Kyte
September 04, 2008 - 4:29 pm UTC

well, as long as you fit the domain of simple numbers I described:

"do a quick check to see if it is all numbers, and maybe a decimal in the middle. "

it'll work - hopefully you see what it does! Knowing what it does - you would be able to answer the question "will it work for my data"


case when x <> '.' and
nvl(replace(translate(x,'0123456789','000000000'),'0',''),'.') ='.' then
to_number(x) end


translate(x,'0123456789','000000000') - turn all digits into zeros

replace( <of that>, '0', '' ) - remove all zeroes

nvl( <of that>, '.' ) - if everything was a digit, then zero, we have null - make null '.'


case when x <> '.' -- '.' by itself is not a number...
and nvl( <of that>, '.') = '.'

we must have

digits.digits
or
digits
or
.digits
or
digits.

as long as you have only those inputs to be considered (eg: 1e10 is not handled... +1.2 is not, -44 is not - but you could easily add support for +/- obviously...)



...
standard yyyymmdd format.
Any clever SQL that will establish "cleanliness" as in your previous example?
......


is a bit more difficult.

it starts easy enough, but leap years make it nasty, it could be something like:


ops$tkyte%ORA10GR2> select
  2  case
  3  when substr( dt, 1, 4 ) between '1900' and '2100'
  4       then case when substr( dt, 5, 2 ) between '01' and '12'
  5            then case
  6                 when (
  7                       (substr( dt, 5, 2 ) in ('01','03','05','07','08','10','12') and substr( dt, 7, 2 ) between '01' and '31')
  8                       or
  9                       (substr( dt, 5, 2 ) in ('04','06','09','11') and substr( dt, 7, 2 ) between '01' and '31')
 10                       or
 11                       (substr( dt, 5, 2 ) = '02' and substr( dt, 7, 2 ) between '01' and '28')
 12                                          )
 13                  then to_date( dt, 'yyyymmdd' )
 14                                  when (substr( dt, 5, 2 ) = '02' and substr( dt, 7, 2 ) = '29' and
 15                        (
 16                        (trunc(to_number(substr(dt,1,4))/4) = to_number(substr(dt,1,4))/4 and NOT trunc(to_number(substr(dt,1,4))/100) = to_number(substr(dt,1,4))/100)
 17                        or
 18                        (trunc(to_number(substr(dt,1,4))/400) = to_number(substr(dt,1,4))/400)
 19                        )
 20                       )
 21                  then to_date( dt, 'yyyymmdd' )
 22                   end
 23              end
 24  end
 25  from (select '20400229' dt from dual )
 26  /

CASEWHENSUBSTR(DT,1,
--------------------
29-feb-2040 00:00:00



using the rule:

1. The year is a Leap Year if the year is divisible by 4
UNLESS
2. The year is divisible by 100 then it is not a Leap Year (the century rule)
UNLESS
3. the year is divisible by 400 then it is a Leap Year




A reader, September 08, 2008 - 11:16 am UTC


More to Explore

PL/SQL demos

Check out more PL/SQL tutorials on our LiveSQL tool.

PL/SQL docs

PL/SQL reference manual from the Oracle documentation library