Skip to Main Content
  • Questions
  • Why is char comparison slower than intergers/numbers/Dates

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Prats.

Asked: October 20, 2009 - 6:45 am UTC

Last updated: October 28, 2009 - 8:00 am UTC

Version: 10g

Viewed 10K+ times! This question is

You Asked

Tom,

I have inherited a production database that is storing some numbers and data as char datatype. For example, a char column with size 5 is storing values like '0055', '0045'.
Another instance is storing Month and year separately as October and 2009 in different columns. Hence I have to perform char comparison after pulling month and date from a date say 3/10/2009.

The performance of queries performing comparison among char data is very slow. Can you elaborate reasons for this. Is there a way to handle performance degradation here.

Thanks for your help...

and Tom said...

You would need to provide a tad more information.

I seriously (totally, completely, utterly) doubt - do not think - do not believe the datatype has a thing to do with it.

What would count would be the queries themselves.


You give no concrete examples
You give nothing to go on


... Hence I have to perform char comparison after pulling month and date from a date say 3/10/2009.
...

it SOUNDS like you have a date - but probably you mean "i have a string with a date in it" or something. I don't know the semantics of your comparison here - I have *no examples*, nothing to work with.



ops$tkyte%ORA10GR2> create table t
  2  as
  3  select to_char( l, 'fm000000' ) str, l num
  4    from (select level l from dual connect by level < 1000000 )
  5  /

Table created.

ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> @trace
ops$tkyte%ORA10GR2> alter session set events '10046 trace name context forever, level 12';

Session altered.

ops$tkyte%ORA10GR2> select * from t where str = '123456';

STR            NUM
------- ----------
123456      123456

ops$tkyte%ORA10GR2> select * from t where str = 123456;

STR            NUM
------- ----------
123456      123456

ops$tkyte%ORA10GR2> select * from t where num = 123456;

STR            NUM
------- ----------
123456      123456

ops$tkyte%ORA10GR2> select * from t where num = '123456';

STR            NUM
------- ----------
123456      123456



When you look at the tkprof, you can see when you compare a str database column to a number input - we have to convert the str into a number - so that

select * from t where str = 123456;

is really

select * from t where TO_NUMBER(str) = 123456

and that will affect performance - but in general, compare a str to a str and it is as fast as comparing a number to a number

Likely - you have implicit conversions taking place and you need to correct that - compare only STRINGS to STRINGS and only NUMBERS to NUMBERS and never compare a string to a number - never.
select * from t where str = '123456'

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.04       0.04          0       2359          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.04       0.04          0       2359          0           1
********************************************************************************
select * from t where str = 123456

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.30       0.29          0       2359          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.30       0.29          0       2359          0           1
********************************************************************************
select * from t where num = 123456

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.05       0.04          0       2359          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.05       0.05          0       2359          0           1
********************************************************************************
select * from t where num = '123456'

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.05       0.05          0       2359          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.05       0.05          0       2359          0           1

Rating

  (12 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

riyaz, October 20, 2009 - 7:17 am UTC

It sounds like only if implicit conversion or explicit conversion happens, then it is advisable to use natural data types.
But in your other threads, you are always recommending to use number data type wherever possible to get better performance. (But here you are saying, if varchar is compared with varchar, the performance will be same as number column compared with number column.) In earlier threads, you said number data type comparison is always faster.
Tom Kyte
October 22, 2009 - 4:19 pm UTC

... It sounds like only if implicit conversion or explicit conversion happens, then
it is advisable to use natural data types.
...

I don't know what that means?


... But in your other threads, you are always recommending to use number data type
wherever possible to get better performance. ...

where - show me where I said that?


... In earlier threads, you said number data type
comparison is always faster. ...

give me an example please, go for it.

and then we'll see if

a) I said it at all (doesn't sound like me)
b) if I did, what the CONTEXT was - the discussion surrounding the statement.

number vs character comparison

Franky, October 21, 2009 - 3:42 am UTC

Hello,

I think we need to leave Oracle for a moment to get better understanding:

From assembly point of view the comparison of two numbers is simply done by one command which triggers one (!) low level call, while for "strings" one asm command triggers several (!) low level calls because characters in the strings are compared against each other step by step...

How does this affect Oracle performance? The same way.

Today's servers have multiple cores and core clocks in the GHz range, thus this "problem" doesn't show up very often, but if you need to focus on the performance, then use numbers whenever possible.

Tom wrote the following : "but in general, compare a str to a str and it is as fast as comparing a number to a number" which means that (!) in general (!) you don't get problems when you compare strings to strigns since the server CPUs do a very quick job for you. Problems might arrise whenever char or varchar2(2000) "strings" have to be compared million or billion (!) times and the first ~1900 characters are identical. If this happened in a very busy database instance, then you would have a real problem...

Regards,

Franky

Tom Kyte
October 23, 2009 - 11:22 am UTC

you need to COME ON BACK

you assume that we use native numbers (2 byte, 4 byte integers).

We do not. We use a packed decimal like format in a varying 'string' of 0 to 22 bytes.

Now what?


Please don't leave Oracle - You actually need to come inside and understand how it works!

... but if you need to focus on the performance, then use numbers whenever possible. ...

not so - not if you know how numbers are really stored in Oracle - 38 digits of precision - you do not get that in a CPU native format.

Not Exactly

djb, October 21, 2009 - 8:53 am UTC

From assembly point of view the comparison of two numbers is simply done by one command which triggers one (!) low level call, while for "strings" one asm command triggers several (!) low level calls because characters in the strings are compared against each other step by step...

Not exactly. The NUMBER datatype in Oracle is stored as a BCD (Binary Coded Decimal) which is, in effect, a string type. So comparison of a number and a varchar is essentially the same. The only difference is that character data types go through the NLS layer, so that takes a teensy bit longer, but I've found the difference to be negligible.

Not exactly?

Franky, October 22, 2009 - 5:43 am UTC

There is a difference between data storage (on the disk) and data usage of processors.

Example:

decimal: 49
binary : 00110001
BCD : 01001001

I can see no difference between 00110001 xor 00110001 (binary) and 01001001 xor 01001001 (BCD), neither does intel, since their x86 processors use binary (!) instructions for BCD numbers.

Processor instructions determine how BCD numbers are used and not their "type" of storage.

Regards,

Franky

Tom Kyte
October 23, 2009 - 1:25 pm UTC

ummm

what about the number 123456789987654321
ops$tkyte%ORA9IR2> select dump(123456789987654321 ) from dual;

DUMP(123456789987654321)
---------------------------------------------
Typ=2 Len=10: 201,13,35,57,79,100,88,66,44,22


we do NOT have a number to pop into a register and have the CPU process - the number type is a complex type with

a null flag indicator
a leading length byte
sign and scale information
packed numeric data

in a varying string between 0 and 22 bytes in length.

Start seeing a difference, you cannot use the CPU straight out to compare two Oracle numbers - it takes a bit of procedural code in a library to do it.

Exactly

djb, October 23, 2009 - 12:50 pm UTC

I don't think you understand BCD. The example you provide is not accurate.

Let's use a more realistic number:
73676598

As a 32-bit integer, it would be stored like:
binary (hex): 0x04643736

which easily is stored in 3 and 1/2 bytes. But Oracle stores numbers in a non-trivial way (for good reasons).

A dump of numbers from Oracle:
0           Len=1: 128
1           Len=2: 193,2
10          Len=2: 193,11
100         Len=2: 194,2
73676598    Len=5: 196,74,68,66,99
1930939826  Len=6: 197,20,31,94,99,27

I don't have the time to research the details of how Oracle goes about formatting the bytes (but you can Google it).

Note that the number 1 uses two bytes.
Note that the number 73676598 uses five bytes.

It's a form of BCD, which winds up using string comparison instructions on the machine.

BTW, you can play around looking at how Oracle stores these values by using the DUMP function:

select dump(12345) from dual;

?

Franky, October 26, 2009 - 7:44 am UTC

you need to COME ON BACK

What do you mean?


Please don't leave Oracle - You actually need to come inside and understand how it works!


Don't leave ????

Please clarify here or in private.

Tom Kyte
October 26, 2009 - 2:17 pm UTC

Franky

you wrote:

... I think we need to leave Oracle for a moment to get better understanding: ...


I said "you need to come on back inside"



You were making assumptions about how you think it works

It doesn't work the way you think, therefore, you need to come back into the database realm and look at it from the way the database works - not how you think it works.

Exactly.

Franky, October 26, 2009 - 8:24 am UTC

djb, Tom

I apologize for my misunderstanding about Oracle's storage mechanism, thank you for the enlightenment. I thought I knew the dump function, but as it seems I misinterpreted the documentaion which is:

"DUMP returns a VARCHAR2 value containing the datatype code, length in bytes, and internal representation of expr."

"internal representation" - this one here made me think that things work otherwise...

Regards,

Franky

Ok

Franky, October 26, 2009 - 2:34 pm UTC


Franky

you wrote:

... I think we need to leave Oracle for a moment to get better understanding: ...


I said "you need to come on back inside"


Sorry, it seems that I misinterpreted your response as well... I think I need to have some days off...

Theory is good, evidence better.

Jack, October 27, 2009 - 11:38 am UTC

In my test below, character comparisons are just as fast as numbers - no armchair computer science needed.  Also note, implicit conversion isn't a very big deal.

- Jack


SQL> ------------------------
SQL> -- Create input table 1.
SQL> ------------------------
SQL> 
SQL> create table temp_01_tb
  2    tablespace x_ts
  3    nologging
  4    parallel 16
  5  as
  6  select id as id_num,
  7         cast(to_char(id) as varchar2(30)) as id_char
  8  from (
  9       select trunc(dbms_random.value(100000000,100000000000)) id
 10       from large_tb 
 11       where rownum <= 10000000
 12       )
 13  ;

Table created.

Elapsed: 00:02:14.39
SQL> 
SQL> ---------
SQL> -- Count.
SQL> ---------
SQL> 
SQL> select count(*) from temp_01_tb;

        COUNT(*)
----------------
      10,000,000

Elapsed: 00:00:00.81
SQL> 
SQL> -------------------------------------
SQL> -- Freq length of id_num and id_char.
SQL> -------------------------------------
SQL> 
SQL> break on report
SQL> compute sum of count(*) on report
SQL> 
SQL> select length(id_num),
  2         length(id_char),
  3         count(*)
  4  from temp_01_tb
  5  group by length(id_num),
  6           length(id_char)
  7  order by 1
  8  ;

  LENGTH(ID_NUM)  LENGTH(ID_CHAR)         COUNT(*)
---------------- ---------------- ----------------
               9                9           89,854
              10               10          901,563
              11               11        9,008,583
                                  ----------------
sum                                     10,000,000

Elapsed: 00:00:01.41
SQL> 
SQL> clear breaks
SQL> clear computes
SQL> 
SQL> ----------
SQL> -- Sample.
SQL> ----------
SQL> 
SQL> select *
  2  from temp_01_tb
  3  where rownum <= 10
  4  ;

          ID_NUM ID_CHAR
---------------- ------------------------------
  86,969,995,517 86969995517
  57,700,034,749 57700034749
  98,820,424,831 98820424831
  85,003,779,624 85003779624
   9,644,826,478 9644826478
  88,393,740,235 88393740235
  99,155,106,367 99155106367
  26,633,943,599 26633943599
  67,287,256,584 67287256584
  29,916,845,470 29916845470

10 rows selected.

Elapsed: 00:00:00.17
SQL> 
SQL> ------------
SQL> -- Describe.
SQL> ------------
SQL> 
SQL> desc temp_01_tb
 Name                                Null?    Type
 ----------------------------------- -------- ------------------------
 ID_NUM                                       NUMBER
 ID_CHAR                                      VARCHAR2(30)

SQL> 
SQL> ------------------------
SQL> -- Create input table 2.
SQL> ------------------------
SQL> 
SQL> create table temp_02_tb
  2    tablespace x_ts
  3    nologging
  4    parallel 16
  5  as
  6  select id as id_num,
  7         cast(to_char(id) as varchar2(30)) as id_char
  8  from (
  9       select trunc(dbms_random.value(100000000,100000000000)) id
 10       from large_tb
 11       where rownum <= 10000000
 12       )
 13  ;

Table created.

Elapsed: 00:02:14.14
SQL> 
SQL> ---------
SQL> -- Count.
SQL> ---------
SQL> 
SQL> select count(*) from temp_02_tb;

        COUNT(*)
----------------
      10,000,000

Elapsed: 00:00:00.70
SQL> 
SQL> -------------------------------------
SQL> -- Freq length of id_num and id_char.
SQL> -------------------------------------
SQL> 
SQL> break on report
SQL> compute sum of count(*) on report
SQL> 
SQL> select length(id_num),
  2         length(id_char),
  3         count(*)
  4  from temp_02_tb
  5  group by length(id_num),
  6           length(id_char)
  7  order by 1
  8  ;

  LENGTH(ID_NUM)  LENGTH(ID_CHAR)         COUNT(*)
---------------- ---------------- ----------------
               9                9           90,332
              10               10          901,092
              11               11        9,008,576
                                  ----------------
sum                                     10,000,000

Elapsed: 00:00:01.41
SQL> 
SQL> clear breaks
SQL> clear computes
SQL> 
SQL> ----------
SQL> -- Sample.
SQL> ----------
SQL> 
SQL> select *
  2  from temp_02_tb
  3  where rownum <= 10
  4  ;

          ID_NUM ID_CHAR
---------------- ------------------------------
  54,067,209,027 54067209027
   5,979,558,060 5979558060
  72,269,951,072 72269951072
  62,556,776,987 62556776987
  16,990,890,984 16990890984
  47,303,283,438 47303283438
  70,490,427,076 70490427076
  49,336,811,237 49336811237
  68,728,531,845 68728531845
   3,126,925,174 3126925174

10 rows selected.

Elapsed: 00:00:00.16
SQL> 
SQL> ------------
SQL> -- Describe.
SQL> ------------
SQL> 
SQL> desc temp_02_tb
 Name                                Null?    Type
 ----------------------------------- -------- ------------------------
 ID_NUM                                       NUMBER
 ID_CHAR                                      VARCHAR2(30)

SQL> 
SQL> -----------------------
SQL> -- Part II.  Run tests.
SQL> -----------------------
SQL> 
SQL> alter table temp_01_tb parallel 1;

Table altered.

Elapsed: 00:00:00.02
SQL> alter table temp_02_tb parallel 1;

Table altered.

Elapsed: 00:00:00.00
SQL> 
SQL> --------------------
SQL> -- Test num compare.
SQL> --------------------
SQL> 
SQL> select count(*)
  2  from temp_01_tb a,
  3       temp_02_tb b
  4  where a.id_num = b.id_num
  5  ;

        COUNT(*)
----------------
             993

Elapsed: 00:00:29.43
SQL> 
SQL> ---------------------
SQL> -- Test char compare.
SQL> ---------------------
SQL> 
SQL> select count(*)
  2  from temp_01_tb a,
  3       temp_02_tb b
  4  where a.id_char = b.id_char
  5  ;

        COUNT(*)
----------------
             993

Elapsed: 00:00:29.12
SQL> 
SQL> ----------------------------
SQL> -- Test num to char compare.
SQL> ----------------------------
SQL> 
SQL> select count(*)
  2  from temp_01_tb a,
  3       temp_02_tb b
  4  where a.id_num = b.id_char
  5  ;

        COUNT(*)
----------------
             993

Elapsed: 00:00:30.98
SQL> 
SQL> --------------------------
SQL> -- Test num compare again.
SQL> --------------------------
SQL> 
SQL> select count(*)
  2  from temp_01_tb a,
  3       temp_02_tb b
  4  where a.id_num = b.id_num
  5  ;

        COUNT(*)
----------------
             993

Elapsed: 00:00:24.84
SQL> 
SQL> ---------------------------
SQL> -- Test char compare again.
SQL> ---------------------------
SQL> 
SQL> select count(*)
  2  from temp_01_tb a,
  3       temp_02_tb b
  4  where a.id_char = b.id_char
  5  ;

        COUNT(*)
----------------
             993

Elapsed: 00:00:25.70
SQL> 
SQL> ----------------------------------
SQL> -- Test num to char compare again.
SQL> ----------------------------------
SQL> 
SQL> select count(*)
  2  from temp_01_tb a,
  3       temp_02_tb b
  4  where a.id_num = b.id_char
  5  ;

        COUNT(*)
----------------
             993

Elapsed: 00:00:27.99
SQL> 
SQL> spool off

Tom Kyte
October 27, 2009 - 12:19 pm UTC

... Also note, implicit conversion isn't a very
big deal.
...

there you are wrong. If you compare a string to string 1,000,000 times or a number to a number 1,000,000 times - that'll take about the same time.

But compare a string that has to be converted into a number 1,000,000 times and forget it. Very different.


Your example has a quirk in it that you did a hash join, the actual thing you were testing was such a small contributor to the overall runtime that you cannot see that.

so, just testing what we want to test:

ops$tkyte%ORA10GR2> create table t
  2  as
  3  select rownum num, to_char(rownum) str
  4    from dual
  5  connect by level <= 1000000
  6  /

Table created.

ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> @trace
ops$tkyte%ORA10GR2> alter session set events '10046 trace name context forever, level 12';

Session altered.

ops$tkyte%ORA10GR2> select count(*) from t where num = 0;

  COUNT(*)
----------
         0

ops$tkyte%ORA10GR2> select count(*) from t where str = '0';

  COUNT(*)
----------
         0

ops$tkyte%ORA10GR2> select count(*) from t where num = '0';

  COUNT(*)
----------
         0

ops$tkyte%ORA10GR2> select count(*) from t where str = 0;

  COUNT(*)
----------
         0







select count(*) from t where num = 0

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.06       0.05          0       2342          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.06       0.05          0       2342          0           1
********************************************************************************
select count(*) from t where str = '0'

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.03       0.03          0       2342          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.03       0.03          0       2342          0           1
********************************************************************************
select count(*) from t where num = '0'

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.03       0.03          0       2342          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.03       0.03          0       2342          0           1
********************************************************************************
select count(*) from t where str = 0

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.28       0.27          0       2342          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.28       0.27          0       2342          0           1



In the last query, the implicit conversion was of the column to a number - 1,000,000 times and that is expensive.

Ok. I believe you.

Jack, October 27, 2009 - 1:10 pm UTC

I guess I could argue that two tenths of a second per million evaluations "isn't a very big deal", but that would just be argumentative. Thanks Tom.

- Jack
Tom Kyte
October 27, 2009 - 4:59 pm UTC

how about this:

an order of magnitude difference.


because that is what it is, an order of magnitude.

and an error waiting to happen - compare string to number implies to_number(string)=number - which can easily fail.

:)

Different view

ygp, October 28, 2009 - 3:26 am UTC

but in general, compare a str to a str and it is as fast as comparing a number to a number

can also be said as

but in general, compare a number to a number and it is as slow as comparing a str to a str

Oracle uses BCD format for numbers, and hence the numbers are compared like string

Tom Kyte
October 28, 2009 - 8:00 am UTC

a bcd like format, it isn't bcd.

therefore is number to string conversion less expensive than string to number conversion

George Joseph, May 13, 2010 - 5:40 am UTC

Tom,
Based on the tests shown above is it fair to conclude that string to number conversion is more expensive than number to string?.

I imagine 1a,2a to perform a to_number(str)conversion
and 1b,2b to perform to_char(num) conversion

1a select * from t where str = 123456
elapsed time:0.29
cpu time:0.30

1b select * from t where num= '123456'
elapsed time:0.05
cpu time:0.05

2a select count(*) from t where str = 0
elapsed time:0.27
cpu time:0.28

2b select count(*) from t where num = '0'
elapsed time:0.03
cpu time:0.03

I followed your advice which said that its always wiser to compare dates and dates and strings and strings. So when i was confronted with a piece of code which did
to_char(date_col,'yyyymm')=to_char(date_col_field,'yyyymm')

I suggested to change the sql code to compare after truncating
trunc(date_col,'MM')=trunc(date_col_field,'MM')


Thanks
George

More to Explore

PL/SQL demos

Check out more PL/SQL tutorials on our LiveSQL tool.

PL/SQL docs

PL/SQL reference manual from the Oracle documentation library