Home>Question Details



Prats -- Thanks for the question regarding "Why is char comparison slower than intergers/numbers/Dates", version 10g

Submitted on 20-Oct-2009 6:45 Central time zone
Last updated 28-Oct-2009 8:00

You Asked

Tom,

I have inherited a production database that is storing some numbers and data as char datatype. For example, a char column with size 5 is storing values like '0055', '0045'.
Another instance is storing Month and year separately as October and 2009 in different columns. Hence I have to perform char comparison after pulling month and date from a date say 3/10/2009.

The performance of queries performing comparison among char data is very slow. Can you elaborate reasons for this. Is there a way to handle performance degradation here.

Thanks for your help...

and we said...

You would need to provide a tad more information.

I seriously (totally, completely, utterly) doubt - do not think - do not believe the datatype has a thing to do with it.

What would count would be the queries themselves.


You give no concrete examples
You give nothing to go on


... Hence I have to perform char comparison after pulling month and date from a date say 3/10/2009.
...

it SOUNDS like you have a date - but probably you mean "i have a string with a date in it" or something. I don't know the semantics of your comparison here - I have *no examples*, nothing to work with.



ops$tkyte%ORA10GR2> create table t
  2  as
  3  select to_char( l, 'fm000000' ) str, l num
  4    from (select level l from dual connect by level < 1000000 )
  5  /

Table created.

ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> @trace
ops$tkyte%ORA10GR2> alter session set events '10046 trace name context forever, level 
12';

Session altered.

ops$tkyte%ORA10GR2> select * from t where str = '123456';

STR            NUM
------- ----------
123456      123456

ops$tkyte%ORA10GR2> select * from t where str = 123456;

STR            NUM
------- ----------
123456      123456

ops$tkyte%ORA10GR2> select * from t where num = 123456;

STR            NUM
------- ----------
123456      123456

ops$tkyte%ORA10GR2> select * from t where num = '123456';

STR            NUM
------- ----------
123456      123456



When you look at the tkprof, you can see when you compare a str database column to a number input - we have to convert the str into a number - so that

select * from t where str = 123456;

is really

select * from t where TO_NUMBER(str) = 123456

and that will affect performance - but in general, compare a str to a str and it is as fast as comparing a number to a number

Likely - you have implicit conversions taking place and you need to correct that - compare only STRINGS to STRINGS and only NUMBERS to NUMBERS and never compare a string to a number - never.
select * from t where str = '123456'

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.04       0.04          0       2359          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.04       0.04          0       2359          0           1
********************************************************************************
select * from t where str = 123456

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.30       0.29          0       2359          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.30       0.29          0       2359          0           1
********************************************************************************
select * from t where num = 123456

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.05       0.04          0       2359          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.05       0.05          0       2359          0           1
********************************************************************************
select * from t where num = '123456'

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.05       0.05          0       2359          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.05       0.05          0       2359          0           1

Reviews    
4 stars   October 20, 2009 - 7am Central time zone
Reviewer: riyaz 
It sounds like only if implicit conversion or explicit conversion happens, then it is advisable to 
use natural data types.
But in  your other threads, you are always recommending to use number data type wherever possible 
to get better performance. (But here  you are saying, if varchar is compared with varchar, the 
performance will be same as number column compared with number column.) In earlier threads, you 
said number data type comparison is always faster.


Followup   October 22, 2009 - 4pm Central time zone:

... It sounds like only if implicit conversion or explicit conversion happens, then
it is advisable to use natural data types.
...

I don't know what that means?


... But in your other threads, you are always recommending to use number data type
wherever possible to get better performance. ...

where - show me where I said that?


... In earlier threads, you said number data type
comparison is always faster. ...

give me an example please, go for it.

and then we'll see if

a) I said it at all (doesn't sound like me)
b) if I did, what the CONTEXT was - the discussion surrounding the statement.
5 stars number vs character comparison   October 21, 2009 - 3am Central time zone
Reviewer: Franky from Europe
Hello,
 
  I think we need to leave Oracle for a moment to get better understanding:

From assembly point of view the comparison of two numbers is simply done by one command which triggers one (!) low level call, while for "strings" one asm command triggers several (!) low level calls because characters in the strings are compared against each other step by step...

How does this affect Oracle performance? The same way.

Today's servers have multiple cores and core clocks in the GHz range, thus this "problem" doesn't show up very often, but if you need to focus on the performance, then use numbers whenever possible.

Tom wrote the following : "but in general, compare a str to a str and it is as fast as comparing a number to a number" which means that (!) in general (!) you don't get problems when you compare strings to strigns since the server CPUs do a very quick job for you. Problems might arrise whenever char or varchar2(2000) "strings" have to be compared million or billion (!) times and the first ~1900 characters are identical. If this happened in a very busy database instance, then you would have a real problem...

Regards,

Franky


Followup   October 23, 2009 - 11am Central time zone:

you need to COME ON BACK

you assume that we use native numbers (2 byte, 4 byte integers).

We do not. We use a packed decimal like format in a varying 'string' of 0 to 22 bytes.

Now what?


Please don't leave Oracle - You actually need to come inside and understand how it works!

... but if you need to focus on the performance, then use numbers whenever possible. ...

not so - not if you know how numbers are really stored in Oracle - 38 digits of precision - you do not get that in a CPU native format.
4 stars Not Exactly   October 21, 2009 - 8am Central time zone
Reviewer: djb from Georgia
From assembly point of view the comparison of two numbers is simply done by one command which triggers one (!) low level call, while for "strings" one asm command triggers several (!) low level calls because characters in the strings are compared against each other step by step...


Not exactly. The NUMBER datatype in Oracle is stored as a BCD (Binary Coded Decimal) which is, in effect, a string type. So comparison of a number and a varchar is essentially the same. The only difference is that character data types go through the NLS layer, so that takes a teensy bit longer, but I've found the difference to be negligible.


5 stars Not exactly?   October 22, 2009 - 5am Central time zone
Reviewer: Franky from Europe
There is a difference between data storage (on the disk) and data usage of processors.

Example:

decimal: 49
binary : 00110001
BCD  : 01001001

I can see no difference between 00110001 xor 00110001 (binary) and 01001001 xor 01001001 (BCD), neither does intel, since their x86 processors use binary (!) instructions for BCD numbers.

Processor instructions determine how BCD numbers are used and not their "type" of storage.

Regards,

Franky


Followup   October 23, 2009 - 1pm Central time zone:

ummm

what about the number 123456789987654321
ops$tkyte%ORA9IR2> select dump(123456789987654321 ) from dual;

DUMP(123456789987654321)
---------------------------------------------
Typ=2 Len=10: 201,13,35,57,79,100,88,66,44,22


we do NOT have a number to pop into a register and have the CPU process - the number type is a complex type with

a null flag indicator
a leading length byte
sign and scale information
packed numeric data

in a varying string between 0 and 22 bytes in length.

Start seeing a difference, you cannot use the CPU straight out to compare two Oracle numbers - it takes a bit of procedural code in a library to do it. 
4 stars Exactly   October 23, 2009 - 12pm Central time zone
Reviewer: djb from Rome GA
I don't think you understand BCD. The example you provide is not accurate.

Let's use a more realistic number:
73676598

As a 32-bit integer, it would be stored like:
binary (hex): 0x04643736

which easily is stored in 3 and 1/2 bytes. But Oracle stores numbers in a non-trivial way (for good reasons).

A dump of numbers from Oracle:
0           Len=1: 128
1           Len=2: 193,2
10          Len=2: 193,11
100         Len=2: 194,2
73676598    Len=5: 196,74,68,66,99
1930939826  Len=6: 197,20,31,94,99,27

I don't have the time to research the details of how Oracle goes about formatting the bytes (but you can Google it).

Note that the number 1 uses two bytes.
Note that the number 73676598 uses five bytes.

It's a form of BCD, which winds up using string comparison instructions on the machine.

BTW, you can play around looking at how Oracle stores these values by using the DUMP function:

select dump(12345) from dual;


1 stars ?   October 26, 2009 - 7am Central time zone
Reviewer: Franky from Europe
you need to COME ON BACK

What do you mean?


Please don't leave Oracle - You actually need to come inside and understand how it works!


Don't leave ????

Please clarify here or in private.


Followup   October 26, 2009 - 2pm Central time zone:

Franky

you wrote:

...  I think we need to leave Oracle for a moment to get better understanding: ...


I said "you need to come on back inside"



You were making assumptions about how you think it works

It doesn't work the way you think, therefore, you need to come back into the database realm and look at it from the way the database works - not how you think it works.
5 stars Exactly.   October 26, 2009 - 8am Central time zone
Reviewer: Franky from Europe
djb, Tom

I apologize for my misunderstanding about Oracle's storage mechanism, thank you for the enlightenment. I thought I knew the dump function, but as it seems I misinterpreted the documentaion which is:

"DUMP returns a VARCHAR2 value containing the datatype code, length in bytes, and internal representation of expr."

"internal representation" - this one here made me think that things work otherwise...

Regards,

Franky


5 stars Ok   October 26, 2009 - 2pm Central time zone
Reviewer: Franky from Europe

Franky

you wrote:

... I think we need to leave Oracle for a moment to get better understanding: ...


I said "you need to come on back inside"


Sorry, it seems that I misinterpreted your response as well... I think I need to have some days off...

2 stars Theory is good, evidence better.   October 27, 2009 - 11am Central time zone
Reviewer: Jack from Little Rock, AR USA
In my test below, character comparisons are just as fast as numbers - no armchair computer science 
needed.  Also note, implicit conversion isn't a very big deal.

- Jack


SQL> ------------------------
SQL> -- Create input table 1.
SQL> ------------------------
SQL> 
SQL> create table temp_01_tb
  2    tablespace x_ts
  3    nologging
  4    parallel 16
  5  as
  6  select id as id_num,
  7         cast(to_char(id) as varchar2(30)) as id_char
  8  from (
  9       select trunc(dbms_random.value(100000000,100000000000)) id
 10       from large_tb 
 11       where rownum <= 10000000
 12       )
 13  ;

Table created.

Elapsed: 00:02:14.39
SQL> 
SQL> ---------
SQL> -- Count.
SQL> ---------
SQL> 
SQL> select count(*) from temp_01_tb;

        COUNT(*)
----------------
      10,000,000

Elapsed: 00:00:00.81
SQL> 
SQL> -------------------------------------
SQL> -- Freq length of id_num and id_char.
SQL> -------------------------------------
SQL> 
SQL> break on report
SQL> compute sum of count(*) on report
SQL> 
SQL> select length(id_num),
  2         length(id_char),
  3         count(*)
  4  from temp_01_tb
  5  group by length(id_num),
  6           length(id_char)
  7  order by 1
  8  ;

  LENGTH(ID_NUM)  LENGTH(ID_CHAR)         COUNT(*)
---------------- ---------------- ----------------
               9                9           89,854
              10               10          901,563
              11               11        9,008,583
                                  ----------------
sum                                     10,000,000

Elapsed: 00:00:01.41
SQL> 
SQL> clear breaks
SQL> clear computes
SQL> 
SQL> ----------
SQL> -- Sample.
SQL> ----------
SQL> 
SQL> select *
  2  from temp_01_tb
  3  where rownum <= 10
  4  ;

          ID_NUM ID_CHAR
---------------- ------------------------------
  86,969,995,517 86969995517
  57,700,034,749 57700034749
  98,820,424,831 98820424831
  85,003,779,624 85003779624
   9,644,826,478 9644826478
  88,393,740,235 88393740235
  99,155,106,367 99155106367
  26,633,943,599 26633943599
  67,287,256,584 67287256584
  29,916,845,470 29916845470

10 rows selected.

Elapsed: 00:00:00.17
SQL> 
SQL> ------------
SQL> -- Describe.
SQL> ------------
SQL> 
SQL> desc temp_01_tb
 Name                                Null?    Type
 ----------------------------------- -------- ------------------------
 ID_NUM                                       NUMBER
 ID_CHAR                                      VARCHAR2(30)

SQL> 
SQL> ------------------------
SQL> -- Create input table 2.
SQL> ------------------------
SQL> 
SQL> create table temp_02_tb
  2    tablespace x_ts
  3    nologging
  4    parallel 16
  5  as
  6  select id as id_num,
  7         cast(to_char(id) as varchar2(30)) as id_char
  8  from (
  9       select trunc(dbms_random.value(100000000,100000000000)) id
 10       from large_tb
 11       where rownum <= 10000000
 12       )
 13  ;

Table created.

Elapsed: 00:02:14.14
SQL> 
SQL> ---------
SQL> -- Count.
SQL> ---------
SQL> 
SQL> select count(*) from temp_02_tb;

        COUNT(*)
----------------
      10,000,000

Elapsed: 00:00:00.70
SQL> 
SQL> -------------------------------------
SQL> -- Freq length of id_num and id_char.
SQL> -------------------------------------
SQL> 
SQL> break on report
SQL> compute sum of count(*) on report
SQL> 
SQL> select length(id_num),
  2         length(id_char),
  3         count(*)
  4  from temp_02_tb
  5  group by length(id_num),
  6           length(id_char)
  7  order by 1
  8  ;

  LENGTH(ID_NUM)  LENGTH(ID_CHAR)         COUNT(*)
---------------- ---------------- ----------------
               9                9           90,332
              10               10          901,092
              11               11        9,008,576
                                  ----------------
sum                                     10,000,000

Elapsed: 00:00:01.41
SQL> 
SQL> clear breaks
SQL> clear computes
SQL> 
SQL> ----------
SQL> -- Sample.
SQL> ----------
SQL> 
SQL> select *
  2  from temp_02_tb
  3  where rownum <= 10
  4  ;

          ID_NUM ID_CHAR
---------------- ------------------------------
  54,067,209,027 54067209027
   5,979,558,060 5979558060
  72,269,951,072 72269951072
  62,556,776,987 62556776987
  16,990,890,984 16990890984
  47,303,283,438 47303283438
  70,490,427,076 70490427076
  49,336,811,237 49336811237
  68,728,531,845 68728531845
   3,126,925,174 3126925174

10 rows selected.

Elapsed: 00:00:00.16
SQL> 
SQL> ------------
SQL> -- Describe.
SQL> ------------
SQL> 
SQL> desc temp_02_tb
 Name                                Null?    Type
 ----------------------------------- -------- ------------------------
 ID_NUM                                       NUMBER
 ID_CHAR                                      VARCHAR2(30)

SQL> 
SQL> -----------------------
SQL> -- Part II.  Run tests.
SQL> -----------------------
SQL> 
SQL> alter table temp_01_tb parallel 1;

Table altered.

Elapsed: 00:00:00.02
SQL> alter table temp_02_tb parallel 1;

Table altered.

Elapsed: 00:00:00.00
SQL> 
SQL> --------------------
SQL> -- Test num compare.
SQL> --------------------
SQL> 
SQL> select count(*)
  2  from temp_01_tb a,
  3       temp_02_tb b
  4  where a.id_num = b.id_num
  5  ;

        COUNT(*)
----------------
             993

Elapsed: 00:00:29.43
SQL> 
SQL> ---------------------
SQL> -- Test char compare.
SQL> ---------------------
SQL> 
SQL> select count(*)
  2  from temp_01_tb a,
  3       temp_02_tb b
  4  where a.id_char = b.id_char
  5  ;

        COUNT(*)
----------------
             993

Elapsed: 00:00:29.12
SQL> 
SQL> ----------------------------
SQL> -- Test num to char compare.
SQL> ----------------------------
SQL> 
SQL> select count(*)
  2  from temp_01_tb a,
  3       temp_02_tb b
  4  where a.id_num = b.id_char
  5  ;

        COUNT(*)
----------------
             993

Elapsed: 00:00:30.98
SQL> 
SQL> --------------------------
SQL> -- Test num compare again.
SQL> --------------------------
SQL> 
SQL> select count(*)
  2  from temp_01_tb a,
  3       temp_02_tb b
  4  where a.id_num = b.id_num
  5  ;

        COUNT(*)
----------------
             993

Elapsed: 00:00:24.84
SQL> 
SQL> ---------------------------
SQL> -- Test char compare again.
SQL> ---------------------------
SQL> 
SQL> select count(*)
  2  from temp_01_tb a,
  3       temp_02_tb b
  4  where a.id_char = b.id_char
  5  ;

        COUNT(*)
----------------
             993

Elapsed: 00:00:25.70
SQL> 
SQL> ----------------------------------
SQL> -- Test num to char compare again.
SQL> ----------------------------------
SQL> 
SQL> select count(*)
  2  from temp_01_tb a,
  3       temp_02_tb b
  4  where a.id_num = b.id_char
  5  ;

        COUNT(*)
----------------
             993

Elapsed: 00:00:27.99
SQL> 
SQL> spool off


Followup   October 27, 2009 - 12pm Central time zone:

... Also note, implicit conversion isn't a very
big deal.
...

there you are wrong. If you compare a string to string 1,000,000 times or a number to a number 1,000,000 times - that'll take about the same time.

But compare a string that has to be converted into a number 1,000,000 times and forget it. Very different.


Your example has a quirk in it that you did a hash join, the actual thing you were testing was such a small contributor to the overall runtime that you cannot see that.

so, just testing what we want to test:

ops$tkyte%ORA10GR2> create table t
  2  as
  3  select rownum num, to_char(rownum) str
  4    from dual
  5  connect by level <= 1000000
  6  /

Table created.

ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> @trace
ops$tkyte%ORA10GR2> alter session set events '10046 trace name context forever, level 12';

Session altered.

ops$tkyte%ORA10GR2> select count(*) from t where num = 0;

  COUNT(*)
----------
         0

ops$tkyte%ORA10GR2> select count(*) from t where str = '0';

  COUNT(*)
----------
         0

ops$tkyte%ORA10GR2> select count(*) from t where num = '0';

  COUNT(*)
----------
         0

ops$tkyte%ORA10GR2> select count(*) from t where str = 0;

  COUNT(*)
----------
         0







select count(*) from t where num = 0

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.06       0.05          0       2342          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.06       0.05          0       2342          0           1
********************************************************************************
select count(*) from t where str = '0'

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.03       0.03          0       2342          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.03       0.03          0       2342          0           1
********************************************************************************
select count(*) from t where num = '0'

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.03       0.03          0       2342          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.03       0.03          0       2342          0           1
********************************************************************************
select count(*) from t where str = 0

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.28       0.27          0       2342          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.28       0.27          0       2342          0           1



In the last query, the implicit conversion was of the column to a number - 1,000,000 times and that is expensive.
5 stars Ok. I believe you.   October 27, 2009 - 1pm Central time zone
Reviewer: Jack from Little Rock, AR USA
I guess I could argue that two tenths of a second per million evaluations "isn't a very big deal", 
but that would just be argumentative.  Thanks Tom.

- Jack


Followup   October 27, 2009 - 4pm Central time zone:

how about this:

an order of magnitude difference.


because that is what it is, an order of magnitude.

and an error waiting to happen - compare string to number implies to_number(string)=number - which can easily fail.

:)
5 stars Different view   October 28, 2009 - 3am Central time zone
Reviewer: ygp from India
but in general, compare a str to a str and it is as fast as comparing a number to a number 

can also be said as

but in general, compare a number to a number and it is as slow as comparing a str to a str

Oracle uses BCD format for numbers, and hence the numbers are compared like string


Followup   October 28, 2009 - 8am Central time zone:

a bcd like format, it isn't bcd.

Write a Review
 


All information and materials provided here are provided "as-is"; Oracle disclaims all express and implied warranties, including, the implied warranties of merchantability or fitness for a particular use. Oracle shall not be liable for any damages, including, direct, indirect, incidental, special or consequential damages for loss of profits, revenue, data or data use, incurred by you or any third party in connection with the use of this information or these materials.

About Oracle | Legal Notices and Terms of Use | Privacy Statement