
October 20, 2009 - 7am Central time zone
Reviewer: riyaz
It sounds like only if implicit conversion or explicit conversion happens, then it is advisable to
use natural data types.
But in your other threads, you are always recommending to use number data type wherever possible
to get better performance. (But here you are saying, if varchar is compared with varchar, the
performance will be same as number column compared with number column.) In earlier threads, you
said number data type comparison is always faster.
Followup October 22, 2009 - 4pm Central time zone:
... It sounds like only if implicit conversion or explicit conversion happens, then
it is advisable to use natural data types.
...
I don't know what that means?
... But in your other threads, you are always recommending to use number data type
wherever possible to get better performance. ...
where - show me where I said that?
... In earlier threads, you said number data type
comparison is always faster. ...
give me an example please, go for it.
and then we'll see if
a) I said it at all (doesn't sound like me)
b) if I did, what the CONTEXT was - the discussion surrounding the statement.
number vs character comparison
October 21, 2009 - 3am Central time zone
Reviewer: Franky from Europe
Hello,
I think we need to leave Oracle for a moment to get better understanding:
From assembly point of view the comparison of two numbers is simply done by one command which triggers one (!) low level call, while for "strings" one asm command triggers several (!) low level calls because characters in the strings are compared against each other step by step...
How does this affect Oracle performance? The same way.
Today's servers have multiple cores and core clocks in the GHz range, thus this "problem" doesn't show up very often, but if you need to focus on the performance, then use numbers whenever possible.
Tom wrote the following : "but in general, compare a str to a str and it is as fast as comparing a number to a number" which means that (!) in general (!) you don't get problems when you compare strings to strigns since the server CPUs do a very quick job for you. Problems might arrise whenever char or varchar2(2000) "strings" have to be compared million or billion (!) times and the first ~1900 characters are identical. If this happened in a very busy database instance, then you would have a real problem...
Regards,
Franky
Followup October 23, 2009 - 11am Central time zone:
you need to COME ON BACK
you assume that we use native numbers (2 byte, 4 byte integers).
We do not. We use a packed decimal like format in a varying 'string' of 0 to 22 bytes.
Now what?
Please don't leave Oracle - You actually need to come inside and understand how it works!
... but if you need to focus on the performance, then use numbers whenever possible. ...
not so - not if you know how numbers are really stored in Oracle - 38 digits of precision - you do not get that in a CPU native format.
Not Exactly
October 21, 2009 - 8am Central time zone
Reviewer: djb from Georgia
From assembly point of view the comparison of two numbers is simply done by one command which triggers one (!) low level call, while for "strings" one asm command triggers several (!) low level calls because characters in the strings are compared against each other step by step...
Not exactly. The NUMBER datatype in Oracle is stored as a BCD (Binary Coded Decimal) which is, in effect, a string type. So comparison of a number and a varchar is essentially the same. The only difference is that character data types go through the NLS layer, so that takes a teensy bit longer, but I've found the difference to be negligible.
Not exactly?
October 22, 2009 - 5am Central time zone
Reviewer: Franky from Europe
There is a difference between data storage (on the disk) and data usage of processors.
Example:
decimal: 49
binary : 00110001
BCD : 01001001
I can see no difference between 00110001 xor 00110001 (binary) and 01001001 xor 01001001 (BCD), neither does intel, since their x86 processors use binary (!) instructions for BCD numbers.
Processor instructions determine how BCD numbers are used and not their "type" of storage.
Regards,
Franky
Followup October 23, 2009 - 1pm Central time zone:
ummm
what about the number 123456789987654321
ops$tkyte%ORA9IR2> select dump(123456789987654321 ) from dual;
DUMP(123456789987654321)
---------------------------------------------
Typ=2 Len=10: 201,13,35,57,79,100,88,66,44,22
we do NOT have a number to pop into a register and have the CPU process - the number type is a complex type with
a null flag indicator
a leading length byte
sign and scale information
packed numeric data
in a varying string between 0 and 22 bytes in length.
Start seeing a difference, you cannot use the CPU straight out to compare two Oracle numbers - it takes a bit of procedural code in a library to do it.
Exactly
October 23, 2009 - 12pm Central time zone
Reviewer: djb from Rome GA
I don't think you understand BCD. The example you provide is not accurate.
Let's use a more realistic number:
73676598
As a 32-bit integer, it would be stored like:
binary (hex): 0x04643736
which easily is stored in 3 and 1/2 bytes. But Oracle stores numbers in a non-trivial way (for good reasons).
A dump of numbers from Oracle:
0 Len=1: 128
1 Len=2: 193,2
10 Len=2: 193,11
100 Len=2: 194,2
73676598 Len=5: 196,74,68,66,99
1930939826 Len=6: 197,20,31,94,99,27
I don't have the time to research the details of how Oracle goes about formatting the bytes (but you can Google it).
Note that the number 1 uses two bytes.
Note that the number 73676598 uses five bytes.
It's a form of BCD, which winds up using string comparison instructions on the machine.
BTW, you can play around looking at how Oracle stores these values by using the DUMP function:
select dump(12345) from dual;
?
October 26, 2009 - 7am Central time zone
Reviewer: Franky from Europe
you need to COME ON BACK
What do you mean?
Please don't leave Oracle - You actually need to come inside and understand how it works!
Don't leave ????
Please clarify here or in private.
Followup October 26, 2009 - 2pm Central time zone:
Franky
you wrote:
... I think we need to leave Oracle for a moment to get better understanding: ...
I said "you need to come on back inside"
You were making assumptions about how you think it works
It doesn't work the way you think, therefore, you need to come back into the database realm and look at it from the way the database works - not how you think it works.
Exactly.
October 26, 2009 - 8am Central time zone
Reviewer: Franky from Europe
djb, Tom
I apologize for my misunderstanding about Oracle's storage mechanism, thank you for the enlightenment. I thought I knew the dump function, but as it seems I misinterpreted the documentaion which is:
"DUMP returns a VARCHAR2 value containing the datatype code, length in bytes, and internal representation of expr."
"internal representation" - this one here made me think that things work otherwise...
Regards,
Franky
Ok
October 26, 2009 - 2pm Central time zone
Reviewer: Franky from Europe
Franky
you wrote:
... I think we need to leave Oracle for a moment to get better understanding: ...
I said "you need to come on back inside"
Sorry, it seems that I misinterpreted your response as well... I think I need to have some days off...
Theory is good, evidence better.
October 27, 2009 - 11am Central time zone
Reviewer: Jack from Little Rock, AR USA
In my test below, character comparisons are just as fast as numbers - no armchair computer science
needed. Also note, implicit conversion isn't a very big deal.
- Jack
SQL> ------------------------
SQL> -- Create input table 1.
SQL> ------------------------
SQL>
SQL> create table temp_01_tb
2 tablespace x_ts
3 nologging
4 parallel 16
5 as
6 select id as id_num,
7 cast(to_char(id) as varchar2(30)) as id_char
8 from (
9 select trunc(dbms_random.value(100000000,100000000000)) id
10 from large_tb
11 where rownum <= 10000000
12 )
13 ;
Table created.
Elapsed: 00:02:14.39
SQL>
SQL> ---------
SQL> -- Count.
SQL> ---------
SQL>
SQL> select count(*) from temp_01_tb;
COUNT(*)
----------------
10,000,000
Elapsed: 00:00:00.81
SQL>
SQL> -------------------------------------
SQL> -- Freq length of id_num and id_char.
SQL> -------------------------------------
SQL>
SQL> break on report
SQL> compute sum of count(*) on report
SQL>
SQL> select length(id_num),
2 length(id_char),
3 count(*)
4 from temp_01_tb
5 group by length(id_num),
6 length(id_char)
7 order by 1
8 ;
LENGTH(ID_NUM) LENGTH(ID_CHAR) COUNT(*)
---------------- ---------------- ----------------
9 9 89,854
10 10 901,563
11 11 9,008,583
----------------
sum 10,000,000
Elapsed: 00:00:01.41
SQL>
SQL> clear breaks
SQL> clear computes
SQL>
SQL> ----------
SQL> -- Sample.
SQL> ----------
SQL>
SQL> select *
2 from temp_01_tb
3 where rownum <= 10
4 ;
ID_NUM ID_CHAR
---------------- ------------------------------
86,969,995,517 86969995517
57,700,034,749 57700034749
98,820,424,831 98820424831
85,003,779,624 85003779624
9,644,826,478 9644826478
88,393,740,235 88393740235
99,155,106,367 99155106367
26,633,943,599 26633943599
67,287,256,584 67287256584
29,916,845,470 29916845470
10 rows selected.
Elapsed: 00:00:00.17
SQL>
SQL> ------------
SQL> -- Describe.
SQL> ------------
SQL>
SQL> desc temp_01_tb
Name Null? Type
----------------------------------- -------- ------------------------
ID_NUM NUMBER
ID_CHAR VARCHAR2(30)
SQL>
SQL> ------------------------
SQL> -- Create input table 2.
SQL> ------------------------
SQL>
SQL> create table temp_02_tb
2 tablespace x_ts
3 nologging
4 parallel 16
5 as
6 select id as id_num,
7 cast(to_char(id) as varchar2(30)) as id_char
8 from (
9 select trunc(dbms_random.value(100000000,100000000000)) id
10 from large_tb
11 where rownum <= 10000000
12 )
13 ;
Table created.
Elapsed: 00:02:14.14
SQL>
SQL> ---------
SQL> -- Count.
SQL> ---------
SQL>
SQL> select count(*) from temp_02_tb;
COUNT(*)
----------------
10,000,000
Elapsed: 00:00:00.70
SQL>
SQL> -------------------------------------
SQL> -- Freq length of id_num and id_char.
SQL> -------------------------------------
SQL>
SQL> break on report
SQL> compute sum of count(*) on report
SQL>
SQL> select length(id_num),
2 length(id_char),
3 count(*)
4 from temp_02_tb
5 group by length(id_num),
6 length(id_char)
7 order by 1
8 ;
LENGTH(ID_NUM) LENGTH(ID_CHAR) COUNT(*)
---------------- ---------------- ----------------
9 9 90,332
10 10 901,092
11 11 9,008,576
----------------
sum 10,000,000
Elapsed: 00:00:01.41
SQL>
SQL> clear breaks
SQL> clear computes
SQL>
SQL> ----------
SQL> -- Sample.
SQL> ----------
SQL>
SQL> select *
2 from temp_02_tb
3 where rownum <= 10
4 ;
ID_NUM ID_CHAR
---------------- ------------------------------
54,067,209,027 54067209027
5,979,558,060 5979558060
72,269,951,072 72269951072
62,556,776,987 62556776987
16,990,890,984 16990890984
47,303,283,438 47303283438
70,490,427,076 70490427076
49,336,811,237 49336811237
68,728,531,845 68728531845
3,126,925,174 3126925174
10 rows selected.
Elapsed: 00:00:00.16
SQL>
SQL> ------------
SQL> -- Describe.
SQL> ------------
SQL>
SQL> desc temp_02_tb
Name Null? Type
----------------------------------- -------- ------------------------
ID_NUM NUMBER
ID_CHAR VARCHAR2(30)
SQL>
SQL> -----------------------
SQL> -- Part II. Run tests.
SQL> -----------------------
SQL>
SQL> alter table temp_01_tb parallel 1;
Table altered.
Elapsed: 00:00:00.02
SQL> alter table temp_02_tb parallel 1;
Table altered.
Elapsed: 00:00:00.00
SQL>
SQL> --------------------
SQL> -- Test num compare.
SQL> --------------------
SQL>
SQL> select count(*)
2 from temp_01_tb a,
3 temp_02_tb b
4 where a.id_num = b.id_num
5 ;
COUNT(*)
----------------
993
Elapsed: 00:00:29.43
SQL>
SQL> ---------------------
SQL> -- Test char compare.
SQL> ---------------------
SQL>
SQL> select count(*)
2 from temp_01_tb a,
3 temp_02_tb b
4 where a.id_char = b.id_char
5 ;
COUNT(*)
----------------
993
Elapsed: 00:00:29.12
SQL>
SQL> ----------------------------
SQL> -- Test num to char compare.
SQL> ----------------------------
SQL>
SQL> select count(*)
2 from temp_01_tb a,
3 temp_02_tb b
4 where a.id_num = b.id_char
5 ;
COUNT(*)
----------------
993
Elapsed: 00:00:30.98
SQL>
SQL> --------------------------
SQL> -- Test num compare again.
SQL> --------------------------
SQL>
SQL> select count(*)
2 from temp_01_tb a,
3 temp_02_tb b
4 where a.id_num = b.id_num
5 ;
COUNT(*)
----------------
993
Elapsed: 00:00:24.84
SQL>
SQL> ---------------------------
SQL> -- Test char compare again.
SQL> ---------------------------
SQL>
SQL> select count(*)
2 from temp_01_tb a,
3 temp_02_tb b
4 where a.id_char = b.id_char
5 ;
COUNT(*)
----------------
993
Elapsed: 00:00:25.70
SQL>
SQL> ----------------------------------
SQL> -- Test num to char compare again.
SQL> ----------------------------------
SQL>
SQL> select count(*)
2 from temp_01_tb a,
3 temp_02_tb b
4 where a.id_num = b.id_char
5 ;
COUNT(*)
----------------
993
Elapsed: 00:00:27.99
SQL>
SQL> spool off
Followup October 27, 2009 - 12pm Central time zone:
... Also note, implicit conversion isn't a very
big deal.
...
there you are wrong. If you compare a string to string 1,000,000 times or a number to a number 1,000,000 times - that'll take about the same time.
But compare a string that has to be converted into a number 1,000,000 times and forget it. Very different.
Your example has a quirk in it that you did a hash join, the actual thing you were testing was such a small contributor to the overall runtime that you cannot see that.
so, just testing what we want to test:
ops$tkyte%ORA10GR2> create table t
2 as
3 select rownum num, to_char(rownum) str
4 from dual
5 connect by level <= 1000000
6 /
Table created.
ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T' );
PL/SQL procedure successfully completed.
ops$tkyte%ORA10GR2> @trace
ops$tkyte%ORA10GR2> alter session set events '10046 trace name context forever, level 12';
Session altered.
ops$tkyte%ORA10GR2> select count(*) from t where num = 0;
COUNT(*)
----------
0
ops$tkyte%ORA10GR2> select count(*) from t where str = '0';
COUNT(*)
----------
0
ops$tkyte%ORA10GR2> select count(*) from t where num = '0';
COUNT(*)
----------
0
ops$tkyte%ORA10GR2> select count(*) from t where str = 0;
COUNT(*)
----------
0
select count(*) from t where num = 0
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.06 0.05 0 2342 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.06 0.05 0 2342 0 1
********************************************************************************
select count(*) from t where str = '0'
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.03 0.03 0 2342 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.03 0.03 0 2342 0 1
********************************************************************************
select count(*) from t where num = '0'
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.03 0.03 0 2342 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.03 0.03 0 2342 0 1
********************************************************************************
select count(*) from t where str = 0
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.28 0.27 0 2342 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.28 0.27 0 2342 0 1
In the last query, the implicit conversion was of the column to a number - 1,000,000 times and that is expensive.
Ok. I believe you.
October 27, 2009 - 1pm Central time zone
Reviewer: Jack from Little Rock, AR USA
I guess I could argue that two tenths of a second per million evaluations "isn't a very big deal",
but that would just be argumentative. Thanks Tom.
- Jack
Followup October 27, 2009 - 4pm Central time zone:
how about this:
an order of magnitude difference.
because that is what it is, an order of magnitude.
and an error waiting to happen - compare string to number implies to_number(string)=number - which can easily fail.
:)
Different view
October 28, 2009 - 3am Central time zone
Reviewer: ygp from India
but in general, compare a str to a str and it is as fast as comparing a number to a number
can also be said as
but in general, compare a number to a number and it is as slow as comparing a str to a str
Oracle uses BCD format for numbers, and hence the numbers are compared like string
Followup October 28, 2009 - 8am Central time zone:
a bcd like format, it isn't bcd.
|