Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Tarun.

Asked: August 03, 2010 - 2:35 pm UTC

Answered by: Tom Kyte - Last updated: October 31, 2015 - 12:03 am UTC

Category: Developer - Version: 10.1.0.4

Viewed 50K+ times! This question is

You Asked

Hi Tom,

It would be really useful, if you can give your input on...
http://www.orafaq.com/faq/what_is_the_difference_between_varchar_varchar2_and_char_data_types

that


VARCHAR is used to store variable length character strings up to 4000 characters. But, remember CHAR is faster than VARCHAR - some times up to 50% faster.

Thanks

and we said...

hah, up to 50% faster... Hmm, fascinating. I wonder where they "made up" that number from.

Since a char is nothing more than a VARCHAR2 that is blank padded out to the maximum length - that is, the difference between the columns X and Y below:

create table t ( x varchar2(30), y char(30) );
insert into t (x,y) values ( rpad('a',' ',30), 'a' );

IS ABSOLUTELY NOTHING, and given that the difference between columns X and Y below:

insert into t (x,y) values ('a','a')

is that X consumes 3 bytes (null indicator, leading byte length, 1 byte for 'a') and Y consumes 32 bytes (null indicator, leading byte length, 30 bytes for 'a ' )


Umm, varchar2 is going to be somewhat "at an advantage performance wise". It helps us NOT AT ALL that char(30) is always 30 bytes - to us, it is simply a varchar2 that is blank padded out to the maximum length. It helps us in processing - ZERO, zilch, zippo.



Anytime you see anyone say "it is up to 50% faster", and that is it - no example, no science, no facts, no story to back it up - just laugh out loud at them and keep on moving along.


There are other "made up things" on that page as well, for example:

Searching is faster in CHAR as all the strings are stored at a specified position from the each other, the system doesnot have to search for the end of string.
Whereas in VARCHAR the system has to first find the end of string and then go for searching.


FALSE: a char is just a varchar2 blank padded - we do not store strings "at a specified position from each other". We do search for the end of the string - we use a leading byte length to figure things out.




Here is an excerpt from my book Expert Oracle Database Architecture:

<quote>
The fact that a CHAR/NCHAR is really nothing more than a VARCHAR2/NVARCHAR2 in disguise makes me of the opinion that there are really only two character string types to ever consider, namely VARCHAR2 and NVARCHAR2. I have never found a use for the CHAR type in any application. Since a CHAR type always blank pads the resulting string out to a fixed width, we discover rapidly that it consumes maximum storage both in the table segment and any index segments. That would be bad enough, but there is another important reason to avoid CHAR/NCHAR types: they create confusion in applications that need to retrieve this information (many cannot “find” their data after storing it). The reason for this relates to the rules of character string comparison and the strictness with which they are performed. Let’s use the 'Hello World' string in a simple table to demonstrate:

ops$tkyte@ORA11GR2> create table t
  2  ( char_column      char(20),
  3    varchar2_column  varchar2(20)
  4  )
  5  /
Table created.

ops$tkyte@ORA11GR2> insert into t values ( 'Hello World', 'Hello World' );
1 row created.

ops$tkyte@ORA11GR2> select * from t;
CHAR_COLUMN          VARCHAR2_COLUMN
-------------------- --------------------
Hello World          Hello World
ops$tkyte@ORA11GR2> select * from t where char_column = 'Hello World';
CHAR_COLUMN          VARCHAR2_COLUMN
-------------------- --------------------
Hello World          Hello World

ops$tkyte@ORA11GR2> select * from t where varchar2_column = 'Hello World';
CHAR_COLUMN          VARCHAR2_COLUMN
-------------------- --------------------
Hello World          Hello World


So far, the columns look identical but, in fact, some implicit conversion has taken place and the CHAR(11) literal ‘Hello World’ has been promoted to a CHAR(20) and blank padded when compared to the CHAR column. This must have happened since Hello World……… is not the same as Hello World without the trailing spaces. We can confirm that these two strings are materially different:

ops$tkyte@ORA11GR2> select * from t where char_column = varchar2_column;
no rows selected


They are not equal to each other. We would have to either blank pad out the VARCHAR2_COLUMN to be 20 bytes in length or trim the trailing blanks from the CHAR_COLUMN, as follows:

ops$tkyte@ORA11GR2> select * from t where trim(char_column) = varchar2_column;
CHAR_COLUMN          VARCHAR2_COLUMN
-------------------- --------------------
Hello World          Hello World
ops$tkyte@ORA11GR2> select * from t where char_column = rpad( varchar2_column, 20 );
CHAR_COLUMN          VARCHAR2_COLUMN
-------------------- --------------------
Hello World          Hello World


Note There are many ways to blank pad the VARCHAR2_COLUMN, such as using the CAST() function.


The problem arises with applications that use variable length strings when they bind inputs, with the resulting “no data found” that is sure to follow:

ops$tkyte@ORA11GR2> variable varchar2_bv varchar2(20)
ops$tkyte@ORA11GR2> exec :varchar2_bv := 'Hello World';
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA11GR2> select * from t where char_column = :varchar2_bv;
no rows selected

ops$tkyte@ORA11GR2> select * from t where varchar2_column = :varchar2_bv;
CHAR_COLUMN          VARCHAR2_COLUMN
-------------------- --------------------
Hello World          Hello World


So here, the search for the VARCHAR2 string worked, but the CHAR column did not. The VARCHAR2 bind variable will not be promoted to a CHAR(20) in the same way as a character string literal. At this point, many programmers form the opinion that “bind variables don’t work; we have to use literals.” That would be a very bad decision indeed. The solution is to bind using a CHAR type:

ops$tkyte@ORA11GR2> variable char_bv char(20)
ops$tkyte@ORA11GR2> exec :char_bv := 'Hello World';
 
PL/SQL procedure successfully completed.
ops$tkyte@ORA11GR2>
ops$tkyte@ORA11GR2> select * from t where char_column = :char_bv;
CHAR_COLUMN          VARCHAR2_COLUMN
-------------------- --------------------
Hello World          Hello World
ops$tkyte@ORA11GR2> select * from t where varchar2_column = :char_bv;
no rows selected 


However, if you mix and match VARCHAR2 and CHAR, you’ll be running into this issue constantly. Not only that, but the developer is now having to consider the field width in her applications. If the developer opts for the RPAD() trick to convert the bind variable into something that will be comparable to the CHAR field (it is preferable, of course, to pad out the bind variable, rather than TRIM the database column, as applying the function TRIM to the column could easily make it impossible to use existing indexes on that column), she would have to be concerned with column length changes over time. If the size of the field changes, then the application is impacted, as it must change its field width.

It is for these reasons—the fixed-width storage, which tends to make the tables and related indexes much larger than normal, coupled with the bind variable issue—that I avoid the CHAR type in all circumstances. I cannot even make an argument for it in the case of the one-character field, because in that case it is really of no material difference. The VARCHAR2(1) and CHAR(1) are identical in all aspects. There is no compelling reason to use the CHAR type in that case, and to avoid any confusion, I “just say no,” even for the CHAR(1) field.

</quote>


and you rated our response

  (22 ratings)

Is this answer out of date? If it is, please let us know via a Review

Reviews

VARCHAR2

August 04, 2010 - 4:32 pm UTC

Reviewer: A reader

Tom:

<<<Anytime you see anyone say "it is up to 50% faster", and that is it - no example, no science, no facts, no story to back it up - just laugh out loud at them and keep on moving along. >>>

I like that. Based on that I should be laughing every day probably untill I retire....Some may start to hate me but who cares..

OraFAQ

August 05, 2010 - 3:40 am UTC

Reviewer: Michel Cadot from France


Tom,

I added a comment to OraFAQ page to point here as a comment from you.
If you don't agree, tell me I will remove this comment.

http://www.orafaq.com/faq/what_is_the_difference_between_varchar_varchar2_and_char_data_types#comment-6284

Regards
Michel

Tom Kyte

Followup  

August 05, 2010 - 7:20 am UTC

no problem, linking to the site here is the way I prefer - that way - no misquotes :)

VARCHAR2(1) better than CHAR(1) really?

August 10, 2010 - 10:12 pm UTC

Reviewer: Kevin N from Melbourne A

Will Gender code ever be longer than 1 character for 'M', 'F'?
Will a flag or indicator column ever be longer than 1 character for Y N T or F?
Will ISO country codes ever be anything other than 2 characters fixed length?
Will ISO currency codes ever be anything other than 3 characters fixed length?

For very large tables (>100 M rows) where these types of fixed length columns make up a large proportion of the row length, using CHAR datatypes can provide more efficient use of disk space and database buffer cache. Not every site has unlimited budgets for disk and memory in todays economy.

I agree that for the more general case of longer character columns and smaller tables, then VARCHAR2 is definitely the way to go.
Tom Kyte

Followup  

August 18, 2010 - 11:39 pm UTC

my point is:

if you use char for something, that legitimizes char. If you use it for something, it must be good. It promotes the use of char.

I am against the use of char - period. Ever. Because if you use it for char(1), someone will use it for char(5). Just use varchar2.


...using CHAR datatypes can provide
more efficient use of disk space and database buffer cache. ...

that is so false, I don't even know where to begin. Therefore, I will ask you to provide the science behind that - you won't be able to.


As I've said many times - on this page even - a char(1) and a varchar2(1) are IDENTICAL.

Either:

Both have a leading null indicator which says "I'm not null"
BOTH have a leading length field of one byte
BOTH have one byte

or

Both have a leading null indicator which says "I'm null"


Or

Neither consume any space at all (because they are a null column that is followed either by no other columns or a bunch of columns that are also null)


In other words - char(1) and varchar2(1) are are IDENTICAL.


And if you start speaking of char(n) versus varchar2(n) where n>1 then the only things you can say is that:

char(n) will take as least as much space if not more (and generally almost always - like 99.999999999% of the time always) as varchar2(2) - both on disk and in the buffer cache.


So, if you believe: Not every site has
unlimited budgets for disk and memory in todays economy.
to be true, please put forth the "use only varchar2" argument - else you need that unlimited budget.

OK

August 19, 2010 - 3:13 am UTC

Reviewer: Kevin N from Melbourne, Australia

Thanks Tom for the more precise explanation of why you are right.

I never realised that the CHAR datatype also has a leading length field of one byte. I had assumed that there was no length field with CHAR as the length can be considered as implicit from the column datatype definition.

I also design for DB2 and SQL Server, so must have been getting confused in my mind between the internal representations of DB2 data and Oracle data.

Guess I'll stick to table compression and compress option on indexes when I need to save disk space when using Oracle.

test

September 08, 2010 - 9:23 am UTC

Reviewer: test test

test
Tom Kyte

Followup  

September 09, 2010 - 8:46 pm UTC

it works

September 10, 2010 - 1:24 pm UTC

Reviewer: Vaij from Pune,India

The conclusion:

The only difference is varchar2 is *trimmed* char and char is *blank padded* varchar2 and nothing else.

Given above it makes little sense to use char even if it char(1), in my honest opinion.

But how about this?

October 20, 2010 - 6:28 am UTC

Reviewer: C from Romania

While i generally agree with most of your points, there are cases when i won't "just say no"... ;-)

Consider this scenario:

Create Table Char_Number(id char(10), count number);
Create Table Varchar_Number(id varchar2(10), count number);
-- loop 1,000,000 times
insert into Varchar_Number(id, count) Values(GetRandomString(4,9), GetRandomNumber(10,100))
select * into Char_Number from Varchar_Number

Now would you say there is a difference between the two tables considering this query: "SELECT * FROM <table> WHERE count BETWEEN 42 AND 69"?

PS Your column comparison is flawed, since a VarChar column doesn't NEED to be equal to the trimmed version of itself. The correct query would IMO be like this:
select * from t where trim(char_column) = trim(varchar2_column)

Tom Kyte

Followup  

October 25, 2010 - 3:00 pm UTC



I don't follow your example at all.

Give me a case whereby using CHAR is superior to VARCHAR2, then we'll talk about the merits of it.

Now would you say there is a difference between the two tables considering this query: "SELECT * FROM <table> WHERE count BETWEEN 42 AND 69"?


sure I would - they return DIFFERENT DATA. One returns char(10) - always 10 characters, the other - does not, it returns *different data*.

PS Your column comparison is flawed, since a VarChar column doesn't NEED to be equal to the trimmed version of itself. The correct query would IMO be like this:

I beg to differ, the varchar2 column contains exactly, precisely what it should contain - what was placed in there. If there are trailing blanks - then THEY WERE THERE ON PURPOSE.

follow-up

October 26, 2010 - 1:29 pm UTC

Reviewer: C from Romania

>>> Now would you say there is a difference between the two tables considering this query: "SELECT * FROM <table> WHERE count BETWEEN 42 AND 69"?

> sure I would - they return DIFFERENT DATA. One returns char(10) - always 10 characters, the other - does not, it returns *different data*.

Umm... Okay, let's assume for the moment that the column *is* supposed to contain "always 10 characters" (my previous inserts were not very well thought out). I was asking if there's a difference in /performance/ for that select statement. The issue being that the data for the filter is only available at a specific offset on each row for the "char" case; i do understand that the usual solution would be to place these "filtered on" columns before the var-length columns, but still...

>>> PS Your column comparison is flawed, since a VarChar column doesn't NEED to be equal to the trimmed version of itself. The correct query would IMO be like this:

> I beg to differ, the varchar2 column contains exactly, precisely what it should contain - what was placed in there. If there are trailing blanks - then THEY WERE THERE ON PURPOSE.

Yes, that was my point: wouldn't your comparison fail if there were trailing and/or preceding blanks in the data, on purpose and by design?
Tom Kyte

Followup  

October 26, 2010 - 8:20 pm UTC

a varchar2(10) that is blank padded to 10 would perform identically to a char(10)- since a char(10) is just a varchar2(10) that is always blank padded.

And since the data is always supposed to be 10 characters, there would be NO difference between then

Hence, using either of varchar2(10) and char(10) would be the same

and since using char(n) would/could lead to serious problems in general - we are back to


JUST USE VARCHAR - period, end.

... Yes, that was my point: wouldn't your comparison fail if there were trailing
and/or preceding blanks in the data, on purpose and by design? ...

IF they are supposed to be there
then 
    THEY ARE SIGNIFICANT AND SUPPOSED TO BE THERE, that is, 'a' is not 'a ' is not 'a  ' and so on - they are different, the compare would SUCCEED - not fail
end if


Why people believe...

January 09, 2011 - 3:08 am UTC

Reviewer: MaHuJa

People are used to thinking in terms of fixed-size records, where a varchar is then expected to be stored externally. In such a case, "CHAR is faster than VARCHAR - some times up to 50% faster" may well be true; it's an extra lookup. (With no cache hits, it would take about twice as long. Thus 50%. Assuming no other fields involved.)

The error is to assume such inferences are valid everywhere - to assume there's only one way to implement something like an sql database.

(The answer, how and why this is wrong in the context of an oracle database, imply oracle doesn't use fixed-size records when strings are involved. That should give those people pause.)

Being a disclaimer-less post on an oracle site, we can assume the author of that quote believed it to be true for oracle as well. But in the same vein, the answer given here doesn't allow for other databases than oracle. Justifiable given that this is an oracle site and every google user knows oracle is one implementation of an sql database...

Tom Kyte

Followup  

January 10, 2011 - 7:49 am UTC

... People are used to thinking in terms of fixed-size records ...

really - I haven't dealt with fixed size records since my mainframe days and having to specify an LRECL for my file allocations.



... In such a case, "CHAR is faster than
VARCHAR - some times up to 50% faster" may well be true; ...

Prove it. That has *never* been my experience. Never.


... some times up to 50% faster" may well be true; it's an extra lookup.
(With no cache hits, it would take about twice as long. Thus 50%. Assuming no
other fields involved.) ...

That literally made no sense to me. I read it many times, it made no sense. Not sure what you were trying to say.



Varchar2 vs Char

January 14, 2011 - 2:25 pm UTC

Reviewer: Kevin O'Keeffe from Charlotte, NC

Tom,
Thanks for your site, your responses and personal experiences have really helped me understand how Oracle works, making me better at being able to use it as a tool. I was, however, suprised about your responses to this particular question.

I think the reason (it's at least my reason) believe that char would be more efficient is because it is a "pritive type", which Oracle can allocate space for when writing the row - just as a number is in a fixed spot, so would a char array - where as a varchar2 is thought to be more of a pointer consuming only the amount of physical memory that is used. Given that, I would expect that the Varchar would either be:
a) a chained field, appended to the top of the datablock - which would cause column migration (and fragmentation) if the datablock is filled.
b) a field causing multiple migrations (everytime the size of the field increases).
In either case, increasing string modifications on the column would cause multiple lookups( option "a" being significantly less and what I tend to believe). Are my assumptions incorrect?

As a quick little test I did the following:
set timing on;

DROP TABLE ko_test2;

CREATE TABLE ko_test2
(
a CHAR (10),
b VARCHAR2 (20),
c CHAR (10),
d CHAR (10)
)
/

DECLARE
id_num NUMBER;
BEGIN
FOR id_num IN 0 .. 10000
LOOP
INSERT INTO ko_test2
VALUES ('xxxxxxxxxx',
'x',
'xxxxxxxxxx',
'xxxxxxxxxx');
END LOOP;

COMMIT;
END;
/

SELECT * FROM ko_test2;

/

DECLARE
id_num NUMBER;
s_ins VARCHAR2 (20);
BEGIN
s_ins := '';

FOR id_num IN 1 .. 20
LOOP
s_ins := s_ins || 'y';

UPDATE ko_test2
SET b = s_ins;
END LOOP;

COMMIT;
END;
/

SELECT * FROM ko_test2;

/

DROP TABLE ko_test2;

CREATE TABLE ko_test2
(
a CHAR (10),
b CHAR (20),
c CHAR (10),
d CHAR (10)
)
/

DECLARE
id_num NUMBER;
BEGIN
FOR id_num IN 0 .. 10000
LOOP
INSERT INTO ko_test2
VALUES ('xxxxxxxxxx',
'x',
'xxxxxxxxxx',
'xxxxxxxxxx');
END LOOP;

COMMIT;
END;
/

SELECT * FROM ko_test2;

/

DECLARE
id_num NUMBER;
s_ins VARCHAR2 (20);
BEGIN
s_ins := '';

FOR id_num IN 1 .. 20
LOOP
s_ins := s_ins || 'y';

UPDATE ko_test2
SET b = s_ins;
END LOOP;

COMMIT;
END;
/

SELECT * FROM ko_test2;

With the following results: (output from selects omitted)
varchar2
--------------------------
Table dropped.
Elapsed: 00:00:01.04
Table created.
Elapsed: 00:00:00.04
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.60
10001 rows selected.
Elapsed: 00:00:07.76
PL/SQL procedure successfully completed.
Elapsed: 00:00:12.11 <---- update
10001 rows selected.
Elapsed: 00:00:07.12 <---- select
Table dropped.


char
-----------------------------------
Elapsed: 00:00:00.07
Table created.
Elapsed: 00:00:00.06
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.59
10001 rows selected.
Elapsed: 00:00:07.81
PL/SQL procedure successfully completed.
Elapsed: 00:00:01.68 <----- update
10001 rows selected.
Elapsed: 00:00:07.98 <----- select

It was just a quick test in order to further understanding, I'm sure you'd have a better real-world example.

Thanks
Tom Kyte

Followup  

January 14, 2011 - 7:16 pm UTC

define primitive type. In java speak I understand the concept - a type natively implemented by the hardware - like int or double - versus something emulated in software like an oracle.sql.number type or a java string or your own objects. One has an obvious performance advantage over the other IF APPROPRIATE (you cannot just use a double instead of a number type - it doesn't work)


A char is nothing more than a blank padded varchar2 - they are identical in storage. They each have a null indicator. They each have a length byte. They each consume as many bytes as that length byte indicates. A char always consumes MAXIMUM lengths - they either consume no storage or ALL storage. A varchar2 consumes a varying amount of storage.



Tell me how many times you plan on - in REAL LIFE - migrating every row in your table?

If you plan on doing that, why wouldn't you set pctfree more sensibly?

set it to 50% and use tkprof and see what you see.

And again - after reviewing the (different) results - ask yourself "will I ever really do anything even remotely similar to this in real life?????"


Will you ever update every varchar 20 times increasing its length?

Space economy

January 20, 2011 - 11:20 am UTC

Reviewer: Grant Johnson from WY USA

I was under the impression that only varchar2 has the 2 bytes of length on the front of each field, and that char being always fixed in length, would be more efficient with space. This would be especially relevant to small fields that are fixed in length (gender, state, status codes, etc.)

Is that incorrect?
Tom Kyte

Followup  

January 24, 2011 - 6:13 am UTC

a char is simple a varchar2 is that always blank padded to the maximum length when not null.

therefore a char(5) takes AT LEAST as much space as a varchar2(5) would - and typically MORE since a char(5) will ALWAYS have 5 characters and the leading length indicator and null indicator whereas a varchar2(5) will have just the leading length indicator, null indicator and between 1 and 5 characters.


a char(1) and varchar2(1) are identical in storage.

Testing

February 03, 2011 - 2:55 am UTC

Reviewer: Jonathan Lewis from UK

I think it's worth mentioning that Kevin's example highlights an important aspect of testing that everyone needs to remember.

In a comment to the original orafaq someone said: "I created a table which was essentially two columns, one char one varchar2() and one was twice as fast as the other" - but gave no details that could show why their test was either a special case or wrong. (First thought - what's the difference between a table that is "essentially" two columns and one the actually is two columns ?)

Looking at Kevin's script my first thought (like yours) was: "You've grown every row from about 35 bytes to about 55 bytes, your PCTFREE should have been about 40. This is a test of row migration, not of char/varchar2".

My second thought was: "How do you think the tests would compare if you didn't change the PCTFREE, but started the test columns at NULL rather than 'x' ? (A CHAR() that is NULL is not space padded and the first update would grow the char(20) from zero bytes to 20 bytes.)".

I had three or four other thoughts about how the test, and its possible variants, could help us understand and learn things about how a production system would behave, but I won't bore you with details.


In passing, the column overhead is only one byte - the "null byte" and "length byte" are the same thing, a length byte of 0xFF (255) means the column is null; but when the used space is over 250 bytes the length "byte" becomes three bytes - a lead-in byte of 0xFE and the next two bytes giving the actual usage.

Tom Kyte

Followup  

February 03, 2011 - 3:25 pm UTC

thanks for the correction on the null indicator, I should know that, it is written down in a book somewhere... by me...

I thought the physical storage was different.

August 23, 2011 - 4:07 pm UTC

Reviewer: Andrew from Atlanta, GA USA

I thought VARCHARs were stored a little like LOBs where there is a pointer record in the "row data" and the actual String was stored somewhere else. This is how you get around having to store nulls for unused parts of the VARCHAR. If the VARCHAR is stored with the row data then each row may be a different length and each record's start byte would not be in a calculateable place.

Then CHARs are are stored "inline" and always take up the same amount of space on disk. They can exist with the "row data" because they are fixed length. The beginning of each record is always "row length" bytes after the beginning of the prior record. In this case "row length" is constant for the table.

At least this is how many raw datafiles and I think maybe even some old Foxpro like DBs work. I used to have to write programs that could read files like this. It's been years (thankfully) since I've had to reverse engineer a file with a hex editor and no file spec. Maybe things have changed alot.

Tom Kyte

Followup  

August 30, 2011 - 3:29 pm UTC

primitive databases like dbase/foxpro in the olden days used fixed width records - yes.


Pretty much every RDBMS stores varying length character strings inline, in a varying length field - rows are not all the same length. We work with BLOCKS (some call them pages) of data - not rows.

dbase - worked with rows.

Oracle (and others) work with blocks.


been that way since the beginning - before dbase existed in fact :)

char and varchar in Oracle

June 03, 2012 - 2:11 pm UTC

Reviewer: Phil from Gaitherburg, Md

I am astonished if Tom's description of Oracle is correct.

If each record in a table has a fixed length why would anyone store
length indicators on char columns and thus need to parse each column in
each record every time it is accessed? A waste of storage, a waste of
memory, a waste of cpu time, a waste of coal. If there is an integer
following ten chars then they seriously have to read each char column's
length indicator so they can find the offset to the next char column
length indicator and do it ten times, so they can finally figure out
where the 11th column begins? For every record in every query? Of course
the data is blocked in memory - but they also have to do this for every
record in every the block in order to find the beginning of the next
record! Do they do that for integers and floats and doubles and bytes or
they are only they only that absurd for char(1)?

I will continue to use CHAR for fixed length characters such as char(1)
or char(3) - just in case Oracle ever figures out the obvious.

Tom Kyte

Followup  

June 04, 2012 - 9:07 am UTC

A byte - you are squabbling over a byte? two bytes at most? (but to have two bytes you'd have to be over a char(255) meaning your overhead is pretty darn tiny.

seriously?



You may do whatever you want, I just call it a "very bad practice", it can only lead to confusion and will never be "better" than a varchar2 in implementation.

You do realize also that our number type is a pack decimal type of number and has leading length bytes too right?

dbase was the last database I used that used a fixed width record "blocked" out in memory.


The data is not blocked in memory for any real database that I'm aware of in the 21st century.


but they also have to do this for every
record in every the block in order to find the beginning of the next
record!


where did you read that???? please don't make up stuff, that isn't true. If you read up on the block (Server Concepts) you would know we have a row directory in the block header (oh horrors, another bit of overhead!)

Can you *imagine* if everything where stored in fixed length fields? Holy cow, our databases would be many orders of magnitude LARGER than they are today.


Also, please think about the inefficiencies of trying to mingle fixed width and variable width data in the same record. Think about it.

Given that the mere existence of a SINGLE varying width field means you have to parse over every field after that - the mere existence of a SINGLE varying width field - the "optimization" you propose would be an anti-optimization at worst and meaningless (not providing any) optimization at best.


Given that probably 99% of tables start with something like create table t ( id_of_some_sort number primary key, .... ) and the number type is a varying width field from 0 to 44 bytes in length (as is INT) - any sort of fixed width field would be meaningless. You wouldn't know where any of the fields started in a record as soon as you hit a varying width one.


and keep thinking about this, in order to get the N'th field in a "pure fixed width implementation", you'd have to

a) access some data structure to figure out the length of column 1
b) add that to a counter
c) access some data structure to figure out the length of column 2
d) add that to a counter
...

and so on, compared to what we do:

a) access some data structure (the block itself) to figure out the length of column 1
b) move forward by that amount by incrementing a pointer
c) access some data structure (we are there) to figure out the length of column 2
d) move forward .....

yes, that would burn much more coal now.... or not.


Tom is Awesome!

July 23, 2013 - 7:26 pm UTC

Reviewer: Pupil from Charlotte, NC

Your knowledge is so much in-depth and you are so thorough that even a single (possibly more books) can explain the same concept with examples. One'd wish you never retire but we know it's not going to happen!

Yes it's primitive

December 18, 2013 - 8:24 am UTC

Reviewer: Jaimie from Dataland

But Tom, isn't it true that some tables are composed of all fixed-width fields? For example, a series of integers will be fixed-width. There are lots of tables like that -- especially normalized ones. Now let's say I happen to want to use a CHAR(1) for 1 field like "P" for product and "D" for download. I know it's limited in utility, but in that case the old-school DBASE format would probably do very well and modern hardware might notice or be able to anticipate which area of memory (if it maps the file to memory) will be read next in a full table scan. I'm not thinking of transactions or version control here. Let's just say it's a big fixed-width grid. Am I wrong? Is it not worth it?

Hardware Prefetching

December 18, 2013 - 9:22 am UTC

Reviewer: Jaimie from Dataland

OK here's the optimization mechanism:

http://software.intel.com/en-us/articles/optimizing-application-performance-on-intel-coret-microarchitecture-using-hardware-implemented-prefetchers

I don't know exactly what Intel covers with "heuristics," nor is it clear they'd tell you all of such secret sauce, but it is safe to say that Intel will recognize a series of cache misses when it is scanning through data and it's always an offset that is divisible by N.

So let's say you have a wide table of integers representing various foreign keys because we did lots of our normalizing homework. They're all multiples of CPU-words in length. Say 2000 columns as a really obtuse example to make it clear. All integers. In this case the CPU's prefetching mechanism could make a rowstore behave a little more like a column store. I'm aware that one can encode integers using some sort of variable length byte (VLB) format, and that this can even be desirable. I'm also aware that one can organize data into blocks/pages/whatever to make it easy to, among other things, skip over stuff. But this simple use-case comes to mind.

What else does Intel do? I honestly don't know, and this is conjecture to be fair.

agree

March 29, 2014 - 6:34 pm UTC

Reviewer: jonathan from Leesburg, VA USA

sometimes programmers feel like they have to be efficient, or use every feature. Our application has "CHAR" in it....and someone even used the java "SHORT". Retrofitting the database and the java application to clear out "SHORT" or "CHAR" is expensive and error prone. The only time i would actually care about efficiency of code is in imbedded applications. Most folks do not write them. Most folks should care about maintainability and extensibility. I think using strange data types should be a hangin' offense.

"Strange" data types do have their uses

August 05, 2014 - 11:58 am UTC

Reviewer: Thomas from Scotland

re: jonathan from Leesburg, VA USA,

I find myself using "strange" data types more and more. Not because I am worried about resource use, but because they add another level of data validation.

If I were to write PL/SQL code today for a standard customer/order system, I would declare the number of items in an order as POSITIVE - not NUMBER, not INTEGER.

Having an order with less than one item does not make sense in real life. Having an order with less than one item highlights a problem with the code.

Regards,
Thomas

CHAR vs. VARCHAR2

April 21, 2015 - 9:54 pm UTC

Reviewer: Josh from Los Angeles, CA

I think some of this discussion may come down to detailed Oracle implementation vs. other database platforms. For example, I was recently implementing relatively large fact tables in a Data Warehouse running on Teradata.

The tables were relatively "narrow" (maybe 10 to 15 columns) but one of the two had 100 million rows and the other had about 1.5 billion. I converted a couple of columns from VARCHAR to CHAR and was able to save several GB of storage on the 1.5 billion row table. I believe this was Teradata version 13.10. I was then able to save even more space by using dictionary-based compression which Teradata supports on CHAR but not on VARCHAR (or at least they only supported dictionary-based compressed on CHAR and not VARCHAR as of 13.10, I believe more recent versions have expanded compression capabilities.)

This is because Teradata doesn't store a 2-byte variable-length indicator with a CHAR column but it does with VARCHAR. So you can save two bytes per row, and when you have a column that is only a couple of bytes in the first place this can save a lot of space in a table with a lot of rows. Here is more detail on Teradata and CHAR vs. VARCHAR: http://teradataforum.com/teradata/20030204_161225.htm (yes it is an old link but the most recent I could quickly find.)

I can't speak to other DB platforms, but my point is that many folks may come to Oracle carrying this assumptions that CHAR on Oracle does not carry the overhead of the variable-length indicator in the same way that it behaves on other platforms. Clearly from reading previous posts on this forum Oracle's behavior is different from other platforms.

October 29, 2015 - 10:52 pm UTC

Reviewer: lim from Los Angeles, CA

Row Chaining and Migrating

In two circumstances, the data for a row in a table may be too large to fit into a single data block. In the first case, the row is too large to fit into one data block when it is first inserted. In this case, Oracle stores the data for the row in a chain of data blocks (one or more) reserved for that segment. Row chaining most often occurs with large rows, such as rows that contain a column of datatype LONG or LONG RAW. Row chaining in these cases is unavoidable.

However, in the second case, a row that originally fit into one data block is updated so that the overall row length increases, and the block's free space is already completely filled. In this case, Oracle migrates the data for the entire row to a new data block, assuming the entire row can fit in a new block. Oracle preserves the original row piece of a migrated row to point to the new block containing the migrated row. The rowid of a migrated row does not change.

When a row is chained or migrated, I/O performance associated with this row decreases because Oracle must scan more than one data block to retrieve the information for the row.

http://docs.oracle.com/cd/B19306_01/server.102/b14220/logical.htm
https://docs.oracle.com/database/121/CNCPT/logical.htm#CNCPT1053

So in the case of an update, CHAR seems to be a better choice....
Chris Saxon

Followup  

October 30, 2015 - 10:17 am UTC

This is only an issue if you're increasing the length of the strings.

If you expect this (people will insert placeholders that they'll later update to longer text), you can set pctfree higher. This reserves more space so everything can stay in the original block.

If updates leave the total with a similar length, row migration is unlikely to be an issue (pctfree has a default of 10, leaving some wiggle room).

October 30, 2015 - 3:05 pm UTC

Reviewer: Lim from Los Angeles, CA

This is only an issue if you're increasing the length of the strings.

- Yes and this can happen at any given time.

If you expect this (people will insert placeholders that they'll later update to longer text), you can set pctfree higher. This reserves more space so everything can stay in the original block.

- Yes, I can play with pctfree but I rather have the tables designed right from the beginning.

If updates leave the total with a similar length, row migration is unlikely to be an issue (pctfree has a default of 10, leaving some wiggle room).

- Unless you are involved in table design, you just don’t know. It’s better to set a standard to accommodate any potential mistakes or problems.

BTW, I am not saying that playing with pctfree is wrong. I just like to have my tables designed right from the beginning and remove any potential headaches or problems down the road…

Chris Saxon

Followup  

October 31, 2015 - 12:03 am UTC

Its also worth thinking about the Hakan factor for tables in this instance. It was originally planned for bitmap indexes,but can also be used to good effect for rows that grow.

Check out the docs for 'alter table ...minimize records per block'