Skip to Main Content
  • Questions
  • How are varchar stored internally in a block or page.

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Kelly .

Asked: August 09, 2001 - 4:10 pm UTC

Last updated: March 15, 2021 - 3:39 pm UTC

Version: 8.1.7

Viewed 10K+ times! This question is

You Asked

Tom,
I work with a modelers group, who would like to define every varchar2 field with the maximum length. Which means that a table with 20 varchar2 fields will all be defined with a maximum of 2000 or 4000 bytes.
I tied to talk to him about the reasons we identify data with correct lengths and names inorder to understand what we have in our database. He told me that it doesn't matter as oracle just stores the length etc and there is no overhead. I don't believe this is true but have been jumping between so many databases that I cannot find a document on the internals of oracle storage.
Can you help me out here with this question?
Thanks


and Tom said...

This is your "data modeler" -- my goodness. They are the ones that are supposed to be telling YOU that it is vital to use the appropriate length when defining fields! That is their JOB.

Lets just forget about things like storage for a minute...

Ask them

o whats going to happen when users pull this up in a query tool that formats each field based on the width of the column in the database? They'll see one column and have to arrow way over to see the second and so on.

o Ok, so the code prepares a query that select 10 columns that are varchar2. The developer, for performance would like to array fetch (very important). They would like to array fetch say 100 rows (very very typical). So, we have 4000 times 10 times 100 = alsmost 4meg of RAM the developer must allocate!!

Now consider if that where 10 varchar2(80) fields (its probably much smaller then that) -- thats about oh 78k. Ask the "data modeler" how much RAM he is willing to kick in for this system....

o ok so the developers go to build a data entry screen to put data into the database. Wow -- that code field can be 4000 characters long, that first name can be 4000 characters long. How the heck is anyone going to know what sort of data can really go in there?


Tell them they need to consider the length as a constraint. Just like they are going to use primary and foreign keys, they will use the proper and correct length on fields. You can ALWAYS grow a field via "alter table t modify c varchar2(bigger_number)" -- there is ZERO, none, no reason to use the max length everywhere. It'll kill the applications you develop - they'll mistakenly allocate many megabytes of ram (just think of the array fetch example with an app server -- now its not just 4 meg, its 4 meg * number of connections -- you are talking some real memory here for a SINGLE QUERY (and you'll be doing lots of them at the same time))

As for the storage. A char(2000) will consume 2000 bytes of storage whether you put the letter "a" in there or the string "hello world" or 2000 characters. A char is always blank padded.

As for the storage, consider this -- are you thinking about using an occasional index in your system?

tkyte@ORA817.US.ORACLE.COM> create table t ( a varchar2(4000), b varchar2(4000) );

Table created.

tkyte@ORA817.US.ORACLE.COM> create index t_idx on t(a);
create index t_idx on t(a)
*
ERROR at line 1:
ORA-01450: maximum key length (3218) exceeded

I have an 8k block size -- you would need to use at least a 16k block size to index a SINGLE column but even then if you tried to create a concatenated index on t(a,b) -- it would fail there!

This is one of the worst ideas I've personally heard of. Get a new data modeler!


(these are just the things that popped off the top of my head -- there are undoubtably many many more bad things that would happen due to this)



followup to comment one

Thats an interesting point on the UTF8 issue. In Oracle9i (9.0.1) and up what we've done is to add a new syntax to the varchar2/char datatypes:

create table t
( some_data varchar2( 20 char ),
some_other_data char( 10 char )
)
/

This defines a table with 20 CHARACTERS -- not bytes -- of storage. In a UTF8 instance (where a character can take many bytes - many more then 3 -- eg:

....
A typical situation is when character elements are combined to form a single
character. For example, o and an umlaut can be combined to form ö. In the Thai
language, up to three separate character elements can be combined to form one
character, and one Thai character would require up to 3 bytes when TH8TISASCII
or another single-byte Thai character set is used. One Thai character would require up to 9 bytes when the UTF8 character set is used.
.....

That char( 10 char ) may consum many BYTES of storage -- many more then 10.


Now, this does NOT increase the max varchar2/char size. If you define a:

varchar2( 4000 char )
char( 2000 char )

we are still limited physically to 4000 bytes and 2000 bytes. Where this new syntax comes in most useful is in "normal" fields like:

code char( 1 char )
name varchar2( 80 char )
description varchar2( 255 char )

for really long fields you'll still want to use CLOBS as the storage mechanism.

I stand by my comments above. Having a blanket rule that everything should be a varchar2(4000) is a really bad idea

Rating

  (33 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

How are varchar stored internally in a block or page.

Svend Jensen, August 10, 2001 - 5:15 am UTC

Very interesting point on memory usage in clients/OAS.
The so called "modeler" is close to the point. Varchar2 are stored with minimum overhead on disk. A column GREET varchar2(4000)='HELLO WORLD' would use ~11 bytes of disk storage.

Using variable multibyte charsets as UTF, each character will expand anywhere from 1 to 3 bytes. Take that into account when you design your column sizes for global deployment and charsets other than single byte charsets (US7ASCII, WE8ISO8859...)

I miss info on overhead and or compression with repeating characters like 200 spaces or dot's

Storage of Varchar2 fields

Ilango Jv, August 10, 2001 - 8:19 am UTC

Its a good explanation and very useful. Thank you very much Mr. Tom.

How are varchar stored internally in a block or page

Rob, August 10, 2001 - 3:20 pm UTC

Unfortunately I have seen this quite often.
The database modelers and are using row based architecture.
They create tables with a type identifier and a bunch of generic columns with varchar2 (4000). I have tried to get them to move away from this approach. They believe the flexibility out weighs the issues you bring up.
I’m sure you can imagine what a pain it is developing for these systems.


AB, August 13, 2001 - 4:27 am UTC

I don't think these so called "modellers" are creating flexible environments at all - they are just being extremely lazy and cannot be bothered to understand the data that will be eventually be put into the database. Tom tells it as it is.

Storage of VARCHAR2

Randy, August 13, 2001 - 10:49 am UTC

Personally I have seen data modeller of this kind. They seem to have no idea about resource conservation and very surprisingly most of them has some kind of Visual Basic programming background. Can you ask your data modeller how he would estimate the size of the database/table ? I gurantee that would be very challenging.


varchar2 constraint

Mikito Harakiri, September 19, 2001 - 5:21 pm UTC

I'm with data modeller. How come Oracle enters 21st century with that silly limit on string length while any decent language has dynamic strings today? If it's a constraint, then it must be _optional_ like any other constraint. In short, one string datatype -- not dosen as today -- plus optional length constraint would be an ideal implementation.

Tom Kyte
September 19, 2001 - 5:53 pm UTC

If you don't want a limit, use a clob. So, when you don't want a limit -- we have a datatype for you. Heck, in 9i you can use like, =, <, >, instr, substr, whatever on a clob. Clobs are for you.

There are practical concerns with regards to database development and design. Imagine if people started putting gobs and gobs of data inline in the table (oh wait -- thats called a LONG and the issues with that were so bad -- we've totally replaced that datatype with a CLOB).

That is sort of what this thread was about -- there are practical considerations to be thought about with a physical data model. Can you answer some of the questions I posed above??

SQL stands for something -- structured is part of the name.

Everything has constraints. Even those languages that seemingly don't have constraints. (C seems to be a decent language, PLSQL too -- all strings are constrained in those languages as well).

Dates -- gotta be a date, there is no feb 30'th, there is nothing before 4712bc, there is no year 10,000.

Numbers -- gotta be a number, there is no number 'z', gotta be in a certain range of precision.

Strings -- gotta be of a certain size. Go with 4000 if you want -- but you might want to consider the practicality of doing so, answer the questions in your own mind first (so when the developers ask them, you are ready with the solutions to the real problems they will encounter).



VARCHAR2 columns per table

sachin, March 26, 2003 - 3:38 pm UTC

Tom,
We have 8.1.7.4.
I am building a table for a survey...
Some of the columns involved needs lot of text...about 2000
characters...
So, couple of tables are coming up with about 11 columns
each with length varchar2(2000)...
The analyst is insisting that we will need this much length
for all these columns...may be more...
What is the best way to declare the types of these columns..
Will CLOBS be better than VARCHAR2
Also, We are using three tablespaces...small, medium and large (LMT)
I was going to put this in large:
DATAFILE SIZE 1000M
16384K
MAXSIZE 1000M
EXTENT MANAGEMENT LOCAL UNIFORM
SIZE 1024K

What are your suggestions about tablespaces for this kind of table.....may be a separate tablespace??

Thanks...



Tom Kyte
March 26, 2003 - 6:36 pm UTC

If they are 2000 characters -- varchar2(2000) is appropriate.


A table with 11 varchar2(2000) columns is not unusual, it need not be treated differently from anything else.

Is UTF8 not required?

Tony, June 10, 2003 - 8:32 am UTC

Tom,
One of my friends says that if we use varchar2( 20 char )
syntax, no need to have UTF8 for multi-byte character sets. Is it true?

Tom Kyte
June 10, 2003 - 9:21 am UTC

no, either you misunderstood or your friend is confused.


You want to use varchar2(20 CHAR) when you think you might want to use UTF8 or any multi-byte character set and don't want to have to change your schema.


a column of varchar2(20) reserves 20 bytes.
in US7ASCII for example, that just happens to be 20 characters.
in UTF8, that might be 20 characters, it might be 10, it might be 6 -- since each character might take 1, 2, or 3 bytes (maybe even more, not sure, not relevant -- only relevant that a CHARACTER may take more then ONE BYTE)


Now, if you had defined the column in 9i (only in 9i and up does this feature exist) as varchar2(20 CHAR) then

in US7ASCII, that will permit upto 20 characters in that character set -- that just happens to be 20 bytes.

in UTF8, that will permit upto 20 characters in that charcater set -- meaning that might be 20, 40, 60, .... bytes


NOW, bear in mind, this does NOT increase the maximum varchar2/char byte width which is still 4000/2000! If you

varchar2(4000 CHAR)
char(2000 CHAR)

that will still be limited to 4000/2000 BYTES and may or may NOT be able to hold 4000/2000 utf8 characters!



This is something to consider start using in your applications as it will save pain in the furture if you are installed on a multi-byte character set. More then one application has had to go back and fix their schema since their "first_name" field at varchar2(40) -- couldn't hold 40 characters -- they had to double to triple that

What about size of number data type columns?

A reader, June 11, 2003 - 5:10 pm UTC

Hi Tom,
Could you please also give your views (apart for using as a constraint) regarding columns created with number data type of specific precision rather that just number?
Thanks

Tom Kyte
June 12, 2003 - 8:28 am UTC

if the number is to be a number between 1 and 999 with no decimals, etc etc etc -- using number(3) would be appropriate.

Just like using varchar2(10) for a 10 byte field would be.


If there is a constraint to be enforced, DO IT.



Max of varchar2 is 2000 or 4000 bytes ?

Tony, June 12, 2003 - 2:50 am UTC

You have mentioned in the above follow-up for my question that varchar2 max byte is 2000. Is it not 4000 bytes?

Tom Kyte
June 12, 2003 - 8:48 am UTC

touched that up -- was thinking "char", typed "varchar2", so I added both

Max of varchar2 is 2000 or 4000 bytes ? Please Ignore

Tony, June 12, 2003 - 2:55 am UTC

Sorry Tom, I've misunderstood. You're right.

Tom Kyte
June 12, 2003 - 8:50 am UTC

no, you were, it is fixed.

Excellent as always.

Orlando Reyes, December 07, 2004 - 5:58 pm UTC

Tom,

I am facing the same situation right now but related to NUMBER vs NUMBER(p,s).

Our Data Architect group wants to define all numeric values to be NUMBER because of flexibility, they claim, and their manager agrees with it, we, the DBA team, have already point out the Data Constraint issue along with formatting, and rounding/truncate possible problems.

Can you please give me some more ammunition to use when we have the final meeting to discuss the issue?

Is there anything strong enough, related to performance, space or the like that we could also use? Like the problem of creating an index on the VARCHAR4000). My understanding is that Oracle does not really gets impacted either way, since it would store the number the same way and it would take the same space, am I wrong? Performance, I am not sure if the number manipulation would be impacted.

Any ideas will be very much appreciated.

Thanks,


Tom Kyte
December 07, 2004 - 8:54 pm UTC

how can they even begin to anoint themselves with the term "architect"

anyway -- I suppose they think all things should be varchar2(4000) too -- it is after all "flexible" no?

you are losing the edit - if the number is to be a number(7,2) -- you lose the edit.

space -- nothing there -- until bad data gets in, but then so what, you have bad data (note: not IF bad .... but when BAD....)

it is all about data integrity here -- remember a database does not have "99% data integrity" - it is a binary thing, either you have it or not. if they just use number for everything, they probably don't have it.

FFO, December 27, 2004 - 7:04 am UTC

So from a storage perspective, a VARCHAR is good. But on the memory side it may consumme more memory (disc buffers, sort areas...). Does VARCHAR has an influence on index creation/queries performance? I mean, does reducing size -say from VARCHAR(4000) to VARCHAR(1500)- improves performance ?

Tom Kyte
December 27, 2004 - 10:15 am UTC

how so? why do you say that about the memory?

confusion about follow up

amit poddar, December 27, 2004 - 11:18 am UTC

Hi,
From your first follow up
"Ok, so the code prepares a query that select 10 columns that are varchar2.
The developer, for performance would like to array fetch (very important). They
would like to array fetch say 100 rows (very very typical). So, we have 4000
times 10 times 100 = alsmost 4meg of RAM the developer must allocate!!

Now consider if that where 10 varchar2(80) fields (its probably much smaller
then that) -- thats about oh 78k. Ask the "data modeler" how much RAM he is
willing to kick in for this system...."

I thought varchars were not blank padded. So lets say we have one character in a varchar2(4000) field, The client fetches 100 rows then why would it fetch 4000*100 bytes instead of just 4000*1 since varchar2 is not blank padded.

Fetching 4000*100 bytes makes sense if we had a char(4000) field.

Could you please clarify

Tom Kyte
December 27, 2004 - 12:19 pm UTC

that is not buffer cache memory, that is client side memory. it will not affect disk buffers, sort areas at all.



The client, buffering this data, would have to allocate a "square array" to fetch into.


They are not blank padded, is it just that when you ask for 100 rows into an array and each row COULD have a varchar2(4000), you'd better have an array that is 100*4000 in size.




Amazing data modeler!

Mark, December 31, 2004 - 3:45 pm UTC

Why not make every column a blob? They're more flexible, after all! (sarcasm)

maximum manipulation of varchar

Raj, March 28, 2005 - 8:52 am UTC

The explanations are excellent.
I need a clarification wrt varchars.
I'm having a file and i want to store the contents of the file in a variable so that this variable will be used to update the values present in a column
of the table. The column is LONG field.

varchar abc[20000000]; /* pro*c declaration -2GB*/

Is the above declaration correct or any other better way for manipulating the file via variables in pro*C environment.

So far it is working fine with the above declaration.
I'm in a developing environment.
But now all of a sudden it is just getting stored
26702 bytes only. The remaining part is just getting truncated.
Your view on this will be highly useful.

Is there any restriction of maximum size for varchar declaration in proc*C environment. Is there any parameter variables needs to be increased.
Thanx in advance


Tom Kyte
March 28, 2005 - 9:13 am UTC

in 2005 you should really be using a CLOB.

why are you using a long?

Maximum manipulation of Varchars

raj, March 28, 2005 - 9:50 am UTC

Thanx for your response.
It is to be changed. Now this is, as a part of migration process ,we need to take all the values in the long field into a flat file. which i'm able to get. When i did the other way round (From the file to the table. )
I'm having that problem. But earlier i tried with 1.2 million bytes which worked fine.
I'm able to update the long field. But now i'm stucked up.
Please highlight on this area.



Tom Kyte
March 28, 2005 - 10:51 am UTC

if you have it in a file, why aren't you using sqlldr?

Matt Prince, June 01, 2005 - 11:26 am UTC

Tom, this all makes perfect sense, at least for OLTP, but what happens when we move into the warehouse arena?

In the warehouse we are building (to be more precise in the Conformed Data Store that we are building to feed the datamarts), we will be conforming data from multiple systems and I would have thought there is a trade off between tight definitions and having to alter the schema whenever a new system comes in with a slightly larger textual code or description.

For example, say the systems we are currently using as sources have a varchar2(4) code, defining the source system id on the loaded record a varchar2(10) does not seem unreasonable.

However, it has been suggested that we use a small number of sizes (say vc10,100,250,500,4000) and build in plenty of redundancy - would you still see disadvantages with this approach in a warehouse environment?


Tom Kyte
June 01, 2005 - 11:50 am UTC

you can make columns larger easily.

you cannot make columns smaller without unloading and reloading (online rebuild or whatever).


should you not be made aware of the fact that what used to be 4 is now 10. Might you have to rethink your future settings, might your cleansing logic have to change, might not dozens of things be affected?


I would use the right size, period.

I'd like to know -- because maybe the jump to 10 was "a big mistake, glad you caught it" sort of thing. Rather than a "you took garbage in and gave us garbage out"

no blanket rules..

Matt Prince, June 01, 2005 - 1:02 pm UTC

Fair comment - however I think my example is a case where it pays to oversize the column. Its quite possible that a new system [ perhaps this is unusual we are likely to be incorporating new systems down the line ] would have a primary key longer than 4 chars but unlikely that it would be longer than 10. Its an internal field, not used in reports so display formatting is not an issue. Finally the ETL has to be extremely messed up to insert something invalid into the source id column (and cunning to not mess up everywhere else).

So when the new system comes along with a vc10 - we check that it fits, yes, no changes required.

It is of course small potatoes when compared to the original suggestion from Kellys modeller


Tom Kyte
June 01, 2005 - 5:16 pm UTC

we'll have to agree to disagree.

I don't think that just because it is a Data warehouse, integrity can be thrown out the window (in some respects, I almost consider it even more important here).


ETL can get messed up, anything can get messed up, multiple layers of defense stands a better chance of catching a mess up.

A reader, May 16, 2008 - 5:27 pm UTC

When I define a column using the "UTF8" syntax, what's stored in user_tab_columns.data_length is the number of bytes.

For example,

CREATE TABLE x (
col1 VARCHAR2(10),
col2 VARCHAR2(10 CHAR)
);

col data_type for a10
col column_name for 5

SELECT column_name, data_type, data_length
FROM user_tab_columns
WHERE table_name = 'X';

COLUMN_NAME DATA_TYPE DATA_LENGTH
------------------------------ ---------- -----------
COL1 VARCHAR2 10
COL2 VARCHAR2 40 --> number of bytes


Now where in the data dictionary does it store the fact that col2 is defined as 10 characters? Doesn't Oracle still need this info in order to constraint col2 to 10 characters?


INSERT INTO x VALUES ('0123456789', '01234567890');


ERROR at line 1:
ORA-12899: value too large for column "PROTECT"."X"."COL2" (actual: 11, maximum: 10)


Tom Kyte
May 19, 2008 - 3:16 pm UTC

CHAR_COL_DECL_LENGTH

Followup on CHAR_COL_DECL_LENGTH

A reader, May 19, 2008 - 5:47 pm UTC

CREATE TABLE x (
col1 VARCHAR2(10),
col2 VARCHAR2(10 CHAR)
);

col data_type for a10
col column_name for 5

SELECT column_name, data_type, data_length, CHAR_COL_DECL_LENGTH
FROM user_tab_columns
WHERE table_name = 'X';

COLUM DATA_TYPE DATA_LENGTH CHAR_COL_DECL_LENGTH
----- ---------- ----------- --------------------
COL1 VARCHAR2 10 10
COL2 VARCHAR2 40 40

I was expecting that col2 will have 40(data_length)/10(CHAR_COL_DECL_LENGTH) instead of 40/40?

Tom Kyte
May 20, 2008 - 11:21 am UTC

sorry - CHAR_LENGTH....

ops$tkyte%ORA10GUT> desc x
 Name                                 Null?    Type
 ------------------------------------ -------- -------------------------
 COL1                                          VARCHAR2(10)
 COL2                                          VARCHAR2(10 CHAR)

ops$tkyte%ORA10GUT> set linesize 1000
ops$tkyte%ORA10GUT> /

COLUMN_NAME                    DATA_TYPE  DATA_LENGTH CHAR_COL_DECL_LENGTH CHAR_LENGTH
------------------------------ ---------- ----------- -------------------- -----------
COL1                           VARCHAR2            10                   10          10
COL2                           VARCHAR2            40                   40          10

ops$tkyte%ORA10GUT>

VARCHAR(4000)

Ramakrishnan, May 29, 2008 - 12:03 am UTC

My question is same as 1st question, We know many varchar column needs only VARCHAR(100), but we are creating with VARCHAR(4000. Can you explain what performance problem we will face & do we have any storage issue if we blindly create VARCHAR(4000). Since we need this as argument to tell why we should NOT create VARCHAR(max number)

Best regards
Ramakrishnan.A
Tom Kyte
May 29, 2008 - 7:50 am UTC

no, you don't need a performance argument here to win a common sense argument.


A datatype is nothing more than a data integrity constraint, plain and simple, period


If you have a requirement to store 100 characters, you would use a varchar2(100) or (100 char). Just like if you had a requirement to store a date you would use a date (or a timestamp) and so on. You use the right datatype because you are professional data handling people, because you actually take some modicum of pride in that which you do.


If you use a varchar2(4000) do you know what will happen to your data? Simple. Someone will put 4000 bytes of data in there - 4000 being more than 100. You have garbage. You have a field which by all accounts should have 100 and now doesn't. And why? Because someone did not do their job.

Please - think about this, why in the world do we need to take it to a performance level here - why, for once, can't COMMON SENSE rule????


Since you can always alter a varchar2 column to be LARGER in the future (and the converse is NOT TRUE) - one questions the motivation (or lack thereof perhaps) of your 'data modelers'. "We are too lazy to do our work, we don't care about data integrity, just use a varchar2(4000)"


Please just re-read the original response - everything you need is there.

Instead of just raw performance, use shame, shame then into actually doing their job.



Well, there is a doc on part of the performance thing

joel garry, July 28, 2008 - 5:26 pm UTC

Tom Kyte
July 29, 2008 - 3:45 pm UTC

that is specific to variables declared in plsql, not datatypes used to store data.

varchar2(4000)

Johan de Ridder, August 04, 2008 - 12:40 pm UTC

My problem is that I have to reason with a DBA that
made all string fields varchar(4000) because he doesn't
see the performance implications

Can you give me some arguments besides
unable to create index's, to persuade this dba to
reconsider this very bad plan.
I am not sure the SHAME argument will hold up in
this discussion.....

So i am looking for optimizer choices etc

thanks in advance

Tom Kyte
August 04, 2008 - 2:10 pm UTC

... because he doesn't
see the performance implications
....


why is this about performance?

this is about doing it right - a datatype is a constraint, a data integrity constraint.

tell him "it makes you look like a high school kid creating your first database schema"

forget performance, this is about data - the data isn't varchar2(4000) and the only thing that'll happen if you make it all varchar2(4000) is someone will end up with a last name that is 4000 bytes long - and that probably isn't right.

Ask him 'to what sensible advantage did you do this thing, why is this good, educate me oh DBA, tell me why I should bring this idea to the world'

and then we can crush the response, if there isn't one - not much we can do for you other then to suggest "arm wrestling contest" to see who wins.

I fail to see the cost benefit within a datawarehouse environment

robert anderson, September 28, 2009 - 9:17 am UTC

Whilst I would agree with much that is said and am certainly not advocating the use of varchar2(4000) in the manor described - the costs of administration need to be addressed.

As was stated above, within a DW environment new sources maybe added and similarly sources may have data types changed.

These will require changes to the tables within the DW.

Whilst such a change maybe simple to enact on the table, with the increased usage of external ETL loaders, there may also be required changes to a substantial number of load processes, these cannot always be enacted with a simple alter table statement nor can they necessarily be enacted by the same person.

Further within a DW environment the system owner of a source system may make a change without informing the owner of the data-warehouse. Such a change will cause a failure in the load process requiring delivery delays, call outs, roll backs updates etc....

All of these have (potentially high) associated costs in time and fte, when explaining them to the business these costs must be weighed against the benefits.

The key benefit above seems to be that of constraints. However a constraint is justified by its purpose. There are a number of good examples in the text above but they dont seem to apply to warehousing. In most cases when loading to a warehouse data will either be used to represent the source as is or will be cleansed.

The former implies that no constraint is required and the latter is applied via the etl which consists of rules that may or may not include data length.

It is notable that oracle provides a number of ways to avoid the application of relational constraints specifically for data warehousing loads, perhaps constraints are sometimes to expensive to maintain?

As I said in the opener, I do not advocate the use of varchar2(4000) but I do feel that the above article failed to offer the business case that the comments requested.

As such I would appreciate the technical justification, other than 'shame', for the expense of enforcing constraints that are not seen as meaningful to the business.
Tom Kyte
September 30, 2009 - 7:15 am UTC

... These will require changes to the tables within the DW....

so what? That is - well - the normal course of development, maintenance, deployment.

so what - just say "so what"

it is rather normal, expected.


.. ith the increased
usage of external ETL loaders, there may also be required changes to a
substantial number of load processes, ..


to which i say "GOOD" - it forces you to actually INSPECT WHAT YOU ARE DOING, TO VERIFY, TO VALIDATE, TO DO THE RIGHT THING.

Your attitude towards this "do it easy, do it lazy, don't worry about the data" speaks to everything that is wrong in our industry today. Data quality is extremely low. You got lots of it, it is junk.


... urther within a DW environment the system owner of a source system may make a
change without informing the owner of the data-warehouse ...

AGAIN - PERFECTION - this is precisely exactly why you need constraints in place. TO FIND OUT THIS STUFF BEFORE IT BURNS YOU.


... when
explaining them to the business these costs must be weighed against the
benefits.
...

that is your job, just repeat that over and over: that is part of your JOB, that is what you do. You work in IT which stands for INFORMATION TECHNOLOGY.

Information - needs good data
Technology - you might be a professional.


... perhaps
constraints are sometimes to expensive to maintain?...

all in the eyes of the beholder. I've shown how to put constraints in place many times - in order to get the optimizer to have a CHANCE (you need the constraints for performance) - but woe be to the people that put the constraints on when the data DOESN'T OBEY THEM.

The results of such a thing: wrong answers.

But you were getting wrong answers anyway all of the time since you didn't care a whit about data integrity and just want things to flow in. Your judgment of the quality of the warehouse seems to be "if it is big, it must be good"


read about constraints - there are tons of articles out there - this one for example:

http://asktom.oracle.com/Misc/stuck-in-rut.html

or better

https://www.oracle.com/technetwork/issue-archive/2009/09-may/o39asktom-096149.html


don't use constraints -> make the optimizer brain dead and unable to perform its job

don't use constraints -> have garbage for data (you make sure that is so in your list of situations - feeder systems CHANGING stuff, you changing stuff and letting the loader processes do whatever they USED TO DO even if that is no longer proper and correct - no validation - you list out the worst things to do)


mfz, November 03, 2009 - 5:08 pm UTC

In my assigment , they have a table ( where everything in the world goes in ) and they have a column of varchar2(4000) , as a defensive mechanism , in case if the value goes till that limit . Most of the data in the column is less than 200 chars .

I was telling them to rightly size it ; but their argument is , if Oracle does not pad the characters ( as in char) , they see no value in rightly sizing it .

I showed your slide ( of metadata matters.ppt) ; Still they need a proof ... how 25 concurrent connection would allocate 3GB.

I told them this is not db ; but the client will have a issue . Am I right ?
Tom Kyte
November 09, 2009 - 3:04 pm UTC

I cannot compete with ignorance some days.

They obviously do not value data at all, they are of the camp "the application is the end all be all". But their application will go away someday and we'll be left with their mess of data.


Data integrity - they have none.

They have strings in their numbers and numbers in their strings and nothing is the proper length.

Don't know what else to tell you - I gave 'straw man could happen numbers', it is pretty straight "just multiply". Your mileage will vary.

But these people do not use databases, it sounds more like they are forced to say they use a database and they just really abuse it.

How are varchar stored internally in a block or page

Hari, November 17, 2009 - 6:55 am UTC

Hi Tom,

Good Day

I have an issue now, related to the same discussion, but not able to convince them. At least for my knowledge improvement, please provide me your expert comments on the following:

We have some tables with sequence generated number as primary key. Almost per day around 1,000 records gets inserted. So, I decided to key the size of this primary key as Number(8) (1000 * 30 days * 12 months * 50 years). However, the architects and managers are asking me to keep Number(32) since the limit is 32. I am very skeptical on this. I even cited the URL of this page, still I guess they are not convinced.

Now, please tell me if Number(8) is correct or Number(32) is correct?

Thanks

Hari
Tom Kyte
November 23, 2009 - 2:04 pm UTC

the limit is 38, not 32.

And a number(8) will consume as much storage as a number(32) as a number(38) to store the same information (that is, if you limit yourself to 8 digit numbers - they all consume the same storage)


Now, the choice of a scale - the number of digits - is driven by one thing and one thing alone...

What does your data model have to say about this field, what should the datatype and length be based on your data model. Is it valid to have 32 digits in there? Will your application break or work incorrectly if you get a nine digit number? If so, using number(32) would be not smart. You would use a number(8) for the same reason you use a varchar2(30) for some field. Because the length is a maximum of 30.

So, what is the domain of your primary key- is 123456789 a valid primary key? If not, number(8) is correct.


And remember, you can always alter a number(8) UP, but you have a hard time making a number(32) go down.

ops$tkyte%ORA11GR2> create table t ( x number(8), y number(32) );

Table created.

ops$tkyte%ORA11GR2> insert into t values ( 1234, 5678 );

1 row created.

ops$tkyte%ORA11GR2> alter table t modify x number(32);

Table altered.

ops$tkyte%ORA11GR2> alter table t modify y number(8);
alter table t modify y number(8)
                     *
ERROR at line 1:
ORA-01440: column to be modified must be empty to decrease precision or scale


meaning, if you oversize it is very hard to fix, if you undersize - easier to correct.

varchar2 storage architecture

Prajjwal Mallik, December 13, 2009 - 2:32 pm UTC

Hi Tom,
Every time I go through this site, I get really confused and on clearing my doubts, eventually I end up learning a lot!! All Thanks to you!!

I got very confused with the way how data is stored in varchar2. I could manage no good material on this from anywhere.
My understanding about varchar2 is this. Please correct me if I am wrong.
When on disk, I mean when in actual tables, a data of size 3 characters (say) occupies a space of 3 bytes (for any one-byte character, character set) also if the field in the table is of varchar2(10) say. That's why it is varchar2 and not char.
Now, when we consider a varray of varchar2(10) and try to assign a data of 3 characters to one of its elements, in memory still 10 bytes has to remain allocated for that array element. There varchar2 plays the same role as char!


On disk, is a varchar2(x) element stored in a continguous disk space or via pointers/linked-lists?

-> In case of contiguous storage, an update to a larger value might possibly need to move the row to some other block. This could possibly
1)slow down update performance
2)hamper the clustering factor if there were indexes on this varchar field.

-> In case of linked list, to fetch the data would need of a lot of I/O.
1)selects will be very slow

So which one does oracle choose?


Also, please throw some light on how clob and blob data are stored on disk.

Regards,
Prajjwal
Tom Kyte
December 14, 2009 - 8:30 am UTC

... I could manage
no good material on this from anywhere.
....

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1542606219593



Can see the performance impact in simple select but why?

A reader, September 22, 2011 - 6:31 am UTC

Thanks Tom for creating such a great repository of knowledge.
We too have modellers but are lucky that they do understand the concepts about data integrity and are against creating varchar2(4000) columns. But after going through this page I decided to test the performance impact as well and I did found the impact.
Here is what I did
SQL> create table test1 (col1 varchar2(1));

Table created.

SQL> insert into test2 values ('a');

1 row created.

SQL> create table test2 (col1 varchar2(4000));

Table created.

SQL> insert into test2 values ('a');

1 row created.

SQL> commit;


I bounced the instance and then perform selects on the above tables and found bytes in explain plan to be significantly higher when I run select on varchar2(4k) table(test2).

shutdown immediate
startup
SQL> set autot on;
SQL> select * from test;

C
-
a

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 1357081020

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | <b>Bytes</b> | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |     <b>2</b> |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| TEST |     1 |     <b>2</b> |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------

shutdown immediate
startup
SQL> set autot on
SQL> select * from test2;

COL1
--------------------------------------------------------------------------------
a

Execution Plan
----------------------------------------------------------
Plan hash value: 300966803

---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | <b>Bytes</b> | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |     1 |  <b>2002</b> |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| TEST2 |     1 |  <b>2002</b> |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------


Now for sure we can see the performance impact even in this simple select but what I am not able to understand is why we are seeing this value?

DB is 11.1.0.7.0.
Tom Kyte
September 22, 2011 - 5:46 pm UTC

what performance impact do you see? I see different bytes - but that is all - and that is based on the different row lengths?? That could affect plans - however, you don't demonstrate that - so this isn't showing any sort of performance difference...

you don't need to shutdown and startup for stuff like this - or anything really - that is more artificial than anything I can imagine - you HARDLY EVER do stuff right after a startup, most stuff takes place way after a startup (since databases run for really long times)...

Bytes means amount of information accessed

A reader, September 22, 2011 - 11:55 pm UTC

Back to my previous question on why Bytes is higher when I access table with varchar2(4k).

Bytes per my understanding tells the amount of data accessed.
From Oracle doc's plan table description
"BYTES NUMERIC Estimate by the cost-based approach of the number of bytes accessed by the operation"

So when I see higher Bytes for 4k column I guess it means it accessed more.
Also I don't see any difference in row length.

dev_85> select table_name,num_rows,avg_row_len from user_tables where table_name in ('TEST','TEST2');

TABLE_NAME                       NUM_ROWS AVG_ROW_LEN
------------------------------ ---------- -----------
TEST                                    1           2
TEST2                                   1           2

dev_85>


Please correct me whats the flaw
Tom Kyte
September 23, 2011 - 9:35 am UTC

the bytes in that report is an ESTIMATE.

It is based on a guess after looking at your schema.

It is not based on reality, it is a guess - made before accessing any of the data.


You'd get the same guess using "autotrace traceonly explain" - just explaining the query.


metadata matters

Rajeshwaran, Jeyabal, May 26, 2012 - 11:21 am UTC

Tom,

I was reading your MetadataMatters.ppt and have a question regarding "Length Matters"

<quote>
Varchar2(4000) for everything (just in ‘case’)
Assume 10 columns, average, minimum, maximum width is 4000
40,000 bytes per row on a fetch
</quote>

1) How can i see that its allocated 4000 width for average, minimum, maximum ?
2) How can i see that it will be 40,000 bytes per row fetch? is that something a Tkprof can show ?

Can we prove these points or they just math?
Tom Kyte
May 27, 2012 - 11:23 am UTC

1) think about it.

If you described a sql statement and it told you "I have 10 columns and every column is 4000 bytes"

You can measure it using whatever OS/language tools you have at your disposal. And, in the presentation when I deliver it, I mention that not all drivers will do this, some will allocate a bit at a time. It is something that can happen.

2) tkprof won't show it - tkprof only shows what happened. Unless you hit a row with all 10 columns at 4000 characters, it won't have actually happened.

just ask yourself how much data you would allocate to do an array fetch of 100 rows with 10 columns that could each be 4000 bytes?

Regadring the ora-01450

Mette Juel, September 13, 2012 - 1:02 pm UTC

I would like to know if this metalink note is still valid regarding the max lengths and the block length?

https://support.oracle.com/epmos/faces/ui/km/DocumentDisplay.jspx?_afrLoop=1685733510470401&id=136158.1&_afrWindowMode=0&_adf.ctrl-state=ptxaw30cc_326

There is some debate on the forum (and I have not been able to test it) on wheather the block needs to hold 2 indexkeys in a block or just one.

It the calculation has changed would you tell from which version it has changed?

We just need the calculations in order to prepare upgrading from ie 9>10 (I know its outdated) or from 9>11. And in the same process change from ISO8859 to UTF-8 or other multibyte charsets. (thus mulitplying by ie 4 for ALUTF32 when calculation index lengths)?

PS And we cannot change the bad design with ie 500 bytes varchar2's in maybe 10 cols (theese are customers databases, we have no control over the defs and contents, we "just" do the upgarde)

Web based UI and varchar2(4000)

Phil E Cannata, March 12, 2021 - 2:22 am UTC

What if the query tool is a React application and it is getting it's data from a REST query. In that case, Gigbytes of memory won't be allocated on the db server to satisfy the query would it?
Chris Saxon
March 15, 2021 - 3:39 pm UTC

Gigabytes of memory translates to a large data set, so probably not but it really depends on:

1. How many rows you're fetching
2. The number of columns and their data type and contents
3. How exactly the REST API is implemented

More to Explore

CLOB/BLOB/etc

Complete documentation on Securefiles and Large Objects here