How are varchar stored internally in a block or page.
Svend Jensen, August 10, 2001 - 5:15 am UTC
Very interesting point on memory usage in clients/OAS.
The so called "modeler" is close to the point. Varchar2 are stored with minimum overhead on disk. A column GREET varchar2(4000)='HELLO WORLD' would use ~11 bytes of disk storage.
Using variable multibyte charsets as UTF, each character will expand anywhere from 1 to 3 bytes. Take that into account when you design your column sizes for global deployment and charsets other than single byte charsets (US7ASCII, WE8ISO8859...)
I miss info on overhead and or compression with repeating characters like 200 spaces or dot's
Storage of Varchar2 fields
Ilango Jv, August 10, 2001 - 8:19 am UTC
Its a good explanation and very useful. Thank you very much Mr. Tom.
How are varchar stored internally in a block or page
Rob, August 10, 2001 - 3:20 pm UTC
Unfortunately I have seen this quite often.
The database modelers and are using row based architecture.
They create tables with a type identifier and a bunch of generic columns with varchar2 (4000). I have tried to get them to move away from this approach. They believe the flexibility out weighs the issues you bring up.
Im sure you can imagine what a pain it is developing for these systems.
AB, August 13, 2001 - 4:27 am UTC
I don't think these so called "modellers" are creating flexible environments at all - they are just being extremely lazy and cannot be bothered to understand the data that will be eventually be put into the database. Tom tells it as it is.
Storage of VARCHAR2
Randy, August 13, 2001 - 10:49 am UTC
Personally I have seen data modeller of this kind. They seem to have no idea about resource conservation and very surprisingly most of them has some kind of Visual Basic programming background. Can you ask your data modeller how he would estimate the size of the database/table ? I gurantee that would be very challenging.
varchar2 constraint
Mikito Harakiri, September 19, 2001 - 5:21 pm UTC
I'm with data modeller. How come Oracle enters 21st century with that silly limit on string length while any decent language has dynamic strings today? If it's a constraint, then it must be _optional_ like any other constraint. In short, one string datatype -- not dosen as today -- plus optional length constraint would be an ideal implementation.
September 19, 2001 - 5:53 pm UTC
If you don't want a limit, use a clob. So, when you don't want a limit -- we have a datatype for you. Heck, in 9i you can use like, =, <, >, instr, substr, whatever on a clob. Clobs are for you.
There are practical concerns with regards to database development and design. Imagine if people started putting gobs and gobs of data inline in the table (oh wait -- thats called a LONG and the issues with that were so bad -- we've totally replaced that datatype with a CLOB).
That is sort of what this thread was about -- there are practical considerations to be thought about with a physical data model. Can you answer some of the questions I posed above??
SQL stands for something -- structured is part of the name.
Everything has constraints. Even those languages that seemingly don't have constraints. (C seems to be a decent language, PLSQL too -- all strings are constrained in those languages as well).
Dates -- gotta be a date, there is no feb 30'th, there is nothing before 4712bc, there is no year 10,000.
Numbers -- gotta be a number, there is no number 'z', gotta be in a certain range of precision.
Strings -- gotta be of a certain size. Go with 4000 if you want -- but you might want to consider the practicality of doing so, answer the questions in your own mind first (so when the developers ask them, you are ready with the solutions to the real problems they will encounter).
VARCHAR2 columns per table
sachin, March 26, 2003 - 3:38 pm UTC
Tom,
We have 8.1.7.4.
I am building a table for a survey...
Some of the columns involved needs lot of text...about 2000
characters...
So, couple of tables are coming up with about 11 columns
each with length varchar2(2000)...
The analyst is insisting that we will need this much length
for all these columns...may be more...
What is the best way to declare the types of these columns..
Will CLOBS be better than VARCHAR2
Also, We are using three tablespaces...small, medium and large (LMT)
I was going to put this in large:
DATAFILE SIZE 1000M
16384K
MAXSIZE 1000M
EXTENT MANAGEMENT LOCAL UNIFORM
SIZE 1024K
What are your suggestions about tablespaces for this kind of table.....may be a separate tablespace??
Thanks...
March 26, 2003 - 6:36 pm UTC
If they are 2000 characters -- varchar2(2000) is appropriate.
A table with 11 varchar2(2000) columns is not unusual, it need not be treated differently from anything else.
Is UTF8 not required?
Tony, June 10, 2003 - 8:32 am UTC
Tom,
One of my friends says that if we use varchar2( 20 char )
syntax, no need to have UTF8 for multi-byte character sets. Is it true?
June 10, 2003 - 9:21 am UTC
no, either you misunderstood or your friend is confused.
You want to use varchar2(20 CHAR) when you think you might want to use UTF8 or any multi-byte character set and don't want to have to change your schema.
a column of varchar2(20) reserves 20 bytes.
in US7ASCII for example, that just happens to be 20 characters.
in UTF8, that might be 20 characters, it might be 10, it might be 6 -- since each character might take 1, 2, or 3 bytes (maybe even more, not sure, not relevant -- only relevant that a CHARACTER may take more then ONE BYTE)
Now, if you had defined the column in 9i (only in 9i and up does this feature exist) as varchar2(20 CHAR) then
in US7ASCII, that will permit upto 20 characters in that character set -- that just happens to be 20 bytes.
in UTF8, that will permit upto 20 characters in that charcater set -- meaning that might be 20, 40, 60, .... bytes
NOW, bear in mind, this does NOT increase the maximum varchar2/char byte width which is still 4000/2000! If you
varchar2(4000 CHAR)
char(2000 CHAR)
that will still be limited to 4000/2000 BYTES and may or may NOT be able to hold 4000/2000 utf8 characters!
This is something to consider start using in your applications as it will save pain in the furture if you are installed on a multi-byte character set. More then one application has had to go back and fix their schema since their "first_name" field at varchar2(40) -- couldn't hold 40 characters -- they had to double to triple that
What about size of number data type columns?
A reader, June 11, 2003 - 5:10 pm UTC
Hi Tom,
Could you please also give your views (apart for using as a constraint) regarding columns created with number data type of specific precision rather that just number?
Thanks
June 12, 2003 - 8:28 am UTC
if the number is to be a number between 1 and 999 with no decimals, etc etc etc -- using number(3) would be appropriate.
Just like using varchar2(10) for a 10 byte field would be.
If there is a constraint to be enforced, DO IT.
Max of varchar2 is 2000 or 4000 bytes ?
Tony, June 12, 2003 - 2:50 am UTC
You have mentioned in the above follow-up for my question that varchar2 max byte is 2000. Is it not 4000 bytes?
June 12, 2003 - 8:48 am UTC
touched that up -- was thinking "char", typed "varchar2", so I added both
Max of varchar2 is 2000 or 4000 bytes ? Please Ignore
Tony, June 12, 2003 - 2:55 am UTC
Sorry Tom, I've misunderstood. You're right.
June 12, 2003 - 8:50 am UTC
no, you were, it is fixed.
Excellent as always.
Orlando Reyes, December 07, 2004 - 5:58 pm UTC
Tom,
I am facing the same situation right now but related to NUMBER vs NUMBER(p,s).
Our Data Architect group wants to define all numeric values to be NUMBER because of flexibility, they claim, and their manager agrees with it, we, the DBA team, have already point out the Data Constraint issue along with formatting, and rounding/truncate possible problems.
Can you please give me some more ammunition to use when we have the final meeting to discuss the issue?
Is there anything strong enough, related to performance, space or the like that we could also use? Like the problem of creating an index on the VARCHAR4000). My understanding is that Oracle does not really gets impacted either way, since it would store the number the same way and it would take the same space, am I wrong? Performance, I am not sure if the number manipulation would be impacted.
Any ideas will be very much appreciated.
Thanks,
December 07, 2004 - 8:54 pm UTC
how can they even begin to anoint themselves with the term "architect"
anyway -- I suppose they think all things should be varchar2(4000) too -- it is after all "flexible" no?
you are losing the edit - if the number is to be a number(7,2) -- you lose the edit.
space -- nothing there -- until bad data gets in, but then so what, you have bad data (note: not IF bad .... but when BAD....)
it is all about data integrity here -- remember a database does not have "99% data integrity" - it is a binary thing, either you have it or not. if they just use number for everything, they probably don't have it.
FFO, December 27, 2004 - 7:04 am UTC
So from a storage perspective, a VARCHAR is good. But on the memory side it may consumme more memory (disc buffers, sort areas...). Does VARCHAR has an influence on index creation/queries performance? I mean, does reducing size -say from VARCHAR(4000) to VARCHAR(1500)- improves performance ?
December 27, 2004 - 10:15 am UTC
how so? why do you say that about the memory?
confusion about follow up
amit poddar, December 27, 2004 - 11:18 am UTC
Hi,
From your first follow up
"Ok, so the code prepares a query that select 10 columns that are varchar2.
The developer, for performance would like to array fetch (very important). They
would like to array fetch say 100 rows (very very typical). So, we have 4000
times 10 times 100 = alsmost 4meg of RAM the developer must allocate!!
Now consider if that where 10 varchar2(80) fields (its probably much smaller
then that) -- thats about oh 78k. Ask the "data modeler" how much RAM he is
willing to kick in for this system...."
I thought varchars were not blank padded. So lets say we have one character in a varchar2(4000) field, The client fetches 100 rows then why would it fetch 4000*100 bytes instead of just 4000*1 since varchar2 is not blank padded.
Fetching 4000*100 bytes makes sense if we had a char(4000) field.
Could you please clarify
December 27, 2004 - 12:19 pm UTC
that is not buffer cache memory, that is client side memory. it will not affect disk buffers, sort areas at all.
The client, buffering this data, would have to allocate a "square array" to fetch into.
They are not blank padded, is it just that when you ask for 100 rows into an array and each row COULD have a varchar2(4000), you'd better have an array that is 100*4000 in size.
Amazing data modeler!
Mark, December 31, 2004 - 3:45 pm UTC
Why not make every column a blob? They're more flexible, after all! (sarcasm)
maximum manipulation of varchar
Raj, March 28, 2005 - 8:52 am UTC
The explanations are excellent.
I need a clarification wrt varchars.
I'm having a file and i want to store the contents of the file in a variable so that this variable will be used to update the values present in a column
of the table. The column is LONG field.
varchar abc[20000000]; /* pro*c declaration -2GB*/
Is the above declaration correct or any other better way for manipulating the file via variables in pro*C environment.
So far it is working fine with the above declaration.
I'm in a developing environment.
But now all of a sudden it is just getting stored
26702 bytes only. The remaining part is just getting truncated.
Your view on this will be highly useful.
Is there any restriction of maximum size for varchar declaration in proc*C environment. Is there any parameter variables needs to be increased.
Thanx in advance
March 28, 2005 - 9:13 am UTC
in 2005 you should really be using a CLOB.
why are you using a long?
Maximum manipulation of Varchars
raj, March 28, 2005 - 9:50 am UTC
Thanx for your response.
It is to be changed. Now this is, as a part of migration process ,we need to take all the values in the long field into a flat file. which i'm able to get. When i did the other way round (From the file to the table. )
I'm having that problem. But earlier i tried with 1.2 million bytes which worked fine.
I'm able to update the long field. But now i'm stucked up.
Please highlight on this area.
March 28, 2005 - 10:51 am UTC
if you have it in a file, why aren't you using sqlldr?
Matt Prince, June 01, 2005 - 11:26 am UTC
Tom, this all makes perfect sense, at least for OLTP, but what happens when we move into the warehouse arena?
In the warehouse we are building (to be more precise in the Conformed Data Store that we are building to feed the datamarts), we will be conforming data from multiple systems and I would have thought there is a trade off between tight definitions and having to alter the schema whenever a new system comes in with a slightly larger textual code or description.
For example, say the systems we are currently using as sources have a varchar2(4) code, defining the source system id on the loaded record a varchar2(10) does not seem unreasonable.
However, it has been suggested that we use a small number of sizes (say vc10,100,250,500,4000) and build in plenty of redundancy - would you still see disadvantages with this approach in a warehouse environment?
June 01, 2005 - 11:50 am UTC
you can make columns larger easily.
you cannot make columns smaller without unloading and reloading (online rebuild or whatever).
should you not be made aware of the fact that what used to be 4 is now 10. Might you have to rethink your future settings, might your cleansing logic have to change, might not dozens of things be affected?
I would use the right size, period.
I'd like to know -- because maybe the jump to 10 was "a big mistake, glad you caught it" sort of thing. Rather than a "you took garbage in and gave us garbage out"
no blanket rules..
Matt Prince, June 01, 2005 - 1:02 pm UTC
Fair comment - however I think my example is a case where it pays to oversize the column. Its quite possible that a new system [ perhaps this is unusual we are likely to be incorporating new systems down the line ] would have a primary key longer than 4 chars but unlikely that it would be longer than 10. Its an internal field, not used in reports so display formatting is not an issue. Finally the ETL has to be extremely messed up to insert something invalid into the source id column (and cunning to not mess up everywhere else).
So when the new system comes along with a vc10 - we check that it fits, yes, no changes required.
It is of course small potatoes when compared to the original suggestion from Kellys modeller
June 01, 2005 - 5:16 pm UTC
we'll have to agree to disagree.
I don't think that just because it is a Data warehouse, integrity can be thrown out the window (in some respects, I almost consider it even more important here).
ETL can get messed up, anything can get messed up, multiple layers of defense stands a better chance of catching a mess up.
A reader, May 16, 2008 - 5:27 pm UTC
When I define a column using the "UTF8" syntax, what's stored in user_tab_columns.data_length is the number of bytes.
For example,
CREATE TABLE x (
col1 VARCHAR2(10),
col2 VARCHAR2(10 CHAR)
);
col data_type for a10
col column_name for 5
SELECT column_name, data_type, data_length
FROM user_tab_columns
WHERE table_name = 'X';
COLUMN_NAME DATA_TYPE DATA_LENGTH
------------------------------ ---------- -----------
COL1 VARCHAR2 10
COL2 VARCHAR2 40 --> number of bytes
Now where in the data dictionary does it store the fact that col2 is defined as 10 characters? Doesn't Oracle still need this info in order to constraint col2 to 10 characters?
INSERT INTO x VALUES ('0123456789', '01234567890');
ERROR at line 1:
ORA-12899: value too large for column "PROTECT"."X"."COL2" (actual: 11, maximum: 10)
May 19, 2008 - 3:16 pm UTC
CHAR_COL_DECL_LENGTH
Followup on CHAR_COL_DECL_LENGTH
A reader, May 19, 2008 - 5:47 pm UTC
CREATE TABLE x (
col1 VARCHAR2(10),
col2 VARCHAR2(10 CHAR)
);
col data_type for a10
col column_name for 5
SELECT column_name, data_type, data_length, CHAR_COL_DECL_LENGTH
FROM user_tab_columns
WHERE table_name = 'X';
COLUM DATA_TYPE DATA_LENGTH CHAR_COL_DECL_LENGTH
----- ---------- ----------- --------------------
COL1 VARCHAR2 10 10
COL2 VARCHAR2 40 40
I was expecting that col2 will have 40(data_length)/10(CHAR_COL_DECL_LENGTH) instead of 40/40?
May 20, 2008 - 11:21 am UTC
sorry - CHAR_LENGTH....
ops$tkyte%ORA10GUT> desc x
Name Null? Type
------------------------------------ -------- -------------------------
COL1 VARCHAR2(10)
COL2 VARCHAR2(10 CHAR)
ops$tkyte%ORA10GUT> set linesize 1000
ops$tkyte%ORA10GUT> /
COLUMN_NAME DATA_TYPE DATA_LENGTH CHAR_COL_DECL_LENGTH CHAR_LENGTH
------------------------------ ---------- ----------- -------------------- -----------
COL1 VARCHAR2 10 10 10
COL2 VARCHAR2 40 40 10
ops$tkyte%ORA10GUT>
VARCHAR(4000)
Ramakrishnan, May 29, 2008 - 12:03 am UTC
My question is same as 1st question, We know many varchar column needs only VARCHAR(100), but we are creating with VARCHAR(4000. Can you explain what performance problem we will face & do we have any storage issue if we blindly create VARCHAR(4000). Since we need this as argument to tell why we should NOT create VARCHAR(max number)
Best regards
Ramakrishnan.A
May 29, 2008 - 7:50 am UTC
no, you don't need a performance argument here to win a common sense argument.
A datatype is nothing more than a data integrity constraint, plain and simple, period
If you have a requirement to store 100 characters, you would use a varchar2(100) or (100 char). Just like if you had a requirement to store a date you would use a date (or a timestamp) and so on. You use the right datatype because you are professional data handling people, because you actually take some modicum of pride in that which you do.
If you use a varchar2(4000) do you know what will happen to your data? Simple. Someone will put 4000 bytes of data in there - 4000 being more than 100. You have garbage. You have a field which by all accounts should have 100 and now doesn't. And why? Because someone did not do their job.
Please - think about this, why in the world do we need to take it to a performance level here - why, for once, can't COMMON SENSE rule????
Since you can always alter a varchar2 column to be LARGER in the future (and the converse is NOT TRUE) - one questions the motivation (or lack thereof perhaps) of your 'data modelers'. "We are too lazy to do our work, we don't care about data integrity, just use a varchar2(4000)"
Please just re-read the original response - everything you need is there.
Instead of just raw performance, use shame, shame then into actually doing their job.
Well, there is a doc on part of the performance thing
joel garry, July 28, 2008 - 5:26 pm UTC
July 29, 2008 - 3:45 pm UTC
that is specific to variables declared in plsql, not datatypes used to store data.
varchar2(4000)
Johan de Ridder, August 04, 2008 - 12:40 pm UTC
My problem is that I have to reason with a DBA that
made all string fields varchar(4000) because he doesn't
see the performance implications
Can you give me some arguments besides
unable to create index's, to persuade this dba to
reconsider this very bad plan.
I am not sure the SHAME argument will hold up in
this discussion.....
So i am looking for optimizer choices etc
thanks in advance
August 04, 2008 - 2:10 pm UTC
... because he doesn't
see the performance implications
....
why is this about performance?
this is about doing it right - a datatype is a constraint, a data integrity constraint.
tell him "it makes you look like a high school kid creating your first database schema"
forget performance, this is about data - the data isn't varchar2(4000) and the only thing that'll happen if you make it all varchar2(4000) is someone will end up with a last name that is 4000 bytes long - and that probably isn't right.
Ask him 'to what sensible advantage did you do this thing, why is this good, educate me oh DBA, tell me why I should bring this idea to the world'
and then we can crush the response, if there isn't one - not much we can do for you other then to suggest "arm wrestling contest" to see who wins.
I fail to see the cost benefit within a datawarehouse environment
robert anderson, September 28, 2009 - 9:17 am UTC
Whilst I would agree with much that is said and am certainly not advocating the use of varchar2(4000) in the manor described - the costs of administration need to be addressed.
As was stated above, within a DW environment new sources maybe added and similarly sources may have data types changed.
These will require changes to the tables within the DW.
Whilst such a change maybe simple to enact on the table, with the increased usage of external ETL loaders, there may also be required changes to a substantial number of load processes, these cannot always be enacted with a simple alter table statement nor can they necessarily be enacted by the same person.
Further within a DW environment the system owner of a source system may make a change without informing the owner of the data-warehouse. Such a change will cause a failure in the load process requiring delivery delays, call outs, roll backs updates etc....
All of these have (potentially high) associated costs in time and fte, when explaining them to the business these costs must be weighed against the benefits.
The key benefit above seems to be that of constraints. However a constraint is justified by its purpose. There are a number of good examples in the text above but they dont seem to apply to warehousing. In most cases when loading to a warehouse data will either be used to represent the source as is or will be cleansed.
The former implies that no constraint is required and the latter is applied via the etl which consists of rules that may or may not include data length.
It is notable that oracle provides a number of ways to avoid the application of relational constraints specifically for data warehousing loads, perhaps constraints are sometimes to expensive to maintain?
As I said in the opener, I do not advocate the use of varchar2(4000) but I do feel that the above article failed to offer the business case that the comments requested.
As such I would appreciate the technical justification, other than 'shame', for the expense of enforcing constraints that are not seen as meaningful to the business.
September 30, 2009 - 7:15 am UTC
... These will require changes to the tables within the DW....
so what? That is - well - the normal course of development, maintenance, deployment.
so what - just say "so what"
it is rather normal, expected.
.. ith the increased
usage of external ETL loaders, there may also be required changes to a
substantial number of load processes, ..
to which i say "GOOD" - it forces you to actually INSPECT WHAT YOU ARE DOING, TO VERIFY, TO VALIDATE, TO DO THE RIGHT THING.
Your attitude towards this "do it easy, do it lazy, don't worry about the data" speaks to everything that is wrong in our industry today. Data quality is extremely low. You got lots of it, it is junk.
... urther within a DW environment the system owner of a source system may make a
change without informing the owner of the data-warehouse ...
AGAIN - PERFECTION - this is precisely exactly why you need constraints in place. TO FIND OUT THIS STUFF BEFORE IT BURNS YOU.
... when
explaining them to the business these costs must be weighed against the
benefits.
...
that is your job, just repeat that over and over: that is part of your JOB, that is what you do. You work in IT which stands for INFORMATION TECHNOLOGY.
Information - needs good data
Technology - you might be a professional.
... perhaps
constraints are sometimes to expensive to maintain?...
all in the eyes of the beholder. I've shown how to put constraints in place many times - in order to get the optimizer to have a CHANCE (you need the constraints for performance) - but woe be to the people that put the constraints on when the data DOESN'T OBEY THEM.
The results of such a thing: wrong answers.
But you were getting wrong answers anyway all of the time since you didn't care a whit about data integrity and just want things to flow in. Your judgment of the quality of the warehouse seems to be "if it is big, it must be good"
read about constraints - there are tons of articles out there - this one for example:
http://asktom.oracle.com/Misc/stuck-in-rut.html or better
https://www.oracle.com/technetwork/issue-archive/2009/09-may/o39asktom-096149.html don't use constraints -> make the optimizer brain dead and unable to perform its job
don't use constraints -> have garbage for data (you make sure that is so in your list of situations - feeder systems CHANGING stuff, you changing stuff and letting the loader processes do whatever they USED TO DO even if that is no longer proper and correct - no validation - you list out the worst things to do)
mfz, November 03, 2009 - 5:08 pm UTC
In my assigment , they have a table ( where everything in the world goes in ) and they have a column of varchar2(4000) , as a defensive mechanism , in case if the value goes till that limit . Most of the data in the column is less than 200 chars .
I was telling them to rightly size it ; but their argument is , if Oracle does not pad the characters ( as in char) , they see no value in rightly sizing it .
I showed your slide ( of metadata matters.ppt) ; Still they need a proof ... how 25 concurrent connection would allocate 3GB.
I told them this is not db ; but the client will have a issue . Am I right ?
November 09, 2009 - 3:04 pm UTC
I cannot compete with ignorance some days.
They obviously do not value data at all, they are of the camp "the application is the end all be all". But their application will go away someday and we'll be left with their mess of data.
Data integrity - they have none.
They have strings in their numbers and numbers in their strings and nothing is the proper length.
Don't know what else to tell you - I gave 'straw man could happen numbers', it is pretty straight "just multiply". Your mileage will vary.
But these people do not use databases, it sounds more like they are forced to say they use a database and they just really abuse it.
How are varchar stored internally in a block or page
Hari, November 17, 2009 - 6:55 am UTC
Hi Tom,
Good Day
I have an issue now, related to the same discussion, but not able to convince them. At least for my knowledge improvement, please provide me your expert comments on the following:
We have some tables with sequence generated number as primary key. Almost per day around 1,000 records gets inserted. So, I decided to key the size of this primary key as Number(8) (1000 * 30 days * 12 months * 50 years). However, the architects and managers are asking me to keep Number(32) since the limit is 32. I am very skeptical on this. I even cited the URL of this page, still I guess they are not convinced.
Now, please tell me if Number(8) is correct or Number(32) is correct?
Thanks
Hari
November 23, 2009 - 2:04 pm UTC
the limit is 38, not 32.
And a number(8) will consume as much storage as a number(32) as a number(38) to store the same information (that is, if you limit yourself to 8 digit numbers - they all consume the same storage)
Now, the choice of a scale - the number of digits - is driven by one thing and one thing alone...
What does your data model have to say about this field, what should the datatype and length be based on your data model. Is it valid to have 32 digits in there? Will your application break or work incorrectly if you get a nine digit number? If so, using number(32) would be not smart. You would use a number(8) for the same reason you use a varchar2(30) for some field. Because the length is a maximum of 30.
So, what is the domain of your primary key- is 123456789 a valid primary key? If not, number(8) is correct.
And remember, you can always alter a number(8) UP, but you have a hard time making a number(32) go down.
ops$tkyte%ORA11GR2> create table t ( x number(8), y number(32) );
Table created.
ops$tkyte%ORA11GR2> insert into t values ( 1234, 5678 );
1 row created.
ops$tkyte%ORA11GR2> alter table t modify x number(32);
Table altered.
ops$tkyte%ORA11GR2> alter table t modify y number(8);
alter table t modify y number(8)
*
ERROR at line 1:
ORA-01440: column to be modified must be empty to decrease precision or scale
meaning, if you oversize it is very hard to fix, if you undersize - easier to correct.
varchar2 storage architecture
Prajjwal Mallik, December 13, 2009 - 2:32 pm UTC
Hi Tom,
Every time I go through this site, I get really confused and on clearing my doubts, eventually I end up learning a lot!! All Thanks to you!!
I got very confused with the way how data is stored in varchar2. I could manage no good material on this from anywhere.
My understanding about varchar2 is this. Please correct me if I am wrong.
When on disk, I mean when in actual tables, a data of size 3 characters (say) occupies a space of 3 bytes (for any one-byte character, character set) also if the field in the table is of varchar2(10) say. That's why it is varchar2 and not char.
Now, when we consider a varray of varchar2(10) and try to assign a data of 3 characters to one of its elements, in memory still 10 bytes has to remain allocated for that array element. There varchar2 plays the same role as char!
On disk, is a varchar2(x) element stored in a continguous disk space or via pointers/linked-lists?
-> In case of contiguous storage, an update to a larger value might possibly need to move the row to some other block. This could possibly
1)slow down update performance
2)hamper the clustering factor if there were indexes on this varchar field.
-> In case of linked list, to fetch the data would need of a lot of I/O.
1)selects will be very slow
So which one does oracle choose?
Also, please throw some light on how clob and blob data are stored on disk.
Regards,
Prajjwal
December 14, 2009 - 8:30 am UTC
Can see the performance impact in simple select but why?
A reader, September 22, 2011 - 6:31 am UTC
Thanks Tom for creating such a great repository of knowledge.
We too have modellers but are lucky that they do understand the concepts about data integrity and are against creating varchar2(4000) columns. But after going through this page I decided to test the performance impact as well and I did found the impact.
Here is what I did
SQL> create table test1 (col1 varchar2(1));
Table created.
SQL> insert into test2 values ('a');
1 row created.
SQL> create table test2 (col1 varchar2(4000));
Table created.
SQL> insert into test2 values ('a');
1 row created.
SQL> commit;
I bounced the instance and then perform selects on the above tables and found
bytes in explain plan to be significantly higher when I run select on varchar2(4k) table(test2).
shutdown immediate
startup
SQL> set autot on;
SQL> select * from test;
C
-
a
Elapsed: 00:00:00.01
Execution Plan
----------------------------------------------------------
Plan hash value: 1357081020
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | <b>Bytes</b> | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | <b>2</b> | 3 (0)| 00:00:01 |
| 1 | TABLE ACCESS FULL| TEST | 1 | <b>2</b> | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
shutdown immediate
startup
SQL> set autot on
SQL> select * from test2;
COL1
--------------------------------------------------------------------------------
a
Execution Plan
----------------------------------------------------------
Plan hash value: 300966803
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | <b>Bytes</b> | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | <b>2002</b> | 3 (0)| 00:00:01 |
| 1 | TABLE ACCESS FULL| TEST2 | 1 | <b>2002</b> | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------
Now for sure we can see the performance impact even in this simple select but what I am not able to understand is
why we are seeing this value?
DB is 11.1.0.7.0.
September 22, 2011 - 5:46 pm UTC
what performance impact do you see? I see different bytes - but that is all - and that is based on the different row lengths?? That could affect plans - however, you don't demonstrate that - so this isn't showing any sort of performance difference...
you don't need to shutdown and startup for stuff like this - or anything really - that is more artificial than anything I can imagine - you HARDLY EVER do stuff right after a startup, most stuff takes place way after a startup (since databases run for really long times)...
Bytes means amount of information accessed
A reader, September 22, 2011 - 11:55 pm UTC
Back to my previous question on why Bytes is higher when I access table with varchar2(4k).
Bytes per my understanding tells the amount of data accessed.
From Oracle doc's plan table description
"BYTES NUMERIC Estimate by the cost-based approach of the number of bytes accessed by the operation"
So when I see higher Bytes for 4k column I guess it means it accessed more.
Also I don't see any difference in row length.
dev_85> select table_name,num_rows,avg_row_len from user_tables where table_name in ('TEST','TEST2');
TABLE_NAME NUM_ROWS AVG_ROW_LEN
------------------------------ ---------- -----------
TEST 1 2
TEST2 1 2
dev_85>
Please correct me whats the flaw
September 23, 2011 - 9:35 am UTC
the bytes in that report is an ESTIMATE.
It is based on a guess after looking at your schema.
It is not based on reality, it is a guess - made before accessing any of the data.
You'd get the same guess using "autotrace traceonly explain" - just explaining the query.
metadata matters
Rajeshwaran, Jeyabal, May 26, 2012 - 11:21 am UTC
Tom,
I was reading your MetadataMatters.ppt and have a question regarding "Length Matters"
<quote>
Varchar2(4000) for everything (just in ‘case’)
Assume 10 columns, average, minimum, maximum width is 4000
40,000 bytes per row on a fetch
</quote>
1) How can i see that its allocated 4000 width for average, minimum, maximum ?
2) How can i see that it will be 40,000 bytes per row fetch? is that something a Tkprof can show ?
Can we prove these points or they just math?
May 27, 2012 - 11:23 am UTC
1) think about it.
If you described a sql statement and it told you "I have 10 columns and every column is 4000 bytes"
You can measure it using whatever OS/language tools you have at your disposal. And, in the presentation when I deliver it, I mention that not all drivers will do this, some will allocate a bit at a time. It is something that can happen.
2) tkprof won't show it - tkprof only shows what happened. Unless you hit a row with all 10 columns at 4000 characters, it won't have actually happened.
just ask yourself how much data you would allocate to do an array fetch of 100 rows with 10 columns that could each be 4000 bytes?
Regadring the ora-01450
Mette Juel, September 13, 2012 - 1:02 pm UTC
I would like to know if this metalink note is still valid regarding the max lengths and the block length?
https://support.oracle.com/epmos/faces/ui/km/DocumentDisplay.jspx?_afrLoop=1685733510470401&id=136158.1&_afrWindowMode=0&_adf.ctrl-state=ptxaw30cc_326 There is some debate on the forum (and I have not been able to test it) on wheather the block needs to hold 2 indexkeys in a block or just one.
It the calculation has changed would you tell from which version it has changed?
We just need the calculations in order to prepare upgrading from ie 9>10 (I know its outdated) or from 9>11. And in the same process change from ISO8859 to UTF-8 or other multibyte charsets. (thus mulitplying by ie 4 for ALUTF32 when calculation index lengths)?
PS And we cannot change the bad design with ie 500 bytes varchar2's in maybe 10 cols (theese are customers databases, we have no control over the defs and contents, we "just" do the upgarde)
Web based UI and varchar2(4000)
Phil E Cannata, March 12, 2021 - 2:22 am UTC
What if the query tool is a React application and it is getting it's data from a REST query. In that case, Gigbytes of memory won't be allocated on the db server to satisfy the query would it?
March 15, 2021 - 3:39 pm UTC
Gigabytes of memory translates to a large data set, so probably not but it really depends on:
1. How many rows you're fetching
2. The number of columns and their data type and contents
3. How exactly the REST API is implemented