Hi Tom, long time reader, first time poster.
I am interested in your opinion on the main reason why not use varchar2(255) or greater when storing data, if you are not 100% sure of what may get inserted into that column over time. Basically why not just use varchar2(4000) and avoid the old ORA-12899: Value to Large for Column error.
Here is my list, I am curious about yours and the readers'
1. Data integrity. Lots of people need to read the data dictionary to build applications, to store data, and truth in advertising here is critical.
The only thing I will say about this #1 is that I have had to integrate plenty of public (govt) data sets into my applications, and let's just say govt agencies are not always that hot at giving you the data types and lengths (or worse, make you accept XML... yuck).
2. Limitation on index creation. If you try to create an index with 2 columns that actually store 2 characters but are defined as varchar2(4000) you are out of luck.
Storage of either the data column or and index on these columns does not seem to come into play for this discussion (although I have heard these myths for many years).
Example of index creation limitation:
drop table idx_size_table;
create table idx_size_table (
thincol varchar2(4000),
fatcol varchar2(4000));
create index idx_size_test on idx_size_table (thincol,fatcol);
gets you the error:
Error starting at line 8 in command:
create index idx_size_test on idx_size_table (thincol,fatcol)
Error at Command Line:8 Column:30
Error report:
SQL Error: ORA-01450: maximum key length (6398) exceeded
01450. 00000 - "maximum key length (%s) exceeded"
*Cause:
*Action:
Too bad, if thincol was supposed to store a 2 digit country code, and fatcol was supposed to store a 2 digit state code. No index for you.
Example of why storage of indexed columns does not matter:
drop table idx_size_table;
create table idx_size_table (
thincol varchar2(3),
fatcol varchar2(255));
begin
for i in 1 .. 5000
loop
insert into idx_size_table (thincol, fatcol) values ('abc','abc');
end loop;
end;
/
create index idx_size_thin on idx_size_table (thincol) compute statistics;
create index idx_size_fat on idx_size_table (fatcol) compute statistics;
select index_name, table_name, blevel, leaf_blocks, distinct_keys, avg_leaf_blocks_per_key, avg_data_blocks_per_key from user_indexes;
shows that both indexes have the exact same characteristics and leaf blocks. So the same storage even though one was defined as varchar2(3) and one was defined as varchar2(255) but only had 3 characters in it.
If you change the insert loop to insert.... values ('abc','abcdefghijklmnop')....
and then drop and recreate the index, you will see the differences in number of blocks, not due to the defined length, but to the actual length.
Apologies if I could not find an original post, and if I got the code tagging wrong.
... , if you are not 100% sure of what may get inserted into that column over time. ...
if that is true, please inform me how you can write your application? And please tell me how you know it won't exceed 4000 bytes???
Data integrity is the only one I really need - that does it for me.
The index could be an issue.
Client side resources could definitely be another. Imagine you are writing an application, it has to handle an arbitrary cursor. Imagine you have a table with 10 columns - the average width of which would be 40 bytes if you "right sized everything" - something are 100 bytes, some are 2 bytes - but the average is 40 bytes (so each row is 400 bytes). Further, suppose you are concerned about performance so you'll array fetch 100 rows at a time.
Let's do the math
40 bytes/column * 10 columns = 400 bytes/row
400 bytes/row * 100 rows = 40,000 bytes per fetch buffer
Assume you have 25 cursors open
25*40,000 = 1,000,000 - 1mb per connection
assume you have a connection pool with 100 connections
100mb
Now, do the math with 4000 bytes maximum per column...
4000 bytes/column * 10 columns = 40000 bytes/row
40000 bytes/row * 100 rows = 4,000,000 bytes per fetch buffer
Assume you have 25 cursors open
25*4mb = 100mb per connection
assume you have a connection pool with 100 connections
10,000mb
ouch.
see also
https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1145132537055