You Asked

We're going to allow sys admins to create columns in the DB using a UI for our applicatoin.

I am wondering if there will be any degradation in performance if say, the Users are allowed to create a 4000 nvarchar irrespective if they use only 4 chars or 4000 chars, everytime they need to create a new String column, they get to create a 4K nvarchar, or would it be more useful to give them a choice of say 4, 40, 400 & 4000 sized String columns?

Is there a performance degradation when reading this data?

Our app will be Java based.

Thanks

and Tom said...

It will affect

o data integrity
o data quality
o data usability
o runtime performance
o memory utilization

in a negative fashion. the funny thing is - we can easily INCREASE the width of a column - it is really hard to fix a column and shrink its width however!!!!

Consider this tiny schema:

drop table t1;
drop table t2;
create table t1 as select * from all_objects;
create table t2
(OWNER varchar2(4000),
OBJECT_NAME varchar2(4000),
SUBOBJECT_NAME varchar2(4000),
OBJECT_ID varchar2(4000),
DATA_OBJECT_ID varchar2(4000),
OBJECT_TYPE varchar2(4000),
CREATED varchar2(4000),
LAST_DDL_TIME varchar2(4000),
TIMESTAMP varchar2(4000),
STATUS varchar2(4000),
TEMPORARY varchar2(4000),
GENERATED varchar2(4000),
SECONDARY varchar2(4000)
)
/
insert into t2 select * from t1;

Ok, we run this java program against it:

public static void process_data( Connection conn, String inputs )
throws Exception
{
PreparedStatement pstmt;
String col;
int i;

pstmt = conn.prepareStatement
("select * from " + inputs );
((OraclePreparedStatement)pstmt).setRowPrefetch(100);
System.out.println("select * from " + inputs );

ResultSet rset = pstmt.executeQuery();

while( rset.next() )
{
for( i = 1; i<=13; i++ )
{
col = rset.getString(i);
}
//System.in.read();
//break;
}
rset.close();
pstmt.close();
}

That is, array fetch 100 rows at a time (good overall fetch size) and look at each column. Just like a program might. Now, t1 and t2 have the *same* data, just t2 is all about varchar2(4000):

[tkyte@xtkyte-pc j]$ time java select t1
select * from t1

real 0m2.124s
user 0m1.590s
sys 0m0.100s
[tkyte@xtkyte-pc j]$ time java select t1
select * from t1

real 0m2.306s
user 0m1.790s
sys 0m0.070s
[tkyte@xtkyte-pc j]$ time java select t2
select * from t2

real 0m5.565s
user 0m5.090s
sys 0m0.120s
[tkyte@xtkyte-pc j]$ time java select t2
select * from t2

real 0m5.560s
user 0m5.040s
sys 0m0.110s

so, to call is slower is an understatement (and bear in mind, we've NEGATIVELY impacted the t1 example by doing the string conversion!!!! This is worse than it looks)

Next, a peek at how this might affect memory (yea of the middle tier persuasion - take note, consider how a couple HUNDRED queries might affect you in the aggregate on the client side....)

I'm "cheating", I'm going to uncomment the read call above and when the java client "hangs", use "$ top" to measure it:

$ java select t1

RSS=19m SIZE=19,596 SHARE=10,520k

$ java select t2

RSS=34m SIZE=39,880k SHARE=10,460k

These numbers were repeated over and over, run after run - think "square array" sum(column*width)*prefetch....

Do NOT use varchar2(4000) for "every column, just in case".

(regardless of language)

Rating

(3 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Alexander the ok, January 04, 2006 - 2:57 pm UTC

Oh man did this one hit home. Just to add to Tom's point about "data quality" I thought I'd briefly share an experience of mine.

We have a column defined as a large text field in a table. The users entered various descriptions in there and also decided to store another value that should have been it's own column randomly in these text strings. Then one day the asked us to write a script that read these values out and inserted them into another column. These were completely free following sentences and the values where random numbers and characters so it was impossible to get them all (Tom actually help me with this.)

So basically if you give users that amount of space chances are they'll stick all kinds of junk in there and cause you headaches down the road to format it or whatever.

cannot duplicate

Richard, February 07, 2006 - 8:17 pm UTC

I am unable to duplicate the results here.
The java code I am writing is running at the same speed regardless of the size of the varchar2.

February 08, 2006 - 1:47 am UTC

and how about the memory used

and information such as db version, jdk, platform would be extremely useful as well.

RE: Unable to duplicate

Richard, February 08, 2006 - 12:14 pm UTC

Ok could be wrong but I don't think varchar2 works that way...

local machine: WinXp sp2, java 1.4.02
dbserver: hpunix 10gi
local memory: the jvm stays at 19 K for both runs (if I comment out one or the other)

I have 2 tables with identical data, 10000 rows,
one with a column varchar2(5) the other with the same column varchar2(4000) (each cell is actually 3 chars)

If you had (100 rows * 4000 chars ) = 400k of memory

[junit] ======================== table_2
[junit] time: 47ms
[junit] col1 VARCHAR2(4000)
[junit] time: 312ms
[junit] num Rows 10320
[junit] ======================== table_1
[junit] time: 47ms
[junit] col1 VARCHAR2(5)
[junit] time: 297ms
[junit] num Rows 10320

public void test01()
throws Exception
{
try
{
DBConnectionManager.init();

System.out.println();
System.out.println("======================== table_2");
runQuery("table_2");

System.out.println("======================== table_1");
runQuery("table_1");

}
catch (Exception ex)
{
ex.printStackTrace();
}
}

public void runQuery(String tableName)
throws Exception
{
Connection db_connection = DBConnectionManager.getConnection();
Statement stmt = stmt = db_connection.createStatement();
PreparedStatement theStmt = null;

try
{
String sqlFromFile = "select * from "+ tableName;

ResultSet rs;
ResultSetMetaData rsmd;
int numRows = 0;
int numberOfColumns;
String val;
Date start = new Date();

theStmt = db_connection.prepareStatement(sqlFromFile);
theStmt.setFetchSize(100);

rs = theStmt.executeQuery();
printTime(start);

rsmd = rs.getMetaData();
numberOfColumns = rsmd.getColumnCount();
numRows = 0;

for (int j=1;j<=numberOfColumns;j++)
{
System.out.println( rsmd.getColumnName(j)+" "+rsmd.getColumnTypeName(j) + "(" + rsmd.getColumnDisplaySize(j)+")");
}

while (rs.next())
{
for (int j=1;j<=numberOfColumns;j++)
{
val = rs.getString(j);
val = val + "y";
}
numRows++;
//System.in.read();
}
printTime(start);
System.out.println(" num Rows " + numRows);
}
finally
{
stmt.close();
theStmt.close();
db_connection.close();
}
}

public void printTime(Date start)
{
System.out.println(" time: "+ ( (new Date()).getTime() - start.getTime()) );
}

February 09, 2006 - 4:16 am UTC

did you run my test using OS tools to measure the memory used and such as I did.

a jvm at 19k? I've never heard of such a thing.

it sure looks like you've changed my example considerably?

I was jdk 1.4 on rhas linux.

varchar2 doesn't "work that way", but array fetch buffers do - you ask the database "what is the max width of this column", database says 4000, you ask "how many columns" - it says "10", you know now you need 4000x10 to fetch a row (you haven't fetched a row yet, the row COULD be 4000x10). You ask "how many rows at a time? It says "100", you now know you need 4000x10x100 to hold the largest possible fetch that could happen.

Live SQL

Share and learn SQL and PL/SQL; free access to the latest version of Oracle Database!

Dev Gym

Classes, workouts and quizzes on Oracle Database technologies. Expertise through exercise!

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

cannot duplicate

RE: Unable to duplicate

More to Explore

Live SQL

Dev Gym

Will size of columns affect performance even when data is not present?

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

cannot duplicate

RE: Unable to duplicate

More to Explore

Live SQL

Dev Gym