Thanks for the question, Farhan.

Asked: July 05, 2005 - 9:15 pm UTC

Last updated: August 31, 2005 - 3:30 pm UTC

Version: 9.0.2

Viewed 1000+ times

You Asked

Hi Tom,

We have a big de-normalized table that stores the summary statistics calculated from a detail table. Detail table contains under 20 columns and usually have 1000+ rows for each summary row. The summary row stores about 900 different attributes derived from that detail (Due to complexity we can not afford to calculate these numbers at run time)

Nearly 90% of those 900 columns are numeric and more than 50% columns are not populated therefore the average row size in below 8kb. We still notice above 15% row chaining for this summary table (I guess because number of columns exceeds 255).

1. Is it a good idea to break this table in ten 90 column tables sharing the PK just to avoid row chaining and create a view combining them all so that the existing code won't have any impact? (Another motive of doing it is, these 900 attribute can be logically separated into 90 similar attributes of 10 types)

We think this will decrease performance because we will be doing 10 small inserts instead of one big insert. And we will have 10 pk indexes instead of 1 pk index. Retreiving data using view will also be slower.

2. Instead of above, can we just eliminate (or reduce) chanining by changing PCTFREE and PCTUSED? If yes, what should be the ideal value for this scenario?

3. If chaning can not be eliminated and 10 tables will result too bad performance then can we go for mid-way approach i.e. 5 tables of 180 columns etc?

thanks
Farhan

and Tom said...

1) if you retrieve all of the columns, no -- you'd rather have the rows chained with rowid pointers linking them rather than have to hip hop from row to index to row and do all of the joins.

No, I would not vertically partition like that, if you have 900, you have 900, so be it

2) anything over 255 columns will be chained.

3) joining 1:1 mandatory relationships would be worse than the chaining that takes place. At least with chaining, we have the rowid to the next piece of the chain, we don't have to go to an index and try to read. And, we'll put more than one piece of the chain on the same block. Instead of (say 5 tables)

5 table blocks + 3*5 index blocks (assuming 5 indexes of height three) = 20 blocks for a row in the cache

we might have

3 index block + 2 table blocks with multiple row pieces on each = 5 blocks.

Rating

(15 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Farhan, July 06, 2005 - 4:39 pm UTC

Thanks for the quick and helpful response.

I need little more clarification.

1. What if we do not retrieve all columns all the time i.e instead of using the view all the time, rewrite code to query only the related table (or tables) and use the view only when all columns are needed (Lets say we use the view 40% and indvidual tables 60%).

2. If we go with one big table solution, what should be the ideal PCTFREE and PCTUSED if this table DML pattern is:

- 90% inserts
- 10% deletes
- no updates.

thanks
Farhan

July 07, 2005 - 8:30 am UTC

1) not all, but go for "most", what do you MOSTLY do. would you MOSTLY be able to only query a single table if you went with 10 tables? or would you MOSTLY be joining at least 2 of them (and incorporating logic to deal with "what tables do I really need" all over the place, ugh)

put the MOST FREQUENTLY used columns first, if you only access them, it'll never walk the chain.

2) exactly the same as if you used 10 tables, or 100 table or 2 tables. No updates - pctfree=0. pack them up.

Why Sponge Bob? Why??

A reader, July 07, 2005 - 9:28 am UTC

Why do you need 900 columns? I read and re-read your question to Tom trying to figure out why but I don't see it. How in the world are you going to use a 900 column wide table? Are you actually going to fit that many columns in a report?

Is there not another way to store the data? Maybe like this:

create table summary_header
(summary_id number primary key not null);

create sequence summary_id_seq;

create or replace trigger t_summary_header_bi
before insert
on summary_header referencing old as old new as new
for each row
begin

if :new.summary_id is null then
select summary_id_seq.nextval
into :new.summary_id
from dual;
end if;

end t_summary_header_bi;

create table attribute_header
(attribute_id number primary key not null);

create sequence attribute_id_seq;

create or replace trigger t_attribute_header_bi
before insert
on attribute_header referencing old as old new as new
for each row
begin

if :new.attribute_id is null then
select attribute_id_seq.nextval
into :new.attribute_id
from dual;
end if;

end t_attribute_header_bi;

create table summary_data
(summary_id number not null,
attribute_id number not null,
attribute_value varchar2(255) not null);

alter table summary_data add primary key (summary_id, attribute_id);

alter table summary_data add foreign key (summary_id) references summary_header (summary_id);

alter table summary_data add foreign key (attribute_id) references attribute_header (attribute_id);

July 07, 2005 - 9:55 am UTC

oh -- i don't like those 'models'. If they have 900 attributes, so be it.

Having what I call a "funky data model" like you propose is something I write actively AGAINST doing.

It is very generic, it can hold "anything" -- so it goes...

but it is not queryable
it won't perform
data integrity is totally wiped out, gone

I've nothing much good to say about them at all.

I'd rather stuff the attributes into an XML document and index that and do section searching on it.

anything...

A reader, July 07, 2005 - 10:19 am UTC

Well, I don't like the model too much either so I added the attribute_header so you couldn't add any attributes that aren't defined up front. Hopefully, that would limit it to the 900 attributes. I do like my solution better than having 900 columns. I just don't see any value to having a table that wide.

But using XML is certainly another way to go. I guess my point is, there are other ways to solve the problem and the other ways are more than likely better.

The simple question to the Farhan is why? Why do you need 900 columns? Can't you break this up into either a normalized model or even a star schema? And as Tom pointed out, XML is another choice too.

July 07, 2005 - 12:52 pm UTC

I see NO value in having the attributes go down the page, I see NEGATIVE value.

I would much rather have 900 columns over a bunch of rows -- IF I truly had 900 functionally dependent things.

List Partitioning

Reader, July 07, 2005 - 10:54 am UTC

Or you can list-partition one 90-column table by 10 logical column sets (instead of having 10 separate tables). If you (generally) don't need to look at more than one partition (logical columns set) at a time - it probably would work for you.

Cluster?

Brett, July 07, 2005 - 11:30 am UTC

Would a cluster work in this situation? Break up the table as Tom suggested, keeping the most commonly accessed columns together...or even grouping the columns by those most likely to be accessed together.

July 07, 2005 - 1:48 pm UTC

row chaining is not evil, row chaining is not evil.

having to JOIN ALL OF THIS DATA back together would be evil.

Think of the 4 row pieces as if they were a hash cluster all stored together on the same or few blocks and accessible to eachother without using an index at all!

value for going down

A reader, July 07, 2005 - 1:44 pm UTC

There is some value for going down with the data over more columns. A simple example is the need to add 10 additional fields. With the 900 column table, you would have to add 10 additional columns and then change whatever code you have to account for these additional columns.

If you knew up front that your elements might be changing, you most certainly would add/delete records from a table rather than having additional columns.

A good example is v$parameter. Every time Oracle modifies the parameters, they simply insert/delete records rather than adding/deleting columns.

Can you give an example where you would want a table with 900 columns? I have never seen an example of this. I've seen some people try by having a bunch of columns like code1, code2, code3, code4 rather than having a simple table that has a foreign key to another table and a code column. That has always been better in my experience because people always wanting to add or remove the number of elements and adding/removing columns can be rather painful compared to simple DML.

July 07, 2005 - 2:30 pm UTC

you cannot effectively, efficiently or ANYTHING make use of this funky data model.

If you have 900 attributes, you have 900 attributes.

last try

A reader, July 07, 2005 - 3:44 pm UTC

Last try and then I will leave you alone. :)

Let's say the data consists of sales data. Each row will be a summary of sales for a single sales person. The company sells products in 9 states and there are 10 elements that rate the performance of the sales person for each state.

The de-normalized table would look like this:

create table monster
(employee_id number not null,
ny_total number not null,
ny_avg_daily number not null,
ny_avg_monthly number not null,
ny_blah_daily number not null,
ny_blah_monthly number not null,
ny_foo_daily number not null,
ny_foo_monthly number not null,
ny_bar_daily number not null,
ny_bar_monthly number not null,
ny_grant_total number not null,

nj_total number not null,
nj_avg_daily number not null,
nj_avg_monthly number not null,
nj_blah_daily number not null,
nj_blah_monthly number not null,
nj_foo_daily number not null,
nj_foo_monthly number not null,
nj_bar_daily number not null,
nj_bar_monthly number not null,
nj_grant_total number not null,

ak_total number not null,
ak_avg_daily number not null,
ak_avg_monthly number not null,
ak_blah_daily number not null,
ak_blah_monthly number not null,
ak_foo_daily number not null,
ak_foo_monthly number not null,
ak_bar_daily number not null,
ak_bar_monthly number not null,
ak_grant_total number not null,

al_total number not null,
al_avg_daily number not null,
al_avg_monthly number not null,
al_blah_daily number not null,
al_blah_monthly number not null,
al_foo_daily number not null,
al_foo_monthly number not null,
al_bar_daily number not null,
al_bar_monthly number not null,
al_grant_total number not null,

--etc for 9 states in total to get 901 columns in my table.

So now, the company grows and adds two more states. This would require DDL to add 20 columns to the table, the ETL code would have to change to handle these new columns, and reports would have to change to handle the new columns.

The funky way to model this would be this:

create table funky
(employee_id number not null,
state_code varchar2(2) not null,
total number not null,
avg_daily number not null,
avg_monthly number not null,
blah_daily number not null,
blah_monthly number not null,
foo_daily number not null,
foo_monthly number not null,
bar_daily number not null,
bar_monthly number not null,
grant_total number not null);

The addition of two more states would be more rows to the table. I would also have a state_code table that has all of the acceptable state codes with a foreign key from funky to the state code lookup table.

Which table is "better"?

July 07, 2005 - 5:07 pm UTC

did they say they were doing anything remotely similar to that?

The "funky" data model is above -- it would turn each "attribute" into a row.

This model here (the last table you have) is correct and proper for this -- where you have taken what is naturally a CROSS RECORD dimension and pivoted it "in record". If their model looks like yours, then yes, that was a mistake and they should not turn a domain (state) into columns - definitely

Funky table is mixed bag

Brett, July 07, 2005 - 4:06 pm UTC

We have a table such as the "funky" table in our OLTP. It is great for those guys. They merely add elements and they are off. It is terrible for the reporting side (were I am). It is slow and you always have to pivot the data so that it is one row.

July 07, 2005 - 5:23 pm UTC

One might ask "why do they merely add elements to a running transactional system"

On the OLTP side, they've thrown out data integrity.
On the reporting side, they've made life absolutely miserable.

David Aldridge, July 07, 2005 - 8:30 pm UTC

If you had particular reports that used a subset of the 900 columns, and you found a measurable and worthwhile performance benefit to isolating these columns into their own table, then I'd suggest that a Materialized View with Query Rewrite could be used to make a copy of them in a new table - ie. keep everything in the 900 column table and create appropriate MV's for the subsets that you need (possibly aggregating also, for extra goodness)

Static domain

Farhan, July 08, 2005 - 3:19 pm UTC

Yes, our model is somewhat simliar to that 'state' example but with two big differences.

1. The domain is static - no inserts/updates allowed to that domain.
2. The attributes required for each domain value are different. (There are about 40% common attributes)

One more clarification: This table is not an OLTP table, it is a denormalized summary statistics captured for one report i.e. One row in this table represent the entire report (The report shows matrices of summary values - no details).

We did thought about breaking this summary table into a header table and a domain specific detail table (that will contain 10 rows for each header i.e. a row for each domain, and will have uninon of all attributes of all domains). But, what do you think about performance of querying 10,000 'moster' rows versus pivoting 10 rows for each report thus accessing 10,000 header + 10 x 10,000 detail = 110,000 'small' rows?

July 08, 2005 - 3:23 pm UTC

If you absolutely need it pivoted for reporting purposes, stick with the single table

A reader, July 08, 2005 - 5:52 pm UTC

Hi Tom,
We are using Siebel crm application and we have two tables in the application that have more then 255 columns - (S_ORG_EXT - 374 and S_ASSET - 296). We are using 8K block size in the database.

In your response above you mentioned that anything over 255 columns will be chained.

In Siebel crm everthing is driven from account (s_org_ext) table (ie Most of the times user queries account related information eg. account's activity, service request, notes, opportunities etc..)

Could you please let me know what are the performance implications?

As always, appreciate your feedback.
Thanks

July 08, 2005 - 6:43 pm UTC

not too much -- it is just more of a "fact", the rowpieces will likely be on the same block.

but basically "there isn't a thing you can do about it". Keep that in mind.

Why?

A reader, August 22, 2005 - 5:22 pm UTC

>>2) anything over 255 columns will be chained.

Is the chaining depending on the data size and available space on the block? Why is it depending on number of columns? What if I have a 300 columns table but I only used first 2 number columns? Still is it going to do the chaining?

August 24, 2005 - 3:29 am UTC

anything over 255 column will be chained, regardless of size.

why? because that is the way it is implemented (255 used to be the limit)

if you have 300 columns defined and only use the first 2, there will be 298 "missing columns" and it won't be chained since trailing null columns are not stored in the row.

A reader, August 31, 2005 - 2:32 pm UTC

How about if I add data to last 2 columns in a table with 300 columns?

August 31, 2005 - 3:28 pm UTC

then there will be 300 columns all filled in (the intervening columns will be NULL - the last two filled in)

it'll be chained.

Could not agree more

Pet, August 31, 2005 - 6:35 pm UTC

I've seen this kind of attribute driven tables in lot of client places and I told them not to make it complicated if you don't have too. These kind of Attributes driven tables are for Software Products like Oracle Apps.. where they need a flexible data model for different places,different needs... Thanks

Some examples of tables with LOTS of columns

Michael Friedman, November 27, 2005 - 11:28 am UTC

Let me give you some examples of tables that require lots of columns:

1. We are implementing multi-lingual Oracle Forms. We've played around with Oracle's various translation tools and concluded that they impose too much overhead - we need one version of the Form and the ability to make it change language at will.

We are doing this using a table with 1,000 columns. The first few columns indicate the Form, the Language, and a few other things. The remaining columns are 900+ Form text items - labels, tool tips, etc.

By keeping them all in one column we bring them back with a single select and can automatically load them right into the labels with no extra coding. This has a major performance benefit.

As second use is something similar to Oracle Applications' Flexfields. Our analysis of our customers is that almost all of them are satisfied with three languages for data. We have one customer who requires four languages. So we're going to implement a system that supports 5.

We need configurable columnslike the SEGMENT_xx columns in Oracle Applications but we need to support multi-lingual data and also configurable image fields. Therefore our configurable field sets are going to have a bit more than 180 columns in them:

FIELD_[00-30]_L[1-5], FIELD_[00-30]_BLOB, and a few housekeeping items like a hashed concatenation of all FIELD_xx_Lx for each language to index when the configurable Field Set must be Unique.

Unlike Oracle we don't distinguish between KEY configurable fields and non-KEY configurable fields. We just implement two field sets - one for KEY, one for non-Key.

If we decide to use a FIELD SET to implement application level functionality then we will also provide another FIELD SET for user level functionality.

What this all means is that some of our tables will have almost 600 columns that are just used to define FIELD SETS. That is in addition to any other columns they may have.

(And yes, we did think about splitting this out into separate rows for each language but past experience is that this ends up being much harder to manage. For example, forcing the user to use at least one language in every row becomes very difficult.)

Live SQL

Share and learn SQL and PL/SQL; free access to the latest version of Oracle Database!

Dev Gym

Classes, workouts and quizzes on Oracle Database technologies. Expertise through exercise!

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

Why Sponge Bob? Why??

anything...

List Partitioning

Cluster?

value for going down

last try

Funky table is mixed bag

Static domain

Why?

Could not agree more

Some examples of tables with LOTS of columns

More to Explore

Live SQL

Dev Gym

One 900 column table versus ten 90 column tables.

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

Why Sponge Bob? Why??

anything...

List Partitioning

Cluster?

value for going down

last try

Funky table is mixed bag

Static domain

Why?

Could not agree more

Some examples of tables with LOTS of columns

More to Explore

Live SQL

Dev Gym