Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, John.

Asked: October 19, 2006 - 1:25 am UTC

Last updated: October 25, 2010 - 2:54 pm UTC

Version: 10.2.0

Viewed 1000+ times

You Asked

Hi Tom,

I love your site.

We're developing a new application, and the data architects have created an ER diagram for it. After looking at it, it appears to be a very generic data model (similar to what you have in your book where columns are made into rows, etc.).

I have warned them that it is going to have problems scaling, because a simple query is going to do unnecessary joins/etc ... but I'm not sure how to prove it because we don't have that much data yet.

At this point, they simply say that we've loaded all of our production data, they run their multiple table join query, and the results come back immediately. The execution plan appears fine.

What can I do to convice/prove to them otherwise? Do you have any actual examples that will show from a performance perspective how this model is not a good one. I have your book and I understand the SQL is harder to write as well, but this isn't something that will convine them.

Thanks.

John

and Tom said...

</code> http://www.simple-talk.com/opinion/opinion-pieces/bad-carma/ <code>


How about that one! Bad-carma - could be Good Karma for you!

Rating

  (26 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Generic

H, October 19, 2006 - 1:54 pm UTC

As soon as someone says it is/should be "generic" then you should stop take a deep breadth and if possible rejected it authoritatively or escape from that mental hospital.

(speaking based on experience)

Tom --

John, October 19, 2006 - 5:26 pm UTC

Tom,

Thank you for that excellent article.

However, it won't really help me deal with this problem unless I provide specific examples/etc, which I'm not sure how to generate for this specific issue.

Do you have any ideas/examples that would help?

Tom Kyte
October 20, 2006 - 4:03 am UTC

umm, did you read that article? It *was the specific example*.

Generic always evil?

Vega, October 20, 2006 - 4:29 am UTC

Can you think of any approach where generic would be the right one?
Let's say your bussiness is handling documents from different clients (physical paper ones or digitized ones). There are lots of different documents and different specs about which information would be stored depending on the type of document for every client. Wouldn't be a generic DOCUMENT table a right approach? I know the design would have more tables, of course, but with one like this as the main table to store the information.

What's your opinion?

Tom Kyte
October 20, 2006 - 7:09 am UTC

my opinion would be that.......


you'd have to do "flex fields" - a couple of extra columns at the end - and pray for the best.




XML is generic

Andy Mackie, October 20, 2006 - 7:42 am UTC

An XMLType column is very flexible - lets users store whatever new data they can think of, without having to restructure the database all the time - and without having to get DBA's involved whenever some new piece of data needs stored. Of course, the fun begins when you want to query it - but then Oracle has XPath and/or XQuery for that.

This is a never ending debate. There's no getting away from the fact that there are *business* requirements for users to be able to specify what data they want to store.

Whether that is then implemented using XML, or plain SQL "generic data models" (which run like a dog), or by an application issuing DDL statements all the time to add new columns (try getting that past a DBA!), does not change the fact that it is business users who are asking for this functionality.

Tom Kyte
October 20, 2006 - 7:57 am UTC

... Of
course, the fun begins when you want to query it ...

nail
head
hit


... does not change the fact that
it is business users who are asking for this functionality. ...

and we must fight them unless and until it is absolutely found to be a true requirement.




Generic

A reader, October 20, 2006 - 9:04 am UTC

Purportedly the Kalido ETL tool uses a generic "thing" table to store both data and metadata.
</code> http://utopia.csis.pace.edu/dps/2007/gghare/references/Kalido/1kalido_dwlm_white_paper.pdf <code>
"From a technical point of view, it may appear that generic data storage will impair the performance of the data warehouse. Empirical evidence suggests otherwise: commercially-available relational databases have been shown to perform as well, or very nearly as well, with generic designs as with traditional designs. Where performance is an absolute priority, a hybrid design may be considered, in which the performance-sensitive portions of the data warehouse are handled conventionally, while the rest of the data is handled generically."

Tom Kyte
October 20, 2006 - 9:26 am UTC

so does oracle, we call it a database :)

so, you really believe:

<quote>
...
If it were possible for business users to manage
change in a data warehouse themselves, this
would eliminate the need for lengthy consultation
between the business and technical functions,
and enable data warehouses to be far more
adaptive. Even where only a limited IT knowl-
edge is required, the business function is always
looking for more productive and less error-prone
ways to keep a data warehouse current. The
ideal solution is to raise the level at which a data
warehouse is configured, going from the physical
database level to the level of the business model.
When this is done, business users can modify
data warehouses themselves without resorting
to highly skilled IT development resources.
It
should be possible, for example, for a financial
controller
to change the definition of net profit
and to add some new budget data to a data
warehouse
, confident that the results produced
will carry out currency conversion and time vari-
ance and all the other complexities correctly.
....
</quote>

I'd like to see that "in action"

generic?

Duke Ganote, October 20, 2006 - 9:48 am UTC

It would be interesting to see a TPC benchmark on generic ("configured") vs custom ("tailored")!
</code> http://www.tpc.org/tpch/default.asp <code>
I like the quote from your cited URL:
"early Computer-Aided Software Engineering (CASE) ... included a table named SDD_ELEMENTS that contained both data and metadata indistinguishably... [BUT] what works very well for a CASE application supporting 1–5 developers who think a lot and type only a little may not work well at all for a mission-critical order-entry application supporting 300–400 reservation agents who think very little and type very rapidly. The database workload generated by a small number of users generating transactions intermittently has little in common with the database workload generated by an order-entry application."

A reader, October 20, 2006 - 11:10 am UTC

Vega asks "Can you think of any approach where generic would be the right one?"

This object/object-attribute structure seems like it would be useful when used by an Oracle text user-datastore to construct a logical xml document for text queries.

Provided it is only used for that purpose and only Text queries, I think it's use is valid.

Tom, what do you think?


Tom Kyte
October 20, 2006 - 11:18 am UTC

why not just store xml then.

A reader, October 20, 2006 - 11:56 am UTC

I was thinking that the db would want some control over what shape the document would have and which attributes were included without incurring the cost of XSLT.



Tom Kyte
October 20, 2006 - 12:24 pm UTC

why would you use name/value pairs and incurr the overhead of name/value pairs - what "overhead" have you avoided exactly?

A reader, October 20, 2006 - 12:46 pm UTC

I was referring to what I expected to be the greater processing cost of XLST.

But, if the database can perform an xslt transformation of an xml clob more quickly than one could construct the same from an attribute table, great.


Tom Kyte
October 20, 2006 - 1:21 pm UTC

you have the ability to function base index the xml
you have the ability for the xml document to be shredded upon input and stored structured
you can leave it as a clob
you can text index ALL attributes for "fast" retrieval
you can text index SOME IMPORTANT attributes for "fast" retrieval by them

not sure where xslt would come into play though, we don't really need to "transform" the document.

A reader, October 21, 2006 - 3:14 pm UTC

yes, xml and xmltype offers many options.

but in this case (the generic table thingy):

- "you have the ability to function base index the xml"
yes, function based indexes are possible (but when using extractvalue() and friends, does Oracle parse xml once/insert-or-update or once for each index?)

- "you can leave it as a clob"
shedded xml is not possible because an xml schema can not be registered in this context,

- "you have to use a clob"
you have to,

- "you can text index ALL attributes for \"fast\" retrieval"
yes, if that means using a CTXXPATH index; but, doesn't that index the tag names and not the values,

- "you can text index SOME IMPORTANT attributes for "fast" retrieval by them"
not really possible without an xml schema.

as far as my comment to xslt goes, I was assuming that different structures of objects were being serialized and it's desired to have some sort of consistent query.

I was only pointing out that Oracle Text (on it's own) can be used as a powerful query tool and the o/oa structure can be used to construct something to query. I think storing XML and querying with xpath or xquery is often overrated or overused. I like the Oracle text-over-xml facility.

Btw, when is Oracle going to allow limited numeric queries in Text. That'd be very powerful and pretty doable I think.
ie something like:
contains(doc, 'yadda yadda and (expr(gt, 20061001) within thedate)') > 0







Tom Kyte
October 21, 2006 - 3:20 pm UTC

indexes are maintained only during the modification, so the extractxml is performed once on the insert and not during the retrieval.

you can do section searching on the XML as well. You can get the tags and content.



If you have different structures - why bother with a consistent structure, you just said "different", not sure what that would buy you.

A reader, October 21, 2006 - 6:19 pm UTC

- "indexes are maintained only during the modification, so the extractxml is performed once on the insert and not during the retrieval."
yes, but if I have two function-based indexes (using extractvalue or friend), will the the clob be xml-parsed once or twice? I hope once. XML parsing is pretty heavy.

The consistent query model business was just wanting to normalize what made sense. So that, say, searching in a title would hit both newspaper articles and books (which may have been serialized differently).

[Tom, fwiw, I *really* appreciate this dialog. (Who else does that?) It's pretty interesting and has caused be to explore corners of Oracle I had not considered.]



Tom Kyte
October 22, 2006 - 7:39 am UTC

it would be done twice, you are running extractxml twice.

But your model isn't really "normalizing" anything - data normalization is part of the design phase of your data model - and there "isn't a data model here", just layers upon layers of abstraction :)

It isn't fair to introduce the term "normalization" into "generic storage"


I would say in your example - documents have titles.... structured data.

no limit to how to wreck a good car ...

Gabe, October 22, 2006 - 1:43 pm UTC

<quote>(Who else does that?) It's pretty interesting and has caused be to explore corners of Oracle I had not considered.]</quote>

It had started with [relational] data models, went through generic models and the utopia of Kalido's marketing machine only to get swamped in a sea of acronyms all starting with 'X'. How better can it get?

Normalization <> Abstraction/Generalization

Duke Ganote, October 24, 2006 - 5:53 pm UTC

One of my pet peeves is misusing the term "normalization" to refer to abstraction or generalization. Thank you for pointing out the difference! They are orthogonal / independent concepts!

Normalization <--> denormalization
Abstraction <--> concretization (it's a real word!)

Fit the solution to the problem

Michael Friedman, November 26, 2006 - 5:28 am UTC

The famous SDD_Element table in Oracle CASE is a generic model... and it's well suited to its target application.

As the Bad CaRMa article points out, it would not be good for a transactional system.

Oracle Applications Flexfields are another less generic but still quite generic solution that work well in transactional business systems. Huh, fancy that.

The Oracle DBMS itself has an extremely generic data model tucked away in its internals. Surprise surprise.

Most serious applications have at least some generic functionality mixed in - you need some kind of generic fields or flexibilty to allow minor changes in data model and business practices without calling in the programmers.

The key is to balance the cost of generic designs with their benefits, to use them where they are most valuable, and to make them as cheaply generic as possible given the business requirements.

Tom Kyte
November 26, 2006 - 9:38 am UTC

the oracle dbms has a generic data model where pray tell?




General internal model

Michael Friedman, November 26, 2006 - 12:27 pm UTC

I am referring to the internal tables that views like ALL_TABLES, etc. are based on.

Look, for example, at sys.obj$.

To give people an understanding of how many different distinct database objects are stored in sys.obj$, here's an excerpt from the definition of the view ALL_OBJECTS:

decode(o.type#, 0, 'NEXT OBJECT', 1, 'INDEX', 2, 'TABLE', 3, 'CLUSTER',
4, 'VIEW', 5, 'SYNONYM', 6, 'SEQUENCE',
7, 'PROCEDURE', 8, 'FUNCTION', 9, 'PACKAGE',
11, 'PACKAGE BODY', 12, 'TRIGGER',
13, 'TYPE', 14, 'TYPE BODY',
19, 'TABLE PARTITION', 20, 'INDEX PARTITION', 21, 'LOB',
22, 'LIBRARY', 23, 'DIRECTORY', 24, 'QUEUE',
28, 'JAVA SOURCE', 29, 'JAVA CLASS', 30, 'JAVA RESOURCE',
32, 'INDEXTYPE', 33, 'OPERATOR',
34, 'TABLE SUBPARTITION', 35, 'INDEX SUBPARTITION',
40, 'LOB PARTITION', 41, 'LOB SUBPARTITION',
42, NVL((SELECT distinct 'REWRITE EQUIVALENCE'
FROM sum$ s
WHERE s.obj#=o.obj#
and bitand(s.xpflags, 8388608) = 8388608),
'MATERIALIZED VIEW'),
43, 'DIMENSION',
44, 'CONTEXT', 46, 'RULE SET', 47, 'RESOURCE PLAN',
48, 'CONSUMER GROUP',
55, 'XML SCHEMA', 56, 'JAVA DATA',
57, 'SECURITY PROFILE', 59, 'RULE',
60, 'CAPTURE', 61, 'APPLY',
62, 'EVALUATION CONTEXT',
66, 'JOB', 67, 'PROGRAM', 68, 'JOB CLASS', 69, 'WINDOW',
72, 'WINDOW GROUP', 74, 'SCHEDULE', 79, 'CHAIN',
81, 'FILE GROUP',
'UNDEFINED'),

Tom Kyte
November 26, 2006 - 1:12 pm UTC

They are not a "generic" model however - an object is an object is an object.

This is not a look up table either.

There are specific subtype tables for most of those "objects".

There is a very specific data model designed for physical access, and a set of views on top to make it easy to query.

What is Oracle's stance?

John, April 06, 2007 - 6:41 pm UTC

It would be nice if Oracle had a published white paper on the pros and cons of generic models, and what kinds of performance problems can come out of generic modeling.

John

Speaking of Generic Data Models...

Mark, April 05, 2008 - 11:22 am UTC

Watch out Tom, a new competitor is in town: http://www.amazon.com/b?ie=UTF8&node=342335011

Some quotes:

"Amazon SimpleDB differs from tables of traditional databases in several important ways. First, you have the flexibility to easily go back later on and add new attributes that only apply to certain items - for example, sleeve length for dress shirts. Additionally there is no need to pre-define data types. If you have an attribute called "size," it can have values of 10.5 for shoes and XL for sweaters, making the service extremely flexible and easy to use."

"Amazon SimpleDB automatically indexes all of your data, enabling you to easily query for an item based on attributes and their values. In the above example, you could submit a query for items where (color = blue INTERSECTION description = dress shirt), and Amazon SimpleDB would quickly return item 456 as the result."

"Amazon SimpleDB requires no schema, automatically indexes your data and provides a simple API for storage and access. This eliminates the administrative burden of data modeling, index maintenance, and performance tuning. Developers gain access to this functionality within Amazon's proven computing environment, are able to scale instantly, and pay only for what they use."

I suppose it could work for some, well, very simple databases (ahh, I get the name now!)...The funny part is there's kind of a buzz about this new database being revolutionary, and I've even read comments that Oracle and MySQL should be worried...

Non Genric 104 Column Fact Table

Arvind Mishra, April 30, 2009 - 7:35 pm UTC

Hello Tom,

I am designing a dimensional database model for a scientific research org.. I have a fact table item_fact in which we are storing item_id and different physical characteristics of item. Since physical characteristics are changing all the time, we can not have fixed columns in this fact table to stores them. As I am fully aware of drawbacks of generic model(Thanks to YOU)I have decided to go for all possible column + some flexi columns in the fact table. Now I can design fact table in two ways:

a) Give all column a meaning full name e.g. item_id, color_after_boil, color_before_boil....
But then I am ending up with 104 column in fact table.

b) If I do not use proper column name for example color_after_boiling and use col1 for it then I can store values in 19 columns of different data types. In this case my fact table will be item_id, col1,col2,col3......col20 but I will loose the meaningful column names

My questions are:

1)Between a and b, Which one is a better approach? Is there any way by which I can avoid loosing meaningful column name?

2) Is 104 column fact table is a good idea? What will be the impact of it on storage and full table scan?

We are using OBI for reporting purpose.

Thanks,

ARVIND
Tom Kyte
April 30, 2009 - 9:37 pm UTC

104 columns is not really too large (1000, that would be).

There will be the overhead of the null indicator for data that is not there - so sure, each row will be a little larger than it "could be" if you packed the information in there - however, this is just 100 columns.

I'd be very tempted to go with 100 columns


I think if you went with c1, c2, c3, .... like that, it would be more than c1, c2, c3.... - you'd need to embed metadata into the row to describe what you are looking at wouldn't you?

data model

A reader, May 01, 2009 - 6:05 pm UTC


Non Genric 104 Column Fact Table

Arvind Mishra, May 01, 2009 - 7:12 pm UTC

Sorry for asking basic level question but how can I add metadata?

Thanks for your quick response.

Regards,

ARVIND
Tom Kyte
May 02, 2009 - 12:05 pm UTC

you would have to model it in there somehow, you would need to store metadata somewhere that would tell you "when looking at c1, for a row with x=<something>, then c1 really means..."

YOU would have to figure out how YOU wanted to model/add the metadata - that is a big problem with the c1, c2, c3, ... method

Non Genric 104 Column Fact Table

Arvind Mishra, May 02, 2009 - 7:49 pm UTC

THANKS - A - LOT----> to YOU.

(YOU == GREATEST PERSONS. You is for others) :-)

Regards,

Arvind

1 table bad, 4+ tables good

Mike Kutz, May 06, 2009 - 2:57 pm UTC

From experience, a "single table that does everything" is about as useful as a java/c/c++ "one-liner".

For Arvin, my experience with writing scientific applications:
A view/materialized view like
item_id, test_name_a, test_name_b, color_before_boil, ...
is better for the non-SQL savvy end-user who is accessing the data through a read-only ODBC connection (eg Excel, SAS, other statistical software packages). This is usually used for the warehoused/published data.

2nd) due to our business requirements, this design sucks for the application that collects, analyzes, runs through 'approval' process, and 'publishes' the data.
-- reason 1) each c1, c2, c3, ... requires ~14 columns
-- reason 2) on rare occasions, we would have to "DEL COLUMN" in this design
-- reason 3) i'm not lucky enough that c1 has one and only one result value per test run. (which increases column count)
-- reason 4) it appears that i'd be trying to make a single table that does everything (a la Vision)
-- ** i do use this design for consolidating and reporting data.

3rd) (for John also) (NOTE: i am not an expert)
From experience, if and only if the table has a very well defined/specific reason in life, then a table like:

create table results_for_physical_properties (
rfpp_data_index Integer primary key,
some_foreign_key integer,
item_id integer,
physical_test_id integer, -- some tables use varchar2
result_value float -- some tables have result_x, result_y
-- other tables need (result_value varchar2)
);

works great, especially when (item_id, physical_test_id) is known to be non-unique. (ie item_id=x was tested multiple times for test_id=y). For me, what makes this table specific is that I have another set of tables that contains a table called "results_for_chemical_properties".

If i'm understanding correctly, what would be BAD would be a design like:
create table any_result (
result_type int, -- 0== physical, 1==chemical, ...
item_id int,
test_id int,
{some set of columns who's meaning changes based on "result_type"} -- this is the part that makes it bad.
);

I would love to hear everyone's comments. Especially Tom's.

Thanks,

Mike Kutz

Tom Kyte
May 11, 2009 - 10:10 am UTC

I think tables should be specific, not generic... Yes, I agree - that result_type in the last example - bad idea in general

Generic table, specific views

A reader, August 10, 2009 - 5:34 pm UTC

I have a developer who would like to create a generic table with, say, 10 columns each of string, number, date datatypes and a "type" and a sequence generated PK. There would be an index on type+pk, that would the only access path.

He would then create specific views on the table depending on specific uses of it.
create view cust as 
select
 id cust_id,
 cast(c1 as varchar2(30)) cust_name,
 cast(c2 as date)         dob,
 cast(c3 as number(10,2)) some_numeric_attribute,
 ...
from generic_table
where type='cust'


Your thoughts? Thanks
Tom Kyte
August 13, 2009 - 9:01 am UTC

Unless they only access this table via the primary key - the ONLY predicate is "where cust_id = ?", tell this 'developer' to go back and do some design.



In other words

JUST SAY NO - it would be a stupid idea. To be blunt. Stupid.

and buys nothing, they already know what the columns should be (they create the views for goodness sake). Just use *tables* that can actually be indexed, sized, monitored, tuned, reorganized, etc.

db model

A reader, January 14, 2010 - 10:38 pm UTC


Where in the saddle do you want to put your app

Tony Fernandez, October 19, 2010 - 4:31 pm UTC

This is another example of the old dilema between the normalization versus the de-normalization forces at design time. Each application has it is own sweet point in the saddle. But is never the extremes.

One table = full denormalization, this is extreme.

Too many tables = full normalization, this is the other extreme.

How can a developer be given a chance to implement any of the extremes?

A fundamental principle in the design of any tool used by human kind is not necessarily the full stress of the extremes if optimal performance is in the goals.
Tom Kyte
October 25, 2010 - 2:54 pm UTC

this is not another example of whether to normalize or not - this is as far from that discussion as possible.

They want *A SINGLE TABLE HOLDING EVERYTHING*. We are not talking about de-normalizing the department name into the emp table...

Here's my only thought on when to let these in.

Galen Boyer, October 25, 2010 - 3:16 pm UTC

Business users will yell and scream that they need the ability to
define attributes about data, especially when that is how the current
application works. (ie, you take it over and they already have these
name/value pairs). Okay, so what really happened? How come these
things exist? Its because we, the developers can't react fast enough
to support all the business's needs RIGHT NOW!!!!, nor should we react
that fast. A design change should mean something. Like in Excel, they
think that they want to add a new column to a worksheet and start
maintaining it on the rows in that sheet so they can then filter on
that column for their daily work. We need to be able to give them
that new column, basically, right when they ask for it. But, we
can't. We need requirements gathering, design, testing, and then
deployment. So, can we bridge the gap? I say, okay, go ahead, let
them add that new name/value pairing. Then, start putting things in
place which monitor these things when they show up. Triage them as
they do and as they grow. Then, when they actually truly are needed,
modify the datamodel and the dependent applications to support this
new attribute. Update that attribute from the settings found in the
name/value pairing. Then, add yet another constraint on the
name/value pairing that does not allow that type in anymore.

Basically, allow that to be the pre-cursor to requirement gathering.
When reports are needed with that information as part of the report,
those name/value pairings shouldn't be generic anymore. They should
have made their way into the model.

On the other hand though, the users should have to answer for
name/value pairings that aren't useful. (ie, the monitoring should
also notice if a NVP actually has many values and such). If no, you
should have a purge agreement as well.