The Tom Kyte Blog: This should be fun to watch...

Wednesday, January 07, 2009

This should be fun to watch...

On reddit - you can post a 'question' to the community. Someone just posted "What is the worse design decision that you have made ?" My favorite comment on that thread so far is this one:

It started with an thought not unlike the following:
"Nah, flat-file should be fine."
It ended in tears. And a double digit load average.

Been there, seen that happen. My worst design decision I personally made was when I "invented" (see below for why I say "invented") the "funky data model" - better known as an EAV (entity-attribute-value). You know, the extensible model where all you need is four tables:

objects
attributes
object_attributes (objects is 1:M with object_attributes)
links (links objects to objects, an association table)

Man, I could store *anything* in there. And it was very secure - because trying to retrieve anything was really hard. Not only really hard, but really slow.

But the prototype/demo was awesome. It never worked in real life though.

I used the term "invent" above. Of course I didn't "invent" the EAV - it has existed as a concept for a long long time. But, I see it get re-invented in relational database applications over and over and over again. And every developer thinks for a short period of time "hey, this is so cool - I wonder why no one else has thought of this - they must not have thought of it or everyone would be doing it - I'll be famous".

And then they learn why not everyone is doing it :)

33 Comments:

Rahul said....: Tom,

Definitely, not the way to go. I understand.

But, here is my scenario:

We are buying a product which requires you to give the data (from Oracle) in attribute,value pair for it to work (it's pretty much a printing application).

But, our tables are from oracle apps. So, I am assuming that I may have to write a pivot query (10gR2), for my program to work, right?; Wed Jan 07, 12:30:00 PM EST
tnourse said....: After your example, my next favorite is one not by me, but one that I inherited... it was a 3 TB "data mar" designed by some old SAS programmes involving highly relational medical data, but due to their SAS background the tables looked exactly like the SAS files used to load it with nary a key or index to be found...duplicate data to the max, full table scans no matter what and terrible response times. Millions had to be spent on hardware to solve the "performance problem" and no, they would not even consider a redesign of their "perfect" data model...; Wed Jan 07, 12:32:00 PM EST
Thomas Kyte said....: @Rahul

not sure what you mean - but - pivot/unpivot is new in 11g.; Wed Jan 07, 02:14:00 PM EST
John Spencer said....: I kind of liked this one

"well, it wasn't my design decision, but we decided to have Oracle triggers creating tables on the fly for user-defined data...friggin' nightmare"

I've seen this one too, and nightmare is not nearly strong enough; Wed Jan 07, 03:59:00 PM EST
David Aldridge said....: Yes, EAV crops up all the time, especially on sites and forums not dedicated in particular to databases. I always Google up Tim Gorman's article "Bad CaRMa" (sic), a wonderful story of EAV gone bad and post a link to it.

http://www.simple-talk.com/opinion/opinion-pieces/bad-carma/; Wed Jan 07, 05:10:00 PM EST
Noons said....: Assigning a meaning to a NULL value.

Doesn't everyone?

(g,d&r); Wed Jan 07, 07:16:00 PM EST
galactic_hitchhiker said....: Tom,
Even Oracle Applications (Oracle E-Business Suite) has something like this. It is called the "Extensible Attribute" feature. It started off in the PLM (Product Lifecycle Module) and is now also one of the selling points of the Trading Community Architecture.
It has its uses but it is very close to the kind of monstrosity you said. If I remember correctly, It has 50 generic varchar2 columns, 50 number columns and I think 20 or 30 Date columns for "any kind if data extensions" you could care to add to your definition of a customer or Item etc.; Wed Jan 07, 11:05:00 PM EST
Thomas Kyte said....: @galactic hitchhiker

Yes, I know, but at least it is not a 1:M relationship - but rather attributes of a row that could actually be indexed and whatnot if/when needed.

Not so for the four table "funky data model"; Thu Jan 08, 06:36:00 AM EST
knash said....: Oops, you caught me, I was one of them once... Years ago I started a new ER modelling tool model...was very proud of this meta-meta-model...and saved the model as "this_is_it"... wondering why no one else thought of it before...must be here somewhere.; Fri Jan 09, 01:50:00 AM EST
Fraber said....: Hi,

I was able to avoid the EAV thanks to some great masters...

But after ~15 years in IT and databases, I finally found a way to make EAV work - It's called a "SQL metamodel". Basically, you take your objects and attributes and create SQL tables from it.

However, working with SQL metamodel is quite complicated and unsuitable for the usual PHP/VB coder, so you got your downside again.

Checkout http://www.project-open.org/doc/intranet-dynfield/

Cheers!
Frank; Fri Jan 09, 10:30:00 AM EST
srr said....: I use this EAV model to create and store billing documents data in highly customizable environment, and it works quite well in this case. Because it's needed to get all attributes and all links from object in almost all cases.; Sat Jan 10, 04:30:00 PM EST
Thomas Kyte said....: @srr

please find me all bills for people living in Texas that are 20 days past due and include a purchase of gum.

After that, I'll have 15 more ad-hoc queries to ask - all spanning attributes that are in the EAV.

The only time an EAV works is when you always do a keyed read by some primary key - eg: find me the information for bill = 12345. As soon as you start asking anything else - bam.

EAV's work well for one thing, storing an 'entity and its attributes by a key'. They are not useful for retrieving entities by their attributes (what we normally do with database applications); Sat Jan 10, 04:39:00 PM EST
galactic_hitchhiker said....: Tom, regarding the Extensible Attributes in EBS, it allows 1:M too.
In fact we are using it that way in a current implementation.
Oracle does have a feature to automatically create view(s) to abstract out all the complex definitions...which makes writing reports and queries against it a bit easy but the resulting query is not exactly easy to digest.
Our Customer Master is not very large and is not going to grow that big too so we do not see a lot of issues and can definitely add more custom indexes (indices?). And yes, the extensible attribute are indeed all keyed to at least the organization profile (in our case).
I believe the objective may have been to make it more "customizable" from the front end rather than have to do backend additions like a custom table etc but yes the approach is not elegant. Maybe for a very small domain with very well defined growth characteristics. I definitely wouldn't use it as the transaction master for a billing system.; Sat Jan 10, 05:58:00 PM EST
nancyc said....: Whew, I thought I was the only one who "invented" something and then realized later that my invention was a complete albotross around that particular database's neck! I invented the "lookup" table. A place for one-stop shopping of all picklists for the web front-end. Any table that had a short and long description found its data in this table. Then, this table was the parent to any table that had a particular dimension of that flavor. The Java team I was designing for thought I was great! One-stop shopping, yeah!!! But, guess what, one could easily put "account type codes" in the "customer type code" attribute and the database wouldn't think anything was wrong with that.; Thu Jan 15, 10:57:00 AM EST
Galen Boyer said....: "nancyc" is actually Galen Boyer from Boston. Sorry about the need for this follow-up.; Thu Jan 15, 11:01:00 AM EST
srr said....: @Thomas Kyte

I completely agree with you - searching the data in attributes is a very hard task. But I store common attributes in normal tables - bill date, number, customer, status and etc. And yes, fetching all attributes by key used more often then any other way to access this EAV data.
Another task is to make a report wich contains some set of attributes from selected bills. This task shows the weakness of EAV, but I expected it and it's ok. And the rarest task - to find a bill with some attribute (and always for the known customer).; Sun Jan 18, 07:31:00 AM EST
Wollaton DBA said....: Hi Tom,

I came across this web site this weekend which details architectures for scaling web 2.0 social network web sites. A common practice in scaling the backend appears to be use "Data sharding": http://highscalability.com/unorthodox-approach-database-design-coming-shard. Is this one worse design practices being rolled out right now ?.

As soon as I saw this, I thought, I'd love to get Tom Kyte's view on this as he is always quite passionate about avoiding the take advantage of nothing approach when using databases.

Off the top of my head it appears to me that this falls apart if you need to aggregate data above the 'shard' layer. It also seems to me that there is the potential for reinventing all the good stuff that is available in most database offerings in the application tier.
When most of the major players in the RDBMS market have spent the best part of the last 20 years or so refining and honing their products, why would anyone think they could do a better job on one project ?.

I look forward to getting your take on this.

Chris; Mon Jan 19, 07:15:00 AM EST
Anonymous said....: There is an enhanced eav model and it's called the semantic web. Oracle supports it too. How does that perform?; Sat Jan 24, 02:46:00 PM EST
Thomas Kyte said....: @Anonymous

calling the semantic web an 'advanced EAV' is like calling - we'll, I'm actually hard pressed for an absurd enough analogy....

Can you show me a payroll application, HR application, time and attendance, shopping, banking, etc etc etc (you know, the applications we design and build) written in the 'semantic web'

Oracle supports EAV's
Oracle supports XML
Oracle supports OWL and other related technologies
Oracle supports thousands of things

that doesn't mean you should use them as the core of your application. It doesn't mean you might not have an application that could use them. It means they are all tools - use wisely, with knowledge of what you are actually doing and a good model of how it will work in real life when you are done.; Sat Jan 24, 06:58:00 PM EST
Anonymous said....: " Can you show me a payroll application, HR application, time and attendance, shopping, banking, etc etc etc (you know, the applications we design and build) written in the 'semantic web'"

I've build such apps for 11 years with Oracle and no I didn't want to use the semantic web for such applications. Those 11 years where a great time but now it is time for something else. Such apps are no longer the apps I build.

I don't understand why the semantic web can't be seen as an enhanced eav model? Everything is stored in triplets in the semantic web.; Tue Jan 27, 11:41:00 AM EST
Thomas Kyte said....: ok, show me any typical application written in the semantic web. So, what applications are you building (throw us a bone, you have told us that conventional applications are not what you are building - so, what are you doing with it.

And don't you think that and EAV is to semantic web what ascii is to XML - slightly different scales, different scope, different goals, just plain different. EAV = name+value.; Tue Jan 27, 12:19:00 PM EST
Brian Tkatch said....: I just started on a project where we're supposed to accept values for some hundreds or so attributes from ~5000 entities, and the attributes can change on a yearly basis. The changes are probably less than 10% of the attributes.

I'm imagining the only way to do this is with an EAV TABLE. Indeed, the flat file that provides that data is one line per value.; Wed Feb 04, 05:18:00 PM EST
Thomas Kyte said....: @Brian

or just save it in a glob of XML

Not sure why these people are using a database though...; Thu Feb 05, 06:55:00 AM EST
Brian Tkatch said....: @Tom

XML, that's an excellent idea. I'm going to see if that will work, thanx!

Why a database? It's all for the reporting. It's financial data, but which financial data they want, and what aggregates to be reported on change a bit yearly.

This is the beginning of a project, which is to rewrite "the old system", and consolidate, etc. I'm pushing for no aggreagtes, just the plain data, so the database can do what its made for. Right now it does not look like that will happen, but i'm not giving up.

I'm scared of what the reports will be like if we used an EAV.; Thu Feb 05, 12:38:00 PM EST
Thomas Kyte said....: @Brian

XML will do nothing for performance, it'll just make your life easier.

It'll still be "end user hell" for anyone actually trying to create a report.; Thu Feb 05, 12:42:00 PM EST
Brian Tkatch said....: @Tom

Performace will be that bad? Maybe it isn't such a good idea then.

The writes are currently monthly, though not all on the same day of the month. Reporting is numerous times daily, either on a lot of data from one entity, or a smaller set for multiple entities.

The data will need to be Timestamped, adding an extra COLUMN to the EAV. I'm wondering if the older data should be moved into another TABLE. There is so much to learn.; Thu Feb 05, 01:20:00 PM EST
Thomas Kyte said....: @Brian

do not EAV, use generic columns if you must, views on top of that.

At least then you would have a chance of indexing based on actual end user questions.

At least then you would have some structure on top of a mess.

I'd rather you built a system that would "create the views we need based on real tables" rather than create a system that maintains data in an EAV; Thu Feb 05, 01:37:00 PM EST
Brian Tkatch said....: @Tom

Currently, there are ~5000 attibutes, and my understanding is, it will continue to be so. That would mean, create 5 TABLEs, with 1000 COLUMNs each. The data seems to be almost completely numeric (we haven't verified that yet, but that's what the sample data is), so they could all be NUMBER(x) (or if text is required, VARCHAR2(x)).

Hmm... if there are both, would it make sense to seaparate numeric from alphabetic in different TABLEs? Or is the conversion cost that negligable that it would not matter?; Thu Feb 05, 01:54:00 PM EST
Thomas Kyte said....: @Brian,

at 5000ish possible attributes, if you do not give some thought to design - not implementing a generic model, but actually getting the requirements - this is not going to work.

There is a small chance you can do something useful with section searching and text indexes on XML - especially if the data is loaded monthly (read the TEXT documentation).; Thu Feb 05, 02:06:00 PM EST
Brian Tkatch said....: @Tom

As usual, you are quite helpful. Thank you.; Thu Feb 05, 02:10:00 PM EST
Markku Uttula said....: In my workplace, we call this "EAV" with a different name - even though the idea is pretty much the same.

Every now and then I'm forced to writing memos to other designers and developers saying pretty much "please do not try to re-invent the 'one lookup table'-data model, it doesn't work". Usually the memo goes out for the latest beginner of the group.

It is interesting that all of us seem to be able to fall into this trap at least (and hopefully at most) once.; Mon Feb 16, 06:49:00 AM EST
Bill Dwight said....: Still happening...

http://bret.appspot.com/entry/how-friendfeed-uses-mysql; Sat Feb 28, 01:54:00 AM EST
Luis said....: Well... we use an EAV model in a system in which the business Entities have thousands of attributes, which are null most of the times...it works alright, it allows us to change the entities all the time (the system is chaning all the time) in a really easy way and to perform really complex data validation.

It is sluggish...but that fact has been mitigated with cached attribute lookup tables...

The team still thinks it was a great design decision...maintenance and enhancements are a breeze...

Luis; Mon Mar 23, 08:59:00 PM EDT