Ah yes... EAV (that was what I couldn't think of)
Robert, February 01, 2010 - 4:05 pm UTC
Tom,
What about index on (ATTRIBUTE, VALUE)
Then gather histograms.
That would be better than nothing, eh?
(we are stuck with this app)
Thanks,
Robert.
February 01, 2010 - 4:16 pm UTC
why would that help?
The problem with EAV is not getting *a row*, but rather putting together your row from the N'thousand of EAV rows that make it up and comparing that to other rows.
If you have a table object
and a table object_attributes
and into object_attributes you stick things like:
(obj=100,attr=first_name,val=thomas)
(obj=100,attr=last_name,val=kyte)
finding all of the thomas kytes and printing out their full name, address and so on requires a multi (many multi) table outer join done over and over to pivot the data and then apply a where clause.
No indexing strategy will change that.
What about this....
Robert, February 01, 2010 - 4:35 pm UTC
Tom,
An index on (ATT,VAL) with histograms will tell CBO when to do index lookup (when there are only a small number of ATT and/or ATT,VAL) and when to do a full table scan.
So it is not *totally* useless.
Thanks,
Robert.
February 02, 2010 - 10:51 am UTC
it is totally useless.
first of all, you will be using bind variables - so, think about your histograms now....
one day you'll full scan - for everything
next day you'll index range scan - for everything
and so on
can you imagine if you do this and the developers are tasked with "fixing things", you know what they'll do right - remove binds. Then, you'll be out of the frying pan and into the fire big time.
And secondly, in an EAV, you pretty much always need to use indexes or full scan depending on the query, you are PIVOTING THE DATA.
select *
from (
select o.*, e1.val foo, e2.val bar .......
from object_table o, eav e1, eav e2, ... eav eN
where o.id = e1.id(+) and e1.name = 'foo' (+)
and o.id = e2.id(+) and e2.name = 'bar'(+)
and ...........
)
where some_eav_column = ?
and some_other_eav_column = ?
that'll full scan, it better full scan.
or,
select o.*, e1.val foo, e2.val bar .......
from object_table o, eav e1, eav e2, ... eav eN
where o.id = e1.id(+) and e1.name = 'foo' (+)
and o.id = e2.id(+) and e2.name = 'bar'(+)
and ...........
AND ID = ?
that'll index range scan, it better - every time.
Architectural Design
Steve, February 02, 2010 - 6:44 am UTC
Tom,
I have been working on Oracle since version 6, at many client sites, across many business areas.
I cannot remember a single database which did not have NAME-VALUE pair architecture in some schema.
Surely the numerous Techincal Architects that have designed these disparate systems cannot all be wrong? I've been told repeatedly that as a Developer I need to "live with it", and that it is the most flexible way, involving the least coding, and least maintenance. These architects must also have very detailed Oracle SQL knowledge as they *know* you can access data very fast by adding indexes.
Should I come across an application where the Oracle database doesn't use this architecture I will let you know.
February 02, 2010 - 12:29 pm UTC
... Surely the numerous Techincal Architects that have designed these disparate
systems cannot all be wrong? ...
why not? why the HECK not??? Seriously - just because it is done makes it good? right?
Sorry, but that is a wrong way of thinking about it.
I do know that *every* developer is so very very very proud of themselves when they "invent" this model (I know for a fact that I was when I invented it in the late 1980's). Every developer invents this - thinking "I've done it, I've solved the insolvable problem"
... Should I come across an application where the Oracle database doesn't use this
architecture I will let you know.
...
and remember, I said it works in a fashion for one thing - storing attributes of a single object, in the context ONLY of that object.
You are using an EAV right now, right here, gasp - on this site. The session state is stored that way.
But fortunately, I never have to query over thousands/millions/billions of sessions that are on page 82 with a field1 value of X or a field2 value of Y.
I only have to ask "what is the current (name=PAGE) for (session=324124321)" or what is field1 for session 43434334 and so on.
the EAV works for that, to store the attributes of an object that are queried in the context of that object only.
The asktom application uses an EAV to store session state.
However the real data, the important stuff - all about rows and columns folks, all about rows and columns.
Funny Tom!
Robert, February 02, 2010 - 1:09 pm UTC
"...gasp - on this site." (funny stuff, Tom!!)
This website and Mr. Tom Kyte is like being enrolled at University and having opportunity of sitting under one of the most revered and top professors there... honest, rigorous, lively, up front, world class.
Thanks again Mr. Tom!
Sigh...
Tim Hall, February 03, 2010 - 4:54 am UTC
The terms "Techincal Architects" and "Database Designers/Modelers" are probably the most abused titles ever.
When they "design" nonsense like this the first thing you should do is ask them to write the SQL to query it. They will soon revert to using regular rows and columns.
Another thing not mentioned here, although I know is close to Tom's heart, is how you validate the contents of the data. You can't define check constraints and foreign keys on this stuff.
As Tom says, for the correct purpose it is fine, but most of the times I've seen this implemented it is because some "designer" thinks that one table can replace all their small lookup tables. Typically that person is not present to see the chaos that ensues why you try (and fail) to make it perform.
In the UK we have the expression, "polishing a turd". I think that applies here. :)
http://www.urbandictionary.com/define.php?term=turd+polishing Cheers
Tim...
February 03, 2010 - 10:09 am UTC
eeeew - that expression is gross :)
Another case....
Robert, February 03, 2010 - 10:38 am UTC
In our situation the design is from a 3rd party vendor whose apparent purpose was to provide a flexible way for end users to modify the application easily.
Which purpose it achieves quite well... but with the negative side effect of poor scalability.
It seems that sometimes more money is available for bigger/faster hardware than for Developers and Developement time! :)
P.S. I would like to add "challenging" to my above list of attributes to Professor/Dr. Tom's University 'classroom' on this website.
Robert.
February 03, 2010 - 11:52 am UTC
... It seems that sometimes more money is available for bigger/faster hardware than
for Developers and Developement time! :)
...
but even that does not fix the EAV when used for evil.
Ah flexibility....
Connor, February 03, 2010 - 7:51 pm UTC
Ah the age old premise of: "if I EAV it, then I can trivially add/remove stuff later", or in its modern guise: "if I XML it" :-)
Most places I've ever worked that have trebled the development time, and quadrupled the hardware to have these "flexible" systems, still end up having months of "impact analysis" when someone wants to actually *implement* one of these flexible changes...So what did you gain spending all that extra time and money ? Diddly squat
root of the problem
Sokrates, February 04, 2010 - 3:14 am UTC
so, I am unsure about of the root of the problem:
is it that the education of decision makers is often poor ?
should more money spent there ?
February 04, 2010 - 1:54 pm UTC
... should more money spent there ?...
maybe not money, but maybe
... experience should count, unless your resume includes at least five glaring failures, complete and utter "this was a bad idea" projects - we don't want you. I don't care how many toy's you've built successfully - I'd like to hear about what you know to avoid....
FAO Tim Hall
A reader, February 04, 2010 - 5:26 am UTC
Re : Turd Polishing.
I heard a UK phrase today which extends your statement.
"You cannot polish a turd"...... "But you can roll it in glitter!"
LOL
Tim Hall, February 04, 2010 - 7:32 am UTC
That made me laugh out loud. :)
Connor: I agree. Whether you use this method, flex fields or new columns, you have to do the testing. It's easy for a developer to write some code that doesn't account for this "flexibility".
Sokrates: The point is, no solution is universally good or bad. The trick is to know when they apply. Many of these "design" issues come about because people making the decisions don't get their hands dirty and understand the effect of their decisions.
This is why we always say developers should use databases populated with realistic production data volumes. That way they can identify and give feedback on design issues very early on in the process.
Cheers
Tim...
Chief technical architect of engineering MD
Bravid, February 04, 2010 - 9:58 am UTC
...we've got some of these Architects on site at the moment from a large consultancy firm. Their ideas are "beyond cutting edge", for example: we currently have lots of legacy overnight processing written very badly in pl/sql. Although we're making inroads to refreshing all of the code and bringing it up to date, the proposed solution is to take around 200GB of data out of the database using an ETL tool and push it into flat files. Then we'll use the ETL tool to aggregate and chop up the data before reloading it into a "cube" database which of course will all be in memory so it will be lightening fast!
And the next stage is to get rid of all our messy tables and have a single blob of XML to store everything. That way we can pass xml packets to stateless whatsits and they will process everything at the speed of light...
"Beyond cutting edge" looks more like "in freefall without knowing it" to me, but hey I've not got the job title to support my hypothesis.
Commenting on the Comments.....
Robert, February 04, 2010 - 3:16 pm UTC
First of all, thanks Tom for that tip on the failed projects on the resume (I might just wait and bring them up in the interview phase).
But I'd like to comment on all the commenters....
I'm sure all you guys are real smart (If "Connor from Perth" is Connor McDonald, I have your PL/SQL book), probably with lots of degrees after your names... Maybe you all work in Ivory Towers, or for the government or something but you sound like you live and move in a perfect, antiseptic ITS laboratory.
But in my world technical excellence and experience does not constitute 100% (or even 50%) of decision ... at least at higher levels. Blame it on the Directors, the CIO, the CEO, the Board, whoever, when you put all the politics, deadlines, changing business goals, bureaucracy, etc. in to the mix, us guys in the trenches just have to make the best of what we have to deal with. Even with all of the expert input, salesmanship, persuasion, experience, we can muster etc.... many times we just have to make the best of what we have and try to have an influence for good on the next business software package to buy... So maybe a DBA with the charisma and power of a General George Patton (or General George Washington) can affect/effect the kind of decisions/technology you are talking about....(that is, if he can avoid getting fired for 'not being a team player'! ;)
February 04, 2010 - 3:49 pm UTC
but, and this is a huge but, we must never stop speaking the truth, based on experience.
Never.
We must never just sit idly by and watch bad decisions be made without speaking up.
We must never not get it on record that we think this is a really bad idea (tm).
We must never be complacent.
We must never stop trying to educate.
I've had more than one CIO, CTO, chief architect, whatever yell at me, threaten to not work with us because I've given them a blunt set of feedback on their architecture. It has happened in the past, it'll happen in the future, it happens almost every day it seems. I'll keep doing it - every now and then, I get something changed and that makes me feel good.
Maybe if there are enough war stories out there, documented, people will catch on. Probably not, but who knows.
One of my all time favorite 'tuning calls' ever went like this.
The developers, in front of management and all, drew on the whiteboard the data model for their data warehouse (yes, *warehouse*, remember that, big stuff). It was easy for them to draw because it only consisted of four tables. They were having massive performance issues - huge).
So, they drew it on the board and I watched. I asked to project my laptop - and I put on on the board a chapter I wrote for effective Oracle by Design.
Their data model was right there - even the names of my four tables were similar to theirs.
They were ecstatic - they said "wow, you know our model, this is great, you'll be able to help us!"
Then I scrolled up to the section heading:
Don’t Use Generic Data Models
Frequently, I see applications built on a generic data model for “maximum flexibility” or applications built in ways that prohibit performance. Many times, these are one and the same thing! For example, it is well known you can represent any object in a database using <b>just four tables</b>:
beginning of a long long day. But - they changed the model, and their stuff works. Fancy that.
To Robert, Commenting on the Comments
Martijn Hoekstra, February 04, 2010 - 3:40 pm UTC
Ah, you're getting very close to understanding the "it depends"-mantra.
Very good, keep it up!
Offtopic, but maybe worthwhile mentioning, since it's a strange coïncidence:
ORA-12899: value too large for column "ASKTOM"."ATE_QUESTION_REVIEWS"."REVIEWER_LOCATION" (actual: 118, maximum: 100)
I got it when entering:
"NLD, 42'nd degree east of the westside if you look from the south..I think... *launches Google Maps just to make sure*"
It reminded me of this question today on OTN:
http://forums.oracle.com/forums/message.jspa?messageID=4075083#4075083 Why allow a client to enter more chars than the column can handle in the database?
Great Feedback! Thanks!
Robert, February 04, 2010 - 4:01 pm UTC
WTG -- Worse Than Generic
Duke Ganote, February 04, 2010 - 4:23 pm UTC
A reader, February 04, 2010 - 7:29 pm UTC
<quote>
But - they changed the model, and their stuff works
</quote>
I was a new hire to that assignment .
In one of our discussion, I recommended againt the "four table " approach , referring to Tom's book ( Design to perform" ( Actually they had six tables :-)
I was accused of being a "typical db guy .. not competent enough to understand their flexible , powerful application arch ..."
If the suggestion would have been from Tom , Jonathan , Gopalakrishnan .. ( probably ) they would have listened.
My suggestion did not fly ... They threw in lot of hardware to tune the performance...
It was a "Band Aid " from my perspective
Fortunately enough , I got out of that place.
....and yet here we are
Richard Henderson, June 22, 2011 - 3:28 am UTC
I just got to look at one of these beasts with the intent to tune it. Now, not withstanding the fact it may not be the most useful approach, sometimes it really is the only way. I seek not to judge.
Dynamically changing schemas just won't fly in the majority of organisations, so here's my approach to tuning this. We'll see how it goes:
1) Partitioning: Database tuning of bulk data is all about divide and conquer. Firstly, can we fit the candidate set in memory and get the 100-fold boost from there? If we have EE then a dynamic partition could do it. I don't, so a more brute force tactic is to move the candidate data on insertion (for the typical case of a date limited partition over a historical data set) or indeed simply splitting the data-set across multiple EAV tables and UNION'ing as needed (might be tricky, we shall see). What we are looking for is distinct groups of data.
2) Denormalising: An EAV table-set is effectively a super-normalised structure. So We can pack some number of columns on each entity (bad name 'entity', it is really an instance/row of an entity in Chen-speak), so instead of EAV we get E+100*(AV), giving us 100:1 improvement, though forcing some clever row composition logic, and with some small cost for all those NULL's if it is very sparse.
3) Semi-structuring: Most EAV schemes, I would hazard, do have some stable structure, so if using (2) above, try to keep the stable columns of any particular entity type (equivalent of table) in the same generic columns. This is fragile, so possibly too far in the bag of tricks.
4) Reassert relational structures: Where there are common columns, get them into proper columns with names/indexes etc. Dates are often common and "meta".
5) Compression: Where an attribute can be enumerated, in the sense that there a limited number of fixed values, use encoded values, the easiest are array indexes into the array of values.
6) Association: Where a group of attributes occur together, pack them into a fixed width field and use string/logical formulae to mask/filter them. Similar to (2) above but with a fixed structure and so reducing the effort and potentially allowing filtering in the inner scan loop.
And so on. Hopefully the sheer effort involved here will put poeple off EAV in the first place, but if that is where you are, then maybe this will help. I'll check back in if I find more useful patterns, or indeed find that theory and practice wildly diverge....
June 22, 2011 - 10:38 am UTC
1) I don't see what partitioning would do, the problem with EAV's is everything needs rows and columns - it is the act of putting it back together that is a hassle. Asking OR type questions is near next to impossible.
2) it is *not* normalized. It is just improperly modeled, not modelled really - tere is NO model.
3) this makes sense.
5) I don't see what that would buy you - it is the lack of indexing, the lack of query power (think OR's again), the lack of rows and columns that present the problem. Making it "smaller" and even more complex isn't going to help.
6) you lost me here. a band-aid on top of an amputation.
I don't see how making the problem smaller is going to help - it is the sheer complexity of querying this model, the work it takes to put it back to together, the inability to effectively index and so on that makes this a bad idea.
An EAV is good for certain things - like a shopping cart perhaps. Something you retrieve as a single instance and never query across.
They are pretty much not good for an Order Entry system, an HR system and so on. You tend to query across the "rows" and having to put it all back together - ugh.
Name Value Pair Data Model
Manas Nayak, August 12, 2012 - 5:36 pm UTC
Hi Tom,
I have just joined an in flight project for a fortune 100 company and find the lead data modeler has created a name value pair data model. The reason given was flexibility of data model considering some records can have as many as 900 attributes. The .Net application team seem to be loving it, as they can code everything with very few tables in mind.
I find it quite combursone as it is difficult to query, difficult to load and difficult to syndicate and in all likelyhood will suffer from poor performance. Also the data model can't be considered as a rich metadata repository and difficult to understand.
The project team has made executive decision to go ahead with the data model, as the model seem to be existing for last 6 months and application development team has completed the design o .Net screens and .Net business logic keeping this flexible data modelin mind.
Any comments?
August 17, 2012 - 1:50 pm UTC
Just the ones I've already written many times, the original answer above for example.
This EAV (entity-attribute-value) model is very very very secure.
It is easy to get data in
It is impossible to get it back out
If all the developers ever want to do is give you a key and get all of the attributes - then it is great, it is perfect.
If they want to search for all of the people whose name is Tom or live in VA - it is going to get painful.
Robert, September 05, 2012 - 7:11 am UTC
Don't get me wrong. I understand the complexity of querying against an EAV, because we have them in our shop and I am using it. However, we cannot deny the efficiency of the "no model" approach, especially working in a big organization where everyone want to stick their finger into the pie.
This is where a data warehouse comes into play with a proper data model. People should always be using the data warehouse and only approach the EAV in times of desperation.
The main benefit of EAV (in the context of bureaucracy) is, at least, the data is somewhere.
September 10, 2012 - 7:43 pm UTC
how many times has this 'efficiency' led to 'complexity' itself. who knows what data is in there, who knows if the data has any integrity (i do, it *doesn't*, 100% sure of that, not a single chance it does)
transforming EAV into dimenions and facts
Joshua Magana, October 17, 2012 - 12:20 pm UTC
I am currently at a company building a data warehouse that uses an EAV model in the source to allow the "flexibility" to add an infinite number of what I will call "virtual columns". This is "butt pucker" issue from a DW prospective. I spoke to Fabian Pascal last night and he too did not have kind words for the design. Tom, can you be my Dr. Laura and give me the cold painful truth as to how painful this will be to get into an star schema by pivoting the rows into columns, which I also assume will have to be done on a per client basis since each one controls what goes into the EAV tables?
October 18, 2012 - 7:15 am UTC
oracle xml db
A reader, June 11, 2014 - 6:52 am UTC
Hi,
I am having name value pair xml and which i need to query using oracle xml db. Can you please help throw some light on this below requirement.
XML File
<?xml version="1.0" encoding="UTF-8"?>
<application xmlns="X/ApplicationManagement" name="6A">
<repoInstanceName>-1</repoInstanceName>
<maxdeploymentrevision>5</maxdeploymentrevision>
<NVPairs name="Global Variables">
<NameValuePair>
<name>ApplicationName</name>
<value>x</value>
</NameValuePair>
<NameValuePair>
<name>App_ID</name>
<value>x</value>
</NameValuePair>
<NameValuePairInteger>
<name>App_ID1</name>
<value>x</value>
</NameValuePairInteger>
<NameValuePairPassword>
<name>App_ID_password</name>
<value>x!-fdfdfd-</value>
</NameValuePairPassword>
</NVPairs>
</application>
SQL - Tried but not working
select x.* from XML_RUNFILES_CLOB t,
XMLTable(
XMLNamespaces('
http://www.tibco.com/xmlns/ApplicationManagement' as "p"),
'//p:application/p:NVPairs[@name=''Global Variables'']' passing t.FILECONTENT
columns
pname varchar2(100) path '//p:name',
pvalue varchar2(100)path '//p:value')x
Sriram, June 12, 2014 - 7:08 am UTC
Hope this query helps your question.
Answer
select x.* from XML_RUNFILES_CLOB t,XMLTable(XMLNamespaces('
http://www.tibco.com/xmlns/ApplicationManagement' as "p"),
'for $i in /p:application/p:NVPairs[@name=''Global Variables''] return ($i/p:NameValuePairInteger,$i/p:NameValuePair)' passing filecontent
columns
name varchar2(150) path 'p:name',
value varchar2(150) path 'p:value')x