Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Mikito.

Asked: October 26, 2004 - 9:26 pm UTC

Last updated: May 13, 2009 - 7:52 pm UTC

Version: 10.2

Viewed 1000+ times

You Asked

EAV (Entity-Attribute-Value) stores all the data in a "generic" format like this:

create table EAV (
row_id integer,
col_name varchar2(100),
value varchar2(1000)
);

Suppose we have the following data

insert into EAV values (1,'ENAME','SMITH');
insert into EAV values (1,'DEPTNO','20');
insert into EAV values (1,'SAL','800');
...
insert into EAV values (1,'ENAME','ALLEN');
insert into EAV values (1,'DEPTNO','30');
insert into EAV values (1,'SAL','1600');
...

I want to report all salaries grouped by department. Any way to do it without pivoting the data back into a "normal" table? Any clever use of analytics (which as you are aware I'm not a big fan of:-)?


and Tom said...

and this sort of model is one that I a huge anti fan of. (hate it when we give profession sounding acronyms to things like they are "a good thing (tm)"

funny to be an anti-fan of analytics when they prove to be the coolest thing since the keyword SELECT in SQL -- for answering the unanswerable, for answering it fast, for doing things in SQL that required programming. oh well.

you need a very simple pivot.

I really assume your second set of inserts was really to be with the number 2 instead of one...


select deptno, sum(sal)
from (
select max(decode( col_name, 'DEPTNO', value )) deptno,
max(decode( col_name, 'SAL', value )) sal
from t
group by row_id
)
group by deptno;


this is exactly the model I spent a chapter on in my last book making fun of it. Showing how "easy and flexible" such a model truly is, how well it performs (none of that is true, it is neither easy, nor flexible, nor performant -- the easiest questions in the world become the hardest things to answer)

could we do this with analytics? sure, scalar subqueries even. here are three implementations:

select deptno, sum(sal)
from (
select max(decode( col_name, 'DEPTNO', value )) deptno,
max(decode( col_name, 'SAL', value )) sal
from t
where col_name in ('DEPTNO','SAL')
group by row_id
)
group by deptno

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.31 0.34 0 231 0 10
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.31 0.34 0 231 0 10

Rows Row Source Operation
------- ---------------------------------------------------
10 SORT GROUP BY (cr=231 r=0 w=0 time=346835 us)
27817 VIEW (cr=231 r=0 w=0 time=305958 us)
27817 SORT GROUP BY (cr=231 r=0 w=0 time=265262 us)
55634 TABLE ACCESS FULL T (cr=231 r=0 w=0 time=60161 us)
********************************************************************************
select deptno, sum(sal)
from (
select max(decode(col_name,'DEPTNO',value)) over (partition by row_id) deptno,
max(decode(col_name,'SAL' ,value)) over (partition by row_id) sal,
row_number() over (partition by row_id order by col_name) rn
from t
where col_name in ('DEPTNO','SAL')
)
where rn = 1
group by deptno

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.43 0.46 0 231 0 10
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.43 0.46 0 231 0 10

Rows Row Source Operation
------- ---------------------------------------------------
10 SORT GROUP BY (cr=231 r=0 w=0 time=464361 us)
27817 VIEW (cr=231 r=0 w=0 time=417787 us)
55634 WINDOW SORT (cr=231 r=0 w=0 time=350280 us)
55634 TABLE ACCESS FULL T (cr=231 r=0 w=0 time=118476 us)
********************************************************************************
select deptno, sum(sal)
from (
select value sal,
(select value from t t2 where row_id = t.row_id and col_name = 'DEPTNO') deptno
from t
where col_name = 'SAL'
)
group by deptno

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.48 0.51 0 55867 0 10
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.48 0.51 0 55867 0 10

Rows Row Source Operation
------- ---------------------------------------------------
27817 TABLE ACCESS BY INDEX ROWID OBJ#(32233) (cr=55636 r=0 w=0 time=297771 us)
27817 INDEX UNIQUE SCAN OBJ#(32234) (cr=27819 r=0 w=0 time=163237 us)(object id 32234)
10 SORT GROUP BY (cr=55867 r=0 w=0 time=516822 us)
27817 TABLE ACCESS FULL OBJ#(32233) (cr=231 r=0 w=0 time=45530 us)




Rating

  (5 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

The coolest thing since SELECT is subselect (aka scalar subquery)!

Mikito Harakiri, October 27, 2004 - 5:43 pm UTC

EAV

</code> http://tonyandrews.blogspot.com/2004/10/otlt-and-eav-two-big-design-mistakes.html <code>

makes certain things easier (e.g. searching for a value without knowing the schema), while the others become more complicated (e.g. what is analog of multi-column index in EAV?)

I just wanted to make certain I didn't miss some clever way to express a simple group-by query.



as the author said...

A reader, October 28, 2004 - 2:22 am UTC

From your own link ... tom is right - as usual :-)

<quote>
OTLT and EAV are “generic” approaches that are seductive to programmers, but are actually a bad idea for most databases. Resist the temptation!
</quote>

hannibal

Is relational database doomed?

A reader, February 17, 2009 - 7:11 pm UTC

Tom Kyte
February 17, 2009 - 8:35 pm UTC

when you show me a full up - for 10,000 users - payroll system, HR system, a time and attendance system, a <long list of applications, useful ones - beyond say facebook...> written with an EAV

I'll start looking for another job.


and they talk about having to scale in a single (hardware/host) server - we are not encumbered by that limitation.



Funny that they write:

... This new kind of database management system is commonly called a key/value store. In fact, no official name yet exists ...

no official name yet exists, another case of someone 'inventing' the EAV for the 'first' time (I remember the first time I invented it!)

http://asktom.oracle.com/Misc/this-should-be-fun-to-watch.html


The EAV model has very limited focus/use - it is very useful in a very specific type of application.

But to ask general questions of it - no, it does not work, "to scale" is not part of it's vernacular.


I love their picture of a car on page 2
http://www.readwriteweb.com/archives/is_the_relational_database_doomed.php?p=2


umm, how is that different from

create table car(
silly_id number primary key,
make,
model
color,
year )




Ok, this is annoying:
<quote>


Suitability for Clouds

The first benefit is that they are simple and thus scale much better than today's relational databases. If you are putting together a system in-house and intend to throw dozens or hundreds of servers behind your data store to cope with what you expect will be a massive demand in scale, then consider a key/value store.

Because key/value databases easily and dynamically scale, they are also the database of choice for vendors who provide a multi-user, web services platform data store. The database provides a relatively cheap data store platform with massive potential to scale. Users typically only pay for what they use, but their usage can increase as their needs increase. Meanwhile, the vendor can scale the platform dynamically based on the total user load, with little limitation on the entire platform's size.

</quote>

they are simple
THUS
they scale

HUH? BOING? Wait WHAT?

they are simple, therefore they scale.

read that to yourself over and over and over again.

they are simple, hence we say they shall scale.

Fascinating.

Too bad it is, well, not factually accurate.



But, the article is not entirely incorrect, this is well said:

<quote>

Key/Value Stores: The Bad

The inherent constraints of a relational database ensure that data at the lowest level have integrity. Data that violate integrity constraints cannot physically be entered into the database. These constraints don't exist in a key/value database, so the responsibility for ensuring data integrity falls entirely to the application. But application code often carries bugs. Bugs in a properly designed relational database usually don't lead to data integrity issues; bugs in a key/value database, however, quite easily lead to data integrity issues.

</quote>




When my paycheck is produced or producable by a one of these systems on a grand scale.... I'll be right there.

Table Design and EAV

A reader, May 11, 2009 - 12:44 pm UTC

http://structureddata.org/2009/03/19/the-impact-of-good-table-and-query-design/

This article seems to suggest that there's no problem with EAV. It's how you write your SQL.
Tom Kyte
May 11, 2009 - 7:13 pm UTC

are you sure you read it closely?

seems to me that

a) it was all full scans
b) every time you use the EAV, it is multiple full scans OR a full scan plush a rather expensive pivot operation



Table Design and EAV Followup

A reader, May 13, 2009 - 7:06 pm UTC

Yes, I also noticed what you observed. However, the article started out with this:

"I think that frequently people underestimate the power of SQL (for both ¿good¿ and ¿evil¿). All too often in data warehouses, I see tables designed for one specific report, or a very select few reports. These tables frequently resemble Microsoft Excel Spreadsheets (generally Pivot Tables), not good Dimensional (Star Schema) or Third Normal Form (3NF) schema design. The problem with such designs is that it severely limits the usefulness of that data, as queries that were not known at the time of design often time become problematic. The following is a simple one table example, derived from a field experience in which I discuss two table designs and provide the SQL queries to answer a question the business is seeking."

It looks to me that he's a proponent of the EAV design as opposed to the "spreadsheet" design, and he uses the example to demonstrate that EAV is actually better if the SQL is written properly.

Am I not reading it right?


Tom Kyte
May 13, 2009 - 7:52 pm UTC

No, no - I don't see that at all.

I read:

All too often in data warehouses, I see tables designed for one specific report, or a very select few reports. These tables frequently resemble Microsoft Excel Spreadsheets (generally Pivot Tables), not good Dimensional (Star Schema) or Third Normal Form (3NF) schema design. The problem with such designs is that it severely limits the usefulness of that data, as queries that were not known at the time of design often time become problematic.

and I hear "do the right thing, don't denormalize as a matter of fact, THINK about your design"

Nothing in there says "EAV is good" to me.

I see "star schema" (bunch of master details, a fact with dimensions)
3rd normal form

The quote "the problem with such designs" to me refers to the "the tables frequently resemble MS excel spreadsheet". A spreadsheet design is a massively denormalized thing - it is not the opposite of an EAV - and the proposed structures are star schema, normalized designs.

An EAV is neither of those

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.