Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question.

Asked: April 10, 2002 - 8:22 pm UTC

Last updated: October 28, 2008 - 7:50 am UTC

Version: 901

Viewed 1000+ times

You Asked

Hi Tom

We are trying to design our database to accomodate dynamic alteration of our schema by providing
user defined columns. The client should be able to add columns to tables without altering the base
schema. Example
If we have a customer table with the example data as follows:

Cust_ID 100
Cust_Name Corrie & Walter Inc
Cust_Contact Scott

Since we dont at this time what other attributes might be required during the implementation we
provide a Cust_Extended table where we define the specific attributes such as follows :

Attribute_ID 1001
Cust_ID 100
Attribute_Value 94041

Also we have a User_defined_Attributes table as follows

Attribute_ID 1001
Atrr_Name ZipCode
Attr_Datatype Number

So in this table we can add all the user defined columns that need to go in the Cust_Extended table.

The data that would go in such tables would be transaction data in millions of records.

My question is as follows:
Is this a good design approach as this involves a lot of transaction data. If I have to search on these tables using joins on them wouldnt it be extremely slow?

For Example

Find the customer whose zipcode=94041 or Company='Compaq'

select cust_id from Cust_Extended where
attribute_value in ('94041','Compaq')
And Attribute_id in
(select Attribute_id from User_defined_Attributes where
atrr_name in ('ZipCode','Company'));



Schema source:


Create table customer
(Cust_ID number,
Cust_Name varchar2(34),
Cust_Phone varchar2(50));

Create table Cust_Extended
(Attribute_ID number,
Cust_ID number,
Attribute_Value varchar2(64));

Create table User_defined_Attributes
(Attribute_ID number,
Atrr_Name varchar2(54),
Attr_Datatype varchar2(32));

insert into Customer
Values(100,'Corrie', '6501111111');
insert into Customer
Values(200, 'Scott', '4087879999');
insert into Customer
Values(300,'Smith', '4153338899');

insert into User_defined_Attributes
Values(1001,'ZipCode','Number');
insert into User_defined_Attributes
Values(1002,'Company','varchar');
insert into User_defined_Attributes
Values(1003,'Region','varchar');


insert into Cust_Extended
Values(1001,100,'94041');
insert into Cust_Extended
Values(1002,100,'Microsoft');
insert into Cust_Extended
Values(1001,200,'Compaq');
insert into Cust_Extended
Values(1001,300,'HP');



Thanks
Tanu



and Tom said...

I would prefer an approach that used a fixed number of trailing columns. You would:

create table customer(
Cust_ID number,
Cust_Name varchar2(xxx),
Cust_Contact varchar2(xxxx),
field1 varchar2(4000),
field2 varchar2(4000),
...
fieldN varchar2(4000)
)
/


Now, to "add" an attribute, you would simply use one of the fields. You would still use your attribute table to keep track of whats what (you still have user_defined_attributes). You could even go further and create a view:

create view customer_v
as
select cust_id, cust_name, cust_contact,
to_number(field1) cust_zip_code,
to_date( field2, 'dd/mm/yyyy' ) cust_first_contact,
.....
from customer;

and if you wanted -- you could allow them to create function based indexes on the table (see
</code> http://asktom.oracle.com/~tkyte/article1/index.html <code>
) so they could search:

select * from customer_v where cust_zip_code = 20175 and cust_first_contact between sysdate-50 and sysdate;

and use an INDEX on (to_number(field1), to_date(field2,'dd/mm/yyyy')) to retrieve the results rapidly (rather then a full scan). That is something that would be virtually IMPOSSIBLE to achieve with that other schema (i have a name for that other schema, I call it the "funky data model", very generic -- very poor performing).

Going a step further, you can even put INSTEAD of triggers on the customer_v (you would dynamically generate them as they added/removed attributes from the attributes table) that allows them to insert/update/delete via the view easily.


Using this technique, you could deliver something with flexibility and the potential for performance.

Rating

  (6 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

a reader, December 23, 2003 - 3:28 pm UTC


Lao Zhang

A Reader from NC, October 27, 2008 - 5:50 pm UTC

Tom,
I am still confused by your solution. Can you provide some code examples how to populate the customer table and how you query the table, like you have done all the time?
Thanks a lot.
Tom Kyte
October 27, 2008 - 8:38 pm UTC

what is to be confused about?

I did actually give code snippets above.


All I'm saying is "keep the attributes in the row" - "do not use an EAV (entity attribute value) model"

Keep is in the row - you stand a chance of having some decent performance, whereas the EAV will kill that.

A reader, October 28, 2008 - 1:26 am UTC


B*Tree /Sorted hash clusters

A reader, October 28, 2008 - 2:45 am UTC

Hi Tom,

I am slightly perplexed about a design choice.
I've an entity "Customer", whose attributes are extending to over 500 fields. A colleague suggested to divide it into 2 tables and pre-join them using B*Tree cluster to avoid performance bottlenecks due to increased row length and row-chaining.
The less frequently used columns can be stored in the 2nd table.

I am not too convinced about it, and feel that all the columns which are logically related in a one-one relationship should always be stored in a single table.

I therefore wanted to seek your opinion on this.

Thanks
Tom Kyte
October 28, 2008 - 7:50 am UTC

let us see.

would we rather

a) use a 500 column table which will be chained, but probably onto the same block and that can be put back together by simply traversing a row piece pointer found at the end of the first chain...

b) use two tables such that we would end up still with the logical equivalent of a chained row, but one that would require an index unique scan to find the second bit?


In the case of A, if you don't need the second set of columns, we'll never traverse to them. They would be typically on the same block as the original row. But if you did need them - we'd just dereference a row piece pointer.

In the case of B, if you don't need the second set of columns, we'll never traverse to them. They would be typically on the same block as the original row. But if you did need them - we'd walk a index to find a rowid to find the row



chained rows are not in themselves a performance problem - IF THEY ARE UNAVOIDABLE.

Chained rows can be a performance problem - when they can be solved, avoided. If you have a table with more than 255 attributes - you will have chained rows, period. If you have a table whose row doesn't fit on a single block, you will have chained rows - period. Chained rows are OK if your design demands them.


Migrated rows, a special case of chained rows, rows that have moved because they no long fit on the block they were inserted into - they can be a performance issue and they can be 'fixed' permanently by thinking about your storage related parameters.


However, you have simply a chained row that cannot be avoided.



ps$tkyte%ORA11GR1> create cluster btree_cluster ( id number ) size 4096
  2  /

Cluster created.

ops$tkyte%ORA11GR1> create index btree_cluster_idx on cluster btree_cluster
  2  /

Index created.

ops$tkyte%ORA11GR1> create table t1 ( id number primary key,  data char(2000) )
  2  cluster btree_cluster(id)
  3  /

Table created.

ops$tkyte%ORA11GR1> create table t2 ( id number primary key,  data char(2000) )
  2  cluster btree_cluster(id)
  3  /

Table created.

ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> @at
ops$tkyte%ORA11GR1> column PLAN_TABLE_OUTPUT format a72 truncate
ops$tkyte%ORA11GR1> set autotrace traceonly explain
ops$tkyte%ORA11GR1> select * from t1, t2 where t1.id = t2.id and t1.id = 1;

Execution Plan
----------------------------------------------------------
Plan hash value: 3173799388

------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Co
------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |     1 |  4030 |
|   1 |  NESTED LOOPS                |              |     1 |  4030 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T2           |     1 |  2015 |
|*  3 |    INDEX UNIQUE SCAN         | SYS_C0014438 |     1 |       |
|   4 |   TABLE ACCESS BY INDEX ROWID| T1           |     1 |  2015 |
|*  5 |    INDEX UNIQUE SCAN         | SYS_C0014437 |     1 |       |
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("T2"."ID"=1)
   5 - access("T1"."ID"=1)



that would be less efficient to retrieve the data - two index scans instead of just one.

An "halfway solution"

draccagni, April 24, 2014 - 10:13 am UTC

Hi Tom,
we are going to design a new schema by using your suggestion. Anyway, just to avoid to miss any alternative, i have a doubt about performance comparing your solution inspite of another approach where i use a unique table anyway, but instead of generic columns, add typized columns for additional metadata. Ex:

create table metadata (
ID NUMBER,
Name VA2(4000),
Surname varchar2(4000),
Age NUMBER,
...
DateReceived DATETIME
)

This table contains any managed attribute, it's obviously a sparse table, but it is an "halfway solution" to please an ORM and a DB approach without creating views each time i have to expose a new group of attributes.

Any feedback will be appreciated.
Thanks in advance,
Davide


Errata corrige: An "halfway solution"

Davide, April 24, 2014 - 10:20 am UTC

Sorry Tom,
here below the right SQL:

CREATE TABLE METADATA (
ID NUMBER,
NAME VARCHAR2(4000),
SURNAME VARCHAR2(4000),
AGE NUMBER,
...
DATE_RECEIVED DATETIME
);

Thanks,
Davide

More to Explore

PL/SQL demos

Check out more PL/SQL tutorials on our LiveSQL tool.

PL/SQL docs

PL/SQL reference manual from the Oracle documentation library