Skip to Main Content
  • Questions
  • Table design Normalization or denormalization

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Kirthi.

Asked: April 22, 2010 - 2:01 pm UTC

Last updated: May 10, 2010 - 6:33 pm UTC

Version: Oracle 10.2

Viewed 1000+ times

You Asked

The database I work on is used to store late delivery and quality deficiency information on vendors for each product they supply. The information on each type of quality deficiency and late deliveries on products supplied are stored in separate tables. Each product has an FSC (Federal supply code) code to identify what the product is. Now the system would like to expand its services to store delivery and quality records for products classified using the NAICS code too. Below is the current database structure of one of the tables before the change. After loading the data into these tables the data will be mostly (90%) queried and there will be some updates/deletes. One of these tables has 5 million records.

I had two options on mind but I am bit conflicted as to which one to choose. I will present both the options below. One of our programmers had suggested using an approach that is different than the two options that I had in mind. I am not convinced about the approach he had suggested but I was unable to give a sound reasoning as to why it might not be a good idea. I will present his suggestion as well below. Can you please give me your suggestions for an appropriate design and also and its flexibility for future changes that might come along. Please if you could comment on the pros and cons of each of the below suggested options/approaches it would be very useful in backing up why one approach is better than the other. Thank you.


Table Structure:  DELIVERY_TABLE
VENDOR_ID CHAR(5)
FSC CHAR(4)
CONTRACT_NUMBER  VARCHAR2(32) (pk)
REQUIRED_DATE DATE (pk)
RECEIVED_DATE DATE


<u>Proposed solution 1:</u>

<b>Table Structure:  DELIVERY_TABLE</b>
VENDOR_ID CHAR(5)
FSC CHAR(4)
CONTRACT_NUMBER  VARCHAR2(32) (pk)
REQUIRED_DATE DATE (pk)
RECEIVED_DATE DATE
NAICS VARCHAR2(6)

<u>Proposed solution 2:</u>

<b><b>Table Structure:</b>  DELIVERY_TABLE</b>
VENDOR_ID CHAR(5)
CONTRACT_NUMBER  VARCHAR2(32) (pk)
REQUIRED_DATE DATE (pk)
RECEIVED_DATE DATE

<b>Look up table: DELIVERY_PRODUCT_TYPE</b>
Contract_number VARCHAR2(32) 
REQUIRED_DATE DATE
Product_type CHAR(1)
Product_code VARCHAR(6)

Composite foreign key on contract_number, required_date from the “DELIVERY_PRODUCT_TYPE” table to composite pk contract_number, required_date in the “DELIVERY_TABLE”

<u>Proposed solution 3:</u> (Programmer suggestes solution)

<b>Table Structure:  DELIVERY_TABLE</b>
VENDOR_ID CHAR(5)
CONTRACT_NUMBER  VARCHAR2(32) (pk)
REQUIRED_DATE DATE (pk)
RECEIVED_DATE DATE
Product_type CHAR(1)
Product_code VARCHAR(6)


I also have some tables where the FSC code is currently in a different table and a foreign key is used to identify the FSC code for a quality deficiency record i.e., a parent child relationship. In this situation do you modify the child table to accommodate the product type code and value so both NAICS and FSC codes for the record can be handled in one child table.

General background information:
The "DELIVERY_TABLE" will have vendors delivery information. Most of this information would be negative/late delivery information. This information
will be used in a procedure to calculate the delivery score of each vendor for each product they have supplied. Every month this score is calculated for all

vendors for each product they supply. This information is then used by government officials to determine who would be a better vendor for a product. During the preview period the vendor can challenge a record if they feel it is not correct. So some updates are done on the table but more than 90% of the time there are going to be queries executed to calculate their delivery scores or to show the negative records to the vendor.


Similarly there are other tables which store vendor's data on products with deficiency. This information will similary be used as above to determine the

quality of each product a vendor supplies. I have not given the table structure for these tables but I assume the same solution that would be used for

storing delivery records would apply here.


The primary key for the "DELIVERY_TABLE" is CONTRACT_NUMBER and REQUIRED_DATE DATE and will be the primary keY after the NAICS information is added.

Solution 1:
1. Currently FSC is a NOT NULL field. After NAICS column is added either FSC or NAICS will have a value for each record.

2. VENDOR_ID is a NOT NULL field and this constraint will remain after the NAICS column is added.

CREATE TABLE DELIVERY_TABLE
(
CONTRACT_NUMBER VARCHAR2(32 BYTE) NOT NULL,
SERVICE_DESIGNATOR_CODE CHAR(1 BYTE) NOT NULL,
VENDOR_ID CHAR(5 BYTE) NOT NULL,
FSC CHAR(4 BYTE) NOT NULL,
REQUIRED_DATE DATE NOT NULL,
CANCEL_DATE DATE,
SHIP_DATE DATE,
RECEIPT_DATE DATE,
ADDED_DATE DATE NOT NULL,
UPDATE_DATE DATE,
REASON_FOR_DELAY_CODE CHAR(2 BYTE),
CHALLENGE_CODE CHAR(1 BYTE),
CHALLENGE_DATE DATE,
CONSTRAINT DEL_TBL_PK PRIMARY KEY(CONTRACT_NUMBER,SERVICE_DESIGNATOR_CODE)
);

Solution 2:

The "DELIVERY_PRODUCT_TYPE" table with the "PRODUCT_TYPE" and "PRODUCT_CODE" would provide some flexibility when in the future if another product

classification type records (other than FSC and NAICS) needed to be added. The table is normalized and also it would makes it easier to modify the procedures

to incorporate the new product type records.

The relationship is between the "DELIVERY_TABLE" and "DELIVERY_PRODUCT_TYPE" table is 1-1.


CREATE TABLE DELIVERY_TABLE
(
CONTRACT_NUMBER VARCHAR2(32 BYTE) NOT NULL,
SERVICE_DESIGNATOR_CODE CHAR(1 BYTE) NOT NULL,
VENDOR_ID CHAR(5 BYTE) NOT NULL,
REQUIRED_DATE DATE NOT NULL,
CANCEL_DATE DATE,
SHIP_DATE DATE,
RECEIPT_DATE DATE,
ADDED_DATE DATE NOT NULL,
UPDATE_DATE DATE,
REASON_FOR_DELAY_CODE CHAR(2 BYTE),
CHALLENGE_CODE CHAR(1 BYTE),
CHALLENGE_DATE DATE,
CONSTRAINT DEL_TBL_PK PRIMARY KEY(CONTRACT_NUMBER,SERVICE_DESIGNATOR_CODE)
);

CREATE TABLE DELIVERY_PRODUCT_TYPE
(
CONTRACT_NUMBER VARCHAR2(32 BYTE) NOT NULL,
REQUIRED_DATE DATE NOT NULL,
Product_type CHAR(1) NOT NULL,
Product_code VARCHAR(6) NOT NULL,
CONSTRAINT delvProdTyp_ctrNo_reqDt_FK FOREIGN KEY(CONTRACT_NUMBER,REQUIRED_DATE) REFERENCES DELIVERY_TABLE(CONTRACT_NUMBER,REQUIRED_DATE),
CONSTRAINT delvProdTyp_pk PRIMARY KEY(CONTRACT_NUMBER,REQUIRED_DATE)
);


Solution 3:

CREATE TABLE DELIVERY_TABLE
(
CONTRACT_NUMBER VARCHAR2(32 BYTE) NOT NULL,
SERVICE_DESIGNATOR_CODE CHAR(1 BYTE) NOT NULL,
VENDOR_ID CHAR(5 BYTE) NOT NULL,
REQUIRED_DATE DATE NOT NULL,
CANCEL_DATE DATE,
SHIP_DATE DATE,
RECEIPT_DATE DATE,
ADDED_DATE DATE NOT NULL,
UPDATE_DATE DATE,
REASON_FOR_DELAY_CODE CHAR(2 BYTE),
CHALLENGE_CODE CHAR(1 BYTE),
CHALLENGE_DATE DATE,
Product_type CHAR(1) NOT NULL,
Product_code VARCHAR(6) NOT NULL,
CONSTRAINT DEL_TBL_PK PRIMARY KEY(CONTRACT_NUMBER,SERVICE_DESIGNATOR_CODE)
);

The programmers reasoning was that it (to have the "product_type" and "product_code") would be flexible for future needs if we have to include records under

a different product classification type. In addition he mentioned that it would also make it easier to modify the queries.

My thinking is it is not normalized and I am not sure if it is a good idea to implement it using solution 3.

P.S.
How do you create mutually exclusive constraints?

and Tom said...

I don't like "EAV" - entity attribute value - models, that is model two (it is not normalized, it is generic - an EAV). Also, how would you enforce the "must have AT LEAST one value, AT MOST one value", you need AT LEAST one value in the child table when you commit, you need AT MOST one value in the child table when you commit. It can be done - but it is pretty complex.


I'm not a fan of solution 3 technically - in most cases. It hides the meaning of the attribute, it makes it hard to validate (an FSC should be 4 characters the other should be 6 - perhaps there are other rules - like in the future - FSC's must exist in some other table (lookup, domain validation). Not really possible with #3.


But this is maybe a toss up between 1 and 3.

#3 would be acceptable if this product code is meaningless to your application - if you do not consider an FSC different from a NAICS. I know the product type could differentiate them - but that is besides the point. If they are different attributes - they should be different attributes.


I would favor #1, the constraint on #1 would be

check ((fsc is null and naics is not null) or (fsc is not null and naics is null))


or it could be:

check( decode( fsc, null, 0, 1 ) + decode( naics, null, 0, 1 ) = 1 );

that 2nd one is not as clear, but would be easier to add to over time if needed.



that way, FSC can be a varchar2(4), NAICS can be a varchar2(6) and if the developer *really* wanted to, they could:

create view v as
select ... coalesce( fsc, naics ) product_code, 
           case when fsc is not null then 'F' 
                when naics is not null then 'N'
           end product type,
       .....



if they really wanted that view.

Rating

  (26 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Table design

Kirthi, April 26, 2010 - 10:32 am UTC

Tom, I am unable to see your response to my question which says it has been answered. Can you please let me know where I can find your response. Thank you.
Tom Kyte
April 26, 2010 - 10:38 am UTC

I had a duplicate from you - I asked you for additional information using the other question - check your email.

Table design

Kirthi, April 26, 2010 - 10:57 am UTC

Tom,
I have not received your e-mail in my other email account yet. Is it possible for you to e-mail it to my e-mail address associated with this post. Thank you. I appreciate it.

Table design

A reader, April 26, 2010 - 1:56 pm UTC

Tom,
Thank you for your detailed response. It is enlighting to read your response.

1. If implementing solution 1 where NAICS and FSC have their own data column
then would it be a an issue if one of the columns have NULL in them. Since if we have a table with 5 million records with most of the NAICS data set to NULL do you see any concern? If it is NULL it means that no space has been used and it is not an issue, am I correct?

2. Can you explain a little bit more on what you mean by "EAV" - entity attribute value and when to use the solution 2.

Thank you,
Kirthi



Tom Kyte
April 26, 2010 - 2:01 pm UTC

1) correct


2) your model of having the child table with a foreign key to the parent and a generic row is just like an EAV- entity attribute value - model. Typically, they are used like this:


create table object ( object_id primary key );

create table object_attributes( object_id, attribute_name, attribute_value, primary key(object_id,attribute_name), foreign key (object_id) references object);


insert into object values ( 1234 );
insert into object_attributes values ( 1234, 'first name', 'thomas' );
insert into object_attributes values ( 1234, 'last name', 'kyte' );
insert into object_attributes values ( 1234, 'middle name', 'jeffrey' );
insert into object_attributes values ( 1234, 'state', 'VA' );


etc etc - there I just stored a person. This is very flexible, I can add attributes whenever I want!

it is also wrong in this case, not an appropriate use.


Stick EAV or entity attribute value into google if you want more on it.

Table design

A reader, April 26, 2010 - 2:05 pm UTC

Thank you, Tom. I understand when it is not supposed to be used. Thank you again for your valuable insights.

Table design

A reader, April 27, 2010 - 6:38 am UTC

Tom,
The programmer has done most of his coding using solution 3 and is reluctant to have an FSC and NAICS column as its own columns in the table as in solution 1. I am thinking of making the changes as proposed in solution 1 to the base tables and create Views for him on those tables to mirror the structure of solution 3. Do you see any issues with that?

Thank you,
Kirthi
Tom Kyte
April 27, 2010 - 7:02 am UTC

what ever happened to design reviews? I know the environment you are working in (govt) - nothing ever happens without many meetings - how did this come about?

Just hack it together, it'll probably work, it'll probably be easy to maintain (opposite day)

Anyway, the view will work for data retrieval, but not so much for insert/update/delete - and please do not suggest instead of triggers - true they could be used, but could be is not equivalent to should be.

I'd go back to the premise that "this is a large change that will affect us for a long long time, we need to design this - code be darned"

Table design

A reader, April 27, 2010 - 7:27 am UTC

Tom,
I had not been assigned to the project when they started developing it. In between priorities changed and the focus was to get another project done first. So this one waited until the other got done and now it is being resumed. They recently assigined me to this project and I guess in the mean time the developer had already completed the coding and there was no DBA assigned during the beinging stages to ensure it was designed right.

I plan to say that the implemented solution 3 would make it difficult to validate the data i.e., FSC CHAR(4) or NAICS VARCHAR2(6) if using product_type and product_code columns.
What would your arguement be if I get countered saying that the queries of flexibility will be lost for new product type classification of data for now and future. If they say the benefit of having the product_type and product_code exceeds the benefit of solution 1. If they add that we can ensure data is validated when they are loaded so it would not be an issue.

Thanks,
Kirthi
Tom Kyte
April 27, 2010 - 8:23 am UTC

... I plan to say that the implemented solution 3 would make it difficult to
validate the data i.e., FSC CHAR(4) or NAICS VARCHAR2(6) if using product_type
and product_code columns. ...

and they'll come back with:


check (product_type='F' and length(product_code)=4) or (product_type='N' and length(product_code)=6)

or something :)




You already know how to address "the queries of
flexibility " - it is called "a view"

Table design

A reader, April 27, 2010 - 8:31 am UTC

Sorry to go back and forth on this. Just wanted to confirm.

1. Go with solution 1. FSC and NAICS would be separate columns.

2. If there is a complex query that he is finding difficult to right then create a View for it. Deal with it on a case by case basis.


Tom Kyte
April 27, 2010 - 8:59 am UTC

I would have used two columns, yes...

Use views like you would use stored procedures or subroutines, to hide complexity, to reuse common logic.

Table design

A reader, April 27, 2010 - 10:27 am UTC

Thanks.

Table design

A reader, April 27, 2010 - 12:41 pm UTC

Tom,
If you consider this to be a new question please let me know I will create a new one. It is related to what I had discussed earlier hence wanted to follow up on the same thought process.

CREATE TABLE USER_A.DELIVERY_TABLE
(
CONTRACT_NUMBER VARCHAR2(32 BYTE) NOT NULL,
SERVICE_DESIGNATOR_CODE CHAR(1 BYTE) NOT NULL,
CAGE_CODE CHAR(5 BYTE) NOT NULL,
FSC CHAR(4 BYTE),
NIIN CHAR(9 BYTE),
TERMINATION_CODE CHAR(1 BYTE),
CHALLENGE_CODE CHAR(1 BYTE),
CHALLENGE_DATE DATE,
REQUIRED_DATE DATE NOT NULL,
CANCEL_DATE DATE,
SHIP_DATE DATE,
RECEIPT_DATE DATE,
ADDED_DATE DATE NOT NULL,
UPDATE_DATE DATE,
REASON_FOR_DELAY_CODE CHAR(2 BYTE),
CHALLENGE_COMMENTS VARCHAR2(2000 BYTE),
QUANTITY NUMBER,
PRICE NUMBER(16,4),
AWARD_DATE DATE,
PR_INITIATION DATE,
NAICS VARCHAR2(6 BYTE),
CONSTRAINT TEMP_CDD_FSC_NAICS_CK CHECK ((fsc is null and naics is not null) or (naics is null and fsc is not null)),
CONSTRAINT TEMP_CDD_PK PRIMARY KEY (CONTRACT_NUMBER, REQUIRED_DATE)
);


NAICS_FSC_RELATIONSHIP is the relationship table which has the corresponding NAICS code for an FSC and vice versa. One FSC can have multiple NAICS and vice versa. It is a many to many relationship.


CREATE TABLE USER_A.NAICS_FSC_RELATIONSHIP
(
FSC CHAR(4 BYTE),
NAICS VARCHAR2(6 BYTE),
CONSTRAINT NAICS_FSC_RELA_PK PRIMARY KEY (FSC, NAICS)
);


Let us say for example vendor “vendor_A” produces product screws. The FSC code for this product is “1111” and the NAICS code for the same product is “222222”. I need to be able to extract not only records that “vendor_A” delivered for FSC code “1111” but also records he delivered for NAICS code “222222”. Each of these records will be assigned a delivery weights and they would be summed up to come up with the delivery score for that contractor to deliver a screw.

For vendor_A to deliver product “1111”.

Earlier when the NAICS code was not included I could do a the below since there was only FSC code to group by. See below.



Select cage_code,
FSC, SUM(delivery_count), SUM(delivery_percent)

From (

SELECT
cage_code,
FSC,
1 delivery_count,
DECODE(DECODE(TRIM(reason_for_delay_code), 'X', NULL, reason_for_delay_code), NULL, DECODE(termination_code, NULL, DECODE(NVL(NVL(NVL(receipt_date, ship_date), cancel_date), ld_classification) - required_date,
LEAST(NVL(NVL(NVL(receipt_date, ship_date), cancel_date), ld_classification) - required_date, 5), 0,
LEAST(NVL(NVL(NVL(receipt_date, ship_date), cancel_date), ld_classification) - required_date, 30), 1,
LEAST(NVL(NVL(NVL(receipt_date, ship_date), cancel_date), ld_classification) - required_date, 60), 1.5,
LEAST(NVL(NVL(NVL(receipt_date, ship_date), cancel_date), ld_classification) - required_date, 90), 2, 2.5), 2.5), 0) delivery_percent
FROM USER_A.DELIVERY_TABLE
)
GROUP BY cage_code, FSC;


Now with the NAICS column added how would you retrieve these records to include it in the calculation?


1. I don’t think I can use a table JOIN in this case. I am wondering if I might endCREATE TABLE USER_A.DELIVERY_TABLE
(
CONTRACT_NUMBER VARCHAR2(32 BYTE) NOT NULL,
SERVICE_DESIGNATOR_CODE CHAR(1 BYTE) NOT NULL,
CAGE_CODE CHAR(5 BYTE) NOT NULL,
FSC CHAR(4 BYTE),
NIIN CHAR(9 BYTE),
TERMINATION_CODE CHAR(1 BYTE),
CHALLENGE_CODE CHAR(1 BYTE),
CHALLENGE_DATE DATE,
REQUIRED_DATE DATE NOT NULL,
CANCEL_DATE DATE,
SHIP_DATE DATE,
RECEIPT_DATE DATE,
ADDED_DATE DATE NOT NULL,
UPDATE_DATE DATE,
REASON_FOR_DELAY_CODE CHAR(2 BYTE),
CHALLENGE_COMMENTS VARCHAR2(2000 BYTE),
QUANTITY NUMBER,
PRICE NUMBER(16,4),
AWARD_DATE DATE,
PR_INITIATION DATE,
NAICS VARCHAR2(6 BYTE),
CONSTRAINT TEMP_CDD_FSC_NAICS_CK CHECK ((fsc is null and naics is not null) or (naics is null and fsc is not null)),
CONSTRAINT TEMP_CDD_PK PRIMARY KEY (CONTRACT_NUMBER, REQUIRED_DATE)
);


NAICS_FSC_RELATIONSHIP is the relationship table which has the corresponding NAICS code for an FSC and vice versa. One FSC can have multiple NAICS and vice versa. It is a many to many relationship.


CREATE TABLE USER_A.NAICS_FSC_RELATIONSHIP
(
FSC CHAR(4 BYTE),
NAICS VARCHAR2(6 BYTE),
CONSTRAINT NAICS_FSC_RELA_PK PRIMARY KEY (FSC, NAICS)
);


Let us say for example vendor “vendor_A” produces product screws. The FSC code for this product is “1111” and the NAICS code for the same product is “222222”. I need to be able to extract not only records that “vendor_A” delivered for FSC code “1111” but also records he delivered for NAICS code “222222”. Each of these records will be assigned a delivery weights and they would be summed up to come up with the delivery score for that contractor to deliver a screw.

For vendor_A to deliver product “1111”.

Earlier when the NAICS code was not included I could do a the below since there was only FSC code to group by. See below.



Select cage_code,
FSC, SUM(delivery_count), SUM(delivery_percent)

From (

SELECT
cage_code,
FSC,
1 delivery_count,
DECODE(DECODE(TRIM(reason_for_delay_code), 'X', NULL, reason_for_delay_code), NULL, DECODE(termination_code, NULL, DECODE(NVL(NVL(NVL(receipt_date, ship_date), cancel_date), ld_classification) - required_date,
LEAST(NVL(NVL(NVL(receipt_date, ship_date), cancel_date), ld_classification) - required_date, 5), 0,
LEAST(NVL(NVL(NVL(receipt_date, ship_date), cancel_date), ld_classification) - required_date, 30), 1,
LEAST(NVL(NVL(NVL(receipt_date, ship_date), cancel_date), ld_classification) - required_date, 60), 1.5,
LEAST(NVL(NVL(NVL(receipt_date, ship_date), cancel_date), ld_classification) - required_date, 90), 2, 2.5), 2.5), 0) delivery_percent
FROM USER_A.DELIVERY_TABLE
)
GROUP BY cage_code, FSC;


Now with the NAICS column added how would you retrieve these records to include it in the calculation?


1. I don’t think I can use a table JOIN in this case.
2. I am thinking of writing a procedure and have a cursor to get the distinct CAGE and FSC and then
Loop through each one and get the records associated with that FSC and related NAICS and then use the algorithm to sum the results. The delivery table has 6 million records and I am wondering if it would cause any performance issue if I go this route?


Can you think of a better way of accomplishing this? Thank you.



Tom Kyte
April 27, 2010 - 1:28 pm UTC

but basically, you would need to map the naics code to a fsc code - for any of this work as as is - regardless of structure.

and you started by saying:

NAICS_FSC_RELATIONSHIP is the relationship table which has the corresponding
NAICS code for an FSC and vice versa. One FSC can have multiple NAICS and vice
versa. It is a many to many relationship.


I utterly fail to see how that'll work.

A reader, April 27, 2010 - 1:37 pm UTC

I am sorry I did not understand your response. Please see below what I am trying to accomplish. Thanks.

The NAICS_FSC_RELATIONSHIP table has the relationship between the FSC and NAICS. One FSC can have many NAICS associated with it and similarly one NAICS code can have multiple FSC codes associated with it. In the table the NAICS, FSC combination is the primary key.

For each vendor I need to get not only the records associated with an FSC but also the related NAICS record. I use the relationship table to determine what are the NAICS codes associated with this FSC and then look at the DELIVERY_TABLE to get the records for those NAICS as well. I then need to use these records as well along with the FSC records I had selected to perform the calculaions.




Tom Kyte
April 27, 2010 - 2:06 pm UTC

My response basically said "I don't see how you get there from here, your data does not support such a report anymore"

I fail to see how if a FSC has many NAICs and a NAICs has many FSCs you can report by cage code and fsc. You would end up obviously "multiplying records out"


You are doing a report not for a single FSC, but for ALL FSC codes. That is the query I see (did you see a different one?). If you do that - which FSC should that NAICS map to - if it maps to more than one, you will obviously double, tripe, whatever count that record more than once.

I can give you that - easy, that is just a join (they will all be joins or unions - no procedural code). But that report you talk of - that is the past, that report cannot exist anymore since you have this many to many relationship, the data doesn't make sense anymore.

A reader, April 28, 2010 - 6:29 am UTC

Tom,
The query i had given below


Select cage_code,
FSC, SUM(delivery_count), SUM(delivery_percent)

From (

SELECT
cage_code,
FSC,
1 delivery_count,
DECODE(DECODE(TRIM(reason_for_delay_code), 'X', NULL, reason_for_delay_code), NULL, DECODE(termination_code, NULL, DECODE(NVL(NVL(NVL(receipt_date,

ship_date), cancel_date), ld_classification) - required_date,
LEAST(NVL(NVL(NVL(receipt_date, ship_date), cancel_date), ld_classification) - required_date, 5), 0,
LEAST(NVL(NVL(NVL(receipt_date, ship_date), cancel_date), ld_classification) - required_date, 30), 1,
LEAST(NVL(NVL(NVL(receipt_date, ship_date), cancel_date), ld_classification) - required_date, 60), 1.5,
LEAST(NVL(NVL(NVL(receipt_date, ship_date), cancel_date), ld_classification) - required_date, 90), 2, 2.5), 2.5), 0) delivery_percent
FROM USER_A.DELIVERY_TABLE
)
GROUP BY cage_code, FSC;

is how things were done in the past but now with the NAICS column added I need to modify the query to meet the new requirements.

In the past:
The system collected negative delivery information on products vendors delivered. These products fell under the FSC classification. So when you generate
reports to display how bad a vendor was in their delivery of "screws" you would only consider these records and this is what the SQL above represented.

The new requirement:
Now we have additional data about how a vendor had done in delivering "screws" by collecting NAICS data for a vendor. The NAICS data could have the same

product "screws" classified under a different code than the FSC. To know how bad a vendor is in delivering the product "screws" I need to gather all the

delivery data information on that vendor to manufacture "screws" by getting both FSC and NAICS records data I have on him for that product. If FSC code for

screw is "1111" and the same product is made under NAICS code "111111" I need to see if a contractor has delivery information on either of these codes and

use all these records to determine how good/bad they were in delivering that product. One FSC code can have many related NAICS codes.



The the vendor had manufactured product "screws" under FSC

The new requirement is to for example

let us say

insert into NAICS_FSC_RELATIONSHIP (NAICS,FSC)
VALUES('1111','111111');

insert into NAICS_FSC_RELATIONSHIP (NAICS,FSC)
VALUES('1111','333333');


insert into NAICS_FSC_RELATIONSHIP (NAICS,FSC)
VALUES('2222','222222');

INSERT INTO DELIVERY_TABLE(CONTRACT_NUMBER ,
SERVICE_DESIGNATOR_CODE ,
CAGE_CODE ,
FSC ,
NIIN ,
TERMINATION_CODE ,
CHALLENGE_CODE ,
CHALLENGE_DATE ,
REQUIRED_DATE ,
CANCEL_DATE ,
SHIP_DATE ,
RECEIPT_DATE ,
ADDED_DATE ,
UPDATE_DATE ,
REASON_FOR_DELAY_CODE ,
CHALLENGE_COMMENTS ,
QUANTITY ,
PRICE ,
AWARD_DATE ,
PR_INITIATION ,
NAICS ,
)
VALUES('CTRNO1',
'X',
'VENDOR_A',
'1111',
NULL,
NULL,
NULL,
NULL,
TO_DATE('01/02/2009','MM/DD/YYYY')),
NULL,
TO_DATE('01/30/2009','MM/DD/YYYY')),
TO_DATE('02/04/2009','MM/DD/YYYY')),
TO_DATE('03/01/2009','MM/DD/YYYY')),
NULL,
NULL,
NULL,
1,
10,
TO_DATE('01/02/2008','MM/DD/YYYY')),
NULL
NULL
);


INSERT INTO DELIVERY_TABLE(CONTRACT_NUMBER ,
SERVICE_DESIGNATOR_CODE ,
CAGE_CODE ,
FSC ,
NIIN ,
TERMINATION_CODE ,
CHALLENGE_CODE ,
CHALLENGE_DATE ,
REQUIRED_DATE ,
CANCEL_DATE ,
SHIP_DATE ,
RECEIPT_DATE ,
ADDED_DATE ,
UPDATE_DATE ,
REASON_FOR_DELAY_CODE ,
CHALLENGE_COMMENTS ,
QUANTITY ,
PRICE ,
AWARD_DATE ,
PR_INITIATION ,
NAICS ,
)
VALUES('CTRNO2',
'X',
'VENDOR_A',
NULL,
NULL,
NULL,
NULL,
NULL,
TO_DATE('01/02/2009','MM/DD/YYYY')),
NULL,
TO_DATE('01/30/2009','MM/DD/YYYY')),
TO_DATE('02/04/2009','MM/DD/YYYY')),
TO_DATE('03/01/2009','MM/DD/YYYY')),
NULL,
NULL,
NULL,
1,
10,
TO_DATE('01/02/2008','MM/DD/YYYY')),
NULL
'333333'
);

INSERT INTO DELIVERY_TABLE(CONTRACT_NUMBER ,
SERVICE_DESIGNATOR_CODE ,
CAGE_CODE ,
FSC ,
NIIN ,
TERMINATION_CODE ,
CHALLENGE_CODE ,
CHALLENGE_DATE ,
REQUIRED_DATE ,
CANCEL_DATE ,
SHIP_DATE ,
RECEIPT_DATE ,
ADDED_DATE ,
UPDATE_DATE ,
REASON_FOR_DELAY_CODE ,
CHALLENGE_COMMENTS ,
QUANTITY ,
PRICE ,
AWARD_DATE ,
PR_INITIATION ,
NAICS ,
)
VALUES('CTRNO2',
'X',
'VENDOR_A',
NULL,
NULL,
NULL,
NULL,
NULL,
TO_DATE('01/02/2009','MM/DD/YYYY')),
NULL,
TO_DATE('01/30/2009','MM/DD/YYYY')),
TO_DATE('02/04/2009','MM/DD/YYYY')),
TO_DATE('03/01/2009','MM/DD/YYYY')),
NULL,
NULL,
NULL,
1,
10,
TO_DATE('01/02/2008','MM/DD/YYYY')),
NULL
'111111'
);


Now if I would like to know how the contractors delivery performance was for product screws and if the FSC code for it is "1111". I would also have to look
the corresponding NAICS code to determine if the vendor has also delivered the same products under the NAICS code as well so I have all the data to determine
how good/bad he was in his delivery of that product. To do meet the new requirements

1. I don’t think I can use a table JOIN in this case.
2. I am thinking of writing a procedure and have a cursor to get the distinct CAGE and FSC and then Loop through each one and get the records associated

with that FSC and related NAICS and then use
the algorithm to sum the results. The delivery table has 6 million records and I am wondering if it
would cause any performance issue if I go this route?


Can you think of a better way of accomplishing this? Thank you.



Tom Kyte
April 28, 2010 - 8:36 am UTC

the problem is, you have stated MANY TO MANY.

So, a single FSC may have *many* NAICS

A single NAICS maps to many FSC's


How do YOU propose to handle the obvious double counting that will happen when FSC 111 maps to NAICS 123, 456 and NAICS 123 maps to FSC 111, 222, 333.

What is the output of that.

Do you get what I'm saying - if there isn't a 1:1 relationship here - what are you going to do about the obvious double counting that not only can happen, but will happen?

And yes, a join is still the answer, I cannot fathom a reason why join or union would not be.


think about your data for a minute, your requirement MUST change - it has to now accept that there is this entirely new relationship and you need to tell YOURSELF what is to be done with this duplicate information - how is that to be handled.

Draw some pictures and use the many to many aspect of this data - do it by hand on a white board if you need to.

Kirthi Sreenivasan, April 28, 2010 - 10:06 am UTC

Hi tom,
I had a meeting today and mentioned that the NAICS must be separate from the FSC code to make sure data validation
is not made too complicated. Keeping it separate we can used the FSC and NAICS lookup tables to ensure the data is valid and that if the FSC/NAICS validation rules change it would be easier to incorporate it. Their response was that when the data gets loaded these codes will be validated and hence it is not necessary to have these as separate columns. It will gives us more flexibility and managebility in the future if new product classification types are introduced. Therefore my suggestion did not go through successfully. Their assumption is the data will be valid once it gets into these tables. The NAICS and FSC codes will not be changed from the application they are only added via the data load routines. I could not come up with any other reasons as to why not to use solution 3(product_type and product_code on the delivery table). Can you think of any other issues with the solution 3.



Tom Kyte
April 28, 2010 - 12:46 pm UTC

... It will gives us more flexibility
and managebility in the future if new product classification types are
introduced....

no it won't, it will make it be up to the stupid application to do that. That is what DATABASES do, validate data.

Remind them that software has a shelflife and this data will exist long and far after their tiny little programs have bitten the dust.

... Their
assumption is the data will be valid once it gets into these tables. ..

then they have never written code for money before, they are newbies, they do not understand reality.

Tell them what happens when you assume - you know the old joke, never would it be more appropriate.

A reader, April 28, 2010 - 11:46 am UTC

Tom,
To add on to my previous post there are going to be some quality records on contractors for the products they produce. Most of them would be product deficiency reports. A deficinecy report can be written for not just always one FSC but it can be written for multiple FSC's. Similarly for NAICS.

Now:
"Product_Deficiency" table will have all deficiency report
for products produced by vendors. Since a deficiency report can written for more than one FSC. The child table is used for storing the FSC codes along with a "serial_number" column as the primary key. The "serial_number" has a foreign key relationship to the "Product_Deficiency" tables primary key "serial_number"

CREATE TABLE Product_Deficiency
(SERIAL_NUMBER NUMBER constraint prod_def_pk PRIMARY KEY,
VENDOR_ID CHAR(5) CONSTRAINT prod_def_vid_nn NOT NULL,
DEFICIENCY_TYPE VARCHAR2(3) CONSTRAINT prod_def_deftyp NOT NULL,
);

CREATE TABLE product_deficiency_FSC(
SERIAL_NUMBER NUMBER CONSTRAINT prof_def_fsc_fk REFERENCES Product_Deficiency(SERIAL_NUMBER),
FSC CHAR(4) CONSTRINAT prof_def_fsc_fsc_nn NOT NULL,
CONSTRINAT prod_def_fsc_pk PRIMARY KEY(SERIAL_NUMBER, FSC)
);

After the NAICS addition the above structure is going to change to the below.

CREATE TABLE Product_Deficiency
(SERIAL_NUMBER NUMBER constraint prod_def_pk PRIMARY KEY,
VENDOR_ID CHAR(5) CONSTRAINT prod_def_vid_nn NOT NULL,
DEFICIENCY_TYPE VARCHAR2(3) CONSTRAINT prod_def_deftyp NOT NULL,
);

CREATE TABLE product_deficiency_FSC(
SERIAL_NUMBER NUMBER CONSTRAINT prof_def_fsc_fk REFERENCES Product_Deficiency(SERIAL_NUMBER),
PRODUCT_TYPE VARCHAR2(2),
PRODUCT_CODE VARCHAR(6),
CONSTRINAT prod_def_fsc_pk PRIMARY KEY(PRODUCT_TYPE, PRODUCT_CODE)
);

Do you see any reasons why not to go the above route. I just want to make sure we dont make a change that we regret later and I would like to give valid reasons why it might not be a good idea if there could be other problem we run into later.

My suggestion would have been do as below.

CREATE TABLE Product_Deficiency
(SERIAL_NUMBER NUMBER constraint prod_def_pk PRIMARY KEY,
VENDOR_ID CHAR(5) CONSTRAINT prod_def_vid_nn NOT NULL,
DEFICIENCY_TYPE VARCHAR2(3) CONSTRAINT prod_def_deftyp NOT NULL,
);

CREATE TABLE product_deficiency_FSC(
SERIAL_NUMBER NUMBER CONSTRAINT prof_def_fsc_fk REFERENCES Product_Deficiency(SERIAL_NUMBER),
FSC CHAR(4) CONSTRINAT prof_def_fsc_fsc_nn NOT NULL,
CONSTRINAT prod_def_fsc_pk PRIMARY KEY(SERIAL_NUMBER, FSC)
);


CREATE TABLE product_deficiency_NAICS(
SERIAL_NUMBER NUMBER CONSTRAINT prof_def_naics_fk REFERENCES Product_Deficiency(SERIAL_NUMBER),
NAICS CHAR(6) CONSTRINAT prof_def_fsc_naics_nn NOT NULL,
CONSTRINAT prod_def_fsc_pk PRIMARY KEY(SERIAL_NUMBER, NAICS)
);
Tom Kyte
April 28, 2010 - 12:53 pm UTC

do you have any honest to goodness data modelers to work with? this is bigger than a bread box, there is a lot of information to capture and work with. I've done the whole fsc, niin, cage code stuff in a former life - this is a well known area - there should be someone with lots of experience in this data to work with you?

A reader, April 28, 2010 - 12:53 pm UTC

Tom,
The data load routines are written as procedures in the database and they are used to validate the data before it goes into these tables. So the shelf life of the software they would say would not matter as the procedure does the validation and loads the data. The FSC and NAICS code are not updated after. I am not sure how else to convince them. Please advice.
Tom Kyte
April 28, 2010 - 1:02 pm UTC

It sounds like they do not want to be convinced. No matter what you say, they will say no.

So, tell me, do they control the data models or what - who is in charge of that. I know the military/government customers are typically very very particular about their data - they have more than one standard on how to store and deal with it.

A reader, April 28, 2010 - 1:04 pm UTC

I am working with J2EE application developers who have already written an application based on their design decisions and I guess no database person such as a DBA was involved to make these database design decisions. That is what it appears to be. At this stage until I find a real convincing reason to make them realize that it is not the right decision I am afraid they might not agree to make the change. There is a DBA who have worked with this app long enough to understand the data but it appears like she might not have been involved in the database design decision.


Oleksandr Alesinskyy, April 29, 2010 - 3:45 am UTC

I guess one important question was not asked yet (or I missed it) - are both classification mutually exclusive?
That means may some products have codes from both classifications? If the answer is yes then it completely disqualifies the solution 3.

A reader, April 29, 2010 - 7:11 am UTC

Tom, is this your post ("Oleksandr Alesinskyy")?

Can you please explain a little bit more, I did not understand. Thanks.
Tom Kyte
April 29, 2010 - 7:54 am UTC

I am not Oleksandr

he is basically saying what I've been saying:

this is not well defined, you guys haven't nailed down the data model, the relations are not well defined. The questions/queries you used to ask of the old data might well have no answer in the new world (due to that many to many relationship).

In other words, you need to take a huge step back, understand all of the data and put together a proper model to represent it and have the documentation available so that you can later say to someone 'the question you ask doesn't make sense since the data by definition works like this...'

A reader, April 30, 2010 - 2:56 pm UTC

CREATE TABLE DELIVERY_TABLE
(
CONTRACT_NUMBER VARCHAR2(32 BYTE) NOT NULL,
SERVICE_DESIGNATOR_CODE CHAR(1 BYTE) NOT NULL,
VENDOR_ID CHAR(5 BYTE) NOT NULL,
REQUIRED_DATE DATE NOT NULL,
CANCEL_DATE DATE,
SHIP_DATE DATE,
RECEIPT_DATE DATE,
ADDED_DATE DATE NOT NULL,
UPDATE_DATE DATE,
REASON_FOR_DELAY_CODE CHAR(2 BYTE),
CHALLENGE_CODE CHAR(1 BYTE),
CHALLENGE_DATE DATE,
Product_type CHAR(1) NOT NULL,
Product_code VARCHAR(6) NOT NULL,
CONSTRAINT DEL_TBL_PK PRIMARY KEY(CONTRACT_NUMBER,SERVICE_DESIGNATOR_CODE)
);

Each record in the above table will have either a "F" or "N" value for the "product_type" column. If it is an "F" the product_code would have an FSC value otherwise the product_type code must have "N" and the product_code will have a NAICS value. Each record in the above table represents a delivery information for a contract. This could be either a NAICS or FSC record type. A delivery record cannot be associated with both an NAICS and an FSC at the same time. Therefore each record in the above table would represent a NAICS delivery record or a FSC delivery record for a contractor.

Below is an example of the NAICS FSC relationship.


create table temp_naics_fsc(
naics char(6),
fsc char(4),
CONSTRAINT temp_naics_fsc_pk PRIMARY KEY(NAICS,FSC)
);

insert into temp_naics_fsc(naics, fsc)
values('451110','1005');

insert into temp_naics_fsc(naics, fsc)
values('451110','1010');

insert into temp_naics_fsc(naics, fsc)
values('451110','1015');

insert into temp_naics_fsc(naics, fsc)
values('332995','1010');

insert into temp_naics_fsc(naics, fsc)
values('332995','1015');

I need to get all FSC delivery records for a contractor and also the NAICS delivery records for the same contractor. Assign some weights based on how late they were and use that to calculate the their delivery score.

I understand the requirement but I am not sure why the above model would fail. I am not sure if this clears up the mutually exclusive question. If not can you explan a little bit more. Thank you.

A reader, April 30, 2010 - 3:01 pm UTC

Clarification:

I need to get all FSC delivery records for a contractor and also the NAICS delivery records (i.e., getting the NAICS related to an FSC and using that query the table to see if the same contractor has any NAICS delivery records) for the
same contractor. Assign some weights based on how late they were and use that to calculate the their delivery score

Table design

A reader, May 03, 2010 - 1:17 pm UTC

How would you design the following requirement.

A quality record can be an FSC or NAICS. If it is a quality record it can have more than one FSC associated with it. If it is a NAICS it can have more that one NAICS associated with it. A quality record needs to have atleast one FSC or NAICS associated with it.

Below is what I had in mind but, how would you incorporate the above requirements. If the below is not the right deisgn how would you do it instead.
Ex:

create table quality(
serial_number NUMBER CONSTRAINT quality_pk PRIMARY KEY,
contract_Number VARCHAR2(30) CONSTRAINT ctr_nn NOT NULL,
VENDOR_ID CHAR(5) CONSTRAINT vendor_nn NOT NULL,
RECEIVED_DATE DATE,
SHIP_DATE DATE
);

create table quality_FSC(
serial_number NUMBER,
FSC CHAR(4),
CONSTRAINT quality_seri_fsc_pk PRIMARY KEY(serial_number,fsc),
CONSTRAINT quality_serial_num_fk FOREIGN KEY(serial_number)
REFERENCES quality(serial_number)
);


create table quality_NAICS(
serial_number NUMBER,
NAICS CHAR(6),
constraint quality_naics_pk PRIMARY KEY(serial_number,naics),
CONSTRAINT quality_serial_num_NAICS_fk FOREIGN KEY(serial_number) REFERENCES quality(serial_number)
);


Thanks.
Tom Kyte
May 06, 2010 - 12:18 pm UTC

your requirement:

it can have
more than one FSC associated with it. If it is a NAICS it can have more that
one NAICS associated with it.


would necessitate an association object for each relation - which you partially have (each of your association objects is missing a foreign key to the table they are relating the quality table to)

as for:

A quality record needs to have at least one FSC or
NAICS associated with it.


that gets tricky - do not do this in the application since application developers tend to not understand that multiple things can happen in a database simultaneously and tend to underestimate the complexity of such a requirement.

probably the easiest way to implement that would be via a very very very small, simple trigger on the association objects:


something like this:

ops$tkyte%ORA11GR2> create table q
  2  ( id number primary key, /*other columns*/
  3    child_count number check ( nvl(child_count,0) > 0 ) deferrable initially deferred
  4  )
  5  /

Table created.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> create table f ( q_id references q, f number, primary key(q_id,f) );

Table created.

ops$tkyte%ORA11GR2> create table n ( q_id references q, n number, primary key(q_id,n) );

Table created.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> create trigger f after insert or update or delete on f for each row
  2  begin
  3          if (updating or inserting)
  4          then
  5                  update q set child_count = nvl(child_count,0)+1 where id = :new.q_id;
  6          end if;
  7          if (updating or deleting)
  8          then
  9                  update q set child_count = nvl(child_count,0)-1 where id = :old.q_id;
 10          end if;
 11  end;
 12  /

Trigger created.

ops$tkyte%ORA11GR2> create trigger n after insert or update or delete on n for each row
  2  begin
  3          if (updating or inserting)
  4          then
  5                  update q set child_count = nvl(child_count,0)+1 where id = :new.q_id;
  6          end if;
  7          if (updating or deleting)
  8          then
  9                  update q set child_count = nvl(child_count,0)-1 where id = :old.q_id;
 10          end if;
 11  end;
 12  /

Trigger created.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> insert into q (id) values ( 1 );

1 row created.

ops$tkyte%ORA11GR2> commit;
commit
*
ERROR at line 1:
ORA-02091: transaction rolled back
ORA-02290: check constraint (OPS$TKYTE.SYS_C0012776) violated


ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> insert into q (id) values ( 1 );

1 row created.

ops$tkyte%ORA11GR2> insert into f (q_id,f) values (1,1000 );

1 row created.

ops$tkyte%ORA11GR2> commit;

Commit complete.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> delete from f;

1 row deleted.

ops$tkyte%ORA11GR2> insert into n (q_id,n) values (1,12345);

1 row created.

ops$tkyte%ORA11GR2> commit;

Commit complete.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> delete from n;

1 row deleted.

ops$tkyte%ORA11GR2> commit;
commit
*
ERROR at line 1:
ORA-02091: transaction rolled back
ORA-02290: check constraint (OPS$TKYTE.SYS_C0012776) violated



Many to Many relationship

A reader, May 03, 2010 - 11:08 pm UTC

Tom,
This response answers your question posted on April 27th.

"I fail to see how if a FSC has many NAICs and a NAICs has many FSCs you can report by cage code and fsc. You would end up obviously "multiplying records out"

You are doing a report not for a single FSC, but for ALL FSC codes. That is the query I see (did you see a different one?). If you do that - which FSC should that NAICS map to - if it maps to more than one, you will obviously double, tripe, whatever count that record more than once. "

Let us say vendor "A" has manufactured product "X" and deliverd it to the NAVY who use FSC code "1234" to identify the product. The same vendor "A" has manufactured the same product and delivered it to the Federal government who use NAICS code "123456". Let us say the same FSC product can also be covered by NAICS code "78901" which covers stationaries. A NAICS code covers a broader range of products for example "Office Supplies" whereas the FSC code is used a more specific product for example like to make "0.5mm pencil leds". The more information we have on a contractor to produce a certain class of products we have more information to decide if you would like to purchase products that fall under that category from him. Therefore to rate a vendor to produce a certain class of products I need to include not only the records I have on the vendor to produce an FSC but also the related NAICS records that he has delivered. Therefore
in the above example to evaluate vendor A's ability to produce FSC 1234 I need to use the information received from the NAVY on him to identify how well he performed and also the information I receive from the Federal government on him to produce NAICS "123456" and "78901". There can be more than one NAICS code that relate to this FSC. Therefore I need to consider records that the vendor might have on these NAICS as well in addition to the FSC records.

"How do YOU propose to handle the obvious double counting that will happen when FSC 111 maps to NAICS 123, 456 and NAICS 123 maps to FSC 111, 222, 333.
What is the output of that."

When I score the Contractor based on FSC I would want to include records that the contractor has on that FSC i.e., 111 and also records he has on NAICS 123 and 456. The records that the contractor has on FSC "222" and "333" will not be included in this. (The project manager does not want these to be included)

When I score the contractor based on NAICS I would want to include records that the contractor has on NAICS 123 and also records he has for FSC 111, 222 and 333.


Tom Kyte
May 06, 2010 - 12:48 pm UTC

then, just join. why do you think "i cannot join"

if you have the data modeled in the classic 1:m relationship, what is the issue with joining? You keep stating "I cannot", I want to understand why you think this.

You have a mapping table, just join to it, it'll give you a set of records that have the parent record repeated for each and every child record you join to. What you do with that - is up to you.

A reader, May 04, 2010 - 12:55 pm UTC

Tom, I was reading your forum on EAV and came across the following at the following link http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:2314483800346542969

"Tom,

What about index on (ATTRIBUTE, VALUE)
Then gather histograms.
That would be better than nothing, eh?

(we are stuck with this app)

Thanks,

Robert.



Followup February 1, 2010 - 4pm Central time zone:

why would that help?

The problem with EAV is not getting *a row*, but rather putting together your row from the N'thousand of EAV rows that make it up and comparing that to other rows.


If you have a table object

and a table object_attributes

and into object_attributes you stick things like:

(obj=100,attr=first_name,val=thomas)
(obj=100,attr=last_name,val=kyte)

finding all of the thomas kytes and printing out their full name, address and so on requires a multi (many multi) table outer join done over and over to pivot the data and then apply a where clause.

No indexing strategy will change that.
"

1. So creating indexes on the (ATTRIBUTE, VALUE) will not improve performance?? Can you please explain why. Can you also please compare it with processing if these attributes were stored as a separate column in the table (the recommended option).

Thanks.

.



A reader, May 04, 2010 - 1:59 pm UTC

I have your "Expert Oracle Database Architecture 9i and 10g programming techniques and solutions". Do you discuss the EAV design problems and how the SQL is processed to retrieve results for a query in this book. Do you also compare it to the way the SQL is processed using a recommended option in the book.

I have a meeting tomorrow and I would like to read up on these to explain why even creating an index on the attribute, value pair in an EAV design might not help
imporving the performance of the query.

Any information on this would be greatly useful.

A reader, May 05, 2010 - 9:26 am UTC

If I have an EAV table and if I have no more than a attributes for each record and this is stored in a child table. A record can have couple of values for this attribute. Would this cause a performance degradation. I am thinking it probably would not cause a performance degradation because as per the below article you mention the problem is not finding the row using an index but piecing it together. If I do not have many attributes to piece together (in this case just one attribute with multiple values for each record) it should not affect performance right? If the performance would not degrade would you recommend using the EAV model in these types of situations?

CREATE TABLE object(id number constraint object_pk primary key,
comments varchar2(30));

CREATE TABLE object_attribute(id NUMBER, attribute_name VARCHAR2(15), attribute_value VARCHAR2(30),
CONSTRAINT obj_attri_fk FOREIGN KEY(id) REFERENCES object(id) on delete cascade,
constraint obj_attr_pk PRIMARY KEY(id,attribute_name,attribute_value));

insert into object(id, comments)
values (1,"some comment..");

insert into object_attribute(id,
attribute_name, attribute_value)
values(1,'FSC','1234');


insert into object_attribute(id,
attribute_name, attribute_value)
values(1,'FSC','5678');


insert into object_attribute(id,
attribute_name, attribute_value)
values(1,'FSC','9012');


insert into object(id, comments)
values (2,"some comment for record 2.");


insert into object_attribute(id,
attribute_name, attribute_value)
values(2,'NAICS','999999');


insert into object_attribute(id,
attribute_name, attribute_value)
values(2,'NAICS','666666');

Below is an excerpt from one of the discussions that states the problem with EAV is not finding the row but piecing the different attributes for each record.


http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:2314483800346542969
"
Tom,

What about index on (ATTRIBUTE, VALUE)
Then gather histograms.
That would be better than nothing, eh?

(we are stuck with this app)

Thanks,

Robert.



Followup February 1, 2010 - 4pm Central time zone:

why would that help?

The problem with EAV is not getting *a row*, but rather putting together your row from the N'thousand of EAV rows that make it up and comparing that to other rows.


If you have a table object

and a table object_attributes

and into object_attributes you stick things like:

(obj=100,attr=first_name,val=thomas)
(obj=100,attr=last_name,val=kyte)

finding all of the thomas kytes and printing out their full name, address and so on requires a multi (many multi) table outer join done over and over to pivot the data and then apply a where clause.

No indexing strategy will change that.
"

NUMBER (< p >, < s >)

rohit richhariya, May 09, 2010 - 5:44 am UTC

Value Datatype Stored Value
123.2564 NUMBER 123.2564
1234.9876 NUMBER(6,2) 1234.99
12345.12345 NUMBER(6,2) Error
123456 NUMBER(6,2) Error
1234.9876 NUMBER(6) 1235
123456.1 NUMBER(6) 123456
12345.345 NUMBER(5,-2) 12300
1234567 NUMBER(5,-2) 1234600
12345678 NUMBER(5,-2) Error Outside
123456789 NUMBER(5,-4) 123460000
1234567890 NUMBER(5,-4) Error
12345.58 NUMBER(*, 1) 12345.6
0.1 NUMBER(4,5) Error
0.01234567 NUMBER(4,5) 0.01235
0.09999 NUMBER(4,5) 0.09999
0.099996 NUMBER(4,5) Error


i am unable to understand of above example please help me and describe how Number data type rotate value, how it is work?
Tom Kyte
May 10, 2010 - 6:33 pm UTC

what do you mean by "rotate value"

Do you have access to the sql language guide? (rhetorical question, yes you do, have you read it?)

http://docs.oracle.com/docs/cd/E11882_01/server.112/e10592/sql_elements001.htm#sthref119

NUMBER(p,s)

where:

    *

      p is the precision, or the maximum number of significant decimal digits, where the most significant digit is the left-most nonzero digit, and the least significant digit is the right-most known digit. Oracle guarantees the portability of numbers with precision of up to 20 base-100 digits, which is equivalent to 39 or 40 decimal digits depending on the position of the decimal point.
    *

      s is the scale, or the number of digits from the decimal point to the least significant digit. The scale can range from -84 to 127.
          o

            Positive scale is the number of significant digits to the right of the decimal point to and including the least significant digit.
          o

            Negative scale is the number of significant digits to the left of the decimal point, to but not including the least significant digit. For negative scale the least significant digit is on the left side of the decimal point, because the actual data is rounded to the specified number of places to the left of the decimal point. For example, a specification of (10,-2) means to round to hundreds.

A reader, May 11, 2010 - 7:30 am UTC

Hello Tom,
Can you please say what your responses are for the posts I made on May 4th and 5th. I appreciate it. Thank you.

To ansewer your question as to why I am thinking I cannot join? I can join but see this is what I am thinking.

1. Let's say I write a sql to retrieve the weighted score for each vendor and FSC pair. Then group by vendor and FSC to get the total weighted score for each pair.

2. I write another sql to retrieve the weighted score for each vendor NAICS pair. The group by vendor and NAICS to get the total weighted score for each of these pairs.

3. I have the mapping table which has the list of NAICS that an FSC is related to.

4. In the below SQL I join the results as below

(1 join 3) join (2)

but the results would give me the total fsc score multiple times if there are more than one NAICS associated with that FSC. I want a result like below, I do not want to double count the "total_fsc_score" if it has more than one NAICS associated with it.

vendor FSC total_score (fsc+naics weighted score)
ABCD 1111 9 (4+2+3)



SELECT fn.vendor,
fn.fsc,
fn.total_fsc_score,
fn.naics,
na.totalScore as total_naics_score
FROM (SELECT f.vendor,
f.fsc,
f.totalScore AS total_fsc_score,
c.naics
FROM ( SELECT vendor, FSC, SUM (score) as totalScore
FROM (SELECT 1 AS serial_number,
'ABCD' AS vendor,
'1111' AS FSC,
1 AS score
FROM DUAL
UNION ALL
SELECT 1 AS serial_number,
'ABCD' AS vendor,
'1111' AS FSC,
1 AS score
FROM DUAL
UNION ALL
SELECT 2 AS serial_number,
'ABCD' AS vendor,
'1111' AS FSC,
2 AS score
FROM DUAL
UNION ALL
SELECT 3 AS serial_number,
'EFGH' AS vendor,
'2222' AS FSC,
1 AS score
FROM DUAL) a
GROUP BY vendor, fsc) f,
(SELECT '1111' AS FSC, '99999' AS NAICS FROM DUAL
UNION ALL
SELECT '1111' AS FSC, '88888' AS NAICS FROM DUAL) c
WHERE f.fsc = c.fsc(+)) fn,
( SELECT vendor, naics, SUM (score) AS totalScore
FROM (SELECT 1 AS serial_number,
'ABCD' AS vendor,
'99999' AS NAICS,
1 AS score
FROM DUAL
UNION ALL
SELECT 1 AS serial_number,
'ABCD' AS vendor,
'99999' AS NAICS,
2 AS score
FROM DUAL
UNION ALL
SELECT 2 AS serial_number,
'ABCD' AS vendor,
'88888' AS NAICS,
2 AS score
FROM DUAL
UNION ALL
SELECT 3 AS serial_number,
'EFGH' AS vendor,
'77777' AS NAICS,
1 AS score
FROM DUAL) b
GROUP BY vendor, naics) na
WHERE fn.vendor = na.vendor(+) AND fn.naics = na.naics(+)

May be I need to rethink how I would rewrite the SQL to derive at the result I want. It is the above reason why I was mentioning I cannot use a join.

Kirthi

More to Explore

PL/SQL demos

Check out more PL/SQL tutorials on our LiveSQL tool.

PL/SQL docs

PL/SQL reference manual from the Oracle documentation library