I have a slightly different twist to the original question
Mohsin, October 25, 2002 - 4:57 pm UTC
Given a database/schema how can we determine the level of normalization the schema underwent.
October 26, 2002 - 12:31 pm UTC
you would reverse engineer it into some tool and study the results I suppose. I've (very frankly) never even considered doing anything like that, ever.
Mike, October 26, 2002 - 11:25 pm UTC
Not vary understood your statement:
"
The cost associated with an
UPDATE (of which you do as many as you do reads typically) more then offsets any
percieved gain by denormalization.
"
Could you elaborate that by a test example.
Thank you vary much
October 27, 2002 - 11:18 am UTC
A normalized structure might have:
table customer ( cust_id, cust_name, cust_address );
table orders ( order_id, cust_id );
someone might denormalize that to:
table customer ( cust_id, cust_name, cust_address );
table orders ( order_id, cust_id, cust_name, cust_address );
in order to avoid the "join" from customer to orders during a select. However, they have massively impacted
o inserts into orders (how many times do you call up an order really? does what
you did for SELECTS pay off the penalty you imposed on inserts?)
o updates to customer which much cascade all over the place
database were born to join, my comment was people try to avoid the join -- not realizing they just moved the "cost" elsewhere. did they do the right thing? many times *not*
How about small lookup tables?
Doug C, October 27, 2002 - 3:33 pm UTC
Tom - if databases are born to join, how do you feel about small lookup tables that show up in a bazillion queries and add an addtional join level? What I mean is the kind of table where there is nothing more than a description for a type of order or something like that. I'm guessing you are going to say that if it is read quite frequently then the blocks will be in the buffer cache and the overhead is minimal. Even so, it seems like the kind of think that can be easily re-inforced in the app as well (i.e, - forcing values 1-5 or something rather than joining with a lookup talbe containing 1-5).. What do you prefer to do?
October 27, 2002 - 3:41 pm UTC
are 1-5 constants for ever and ever and ever (they never are -- you'll add 6 later, remove 1)
If 1-5 are well know, take the approach we did in the data dictionary for many things -- views with DECODES. Using FBI's (function based indexes) you could even "where on them" if you like.
Works well for very small lookup tables.
What difference does it make if ...
Pasko, October 28, 2002 - 3:29 am UTC
Hi Tom,
In your example above ...
table customer ( cust_id, cust_name, cust_address );
table orders ( order_id, cust_id );
What difference does it make when i specify or do not specify the Foreign Key from Orders to Customer.
Some people would still call that Denormalized if i omit the Foreign Key .
What's your take on that Tom.
Best Regards.
October 28, 2002 - 7:29 am UTC
A rose by any other name is still a rose.
If it walks like a duck, talks like a duck -- hey, it must be a duck.
I would not call that denormalized, no. There is no redundant data -- I left out primary and fkeys for the example.
What is your opinion?
Thaha Hussain, May 06, 2003 - 3:18 am UTC
Dear TOM,
Please see a few tables from my comuterised Library:
LIB_MASTER
=========
MASTER_CODE
TITLE
AUTHOR_ID
......
......
......
LIB_CHILD
==========
ITEM_CODE
MASTER_CODE
AVAILABLE DEFAULT 'Y' <<---- Please Note this field
BOOK_ISSUE
==========
ITEM_CODE
MEMBER_ID
ISSUE_DATE
RETURN_DATE
RETURNED_DATE
.........
.........
When one book is issued, the transaction is recorded in the BOOK_ISSUE table and the field AVAILABLE is made 'N' for the curresponding ITEM_CODE in LIB_CHILD table.
RETURNED_DATE is filled while the book is returned. (Suppose no renewal!)
My doubt is:-
We will know whether a particular book is available or not by querying the
BOOK_ISSUE table. So there is no need of the filed AVAILABLE in the LIB_CHILD
table. If it is there, is it against Normalization? What is your opinion?
May 06, 2003 - 7:56 am UTC
Normalization is overrated.
Denormalization can be taken to frightening extremes.
The answer usually lies some where in between the two extremes.
Depends on the volume of queries against this system and their nature. If you query up "a" row from lib_child to see if it is available and do not ask questions like "how many books are available" "how many are not available", then a view like:
create view lib_child
as
select ......, nvl( (select 'N'
from book_issue
where item_code = lib_child_table.item_code
and returned_date is null ), 'Y' ) available
from lib_child_table;
might be easier to maintain.
If I asked those questions frequently however -- I might want Y/N to physically appear in that table and would pay the price on insert/update/delete on book_issue in order to have it be so.
Since there could be mutating table issues if you try to maintain this field via a trigger (assuming referential integrity is in place here), you might lean toward the view -- an index on book_issue(item_code,returned_date) would make it fairly efficient.
Thank You!!!
Thaha Hussain, May 06, 2003 - 8:53 am UTC
Dear Tom,
Thank You!!
-Thaha Hussain
Whick is the best practice?
TH, July 14, 2003 - 2:37 am UTC
Dear Tom,
We are extending one of our software to a new edition.
Right now we have a table like this
=======================================
TABLE CLASSES:-
class# | column2 | column3
-----------------------------------
Level1 | -- | --
Level2 | -- | --
Level3 | -- | --
Level4 | -- | --
Level5 | -- | --
Level6 | -- | --
Level7 | -- | --
Column2 & Column3 are a few columns which are out of our discussion.
We introduced a new table to group these levels for taking reports based on that.
PROGRAM_MASTER:-
Program_id # | Description
-----------------------------
ID1 | First program
ID2 | Second program
ID3 | Third program
ID4 | Forth program
===================================================================
Question:-
Whick is the best practice among the following?
CASE I
Adding a new column (Program_id) to the table CLASSES
TABLE CLASSES:-
class# | column2 | column3 | Program_id
---------------------------------------------------------
Level1 | -- | -- | ID1
Level2 | -- | -- | ID1
Level3 | -- | -- | ID1
Level4 | -- | -- | ID2
Level5 | -- | -- | ID2
Level6 | -- | -- | ID3
Level7 | -- | -- | ID3
CASE - II
Creating a new table as follows?
PROGRAMS_AND_LEVELS:-
Program_id | Level
------------------------------
ID1 | Level1
ID1 | Level2
ID1 | Level3
ID2 | Level4
ID2 | Level5
ID3 | Level6
ID3 | Level7
Thanks a lot!
July 14, 2003 - 9:19 am UTC
since it appears to be a classic "1 to M" relationship -- not a "many to many", the first approach, using a foreign key, would be the one I would lean towards
In your new book...
Kashif, July 14, 2003 - 3:56 pm UTC
Hi Tom,
Do you cover normalization/denormalization in your new book? I did read somewhere that you discuss how to design efficient schemas, and I assumed one of the sections under that would be normalization. Thanks.
Kashif
July 15, 2003 - 1:04 am UTC
efficient schemas talks about the physics of data.
Me, i've never been too overly concerned about getting my data into the 30th normal form close to nirvana.
I'm more worried about it getting it *fast*
In a way i guess I do talk about normal/denormal -- but I never call it that. I call it "implementing to answer your most frequently asked questions". I'll organize the data in the way that
a) is easy enough to access
b) gives the best performance for the most frequent types of access
eg: I'll store the data in a format perfect for querying but terrible for writing if 90% of my access is reading the data (i'll ping the writes for the benefit of the read).
I'll do the opposite if the opposite is true.
Clarification?
Mark Wooldridge, December 03, 2003 - 9:23 pm UTC
Can you calrify your remark...
If 1-5 are well know, take the approach we did in the data dictionary for many
things -- views with DECODES. Using FBI's (function based indexes) you could
even "where on them" if you like.
Works well for very small lookup tables.
I kinda udertand the decode concept but the FBI concept is still confusing.
December 04, 2003 - 7:43 am UTC
ops$tkyte@ORA920PC> create table t ( cd int, data varchar2(80) );
Table created.
ops$tkyte@ORA920PC>
ops$tkyte@ORA920PC> create or replace view v
2 as
3 select cd, decode( cd, 1, 'code 1',
4 2, 'code 2',
5 3, 'code 3',
6 'unknown' ) cd_str, data
7 from t
8 /
View created.
ops$tkyte@ORA920PC>
ops$tkyte@ORA920PC> create index t_idx on t( decode( cd, 1, 'code 1',
2 2, 'code 2',
3 3, 'code 3',
4 'unknown' ) )
5 /
Index created.
ops$tkyte@ORA920PC>
ops$tkyte@ORA920PC> exec dbms_stats.set_table_stats( user, 'T', numrows => 1000000, numblks => 100000 );
PL/SQL procedure successfully completed.
ops$tkyte@ORA920PC>
ops$tkyte@ORA920PC> set autotrace traceonly explain
ops$tkyte@ORA920PC> select * from v
2 where cd_str = 'unknown'
3 /
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=10000 Bytes=550000)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=2 Card=10000 Bytes=550000)
2 1 INDEX (RANGE SCAN) OF 'T_IDX' (NON-UNIQUE) (Cost=1 Card=4000)
ops$tkyte@ORA920PC> set autotrace off
what I meant was -- you can index the "decode" and then "where" on the decod and still use the index efficiently.
Do you consider this as denormalization
A reader, March 24, 2005 - 1:40 pm UTC
Hi Tom,
We build a school management software, which manages hundreds of schools within district. Most of the data are relevant to one school only.
For we have the following tables:
SCHOOLS (SCHOOLID PK)
STUDENTS (STUDENTID PK, SCHOOLID)
STUDENT_GRADES(STUGRADEID PK, STUDENTID)
A studentid always has fixed schoolid, if he tranfers, we create a new studentid/schoolid for him. Currently table like STUDENT_GRADES contains 50-million rows. I am thinking to add SCHOOLID to it and partition the table as well. Do you think if this is a good idea?
And do you think this is denormalization?
March 24, 2005 - 4:02 pm UTC
don't see the point of student_grades?
RE: Do you consider this as denormalization
Bill Schwartz, March 24, 2005 - 4:22 pm UTC
I have a good question for you, reader:
As I work in education, how could you possibly track your students over time if you are issuing new student ids each time a student changes schools? How can your elementary school educators get feedback on how well they teach based on their former students middle school grades?
Seems to me you need to re-think your design there.
March 24, 2005 - 4:27 pm UTC
yeah, it seemed a little strange to me too.
oltp + dss
riyaz, March 25, 2005 - 12:47 am UTC
I am fully agreeing with your answer on normalizaton of database.
But in our case, we have a system fully OLTP, but updating so many tables (for report purpose) through trigger online. So our system is both OLTP and DSS.
e.g, sales transaction has 2 tables header and detail and the same thing is updated in 3 rd table which has summary of data (sales), will be used for MIS report purpose.
Here the 3rd table used for MIS purpose is denormalized one but updated online.(THE ABOVE IS THE CASE FOR OTHER MODULES ALSO)
Here, What is your opinion, our design is ok or bad.
March 25, 2005 - 7:47 am UTC
not sure what you mean here.
explanation
riyaz, March 25, 2005 - 9:07 am UTC
our system is OLTP + DSS
ie. so many reports are being generated.
we have table design like this
1. sales_head (one row per invoice - invoice no, date, party etc..)
2. sales_det (product-wise details (invoice no, date, prod, qty etc
3. dss_sales (date, prod code, party code, qty, amount)
this table will have summary of data, which will be used for reporting.
And this table is being updated through trigger on sales_det. (Data required for reporting online)
Here, sales_head and sales_det is in line with Normalization, but dss_sales is denormalized one. (repeated columns of sales_head/det)
As per your view, OLTP should have Normalization and DSS can be of De-normalization.
But in our case for OLTP, we are foced to have de-normalized table. Is it ok. pl reply.
March 25, 2005 - 6:19 pm UTC
you are penalizing your OLTP work to the benefit of the reports.
If you are OK with that, so be it.
One might use materialized views refreshed from time to time to denormalize for reporting and be less impacting on the OLTP data.
Re: Do you consider this as denormalization
A reader, April 01, 2005 - 8:24 pm UTC
Do you consider this as denormalization March 24, 2005
Reviewer: A reader
I am the poster of this post - sorry so busy with my project i even had no time to check my favorite internet website asktom.com nowadays.
I'd better rephrase my question:
We have tables like:
SCHOOLS (SCHOOLID PK, SCHOOLNAME etc...)
+->STUDENTS (STUDENTID PK, SCHOOLID FK, STUDENTNAME etc...)
+->STUDENT_GRADES(STUGRADEID PK, STUDENTID FK, GRADEID FK, ...)
+->STUDENT_GRADE_ITEMS(STUGRADEITEMID PK, STUGRADEID FK,...)
+->CLASSES
+->CLASS_ATTENDANCES
+->SUBJECTS
+->SUBJECT_GRADES
+->PERIODS
+->PERIOD_ATTENDANCES
As you can see this is a multiple-level hierarchy of tables and the schoolid is only propagated to the second level tables.
There are two things guruanteed:
1. A row in a second level table will never change its schoolid. (we have other mechanism to handle student transfter etc.)
2. There are tables in the system that we need to visit 4 or 5 steps up the hierarchy to get the schoolid.
Given these conditions, do you think it a good idea to propagate schoolid to these "grandchild tables" for better performance?
Thanks.
April 02, 2005 - 9:10 am UTC
I still don't get student_grades. Seems it's primary key should begin with studentid and what is graded a fk to?
I cannot answer if it would be a good idea or not, not knowing the questions you ask of the data, whether you need to make it so that inserts perform as well as they could be or what.
Re: Do you consider this as denormalization
A reader, April 02, 2005 - 10:42 am UTC
All our tables use surrogate single-column meaningless autogenerated PK. We are mainly concerned about performance of queries, some of them involves joins over 20 tables and takes over dozens of minutes.
As you can tell without this schoolid in the "grand child" tables, the joins are very expensive. But if I add the schoolid to the "grandchild" tables and joining all participating tables together with schoolid too then the query will be very efficient because the rows will be filtered by schoolid before joining together. I improved one single query from 10minutes to 1 second by doing this. I am just wondering if this is a common technique.
Interestingly this database is actually a M$ one, but I couldn't find a good forum like yours. But I guess the tuning methodology of all databases are similar.
April 02, 2005 - 11:33 am UTC
and the addition of these meaningless keys does what for you? HOW can they help?
The addition of these meaningless keys is a bad idea. Ok, one way to look at this, according to your model:
We have tables like:
SCHOOLS (SCHOOLID PK, SCHOOLNAME etc...)
+->STUDENTS (STUDENTID PK, SCHOOLID FK, STUDENTNAME etc...)
+->STUDENT_GRADES(STUGRADEID PK, STUDENTID FK, GRADEID FK, ...)
+->STUDENT_GRADE_ITEMS(STUGRADEITEMID PK, STUGRADEID FK,...)
A school has students. Students are unique within a school (no reason they have to be "uniquely unique" here, in your model, a student is a detail record OF A SCHOOL and you've already said if we move schools, new student)
so, why isn't it:
schools( schoolid pk, .... )
students( (schoolid fk,studentid) pk, ..... )
student_grades( (key from students+whatever) pk, .. )
In your model (which others believe to be a bit "wrong", I agree, a student is a student and you should not be giving them a new id when they move schools), the natural key of a student is their schoolid and studentid within that school.
the tuning methodology of databases is frankly what makes them different. Therein lies our LARGEST differences!!! an implementation that works in Oracle with non-blocking reads, multi-versioning will not work the *same*, nor as good in SQL Server.
The reverse is frequently true as well.
Re: Do you consider this as denormalization
A reader, April 02, 2005 - 11:02 am UTC
More info: our system contains data of more than 200 schools and most reports are dealing with one single school only. Some fact tables contains 10s of millions of rows.
GRADEID is a FK to a GRADES lookup table, which references SCHOOLS too (each school has 12 grades). so for 200 schools there will be 2400 rows in GRADES.
I know it's hard for you to say good or bad. But would you pls tell if you think this kind of tuning common or not?
April 02, 2005 - 11:35 am UTC
I'd use compound natural keys, they are OK, you are allowed to.
To create a student, you need to know the school. The student's primary key includes the schoolid and the unique student id within that school.
To give a student a grade, you need the students primary key (schoolid,studentid)
and so on.
Re: Do you consider this as denormalization
A reader, April 02, 2005 - 11:43 am UTC
Thanks Sir.
With compound PK approach, I am concerned about the number of foreign key columns. With 5 or 6 levels down the hierarchy, you might have 6 columns for a single foreign key which would be a concern (I don't know why but I think it doesn't smell good).
Anyway thanks for shedding light on this.
April 02, 2005 - 12:00 pm UTC
in Oracle it would not be a problem.
in other databases, who knows.
In fact, since foreign keys almost always have to be indexed in Oracle, it would be a good thing.
t1( a pk )
t2( a fk, b, (a,b) pk )
t3( a, b, c, (a,b) fk, (a,b,c) pk )
we would have a t1_pk, t2_pk, t3_pk and since the foreign keys are on the leading edge of the index, all is well *IN ORACLE*.
</code>
http://asktom.oracle.com/~tkyte/unindex/index.html <code>
Now, you do this
t1( a pk )
t2( X pk, a fk )
t3( Y pk, a, b, X fk )
so you have t1_pk, t2_pk, t3_pk as indexes AND t2_a, t3_X
as indexes....
AND you have the extra columns. So since every index adds work, you've just made it slow to insert and we still have to do the fkey check (1 column, 5 columns, just bytes in a key to us)
So, you have fatter tables and more indexes.
I have skinnier tables with less indexes.
I'm having two issues with this conversation (don't take this the wrong way, I'm not rebuking you or anything, I think this is actually instructive)
a) you are asking Oracle advice for SQLServer implementations. A logical model, sure we can help there but when you get to physical schemas, if you are concerned about performance, you best get a SQServer guy on board and fast.
b) You are solving hypothetical issues. I like to work on real problems myself. Before solving a hypothetical issue, I like to be relatively sure it is a real issue.
can this table be normalized
A reader, October 25, 2005 - 9:14 am UTC
Hi
We have a user request application running on Oracle 9.2.0.6.
The workflow is as follows:
1. User sends a new request to support level 1
2. Support level 1 tries to resolve the request or issue
3. If Support level 1 cannot resolve it scalates to level 2
4. If level 2 cannot resolve it sends to level 3
The main table where all requests goes has over 80 columns, there are three columns indicating the level where the request is currently and some columns indicating when the request is sent from level to level.
I wonder if the level of support can be normalizaed, i.e have another level table with three rows indicating level 1, 2 and 3 so if in future we have a new level we dont have to add columns.
Is this a technique to tackle this sort of problems?
October 26, 2005 - 7:20 am UTC
I don't know why you would have 3 columns for the 3 levels in the first place? I didn't get that part - seems you would have a single column "level" (don't call it that, there is a psuedo-column LEVEL already) in the first place.
can this table be normalized
A reader, October 26, 2005 - 10:13 am UTC
Hi
We have three columns because each row in the table is a request, if we insert a new row everytime a request moves from level 1 to 2 or 2 to 3 our natural key, request_id would be useless. If we use surrogate keys then we cannot perform joins!
October 27, 2005 - 2:50 am UTC
why would you insert a new row every time a request moves from level 1 to 2 or 2 to 3 - why would you not just
update t set level = 2 where key = value;
to move it from 1 to 2.
can this table be normalized
A reader, November 04, 2005 - 9:22 am UTC
Hi
Sorry for the late reply, being travelling.
If we dont insert a new row we cannot know from what level moved to what level. If I simply update the level I dont know what is the previous level.
November 04, 2005 - 5:08 pm UTC
well, would it not be "the previous level"??????
(level minus one)
can this table be normalized
A reader, November 05, 2005 - 6:54 am UTC
Hi
It can move from 1 to 2 then 2 to 3 but also moved from 3 to 1 or 3 to 2 :(
November 05, 2005 - 5:55 pm UTC
well, first question I would ask is "is is relevant to remember" (could be a silly question, then again, you might not have asked it yourself)
and then I might consider "current_status", "last_status" as attributes if I needed last and current status to be available.
Normalization
Sujit Mondal, May 04, 2006 - 4:33 am UTC
Discussions is really helpful.
I have a business scenario like below:
We have to store and retrive Customer Information , It has 3 parts
1) Customer General Information (General Details attributes,Address Attributes General)
2) Customer Distribution Information (Distribution Details attributes,Address Attributes Distribution)
3) Customer Contract Information (Contract Details attributes,Address Attributes Contract).
The 3 parts with all its attributes will be inserted , updated or retrived separately from application. in any situations business does not want to see all three sets of informations at the same time. The relationship between 3 entities are one to one.
To model this business requirement we have come up with following two ways , can you please guide us to choose the best way , or if none of those two are good then what will be the best way?
Model 1:
---------------------------------------------------
3 separate tables for general , distribution , contract with all its respective related fields ( including address attributes which are common but will have different values in 3 tables) with the Primary Key column customer_id with proper FK constraint.
Model 2:
---------------------------------------------------
3 separate tables for general , distribution , contract with all its respective related fields excuding address.
PK and FK are as Model 1.
There will be a separet table for Address with Customer id and address type as its Primary Key.
So for each Customer there will be 3 records in the address table.
May 04, 2006 - 7:09 am UTC
what about model 3:
one table.
Normalization
Sujit Mondal, May 04, 2006 - 7:58 am UTC
Thanks for your reply.
One table is obviously a good solution , but because of some constraints we won't go for it. As 3 parts of information may be endered at separate time with the general as first. So in that case the elements for others we may need to keep null or defaullt values. Again the elements of distribution and contract has some foreign key constraint so it will be difficult and ambiguous to assign some default values to those columns with Fk. Though we can keep not assigned as default value. Also application like to retrive and store it in a modularize way. Which I think , may not be a good idea from database perspective.
Based on those we had to make 3 different tables. Can you please tell me among Model 1 and Model 2 which one shall I choose?
May 04, 2006 - 12:23 pm UTC
tell me the constraints.
Give me a real example where it would be "difficult or ambigous" to have a single table.
I am still voting for model 3, I do not like either of Model 1 or 2.
Model 2 is really not good. Model 1 is simply not good.
Can denormalization be worse than joins for queries
biswa, May 10, 2006 - 12:08 pm UTC
Say we have two tables T1 with 5,000 rows and T2 with 100,000 rows in an OLTP database.
The relationship of T2 to T1 is Many to One.
Both tables are relatively static.
Now suppose we denormalize T2 and store T1 data there for faster access so we dont need to join.
Since the T2 rowsize increases, The database will probably be able to store less rows of the table blocks in memory.
If so then although we would not be requiring a join, we might require more physical IO.
My question is
Is it possible that at some point, denormalization can reduce performance because more physical I/O is required because less rows fit in memory - or is denormalization always faster for querying tables than joins?
May 11, 2006 - 7:52 am UTC
have you shown that joins are "slow" for you - databases WERE BORN TO JOIN.
normalization
Lakshmi, September 15, 2006 - 7:36 am UTC
Hi Tom
Our requirement is to store the following data in database:
ScanId
Start time
End time
Granularity
BSC1
BSC1_counterX
BSC1_counterY
BSc2
BSC2_counterZ
BSC2_counterX
BSC2_counterY
.....
[The number of Objects (BSC1,2 and the counter) is not fixed and can be configured by the user]
We initially thought of normalizing the table as
Scanner table:
SCANID STRTTIME ENDTIME GRANULARITY
ScannerBsc table:
SCANID BSCID COUNTER_NAME
Hence the data:
SCANID STRTTIME ENDTIME GRANULARITY ObjectName COUNTERNAME ObjectName COUNTERNAME COUNTERNAME
1 10 12 1 BSC1 xx BSC3
xx yy
will be put as:
SCANID STRTTIME ENDTIME GRANULARITY
------- ---------- ---------- -----------
1 10 12 1
2 3 6 1
SCANID ObjectName COUNTERLIST
---------- ---------- ----------
1 BSC1 xx
1 BSC3 xx
1 BSC3 yy
The problem now is that the user should not be allowed to create multiple scanners with the same definitions
The list of object name and counters should not be exactly for 2 scanIds
It would be nice if you can suggest how to design tables for this particular requirement
Thanks
Lakshmi
September 15, 2006 - 8:51 am UTC
so, (object_name,counter) needs a unique constraint? isn't that all?
Not exactly
Lakshmi, September 18, 2006 - 2:38 am UTC
Actually the "list" of object_name+counter should be unique
Example:
If
Scan1 Obj1 counter1 Obj2 counter1 counter2
already exists
then
DupScan Obj1 counter1 Obj2 counter1 counter2 - NOT allowed
and
Scan2 Obj1 counter1 Obj2 counter1 - allowed (counter2 is not present)
Scan3 Obj5 counter1 Obj2 counter1 counter2
- allowed
In brief there should not be two different scanners with the same scanner definition
September 18, 2006 - 2:52 am UTC
sorry, did not follow this at all.
or do you just mean that the LAST 5 columns need a unique constraint.
if so, then you perhaps did pick the wrong structure (you didn't really "normalize" anything - you meant to have 6 columns in a table)
Thanks for the quick response!!
Lakshmi, September 18, 2006 - 7:01 am UTC
Sorry may be I was not so clear !
I have the following requirements:
1. The following info is to be stored in DB
ScanId/Name
Start time
End time
Granularity
Object
Counters for that Object
a.The number of Objects can be from 1 to n and
b.Counters for each such object can vary from 1 to n
c.The number of objects / couters per each object is not fixed
2. "Granularity + Object[](object list) + Counters [](counter list)" should be unique
How to design the tables for this requirement?
I guess I cannot create a single table because Object list and counter list is not fixed !
September 18, 2006 - 1:29 pm UTC
sorry, I don't see any sort of effective design for this particular requirement.
This is interesting ... waiting for response from Tom also
Khalid, September 18, 2006 - 1:33 pm UTC
IMHO, my solution would be 'triggah' FWIW. So I am curious as to what Tom has to say.
September 19, 2006 - 2:22 am UTC
let us see your "triggah" - and it best include "lock table" which is why I said "I see no effective (efficient) design for this"
Query Performance
Raghav, September 19, 2006 - 3:04 am UTC
Hi Tom,
For example,
select stu_id, stu_name, class, school_name, sport, ....
from stu_dat, stu_mst, sch_mst
where stu_dat.stu_id = stu_mst.stu_id
and stu_dat.schoolid = sch_mst.schoolid
and sport = 'volley ball'
The tables are fully normalised. On the above tables, indexes are built on stu_id, class, school etc. But, there is no index in particular to the sport.
The query is taking much time than expected. We know this is a bad performer for instance. There are somany indexes on those tables and we dont want to create one more index for the purpose of this query. Why because, in place of sport, there may be some other varient(s) against which the information is needed.
As per the stats we know this particular query is using the existing indexes but still the performance is bad. How can we check the query performance by changing the existing index to other without physically chaning the database (existing indexes)?
Is there any other mechanism to increase the performance of this query without creating a new index.
Thanks in Advance
Raghav
September 19, 2006 - 2:32 pm UTC
I would hope that it would use no indexes in your case!!
the only thing that could benefit from an index would be one on sport and only if volley ball is not a popular sport.
if this is using indexes - that is probably the problem.
but you give almost nothing to work with here, like "how long is long", 'how big is big'
query performance
Raghav, September 21, 2006 - 9:00 am UTC
Hi Tom,
Thank you for your reply. In the example, I am trying to put some more conditions, to put stress on the query.
select stu_id, stu_name, class, school_name, sport, section, gender, mother_tongue
from stu_dat, stu_mst, sch_mst
where stu_dat.stu_id = stu_mst.stu_id
and stu_dat.schoolid = sch_mst.schoolid
and stu_name like 'A%'
and sport = 'volley ball'
and section in ('A','C')
and gender = 'M'
and mother_tongue in ('Hindi','Telugu','Bengali')
Request you not to go by the exact above example and request you to take a similar example as you are the best and right person can give good examples to idenfity the following.
supposing all the tables are fully normalised, and there is no index in particular to the column which we are trying to query which is causing bad performance as soon as the condition is incorporated.
How can we check the query performance by changing the existing index to other without physically chaning the database (existing indexes)?
Is there any other / best mechanism to increase the performance of this query without creating a new index.
Thanks and Regards
Raghav
September 22, 2006 - 2:00 am UTC
"more stress"?!?!?! not sure what that means, queries do not get stressed, they remain calm cool and collected through the worst of times.
you cannot "test the performance of some new index structure" without creating the new index structure which would physically change the database.
query performance
Raghav, October 09, 2006 - 3:36 am UTC
Hi Tom,
Thank you for your reply.
I understand that queries do not get "stressed" by giving more conditions. They only cause the performance issues like the one in out example.
My question is : suppose the database is fully normalised, and there is no index on a particular column on which we are trying to filter the data, which is causing bad performance as soon as the condition is incorporated.
Is there any other / best mechanism to increase the performance of this query without creating a new index on that column.
Hope this time, you can understand my query better.
Thanks and Regards
Raghav
October 09, 2006 - 8:35 am UTC
you can only "guess", we can help you with that guess using enterprise manager and the index advisor, but it'll be "I guess this the work I'll do if you add this index because this is my guess as to the cardinality"
the only way to move beyond a guess (which frankly, if you know your data - should might be able to just "make" in your head) is to add it and test.
Which you can safely do in your TEST environment of course...
virtual index
Raghav, October 10, 2006 - 8:29 am UTC
Hi Tom,
Thank you for your reply. I agree with your coments. Yes, there is a provision in enterprise manager sql analyser to get recommendations on the indexes which may be again an assumptions made to change the performance of the sql which we can try in test environment too.
There is one more concept "virtual index" being allowed to create through sql analyser which is creating an index in another schema called "AURORA$JIS$UTILITY$" without affecting the current schema (not creating any new object (index) in current schema. Request you to throw some light on this concept how it works.
Thanks and Regards
Raghav
October 10, 2006 - 12:02 pm UTC
that is what EM (enterprise manager) uses, the "nosegement" index. It is not documented, if you want to use it, you would use EM.
virtual index
Raghav, October 14, 2006 - 3:50 am UTC
Hi Tom,
Thank you for your reply.
Can we create these "virtual" / "nosegement" indexes in any schema (for ex:test) instead of "AURORA$JIS$UTILITY$" incase if we dont want to use EM and do it manually. Will the query in main schema (for ex: live) uses this index while executing the query. If not, how can we enforce the query to use the virtual index created in other(test) schema. It may be a silly question. Please provide your valuable inputs.
Thanks and Regards
Raghav
October 14, 2006 - 8:14 am UTC
they are not documented, caveat emptor.
A reader, October 19, 2006 - 2:15 pm UTC
We have a persons table that has approximately 40 columns. The first few columns are the persons details such as name, gender etc.. and then follow other details such as his arrival flight number, departure airport code, hotel person of contact name etc .. The flight, hotel details will be null blank for many records becuse these people are from the same town and the table ends up having null fields in many records. Should these columns with null values be separated out into a new table (normalized?). Also, this table stores various kinds of data about a person. Can all of this remain in a single table?
October 19, 2006 - 2:21 pm UTC
no, you don't want to separate out a one to one relationship like that.
(well, there are two sides to this coin, some would say "never use null" - I'm not of that camp, more of a "pragmatist" when it comes to performance)
One table.
normlaization
sam, November 18, 2006 - 8:39 am UTC
TOm:
For the above case, some DB design books say to create a "Subset table" for the null columns if they only apply to a few people. You seem to like the idea of having one large table for 1 to 1 data relationship. Is there an advantage to that versus creating a subset table?
normalization
sam, November 18, 2006 - 8:48 am UTC
Tom:
1. For your above example, let us say the customer address changes frequently. Do not you have to store the customer address associated with each order and not normalize it.
table customer ( cust_id, cust_name, cust_address );
table orders ( order_id, cust_id, cust_name, cust_address );
2. For OLTP you normalize to eliminate redundant data.
For DSS (Data warehouses) you denormalize to speed up queries.
I am confused. Would normalizing an OLTP increase performance for (UPDATE, DELETE, INSERT) and decrease performance for (SELECT)?
because you only do (SELECT) on DSS and denormalizing is supposed to speed it up.
3. Can you give us example how normalization works in increasing performance. If I have one unnormalized table (1 million records) and the SELECT query is taking 20 minutes to run can it be said that because it is not normalized or more related to the way the query is built and doing full scan versus index scan?
Is it normal for some queries to take that long or there must be something wrong?
thanks,
normalization
sam, November 21, 2006 - 12:42 am UTC
Tom:
With your expertise in DB design, do not you have any comment on the normalization questions above?
November 22, 2006 - 3:16 pm UTC
#1 - too vague to address.
If the customer address changes, why would store it with the orders table? You have to STATE REQUIREMENTS here.
If the cust_address in orders is the current address of the customer, well, then you would not put it there.
if the cust_address in orders is really "the address to ship this here order to", then it is not named very well.
answer therefore: depends on the requirements doesn't it.
#2 - you normalize to avoid redundant data, period. regardless of system type.
you might, MIGHT, maybe, SOMETIMES (usually in a materialized view) denormalize in a warehouse. otherwise no, you wouldn't.
#3 - if you have a 20 minute select, you are not in a transactional system are you. And you might, just maybe, just MIGHT, sometimes, MAYBE use a materialized view in a warehouse for that 20 minute query.
normalization of database
Zpatel, March 15, 2007 - 8:11 am UTC
Hi Tom,
See I Confused here.
Which one is preferable??
Actually What I have to do?
I am designing database for hospitality mgmt....
Cuisine Name
Indian
Chinese
Mexican
Cuisine Type
Veg
Non Veg
Item Type
Soup
Main Course
Desserts
Snacks
Item Name
Nachos
Hakka Noodles
Panir Bhurji
So I have to maintain hierarchy this way,,
CHINESE
|
Soup
|
Vegetarian
|
Sweet- corn
Hot & sour
|
Non- Vegetarian
|
Chicken Hot & Sour
|
Snacks
|
Vegetarian
|
Hakka Noodles
Man-chow
|
Non-Vegetarian
|
Chicken Chopswey
MEXICAN
|
Soup
|
Vegetarian
|
Mexican Tomato Soup
|
Main Course
|
Vegetarian
|
Nachos
|
Non-Vegetarian
|
AAAA
BBBB
INDIAN
|
South Indian
|
Snacks
|
Vegetarian
|
Idli
Dhosa
|
Punjabi
|
Main Course
|
Vegetarian
|
Paneer Bhurji
Veg jaipuri
|
Non-Vegetarian
|
Shahi Kabab
Category in red are only belongs to Indian Cuisine. This is only for India.
thats why i make two tables.
Ref_CuisineName
------------------------
1 Chinese
2 Indian
3 Mexican
Ref_CuisineType
---------------------
1 Vegetarian
2 Non Vegetarain
Ref_CuisineTo (Only Belong to Indian Cuisine)
---------------------
1 South Indian
2 Punjabi
3 Gujarat
Ref_Item
-------------
1 Noodle
2 Paneer Bhurji
3 Veg Jaipuri
4 Shahi kabab
5 Idli
6 Dhosa
7 Mexican Tomato Soup
8 Sweet- corn
9 Hot & sour
10 Chicken Hot & Sour
11 Hakka Noodles
12 Man-chow
13 Chicken Chopswey
14 AAAA
15 BBBB
Which is the best practice among the following?
CASE I
Sec_ItmTypeNm
---------------
cuisineTypeId cuisineTypeId CuisineNameId
1 Chinese Veg
2 Chinese Non-Veg
3 Indian Veg
4 Indian Non-Veg
5 Mexican Veg
6 Mexican Non-Veg
Ref_CuisineTypeItem
CuisineTypeItemId cuisineTypeId ItemtypeId
1 Chinese - veg Soup
2 Chinese ¿Non Veg Soup
3 Indian- Veg Soup
4 Indian- Non Veg Soup
5 Mexican - Veg Soup
6 Mexican ¿ Non Veg Soup
7 Indian- Veg Main Course
8 Indian- Non Veg Main Course
9 Chinese- Veg Snacks
10 Chinese ¿ Non veg Snacks
11 Indian ¿ veg Snacks
Ref_ItemTke
ItemTke ItemId CuisineTypeItemId
1 Mexican Tomato Soup 5
2 Sweet- corn 1
3 Hot & sour 1
4 Chicken Hot & Sour 2
5 Hakka Noodles 9
6 Man-chow 9
7 Chicken Chopswey 10
Ref_ItemTke1
ItemTke1 ItemId CuisineTypeItemId
2 Paneer Bhurji 7 2
3 Veg Jaipuri 7 2
4 Shahi kabab 8 2
5 Idli 11 1
6 Dhosa 11 1
CASE II
Ref_ItemDetail
Item Id Item Nm Cuisine Type Cuisine Name Item Type
1 Hakka Noodles Vegetarian Chinese Snacks
Hope , You Understand....
What i want to ask you,,,
Which is the best practice among the following?
Zpatel
March 15, 2007 - 12:24 pm UTC
seems like "cuisines" have "foods" and "foods" have various attributes like
- vegetarian versus carnivore
- course (snack, main, dessert, whatever)
now, foods may or may not belong to multiple cuisines - I think they do.
and there isn't a "hierarchy" in as much as a simple many to many relationship.
create table cuisines ( cname primary key );
create table food ( foodname primary key, fkey_to_veg_or_carnivore, fkey_to_course_name );
create table cuisines_to_foods (cname, foodname);
and veg_or_carnivore/ course_name are simple lookup tables with valid values for those attributes.
Normalization
Zpatel, March 16, 2007 - 5:42 am UTC
Thanx Tom,,,
But Now
Only Indian Cuisine have sub Catgory like "South Indian", "Punjab","Gujarati","Rajsthani" etc....
And Food resides Under all that category if Food is INDIAN.
your suggestion is good for other cuisines, but what about above situation.
Now i have to take care of all cuisines...
So Pls Append your View as per My need..
So What i have to do???
create table cuisines ( cname primary key );
create table food ( foodname primary key, fkey_to_veg_or_carnivore, fkey_to_course_name );
create table cuisines_to_foods (cname, foodname);
I think i have to add one colums to "cuisines_to_food" fkey_to_Subcategoty_lookup
Food_subcategory(Foodsubname Primarykey);
If food belongs to indian cuisine's subcategory i have to pass that "Foodsubname" to cuisines_to_foods.
Otherwisw Put that field Null
March 17, 2007 - 2:37 pm UTC
In general the sub categories are permitted everywhere.
Are you saying you want to repeat things like 'breakfast', 'lunch' and dinner???
Normalization
Zpatel, March 19, 2007 - 4:43 am UTC
Hi Tom,
In general the sub categories are permitted everywhere.
Yes, you are right.
Are you saying you want to repeat things like 'breakfast', 'lunch' and dinner???
NO no ....
What i mean ,
Cuisines like Mexican,Chinese Have subcategory Like Soup,Main Course etc..
But Indian cuisine have subcategory like SouthIndian,Punjabi,Gujarati and this subcategories categorised into Sub-subcategory like Soup, Main Course.....
So, How i have to hadle it?
Thanx
March 19, 2007 - 10:13 am UTC
the model i suggested above "handles" it already?
Normalization
Zpatel, March 20, 2007 - 7:36 am UTC
Hi Tom,
Got It!
Thanx
aother point
Bunditj, October 28, 2008 - 7:02 am UTC
October 28, 2008 - 8:18 am UTC
well, let us see - they state "immutable", but the examples are all data that changes.
They recognize that you'll have to max out the attributes - if you want someone to be able to have 100 affiliations, you'll need 100 columns. So you know what happens then? Some developer says - eh, we'll just store a comma delimited list. Oh boy, then the problems really begin.
I might look at a cluster to keep all data with user_id as a primary or foreign - together, in a single block.
Every time I see someone doing something in-record (attr1, attr2, attr3, ... ) instead of cross record as a relation - bad stuff happens. They get to a point where they need attr(N+1) and then things go downhill fast.
Now, that said - re-read the original answer. OLTP versus DSS/Reporting. There is something to be said that a profile page might be considered a 'report' and we do use things like materialized views (redundant, denormalized) in normal systems to maintain this sort of stuff.
No reason why the transactional data would not be persisted transactionally and things like a profile page assembled and subsequently stored (to be reassembled if something changes).
I do similar things here - I denormalize into my TEXT index (a page is a Question, an Answer, a set of reviews and a set of followups - when you search this site you are searching over 4 columns in 3 tables - but I don't join to do the search, I did the join way ahead of time and indexed the result of the join)
Data Warehouse and Normalization
Dheeraj, July 28, 2009 - 2:57 am UTC
Hi Tom,
I am working in Data Warehousing and was under the impression that it is De-normalizaed,Yesterday I came across a question that in which normal form is Data Warehouse Model (Star Schema and Snow flake schema).
and the answer given was that it is in Third Normal form.
Can you please throw some light on this,with examples.
Thanks,
Dheeraj
July 28, 2009 - 8:20 pm UTC
Well, I'm not at all sure what you are asking here.
Data warehouses do not have a single model that all ascribe to. You can normalize, you can denormalize. It all depends.
But, I'm not at all sure what question you have???
normalization
A reader, July 28, 2009 - 11:16 pm UTC
Tom:
Do you always use Materialized views for read only page that uses 2-3 tables or you only do it when the page rendered is slow?
Also, how often you refresh the MV? is it real time, every hour, nightly ,etc.
I like you idea of using normlaized tables for transactions and indexed MVs for reports or read only pages.
July 29, 2009 - 12:35 am UTC
Never say never,
Never say always,
I always say.....
there are no "always".
materialized views are like indexes, you would use them like indexes.
how often to refresh? the only answer to that is "it depends", else we would not give you an option. In a warehouse it might be weekly - after the load. In a transactional system it might be on commit. It is somewhere in between "right away and never"
Normalization in DWH
DK, July 29, 2009 - 6:05 am UTC
Tom,
Refering to the Normilization question in DWH.
What I want to ask is that in the typical Star Schema studded with DIM and Fact tables,is it any way Normalized form (which for of normalization it follows)
Or
Does the Snow Flake Schema wherein the DIM is splitted into two or more tables,is that in any Normalized.
Hope you understand what I am asking
Thanks,
Dheeraj
July 29, 2009 - 7:24 am UTC
star schemas are normalized schemas (data does not repeat) as are snowflakes (just more relations).
star schemas can be denormalized as well, a star schema could be a collapsed snowflake for all we know. A snowflake could be denormalized as well.
"it depends"
give us a documented schema and then we can tell you - these names 'star, snowflake' they do not enforce anything - they are just names.
db
A reader, July 29, 2009 - 4:02 pm UTC
<<In a transactional system it might be on commit. It is somewhere in between "right away and never" >>
With this you are really inserting several records: one small record for each transaction table and one big record with several joins to the matetrialized view. correct?
I guess there would be some performance penalty for each commit - but reporting give be real time updates and faster. right.
Codification
Muru B, October 16, 2009 - 12:50 pm UTC
Dear Tom,
Thanks a lot for improving our Oracle skills! Your expert advice is invaluable.
I request your opinion on, whether column values in large tables with more than million rows should be codified for better performance. For example a customer table could have following columns, gender and card type. In our team many prefer to spell out the entire word for every row (as Male, Female, Credit Card, Debit Card etc), but my recommendation is to codify (M, F, C, D, etc) with a lookup table when needed. Their argument is, hardware is cheap including storage, helps readability and less joins. My argument is, production hardware is not cheap (we can buy 1TB for home under $100, but production hardware costs thousands), less overall blocks used improves performance, negligible problem with joining lookup tables. Please give your opinion.
Gender
---------
M vs Male
F vs Female
Card type
-------------
C vs Credit Card
D vs Debit Card,
P vs Prepaid Card
help on this
venkata, July 19, 2011 - 11:18 am UTC
one example
venkata, July 19, 2011 - 12:40 pm UTC
Tom,
i need one example for each different type of schema, please, for the scenario you mentioned.
July 19, 2011 - 1:23 pm UTC
they are right there in the documentation link you sent me????? did you read it??
click on the link I gave you - it has a schema *right there* in a nice graphical format.