Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question.

Asked: November 27, 2000 - 8:43 pm UTC

Last updated: May 10, 2012 - 2:57 am UTC

Version: 8.1.6

Viewed 10K+ times! This question is

You Asked

Tom:

can you give me an simple example at what situation
denormalize will help performance, or can you refer some
document on that subject.



and Tom said...



Denormalization is a technique typically used in a data warehouse/mart to speed query response time. It is typically not used in an OLTP system as it degrades both concurrency and update response time greatly.


An example would be if you need to answer the question "how many employees are in departements ( x, y, z )" frequently. You could have a table of DEPT's and EMP's and join them and count. OR -- you could store in the DEPT table a count of the number of employees in that department -- an EMP_COUNT column -- and every time you added/removed an employee detail record, you would have to maintain this column. Oracle provides a feature called a materialized view for just such an operation.

Denormalization TYPICALLY involves "pre-joining" frequently joined data to avoid having to do that at runtime. This typically results in redudant data they either the system (via materialized views) or you must remember to maintain in addition to the raw detail data.

Rating

  (24 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Deormalization is not at all suitable in Oracle?

Thaha Hussain, April 06, 2003 - 5:57 am UTC

Are you saying that, demalization is not at all required in Oracle?

Thaha Hussain

Tom Kyte
April 06, 2003 - 10:28 am UTC

No, where did you read that?

I said --
In a data warehouse -- sure.
In a transactional system -- not.

How much successful are you?

Thaha Hussain, April 10, 2003 - 5:35 am UTC

Dear Tom,

Seems you are a Roll-Model. I like you. At each stage, it is a curiosity to know more about you.

I have a few personal queries. Sorry, if it is out of the scope.

1. How did you become the real Tom? How many years you took to reach this stage?

2. Is Oracle only your reign?

3. Is a successful Oracle expert successful in life too?

4. What do you think as the Point of success in personal life? Does money or prestige fulfill the purpose of our being?

5. What is your final goal?

If you have time and mind, it will be helpful for people like us to come to a conclusion.

Thanks

Thaha Hussain


Our Developers want denormalization

Max Frustration, May 17, 2004 - 3:25 pm UTC

...But not for the standard reasons. They do not want to have to write the additional code for joins!!

When I try to explain to them that this is not the way it should be done I am presented with a "what do you know?" additude because they have already rolled out a system or two that would have been just fine staying in flat file format. I am also new to this locale, and I am a consultant so I must tread lightly.

I have duplicated thier system in my personal schema and fully normalized it, then created a view to interface with the normalized system so they are required to think less. Better... but they still don't like my meddling.

How would you have handled this situation Tom?


Tom Kyte
May 17, 2004 - 4:16 pm UTC

Sell "up"

developers -- dba's for that matter -- work for a company. the data belongs to (gasp) the company -- not the developers, not the end users -- the company. that data will be there long long after they have left or moved onto some more interesting project.

Generally find the person responsible for the corporate data and ask them what they think about this practice.


so, they don't like coding joins, but they like coding lots of insert/update statements to do the simple stuff?

Denormalize or single table which way better. Need your expert advise

A Reader, May 17, 2004 - 10:04 pm UTC

Dear Tom,
I have a table like below.

Original:
----------
Table Name: Save_Link
-----------

BLOCK_ID NOT NULL NUMBER(10)
PRODUCT_ID NOT NULL NUMBER(10)
MASTER_CONTRACT_ID NOT NULL NUMBER(10)
MASTER_VERSION_NO NUMBER(5)
CHILD_CONTRACT_ID NOT NULL NUMBER(10)
CHILD_VERSION_NO NUMBER(5)
CHILD_PARTITION_NO NOT NULL NUMBER(10)
CHILD_COVER_NO NOT NULL NUMBER(10)
LINK_VERSION_NO NOT NULL NUMBER(5)
START_DATE NOT NULL DATE
ARRANGEMENT_ID NOT NULL NUMBER(10)
SECTION_ID NOT NULL NUMBER(10)
RI_TYPE NOT NULL VARCHAR2(5)
END_DATE NOT NULL DATE
CEDED_PERC NOT NULL NUMBER
CEDED_PREM NOT NULL NUMBER
CEDED_PREM_SWF NOT NULL VARCHAR2(3)
RI_COMM NUMBER
RI_COMM_SWF VARCHAR2(3)
RI_TAX NUMBER
RI_TAX_SWF VARCHAR2(3)
DATE_RATIO NOT NULL NUMBER(10)
RI_BROKERAGE NUMBER
RI_BROKERAGE_SWF VARCHAR2(3)
CEDING_COMM NUMBER
CEDING_COMM_SWF VARCHAR2(3)
USERNAME NOT NULL VARCHAR2(30)
RI_PARTNER_ID NOT NULL NUMBER(10)
CREATION_DATE NOT NULL DATE
STATUS NOT NULL VARCHAR2(5)
LINK_ACTION_CODE VARCHAR2(5)
SPECIAL_ACCEPTANCE_REF NUMBER(10)
CLASS_ID NUMBER(10)
DISPLAY_SEQ NUMBER(5)
SOURCE VARCHAR2(20)
TOP_LOCATION VARCHAR2(2)
ARRANGEMENT_PRTY NUMBER(5)
LINK_NO NOT NULL NUMBER(10)
CEDED_PML NUMBER
CEDED_TSI NUMBER
T3_VALUE VARCHAR2(30)
COMMISSION_PERC NUMBER
CEDING_SEQ NUMBER(10)
SA_INDICATOR VARCHAR2(1)
DISP_CEDED_PREM_GROSS NUMBER
DISP_RI_COMM NUMBER

Let me explain the work flow. Each time a policy is processed there might be
min of 6 records for each date sets. There might be one or more than one policies
to be processed at a time t. Each time a policy is processed the other policies in the same block
also considered so the previous programmer deletes all the entries for those
policies (old entries)and for that start date or new start date then calculate and insert again.

So i am planning to change the design the below is my denormalized tables

I dont want to directly play with the above table. I have created some SQL types
play with that and finally store the result. While storing i just check
for the date set then i make that to Top_Indicator as 'N'(if 'Y' then
that is the current set) then i insert a new date sets with Top_Indicator 'Y'.
Here i am finding the advantage because i dont need to delete 6 records for each
policy instead i play with the date set from header table(one record for one
policy and select the other data in the SQL type and using pipe row ) then after
all manupilation i will insert everything in detail table in the final stage.
Tom which way of doing you think is better?

Planning:
--------

Table: Header
-------------
HEADER_ID NOT NULL NUMBER
HDR_VERSION_NO NOT NULL NUMBER(10)
START_DATE NOT NULL DATE
END_DATE NOT NULL DATE
PRODUCT_ID NOT NULL NUMBER(10)
MASTER_CONTRACT_ID NOT NULL NUMBER(10)
MASTER_VERSION_NO NUMBER(10)
BLOCK_ID NUMBER(10)
CHILD_CONTRACT_ID NUMBER(10)
CHILD_VERSION_NO NUMBER(10)
CEDING_SEQ NUMBER
STATUS VARCHAR2(1)
TOP_INDICATOR VARCHAR2(1)
USER_NAME VARCHAR2(100)
CREATION_DATE DATE

Table: Detail
--------------
HEADER_ID NOT NULL NUMBER
CLASS_ID NUMBER
ARRANGEMENT_ID NUMBER
SECTION_ID NUMBER
ARRANGEMENT_PRTY NUMBER(5)
RI_TYPE VARCHAR2(5)
T3_VALUE VARCHAR2(12)
CEDED_TSI NUMBER
CEDED_PML NUMBER
CEDED_PERC NUMBER
CEDED_PREM NUMBER
CEDED_PREM_SWF VARCHAR2(3)
CEDING_COMM NUMBER
CEDING_COMM_SWF VARCHAR2(3)
RI_COMM NUMBER
RI_COMM_SWF VARCHAR2(3)
RI_TAX NUMBER
RI_TAX_SWF VARCHAR2(3)
RI_BROKERAGE NUMBER
RI_BROKERAGE_SWF VARCHAR2(3)
DATE_RATIO NUMBER
COMMISSION_PERC NUMBER
RI_PARTNER_ID NUMBER(10)
SA_IND VARCHAR2(1)
DISPLAY_ORDER NUMBER

I have few questions to ask you.

1) In this case which way of data modeling you think better?
2) Whether Directly insert, delete and update the table while
processing for 1000 policies is better or pull the data and manipulate
everything in SQL type object then finally store in the permanant
table is better?
3) Otherwise i have a doubt whether i have a bad feeling that if
manipulate like do select, insert, again select then update or insert
again select from the same table some how i am finding uncomfortable
or may be i dont know which way would be better. Please help to solve
thie dilemma.

Thanks and Regards,
Ibrahim.

Tom Kyte
May 18, 2004 - 11:23 am UTC

comment -- the head/detail is "more" normalized, not denormalized

1) in general, start normalized, denormalize for performance/ease of use reasons

2) I prefer relational schemas period.

3) test it? that is what I do when confronted with many possible paths, set up a test see how each works (tkprof, runstats are the things I use to do that)

Number and number(10).

kaushik, June 12, 2004 - 7:51 am UTC

Hi tom,
It will helpful for me if u kindly explain the difference between
Number and number(10).

Regards
Kaushik


Tom Kyte
June 12, 2004 - 9:51 am UTC

a number(10) is a number that can hold only 10 digits. If you try to put in 11, it won't work.

It is like "what is the difference between varchar2(38) and varchar2(10)"

Example of denormalizing a table

Peter Tran, March 10, 2005 - 11:07 pm UTC

Hi Tom,

Do you have an example script that will denormalize a table?

Let's say I have the following table.

id (pk)
pos (pk) value can be from 1-5
height (attribute)
weight (attribute)

And let's say I want to denormalize this table to look like:

id (pk)
height_1
weight_1
height_2
weight_2
height_3
weight_3
height_4
weight_4
height_5
weight_5

We're denormalizing for performance reason, but I'm trying to figure out the quickest way to populate the new table from the old table.

Thanks,
-Peter

Tom Kyte
March 11, 2005 - 6:11 am UTC

select id,
max( decode( pos, 1, height ) ) height_1,
....
max( decode( pos, 5, height ) ) height_5,

max( decode( pos, 1, weight ) ) weight_1,
....
max( decode( pos, 5, weight ) ) weight_5
from t
group by id;


Great as usual...

Peter Tran, March 11, 2005 - 8:56 am UTC

I was thinking about this problem last night and had a nightmare about some wickedly long PL/SQL code to check for different PK values and array tables, etc. Furthermore, I didn't get it to work, so missed deadlines, being yelled at, career change, manic depression, etc. It was pretty scary.

I can sleep well tonight.

Thanks Tom!!!

De-normalization

TH, August 15, 2005 - 3:16 am UTC

We have a School fee management software and around 10,000 students in different class, division etc with entirely different fee schemes viz amount, fee types, discount patterns etc to each student. It is common to change the schemes, discount rates, discount starting dates, fee types etc in the academic year at any time. Front end is Delphi and back end is Oracle.

Around 10-12 tables are used to calculate fees only.(Fully noramlized)

The application shows monthly applicable fee according to each student and the cashier can collect fees. That part is fine. But the problem is whith report. Some reports(call it a query) are taking lot of time to display results - upto 5 minutes. (Ex: Expected monthly income class wise division wise , Defualters List etc). We solved this issue by creating some intermedaite tables that will be changed according to the changes in the curresponding master/transaction tables. Now everything is readily available in a table viz Registration Number, YearMonth, Fee_Type_ID, Applicale Fee, Discount Amount, Paid Amount etc. The accuracy of data is confirmed by some maintenance routines, that are scheduled or according to the user request. This table is used to generate report and to decide the balance amount while fee collection.

NOW THE QUESTIONS ARE......

1. Are you satisified by these methodologies? Is it a good or bad practice? If bad, why and what is the alternate way.

2. Is it an OLTP or Datawarehouse model? Why?
Should the oracle installation be to support OLTP or to support Datawarehouse?

3. Should I be this much concious about these type of things (theory) or is "the best system is a fast system"?

Thanks in advance


Tom Kyte
August 15, 2005 - 7:02 am UTC

with just 10,000 records, nothing should take more than a couple of seconds. I myself would have started by looking at the algorithm and asking "how could it take 5 minutes on such little data"



Continuation

TH, August 16, 2005 - 2:32 am UTC

10,000 is in Students table.

There are other tables also.

-------------------------

25000 records in Fee_collection_master(Receipt number, StudentID etc)

110000 Records in fee_collection_details
(Receipt Number, MonthYear, Amount etc)

When a student comes, or when the reports are taken it scans through all the records in the tables and calculating the pending amount.

There are some other complicated tables also like Fine_Table, Transportation Table, Bus Routes, Discount Patters, Snacks Tables, Class Categories, Accounting Tables etc.

1. Is it a good idea to keep the things calculated and ready for the next transaction and for the report.

2. Is it OLTP or DSS(while considering the reports) or a mix?

3. According to you, with 1,00,000 to 5,00,000 recordes in 3-4 tables and around 10 other tables for join, how much maximum time is tolerable in a moderately complex query? (Medium range server)


Tom Kyte
August 16, 2005 - 10:54 am UTC

these are small tables, you should be able to get most things subsecond. Why would you "scan through all records in all tables" for a single student? That seems wrong doesn't it?

There are some guide lines?

TH, August 17, 2005 - 6:02 am UTC

Is there any guideline for what sould be done and what not? Or all intuitive or gathered from experience?

Please provide me with some links. Does your book cover all these?

Thanks Tom..




Tom Kyte
August 17, 2005 - 1:37 pm UTC

experience counts.

this just doesn't sound like something that should take minutes.

denormalized discussion

K Bruening, January 13, 2006 - 11:11 am UTC

It was helpful to see that I'm not the only one having trouble with the 'to normalize or denormalize' question.. the little script example for denormalization really helped me as well. This was my first visit, I'll be back.

Clusters

Robert, August 30, 2006 - 9:11 pm UTC

Hi Tom,

Could clusters be an alternative to denormalization?

Tom Kyte
August 31, 2006 - 8:58 am UTC

denormalization is a physical structure thing - the data is repeated.

clusters are a different sort of physical structure thing, the cluster data that is queried together frequently - together. The data is not repeated.

That said, could clusters (hash/b*tree) or IOT's (index organized tables) be used to reduce the work needed to pull together this related data - and effectively eliminate the need to consider denormalization (which is an attempt usually to remove a join when retrieving data) - yes, it can in some cases.

Perfect. Simple and right.

Vinicius, August 31, 2006 - 10:01 am UTC


Denormalize or Use Triggers?

Praveen, September 12, 2006 - 10:31 am UTC

Hi Tom,

I have a peculiar problem in desiging a database.
I find it difficult to ensure the data integrity through foreign keys.

I have user informations.

User
----
Don




There are sites.

Site
-----------
Pheonix
Deer Valley




And there are Products.

Product
-------
Cockpit Display
Surfact Display




A user can be a Site Manager and may manage multiple sites.

User-Site
---------
ID User Site
--------------------------
10 Don Pheonix
20 Don Deer Valley





Similarly there are Product Managers and may manage multiple products. It can be the same person who is a Site Manager too.

User-Product
------------
ID User Product
--------------------------------
100 Don Cockpit Display
200 Don Surface Display





Now there are PROJECTS managed by these users. The manager may be either
Site Manager or Product Manager.
Since Site Manager and Product Manager exists in two different tables
(User-Site and User-Product), how can I ensure a data integrity by creating a
foreign key, "Manager", in the Project table?


Project
-------
Project ID Manager (foreign key from User-Site/User-Product table)
-----------------------------------------------------------
P1 <The data may be either from User-Site or User-Product table>



(1) Is it ok if I denormalize the User-Site and User-Product table together into one table?

(2) Or should I use a trigger to ensure the data integrity?


Please advise.

Thank you

Tom Kyte
September 12, 2006 - 11:07 am UTC

You have users

You have users that are allowed to manage things (sites, products and projects) apparently.


USERS
|
USERS_THAT_MANAGE
|
+projects
+products
+sites

that does all of the RI you need. you cannot use triggers to ensure data integrity without using the lock table command and serializing modifications (hey, we have to use the lock table command with un-indexed foreign keys and WE can do all kinds of internal stuff - you cannot, you have to use lock table)



RI and triggers

Praveen, September 12, 2006 - 11:42 am UTC

Hi Tom, I understand from your explanation that it will be costly if we try to enforce RI through triggers.

But I didn't quite get what you meant by:-

USERS
|
USERS_THAT_MANAGE
|
+projects
+products
+sites

Is "USERS_THAT_MANAGE" another table and "project", "products" and "sites" are child tables?

Could you please elaborate a little?

Thank you

Tom Kyte
September 12, 2006 - 5:23 pm UTC

you got it - that is what it is, yes.

Triggers and RI

Madhu Mutukula, September 12, 2006 - 1:46 pm UTC

There are 4 entities in your database scenario

1.Users - users that manage sites, products and projects –User_id is primary key
2.Sites - sites located as per geographic location -site_id is primary key
3.Products -Products offered - product_id is primary key
4.Projects --- projects of the organization--product_id is primary key

One or more Users manage one or more sites or products.

So the relation between between Users - sites, Users – Products is many to many. Which is resolved in physical data model by creating associative entities with names User_sites and user_products.

If relation between Users and Projects is one to one then manager column in Project table refers to Users table user_id (not User_sites and user_products tables).

If relation between Users and Project is many to many then create associative entity user_projects which maintains the referential integrity.

This way there is no problem with referential integrity and no need for triggers.

Hope this helps.

Tom, please correct me if I am wrong.


RI

Praveen, September 12, 2006 - 2:46 pm UTC

Thanks, Madhu.

We need those users who are either Site Managers or Product Managers to manage a Project. But in the User table there may be several other users who are not managers at all. So if the "manager" column in Project table is refering User table, then we cannot necessarily ensure that the user is a Site Manager or a Product Manager.

Tom, your view is highly appreciated.

regards


Tom Kyte
September 12, 2006 - 5:41 pm UTC

my view is you have a missing relation here - you want "managers of things" and you can point to that.

Is this denormalised or normalised data?

Sreedhar, February 24, 2011 - 4:23 am UTC

I have a table A with storeid,storename,product,product_code,month,amount columns. I have created another table B(with data from table A) having columns storeid,storename,product, product_code,jan_amount,feb_amount,mar_amount,.. dec_amount. Basically what i have done is convert the rows in table A to columns in Table B.

Now would you call table B a normalised or de normalised version of table A? I believe it should be called normalised version because the data redundancy has been reduced from table A to B. But my business guys call table B a denormalised table A.
Tom Kyte
February 24, 2011 - 12:45 pm UTC

I call it a wrong table - you are missing a year attribute at the very least and given you already have the data in table A, the entire table B is redundant.

So, I'll just call it a table that probably shouldn't be.

Followup: Is this normalised or denormalised data?

Sreedhar, February 25, 2011 - 12:12 am UTC

When considered from a database point of view, yes this is a redundant table. But these tables are in a datawarehousing environment, where table A is a stage table which holds the source file contents as is and table B is the final table with the business rules appiled which is exposed to the business.

If your question is why a stage table when u have a refined version, then it is the prerogative of the people who pay for the database. May be they have reports running from both the version of the table and it is cost effective that way.

The tables which i have provided are sample tables to illustrate the kind of row to column conversion.

Basically my question is just this,
When "rows to columns" conversion happens does it lead to more denormalised version of the data or normalised version?

Tom Kyte
February 25, 2011 - 8:39 am UTC

what is "U" and how does "U" have anything?

When "rows to columns" conversion happens does it lead to more denormalised
version of the data or normalised version?


and here is my take on this mind numbing sort of question:

"Who cares at this point"

You are not in a OLTP environment - your goal is to store the information in a manner that best facilitates the queries you are running. You should not have even mentioned table "A" in the prior comment - it is a "stage" table, it doesn't exist in the warehouse, it is a scratch table, a temporary table, a nothing thing.


I would argue in a an OLTP system that your model is "wrong", but in a warehouse system - as the product of an ETL process - it is perfectly ok.

So, go to the normalization guy and ask "so what if it is - what is your point?"

The goal of normalization is to end up free of insertion, update, and deletion anomalies - if you never insert/update/delete (you don't, it is a warehouse remember) - why do you care?

Technically - you are at least in the 1st normal form (of 6 or 7 levels) as you haven't any redundancies

read the last paragraph in the introduction here (and yes, I realize wikipedia is not the end all, be all definitive source everyone out there - but when it has something useful it is useful to point to)...

http://en.wikipedia.org/wiki/Database_normalization

A standard piece of database design guidance is that the designer should create a fully normalized design; selective denormalization can subsequently be performed for performance reasons.[6] However, some modeling disciplines, such as the dimensional modeling approach to data warehouse design, explicitly recommend non-normalized designs, i.e. designs that in large part do not adhere to 3NF.[7]



Question regarding denormalization

A reader, May 08, 2012 - 9:33 pm UTC

Hi Tom,

Once i read some book and noticed some hints as below:
'sql join with too much tables might indicate a over-normalization data model'
'resolution of many-to-many by creating another entity is a candidate of denormalization'
'seperate active/inactive, heavily/lightly access data into different entities is a way of denormalization'

I am confused and hope you can help some.
1. any criteria about how many table joins indicate a necessary of denormalization?
any concrete example for this kind of denormalization will be highly appreciated
2. in OLTP, it is a common sense to create a reference table for n-n relationship, but if asking why, i can not answer
so what's your opinion
3. how can 'point 3(seperate ...) works'?
Tom Kyte
May 10, 2012 - 2:57 am UTC

'sql join with too much tables might indicate a over-normalization data model'


Define "over normalization" for me.

And when something says "might", it also definitely means "might not"

'resolution of many-to-many by creating another entity is a candidate of
denormalization'


out of context I don't even know what that means?


'seperate active/inactive, heavily/lightly access data into different entities
is a way of denormalization'


I suggest you get another book. This is gibberish.



1) no, there is not. You've either modeled your data - or you have not.

2) If you have a many to many relation - I don't really know of another way to do it short of having an association object in the middle.

3) get a different book I think....

what situation denormalize

poletepid, September 25, 2023 - 4:57 am UTC

Upon perusing a literary work, I discerned several subtle indications, which I shall elucidate henceforth:
The utilization of several tables in a SQL join could potentially suggest an over-normalized data model. One possible approach to resolving a many-to-many relationship is the creation of an additional entity, which can be considered a denormalization technique. The segregation of actively used and infrequently accessed data into separate entities can be viewed as a denormalization strategy. I have read before on https://asktom.oracle.com/pls/apex/asktom.search?tag=sqlnetcompression https://connectionspuzzle.com