Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Gajanan.

Asked: August 06, 2007 - 6:36 am UTC

Last updated: June 27, 2012 - 11:38 am UTC

Version: 9i

Viewed 1000+ times

You Asked

I am asked to do the re-design of a database which is having very poor performance. Majority of tables have the primary key as composite keys consisting of 2 to 5 columns. In the application sql where clause, I found that not all columns of composite keys are used, sometimes the leading column of composite key is not used as well.
Can you please suggest how can I go with redesigning such database in this case.

Thanks in advance.
Gajanan

and Tom said...

Umm, in a word

NO.

I can say you should identify WHAT is causing the poor performance - so we have an idea of what needs to be fixed.

... I found that not all columns of composite keys are used, ...

to that I say "so"? That must mean the application needs to retrieve all rows that match part of the natural key. That is not cause for alarm, nor is it likely to be the cause of any performance problem (it is what the application must do)

... sometimes the leading column of composite key is not used as well. ...

and to that I say the same - as long as the columns used to retrieve are on the leading edge of SOME index (there can be more than one index in place) - I don't see any issue with this either.



In order to 'redesign' a database, you need to understand what the requirements are. Nothing you listed above is bad or evil - nor would they be the cause of performance issues in general.

You might want to analyze where the current bottlenecks are and address them - it could be very likely that a redesign from the ground up is not warranted - but rather that a tweak here and there could generate great payoff.

Rating

  (19 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

select * and poor performance

A reader, August 07, 2007 - 3:00 am UTC

Hi Tom,

It was strange for me that as developers we have been told by a task force, built to address a performance problem in our application, to replace all select * by select columns names. They said "even if you are selecting all the tables columns then you must replace your select * by select col1, col2,col3..."

In my opinion this change will not improve performance at all. Or at least it will not solve our performance problem

I would like to know your opinion if possible

Best Regards

Tom Kyte
August 07, 2007 - 11:41 am UTC

but it will result better code, I'm definitely for it.

PT, August 07, 2007 - 3:23 am UTC

In my opinion specifying column names in a select clause is good practice: it shows exactly what data you are retrieving for. If you use * as selection clause, you might also retrieve data you don't need. This comes to an performance issue in client server model designs where clients select more data than they need which results into unnecessary network traffic (greater amount of data) between client and server.

Of course it might seem easier to specify * as select clause but in my experiences this may expose some compatibility issues between your PL/SQL code and table structures, i.e when specifying RECORD type variables that try to maintain same structure againts base table you're selecting from.

Better Programming and Redesigning

Tim, August 07, 2007 - 12:57 pm UTC

I really appreciate the comment that you need to understand the requirements. So often no one knows where they are headed.

Ask some questions before assuming anything.
1) What are the goals of increased performance? 3000 update transactions a minute. A particular report runs in under 30 seconds. Network traffic is reduced by 30%. A particular batch job completes within 2 hours. Make sure that the goals are quantitative not just "improve performance".
If you do get quantitative requirements then start questioning other things.
2) Did it ever have good performance? (If so what changed?)
3) Who did the original design, can you contact them to understand why they built it the way they did? (They may have insight you don't into the application.)
4) Are there other applications that are now having performance problems?
5) How much bandwidth do we have/need?
6) To the team whom said that the columns need specified ask them to show you why they think they need specified and where they got their information. I always like to learn new things and really want to see the research.

I definitely agree that putting the columns out there makes things more readable for those who come later.

So on and So forth.
Regards
Tim

Tables Design - PK

Emad Kehail, August 08, 2007 - 2:48 am UTC

Hello Tom,

There is a debate here at work among the developers and me.
If there is a table its Primary Key is more than one field then they add a new column with sequence numbers and make it the Primary Key. They said they have been working before with an international company and the data modeler there told them this what has to be done for PKs.!!!

I have told them if the PK is more than one field then it is ok. No problem in that. I will not affect performance and it is right from the database design point of view. But I still feel they are not convinced yet.

Can you help me in this point please...


Tom Kyte
August 10, 2007 - 3:14 pm UTC

rolls eyes and sighs.....


So, why is the "data modeler for an international company" the root of all truth?

tell them
a) read
b) read lots
c) learn about what it is they do (eg: research, see a and b)
d) test, benchmark, evaluate.

if there exists a reasonable natural key, it is not only perfectly OK - but very natural to use it.

Hence - that is why it is called a natural key, as opposed to the surrogate key your data modeler wants to always use.

Meaningless Key

Mohamed Houri, August 08, 2007 - 3:16 am UTC

Per regards to the discussion about the Primary Key design, I believe reading this link of jonathan lewis

http://jonathanlewis.wordpress.com/2006/12/29/meaningless-keys/

will help you a little bit in your design question

Kind Regards

If there was one eternal "best" design approach...

Duke Ganote, August 08, 2007 - 8:47 am UTC

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:6723299263475

"If the primary key has a chance -- however remote -- of changing, well, then it is NOT the primary key."

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:580828234131

"I agree that composite keys work very well, they can be used as foreign keys, everything works. HOWEVER

o a sequence is smaller then 3 columns will be. If the key of this table is a foreign key in lots of other tables -- the space savings can be enormous.

o people frequently try to update a primary key to correct data -- using a surrogate key which is imutable solves forever this update cascade problem as you never update the
primary key (the sequence) you update the 3 columns which are not stored anywhere.

o it is easier to code"

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1235228851558

"Sure, using denormalization can reduce the number of JOINS but consider -- if we moved DNAME, LOC into EMP from DEPT - we now have hundreds or thousands of records to update in
order to change the LOC of a departement. We have redundant data, we have data integrity issues."
Tom Kyte
August 14, 2007 - 10:05 am UTC

correct, there is not -

never say never
never say always
I always say.

Delelopers like surrogate keys

Marc Blum, August 11, 2007 - 6:02 am UTC

In my experience, developers like surrogate keys. They like the idea of each row having an numeric rownum. Just like scrolling through a flat file.

Some tend to apply an ID-column to every table, even association tables (n:m-relationships) only consisting of two FKs.

Surrogate Keys seem to work well with 4GL-IDEs and RAD-tools. GUI development appears to be easier.

It's the introduction of pointers to the relational model. Data isn't hold together by content anymore.

I'm not a big fan of IDs at all. I introduce them only when absolutely necessary.

db

Sam, August 12, 2007 - 1:14 am UTC


no choice but use IDs

Pasko, August 13, 2007 - 7:12 am UTC

Hi,
this is an interesting Discussion.

I don't necessarily like the auto-generated primary keys, but sometimes one has no choice.

I can not think of many Natural Primary( or Candidate) Keys which do not have a chance of changing, even if there are, then may be very few.

That's why i always use surrogate keys as my Primary Keys, because that way i'm on the safer side,
instead of having to cascade update hundreds of Tables when someday that key changes :)

IDs also help to standardize my Data Model.
For example, all my Database Tables have a Primary Key Column called 'ID' and this column has the same Data-Type for all Tables, for example a NUMBER(with an auto generated sequence) or RAW(16) with the Default value sys_guid().

Another advantage of using IDs is it makes it easier for Developers(especially Java Developers :) )
when they are using frameworks such as (JPA-Java Persistence API) or Ruby-On-Rails or even Oracle's APEX.





Tom Kyte
August 15, 2007 - 10:55 am UTC

You say you cannot think of many, however that implies you can think of some and using surrogates for them would not be correct.


Surrogate PKs

A reader, August 13, 2007 - 9:05 am UTC

In the OLTP environment, I wonder how many proponents of surrogate PKs also remember to build an Alternate Key on the natural key when one exists? It really makes troubleshooting data extremely difficult when the same natural key exists several times, each with a different surrogate PK. That's why I always recommend using the natural key as the PK where ever possible in the OLTP environment. Relying on app code to enforce data integrity is a game played only be fools.

TO- A reader

Tyler, August 14, 2007 - 10:53 am UTC

That's what you use a constraint for. If you have a data model with with a primary key defined, but no other unique constraints, check constraints, foreign key constraints, where they are applicable, I'd be less concerned that you have no logical primary key, and more concerned that you're data modeling an access database.

Surrogate keys

Vikas Atrey, August 16, 2007 - 1:43 am UTC

Nothing can be said as universal truth.
Surrogate keys makes many things easier and more manageable.

Some of them are :
1. Common routines can be written for data auditing.

2. Audit information for all the tables can be stored in one table( or one table per logical group), also data retrieval routines can also be made generic.
This also helps in other metrics calculation e.g.How much data we change per day, per user...
If we are getting data from some other system regularly then audit table data may give us an indication of the data quality of the other systems as well. It Can also be extended even to help calculate some business measures like
user's efficency(the more are the data chages particularly in basic tables the more is the chance that user entered it wrongly in the first chance).

3. Data fetch coding in GUI also will be much more easier and only single routine can be written to fetch data for various tables.
4 . It shields us from business changes that may warrant in the changes in the PKs.
Tom Kyte
August 20, 2007 - 7:24 pm UTC

and surrogate keys can really screw things up to - don't forget that side of the coin.

I do not believe I've ever said "they are universally good (or bad)".

Never say Always
Never say Never
I always say


Natural keys

Oleksandr Alesinskyy, August 16, 2007 - 7:38 am UTC

I like natural keys more then surrogate but there are 2 major issues with former:

1. If you are sure that some natural key would not change it will change.

2. If you are sure that some natural key is absolutely unique it is not (at it was proven mani times with SSID, tax numbers, etc.).

So you may use natural keys only for data that are completely at your disposition.

Regards,
Oleksandr

even if i have a natural key...

Pasko, August 16, 2007 - 11:03 am UTC

Hi Tom,

if there is a natural key which is guaranteed to not change,
now and in 100+ years to come, then it should be used.

Natural keys help to eliminate some of the joins used
when using Surrogate keys.

for example if i have two tables customers and tariffs, many-to-many,with a join Table: tariff_of_customer

Table: customers
--------
->name
->id

Table: tariffs
--------
->name
->id

using surrogate keys:

Table: tariff_of_customer
--------
--cus_id => references customer
--tariff_id => references tariff
.
.

or using natural keys

tariff_of_customer
---------
cust_name => references customer
tariff_name => references customer
.
.
.

The advantage of using natural keys:
i would not have to always join to all three Tables
to get tariff-name and customer-name;

but if i use surrogate keys then i will always have to
join whenever i want some information from
the join-table: tariff_of_customer.

However,whenever i use index-organized Tables , then i always prefer the use of Natural keys.


Tom Kyte
August 20, 2007 - 10:05 pm UTC

I'm getting confused here by these followups


I have said:

a) there are times for surrogates
b) there are times for natural keys

and anyone that says "always use surrogates in all cases" is just plain *wrong*

Data Modal

Arvind Mishra, January 14, 2009 - 7:07 pm UTC

Hello Tom,

In your book you wrote:
" if type = 'a' join to tablea if type = 'b' join to tableb
........
A QUERY LIKE THIS SUGGEST THAT DATA MODAL IS NOT SET UP TO ANSWER THE QUESTION ......."

Can u please give me an example of what is wrong in the data modal and how to correct it?

Thanks,

Arvind
Tom Kyte
January 16, 2009 - 5:00 pm UTC

I've written a lot, if you wouldn't mind giving a more 'specific reference' so I can refresh my mind with the example you are referring to?

Data Modal

Arvind Mishra, January 17, 2009 - 5:51 pm UTC

Hello Tom,

Here are the required details:

Book Name: Effective Oracle By Design
Chapter : Chapter 1, The Right Approach to Building Application
Topic Heading : Design Your Data Modal for Efficiency, Second Paragraph
Page Number: 37

Thanks,

Arvind
Tom Kyte
January 19, 2009 - 8:38 pm UTC

So, the entire quote in context is:

...
"We have this query we need to run hundreds of times a minute and it takes forever, bringing our system to its knees, Please help!"
if type = 'A' join to tablea
if type = 'B' join to tableb
¿
A query like this does conditional joining. (A similar problem is a query that needs to do where exists type processing four or five levels deep.) It is obvious that the data model was not set up to answer the question they ask hundreds of times per minute.
...

a quick search finds it:

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1460004570029#87793700346365857

Think of this simple example:

person_table( person_id, name, address, phone, person_type )

dr_table( person_id references person_table, degree, specialty )
teacher_table( person_id references person_table, grade_taught )
.... and so on .....


if person_type = 'DR' then join to dr_table
if person_type = 'TEACHER' then join to teacher_table
... and so on ....


basically, someone manifested a 1 to 1 optional relationship as a ton of child tables - each of which needs a join to pick up various attributes. When the meant to use a single table... with all of the attributes...

Scnema Design

A reader, January 24, 2009 - 4:49 pm UTC

Thanks Tom. Your web site is too good. I am happy to addicted to this website.

arvind




design and privleges

asktom fan, June 27, 2012 - 10:32 am UTC

Hi Tom,

My question is about a new database we are designing and the privileges of developer accounts.

Like every database, we have "application" schemas that own the business data and packages/procedures/functions, etc. And we also have developer accounts that can access the objects in the "application" schemas.

For developers to write and test the packages/procedures/functions for the "application" schemas, we are facing two choices:

1. Give developers the password for the application account, and let the developers develop and test the code in the application schema.

2. Create pubic synonyms for objects in the application accounts, grant developers access to these objects, and let developers develop and test the code in their own develpers' schemas, and later move the code to the application schema.

What is your preference? What are the pros and cons of each choice?

Thanks!
Tom Kyte
June 27, 2012 - 10:55 am UTC

1) do not do that.

2) do not do that.


You don't need/want synonyms. They can directly reference them, it won't kill them to do that.

Let the developers have their development database which feeds into the real development database on a regular basis (you could even automate that with continuous build process). That is, they would work in database X, but database Y is the current source of truth. They would check out a piece of code, work with it in X - get it doing what it should and when they check it back in a continuous integration tool ( http://en.wikipedia.org/wiki/Hudson_(software ) that one for example) will retrieve the checked in code, put it into database Y, and run the integration tests your team builds over time. If what they check in breaks something - you are notified right away.


design and privleges

asktom fan, June 27, 2012 - 11:33 am UTC

Hi Tom,

Thanks for your quick response and insight.

So we should have at least 4 databases:

1. Developers' database: used by developers to develop code. And in this database, the developers should develop directly in the application schema?

2. Real development database: serves integration testing for code migrated from Developers' database.

3. QA database: used by business users to test code migrated from real development database.

4. Production database: the real thing.

Is this enough or do we need more databases? :)

Thanks!
Tom Kyte
June 27, 2012 - 11:38 am UTC

1) they can, in fact, there isn't any reason this database isn't on their own machine if you wanted to go that far. It isn't important.

2) yes

3) yes

4) yes


and these are four "environments", not databases. App servers, whatever else is needed.