select * and poor performance
A reader, August    07, 2007 - 3:00 am UTC
 
 
Hi Tom,
It was strange for me that as  developers we have been told by a task force, built to address a performance problem in our application, to replace all select * by select columns names. They said "even if you are selecting all the tables columns then you must replace your select * by select col1, col2,col3..."
In my opinion this change will not improve performance at all. Or at least it will not solve our performance problem
I would like to know your opinion if possible
Best Regards
 
August    07, 2007 - 11:41 am UTC 
 
but it will result better code, I'm definitely for it. 
 
 
PT, August    07, 2007 - 3:23 am UTC
 
 
In my opinion specifying column names in a select clause is good practice: it shows exactly what data you are retrieving for. If you use * as selection clause, you might also retrieve data you don't need. This comes to an performance issue in client server model designs where clients select more data than they need which results into unnecessary network traffic (greater amount of data) between client and server.
Of course it might seem easier to specify * as select clause but in my experiences this may expose some compatibility issues between your PL/SQL code and table structures, i.e when specifying RECORD type variables that try to maintain same structure againts base table you're selecting from. 
 
Better Programming and Redesigning
Tim, August    07, 2007 - 12:57 pm UTC
 
 
I really appreciate the comment that you need to understand the requirements.  So often no one knows where they are headed.  
Ask some questions before assuming anything.
1)  What are the goals of increased performance?  3000 update transactions a minute.  A particular report runs in under 30 seconds.  Network traffic is reduced by 30%.  A particular batch job completes within 2 hours.  Make sure that the goals are quantitative not just "improve performance".
If you do get quantitative requirements then start questioning other things.
2)  Did it ever have good performance? (If so what changed?)
3)  Who did the original design, can you contact them to understand why they built it the way they did? (They may have insight you don't into the application.)
4)  Are there other applications that are now having performance problems?
5)  How much bandwidth do we have/need?
6)  To the team whom said that the columns need specified ask them to show you why they think they need specified and where they got their information.  I always like to learn new things and really want to see the research.
I definitely agree that putting the columns out there makes things more readable for those who come later.  
So on and So forth.
Regards
Tim
 
 
Tables Design - PK 
Emad Kehail, August    08, 2007 - 2:48 am UTC
 
 
Hello Tom,
There is a debate here at work among the developers and me.
If there is a table its Primary Key is more than one field then they add a new column with sequence numbers and make it the Primary Key. They said they have been working before with an international company and the data modeler there told them this what has to be done for PKs.!!!
I have told them if the PK is more than one field then it is ok. No problem in that. I will not affect performance and it is right from the database design point of view. But I still feel they are not convinced yet.
Can you help me in this point please...
 
August    10, 2007 - 3:14 pm UTC 
 
rolls eyes and sighs.....
So, why is the "data modeler for an international company" the root of all truth?
tell them 
a) read
b) read lots
c) learn about what it is they do (eg: research, see a and b)
d) test, benchmark, evaluate.
if there exists a reasonable natural key, it is not only perfectly OK - but very natural to use it.
Hence - that is why it is called a natural key, as opposed to the surrogate key your data modeler wants to always use.
 
 
 
Meaningless Key
Mohamed Houri, August    08, 2007 - 3:16 am UTC
 
 
 
If there was one eternal "best" design approach...
Duke Ganote, August    08, 2007 - 8:47 am UTC
 
 
August    14, 2007 - 10:05 am UTC 
 
correct, there is not -
never say never
never say always
I always say. 
 
 
Delelopers like surrogate keys
Marc Blum, August    11, 2007 - 6:02 am UTC
 
 
In my experience, developers like surrogate keys. They like the idea of each row having an numeric rownum. Just like scrolling through a flat file. 
Some tend to apply an ID-column to every table, even association tables (n:m-relationships) only consisting of two FKs.
Surrogate Keys seem to work well with 4GL-IDEs and RAD-tools. GUI development appears to be easier. 
It's the introduction of pointers to the relational model. Data isn't hold together by content anymore.
I'm not a big fan of IDs at all. I introduce them only when absolutely necessary. 
 
db
Sam, August    12, 2007 - 1:14 am UTC
 
 
 
 
no choice but use IDs
Pasko, August    13, 2007 - 7:12 am UTC
 
 
Hi,
this is an interesting Discussion.
I don't necessarily like the auto-generated primary keys, but sometimes one has no choice.
I can not think of many Natural Primary( or Candidate) Keys which do not have a chance of changing, even if there are, then may be very few.
That's why i always use surrogate keys as my Primary Keys, because that way i'm on the safer side, 
instead of having to cascade update hundreds of Tables when someday that key changes :)
IDs also help to standardize my Data Model.
For example, all my Database Tables have a Primary Key Column called 'ID' and this column has the same Data-Type for all Tables, for example a NUMBER(with an auto generated sequence) or RAW(16) with the Default value sys_guid().
Another advantage of using IDs is it makes it easier for Developers(especially Java Developers :) ) 
when they are using frameworks such as (JPA-Java Persistence API) or Ruby-On-Rails or even Oracle's APEX.
 
August    15, 2007 - 10:55 am UTC 
 
You say you cannot think of many, however that implies you can think of some and using surrogates for them would not be correct.
 
 
 
Surrogate PKs
A reader, August    13, 2007 - 9:05 am UTC
 
 
In the OLTP environment, I wonder how many proponents of surrogate PKs also remember to build an Alternate Key on the natural key when one exists?  It really makes troubleshooting data extremely difficult when the same natural key exists several times, each with a different surrogate PK.  That's why I always recommend using the natural key as the PK where ever possible in the OLTP environment.  Relying on app code to enforce data integrity is a game played only be fools.  
 
TO- A reader 
Tyler, August    14, 2007 - 10:53 am UTC
 
 
That's what you use a constraint for. If you have a data model with with a primary key defined, but no other unique constraints, check constraints, foreign key constraints, where they are applicable, I'd be less concerned that you have no logical primary key, and more concerned that you're data modeling an access database. 
 
Surrogate keys
Vikas Atrey, August    16, 2007 - 1:43 am UTC
 
 
Nothing  can be said as universal truth.
Surrogate keys makes many things easier and more manageable.
Some of them are :
1. Common routines can be written for data auditing.
2. Audit information  for all the tables can be stored in one table( or one table per logical group),  also data retrieval routines can also be made generic.
This also helps in other metrics calculation e.g.How much data we change per day, per user...
 If we are getting data from some other system regularly then audit table data  may give us an indication of the data quality of the other systems as well. It Can also be extended even to help calculate some business measures like 
user's efficency(the more are the data chages particularly in basic tables the more is the chance that user entered it wrongly in the first chance). 
3. Data fetch coding in GUI also will be  much more easier and only single routine can be written to fetch data for various tables.
4 . It shields us from business changes that may warrant in the changes in the PKs. 
August    20, 2007 - 7:24 pm UTC 
 
and surrogate keys can really screw things up to - don't forget that side of the coin.
I do not believe I've ever said "they are universally good (or bad)".  
Never say Always
Never say Never
I always say
 
 
 
Natural keys
Oleksandr Alesinskyy, August    16, 2007 - 7:38 am UTC
 
 
I like natural keys more then surrogate but there are 2 major issues with former:
1. If you are sure that some natural key would not change it will change.
2. If you are sure that some natural key is absolutely unique it is not (at it was proven mani times with SSID, tax numbers, etc.).
So you may use natural keys only for data that are completely at your disposition.
Regards,
Oleksandr 
 
even if i have a natural key...
Pasko, August    16, 2007 - 11:03 am UTC
 
 
Hi Tom,
if there is a natural key which is guaranteed to not change,
now and in 100+ years to come, then it should be used.
Natural keys help to eliminate some of the joins used 
when using Surrogate keys.
for example if i have two tables customers and tariffs, many-to-many,with a join Table: tariff_of_customer
Table:  customers
--------
->name
->id
Table:  tariffs
--------
->name
->id
using surrogate keys:
Table: tariff_of_customer
--------
--cus_id     => references customer
--tariff_id  => references tariff
.
.
or using natural keys
tariff_of_customer
---------
cust_name    => references customer
tariff_name  => references customer
.
.
.
The advantage of using natural keys:
i would not have to always join to all three Tables
to get tariff-name and customer-name;
but if i use surrogate keys then i will always have to
join whenever i want some information from
the join-table: tariff_of_customer.
However,whenever i use index-organized Tables , then i always prefer the use of Natural keys.
 
August    20, 2007 - 10:05 pm UTC 
 
I'm getting confused here by these followups
I have said:
a) there are times for surrogates
b) there are times for natural keys
and anyone that says "always use surrogates in all cases" is just plain *wrong* 
 
 
Data Modal
Arvind Mishra, January   14, 2009 - 7:07 pm UTC
 
 
Hello Tom,
In your book you wrote: 
" if type = 'a' join to tablea if type = 'b' join to tableb
........
A QUERY LIKE THIS SUGGEST THAT DATA MODAL IS NOT SET UP TO ANSWER THE QUESTION ......."
Can u please give me an example of what is wrong in the data modal and how to correct it?
Thanks,
Arvind 
January   16, 2009 - 5:00 pm UTC 
 
I've written a lot, if you wouldn't mind giving a more 'specific reference' so I can refresh my mind with the example you are referring to? 
 
 
Data Modal
Arvind Mishra, January   17, 2009 - 5:51 pm UTC
 
 
Hello Tom,
Here are the required details:
Book Name: Effective Oracle By Design
Chapter  : Chapter 1, The Right Approach to Building Application
Topic Heading : Design Your Data Modal for Efficiency, Second Paragraph
Page Number: 37
Thanks,
Arvind 
January   19, 2009 - 8:38 pm UTC 
 
So, the entire quote in context is:
...
"We have this query we need to run hundreds of times a minute and it takes forever, bringing our system to its knees, Please help!"
if type = 'A' join to tablea
if type = 'B' join to tableb
¿
A query like this does conditional joining. (A similar problem is a query that needs to do where exists type processing four or five levels deep.) It is obvious that the data model was not set up to answer the question they ask hundreds of times per minute.
...
a quick search finds it: 
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1460004570029#87793700346365857 Think of this simple example:
person_table( person_id, name, address, phone, person_type )
dr_table( person_id references person_table, degree, specialty )
teacher_table( person_id references person_table, grade_taught )
.... and so on .....
if person_type = 'DR' then join to dr_table
if person_type = 'TEACHER' then join to teacher_table
... and so on ....
basically, someone manifested a 1 to 1 optional relationship as a ton of child tables - each of which needs a join to pick up various attributes.  When the meant to use a single table... with all of the attributes... 
 
 
Scnema Design
A reader, January   24, 2009 - 4:49 pm UTC
 
 
Thanks Tom. Your web site is too good. I am happy to addicted to this website.
arvind
 
 
design and privleges
asktom fan, June      27, 2012 - 10:32 am UTC
 
 
Hi Tom,
My question is about a new database we are designing and the privileges of developer accounts.
Like every database, we have "application" schemas that own the business data and packages/procedures/functions, etc. And we also have developer accounts that can access the objects in the "application" schemas.
For developers to write and test the packages/procedures/functions for the "application" schemas, we are facing two choices:
1. Give developers the password for the application account, and let the developers develop and test the code in the application schema.
2. Create pubic synonyms for objects in the application accounts, grant developers access to these objects, and let developers develop and test the code in their own develpers' schemas, and later move the code to the application schema.
What is your preference? What are the pros and cons of each choice?
Thanks! 
June      27, 2012 - 10:55 am UTC 
 
1) do not do that.
2) do not do that.
You don't need/want synonyms.  They can directly reference them, it won't kill them to do that.  
Let the developers have their development database which feeds into the real development database on a regular basis (you could even automate that with continuous build process).  That is, they would work in database X, but database Y is the current source of truth.  They would check out a piece of code, work with it in X - get it doing what it should and when they check it back in a continuous integration tool ( 
http://en.wikipedia.org/wiki/Hudson_(software ) that one for example) will retrieve the checked in code, put it into database Y, and run the integration tests your team builds over time.  If what they check in breaks something - you are notified right away. 
 
 
design and privleges   
asktom fan, June      27, 2012 - 11:33 am UTC
 
 
Hi Tom,
Thanks for your quick response and insight.
So we should have at least 4 databases:
1. Developers' database: used by developers to develop code. And in this database, the developers should develop directly in the application schema?
2. Real development database: serves integration testing for code migrated from Developers' database.
3. QA database: used by business users to test code migrated from real development database.
4. Production database: the real thing.
Is this enough or do we need more databases? :)
Thanks! 
June      27, 2012 - 11:38 am UTC 
 
1) they can, in fact, there isn't any reason this database isn't on their own machine if you wanted to go that far.  It isn't important.
2) yes
3) yes
4) yes
and these are four "environments", not databases.  App servers, whatever else is needed.