Home>Question Details



Ashok -- Thanks for the question regarding "Identifying a Primary Key", version 10g R2

Submitted on 11-Mar-2008 2:57 Central time zone
Last updated 22-Mar-2011 13:54

You Asked

Dear Tom,

Need your advice on determining the Primary Key. There is a table which has got a natural key of data type CHAR(10) which will always be Unique. There are other dependent tables which will refer to this Primary Key. Will it be good to have a Surrogate key as a Primary Key of datatype NUMBER instead of having Natural Key as Primary Key of Datatype CHAR(10).


and we said...

there is nothing wrong with using a natural key, especially a single attribute one.

but - suggestion - never ever never use char, only use varchar2.


Reviews    
3 stars Primary Keys   March 12, 2008 - 8am Central time zone
Reviewer: John Flack from Arlington, VA
I agree that there is nothing wrong with using a natural key as primary.  However, primary keys 
should be immutable, i.e. they should never change, ESPECIALLY if they are used as foreign keys in 
other tables.  So if your natural key is subject to change, I advise you to make it an unique key, 
but not primary, and go ahead and add a surrogate primary key generated from a sequence.


5 stars   March 12, 2008 - 11am Central time zone
Reviewer: Brian Tkatch from Oak Park, MI
"never ever never use char, only use varchar2."

Would you say that even when using just CHAR(1)?
What if the field is "guaranteed" to use the full width?


Followup   March 12, 2008 - 5pm Central time zone:

yes, I would say that for varchar2(1)

why?

because you'll not be tempted to use a char. If you use it once, you might use it twice, so just DO NOT USE IT

A char is a varchar2 that is blank padded to the full width. So, if it is always the full width the char and varchar2 are identical - IDENTICAL - and since using CHAR is a really bad idea in general, best just to NEVER USE IT - so you know, you are not tempted to use it again.
4 stars Tom's take on CHAR vs VARCHAR2   March 12, 2008 - 12pm Central time zone
Reviewer: Duke Ganote from Amelia, Ohio USA
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1542606219593

He specifically addresses CHAR(1) vs VARCHAR2(1) here:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1542606219593#8216570788470


4 stars Identifying a Primary Key   March 13, 2008 - 2am Central time zone
Reviewer: Ashok from India
Thanks a lot for the guidance. What would be your advice on following scenario.
The format of the NATURAL Key (Which will be a Primary Key for the table) would be AAMMYY9999 for 
making it meaning ful to the end users
where AA is Alphabets
MM is the month
YY is the Year
9999 is the number generated from running sequence

In this case would the decision of having Natural Key as Primary Key would stand logical in terms 
of Performance aspect or Should a Surrogate Key be Introduced as Primary Key performance wise.

Please suggest.

Thanks


Followup   March 13, 2008 - 8am Central time zone:

AAMMYY9999 - ugh, that is so bad. storing a date in a string, storing two attributes in one :( so bad, so sad.


this has nothing to do with performance, a number in Oracle is a varying width pack decimal like thing - it will not be significantly smaller, in fact a surrogate would likely have a negative impact as it would

a) make the entire data set that much larger
b) take time to generate
c) take time to look up and propogate down to the child tables
d) not be able to be converted immediately without a lookup into the original data that contains a date hidden in a string..
4 stars AAMMYY9999 vs surrogate discussion   March 13, 2008 - 7am Central time zone
Reviewer: Duke Ganote from the rolling hills of Anderson Twp, Hamilton Cty, Ohio USA
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:689240000346704229


4 stars Something to consider   March 13, 2008 - 1pm Central time zone
Reviewer: Andrey from Washington, DC USA
Tom,

In my experience, I've found that if you have a complex enterprise application, it's better to 
avoid using natural keys as primary keys. This is because natural keys have real-world meaning, 
they stand for something, and users will eventually want to be able to change their value. In this 
example, are we 100% absolutely positive the date in the record's PK will never have to be 
adjusted? 

Also, coming from the Java world, we try to avoid using anything but sequence-generated PKs because 
we use a popular object-relational mapping tool called Hibernate. It works infinitely better with 
sequence PKs, because it knows an object you're saving is "new" if and only if the PK is null.

Just my 2c.


Followup   March 15, 2008 - 8am Central time zone:

something else to consider:

create table document( surrogate number primary key, .... );


create table document_version( surrogate, version, ...., foreign key(surrogate,versoin) );


I have found that many times natural keys "just fall out, exist, are ok, are there, should be used" as well.




...
Also, coming from the Java world, we try to avoid using anything but
sequence-generated PKs because we use a popular object-relational mapping tool
called Hibernate. It works infinitely better with sequence PKs, because it
knows an object you're saving is "new" if and only if the PK is null.
...

oh swell, do everything for the ease and convenience of a language.


...
In this example, are we 100%
absolutely positive the date in the record's PK will never have to be adjusted?
.....

if you cannot answer that question, you haven't finished your design, haven't done the analysis perhaps....


Not that I like their 'unnaturally natural' key above - aayyyymmdd - yuck, as bad as doing something for the ease and convenience of a language and a framework that won't exist in 5 years time.
5 stars Thanx!   March 13, 2008 - 1pm Central time zone
Reviewer: Brian Tkatch from Oak Park, MI
Tom, thank you for responding. I was wondering about that, and what was the reason. Even when the 
reason is don't get used to it! :)

@Duke Ganote

I did a search and found the article, but i did not see the comments that specifically referred to 
CHAR and VARCHAR2(1). Thanx for pointing it out.

I love AskTom!


5 stars aaddmmyy-ssss as record primary key   March 17, 2008 - 2am Central time zone
Reviewer: Amir Riaz 

hi tom

in most of databases i have designed i found that the client has some sort of requirement in one or two place to identify a record with keys like AAMMYY-9999

where
AA is Alphabets
MM is the month
YY is the Year
9999 is the number generated from running sequence

and
sadly its been a common practice which sadly i also haven been practicing. but i still not manage to understand what problem it causes because if i use three attributes for it i still have to generate date and sequence

so there should not be any difference between

create table surprised(seq integer, pdate date, symbol varchar2(2) /*some other columns here*/);

alter table surprised
ADD CONSTRAINT looks_strange_pk PRIMARY KEY(seq ,pdate,symbol);

insert into surprised values(seq.nextval, sysdate, 'CS');

and

create table i_do(pkey varchar2(60) primary key, /* other keys*/);

insert into i_do values ('CS'||sysdate||seq.nextval);

Since the primary key does not changes i think it should not make any difference. If yes please make it clear with some example

Followup   March 24, 2008 - 8am Central time zone:

Well, actually, all you need is:

create table surprised(seq integer, pdate date, symbol varchar2(2) /*some other columns here*/);

alter table surprised
ADD CONSTRAINT looks_strange_pk PRIMARY KEY(seq);

period - pdate and symbol are what as known as functionally dependent data - they are IMPLIED by the value of SEQ.

You have what is known as a surrogate key, it only needs be unique. SEQ is already unique. The addition of pdate and symbol to the primary key is not useful.



5 stars If the users can see it, they WILL want to change it   March 19, 2008 - 7am Central time zone
Reviewer: Dan from TAMPA FL US
In my experience, if you allow the users to see the key, they WILL eventually want to change it, 
despite having "complete" requirements, now. (Requirements change...)

Perhaps in this example, the components of the key are generated, but consider the corner cases

... time--> duplicates in the US when daylight savings time flips...

...in your case you have a sequence going to 9999. What happens if you get that many in one day?

...I have also seen cases where things that cannot happen in real life "happen' in the system. A 
company can't move from Canada to the US, but some minimum-wage data entry person can type in MB 
(Manitoba) instead of MN (Minnesota). As far as the database is concerned, the company just moved 
from Canada to the US.


Unless there is a real physical reason (and enforced by edit checks) that makes it unique, I 
usually make intelligent keys become unique keys, and create a surrogate sequence-based primary 
key.



Followup   March 24, 2008 - 9am Central time zone:

In short, end users should not have a say in the physical schema design - period, never, not ever.

they can specify a requirement (which we should then freely debate and not take as a "rule", but rather as a request - after explaining the COST of such a request) - but not a physical schema.
4 stars Fail to identify Primary Key   October 6, 2009 - 5am Central time zone
Reviewer: Arindam Mukherjee from India - Kolkata
Respected Mr. Tom,

At the outset, I am sorry to say something that may be against the current. In Design phase, we get stuck in identifying Primary key of client transaction table. The table structure looks like below. The bold columns have been identified as a composite primary key in Oracle 10g RAC.

Client Transaction Table

Clinent ID
VOUCHER_NO
TRN_NO
SRL_NO


GL_ID
TRN_DATE
FINANCIAL_YEAR
COMPANY_ID
REGION_ID
BRANCH_ID
BILL_TYPE
DR_AMOUNT
CR_AMOUNT
TAX_PERCENTAGE

At the time of transaction, it is okay but at the year end, the outstanding amount, if any, for every client or customer will be carried forwarded to the same table and the existing data will be removed.

During the year end process, we can easily populate data to all columns expect voucher_no, Trn_no and Srl_no because consolidated amount either Debit or Credit will be brought forwarded from the previous year transaction history. At this stage, predefined primary key is violated.

Other team members are suggesting now to introduce surrogate key (Sequence Number) as primary key but I could not find any value of surrogate key in transaction table as this key will not be used in generating report or SQL query on Sub Ledger wise payment or outstanding etc.

I request you to suggest how to overcome this problem. If you think, table design should be changed, please suggest. We are desperate in need of the solution.

Regards,

Followup   October 8, 2009 - 7am Central time zone:

tell me why you cannot populate voucher_no - I have no clue what voucher_no is, does, implies, is used for - nor trn/srl_no either - it is your data, you live with it year after year - to you it is common sense what this all is, to us - it is meaningless.

how would a surrogate help? you still need to retrieve the data by something OTHER THAN the surrogate right? when you go to retrieve this data - what would the predicate be there? (that'll lead you on the path to what your primary key is...)
5 stars OMG   October 8, 2009 - 3pm Central time zone
Reviewer: Rick Davis from St Louis, MO USA
Audrey from DC posts:  "In my experience, I've found that if you have a complex enterprise 
application, it's better to avoid using natural keys as primary keys. This is because natural keys 
have real-world meaning, they stand for something, and users will eventually want to be able to 
change their value. In this example, are we 100% absolutely positive the date in the record's PK 
will never have to be 
adjusted?"  

In an OLTP environment (does not apply for Dss/DW) the following:
One of two things going on here.  First, for several very good reasons, of course a natural PK has 
a real-world meaning.  If the meaning changes you have a new entity (table).  You can't simply 
kidnap a natural PK when its meaning changes; you have to design a new entity (table).  If a 
mistake is made entering a PK then simply logically delete those row(s) (parent and child, if they 
exist) and add the corrected natural PK and children, if required.  

"Also, coming from the Java world, we try to avoid using anything but sequence-generated PKs 
because 
we use a popular object-relational mapping tool called Hibernate. It works infinitely better with 
sequence PKs, because it knows an object you're saving is "new" if and only if the PK is null."
  
If you are using surrogate PKs then you should also be using an Alternate Key (AK) on the column(s) 
of the natural key.  If you do not know the natural key them you have a poor design.  If you know 
the natural key, you don't need a surrogate.  Think that all the way through.  If you use a 
surrogate, without an AK, this can happen:

Create table table1 (SK number Primary Key, color varchar2(10);
Create sequence seq1;
Insert into table1 (seq1.nextval, 'RED');
insert into table1 (seq1.nextval, 'YELLOW');
Insert into table1 (seq1.nextval, 'RED');

Oops, which RED is the real RED, SK 1 or SK 3?
OK, to avoid that issue, we'll put an AK on color;
If we put an AK on color we no longer need the SK.
 



4 stars another spreadsheet database   October 9, 2009 - 9am Central time zone
Reviewer: Jim from Phila, PA
AAMMDDYYYY as a pk field.  This reminds me of the spreadsheet users who merge data into one filed 
to make the spreadsheet thinner and easier  to read.  Very bad design for a database.


3 stars   October 14, 2009 - 9am Central time zone
Reviewer: A reader 
A primary key with format AAMMYY-9999 is not a good design. What happens if we need to store more 
than 9999 values? I have seen this type of keys suggested by inexperienced business analysts. 


5 stars Primary Key in Oracle   February 23, 2010 - 8am Central time zone
Reviewer: Alok Kumar Singh from INDIA
Hi Tom,

Why does Oracle not allow to create more than one primary key on a table? What was the thought 
behind this restriction? 
Would that create a confusion in the execution plan?


Followup   March 1, 2010 - 8am Central time zone:

because the rules are "there is just one primary key on a table". It is just sort of a premise of relational databases.

... Would that create a confusion in the execution plan? ...

not really sure what you are concerned about? If you have surrogate keys, you would obviously define them as UNIQUE and if applicable "NOT NULL" which provided the optimizer all of the relevant information.
3 stars One PK per Table   February 24, 2010 - 8am Central time zone
Reviewer: Rick Davis from St. Louis, MO
Oracle defines a PK as:  "The column or set of columns that uniquely identifies a row in a table. 
Only one primary key can be defined for each table."

A table usually has a column or combination of columns whose values uniquely identify each row in 
the table. This column (or columns) is called the primary key (sometimes called the natural key) of 
the table and enforces the entity integrity of the table. A table can have only one PRIMARY KEY 
constraint, and a column that participates in the PRIMARY KEY constraint cannot accept null values. 


You might be thinking of an Alternate Key (AK), these are usually used when the PK is a sequence 
and the natural key column or columns have an AK built on them. 


1 stars Primary keys are for building relationships   March 22, 2011 - 5am Central time zone
Reviewer: Dominic from UK
While it is true that every table should have a definition to ensure that every row is unique, I do 
not believe that it is the job of the primary key to do that. 

A primary key is used for maintaining relationships between entities and should be as narrow as 
possible (e.g. a numeric field). Unique indexes can be used to define uniqueness on a row.

I am currently working on a database that uses natural and compound natural keys. Regardless of the 
data wanting to change, all relationships are horribly inefficient and needlessly complex to code 
against.

I think creating tables should be simple and I always follow two rules: 

1. Always use a surrogate, numeric PK 
2. Nearly always use a unique index to define uniqueness (exceptions may be log tables and such, 
where duplicate data may not mean that the data is invalid).

And so, to the original question:

Do not spend time thinking about what the PK should be, always create a Numeric surrogate. Do spend 
time on thinking about what makes the rest of the data unique, and define that with unique indexes.


Followup   March 22, 2011 - 8am Central time zone:

You would be - in my humble opinion - wrong.


Do spend time thinking.
Do spend time thinking about your data model.

Do use a natural key when it makes sense. Most common - something like this:

create table documents( doc_id number primary key, ... )

(hey, you must be thinking, wait a minute, he is tricking me, that is a surrogate, sure that is, but....)

create table document_version( doc_id references documents, version number, ..., primary key(doc_id, version ) );

document version has a natural key.



Do spend time thinking, period. Otherwise you'll end up with 'junk'.

http://databases.about.com/cs/administration/g/primarykey.htm


A primary key is NOT used for maintaining relationships exclusively.
4 stars Surrogates usually win but...   March 22, 2011 - 10am Central time zone
Reviewer: Galen Boyer from Boston
I've heard the rule, "Do not expose your primary keys" to users.
Hm..., that would mean you can't have a natural key as the PK, so if
I'm in a shop and the main guy there is spouting that, then I guess
surrogates win.

I very much have a like/dislike relationship with Natural keys.  The
like is because the queries against children tables do not have to
join back to the parent with the natural key, they can just ask the
table directly.  With natural keys as AKs, then all joins involve
driving through the model to get to the table that has the AK tied to
the PK that's on the child.

My dislike is when the number of attributes for the PK get numerous,
maybe above 3.  But, that's just my own sort of dislike.

My like for surrogate keys are truly because the object oriented world
has shown up.  In supporting an object oriented system, a single key
does simplify the world for everybody.  Objects in the app space have
the sense of a single id, getting that from the database makes the tie
exact.  Yeah, I know you will continue to ughhhh, but, hey, surrogates
work as PKs.  Even when the object oriented system where to go away,
the design won't be wrong, so one isn't wrong in designing it that way
to begin with.  When the surrogate world is part of the design then I
only have one rule, there must be a NOT NULL attribute identified as
the alternate key.  My beef with that design is the arrogance I find
in the documentation for HIBERNATE.  Their documentation states it
with the following flavor, "Surrogate keys are the correct way to
model databases".  Huh???  I model surrogates because of the inherent
weak assumption of the object model, not because its the exact correct
way to do it!!!  Those arrogant java architects were probably the same
bozos that came up with the ridiculous idea of entity beans.  Yeah,
lets do all the caching of rows on the app side, heck, lets have
multiple objects cached on the app all representing the same row, all
because somebody said trips to the database were expensive.  Sure,
that's fact!!!  We all know it, we've all seen the numbers to support
it, you know those well publicized numbers showing how trips to the
database cost the app way too much time.  Oh wait a minute, its the
myth from the java world.  Must be akin to the new "database
ignorance" model of app development.  Okay, rant over...

If you are building a datawarehouse where your dimensions aren't even
close to 3rd normal, your are forced into surrogates.  But, what you
will find is the natural key is how you use access the table.  Some
fact will come in from the source tied to the "natural key" with some
date and your code will look up in the dimension with the natural key
where the source date is between the start/end dates and vhalla, you
get the dimensions key for the fact.  So, in the Kimball datawarehouse
world, always use surrogates for PKs.


Followup   March 22, 2011 - 11am Central time zone:

I've heard the rule, "Do not expose your primary keys" to users.
Hm..., that would mean you can't have a natural key as the PK, so if
I'm in a shop and the main guy there is spouting that, then I guess
surrogates win.


that is how is usually goes - the main persons belief.

My dislike is when the number of attributes for the PK get numerous,
maybe above 3. But, that's just my own sort of dislike.


I tend to agree with that.

My beef with that design is the arrogance I find
in the documentation for HIBERNATE. Their documentation states it
with the following flavor, "Surrogate keys are the correct way to
model databases".


I concur with your opinion on that - definitely.


If you are building a datawarehouse where your dimensions aren't even
close to 3rd normal, your are forced into surrogates.


I also see it way over done there too. Which would you prefer:

create table time_dim
( id number primary key,
dt date, /* stored without time, just dd-mon-yyyy data */
month date /*+ trunc-ed down to 01-mon-yyyy */
qtr_year,
fyqtr_year,
fy,
year
);

create table fact_table
( my_key,
sales_date number references time_dim,
that_date number references time_dim,
the_other_date number references time_dim,
...)


or


create table time_dim
( dt date primary key, /* stored without time, just dd-mon-yyyy data */
month date /*+ trunc-ed down to 01-mon-yyyy */
qtr_year,
fyqtr_year,
fy,
year
);

create table fact_table
( my_key,
sales_date date references time_dim,
that_date date references time_dim,
the_other_date date references time_dim,
...)




I know which one I'd much prefer and it isn't the first one.
5 stars Is it immutable?   March 22, 2011 - 11am Central time zone
Reviewer: S from Canada
Remember Julian to Gregorian calendar change :))


Followup   March 22, 2011 - 12pm Central time zone:

;)
4 stars Yeah, I know on the dates dim   March 22, 2011 - 12pm Central time zone
Reviewer: Galen Boyer from Boston
Hi Tom,

I hear your words clashing with Kimballs on the date dimension
everytime I use it.  On that one I went with Kimball.  To me there is
one fundamental reason a date is wrong as a key and it is that it
really does violate the "atomic" principle.  A date is stored with the
expectation that the stored attribute will be parsed looking for
months, or days, or years or quarters, so, its like the end all be-all
of intelligent keys, and we know the rule on intelligence in a key.
So, in my theoretical rationalization, I've joined Kimball.  But, not
having to join to the date dimension when I'm researching things would
be alot easier, which, is when I hear your words loud and clear.  :-)
And, don't get me wrong, I still have dates all over my models, but,
in the dimensional world, I've sided with Kimball.

I guess, the "ease" or correctness of the design is found when end
users are accessing the data.  Doing "date logic" is much less easier
to them than doing math, ie, the subtracting the julian number.
That's sometimes true based on what user, and one could also say,
well, let them do the join then!  But, Kimball makes a very loud point
explicitly stating do not use dates on the fact, implicitly saying,
"Force the join".

But, I hear ya.  I mention this clash many times with my group.  I
say, "Tom Kyte" would take me out to the woodshed on this, but Kimball
point blanks says don't use a date.  Hem, hah, flip a coin, make a
call...


Followup   March 22, 2011 - 1pm Central time zone:

A date is stored with the
expectation that the stored attribute will be parsed looking for
months, or days, or years or quarters, so, its like the end all be-all
of intelligent keys, and we know the rule on intelligence in a key.


but that is why I built my time dimension with month, year, quarter, fy-quarter etc in it...

I don't like forcing the join when the join isn't necessary myself.

I wouldn't take you out to the woodshed - I've seen it done both ways, and regrets have come from both sides of the coin.



All information and materials provided here are provided "as-is"; Oracle disclaims all express and implied warranties, including, the implied warranties of merchantability or fitness for a particular use. Oracle shall not be liable for any damages, including, direct, indirect, incidental, special or consequential damages for loss of profits, revenue, data or data use, incurred by you or any third party in connection with the use of this information or these materials.

About Oracle | Legal Notices and Terms of Use | Privacy Statement