Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Ashok.

Asked: March 11, 2008 - 2:57 am UTC

Last updated: July 30, 2012 - 9:24 am UTC

Version: 10g R2

Viewed 1000+ times

You Asked

Dear Tom,

Need your advice on determining the Primary Key. There is a table which has got a natural key of data type CHAR(10) which will always be Unique. There are other dependent tables which will refer to this Primary Key. Will it be good to have a Surrogate key as a Primary Key of datatype NUMBER instead of having Natural Key as Primary Key of Datatype CHAR(10).


and Tom said...

there is nothing wrong with using a natural key, especially a single attribute one.

but - suggestion - never ever never use char, only use varchar2.


Rating

  (20 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Primary Keys

John Flack, March 12, 2008 - 8:56 am UTC

I agree that there is nothing wrong with using a natural key as primary. However, primary keys should be immutable, i.e. they should never change, ESPECIALLY if they are used as foreign keys in other tables. So if your natural key is subject to change, I advise you to make it an unique key, but not primary, and go ahead and add a surrogate primary key generated from a sequence.

Brian Tkatch, March 12, 2008 - 11:11 am UTC

"never ever never use char, only use varchar2."

Would you say that even when using just CHAR(1)?
What if the field is "guaranteed" to use the full width?

Tom Kyte
March 12, 2008 - 5:40 pm UTC

yes, I would say that for varchar2(1)

why?

because you'll not be tempted to use a char. If you use it once, you might use it twice, so just DO NOT USE IT

A char is a varchar2 that is blank padded to the full width. So, if it is always the full width the char and varchar2 are identical - IDENTICAL - and since using CHAR is a really bad idea in general, best just to NEVER USE IT - so you know, you are not tempted to use it again.

Tom's take on CHAR vs VARCHAR2

Duke Ganote, March 12, 2008 - 12:49 pm UTC

Identifying a Primary Key

Ashok, March 13, 2008 - 2:10 am UTC

Thanks a lot for the guidance. What would be your advice on following scenario.
The format of the NATURAL Key (Which will be a Primary Key for the table) would be AAMMYY9999 for making it meaning ful to the end users
where AA is Alphabets
MM is the month
YY is the Year
9999 is the number generated from running sequence

In this case would the decision of having Natural Key as Primary Key would stand logical in terms of Performance aspect or Should a Surrogate Key be Introduced as Primary Key performance wise.

Please suggest.

Thanks
Tom Kyte
March 13, 2008 - 8:45 am UTC

AAMMYY9999 - ugh, that is so bad. storing a date in a string, storing two attributes in one :( so bad, so sad.


this has nothing to do with performance, a number in Oracle is a varying width pack decimal like thing - it will not be significantly smaller, in fact a surrogate would likely have a negative impact as it would

a) make the entire data set that much larger
b) take time to generate
c) take time to look up and propogate down to the child tables
d) not be able to be converted immediately without a lookup into the original data that contains a date hidden in a string..

AAMMYY9999 vs surrogate discussion

Duke Ganote, March 13, 2008 - 7:06 am UTC

Something to consider

Andrey, March 13, 2008 - 1:01 pm UTC

Tom,

In my experience, I've found that if you have a complex enterprise application, it's better to avoid using natural keys as primary keys. This is because natural keys have real-world meaning, they stand for something, and users will eventually want to be able to change their value. In this example, are we 100% absolutely positive the date in the record's PK will never have to be adjusted?

Also, coming from the Java world, we try to avoid using anything but sequence-generated PKs because we use a popular object-relational mapping tool called Hibernate. It works infinitely better with sequence PKs, because it knows an object you're saving is "new" if and only if the PK is null.

Just my 2c.
Tom Kyte
March 15, 2008 - 8:57 am UTC

something else to consider:

create table document( surrogate number primary key, .... );


create table document_version( surrogate, version, ...., foreign key(surrogate,versoin) );


I have found that many times natural keys "just fall out, exist, are ok, are there, should be used" as well.




...
Also, coming from the Java world, we try to avoid using anything but
sequence-generated PKs because we use a popular object-relational mapping tool
called Hibernate. It works infinitely better with sequence PKs, because it
knows an object you're saving is "new" if and only if the PK is null.
...

oh swell, do everything for the ease and convenience of a language.


...
In this example, are we 100%
absolutely positive the date in the record's PK will never have to be adjusted?
.....

if you cannot answer that question, you haven't finished your design, haven't done the analysis perhaps....


Not that I like their 'unnaturally natural' key above - aayyyymmdd - yuck, as bad as doing something for the ease and convenience of a language and a framework that won't exist in 5 years time.

Thanx!

Brian Tkatch, March 13, 2008 - 1:48 pm UTC

Tom, thank you for responding. I was wondering about that, and what was the reason. Even when the reason is don't get used to it! :)

@Duke Ganote

I did a search and found the article, but i did not see the comments that specifically referred to CHAR and VARCHAR2(1). Thanx for pointing it out.

I love AskTom!

aaddmmyy-ssss as record primary key

Amir Riaz, March 17, 2008 - 2:50 am UTC


hi tom

in most of databases i have designed i found that the client has some sort of requirement in one or two place to identify a record with keys like AAMMYY-9999

where
AA is Alphabets
MM is the month
YY is the Year
9999 is the number generated from running sequence

and
sadly its been a common practice which sadly i also haven been practicing. but i still not manage to understand what problem it causes because if i use three attributes for it i still have to generate date and sequence

so there should not be any difference between

create table surprised(seq integer, pdate date, symbol varchar2(2) /*some other columns here*/);

alter table surprised
ADD CONSTRAINT looks_strange_pk PRIMARY KEY(seq ,pdate,symbol);

insert into surprised values(seq.nextval, sysdate, 'CS');

and

create table i_do(pkey varchar2(60) primary key, /* other keys*/);

insert into i_do values ('CS'||sysdate||seq.nextval);

Since the primary key does not changes i think it should not make any difference. If yes please make it clear with some example
Tom Kyte
March 24, 2008 - 8:42 am UTC

Well, actually, all you need is:

create table surprised(seq integer, pdate date, symbol varchar2(2) /*some other columns here*/);

alter table surprised
ADD CONSTRAINT looks_strange_pk PRIMARY KEY(seq);

period - pdate and symbol are what as known as functionally dependent data - they are IMPLIED by the value of SEQ.

You have what is known as a surrogate key, it only needs be unique. SEQ is already unique. The addition of pdate and symbol to the primary key is not useful.



If the users can see it, they WILL want to change it

Dan, March 19, 2008 - 7:53 am UTC

In my experience, if you allow the users to see the key, they WILL eventually want to change it, despite having "complete" requirements, now. (Requirements change...)

Perhaps in this example, the components of the key are generated, but consider the corner cases

... time--> duplicates in the US when daylight savings time flips...

...in your case you have a sequence going to 9999. What happens if you get that many in one day?

...I have also seen cases where things that cannot happen in real life "happen' in the system. A company can't move from Canada to the US, but some minimum-wage data entry person can type in MB (Manitoba) instead of MN (Minnesota). As far as the database is concerned, the company just moved from Canada to the US.


Unless there is a real physical reason (and enforced by edit checks) that makes it unique, I usually make intelligent keys become unique keys, and create a surrogate sequence-based primary key.


Tom Kyte
March 24, 2008 - 9:41 am UTC

In short, end users should not have a say in the physical schema design - period, never, not ever.

they can specify a requirement (which we should then freely debate and not take as a "rule", but rather as a request - after explaining the COST of such a request) - but not a physical schema.

Fail to identify Primary Key

Arindam Mukherjee, October 06, 2009 - 5:27 am UTC

Respected Mr. Tom,

At the outset, I am sorry to say something that may be against the current. In Design phase, we get stuck in identifying Primary key of client transaction table. The table structure looks like below. The bold columns have been identified as a composite primary key in Oracle 10g RAC.

Client Transaction Table

Clinent ID
VOUCHER_NO
TRN_NO
SRL_NO


GL_ID
TRN_DATE
FINANCIAL_YEAR
COMPANY_ID
REGION_ID
BRANCH_ID
BILL_TYPE
DR_AMOUNT
CR_AMOUNT
TAX_PERCENTAGE

At the time of transaction, it is okay but at the year end, the outstanding amount, if any, for every client or customer will be carried forwarded to the same table and the existing data will be removed.

During the year end process, we can easily populate data to all columns expect voucher_no, Trn_no and Srl_no because consolidated amount either Debit or Credit will be brought forwarded from the previous year transaction history. At this stage, predefined primary key is violated.

Other team members are suggesting now to introduce surrogate key (Sequence Number) as primary key but I could not find any value of surrogate key in transaction table as this key will not be used in generating report or SQL query on Sub Ledger wise payment or outstanding etc.

I request you to suggest how to overcome this problem. If you think, table design should be changed, please suggest. We are desperate in need of the solution.

Regards,
Tom Kyte
October 08, 2009 - 7:01 am UTC

tell me why you cannot populate voucher_no - I have no clue what voucher_no is, does, implies, is used for - nor trn/srl_no either - it is your data, you live with it year after year - to you it is common sense what this all is, to us - it is meaningless.

how would a surrogate help? you still need to retrieve the data by something OTHER THAN the surrogate right? when you go to retrieve this data - what would the predicate be there? (that'll lead you on the path to what your primary key is...)

OMG

Rick Davis, October 08, 2009 - 3:22 pm UTC

Audrey from DC posts: "In my experience, I've found that if you have a complex enterprise application, it's better to avoid using natural keys as primary keys. This is because natural keys have real-world meaning, they stand for something, and users will eventually want to be able to change their value. In this example, are we 100% absolutely positive the date in the record's PK will never have to be
adjusted?"

In an OLTP environment (does not apply for Dss/DW) the following:
One of two things going on here. First, for several very good reasons, of course a natural PK has a real-world meaning. If the meaning changes you have a new entity (table). You can't simply kidnap a natural PK when its meaning changes; you have to design a new entity (table). If a mistake is made entering a PK then simply logically delete those row(s) (parent and child, if they exist) and add the corrected natural PK and children, if required.

"Also, coming from the Java world, we try to avoid using anything but sequence-generated PKs because
we use a popular object-relational mapping tool called Hibernate. It works infinitely better with
sequence PKs, because it knows an object you're saving is "new" if and only if the PK is null."

If you are using surrogate PKs then you should also be using an Alternate Key (AK) on the column(s) of the natural key. If you do not know the natural key them you have a poor design. If you know the natural key, you don't need a surrogate. Think that all the way through. If you use a surrogate, without an AK, this can happen:

Create table table1 (SK number Primary Key, color varchar2(10);
Create sequence seq1;
Insert into table1 (seq1.nextval, 'RED');
insert into table1 (seq1.nextval, 'YELLOW');
Insert into table1 (seq1.nextval, 'RED');

Oops, which RED is the real RED, SK 1 or SK 3?
OK, to avoid that issue, we'll put an AK on color;
If we put an AK on color we no longer need the SK.



another spreadsheet database

Jim, October 09, 2009 - 9:02 am UTC

AAMMDDYYYY as a pk field. This reminds me of the spreadsheet users who merge data into one filed to make the spreadsheet thinner and easier to read. Very bad design for a database.

A reader, October 14, 2009 - 9:30 am UTC

A primary key with format AAMMYY-9999 is not a good design. What happens if we need to store more than 9999 values? I have seen this type of keys suggested by inexperienced business analysts.

Primary Key in Oracle

Alok Kumar Singh, February 23, 2010 - 8:05 am UTC

Hi Tom,

Why does Oracle not allow to create more than one primary key on a table? What was the thought behind this restriction?
Would that create a confusion in the execution plan?
Tom Kyte
March 01, 2010 - 8:17 am UTC

because the rules are "there is just one primary key on a table". It is just sort of a premise of relational databases.

... Would that create a confusion in the execution plan? ...

not really sure what you are concerned about? If you have surrogate keys, you would obviously define them as UNIQUE and if applicable "NOT NULL" which provided the optimizer all of the relevant information.

One PK per Table

Rick Davis, February 24, 2010 - 8:47 am UTC

Oracle defines a PK as: "The column or set of columns that uniquely identifies a row in a table. Only one primary key can be defined for each table."

A table usually has a column or combination of columns whose values uniquely identify each row in the table. This column (or columns) is called the primary key (sometimes called the natural key) of the table and enforces the entity integrity of the table. A table can have only one PRIMARY KEY constraint, and a column that participates in the PRIMARY KEY constraint cannot accept null values.

You might be thinking of an Alternate Key (AK), these are usually used when the PK is a sequence and the natural key column or columns have an AK built on them.

Primary keys are for building relationships

Dominic, March 22, 2011 - 5:33 am UTC

While it is true that every table should have a definition to ensure that every row is unique, I do not believe that it is the job of the primary key to do that.

A primary key is used for maintaining relationships between entities and should be as narrow as possible (e.g. a numeric field). Unique indexes can be used to define uniqueness on a row.

I am currently working on a database that uses natural and compound natural keys. Regardless of the data wanting to change, all relationships are horribly inefficient and needlessly complex to code against.

I think creating tables should be simple and I always follow two rules:

1. Always use a surrogate, numeric PK
2. Nearly always use a unique index to define uniqueness (exceptions may be log tables and such, where duplicate data may not mean that the data is invalid).

And so, to the original question:

Do not spend time thinking about what the PK should be, always create a Numeric surrogate. Do spend time on thinking about what makes the rest of the data unique, and define that with unique indexes.
Tom Kyte
March 22, 2011 - 8:37 am UTC

You would be - in my humble opinion - wrong.


Do spend time thinking.
Do spend time thinking about your data model.

Do use a natural key when it makes sense. Most common - something like this:

create table documents( doc_id number primary key, ... )

(hey, you must be thinking, wait a minute, he is tricking me, that is a surrogate, sure that is, but....)

create table document_version( doc_id references documents, version number, ..., primary key(doc_id, version ) );

document version has a natural key.



Do spend time thinking, period. Otherwise you'll end up with 'junk'.

http://databases.about.com/cs/administration/g/primarykey.htm

A primary key is NOT used for maintaining relationships exclusively.

Surrogates usually win but...

Galen Boyer, March 22, 2011 - 10:33 am UTC

I've heard the rule, "Do not expose your primary keys" to users.
Hm..., that would mean you can't have a natural key as the PK, so if
I'm in a shop and the main guy there is spouting that, then I guess
surrogates win.

I very much have a like/dislike relationship with Natural keys. The
like is because the queries against children tables do not have to
join back to the parent with the natural key, they can just ask the
table directly. With natural keys as AKs, then all joins involve
driving through the model to get to the table that has the AK tied to
the PK that's on the child.

My dislike is when the number of attributes for the PK get numerous,
maybe above 3. But, that's just my own sort of dislike.

My like for surrogate keys are truly because the object oriented world
has shown up. In supporting an object oriented system, a single key
does simplify the world for everybody. Objects in the app space have
the sense of a single id, getting that from the database makes the tie
exact. Yeah, I know you will continue to ughhhh, but, hey, surrogates
work as PKs. Even when the object oriented system where to go away,
the design won't be wrong, so one isn't wrong in designing it that way
to begin with. When the surrogate world is part of the design then I
only have one rule, there must be a NOT NULL attribute identified as
the alternate key. My beef with that design is the arrogance I find
in the documentation for HIBERNATE. Their documentation states it
with the following flavor, "Surrogate keys are the correct way to
model databases". Huh??? I model surrogates because of the inherent
weak assumption of the object model, not because its the exact correct
way to do it!!! Those arrogant java architects were probably the same
bozos that came up with the ridiculous idea of entity beans. Yeah,
lets do all the caching of rows on the app side, heck, lets have
multiple objects cached on the app all representing the same row, all
because somebody said trips to the database were expensive. Sure,
that's fact!!! We all know it, we've all seen the numbers to support
it, you know those well publicized numbers showing how trips to the
database cost the app way too much time. Oh wait a minute, its the
myth from the java world. Must be akin to the new "database
ignorance" model of app development. Okay, rant over...

If you are building a datawarehouse where your dimensions aren't even
close to 3rd normal, your are forced into surrogates. But, what you
will find is the natural key is how you use access the table. Some
fact will come in from the source tied to the "natural key" with some
date and your code will look up in the dimension with the natural key
where the source date is between the start/end dates and vhalla, you
get the dimensions key for the fact. So, in the Kimball datawarehouse
world, always use surrogates for PKs.

Tom Kyte
March 22, 2011 - 11:31 am UTC

I've heard the rule, "Do not expose your primary keys" to users.
Hm..., that would mean you can't have a natural key as the PK, so if
I'm in a shop and the main guy there is spouting that, then I guess
surrogates win.


that is how is usually goes - the main persons belief.

My dislike is when the number of attributes for the PK get numerous,
maybe above 3. But, that's just my own sort of dislike.


I tend to agree with that.

My beef with that design is the arrogance I find
in the documentation for HIBERNATE. Their documentation states it
with the following flavor, "Surrogate keys are the correct way to
model databases".


I concur with your opinion on that - definitely.


If you are building a datawarehouse where your dimensions aren't even
close to 3rd normal, your are forced into surrogates.


I also see it way over done there too. Which would you prefer:

create table time_dim
( id number primary key,
dt date, /* stored without time, just dd-mon-yyyy data */
month date /*+ trunc-ed down to 01-mon-yyyy */
qtr_year,
fyqtr_year,
fy,
year
);

create table fact_table
( my_key,
sales_date number references time_dim,
that_date number references time_dim,
the_other_date number references time_dim,
...)


or


create table time_dim
( dt date primary key, /* stored without time, just dd-mon-yyyy data */
month date /*+ trunc-ed down to 01-mon-yyyy */
qtr_year,
fyqtr_year,
fy,
year
);

create table fact_table
( my_key,
sales_date date references time_dim,
that_date date references time_dim,
the_other_date date references time_dim,
...)




I know which one I'd much prefer and it isn't the first one.

Is it immutable?

S, March 22, 2011 - 11:54 am UTC

Remember Julian to Gregorian calendar change :))
Tom Kyte
March 22, 2011 - 12:00 pm UTC

;)

Yeah, I know on the dates dim

Galen Boyer, March 22, 2011 - 12:37 pm UTC

Hi Tom,

I hear your words clashing with Kimballs on the date dimension
everytime I use it. On that one I went with Kimball. To me there is
one fundamental reason a date is wrong as a key and it is that it
really does violate the "atomic" principle. A date is stored with the
expectation that the stored attribute will be parsed looking for
months, or days, or years or quarters, so, its like the end all be-all
of intelligent keys, and we know the rule on intelligence in a key.
So, in my theoretical rationalization, I've joined Kimball. But, not
having to join to the date dimension when I'm researching things would
be alot easier, which, is when I hear your words loud and clear. :-)
And, don't get me wrong, I still have dates all over my models, but,
in the dimensional world, I've sided with Kimball.

I guess, the "ease" or correctness of the design is found when end
users are accessing the data. Doing "date logic" is much less easier
to them than doing math, ie, the subtracting the julian number.
That's sometimes true based on what user, and one could also say,
well, let them do the join then! But, Kimball makes a very loud point
explicitly stating do not use dates on the fact, implicitly saying,
"Force the join".

But, I hear ya. I mention this clash many times with my group. I
say, "Tom Kyte" would take me out to the woodshed on this, but Kimball
point blanks says don't use a date. Hem, hah, flip a coin, make a
call...

Tom Kyte
March 22, 2011 - 1:54 pm UTC

A date is stored with the
expectation that the stored attribute will be parsed looking for
months, or days, or years or quarters, so, its like the end all be-all
of intelligent keys, and we know the rule on intelligence in a key.


but that is why I built my time dimension with month, year, quarter, fy-quarter etc in it...

I don't like forcing the join when the join isn't necessary myself.

I wouldn't take you out to the woodshed - I've seen it done both ways, and regrets have come from both sides of the coin.

Alternate key conditioned to primary key use

Nilton Maganha, July 24, 2012 - 9:32 pm UTC

How avoid someone to change an alternate key value (natural key) in which the primary key is already being referenced as foreign key in another table?

ex.
create table employee(
employee_id number primary key,
employee_code varchar2(5) unique
...);

create table employee_salaries(
salaries_id number primary key,
empoyee_id number references employee(employee_id)
...);

I want to avoid someone to change the employee_code if the employee_id was already used by another table (referenced)

Consider that the primary key can be used by dozen of other tables (referenced)
Tom Kyte
July 30, 2012 - 9:24 am UTC

whilst you could do this with triggers, I would suggest "you do not"

I would suggest you code your transactions in stored procedures and do not let applications (programmed by programmers - who do not necessarily understand the database or how it works) access the tables directly with updates/inserts/deletes and the like.


one approach would be to add a column to the employee table - referenced_count - and use a trigger/stored procedure to increment that counter every time a row was added that referenced it and decrements. then just make sure that the referenced count was 0 before the update began.


Or, better yet, change the data model so that employees with mutable codes are in one table and those with static are in another (they seem to be two entirely different types of employees).


another would be to

a) when updating the employee code column - you would have to select for update that row first, and after doing so, perform a count against each child table looking for children. Upon finding none, proceed with the udpate

b) when inserting into or updating the foreign key of any of the child tables - you would have to first select for update the employee row - and then you could modify the data.


that would make it so that the parent cannot be modified while any child records are being created/worked on.

and that child records cannot be created for that employee while the parent is being worked on.