Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, gvsrk.

Asked: December 03, 2002 - 11:59 am UTC

Last updated: September 10, 2012 - 6:58 pm UTC

Version: 8.07

Viewed 10K+ times! This question is

You Asked

Tom,

we have application requirement like this.the master table is Busstop having columns stopid,onstreet,crossstreet,direction.Stopid is primary key.Stopid Automatically genarating in application (from GUI)based on select of the onstreet,corssstreet,direction.
This stopid referenced in another 4 tables for stop details.The application is brand new and we are doing data loading and maintenance.If the stopid is wrong, i need to delete cascade or disable constraints and update( Personally i do not like) or create correct id and assosiate dependent records to correct Id and delete wrong ID.
It is not on going process.Once the Busstops are colected and corrected from then it is kind of maintenance.
We have debate in our group to put a sequence as primary key for this table and reference rather than stopid to make the update easy.
I am not inclind to this Idea to change the whole meaning of PK with out nothing.
This is not a composite PK and not referenced in too many tables and the updates are not on regular basis.,Then why should i create a surrogate key ?

and we said...

If the primary key has a chance -- however remote -- of changing, well, then it is NOT the primary key.

This would be like you changing your DNA halfway through your life.

You are not changing the concept of a primary key here -- your current set of attributes just don't match what a primary key is.

The autogenerated "key" is something needs to be unique, but it is not the primary key.

I would use a surrogate key here -- I almost always use surrogates personally.

Rating

  (31 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

What is a surrogate key ?

jwc, December 03, 2002 - 1:28 pm UTC

I have heard this term in a lot of places.

what is a surrogate key

Tom Kyte
December 03, 2002 - 1:40 pm UTC

Main Entry: 1sur·ro·gate

Pronunciation: 's&r-&-"gAt, 's&-r&-

Function: transitive verb

Inflected Form(s): -gat·ed; -gat·ing

Etymology: Latin surrogatus, past participle of surrogare to choose in place of another, substitute, from sub- + rogare to ask -- more at RIGHT
Date: 1533
: to put in the place of another: a : to appoint as successor, deputy, or substitute for oneself b : SUBSTITUTE


it is a substitute primary key -- for when you don't have a real one, as above.

Don't joke

Martin, December 04, 2002 - 3:32 am UTC

"This would be like you changing your DNA halfway through your life."

The way technology is going... we're not far off making this possible! ;-)


Sagi, December 04, 2002 - 10:09 am UTC

Hi Tom,

You said that Surrogate Key is a substitute for Primary key.

Do you mean to say that we have a column that is populated with SEQUENCE automatically, which is called Surrogate Key (Technical Term)?

I would appreciate if you could through some more detail on this....please.

Thanx in advance.

Regards,
Sagi

Tom Kyte
December 04, 2002 - 12:38 pm UTC

an ID column populated via a sequence is known as a surrogate key.

if the table has a true primary key -- it is known as a "natural key"

Performance of surrogate vs natural

Reader, January 24, 2003 - 9:24 am UTC

We have some consultants here now who are saying that "good warehouse design practice is to use a 4-byte surrogate value instead of a business key value for keys." They are saying that query performance is compromised when joins are performed on the longer business key fields, and that overall performance is affected by the resulting longer fact tables since more database pages must be read to process a result set. Your comments, please.

Tom Kyte
January 24, 2003 - 11:03 am UTC

ask them what a 4 byte surrogate value is in Oracle please.

I'd be REALLY interested to here their response. Then, when they say "integer", we can have a laugh as we show them the door....




Re: Performance of surrogate vs natural

Reader, January 24, 2003 - 11:23 am UTC

Yes, I had a laugh at that, too. But are the other points valid, presuming the use of a sequence as the surrogate?

Tom Kyte
January 24, 2003 - 12:49 pm UTC

not really but a little.

An index on N bytes will be smaller then an index on (N+M) bytes of data.

So, if you have a compound key that is on average 25 bytes and can replace it with a number (which varies in length from 0..22 bytes and whose avg would be smaller then 25 bytes by alot), then

o you cache more index blocks
o you compare smaller strings


It is a very common practice to use surrogate keys, yes. Sort of funny they would want to go with a 4 byte integer (in a database that has no such thing) since that would limit them to a set of 4billion rows which in a data warehouse is pushing it. Especially with incremental loads and stuff. Do you really want to be limited to

a) 4 billion
b) having to play "keeper of the keys" cause you cannot just use an increasing number -- you'll have to reuse values with such a small namespace.

Surrogate keys...

Kashif, July 07, 2003 - 4:57 pm UTC

Hi Tom,

Since performance becomes the de facto standard in many cases for the success of a warehouse, why would someone NOT use surrogate keys? Secondly, are surrogate keys the same as "factless" keys?

Thanks.

Kashif

Tom Kyte
July 07, 2003 - 5:11 pm UTC

generation of surrogate keys and mapping them back and forth during a load will not make it faster to load.

generation of surrogate keys won't necessarily make queries any faster either -- if the natural key is not "huge"

I'd have to see the text book definition of "factless" key.

SKs: Is not only about performance

Juanje, July 08, 2003 - 5:01 am UTC

Hi,

well, the use of SKs in Datawarehouse environments is not only simplicity of star design or performance. You often need to populate your dimensions with extra records not found in the source table, such as "UNKNOWN" or "NOT APPLIES" records. If you don't use SKs, What PK value would you use for this extra records? How can you be sure that the values you choose won't collide with future records from the source table? Also, taking your dimensional schemas away from possible source key changes is a good idea. Of course creating and maintaining SKs adds complexity and is extra work for ETL processes.



Tom Kyte
July 08, 2003 - 7:39 am UTC

I think we are in agreement

One Individual Sequence for each Surrogate Key ?

Arash, June 08, 2004 - 10:46 am UTC

Hi Tom,

I strongly believe in using surrogate key as PK in OLTP application. I have a question regarding the best way we can generate this auto number key? In my last project, we didn't use sequence because we didn't want have a gap! So we had Before Insert trigger which select max(id)+1 from Table.

Now, for my new project, gap is not important and I thought it's better that I use sequence.

1- Is it better to have an individual sequence for each table? or
- Just a sequence for all tables or
- One sequence for all small tables and an individual sequence for each big table.

2- Do you agree with having just one user as the owner of all objects?

Please let me know your valuable idea.

As always, thank you for your help.

-Arash


Tom Kyte
June 08, 2004 - 11:58 am UTC

gap free sequences = one person at a time does a transaction in your system.

that is the only way to do it. ONE PERSON AT A TIME.

ONE TRANSACTION.

no concurrency.


It is neither scalable, nor have I seen an actual case where it is necessary.


It is infinitely better to use a sequence. (i hope you had the proper manual locking in place, else your system is either having duplicate keys or raising ora-1 lots of times)

1) I would have a sequence per object.

2) depends. are all of the objects logically belonging together?




Thank you so much

A reader, June 08, 2004 - 12:06 pm UTC


gap free sequences

Matthias Rogel, June 08, 2004 - 12:06 pm UTC

alter sequence my_surr_key_seq nocache;

Tom Kyte
June 08, 2004 - 1:21 pm UTC

can you spell "rollback"
or "system crash"

now what? welcome to "gap city"




;-)

Matthias Rogel, June 09, 2004 - 3:13 am UTC


Designing a table load with a surrogate key

Jon, August 30, 2005 - 6:43 pm UTC

Tom,
Would you please advise me on the use of surrogate keys in the following situation.

I am designing an MIS data warehouse where a fact table is holding data aggregated by month. A dimension table is holding information that must be historically accurate.

For example:
If Company 'XYZ' has a sales rep named 'BOB' in Jan/2005
Then the sales rep for 'XYZ' changed to 'MARK' in Feb/2005 onward

They need to be able to produce a report that shows:

RPT_DATE SALES REP AMOUNT SOLD
JAN/2005 BOB 100,000
FEB/2005 MARK 200,000
MAR/2005 MARK 200,000
...
...


The design approach that I proposed for this is to use an ordered sequence as the surrogate key on the dimension table that would change whenever the attributes changed.

Upon loading the fact table the ETL job would have to select max(dim_id) for their natural key from the dimension table.

Upon loading the dimension table the ETL job would get a new sequence whenever an attribute set does not exist in the dimension table.

I have been browsing through a few DW books and have not come up anything that either validate this approach or provides a better solution. Would you please tell me if this is the best design for this situation.

Regards,
Jon


Tom Kyte
August 31, 2005 - 12:50 pm UTC

</code> http://www.google.com/search?q=%22slowly+changing+dimensions%22&sourceid=mozilla-search&start=0&start=0&ie=utf-8&oe=utf-8&client=firefox-a&rls=org.mozilla:en-US:official <code>

there are various schools of thought on slowly changing dimension, you'll want to read around a bit

Designing a table load with a surrogate key followup

Jon, August 31, 2005 - 12:02 pm UTC

Tom,
Here is the DDL for my question above.
Thanks,
Jon

>CREATE TABLE org_se_dim (
dim_org_ae_id number(7) NOT NULL,
org_id number(7) NOT NULL,
ae_account_id number(7) NOT NULL,
ae_first_name varchar(20) NOT NULL,
ae_last_name varchar(30) NOT NULL)

CREATE UNIQUE INDEX pk1_org_se_dim ON org_se_dim (dim_org_ae_id)
LOGGING
NOPARALLEL;

ALTER TABLE org_se_dim ADD (
CONSTRAINT pk1_org_se_dim PRIMARY KEY (dim_org_ae_id));


CREATE SEQUENCE seq_dim_org_ae_id INCREMENT BY 1 START WITH 1 MINVALUE 1 NOCYCLE NOCACHE ORDER

CREATE TABLE org_sales (
org_id number(7) NOT NULL,
dim_org_ae_id number(7) NOT NULL,
rpt_date date NOT NULL,
sales_amt number(10,2) NOT NULL)

CREATE UNIQUE INDEX pk1_org_sales ON org_sales
(org_id, dim_org_ae_id, rpt_date)
LOGGING
NOPARALLEL;

ALTER TABLE org_sales ADD (
CONSTRAINT pk1_org_sales PRIMARY KEY (org_id, dim_org_ae_id, rpt_date));


ALTER TABLE org_sales ADD (
CONSTRAINT ref1_org_sales FOREIGN KEY (dim_org_ae_id)
REFERENCES org_se_dim (dim_org_ae_id));



Sequences usages

Thirumaran, November 25, 2005 - 12:16 am UTC

HiTom,

I have certain queries for your response regarding sequences. Below #1&#2 are your answers (a),(b) are my question for these points

#1)generation of surrogate keys won't necessarily make queries any faster either -- if the natural key is not "huge"

a) can you please eloborate #1
b) if the natural key is not "huge" ? what is the limitations ?


#2) I'd have to see the text book definition of "factless" key.
a) w.r.t #2 i found a website using the same definition "factless key"

</code> http://www.dfpug.de/konf/konf_1998/07_c_s/e_back/e_back.htm <code>

Below Figure 2. The Sample data
"Our factless primary key here"

what is a factless key ?

Thanks in ADV
Thirumaran

Tom Kyte
November 25, 2005 - 10:05 am UTC

#1) means that

where t1.surrogate_key = t2.surrogate_key


isn't necessarily faster than

where t1.natural_key = t2.natural_key


we are just comparing bytes.

there are no "limitations", just math. If you compare a couple of bytes - it doesn't matter if the bytes represented a date, a number or a string.

#2)
scary paper there isn't it. seems to be saying "do all of this work and you'll be database independent".

like by generating your own sequences from a table and so on....

but, they didn't have the definition there either - I was asking myself for the textbook definition.


My tsql is a little rusty, but isn't anyone else afraid of the following routine for generating sequences:




CREATE PROCEDURE sp_getnewid
@tcEntity char(12)
AS
begin
/* create a simple cursor containing the new id: */
select ID from uniqueid
where uniqueid.ENTITY = @tcEntity

/* increment the field for the next time: */
update uniqueid
set ID = (ID+1)
where ENTITY = @tcEntity

/* return success: */
return ( 1 )
end


ouch, works well - for a SINGLE USER. Hate to see what happens with more than one user..... Hmmm. interesting concurrency control problems there. Wondering what happens when the duplicates start being generated....

Surrogate keys

AD, December 12, 2005 - 8:55 pm UTC

Tom,

Your advise on the following is very much appreciated.

I'll be creating a set of extract files from Oracle database that will be fed to Mainframe. As part of this requirement, every extract file should have a header, body and a trailer. I am thinking of creating separate header /trailer tables for each of these extract files, so that when I create the file I can query these tables along with the tables that are the data source for the body. Having separate tables for header/trailer is a good idea or just a waste? If not, since these tables will have only one record each, do we need to create any surrogate keys, if there are no obvious natural keys available?

Many thanks for your time.

Regards

Tom Kyte
December 13, 2005 - 8:29 am UTC

what are to be the contents of these "header" and "trailer" thingys?

contents of header/trailer

AD, December 13, 2005 - 1:05 pm UTC

Tom,

Thanks for your reply.
The header will have info like record_type, file_id, file sequence_no, file creation time etc. Whereas trailer will have record_type, file_id, record counts. Header and trailer each will be a single record.

Regards



Tom Kyte
December 13, 2005 - 1:27 pm UTC

I'd probably just put that into extraction program itself (not knowing what a file_id, file_sequence_no is of course...)

composite PK everywhere

A reader, June 19, 2006 - 12:34 pm UTC

Hi Tom

We have a new coming reporting application (still in design state) which are full of composite primary keys, from one column to 5 columns.

The data loading strategy is as follows:

- Parent table refreshed from 3 sources using Read Only Materialized Views
- To load child table, unload the data from 3 sources to flat file and load to the destination using SQL*Loader

We are facing a problem here, to avoid locking issues when refreshing parent tables we index the foreign keys columns as well, for most of child tables we ending up having indexes like this

primary key
------------
col1, col2, col3, col4, col5

FK index to parent A
------------
col1, col2, col4

FK index to parent B
------------
col1, col2, col5

We have columns which ends up indexed as many as 5 times! Do you think we should use surrogate keys in this situation?

Sheilla


Tom Kyte
June 19, 2006 - 6:45 pm UTC

how would surrogate keys cut down on the aggregate number of indexes.

If you have natural keys of a reasonable size, I see no problem using them.

composite PK everywhere

A reader, June 20, 2006 - 3:17 am UTC

Hi

The problem is we have columns which are indexed at least 2 times and at most 5 times (average 3 times). Also there are tables with 10 columns composite Primary Key!

Tom Kyte
June 20, 2006 - 9:45 am UTC

why is that a problem, as compared to have N indexes.

You would either have:

a) N indexes
b) N indexes


N would not change right? so as long as the natural keys are reasonable, there is nothing major league wrong with using them.

the problem is.......

A reader, June 20, 2006 - 11:03 am UTC



T1 is parent_1 -> PK make up of c1, c2, c3
T2 is parent_1 -> PK make up of c1, c2, c4
T3 is parent_1 -> PK make up of c1, c2, c5

T2 is child of T1, T2, and T3 so to avoid locking we indexed FKs

T2 -> c1, c2, c3, c4, c5, c6

index for FK to T1 -> c1, c2, c3
index for FK to T2 -> c1, c2, c4
index for FK to T3 -> c1, c2, c5

That is the problem I was saying. Or not really a problem as you say! (When the CBO can choose the proper index)

Tom Kyte
June 21, 2006 - 9:03 am UTC

The CBO would choose the index, yes.

You have either

a) N indexes or....
b) N indexes

You have N indexes in both cases. As I've said, as long as your natural keys are reasonable in size, there is nothing wrong in using them.

the problem is.......

A reader, June 20, 2006 - 11:06 am UTC

Make a pardon, some data were wrong, here is the correct

T1 is parent_1 -> PK make up of c1, c2, c3
T2 is parent_2 -> PK make up of c1, c2, c4
T3 is parent_3 -> PK make up of c1, c2, c5

T4 is child of T1, T2, and T3 so to avoid locking we indexed FKs

T4 -> c1, c2, c3, c4, c5, c6

index for FK to T1 -> c1, c2, c3
index for FK to T2 -> c1, c2, c4
index for FK to T3 -> c1, c2, c5

That is the problem I was saying. Or not really a problem as you say! (When the CBO can choose the proper index)


Sorry for spamming

Good

Vinod, June 20, 2006 - 1:49 pm UTC

"This would be like you changing your DNA halfway through your life."

There couldn't have been a better line to explain the concept.. Great!!!


Intelligent surrogate key

Yogesh, October 26, 2006 - 3:30 am UTC

Hi Tom,

We are somewhat facing similar situation you have described in the beginning. For our date dimension shall we go for intelligent surrogate key or dumb surrogate key , thats what I am trying to figure out.

I did a test run on sample tables one with intelligent surrogate key where date key is 20060101,20060102,20060103 etc. Another table with dumb surrogate key for date key as 1,2,3 etc. 
For all the operations like joins , sorts , indexing the cost of queries with date key 20060101 etc was higer(not much). Also the bytes processed were also on higher side. Considering that the fact tables typically have millions of records, isnt it better from performance point to have dumb surrogate keys ? 

Both the tables in the following test have identical number of rows and other physical structures and also both are analyzed.

/********** Sample output start *********/

SQL> select * from date_s;

      COL1       COL2
---------- ----------
         1          2
         2          3
         3          4
         4          5
         5          6
         6          7
         7          8
         8          9
         9         10
        10         11
        11         12
....
SQL> select * from date_i;


      COL1       COL2
---------- ----------
  20061027          1
  20061028          2
  20061029          3
  20061030          4
  20061031          5
  20061101          6
  20061102          7
  20061103          8
  20061104          9
  20061105         10
  20061106         11
.....

SQL> select count(*) from date_s;


  COUNT(*)
----------
    498232

SQL> select count(*) from date_i;


  COUNT(*)
----------
    498232

SQL> set autotrace traCEONLY EXPLAIN;

SQL> select * from date_s order by col1;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=2052 Card=498232 B
          ytes=3985856)

   1    0   SORT (ORDER BY) (Cost=2052 Card=498232 Bytes=3985856)
   2    1     TABLE ACCESS (FULL) OF 'DATE_S' (TABLE) (Cost=235 Card=4
          98232 Bytes=3985856)




SQL> select * from date_i order  by col1 ;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=2163 Card=498232 B
          ytes=4484088)

   1    0   SORT (ORDER BY) (Cost=2163 Card=498232 Bytes=4484088)
   2    1     TABLE ACCESS (FULL) OF 'DATE_I' (TABLE) (Cost=253 Card=4
          98232 Bytes=4484088)

/********* End of sample output*********/

I deeply appreciate your help and just love your site. 

Thanks 
Yogesh 

Tom Kyte
October 26, 2006 - 11:54 am UTC

think about it - with the "intelligent" (i am not at all fond of that naming convention, it conveys an opinion there in the name itself - why are they "intelligent". They are NATURAL KEYS) will allow you to skip a join in many cases! WINNER.

I do not get the relevance of your test at all. So what, you compared a table with short numbers versus a table with wide numbers. Are you saying that in REAL LIFE, you would not be retrieving the date in both cases anyway (your test case doesn't seem to be showing us anything of relevance regarding the real work you would do in real life)

surrogate Key

Yogesh, October 27, 2006 - 1:57 am UTC

Always great to get your reply.

The two tables I have used DATE_S and DATE_I are two varities of date dimension table in our model we are debating. DATE_S is having surrogate keys (column COL1) generated using dumb sequence. DATE_I is having surrogate keys (column COL1) having built in intelligence. When I am sorting on the surrogate keys in both cases, the cost of query is higher for DATE_I. Similar trend I had observed for similar queries on these tables with joins, filters etc.
Besides that while using DATE_S, I can use the surrogate key and corresponding fact table column as number(4) datatype. Whereas for DATE_I table, the key should be number(8) for both dimension and fact. As this key will be used in all the fact tables across millions of records, the size might be issue. Am not sure. Even the network traffic will be (slightly) higher in case of DATE_I type model. Isn't that correct?

I am not clear how it will reduce the joins. We will be joining the fact and dimension tables in both cases right?


Tom Kyte
October 27, 2006 - 7:44 am UTC

who cares about the cost

how is the actual observed response time.

if date_i is a DATE, has the DATE, why do you need to join to the dimension?????

PK values from sequence - security concern?

Aaron, October 27, 2006 - 1:11 pm UTC

Hi Tom -

I just got out of a meeting with a consultant who was trying to talk us out of using an Oracle SEQUENCE to populate a surrogate key - and his argument was that it is a security concern. I was so stunned to hear this that I couldn't come up with a response to disprove it. Have you ever heard of anything like this before, or can you think of any way it might be a security problem? (or could it just be that the guy was drunk?)

Anyway, he suggested it was better to use something like :

create table t (
content_ID RAW(16) default sys_guid(),
name VARCHAR2(10));

insert into t values ('system');
commit;

instead of something like:

create table t (
content_ID NUMBER
name VARCHAR2(10));

which would get populated by a sequence like:

CREATE SEQUENCE cID_sequence INCREMENT BY 1 START WITH 1 NOMAXVALUE NOCYCLE CACHE 50 ORDER);

using something like:

INSERT INTO table VALUES (cID_sequence.nextval, 'system');
commit;

We don't care about gaps, we just care about having a unique ID for each value of the content_ID.

Can you let me know which is your preference (RAW with sys_guid() or SEQUENCE) and why, and any comment on the security notion mentioned earlier?

Thanks Tom

Tom Kyte
October 27, 2006 - 8:08 pm UTC

depends on where the key is used.

Look at my url's - I pass a displayid around. It is basically a function of a random number, some other stuff and a sequence. It makes them "unguessable".

If you are generating "sessionids" - it might be nice to make them NON-GUESSABLE .

If you are generating encryption keys - the same.

If you are assigning normal keys to most data, you need not worry about.

So - they had a point, but the point only needs to be made in very very special cases.

sys_guid as primary key

Aaron, October 29, 2006 - 2:11 pm UTC

Thanks Tom.

Is there any way to ensure that sys_guid() will be unique? I don't suppose there is a way, because it is a randomly generated set of numbers; even though the chances are very, very slim that a duplicate would be generated . . there is still a chance. So using sys_guid() as a primary key might not be the best idea (better to use a sequence for PK field). Or would it be worth the risk in order to gain the "unguessable"-ness of what sys_guid can provide?

Thanks in advance for your input, I appreciate it.

Tom Kyte
October 29, 2006 - 3:10 pm UTC

unique constraints would make sure it was unique... The probability of a clash is so sufficiently small as to be ignorable as long as you have constraints in place.

sweet

A reader, October 29, 2006 - 3:13 pm UTC

Excellent. Thanks for your input, I appreciate it :-)

Philosophical Question

Kurt, October 07, 2008 - 5:36 pm UTC

In your response to the original question, you say:

[Quote]
If the primary key has a chance -- however remote -- of changing, well, then it is NOT
the primary key.

This would be like you changing your DNA halfway through your life.
[End Quote]

That seems like a statement any sane DBA or application developer would agree with.

So, if the columns of a Primary Key are inherently (logically) unchanging, then why does Oracle allow updates to Primary Key columns? Conformance to some SQL specification? Backward compatibility?

I would think Oracle could shortcut a lot of processing by simply disallowing the modification of Primary Key columns.

(Please forgive the "philosophical" nature of the question.)

Thanks.
Tom Kyte
October 08, 2008 - 9:59 pm UTC

... Conformance to some SQL
specification? ...

there are in fact no rules against it unfortunately.


Database Design Resources

Arvind Mishra, January 14, 2009 - 4:53 pm UTC

Hello Tom,

I have read your thoughts on schema design in your book. It gave me a new insight on schema design. Can you please suggest some MUST READ books and on-line resources for following:

1. Relational database design.
2. Dimensional database design

Thanks,

Arvind

Surrogate Key for Weak Entity

Sameer Kumar, September 15, 2010 - 10:35 pm UTC

Tom,

I agree for an environment where value of a Candidate Key/Business Key can change over a period of time, we should better have a surrogate key as primary key.
But if we have a scenario where we have an identifying relationship b/w strong entity-weak entity, shall the foreign key from strong entity be made primary key as well in weak entity (dependent table)? Or we should have a surrogate primary key?
Here we will be unnecessarily maintaining two indexes, two columns whereas joins will always be performed on (unique one-one identifying) foreign key. I somehow feel that the better approach in these scenario is to do away without surrogate key and use business key (identifying foreign key) as primary key. Kindly explain if you feel otherwise.
Tom Kyte
September 16, 2010 - 6:44 am UTC

Use natural keys when natural keys exist.

Use surrogate keys only when there is no true primary key.

... But if we have a scenario where we have an identifying relationship b/w strong
entity-weak entity, shall the foreign key from strong entity be made primary
key as well in weak entity (dependent table)? ...

that doesn't make sense. What does a foreign key in the 'strong' (parent) entity have to do with the child, the child has the foreign key to the parents primary key.

If you are talking about a one to one relationship, you really meant to use a SINGLE TABLE.

Surrogate Key for Weak Entity

Sameer Kumar, September 03, 2012 - 5:36 am UTC

Hi Tom,

Getting back to this old question from me...

You said


I would use a surrogate key here -- I almost always use surrogates personally.


and you said


Use natural keys when natural keys exist.


I little confusing.


that doesn't make sense. What does a foreign key in the 'strong' (parent) entity have to do with the child, the child has the foreign key to the parents primary key.

May be I used the words in wrong way. I mean a column in weak entity references a key in parent and the reference can uniquely identify the tuple in weak (child) entity as well.

If you are talking about a one to one relationship, you really meant to use a SINGLE TABLE.
What i mean here is suppose, I have a LOANS table and a table for LOANCHARGES. LOAN and LOANCHARGES though are one-to-one, but only LOANCHARGES are optional and are required only when there are charges associated.
Mmoreover the read services always read just the LOAN details or just the LOANCHARGE details and posts the charges accordingly. Now for this case I prefer creating a new table for LOANCHARGE (instead of storing it with LOANS)
I would do this to avoid storing too many nullable columns and avoid cases when I will have to query "select * from loans where chargeRelatedColumn is null". Now I can just say "select * from loancharge".

In this scenario I will prefer to use LOANID itself as PK in both the tables (instead of using a surrogate key in LOANCHARGE).

I had a discussion with people over this and thought otherwise, pointing me to your link they suggested, we should "never" use natural keys, specifically in case of master-detail/parent-child tables.

It will be nice if the users posting review/response get an email when your post a followup. :-)
If it is already there, I need to check my spam filter.
Tom Kyte
September 10, 2012 - 6:58 pm UTC

use natural keys when they exist... they do not exist frequently, hence I use surrogates most of the time.


What i mean here is suppose, I have a LOANS table and a table for LOANCHARGES. LOAN and LOANCHARGES though are one-to-one, but only LOANCHARGES are optional and are required only when there are charges associated.

then I would say that there should almost certainly be one table.


I would do this to avoid storing too many nullable columns

what is the scientific, numerate reason behind that? especially since if they are last in the table, they take zero bytes of storage?

avoid cases when I will have to query "select * from loans where chargeRelatedColumn is null"

again - why? we can index that so it would use an index if it made sense. Otherwise we'll just full scan it...



ask those people for the math or scientific reason behind it.

but what does a 1:1 relationship and whether it should be flat in a table or in two tables have to do with surrogates at all??

Sameer Kumar, September 22, 2012 - 1:59 pm UTC

Well some people had referred me to your post and told me that, we should (always) use surrogate keys.
Even when there is a natural key. I guess I got carried away in giving an example/explaining the scenario but got my point clarified that, "use natural keys whenever possible"