You Asked

Good Morning,

Is there some underlying rule or DBA agreement that every table should have a primary key? To me it doesn't make logical sense that every table we define needs a primary key. Just recently I came across a primary key made up of two columns with each having about 60% unique values in a 1.5 million row table. The optimizer chooses to perform a full table when both those columns are used in the "Where" clause since the primary key isn't very selective. This table doesn't even have a child table.

My assumption is that a primary key is only needed when it has a child table.

On the other side, if a large, frequently accessed table doesn't have any child tables and doesn't have any unique column, should we create a column based off the value of a sequence and add a unique index (instead of primary key) on it? I would assume that isn't needed in all cases. An example where it would be needed is in the case where the application inserts a row and then plans to access it within the same session.

In summary of questions:
- Is there some underlying rule or DBA agreement that every table should have a primary key?
- My assumption is that a primary key is only needed when it has a child table?
- On the other side, if a large, frequently accessed table doesn't have any child tables and doesn't have a unique column, should we create a column based off the value of a sequence and add a unique index (instead of primary key) on it?

Thanks,

John

and Chris said...

The main goal of a primary key is to stop you storing duplicate values in its columns. You can also achieve this using unique constraints.

Both of these can be the target of foreign keys. So you could build a schema (with foreign keys) that only has unique constraints.

But by convention foreign keys reference primary keys, not unique constraints. This is because a table can have many unique constraints, but at most one primary key. By having all foreign keys point to primary keys, you make it easier for other developers/DBAs to understand the schema.

You also limit cascading update problems when you change unique/primary column values. This is where you update foreign key referenced columns in the parent table, then need to apply this update to all rows in all child tables.

To further limit the cascading update problem, often these primary keys are surrogates - synthetic values such as incrementing integers or GUIDs. These have no meaning outside the application, so should never change. Unlike the values in unique constraints (aka candidate or business keys) such as SSNs, vehicle license plate numbers, passport numbers, etc. which can and do change.

It's also worth noting that many are unaware a foreign key can point to a unique constraint. By sticking to primary keys, you reduce the "surprise factor" for the application. Which makes it easier to maintain.

Finally many frameworks expect tables to have single-column primary keys. It can be harder to get these to work with tables with compound keys.

Is there some underlying rule or DBA agreement that every table should have a primary key?

It's rare for a table to need neither a primary key nor any unique constraints. If both are missing, this is usually a sign of a modelling failure.

Not all tables need surrogate keys though. This is most likely when the business keys are standardized short character codes. For example, ISO currency and country codes or IATA airport codes. In these cases I would make these values the primary key:

create table currencies (
  currency_code varchar2(3) primary key,
 ...
)

That said there are few types of table where neither constraint are appropriate. Staging tables are the main ones that come to mind.

My assumption is that a primary key is only needed when it has a child table?

No. As stated above, a table may not need a surrogate key. But I would strive to find the unique constraints and designate one the primary key.

It's also worth bearing in mind: just because there are no child tables now, as the application evolves it's likely many childless tables will become parents at some stage!

On the other side, if a large, frequently accessed table doesn't have any child tables and doesn't have a unique column, should we create a column based off the value of a sequence and add a unique index (instead of primary key) on it?

I would first ask what the keys are for the table - the columns that shouldn't have duplicate values. Then define unique constraints on the keys and either

- pick one to be the primary key
- add a surrogate key to the table and making that the PK.

Which I'd do depends on what's stored, how it's used by the application, etc. I wouldn't create a surrogate key "just because".

The optimizer chooses to perform a full table when both those columns are used in the "Where" clause since the primary key isn't very selective

This statement makes no sense to me. If you've got equality conditions on all the columns of the primary key, you're accessing at most one row. It's selective by definition.

Please share an example of this situation.

Is this answer out of date? If it is, please let us know via a Comment

More to Explore

Design

New to good database design? Check out Chris Saxon's full fundamentals class.

Breadcrumb

Question and Answer

You Asked

and Chris said...

More to Explore

Design

Core Primary key concepts not mentioned in my Oracle training classes

Breadcrumb

Question and Answer

You Asked

and Chris said...

More to Explore

Design