Skip to Main Content
  • Questions
  • Is a composite primary key a good idea?

Breadcrumb

more

Connor and Chris don't just spend all day on AskTOM. You can also catch regular content via Connor's blog and Chris's blog. Or if video is more your thing, check out Connor's latest video and Chris's latest video from their Youtube channels. And of course, keep up to date with AskTOM via the official twitter account.

Question and Answer

Tom Kyte

Thanks for the question, Greg.

Asked: May 02, 2001 - 1:33 pm UTC

Answered by: Tom Kyte - Last updated: May 12, 2012 - 4:02 pm UTC

Category: Database - Version: 8.1.6 SE

Viewed 10K+ times! This question is

Whilst you are here, check out some content from the AskTom team: The AskTOM team at OpenWorld 2017

You Asked

Tom,

I have a table that has 3 fields that, when considered in combination, could uniquely identify each record. These fields are...

Object_ID CHAR(4)
Ticket_Number NUMBER
Start_DateTime DATE

It will never be the case that any of these fields is null, nor will it ever possible for the same object to have the same ticket number at exactly the same time.

So it occurred to me that, instead of introducing a new field whose sole significance is to uniquely identify each record in the table, the combination of these 3 fields would do the trick. However, when I looked up composite primary keys in the PL/SQL Developer's Guide, it recommends not using such beasts.

What do you think?

Some things to consider:

I'd have to insert these same values as a composite foreign key in a number of other tables. Would there be any performance issues raised when linking via composite keys rather than a simple sequentially numbered key? Perhaps because there's character and date data to compare as well?

On the other hand, if it winds up that I do have to insert a new numeric field to identify each record, what's the easiest way to increment that field on each insert. Does Oracle have something akin to MS Access's "autonumber" datatype?

and we said...

If the business rule is "these three fields uniquely identify the record and uniqueness must be enforced" -- then you would need a UNIQUE CONSTRAINT on them anyway. If it would be an error for object_id,ticket_number,start_datetime to be duplicated -- you NEED the unique constraint.

You could add yet another field to the table to be the "primary key" but that does not remove the need to have a unique constraint on this field.

If the above is used as a foreign key in many other tables -- there is something to be said for using a surrogate key. If it is not used as a foreign key in alot of tables - i would just use it as the primary key.


In order to have an "auto increment" type in Oracle you will:

create sequence my_seq;

create trigger my_trigger before insert on T for each row
begin
select my_seq.nextval into :new.ID from dual;
end;
/

that'll do it.


follow up to comment by Usama Munir Dar below


Sequence numbers skip -- YES -- anything that scales above one user will exhibit that behavior. The only way to get a sequential number with no gaps is to process 1 transaction at a time at most. Sequences are not designed for that and any system that has that as a requirement will not scale anyhow (it is a really silly requirement in a computerized system -- maybe it made sense when there was a human assigning document ids or something but in a computerized system -- it just doesn't make sense).

Sequences are a highly scalable way to assign surrogate keys.

I agree that composite keys work very well, they can be used as foreign keys, everything works. HOWEVER

o a sequence is smaller then 3 columns will be. If the key of this table is a foreign key in lots of other tables -- the space savings can be enourmous.

o people frequently try to update a primary key to correct data -- using a surrogate key which is imutable solves forever this update cascade problem as you never update the primary key (the sequence) you update the 3 columns which are not stored anywhere.

o it is easier to code :

select * from p, c where p.primary_key = c.foreign_key

then

select * from p, c where p.id1 = c.fk1 and p.id2 = c.fk2 and p.id3 = c.fk3

and the first query runs faster.

Those are the things I said to take into consideration. As I said -- if the composite primary key is not a foreign key in lots of tables -- go for the composite primary key. If it is -- give serious consideration to the surrogate key populated via a sequence (and don't even consider that "gaps" might be there -- that is not even a little bit relevant, it is just a unique id).

and you rated our response

  (83 ratings)

Is this answer out of date? If it is, please let us know via a Review

Reviews

Composite Primary Key or Sequence Numbers? - Tom's view.

May 03, 2001 - 9:43 am UTC

Reviewer: Greg Kliewer from Toronto, Ontario, Canada

This really helped me to make my decision, Tom. Thanks.

Since there will be quite a few tables linked via foreign key, I'll go with the new, sequentially-numbered field as primay key.

Nah! Sequence Numbers give u Skips!

May 04, 2001 - 3:51 am UTC

Reviewer: Usama Munir Dar from Islamabad, Pakistan

Well on Papers good solution..but sequence numbers does give u skips..incase trigger fails for some reasons...BUT IF this not a point i mean a skip or two is tolerable then yes..it's ok. Why are u hasitant to make a Composite Primary key! it works just fine.

composite key columns

November 22, 2002 - 11:01 pm UTC

Reviewer: A reader from Austin, TX

I am interested in your opinion with regard to the insistence by some clients to specify the use of what I have heard referred to as "smart numbers" which are just a composite key all stuck together in one column.

A simple example would be this Incident ID:

03-40123 Format: FY-XNNNN

where:

FY = 2 digit fiscal year
X = 1 digit incident type
NNNN = 4 digit sequence of inc. type within FY


Now, I personally try to avoid this kind of identifier (system generated or otherwise) like the plague for several of what I consider to be obvious reasons:

* data entered initially was wrong and made the key wrong
(i.e.: incident type is actually 3, not 4)
too bad, the world already "knows" this incident as 03-40123

* it makes assumptions about upper limits:
1) there will never be >10,000 incident type 1’s in one FY
2) forget about adding an eleventh incident type
3) two digit year –- been there!

* they're cumbersome to implement

* what good does knowing FY and incident type at a glance do anyway?


I’ve had this problem come up at times, usually when upgrading from a "legacy" and/or paper-based system where users are resistant to changing their numbering scheme. Often, I have been able to convince (coerce?) the client to use a simple sequence, but not always.

I know you’re not big on rules of thumb but can things like the nature of the data (static reference, transactional, etc.) or a need for the key to be "memorable" give merit to this type of key.

Under what conditions do you stand for a design that makes use of a composite key column, like above, to identify data?

Would you be inclined to use a surrogate key and let the customer have their smart number whatever way they want it or would you (perhaps strongly) suggest a simple sequence approach?


Tom Kyte

Followup  

November 23, 2002 - 9:27 am UTC

A field like this could (should) be DERIVED from the other data.

There is no reason the client has to know what the real and true primary key in a physical schema is -- that is an implementation detail.

So, I would

create table t ( id int number primary key, /* filled by sequence */
fy date,
incident_type varchar2(1),
goofy_number number,
....


create view v as
select
t.*,
to_char(fy,'yy')||'-'||incident_type||to_char(goofy_number,'fm0000')their_field
from t;


You could even create a function based index on their_field if they felt they had to search by it.


proof that intelligence is in the eye of the beholder - smart number indeed. (silly number for all of the reasons you put forth - ESPECIALLY that 9,999 limit)

really foreign key

November 23, 2002 - 1:43 pm UTC

Reviewer: A reader from Austin, TX

As usual, your solution is as clear as can be. I especially like the function-based index part!

However, that doesn’t quite protect me from a bit of an "address change" problem.

As I (rather clumsily) tried to state in first bullet, once fy, incident_type, and goofy_number are known and a row is inserted, then their_number is implicitly defined. At this point, perhaps their_number is printed on reports, given out to involved parties, etc.

When it is found that, say incident_type, was entered wrong initially and is updated then the database’s view of their_number and that shown on reports, held by involved parties, etc., are a no match.

I can see basing their_number on columns that are not updatable (thru whatever means) but that may not meet "their" need.

I could build their_number at insert time, put it in its own column (UK, not PK) and never update it again even if its subcomponents change.


I guess my bottom-line question is this:

Even as the "expert" hired to provide solutions with solid data definitions, am I overstepping my bounds (and/or wasting time) when I use (sometimes) hours trying to dissuade clients from their_number in favor of MY number, a simple sequence?

Tom Kyte

Followup  

November 23, 2002 - 1:51 pm UTC

If you present to them the facts -- as you have here, that this will lead to data errors -- and they reject that, then you have done all that you can do. You might include YOUR primary key on the reports somewhere so that if and when an issue arises, at least you can ask them for the other number as well.


You are not overstepping your bounds. In fact I have more then once said "it is our jobs to point out these things to those who don't write code for money".

The most recent example of this on this site was a request for how to get N random rows from a table. I described how to do it -- problem kept getting more and more complex till it turned out they wanted a random sample from a complex query with lots of joins and such. they wanted 4 rows from this. They wanted to do this hundreds/thousands of times a day. The cost of doing this was quite high (the query took tons of resources).

Why did they want to do this? So their portal could have a picture of 4 randomly selected employees on it when people came to it. Hmmm, my answer back was "tell the business users that 90% of the machine resources will now be dedicated to serving these 4 pictures up, do you want to pay for that". The opinions were about 50/50 as to whether that is "our place" - whether we blindly do whatever we are told like sheep (bah bah) or whether we actually point out that a less then useful feature/function is really expensive and bad.

I would continue to point this out to these people -- your arguments are 100% valid. If they choose not to listen, do your best to try and protect them (via that true surrogate key) but at least you tried.

Hate to be in the place that only does 9,999 things a year. Not much room for growth there is there? And then on Jan 1st, you have to reset that silly sequence back to 0 again.



update composite primary key

January 14, 2003 - 5:12 pm UTC

Reviewer: Sean from NJ, USA

We have two tables like this:

create table t1(c1 number, c2 varchar2(50),
constraint t1p primary key(c1, c2));

create table t2(c1 number, c2 varchar2(50), c3 number,
constraint t2p primary key(c1, c2, c3),
constraint t2f foreign key (c1, c2)
references t1(c1, c2));

insert into t1(c1, c2)
values(1, 'c2');


insert into t2(c1, c2, c3)
values(1, 'c2', 3);

Now we need to update c2 of t1. Are there any ways to update this primary key
besides your suggestion of surrogate key?

I am not aware of update primary key cascade SQL command.

Best,

Sean


Tom Kyte

Followup  

January 14, 2003 - 8:34 pm UTC

c1,c2 is not your primary key then -- a primary key is imutable.

You can use deferrable constraints if you like.

You can use this package:
</code> http://asktom.oracle.com/~tkyte/update_cascade/index.html <code>

but if you plan on this being "normal", that you'll be doing this update time after time - you need to pick a different primary key.

update primary key

January 15, 2003 - 4:07 pm UTC

Reviewer: sean from NJ, USA

Hi Tom,

The package works amazingly well.

The questions:

(1) After I update the primary key, how do I dissociate that table from package (I still need the package for other tables, so I can not drop the package).

(2) Using this package seems be the best solution for us since we don’t need to change any code. Besides the performance, any other disadvantages?

(3) The solution of ‘deferrable constraints’ you mentioned is that I add deferrable to foreign key tables, so I can update primary key first, then foreign key. Is this what you mean? Here is the sample:

create table t1(c1 number, c2 varchar2(50),
constraint t1p primary key(c1, c2));

create table t2(c1 number, c2 varchar2(50), c3 number,
constraint t2p primary key(c1, c2, c3),
constraint t2f foreign key (c1, c2)
references t1(c1, c2) deferrable initially deferred);



Best,

Sean





Tom Kyte

Followup  

January 16, 2003 - 8:04 am UTC

Not sure I know how to take that "works amazingly well" ;) a backhanded compliment i guess ;)


1) the package is not associated with a table. the package creates some triggers and procedures SPECIFIC to a table - it is a code generator. It is not itself bound to any object.

2) you mean besides the fact that updating a primary key is just really a seriously bad idea that points to a long running design flaw?

3) yes, the fkeys are deferrable so you can :

set constraints all deferred;
update parent primary key;
update child1 fkey;
update child2 fkey;
set constraints all immediate <<=== if that fails, you missed a child
commit;




update primary key

January 16, 2003 - 1:46 pm UTC

Reviewer: Sean from NJ, USA

(1)  First time after package installed and t1, t2, t3 table created, I need to run this command first in order to update primary key:

exec update_cascade.on_table('t2')

But if I open a new session after this session, I was able to update primary without exec this command
'exec update_cascade.on_table('t2')'.  But I don't want people to update this PK all the times.

Here is the sample:

SQL> connect  UCDEMO/UCDEMO@e2rs
Connected.
SQL> update t2
  2  set b=900
  3  where b = 6;     -- update primary key without 'exec update_cascade.on_table('t2')'

1 row updated.
SQL> 

(2) Thanks,  We will try to correct the design flaw.

 

Tom Kyte

Followup  

January 16, 2003 - 7:49 pm UTC

exec update_cascade.on_table('t2')

created (as fully documented in the page) triggers and packages to support the update cascade. If you don't want it anymore, you just need to DROP them.



Regenerate primary key

January 17, 2003 - 10:44 am UTC

Reviewer: sean from NJ, USA

Thanks so much for your help.

Sean


how about sys_guid() instead of sequence for primary keys

July 10, 2003 - 12:44 pm UTC

Reviewer: A reader

what are the pros and cons of using sys_guid()(suspect
space may be an issue with this approach?) Also with
sys_guid() you can just put it in the default clause
( no need for the auto-increment trigger".

Obviously we dont care about "gaps" here:)

Tom Kyte

Followup  

July 10, 2003 - 2:10 pm UTC

well, RAW(16) is bigger then most number fields.

it is not something someone can write down (like a number).

some things won't like raw -- they'll turn it into a 32byte varchar2.

Not sure I would do it unless I had a compelling reason to do it.

sys_guid versus sequence as primary key

August 02, 2003 - 7:00 pm UTC

Reviewer: A reader

Hi Tom
In our case, we are using sys_guid instead of sequence
(we have some requirement to make the primary keys
globally unique) - anyways, I thought I would run
benchmarks and share results with all:

The results show that:

1. elapsed time wise sequences are faster (I used cache
option with a cache of 5000) - I did get some wild result
variation - sequences were like 52% to 90% faster than
sys_guid in different runs.
I suppose the reason that are faster is due to the sga caching of sequences , right? However,I was wondering what makes sys_guid slower - what is the internal mechanics for
generating sys_guid?)

2. Of course sequences took less space (around 1/2 of what
sys_guids occupied)

3. sys_guid consumed less latches however - mainly
following latches:

STAT...session logical reads 319,026 317,195 -1,831
STAT...db block gets 318,025 315,680 -2,345
LATCH.cache buffers chains 1,398,729 1,366,857 -31,872
LATCH.library cache 407,043 207,830 -199,213
LATCH.sequence cache 300,040 0 -300,040

Could you please explain each of them - I think the
latter two are probably the sequences latching to
get and set the next val for my session.
In particular,
what results in higher db block gets and cache buffer chains for sequences? From "Oracle reference" cache buffer
chains - refers to a hot block contention meaning
sequences result in accessing of the same block
many times (potentially of the internal table used
to store sequences (seq$ table)?

test@apps> @schema
test@apps> drop sequence my_seq;

Sequence dropped.

Elapsed: 00:00:00.09
test@apps> create sequence my_seq cache 5000;

Sequence created.

Elapsed: 00:00:00.02
test@apps> drop table t_guid ;

Table dropped.

Elapsed: 00:00:04.51
test@apps> create table t_guid
2 (
3 id RAW(16) default sys_guid() not null primary key,
4 value number
5 );

Table created.

Elapsed: 00:00:00.09
test@apps>
test@apps>
test@apps> drop table t_seq ;

Table dropped.

Elapsed: 00:00:01.44
test@apps> create table t_seq
2 (
3 id int not null primary key,
4 value number
5 );

Table created.

Elapsed: 00:00:00.08
test@apps>
test@apps>
test@apps> @data
test@apps> create or replace package run_benchmark_pkg as
2 g_number_of_records constant integer := 100000 ;
3
4 procedure insert_with_seq;
5 procedure insert_with_guid;
6 end;
7 /

Package created.

Elapsed: 00:00:00.09
test@apps> create or replace package body run_benchmark_pkg as
2 procedure insert_with_seq
3 is
4 begin
5 for i in 1 .. g_number_of_records
6 loop
7 insert into t_seq values( my_seq.nextval, i);
8 end loop;
9 end;
10
11 procedure insert_with_guid
12 is
13 begin
14 for i in 1 .. g_number_of_records
15 loop
16 insert into t_guid (value) values( i);
17 end loop;
18 end;
19
20 end;
21 /

Package body created.

Elapsed: 00:00:00.16
test@apps>
test@apps> begin
2 runstats_pkg.rs_start;
3 run_benchmark_pkg.insert_with_seq;
4 runstats_pkg.rs_middle;
5 run_benchmark_pkg.insert_with_guid;
6 runstats_pkg.rs_stop(200);
7 end;
8 /
Run1 ran in 5728 hsecs
Run2 ran in 7337 hsecs
run 1 ran in 78.07% of the time

Name Run1 Run2 Diff
STAT...calls to get snapshot s 100,582 100,847 265
STAT...commit cleanouts 423 708 285
STAT...commit cleanouts succes 423 708 285
LATCH.list of block allocation 567 858 291
STAT...immediate (CURRENT) blo 372 674 302
STAT...calls to kcmgcs 396 718 322
LATCH.transaction allocation 864 1,266 402
STAT...consistent gets - exami 670 1,130 460
STAT...consistent gets 1,001 1,515 514
STAT...free buffer requested 2,916 3,775 859
STAT...change write time 2,455 3,775 1,320
STAT...session logical reads 319,026 317,195 -1,831
STAT...dirty buffers inspected 1,996 3,991 1,995
STAT...free buffer inspected 1,996 3,991 1,995
LATCH.redo allocation 205,074 207,309 2,235
STAT...redo entries 203,898 206,150 2,252
STAT...db block gets 318,025 315,680 -2,345
LATCH.cache buffers lru chain 4,826 7,585 2,759
STAT...db block changes 408,338 411,971 3,633
LATCH.checkpoint queue latch 8,652 12,829 4,177
LATCH.cache buffers chains 1,398,729 1,366,857 -31,872
LATCH.library cache 407,043 207,830 -199,213
LATCH.sequence cache 300,040 0 -300,040
STAT...redo size 49,988,292 56,368,192 6,379,900

Run1 latches total versus runs -- difference and pct
Run1 Run2 Diff Pct
3,946,936 3,426,263 -520,673 115.20%

PL/SQL procedure successfully completed.

Elapsed: 00:02:11.30
test@apps> exec show_space ( 'T_GUID')
Unformatted Blocks ..................... 30
FS1 Blocks (0-25) ..................... 0
FS2 Blocks (25-50) ..................... 0
FS3 Blocks (50-75) ..................... 1
FS4 Blocks (75-100)..................... 31
Full Blocks ..................... 370
Total Blocks............................ 512
Total Bytes............................. 4,194,304
Total MBytes............................ 4
Unused Blocks........................... 64
Unused Bytes............................ 524,288
Last Used Ext FileId.................... 9
Last Used Ext BlockId................... 177,800
Last Used Block......................... 64

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.27
test@apps> exec show_space ( 'T_SEQ')
Unformatted Blocks ..................... 0
FS1 Blocks (0-25) ..................... 0
FS2 Blocks (25-50) ..................... 1
FS3 Blocks (50-75) ..................... 0
FS4 Blocks (75-100)..................... 40
Full Blocks ..................... 203
Total Blocks............................ 256
Total Bytes............................. 2,097,152
Total MBytes............................ 2
Unused Blocks........................... 0
Unused Bytes............................ 0
Last Used Ext FileId.................... 9
Last Used Ext BlockId................... 176,776
Last Used Block......................... 128

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.22
test@apps> spool off

Tom Kyte

Followup  

August 02, 2003 - 7:28 pm UTC

1) undocumented however, sys guid involve OS interaction and is undoubtably more "cpu intensive"

2)
3)
LATCH.cache buffers chains 1,398,729 1,366,857 -31,872

latches to get blocks in the buffer cache... sequences live on blocks... that was the latches to get the buffer to update the sequence and/or just get the current value thereof.

LATCH.library cache 407,043 207,830 -199,213

latches to latch things in the library cache. a sequence definition lives there. accessing the sequence will cause some of those.

LATCH.sequence cache 300,040 0 -300,040

should be obvious ;) but it is just a latch surrounding the sequence used internally.








thanx!

August 03, 2003 - 9:43 pm UTC

Reviewer: A reader

Please keep up the awesome work!!


good

August 04, 2003 - 2:46 am UTC

Reviewer: Dani from India

Hi Tom,
In the oracle 9.0.1.1.1 documentation says that "A composite primary key cannot have more than 32 columns". But I have tried with 33 columns its working fine. What is the reason exactly?
This is table I have tried.
create table Test(x1 number(2) ,x2 number(2) ,x3 number(2) ,x4 number(2) ,x5 number(2) ,x6 number(2) ,x7 number(2) ,x8 number(2) ,x9 number(2) ,x10 number(2),x11 number(2),x12 number(2),x13 number(2),x14 number(2),x15 number(2),x16 number(2),x17 number(2),x18 number(2),x19 number(2),x20 number(2),x21 number(2),x22 number(2),x23 number(2),x24 number(2),x25 number(2),x26 number(2),x27 number(2),x28 number(2),x29 number(2),x30 number(2),x31 number(2),x32 number(2),x33 number(2),x34 number(2),x35 number(2),x36 number(2),x37 number(2),x38 number(2),x39 number(2),x40 number(2),
primary key( x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,x16,x17,x18,x19,x20,x21,x22,x23,x24,x25,x26,x27,x28,x29,x30,x31,x32,x33))
/

Tom Kyte

Followup  

August 04, 2003 - 9:00 am UTC

looks like the limit is actually 33 then.

Not that I'm overly worried as "1" is the most common size and 5 or 6 a reasonable maximum.... 32 would be "slightly overboard" and 33 is even worse.

SURROGATE KEY

August 04, 2003 - 5:38 am UTC

Reviewer: umesh from blore india

Tom
Effective Oracle By Design -- When is your Book available in India?

Surrogate Key
- What is this key used for and how is it helpful .. I do not get the cocept of this surro... key
Thanks for your patience

Tom Kyte

Followup  

August 04, 2003 - 9:07 am UTC

sorry -- i don't do the details like "when it'll be reprinted in various countries". I'm not part of that particular part of the book publishing thing...

search this site for

surrogate


lots of articles discussing them

sequence versus sys_guid as surrogate key mechanism

August 18, 2003 - 4:54 pm UTC

Reviewer: A reader

In our case we are using sys_guid because we want
to avoid clashes when moving data from one db to another.
I am still not sure about the entire rationale - but
it seems to be that when you import and export data
from one db - db1 to another db db2, if there are sequences
then we will run into conflicts (since the same sequence
is created in both schemas.

Have you faced this situation? Is there an elegant solution
for this problem apart from using sys_guid
instead of sequences, This is different from the case
where we have a known number of sites and each site can
start the sequence at a different number.

Tom Kyte

Followup  

August 18, 2003 - 8:41 pm UTC

say you have N sites you anticipate having to service.

multiply by 100 incase you are off.

Then, at site one:

create sequence S start with 1 increment by n*100;

at site two:

create sequence S start with 2 increase by n*100;


say n = 10, site 1 will generate numbers

1, 1001, 2001, 3001, ......

site 2:

2, 1002, 2002, 3002, ......


non-overlapping sequences generated at upto 1000 sites.

thanx Tom!

August 18, 2003 - 8:47 pm UTC

Reviewer: A reader

"non-overlapping sequences generated at upto 1000 sites. "

I went through that solution in your site somewhere.
It is very elegant - but you have to know in advance the
number of sites. In our case the requirement is that we
can have any local database where the schema can be populated and then subsequently "merged" into a central
db through export/import.

Ideally, the same schema install script should always work
without creating clashes in any database.

*sigh* - I guess short of always having a central table
storing starting points of sequences and having serialized access - there does not seem to be a good alternative...

Thank you so much!!


An alternative...

August 19, 2003 - 12:47 am UTC

Reviewer: Matt from Australia

There are a couple of assumptions here (such as there being a manageable amount of data to transfer in this way), and this involves writing some code, but:

You could write some extraction routines to pull out the data from individual tables in a format that can be used to define an external table in your target DB. This would give you a file per table (per source DB).

The filenames will need to encode the tablename and the source database. This information can be stored in a table in the target DB to allow you to manage the creation of a new source DB with a data change.

These files are then used to populate the target DB:

You will need to know the mapping between your target tables, columns and local sequences that are used to populate these tables. As well as how the filename maps to the table / source DB.

A procedure could then directly query your data from the external tables and dynamically generate insert statements using the appropriate sequence in the correct place in the insert ie;

insert into * select seq.next_val, col1, col2 from external table.

I am assuming that the target DB is live, and that you do not need to transfer index / constrainst etc.

thanx Matt!!

August 19, 2003 - 5:26 pm UTC

Reviewer: A reader

I am not extremely familiar with external tables - may
be that is why the solution looks more complicated:)
I also want to avoid dynamically generated inserts.

What do you think of the following idea?

When installing the product schema sequences are created.
Our attempt is to have a unique increment number a given installation. This number can be based on the current date and time.
Since we can't generate the sequences as above using
NDS, this would have to be done using sql - spooled
into a file and then executed.

following shows the logic of generating create statements
with unique increments.

test@apps> @t
test@apps> select 'create sequence my_sequence start with 1 increment by ' || inc || ';'
2 from
3 (
4 select julian_date + hour_minute_second inc
5 from
6 (
7 select to_number(to_char(sysdate+1, 'J')) julian_date ,
8 to_number( to_char(sysdate, 'hh24MISS')) hour_minute_second
9 from dual
10 )
11 );

create sequence my_sequence start with 1 increment by 2595011;

Elapsed: 00:00:00.01
test@apps> @t
test@apps> select 'create sequence my_sequence start with 1 increment by ' || inc || ';'
2 from
3 (
4 select julian_date + hour_minute_second inc
5 from
6 (
7 select to_number(to_char(sysdate+1, 'J')) julian_date ,
8 to_number( to_char(sysdate, 'hh24MISS')) hour_minute_second
9 from dual
10 )
11 );

create sequence my_sequence start with 1 increment by 2595014;

Elapsed: 00:00:00.00
test@apps>

This should work - what do you think Tom, Matt?

Menon:)

re: sequences

August 19, 2003 - 5:54 pm UTC

Reviewer: A reader

And if you are concerned with the increment being
rather high - you can of course subtract a known
constant (e.g. 2500000 from the value you got to have
lower increments but always different ones for each
install.

But I don't think this should be a concern, really!

sorry the above won't work!

August 20, 2003 - 11:38 am UTC

Reviewer: A reader

Obviously - since my start point is the same! I guess the above idea won't work even if you have the "random" increasing number based on date and time used in start_value or increment of the sequence - The clash can not be avoided for unless you do something like what Tom
suggested (partitioning values with given start points
changing (not randomly) and fixed increment. But this
ain't possible in my case.

Thanx!
Menon:)

sequences and data migration

September 23, 2003 - 11:48 am UTC

Reviewer: A reader

One problem we have is:
If we use sequences, when we migrate data from
one db to the other instance of db (during product
upgrades) say, sequences result in clashes.
Since here the number of sites is not known -
we can not use the "n site" solution you
prescribe above. Have you ever run into this
situation and if so, how did you resolve it?

Thanx!

Tom Kyte

Followup  

September 23, 2003 - 11:51 am UTC


well, the problem isn't sequences.

the problem is surrogate keys -- period. sounds like you need a globally unique one.

read about sys_guid(), a sql function.


</code> http://docs.oracle.com/docs/cd/B10501_01/server.920/a96540/functions125a.htm#79196 <code>

it might be of interest to you.

thanx Tom!

September 23, 2003 - 12:17 pm UTC

Reviewer: A reader

I was aware of the guid solution - wanted to see if
there is a sequence based alternative.

Thanx!!!

Fat Primary Key

September 23, 2003 - 11:15 pm UTC

Reviewer: A reader

Tom

1.Is there any downside to having a varchar2 column as the primary key of a table.

2.Is there a downside to having a fat primary key, i.e. if a primary key is composed of 3 columns

1.col1 varchar2(20)
2.col2 number(10)
3.col3 number(10)

We are having to deal with dba's who believe that having a fat primary key will lead to performance problems, and in the kind of scenarios , they would suggest to have unique index on the combination of the 3 columns above, and then have a number column say col_key number as the primary key, which will be populated by a sequence.

Please help with illustrations.



Tom Kyte

Followup  

September 24, 2003 - 7:35 am UTC

1) no

2) more typing in predicates, that's about it.  realizing that the wider the key, the longer to compare two values -- but i would in most all cases ignore that.

there are performance "problems"
there are observed performance responses

this would not be a performance "problem".  

consider (benchmarking, far far too easy -- easier then "hypothesizing" for sure)

ops$tkyte@ORA920LAP> drop table t;
Table dropped.

ops$tkyte@ORA920LAP>
ops$tkyte@ORA920LAP> create table t
  2  as
  3  select object_name, object_type, owner, max(object_id) object_id
  4    from all_objects
  5   group by object_name, object_type, owner;

Table created.

ops$tkyte@ORA920LAP>
ops$tkyte@ORA920LAP> create unique index t1_idx on t(object_id);

Index created.

ops$tkyte@ORA920LAP> create unique index t2_idx on t(object_name, object_type, owner);

Index created.

ops$tkyte@ORA920LAP>
ops$tkyte@ORA920LAP> create or replace procedure p1
  2  as
  3      l_rec t%rowtype;
  4  begin
  5      for x in ( select * from t )
  6      loop
  7          select * into l_rec from t where object_id = x.object_id;
  8      end loop;
  9  end;
 10  /

Procedure created.

ops$tkyte@ORA920LAP> create or replace procedure p2
  2  as
  3      l_rec t%rowtype;
  4  begin
  5      for x in ( select * from t )
  6      loop
  7          select * into l_rec from t where object_name = x.object_name and owner = x.owner and
  8          object_type = x.object_type;
  9      end loop;
 10  end;
 11  /

Procedure created.

ops$tkyte@ORA920LAP>
ops$tkyte@ORA920LAP> exec runStats_pkg.rs_start

PL/SQL procedure successfully completed.

ops$tkyte@ORA920LAP> exec p1

PL/SQL procedure successfully completed.

ops$tkyte@ORA920LAP> exec runStats_pkg.rs_middle

PL/SQL procedure successfully completed.

ops$tkyte@ORA920LAP> exec p2

PL/SQL procedure successfully completed.

ops$tkyte@ORA920LAP> exec runStats_pkg.rs_stop(1000)
Run1 ran in 328 hsecs
Run2 ran in 353 hsecs
run 1 ran in 92.92% of the time

Name                                  Run1        Run2        Diff
STAT...session pga memory         -131,072           0     131,072

Run1 latches total versus runs -- difference and pct
Run1        Run2        Diff       Pct
315,673     315,410        -263    100.08%

PL/SQL procedure successfully completed.

<b>so, we are talking less then 3/100's of a second here..</b>

ops$tkyte@ORA920LAP>
ops$tkyte@ORA920LAP> exec runStats_pkg.rs_start

PL/SQL procedure successfully completed.

ops$tkyte@ORA920LAP> exec p2

PL/SQL procedure successfully completed.

ops$tkyte@ORA920LAP> exec runStats_pkg.rs_middle

PL/SQL procedure successfully completed.

ops$tkyte@ORA920LAP> exec p1

PL/SQL procedure successfully completed.

ops$tkyte@ORA920LAP> exec runStats_pkg.rs_stop(1000)
Run1 ran in 363 hsecs
Run2 ran in 333 hsecs
run 1 ran in 109.01% of the time

<b>3/100ths of a second.. for over 30,000 rows.  do the math, it is neglible</b>


Name                                  Run1        Run2        Diff

Run1 latches total versus runs -- difference and pct
Run1        Run2        Diff       Pct
314,060     314,149          89     99.97%

PL/SQL procedure successfully completed.


<b>but, did they think about the inserts taking longer, generating more redo, the extra space of the index, yadda, yadda, yadda</b>

you can benchmark that for them if you want but something like this should be enough. 

compiste PK, locking, FK and redundant indexes

September 24, 2003 - 7:50 am UTC

Reviewer: A reader

Hi

I have seen applications with many composite PK therefore many child tables are over-indexed. For example

table_1 has col_a, col_b and col_c as PK
table_2 has col_b, col_c, and col_e as PK
table_3 has col_b, col_c, and col_d

table_child references table_1, table_2 and table_3 as follows

col_a
col_b
col_c
col_d
col_e

to avoid locking we would index

col_a, col_b, col_c
col_b, col_c, col_e
col_b, col_c, col_d

Leads to many redundant indexes...

Is this bad database design? Or it is something which cannot be avoided?

Tom Kyte

Followup  

September 24, 2003 - 11:59 am UTC

if you delete from the parent -- you need the index (to make the check on child fast, avoid a full scan)

if you update the parent primary key -- well, besides being a bad idea -- you need the index to avoid the full scan.

so, they are not redundant in this case if either of the above is true.

surrogate keys , shadow keys

September 24, 2003 - 4:39 pm UTC

Reviewer: A reader

1.
Are surrogate keys used to handle a situation where we have fat primary keys.
i.e.

in a situation where we have to have a primary key like ( col1, col2,col3), then we create another key column ( mostly populated by a sequence) which will be the unique identifier for the ( col1, col2,col3).

And secondly we would create a unique index on ( col1, col2,col3).

So we would have key surrogate key
( col1, col2,col3) - unique index.

If someone has to join to this table they will do so by joining to just key column.

2.Are surrogate keys and shadow keys one and the same?

Tom Kyte

Followup  

September 25, 2003 - 5:05 am UTC

1) no, they are not just for "fat primary keys"

they are used when there is no true primary key. a primary key is immutable -- never changing. Many real world situations do not have such a thing. Consider a "user" table. Say you keyed it off of the employee email id. Well, people get married, change names - emails CHANGE. using email as a key (i've seen so many people burned big time by that one) is a really utterly bad idea. so we use a surrogate.

2) not familar with "shadow keys" personally.

what do u think ?

September 29, 2003 - 2:18 pm UTC

Reviewer: huss from egypt

Tom
i already have the problem of composite primary key that foreign in many others
so when need to update it i fall into cascading update problems,
i need ur opinion in my soluation
i think in add a not null unique key in the master table and then add a column in each child of that master and make then reference to that made and after that change the primary of master table to that new column and drop the fat foreign keys from all details tables

Tom Kyte

Followup  

September 29, 2003 - 2:53 pm UTC

(your keyboard is failing -- it is dropping really important letters used in day to day writing -- vowels!)


if you have an update cascade problem, matters not that you have 1, 2, 3, 30 attributes in your key -- you have a problem in the first place.

You chose a primary key that is not a primary key.

So, it sounds like you are trying to add a surrogate.

Yes, you can add a unique, not null column that is populated by a sequence.

You can then add that column as a fkey to the child tables and update the join between parent and child to key the surrogates down there. You would update all code to make it so it populates this new column as well.

you could then drop the "fat" key which isn't a key really -- since you update it!

composite index structure

December 09, 2003 - 5:38 pm UTC

Reviewer: A reader

Hi

I am reading about indexes in the concepts guide, the examples there are quite simple since they are using simple single column indexes.

However, if my PK is formed by 2 or more columns such as

CASE_ID
BUSINESS_UNIT

If I have these rows

CASE_ID BUSINESS_UNIT
1 XA
2 YZ
2 YA
3 XC
4 YW
5 ZQ

in the leaf blocks how are these rows stored? 2 YZ and 2 YA?
Are the 2 columns concatenated? As 2YZ 2YA?



Tom Kyte

Followup  

December 10, 2003 - 2:48 am UTC

they are stored as two separate columns in the index blocks. so, you can use the pk index with queries such as:

select * from t where case_id = :x and business_unit = :y;
select * from t where case_id = :x;

and if case_id has few distinct values, maybe even:

select * from t where business_unit = :y;

using index skip scans (search for "index skip scan" to read more about that if interested)



how about range partitioned indexes

December 10, 2003 - 8:17 am UTC

Reviewer: A reader

Hello

multi column range partitioned tables concatenate the columns to form the partition key, does it happen for range partitioned indexes too?



Tom Kyte

Followup  

December 10, 2003 - 3:28 pm UTC

no they do not. they work as vectors.

ORA-01793: maximum number of index columns is 32

February 17, 2005 - 7:49 am UTC

Reviewer: Kamal Kishore from New Jersey, USA

Hi Tom,
When I try to create an index on 33 columns of a table, I get the above error. But when I add constraint primary key on that table with same number of columns, I end up with an index having 33 columns. Is there a reason for not being able to create an index with 33 columns but getting it as a side effect of add constraint?

SQL> create table test_key (
  2  c1 varchar2(1),
  3  c2 varchar2(1),
  4  c3 varchar2(1),
  5  c4 varchar2(1),
  6  c5 varchar2(1),
  7  c6 varchar2(1),
  8  c7 varchar2(1),
  9  c8 varchar2(1),
 10  c9 varchar2(1),
 11  c10 varchar2(1),
 12  c11 varchar2(1),
 13  c12 varchar2(1),
 14  c13 varchar2(1),
 15  c14 varchar2(1),
 16  c15 varchar2(1),
 17  c16 varchar2(1),
 18  c17 varchar2(1),
 19  c18 varchar2(1),
 20  c19 varchar2(1),
 21  c20 varchar2(1),
 22  c21 varchar2(1),
 23  c22 varchar2(1),
 24  c23 varchar2(1),
 25  c24 varchar2(1),
 26  c25 varchar2(1),
 27  c26 varchar2(1),
 28  c27 varchar2(1),
 29  c28 varchar2(1),
 30  c29 varchar2(1),
 31  c30 varchar2(1),
 32  c31 varchar2(1),
 33  c32 varchar2(1),
 34  c33 varchar2(1));

Table created.

SQL> create unique index pk_test_key on test_key(
  2  c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13,c14,c15,c16,c17,c18,c19,c20,c21,c22,c23,c24,c25,c26,c27,c28,c29,c30,c31,c32, c33);
c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13,c14,c15,c16,c17,c18,c19,c20,c21,c22,c23,c24,c25,c26,c27,c28,c29,c30,c31,c32, c33)
                                                                                                                           *
ERROR at line 2:
ORA-01793: maximum number of index columns is 32


SQL> alter table test_key add constraint pk_test_key primary key (
  2  c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13,c14,c15,c16,c17,c18,c19,c20,c21,c22,c23,c24,c25,c26,c27,c28,c29,c30,c31,c32, c33);

Table altered.

SQL> select index_name, count(*) from user_ind_columns where index_name = 'PK_TEST_KEY' group by index_name ;

INDEX_NAME                       COUNT(*)
------------------------------ ----------
PK_TEST_KEY                            33

1 row selected.

SQL> select index_name, column_name from user_ind_columns where index_name = 'PK_TEST_KEY' order by column_position ;

INDEX_NAME                     COLUMN_NAME
------------------------------ ------------------------------
PK_TEST_KEY                    C1
PK_TEST_KEY                    C2
PK_TEST_KEY                    C3
PK_TEST_KEY                    C4
PK_TEST_KEY                    C5
PK_TEST_KEY                    C6
PK_TEST_KEY                    C7
PK_TEST_KEY                    C8
PK_TEST_KEY                    C9
PK_TEST_KEY                    C10
PK_TEST_KEY                    C11
PK_TEST_KEY                    C12
PK_TEST_KEY                    C13
PK_TEST_KEY                    C14
PK_TEST_KEY                    C15
PK_TEST_KEY                    C16
PK_TEST_KEY                    C17
PK_TEST_KEY                    C18
PK_TEST_KEY                    C19
PK_TEST_KEY                    C20
PK_TEST_KEY                    C21
PK_TEST_KEY                    C22
PK_TEST_KEY                    C23
PK_TEST_KEY                    C24
PK_TEST_KEY                    C25
PK_TEST_KEY                    C26
PK_TEST_KEY                    C27
PK_TEST_KEY                    C28
PK_TEST_KEY                    C29
PK_TEST_KEY                    C30
PK_TEST_KEY                    C31
PK_TEST_KEY                    C32
PK_TEST_KEY                    C33

33 rows selected.

SQL>
SQL> disconnect
Disconnected from Oracle9i Enterprise Edition Release 9.2.0.3.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.3.0 - Production
SQL>
 

Tom Kyte

Followup  

February 17, 2005 - 9:30 am UTC

32 is the max supported and documented.

you cannot have any expectation above that.

having 33 columns in a constraint would be an illogical extreme as well. that is works is not real world relevant.

don't know why it works, wouldn't even begin to get near to approaching testing it in real life.

Can't use composite key, only ID field is Foreign Key

April 28, 2005 - 12:23 pm UTC

Reviewer: JK Purtzer from Lincoln, NE USA

The developers want to replicate tables across to another database, but did not include a unique ID field, for me to create a Primary Key for replication, the only ID field is a Foreign Key, should I add a unique field or make a Primary Key from the Foreign Key? Am I making things difficult for myself in the future creating a constraint of a constraint?

Tom Kyte

Followup  

April 28, 2005 - 2:02 pm UTC

you cannot make a primary key from a foreign key in general.

does the table have a candidate key? is there something that uniquely identifies a row in there?

No candidate key

April 28, 2005 - 3:12 pm UTC

Reviewer: JK Purtzer from Lincoln, NE USA



Tom
Unfortunately no candidate key rows, not that conform to the Primary Key requirements, I was hoping to do a composite, rows are not unique and have nulls. Ok, I will create one for these tables using a trigger/sequence. I saw referece on this site earlier. Thanks for excellent and timely advice.

JK

composite key resulted in foreign key problem

June 03, 2005 - 5:59 am UTC

Reviewer: raaghid from india

So you are saying to replace composite primary key with surrogate key. What about my case below:

1- I am designing payroll system
2- e.g.,
pms_emp_mas (emp_no, emp_name, address, desig_code (fkey or pms_desig_mas), dept_code (fky of pms_dept_mas) etc..)
pms_desig_mas (desig_code, desig_desc)
pms_dept_mas (dept_code, dept_desc)

Now the requirement is I need to add a company_code in all tables (both master and transaction). Even designation/dept is exclusive for the company.

- If I add company_code in the dept_mas and desig_mas and made that as part of primary key,
- then when I create foreign key in the pms_emp_mas, I am struct, since I need to accomadate (desig_code+company_code) & (dept_code + company_code) - so company_code is repeating. (so I have 10 masters, so 10 company codes will be there in emp_master)

In this circumstance How do we design?

option 1- Make dept_code as primary key in dept_mas /desig_code in desig_mas ---> and have company code in these tables as ordinary column (not included in primary key) -->
so in this case, dept_code is unique & autogenerated through sequence and filtered based on company in the application. (may be companies will have skipped desig code - no problem I think)

(or)

2- Create surrogate key --> make desig_code+company_code as unique constraint. And use surrogate key of desig as foreign key in the employee master. (similrly for dept mas)

Which method is best. Please help.



Tom Kyte

Followup  

June 03, 2005 - 7:38 am UTC

...
So you are saying to replace composite primary key with surrogate key.
......


where did I say that? It is a choice, 1 or 2 columns, maybe 3 columns -- livable. If you have a primary key that is 5, 6, 7 columns -- I'd be thinking hard about surrogate keys yes.


But in answer to your question -- the key is the key, what is your key (and if the key is changing, you'll have to go and make the change).


if desig_code,company_code is the imutable (unchangeable, values are not updated) key, it is perfectly OK to use it. It is your choice.

yes you are right

June 06, 2005 - 3:43 am UTC

Reviewer: Raaghid from India

yes I am wrong, You said that when it is more thatn 3 or 4 columns, go for surrogate key.

But in my case,

desig mas (desig_code (pkey), company_code (pkey), desig_desc)

dept_mas (dept_code (pkdy), company_code (pkey), dept_desc)

emp_mas
-------
company_code - pkey
emp_no pkey
desig_company_code (fkey - desigmas)
desig_code (fkey - desigmas)
dept_company_code (fkey - deptmas)
dept_code (fkey - deptmas)
emp_name

-- Here I need to repeat the company code, because it is a part of primary key in both desig and dept mas. (desig_company_code, dept_company_code)
-- I think this is unnecessary and not required.
-- to avoid what can I do.
-- So I decided to make desig_code (only) as primary key and make company code as just an non-attribute column, so that I don't need to repeat this in employee master. ( and in all other tables also i dont need to repeat, wherever Fkey is referred)

What is your openion Tom.









Tom Kyte

Followup  

June 06, 2005 - 7:23 am UTC

my opinion is that only you know what the true primary keys are in your data model and only you know the functionally dependent data (attributes). You tell us what they are -- it isn't really an opinion, but facts from your model that count in this case.

Immutable

October 27, 2005 - 3:13 pm UTC

Reviewer: Mark Brady from Baltimore, MD USA

"There is no reason the client has to know what the real and true primary key in a physical schema is -- that is an implementation detail."

Let's take a canonical order entry system for, say, an auto supply store. Someone orders a list of parts. I create a header record with the data element unique to the order, name, address, etc. I create detail records for each part, the quantity, price, etc.

I create an arbitrary sequence based Pk to tie all the records together in this system and the shipping system et al.

So I need to give the customer a way of identifying THIS order so that he can get shipment date etc. From your declaritive above, I should not reveal my PK as the number he can refer to to uniquely identify this order. So i should create another sequence and another column and another index to give an Order#? Seems superfluous.

So I'm torn between your 'never' meaning 'it's generally not the case to reveal the real PK' and go through the work and overhead of a different number because if the column has meaning, it's possible that someone, somewhere will want to change it.




Tom Kyte

Followup  

October 28, 2005 - 1:35 am UTC

Let's take that order entry system for say the auto supply store - pre-computers.

You had a sheet of paper with numbers on them.
They were sequential.

But you just spilled your coffee on the tablet - destroying the next 10 pages, they are gone, done, never to be used.

So, you had gaps.

Live with gaps... they exist, they will exist, they have existed - you voided those invoices in the past, you can do it today.

definition of hsec = 100th of a second?

December 05, 2005 - 7:15 pm UTC

Reviewer: A reader

"ops$tkyte@ORA920LAP> exec runStats_pkg.rs_stop(1000)
Run1 ran in 363 hsecs
Run2 ran in 333 hsecs
run 1 ran in 109.01% of the time

3/100ths of a second.. for over 30,000 rows. do the math, it is neglible"

Not being nitpicky - I really want to clarify. Should this be 30/100ths of a second? (363 - 333 = 30)



Tom Kyte

Followup  

December 06, 2005 - 5:36 am UTC

yes, thanks.

Using an 'artificial' primary key

December 06, 2005 - 6:30 am UTC

Reviewer: Serge Shmygelsky from Ukraine

Hello Tom,

I have a table containing user data like username, date of creation etc. Actually, username is unique identifier and I've decided to use it as a primary key. Username is VARCHAR2(30). But my peers told me that it was not a good idea and I should have used an artificial PK like ID of NUMBER datatype filled by a sequence. They didn't give me any reasons except that 'PK should be a number' and some notes about space usage. What is your opinion on that? Is there any reasons to have PK of NUMBER datatype?

P.S. The table is referenced by the only one FK.

Thanks in advance

Tom Kyte

Followup  

December 06, 2005 - 8:16 am UTC

username, as long as it does not change, should be the natural primary key it sounds like to me.

there is no reason why surrogate keys should be used everywhere, in fact the converse it typically true, natural keys should be used - unless the natural key is something subject to change; or unless no natural key really exists.

define "something subject to change"

December 06, 2005 - 8:36 am UTC

Reviewer: Matthias Rogel from Kaiserslautern, Germany

please define "something subject to change"

a name is "subject to change"
a username is very "subject to change"

the only thing not "subject to change" is
something happened in the past


Tom Kyte

Followup  

December 06, 2005 - 9:43 am UTC

change your username in oracle then?

I don't get the air quotes or the subject to change commentary though?


IF (your proposed key is not something that is "subject to change" (the value thereof))
THEN
it is your natural key, feel free to use it, don't be duped into using
surrogates for everything
END IF;



December 06, 2005 - 9:17 am UTC

Reviewer: David Aldridge http://oraclesponge.blogspot.com from Colorado Springs

I don't know about using username as a primary key value. There are a number of circumstances when people want them to change ...

i) to match a new corporate standard
ii) to match a name change
iii) to match a change in application name

I think that we're all used to immutable usernames simply because that is what we always have to use, but there's a reasonable case for being able to change them.

If you had a bunch of code that referenced a username then that would obviously mitigate against actually changing it, but it would be good to have the facility anyway.

Tom Kyte

Followup  

December 06, 2005 - 9:48 am UTC

like I said:

sername, as long as it does not change,

(and said more than once.... :)

December 06, 2005 - 11:49 am UTC

Reviewer: David Aldridge http://oraclesponge.blogspot.com from Colorado Springs

Yes of course, but I was highlighting some actual reasons why there might be cause for usernames to change, reasons that Serge might usefully consider. As Oracle practitioners I think we might be influenced by the immutability of Oracle usernames and extrapolate that into our own designs erroneously.

Tom Kyte

Followup  

December 06, 2005 - 3:02 pm UTC

arg...

Ok, lets take username out of it.


a) do not be afraid to use natural keys.
b) if you have a natural key that is imutable - use it
c) do not use surrogates for everything
d) see a)

December 06, 2005 - 4:12 pm UTC

Reviewer: David Aldridge http://oraclesponge.blogspot.com from Colorado Springs

>> arg...

Heh heh heh.

composite primary key

December 06, 2005 - 6:22 pm UTC

Reviewer: Mark A. Williams from Indianapolis, IN USA

David,

You itching for that thread on dizwell to open up again or something!?!?

I've always viewed changing a user name as something like an index "update". Create the new one, delete the old one. Not an actual in-place change and, given that, a user name is immutable in my eyes.

Now I have run into something of a less than ideal situation recently. I have a nice little schema that uses natural keys and have a mod_plsql application that runs against that schema. When I tried to "migrate" the application to html db I was a bit disappointed that html db does not seem to like composite keys with more than 2 columns. Some of my tables have 3 column natural keys. bleh.

- Mark

Tom Kyte

Followup  

December 07, 2005 - 1:45 am UTC

They are working on that - adding multi-column support to the wizards (i never ran into it because I didn't use the wizards in asktom, all about api's for me)

... surrogate keys rock ...

December 07, 2005 - 3:36 am UTC

Reviewer: A reader

In my personal experience, natural keys, most of the time, turned one time or another into a subject to change, the simplest reason beeing typing errors.

Therefore: never ever use PKs that can be created/changed through "normal" user operation (forms, etc.) - the implications (problems) in complex data models far outweigh their advantages, in my opinion.

C.

December 07, 2005 - 11:41 am UTC

Reviewer: David Aldridge http://oraclesponge.blogspot.com from Colorado Springs

>> You itching for that thread on dizwell to open up again or something!?!?

You're a very bad person, Mark Williams, verrry verrry bad :D

>> I've always viewed changing a user name as something like an index "update". Create the new one, delete the old one. Not an actual in-place change and, given that, a user name is immutable in my eyes.

Often that is forced more by system limitations anyway, right?

That was partly my point before -- that our prior experience with the limitations of other systems can lead us to inherit those limitations in our own. Obviously to change a username in Oracle is no trivial matter when objects owned by that username/schema are referenced by PL/SQL or SQL code, because the change is very difficult to cascade. It would take a very different coding regime to allow that.

However for usernames _within_ the application (ie. the people sitting in front of the tube entering data and generating reports etc) the situation is more conventional. You're likely to store their names against audit records, maybe in a table of roles and permissions, and in that case username changes are more logically feasible (although more practically feasible only if you have hand-rolled your own user/role/permission storage and logic).

There's nothing special about them at all, not in the sense that the username of an object owner in Oracle is special. They're just regular items of data.

more compositing

December 07, 2005 - 2:36 pm UTC

Reviewer: Mark A. Williams from Indianapolis, IN USA

>> You're a very bad person, Mark Williams, verrry verrry bad :D

Most likely due to my unreassuringly utter beardlessness I would have to say ;)

I did find it interesting that in the new Jonathan Lewis book he makes a reference to a system being "cursed" (his word) by surrogate keys. Of course I don't have the book with me right now, so I can't point to the page/context.

Anyway, I think this is one of those topics destined for eternal questioning.

- Mark

System cursed with single-column synthetic keys

December 07, 2005 - 9:24 pm UTC

Reviewer: Mark A. Williams from Indianapolis, IN USA

As a follow-up to my previous comment, it is page 271, section "Join Cardinality for Real SQL", where Jonathan states:

"Unless your system has been cursed with single-column synthetic keys all over the place, you will probably write some SQL that involves two, or more, columns being used in the join condition between two tables."
(Cost Based Oracle Fundamentals, Jonathan Lewis, Apress, 2005)

Anyway, I thought that "cursed" was a fairly strong word so I wrote Jonathan to ask about it. He kindly responded and a synopsis of that response is:

[paraphrase on]
Using a synthetic key usually introduces an additional index/unique constraint into the system. A table typically has columns that could serve as the natural key and, therefore, have a unique constraint on them as well. This introduces additional overhead in areas like redo, undo, latching, and buffer efficiency. The optimizer may examine an additional index which also results in overhead in the shared pool, latching, and cpu consumption. This may also hide information from the optimizer if the real key is hidden behind a join on a meaningless key -- for example, a data distribution pattern in the real key can be lost to the optimizer.

Users of a system often like to query by meaningful information; and primary keys can often propagate down to child tables. This means that you may have to write SQL to join tables because the thing that the user understands is in a table that is a meaningless key away from the data the user wants to see. [this is mostly a direct quote]
[paraphrase off]

Anyway, there you have it, and I hope that my paraphrase did not alter any of Jonathan's intended meaning.

- Mark

December 08, 2005 - 2:47 am UTC

Reviewer: David Aldridge http://oraclesponge.blogspot.com from Colorado Springs

Yes, that all sounds like very valid observations -- the requirement to maintain a unique constraint+index on the natural key in addition to a primary key constraint+index on the synthetic column, as well as an increased number of joins, are well known disadvantages to the synthetic key approach. That it also can obscure statistics is also a consequence of using synthetic keys which is why they ought to be less popular in value-distribution sensitive data warehouses -- almost certainly that is less of a problem in OLTP systems that are less sensitive to value distributions (except for batch and report queries of course).

This is balanced against the performance problem of cascading PK changes to child tables if a natural key changes (and has child tables yadda yadda).

As far as diagnosing performance problems goes, because the potential effects of synthetic keys are spread widely around an application I would suggest that they are more likely to be identified as contributing to a systemic performance problem. On the other hand, should there be a performance problem caused by the need to cascade a changed PK value then the developers already know several things:

i) that it's there.
ii) that it's very obvious and very localised.
iii) what caused it.
iv) that it was avoidable with a synthetic key.

Such situations are interesting because the performance problem of synthetic keys is "the database is slow", while the performance problem of the natural keys is "we got the design wrong". Which circumstance is more likely to lead to public (ie. outside the technical team) discussion of the nature of the problem? :D

Well, all that aside I'm not intending to pick at the raw scab of recent synthetic/natural key debates, I was just saying, "here are some thoughts on the mutability of usernames in particular, and why people think they're not mutable when in some circumstances they potentially ought to be". Honest, guv!

Trigger for sequence

December 21, 2005 - 11:50 am UTC

Reviewer: Jim from Scotland

Great thread.
One thing though Tom (I had actually shortened "though" to "tho'" but thought I had better change it in case I got in trouble :D)
I noticed that you used a trigger to populate the surrogate PK automatically with the sequence. Is this not a bit excess to requirements. Creating a new object with dependencies that must be maintained by Oracle along with the context switches, instead of just using
insert into ... (sequence_name.nextval,....)

Thanks

Jim

Tom Kyte

Followup  

December 21, 2005 - 7:48 pm UTC

I would definitely prefer that myself. Over time - I've become infintely "more vocal" about what I think is the right way to do it...

Thanks

December 22, 2005 - 4:44 am UTC

Reviewer: Jim from Scotland


Toad use &composite primary key use

December 29, 2005 - 3:31 am UTC

Reviewer: arvind from INDIA

gr8 enough to explain

varchar2

March 24, 2006 - 4:04 am UTC

Reviewer: Umesh Kasturi from Bangalore India

can I enforce a primary key on a varchar2 column . Are there any disadvantages of doing so ; compared to the applying the constraint on the numeric column

Tom Kyte

Followup  

March 24, 2006 - 9:50 am UTC

yes, of course - any scalar(s) will do.


a number column in Oracle is a varying length character string similar to a packed decimal number. Not much different from a string.


Use the natural key of an object whenever possible, if that happens to be a string, so be it.

Primary Keys between development and production

April 03, 2006 - 10:57 am UTC

Reviewer: Scott Horowtiz from Smithtown, NY USA

Tom,
I am wondering the best use of primary keys between development and production servers. In order for application development, is it better to make sure the PKs on type tables match between development and production. Or for a condition, create a condition table, and insert the keys you need into this table and reference this table for the condition.

To give you a better idea. We have a tasktype table, and our reports change for specific task types. As many as 8 of this task types can have the same condition in the report. Our PKs are created by triggers on both databases. It seems like too much work to try and keep the PK IDs the same between development and production. My suggestion was to create a table of TaskTypeIDs and use this table for the condition, instead of referencing each one individually. Or adding a flag to the table for the condition.

CREATE TABLE "TASKTYPE" ("TASKTYPEID" NUMBER NOT NULL, "NAME"
VARCHAR2(70 byte)
PRIMARY KEY("TASKTYPEID"))


CREATE TABLE "COS" ("TASKTYPEID" NUMBER NOT NULL
CONSTRAINT "FK_COS_TO_TASKTYPE" FOREIGN KEY("TASKTYPEID") REFERENCES "TASKTYPE"("TASKTYPEID"))

Tom Kyte

Followup  

April 04, 2006 - 9:39 am UTC

why would the data values need to be the same?

More on Primary keys from before

April 03, 2006 - 2:23 pm UTC

Reviewer: Scott Horowitz from Smithown, NY USA

Here's some more info to help you understand what I am tyring to do.

insert into tasktype(tasktypeid,name)
values(1,'Document Retrieval');

insert into tasktype(tasktypeid,name)
values(2,'COS - Short Form');

insert into tasktype(tasktypeid,name)
values(3,'COS - Foreclosure');

insert into cos(tasktypeid)
values(2);
insert into cos(tasktypeid)
values(3);


normally the tasktypeids are created via trigger. But, I wanted to simplify for my purpose.

Select count(*) from tasktype
where tasktypeid in (2,3);

or

select count(*) from tasktype
where tasktypeid in (select tasktypeid from cos);

Using the latter, would prevent having to ensure PKs between development and production are the same.


Tom Kyte

Followup  

April 04, 2006 - 9:46 am UTC

why does it matter, they are two entirely different databases.

Thanks Tom

April 04, 2006 - 10:29 am UTC

Reviewer: Scott Horowitz from Smitthown, NY USA

The reason it matters is for our reporting. We use Crystal Reports for our reporting purposes. Certain tasktypeids have specific parts of the reports that need to come up, while others do not. We can either hard code the reports to these IDs or create the table, and put the ones that meet the condition into the table, thus negating having to hard code the IDs.

I just wanted your opinion on the issue.

Thanks,
Scott

Tom Kyte

Followup  

April 04, 2006 - 7:25 pm UTC

I think your design is horribly flawed if a surrogate key must be the same in two entirely different databases.


You are missing something in your data model then .

So, why do we put data in a database?

April 09, 2006 - 4:09 am UTC

Reviewer: Brad Worsfold from Victoria, BC Canada

To report on it. Using SKeys all over the place is not always a good idea... Usually, I will do a combination where you SKEy your important business entities, but not their relationships.

In this way, you can do reporting much more simply than when all you have are SKeys and massive joins to perform to get to a parent object a few layers up. Most of my clients don't like the idea of having redundant data unless it is truly required.

Just imagine your reporting on a fact table with one table you need to grab the description with 10 joins required? So, be careful on how you use SKEYs, especially when you have to query that massive fact table for reporting purposes!!!

The only draw back is it makes your programmers job a "little" more challenging and you may larger indexes to maintain, depending on whether you can utilize compression or not. In the case of not using SKeys, compression works GREAT!!!

Food for thought,


BradW



Composite Primary Key & Btree Index

May 02, 2006 - 6:39 am UTC

Reviewer: Sujit Mondal from India

The information is really excellent. I have a question as a followup of this:
Table A(A1 NUMBER,A2 NUMBER,A3 NUMBER)
Table B(B1 NUMBER,B2 NUMBER,B3 NUMBER)
Table A and B has a many to many relationship , so Modelled associative entity Table C in following two ways:
1) Table C(A1 NUMBER,B1 NUMBER) a composite Primary Key Index in column A1 and B1. With respective Foreign Key Constraint.
2) Table C(C1 NUMBER,A1 NUMBER,B1 NUMBER) , Where C1(Surrogate Key) is a unique number generated from a sequence and is the Primary Key for C. There is a Unique key constraint on A1,B1. And respective Foreign Key constraint. And
a) A composite B Tree index on (A1,B1)
b) Two B Tree index on column A1 and B1 separately.

Table C usage are like below:
Table C rows won't be made as foreign key to other tables. As it is the bottommost table in the model.

As per the discussion in this page it seems to me that in this scnario above option 1 should be sufficient. Can you please correct me if I am wrong?
If we consider the performance which approach is best?
1 , 2a or 2b?
Is the decision of this model will be based on the size of the table?

Thanks & Regards
-Sujit


Tom Kyte

Followup  

May 02, 2006 - 7:35 am UTC

I cannot think of a case where I would use a surrogate key on C.

C would have the two primary keys, one each from A and B.


Why would I want

a) primary key on C1
b) unique on A1,B1
c) NOT NULL on A1
d) NOT NULL on B1
e) foreign key on A1
f) foreign key on B1

and then to link A to B I would have to

a) create a row in C
b) update A
c) update B


versus

a) primary key on (a1,b1)
b) foreign key on a1
c) foreign key on b1

and then to link A to B I would have to

a) create a row in C

Surrogates and EJB3 etc

May 06, 2006 - 5:45 pm UTC

Reviewer: Tim Onions from Worcester UK

The information here is excellent and confirms everything I believe about PKs and surrogate keys in relational databases.

Surrogate keys, and EJB3/J2EE

May 07, 2006 - 5:06 am UTC

Reviewer: Tim Onions from Worcester UK

...having read this thread with interest I then read up on J2EE/EJB3 and it seems there is strong pressure (or even an absolute need) to design exclusively with surrogate keys for such a framework. So external project pressures may make it impossible to put into practise Tom's good advise. Personally I can't see how such "requirements" could ever result in a performant/maintainable system and severely restricts the DB designers scope for coming up with the most suitable table design. I am not even sure if the surrogate key requirement for EJB3 is really a requirement or simply one of those "it makes it so much easier to develop" arguments.

Tom Kyte

Followup  

May 07, 2006 - 11:30 am UTC

it is sort of sad that your choice of programming language drives your data design.

Since we all know that applications come and applications go, but the data they use stays forever and ever....

DB Design, surrogate keys, EJB3

May 07, 2006 - 4:31 pm UTC

Reviewer: Tim Onions from Worcester UK

"Sad"? - master of the understatement there! "Mad" more like. I think you meant "their" decision not "your" - I had no part in it and continue to fight to get what I see is the right decision (and if not then to wash my hands of it and walk away). Rhetorical question: Does J2EE/EJB3 really need this, sounds like it just makes it easier for a developer somewhere. But thanks to you as ever for your objective wisdom.

Tom Kyte

Followup  

May 08, 2006 - 7:35 am UTC

Some beans that I've had the pleasure of seeing not only use these surrogates (object ids) everywhere, but make you use NULLABLE attributes for everything as they tend to create an empty object and populate it later. Plays havoc with indexing, data integrity, query optimization and everything.

Depends on how the people implementing the stuff do it entirely.

J2EE/EJB3 and surrogate keys

May 09, 2006 - 12:46 pm UTC

Reviewer: Tim Onions from Worcester UK

now I get this statement from our Java gurus:
"there is a limitation in ejb3 tha components of compound Primary keys cannot be foreign keys"!!! I need to get clarification form the guru on this but if it is true then you can see why they are so keen on surrogates - very many natural PKs would be ineligable as they comprise a parent PK plus some other data to make them uniqe. Just had to share this.

Tom Kyte

Followup  

May 09, 2006 - 1:25 pm UTC

they lie to you.

It is a limitation of the toolkit they are using perhaps.

But basically they would be saying "EJB3 cannot use existing legacy data sources, EJB3 makes you reimplement all of your data models before you can start using it"

Ask them if that is what EJB3 really means and if so - why did they choose something so silly?

max value of surrogate keys

May 09, 2006 - 10:48 pm UTC

Reviewer: Karren from Philippines

Hi Tom,

I am planning to use a surrogate key to drop a 6-field composite key. My concern is: Is there a max value for a surrogate key field?
If a system is "built to last" and my surro key is an int; will there be no way that the surrogate key approach its limit?

I'm afraid that the idea of resetting the surrogate key every year is on my head, is that a good approach (if not i don't want to do that extra resetting anyway)

thanks in advance

Tom Kyte

Followup  

May 10, 2006 - 7:40 am UTC

the max limit is


ops$tkyte@ORA10GR2> select rpad('9',28,'9') from dual;
 
RPAD('9',28,'9')
----------------------------
9999999999999999999999999999

using a sequence to populate it.  Even if you generate 1,000,000 of them per second for the next 1,000 years:

ops$tkyte@ORA10GR2> select 1000000 * 60 * 60 * 24 * 366 * 1000 from dual;
 
1000000*60*60*24*366*1000
-------------------------
               3.1622E+16


you will not run out ;) 

J2EE/EJB3

May 10, 2006 - 12:53 pm UTC

Reviewer: Tim Onions from Worcester,UK

You are right (of course) they lie - what they meant was "we do not know how to..." Now they do know. Your final shot about why they chose EJB3 is priceless and neatly sums up the confusion of emotions in my head. Enough said on this part of the thread. You have been a great help for which I thank you.

fat primary keys

June 20, 2006 - 3:48 am UTC

Reviewer: A reader

Hi

In one of your previous answers regarding having a fat primary key would affect the performance you said that it would but can be ignored.

I am not so sure, I have been running a test just for the sake to show our outsourcers that composite PKs do have impacts on the performance when range scans is used, the test showed that

a table's PK made up of 3 columns need twice consisten gets when doing a table join

when changed to a single column PK consistent gets dropped to half

To get a 100 rows with 3 columns PK 220 consistent gets were requiered, with a single PK only 100 were needed.



Tom Kyte

Followup  

June 20, 2006 - 10:01 am UTC


show us your example - and we'll be able to explain what you saw. Then, we can discuss if it is "real world"



drop table t1;
drop table t2;


create table t1
as
select * from (
select object_name, object_type, owner,
max(object_id) object_id, count(*) cnt, rpad('*',30,'*') data
from all_objects
group by object_name, object_type, owner
)
where rownum <= 500
/

alter table t1 add constraint t1_unique1 unique(object_name,object_type,owner);
alter table t1 add constraint t1_unique2 unique(object_id);

create table t2
as
with data as (select level l from dual connect by level <= 25)
select * from t1, data;

create index t2_idx1 on t2(object_name,object_type,owner);
create index t2_idx2 on t2(object_id);

alter session set sql_trace=true;

begin
for x in ( select * from t1 )
loop
for y in (select t1.data t1data, t2.data t2data
from t1, t2
where t1.object_name = t2.object_name
and t1.owner = t2.owner
and t1.object_type = t2.object_type
and t1.object_name = x.object_name
and t1.object_type = x.object_type
and t1.owner = x.owner )
loop
null;
end loop;
for y in (select t1.data t1data, t2.data t2data
from t1, t2
where t1.object_id = t2.object_id
and t1.object_id = x.object_id )
loop
null;
end loop;
end loop;
end;
/
SELECT T1.DATA T1DATA, T2.DATA T2DATA
FROM
T1, T2 WHERE T1.OBJECT_NAME = T2.OBJECT_NAME AND T1.OWNER = T2.OWNER AND
T1.OBJECT_TYPE = T2.OBJECT_TYPE AND T1.OBJECT_NAME = :B3 AND T1.OBJECT_TYPE
= :B2 AND T1.OWNER = :B1


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 500 0.04 0.02 0 2 0 0
Fetch 500 0.15 0.13 56 15053 0 12500
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 1001 0.20 0.16 56 15055 0 12500

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 106 (recursive depth: 1)

Rows Row Source Operation
------- ---------------------------------------------------
12500 NESTED LOOPS (cr=15053 pr=56 pw=0 time=311032 us)
500 TABLE ACCESS BY INDEX ROWID T1 (cr=1500 pr=3 pw=0 time=22447 us)
500 INDEX UNIQUE SCAN T1_UNIQUE1 (cr=1000 pr=3 pw=0 time=10630 us)(object id 57928)
12500 TABLE ACCESS BY INDEX ROWID T2 (cr=13553 pr=53 pw=0 time=217658 us)
12500 INDEX RANGE SCAN T2_IDX1 (cr=1053 pr=53 pw=0 time=15500 us)(object id 57931)
********************************************************************************
SELECT T1.DATA T1DATA, T2.DATA T2DATA
FROM
T1, T2 WHERE T1.OBJECT_ID = T2.OBJECT_ID AND T1.OBJECT_ID = :B1


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 500 0.02 0.12 0 2 0 0
Fetch 500 0.14 0.58 28 15026 0 12500
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 1001 0.17 0.70 28 15028 0 12500

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 106 (recursive depth: 1)

Rows Row Source Operation
------- ---------------------------------------------------
12500 NESTED LOOPS (cr=15026 pr=28 pw=0 time=426947 us)
500 TABLE ACCESS BY INDEX ROWID T1 (cr=1500 pr=2 pw=0 time=275722 us)
500 INDEX UNIQUE SCAN T1_UNIQUE2 (cr=1000 pr=2 pw=0 time=12449 us)(object id 57929)
12500 TABLE ACCESS BY INDEX ROWID T2 (cr=13526 pr=26 pw=0 time=228445 us)
12500 INDEX RANGE SCAN T2_IDX2 (cr=1026 pr=26 pw=0 time=12817 us)(object id 57932)


trigger and sequence

June 20, 2006 - 5:32 am UTC

Reviewer: A reader

Hi

I wonder why you would implement an auto increment using triggers to get the sequence number?

Can't we just do

insert......... seq.nextval?

Tom Kyte

Followup  

June 20, 2006 - 10:05 am UTC

yes. and you should

fat primary keys

June 20, 2006 - 10:59 am UTC

Reviewer: A reader

Hi Tom

Try with range scans, in the mean while I will digg the test case I have, post it later
Cheers

Tom Kyte

Followup  

June 21, 2006 - 9:02 am UTC

what do you mean try with range scans. Both of them "range scanned" the child table from the parent.

Else - if you mean "range scan the primary key", use between :x and :y - then I will say your test was likely artifical - and you got a nice good clustering factor on your fake surrogate key that didn't happen for your true primary key.

And I'll call it artificial because - well - why would you range scan on a surrogate key that by definition has no meaning beyond "I am a unique id that should be transparent to the application since all I am is a fake key"????




Design question

August 15, 2006 - 8:40 am UTC

Reviewer: Sara from Pittsburgh, PA

Can you settle an agruement several developers are having? We are using sequences as the primary key on tables. What is the best way to populate this field?

1) Use a trigger that gets sequence.nextval?
2) Write a stored procedure that puts sequence.nextval into a variable and then use that value in the insert statement.
3) Just access sequence.nextval in the insert statement.

Thanks in advance for your opinion.


Tom Kyte

Followup  

August 15, 2006 - 12:14 pm UTC

1) I'm not a fan of that.
2) that is good
3) this is OK too


Why use a trigger (overhead, slower, MAGIC STUFF THAT HAPPENS IN THE BACKGROUND AND CONFUSES PEOPLE NO END) when you don't absolutely have to?

performance stats for surrogate v/s natural

January 12, 2007 - 4:45 am UTC

Reviewer: A reader

Tom,

<<
o it is easier to code :

select * from p, c where p.primary_key = c.foreign_key

then

select * from p, c where p.id1 = c.fk1 and p.id2 = c.fk2 and p.id3 = c.fk3

and the first query runs faster.
>>


We have a similar situation, but there are two natural key columns varchar(16) & varchar(2).
So by having a surrogate key (created by sequence number)...how much performance gain would I get if I join with surrogate key v/s natural key.
The data in the base table is around 20 million.

Regards,
Prasad

tablename_id vs. id

March 22, 2007 - 10:27 am UTC

Reviewer: Mark from NY

Hey Tom,

Have you found any pros and cons in naming a surrogate primary key "tablename_id" as opposed to just "id"? I personally opt for the former - I like to have foreign key column names match their parent's primary key name. Another guy here finds it redundant, saying that when you join you're always typing the table names, or an alias, anyway (e.g. mytable.id = othertable.id).

Any opinion on that?

Thank you,
Mark
Tom Kyte

Followup  

March 22, 2007 - 10:28 am UTC

I don't mind "id" by itself, using a correlation name is a good idea in most all cases so it could be viewed as redundant.

thanks

March 22, 2007 - 10:47 am UTC

Reviewer: Mark from NY

OK, thanks Tom.

Small correction to my "e.g." above: it should read "mytable.othertable_id = othertable.id" because one of them is a FK...

Ugh, I hate just "ID"

March 22, 2007 - 2:43 pm UTC

Reviewer: Mark Brady from Baltimore, MD 21202

SELECT * FROM user_Tab_Columns WHERE
column_name = 'MyTable_ID'


Oh yeah, can't just do that to find all the tables that share a common column.


SELECT * FROM user_Tab_Columns WHERE
column_name = 'MyTable_ID'
UNION ALL
SELECT * FROM user_Tab_Columns WHERE
table_name = 'MyTable' and
column_name = 'ID'



Who cares that it's redundant in one place. Programmers are just too hung up on Maximizing/Minimizing/Optimizing EVERYTHING.


FROM
MyTable mt,
YourTable yt,
hisTable ht,
theirTable tt
WHERE
mt.yourTable_id = yt.id and
yt.hisTable_ID = ht.id and
ht.Theirtable_id = tt.id

I just don't find that easy to quickly decipher. I spend more time bouncing around the aliases and the column names than this:

FROM
MyTable mt,
YourTable yt,
hisTable ht,
theirTable tt
WHERE
mt.yourTable_id = yt.yourTable_id and
yt.hisTable_ID = ht.hisTable_id and
ht.Theirtable_id = tt.Theirtable_id

When you see the same column name on both sides it makes reading (or more importantly debugging) the Statement so much easier.

Even if you only slightly agree with me or you think that it is easier to do it my way but just barely easier, what benefit do you get out of it the other way? Some religious adherence to divine redundancy elimination. I am a developer, I understand the personality flaws/quirks that make us see almost everything we do as an oppotunity for Maximizing/Minimizing/Optimizing. I put the cream and sugar in my coffee first so that the addition of the coffee does the stirring, I save throwing a stirrer away every cup. I mean, I get it. But Come On, give it up when doing the 'Unnecessary' thing actually makes something else easier.
Tom Kyte

Followup  

March 22, 2007 - 3:02 pm UTC

I don't find your approach any more (or less) readable personally.

This is the thing of opinions. I don't mind "ID" by itself.


I take my coffee black - that is the optimization I chose - I get from coffee pot to ingestion faster that way :)

So you must believe in the screw off beer cap then. :-)

March 23, 2007 - 1:05 pm UTC

Reviewer: Mark Brady

Absolutely, there are opinions as to what is more or less readable.

In the case of the poster above, the recalcitrant colleague made an argument not for clarity but for redundancy elimination. My point is that if it is more readable for 1 developer and the rest are like you, see no difference, and the one roadblock is some guy whose only argument is that it is "redundant" then that's a purely religious stand by his captious peer.

As you say, and quite often, not everything is bad (triggers?). Tools have their place. Even redundancy is useful. Ask the space shuttle astronauts if they want to eliminate redundancy.

Redundant isn't a reason unto itself. It should be,

it's redundant and that makes is slower
it's redundant and that could cause an error
it's redundant and that takes a lot more space to store

Another take on why to use composite key and not surrogate key

June 27, 2007 - 12:38 pm UTC

Reviewer: Jon from Omaha, NE USA

Here¿s something I haven¿t seen covered yet. If it has, then accept my apologies for the wasted pixels.

Example 1:

cat
cat_id (PK)
cat_nm (AK 1)

manu
manu_id (PK)
manu_nm (AK 1)

cat_manu
cat_id (PK) (FK) -> cat.cat_id
manu_id (PK) (FK) -> manu.manu_id

item
item_id (PK)
item_nm (AK 1)
cat_id (FK) -> cat_manu.cat_id
manu_id (FK) -> cat_manu.manu_id


When a category and manufacturer combination is changed, the item table must also be modified which helps reinforce the RI since it raises awareness that if you change the relationship between categories and manufactures you change the meaning of the item.


Example 2:

cat
cat_id (PK)
cat_nm (AK 1)

manu
manu_id (PK)
manu_nm (AK 1)

cat_manu
cat_manu_id (PK)
cat_id (FK) (AK 1) -> cat.cat_id
manu_id (FK) (AK 1) -> manu.manu_id

item
item_id (PK)
item_nm (AK 1)
cat_manu_id (FK) -> cat_manu.cat_manu_id

With this example, someone can change the combination of category and manufacturer (update the cat_id or manu_id for a cat_manu_id row in the cat_manu table) which then affects the records in the item table without the programmer/user aware that they just changed the meaning of the item.

I do agree with Tom that when you get to four or more columns in a composite key it can be hard to maintain the SQL that is written, so with that in mind, I will succumb to the surrogate key. But, I still prefer to force the programmer/user to deal with the relationships explicitly via the use of composite keys whenever possible.

What say you community?


Why natural key

January 14, 2009 - 6:11 pm UTC

Reviewer: Arvind Mishra from Pune, India

Hello Tom,

You suggested that "If the above is used as a foreign key in many other tables" then go for surrogate key. But Jonathan Lewis suggests that having composite key as primary key is useful to avoid unnecessary join. This comment can be seen at ( http://jonathanlewis.wordpress.com/2006/12/29/meaningless-keys/ ). Now I am confused. Please explain.

Also, you are always (OK most of the time!) in favour of using natural keys. What are the advantages of doing that?

Regards,

Arvind

Tom Kyte

Followup  

January 16, 2009 - 4:59 pm UTC

I said "many", "a lot of other" - implying "if it is really large - there could be savings to be had by not replicating this to dozens of other tables).

Everything is answered with "it depends", what if you don't need those attributes (eg: the join argument doesn't always hold water, it does frequently, but not always).

I'm almost always in favor of using natural keys. You should only use a surrogate when no natural keys really work (or some other common sense compelling reason says you shouldn't)

primary key

February 17, 2009 - 12:31 pm UTC

Reviewer: A reader

Tom:

You say Primary key should never be updateable.

I agree but does not this mean that any data that is entered by a user should never be a PRIMARY Key since a user might make a data entry error and wants to fix it.

For example, if i receive an Invoice from company XYZ and i log it into my invoice table, I would not be able to use their invoice number because i might make a typo and want to fix it so i should use a surrogate invoice id (system generated) for PK. correct?
Tom Kyte

Followup  

February 17, 2009 - 2:36 pm UTC

invoice number would not be the primary key would it.

I give you invoice #42
She gives you invoice #42
He gives you invoice #42

Perhaps it is part of a primary key - But sure, if you feel you are going to modify the field - it is not a primary key, it might be unique, it might be not null, but a primary key - it is not, not if you have in place mechanisms to update it.

keys

February 17, 2009 - 5:10 pm UTC

Reviewer: A reader

Tom:

My point is that all natural KEYS are part of the data entered by user (not system generated). I can't imagine a system that wont let a user fix a typo or data entry error.

Does not that mean to not use natural keys as they are most of time updateable.
Tom Kyte

Followup  

February 17, 2009 - 8:20 pm UTC

No they are not (all natural keys are NOT that way). Do not twist around what has been said.

Use natural keys when natural keys make sense.

Ok, a state lookup table. Should it be:

state_cd primary key, state_name
or should it be
idiotic_number number primary key, state_cd, state_name

that is a rather trivial example, some people would go with option 2. I would not.

And actually, there is quite the easy solution that would even let that hand input invoice work nicely.

If you input the wrong invoice, you input the wrong invoice. Delete the incorrect information, insert the correct information.


Use natural keys when natural keys *make sense* and no, most data in a database is not hand entered in the year 2009. Most data is not.

February 17, 2009 - 11:03 pm UTC

Reviewer: A reader

Tom:

I see your point on validation tables. I meant more on transaction tables.

Still if you have screens to maintain a state lookup table and a user entered "VC" instead of "VA", then he has to go back and update the code.

Your solution is to delete the record and enter a new one and not update. right!
Tom Kyte

Followup  

February 18, 2009 - 7:28 am UTC

most transaction tables are not populated by hand - think about it, many transactional system do hundreds/thousands of transactions per second. Many of those tables are never touched by human hands.


If the user entered VC instead of VA, they need to delete VC and input VA, yes.


composite Primary KEY

August 10, 2009 - 3:24 pm UTC

Reviewer: Vikas Sharma from delhi india

Hi Tom,

I am designing data tables for a web based SAAS application. All the tables are going to have a primary key column say (col_pk) and a tenant_id column. All though col_pk is primary key but i want to create a composite primary of col_pk and tenant_id as almost of select on all table will have at least one where condn like "select * from mytable where col_pk = some_value and tenent_id = other_value".

This would prevent some other tenent to see the pk data of some other tenent apart form other security.

Please suggest.

Thanks

Vikas Sharma





Tom Kyte

Followup  

August 13, 2009 - 8:57 am UTC

you would

ops$tkyte%ORA10GR2> create table t
  2  ( x int constraint t_pk primary key using index( create index t_pk on t(x,y) ),
  3    y int,
  4    z int
  5  )
  6  /

Table created.



do not modify your primary key for an indexing consideration. You can use an index on (x,y) to enforce a primary key on (x)

Keep your key what your key should be.

Order of composite primary key columns important?

August 27, 2009 - 1:04 pm UTC

Reviewer: Joris Vleminckx from Santiago, Chile

Tom, do you know whether the order of the columns in the composite primary key declaration is important? I am using Oracle 10g.

For instance, in a table with two columns
does CONSTRAINT "MY_PK" PRIMARY KEY ("COL1", "COL2") mean exactly the same thing as CONSTRAINT "MY_PK" PRIMARY KEY ("COL2","COL1")?

Tom Kyte

Followup  

August 28, 2009 - 4:59 pm UTC

if you have col1, col2, it will be good for queries (in general) of the form:

where col1 = ?
where col1 = ? and col2 = ?
where col1 = ? and col2 > ? (or less than - any range)

you want things that you use "=" on at the front of the index and then things you range on at the end.


if you do not do "col1 = ? and col2 > ?" but you do "col1 > ? and col2 = ?" then I would suggest (col2, col1) as the correct ordering.

Composite Key or Surrogate Key

March 31, 2010 - 11:29 am UTC

Reviewer: Sujith from TX, USA

Hi Tom,

I have a question. Here is the data structure.
--Invoice Header Data
CREATE TABLE sujith_invoice_header_tab(invoice_id number, invoice_date date, invoice_amount number);

invoice_id is the PK

--Invoice Lines
CREATE TABLE sujith_invoice_item_tab(invoice_id number, item_id number, line_code VARCHAR2(20), line_amount NUMBER);

Invoice_id and item_id makes up the PK, invoice id is the FK to header

CREATE TABLE sujith_invoice_line_tax_tab(invoice_id number, item_id number, tax_code VARCHAR2(20), tax_amount NUMBER);

Invoice_id, item_id and tax_code is the PK and there are two FK to header and item.

What if someone argue that you are repeating same data (invoice_id and line_item) and that instead you should use unique system generated(sequence + trigger) id's for line item and tax tables?

What are the pros and cons of both approaches and which one would you recommend??

Thanks,

Sujith



Tom Kyte

Followup  

April 05, 2010 - 12:31 pm UTC

You should always use natural keys when ever possible and practical.

Here it is not only possible, it is practical. You are "repeating" (not really) data that is immutable - never changing. Perfect.

Phrase the response this way "why would you add the great overhead to do something that adds nothing, using a trigger - extra code that has to run and will make every insert take much much longer, using a sequence with its requisite recursive sql and commits - do you just want to add code that will slow down everything and make everything harder just because you get paid by lines of code?"


turn it around, ask them what the advantage of a surrogate key would be?

Natural Key Vs. Surrogate

March 31, 2010 - 4:18 pm UTC

Reviewer: Sujith from TX, USA

Hi Tom,
After posting the above question I kept googling and found the answers I was looking for is under "Natural Key Vs. Surrogate" topic.

Is there a Oracle specific view/points and recommendations under that topic??

Thanks,

Sujith
Tom Kyte

Followup  

April 05, 2010 - 12:45 pm UTC

sure, use natural keys whenever possible and practical, that is what they are there for.

foreign key as primary key

April 17, 2010 - 9:36 am UTC

Reviewer: A reader

What are the pros and cons of using the primary key of the parent table as part of the primary key in the child tables versus just having it as a foreign key in the child tables?

Please shed some light on this.
Tom Kyte

Followup  

April 17, 2010 - 12:56 pm UTC

It is called a natural key when you do that and it is great in general. You would be hard pressed to find a case against doing it.

April 17, 2010 - 3:40 pm UTC

Reviewer: A reader

Hi Tom,

Thanks for your response.Are there any performance issue, if we use the foreign key as part of the primary key?

Tom Kyte

Followup  

April 19, 2010 - 8:30 am UTC

it'll in general perform 'faster' since the table will be smaller and easier to insert into (less work to generate a surrogate).

And you'll have less joins - you won't need to go to parents of parents in order to figure out key information.

April 17, 2010 - 3:46 pm UTC

Reviewer: A reader

Hi Tom,

Can you please explain what you are trying to convey in the below line with an example?
"You would be hard pressed to find a case against doing it"

Are there any performance issue, if we keep the foreign key as part of the primary key?

Thanks for your help.

Regards,
Chella
Tom Kyte

Followup  

April 19, 2010 - 8:39 am UTC

Can you please explain what you are trying to convey in the below line with an example?
"You would be hard pressed to find a case against doing it"


Umm, well, I'm sort of saying the opposite of what you want. I'm asking YOU for an example to the contrary. I'm saying you would have to search long and hard to find a case in which using the surrogate instead of the natural key here would be "good".


Create table documents ( doc_id primary key, .... );

now, a child table could be:

create table document_versions (doc_id references documents, doc_version, ... primary key(doc_id,doc_version) );


that is the way I would do it, I would not create yet another surrogate - I would use the natural key for this table, not a faked out one.


It would take less storage (I would need to have doc_id and doc_version regardless, using a surrogate would add SOME_KEY to the table and a unique index on SOME_KEY - in addition to the index on doc_id that would be needed in all cases - and the unique constraint on doc_id,doc_version, and the not null constraints on both as well).

If I had a child table of this child table - it would simply carry down the doc_id,doc_version - and we could end up doing a lot less joining as the information about the parents parent (and so on) is already there.



Let me turn this around, now you present a case whereby it would make sense to use a surrogate in the child tables?

Something changed from 2001 up today?

December 14, 2010 - 1:04 pm UTC

Reviewer: A reader

Hi Tom,
reading the first post, I understand you was in favor of using a surrogate key, only is some particular circumstances and mostly because of:
A) the space savings (when the key columns are more than 3 and referenced in many tables)
B) the update cascade (when could happen) is avoided, update is limited to one row in one table
C) coding requires less effort (only one column to join)
Going on with reading, I was not able to find any answer supporting/suggesting an advantage in terms of performances using surrogate instead of natural keys. At the contrary, I read many other reasons for not using surrogates keys. Would you change/add anything in your answer on this subject after many years and in relation with Oracle 11gR2?
Tom Kyte

Followup  

December 15, 2010 - 12:31 am UTC

I am (and have always been) in favor of using a natural key (an immutable natural key) when they exist and are of a reasonable size - surrogates otherwise.

if they are immutable - update cascade is not an issue to even discuss.
if they are of reasonable size - the space discussion isn't entirely relevant.


As with most all things - the answer is "it depends". If I have a two, maybe three attribute natural key available - I'd most likely use it.

composite Primary key

May 12, 2012 - 2:10 pm UTC

Reviewer: Ahmad Al-Sallal from Jordan-Amman

Hi Tom,
I'm in a problem again :(
Currently I'm working with a company that bought it's ERP application from a Vendor (actually bad software vendor).
The "Main Master Table" has a composite primary key with 8 columns !!! and there are more than 10 detailed tables for this table.
My duty is to generate/extract a management reports, while doing that if need to join 3 tables i have to write 16 join condition, of course if i need to join 8 tables it's 56 join conditions !!!!
My Question: I want to build a new schema to simplify the structure by adding a new column "ID" instead of the columns primary key/foreign keys, how can implement it???

My thoughts:
- create new user
- create a sequence
- create a mapping_table with 9 columns
1 -> new primary key "ID" (using the sequence)
[2-9] -> the 8 old composite primary key
- create trigger on the Master Table After I/U/D to I/U/D on the mapping table
- create a Materialized view for the Master Table (join between Old Master and the mapping_table "to get the ID")
- create a Materialized views for all detailed Tables
after doing that, can i create indexes on materialized views???


Tom Kyte

Followup  

May 12, 2012 - 4:02 pm UTC

The "Main Master Table" has a composite primary key with 8 columns !!! and
there are more than 10 detailed tables for this table.
My duty is to generate/extract a management reports, while doing that if need
to join 3 tables i have to write 16 join condition, of course if i need to join
8 tables it's 56 join conditions !!!!,


other than being a pain for you ONCE (views exist after all) there is nothing inherently wrong with using a natural key.

Oh and - if you have a parent with 3 detail tables, I'd like to see the query that joins them together (unless you have an aggregate of some sort involved in ALL THREE DETAILS)

but think "view", views are good - build a view that you can use for a few reports, build another view for another few reports and so on. Views are good for this.

My Question: I want to build a new schema to simplify the structure by adding a
new column "ID" instead of the columns primary key/foreign keys, how can
implement it???


You can't
You shouldn't
You won't

It just isn't necessary.