Skip to Main Content
  • Questions
  • Add Primary Key to Large Table - Update or Create As Select/Drop Rename

Breadcrumb

May 4th

Question and Answer

Connor McDonald

Thanks for the question, Jim.

Asked: June 21, 2016 - 10:07 pm UTC

Last updated: June 22, 2016 - 4:42 am UTC

Version: Oracle 11g EE 11.2.0.4.0 - 64bit

Viewed 10K+ times! This question is

You Asked

Hi,

I'm new at my company, and they have a table with a little over 200 million records and 260 columns, and there is no primary key. I need to add a primary key for a new process.

I've seen articles on here recommending to create a new table with select from the current table when updating large tables (CREATE table AS (SELECT from current table --updates--)), then drop old table and rename new table. My questions is, will using that method have a smaller affect on UNDO and REDO vs updating a new column that is NULL with sequential values? With that many columns, it seems like it would be less of an impact to just add the column an update it.

Any advice would be greatly appreciated

Thanks,
Jim

and Connor said...

Option 1 - is there a set of columns that could be used as the primary key ?

If not, then see Option 2
If so, then you can use that, rather than adding a surrogate key, which means

a) make the columns not null if they are not already

b) check for any duplicates

select col1, col2, count(*)
from t
group by col1, col2
having count(*) > 1

c) once duplicates are cleaned up, then you can add an index online, and base the primary key on that

create index IX on T ( col1, col2 ) online;
alter table T add constraint PK primary key ( col1, col2 ) using index;

Option 2

If you need to add a (non-null) column, then a key decision is what will happen to each row.

If by adding the column, and issuing the update, the rows need to move (ie, they no longer fit where they were), then thats going to be major problem. Or if the rows were compressed, then an update means a decompress...also a major problem. So then you are into create-table-as-select (ctas) territory.

CTAS also has the (potential) benefit of doing the operation in nologging fashion, which would reduce your undo/redo to near zero

I often prefer ctas because it means that sometimes I can "sneak" in some things I would have like to have done but never had the time to do, eg change precisions, compress rows, change pctfree, etc etc....

But the best way - benchmark each with a subset of the data, and see what will best suit your situation.

Hope this helps.

Rating

  (1 rating)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Plus

Ghassan, June 22, 2016 - 5:04 am UTC

I add to option 1:

C) visit the business and functional requirements . We cannot simply technically define alone a pk.
Espacially non surrogate ones.

Looking at current data is not enough maybe in future incoming cases this select group by returns undesirable output. . So visit business. ..