Hi guys. I have a question.
I have a huge table (let's say, over 50M rows)
right now it doesn't have natural PK, and in order to get higher data-integrity, I want to create a natural key for duplication protection.
The question is, what is going to be the impact on performance while inserting new data.
What is the best way to test it ?
Thanks in advance,
Niv
Here's a simple example of what you can do
SQL> set timing off
SQL>
SQL> drop table T purge;
Table dropped.
SQL> drop sequence pk_Seq;
Sequence dropped.
SQL>
SQL> create sequence pk_seq cache 5000;
Sequence created.
SQL>
SQL> create table T (
2 pk int default pk_seq.nextval, <=== 12c feature
3 natural_key int );
Table created.
SQL>
SQL> alter table T add primary key ( pk );
Table altered.
SQL>
SQL>
SQL> create or replace
2 procedure hammer_time(p_offset int) is
3 err pls_integer := 0;
4 begin
5 dbms_random.seed(0);
6 for i in 1 .. 50000 loop
7 begin
8 insert into T ( natural_key )
9 values ( p_offset + dbms_random.value(1,100000) );
10
11 commit;
12 exception
13 when dup_val_on_index then err := err + 1;
14 end;
15 end loop;
16 dbms_output.put_line('dup cnt = '||err);
17 end;
18 /
Procedure created.
SQL>
SQL> set timing on
SQL> exec hammer_time(0);
PL/SQL procedure successfully completed.
Elapsed: 00:00:06.25
SQL>
SQL> truncate table T;
Table truncated.
Elapsed: 00:00:00.03
SQL>
SQL> alter table T add constraint UQ unique ( natural_key ) ;
Table altered.
Elapsed: 00:00:00.01
SQL>
SQL> set timing on
SQL> exec hammer_time(0);
PL/SQL procedure successfully completed.
Elapsed: 00:00:11.68
SQL>
SQL>
So I start with a table with a surrogate primary key and a natural key, but the latter is not yet indexed. Then I do a insertion test (I'm assuming one row at a time, but you can model this to suit your application).
Then I truncate and try again, this with an index in place.
Why did I create a proc with an offset parameter ? Well, I can now extend this test to use dbms_scheduler to run 10's of these concurrently, each with a different offset so mimic concurrent sessions.
Similarly, the distribution of natural keys will have an impact - I'm using 'dbms_random' but you would ideally use something thats close to what you will have in real life, eg, it could be mainly ascending which means the index will stay quite dense, or if its totally random, you'll see index splits more frequently.
I've also just done a straight timing test - you can also run this benchmark with trace enabled, or AWR or statspack, to gauge other impacts (eg redo, locks, latching etc etc)
Hope this helps.