You Asked

My application uses NUMBER columns as primary keys, and uses sequences to populate the column (via a trigger with the typical "if :new.customer_id is null then select seq_customer_id.nextval into :new.customer_id from dual" construct).

Now, I would like to move/copy subsets of data between different databases (each of which has identical schemas/tables), but because of the sequence-generated values there might be collisions where the primary key value has already been used in the target database.

This could have been solved by using GUIDs as primary keys, but this would require a significant rewrite of the application.

Is it possible to generate globally unique integers, and if yes, how do I do this, and is this a good idea from a performance and indexing point of view? Could I write a function that returns a globally unique number and modify my triggers to call this function instead of just selecting the next value from a sequence? Are there any gotchas with this approach?

I think that APEX does something similar under the covers, given the way it assigns internal IDs for lists, list items, templates, etc. Also, there is some discussion about the same concept for PostgreSQL here: http://rob.conery.io/2014/05/29/a-better-id-generator-for-postgresql/

Your advice would be greatly appreciated!

and Connor said...

Hi Morten,

Some ideas that spring to mind just using normal sequences

1) Choose a limiting value for the maximum number of customers you could ever expect. Then use multiples of that. eg Lets say you never expect more than 1,000,000,000 customers. Then you just have

database 1: sequence start with 1
database 2: sequence start with 1,000,000,000+1
database 3: sequence start with 2,000,000,000+1

and so forth.

or

2) If you want customers as they are created to be relatively "clustered", you could do something similar, but using the maximum number of databases expected. eg Lets say you have a limit of 100 databases, then you would have:

database 1: sequence start with 1 increment by 100
database 2: sequence start with 2 increment by 100
database 3: sequence start with 3 increment by 100

Thus you still are using sequences with all the simplicity and scalability they give.

Rating

(2 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Converting sys_guid to number

Morten, December 12, 2016 - 12:20 pm UTC

Thanks for the feedback! After posting the original question I also found some blog posts detailing the approach that involves using sys_guid() to generate a globally unique value, and then using to_number on that value to continue using numbers rather than GUIDs/RAWs.

See, for example, these blog posts:

http://davidsgale.com/how-do-you-identify/

http://sql-plsql-de.blogspot.no/2009/07/sequence-oder-sysguid-alternativen-zum.html

Care to comment on this approach? It seems to give the benefit of globally unique values, at the cost of some more disk space, but without having to decide in advance how many customers or databases there are going to be.

But what about performance? Would these big numbers be slower to use in queries (lookups by primary key value or filters when they are used as foreign keys) than the smaller numbers? Carsten mentions that the buffer cache gets filled up/exhausted faster when the key values are larger? And indexing would suffer from more fragmented values?

December 13, 2016 - 2:14 am UTC

GUID's will consume more space (simply because they are bigger).

Whilst surrogate keys are meant to be "meaningless" in theory, it often is not the case in the "real world". For example, (in a sqeuence PK model) if someone is regularly querying ID=123123, then the probability that people are querying ID's *similar* to that number is often quite high, because they are querying the "more recent" data. So the purists viewpoint that every ID is equally likely to be queried is false.

Hence for *many* applications, you get a performance benefit, because you get better usage of the buffer cache. Conversely, for a high volume app, it might be the opposite of what you want, because you get increased contention for those blocks.

GUID's (like reverse key indexes) will generally create more sparse indexes because they'll be splitting blocks rather than "appending to the end" like a sequence-based one. Once again - that could be bad, or could be good, depending on your transaction volume.

Performance

Rajeshwaran, Jeyabal, December 14, 2016 - 10:18 am UTC

....
But what about performance? Would these big numbers be slower to use in queries (lookups by primary key value or filters when they are used as foreign keys) than the smaller numbers? Carsten mentions that the buffer cache gets filled up/exhausted faster when the key values are larger? And indexing would suffer from more fragmented values?
....

Ofcourse 40% more CPU than sequence based key columns.

demo@ORA12C> create table t1(x number constraint t1_pk primary key,
  2     data varchar2(100) );

Table created.

demo@ORA12C>
demo@ORA12C> create table t2(x number constraint t2_pk primary key,
  2     data varchar2(100) );

Table created.

demo@ORA12C> create sequence s1 cache 1000;

Sequence created.

demo@ORA12C>
demo@ORA12C> variable n number
demo@ORA12C> exec :n := 100000;

PL/SQL procedure successfully completed.

demo@ORA12C>
demo@ORA12C> exec runstats_pkg.rs_start;

PL/SQL procedure successfully completed.

demo@ORA12C> begin
  2     for i in 1..:n
  3     loop
  4             insert into t1(x,data)
  5             values( to_number(sys_guid(),'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX') ,'x');
  6             commit;
  7     end loop;
  8  end;
  9  /

PL/SQL procedure successfully completed.

demo@ORA12C> exec runstats_pkg.rs_middle;

PL/SQL procedure successfully completed.

demo@ORA12C> begin
  2     for i in 1..:n
  3     loop
  4             insert into t2(x,data)
  5             values( s1.nextval ,'x');
  6             commit;
  7     end loop;
  8  end;
  9  /

PL/SQL procedure successfully completed.

demo@ORA12C> exec runstats_pkg.rs_stop(1000);
Run1 ran in 2662 hsecs  and CPU time in 1238 hsecs
Run2 ran in 1878 hsecs  and CPU time in 789 hsecs
run 1 ran in 141.75% of the time

Name                                  Run1        Run2        Diff
STAT...commit cleanouts            201,623     200,483      -1,140
STAT...commit cleanouts succes     201,619     200,478      -1,141
STAT...deferred (CURRENT) bloc     148,831     150,005       1,174
STAT...immediate (CURRENT) blo       1,625         373      -1,252
STAT...calls to kcmgcs               3,038       1,650      -1,388
STAT...db block gets from cach     102,947     101,434      -1,513
STAT...physical reads for flas       5,239       3,716      -1,523
STAT...physical read total IO        5,249       3,725      -1,524
STAT...non-idle wait count           5,285       3,761      -1,524
STAT...physical read IO reques       5,242       3,718      -1,524
STAT...physical reads cache          5,242       3,718      -1,524
STAT...physical reads                5,242       3,718      -1,524
STAT...free buffer requested         5,275       3,742      -1,533
LATCH.undo global data             306,870     304,750      -2,120
LATCH.object queue header oper      22,940      20,131      -2,809
STAT...redo entries                309,693     303,835      -5,858
STAT...messages sent                28,374      19,988      -8,386
LATCH.lgwr LWN SCN                  28,393      20,001      -8,392
LATCH.Consistent RBA                28,352      19,783      -8,569
STAT...db block changes            615,111     605,992      -9,119
STAT...db block gets               522,259     512,902      -9,357
STAT...db block gets from cach     522,259     512,902      -9,357
STAT...session logical reads       525,393     515,079     -10,314
LATCH.messages                      62,757      42,792     -19,965
LATCH.redo allocation               85,648      60,145     -25,503
LATCH.redo writing                  85,176      59,504     -25,672
LATCH.cache buffers chains       2,129,928   2,094,408     -35,520
STAT...session uga memory max      123,512      65,488     -58,024
STAT...KTFB alloc time (ms)         68,898       3,345     -65,553
STAT...calls to get snapshot s     100,663     200,494      99,831
STAT...session pga memory max            0     131,072     131,072
STAT...session pga memory          -65,536     196,608     262,144
LATCH.sequence cache                     0     300,200     300,200
STAT...file io wait time        13,106,639   8,929,040  -4,177,599
STAT...undo change vector size  28,674,336  23,982,244  -4,692,092
STAT...KTFB alloc space (block  44,040,192  32,505,856 -11,534,336
STAT...redo size                99,250,308  87,146,032 -12,104,276
STAT...physical read bytes      42,942,464  30,457,856 -12,484,608
STAT...physical read total byt  43,057,152  30,572,544 -12,484,608
STAT...cell physical IO interc  43,057,152  30,572,544 -12,484,608
STAT...logical read bytes from######################## -84,492,288

Run1 latches total versus runs -- difference and pct
        Run1        Run2        Diff       Pct
   3,718,603   3,886,768     168,165     95.67%

PL/SQL procedure successfully completed.

demo@ORA12C>

while sequence based keys lead to appealing clustering factors.

demo@ORA12C> exec dbms_stats.gather_table_stats(user,'T1');

PL/SQL procedure successfully completed.

demo@ORA12C> exec dbms_stats.gather_table_stats(user,'T2');

PL/SQL procedure successfully completed.

demo@ORA12C> select index_name, clustering_factor,t.num_rows,t.blocks
  2  from user_indexes i,
  3      user_tables t
  4  where i.table_name in ('T1','T2')
  5  and i.table_name = t.table_name
  6  order by index_name ;

INDEX_NAME CLUSTERING_FACTOR   NUM_ROWS     BLOCKS
---------- ----------------- ---------- ----------
T1_PK                  99745     100000        496
T2_PK                    164     100000        244

2 rows selected.

90-10% leaf block split up for Sequences and 50-50% split up for guid based key columns.

demo@ORA12C> analyze index t1_pk validate structure;

Index analyzed.

demo@ORA12C> select name, blocks,lf_blks,btree_space,used_space,pct_used from index_stats;

NAME              BLOCKS    LF_BLKS BTREE_SPACE USED_SPACE   PCT_USED
------------- ---------- ---------- ----------- ---------- ----------
T1_PK                640        541     4336032    3175082      74

1 row selected.

demo@ORA12C>
demo@ORA12C> analyze index t2_pk validate structure;

Index analyzed.

demo@ORA12C> select name, blocks,lf_blks,btree_space,used_space,pct_used from index_stats;

NAME             BLOCKS    LF_BLKS BTREE_SPACE USED_SPACE   PCT_USED
------------ ---------- ---------- ----------- ---------- ----------
T2_PK               256        187     1504032    1490919        100

1 row selected.

Range scans will be affected due to clustering factors.

demo@ORA12C> column min_x new_val min_x
demo@ORA12C> column max_x new_val max_x
demo@ORA12C>
demo@ORA12C> select min(x) min_x,max(x) max_x
  2  from ( select x from t1
  3  order by x
  4  fetch first 100 row only );

     MIN_X      MAX_X
---------- ----------
2.5686E+33 3.4810E+35

1 row selected.

demo@ORA12C> set autotrace traceonly explain statistics
demo@ORA12C> select * from t1 where x between &min_x and &max_x;
old   1: select * from t1 where x between &min_x and &max_x
new   1: select * from t1 where x between 2.5686E+33 and 3.4810E+35

99 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   103 |  2472 |   104   (3)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |   103 |  2472 |   104   (3)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("X"<=3.4810E+35 AND "X">=2.5686E+33)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        444  consistent gets
          0  physical reads
          0  redo size
       3114  bytes sent via SQL*Net to client
        552  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         99  rows processed

demo@ORA12C> set autotrace off
demo@ORA12C>
demo@ORA12C> select min(x) min_x,max(x) max_x
  2  from ( select x from t2
  3  order by x
  4  fetch first 100 row only );

     MIN_X      MAX_X
---------- ----------
         1        100

1 row selected.

demo@ORA12C>
demo@ORA12C> set autotrace traceonly explain statistics
demo@ORA12C> select * from t2 where x between &min_x and &max_x;
old   1: select * from t2 where x between &min_x and &max_x
new   1: select * from t2 where x between          1 and        100

100 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1879001825

---------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |   100 |   700 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T2    |   100 |   700 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | T2_PK |   100 |       |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("X">=1 AND "X"<=100)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          5  consistent gets
          0  physical reads
          0  redo size
       1647  bytes sent via SQL*Net to client
        552  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

demo@ORA12C> set autotrace off
demo@ORA12C>

GUID’s are

Randomly unique values
Will insert randomly across the entire breadth of the index.
will need the entire index in memory in order to say “we do no physical reads on the index during an insert”
These values are NOT monotonically increasing values
They have a terrific impact on the clustering factor of the index (close to the number of rows - so range scans are not likely possible).
Index will be larger in size when compare to sequence based index, will end up with 50-50 index splits ( half of the index will be empty at any time – that’s ok – don’t rebuild them to reclaim the space, that will only be a waste of time)

Sequences are

Sequentially unique values
As you insert – the values are increasing, you only hit the right hand size of the index.
Only the right hand side of the index in the cache, the left hand side – which you don’t query so much anymore since it is old data – it is rarely in the cache,  hence when we insert we into this index – you will do very little if any physical IO.
These values are monotonically increasing values ( as you insert into it, the values are increasing and hit the right hand side of index)
They will end up with an appealing clustering factor of the index (close to the number of blocks – so range scans are likely possible)
Index will be thin, will end up with 90-10 split ups

Breadcrumb

Question and Answer

You Asked

and Connor said...

Rating

Comments

Converting sys_guid to number

Performance

More to Explore

PL/SQL demos

PL/SQL docs

Globally unique integers

Breadcrumb

Question and Answer

You Asked

and Connor said...

Rating

Comments

Converting sys_guid to number

Performance

More to Explore

PL/SQL demos

PL/SQL docs