Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question.

Asked: November 26, 2006 - 4:06 am UTC

Last updated: November 21, 2011 - 10:46 am UTC

Version: 10.2

Viewed 10K+ times! This question is

You Asked

Tom,

Our application inserts a lot of records (it's a web-selling app) to a table. We were planning to use Sequences as a generator of unique values for the TRANS_ID column. Later we were told to consider GUIDs as unique numbers.

The break down of selects, inserts and updates are roughly 60:30:10. The selects and updates are queried on the TRANS_ID column:

select ... from trans where trans_id = ...
update trans set ... where trans_id = ...

From your site and my own analysis, I conclude that:

(1) sequences require more latches
(2) GUIDs may be converted to varchar2 to be effectively used in programs, etc. The long values (32 chars) increases logical I/O and redo. This increase in I/O will increase the latches as well, specifically "cache buffer chains". The bigger size also increases consitent gets.

So, I'm confused about my choice. Assuming the long GUIDs are acceptable as keys; should I consider them more efficient than sequences as unique values? Most seems to think otherwise; but my tests clearly indicate more latches. Since "latch=serialization", shouldn't they be avoided, especially when GUIDs are readily available?

Will the answer change in a RAC environment, where the intra-instance communication will exacerbate the issues with sequences?

and Tom said...

well, GUIDS when stored would be stored in RAW fields - they would be 16 bytes, not 32, when stored.

You would use:

where trans_id = hextoraw( :x );

and you would select:

select rawtohex(trans_id) ...


if you want to use "strings" in your application (reasonable to do)

Sequences, if set to a large enough cache size (cache defaults to 20, inadequate for many high volume applications, a setting of a 1,000 or more makes great sense) are used by most all high volume Oracle applications.

One of the downsides will be that they are a monotonically increasing number - hence the (unique) index on them will be very "hot" on the right hand side. This can be offset by

o hash partitioning (so you have 8, 16, 32, 64, ... "hot" right hand sides)
o reverse key indexes (so you spread the data out in the entire index structure)


GUIDS will be somewhat randomly distributed and will not cause a "hot" right hand side index. They will always be 16bytes (we use them in our nested table implementation - the parent table has a 16byte raw column added for each nested table type it includes and the child table has as part of it's structure this same 16byte raw)


I'm not sure which you say "require more latches". Your second to last paragraph was not clear which you thought took more



Table created.

ops$tkyte%ORA10GR2> create table t2 ( x number primary key, data char(80) );

Table created.

ops$tkyte%ORA10GR2> create sequence s cache 1000;

Sequence created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> define n=100000
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> exec runStats_pkg.rs_start;

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> begin
2 for i in 1 .. &n
3 loop
4 insert into t1 (x,data) values (sys_guid(),'x');
5 commit;
6 end loop;
7 end;
8 /
old 2: for i in 1 .. &n
new 2: for i in 1 .. 100000

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> exec runStats_pkg.rs_middle;

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> begin
2 for i in 1 .. &n
3 loop
4 insert into t2 (x,data) values (s.nextval,'x');
5 commit;
6 end loop;
7 end;
8 /
old 2: for i in 1 .. &n
new 2: for i in 1 .. 100000

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> exec runStats_pkg.rs_stop(10000);
Run1 ran in 2499 hsecs
Run2 ran in 2759 hsecs
run 1 ran in 90.58% of the time

Name Run1 Run2 Diff
STAT...messages sent 30,027 41,046 11,019
LATCH.lgwr LWN SCN 30,058 41,081 11,023
LATCH.Consistent RBA 30,050 41,110 11,060
LATCH.mostly latch-free SCN 30,275 41,411 11,136
LATCH.messages 61,497 84,489 22,992
LATCH.redo allocation 283,917 316,909 32,992
LATCH.redo writing 90,132 123,202 33,070
STAT...calls to get snapshot s 101,365 201,226 99,861
LATCH.library cache pin 58 200,135 200,077
LATCH.sequence cache 0 300,200 300,200
LATCH.library cache 100,213 400,577 300,364
STAT...IMU Redo allocation siz 2,115,812 1,712,344 -403,468
STAT...IMU undo allocation siz 67,228,188 66,388,100 -840,088
STAT...undo change vector size 26,828,460 24,345,060 -2,483,400
STAT...redo size 94,087,692 88,202,304 -5,885,388

Run1 latches total versus runs -- difference and pct
Run1 Run2 Diff Pct
6,104,556 7,022,494 917,938 86.93%

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create or replace package my_pkg
2 as
3 type rawArray is table of t1.x%type index by binary_integer;
4 type numArray is table of t2.x%type index by binary_integer;
5
6 raw_data rawArray;
7 num_data numArray;
8 end;
9 /

Package created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> begin
2 select x bulk collect into my_pkg.raw_data from t1;
3 select x bulk collect into my_pkg.num_data from t2;
4 end;
5 /

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> exec runStats_pkg.rs_start;

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> declare
2 l_rec t1%rowtype;
3 begin
4 for i in 1 .. my_pkg.raw_data.count
5 loop
6 select * into l_rec from t1 where x = my_pkg.raw_data(i);
7 end loop;
8 end;
9 /

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> exec runStats_pkg.rs_middle;

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> declare
2 l_rec t2%rowtype;
3 begin
4 for i in 1 .. my_pkg.num_data.count
5 loop
6 select * into l_rec from t2 where x = my_pkg.num_data(i);
7 end loop;
8 end;
9 /

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> exec runStats_pkg.rs_stop(10000);
Run1 ran in 420 hsecs
Run2 ran in 394 hsecs
run 1 ran in 106.6% of the time

Name Run1 Run2 Diff
STAT...session pga memory max 131,072 65,536 -65,536
STAT...session uga memory max 123,452 0 -123,452
STAT...session pga memory 65,536 -196,608 -262,144

Run1 latches total versus runs -- difference and pct
Run1 Run2 Diff Pct
329,048 326,257 -2,791 100.86%

PL/SQL procedure successfully completed.


And in a RAC environment - you could use reverse key indexes or hash partitioning to "smear the data around" (reverse key indexes probably). I wrote about that in "Expert Oracle Database Architecture" in the indexing chapter.

Rating

  (7 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Latches -vs- consistent gets

Arup Nanda, November 26, 2006 - 10:38 pm UTC

Tom,

From your output:

LATCH.library cache pin 58 200,135 200,077
LATCH.sequence cache 0 300,200 300,200
LATCH.library cache 100,213 400,577 300,364

the library cache pins and libtary cache latch utlization did go up a lot for the sequence number approach. so, should I be concerned about the scalability of the approach?

On the other hand, the redo, logical I/O, etc. went up singnificantly for the GUID approach. Your output does not show it due to the 10000 cutoff; but the consitent gets go up as well. So, which is the worse evil - latch contention due to sequences or the increase in logical I/O in case of GUIDs? Of course, the correct answer is - it depends on the nature of the app. But I want to know your opinion on this.

Tom Kyte
November 27, 2006 - 7:47 am UTC

Run1 latches total versus runs -- difference and pct
Run1 Run2 Diff Pct
6,104,556 7,022,494 917,938 86.93%

why did you cherry pick those numbers?

and if I did something 100,000 times - and had a 10,000 cutoff, how could anything have gone up "considerably"?

for redo to go up "significantly" for example, it would have to (in my opinion) increase a tad more than 9k for 100,000 rows.

I like GUID myself...

djb, November 27, 2006 - 3:05 pm UTC

For the reasons Tom stated, but I've found some interesting things about them...

On Windows, the generated GUIDs appear quite random on a single instance. On Linux, the GUIDs appear somewhat "linear" on a single instance.

But with a RAC system (on Linux), the different instances have wildly different values. That means that the different instances hit different areas of the index which reduces contention. I like this a lot.

I suppose Windoze and Linux have different ways of generating a GUID, but it all works out for my purposes... even for high-volume systems.

As well, I *always* store GUIDs as RAW(16) and convert them for humans when needed. This prevents the NLS layer from interpreting the data, reducing (perhaps marginally) the processing needed to read/write RAW data, as well as reducing the storage (16 bytes instead of 32 bytes).

I rarely use sequences any more, except when there is a driving business need when they need incrementing numbers for something - and I make sure they understand they are not gap-less.


Sequences won't necessarily be ascending, either

Mike, November 28, 2006 - 7:57 am UTC

> I rarely use sequences any more, except when there is a
> driving business need when they need incrementing
> numbers for something - and I make sure they understand
> they are not gap-less.

In a RAC environment, unless you specify ORDER and NOCACHE in your sequence definition, you can not assume they will always be assigned in ascending order.

And if you are in a RAC environment, ORDER and NOCACHE are definitely things to avoid.

A requirement to have ascending values requires careful design. Are you sure you can always tell 'which happened first'?

What about saving them as 32 bit strings

Dan, September 14, 2007 - 7:00 pm UTC

I know this question was answered a long time ago, but other than the size differences going from 16 (rax) to 32 (hex) is there any other down side to using the rawtohex function before saving the guid? Is that a particularly bad approach?

When it comes to index on the column, do selects suffer when the pk is not sequential?
Tom Kyte
September 15, 2007 - 10:02 pm UTC

you would double the size - unnecessarily.

that is all.


natural keys are typically never sequential. so in many cases, primary keys are not sequential.

Performance Impact of OLAP

Julie Liao, January 20, 2009 - 8:22 pm UTC

Tom,
It may make sense to use GUID for a multi data entries application or OLTP on multiple platforms environment. What's the performance impact for an OLAP over 10 terabytes data warehouse architecture if all the primary key and foreign key with GUID? OLAP SELECT queries or views are complicated and nested with multiple layers and multiple tables inked?
Is there any benchmark? Would the sequence be better idea in this case?
Tom Kyte
January 21, 2009 - 12:06 pm UTC

... What's the performance impact for an OLAP over
10 terabytes data warehouse architecture if all the primary key and foreign key
with GUID? ...

somewhere between -1000% and 1000%. Probably very close to zero.

You would be comparing a fixed width 16 byte raw with a fixed width 16 byte raw.

versus

comparing a varying length field from 0 to 22 bytes with a varying length field from 0 to 22 bytes.


The guid will probably consume more space.

ops$tkyte%ORA11GR1> create table t
  2  as
  3  select s.nextval num, sys_guid() guid, rpad('*',40,'*') data
  4    from dual
  5  connect by level <= 1000000;

Table created.

ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> exec dbms_stats.gather_table_stats( user, 'T' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> select round(sum(vsize(num))/1024/1024,1), round(sum(vsize(guid))/1024/1024,1) from t;

ROUND(SUM(VSIZE(NUM))/1024/1024,1) ROUND(SUM(VSIZE(GUID))/1024/1024,1)
---------------------------------- -----------------------------------
                               3.8                                15.3

ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> @trace
ops$tkyte%ORA11GR1> alter session set events '10046 trace name context forever, level 12';

Session altered.

ops$tkyte%ORA11GR1> select count(*) from t t1, t t2, t t3, t t4 where t1.num = t2.num and t2.num = t3.num and t3.num = t4.num;

  COUNT(*)
----------
   1000000

ops$tkyte%ORA11GR1> select count(*) from t t1, t t2, t t3, t t4 where t1.guid = t2.guid and t2.guid = t3.guid and t3.guid = t4.guid;

  COUNT(*)
----------
   1000000


select count(*)
from
 t t1, t t2, t t3, t t4 where t1.num = t2.num and t2.num = t3.num and t3.num =
   t4.num


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      6.12       6.15      37732      37748          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      6.12       6.15      37732      37748          0           1

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE (cr=37748 pr=37732 pw=0 time=0 us)
1000000   HASH JOIN  (cr=37748 pr=37732 pw=0 time=21997 us cost=16044 size=20000000 card=1000000)
1000000    TABLE ACCESS FULL T (cr=9437 pr=9433 pw=0 time=0 us cost=2609 size=5000000 card=1000000)
1000000    HASH JOIN  (cr=28311 pr=28299 pw=0 time=21997 us cost=11330 size=15000000 card=1000000)
1000000     TABLE ACCESS FULL T (cr=9437 pr=9433 pw=0 time=0 us cost=2609 size=5000000 card=1000000)
1000000     HASH JOIN  (cr=18874 pr=18866 pw=0 time=21997 us cost=6852 size=10000000 card=1000000)
1000000      TABLE ACCESS FULL T (cr=9437 pr=9433 pw=0 time=0 us cost=2609 size=5000000 card=1000000)
1000000      TABLE ACCESS FULL T (cr=9437 pr=9433 pw=0 time=21997 us cost=2609 size=5000000 card=1000000)

select count(*)
from
 t t1, t t2, t t3, t t4 where t1.guid = t2.guid and t2.guid = t3.guid and
  t3.guid = t4.guid


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      6.67       7.59      38866      37748          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      6.67       7.59      38866      37748          0           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 155

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE (cr=37748 pr=38866 pw=1134 time=0 us)
1000000   HASH JOIN  (cr=37748 pr=38866 pw=1134 time=87985 us cost=21157 size=68000000 card=1000000)
1000000    TABLE ACCESS FULL T (cr=9437 pr=9433 pw=0 time=0 us cost=2611 size=17000000 card=1000000)
1000000    HASH JOIN  (cr=28311 pr=28299 pw=0 time=0 us cost=14171 size=51000000 card=1000000)
1000000     TABLE ACCESS FULL T (cr=9437 pr=9433 pw=0 time=0 us cost=2611 size=17000000 card=1000000)
1000000     HASH JOIN  (cr=18874 pr=18866 pw=0 time=0 us cost=7989 size=34000000 card=1000000)
1000000      TABLE ACCESS FULL T (cr=9437 pr=9433 pw=0 time=0 us cost=2611 size=17000000 card=1000000)
1000000      TABLE ACCESS FULL T (cr=9437 pr=9433 pw=0 time=0 us cost=2611 size=17000000 card=1000000)



Performance Impact of OLAP

Julie Liao, January 21, 2009 - 1:46 pm UTC

Hi Tom,
Thank you very much for the follow-up.
In our data warehouse, there is one table contains more than 5 million rows each day for 5 years (it's still growing). If use GUID as primary key, the index would be much bigger than the num index. Don't you think the performance would be an issue?

I saw some data model using VARCHAR2(32) for GUIDs instead of RAW(16). Is that a very bad idea?

Thanks,
Julie


Tom Kyte
January 21, 2009 - 1:49 pm UTC

... Don't you think the performance
would be an issue?
....

like I said, somewhere between -1000 and +1000%

"it depends"

do you use the index - I didn't above, many times you don't. The cost of doing the joins will be about the same.

but you might increase your physical IO's if you make things even bigger.

using a varchar2(32) would be twice the storage, raw(16) is already big - I would not encourage you to make it twice as big, no.


Heartburn over GUID length

Stan R, November 17, 2011 - 8:02 pm UTC

My developers have gotten used to 9 or 10 digit sequence numbers. DBAs see the benefits of GUIDs over sequence waits, index on sequence etc, however developers are having heartburn to embrace GUID keys that need to be shown in UI.
Business users also may have same heartburn with long IDs.

Your take on this?

Is it ok to use GUID based keys to show on UI (say like a submission ID)?
The 32 length GUID is throwing developers & UI designers off.

Obviously managing peformance issues with sequence object by caching/reverse key indexes etc?

DBAs also have major heartburn when business users & naive developers try to order data chronologically by sequence face value. i.e.
seq value 1= first record "in time"
seq value 2= second record "in time" etc




Tom Kyte
November 21, 2011 - 10:46 am UTC

Your take on this?


I have no sympathy for the developers, they are after all programmers, I don't know what kind of 'heartburn' they could be having - other than that you might receive after a bad lunch or dinner. It is just data - they process data. No big deal.

I don't understand why the surrogate key would ever really manifest itself in the UI? If the end user needs to have something to remember a record by - a sequence isn't going to cut it either. Are you telling me the business user can really remember and utilize 424,654,231 ?


DBAs also have major heartburn when business users & naive developers try to
order data chronologically by sequence face value. i.e.
seq value 1= first record "in time"
seq value 2= second record "in time" etc


that has never been really doable - and in clustered environments is just impossible.


I would say:

developers - come on, give me a break, this is what you do for a living. Business users - we promise to not give you a surrogate key to remember things by, they will be pretty much hidden. Sure, you might see them in a URL - but you just click URLs, you don't read them. If you think that is scary, look at the URL's for any news site, amazon.com, whatever. They are there for the computer - not for you - you never type them in.


Said by a guy that has URLS like this:
https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:77564387253035#4124960600346883380

I purposely use surrogates that are not guessable, that are big. I cannot sort by them, I cannot tell order of insertion from them, I cannot remember them - I never say "oh, see question 432513452346464366 for an answer". I just say "click on this link"

More to Explore

VLDB

If you are new to partitioning, check out Connor McDonald's introduction series here.

VLDB

Documentation set on VLDB and Partitioning.