Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Sankarakumar.

Asked: April 19, 2010 - 1:06 pm UTC

Answered by: Tom Kyte - Last updated: November 04, 2016 - 3:47 am UTC

Category: Database - Version: 11.2.0

Viewed 50K+ times! This question is

You Asked

Hello Tom,
I have a question on using GUIDs as primary keys. Our application is being developed on 11gR2 with a 3-node RAC. The development team has introduced a new column to almost all of the tables which is technically a UUID generated by the application code. They are also indexing it. The reasoning behind this is to provide this key to the external systems instead of passing the primary key of the tables as it is created by a sequence today and easily guessable.

I am putting forth this recommendation of keeping the UUID as a primary key as well. I would like to know the impact on this in terms of performance and if you can suggest me some sort of testing that I can conduct in my environment and analyze some traces to see the performance impact, it would be great.

I asked this question earlier and clicked on the submit, somehow it did not transmit and hence I am retyping the whole thing. My apologies if you have received this twice in your inbox.

Thanks for taking time to answer my queries during Oracle tech product fair in Atlanta last month.

Regards
Kumar Ramalingam

and we said...

If your tables have a natural key already - a true key - do not replace them with surrogates.

If you are already using a surrogate (a sequence for example) and you are going to populate yet another surrogate, I would chose to use a single surrogate if I could - one less column, one less unique index to maintain.

a GUID should be a 16 byte raw (hopefully they are not using a 32 byte varchar2..) and will perform adequately as a primary key.

The main difference (and you will have this regardless of whether you use a sequence or the guid as the primary key - it is a fact due to the very existence of the guid) is that you need to have pretty much 100% of the guid unique index in the buffer cache since you randomly insert into it - whereas the primary key key using a sequence would only hit the right hand side. Also, you'll find the guid index to be larger in general since the inserts are random - about 50% of the index will be empty at any time (and this is OK, do not - repeat DO NOT rebuild the index to reclaim space, that will only be a waste of time). But again, you will have this happening due to the existence of the field - not because you are using it as a primary key. And the removal of the index on the sequence will free up some of the cache to help hold it.


I generate random keys using a sequence and dbms_random - look at my url :)

and you rated our response

  (11 ratings)

Is this answer out of date? If it is, please let us know via a Review

Reviews

Converting the SYS_GUID() value to NUMBER

August 10, 2011 - 3:19 pm UTC

Reviewer: Edwin F. López. from Medellín, Colombia.

Hi Tom and thanks for your site.

We're in a similar scenario as the original poster, but we're reluctant to use RAW(16) as the primary key for our tables, mostly as this implies to change all the code in the application layer.

Our scenario includes tablets pc with oraclexe and we're generating the GUID and then converting it to a number using TO_NUMBER(SYS_GUID(),'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'), this is our primary key (used for syncing), we also have a unique key on the natural candidate (for business logic); we then have to sync each oraclexe database in the tablets with the central database.

So far we haven't had any problems, however I'm worried that that conversion to number may cause data truncation, syncing issues, resource contention or performance degradation against other alternatives (raw,varchar2(eeeew!)).

Can you please advise if we made the right decision to use numbers to store the integer representation of the guid value and suggest a testing method I can follow on my environment for knowing the impact of this decision in terms of performance and resources.

I really appreciate your help. Thanks in advance.
Tom Kyte

Followup  

August 13, 2011 - 4:34 pm UTC

that won't work - do NOT - repeat DO NOT do that.



think about this please, we have 38 digits of precision maximum. How many digits does it take to represent 0xffffffffffffffffffffffffffffffff ???

answer: more than 38


if you want to use sys_guid() you will either:

a) deal with the FACT that it is raw(16) and program accordingly

b) deal with the FACT you can safely store that in a varchar2(32) using hex characters

there really isn't anything in between.

Evidence Oracle Provides regarding GUID to NUMBER.

August 16, 2011 - 10:58 am UTC

Reviewer: Edwin F. López A. from Medellín, Colombia

Hi Tom and Thank you very much for your quick response.

I had the same reasoning as the one you suggested and that was the origin of my doubt about data truncation, however, when I created the proof of concept, I got peace of mind after the results; then I asked you and after your answer, that peace gone somewhere cold and gloomy :-).

This is what I did:

CREATE TABLE GUUID_TEST 
( PK NUMBER, UUID VARCHAR2(32), RAWUUID RAW(16) );

INSERT INTO GUUID_TEST (PK, UUID, RAWUUID)
VALUES
(TO_NUMBER('FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF','XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX')
, 'FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF'
, HEXTORAW('FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF')
);

COMMIT;

SELECT * FROM GUUID_TEST;


And this is the response from Oracle
PK                     UUID                             RAWUUID                          
---------------------- -------------------------------- -------------------------------- 
340282366920938463463374607431768211455 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF 


After that, I ran the following query to check for the number that resulted from the operation and here's the result:

SELECT TO_CHAR(340282366920938463463374607431768211455,'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX') 
from dual;


And Oracle said:
TO_CHAR(340282366920938463463374607431768211455,'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX') 
----------------------------------------------------------------------------------- 
 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF


So, as you see, the conversion process did not failed due to number precision, and even the table creation was successful. According to the transformation from oracle, the decimal representation of "0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF" is "340282366920938463463374607431768211455", those are 39 digits.

Can you please advise if I'm doing something wrong? I don't see any problems given the evidence that oracle produced.

Thanks again in advance.
Tom Kyte

Followup  

August 16, 2011 - 5:39 pm UTC

ok, documentation says:

http://docs.oracle.com/docs/cd/E11882_01/server.112/e17118/sql_elements001.htm#sthref120

Oracle guarantees the portability of numbers with precision of up to 20 base-100 digits, which is equivalent to 39 or 40 decimal digits depending on the position of the decimal point.

but I find this to be entirely 'sketchy', I would not recommend this. I can see how many client API's would have troubles with integers over a certain value. You'll have to make sure ALL code deals with them properly - or you will scramble identifiers. I see many bugs related to this in your developed code in the future.


I would not do this myself.

Thanks a lot

August 16, 2011 - 6:45 pm UTC

Reviewer: Edwin F. López A. from Medellín, Colombia

Thanks a lot Tom for your help. I really appreciated it that you took the time to answer.

I'll move this discussion and your consideration to the main architecture team and let them take the decision.

Again, thanks a bunch.

the link to the oracle docs doesn't work anymore

June 05, 2013 - 7:59 am UTC

Reviewer: A reader from Nederland

the link to the oracle docs doesn't work anymore
Tom Kyte

Followup  

June 05, 2013 - 1:53 pm UTC

so I plugged the quote from the documentation "Oracle guarantees the portability of numbers with precision of up to 20" that I placed after the link into google an viola:

http://docs.oracle.com/cd/B28359_01/server.111/b28286/sql_elements001.htm#SQLRF00222

it took me right to the current doc set......

June 05, 2013 - 3:37 pm UTC

Reviewer: A reader

Dear Sir,

what is the meaning of 20 base-100 digite and how its equivalent to 39 0r 40 digits?

can you please explain this?

Thanks a bunch.

Tom Kyte

Followup  

June 05, 2013 - 4:20 pm UTC

we store numbers in a packed decimal like format which gives us 2 digits per byte - since we have 20 of them in this base 100 notation - that gives us up to 39 or 40 digits of precision.

Here is a longer writeup on numbers from my book Expert Oracle database Architecture:


Number Types

Oracle 10g and above supports three native datatypes suitable for storing numbers. Oracle9i Release 2 and earlier support exactly one native datatype suitable for storing numeric data. In this list, the NUMBER type is supported by all releases, and the subsequent two types are new datatypes supported only in Oracle 10g and above:

 * NUMBER: The Oracle NUMBER type is capable of storing numbers with an extremely large degree of precision—38 digits of precision, in fact. The underlying data format is similar to a “packed decimal” representation. The Oracle NUMBER type is a variable length format from 0 to 22 bytes in length. It is appropriate for storing any number as small as 10e-130 and numbers up to but not including 10e126. This is by far the most common NUMBER type in use today.

 * BINARY_FLOAT: This is an IEEE native single-precision floating-point number. On disk it will consume 5 bytes of storage: 4 fixed bytes for the floating-point number and 1 length byte. It is capable of storing numbers in the range of ~ ± 1038.53 with 6 digits of precision.

 * BINARY_DOUBLE: This is an IEEE native double-precision floating-point number. On disk it will consume 9 bytes of storage: 8 fixed bytes for the floating-point number and 1 length byte. It is capable of storing numbers in the range of ~ ± 10308.25 with 13 digits of precision.

As you can see from this quick overview, the Oracle NUMBER type has significantly larger precision than the BINARY_FLOAT and the BINARY_DOUBLE types, but a much smaller range than the BINARY_DOUBLE. That is, you can store numbers very precisely with many significant digits in a NUMBER type, but you can store much smaller and larger numbers in the BINARY_FLOAT and BINARY_DOUBLE types. As a quick example, we can create a table with the various datatypes in them and see what is stored given the same inputs:

ops$tkyte@ORA11GR2> create table t
  2  ( num_col   number,
  3    float_col binary_float,
  4    dbl_col   binary_double
  5  )
  6  /
Table created.
 
ops$tkyte@ORA11GR2> insert into t ( num_col, float_col, dbl_col )
  2  values ( 1234567890.0987654321,
  3           1234567890.0987654321,
  4           1234567890.0987654321 );
1 row created.
 
ops$tkyte@ORA11GR2> set numformat 99999999999.99999999999
ops$tkyte@ORA11GR2> select * from t;
 
                 NUM_COL                FLOAT_COL                  DBL_COL
------------------------ ------------------------ ------------------------
  1234567890.09876543210   1234567940.00000000000   1234567890.09876540000

Note that the NUM_COL returns the exact number we provided as input. There are fewer than 38 significant digits in the input number (I supplied a number with 20 significant digits), so the exact number is preserved. The FLOAT_COL, however, using the new BINARY_FLOAT type, was not able to accurately represent this number. In fact, it preserved only 7 digits accurately. The DBL_COL faired much better, accurately representing the number in this case out to 17 digits. Overall, though, this should be a good indication that the BINARY_FLOAT and BINARY_DOUBLE types will not be appropriate for financial applications! If you play around with different values, you’ll see different results:


ops$tkyte@ORA11GR2> delete from t;
1 row deleted.
 
ops$tkyte@ORA11GR2> insert into t ( num_col, float_col, dbl_col )
  2  values ( 9999999999.9999999999,
  3           9999999999.9999999999,
  4           9999999999.9999999999 );
1 row created.
 
ops$tkyte@ORA11GR2> select * from t;
 
                 NUM_COL                FLOAT_COL                  DBL_COL
------------------------ ------------------------ ------------------------
  9999999999.99999999990  10000000000.00000000000  10000000000.00000000000

Once again, the NUM_COL accurately represented the number, but the FLOAT_COL and DBL_COL cannot. This does not mean that the NUMBER type is able to store things with “infinite” accuracy/precision—just that it has a much larger precision associated with it. It is easy to observe similar results from the NUMBER type:

ops$tkyte@ORA11GR2> delete from t;
1 row deleted.
 
ops$tkyte@ORA11GR2> insert into t ( num_col )
  2  values ( 123 * 1e20 + 123*1e-20 ) ;
1 row created.
 
ops$tkyte@ORA11GR2> set numformat 999999999999999999999999.999999999999999999999999
ops$tkyte@ORA11GR2> select num_col, 123*1e20, 123*1e-20 from t;
 
                                           NUM_COL
--------------------------------------------------
                                          123*1E20
--------------------------------------------------
                                         123*1E-20
--------------------------------------------------
  12300000000000000000000.000000000000000000000000
  12300000000000000000000.000000000000000000000000
                         .000000000000000001230000

As you can see, when we put together a very large number (123*1e20) and a very small number (123*1e-20), we lost precision because this arithmetic requires more than 38 digits of precision. The large number by itself can be faithfully represented, as can the small number, but the result of the larger plus the smaller cannot. We can verify this is not just a display/formatting issue as follows:

ops$tkyte@ORA11GR2> select num_col from t where num_col = 123*1e20;
 
                                           NUM_COL
--------------------------------------------------
  12300000000000000000000.000000000000000000000000 

The value in NUM_COL is equal to 123*1e20, and not the value we attempted to insert.
NUMBER Type Syntax and Usage
The syntax for the NUMBER type is straightforward:

NUMBER( p,s )

where P and S are optional and are used to specify the following:

 * Precision, or the total number of digits. By default, the precision is 38 and has valid values in the range of 1 to 38. The character * may be used to represent 38 as well.

 * Scale, or the number of digits to the right of the decimal point. Valid values for the scale are –84 to 127, and its default value depends on whether or not the precision is specified. If no precision is specified, then scale defaults to the maximum range. If a precision is specified, then scale defaults to 0 (no digits to the right of the decimal point). So, for example, a column defined as NUMBER stores floating-point numbers (with decimal places), whereas a NUMBER(38) stores only integer data (no decimals), since the scale defaults to 0 in the second case.

You should consider the precision and scale to be edits for your data—data integrity tools in a way. The precision and scale do not affect at all how the data is stored on disk, only what values are permitted and how numbers are to be rounded. For example, if a value exceeds the precision permitted, Oracle returns an error:

ops$tkyte@ORA11GR2> create table t ( num_col number(5,0) );
Table created.
 
ops$tkyte@ORA11GR2> insert into t (num_col) values ( 12345 );
1 row created.
 
ops$tkyte@ORA11GR2> insert into t (num_col) values ( 123456 );
insert into t (num_col) values ( 123456 )
                                 *
ERROR at line 1:
ORA-01438: value larger than specified precision allows for this column

So, you can use the precision to enforce some data integrity constraints. In this case, NUM_COL is a column that is not allowed to have more than five digits.
The scale, on the other hand, is used to control “rounding” of the number, for example:

ops$tkyte@ORA11GR2> create table t ( msg varchar2(10), num_col number(5,2) );
Table created.
 
ops$tkyte@ORA11GR2> insert into t (msg,num_col) values ( '123.45',  123.45 );
1 row created.
 
ops$tkyte@ORA11GR2> insert into t (msg,num_col) values ( '123.456', 123.456 );
1 row created.
 
ops$tkyte@ORA11GR2> select * from t;
 
MSG           NUM_COL
---------- ----------
123.45         123.45
123.456        123.46

Notice how the number 123.456, with more than five digits, succeeded this time. That is because the scale we used in this example was used to round 123.456 to two digits, resulting in 123.46, and then 123.46 was validated against the precision, found to fit, and inserted. However, if we attempt the following insert, it fails:

ops$tkyte@ORA11GR2> insert into t (msg,num_col) values ( '1234', 1234 );
insert into t (msg,num_col) values ( '1234', 1234 )
                                             *
ERROR at line 1:
ORA-01438: value larger than specified precision allows for this column

because the number 1234.00 has more than five digits in total. When you specify the scale of 2, at most three digits may be to the left of the decimal place and two to the right. Hence that number does not fit. The NUMBER(5,2) column can hold all values between 999.99 and –999.99.
It may seem strange to allow the scale to vary from –84 to 127. What purpose could a negative scale fulfill? It allows you to round values to the left of the decimal place. Just as the NUMBER(5,2) rounded values to the nearest .01, so a NUMBER(5,-2) would round to the nearest 100, for example:

ops$tkyte@ORA11GR2> create table t ( msg varchar2(10), num_col number(5,-2) );
Table created.
 
ops$tkyte@ORA11GR2> insert into t (msg,num_col) values ( '123.45',  123.45 );
1 row created.
 
ops$tkyte@ORA11GR2> insert into t (msg,num_col) values ( '123.456', 123.456 );
1 row created.
 
ops$tkyte@ORA11GR2> select * from t;
 
MSG           NUM_COL
---------- ----------
123.45            100
123.456           100

The numbers were rounded  to the nearest 100. We still have five digits of precision, but there are now seven digits (including the trailing two 0s) permitted to the left of the decimal point:

ops$tkyte@ORA11GR2> insert into t (msg,num_col) values ( '1234567', 1234567 );
1 row created.
 
ops$tkyte@ORA11GR2> select * from t;
 
MSG           NUM_COL
---------- ----------
123.45            100
123.456           100
1234567       1234600
 
ops$tkyte@ORA11GR2> insert into t (msg,num_col) values ( '12345678', 12345678 );
insert into t (msg,num_col) values ( '12345678', 12345678 )
                                                 *
ERROR at line 1:
ORA-01438: value larger than specified precision allows for this column

So, the precision dictates how many digits are permitted in the number after rounding, using the scale to determine how to round. The precision is an integrity constraint, whereas the scale is an “edit.”
It is interesting and useful to note that the NUMBER type is, in fact, a variable length datatype on disk and will consume between 0 and 22 bytes of storage. Many times, programmers consider a numeric datatype to be a fixed-length type—that is what they typically see when programming with 2- or 4-byte integers and 4- or 8-byte floats. The Oracle NUMBER type is similar to a variable length character string. We can see what happens with numbers that contain differing amounts of significant digits. We’ll create a table with two NUMBER columns and populate the first column with many numbers that have 2, 4, 6, … 28 significant digits. Then, we’ll simply add 1 to each of them:

ops$tkyte@ORA11GR2> create table t ( x number, y number );
Table created.
 
ops$tkyte@ORA11GR2> insert into t ( x )
  2  select to_number(rpad('9',rownum*2,'9'))
  3    from all_objects
  4   where rownum <= 14;
14 rows created.
 
ops$tkyte@ORA11GR2> update t set y = x+1;
14 rows updated.

Now, if we use the built-in VSIZE function that shows how much storage the column takes, we can review the size differences between the two numbers in each row:

ops$tkyte@ORA11GR2> set numformat 99999999999999999999999999999
ops$tkyte@ORA11GR2> column v1 format 99
ops$tkyte@ORA11GR2> column v2 format 99
ops$tkyte@ORA11GR2> select x, y, vsize(x) v1, vsize(y) v2
  2    from t order by x;
 
                             X                              Y  V1  V2
------------------------------ ------------------------------ --- ---
                            99                            100   2   2
                          9999                          10000   3   2
                        999999                        1000000   4   2
                      99999999                      100000000   5   2
                    9999999999                    10000000000   6   2
                  999999999999                  1000000000000   7   2
                99999999999999                100000000000000   8   2
              9999999999999999              10000000000000000   9   2
            999999999999999999            1000000000000000000  10   2
          99999999999999999999          100000000000000000000  11   2
        9999999999999999999999        10000000000000000000000  12   2
      999999999999999999999999      1000000000000000000000000  13   2
    99999999999999999999999999    100000000000000000000000000  14   2
  9999999999999999999999999999  10000000000000000000000000000  15   2
 
14 rows selected.

We can see that as we added significant digits to X, the amount of storage required took increasingly more room. Every two significant digits added another byte of storage. But a number just one larger consistently took 2 bytes. When Oracle stores a number, it does so by storing as little as it can to represent that number. It does this by storing the significant digits, an exponent used to place the decimal place, and information regarding the sign of the number (positive or negative). So, the more significant digits a number contains, the more storage it consumes. 

That last fact explains why it is useful to know that numbers are stored in varying width fields. When attempting to size a table (e.g., to figure out how much storage 1,000,000 rows would need in a table), you have to consider the NUMBER fields carefully. Will your numbers take 2 bytes or 20 bytes? What is the average size? This makes accurately sizing a table without representative test data very hard. You can get the worst-case size and the best-case size, but the real size will likely be some value in between.

Use of VARCHAR(32) GUID as PK

October 17, 2013 - 2:07 pm UTC

Reviewer: Ian from Dublin

Hi Tom,

Could you clarify the following statement you made in the first reply:

"The main difference (and you will have this regardless of whether you use a sequence or the guid as the primary key - it is a fact due to the very existence of the guid) is that you need to have pretty much 100% of the guid unique index in the buffer cache since you randomly insert into it - whereas the primary key key using a sequence would only hit the right hand side."

What do you mean when you say that you'll need to have 100% of the GUID unique index in the buffer cache?

We're facing an issue just now where by an insert into a couple of tables (5000 in all) is taking over 3 minutes. When we traced the insert we noticed a high number of recursive waits on "db file sequential read". Upon further analysis of the trace file these waits were primarily on two objects....the PK GUID of the parent table and the indexed foreign key of the child table.

This is a summary of the TKPROF file:

INSERT INTO TRANSACTIONS (ID, ENDTOENDIDENTIFICATION, AMOUNT, ACCOUNTNAME,
BIC, IBAN, ROWUPDATEDATE, REQUESTEDDATE, NSC, ACCOUNTNUMBER, PICN,
TRANSACTIONONUS, PAY_ID, GRPH_ID, PRCSV_ID, NTF_ID, DEBITCREDIT,
FETCHSEQUENCE, TRANS_ID, MESSAGENAMEIDENTIFICATION, PROCESSINGDATE,
XMLCONTENT)
VALUES
(:B22 , :B21 , :B20 , :B19 , :B18 , :B17 , :B16 , :B15 , :B14 , :B13 , :B12 ,
:B11 , :B10 , :B9 , :B8 , :B7 , :B6 , :B5 , :B4 , :B3 , :B2 , :B1 )


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 5000 7.47 61.67 10348 1132 136177 5000
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 5001 7.47 61.67 10348 1132 136177 5000

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: FIRST_ROWS
Parsing user id: 110 (recursive depth: 1)
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
0 0 0 LOAD TABLE CONVENTIONAL (cr=72 pr=6 pw=0 time=90710 us)


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
Disk file operations I/O 3 0.00 0.00
db file sequential read 10348 0.33 45.93

INSERT INTO TRANSACTIONS_TRG ( ID, STATUS, ACTION, ACTION_TIME )
VALUES
( :B2 , 0, :B1 , SYSTIMESTAMP )


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 5000 0.56 1.52 52 32 5268 5000
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 5001 0.56 1.52 52 32 5268 5000

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: FIRST_ROWS
Parsing user id: 85 (recursive depth: 2)
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
0 0 0 LOAD TABLE CONVENTIONAL (cr=1 pr=0 pw=0 time=91 us)


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file sequential read 52 0.05 0.28

INSERT INTO TRANSPROCSTATUS (ID, ROWUPDATEDATE, PRCSV_ID, TRANS_ID)
VALUES
(:B4 , :B3 , :B2 , :B1 )


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 5000 2.14 103.24 5921 606 89297 5000
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 5001 2.14 103.24 5921 606 89297 5000

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: FIRST_ROWS
Parsing user id: 110 (recursive depth: 1)
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
0 0 0 LOAD TABLE CONVENTIONAL (cr=28 pr=2 pw=0 time=90088 us)


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file sequential read 5921 0.51 98.82
********************************************************************************

INSERT INTO TRANSPROCSTATUS_TRG ( ID, STATUS, ACTION, ACTION_TIME )
VALUES
( :B2 , 0, :B1 , SYSTIMESTAMP )


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 5000 0.56 1.80 59 63 5457 5000
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 5001 0.56 1.80 59 63 5457 5000

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: FIRST_ROWS
Parsing user id: 85 (recursive depth: 2)
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
0 0 0 LOAD TABLE CONVENTIONAL (cr=1 pr=0 pw=0 time=111 us)


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
Disk file operations I/O 1 0.00 0.00
db file sequential read 59 0.13 0.33


********************************************************************************

OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 29 0.03 0.08 0 2 0 0
Execute 5026 5.03 11.94 21 1188 391 5016
Fetch 9 0.01 0.04 0 939 1 39
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 5064 5.08 12.06 21 2129 392 5055

Misses in library cache during parse: 15
Misses in library cache during execute: 9

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 5037 0.00 0.01
log file sync 2 0.00 0.00
SQL*Net message from client 5037 0.15 12.20
Disk file operations I/O 7 0.00 0.00
db file sequential read 9 0.06 0.15


OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 44 0.00 0.00 0 0 6 0
Execute 25162 10.82 168.40 16382 1877 236206 20002
Fetch 5347 0.03 0.07 0 653 0 5237
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 30553 10.86 168.48 16382 2530 236212 25239

Misses in library cache during parse: 15
Misses in library cache during execute: 14

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
Disk file operations I/O 5 0.00 0.00
db file sequential read 16382 0.51 145.40

35 user SQL statements in session.
42 internal SQL statements in session.
77 SQL statements in session.
********************************************************************************
Trace file: /u01/app/oracle/diag/rdbms/iloq1spe/ILOQ1SPE/trace/SEPACTOImportPain001_20131015062222_PAIN001.201310_2.trc
Trace file compatibility: 11.1.0.7
Sort options: prscpu
0 session in tracefile.
35 user SQL statements in trace file.
42 internal SQL statements in trace file.
77 SQL statements in trace file.
72 unique SQL statements in trace file.
1448010 lines in trace file.
194 elapsed seconds in trace file.

The two tables involved in this insert are:

TRANSACTIONS
------------

num_rows circa 15 million

desc transactions
Name Null Type
------------------------- -------- ------------
ID NOT NULL VARCHAR2(32)
ENDTOENDIDENTIFICATION NOT NULL VARCHAR2(35)
AMOUNT NUMBER(17,2)
ACCOUNTNAME VARCHAR2(70)
BIC VARCHAR2(11)
IBAN VARCHAR2(34)
ROWUPDATEDATE NOT NULL DATE
REQUESTEDDATE NOT NULL DATE
NSC VARCHAR2(6)
ACCOUNTNUMBER VARCHAR2(8)
PICN VARCHAR2(14)
TRANSACTIONONUS VARCHAR2(1)
XMLCONTENT XMLTYPE()
PAY_ID VARCHAR2(32)
GRPH_ID VARCHAR2(32)
PRCSV_ID NOT NULL VARCHAR2(32)
STSV_ID VARCHAR2(10)
NTF_ID VARCHAR2(32)
DEBITCREDIT NOT NULL VARCHAR2(1)
FETCHSEQUENCE NUMBER(10)
TRANS_ID VARCHAR2(32)
MESSAGENAMEIDENTIFICATION NOT NULL VARCHAR2(50)
PROCESSINGDATE DATE

TRANSPROCSTATUS
---------------

num rows circa 16.3 million

desc transprocstatus
Name Null Type
------------- -------- ------------
ID NOT NULL VARCHAR2(32)
ROWUPDATEDATE NOT NULL DATE
PRCSV_ID NOT NULL VARCHAR2(32)
TRANS_ID NOT NULL VARCHAR2(32)

There are also a trigger table off each of these but these don't seem to be the source of the issue.

We don't generate the GUIDS in the database, instead they're generated in IBM Message Broker.

Thanks,
Ian.
Tom Kyte

Followup  

November 01, 2013 - 8:11 pm UTC

think of something that is monotonically increasing - like a date populated by systdate or an id populated by a sequence.


as you insert into it - the values are increasing, you only need the right hand side of the index in the cache. the left hand side - which you don't query so much any more since it is old data - it is rarely in the cache. hence when you insert into this index - you'll do very very very little if any physical IO.


now, think of an index on something that is NOT monotonically increasing. Like an index on LAST_NAME. you insert an A, then a Z, then an M, then a B, then a W and so on. you insert randomly across the entire breadth of the index. You need that entire index in memory in order to say "we do no physical reads on the index during an insert" - unlike the index on the monotonically increasing value.


so, making a reasonable assumption that guid's are not monotonically increasing... you will need that entire index in memory to avoid physical reads during inserts.

Use of VARCHAR(32) GUID as PK

November 05, 2013 - 4:47 pm UTC

Reviewer: A reader from Dublin

Hi Tom,

Many thanks for the reply. we've finally managed to test like for like (SEQ versus GUID) on production like volumes and the results are obvious (see TKPROF below). So we're going to modify part of the DB design to use Sequences.


TKPROF: Release 11.2.0.3.0 - Development on Tue Oct 29 16:22:50 2013

Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.

Trace file: /u03/channels/SPE_PP_XFER/traces/SEPACTOImportPain001_20131029162155_PAIN001.xml.sf_38.trc
Sort options: prsela exeela fchela
********************************************************************************
count = number of times OCI procedure was executed
cpu = cpu time in seconds executing
elapsed = elapsed time in seconds executing
disk = number of physical reads of buffers from disk
query = number of buffers gotten for consistent read
current = number of buffers gotten in current mode (usually for update)
rows = number of rows processed by the fetch or execute call
********************************************************************************

SQL ID: 1z16uzjc03xmz Plan Hash: 0

INSERT INTO TX_SEQ (ID, ENDTOENDIDENTIFICATION, AMOUNT, ACCOUNTNAME, BIC,
IBAN, ROWUPDATEDATE, REQUESTEDDATE, NSC, ACCOUNTNUMBER, PICN,
TRANSACTIONONUS, PAY_ID, GRPH_ID, PRCSV_ID, NTF_ID, DEBITCREDIT,
FETCHSEQUENCE, TRANS_ID, MESSAGENAMEIDENTIFICATION, PROCESSINGDATE,
XMLCONTENT)
VALUES
(:B22 , :B21 , :B20 , :B19 , :B18 , :B17 , :B16 , :B15 , :B14 , :B13 , :B12 ,
:B11 , :B10 , :B9 , :B8 , :B7 , :B6 , :B5 , :B4 , :B3 , :B2 , :B1 )


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 10000 6.31 9.70 2513 1601 173974 10000
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 10001 6.31 9.70 2513 1601 173974 10000

Misses in library cache during parse: 0
Optimizer mode: FIRST_ROWS
Parsing user id: 76 (recursive depth: 1)
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
0 0 0 LOAD TABLE CONVENTIONAL (cr=1 pr=1 pw=0 time=1603 us)


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file sequential read 2513 0.03 1.35
Disk file operations I/O 5 0.00 0.00
KSV master wait 2 0.00 0.00
ASM file metadata operation 1 0.00 0.00
log file switch completion 1 0.00 0.00
********************************************************************************

SQL ID: bt9uk34nf1ga6 Plan Hash: 0

BEGIN SPE_PLSQL_OWNER.SPE$TRANSACTIONDAO.P$PROCESSTRANSACTION ( :V1 , :V2 ,
:V3 , :V4 , :V5 , :V6 , :V7 , :V8 , :V9 , :V10 , :V11 , :V12 , :V13 , :V14 ,
:V15 , :V16 , :V17 , :V18 , :V19 , :V20 , :V21 , :V22 , :V23 , :V24 );
END;


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 10000 3.92 6.30 0 0 0 10000
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 10001 3.92 6.30 0 0 0 10000

Misses in library cache during parse: 0
Optimizer mode: FIRST_ROWS
Parsing user id: 43

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 10000 0.00 0.01
SQL*Net message from client 10000 0.08 18.35
********************************************************************************

SQL ID: 7fguphbnnyqau Plan Hash: 0

INSERT INTO TP_SEQ (ID, ROWUPDATEDATE, PRCSV_ID, TRANS_ID)
VALUES
(:B4 , :B3 , :B2 , :B1 )


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 10000 0.83 1.58 683 281 143354 10000
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 10001 0.83 1.58 683 281 143354 10000

Misses in library cache during parse: 0
Optimizer mode: FIRST_ROWS
Parsing user id: 76 (recursive depth: 1)
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
0 0 0 LOAD TABLE CONVENTIONAL (cr=1 pr=0 pw=0 time=136 us)


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file sequential read 683 0.04 0.47
********************************************************************************

OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 31 1.54 2.05 0 0 0 0
Execute 10027 3.95 6.34 12 27 157 10014
Fetch 12 0.11 0.13 0 4200 0 697
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 10070 5.60 8.53 12 4227 157 10711

Misses in library cache during parse: 4

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
reliable message 1 0.00 0.00
SQL*Net message to client 10036 0.00 0.01
log file sync 1 0.00 0.00
SQL*Net message from client 10036 0.08 18.40
Disk file operations I/O 7 0.00 0.00
db file sequential read 10 0.00 0.01
SQL*Net more data to client 1 0.00 0.00


OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 75 0.00 0.00 0 0 0 0
Execute 41077 7.48 11.72 3202 2897 319348 21006
Fetch 20075 0.30 0.43 0 147 1000 20071
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 61227 7.78 12.16 3202 3044 320348 41077

Misses in library cache during parse: 0

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
Disk file operations I/O 5 0.00 0.00
db file sequential read 3202 0.04 1.83
KSV master wait 2 0.00 0.00
ASM file metadata operation 1 0.00 0.00
log file switch completion 1 0.00 0.00

36 user SQL statements in session.
7 internal SQL statements in session.
43 SQL statements in session.
********************************************************************************
Trace file: /u03/channels/SPE_PP_XFER/traces/SEPACTOImportPain001_20131029162155_PAIN001.xml.sf_38.trc
Trace file compatibility: 11.1.0.7
Sort options: prsela exeela fchela
0 session in tracefile.
36 user SQL statements in trace file.
7 internal SQL statements in trace file.
43 SQL statements in trace file.
43 unique SQL statements in trace file.
136800 lines in trace file.
40 elapsed seconds in trace file.

Many thanks,
Ian.
Tom Kyte

Followup  

November 07, 2013 - 3:20 pm UTC

that said, sequences have their foibles as well.

if you use a sequence, all sessions will insert into the same right hand side block - leading to contention again....

Use of VARCHAR(32) GUID as PK

November 07, 2013 - 5:11 pm UTC

Reviewer: A reader from Dublin

Hi Tom,

I wonder if I can follow up again on the issue I've described above. Basically whilst we recognize that a numeric sequence will give us the best performance, it was deemed that there would be too much of an impact on the existing application to go down this route. Therefore we've come up with an alternative approach that seems to be delivering some of the same performance benefits of the sequence.

We tried generating our UID's using Java via the java.rmi.server.UID class. This will generate sequential UID's....to a point (see examples below). We're seeing very similar import/insert performance (on a par with sequences). However after the import comes the export and obviously we need to ensure that query performance isn't adversely affected either (we haven't been able to prove the export performance as of yet....environment issues!).

Our application is based on IBM Message broker. We have an ACTIVE-ACTIVE cluster and multiple execution groups. Each group has it's own JVM and finally multithreading is enabled. So with that in mind, here are the various formats of UID's we can generate:


Java UUID with Instance Id

[pid in hex]:[time in milliseconds from 1970 in hex]:[sequence in hex]:[server id]

-347670e7:142335db179:-8000:1
-347670e7:142335db179:-7fff:1
-347670e7:142335db179:-7ffe:1
-347670e7:142335db179:-7ffd:1
-347670e7:142335db179:-7ffc:1
Execution group (PID) changed due to code deployment for example
56b93271:142335e5f13:-8000:1
56b93271:142335e5f13:-7fff:1
56b93271:142335e5f13:-7ffe:1
56b93271:142335e5f13:-7ffd:1
56b93271:142335e5f13:-7ffc:1

Enhanced UUID based on Application change

[server id]:[Short App Name]:[time in milliseconds from 2013 in hex]:[positive sequence in hex]

1:Imp:63ff963ca:0
1:Imp:63ff963ca:1
1:Imp:63ff963ca:2
1:Imp:63ff963ca:3
1:Imp:63ff963ca:4
1:Imp:63ff9d983:0
1:Imp:63ff9d983:1
1:Imp:63ff9d983:2
1:Imp:63ff9d983:3
1:Imp:63ff9d983:4

Enhanced UUID based on Application

[time in milliseconds from 2013 in hex]:[positive sequence in hex]: ]:[Short App Name]:[ server id]
63ffc4467:0:Imp:1
63ffc4467:1:Imp:1
63ffc4467:2:Imp:1
63ffc4467:3:Imp:1
63ffc4467:4:Imp:1
63ffc94e6:0:Imp:1
63ffc94e6:1:Imp:1
63ffc94e6:2:Imp:1
63ffc94e6:3:Imp:1
63ffc94e6:4:Imp:1

So our sequence while generally sequential will change in the following ways:

[server id] => Can be 1 or 2 depending on which node
[Short App Name] => There are two broker EXE groups on import.
[positive sequence in hex] => resets back to 0 every 65K
[time in milliseconds from 2013 in hex] => new timestamp generated every 65k.

I've looked at the following post as a reference and this sort of looks similar....

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:6077843300346542181

I've not really used GUID/UID's before so I'm a bit unsure as to how the UID string should be formulated for best index performance.

Thanks,
Ian.

Tom Kyte

Followup  

November 11, 2013 - 9:11 am UTC


your choice.

Alternative for sequences?

November 07, 2013 - 7:13 pm UTC

Reviewer: Matt McPeak from Yardley, PA

Tom,

I'm just reading this thread for no reason (call it "training") and was wondering about the way you generate keys using a sequence and a random number.

Is that done to find a happy medium between (A) keeping updates on the right side of the index and (B) not keeping them all on the VERY RIGHT-MOST BLOCKS of the index (to reduce contention)?

If so, can you please share your thoughts / methodology on this? Is it a technique you'd recommend for widespread usage?

In the past, I thought Global Hash Partitions were the way to strike this balance (replacing 1 right-most edge with, say, 64 edges). But I'd love a better way!

Thanks!

Matt

Tom Kyte

Followup  

November 11, 2013 - 9:27 am UTC

it was done in my case to make "guessing an id" not possible.

I don't use authentication here. I publish some answers, I do not publish others. If I used a sequence and people noticed a gap here or there, they would/could try to retrieve an unpublished answer by rewriting the URL.

If you get lucky, you could plug in an ID and get a random article, it isn't likely ,but you can do it. It might be published or it might not be. It is OK - in general I'd rather you didnt see random unpublished articles but if you do - it is OK.

It was just a way to make the id's "not easily guessable" for me.


If you want a scalable sequence, consider using

instance_id || mod(session_id,3 or 4) || sequence


where you use a fixed width for each (2 for instance, 3 or 4 for session id and say 20 or so for the sequence).

that is scalable in RAC since each instance would have its own index subtree.

that is scalable in single instance since there would be a modest number of insertion points, say 1,000 to 10,000.

and still unique.

Thanks!

November 20, 2013 - 4:17 pm UTC

Reviewer: Matt McPeak from Yardley, PA

Thanks, as always, for your good knowledge and advice!

raw(16) vs. varchar2(32)

November 02, 2016 - 1:21 am UTC

Reviewer: Robert Bernier from San Diego, CA USA

Tom, what do you mean by "(hopefully they are not using a 32 byte varchar2..)?"
when later on in the thread you say we have two choices:
a) raw(16)
b) varchar2(32)

Why is raw(16) better?
Connor McDonald

Followup  

November 04, 2016 - 3:47 am UTC

It will consume half the space in the table (and indexes), and do not run the risk of data type conversion errors from incoming or outgoing clients.

More to Explore

DBMS_RANDOM

More on PL/SQL routine DBMS_RANDOM here