Thanks for the question, Andrew.

Asked: February 13, 2013 - 7:07 pm UTC

Last updated: May 06, 2013 - 6:13 pm UTC

Version: 11.2.0.3.0

Viewed 1000+ times

You Asked

Hi Tom,

I recently attended the Real World Performance session that Andrew and yourself ran in Perth, Australia.

There was one particular case study that you mentioned that I cannot remember all the details of. I was wondering if you'd be kind enough to elaborate a bit.

What I remember is:

There was one client site where there was Index contention causing problems in a multi-master replication setup.

Each of you three suggested different solutions - I believe you suggested a reverse key index, but then this was still determined to be causing a problem with not a sufficient number of leaf blocks, and so the unique indexes were modified to be a GUID with a site-identifying prefix.

Does this ring any bells?

I'm asking because I'd like to understand the real world applications for not using a sequence generated primary key.

Kind Regards,
Andrew

and Tom said...

It wasn't replication - we wouldn't talk about replication (being somewhat opposed to the concept - especially multi-master - ugh)

It was about index contention on an index on a sequence number. You have a table T:

create table t ( x int primary key, .... );

and you insert s.nextval into X. All inserts will go into the same right hand side block. Now, if you have several sessions doing this at the same time - they will all naturally try to hit the same right hand side block - but they cannot really do that *at the same exact instant*. They will be serializing their bits and bytes modifications on that block. They will appear to be at the same time, but you know they cannot be. You'll have buffer busy waits.

So, you would like to fix this - to reduce or remove the buffer busy waits.

Three solutions were presented

a) reverse key index. This would solve the buffer busy waits by spreading the inserts into the index all across the entire breadth of the index. All of the numbers that end with 1 would now start with 1, end with 2 - start with 2 (after reversing the bytes). So, the index would be hit on the left hand side, the middle, the right hand side - all over the place. A sequence like 5431531 would not be anywhere near 5431532 would not be anywhere near 5431533 and so on. Contention = gone.

But.... Now the entire index had better be in the buffer cache whereas before - only the hot right hand side needed to be in the cache for efficient inserts. If the index cannot fit into the cache we might well have turned a buffer busy wait into a physical IO wait which could be even worse (the remedy is worse than the symptoms)

b) globally hash partition the index. This would again solve the buffer busy waits by spreading the inserts into N insertion points where N is the number of partitions you used. So, if you used 64 partitions - we'll have 64 small indexes instead of one large one. Each of the 64 hash partitions will have a "slightly warmer than cold" right hand side we'll be inserting into. Each right hand side will see 1/64th of the number of inserts per second - a dramatic reduction in contention.

And you don't need the entire index in cache, just the 64 small right hand sides...

But..... You are now using partitioning for 'tuning', instead of for ease of administration or query performance. We'd rather use it there - we don't want to partition for something like this necessarily (it works, but removes your ability to partition for something else). also, it requires the partitioning option. And lastly - in a RAC environment - this wouldn't work very well because you'd be constantly pinging those slightly warmer than cold right hand sides all over the place.

c) use a new type of key, a scalable key of your own design. Using either three columns (good idea) or a single column (works but less flexible, might impose a size limit on your key and consumes maximum space) - make a key composed of:

1) instance_id - the number 1, 2, 3, 4, .....
2) mod(session/process_id,some_small_number) so you get things like 23, 56, 89 and so on - just 1 or 2 or three digits
3) a sequence number

So, if your key starts with instance id, each node in a rac cluster would have its own subtree of the index to insert into - so you don't have much internode collisions on the inserts.

and if your key continues with something about the session/process - we'll spread the inserts out across a few insertion points within each instance (so no/reduced intra-instance/node collisions)

and the sequence will make it all unique.

The "best" way would probably be (c) in most cases - however, it requires you to have thought of these things at design time. (how often does that happen ;) )

After that - the hash partitioned index is nice, but you need an environment where it is possible.

after that - the reverse key index is looking good - if you have the extra cache, but if you don't - you'll probably be better off *leaving the problem alone* or *reducing the amount of concurrency* (remember this: http://www.youtube.com/watch?v=xNDnVOCdvQ0 )

does that clear it up?

Rating

(8 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Alexander, February 14, 2013 - 2:52 pm UTC

"All inserts will go into the same right hand side block."

I don't understand this problem. I thought the way Oracle worked was that typically a block that needs to be modified will have a latch placed on it in the buffer cache, and all other transactions will wait until it is completed via a rollback or commit. Are you telling me different sides of a block can be modified concurrently? What determines where a row is placed?

I would not have thought it was useful to know this level of detail to what's occurring internally but apparently...

February 14, 2013 - 4:01 pm UTC

right hand side of the index, the index is a tree structure - with a root, branch and leaf blocks.

an index on a monotonically increasing value like a sequence will tend to have all inserts go into the right hand side of the *index*, the block on the far right hand side of the index structure (if you were to draw a picture of the tree on the white board - the block on the far right hand side)

I thought the way Oracle worked was that typically a block that needs to be modified will have a latch placed on it in the buffer cache, and all other transactions will wait until it is completed via a rollback or commit.

that is never true!

a block that needs to be modified is gotten in current mode (latched onto) and modified and then "un-gotten" immediately. You do not have to wait for a commit or a rollback. The serialization is not at the level of a transaction.

only one transaction can have a block in current mode at any time to modify it. It will hold that block for a very very very short period of time.

I was not talking about a right and left hand side of a block, but rather a block on the right hand side of an index.

Excelent thank you!

Andrew, February 14, 2013 - 5:21 pm UTC

Thanks Tom - that's exactly what I was after.

Thank you for a very informative day, even though you still had heaps of Jet lag :-)

February 14, 2013 - 5:39 pm UTC

and we'll be back in March - Wellington NZ and Melbourne AUS :)

Random Number

Ben Collier, February 15, 2013 - 3:41 am UTC

Tom,

Regarding point C, why three columns rather than two? How would this be different if the first was a random integer between 100 and 499 (equivalent to a 2 digit number in you second column) then a sequence value in a second column?

Fortunately we are still at the design stage.

Thanks,

Ben

February 15, 2013 - 9:29 am UTC

We want the first number to represent the node in a RAC cluster. We don't want, if at all possible, to have two nodes inserting into the same right hand side - that way, they won't contend for that particular block.

Now, since we used instance number first - we have a section of the index for each node. There is a "1" section, "2" section and so on (try to visualize the index in your head - see how all of the 1's are together, 2's together and so on....)

Then, within the 1's section - on a single node, we'd like to reduce concurrency istances (buffer busy waits basically) and ensure a single node doesn't have a hot right hand side in its section of the index. So, that is where the mod of process id or session id comes into play... We make each section of the index within an instance have a few insertion points.

The, we have that unique sequence.

If you are a single node, the leading number is not technically necessary, but it wouldn't hurt either - it would make you "RAC'able" in the future if that ever came about.

Sequence +

Jordan, February 15, 2013 - 2:16 pm UTC

Tom,

I was looking at an APEX app called Ask The Expert (which I suspect you have heard of) where there are two packages ATE_ID, ATE_RANDOM that are used in conjunction with a sequence to generate numeric IDs that are used to populate the PK/ID columns in several (most) of the tables.

I would guess that you know that the end result of ATE_ID.next_val is the ATE_SEQ.nextval followed by a number of random/changing digits from ATE_RANDOM.

I know there are differences from the approach described in c), but it seems similar. Does the ATE_ID approach help to address this same index contention issue? Or is my thinking a bit off?
Thanks.

February 15, 2013 - 5:58 pm UTC

it would be a contention free method for a single instance similar to the reverse key index above, yes. With the same drawbacks.

I went with that approach to make the ID's in the URLS much less guessable, if I used a sequence, people could probe for unpublished questions easily just by looking at the current sequence value. This made it harder (not impossible, but harder)

Multi-row insertion

DavidP, February 19, 2013 - 7:33 pm UTC

My encounter with this involved multiple new keys in a single transaction. I decided that only approach c)(different prefixes on the keys) works reasonably in that case - reverse keys and hash partitioning both make a multiple key transaction hit multiple physical index blocks, costing extra I/O compared to a key prefix that would hit the same block for each key from a single process, and they would still give some collisions (although not as many). Of course this was not designed into the application, and is now running on a RAC, making for lots of internode block shipping when there is no prefix.

Does consistent read apply undo on index blocks, or are they handled differently?

February 25, 2013 - 8:28 am UTC

read consistency applies to all data blocks - index, table, and so on. anything "data".

Thank You

Raghu, February 22, 2013 - 1:15 am UTC

Tom,

Thanks for the wonderful explanation. This will be very useful on both OLTP and OLAP databases.

Sequence based primary key performance issue

xinhuan zheng, April 27, 2013 - 5:50 pm UTC

Hi Tom,

We have production database using Sequence based primary key and it's quite a large table (~1 billion rows and still growing). What's the performance implication of that? From time to time I notice "buffer busy waits", "row lock contention", etc. wait from AWR report. Do you have good resource that is specific to this topic?

Thanks.

April 30, 2013 - 2:01 pm UTC

I wrote about in Exert oracle database architecture and expert one on one oracle - you have a hot right hand side index. everyone, every time they insert, must modify the same right hand block. that leads to buffer busy waits, index enq contention and the like.

see
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:6077843300346542181

Soco, May 03, 2013 - 2:44 am UTC

Hi Tom,

Could you explain a little bit further for your three options?

a. How do you tell which one cost more after changing the index to reverse between buffer busy wait and physical IO wait. It depends on different situations i guess.

b. Hash partition the index into 64 like you said will do the job, but how does that can affect other partition ability, table partition you mean i guess. The number of table partition need to be the same as the index, is this what you are trying to say? In RAC, is it similar to solution 'a'--reverse key index?

Thanks a lot and hope you have a good day.

May 06, 2013 - 6:13 pm UTC

a) it would depend on different situations - yes.

IF the index fits into memory - then the reserve key index will not result in increased physical IO, hence it would be suitable in all cases (but #3 is still *right*)

IF the index does not fit entirely into memory and/or pushes out other stuff that was in memory before but cannot fit anymore, then the increased physical IO's may make this non-suitable for some environments.

b) it does not affect the table partitioning scheme at all. the table doesn't even have to be partitioned.

It just uses up your partitioning tool for something that is is not needed for. Use partitioning for real query performance tuning or for ease of administration. Not for something that a) doesn't quite work in all environments, b) is easily solved by a very simple design.

More to Explore

VLDB

If you are new to partitioning, check out Connor McDonald's introduction series here.

VLDB

Documentation set on VLDB and Partitioning.

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

Excelent thank you!

Random Number

Sequence +

Multi-row insertion

Thank You

Sequence based primary key performance issue

More to Explore

VLDB

VLDB

Real World Performance Day Unique Index case study

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

Excelent thank you!

Random Number

Sequence +

Multi-row insertion

Thank You

Sequence based primary key performance issue

More to Explore

VLDB

VLDB