Skip to Main Content
  • Questions
  • criteria for selecting a partition key

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question.

Asked: July 10, 2001 - 10:08 am UTC

Last updated: January 11, 2007 - 9:25 am UTC

Version: 8.1.7

Viewed 10K+ times! This question is

You Asked

Tom

1.Is it necessary that a partition key for a partitioned table should mandatorily have column or columns which are part of a primary key. If this is true , why is it so?

2.What is the criteria to select a partition key for a particular table. Are there any set of rules or guidelines to be considered before we choose a column or columns as the partition key.( especially when we dont have an appropriate date or number column).

3.How many local and global indexes can we have on a partitioned table( I assume any number of them can be created , but I want to be sure).
Before creating a local or global index on a partitioned table , what are the things which need to be considered.

4.Is it necessary that a global index needs to be rebuilt , if we split or merge a partition.

5.Suppose we are exchanging a partition with a table. Will the table have an index created if

1. the partitioned table had local index
2.the partitioned table had global index

Thank you


and Tom said...

1) if the primay key is NOT in the partition key -- then the primary key must use a GLOBAL index and n a data warehouse global indexes should be avoided.

Otherwise, it is perfectly OK to not have the primary key in the partition key -- just bear in mind that in order to do uniqueness your index used to enforce the primary key will be a globally partitiond index -- not local.

2) If there is no natural "ah-hah -- this is the partition key" set of columns, you are probably a candidate for HASH partitioning.

Normally, the partition key falls out very naturally, by region, by time, by code, etc. If not -- hash partitioning might be for you. You would have to look at your table and how you use it to determine this.

3) any number

in a data warehouse -- local indexes should be used and global should be avoided. Otherwise, it depends on HOW you use the data.

4) yes. most any partition level operation will invalidate a global index (split, merge, drop)

5) when you exchange a table with a partitionn, if the table has all of the indexes that support the local indexes built, they can be swapped as well. All global indexes on the partitioned table will become invalid after the switch -- they need to have some data from this new partition merged into them and will have to be rebuilt.

Rating

  (17 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

What if partitions resides on same tablespace

Shailandra, October 09, 2001 - 5:36 pm UTC

Tom

1. Will there be any performance improvement, in case table is partitioned having all partition on same tablespace compared to non-partitioned table. How does oracle parallelize fetch operation when all partitions exists on the same tablespace? I mean when one person fetching records from one partitions and another person from another partition.

2. Could updation of partition key be an important criteria? Suppose you choose a partition a table on a partition key, which is a frequently updated in the table. This may cause some overhead in terms of moving the row from one partition to another partition. This may also cause invalidation of global indexes.

Thanks

Tom Kyte
October 09, 2001 - 7:29 pm UTC

1) its not the TABLESPACE so much as the distribution of IO across your disks.

A tablespace may consist of many files -- each file can be on a separate disk. You can spread IO out by using >1 file in a tablespace.

Whether you have a multi-partition table in a single tablespace with many disks versus a non-partitioned table in a single tablespace with many disks won't matter too much performance wise.

Using >1 tablespace with a partitioned table gives you better abilities to place specific bits of data here and there, gives you higher availability, better recovery and so on. If you are using partitioning, there are good reasons to use more then one tablespace but performance is not necessarily the driving factor here.

2) it will NOT invalidate a global index (updating a partition key). Partition level operations like add, drop, split, merge -- those will invalidate the global index. A simple update -- not a chance.

Yes, a partition key that is modifiable and is in fact modified will add overhead. The update will be more like a delete/insert rather then an update as the data must move from partition to partition sometimes. It should be avoided but it is supported when you need it.

criteria for selecting a partition key

jack, June 15, 2003 - 1:11 am UTC

Hi,
we are having an order entry system and key columns in this table are

order_id (PK)
customer_id,
order_date,
order_status .....and so on.

Typically our batch programs use order_date and order_status(90% order_status column and status for an order can move to 3-4 status's based on the processing steps) combination to process the orders.
But our UI programs (99%) use customer_id to display the orders inforamation.
Can you pl. suggest me in which way to partition this table and what columns I need to include for better batch and UI response .
thanks


Tom Kyte
June 15, 2003 - 9:43 am UTC

your UI programs will be doing keyed reads by customer_id.

your batch won't.

partition the table for batch.

create a globally partitioned index for UI based on customer_id.


It is hard to say more then that generalization as I don't know all about your system, but it would look like you would want to benchmark and test partitioning the table by order_date ranges and making sure the UI applications (using indexed reads) have indexes partitioned best for them.

partition

santhanam, June 16, 2003 - 12:25 am UTC

Excellent

A reader, January 05, 2007 - 1:03 am UTC

Hi Tom,
You have written that partitioning should not be done on a primary key? Why So? I guess its better to partition on a primary key if you are accessing the table using that column only. Gets the result faster. Right?
Also again you said in the first post that In DataWareHouse, you should avoid global indexes? again why? If you are accessing the table other than the partitioning key, isnt it better to create a global index on that column?

Another doubt:
Suppose i have a table test1 partitioned on key1 and table test2(count is 1 million). i am joining the tables test1 and test2 on test1.key1=test2.col2 and putting a filter on column test1.col1. Now do i need to create a local index on col1 or a local index on (key1,col1) or a global index on col1? Which one will be better if we want the result fast? The column test1.col1 has around say 10,000 distinct values and the count of test1 is 1.5million.
Another Scenario: What if i am joining tables test1 and test 2 on test1.col_non_part=test2.col3 and putting a filter on column test1.col1 . Do I need o create a global index on col1 or combined global index on (col_non_part,col1)
Tom Kyte
January 05, 2007 - 9:52 am UTC

where did I say that about a primary key???


in the past, in really old software like 8i, global indexes would not be maintained during partition operations (drop, add, split, merge, exchange and the like)

today in 2006, that restriction is removed - global indexes can be used effectively in a warehouse.

A Reader, January 05, 2007 - 1:10 am UTC

The Version for the above post is 9.2.0.8

A Reader, January 05, 2007 - 10:03 am UTC

Hi Tom,

you didnt answer my doubt regarding the choice of global and local indexes. Please answer it.
Tom Kyte
January 05, 2007 - 10:50 am UTC

it is not clear that you even need an index - there is insufficient data to say that here.

Mohamed, January 05, 2007 - 10:17 am UTC

Partition should be done by the most relevant partition key (the column that will figure in the most important queries).

When the primary key is equal to the partition key (or is a part of it) the corresponding index should be locally partitioned.

If the partition key is not among the primary key then the corresponding index should be globally partitioned in order to avoid lock of local partitioned index (if of course you maintain local instead of global partitioned index)

For the join, I think you have to partition master tables and slave ones in the same manner. This will facilitate futur exchange partitions and will help with partition wise join

Kind Regards
Tom Kyte
January 06, 2007 - 8:02 am UTC

when the primary key is the partition key it will either be locally partitioned or not partitioned. It need not be partitioned (the index used in support of the primary key)


I don't know why you said "if the partition key is not among the primary key..." paragraph at all. That didn't make sense.

No one can really say much clearly about what should or should not be indexed or how it would be indexed in the above - there quite simply is insufficient data to answer. You would have to exhaustively list all of the situations and there would be ones where a) no indexes, none b) global partitioned would be fine c) no partitioning of index would be fine d) locally partitioned might make sense.

A reader, January 05, 2007 - 10:21 am UTC

Thanks Mohammad but one doubt.
If i have created partition on some column say primary key. Then is there any need to create an index on it?
i think if you are doing partitioning just one one column then there is no need to create any index on that column. Am i right Tom?
Tom Kyte
January 06, 2007 - 8:03 am UTC

if you partition by the primary key - there will be an index on the primary key or at least an index that includes the primary key on the leading edge - that is a fact.

it is true that in general, you need not index the partition key. However, if the partition key is the PRIMARY KEY - I can assure you that all of the columns that constitute the primary key will be in some index.

A Reader, January 07, 2007 - 11:20 pm UTC

Hi Tom,

Thanks for the clarification. What i meant was when i partition on a column, i dont have to create an index on it. But if i have defined it as a primary key, an index will automatically be created.


But you havent answered one of my question. Please help me as i need to figure out when to create local index and when to create global indexes. Here are the scenarios:

Suppose i have a table test1 partitioned on key1 and table test2(count is 1 million). i am joining the tables test1 and test2 on test1.key1=test2.col2 and putting a filter on column test1.col1. Now do i need to create a local index on col1 or a local index on (key1,col1) or a global index on col1? Which one will be better if we want the result fast? The column test1.col1 has around say 10,000 distinct values and the count of test1 is 1.5million.
Another Scenario: What if i am joining tables test1 and test 2 on test1.col_non_part=test2.col3 and putting a filter on column test1.col1 . Do I need o create a global index on col1 or combined global index on (col_non_part,col1)

Tom Kyte
January 08, 2007 - 12:17 pm UTC

I've said it before, there are only multiple answers for this.

test1 - partitioned by key1
test2 - not partitioned

join test1.key1 = test2.col2

select *
from t1, t2
where t1.key1 = t2.col2
and t1.col1 = :x;


an index on t1(key1,col1) would not make sense.

assuming you need to go to the table T1 to pick up more than just key1, an index on just col1 *might* make sense.

The "fastest" way to pick up 10,000 rowids "where t1.col1 = :x" would be a non-partitioned index or a globally partitioned index on col1 in order to limit the number of range scans to just 1 - but (big big but) - you don't mention how many partitions T1 is in - so if it is "small", doing N range scans that result in 10,000 rows all together isn't going to be that much of an overhead at all - a small bit of the work needed to be performed.

So, global, local, non-partitioned - probably won't make a bit of difference one way or the other.

A reader, January 08, 2007 - 9:53 am UTC

Hi Tom,

Please clear the above doubts.


A Reader, January 08, 2007 - 12:33 pm UTC

Hi Tom,

Needed a few clarifications.

you wrote that -- an index on t1(key1,col1) would not make sense.
Even a global index might not make sense? i am sure local index is not going to make any but why global index wont?
and does that mean if this is the case there is no need to create any index as the performance of the query cant be improved(assuming query is taking long time 20 min)

Can you Please clarify this:
"assuming you need to go to the table T1 to pick up more than just key1, an index on just col1 *might* make sense. "

The "fastest" way to pick up 10,000 rowids "where t1.col1 = :x" would be a non-partitioned index or a globally partitioned index on col1 in order to limit the number of range scans to just 1 - but (big big but) - you don't mention how many partitions T1 is in - so if it is "small", doing N range scans that result in 10,000 rows all together isn't going to be that much of an overhead at all - a small bit of the work needed to be performed.

You said above that if the no. of partitions is small, global partitioned or non-partitionerd might help. the table T1 has 15 partitions. Is this small? And if i say that Local indexes on column T1.col1 in this case wont help at all then would i b right?

Tom Kyte
January 08, 2007 - 1:23 pm UTC

look at the query:

select *
from t1, t2
where t1.key1 = t2.col2
and t1.col1 = :x;

unless we index skip scanned an index on t1(key1,col1) how would an index on those columns be useful.

You want to

a) locate rows where col1 = :x then
b) join that row to mates in t2

you would desire (probably) col1 on the leading edge, an index with key1 on the leading would do what for you??



the comment about "assuming you need to go tot he table t1..." means if T1 looks like:

t1 ( key1, col1, col99 )


if you need all three columns, we would still have to read the index and go to table T1 by rowid to pick up col99 if you included key1 in the index. Therefore an index on

t1( col1, key1 ) would be used to find a rowid and that rowid would be used to get col99

and an index on

t1( col1 ) would be used to find a rowid and that rowid would be used to get key1, col99.

So, having key1 in the index would buy you nothing (probably) given you have to go to t1 anyway.



I did not say that - what you say I said. I said if the number of partitions is SMALL the AMOUNT OF ADDITIONAL (not less - MORE - additional) work will be small.


I frankly don't think it will matter one way or the other here at all. local, global, non-partitioned.


10,000 rows retrieved from N indexes (won't be affected too much if there is ONE index or 15, the cost of getting 10,000 rows itself is the big cost)

10,000 table access by index rowid (now, there is a different between a local and global rowid - that *might* have some measurable impact, but again, pretty small here) - the 10,000 table accesses - big, that is where the bulk of the time is, not in the range scan already done.

10,000 joins to t2 - that is the big time again.

A Reader, January 08, 2007 - 11:06 pm UTC

Hi Tom,
i wont say that i got everything what you said but got most of the things. But again some doubts i have
Consider this query:
select *
from t1, t2
where t1.key1 = t2.col2
and t1.col1 = :x;

Here you are assuming that this query will return 10,000 rows but in our case, this query will return at most 100 rows. So Now do you think creating a global index on (key1,col1) would be helpfull so that it can index skip scan?
Or Just an index on Col1 would suffice as you said if we want more columns from table T1?

Actually i am pretty much confused here. Some of my colleagues are asking me to create a local index on col1 but i think that creating a local index on col1 is pure waste but a global index might help. Then again another colleague came up and said that if you create local index on (key1,col1) it might help. So Need your expertise on this. As far as my views are concerned, a global or non-partitioned index on col1 will definetely help. What do you say?
And you also told about the no. of partitions that if its small, the overhead would not be much. How many are small actually?
Tom Kyte
January 09, 2007 - 7:44 am UTC

Oh, I see I read this one wrong:

...
The column test1.col1 has around say 10,000 distinct values and the count of test1 is 1.5million.

....


so where col1 = :x will return about 1,500,000/10,000 or 150 - so everywhere I said 10,000 above should be 150.

Now, 150 is small - and to inspect 15 indexes to find an average of 10 rows in each - would likely take us from doing 5 LIO against the index (3 to read root, branch and leaf and 1 or 2 more leaves to pick up the other 150 keys+rowids) to performing probably 45 plus LIOs (3 against each of the 15 local indexes).


Now that said - a global, a non-partitioned, or a local index would definitely help find the 100 or so rows - they all would.

A global or non-partitioned index on col1 would do it in about 5 LIO's against the index.

A local index would do it in about 15 * 3 LIOs.


The global/non-partitioned would be measurably more efficient in this particular case in all probability.


creating an index on (key1,col1) in order to tune this query will not make sense - you need col1 on the LEADING EDGE, unless key1 has so so so few values as to make a skip scan possible (but that will only INCREASE THE IOs here - you want col1 on the leading edge)



so, creating a local index is NOT a pure waste and depending on other circumstances might in fact be the right answer. Suppose you frequently do partition operations (drop, truncate, whatever) to this table. If this index were a global index - that would impact those operations, especially if this were the ONLY global index (or non-partitioned). So other considerations outside of this individual query might drive them to local.

A globally partitioned index (partitioned on col1) or a non-partitioned index would result in the least LIO's overall (as there is just one index range scan, not 15) - however, the presence of this global/non-partitioned index might negatively impact other parts of this application so the fact the local index does a little more work might be an acceptable thing.


A Reader, January 09, 2007 - 8:33 am UTC


i guess i need to be more clear on this.

i have a fact table(T1)(in a datawarehouse) in which i have around 6 foreign keys and i was asking all these question so as to confirm whether to create local indexes or global indexes on these columns. And the fact table is getting loaded using "Update or Insert Strategy". So now what do you say? Local or global?
Also you said this "however, the presence of this global/non-partitioned index might negatively impact other parts of this application so the fact the local index does a little more work might be an acceptable thing. "
Can you Please provide more detail on it.
Tom Kyte
January 11, 2007 - 9:25 am UTC

you cannot answer this question given this information.

do you understand how a local index works
do you understand how a global index works
do you understand when you would want to use one over the other

if not, you need to - that is the only way to understand when to use one or the other or neither!


If you have access to my book Expert Oracle Database Architecture, I explain in some detail what/how these things work and try to relate the knowledge/understanding you need to have in order to choose when to use one over the other.

A Reader, January 10, 2007 - 7:17 am UTC

Hi Tom,

Please clarify the above doubts.

Thanks

A reader, January 11, 2007 - 8:35 am UTC

Hi Tom,

Please clarify the above doubts.

A reader, January 11, 2007 - 9:46 am UTC

Hi Tom,

Any Online Docs where i can get all these information? Say In Oracle Docs?
Actually i searched in Oracle Docs for these but didnt get the detailed information about it. May be i missed somewhere. Can you Please provide the same?

Few words about partition

Mohamed, January 11, 2007 - 10:08 am UTC

I highly recommend you to buy Tom book and Jonathan lewis one.

In order to help you a very very little bit, please read this
-- ---------------------
Partitioning can make a system run many orders of magnitude slower when it is inappropriately applied. When a SQL statement accesses one or more partitioned tables, the Oracle optimizer attempts to use the information in the WHERE clause to eliminate some of the partitions from consideration during statement execution. This process, called partition pruning, speeds statement execution by ignoring partitions that cannot satisfy the statements WHERE clause. To do so, the optimizer uses information from the table definition with information from the statements WHERE clause.

In order for the optimizer to make this type of decision in your application, the WHERE clause must reference at least one column from the set of columns that comprise the partition key. IF your partition key has no functional interest and your WHERE clauses do not reference it then the immediate consequence is that partition pruning (or partition elimination) will not occur. Partition pruning happens only if Your queries where clauses includes the partition key.

Locally partitioned indexes have the same structure as their relative table partitions. Local indexes contain partition keys that only map to rows stored in a single partition. This is a schema for locally partitioned indexes
(loc_ind here)

YOUR TABLE

PART_1 PART_2 PART_3 PART_4
loc_ind_1 loc_ind_2 loc_ind_3 loc_ind_4

Local indexes have the followings advantages(8i bases):

1.Only the local index Ind_loc_1 is affected if the only single partition Part_1 is involved in a DML statement.

2.Better performance: If the optimizer has eliminated partitions then only local indexes of the remaining partitions will be used

3.local index can be rebuilt individually

If the query does not precise the value of the partition key the optimizer will explore all existing partitions (PARTITION RANGE ALL).


When the primary key is equal to the partition key (or is a part of it) the corresponding index should be locally partitioned.


Suppose that YOUR TABLE has been range partitioned on a given dat_partition and "key primared" only by a ide, then in that case the index that should be used for the enforcement of the primary key must be a global partitioned index.

Finally you should make a difference between
locally partitioned index
globally partitioned index
global non partitioned index (normal index)

-- -----------------------
Do not forget to buy those two books. I highly recommand them to you.

More to Explore

VLDB

If you are new to partitioning, check out Connor McDonald's introduction series here.

VLDB

Documentation set on VLDB and Partitioning.