Skip to Main Content
  • Questions
  • Partitioning tables with unequal partitions.

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Khan.

Asked: April 07, 2008 - 3:26 pm UTC

Last updated: April 10, 2008 - 10:23 am UTC

Version: 10.2

Viewed 1000+ times

You Asked

Hi Tom,

We have data coming into our application from different entities/agencies and being loaded into several different tables. Each transmission contains data that belongs to a unique entity. We can also receive a new transmission for each entity, in which case we need to delete the previous transmission's data. My thought was to partition each table by that entity, i.e. use list partitioning, since we also get an entity id in our transmission. All our subsequent use of this data is always divided at the entity level, so the partitioning works out nicely. In the event we receive a new set of data, I can simply truncate the existing partition and reload the new data. However the size of the incoming data can vary wildly, e.g. we can have data for one entity that is in the low thousands, and for other entities that can go into the millions. My concern is that I will end up with partitions that will not be sized equally, so some partitions will be a few megs, and other partitions can be several gigs. Note that we do not have any indexes on the tables being loaded.

Is this something I need worry about? Any ideas/best practices for such a requirement? I also toyed with the idea of using a bitmap index on the entity id, so that each query against these tables will be able to use this index. At any given time I do not anticipate us having more than 20 entities. However with the variation in the size of each entity's data, I do not think I will gain much from a bitmap index, especially for the entities with larger data.

Thanks as always.

Kashif


and Tom said...

... My concern is that I will end up with partitions that will not be sized equally, so some partitions will be a few megs, and other partitions can be several gigs. ....

that is nothing to be worried about.

You are partitioning to make ease of administration easier - you have accomplished that.

Having segments that are "several gigs" is not a problem


... I also toyed with the idea of using a bitmap index on the entity id, so that each query against these tables will be able to use this index ...

why?? you have partitioned by entity, so to find all of the data for an entity would be a simple partition scan.


Having unbalanced sized partitions is perfectly OK - as long as you met the goal you wanted to meet with regards to partitioning - which it sounds like you did - easier administration.

Just because something has "low cardinality" does NOT mean "think bitmaps"....

Rating

  (4 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Missing the point?

A reader, April 08, 2008 - 8:03 am UTC

"as long as you met the goal you wanted to meet with regards to partitioning - which it sounds like you did - easier administration."

Why do you assume the user's goal is easier administration? I don't see that at all. I think the user is looking for better performance since he also talks about using bitmaps indexes which has nothing to do with easier admin.

I also see the user's concern regarding unbalanced list partitions. Yes, to find the data for an entity is a simple partition scan but seems to me it will take longer to scan a partition with several gigs than a partition with a few megs. I think this is the whole point, not easier admin.


Tom Kyte
April 09, 2008 - 2:12 pm UTC

because EVERYTHING mentioned in the question was about "ease of adminstration"

They stated they partitioned to truncate in the event of a reload. That is by definition ease of administration.

everything they said regarding partitioning was about ease of administration.

there was this red herring of a bitmap index - something that would not have done a thing for performance (rather the opposite in this case)

Didn't miss the point

Mike, April 08, 2008 - 8:52 am UTC

Given that the questioner said:
In the event we receive a new set of data, I can simply truncate the existing partition and reload the new data.
(and other related points)

and said nothing about using partitioning for a performance goal, I would also have concluded that 'ease of administration' in managing the entity sets of data was the primary motivation here.

Performance + ease of administration.

Kashif, April 08, 2008 - 10:11 am UTC

To the reviewers,

Thanks for your input. My prime concern is performance, administrative ease is secondary, though I don't see why addressing what my objective is necessary here, the answer would still have been the same?

To Mike from Cleveland,

Truncating partitions is also a performance gain, consider the alternative of having to delete from an unpartitioned table using an entity id. With the partition truncate, I can get rid of the entity's data easily, instead of potentially full-scanning the table to retrieve the rows to be deleted.

Kashif

Tom Kyte
April 09, 2008 - 2:23 pm UTC

...Truncating partitions is also a performance gain, consider the alternative of ...

not day to day - second to second. As a person that works with Oracle a lot - the only thing I could derive from your original problem statement was "ease of administration"

Hmm...

Kashif, April 09, 2008 - 4:54 pm UTC

I'm not sure what you mean by "not day to day - second to second". Truncating a partition would typically be quicker than deleting, no?

Kashif

Tom Kyte
April 10, 2008 - 10:23 am UTC

meaning, this is an administrative task you would perform once in a blue moon - no frequently. One would not think of it as a "day to day, second to second performance improvement"

rather, it makes an administrative task easier and more efficient to perform, that is all.

Hopefully, this is something that hardly ever happens (else I'd be looking hard and long at my process and figuring out why it is so broken)

More to Explore

VLDB

If you are new to partitioning, check out Connor McDonald's introduction series here.

VLDB

Documentation set on VLDB and Partitioning.