A Step-by-Step Introduction for Developers
Partitioning in the database reflects the same way we handle large tasks in the real world. When a task gets too large to tackle in one hit, whether it be trimming a tree, taking a long drive, or washing the dishes, we can split the task up into smaller pieces to make them more manageable. It can be as simple as trying to relocate one's catalog of Oracle technical books!
Partitioning is that same thought applied to data stored in the database. As the demands to store more and more data in the database increase, the performance of operations against those large tables can suffer. And in applying the Pareto principle to the storage of data, typically it is only a subset of the entire dataset that is actively worked upon to satisfy the day to day needs of our businses users. Using the Partitioning option in the Oracle Database, data can be segmented into smaller, more manageable chunks to make maintenance tasks easier for Database Administrators, but also give scope for better application performance via more efficient execution of the queries being issued by Application Developers.
There are different types of partitioning options available to cater for specific business requirements. There might be a requirement to break up SALES data into each calendar year. Or there might be information being gathered on popular SPORTS that will be separated because minimal cross-sport queries will ever be run. Or a table of cell phone CALLS might just be so large that it needs to be evenly scattered across smaller segments to keep them at a manageable size. All such options are possible with the Oracle Partitioning option.
There are other partitioning strategies as well for more esoteric requirements, including partitioning strategies between tables linked by referential integrity, and multi-dimensional forms of partitioning (partitions of partitions).
You get started by simply heading over to a free service from Oracle called livesql.oracle.com. This service lets you run SQL and create database objects with no software required other than your browser. There are also hundreds of sample scripts and tutorials on a multitude of topics to assist you with learning about the Oracle Database
Oracle Partitioning is also available as a fully supported feature of all Oracle Database Cloud Services and on-premise Oracle Database Enterprise edition. Here is a quick primer on LiveSQL by Oracle Vice President Mike Hichwa:
Oracle LiveSQL features:
LiveSQL is completely free for any usage. Sign up is free, quick and easy.
The SQL you write in LiveSQL can be metadata tagged, saved, shared with others, or shared to the entire Oracle community.
LiveSQL contains hundreds of tutorials written by experts both inside and outside Oracle Corporation to fast track your productivity.
LiveSQL runs on the latest version of the Oracle Database, so you can safely test out new features before you upgrade your own systems.
Perhaps the most common example of partitioning in Oracle databases is to divide up a large data into partitions based on a time attribute. The largest tables in your applications are often a time-based record of critical business activities. It could be sales transactions for a retail business, mobile phone calls for a telecommunications business, or deposits and withdrawals for a banking institution. In all of these cases, there are a couple of common elements, namely each transaction (row) in the table has a time stamp of when the transaction occurred, and typically the volume of such transactions is high, making the table large in size in a short period of time. Partitioning is a natural fit for such tables, because queries often want to only peruse time-based subsets of the data, for example, transactions for the current month, or current week. Also, breaking the large table into smaller more manageable sized pieces is useful for adminstrators from the perspective of maintenance. Time is a continuous (analog) measurement, and thus, a large table would be segmented into time-based ranges, hence the term used for this opertaion is range based partitioning. This video walks you through the creation a simple range partitioned table.
The keyword that defines that a table is partitioned is PARTITION BY which follows the normal table column definitions. The clause BY RANGE nominates the type of partitioning scheme that the table will use.
Range partitions are not given lower and upper bounds, only an upper bound. The lower bound is implicitly defined as the upper bound of the previous partition. A partition can contain values up to but not including the value nominated in the VALUES LESS THAN clause.
After the PARTITION BY clause, at least one partition must always be defined.
The data dictionary tracks all of the defined partitions for a table. USER_TAB_PARTITIONS displays one row per defined partition for each partitioned table in your schema.
Even with just a simple range partitioning example, we have enough tools at our disposal to examine the potential performance benefits possible by partitioning a table. When SQL queries consume too much I/O, often the only resolution considered is to create indexes on the table. The premise of indexing is simple: locate data more quickly and avoid scanning data unnecessarily. Partitioning a table takes the same approach via a concept known as "pruning" or "elimination". If a query on a partitioned table contains appropriately phrased predicates that include the partitioning column(s), the optimizer can generate an execution plan that will bypass those partitions that by definition, could not contain data relevant to the query. The following video shows a demonstration of this, including a comparison of the cost of partition pruning versus a conventional indexing strategy.
If the optimizer can eliminate partitions from consideration, query performance can be dramatically improved. In later videos, you will see how you can interpret the optimizer execution plan output to determine if partition elimination will occur for a given SQL query.
You can use the DUAL technique from the video to generate arbitrary test data for any table, partitioned or otherwise. As per the video, keep the number of rows generated from a single DUAL CONNECT BY query to less than tens of thousands, and use cartesian joins if you need to scale beyond that. See Tanel Poder's blog post on how PGA memory is impacted by these queries for the reason why you should not go to extremes.
In some circumstances, partitioning a table allows for existing indexes to be consolidated or dropped, which can reduce overall database size, and improve insert, update and delete performance.
For many tables, a single column accurately defines the ranges via which you might want to partition your data. But range partitioning in the Oracle Database is not limited to that.
You can nominate more than one column to be the partitioning "key", ie, the columns that define where data will be stored. The order of the columns is critical as the video demonstration shows:
Secondary and tertiary columns in the partitioning definition are only used as "tie breaker" values. When inserting a row into a multi-column range partitioned table, the first column in the partitioning key is used to determine the partition into which to store the row. If multiple partitions have the same value for the first column, then the second partitioning column is used and so forth. Multiple columns in the partition key does not a "matrix" or "n-dimensional" structure of partitions.
One of the examples in the video uses a NUMBER datatype to store date based information. As the video states, this is generally a poor design idea. See Storing Dates by Richard Foote for more examples of why you might want to rethink such an approach in your databases.
In the same way that USER_TABLES contains various columns to reflect the current optimizer statistics, USER_TAB_PARTITIONS also contains this information at a partition level. Thus you can use NUM_ROWS, BLOCKS et al to get information on the volumne of each partition, the accuracy being limited to the timing/granularity of statistics gathering with DBMS_STATS.
Sometimes partitioning is not about a logical segmentation of a table based on its attributes as is the case for range partitioning. When any database table gets massive in size, it is harder to manage for adminstrators because maintenance on such a table typically involves longer downtime for your business applications. Hash partitioning lets you segment a table in equi-sized chunks based on a hashing function applied to one or more columns of the table.
Creating a hash partitioned table is easy to do, but the number of hash partitions nominated is critical, as the video demonstration shows:
For equi-sized partitions, the number of partitions must be a power of 2.
The hashing algorithm is not documented but the ORA_HASH function is observed to return results consistent with the data segmentation that occurs with hash partitioning.
A partition can be split using the ALTER TABLE SPLIT PARTITION command, which is a task normally done by the Database Administrator. Spliting a partition is a resource heavy activity as potentially an entire partition's worth of data will be moved.
Range partitioning, as the name suggests, is about carving up data that is analog in nature, that is, a continuous range of values. This is why dates are a natural candidate for a range-based partitioning scheme.
But sometimes the column you might want to partition on contains a discrete set of values, which is when LIST partitioning is the best solution. Creating a LIST partitioned table requires nominating the discrete values for each partition, or relying on the new AUTOMATIC clause in 12c Release 2. The explicit value nomination technique is shown in the following video demonstration:
A single partition can contain one or more discrete values.
A "catch all" partition can be defined using the DEFAULT clause in the VALUES statement. Nulls also go into this partition.
A DML statement can explictly nominate a partition in a table to act upon using the PARTITION clause following the table name.
Even once partitioned, a table may still be extremely large. Or the partitioning scheme may result in partitions that are not equally sized on disk. For example, archiving off of older data might mean that historical partitions are far smaller than the current partitions. Equi-sized partitions might be beneficial in particular when it comes to performing operations in paralle on a per-partition basis.
Partitions can be segmented further into subpartitions. Each partition can have it's own subpartitioning scheme, or all partitions can share a common schema. The syntax for subpartitioning is explained in the following video demonstration:
The subpartitioning scheme can be different to the parent partitioning scheme.
Each partition can have its own subpartition definition. It is also valid for a partition to have no subpartitions.
The SUBPARTITION_POSITION column in USER_TAB_SUBPARTITIONS refers to the position of a subpartition within the parent position.
Subpartition templates make it easy to enforce a common scheme for all partitions and reduce the size of DDL scripts.
The existence of a subpartition determines whether a partition is a physical segment on disk, or just a logical collection of physical subpartition segments.
For incoming data, a partition must already exist that the incoming partitioning key would map to. If a partition does not exist, transactions will fail with ORA-14400. In earlier versions of Oracle Database, this meant that administrators had to take extreme care to ensure that partitions for future data were defined or risk application outages. Interval partitioning (introduced in Oracle Database 11g) removes this risk by automatically creating new partitions on a partitioned table as required when new data arrives.
Interval partitioned tables are different from range partition in that logical "gaps" in the partition coverage are permitted. How to define interval partitioned tables is explained in the following video demonstration:
As partitions are dynamically created, system generated names are assigned to the new partitions. They can be renamed to meet existing business standards.
Partitions are fixed in partitition key range size, namely the size of the interval. Unlike range partitions, the upper and lower bounds are defined by the interval, not the adjoining partitions.
If the name of an interval partition is not known, it can be referenced using the FOR ( key-value ) syntax.
Existing range partitioned tables can be converted into interval partition tables with a simple conversion command, requiring no downtime or data migration. The existing partitions in the range partitioned table remained defined as "range partitions" whilst new partitions created dynamically are "interval partitions". Thus a table that was not initially created as an interval partitioned table is a hybrid between the two, containing both types of partitions. Because intervals are defined as an offset from an initial fixed range partition boundary, you cannot drop all of the range partitions in an interval partitioned table. At least one must always remain. In Oracle Database 12c Release 2, the database will automatically convert the oldest interval partition into a range partition if no initial range partition boundary can be found.
How to convert a range partitioned table to an interval partitioned one is explained in the following video demonstration:
Simply specifying the desired interval via ALTER TABLE ... SET INTERVAL converts a range partitioned table to an interval partitioned one.
On a converted table, the INTERVAL column on USER_TAB_PARTITIONS indicates whether each partition is a range or interval partition.
At least one range partition must remain in the table. Before Oracle Database 12c Release 2, re-running the SET INTERVAL command will mark existing interval partitions as range partitions.
INTERVAL partitioning is great for range partitioning to avoid the need for regular maintenance by a DBA to ensure that all values can be stored. The same facility is available for LIST partitioned tables from Oracle Database 12c Release 2 onwards. In this way, you can ensure that legal data values that have not been defined as partition keys can automatically create partitions on the fly.
The syntax and usage for automatic lists is explained in the following video demonstration:
At least one static partition must still be defined
If needed, a partition can still be defined to hold null values
Automatic partitions are best suited for a finite set of distinct values. Don't fall into the trap of millions of partitions
Tables in a relational database do not work in isolation, and link them via declarative referential integrity. For this reason, a large table might not contain the column upon which we wish to partition it by. For example, a SALES table may be partitioned by a SALES_DATE, but a child table (say) SALES_ITEMS may only have a foreign back to the the parent SALES table, and thus no SALES_DATE column to partition on. Reference partitioning can be used to handle these more complex designs.
Reference partitions are demonstrated in the following video:
The foreign key on the child table defines the partitioning key
If you need to truncate a parent and its child, the CASCADE option in the TRUNCATE command can be used to bypass the usual foreign key checks
There are some restrictions when tables are referenced partitioned due to the closely coupled nature of their data
An index normally points to a row in a physical segment (a table). But a partitioned table consists of multiple physical segments, so the index structures and the data contained within the index need to be slightly different for partitioned tables. If the index spans all partitions, what happens when we perform maintenance on a single partition, such as dropping it? What happens to the index entries that reference the now dropped partition?
Indexes syntax and their relationshiup to partitioned tables is covered in the following video:
The index spans all partitions, which means the ROWID for each index entry is larger to reference the partition (the database object) as well as where the row is within that partition
Modifying table partitions (truncate, split, merge, drop etc) can mark the index to a status of UNUSABLE. By default, no errors are raised when this happens, your query plans just no longer use the index. The UPDATE INDEXES clause can be beneficial here
In 12c and above, DROP and TRUNCATE partition maintenance is less impactful with global indexes since index cleanup is done in the background.
An index can be equipartitioned with its underlying table. Such an index is known as a local index. Local indexes can have significantly benefits when it comes to Information Lifecycle Management (ILM). Because table partitions operations such as TRUNCATE and DROP and isolated to a single index partition, the remainder of the index remains available and never needs unncessary maintenance. Local indexes are also lend themselves to easier partition exchange which is a useful technique for either archiving old data or introducing new data into a partitioned table with zero downtime.
Then benefits of local indexes are explained in the following video demonstration:
Use the STATUS column in USER_IND_PARTITIONS to keep track of index partition usability
Local index partitions can use smaller ROWIDs because they only need to reference a single segment
You can nominate which table partitions will be indexed and which will not to best suit your application requirements
Take advantage of compression and/or in-memory to best adjust the indexing needs on a partition by partition basis
The previous two videos discuss several apparent limitations of global indexes and plenty of advantages of local indexes. This seems to suggest that local indexes should always be used in preference to global indexes. This is not the case, and each indexing technique should be employed to best match the application requirements you have. A local index can be the absolutely wrong choice for particular kinds of queries. However, sometimes you may need to choose some design compromises if you want to get the best of both worlds - partition independence but still with solid declarative database design.
Deciding on which index type to use is explained in the following video demonstration:
Be careful that index lookups do not degrade in performance by incorrectly choosing a local index partitioning strategy.
Sometimes bringing the partitioning key into the physical design of the primary key can be beneficial even if the logical database design does not require it.
The asynchronous cleanup of global indexes makes the distinction between local and gloval indexes less pronounced.
This tutorial series has covered the common partitioning strategies adopted by developers to build successful applications on large volumes of data. However, there are niche cases that also need to be considered. For example, you could have an index that partitioned on a table that is not partitioned. Similarly, the index partitioning strategy might not align with the partitioning strategy for the underlying table. Such examples are typically rare, however there is one important case where hash partitioning an index is critical to achieve extreme levels of OLTP performance.
The hash partitiong technique for OLTP performance is explained in the following video demonstration:
The partitioning for an index does not need to match the partition of the table it is based on, but this is rare.
Partitioning is an effective means of distributing activity over multiple segments to reduce contention for common blocks of data.
Whereas most partitioning syntax is simply PARTITION BY, to partition an index you need to use the GLOBAL keyword as a prefix.
Perhaps the most attractive benefit of partitioning is that performance improvements that can be achieved when querying partitioned tables. Partition pruning is the term used to describe when the predicates of a query are such that the entire table does not need to be scanned, but only a subset of the partitions. Detecting partition pruning relies on understanding the execution plan for a given query.
Various examples of when partitioning pruning can occur (and when it cannot) are explained in the following video demonstration:
The Pstart/Pstop columns in execution plan show the range of partitions scanned where possible. When KEY is displayed, this means the decision is made at execution time not at parse time.
Partition pruning works with bind variables as well as literal values for the partition key predicates.
Since interval partitioned tables already have all 1 million partitions logically defined, Pstart/Pstop values can appear misleading.
Pruning can occur with equality predicates, range predicatess, in-list expressions and many other permutations.
Partitioned tables are rarely queried in isolation, and thus gaining the benefits of partition pruning even when a partitioned table is joined to another table, or involved with a subquery is a critical feature of the Oracle Database. By utilising a memory structure know as a Bloom filter, the database can quickly identify which partitions will be needed to satisfy a given query.
Detecting Bloom filter usage and understanding the execution plan is explained in the following video demonstration:
A :BF[digit] entry in the plan indicates a Bloom filter is being used
Bloom filters use an efficient memory structure to identify which partitions can be pruned.
With a Bloom filter, a false positive is possible so a join operation may still process slightly more data that it needs to, but false negatives are impossible, thus incorrect results from your query cannot occur.
When two tables have identical partitioning definitions, the database can take advantage of this knowledge. If we had tables of animals, then commonsense tells us we will never match breeds between the dogs partition in one table and the cats partition in the other table. A join should be able to detect when two disparate partitions could never have a matching row, and therefore be eliminated from a join operation. This is known as a partition-wise join.
Partition-wise joins are explained in the following video demonstration:
The partition definitions must align exactly for a full partition-wise join to occur
Reference partitions are perfect candidates for partition-wise joins since the partitions between the two related tables are identical by definition
Understanding the placement of the JOIN and PARTIION lines in the execution plan to detect a partition-wise join.
You are ready to start taking advantage of the Oracle Partitioning option. Check the resources below to keep your partitioning knowledge growing
Partitioning features have been in the database for over 20 years, since Oracle Database 8.0. Partitioning has its very own manual in the Oracle Documentation, and there are many other sources of information to boost your knowledge.