Jon -- Thanks for the question regarding "Local versus Global indexes on daily changing partitioned table", version 10.2.0.2
Submitted on 17-Apr-2007 1:49 Central time zone
Last updated 8-Sep-2011 17:50
You Asked
I have this logging table which has a few million records per partition.
CREATE TABLE LOG_EXAMPLE
(TRANSACTION_ID NUMBER PRIMARY KEY NOT NULL,
APPLICATION_ID NUMBER NOT NULL,
LOG_TIME DATE DEFAULT sysdate NOT NULL,
ERROR_MESSAGE VARCHAR2(2000),
LOG_TYPE VARCHAR2(10) NOT NULL,
WARNING_MESSAGE VARCHAR2(2000),
USER_ID VARCHAR2(30) NOT NULL)
PARTITION BY RANGE (LOG_TIME)
(
PARTITION TRAN_PART_2007_03_18 VALUES LESS THAN (TO_DATE('2007-03-19', 'YYYY-MM-DD')),
PARTITION TRAN_PART_2007_03_19 VALUES LESS THAN (TO_DATE('2007-03-20', 'YYYY-MM-DD')),
PARTITION TRAN_PART_2007_03_20 VALUES LESS THAN (TO_DATE('2007-03-21', 'YYYY-MM-DD')),
PARTITION TRAN_PART_2007_03_21 VALUES LESS THAN (TO_DATE('2007-03-22', 'YYYY-MM-DD')),
PARTITION TRAN_PART_2007_03_22 VALUES LESS THAN (TO_DATE('2007-03-23', 'YYYY-MM-DD')),
PARTITION TRAN_PART_2007_03_23 VALUES LESS THAN (TO_DATE('2007-03-24', 'YYYY-MM-DD')),
PARTITION TRAN_PART_2007_03_24 VALUES LESS THAN (TO_DATE('2007-03-25', 'YYYY-MM-DD')),
PARTITION TRAN_PART_2007_03_25 VALUES LESS THAN (TO_DATE('2007-03-26', 'YYYY-MM-DD')),
PARTITION TRAN_PART_2007_03_26 VALUES LESS THAN (TO_DATE('2007-03-27', 'YYYY-MM-DD')),
PARTITION TRAN_PART_2007_03_27 VALUES LESS THAN (TO_DATE('2007-03-28', 'YYYY-MM-DD')),
PARTITION TRAN_PART_2007_03_28 VALUES LESS THAN (TO_DATE('2007-03-29', 'YYYY-MM-DD')),
PARTITION TRAN_PART_2007_03_29 VALUES LESS THAN (TO_DATE('2007-03-30', 'YYYY-MM-DD')),
PARTITION TRAN_PART_2007_03_30 VALUES LESS THAN (TO_DATE('2007-03-31', 'YYYY-MM-DD')),
PARTITION TRAN_PART_2007_03_31 VALUES LESS THAN (TO_DATE('2007-04-01', 'YYYY-MM-DD')),
PARTITION TRAN_PART_2007_04_01 VALUES LESS THAN (TO_DATE('2007-04-02', 'YYYY-MM-DD')),
PARTITION TRAN_PART_2007_04_02 VALUES LESS THAN (TO_DATE('2007-04-03', 'YYYY-MM-DD')),
PARTITION TRAN_PART_2007_04_03 VALUES LESS THAN (TO_DATE('2007-04-04', 'YYYY-MM-DD')),
PARTITION TRAN_PART_2007_04_04 VALUES LESS THAN (TO_DATE('2007-04-05', 'YYYY-MM-DD')),
PARTITION TRAN_PART_2007_04_05 VALUES LESS THAN (TO_DATE('2007-04-06', 'YYYY-MM-DD')),
PARTITION TRAN_PART_2007_04_06 VALUES LESS THAN (TO_DATE('2007-04-07', 'YYYY-MM-DD')),
PARTITION TRAN_PART_2007_04_07 VALUES LESS THAN (TO_DATE('2007-04-08', 'YYYY-MM-DD')),
PARTITION TRAN_PART_2007_04_08 VALUES LESS THAN (TO_DATE('2007-04-09', 'YYYY-MM-DD')),
PARTITION TRAN_PART_2007_04_09 VALUES LESS THAN (TO_DATE('2007-04-10', 'YYYY-MM-DD')),
PARTITION TRAN_PART_2007_04_10 VALUES LESS THAN (TO_DATE('2007-04-11', 'YYYY-MM-DD')),
PARTITION TRAN_PART_2007_04_11 VALUES LESS THAN (TO_DATE('2007-04-12', 'YYYY-MM-DD')),
PARTITION TRAN_PART_2007_04_12 VALUES LESS THAN (TO_DATE('2007-04-13', 'YYYY-MM-DD')),
PARTITION TRAN_PART_2007_04_13 VALUES LESS THAN (TO_DATE('2007-04-14', 'YYYY-MM-DD')),
PARTITION TRAN_PART_2007_04_14 VALUES LESS THAN (TO_DATE('2007-04-15', 'YYYY-MM-DD')),
PARTITION TRAN_PART_2007_04_15 VALUES LESS THAN (TO_DATE('2007-04-16', 'YYYY-MM-DD')),
PARTITION TRAN_PART_2007_04_16 VALUES LESS THAN (TO_DATE('2007-04-17', 'YYYY-MM-DD')),
PARTITION TRAN_PART_2007_04_17 VALUES LESS THAN (TO_DATE('2007-04-18', 'YYYY-MM-DD')),
PARTITION TRAN_PART_2007_04_18 VALUES LESS THAN (TO_DATE('2007-04-19', 'YYYY-MM-DD')));
CREATE SEQUENCE TRANSACTION_ID_SEQ;
CREATE OR REPLACE TRIGGER T_LOG_EXAMPLE_BI
BEFORE INSERT
ON LOG_EXAMPLE
REFERENCING OLD AS OLD NEW AS NEW
FOR EACH ROW
begin
if :new.transaction_id is null then
select transaction_id_seq.nextval
into :new.transaction_id
from dual;
end if;
end t_transaction_bi;
CREATE INDEX LOG_TIME_IDX ON LOG_EXAMPLE (LOG_TIME, APPLICATION_ID);
--users want to sort by date within a parition and only get the top 100 results.
CREATE INDEX APPLICATION_IDX ON LOG_EXAMPLE (APPLICATION_ID);
There is row level security based on the application_id. That is why the indexes include the application_id column. Plus there are some other fields and foreign keys I removed to make this simple.
On a daily basis, a job runs to drop the oldest partition and add a new one.
--drop partition snippet
for v_rec in cur_drop_partitions loop
execute immediate ('alter table LOG_EXAMPLE drop partition ' || v_rec.partition_name || ' UPDATE GLOBAL INDEXES');
end loop;
--add partition snippet
for i in 1 .. v_diff loop
execute immediate('ALTER TABLE LOG_EXAMPLE ' ||
'ADD PARTITION TRAN_PART_' || to_char(v_max_date + i, 'yyyy_mm_dd') || ' ' ||
'VALUES LESS THAN (''' || to_char(v_max_date + i + 1, 'dd-mon-yyyy') || ''') ' ||
'TABLESPACE ' || v_tablespace);
end loop;
We are finding out that most users are searching for one particular day or one log type or combination of log type and day.
I understand the benefit of adding "update global indexes" so users don't have an outage. My question is, which is better given the fact that my table is changing daily plus, most queries are limited to a single partition?
Maybe the solution is a mix of local and global indexes?
and we said...
neither is "better"
the question you have to answer is:
a) do I want fast ddl, but an outage
b) can I deal with slow ddl, but provide continuous access
<quote src=Expert Oracle Database Architecture>
¿Live¿ Global Index Maintenance
Starting in Oracle9i, another option was added to partition maintenance: the ability to maintain the global indexes during the partition operation using the UPDATE GLOBAL INDEXES clause. This means that as you drop a partition, split a partition, perform whatever operation necessary on a partition ¿ Oracle will perform the necessary modifications to the global index to keep it up to date. Since most partition operations will cause this global index invalidation to occur, this feature can be a boon to systems that need to provide continual access to the data. You¿ll find that you sacrifice the raw speed of the partition operation, but with the associated window of unavailability immediately afterward as you rebuild indexes, for a slower overall response time from the partition operation but coupled with 100 percent data availability. In short, if you have a data warehouse that cannot have downtime, but must support these common data warehouse techniques of sliding data in and out, then this feature is for you¿but you must understand the implications.
Revisiting our previous example, if our partition operations had used the UPDATE GLOBAL INDEXES clause when relevant (in this example, it would not be needed on the ADD PARTITION statement since the newly added partition would not have any rows in it)
ops$tkyte@ORA10G> alter table partitioned
2 exchange partition fy_2004
3 with table fy_2004
4 including indexes
5 without validation
6 UPDATE GLOBAL INDEXES
7 /
Table altered.
ops$tkyte@ORA10G> alter table partitioned
2 drop partition fy_2004
3 UPDATE GLOBAL INDEXES
4 /
Table altered.
ops$tkyte@ORA10G> alter table partitioned
2 add partition fy_2006
3 values less than ( to_date('01-jan-2007','dd-mon-yyyy') )
4 /
Table altered.
ops$tkyte@ORA10G> alter table partitioned
2 exchange partition fy_2006
3 with table fy_2006
4 including indexes
5 without validation
6 UPDATE GLOBAL INDEXES
7 /
Table altered.
we would have discovered the indexes to be perfectly valid and usable both during and after the operation:
ops$tkyte@ORA10G> select index_name, status from user_indexes;
INDEX_NAME STATUS
------------------------------ --------
FY_2006_IDX VALID
FY_2004_IDX VALID
PARTITIONED_IDX_GLOBAL VALID
PARTITIONED_IDX_LOCAL N/A
6 rows selected.
ops$tkyte@ORA10G> set autotrace on explain
ops$tkyte@ORA10G> select count(*)
2 from partitioned
3 where timestamp between sysdate-50 and sysdate;
COUNT(*)
----------
6750
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=9 Card=1 Bytes=9)
1 0 SORT (AGGREGATE)
2 1 FILTER
3 2 INDEX (RANGE SCAN) OF 'PARTITIONED_IDX_GLOBAL' (INDEX) (Cost=9¿
But there is a tradeoff: we are performing the logical equivalent of INSERT and DELETE operations on the global index structures. When we drop a partition, we have to delete all of the global index entries that might be pointing to that that partition. When we did the exchange of a table with a partition, we had to delete all of the global index entries pointing to the original data and then insert all of the new ones that we just slid in there. So the amount of work performed by the ALTER commands was significantly increased.
In fact, using runstats and a slightly modified version of the preceding example, we can measure the amount of ¿extra¿ work performed to maintain the global indexes during the partition operation. We¿ll slide out FY_2004 and slide in FY_2006 as before, adding in the requisite index rebuild. This will be the sliding window implementation that causes the data to become unavailable due to the need to rebuild the global indexes. We¿ll also then slide out FY_2005 and slide in FY_2007 using the UPDATE GLOBAL INDEXES clause, to emulate the sliding window implementation that provides for full data availability, even during the partition operations. In that manner, we can measure the same operations using the different techniques and compare them. Our expected outcome will be that the first approach will consume fewer database resources and therefore perform ¿faster¿ but incur a measurable period of ¿downtime.¿ The second approach, while consuming more resources and perhaps taking longer overall, will not incur downtime. As far as the end users are concerned, their ability to work never ceased. They might have been processing a bit slower (since we were competing with them for resources), but they were still processing, and they never stopped.
So, if we take the earlier example, but also create an empty FY_2005 table like FY_2004, and a full FY_2007 table like FY_2006, we can measure the differences between the index rebuild approaches, starting first with the ¿less available approach¿:
exec runStats_pkg.rs_start;
alter table partitioned exchange partition fy_2004
with table fy_2004 including indexes without validation;
alter table partitioned drop partition fy_2004;
alter table partitioned add partition fy_2006
values less than ( to_date('01-jan-2007','dd-mon-yyyy') );
alter table partitioned exchange partition fy_2006
with table fy_2006 including indexes without validation;
alter index partitioned_idx_global rebuild;
exec runStats_pkg.rs_middle;
Here is the highly available UPDATE GLOBAL INDEXES approach:
alter table partitioned exchange partition fy_2005
with table fy_2005 including indexes without validation
update global indexes;
alter table partitioned drop partition fy_2005
update global indexes;
alter table partitioned add partition fy_2007
values less than ( to_date('01-jan-2008','dd-mon-yyyy') );
alter table partitioned exchange partition fy_2007
with table fy_2007 including indexes without validation
update global indexes;
exec runStats_pkg.rs_stop;
We might observe the following:
ops$tkyte@ORA10G> exec runStats_pkg.rs_stop;
Run1 ran in 81 hsecs
Run2 ran in 94 hsecs
run 1 ran in 86.17% of the time
Name Run1 Run2 Diff
...
STAT...CPU used when call star 39 59 20
...
STAT...redo entries 938 3,340 2,402
STAT...db block gets 1,348 5,441 4,093
STAT...session logical reads 2,178 6,455 4,277
...
LATCH.cache buffers chains 5,675 27,695 22,020
...
STAT...table scan rows gotten 97,711 131,427 33,716
STAT...undo change vector size 35,100 3,404,056 3,368,956
STAT...redo size 2,694,172 6,197,988 3,503,816
The index rebuild approach did run faster, both as observed by the elapsed time and the CPU time. This fact has caused many a DBA to pause and say, ¿Hey, I don¿t want to use UPDATE GLOBAL INDEXES¿it¿s slower.¿ That is too simplistic of a view, however. What you need to remember is that while the operations overall took longer, processing on your system was not necessarily interrupted. Sure, you as the DBA might be looking at your screen for a longer period of time, but the really important work that takes place on your system was still taking place. What you need to do is see if this tradeoff makes sense for you. If you have an eight-hour maintenance window overnight in which to load new data, then by all means, use the rebuild approach if that makes sense. However, if you have a mandate to be available continuously, then the ability to maintain the global indexes will be crucial.
Looking at the redo generated by each approach, we can see that the UPDATE GLOBAL INDEXES generated considerably more¿over 230 percent more¿and we would expect that to only go up as we add more and more global indexes to the table. The redo generated by the UPDATE GLOBAL INDEXES is unavoidable and cannot be turned off via NOLOGGING, since the maintenance of the global indexes is not a complete rebuild of their structure but more of an incremental ¿maintenance.¿ Additionally, since we are maintaining the live index structure, we must generate undo for that¿in the event the partition operation fails, we must be prepared to put the index back the way it was. And remember, undo is protected by redo itself, so some of the redo you see generated is from the index updates and some is from the rollback. Add another global index or two and you would reasonably expect these numbers to increase.
So, UPDATE GLOBAL INDEXES is an option that allows you to trade off availability for resource consumption. If you have the need to provide continuous availability, it will be the option for you. But you will have to understand the ramifications and size other components of your system appropriately. Specifically, many data warehouses have been crafted over time to use bulk, direct path operations, bypassing undo generation and, when permitted, redo generation as well. Using UPDATE GLOBAL INDEXES cannot bypass either of those two elements. You¿ll need to examine the rules you used to size these components before using this feature, so you can assure yourself it can work on your system.
</quote>
April 17, 2007 - 1pm Central time zone
Reviewer: Jon
I am a little more interested in when I would pick a local index versus a global one. In one OTN document I read, it stated that for a warehouse, you don't want global indexes because of partitioning. So at what point do you pick a local index over a global one? Just when you are adding/removing partitions frequently? Or maybe when you are adding/removing partitions in a Warehouse?
Global Indexes Pros
* Faster than local indexes when not doing partition elimination
Global Indexes Cons
* Requires "update global indexes" which then creates more redo
* The optimizer may pick a global index over partition elimination
* May require rebuilding as we drop and add a partition daily
Local Indexes Pros
* Local to each partition
* Less redo generated as we drop and add partitions
* Partition elimination not avoided when using the index
Local Indexes Cons
* Slower than a global index when not doing partition elimination
I am leaning towards moving to local indexes because most times, people search for one partition at a time and we are adding and removing partitions on a daily basis.
Thanks!!!
Followup April 18, 2007 - 11am Central time zone:
those pro/con's are not universally true. IT DEPENDS
and using todays software, global indexes do not need go invalid - they are a crucial part of warehouses. really old papers, when partitioning first came out, yeah - they would say avoid them if you can.
Update Global Indexes
April 18, 2007 - 4am Central time zone
Reviewer: Jignesh
.. And for Partition Exchange.. we need primary key on source and target tables should be same.
Consider scenario when we do not have any local indexes on target table. In this case, we need to build the indexes on source table, do exchange partitions and then UPDATE GLOBAL INDEXES again ! ! bit of overhead isn't it?
Followup April 18, 2007 - 12pm Central time zone:
so - it is "online"
Update global index clause
March 9, 2010 - 10am Central time zone
Reviewer: satin satin
While creating the partition table or global index can we use "Update global index clause" .So that
it will prevent in feature if drop any partitions.Because everymonth we have to manage the
partititons which are really bothering us and jobs are abending because of the indexes are disabled
. Any suggestions will be helpfule
some more help in understanding global nonprefixed indexes
June 29, 2010 - 1pm Central time zone
Reviewer: jv
Dear Tom,
good day to you, if you can please help me in understanding the below statement from one of the
Oracle documents.
Quote
Global Nonprefixed: it is not possible to create a nonprefixed global index, because there are no
management or performance benefits when compared to a non-partitioned index.
Unquote
I believe until Oracle 8i Global Nonprefixed indexes were not supported but in later releases they
are supported, correct me if I am wrong here.
Also if they are supported is it true "they offer no management or performance benefits...". I am
confused with this statement from the document.
Thanking you for your help on this.
Regards,
jv
Followup July 6, 2010 - 1pm Central time zone:
No, it is not
ops$tkyte%ORA11GR2> create table t ( x int, y int, z int )
2 partition by range(x)
3 (partition p1 values less than (1),
4 partition p2 values less than (2),
5 partition p3 values less than (3)
6 );
Table created.
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> create index global_idx on t(x) global partition
2 by range(x)
3 (partition pi1 values less than (2), partition pi2 values less than (3), partition junk values
less than (MAXVALUE ));
Index created.
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> create index global_idx2 on t(y,x) global partition
2 by range(x)
3 (partition pi1 values less than (2), partition pi2 values less than (3), partition junk values
less than (MAXVALUE ));
by range(x)
*
ERROR at line 2:
ORA-14038: GLOBAL partitioned index must be prefixed
the partition key for the global index must be on the leading edge.
The comment refers to the fact that if the second index DID WORK, then a query of the form
where y = ?
where y > ?
...
and others would have to scan each and every partition, there would be nothing gained from partitioning this index by X.
How about non-prefixed local indexes?
August 10, 2011 - 7am Central time zone
Reviewer: Kulkarni from India
Hi Tom,
Good day. I have a scenario where I need to partition a table on date column but most of the
queries do not use that column as the leading column. Also there is no unique constraint on the
column that is used in the predicate of most of the queries hence I can not make it a primary key.
Here is the script I used to create a partitioned table and later inserted rows from the original
non-partitioned table into this new partitioned table.
CREATE TABLE DATASTREAM.DS_PRICES_PART
( DS_ID VARCHAR2(9) NOT NULL ENABLE NOVALIDATE,
UPDATE_STAMP DATE NOT NULL ENABLE NOVALIDATE,
MVALUE NUMBER(14,3),
NOSH NUMBER(15,3),
PRICE NUMBER(15,3),
PHIGH NUMBER(15,3),
PLOW NUMBER(15,3),
POPEN NUMBER(15,3),
VOLUME NUMBER(13,0),
PRICE_US NUMBER(15,3),
PHIGH_US NUMBER(15,3),
PLOW_US NUMBER(15,3),
POPEN_US NUMBER(15,3)
)
PARTITION BY RANGE (UPDATE_STAMP)
(PARTITION DS_PRICES_p1 VALUES LESS THAN (TO_DATE('31-MAR-1975 23:59:59', 'DD-MON-YYYY
HH24:MI:SS')) tablespace DATASTREAM_LARGE,
PARTITION DS_PRICES_p2 VALUES LESS THAN (TO_DATE('31-MAR-1980 23:59:59', 'DD-MON-YYYY HH24:MI:SS'))
tablespace DATASTREAM_LARGE,
PARTITION DS_PRICES_p3 VALUES LESS THAN (TO_DATE('31-MAR-1985 23:59:59', 'DD-MON-YYYY HH24:MI:SS'))
tablespace DATASTREAM_LARGE,
PARTITION DS_PRICES_p4 VALUES LESS THAN (TO_DATE('31-MAR-1990 23:59:59', 'DD-MON-YYYY HH24:MI:SS'))
tablespace DATASTREAM_LARGE,
PARTITION DS_PRICES_p5 VALUES LESS THAN (TO_DATE('31-MAR-2000 23:59:59', 'DD-MON-YYYY HH24:MI:SS'))
tablespace DATASTREAM_LARGE,
PARTITION DS_PRICES_p6 VALUES LESS THAN (TO_DATE('31-MAR-2005 23:59:59', 'DD-MON-YYYY HH24:MI:SS'))
tablespace DATASTREAM_LARGE,
PARTITION DS_PRICES_p7 VALUES LESS THAN (TO_DATE('31-MAR-2010 23:59:59', 'DD-MON-YYYY HH24:MI:SS'))
tablespace DATASTREAM_LARGE,
PARTITION DS_PRICES_p8 VALUES LESS THAN (TO_DATE('31-MAR-2015 23:59:59', 'DD-MON-YYYY HH24:MI:SS'))
tablespace DATASTREAM_LARGE);
In most of the queries ds_id is referred as the leading column along with update_stamp. The
combination of these two columns is unique. Now what kind of index do you suggest as I can not
create global non-prefixed indexes?
Please help.
Followup August 13, 2011 - 4pm Central time zone:
In most of the queries ds_id is referred as the leading column along with
update_stamp.
what do you mean by "leading column" - in a predicate?
if you have a where on update_stamp and ds_id - and both use equals, just use a local index on the two columns in any order.
if you have a where clause on those table and use a range operator on either - put the one you use equals on first, then the other. We'll still be able to partition eliminate so a local might be fine.
If you need to use a global index, so be it - just index as you would normally index?
without more information, I cannot really say anymore.
Local Indexes
September 8, 2011 - 4pm Central time zone
Reviewer: jloayza from U.S.A
Hi Tom,
Regarding to the last question, what kind of index should I use when most of the queries does not
include the partition key. Is it correct to use local non prefixed?
Regards
J.L
Followup September 8, 2011 - 5pm Central time zone:
If most of the queries that want to use that index are not going to use the partition key, you almost certainly want a global index or a non-partitioned index.
If you have an emp table that is hash partitioned into 128 partitions by empno and you have an index on LAST_NAME, you almost certainly want the index on last name to be GLOBALLY partitioned - if you used a locally partitioned index - we'd have to do 128 - one hundred and twenty eight - index range scans every time we used it!!!!