Home>Question Details



Phil -- Thanks for the question regarding "Date Partitioning a table", version 10.2.0.1

Submitted on 20-Nov-2007 17:08 Central time zone
Last updated 11-Nov-2009 9:24

You Asked

Hi Tom

I have a system I am working on that will require old data to be removed every week or so. The customer does not want any down time. Is it possible to create a partitioned table that I could define 31 or so partitions and load data in a different one each day? That way we'd be safe for a week or so - and they are unlikely to ever go longer than that. when they do decide to delete old data, we can just truncate the oldest partition with data although I am not sure how that could be identified other than query for existence of data where dd=x.

What I'd like to do is the following:

PARTITION BY RANGE (to_char(created_date,'dd'))
... and a load of partitions for all 31

but I don't think this is possible. Please can you let me know if this is possible or an alternative we have not considered?

Cheers

Phil

and we said...

Well, 11g will solve this rather nicely with either

a) interval partitioning
b) virtual column partitioning

interval partitioning would allow you to do something like this:

ops$tkyte%ORA11GR1> create table audit_trail
  2  ( ts    timestamp,
  3    data  varchar2(30)
  4  )
  5  partition by range(ts)
  6  interval (numtodsinterval(1,'day'))
  7  store in (users, example )
  8  (partition p0 values less than
  9   (to_date('22-sep-2007','dd-mon-yyyy'))
 10  )
 11  /


that would create a partition for any row inserted that did not have a daily partition to go into, for example:

ops$tkyte%ORA11GR1> select partition_name, tablespace_name, high_value
  2    from user_tab_partitions
  3   where table_name = 'AUDIT_TRAIL';

PARTITION_ TABLESPACE HIGH_VALUE
---------- ---------- -------------------------------
P0         USERS      TIMESTAMP' 2007-09-22 00:00:00'


table is empty right now, nothing in it - we insert:

ops$tkyte%ORA11GR1> insert into audit_trail values ( add_months(sysdate,12), 'x' );
1 row created.

ops$tkyte%ORA11GR1> select partition_name, tablespace_name, high_value
  2    from user_tab_partitions
  3   where table_name = 'AUDIT_TRAIL';

PARTITION_ TABLESPACE HIGH_VALUE
---------- ---------- -------------------------------
P0         USERS      TIMESTAMP' 2007-09-22 00:00:00'
SYS_P180   EXAMPLE    TIMESTAMP' 2008-11-10 00:00:00'


and it created a partition to hold our data, further, if we insert into the table again for another day:

ops$tkyte%ORA11GR1> insert into audit_trail values ( add_months(sysdate,6), 'x' );
1 row created.

ops$tkyte%ORA11GR1> select partition_name, tablespace_name, high_value
  2    from user_tab_partitions
  3   where table_name = 'AUDIT_TRAIL';

PARTITION_ TABLESPACE HIGH_VALUE
---------- ---------- -------------------------------
P0         USERS      TIMESTAMP' 2007-09-22 00:00:00'
SYS_P181   EXAMPLE    TIMESTAMP' 2008-05-10 00:00:00'
SYS_P180   EXAMPLE    TIMESTAMP' 2008-11-10 00:00:00'


it'll fill in the partitions as needed based on our interval scheme.

You would then 'age off' the stuff as needed.


A virtual column would allow you to add:

ops$tkyte%ORA11GR1> ALTER TABLE T ADD (the_day AS to_char(created_date,'dd'));
Table altered.


and then you can partition on "THE_DAY"


In 10g and before however, you do not have this capability. So you would either

a) add a real column, default it to to_char(created_date,'dd'), hide it from the application using a view and let the application use the view and partition on it.

b) use standard range partitions - add a few days to the end - and every day - add another day (use dbms_scheduler/dbms_job to automate this) and age off of the old stuff.


I'd rather like (b) myself - simply because:


... That way we'd be safe for a week or so - and they are unlikely to ever go longer than that. when they do decide to delete old data ...

makes me think the (a) approach could be shaky - when you say "unlikely"...


Reviews    
4 stars Scheduler   November 21, 2007 - 11am Central time zone
Reviewer: Phil 
If I can agree that we only ever store 10 days or so then this could be the way to go. 
Is it possible then to create it on installation with 10 range partitions and at the same time 
create a scheduled process to add a partition and truncate/drop the oldest? 
The process inserts rows very fast and so I need to see if the extra logic of where to store the 
data slows things down.
Many thanks for the help as always!
Phil


Followup   November 21, 2007 - 2pm Central time zone:

...
Is it possible then to create it on installation with 10 range partitions and
at the same time create a scheduled process to add a partition and
truncate/drop the oldest?
....

sure, your process you write will have access to the data dictionary and sysdate, you can certainly program something...
5 stars Affecting performance?   November 21, 2007 - 5pm Central time zone
Reviewer: Phil 
Hi Tom
The system this is being developed on requires high insert rates. This is done using OCI & binding 
and we achieve 10,000 a second peak insert rate. Following the change to a hash partition using 10 
partitions we can only achieve 3k inserts a second. We reverted to the original table by dropping 
and recreating with a single index but this too now only achieves 3k inserts a sec. Is there 
something that may have been changed by introducing a partitioned table?
[Oracle 10.2.0.1.0 on Solaris]
Phil


Followup   November 26, 2007 - 9am Central time zone:

Anytime you use a structure that enforces "location of data", you'll find putting data away takes longer (unless of course, you had multiple threads and were suffering from massive concurrency issues - which were resolved by having many segments....)

Partitioning is about speeding data retrieval, making data more available and easing administration. It is not about making inserts faster.

Think about it this way - if you had a big pile of books to return to the library, it would be fastest for the librarians to just throw the books into a room (instant insert) in a big heap (the data structure you were using was a heap table before). However, getting books back out would be somewhat difficult - so we add stuff like indexes, hashing, etc - we organize the data. Now returning a book takes a lot longer, but retrieving it is really fast.

You likely went from using very little CPU to using a lot.

consider:

ops$tkyte%ORA10GR2> CREATE TABLE t_hash
  2  (
  3  OWNER, OBJECT_NAME, SUBOBJECT_NAME, OBJECT_ID, DATA_OBJECT_ID,
  4  OBJECT_TYPE, CREATED, LAST_DDL_TIME, TIMESTAMP, STATUS,
  5  TEMPORARY, GENERATED, SECONDARY
  6  )
  7  PARTITION BY hash (object_id) partitions 16
  8  as
  9  select * from all_objects where 1=0
 10  /

Table created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create table t_heap
  2  as
  3  select * from all_objects where 1=0
  4  /

Table created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> @trace
ops$tkyte%ORA10GR2> alter session set events '10046 trace name context forever, level 12';

Session altered.

ops$tkyte%ORA10GR2> declare
  2          type array is table of t_hash%rowtype index by binary_integer;
  3          l_data array;
  4  begin
  5          select *
  6            bulk collect
  7            into l_data
  8            from all_objects
  9           where rownum <= 1000;
 10
 11          for i in 1 .. 10
 12          loop
 13                  forall i in 1 .. l_data.count
 14                          insert into t_heap values l_data(i);
 15                  forall i in 1 .. l_data.count
 16                          insert into t_hash values l_data(i);
 17          end loop;
 18  end;
 19  /

PL/SQL procedure successfully completed.


Now, that sort of represents what you were doing and what you are doing (hopefully you are using array processing, that is a giving if you want to do things "fast")

tkprof will show:

INSERT INTO T_HEAP
VALUES
 (:B1 ,:B2 ,:B3 ,:B4 ,:B5 ,:B6 ,:B7 ,:B8 ,:B9 ,:B10 ,:B11 ,:B12 ,:B13 )


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute     10      0.07       0.07          0        265       1667       10000
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       11      0.07       0.07          0        265       1667       10000

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 233     (recursive depth: 1)
********************************************************************************

INSERT INTO T_HASH
VALUES
 (:B1 ,:B2 ,:B3 ,:B4 ,:B5 ,:B6 ,:B7 ,:B8 ,:B9 ,:B10 ,:B11 ,:B12 ,:B13 )


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute     10      0.32       0.31          0       9539      29707       10000
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       11      0.32       0.31          0       9539      29707       10000


it simply takes more work to insert into a structure where "location" matters - when we have to figure out WHERE to put the data before putting it somewhere - well, that takes more work.

You should be able to measure this on your own system using tracing pretty easily.


Also, if you had this over time:

a) heap table
b) hash partitioned table
c) heap table plus index

one would expect (a) to be the fastest at inserting (but pretty slow for retrieval...)

(b) & (c) to be slower at insertion but likely faster at retrieval (due to partition elimination when applicable)
5 stars   November 26, 2007 - 9pm Central time zone
Reviewer: A reader 
Excellent explaination never thought of heap and partition that way!


5 stars Superb explanation   November 27, 2007 - 2am Central time zone
Reviewer: Phil from UK
A+++!


4 stars Partition by week   December 3, 2007 - 10am Central time zone
Reviewer: LJ 
To modify the example you gave in the first response:

" ... interval partitioning would allow you to do something like this:

ops$tkyte%ORA11GR1> create table audit_trail
2 ( ts  timestamp,
3  data varchar2(30)
4 )
5 partition by range(ts)
6 interval (numtodsinterval(1,'day'))
7 store in (users, example )
8 (partition p0 values less than
9  (to_date('22-sep-2007','dd-mon-yyyy'))
10 )
11 /

that would create a partition for any row inserted that did not have a daily partition to go into ..."

Is this the correct way to interval partition by WEEK:

create table audit_trail
2 ( ts  timestamp,
3  data varchar2(30)
4 )
5 partition by range(ts)
6 interval (numtodsinterval(7,'day'))
7 store in (users, example )
8 (partition p0 values less than
9  (to_date('22-sep-2007','dd-mon-yyyy '))
10 )
11 /


I couldn't find a NUMTOMWINTERVAL (month to week) function, and the above create table statement seems to do the job correctly, but just wanted to see if you know a better way to do weekly interval partitioning, especially in the to_date function on line 9 (thinking it might need hh24:MM:SS maybe)

Thank you,

LJ

Followup   December 3, 2007 - 7am Central time zone:

define week :)

when does a week start - whatever you say, I'll say "no, it starts on a different day"

and we'll both be right.

day to second, month to year - those are the basic interval types
    'DAY'
    'HOUR'
    'MINUTE'
    'SECOND'
    'MONTH'
    'YEAR'

7 days is great for a week, just pick the right "start" date.
5 stars Date based Partitoning   December 4, 2007 - 3pm Central time zone
Reviewer: John from NY, USA
It is good to know about the new interval partitoning. We are hoping to move there soon. 

In the mean time we are trying to acheive something like interval partioning in 10g. The idea is to 
truncate one days data and reload (truncates are faster than deletes) so we are range partioning 
for a date. 

I don't know how oracle stores the data dictionary in 11 in terms of interval partioning, but 
having 13000 partitions (we also have a subpartion on another column , same idea) on a table with 
83 million rows (avg row len 555), does it sound like an overkill. 


4 stars Date Partitioning a table   December 11, 2007 - 1am Central time zone
Reviewer: Arun 
Hi Tom,
 Is there any option to control on the name of partition? 
 Instead of system generated,SYS_P180.

Thanks
Arun


3 stars High_value format   October 29, 2009 - 3am Central time zone
Reviewer: Rustam from Russia
Hi, Tom.
In column high_value we can see:
TIMESTAMP' 2007-09-22 00:00:00'
I got issue on some database 10.2.0.3 where same value stored as 
TIMESTAMP'2007-09-22 00:00:00' (without leading blank symbol). What are settings (NLS or else) 
could effect on format of high_value on TIMESTAMP column? 


Followup   October 29, 2009 - 8am Central time zone:

have you identified a problem with anything?

you say you have an issue - can you describe the issue?

it just isn't have the sign there - which is fine.
3 stars High_value format   October 30, 2009 - 1am Central time zone
Reviewer: Rustam Kafarov from Moscow, Russia
Issue is in my procedure that parse high_value column to identify which partition should be 
splited. I'm excpecting format in high_value column for timestamp columns like 
TIMESTAMP'SYYYY-MM-DD HH24:MI:SS'. But in Oracle 10.2.0.3  it seems like TIMESTAMP'YYYY-MM-DD 
HH24:MI:SS'. I don't know why and on which versions it could be different. Is there any approachs 
to identify that format?


Followup   October 30, 2009 - 10am Central time zone:

but they evaluate the same? why does this matter?

ops$tkyte%ORA11GR1> select TIMESTAMP' 2007-09-22 00:00:00', TIMESTAMP'2007-09-22 00:00:00' from 
dual; 
TIMESTAMP'2007-09-2200:00:00'
---------------------------------------------------------------------------
TIMESTAMP'2007-09-2200:00:00'
---------------------------------------------------------------------------
22-SEP-07 12.00.00.000000000 AM
22-SEP-07 12.00.00.000000000 AM



if you are substringing the value out - just use the length of the string minus the length of your format to get the end of the string (you never need the blank)
3 stars High_value format   November 2, 2009 - 3am Central time zone
Reviewer: Rustam Kafarov 
Yes, I agree with you. I'm substringing value out. Column high_value_length could be usefull but if 
partition based on several columns it's hard to get real length of timestamp value. Currently I'm 
using like comparision with 
'TIMESTAMP''%YYYY-MM-DD HH24:MI:SS''' value. 


3 stars   November 4, 2009 - 6am Central time zone
Reviewer: David Aldridge 
Rustam, 

If you're just looking for the highest partition then use the one with the highest value of 
partition_position in %_tab_partitions. I just mention this cos I messed around with parsing dates 
etc before thinking of that.


4 stars indicating wich partition helps?   November 4, 2009 - 12pm Central time zone
Reviewer: Martin from Mx
it would be of any help if we indicate in wich partition the data goes ?

insert into table_x partition (partition_x)
select 
....


thanks 
regards


Followup   November 11, 2009 - 9am Central time zone:

help for what? Not sure what you are responding to?

Write a Review
 


All information and materials provided here are provided "as-is"; Oracle disclaims all express and implied warranties, including, the implied warranties of merchantability or fitness for a particular use. Oracle shall not be liable for any damages, including, direct, indirect, incidental, special or consequential damages for loss of profits, revenue, data or data use, incurred by you or any third party in connection with the use of this information or these materials.

About Oracle | Legal Notices and Terms of Use | Privacy Statement