Skip to Main Content

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Phil.

Asked: November 20, 2007 - 5:08 pm UTC

Last updated: May 22, 2020 - 12:12 am UTC

Version: 10.2.0.1

Viewed 100K+ times! This question is

You Asked

Hi Tom

I have a system I am working on that will require old data to be removed every week or so. The customer does not want any down time. Is it possible to create a partitioned table that I could define 31 or so partitions and load data in a different one each day? That way we'd be safe for a week or so - and they are unlikely to ever go longer than that. when they do decide to delete old data, we can just truncate the oldest partition with data although I am not sure how that could be identified other than query for existence of data where dd=x.

What I'd like to do is the following:

PARTITION BY RANGE (to_char(created_date,'dd'))
... and a load of partitions for all 31

but I don't think this is possible. Please can you let me know if this is possible or an alternative we have not considered?

Cheers

Phil

and Tom said...

Well, 11g will solve this rather nicely with either

a) interval partitioning
b) virtual column partitioning

interval partitioning would allow you to do something like this:

ops$tkyte%ORA11GR1> create table audit_trail
  2  ( ts    timestamp,
  3    data  varchar2(30)
  4  )
  5  partition by range(ts)
  6  interval (numtodsinterval(1,'day'))
  7  store in (users, example )
  8  (partition p0 values less than
  9   (to_date('22-sep-2007','dd-mon-yyyy'))
 10  )
 11  /


that would create a partition for any row inserted that did not have a daily partition to go into, for example:

ops$tkyte%ORA11GR1> select partition_name, tablespace_name, high_value
  2    from user_tab_partitions
  3   where table_name = 'AUDIT_TRAIL';

PARTITION_ TABLESPACE HIGH_VALUE
---------- ---------- -------------------------------
P0         USERS      TIMESTAMP' 2007-09-22 00:00:00'


table is empty right now, nothing in it - we insert:

ops$tkyte%ORA11GR1> insert into audit_trail values ( add_months(sysdate,12), 'x' );
1 row created.

ops$tkyte%ORA11GR1> select partition_name, tablespace_name, high_value
  2    from user_tab_partitions
  3   where table_name = 'AUDIT_TRAIL';

PARTITION_ TABLESPACE HIGH_VALUE
---------- ---------- -------------------------------
P0         USERS      TIMESTAMP' 2007-09-22 00:00:00'
SYS_P180   EXAMPLE    TIMESTAMP' 2008-11-10 00:00:00'


and it created a partition to hold our data, further, if we insert into the table again for another day:

ops$tkyte%ORA11GR1> insert into audit_trail values ( add_months(sysdate,6), 'x' );
1 row created.

ops$tkyte%ORA11GR1> select partition_name, tablespace_name, high_value
  2    from user_tab_partitions
  3   where table_name = 'AUDIT_TRAIL';

PARTITION_ TABLESPACE HIGH_VALUE
---------- ---------- -------------------------------
P0         USERS      TIMESTAMP' 2007-09-22 00:00:00'
SYS_P181   EXAMPLE    TIMESTAMP' 2008-05-10 00:00:00'
SYS_P180   EXAMPLE    TIMESTAMP' 2008-11-10 00:00:00'


it'll fill in the partitions as needed based on our interval scheme.

You would then 'age off' the stuff as needed.


A virtual column would allow you to add:

ops$tkyte%ORA11GR1> ALTER TABLE T ADD (the_day AS to_char(created_date,'dd'));
Table altered.


and then you can partition on "THE_DAY"


In 10g and before however, you do not have this capability. So you would either

a) add a real column, default it to to_char(created_date,'dd'), hide it from the application using a view and let the application use the view and partition on it.

b) use standard range partitions - add a few days to the end - and every day - add another day (use dbms_scheduler/dbms_job to automate this) and age off of the old stuff.


I'd rather like (b) myself - simply because:


... That way we'd be safe for a week or so - and they are unlikely to ever go longer than that. when they do decide to delete old data ...

makes me think the (a) approach could be shaky - when you say "unlikely"...


Rating

  (36 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Scheduler

Phil, November 21, 2007 - 11:17 am UTC

If I can agree that we only ever store 10 days or so then this could be the way to go.
Is it possible then to create it on installation with 10 range partitions and at the same time create a scheduled process to add a partition and truncate/drop the oldest?
The process inserts rows very fast and so I need to see if the extra logic of where to store the data slows things down.
Many thanks for the help as always!
Phil
Tom Kyte
November 21, 2007 - 2:51 pm UTC

...
Is it possible then to create it on installation with 10 range partitions and
at the same time create a scheduled process to add a partition and
truncate/drop the oldest?
....

sure, your process you write will have access to the data dictionary and sysdate, you can certainly program something...

Affecting performance?

Phil, November 21, 2007 - 5:29 pm UTC

Hi Tom
The system this is being developed on requires high insert rates. This is done using OCI & binding and we achieve 10,000 a second peak insert rate. Following the change to a hash partition using 10 partitions we can only achieve 3k inserts a second. We reverted to the original table by dropping and recreating with a single index but this too now only achieves 3k inserts a sec. Is there something that may have been changed by introducing a partitioned table?
[Oracle 10.2.0.1.0 on Solaris]
Phil
Tom Kyte
November 26, 2007 - 9:01 am UTC

Anytime you use a structure that enforces "location of data", you'll find putting data away takes longer (unless of course, you had multiple threads and were suffering from massive concurrency issues - which were resolved by having many segments....)

Partitioning is about speeding data retrieval, making data more available and easing administration. It is not about making inserts faster.

Think about it this way - if you had a big pile of books to return to the library, it would be fastest for the librarians to just throw the books into a room (instant insert) in a big heap (the data structure you were using was a heap table before). However, getting books back out would be somewhat difficult - so we add stuff like indexes, hashing, etc - we organize the data. Now returning a book takes a lot longer, but retrieving it is really fast.

You likely went from using very little CPU to using a lot.

consider:

ops$tkyte%ORA10GR2> CREATE TABLE t_hash
  2  (
  3  OWNER, OBJECT_NAME, SUBOBJECT_NAME, OBJECT_ID, DATA_OBJECT_ID,
  4  OBJECT_TYPE, CREATED, LAST_DDL_TIME, TIMESTAMP, STATUS,
  5  TEMPORARY, GENERATED, SECONDARY
  6  )
  7  PARTITION BY hash (object_id) partitions 16
  8  as
  9  select * from all_objects where 1=0
 10  /

Table created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create table t_heap
  2  as
  3  select * from all_objects where 1=0
  4  /

Table created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> @trace
ops$tkyte%ORA10GR2> alter session set events '10046 trace name context forever, level 12';

Session altered.

ops$tkyte%ORA10GR2> declare
  2          type array is table of t_hash%rowtype index by binary_integer;
  3          l_data array;
  4  begin
  5          select *
  6            bulk collect
  7            into l_data
  8            from all_objects
  9           where rownum <= 1000;
 10
 11          for i in 1 .. 10
 12          loop
 13                  forall i in 1 .. l_data.count
 14                          insert into t_heap values l_data(i);
 15                  forall i in 1 .. l_data.count
 16                          insert into t_hash values l_data(i);
 17          end loop;
 18  end;
 19  /

PL/SQL procedure successfully completed.


Now, that sort of represents what you were doing and what you are doing (hopefully you are using array processing, that is a giving if you want to do things "fast")

tkprof will show:

INSERT INTO T_HEAP
VALUES
 (:B1 ,:B2 ,:B3 ,:B4 ,:B5 ,:B6 ,:B7 ,:B8 ,:B9 ,:B10 ,:B11 ,:B12 ,:B13 )


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute     10      0.07       0.07          0        265       1667       10000
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       11      0.07       0.07          0        265       1667       10000

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 233     (recursive depth: 1)
********************************************************************************

INSERT INTO T_HASH
VALUES
 (:B1 ,:B2 ,:B3 ,:B4 ,:B5 ,:B6 ,:B7 ,:B8 ,:B9 ,:B10 ,:B11 ,:B12 ,:B13 )


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute     10      0.32       0.31          0       9539      29707       10000
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       11      0.32       0.31          0       9539      29707       10000


it simply takes more work to insert into a structure where "location" matters - when we have to figure out WHERE to put the data before putting it somewhere - well, that takes more work.

You should be able to measure this on your own system using tracing pretty easily.


Also, if you had this over time:

a) heap table
b) hash partitioned table
c) heap table plus index

one would expect (a) to be the fastest at inserting (but pretty slow for retrieval...)

(b) & (c) to be slower at insertion but likely faster at retrieval (due to partition elimination when applicable)

A reader, November 26, 2007 - 9:41 pm UTC

Excellent explaination never thought of heap and partition that way!

Superb explanation

Phil, November 27, 2007 - 2:47 am UTC

A+++!

Partition by week

LJ, December 03, 2007 - 10:04 am UTC

To modify the example you gave in the first response:

" ... interval partitioning would allow you to do something like this:

ops$tkyte%ORA11GR1> create table audit_trail
2 ( ts timestamp,
3 data varchar2(30)
4 )
5 partition by range(ts)
6 interval (numtodsinterval(1,'day'))
7 store in (users, example )
8 (partition p0 values less than
9 (to_date('22-sep-2007','dd-mon-yyyy'))
10 )
11 /

that would create a partition for any row inserted that did not have a daily partition to go into ..."

Is this the correct way to interval partition by WEEK:

create table audit_trail
2 ( ts timestamp,
3 data varchar2(30)
4 )
5 partition by range(ts)
6 interval (numtodsinterval(7,'day'))
7 store in (users, example )
8 (partition p0 values less than
9 (to_date('22-sep-2007','dd-mon-yyyy '))
10 )
11 /


I couldn't find a NUMTOMWINTERVAL (month to week) function, and the above create table statement seems to do the job correctly, but just wanted to see if you know a better way to do weekly interval partitioning, especially in the to_date function on line 9 (thinking it might need hh24:MM:SS maybe)

Thank you,

LJ
Tom Kyte
December 03, 2007 - 7:45 am UTC

define week :)

when does a week start - whatever you say, I'll say "no, it starts on a different day"

and we'll both be right.

day to second, month to year - those are the basic interval types
'DAY'
'HOUR'
'MINUTE'
'SECOND'
'MONTH'
'YEAR'

7 days is great for a week, just pick the right "start" date.

Date based Partitoning

John, December 04, 2007 - 3:46 pm UTC

It is good to know about the new interval partitoning. We are hoping to move there soon.

In the mean time we are trying to acheive something like interval partioning in 10g. The idea is to truncate one days data and reload (truncates are faster than deletes) so we are range partioning for a date.

I don't know how oracle stores the data dictionary in 11 in terms of interval partioning, but having 13000 partitions (we also have a subpartion on another column , same idea) on a table with 83 million rows (avg row len 555), does it sound like an overkill.

Date Partitioning a table

Arun, December 11, 2007 - 1:51 am UTC

Hi Tom,
Is there any option to control on the name of partition?
Instead of system generated,SYS_P180.

Thanks
Arun

High_value format

Rustam, October 29, 2009 - 3:22 am UTC

Hi, Tom.
In column high_value we can see:
TIMESTAMP' 2007-09-22 00:00:00'
I got issue on some database 10.2.0.3 where same value stored as
TIMESTAMP'2007-09-22 00:00:00' (without leading blank symbol). What are settings (NLS or else) could effect on format of high_value on TIMESTAMP column?

Tom Kyte
October 29, 2009 - 8:39 am UTC

have you identified a problem with anything?

you say you have an issue - can you describe the issue?

it just isn't have the sign there - which is fine.

High_value format

Rustam Kafarov, October 30, 2009 - 1:40 am UTC

Issue is in my procedure that parse high_value column to identify which partition should be splited. I'm excpecting format in high_value column for timestamp columns like TIMESTAMP'SYYYY-MM-DD HH24:MI:SS'. But in Oracle 10.2.0.3 it seems like TIMESTAMP'YYYY-MM-DD HH24:MI:SS'. I don't know why and on which versions it could be different. Is there any approachs to identify that format?
Tom Kyte
October 30, 2009 - 10:25 am UTC

but they evaluate the same? why does this matter?

ops$tkyte%ORA11GR1> select TIMESTAMP' 2007-09-22 00:00:00', TIMESTAMP'2007-09-22 00:00:00' from dual; 
TIMESTAMP'2007-09-2200:00:00'
---------------------------------------------------------------------------
TIMESTAMP'2007-09-2200:00:00'
---------------------------------------------------------------------------
22-SEP-07 12.00.00.000000000 AM
22-SEP-07 12.00.00.000000000 AM



if you are substringing the value out - just use the length of the string minus the length of your format to get the end of the string (you never need the blank)

High_value format

Rustam Kafarov, November 02, 2009 - 3:26 am UTC

Yes, I agree with you. I'm substringing value out. Column high_value_length could be usefull but if partition based on several columns it's hard to get real length of timestamp value. Currently I'm using like comparision with
'TIMESTAMP''%YYYY-MM-DD HH24:MI:SS''' value.

David Aldridge, November 04, 2009 - 6:10 am UTC

Rustam,

If you're just looking for the highest partition then use the one with the highest value of partition_position in %_tab_partitions. I just mention this cos I messed around with parsing dates etc before thinking of that.

indicating wich partition helps?

Martin, November 04, 2009 - 12:44 pm UTC

it would be of any help if we indicate in wich partition the data goes ?

insert into table_x partition (partition_x)
select
....


thanks
regards
Tom Kyte
November 11, 2009 - 9:24 am UTC

help for what? Not sure what you are responding to?

Partition Names

Anil, December 09, 2011 - 3:44 am UTC

Hi Tom
Very well explained interval Partitioning.
Is this possible to give Partition Names in case of auto creation of partition istead of system genrated partition names or
can we define some expresion that can choose the partition name logically from expersion evaluvation.?
Tom Kyte
December 09, 2011 - 4:06 pm UTC

you can rename them after the fact, that is all - there is no way to name them immediately as they are created.

Partitioning a table

A reader, January 05, 2012 - 1:24 pm UTC

Hi Tom,

I have few tables whose estimated size is going to be increased at the rate of 20K per day. It has two columns apart from CreateDate and UpdateDate columns.

Create table demo(col1 carchar2(100), col2 varchar2(3000), createdate date, updatedate date).

In the table col2 will hold data whose length varies from 2000 to 3000 BYTES.

Considering the high volume of data which is coming 20K per day, is it advisable to create a partitioned table? Does partitioning a table helps in performance or only ease in administration?

I was thinking to make partition over YEAR and then subpartition over MONTH. In case you have any other idea please suggest. It will be highly appreciated if you could provide a little explanation.


Thanks a lot.

Tom Kyte
January 05, 2012 - 2:31 pm UTC

.. 20K per day. .. so it is growing fairly slowly.

Considering the high volume of data which is coming 20K per day, is it
advisable to create a partitioned table?


Let's phrase this differently - considering that the table grows at a rate of about 7.3 million records per year (assuming a 24x7 operation here) meaning every year we add about:

ops$tkyte%ORA11GR2> select 20000*365 * 4/1024/1024 gb from dual;

        GB
----------
  27.84729




30 gb of data give or take a few gb. It might make sense to partition by some long period of time (months or years) to make it easier to administer.

Does partitioning a table helps in
performance or only ease in administration?


it depends. If you use an index to get the same amount of data out of the table - I don't care if the table is 1,000,000,000 rows or 10 rows - it'll take about the same amount of time.

If on the other hand, you full scan the table - and you could full scan less of the table with partition elimination - it can have a profound effect.

In short - you have to understand the "physics" behind partitioning before you design your system to use it. You have to be able to close your eyes and envision "what would happen if I did this" where 'this' is some partitioning scheme.


I was thinking to make partition over YEAR and then subpartition over MONTH

why? I'm not saying "that is not a good idea", I'm honestly asking "why" - what will you achieve (you might have really good reasons, I can come up with many examples whereby this would be a great idea - and just as many whereby this would be "not smart").

I don't know your system.
I don't know what type of queries you use.
I don't know how this table will be used

But you do - so you can tell us the technical reasons you picked that partitioning scheme and then we can tell you if you were correct in doing so.

Followup

A reader, January 07, 2012 - 2:37 pm UTC

hi Tom,

Actually the table has 84 columns with one BLOB column in it. Now the scenario is around 20K per day records will be inserted in it.

There are two ways a user can search data. One is with the primary key lets say the key_id column and second functionality is advanced search: user can search data in different ways e.g Date Range, Id, Supervisor, how many records in a month or year...

Since the data comes every day and we have to implement the search in an efficient way so that it fetches within 1-2 seconds, so I was assuming that partitioning the table would be an option.

Now since I get 20k*30 every month so I thought of partitioning the table on Year and thereby on month.

I am not sure whether that is the most suited approach when we deal with Advanced search on such data volume.

Please suggest and let me know if I am still unclear.
Tom Kyte
January 10, 2012 - 9:56 pm UTC

original:
I have few tables whose estimated size is going to be increased at the rate of 20K per day. It has
two columns apart from CreateDate and UpdateDate columns..


now:
Actually the table has 84 columns with one BLOB column in it. Now the scenario
is around 20K per day records will be inserted in it


what the heck? that second sentence doesn't mean very much. At least the first time around, when you lied to us about what you were doing, the size was there - about 4k of data - 20,000 records.

Now we have 20,000 records of COMPLETELY and UTTERLY unknown size - meaningless. What do you want anyone to actually say here?


so I was assuming that
partitioning the table would be an option.


explain how partitioning would help you, use math to explain.


this:

ser can search
data in different ways e.g Date Range, Id, Supervisor, how many records in a
month or year...


is meaningless. what does "search" mean in this context?

If I search by "date range" - do you want to use an index or a full scan?
If I search by "Id" - do you want to use an index or full scan (is ID unique? you told us key_id is, what is id???)

if I search by "supervisor" - you are getting the point.


for things like how many records in month X, in year Y - a range partition by date would definitely be useful as we'd full scan a table segment or index segment and use partition elimination.


Followup

A reader, January 11, 2012 - 1:50 am UTC

Hi tom,

Apology for confusion created.

The table has 84 columns with 4-5 columns as DATE and 1 column as BLOB. Other columns have average length of 30 characters.

Now the user can search on the basis of:

1. Date Range
2. ID - Primary Key
3. Number of cases created per month/per year

Now if we have these three conditions, does partitioning the table makes sense?

I have a bit of idea like:

-I can create a non-unique index on date columns
-ID is primary key so index gets picked up
-Reporting based on per month and per year can be handled in partition..


Should I go ahead and make partitions or we continue on the large table without partitioning?
Tom Kyte
January 11, 2012 - 9:31 am UTC

again...

if you partition this by your created date column - would you

a) be able to use partition elimination to remove a FULL SCAN (not an index range scan, a FULL SCAN of either an index segment or a table segment)

b) or not


If (a) is true, partitioning would help your query.
If (b) is true, partitioning would have to be employed in a manner so as to not negatively impact query performance, it will not make it go faster.


So, if you are using an index to retrieve/count things based on a date range - partitioning will not help.

If on the other hand, you are currently full scanning, it will help.


So, if you are using an index on ID - the primary key (which apparently is not called KEY_ID as documented above) - partitioning will do nothing for this query.

If on the other hand, you are currently full scanning - it still won't help since you've partitioned by a date, not by this key.


So, if you are using an index range scan to retrieve/count the number of cases created per month/per year - partitioning will not do much for you (nothing really)

If on the other hand, you are doing an index fast full scan or a full table scan - it would have an impact.




Should I go ahead and make partitions or we continue on the large table without partitioning?


do you want to be able to manage this segment as it grows? Partitioning is not about performance - it is about administration in most cases. Think of it from that perspective.

partitioning on interval

Maggie, April 26, 2012 - 3:52 pm UTC

Hi Tom,

Going back to the question on partitioning. If it possible to have 4 partitions and each partition is on the specific date of the month like the 1st, 8th, 15th and 22nd? the partitions are in round robin manner and data is stored in 4 different tablespaces.

thanks in advance!

cheers,
Maggie
Tom Kyte
April 26, 2012 - 6:07 pm UTC

describe this more.


do you mean "i want everything from the 1st up to but not including the 8th of ANY month in partition P1, everything from the 8th up to but not including the 15th of ANY month" and so on???

or do you want this month by month (meaning way more than four partitions)?

A reader, April 27, 2012 - 3:07 pm UTC

yes. this is how i wanted

P1 contains from the 1st - 7th of ANY month
P2 contains from the 8th - 14th of ANY month
P3 contains from the 15th - 21st of ANY month
P4 contains from the 22nd - end of the month

is it possible?

thanks!
Tom Kyte
April 30, 2012 - 7:58 am UTC

ops$tkyte%ORA11GR2> CREATE TABLE t
  2  (
  3    dt  date,
  4    x   int,
  5    y   varchar2(30),
  6    theday as (to_number(to_char(dt,'dd')) )
  7  )
  8  PARTITION BY RANGE (theday)
  9  (
 10    PARTITION part1 VALUES LESS THAN (8),
 11    PARTITION part2 VALUES LESS THAN (15),
 12    PARTITION part3 VALUES LESS THAN (22),
 13    PARTITION part4 VALUES LESS THAN (maxvalue)
 14  )
 15  /

Table created.

ops$tkyte%ORA11GR2> insert into t (dt,x,y)
  2  select sysdate+rownum, rownum, 'x'
  3    from dual
  4  connect by level <= 60;

60 rows created.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> select * from t partition(part2);

DT                 X Y                                  THEDAY
--------- ---------- ------------------------------ ----------
08-MAY-12          8 x                                       8
09-MAY-12          9 x                                       9
10-MAY-12         10 x                                      10
11-MAY-12         11 x                                      11
12-MAY-12         12 x                                      12
13-MAY-12         13 x                                      13
14-MAY-12         14 x                                      14
08-JUN-12         39 x                                       8
09-JUN-12         40 x                                       9
10-JUN-12         41 x                                      10
11-JUN-12         42 x                                      11
12-JUN-12         43 x                                      12
13-JUN-12         44 x                                      13
14-JUN-12         45 x                                      14

14 rows selected.


pretty straightforward ...

Sokrates, April 30, 2012 - 9:17 am UTC

... usage of virtual columns

tes@11.2 > CREATE TABLE t
 (
   dt  date,
   x   int,
   y   varchar2(30),
   theday as (extract(day from dt))
)
PARTITION BY RANGE (theday)
(
  PARTITION part1 VALUES LESS THAN (8),
  PARTITION part2 VALUES LESS THAN (15),
  PARTITION part3 VALUES LESS THAN (22),
  PARTITION part4 VALUES LESS THAN (maxvalue)
)
/


works also.

However, both
select * from t partition for (to_number(to_char(sysdate, 'dd')));

and

select * from t partition for (extract(day from sysdate));


yield in
ORA-14108: illegal partition-extended table name syntax

:-(


Tom Kyte
May 01, 2012 - 3:10 pm UTC

of course they do, you cannot "bind" identifiers, identifiers must be - MUST BE - known at parse time.

You are trying to bind an identifier (again :) )

using sysdate makes the identifier name non-deterministic, of course you cannot compile a plan from that.


If you wanted to have your cake and eat it to - you would just

where theday = to_number(to_char(sysdate,'dd'))

then you would see a partition KEY KEY in the plan.


An identifier must be known and fixed.

sorry

Sokrates, April 30, 2012 - 9:26 am UTC

it is clear that
select * from t partition for (to_number(to_char(sysdate, 'dd')));

fails because the partition cannot be determined by compile-time.

select * from t partition for (to_number(to_char(DATE'2012-04-30', 'dd')));


works great as expected.

but why does
select * from t partition for (extract(day from DATE'2012-04-30'));


gives
ORA-14763: Unable to resolve FOR VALUES clause to a partition number

?
Tom Kyte
May 01, 2012 - 3:14 pm UTC

I suspect they might be marking it non-deterministic due to timezone related issues - which granted wouldn't affect a DATE but they might not be looking "that deep"

a pity ...

Sokrates, May 02, 2012 - 1:40 am UTC

ok, thanks, that could be a cause.

I just find it a pity that you a forced to use constructs like
to_number(to_char(dt,'dd'))

in some circumstances - I personally think this is an ugly construct.

You have a date and you want the day of the month of it - it is quite natural that you just extract the day out of it, isn't it ?
Why should we convert this date into a string ( thinking carefully about using the right format, probably thinking about NLS-issues ) and then convert this sting again into a number ( again: do we have to think about the format, do we have to think about other issues ?) just to perform this simple task ?
It seems a bit like cracking a nut with a sledgehammer, doesn't it ?
Tom Kyte
May 02, 2012 - 1:30 pm UTC

sort of..

Partition table related design issue

Arvind, May 03, 2012 - 9:35 am UTC

Hi Tom,

I have a table having 16 partitions. We have a business requirement for which I need to access each partition and do complex processing. During this complex processing, there are few query which access more than one partition of the same table. This table is really big (having 126 million records). To make the processing faster we can do two things:

1) Write separate procedures which access only one partition data of big table. Once we have these procedure we can do it yourself(DIY) parallel execution.

2) Write one procedure only and may be use oracle parallel execution.

Please suggest us which one will be better and why. Do you have any other way of doing it?

Thanks,

Arvind

Tom Kyte
May 03, 2012 - 2:24 pm UTC

I would go with #2 in general - anytime the software can do my work for me - I'll let it.

126 million is pretty small to average today - don't let that number overwhelm you. Try to do as much as you can in as few big sql statements as possible with no procedural code! If you start processing row by row in code - it'll take hours. If you start bulking processing in single sql statements - it'll take much less time. Whenever possible - skip using update/delete in favor of create table as select (no redo is possible, no undo is generated, very efficient)

Partition table related design issue

Arvind, May 04, 2012 - 5:27 pm UTC

Thanks Tom.

Naming numtodsinterval() partitions

Nishtha, May 11, 2012 - 2:29 pm UTC

Hi Tom,

This concept of Interval Partitioning really helped me to resolve my issue, but partially only.
I would want to name the Partitions rather than use system
generated partition names.
Is this possible. I have to partition my table, for every day auditing, and identify the partitions as the date they are created on.
Tom Kyte
May 12, 2012 - 12:32 am UTC

you would have to create a job that did an alter rename on the SYS_* generated names using the HIGH_VALUE column from user_tab_partitions to determine what the name should be.

Hourly vs Daily Partitions

A reader, April 08, 2013 - 8:42 pm UTC

Hi Tom

We have a very large table (average size 1Tb of which 70% is index size , average row count = 1.5 to 2 billion) that is partitioned daily using the oracle 11g interval partitioning method.

Data is loaded every 15 minutes via a insert into select with a group by clause.

The inserts into this table take longer when we increase the load (this insert process is the primary process running on the server).

Would partitioning the table on a hourly basis speed up the inserts ( all indexes are local) ? There are about 6 indexes each varying from 100G to 150G in size. I was thinking along the lines that the index lookup during inserts would be faster with a smaller index due to the hourly partitioning.
Tom Kyte
April 22, 2013 - 2:29 pm UTC

... average size 1Tb of which 70% is index size ...

what does that mean ????

300gb table, 700gb of indexes???? something else?


... Would partitioning the table on a hourly basis speed up the inserts ....

possibly, but it could kill retrieval for all I know.


how much data do you actually load every 15 minutes.

do the queries all use partition elimination and would they continue to eliminate the same number of partitions if you went to 24 a day?

do you direct path load this table or conventional path?

Hourly vs Daily Partitoins

A reader, May 02, 2013 - 3:56 pm UTC

300G table size, 700G indexes size (5 indexes...each appx 100-150G)

Conventional load is used.

We notice heavy degradation during the analyze window. Our dba recommends not using partitioning and instead use fixed tables with the date appended to table name as this would eliminate overheads due to partitioning. However, this is a big change for all the apps using the table.
Tom Kyte
May 06, 2013 - 4:57 pm UTC

what "overhead due to partitioning"??

and how would that help even remotely with the "analyze window"

by analyze to you mean "dbms_stats.gather_table_stats"?


using tables with dates would cause you to be constantly parsing as all sql would be "new" sql, not a really good idea...

hourly vs daily partitions

A reader, May 02, 2013 - 10:35 pm UTC

I forgot to add:

Every 15 minutes, about 2 - 3 million rows get loaded.

Hourly vs Daily Partitions

A reader, May 07, 2013 - 8:36 pm UTC

The gather stats process runs for a couple hrs every day during which IO load spikes(as expected) and the inserts into this large table slows down.

Does the use of separate tables (not partitioned) per day reduce the IO during gather stats or make it finish faster due to not having partitions(this is what our dba implied)
Tom Kyte
May 07, 2013 - 8:42 pm UTC

You would only typically be gathering local partition statistics - so no, having real tables would not cut down on the work performed for statistics gathering.

A question you have to ask yourself though is "do we need to GATHER statistics, is our data so so so different from day to day that we need new statistics".

It could very well be that as you add a new partition, you should be copying statistics from the last partition - updating (using dbms_stats.SET_* routines) just a few column stats to represent new high and low values for some columns (just some, most columns would not need any special treatment)

and maybe gather stats once in a blue moon on a single partition to start copying that in the future, to catch any subtle changes in the data patterns.

the optimizer doesn't need "fresh" statistics

the optimizer doesn't need "gathered" statistics

the optimizer needs only representative statistics - you can set them a lot faster than gathering them.

the fastest way to do something is not to do it - you should look into that. This is how AWR handles its partitions - we copy the stats from an old partition so we don't have empty partition statistics. They are representative of the data (close enough - they don't have to be perfect - just representative)

Hourly vs Daily partitions

A reader, May 07, 2013 - 9:14 pm UTC

Thanks for your advice Tom,

I will push for not collecting stats daily (and use a copy)...that seems to make sense as data distribution does not change enough to warrant fresh collection every day.

DP import into partitioned table

Ben, May 08, 2013 - 11:42 am UTC

Hi Tom,

After reading this post I see it is all about inserting into a partitioned table. I recently did a data pump import into an existing partitioned table using table_exists_action=append and all the data went in as I hoped. Each partition was populated with the correct data ranges.
I exported the unpartitioned table with DP using parallel=10 (created 10 .dmp files nicely), precreated the empty partitioned table with _P appended to the table_name, imported using DP using remap_table=LARGE_TBL:LARGE_TBL_P and also got the 10 threads from the parallel (sweet) and then when the import was done I renamed the tables.
I was pleasantly surprised by this and wanted to share with others, perhaps it will help someone who wasn't aware. I found it more convenient in my case than using online table redefinition (because I could suffer the downtime).
Ben.
Tom Kyte
May 08, 2013 - 1:28 pm UTC

what would have been even MORE convenient would have been a simple:

ops$tkyte%ORA11GR2> create table t1 ( dt , x , y )
  2  as
  3  select created, object_id, object_name from all_objects;

Table created.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> CREATE TABLE t2
  2  (
  3    dt ,
  4    x  ,
  5    y
  6  )
  7  PARTITION BY RANGE (dt)
  8  (
  9    PARTITION part1 VALUES LESS THAN (to_date('13-mar-2003','dd-mon-yyyy')) ,
 10    PARTITION part2 VALUES LESS THAN (to_date('14-mar-2003','dd-mon-yyyy')) ,
 11    PARTITION junk VALUES LESS THAN (MAXVALUE)
 12  )
 13  as select * from t1;

Table created.


using parallel if you like - that'll skip undo, can skip redo generation - does a single read from disk to read T1 and a single write to disk to write T2 (instead of read t1, write t1 to file system, read filesystem, write t2).

much easier...

skipping undo / redo

Ben, May 13, 2013 - 12:03 pm UTC

"using parallel if you like - that'll skip undo, can skip redo generation - does a single read from disk to read T1 and a single write to disk to write T2 (instead of read t1, write t1 to file system, read filesystem, write t2)."

Do I understand correctly that by using a parallel hint I can skip the undo and redo generation? That would be great if it were that easy. Or do I need to specify other things as well as parallel to skip undo and redo generation in this example?
Tom Kyte
May 13, 2013 - 1:30 pm UTC

using direct path operations - sqlldr direct=y, insert /*+APPEND*/,, alter table t move, alter index i rebuilt, create table as select - etc - they all skip undo generation and optionally may skip redo generation (in noarchive log mode, they skip redo, in archive log mode if the segment is set NOLOGGING they skip redo as long as the DBA has not set force logging)

parallel insert always uses append mode.


so, do something that uses direct path (parallel insert does), have the segment set to nologging (alter table t nologging) and make sure DBA doesn't have force logging set on (if they do, you will have to generate redo, they do that for standby databases)

Ben, May 22, 2013 - 12:43 pm UTC

So then is doing a network_link import with data pump the same as doing an insert /*+append */ into as select operation?
Tom Kyte
May 22, 2013 - 1:54 pm UTC

as long as the conditions for direct path are met, yes.

http://docs.oracle.com/cd/E11882_01/server.112/e22490/dp_overview.htm#CJAFDGIC

Best way to drop old partitions

Shimmy, October 15, 2013 - 2:22 pm UTC

DROP TABLE sk_20131015;
CREATE TABLE SK_20131015
(CREATE_DATETIME    DATE,
 DATA  varchar2(30))
PARTITION BY RANGE(CREATE_DATETIME)
INTERVAL (NUMTODSINTERVAL(7,'DAY')) 
(PARTITION P0 VALUES LESS THAN
 (TO_DATE('23-SEP-2007','DD-MON-YYYY')));


insert into sk_20131015 values ( SYSDATE-20, 'A' );
insert into sk_20131015 values ( SYSDATE-13, 'B' );
insert into sk_20131015 values ( SYSDATE, 'x' );
insert into sk_20131015 values ( SYSDATE+ (1/60), 'x1' );
insert into sk_20131015 values ( SYSDATE+1, 'y' );
insert into sk_20131015 values ( SYSDATE+7, 'z' );

SELECT partition_name, tablespace_name, high_value
FROM user_tab_partitions
WHERE table_name = 'SK_20131015';


My partitions are created with a range of Sunday thru Saturday.
What's the best way to drop partitions that are older than 2 weeks?
Meaning, if I run on Monday Oct 21, 2013, I need to drop partitions with CREATE_DATETIME values less than or equal to 10/15/2013?
Tom Kyte
November 01, 2013 - 6:38 pm UTC

write a block of code in plsql


query user_tab_partitions

get the partition high key value and partition name in host variables

using a tiny bit of dynamic sql to convert it to a date.

have an if/then statement to drop that partition if the selected date is more than two weeks old.

ops$tkyte%ORA11GR2> declare
  2          l_date date;
  3  begin
  4          for x in ( SELECT partition_name, tablespace_name, high_value
  5                   FROM user_tab_partitions
  6                  WHERE table_name = 'SK_20131015' )
  7          loop
  8                  execute immediate 'begin :x := ' || x.high_value || '; end;' using OUT l_date;
  9                  dbms_output.put_line( to_char( l_date, 'dd-mon-yyyy hh24:mi:ss' ) );
 10          end loop;
 11  end;
 12  /
23-sep-2007 00:00:00
13-oct-2013 00:00:00
20-oct-2013 00:00:00
03-nov-2013 00:00:00
10-nov-2013 00:00:00

PL/SQL procedure successfully completed.



note, since the high value is coming from the data dictionary, and since the table name is hard coded and known to be range partitioned on a date - this is not going to be subject to sql injection. you would need to validate the high value contents in general if used in other ways however.

we can do it without parsing high_value

Sokrates, November 01, 2013 - 7:19 pm UTC


sokrates@11.2 > CREATE TABLE SK_20131015
2 (CREATE_DATETIME DATE,
3 DATA varchar2(30))
4 PARTITION BY RANGE(CREATE_DATETIME)
5 INTERVAL (NUMTODSINTERVAL(7,'DAY'))
6 (PARTITION P0 VALUES LESS THAN
7 (TO_DATE('23-SEP-2007','DD-MON-YYYY')));

Table created.

sokrates@11.2 >
sokrates@11.2 >
sokrates@11.2 > insert into sk_20131015 values ( SYSDATE-20, 'A' );

1 row created.

sokrates@11.2 > insert into sk_20131015 values ( SYSDATE-13, 'B' );

1 row created.

sokrates@11.2 > insert into sk_20131015 values ( SYSDATE, 'x' );

1 row created.

sokrates@11.2 > insert into sk_20131015 values ( SYSDATE+ (1/60), 'x1' );

1 row created.

sokrates@11.2 > insert into sk_20131015 values ( SYSDATE+1, 'y' );

1 row created.

sokrates@11.2 > insert into sk_20131015 values ( SYSDATE+7, 'z' );

1 row created.

sokrates@11.2 >
sokrates@11.2 > SELECT partition_name, high_value
2 FROM user_tab_partitions
3 WHERE table_name = 'SK_20131015';

PARTITION_NAME HIGH_VALUE
------------------------------ --------------------------------------------------------------------------------
P0 TO_DATE(' 2007-09-23 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
SYS_P3215 TO_DATE(' 2013-10-13 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
SYS_P3216 TO_DATE(' 2013-10-20 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
SYS_P3217 TO_DATE(' 2013-11-03 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
SYS_P3218 TO_DATE(' 2013-11-10 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA

sokrates@11.2 > declare
2 spec_partition_does_not_exist exception;
3 pragma exception_init(spec_partition_does_not_exist, -02149); -- "Specified partition does not exist"
4 last_partition_cant_be_dropped exception;
5 pragma exception_init(last_partition_cant_be_dropped, -14758); -- "Last partition in the range section cannot be dropped"
6 begin
7 for d in (select trunc(sysdate)-14-level d from dual connect by level>0) loop
8 begin
9 execute immediate 'alter table sk_20131015 drop partition for(date''' || to_char(d.d, 'yyyy-mm-dd') || ''')';
10 exception
11 when spec_partition_does_not_exist then null;
12 when last_partition_cant_be_dropped then exit;
13 end;
14 end loop;
15 end;
16 /

PL/SQL procedure successfully completed.

sokrates@11.2 > SELECT partition_name, high_value
2 FROM user_tab_partitions
3 WHERE table_name = 'SK_20131015';

PARTITION_NAME HIGH_VALUE
------------------------------ --------------------------------------------------------------------------------
P0 TO_DATE(' 2007-09-23 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
SYS_P3217 TO_DATE(' 2013-11-03 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
SYS_P3218 TO_DATE(' 2013-11-10 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA

Tom Kyte
November 01, 2013 - 9:46 pm UTC

what happens to this code in 1 year
2 years
3 years
10 years

as it executes thousands and thousands of sql statements - hitting the shared pool, hard parsing, locking things - taking time.


ops$tkyte%ORA11GR2> CREATE TABLE SK_20131015
  2  (CREATE_DATETIME    DATE,
  3   DATA  varchar2(30))
  4  PARTITION BY RANGE(CREATE_DATETIME)
  5  INTERVAL (NUMTODSINTERVAL(7,'DAY'))
  6  (PARTITION P0 VALUES LESS THAN
  7   (TO_DATE('23-SEP-1997','DD-MON-YYYY')));

Table created.

ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> insert into sk_20131015 values ( SYSDATE-20, 'A' );

1 row created.

ops$tkyte%ORA11GR2> insert into sk_20131015 values ( SYSDATE-13, 'B' );

1 row created.

ops$tkyte%ORA11GR2> insert into sk_20131015 values ( SYSDATE, 'x' );

1 row created.

ops$tkyte%ORA11GR2> insert into sk_20131015 values ( SYSDATE+ (1/60), 'x1' );

1 row created.

ops$tkyte%ORA11GR2> insert into sk_20131015 values ( SYSDATE+1, 'y' );

1 row created.

ops$tkyte%ORA11GR2> insert into sk_20131015 values ( SYSDATE+7, 'z' );

1 row created.

ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> declare
  2     spec_partition_does_not_exist exception;
  3     pragma exception_init(spec_partition_does_not_exist, -02149); -- "Specified partition does not exist"
  4     last_partition_cant_be_dropped exception;
  5     pragma exception_init(last_partition_cant_be_dropped, -14758); -- "Last partition in the range section cannot be dropped"
  6     l_cnt Number := 0;
  7  begin
  8     for d in (select trunc(sysdate)-14-level d from dual connect by level>0) loop
  9        begin
 10                    l_cnt := l_cnt + 1;
 11            execute immediate 'alter table sk_20131015  drop partition for(date''' || to_char(d.d, 'yyyy-mm-dd') || ''')';
 12        exception
 13           when spec_partition_does_not_exist then null;
 14           when last_partition_cant_be_dropped then exit;
 15        end;
 16     end loop;
 17     dbms_output.put_line( l_cnt || ' iterations' );
 18  end;
 19  /
<b>5870 iterations</b>

PL/SQL procedure successfully completed.



I'd rather not write code that is programmed to fail ;)

program to succeed.

A reader, October 11, 2018 - 1:06 pm UTC


Vishal, May 21, 2020 - 3:08 pm UTC

Hi Tom,

Below code works fine in oracle 12c, but it is not working in 19c as last_partition_cant_be_dropped exception is not caught. Instead after deleting older partition it starts dropping latest partitions and stops at last partition. Could you please suggest, how i can handle last_partition_cant_be_dropped exception in 19c. Thank you very much for your help in advance.

ops$tkyte%ORA11GR2> declare
2 spec_partition_does_not_exist exception;
3 pragma exception_init(spec_partition_does_not_exist, -02149); -- "Specified partition does not exist"
4 last_partition_cant_be_dropped exception;
5 pragma exception_init(last_partition_cant_be_dropped, -14758); -- "Last partition in the range section cannot be dropped"
6 l_cnt Number := 0;
7 begin
8 for d in (select trunc(sysdate)-14-level d from dual connect by level>0) loop
9 begin
10 l_cnt := l_cnt + 1;
11 execute immediate 'alter table sk_20131015 drop partition for(date''' || to_char(d.d, 'yyyy-mm-dd') || ''')';
12 exception
13 when spec_partition_does_not_exist then null;
14 when last_partition_cant_be_dropped then exit;
15 end;
16 end loop;
17 dbms_output.put_line( l_cnt || ' iterations' );
18 end;
19 /
5870 iterations

PL/SQL procedure successfully completed.
Connor McDonald
May 22, 2020 - 12:12 am UTC

We "improved" things in 18c onwards in that we automatically adjust interval partitions so that you can continue to drop them just like normal partitions.

Thus you should not rely on that to be the "stop" flag. You should check the value of the date that want to keep and only drop up to that moment.

More to Explore

VLDB

If you are new to partitioning, check out Connor McDonald's introduction series here.

VLDB

Documentation set on VLDB and Partitioning.