Skip to Main Content
  • Questions
  • Partitioning on column not used in a query

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Frank.

Asked: April 24, 2013 - 9:19 pm UTC

Last updated: May 07, 2013 - 3:48 pm UTC

Version: 11.2.0.3

Viewed 10K+ times! This question is

You Asked

Hello Tom,

Just started using Oracle and databases in general and trying to understand partitioning a little better. If I have a Range Partition on a Dat column, but then don't use this date column in the FROM or WHERE clause of my sql statement, will the query consider using the partition option? Or will the optimizer simply ignore the partitions and threat the table a one big object like it would without partitioning?

I think I've read in the docs that paritioning can be used for managabiliy and administration as well as performance benefits, but in the cause mentioned above I don't think it will have any performance benefits. Are there cases where this could be a benefit, or would it just be for admin / manageability reasons?

Thank you for helping me understand this
Frank S.

and Tom said...

the optimizer will never ignore the partitions, it will develop a plan that uses as few partitions as possible.

In your example - table T partioned by range on a date column D but D is not referenced in the where clause - we'll either access the table using

a) full scan. the fact the table is partitioned will have no bearing positive or negative on performance. We'll do the same amount of work we would regardless of how the table was stored.

b) index access. This is trickier, the index could be either locally partitioned or globally partitioned.

If the index were locally partitioned - we would likely suffer a negative performance impact. If the table were in 12 partitions for example, we would have to perform 12 index range scans to locate information - instead of one that would be true on a non-partitioned example.

If the index was globally partitioned - and we were using the index to scan the data - that would imply that the predicate involved the indexed columns - and the indexed columns where used to partition the global index - and hence we would be able to partition eliminate on the index - maybe down to a single partition. In this case - the fact we partitioned was neither beneficial nor negative in a performance aspect.



partitioning is for query performance when we can perform partition elimination or other tricks like partition-wise joins. If we cannot do that - then partitioning will not really benefit a SELECT statement, in fact you have to exercise care to not negatively impact performance.


Partitioning is most useful performance wise in an environment such as warehousing or report systems where a full scan can significantly reduce the amount of data it has to scan.

Rating

  (5 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Great explanation

Frank, April 26, 2013 - 9:02 am UTC

Thanks Tom, that really clarifies the concept for me and I am starting to understand it better now.

performance on insert for partition

Anand, May 05, 2013 - 2:52 pm UTC

Hi Tom,

We have plan to partition a table of 300 GB.But this table in our application have no select or update or delete anywhere but it has so many inserts in batch parallel stream.As per DBA if we partition this table there will no performance gain in the batch.Is that true ? Should we only consider those big table which have more select or update then insert ?
Tom Kyte
May 06, 2013 - 7:25 pm UTC

insufficient data to comment.

will the batches each load their own partition? could this enable them to direct path load whereas before they couldn't? how does the 'batch' work? can we do this as a single parallel direct path load instead of user written (slow) code?

partitioning on one itme in the WHERE clause

Ben, May 07, 2013 - 1:13 pm UTC

Hi Tom,

In your first reply I concluded that ideally the partitioned column would be in the WHERE clause (column D of table T partitioned with RANGE).

I have a query that uses column D, but also column A, column B and column C in the WHERE clause. But I only have column A indexed and a composite index on A and D. I am not seeing any partition elimination when running the query through an explain plan (which I'm not surprised by after typing this out and thinking more about it). Is it necessary to use ONLY column D if we want partition elimination? How can I ensure partition-elimination is used when I have more columns in the WHERE clause in addition to the partitioned column?


Small example:

 create table T (
a varchar2(10), 
b varchar2(10), 
c varchar2(10), 
acc_date date) 
 partition by range (acc_date) 
(partition acc1 values less than (to_date(' 2013-01-01 00:00:00','SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')),
partition acc1 values less than (to_date(' 2013-02-01 00:00:00','SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')), 
partition acc1 values less than (to_date(' 2013-03-01 00:00:00','SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')), 
partition acc1 values less than (to_date(' 2013-04-01 00:00:00','SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')), 
partition acc1 values less than (to_date(' 2013-05-01 00:00:00','SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'));

-- please not there are over 90 partitions in the real situation, dating back to around 2006. I didn't list all 90 for brevity sake. In reality, each partition has about 1 million rows.

 create index a_indx on T (a);
create index a_d_index on T (acc_date, a);


 select a, b, c, acc_date
from T
where a='aq12ws'
and b<>'fr45tg'
and c='e34r5t'
and acc_date > '01-JAN-2010';


Being that I have over 90 partitions in reality, going back to 2006, I was hoping that the acc_date condition in the WHERE clause would at least be able to only look at the last 40 or so partitions and avoid a full table scan.

 Execution Plan
----------------------------------------------------------
Plan hash value: 1950637564

--------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          | 11197 |  1191K|   366K  (3)| 01:13:15 |        |      |            |
|   1 |  PX COORDINATOR      |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)| :TQ10000 | 11197 |  1191K|   366K  (3)| 01:13:15 |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX BLOCK ITERATOR |          | 11197 |  1191K|   366K  (3)| 01:13:15 |  Q1,00 | PCWC |            |
|*  4 |     TABLE ACCESS FULL| T         | 11197 |  1191K|   366K  (3)| 01:13:15 |  Q1,00 | PCWP |            |
-------------------------------------------------------------------------------------------------------------- 


Any suggestions on what I need to do to benefit from partition-elimination here.

Many thanks,
Ben

Tom Kyte
May 07, 2013 - 3:46 pm UTC

are you sure you are not just cutting off the pstart-pstop columns in the plan?

ops$tkyte%ORA11GR2> create table T (
  2  a varchar2(10),
  3  b varchar2(10),
  4  c varchar2(10),
  5  acc_date date)
  6   partition by range (acc_date)
  7  (partition acc1 values less than (to_date(' 2013-01-01 00:00:00','SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')),
  8  partition acc2 values less than (to_date(' 2013-02-01 00:00:00','SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')),
  9  partition acc3 values less than (to_date(' 2013-03-01 00:00:00','SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')),
 10  partition acc4 values less than (to_date(' 2013-04-01 00:00:00','SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')),
 11  partition acc5 values less than (to_date(' 2013-05-01 00:00:00','SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))
 12  );

Table created.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> create index a_indx on T (a);

Index created.

ops$tkyte%ORA11GR2> create index a_d_index on T (acc_date, a);

Index created.

ops$tkyte%ORA11GR2> alter table t parallel;

Table altered.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> exec dbms_stats.set_table_stats( user, 'T', numrows => 1000000000, numblks => 10000000 );

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> set autotrace traceonly explain
ops$tkyte%ORA11GR2>  select /*+ full( t ) parallel */ a, b, c, acc_date
  2  from T
  3  where a='aq12ws'
  4  and b<>'fr45tg'
  5  and c='e34r5t'
  6  and acc_date > TO_DATE( '01-MAR-2013', 'dd-mon-yyyy' );

Execution Plan
----------------------------------------------------------
Plan hash value: 486055342

------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |   250 |  7500 |     8   (0)| 00:00:01 |       |       |        |      |            |
|   1 |  PX COORDINATOR      |          |       |       |            |          |       |       |        |      |            |
|   2 |   PX SEND QC (RANDOM)| :TQ10000 |   250 |  7500 |     8   (0)| 00:00:01 |       |       |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX BLOCK ITERATOR |          |   250 |  7500 |     8   (0)| 00:00:01 |     4 |     5 |  Q1,00 | PCWC |            |
|*  4 |     TABLE ACCESS FULL| T        |   250 |  7500 |     8   (0)| 00:00:01 |     4 |     5 |  Q1,00 | PCWP |            |
------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter("A"='aq12ws' AND "C"='e34r5t' AND "B"<>'fr45tg' AND "ACC_DATE">TO_DATE(' 2013-03-01 00:00:00',
              'syyyy-mm-dd hh24:mi:ss'))



it would do partition elimination for the full scan "naturally". Look in the plan table you generate the plan into directly if you want.


AND USE TO_DATE!!!!!

Parition pruning using virtual column based on date partition key

Nicholas Hurt, May 07, 2013 - 2:31 pm UTC

Hello Tom

We have an unfortunate scenario which I feel has relevance in this thread and to date I have not found a concise answer such as the ones you regularly provide. Stewart Bryson has touched on this here http://www.rittmanmead.com/2010/08/partitioning-fact-tables-part-1/ and also in one of your other older threads http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:4632159445946

We have a large fact table, interval partitioned by day based on the date field, however this date field also contains the time component. As a result we previously had two columns, one with the date and time (data type: date) and the other just trunc(date). Most of our queries used both so that one did the partition pruning and the other joined to our time dimension... yuk! Then in our infinite wisdom and some prelim trials with explain plans, we found that dropping the trunc(date) field and replacing it with a virtual column of trunc(date) seemed to tell the optimizer that what we really want to do when querying on the virtual column is use the underlying partition key and prune away. At the expense of a few more cpu cycles, that would allow us to keep the existing partition key and prevent us from having to store two date fields on the fact table, one for the date with time component and one without. Even though the explain plans told us the optimizer path chosen was the same when using either column, sadly after timing the queries and running an sql trace in production it turned out that using the virtual column resulted in lots more disk IO as if it had ignored the partition key. Our analysis seemed to conclude that After little help from support, we are on the verge of rebuilding our fact table, probably using a surrogate key for time (hours and minutes) and a trunc(date) column for the partition key, we are however still faced with the added expense of having to store a date and numeric field (7 + 4 bytes?) rather than some just a date. Would appreciate your thoughts and any insights. Thanks
Tom Kyte
May 07, 2013 - 3:48 pm UTC

need some set up to have this be reproducible for us... create tables and such...

You're right!

Ben, May 08, 2013 - 9:14 am UTC

Hi Tom,

You're right, the pstart and pstop columns got truncated when viewing through set autotrace traceonly explain. Once I did explain plan for and then ran utlxpls.sql I could see the pstart and pstop values. Thanks for that insight.

Also, just by adding TO_DATE in the query, it shaved of 20 minutes in the Time colums (I know milage varies at actual runtime but it's heading in the right direction now). And if I shorten the ACC_DATE range to 18 months instead of 3.5 years it (obviously) looks even better. Thanks.

And what you did here:

ops$tkyte%ORA11GR2> exec dbms_stats.set_table_stats( user, 'T', numrows => 1000000000, numblks =>
10000000 );

That's wicked cool, I learned something new here for sure.

I also tried something from the original poster, and removed the ACC_DATE column from the WHERE clause. I wanted to see if partitioning would be used. What happened is exactly as you explained - the paritioning was used "naturally" but was not actually "used". i.e. I have a pstart value of 1 and a pstop value of 1049873, basically recognizing the partitioning exists but going through all partitions and not eliminating any.

This was a huge help for me, thanks Tom!

More to Explore

VLDB

If you are new to partitioning, check out Connor McDonald's introduction series here.

VLDB

Documentation set on VLDB and Partitioning.