Skip to Main Content

Breadcrumb

May 4th

Question and Answer

Connor McDonald

Thanks for the question, Mohd..

Asked: November 10, 2016 - 2:45 pm UTC

Last updated: November 16, 2016 - 4:09 am UTC

Version: 12c

Viewed 1000+ times

You Asked

Dear Tom,

Greetings and thanks for helping community here.

I am in process of partitioning and using below logic...

1 MB can store = No Of Rows / Current Table Size(MB) => will give me the number of rows in 1MB
Then I can decide how many rows I can store in a e.g. 4 GB. ( determining Range of Partition key here )

e.g. my 500 MB table consists of 100,000,000 rows
Below is the calculation...

1 MB can store = 100,000,000 / 500 (MB) = 200,000 => 1 MB can store 200,000 rows
Now I know my 1GB data file size can store 200,000 * 1024 = 204,800,000 rows

to make it more clear if I choose RANGE of < 204,800,000 in 1st Partititon and INTERVAL (204,800,000) for next partition and so on.....that way my each Partition will be of 1GB.

Now My question is how big my partition size should be ? 2GB, 3GB or 4GB ( So that I will know the exact PARTITION RANGE)

Thanks in advance :)




and Connor said...

The size of the partition *is* relevant but probably not the most important decision factor.

Example(s):

1) I archive my data based on old months - so I might partition by month

2) All my reports are for Year to Date - so I might partition by year or by month

3) I join several partitioned tables together - i might align all of their partition range definitions to get partition-wise joins

etc etc

This is not to say that you should forget about size, because size is related to manageability (eg time taken to move a single partition, or rebuild a local index partition), but it is just one piece of the equation.

Rating

  (5 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

partition-wise joins

Mohd. Alam, November 11, 2016 - 4:11 pm UTC

Dear Connor,

1. So if my one month data in "One Partition" is 100 GB (which is violating the Oracle guideline "if Table is more than 2GB, we should partition it)......is that ok?

2. As you mentioned in point 3 -> partition-wise joins -> If my one table T1 partition is having 3 months range and other tables T2 and T3 are partitioned "month-wise"

And I query these tables together...is there any issue to get "partition-wise joins" between these table since T1 having a different partition range??
Connor McDonald
November 14, 2016 - 2:39 am UTC

1) Where did you see that recommendation ? Because its not a good one. As I said before - choose a partition strategy that best suits *all* of your needs, eg, query performance, administration benefit etc.

2) Here's some examples of partition-wise joins - you can see the partitions need to *align*


SQL> --
SQL> -- Example 1
SQL> --
SQL>
SQL> drop table t1 purge;

Table dropped.

SQL> drop table t2 purge;
drop table t2 purge
           *
ERROR at line 1:
ORA-00942: table or view does not exist


SQL>
SQL> create table t1 ( x int, y int )
  2  partition by range ( x )
  3  (
  4  partition p1 values less than (10000),
  5  partition p2 values less than (20000),
  6  partition p3 values less than (30000)
  7  );

Table created.

SQL>
SQL>
SQL> create table t2 ( x int, y int )
  2  partition by range ( x )
  3  (
  4  partition p1 values less than (10000),
  5  partition p2 values less than (20000),
  6  partition p3 values less than (30000)
  7  );

Table created.

SQL>
SQL>
SQL> insert into t1 select rownum, rownum from dual connect by level < 30000;

29999 rows created.

SQL> insert into t2 select * from t1;

29999 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats('','t1')

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats('','t2')

PL/SQL procedure successfully completed.

SQL>
SQL> --
SQL> -- all partitions align, so we see partition-wise join
SQL> -- (ie, hash join "under" the partition iterator
SQL> --
SQL> set autotrace traceonly explain
SQL> select count(t1.y), count(t2.y)
  2  from t1,t2
  3  where t1.x = t2.x;

Execution Plan
----------------------------------------------------------
Plan hash value: 3155849676

---------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |     1 |    20 |  1050   (1)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE      |      |     1 |    20 |            |          |       |       |
|   2 |   PARTITION RANGE ALL|      | 29999 |   585K|  1050   (1)| 00:00:01 |     1 |     3 |
|*  3 |    HASH JOIN         |      | 29999 |   585K|  1050   (1)| 00:00:01 |       |       |
|   4 |     TABLE ACCESS FULL| T1   | 29999 |   292K|   525   (1)| 00:00:01 |     1 |     3 |
|   5 |     TABLE ACCESS FULL| T2   | 29999 |   292K|   525   (1)| 00:00:01 |     1 |     3 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("T1"."X"="T2"."X")

SQL> set autotrace off
SQL>
SQL> --
SQL> -- Example 2
SQL> --
SQL>
SQL>
SQL> drop table t2 purge;

Table dropped.

SQL> create table t2 ( x int, y int )
  2  partition by range ( x )
  3  (
  4  partition p1 values less than (15000),
  5  partition p3 values less than (30000)
  6  );

Table created.

SQL>
SQL> --
SQL> -- all partitions do NOT align, so we do NOT see partition-wise join
SQL> --
SQL>
SQL> insert into t2 select * from t1;

29999 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats('','t2')

PL/SQL procedure successfully completed.

SQL> set autotrace traceonly explain
SQL> select count(t1.y), count(t2.y)
  2  from t1,t2
  3  where t1.x = t2.x;

Execution Plan
----------------------------------------------------------
Plan hash value: 666786458

---------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |         |     1 |    20 |   875   (1)| 00:00:01 |       |    |
|   1 |  SORT AGGREGATE               |         |     1 |    20 |            |          |       |    |
|*  2 |   HASH JOIN                   |         | 29999 |   585K|   875   (1)| 00:00:01 |       |    |
|   3 |    PART JOIN FILTER CREATE    | :BF0000 | 29999 |   585K|   875   (1)| 00:00:01 |       |    |
|   4 |     PARTITION RANGE ALL       |         | 29999 |   292K|   525   (1)| 00:00:01 |     1 |     3 |
|   5 |      TABLE ACCESS FULL        | T1      | 29999 |   292K|   525   (1)| 00:00:01 |     1 |     3 |
|   6 |    PARTITION RANGE JOIN-FILTER|         | 29999 |   292K|   351   (1)| 00:00:01 |:BF0000|:BF0000|
|   7 |     TABLE ACCESS FULL         | T2      | 29999 |   292K|   351   (1)| 00:00:01 |:BF0000|:BF0000|
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("T1"."X"="T2"."X")

Note
-----
   - this is an adaptive plan

SQL> set autotrace off
SQL>
SQL>
SQL> --
SQL> -- Example 3
SQL> --
SQL> drop table t2 purge;

Table dropped.

SQL> create table t2 ( x int, y int )
  2  partition by range ( x )
  3  (
  4  partition p1 values less than (10000),
  5  partition p2 values less than (20000),
  6  partition p3 values less than (30000),
  7  partition p4 values less than (40000)
  8  );

Table created.

SQL>
SQL> --
SQL> -- some partitions do align, but still a mismatch so we do NOT see partition-wise join
SQL> --
SQL>
SQL> insert into t2 select rownum, rownum from dual connect by level < 40000;

39999 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats('','t2')

PL/SQL procedure successfully completed.

SQL> set autotrace traceonly explain
SQL> select count(t1.y), count(t2.y)
  2  from t1,t2
  3  where t1.x = t2.x;

Execution Plan
----------------------------------------------------------
Plan hash value: 666786458

---------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |         |     1 |    20 |  1224   (1)| 00:00:01 |       |    |
|   1 |  SORT AGGREGATE               |         |     1 |    20 |            |          |       |    |
|*  2 |   HASH JOIN                   |         | 29999 |   585K|  1224   (1)| 00:00:01 |       |    |
|   3 |    PART JOIN FILTER CREATE    | :BF0000 | 29999 |   585K|  1224   (1)| 00:00:01 |       |    |
|   4 |     PARTITION RANGE ALL       |         | 29999 |   292K|   525   (1)| 00:00:01 |     1 |     3 |
|   5 |      TABLE ACCESS FULL        | T1      | 29999 |   292K|   525   (1)| 00:00:01 |     1 |     3 |
|   6 |    PARTITION RANGE JOIN-FILTER|         | 39999 |   390K|   699   (1)| 00:00:01 |:BF0000|:BF0000|
|   7 |     TABLE ACCESS FULL         | T2      | 39999 |   390K|   699   (1)| 00:00:01 |:BF0000|:BF0000|
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("T1"."X"="T2"."X")

Note
-----
   - this is an adaptive plan

SQL> set autotrace off
SQL>
SQL>


Partition-wise joins

Mohd. Alam, November 11, 2016 - 6:40 pm UTC

Dear Connor,

1. So if my one month data in "One Partition" is 100 GB (which is violating the Oracle guideline "if Table is more than 2GB, we should partition it)......is that ok?

2. As you mentioned in point 3 -> partition-wise joins -> If my one table T1 partition is having 3 months range and other tables T2 and T3 are partitioned "month-wise"

And I query these tables joining together...is there any issue to get "partition-wise joins" between these table since T1 having a different partition range??

Partition-wise joins

Mohd. Alam, November 11, 2016 - 7:03 pm UTC

Dear Connor,

1. So if my one month data in "One Partition" is 100 GB (which is violating the Oracle guideline "if Table is more than 2GB, we should partition it)......is that ok?

2. As you mentioned in point 3 -> partition-wise joins -> If my one table T1 partition is having 3 months range and other tables T2 and T3 are partitioned "month-wise"

And I query these tables joining together...is there any issue to get "partition-wise joins" between these table since T1 having a different partition range??

Mohd. Alam, November 14, 2016 - 3:15 pm UTC

Dear Connor,

Thanks a lot for detailed answer,

1. For partition of table size >2GB => https://docs.oracle.com/database/121/VLDBG/GUID-6D369646-16AF-487B-BF32-5F6569D27C8A.htm#VLDBG1046

2. My case being Example 2, I still see a small CPU cost compare to the Example 1.

My data model is like below
t1 will have master data with a single row
t2 will have 1,000 rows per t1 rows and so t3.

since I can not use time based partition, and my partition is going to be based on a sequence so t1 will have 1 row and then t2 and t3 will have 1,000 rows for every row in t1.
so if I create SAME partition range same for these tables, my t1 will have only few rows in every partition and size of each partition may be less than 10 MB, on the other side t2 and t3 will have millions of rows and size may be few GBs.

So can I have a different range for t1 as you example shows still a less CPU cost.

Connor McDonald
November 15, 2016 - 1:12 am UTC

1) Thanks, I've submitted some feedback suggesting that recommendation is outdated.

2) As tables get *huge* (millions and billions of rows), partition-wise joins become more and more important. But if your scale is not that, then choose the mechanism that works best for you.

Mohd. Alam, November 15, 2016 - 3:27 pm UTC

Just to close this thread...

1) If >2GB is not the bench mark then what is the size I should think of partition in 12c Data warehouse or OLTP environments?

2) I just wanted to remove a partition which is as small as 10-50 MB for table T1, but will it make SQL OPTIMIZER to go haywire in doing the semi-partition-wise join ( as I explain above similar to your Example2)
Connor McDonald
November 16, 2016 - 4:09 am UTC

"1) If >2GB is not the bench mark then what is the size ..."

There is *no* right size. On my laptop, a good partition size might be 1G because

a) it takes 10 seconds to query 1G
b) I need my queries that access a single partition to finish in 10 seconds
c) I need my partition archive compression to be less than 30 seconds
etc etc

But on an Exadata machine, 50G might be a good size I might spread my 400G table over the storage cells best in that way etc

*Business requirements* and your technical infrastructure drive the partition decisions you make.


"2) I just wanted to remove a partition ..."

Anything which changes the structure or size of a table can impact the optimizer. But yes, if you had partitions all aligned between table (and hence was getting partition-wise joins), then altering the partition structure of one table without doing the same on the other ,would most likely stop partition-wise join from being available. Your query will *still* run, but it may well consume more resources to do it

More to Explore

VLDB

If you are new to partitioning, check out Connor McDonald's introduction series here.

VLDB

Documentation set on VLDB and Partitioning.