Thanks for the question, Cyrille.

Asked: July 16, 2016 - 4:05 pm UTC

Last updated: July 20, 2016 - 11:25 pm UTC

Version: 12.1.0.2

Viewed 1000+ times

You Asked

Hi Tom,

I have a question regarding the choice of the number of partitions when using hash partitioning.
I have the following table used to store documents

CREATE TABLE "DOCUMENT" 
   ( "DOCUMENT_ID" NUMBER(38,0), 
 "CREATED_BY" VARCHAR2(30 BYTE), 
 "CREATED_DATE" TIMESTAMP (6), 
 "DOCUMENT" CLOB, 
 "DOCUMENT_PRODUCTION_DATE" TIMESTAMP (6) WITH TIME ZONE, 
 "DOCUMENT_SIZE" NUMBER(10,0), 
 "DOCUMENT_TYPE" VARCHAR2(20 BYTE), 
 "ENTITY_IDENTIFIER" VARCHAR2(200 BYTE), 
 "MEDIA_TYPE" VARCHAR2(60 BYTE), 
 "PART_NUMBER" VARCHAR2(200 BYTE), 
 "NIU" VARCHAR2(50 BYTE), 
 "REQUEST_NUMBER" VARCHAR2(200 BYTE), 
 "SENDER_EXTENSION" VARCHAR2(21 BYTE), 
 "SENDER_ROOT" VARCHAR2(240 BYTE), 
 "TRACKING_ID" VARCHAR2(36 BYTE)
   ) SEGMENT CREATION IMMEDIATE 
  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 
 NOCOMPRESS LOGGING
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "CLINICAL_DATA" 
 LOB ("DOCUMENT") STORE AS SECUREFILE (
  TABLESPACE "CLINICAL_LOB" ENABLE STORAGE IN ROW CHUNK 8192
  NOCACHE LOGGING  NOCOMPRESS  KEEP_DUPLICATES 
  STORAGE(INITIAL 106496 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT))

This table will hold terabytes of documents.
This table will not be joined with others.
We decided to partition this table by hash with the partition key column being "NIU" as this table will always be filtered by this column.

The NIU is a unique identifier for a patient but it is not unique in the table because we will have multiple documents for the same patient.
There are approximately 10 millions distinct values and this will grow a bit over time but is pretty static compared to the number of documents.
The table itself will hold hundreds of million rows.

My question resides in the number of partition to choose. What will influence the number of partition to choose?
Will it only be the size of each partition?
Is there any drawback of having a very large amount of partitions, does it add any overhead?

Thank you very much.

Cyrille

and Connor said...

Some things to think about

1) Does it need to be partitioned at all ? Are you worried about overall size, or are you after the scans on patient data being efficient ? Perhaps an IOT with primary key being (NIU, DOCUMENT_ID) would give you data nicely clustered per patient. (Average size of each row is potentially also a factor here)

2) What sizes are the documents (ie, the clob) ? Will you be examining patient data always in conjunction with the document, or just the other metadata? (ie, pertains to storing that document inline or out of line)

3) hash partitions should be a power of 2. I'd be hesitant to go heavily into (say) "thousands" of partitions without a lot of careful testing. When an object consists of that many segments, think about how much dictionary metadata you have floating around for that object, some or all of which might need to be loaded/read/processed when parsing queries. eg 1000 partitions x 10 columns x (say) 200 histogram buckets per column ... that's a LOT of stuff.

Hope this helps.

Rating

(5 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

About the IOT

Cyrille, July 18, 2016 - 3:57 am UTC

Hi Connor,

Thanks for your useful advices.

1) My preoccupation is of course the size of the table and related performance and manageability.
Select statements must remain fast, computing stats for example will be easier on a partitioned table. It would definitely increase the manageability of such a big table to be partitioned. I never thought about an IOT, I can see your point about grouping the NIU together, but that's also what you do when you use hash partitioning, what other benefits would an IOT have?

2) The document can be from 1MB to 10MB, even more in the future, it will always be selected in conjunction of other columns.

3) I understand you point that's what I was worried about, but if I create less partitions they will be bigger, what is the worst, a bigger partition or the increase of metadata coming from more partitions.

I forgot to mention that data will very rarely be deleted from this table, the activity will be a majority of select and insert statements.

Our list of priority is as follow:

1- select statements performance
2- insert statement performance
3- manageability (stats, backups...)

Thanks

Cyrille

July 18, 2016 - 5:01 am UTC

1) With an IOT, the data per NIU is (by definition) co-located. With hash partitions, its *approximately* co-located, because if you have 'n' NUI's and 'm' partitions, with m < n (which sounds like will definitely be the case), then you have NIU's interspersed together within the same partition.

At which point, we think... "We can index it"...and you start heading back to why we didnt we do it with an IOT ? :-)

2) My question was the opposite. Will any of your queries select the other columns and *not* select the document column ? If that's a high volume, then out of line might be worth looking at.

3) It's easy enough to benchmark...create a table document with 100 partitions, 1000 partitions, 10000 partitions and so ... and run some typical queries. Even if you have minimal data in the table, you'll get some good approximations of what the *parsing* costs will be.

"4") for insert performance, its unlikely for the table to be an issue - generally its only when you have contention around index blocks. Remember that tables and indexes can be partitioned separately from each other. Once again, something you could benchmark (eg globally hash partitioned index)

Hope this helps.

What about the size limit

Cyrille, July 18, 2016 - 1:56 pm UTC

Hi Connor,

Thanks again for your reactivity and useful points.

1) That makes a lot of sense actually, but let's say my table will be 200TB, if I'm not wrong the maximum size of a tablespace for a 8K blocksize is 32TB, how can I store this table without partitioning? Maybe we can both use and IOT and partition it?
The access to the table is done by the following sets of columns:

- NIU + DOCUMENT_ID
- NIU + REQUEST_NUMBER + SENDER_ROOT + PART_NUMBER
- NIU + TRACKING_ID + DOCUMENT_TYPE + ENTITY_IDENTIFIER

2) 95% of the queries will return the document(CLOB), but even if I use storage in row it will most likely stored out of line because of the size of the document.

3) I will do the benchmark

4) That would actually be a good idea to hash partition the index globally to avoid these hot blocks during massive inserts.

Thanks

Cyrille

July 18, 2016 - 4:03 pm UTC

Yes, the maximum size of a data file is 32Tb with 8Kb block size. So if your table will be 200Tb you're going to need to split it across several of these.

Maybe we can both use and IOT and partition it?

You can indeed! Though the partition columns must be a subset of the primary key columns.

Chris

partitioning of the IOT

Cyrille, July 18, 2016 - 6:16 pm UTC

thanks for your input.

So it means that partitioning the table is mandatory if we have a table of this size.

The primary key is DOCUMENT_ID which is the only unique column in this table but partitioning on this primary doesn't make much sense according to the accesses given in my previous post, so if the the partition key must be a subset of the primary key for an IOT, the IOT won't work for me correct?

thanks

Cyrille

July 19, 2016 - 9:27 am UTC

Hmmm, the docs are perhaps a bit misleading there. It means if you're partitioning an IOT, you can't include non-PK columns. At least in 12.1.0.2:

SQL> create table t (
  2    x int primary key,
  3     y int
  4  ) organization index
  5    partition by hash (x) partitions 4;

Table created.

SQL>
SQL> insert into t
  2    select rownum, rownum from dual connect by level <= 10;

10 rows created.

SQL> drop table t purge;

Table dropped.

SQL> create table t (
  2    x int primary key,
  3     y int
  4  ) organization index
  5    partition by hash (y) partitions 4;
  partition by hash (y) partitions 4
                     *
ERROR at line 5:
ORA-25199: partitioning key of a index-organized table must be a subset of the primary key

In any case, NIU is the common column in all your queries. So making an IOT with DOCUMENT_ID as the PK won't help 2/3 of your statements. In fact, it's likely to make them worse!

You could get around this by creating a "fake" primary key. This would be NIU + DOCUMENT_ID. Then add a unique constraint on DOCUMENT_ID to ensure this is unique, you can point FKs to it, etc. e.g.:

create table t (
  x int unique,
  y int,
  primary key (y, x)
) organization index 
  partition by hash (y) partitions 4;

https://mwidlake.wordpress.com/2011/08/17/iot-part-5-primary-key-drawback-and-workaround/

Still confused

Cyrille, July 20, 2016 - 3:23 pm UTC

Hi,

I thought I understood the advantage of using an IOT but I'm still confused,
I created both the table as an IOT and a normal heap table with the following indexes:

CREATE INDEX "DOCUMENT_IDX_01" ON "DOCUMENT" ("NIU","REQUEST_NUMBER", "DOCUMENT_TYPE", "SENDER_ROOT") LOCAL; 

CREATE UNIQUE INDEX "DOCUMENT_UQ_01" ON "DOCUMENT" ("NIU","TRACKING_ID", "DOCUMENT_TYPE", "ENTITY_IDENTIFIER") LOCAL;

According to the design of the application we will never query more than one NIU at a time, meaning we will always query on partition at a time, so I created local indexes.

Here is a test query:

select * from CLINICAL_DOCUMENT where NIU='10000017400' and clinical_document_id='81873';

The plan with the IOT:

-----------------------------------------------------------------------------------------------------
| Id  | Operation             | Name        | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |             |     1 |   302 |     1   (0)| 00:00:01 |       |       |
|   1 |  PARTITION HASH SINGLE|             |     1 |   302 |     1   (0)| 00:00:01 |     1 |     1 |
|*  2 |   INDEX UNIQUE SCAN   | DOCUMENT_PK |     1 |   302 |     1   (0)| 00:00:01 |     1 |     1 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("DOCUMENT_ID"=81873 AND "NIU"='10000017400')

The plan with the heap table:

------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name              | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |                   |     1 |  2421 |     2   (0)| 00:00:01 |       |       |
|*  1 |  TABLE ACCESS BY GLOBAL INDEX ROWID| DOCUMENT      |     1 |  2421 |     2   (0)| 00:00:01 |  1025 |  1025 |
|*  2 |   INDEX UNIQUE SCAN                | DOCUMENT_PK       |     1 |       |     1   (0)| 00:00:01 |       |       |
------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("NIU"='10000017400')
   2 - access("DOCUMENT_ID"=81873)

Here is an other query using a local index previously created:

select * from DOCUMENT where niu='10000017400' and REQUEST_NUMBER='2014148000366-161751620' and SENDER_ROOT='2.16.124.10.101.1.60.1.4503' and PART_NUMBER='539013823';

The plan with the IOT:

------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name                     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                          |     1 |   302 |     2   (0)| 00:00:01 |       |       |
|   1 |  PARTITION HASH SINGLE|                          |     1 |   302 |     2   (0)| 00:00:01 |     1 |     1 |
|*  2 |   INDEX UNIQUE SCAN   | DOCUMENT_PK              |     1 |   302 |     2   (0)| 00:00:01 |     1 |     1 |
|*  3 |    INDEX RANGE SCAN   | DOCUMENT_IDX_01    |     1 |       |     1   (0)| 00:00:01 |     1 |     1 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("REQUEST_NUMBER"='2014148000366-161751620' AND "SENDER_ROOT"='2.16.124.10.101.1.60.1.4503'
              AND "NIU"='10000017400')
       filter("PART_NUMBER"='539013823')
   3 - access("REQUEST_NUMBER"='2014148000366-161751620' AND "SENDER_ROOT"='2.16.124.10.101.1.60.1.4503'
              AND "NIU"='10000017400')
       filter("NIU"='10000017400' AND "SENDER_ROOT"='2.16.124.10.101.1.60.1.4503')

The plan with the heap table:

---------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name                     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |                          |     1 |  2421 |     2   (0)| 00:00:01 |       |       |
|   1 |  PARTITION HASH SINGLE                     |                          |     1 |  2421 |     2   (0)| 00:00:01 |  1025 |  1025 |
|*  2 |   TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| DOCUMENT            |     1 |  2421 |     2   (0)| 00:00:01 |  1025 |  1025 |
|*  3 |    INDEX RANGE SCAN                        | DOCUMENT_IDX_01     |     1 |       |     1   (0)| 00:00:01 |  1025 |  1025 |
---------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("PART_NUMBER"='539013823')
   3 - access("NIU"='10000017400' AND "REQUEST_NUMBER"='2014148000366-161751620' AND
              "SENDER_ROOT"='2.16.124.10.101.1.60.1.4503')
       filter("SENDER_ROOT"='2.16.124.10.101.1.60.1.4503')

I still wonder why it is an advantage to use an IOT, why grouping the NIU together can have a benefit for the performance?

Thanks

Cyrille

July 20, 2016 - 3:37 pm UTC

I still wonder why it is an advantage to use an IOT, why grouping the NIU together can have a benefit for the performance?

This is based on the assumption there's more than one row for each NIU.

Say you have 10 rows/NIU. In a heap table these could easily be scattered across 10 different table blocks. So fetch these that's 10 IO operations you need to do + reading the index.

But with an IOT Oracle Database places rows in the same block according to the logical order imposed by the index. So if NIU is the first column, all ten rows will be "together" in one block. Or maybe two if you're unlucky. Plus the index has all the information you need. So you save yourself a table lookup.

So instead of 11+ IOs to read 10 rows, you only have 1. This benefit increases the more rows there are per NIU.

If there's always at most one row per NIU the benefit of an IOT is limited. You save yourself a table lookup, but that's about it. But if this is the case, you should have unique constraint on NIU!

If you want to know more, read Martin Widlake's series on these:

https://mwidlake.wordpress.com/2011/07/18/index-organized-tables-the-basics/

Or Richard Foote's:

https://richardfoote.wordpress.com/2012/01/10/index-organized-tables-an-introduction-of-sorts-pyramid-song/

Chris

Thank you

Cyrille, July 20, 2016 - 5:21 pm UTC

Thank you very much.

July 20, 2016 - 11:25 pm UTC

Glad we could help.

More to Explore

VLDB

If you are new to partitioning, check out Connor McDonald's introduction series here.

VLDB

Documentation set on VLDB and Partitioning.

Breadcrumb

Question and Answer

You Asked

and Connor said...

Rating

Comments

About the IOT

What about the size limit

partitioning of the IOT

Still confused

Thank you

More to Explore

VLDB

VLDB

Hash partitioning and number of partition

Breadcrumb

Question and Answer

You Asked

and Connor said...

Rating

Comments

About the IOT

What about the size limit

partitioning of the IOT

Still confused

Thank you

More to Explore

VLDB

VLDB