You Asked

My question is generally how to determine the size to set the size value in a create cluster statement. And specifically for a parent table 400 bytes wide with 15 million rows, and a child 120 bytes wide with 40 million rows. There may be any number of children but the average is less then 4 and median is 2.

What should the ???? in the code below be, or better what are the steps to arrive at an optimal value?

create cluster tr_cluster ( region number(2),customer number(10),invoice number(10)) size ???? ;

create index tr_cluster_idx on cluster tr_cluster;

/* 15000000 rows ave length 410 bytes */
create table invoices(region number(2),customer number(10),invoice number(10), other_data char(400));

/* 42000000 rows ave length 102 bytes , avgerage number of lines is 3.8, median number of lines is 2  */
create table invoice_lines(region number(2),customer number(10),invoice number(10), line integer, line_data char(100));

Thank you

with LiveSQL Test Case:

https://livesql.oracle.com/apex/livesql/file/content_GTPCI7OMXORIX0UFL2ZNO3T6G.html

and Connor said...

From the docs:

SIZE

Specify the amount of space in bytes reserved to store all rows with the same cluster key value or the same hash value. This space determines the maximum number of cluster or hash values stored in a data block. If SIZE is not a divisor of the data block size, then Oracle Database uses the next largest divisor. If SIZE is larger than the data block size, then the database uses the operating system block size, reserving at least one data block for each cluster or hash value.

So essentially it is how much space will *all* rows for a given cluster key occupy. You are after a balance here depending on your business requirements, eg

1) Pick a really *big* size (lets say you omit it, so one cluster key per block)

Then you get absolutely the best concurrency because no two keys will sit in the same block. *But* you also now have a much larger table - so any scans of data (eg "this months invoices") will take a lot longer. And you might have wasted RAM because much of each of those blocks is empty.

2) Pick a really *small* size

Your table occupies less space on disk - so better for scans, and you get best utilization of those blocks when they are in the buffer cache. *But* you have more chance of cluster keys not fitting in their designated block, so more chance for contention.

So its like Goldilocks :-) the size you choose is balance of your business requirements, ie, volume/frequency of inserts, amount of access by cluster key versus amount of access via other means (date range etc).

Certainly a reasonable starting point would be to fill each block, so something like 1*410 + 3*102 is going to be the ballpark.

Is this answer out of date? If it is, please let us know via a Comment

More to Explore

Design

New to good database design? Check out Chris Saxon's full fundamentals class.

Breadcrumb

Question and Answer

You Asked

and Connor said...

More to Explore

Design

Sizing clusters

Breadcrumb

Question and Answer

You Asked

and Connor said...

More to Explore

Design