Skip to Main Content

Breadcrumb

May 4th

Question and Answer

Tom Kyte

Thanks for the question, Sudhanshu.

Asked: July 06, 2002 - 1:14 pm UTC

Last updated: August 28, 2013 - 5:01 pm UTC

Version: 8.1.7

Viewed 10K+ times! This question is

You Asked

I have a table having around 10Million rows. Data can be grouped against a key, each key generally having around 10-200 rows. Data against this key gets inserted over a large period of time. Distinct keys in this table at present are .7 million.

All of our queries search on the basis of this key and as the data is scattered across files it takes time for all of the queries to execute. Ours is an OLTP application thus insertion in this table are online.

Should i create a single table cluster on this table with my key as cluster column?

Thanks.

and Tom said...

This might be a condidate for a cluster, yes

You will of course want to benchmark -- you have to understand the cluster will take marginally longer to insert into since the data now has a "place" to go -- and maintaining this structure will take longer then maintaining a HEAP table (normal table you have now).

If you have my book -- I go into the in's and out's of clusters. If you can take the insert hit, it could be very good for your retrieval. remember -- you only insert a row once, you might query it up 1,000 times -- therefore, this shifting of overhead from one function to another (from query to insert) could be well worth it in the long term.

Rating

  (68 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Is IOT appropriate here?

andrew, July 08, 2002 - 12:38 pm UTC

Tom
Please could you comment on the appropriateness of an IOT for this problem. If appropriate - under what circumstances would the cluster solution be better?

Tom Kyte
July 08, 2002 - 4:40 pm UTC

probably not.

A cluster just ensures that data that shares a common key is stored together.

An IOT goes a step further and imposes "order" on the data -- 1, 2, 3, 4, 5, ... type of order.

It is more expensive to insert into an IOT then a cluster -- as we might have to physically MOVE data to make room for new data (index splits and such).

In a cluster, we just make sure that like data is stored together. It doesn't matter that physically we store 1, 5, 2, 3, 4 -- just that all of the 1's are together, all of the 5's and so on.

Since the goal here was to cluster like values together (to reduce the physical IO's and make better, more efficient use of the buffer cache) a cluster would probably be more appropriate.



confused by 'single table' hash cluster in Chapter 6

Doug, July 08, 2002 - 8:29 pm UTC

Tom - in this question here, you are suggesting a single table might be worth clustering to co-locate the data. In your book (pg. 238) you make reference to a special case of the single table hash cluster. You say this is designed for those occassions when you want to access a table by primary key and don't wish to cluster other tables with it. "If you need fast access to an employee record by EMPNO - a single table hash cluster might be called for".

In the example in the book it looks like just a regular old table with a standard primary key. In this example in the thread it looks like there are many records with the same value.

So..
If I have a table that is not joined to other tables and I need fast access to it's primary key, do I want a single table hash cluster? If so, isn't it somewhat more common to just make use of the index on the primary key? If I can predict the size of this table is it worth single table hash clustering? In this question thread, it seems we can store many values together but in the book example, I think you imply that empno is guaranteed to be completely unique between 1 and 50,000.

The example here makes sense but the book example is confusing me. Tables are accessed by primary key quite frequently - I would assume I would see more single table hash clusters if they are worth doing in single table mode.

Tom Kyte
July 09, 2002 - 7:05 am UTC

I think that either

o I'm confused by your question
o your confusing topics

In this question -- the user wants to physically colocate related data by some value. "... each key generally having around 10-200 rows." They would like all data with the same key value physically stored near all of the data with the same key.

The answer to that is CLUSTER (well, more explicitly -- B*Tree CLUSTER). In their case, it'll be a single table b*tree cluster but that is not any different then a multi-table one in our eyes.


Now, you are talking about a table with a primary key (unique) AND that you want to access purely by primary key. That is when a HASH cluster comes into play. Normally, you would create a table with a primary key index and then when accessing the table -- we would:

o index unqiue scan the index (say that is 2-3 IO's to find the key)
o table access by rowid (that is 1 IO to get the data)

Using a HASH cluster however we can take that above work and instead:

o hash the primary key (CPU operation) into a physical location
o do the single IO needed to read that block

So, we can cut out the index access all together. "the data is the index"


Even if that table is JOINED to other tables -- this still makes sense, as long as the access path is generally via the primary key (and don't forget -- we can create other indexes on this object as well!)


So, in this thread, I was talking about b*tree clusters (see the book on page 224). The example I used there was EMP+DEPT and a cluster key on DEPTNO, not EMPNO. The goal was to get the DEPT info physically colocated with the EMP info for that DEPTNO.


Later, I go onto HASH clusters.

Why are hash clusters not the norm than?

a) you don't know the number of records ultimately
b) there is a performance penalty on insertions (see above -- i discuss that in the answer)
c) 99% of people aren't aware of them, most people never bother to learn about them


I've used them to great success with some very large tables.

collision ....

Reader, November 01, 2003 - 10:31 pm UTC

Tom, in your new book on page 385, you mention collision in hash table. Could you please elaborate what it is and how exceeding value for hashkeys parameter affect the hash table? Thanks for your time. Your book is great.

Tom Kyte
November 02, 2003 - 9:59 am UTC

a collision occurs when 2 "hash key values" -- the values that are hashed -- result in the same "location", the same place, the same hash value.

You are taking a possibly infinite set of values (the keys you hash) and are hashing them into a finite set of values. You will have unavoidable "collisions" that result in a little extra lookup time (we hash to where we expect to find the data and then have to look for the data -- it might not be "right there" since something else may well have hashed to that value already as well).

If you set hashkeys too low and many values hash to the same location, we'll have to link together lots of blocks in order to find your data. Say you set hashkeys to 5 (for illustration purposes). Oracle will conceptually set up a 5 block table:

+----+----+----+----+----+
| | | | | |
| | | | | |
+----+----+----+----+----+

Now, you create a table in this hash cluster based on a key (say "id"). It turns out that 50 rows will fit onto a single block with your data. You start inserting information and get 'lucky', the data spreads out evenly so that you have 250 rows spread across 5 blocks.

Now, any retrieval such as "select * from T where id = 55" will result in 1 logical IO since Oracle will hash 55 to a database block address and read that block, finding all of the data for id=55.

Now, you insert row 251 -- with id=60. It cannot fit onto the block it hashed to since that block is totally full. Assume further that 55 and 60 "hash" to the same block address. Oracle will link another block to it:


hash(55) = block 3 = hash(60)
|
V
+----+----+----+----+----+
| | |50 | | |
| | | 55 | | |
+----+----+----+----+----+
|
V
+----+
| 60 |
| |
+----+


So, now

"select * from t where id = 60"

will take 2 lio's (as well where id = 55, where id = 50) since we had so many "collisions" that we filled up that block and had to link other blocks to it.

Great explanation - collision

Reader, November 02, 2003 - 10:53 am UTC

I understand that where id=60 will take two LIOs. Why for id=50 and id=55 also take 2 LIOs? Thanks again.

Tom Kyte
November 02, 2003 - 12:27 pm UTC

because the data for 50 and 55 could also be on that linked block. if you were to insert yet another row with id=50, it would has to that block, that block is full -- hence it goes onto the linked blocks. once linked -- any of the data that could be on the head block might be on the linked blocks as well -- we have to check for it.

Comparing Lookup Methods

A reader, November 09, 2003 - 1:20 pm UTC

In your new book, you compared the different methods of doing lookups.

For heap table, you used an example of TABLE ACCESS BY INDEX ROWID, which is not so efficient as SINGLE-TABLE HASH CLUSTER or IOT.

I am wondering if we could achieve the same or better performance if we use INDEX FAST FULL SCAN on the heap table?

Thanks.

Tom Kyte
November 09, 2003 - 2:07 pm UTC



an index fast full scan ??? to do a keyed read????

nope, not a chance (think about it -- do you really want to full scan the entire index each time??????)

Single-Table B-Tree Cluster

A reader, November 09, 2003 - 3:34 pm UTC

Thanks Tom.

According to Oracle DBA Guide:

"The SINGLE TABLE option is valid only for hash clusters. HASHKEYS must also be specified."

Is it right that I should not use a single-table b-tree index cluster.


Tom Kyte
November 09, 2003 - 5:09 pm UTC

no, it is not right.

you can use a b*tree cluster very effectively with a single table in it. Consider a single table in a b*tree cluster that has a key "deptno" on it. The single table is EMP. Now -- all emps in the same deptno will be physically "colocated" together. If that is advantageous to your application, it'll be a good thing.

IOT versus clustered table

Dushan, November 10, 2003 - 2:32 am UTC

As it was said, insert into IOT table is more expensive than insert into clustered table. Is this the only benefit?

As for me the biggest cons of clustered table in common are:
1) Not possible to partition
2) Not possible to direct path load (same for IOT, I think)

So, when we store huge amount of data into clustered table, deleting from this table turns into nightmare. From this point IOT wins (truncate/drop partition).



Tom Kyte
November 10, 2003 - 7:40 am UTC

it was said? where?


A cluster allows you to store rows from many tables together on a single block. All of the data from "DEPT" and "EMP" can be physically stored on a single block.

IOT's do not have that.

They all are just TOOLS -- to be used at the appropriate time. Look at the Oracle data dictionary -- when you create a table (with columns, indexes, constraints, etc) -- we store that in a cluster so that most of the information about a TABLE will be on the same block.

You need to be aware of how each works, what is can and cannot do for you and so on.



Clustered table experience

Dushan, November 10, 2003 - 10:52 am UTC

Understand, Oracle data dictionary is an excellent example - changes to data dictionary are rare (usally), quick response is required, low IO.

Unfortunalely, I had seen rather strange table design on two billing systems regarding using of cluster.
In the first case, weekly CDR tables (big tables) were clustered on customer code column, and were joined in partitioned view. So, no problem with getting rid of data for one week, nevertheless, inserts of new CDR's suffered. Everybody believed that billing process is much faster using these clustered tables, until we tested (by accident) the whole process on regular heap tables with regular B-tree indexes. No impact on performance at all!

Second case - different client, different billing system but the same CDR story. Even worse - really huge clustered table and problem with deleting. Just nightmare!

Of course, I agree with you that this all about TOOLS to be used in proper way.

(Just reading your Effective Oracle by Design, excellent book!)


IOT / hash / heap

Marcio, January 22, 2004 - 9:23 pm UTC

Just to clarifying:
Scenario:

DB A

t_source ( id_contract varchar2(17), name_con varchar2(40), ... )

DB B
t_interface ( id_contract varchar2(17), dt_load date )

t_process (id_contract varchar2(17), dt_process date )

t_contract (id varchar2(17) pk, name ..., state check ('A','B','C','D','E'))

t_history (qty_rows_process number(7), qty_files ...)

There is no fk references between tables.

1) A job #1 wakes up --> collect all data from t_source between dt1 and dt2 and insert into t_interface set null in dt_load. About 2,000,000 rows at time -- interval 1h (8am - 5pm) we talking about 20,000,000 rows/day.

2) Another job #2 wakes up --> insert into t_process from t_source where dt_load is null, set dt_load.

3) Another yet job #3 wakes up --> select t_process, t_contract where id_contract = id and state in ('A','B') -- export some files with information based on state, insert into stats into t_history and set dt_process.

The tables t_contract and t_source I can't touch.
Wich would be create rest of them (t_interface, t_process and t_history)? IOT/b*tree Hash/heap/Other? and please explain...


Tom Kyte
January 23, 2004 - 7:09 am UTC

those structures you mention

o IOT's
o b*tree cluster
o hash cluster

are all about ACCESS -- they are used to optimize your frequent retrievals from the tables you load into.

Hence, a description of a load cannot tell us what sort of structures you should use to load. You have to look at

o how you use the data after loaded
o whether locale of data would be useful

if you want examples and discussions on this -- I go over it in both "Expert one on one Oracle" and "Effective Oracle by Design" in some detail.

IOT / hash / heap - Continued

Marcio, January 23, 2004 - 7:38 am UTC

The system above is all about. As is!
I've found full relevant information about iot, b*tree cluster and so on your book, but I'd like know your opinion about what type of creation would be the tables in the scenario above -- if you don't mind.

Btw - fyi: You have yet the warning on followup page about your trip. "Bear in mind, I am out of the country from Jan 10th ..."

Tom Kyte
January 23, 2004 - 8:12 am UTC

you only load, you never query... hmmm

fastest way to do that is -- don't load. you don't need it i guess. if that is "the system", just don't load -- you never query ;)


there you go.

(there MUST be more to the system, else this system does nothing. If you want to learn what structures would be best -- you need to understand how these structures work

a) server concepts
b) my books

you can read those to gain understanding!)


the scenario above would call for simple "heap" tables.

the missing part is "what really happens to the data later" -- that is where iot/hash clusters/b*tree clusters etc come into play


Load performance...

Scott Watson, March 18, 2004 - 4:50 pm UTC

Tom,

One of the apps I support has a query like this.

select many columns from part_iot_table
where ctry_cd = 'XX' and id = 999;

On average there are 9 rows per id, ctry_cd will always be the same.

I decided to create the table as a partition IOT to colocate the data together around the id column. However, the team has decided that they also want to query the table like so.

select many columns from part_iot_table
where id = 999;

Now my partition elimiation scheme is out the window. I could create a global secondary index on the subj_id, however, I was thinking that maybe a single hash cluster on id might solve both problems at once.

My problem is that loading the single table hash cluster is so slow that this option does not seem feasible. Hopefully you can shed your experience on increasing load performance for these types of objects. So far it has been running for about 8 hours and I have no way to estimate when it will complete.

Since I am loading from the IOT and the data is already sorted by id, I would expect that most blocks would be cached after the first read and not accessed again.

I tried to test the single hash cluster by loading a single partition of 30M rows.

My cluster was created with the following syntax.

create cluster match_data_cluster
( subj_id number(13))
single table
hashkeys 30000000 -- should have used 30M/9 rows per key
size 94
tablespace SCOTT_DONT_USE
pctfree 0
parallel 4

Can you suggest anything I can do to increase the load performance. I am not running in Archive log mode on my test server.

My next option, if I cannot get the load to speed up, would be to try and create a hash partitioned IOT on the id column and add a dummy column using rownum to ensure that I have a primary key. A head table was out of the question as we sometimes need to lookup 5000 id at a time in the 300M table and there is too much bouncing around.

I would appreciate your thoughts and comments. I am running 9.2.0.3 on HP 11.

Thanks once again
Scott Watson

Tom Kyte
March 18, 2004 - 5:17 pm UTC

I'm confused -- if they query:

select many columns from part_iot_table
where ctry_cd = 'XX' and id = 999;

select many columns from part_iot_table
where id = 999;


the primary key of the IOT for clustering should be:

(id,ctry_cd,.....anything else you need...)

and the data would be partitioned by ID ranges/hashes/whatever...


so, why doesn't the partitioned IOT work for you?

More information

Scott Watson, March 19, 2004 - 7:21 am UTC

That was my plan if the hash cluster didn't work. However, I didn't want to give up on the single table hash cluster just yet. I was hoping you might know why the load would be taking so long.



Tom Kyte
March 19, 2004 - 9:47 am UTC

funny -- thought the IOT was the plan but abandoned because you thought it would not work, when it would have to be built with the key in that order in the first place :)



you have size 94, is 94 bytes sufficient to hold 9 rows per hash key?

More info cont..

Scott Watson, March 19, 2004 - 11:38 am UTC

The 94 bytes is the space used per row. I guess I should be using 94*9 instead, an oversite on my part.

The IOT is a great option and it is what I had been recommending. The problem was that they added another query that wasn't suited for how the table was origninally built.

So given the new set of requirements, I thought I would build one IOT version and one single table hash cluster and compare the performance difference between the two. ( I would have never thought of using the cluster until I read your new book) Once I saw the terrible performance of the load into the cluster I decided I did something wrong and sought your advice. I did not mean for you to think that I was not considering the IOT.


Thanks



Tom Kyte
March 19, 2004 - 1:55 pm UTC

well, part of the horrid performance would be caused by the massive collisions and chaining brought about by undersizing the cluster by an order of magnitude!

"size 1000"

Follow up on index clusters

Christo Kutrovsky, April 27, 2004 - 4:36 pm UTC

When you have btree single table clusters, and you set the size low, i.e. several rows with different cluster key share the same block, how are the blocks linked when 1 block is not enought to hold the data?

Is it simply "all those cluster keys share that linked list of data blocks" ? Or is it "the next block for cluster key <bla> is this <rowid>" ?

We have a big table (300 milion rows, ~120 bytes per row) which has a list_id having ~150 000 distinct values, very skewed. For some list_id values we will have 1 row, for others 1 million rows. I was thinking of using a btree cluster index. I desided I will first try the cluster with 2 columns (list_id, domain_id). I am testing it now, but the problem is with those small list_ids, I would have 10 rows with the same list_id, but different domain_ids, thus will be over 10 datablocks. Initially I desided that I will have 1 cluster key per block, but now I may need to reconsider my decision, and depending on your anwers I may just drop the idea.

I am also open to other ideas.

9.2


Tom Kyte
April 28, 2004 - 2:11 pm UTC

well, it is not 100% about size -- it is related more to how the data is inserted.


Lets say you created a key on "X".

You insert 100 records with X=5
then
You insert 100 records with X=6

and assume 75 rows fit on a block.....

Well, the first 75 rows for x=5 would go onto a block (the first x=5 set the cluster key -> rowid ) -- and then the next 25 would go onto a chained block.

Then, the first row for x=6 would go onto its own block (discovering that the block it *wanted* to go onto was full) and the next 74 would follow it and then the next 25 would be chained.


Now, if you loaded x=5, x=6, and then 99 x=5's and 99 x=6's -- then x=5 and x=6 would share the same block and both would chain -- so you would have 200 rows on two blocks.


IOT's might be something to look into for locale of data (getting them together) without having to "right size" it. that and you can partition as well.

clustered materialized view on heap table

Harry, January 08, 2005 - 10:01 pm UTC

Is it possible to create clustered MV for query and use heap table for DML?

Tom Kyte
January 09, 2005 - 11:32 am UTC

well, as long as you understand that the DML applied to the heap table is ultimately also applied to the MV... sure, the MV can have very different structural qualities from the parent table.


ops$tkyte@ORA9IR2> create cluster single_table_hash_cluster
  2  ( hash_key number(4) )
  3  hashkeys 50000
  4  size 45
  5  single table
  6  /
 
Cluster created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create table emp as select * from scott.emp;
 
Table created.
 
ops$tkyte@ORA9IR2> alter table emp add constraint emp_pk primary key(empno);
 
Table altered.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create materialized view emp_mv
  2  cluster single_table_hash_cluster(empno)
  3  as
  4  select * from emp;
 
Materialized view created.
 
 

HASHKEYS

Jim, February 10, 2005 - 8:18 pm UTC

Is there a way to change the HASHKEYS parameter value after the cluster is created and populated? I was wondering that could help to avoid "collisions". Thanks.

Tom Kyte
February 11, 2005 - 7:53 pm UTC

hashkeys is used to size the cluster initially and set up the hash function to map the hash key to a DBA (data block address)

they take (hashkeys * size / block size) and allocate that much space.

so, changing hashkeys would require a rebuild.

Loading data

Peter Tran, February 16, 2005 - 2:48 pm UTC

Tom,

In your above example with X=5 and X=6, are you suggesting that we should try to load the data with the same clustered key at the same time to get the most benefit?

If this is not possible, then would using a cluster still make sense?

Thanks,
-Peter

Tom Kyte
February 16, 2005 - 3:04 pm UTC

if you want the most perfect cluster ever -- yes, you would attempt to load "in order"

however. if they do arrive out of order -- but the cluster was sized for it (the keys and size were "right") the cluster would still be beneficial.

In the example above, with x=5/6 we had sized the cluster to allow for multiple x= values to be on a block -- and you load MORE than fits, that is when the problems come into play. (so it was a wrong sized cluster that lead to the deterioration because of the out of order arrival. had the cluster been sized -- it would not have deterioriated -- might waste space though)

Big tables or many reads?

Flavio Casetta, March 29, 2005 - 8:29 am UTC

Hi Tom,
I have got an application where there are some tables storing geographical data like cities, regions, states and so on. Primary keys are numbers generated by sequences and a typical "where" clause contains only equality operators against primary keys, like for instance "where region_id = :1 and location_id = :2 ".
This kind of data doesn't change over time, indeed it has been loaded once for good therefore the correct size of the hash key could be easily calculated. These tables aren't big, say the biggest will contain 20.000 rows, while the smallest is only 50 rows, but they are likely to be queried at all times.
Are they a candidate for some sort of multi table cluster because, in the end, the less blocks you read, the better is, or is it a completely useless exercise because it would be much better to cache direcly the smaller ones and let an ordinary unique index scan do the rest of the job?

Thanks and bye,
Flavio

PS: great books, I love them!

Tom Kyte
March 29, 2005 - 10:24 am UTC

are you joining these tables and do you query them by the same key? Doesn't look that way.

for example,

emp( ..., DEPTNO )
dept( DEPTNO, .... )

create them in a multi-table cluster by DEPTNO and

from emp, dept where emp.deptno = dept.deptno and dept.deptno = :x;

might benefit.

Yes, tables are joined by the same keys

A reader, March 29, 2005 - 10:43 am UTC


Yes Tom,
I forgot to include the joins...

region_id is on the "provinces" table and is a foreign key to the "regions" table. Province_id is on the "locations" table and is a foreign key to the "provinces" table.
This is a snowflake sort of thing.

So, when I query the tables there are always the joins between the tables in the where clause.

Bye,
Flavio

Tom Kyte
March 29, 2005 - 11:07 am UTC

right but they don't use the same keys do they?

lets see a sample query (small)

here is the actual query

Flavio Casetta, March 29, 2005 - 11:35 am UTC

It's an HTMLDB based application query.

select i.item, i.ean, i.id, i.unit, i.scale, i.category_name, l.net_price - l.discount price, l.net_price, l.last_date, l.store_name, l.store_group_id,
l.location_id, s.province_id, p.region_id
from items_list i, last_prices l, locations s, provinces p
where i.id = l.item_id
and l.location_id = s.id
and s.province_id = p.id
and (l.store_id = :P120_STORE_ID or :P120_STORE_ID = 0)
and (l.location_id = :P120_LOCATION_ID or :P120_LOCATION_ID = 0)
and (s.province_id = :P120_PROVINCE_ID or :P120_PROVINCE_ID = 0)
and (p.region_id = :P120_REGION_ID or :P120_REGION_ID = 0)
and (l.store_group_id = :P120_STORE_GROUP_ID or :P120_STORE_GROUP_ID = 0)
and (i.category_id = :P120_CATEGORY_ID or :P120_CATEGORY_ID = 0)


Tom Kyte
March 29, 2005 - 12:27 pm UTC

"I" and "L" go together by ID/ITEM_ID

but "L" and "S" by location_id/id
and "S" and "P" by province_id/id

I don't see how you would be able to cluster them all together since they join on different keys.

If you cluster "I" and "L" by ID/ITEM_ID, then "S" could not be clusterd with "L", since "L" would have to be clustered by location_id...

and so on.


that "or :Pxxx_yyyy_zzz = 0" troubles me, what is that for?

explanation

Flavio Casetta, March 29, 2005 - 4:45 pm UTC

Tom,
It means any region or any location and so on.
When I first wrote the clause instead of "OR :Pxxx_yyy_zzz = 0" there was "OR :Pxxx_yyy_zzz is null" because the logic was that a null value should mean any value.
Then I found out that HTMLDB is a bit naive in the way it represents null values in drop down lists as a null value is actually represented by the string '%null%' if you don't specify something else, but only after submitting the page.
So you end up with a query that needs to check if a value is null or if it is equal to '%null%'. That's why I preferred to use my own "null" value and I opted for 0.

Back to clusters, what you say is also my understanding, but do you see any advantage in clustering together just the two biggest tables (locations and provinces)?

The other two tables are not eligible I guess, one is a view containing OLAP functions.

Thanks!
Flavio

Tom Kyte
March 29, 2005 - 7:46 pm UTC

I would construct the query based on the inputs then. use a "plsql function returning a query" -- build the query with the where clauses you need.


as for the clustering, do you have a physical IO issue now? The cluster would possibly help decrease that.

and the 11th commandment ?

Flavio, March 30, 2005 - 3:38 am UTC

Tom,
if understand correctly you suggest to fine-tune the WHERE clause omitting some ORs instead of having a unique sql statement using bind variables.
Isn't that a breach of the 11th commandment????
"Don't clutter up the shared pool with unrecyclable sql statements" (Kyte, 17, 11)

;-)

Now, seriously, what is the exact risk behind using that kind of boolean logic?

Currently there are no I/O issues, but, as you teach, the sooner you tackle possible bottlenecks, the better...

Bye,
Flavio

Tom Kyte
March 30, 2005 - 7:20 am UTC

they are reusable, you will have a very finite set of fully shareable cursors with the best plans possible.




tradeoffs

Flavio Casetta, March 30, 2005 - 8:05 am UTC

that's certainly true Tom, I made a comparison between the execution plans and what you say is perfectly reasonable.
However this leads me to ask how and if you normally weigh the tradeoffs between higher performance and coding readability.
Are there situations where you privilege the latter or do you always favor performance, even if there is only a slight difference?
In other words, where is the turning point, if any?

Bye,
Flavio

Tom Kyte
March 30, 2005 - 9:07 am UTC

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:12691263416650#36755424747836 <code>

shows my advanced search page query block from HTMLDB, building a simple query based on inputs is a fairly linear process -- the if statements don't seem to get too much in the way of readability.

Maintainable code is extremely important and performance is always a consideration, I don't think either has to be traded off necessarily.


thanks!

Flavio Casetta, March 30, 2005 - 3:56 pm UTC

I'll modify the query type then and see the results.

Thanks for you time and patience!

Bye,
Flavio

Pls. clarify

A reader, July 12, 2005 - 11:26 pm UTC

Tom,

At the start of the discussion you mention that a primary key lookup with a hash cluster will require only 1 I/O. However, subsequently you mention that 50 could be in chained blocks also. How can it be possible if that column is the primary key?

Please clarify

Tom Kyte
July 13, 2005 - 10:59 am UTC

"50"?

not sure what you mean

RE : Hash Cluster

A reader, July 13, 2005 - 4:52 pm UTC

Tom,

Sorry for not being very clear. The "50" in my question refers to the value that we lookup against. You mention that we need to keep looking in the chained list (once the data block address of the starting block is determined using the hashing algorithm) because there is a possiblity the value of 50 could also be in another block in the same list of blocks. But the initial thread is that these are primary keys. How can 50 be in another block in the chain?



Tom Kyte
July 13, 2005 - 4:59 pm UTC

Yes, if your hash cluster is sized improperly and too many rows hash to the same block - then Oracle will chain the blocks together.

However, you can have the same issue with migrated rows.

A properly sized hash cluster for a lookup table however will give you pretty much a SINGLE IO for a keyed lookup

as compared to an index which will give you 3 or so IO's in the index and 1 IO against the table.

Howard J. Rogers, July 13, 2005 - 9:29 pm UTC

I might just add that in 10g we get a SORTED hash cluster, so that example of Tom's to which you refer (I think it was from 2003, so not surprising if Tom has forgotten about it!) would not have to take place, even if the hash cluster was not clustered on the primary key.

If there were a number of rows with values 50, 55 and 60, then we have to search all possible hash buckets in the chain, even for the 50 and 55 values, because they might be hiding anywhere in the chain. But if you sort the values, then all the 50s would be together, all the 55s would be together and so forth... at which point, as soon as you find your first 55, you know there cannot be any more 50s, and you can stop searching.

I like sorted hash clusters a lot!

The point from Tom's original example was that in an unsorted hash cluster, you know that 50, 55, 60, 65 and so on are all going to be found in the one hash bucket (and that 51, 56, 61, 66 would all be in another hash bucket). But you don't know where in the bucket they are, and so you search until you find the one you want. I can't recall if the optimiser understood that the hash key was a primary key, and therefore having found one value of 50 it could assume there would be no others, but I don't think that's the main point even so: in the absence of a sort within the cluster, you cannot know that 50 will be in the first part of the particular hash bucket, and therefore *in principle*, hash collisions give rise to extra I/O.


Tom Kyte
July 14, 2005 - 10:23 am UTC

sorted hash clusters are good for a fifo queue, but they break down seriously fast if the data arrives out of order, seriously fast. Good therefore for stock quotes in hindsight (I'd size the size of the data associated with a key to be a block so you would get a block per key in that case)

view on cluster segments

reader, July 27, 2005 - 1:26 pm UTC

(1) Can one create a view on cluster segements? any restrictions?
(2) can one implement FGA control on cluster tables?
Thanks.

Tom Kyte
July 27, 2005 - 2:46 pm UTC

one creates views of tables (and other views)

a table can be in a cluster.

a cluster is not queryable directly, you need to put a table in it.

So, it doesn't really make sense to create a view of a cluster segment (not a table), but tables in clusters -- sure, your databases data dictionary is chock full of examples of that!

you can implement FGA (fine grained auditing) on tables, yes. Even if you meant FGAC (fine grained access control), you can do that on tables and views.

Performance

Anil, October 17, 2005 - 6:45 pm UTC

Hi tom,

If we create a cluster having m number of tables("key1" as pk ) on a cluster key say "key1"
then select on these will be faster instead of joining all these tables then
1.insertion/updation will be slow?
2.LIO's might be high? depending on the query u fired.
3.If it's a OLTP env, will there be any contension for latches in momory while doing select/dml operations?

Thankx
Anil


Tom Kyte
October 18, 2005 - 8:25 am UTC

the select might be faster - what you have done is make it so that the related data is stored together - so one physical IO might get all of the data you need, whereas if the data were stored all over the place - it might take many more physical IO's to get the data together.

1) modifications are penalized, to the possible benefit of data retrieval, yes.

2) no, you would expect LIO's to stay the same or more likely - DECREASE. The data is stored on a single (or small set of) block. We don't have to go to 50 blocks to get 50 rows - we goto one block to get the 50 rows.

3) the only answer that can be given is one of three. Yes, No, Maybe - eg: it depends.

if a single session tends to work on "KEY1=X" by itself - you'll likely reduce contention (as all of the key1=x data is together - and key1=y isn't on the same block in general - you've segregated the data out)

but that is not assured.

performance

anil, October 18, 2005 - 10:27 am UTC

Hi tom,

Thankx for the quick turnaround.

Regarding the LIO's if query being fired is somewhat like this
select * from a;
select * from b;
select * from c;
etc
and a ,b ,c are part of cluster then in this case LIO's won't be higher as the data is across multiple blocks?

Thankx,
ANIL





Tom Kyte
October 18, 2005 - 10:44 am UTC

no, that is not true.


select * from a where key1 = 'x'
select * from b where key1 = 'x'
select * from c where key1 = 'x'

could incurr less LIO/PIO if a, b, c are clustered on the key1 column since all of the x values would be near eachother, clustered.


just select * from a however will incurr the same LIO/PIO regardless of clustering, you are getting ALL of the rows.

performance

anil, October 18, 2005 - 10:31 am UTC

Hi tom,

Thankx for the quick turnaround.

Regarding the LIO's if query being fired is somewhat like this
select * from a;
select * from b;
select * from c;
etc
and a ,b ,c are part of cluster then in this case LIO's won't be higher as the data is across multiple blocks?

Thankx,
ANIL





performance

Anil, October 18, 2005 - 2:21 pm UTC

tom,

Please help me in investigating why the LIO's and some wait events are on higher side(like db file seq.read,log file sync and latch free)


STATSPACK report for

DB Name DB Id Instance Inst Num Release OPS Host
------------ ----------- ------------ -------- ----------- --- ------------
MRDP 354944511 MRDP 1 8.1.7.4.0 NO mrdutpc1pr

Snap Id Snap Time Sessions
------- ------------------ --------
Begin Snap: 58 14-Oct-05 10:46:08 512
End Snap: 59 14-Oct-05 11:49:20 512
Elapsed: 63.20 (mins)

Cache Sizes
~~~~~~~~~~~
db_block_buffers: 180000 log_buffer: 5242880
db_block_size: 8192 shared_pool_size: 350000000

Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 84,660.44 3,760.70
Logical reads: 4,542.45 201.78
Block changes: 476.59 21.17
Physical reads: 69.20 3.07
Physical writes: 15.38 0.68
User calls: 543.61 24.15
Parses: 218.49 9.71
Hard parses: 16.06 0.71
Sorts: 4.40 0.20
Logons: 1.35 0.06
Executes: 359.59 15.97
Transactions: 22.51

% Blocks changed per Read: 10.49 Recursive Call %: 50.75
Rollback per transaction %: 11.52 Rows per Sort: 11.23

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 99.98 Redo NoWait %: 100.00
Buffer Hit %: 98.48 In-memory Sort %: 99.99
Library Hit %: 97.84 Soft Parse %: 92.65
Execute to Parse %: 39.24 Latch Hit %: 99.11
Parse CPU to Parse Elapsd %: 80.91 % Non-Parse CPU: 88.36

Shared Pool Statistics Begin End
------ ------
Memory Usage %: 39.64 83.23
% SQL with executions>1: 7.69 5.97
% Memory for SQL w/exec>1: 10.83 9.05

Top 5 Wait Events
~~~~~~~~~~~~~~~~~ Wait % Total
Event Waits Time (cs) Wt Time
-------------------------------------------- ------------ ------------ -------
db file sequential read 193,827 165,114 67.98
log file sync 87,051 22,982 9.46
SQL*Net break/reset to client 141,005 16,198 6.67
latch free 155,806 11,999 4.94
log file parallel write 74,366 9,773 4.02
-------------------------------------------------------------
Wait Events for DB: MRDP Instance: MRDP Snaps: 58 -59
-> cs - centisecond - 100th of a second
-> ms - millisecond - 1000th of a second
-> ordered by wait time desc, waits desc (idle events last)

Avg
Total Wait wait Waits
Event Waits Timeouts Time (cs) (ms) /txn
---------------------------- ------------ ---------- ----------- ------ ------
db file sequential read 193,827 0 165,114 9 2.3
log file sync 87,051 2 22,982 3 1.0
SQL*Net break/reset to clien 141,005 9 16,198 1 1.7
latch free 155,806 86,079 11,999 1 1.8
log file parallel write 74,366 0 9,773 1 0.9
SQL*Net more data to client 260,636 23 7,958 0 3.1
db file scattered read 7,732 0 3,469 4 0.1
db file parallel write 6,948 0 2,839 4 0.1
log file sequential read 4,068 0 995 2 0.0
buffer busy waits 3,014 0 626 2 0.0
enqueue 2,047 0 356 2 0.0
control file parallel write 1,368 0 341 2 0.0
direct path read 1,927 0 64 0 0.0
direct path write (lob) 440 0 59 1 0.0
file open 3,636 0 50 0 0.0
direct path read (lob) 333 0 28 1 0.0
LGWR wait for redo copy 918 0 8 0 0.0
log file switch completion 3 0 7 23 0.0
buffer deadlock 29 29 3 1 0.0
direct path write 560 0 2 0 0.0
control file sequential read 349 0 2 0 0.0
file identify 14 0 1 1 0.0
library cache pin 12 0 1 1 0.0
refresh controlfile command 10 0 0 0 0.0
log file single write 4 0 0 0 0.0
library cache load lock 2 0 0 0 0.0
virtual circuit status 1,272,864 83 6,374,114 50 14.9
SQL*Net message from client 1,594,984 0 71,694 0 18.7
SQL*Net more data from clien 16,944 0 15,379 9 0.2
SQL*Net message to client 1,594,984 0 1,067 0 18.7

thanks a lot,
Anil


Single-table hash clusters

A reader, January 12, 2006 - 5:39 am UTC

Hi Tom,
I've got a question regarding single-table hash clusters. If your table primary key is a varchar2 type column, Oracle doesn't allow you to specify it in the 'HASH IS' part, since that only takes a NUMBER type. Is there any negative impact in this case of using the internal hashing function? What is the reasoning behind only allowing you to specify a NUMBER type for the Hash function?
Thanks

Tom Kyte
January 12, 2006 - 11:01 am UTC

ops$tkyte@ORA10GR2> create cluster Hash_Cluster
  2  ( id varchar2(200) )
  3  SINGLE TABLE
  4  hashkeys 75000   -- number of distinct lookups we expect over time
  5  size 125         -- size of data associated with key
  6  /

Cluster created.


no sure what you are trying to do exactly?  have example? 

Single table hash cluster - more info

A reader, January 16, 2006 - 4:15 am UTC

Hi Tom,
  This is what I'm trying to do:

SQL> create cluster hash_cluster
  2  (id varchar2(2000)) 
  3  single table
  4  hashkeys 200
  5  hash is id 
  6  size 125;
create cluster hash_cluster
*
ERROR at line 1:
ORA-02456: The HASH IS column specification must be NUMBER(*,0)


  I was trying to use the cluster key as the hash function.  But Oracle doesn't like it since it is a varchar2 column.   

Tom Kyte
January 16, 2006 - 9:37 am UTC

Right - you need to apply some function to your string resulting in some number, eg:

ops$tkyte@ORA10GR2> create cluster hash_cluster
  2  (id varchar2(2000))
  3  single table
  4  hashkeys 200
  5  hash is TO_NUMBER(id)
  6  size 125;

Cluster created.

It need not be TO_NUMBER, just some function that takes your inputs and hashes them - returning a number.

but in general, you don't even need the "hash is", we'll hash your string into a number for you. 

Thanks for the info.

A reader, January 16, 2006 - 11:00 am UTC

Hi Tom,
Thanks for the quick response. I do realise that if you don't specify the 'HASH IS' clause, Oracle uses its own hashing function.

My question was how different will this be in terms of performance, if any? The manual says

'If no HASH IS clause is specified, the database uses an internal hash function. If the cluster key is already a unique identifier that is uniformly distributed over its range, you can bypass the internal hash function and specify the cluster key as the hash value, as is the case in the preceding example. You can also use the HASH IS clause to specify a user-defined hash function.'

I was just curious if there would be any performance difference in using the different hash functions (internal, user-defined and just using the cluster key)
Regards

Tom Kyte
January 16, 2006 - 12:49 pm UTC

depends - can you effciently hash your string, more so than we can. If so, go ahead and write your own hash - but just remember, it needs to return a NUMBER.

You can "hash is" your string, with the rule that your hash function returns a NUMBER.

How can I set the right size for a Cluster (hash/btree)???

Jairo Ojeda, February 10, 2006 - 1:01 pm UTC

This is the first time I will use cluster tables and I don't know how to set the right size:
using single table hash cluster, using multi table hash cluster or using btree cluster.
Can you tell me how to calculate that size?
Thanks,


Tom Kyte
February 10, 2006 - 1:27 pm UTC

do you have access to my books where I cover this in great detail?

the size is simply the sum of the bytes associated with each key.

If you hash by "id", what amount of data is associated with id. that is the size.

Cluster collision

Jairo Ojeda, February 14, 2006 - 10:10 am UTC

Thanks, for your comments about cluster size!, but now I want to know how to check if I get cluster collision?

Tom Kyte
February 14, 2006 - 11:12 am UTC

read out each key value (assuming the hash key is unique).

I would expect one LIO per retrieval. For example, this is a quick demo I've used:

create cluster Hash_Cluster
( id number )
SINGLE TABLE
hashkeys 75000 -- number of distinct lookups we expect over time
size 125 -- size of data associated with key
/
pause


clear screen
create table heap_lookup
( OWNER, OBJECT_NAME, SUBOBJECT_NAME, OBJECT_ID,
DATA_OBJECT_ID, OBJECT_TYPE, CREATED, LAST_DDL_TIME,
TIMESTAMP, STATUS, TEMPORARY, GENERATED, SECONDARY,
primary key(object_id)
)
as
select * from all_objects;

create table hash_lookup
( OWNER, OBJECT_NAME, SUBOBJECT_NAME, OBJECT_ID,
DATA_OBJECT_ID, OBJECT_TYPE, CREATED, LAST_DDL_TIME,
TIMESTAMP, STATUS, TEMPORARY, GENERATED, SECONDARY,
primary key(object_id)
)
cluster hash_cluster(object_id)
as
select * from heap_lookup;
pause
clear screen
begin
dbms_stats.gather_table_stats
( user, 'HEAP_LOOKUP', cascade=>true );
dbms_stats.gather_table_stats
( user, 'HASH_LOOKUP', cascade=>true );
end;
/
pause

clear screen
declare
type array is table of number;
l_object_ids array;
l_rec heap_lookup%rowtype;
begin
select object_id bulk collect into l_object_ids
from heap_lookup;

execute immediate q'|
alter session set events '10046 trace name context forever, level 12'
|';

for k in 1 .. l_object_ids.count
loop
select * into l_rec
from heap_lookup
where object_id = l_object_ids(k);
select * into l_rec
from hash_lookup
where object_id = l_object_ids(k);
end loop;
end;
/



In the tkrprof, I expect something like this:

SELECT * FROM HEAP_LOOKUP WHERE OBJECT_ID = :B1

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 99115 7.96 7.96 0 148671 0 49557

Rows Row Source Operation
------- ---------------------------------------------------
49557 TABLE ACCESS BY INDEX ROWID HEAP_LOOKUP (cr=148671 pr=0 pw=0 time=1798682 us)
49557 INDEX UNIQUE SCAN SYS_C006949 (cr=99114 pr=0 pw=0 time=959911 us)(object id 57938)
********************************************************************************
SELECT * FROM HASH_LOOKUP WHERE OBJECT_ID = :B1

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 99115 7.88 7.93 0 49559 0 49557

Rows Row Source Operation
------- ---------------------------------------------------
49557 TABLE ACCESS HASH HASH_LOOKUP (cr=49557 pr=0 pw=0 time=1352434 us)



the heap table did the io's via the INDEX and then into the table, the number of IO's against the table = number of rows retrieved.

the hash clustered table did the IO's directly - number of IO's against the table exactly and precisely equal the number of rows retrieved. "best case".


You'll always get "collisions" (they are expected, we get more than one row per block), you want to know if it is taking more IO than necessary to get your data.

In this case, the answer is "no", but if we lowered the cluster size - eventually the answer would be "yes".

Cluster Table

Arindam Mukherjee, August 18, 2006 - 9:08 am UTC

Respected Mr. Tom,

I have read 10 times the following lines from Oracle 9i Database Concepts – Chapter 10 and sorry I could not understand the following two (2) sentences. If possible, could you kindly explain me with example?

Sentence -1
--------------
Storage parameters cannot be specified when a clustered table is created or altered.

Sentence -2
--------------
The storage parameters set for the cluster always control the storage of all tables in the cluster.

ItÂ’s true; I am not much more familiar with cluster table.


Tom Kyte
August 18, 2006 - 10:25 am UTC

when you create a CLUSTER - you are creating a segment, the segment (the storage) is the CLUSTER.

when you create a table in a cluster, you do not create a segment, you are just creating a table.  the segment (the storage) is the cluster segment, not the table.


So, when you create a table in a cluster (also known as a clustered table), you cannot specify storage parameters like initial, next and so on because the TABLE doesn't control the storage, the cluster itself does.  And the table just resides in the cluster segment.


Here is a short writeup from Expert Oracle database Architecture on this:

<quote>
So, letÂ’s look at how we might go about creating a clustered object. Creating a cluster of tables in the object is straightforward. The definition of the storage of the object (PCTFREE, PCTUSED, INITIAL, and so on) is associated with the CLUSTER, not the tables. This makes sense since there will be many tables in the cluster, and they will be on the same block. Having different PCTFREEs would not make sense. Therefore, a CREATE CLUSTER looks a lot like a CREATE TABLE with a small number of columns (just the cluster key columns):

ops$tkyte@ORA10GR1> create cluster emp_dept_cluster
  2  ( deptno number(2) )
  3  size 1024
  4  /
Cluster created.

Here, we have created an index cluster (the other type being a hash cluster, which weÂ’ll look at in the next section). The clustering column for this cluster will be the DEPTNO column. The columns in the tables do not have to be called DEPTNO, but they must be NUMBER(2), to match this definition. We have, on the cluster definition, a SIZE 1024 option. This is used to tell Oracle that we expect about 1,024 bytes of data to be associated with each cluster key value. Oracle will use that to compute the maximum number of cluster keys that could fit per block. Given that we have an 8KB block size, Oracle will fit up to seven cluster keys (but maybe less if the data is larger then expected) per database block. This is, the data for departments 10, 20, 30, 40, 50, 60, and 70 would tend to go onto one block, and as soon as we insert department 80, a new block will be used. That does not mean that the data is stored in a sorted manner; it just means that if we inserted the departments in that order, they would naturally tend to be put together. If we inserted the departments in the order 10, 80, 20, 30, 40, 50, 60, and then 70, the final department, 70, would tend to be on the newly added block. As weÂ’ll see below, both the size of the data and the order in which the data is inserted will affect the number of keys we can store per block.

The SIZE parameter therefore controls the maximum number of cluster keys per block. It is the single largest influence on the space utilization of our cluster. Set the size too high, and weÂ’ll get very few keys per block and weÂ’ll use more space then we need. Set the size too low, and weÂ’ll get excessive chaining of data, which offsets the purpose of the cluster to store all of the data together on a single block. It is the most important parameter for a cluster.

Now for the cluster index on our cluster. We need to index the cluster before we can put data in it. We could create tables in the cluster right now, but weÂ’re going to create and populate the tables simultaneously, and we need a cluster index before we can have any data. The cluster indexÂ’s job is to take a cluster key value and return the block address of the block that contains that key. It is a primary key in effect, where each cluster key value points to a single block in the cluster itself. So, when we ask for the data in department 10, Oracle will read the cluster key, determine the block address for that, and then read the data. The cluster key index is created as follows:

ops$tkyte@ORA10GR1> create index emp_dept_cluster_idx
  2  on cluster emp_dept_cluster
  3  /
Index created.

It can have all of the normal storage parameters of an index and can be stored in another tablespace. It is just a regular index, so it can be on multiple columns; it just happens to index into a cluster and can also include an entry for a completely null value (see Chapter 11 for the reason why this is interesting to note). Note that we do not specify a list of columns in this CREATE INDEX statement—that is derived from the CLUSTER definition itself. Now we are ready to create our tables in the cluster:

ops$tkyte@ORA10GR1> create table dept
  2  ( deptno number(2) primary key,
  3    dname  varchar2(14),
  4    loc    varchar2(13)
  5  )
  6  cluster emp_dept_cluster(deptno)
  7  /
Table created.
 
ops$tkyte@ORA10GR1> create table emp
  2  ( empno    number primary key,
  3    ename    varchar2(10),
  4    job      varchar2(9),
  5    mgr      number,
  6    hiredate date,
  7    sal      number,
  8    comm     number,
  9    deptno number(2) references dept(deptno)
 10  )
 11  cluster emp_dept_cluster(deptno)
 12  /
Table created.

Here, the only difference from a “normal” table is that we used the CLUSTER keyword and told Oracle which column of the base table will map to the cluster key in the cluster itself. Remember, the cluster is the segment here, therefore this table will never have segment attributes such as TABLESPACE, PCTFREE, and so on—they are attributes of the cluster segment, not the table we just created. 
</quote> 

IOT with sys_guid() values

A reader, November 24, 2006 - 7:00 am UTC

Hi Tom -

Can you explain how an IOT with a primary key populated by sys_guid() would work?

Just not sure how the IOT would determine what order to put RAW values like 6FB56BAF0F5140E184C5E9FDA101389F, B3457FA7049F4C41A48A6E250982CEF6, 8DC7D39212FD4BA390A53A0925D47D7D, . . .

How does it determine if 6FB56BAF0F5140E184C5E9FDA101389F should be stored before or after the value B3457FA7049F4C41A48A6E250982CEF6 ???

Thanks for your advice

Tom Kyte
November 24, 2006 - 6:41 pm UTC

the same as an IOT populated by anything.

the rows are stored in sorted order by primary key.

So, is 6B > B3 or is 6b < B3, it just compares bytes (that is the rule of raw)

Too many consistent gets in cluster access

Lino Bort, March 21, 2007 - 12:12 pm UTC

Hi Tom,

I am considering the use of a cluster to store together data from 5 tables than are joined frequently.

I have set up the following test:

create cluster simple_cluster (id number) size 128;

create index idx_simple_cluster on cluster simple_cluster;

create table t1 (id number primary key, a varchar2(40)) cluster simple_cluster (id);

create table t2 (id number primary key, b date) cluster simple_cluster (id);

create table ht1 (id number primary key, a varchar2(40));

create table ht2 (id number primary key, b date);

insert
when (1=1) then
  into t1 values (id,a)
when (1=1) then
  into t2 values (id,b)
select object_id id, object_name a, created b from all_objects where rownum <= 10000;

commit;

insert into ht1 select * from t1;

insert into ht2 select * from t2;

commit;

analyze cluster simple_cluster compute statistics;

analyze index idx_simple_cluster compute statistics;

analyze table t1 compute statistics for table for all indexes for all indexed columns;

analyze table t2  compute statistics for table for all indexes for all indexed columns;

analyze table ht1 compute statistics for table for all indexes for all indexed columns;

analyze table ht2 compute statistics for table for all indexes for all indexed columns;

analyze index idx_simple_cluster validate structure;

select lf_blks, br_blks, height, distinct_keys from index_stats;

   LF_BLKS    BR_BLKS     HEIGHT DISTINCT_KEYS
---------- ---------- ---------- -------------
        27          1          2         10000

select dbms_rowid.rowid_block_number(t1.rowid) t1_block,
       dbms_rowid.rowid_block_number(t2.rowid) t2_block
from t1, t2
where t1.id = t2.id
  and t1.id = 258;

  T1_BLOCK   T2_BLOCK
---------- ----------
     22085      22085

set autotrace traceonly

select *
from t1, t2
where t1.id = t2.id
  and t1.id = 258;

...
--------------------------------------------------------------------------------------------
| Id  | Operation                    | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |             |     1 |    38 |     3   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                |             |     1 |    38 |     3   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T2          |     1 |    11 |     2   (0)| 00:00:01 |
|*  3 |    INDEX UNIQUE SCAN         | SYS_C006814 |     1 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| T1          |     1 |    27 |     1   (0)| 00:00:01 |
|*  5 |    INDEX UNIQUE SCAN         | SYS_C006813 |     1 |       |     0   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
...
Estadistica
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          6  consistent gets
...

select *
from ht1, ht2
where ht1.id = ht2.id
  and ht1.id = 258;
...
--------------------------------------------------------------------------------------------
| Id  | Operation                    | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |             |     1 |    39 |     3   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                |             |     1 |    39 |     3   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| HT2         |     1 |    13 |     2   (0)| 00:00:01 |
|*  3 |    INDEX UNIQUE SCAN         | SYS_C006816 |     1 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| HT1         |     1 |    26 |     1   (0)| 00:00:01 |
|*  5 |    INDEX UNIQUE SCAN         | SYS_C006815 |     1 |       |     0   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
...
Estadistica
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          6  consistent gets
...

alter index idx_simple_cluster rebuild;

analyze index idx_simple_cluster validate structure;

set autotrace off

select lf_blks, br_blks, height, distinct_keys from index_stats;

   LF_BLKS    BR_BLKS     HEIGHT DISTINCT_KEYS
---------- ---------- ---------- -------------
        24          1          2         10000

set autotrace traceonly

select *
from t1, t2
where t1.id = t2.id
  and t1.id = 258;
/
...
skiped the first plan
...
--------------------------------------------------------------------------------------------
| Id  | Operation             | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                    |     1 |    38 |     3   (0)| 00:00:01 |
|   1 |  NESTED LOOPS         |                    |     1 |    38 |     3   (0)| 00:00:01 |
|   2 |   TABLE ACCESS CLUSTER| T2                 |     1 |    11 |     2   (0)| 00:00:01 |
|*  3 |    INDEX UNIQUE SCAN  | IDX_SIMPLE_CLUSTER |     1 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS CLUSTER| T1                 |     1 |    27 |     1   (0)| 00:00:01 |
|*  5 |    INDEX UNIQUE SCAN  | IDX_SIMPLE_CLUSTER |     1 |       |     0   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
...
Estadistica
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          8  consistent gets
...



The cluster index has a height of 2. Therefore, with 3 blocks of I/O, in theory, we can get the data from both tables we are looking for. How is it possible to take 8 consistent gets?

Thanks in advance,
Tom Kyte
March 21, 2007 - 12:25 pm UTC

tkprof it. with extended statistics, you'll see the IO's performed step by step.

Statistics for cluster access path

Lino Bort, March 21, 2007 - 1:16 pm UTC

Hi Tom,

I have obtained the following statistics from a trace file:

STAT #1 id=1 cnt=1 pid=0 pos=1 obj=0 op='NESTED LOOPS  (cr=8 pr=0 pw=0 time=1440 us)'
STAT #1 id=2 cnt=1 pid=1 pos=1 obj=50728 op='TABLE ACCESS CLUSTER T2 (cr=4 pr=0 pw=0 time=1362 us)'
STAT #1 id=3 cnt=1 pid=2 pos=1 obj=50725 op='INDEX UNIQUE SCAN IDX_SIMPLE_CLUSTER (cr=2 pr=0 pw=0 time=635 us)'
STAT #1 id=4 cnt=1 pid=1 pos=2 obj=50726 op='TABLE ACCESS CLUSTER T1 (cr=4 pr=0 pw=0 time=51 us)'
STAT #1 id=5 cnt=1 pid=4 pos=1 obj=50725 op='INDEX UNIQUE SCAN IDX_SIMPLE_CLUSTER (cr=2 pr=0 pw=0 time=26 us)'


Meaning that I performed 4 consistent gets for the cluster index and other 4 for the cluster access. But, actually, all the data for that cluster key is in one block. ¿?

Thanks

Tom Kyte
March 21, 2007 - 7:19 pm UTC

sure it is all in one block - but we got the block once to access T1 and again to access T2.

this is normal.

It put all of the rows for T1 onto as few blocks as possible. Same for T2. And it tries to put T1 and T2 together (to reduce physical IOs)

but we still have to do a "get" for T1 data and a "get" for T2 data.

Clustering made it so that physical IO is reduced (1 IO, get the data) and logical IO is reduced as much as possible.

More tests in my cluster scenario

Lino Bort, March 22, 2007 - 7:02 am UTC

Hi Tom,

I researched a bit more in my test case and find something interesting.

I looked at the data distribution:

select table_name, blocks, empty_blocks,  avg_space
from user_tables
where table_name in ('T1','T2','HT1','HT2')
order by table_name;

TABLE_NAME                         BLOCKS EMPTY_BLOCKS  AVG_SPACE
------------------------------ ---------- ------------ ----------
HT1                                   370           14       1694
HT2                                   370           14       1694
T1                                    748           20       1016
T2                                    748           20       1016

select i.table_name, i.index_name, s.blocks
from user_segments s, user_indexes i
where (i.table_name in ('T1','T2','HT1','HT2') or i.index_name = 'IDX_SIMPLE_CLUSTER')
  and s.segment_name = i.index_name
order by 1,2;

TABLE_NAME                     INDEX_NAME                         BLOCKS
------------------------------ ------------------------------ ----------
HT1                            SYS_C006875                           104
HT2                            SYS_C006876                           104
SIMPLE_CLUSTER                 IDX_SIMPLE_CLUSTER                    104
T1                             SYS_C006873                            96
T2                             SYS_C006874                            96


So, the cluster is more or less the same size than the two heap tables together. Indeed, we have two indexes for the heap tables and three for the cluster (two for each table and another one for the cluster index).

Depending on the plan we can use a distinct index. Therefore, we are using more space in the cluster approach.

I set up another test, based on chapter 7, p391 of your book "Effective Oracle by Design".

********************************************************************************
SELECT T1.ID, T1.A, T2.B
FROM
 T1, T2 WHERE T1.ID = T2.ID AND T1.ID = :B1


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute 387590     28.20      22.90          0          0          0           0
Fetch   387590     45.36      31.64        151    1275180          0      100000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   775181     73.57      54.55        151    1275180          0      100000

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 58     (recursive depth: 1)

Rows     Row Source Operation
-------  ---------------------------------------------------
 100000  NESTED LOOPS  (cr=1275180 pr=151 pw=0 time=40365241 us)
 100000   TABLE ACCESS CLUSTER T2 (cr=975180 pr=151 pw=0 time=21235320 us)
 100000    INDEX UNIQUE SCAN IDX_SIMPLE_CLUSTER (cr=775180 pr=22 pw=0 time=9269972 us)(object id 50725)
 100000   TABLE ACCESS CLUSTER T1 (cr=300000 pr=0 pw=0 time=7004082 us)
 100000    INDEX UNIQUE SCAN IDX_SIMPLE_CLUSTER (cr=200000 pr=0 pw=0 time=2838852 us)(object id 50725)


********************************************************************************

SELECT HT1.ID, HT1.A, HT2.B
FROM
 HT1, HT2 WHERE HT1.ID = HT2.ID AND HT1.ID = :B1


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute 387590     29.00      23.69          0          0          0           0
Fetch   387590     46.24      31.98        130    1175180          0      100000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   775181     75.25      55.68        130    1175180          0      100000

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 58     (recursive depth: 1)

Rows     Row Source Operation
-------  ---------------------------------------------------
 100000  NESTED LOOPS  (cr=1175180 pr=130 pw=0 time=40937034 us)
 100000   TABLE ACCESS BY INDEX ROWID HT2 (cr=875180 pr=54 pw=0 time=21063360 us)
 100000    INDEX UNIQUE SCAN SYS_C006816 (cr=775180 pr=30 pw=0 time=9443417 us)(object id 50733)
 100000   TABLE ACCESS BY INDEX ROWID HT1 (cr=300000 pr=76 pw=0 time=7366452 us)
 100000    INDEX UNIQUE SCAN SYS_C006815 (cr=200000 pr=31 pw=0 time=3148368 us)(object id 50731)



As you can see, with my cluster approach I am making more physical and logical I/O. With that results, I am discouraged to use clusters in our design.

Thanks,

Altering the cluster key of a record, what happens?

Adam, April 12, 2007 - 9:42 pm UTC

Hello Tom,

If I set up the following example:

create cluster scott.emp_dept_cluster
( deptno number(2) )
tablespace users
size 1024
/

Cluster created.

create index scott.emp_dept_cluster_idx
on cluster scott.emp_dept_cluster
tablespace users
/

Index created.

create table scott.emp
( empno    number primary key,
ename    varchar2(10),
job      varchar2(9),
mgr      number,
hiredate date,
sal      number,
comm     number,
deptno number(2) references scott.dept(deptno)
)
cluster scott.emp_dept_cluster(deptno)
/

Table created.

insert into scott.emp (empno,ename,job,mgr,hiredate,sal,comm     ,deptno) values (7788,'SCOTT','ANALYST',7566,'19-APR-87',3000,20);

1 row inserted.

commit;

update scott.emp set DEPTNO = 30 where ENAME = 'SCOTT';

1 row updated.

commit;

Can one conclude the record for employee Scott has been moved from the block with the cluster key id = 20 to the block with the cluster key id = 30?


Tom Kyte
April 13, 2007 - 1:13 pm UTC

it is more complex than that - it is similar to a migrated row.

the rowid will not change.



ops$tkyte%ORA9IR2> create cluster emp_dept_cluster
  2  ( deptno number(2) )
  3  tablespace users<b>
  4  size 8k -- to make a deptno have a block to itself in my 8k block db</b>
  5  /

Cluster created.

ops$tkyte%ORA9IR2>
ops$tkyte%ORA9IR2> create index emp_dept_cluster_idx
  2  on cluster emp_dept_cluster
  3  tablespace users
  4  /

Index created.

ops$tkyte%ORA9IR2>
ops$tkyte%ORA9IR2> create table emp
  2  ( empno    number primary key,
  3  ename    varchar2(10),
  4  job      varchar2(9),
  5  mgr      number,
  6  hiredate date,
  7  sal      number,
  8  comm     number,
  9  deptno number(2)
 10  )
 11  cluster emp_dept_cluster(deptno)
 12  /

Table created.

ops$tkyte%ORA9IR2>
ops$tkyte%ORA9IR2> insert into emp select * from scott.emp;

14 rows created.

ops$tkyte%ORA9IR2> commit;

Commit complete.

ops$tkyte%ORA9IR2> select deptno, ename, rowid, dbms_rowid.rowid_block_number(rowid) from emp order by deptno;

    DEPTNO ENAME      ROWID              DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID)
---------- ---------- ------------------ ------------------------------------
        10 CLARK      AAAH68AAJAAAA+3AAA                                 4023
        10 KING       AAAH68AAJAAAA+3AAB                                 4023
        10 MILLER     AAAH68AAJAAAA+3AAC                                 4023
        20 SMITH      AAAH68AAJAAAA+4AAA                                 4024
        20 ADAMS      AAAH68AAJAAAA+4AAD                                 4024
        20 JONES      AAAH68AAJAAAA+4AAB                                 4024
        20 SCOTT      AAAH68AAJAAAA+4AAC                                 4024
        20 FORD       AAAH68AAJAAAA+4AAE                                 4024
        30 ALLEN      AAAH68AAJAAAA+0AAA                                 4020
        30 WARD       AAAH68AAJAAAA+0AAB                                 4020
        30 MARTIN     AAAH68AAJAAAA+0AAC                                 4020
        30 BLAKE      AAAH68AAJAAAA+0AAD                                 4020
        30 TURNER     AAAH68AAJAAAA+0AAE                                 4020
        30 JAMES      AAAH68AAJAAAA+0AAF                                 4020

14 rows selected.

ops$tkyte%ORA9IR2> update emp set DEPTNO = 30 where ENAME = 'SCOTT';

1 row updated.

ops$tkyte%ORA9IR2> commit;

Commit complete.

ops$tkyte%ORA9IR2> select deptno, ename, rowid, dbms_rowid.rowid_block_number(rowid) from emp order by deptno;

    DEPTNO ENAME      ROWID              DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID)
---------- ---------- ------------------ ------------------------------------
        10 CLARK      AAAH68AAJAAAA+3AAA                                 4023
        10 KING       AAAH68AAJAAAA+3AAB                                 4023
        10 MILLER     AAAH68AAJAAAA+3AAC                                 4023
        20 SMITH      AAAH68AAJAAAA+4AAA                                 4024
        20 FORD       AAAH68AAJAAAA+4AAE                                 4024
        20 ADAMS      AAAH68AAJAAAA+4AAD                                 4024
        20 JONES      AAAH68AAJAAAA+4AAB                                 4024
        30 ALLEN      AAAH68AAJAAAA+0AAA                                 4020
        30 WARD       AAAH68AAJAAAA+0AAB                                 4020
        30 MARTIN     AAAH68AAJAAAA+0AAC                                 4020
        30 BLAKE      AAAH68AAJAAAA+0AAD                                 4020
        30 TURNER     AAAH68AAJAAAA+0AAE                                 4020
        30 JAMES      AAAH68AAJAAAA+0AAF                                 4020
        30 SCOTT      AAAH68AAJAAAA+4AAC                                 4024

14 rows selected.



So, scott is still on block 4024 (sort of) - has to keep the rowid since an index on ENAME for example would have the rowid.

We can see the row migrated - and left the forwarding address behind...

you can analyze the table to see the chained row count goes up - or, if we add an index on ENAME and query:

select * from emp where ename= 'SCOTT';

before and after and measure

a) logical IO's
b) the table fetch continued row statistic

we'd see the IO goes up by one as does the continued row statistic:

ops$tkyte%ORA9IR2> create index ename_idx on emp(ename);

Index created.

ops$tkyte%ORA9IR2> insert into emp select * from scott.emp;

14 rows created.

ops$tkyte%ORA9IR2> select * from emp where ENAME='SCOTT';

     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
      7788 SCOTT      ANALYST         7566 09-DEC-82       3000                    20

ops$tkyte%ORA9IR2> commit;

Commit complete.

ops$tkyte%ORA9IR2>
ops$tkyte%ORA9IR2>
ops$tkyte%ORA9IR2> @mystat continued
ops$tkyte%ORA9IR2> set echo off

NAME                        VALUE
---------------------- ----------
table fetch continued           5
row


ops$tkyte%ORA9IR2> set autotrace traceonly statistics
ops$tkyte%ORA9IR2> select * from emp where ENAME='SCOTT';


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads
          0  redo size
        805  bytes sent via SQL*Net to client
        503  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

ops$tkyte%ORA9IR2> set autotrace off
ops$tkyte%ORA9IR2> @mystat2
ops$tkyte%ORA9IR2> set echo off

NAME                        VALUE DIFF
---------------------- ---------- ------------------
table fetch continued           5                0
row


ops$tkyte%ORA9IR2> update emp set DEPTNO = 30 where ENAME = 'SCOTT';

1 row updated.

ops$tkyte%ORA9IR2> commit;

Commit complete.

ops$tkyte%ORA9IR2> @mystat continued
ops$tkyte%ORA9IR2> set echo off

NAME                        VALUE
---------------------- ----------
table fetch continued           5
row


ops$tkyte%ORA9IR2> set autotrace traceonly statistics
ops$tkyte%ORA9IR2> select * from emp where ENAME='SCOTT';


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          4  consistent gets
          0  physical reads
          0  redo size
        805  bytes sent via SQL*Net to client
        503  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

ops$tkyte%ORA9IR2> set autotrace off
ops$tkyte%ORA9IR2> @mystat2
ops$tkyte%ORA9IR2> set echo off

NAME                        VALUE DIFF
---------------------- ---------- ------------------
table fetch continued           6                1
row


Thank you

A reader, April 13, 2007 - 1:23 pm UTC

Tom,

Thank you for the exceptionally detailed walk through. Professional and to the point as always.

1753?

Stephan, July 26, 2007 - 10:51 pm UTC

Hi Tom,

Trying the hash cluster examples from your new book (pg 385). I've changed the table and cluster names, but I think I've got everything straight. I can't seem to get past this error - any quick thoughts?

ops$us\suzzell@OPERA> create cluster single_tab_hash_clust
  2  ( hash_key number(10) )
  3  hashkeys 75000
  4  size 150
  5  single table
  6  hash is HASH_KEY
  7  /

Cluster created.

Elapsed: 00:00:01.00
ops$us\suzzell@OPERA> create table t10_9_hashed
  2  cluster single_tab_hash_clust(object_id)
  3  as
  4  select owner, object_name, subobject_name,
  5         cast( object_id as number(10) ) object_id,
  6         data_object_id, object_type, created,
  7         last_ddl_time, timestamp, status, temporary,
  8         generated, secondary
  9    from dba_objects
 10  /
cluster single_tab_hash_clust(object_id)
                              *
ERROR at line 2:
ORA-01753: column definition incompatible with clustered column definition


Should probably mention I'm using 9i...

Thanks,
Tom Kyte
July 27, 2007 - 2:00 pm UTC

I cannot reproduce in 10.2.0.3 or 9.2.0.8

I guess I concur

Stephan, July 27, 2007 - 5:23 pm UTC

I tested in 10.2.0.2 and it worked.

At least I can rest knowing I didn't miss anything stupid.

Thanks, Tom

single hash clustered table still used much CPU than

Qihua, August 23, 2008 - 4:55 am UTC

Hash accessing consumes much more CPU than index accessing even if hash accessing uses 1 LIO and index accessing uses at least 2 LIO.
In expect one-on-one page 303.
"The hash cluster query took significantly more CPU. This too could be anticipated.The act of performing a hash is very CPU¿intensive"

But for the test with a table using "hash is" (page 304), since oracle don't need perform a hash, so it won't take much CPU. But the result of your testing is that the CPU is still much more than index accessing.

I did a test on a huge single clustered table, an index access with 5 LIO uses nearly the same CPU as hash accessing with 1 LIO. I don't know whether the extra CPU goes to. And based on the test, there is no advantage to use hash cluster as for normal table, data are tightly packed, but for cluster table, data are loosely packed. That means if using cluster table we will have a much less cache hit.
Tom Kyte
August 26, 2008 - 7:55 pm UTC

Post full test case please - in most cases, the hash access takes a little less cpu (it'll vary by machine of course)

and remember the main goal - reduce contention, reduce IO.

when you use an index range scan + table access by index rowid - you have what is called a "hot block" - the root index block. Everyone needs it. The more people that go after it, the more contention for the cache buffers chains (cbc) latch there will be. The more people contending for the same latch EVEN MORE cpu will be used (we use spin locks in most all cases - we spin, in a loop, waiting for a latch, the more people trying to get a latch, the more cpu you expend in each session getting that latch)


did you think to take your test to 10 or more concurrent users?

hash uses 1 LIO and index used 5 LIO, cpu times is nearly the same

Qihua, August 30, 2008 - 2:50 am UTC

Thanks Tom, 

I post my test result below and I also have several questions/guess here, please confirm:

1: what's the overhead of hashing? Could hashing overhead be ignored in comparison with 1 LIO?  

My test told me hashing seems not very expensive as if using the "hash is" instead of 
the default hash function,there is only very little improvement in elapse time. 

2: Seems one LIO to access an index block is much cheaper than one LIO to access
 a block by hash  When access a block by hash, oracle has to scan the whole block,
 but to access index block, oracle could quick locate the row in the block 
 which save lots of cpu  Only in this way I could explain why 1 LIO for hash is almost 
the same costly as 5 LIO as index access  (4 LIO index blocks and 1 LIO for data block access by rowid)

3: The big table(30G) I want to turn into cluster is a partition table, so the 
contention is not so huge.
   And to avoid hash collision, I'd like to allocate double space than 
the original one,   then the cache hit ratio could drop a lot. 

If 1 LIO for hash access is nearly the same as 5LIO for index access,
 then seems there is no need to make the change, am I right?


SQL> exec runstats_pkg.rs_start;
PL/SQL procedure successfully completed.
SQL> declare
  2   v_record LOOKUP%rowtype;
  3  begin
  4     for i in 1..10000
  5     loop
  6        select * into v_record from LOOKUP where user_id=74560223;
  7     end loop;
  8  end;
  9  /
PL/SQL procedure successfully completed.
SQL>
SQL> exec runstats_pkg.rs_middle
PL/SQL procedure successfully completed.
SQL> declare
  2   v_record LOOKUP%rowtype;
  3  begin
  4     for i in 1..10000
  5     loop
  6        select /*+ INDEX(LOOKUP,LOOKUP_PK) */* into v_record from LOOKUP where user_id=74560223;
  7     end loop;
  8  end;
  9  /

PL/SQL procedure successfully completed.

SQL> exec runstats_pkg.rs_stop(5)
Run1 ran in 149 hsecs
Run2 ran in 160 hsecs
run 1 ran in 93.13% of the time
      
Name                                  Run1        Run2        Diff
STAT...db block gets                    18          24           6
STAT...db block gets from cach          18          24           6
LATCH.channel operations paren          56          50          -6
LATCH.active service list               14          21           7
STAT...recursive cpu usage           1,089       1,069         -20
STAT...CPU used by this sessio       1,110       1,090         -20
STAT...DB time                       1,122       1,090         -32
STAT...CPU used when call star       1,123       1,090         -33
STAT...Elapsed Time                  1,080       1,134          54
STAT...bytes received via SQL*       1,577       1,635          58
STAT...redo size                     3,044       3,104          60
LATCH.checkpoint queue latch         2,560       2,496         -64
LATCH.undo global data                   5          74          69
LATCH.enqueues                         132         206          74
LATCH.enqueue hash chains              138         213          75
STAT...session uga memory           -2,336       3,944       6,280
STAT...cluster key scans            10,000           0     -10,000
STAT...table fetch by rowid              2      10,002      10,000
STAT...buffer is not pinned co      10,004      20,004      10,000
STAT...cluster key scan block       10,000           0     -10,000
STAT...rows fetched via callba           2      10,002      10,000
STAT...no work - consistent re      10,000           0     -10,000
STAT...index fetch by key                2      10,002      10,000
LATCH.cache buffers chains          20,158      50,169      30,011
STAT...consistent gets              10,015      50,017      40,002
STAT...consistent gets from ca      10,015      50,017      40,002
STAT...session logical reads        10,033      50,041      40,008 
           ===> hash access: 1 LIO, index access: 5 LIO 
STAT...consistent gets - exami          10      50,012      50,002
      
Run1 latches total versus runs -- difference and pct
Run1        Run2        Diff       Pct
23,590      53,763      30,173     43.88%

PL/SQL procedure successfully completed.

Tom Kyte
August 30, 2008 - 10:09 am UTC

in a SINGLE USER TEST the cpu needed to an a SINGLE LOGICAL IO will be the same for everyone.

So, you have a query that takes 1 LIO, it takes X cpu to accomplish that. Of that X, some percentage is for the hash and some percentage is for the LIO. Lets call that percentage for the LIO "x" (lowercase case) and all we know is that "x" < X and X-"x" is the cpu used for the hashing (let us call that "y")

"y" is constant regardless of the number of concurrent users. "y" is fixed.

"x" on the other hand is a function of the number of concurrent users, "x" is variable, "x" GOES UP as the user load goes up (you need to latch in order to do "x", latching is a serialization process, we use spin locks in general, the more people going after a latch at the same time, the more cpu each needs to do so).

So, as we add more uses, the amount of cpu needed to perform that 1 LIO query is:

"y" + "x" * some_multiplier


You have that 5 LIO query, it is all LIO work (no hashing). In single user mode it also used X cpu time. Here we know that X = 5 * "x" (5 LIOS). As the concurrency goes up you have:

5 * ("x" * some_multiplier)


The rate of cpu increase is much higher for this second query, it is really even potentially much worse (the multiplier for this second query could be much higher than the first). Why? Because one of the blocks needed by this 5 LIO query is the root index block - every one needs it, it'll be a hot block, hot blocks lead to cache buffers chains (a latch name) contention.



hash clustered access takes about the same amount of cpu to retrieve a row in single user mode as the index access would.

it is when you scale up that you'll see a marked difference in the ability to scale, you'll have removed that hot block, you have reduce contention (every LIO you do is a point of contention).

Hagi, August 01, 2009 - 9:31 pm UTC

Sir
Regarding the index cluster tables.
Is it necessary to load the data with cluster key ?

Assume I have two tables that are clustered.

First, I insert big amount of data to table1.
All the blocks in the cluster will be filled.

Now I insert to table2.

How can oracle put the clustered column to same block in table1.

The clustered blocks are already full.
Tom Kyte
August 04, 2009 - 1:23 pm UTC

I think you mean to say "is it necessary to load ALL of the rows with the same cluster key at the same time, in order"

The answer is:

It would be nice, but it is certainly not necessary, nor a requirement.


When you create a b*tree cluster - you set a SIZE parameter. The SIZE parameter is the estimate of the size in bytes of the data you expect to store with a given cluster key.

So, say you have a cluster key "ID". If you estimate that table1 will have 500 bytes of data associated with a given ID on average (overestimate a bit), and table2 will have 1500 bytes of data on average, you would set SIZE for the cluster to be about 2000.

Now, say you have an 8k block. 8k/2000 is about 4. When you load table1, we'll put 4 cluster keys on a block and then move on to the next block. We won't use pctfree, we'll use the size parameter.


see the difference in rows/block when we use different cluster SIZE settings?

ops$tkyte%ORA10GR2> drop table t1;

Table dropped.

ops$tkyte%ORA10GR2> drop cluster my_cluster;

Cluster dropped.

ops$tkyte%ORA10GR2> create cluster my_cluster ( id number ) size 2000;

Cluster created.

ops$tkyte%ORA10GR2> create index my_cluster_idx on cluster my_cluster;

Index created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create table t1 ( id number, data char(450) ) cluster my_cluster( id );

Table created.

ops$tkyte%ORA10GR2> insert into t1 select rownum, 'x' from dual connect by level <= 100;

100 rows created.

ops$tkyte%ORA10GR2> select min(cnt), max(cnt), avg(cnt) , count(*)
  2    from (
  3  select count(*) cnt
  4    from t1
  5   group by dbms_rowid.rowid_block_number( rowid )
  6         );

  MIN(CNT)   MAX(CNT)   AVG(CNT)   COUNT(*)
---------- ---------- ---------- ----------
         4          4          4         25

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> drop table t1;

Table dropped.

ops$tkyte%ORA10GR2> drop cluster my_cluster;

Cluster dropped.

ops$tkyte%ORA10GR2> create cluster my_cluster ( id number ) size 1000;

Cluster created.

ops$tkyte%ORA10GR2> create index my_cluster_idx on cluster my_cluster;

Index created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create table t1 ( id number, data char(450) ) cluster my_cluster( id );

Table created.

ops$tkyte%ORA10GR2> insert into t1 select rownum, 'x' from dual connect by level <= 100;

100 rows created.

ops$tkyte%ORA10GR2> select min(cnt), max(cnt), avg(cnt) , count(*)
  2    from (
  3  select count(*) cnt
  4    from t1
  5   group by dbms_rowid.rowid_block_number( rowid )
  6         );

  MIN(CNT)   MAX(CNT)   AVG(CNT)   COUNT(*)
---------- ---------- ---------- ----------
         4          8 7.69230769         13

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> drop table t1;

Table dropped.

ops$tkyte%ORA10GR2> drop cluster my_cluster;

Cluster dropped.

ops$tkyte%ORA10GR2> create cluster my_cluster ( id number ) size 8000;

Cluster created.

ops$tkyte%ORA10GR2> create index my_cluster_idx on cluster my_cluster;

Index created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create table t1 ( id number, data char(450) ) cluster my_cluster( id );

Table created.

ops$tkyte%ORA10GR2> insert into t1 select rownum, 'x' from dual connect by level <= 100;

100 rows created.

ops$tkyte%ORA10GR2> select min(cnt), max(cnt), avg(cnt) , count(*)
  2    from (
  3  select count(*) cnt
  4    from t1
  5   group by dbms_rowid.rowid_block_number( rowid )
  6         );

  MIN(CNT)   MAX(CNT)   AVG(CNT)   COUNT(*)
---------- ---------- ---------- ----------
         1          1          1        100




A reader, August 07, 2009 - 10:03 pm UTC

Sir;

As far as I know the purpose of the cluster is two or more column share the same block.

In this scenerio,
If I insert into table1,we'll put 4 cluster keys on a (block1) and then move on to the (block2).
Then I insert into table2, we'll again put 4 cluster keys on a different (block3) and then move on to the (block4).

The thing is,the clustered column of table1 and table2 will not reside in the same block..
They are all using different blocks.

Tom Kyte
August 11, 2009 - 12:38 pm UTC

... As far as I know the purpose of the cluster is two or more column share the
same block.
...

No, that isn't right.

the goal of the cluster is to:

store on a single block, or as few blocks as possible, all of the rows from all of the tables that share a common key.




You are missing the concept of the b*tree cluster entirely. When we insert the 4 rows for table1 - the 4 keys we created were put onto block X. Later, when those four keys are again inserted for any row in any table in that cluster - we'll want to put them onto block X again.

the cluster "clusters related data together". it forces "location" on the data. Before we insert into a table in the cluster, we figure out where the key values in the newly inserted rows say the row goes.


Say emp and dept are clustered together on DEPTNO.

Say you use a SIZE of 2k - so we get 4 keys/block.

Say you start with everything empty.

You insert DEPT=100, we look up 100 in the b*tree cluster index, we don't find it (empty tables in the cluster). So, we find a block with sufficient free space AND that does not already have 4 cluster keys on it (call this block A) and insert DEPT=100 on that block and update the b*tree cluster index to show "dept=100 goes to block A"

You insert DEPT=10, we look up 10 in the b*tree cluster index, we don't find it. We find a block with sufficient free space AND that does not have 4 cluster keys on it (this will be block A again probably). We put dept=10 on that block and update the b*tree cluster to show "dept=10 goes to block A"

You insert DEPT=200, the same thing that happened to DEPT=10 happens again.
You insert DEPT=1, then same thing that happened to DEPT=10 happens again.

You insert DEPT=50, this time, when we go to find a free block, we cannot use block A again because block A already has 4 keys on it, so we put DEPT=50 (and the next three DEPT=101,33,42 values) onto block B.

So, you have two blocks

block A has 100, 10, 200, 1
block B has 50, 101, 33, 42

when you insert an employee with a deptno = 42, we look up in the b*tree cluster and discover "42 goes onto block B", so we put that emp row onto block B.

when you inesrt employee with a deptno = 100, we look up in the b*tree cluster index and discover "100 goes onto block A"

and so on....

that is the way the b*tree cluster works.

A reader, August 14, 2009 - 9:43 pm UTC

Sir.
Thanks a lot for your enlightining explanation. I am now clear.
I apprecate your time.
Sorry for any inconvinience.

converting to a single table hash cluster

A reader, September 23, 2009 - 11:11 pm UTC

Tom,

Having read this article, we've done some benchmarks and have decided that we will implement single table hash clusters (STHC).

We now need to implement this in a production environment by converting an existing HEAP table (referenced by foreign keys) to a STHC. It's not a huge table, approx 4 million rows.

Is there a "quick" command to do this?
Tom Kyte
September 28, 2009 - 2:37 pm UTC

if you want, online operations are permitted:

ops$tkyte%ORA10GR2> create table t1
  2  as
  3  select * from all_objects
  4   order by dbms_random.random;

Table created.

ops$tkyte%ORA10GR2> alter table t1 add constraint t1_pk primary key(object_id);

Table altered.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create cluster Hash_Cluster
  2  ( id number )
  3  SINGLE TABLE
  4  hashkeys 80000 -- number of distinct lookups we expect over time
  5  size 125         -- size of data associated with key
  6  /

Cluster created.

ops$tkyte%ORA10GR2> create table t2 (
  2  OWNER                  VARCHAR2(30),
  3  OBJECT_NAME            VARCHAR2(30),
  4  SUBOBJECT_NAME         VARCHAR2(30),
  5  OBJECT_ID              NUMBER,
  6  DATA_OBJECT_ID         NUMBER,
  7  OBJECT_TYPE            VARCHAR2(19),
  8  CREATED                DATE,
  9  LAST_DDL_TIME          DATE,
 10  TIMESTAMP              VARCHAR2(19),
 11  STATUS                 VARCHAR2(7),
 12  TEMPORARY              VARCHAR2(1),
 13  GENERATED              VARCHAR2(1),
 14  SECONDARY              VARCHAR2(1),
 15  NAMESPACE              NUMBER,
 16  EDITION_NAME           VARCHAR2(30)
 17  )
 18  cluster hash_cluster( object_id );

Table created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> exec dbms_redefinition.can_redef_table( user, 'T1' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> exec dbms_redefinition.start_redef_table( user, 'T1', 'T2' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> variable nerrors number
ops$tkyte%ORA10GR2> begin
  2          dbms_redefinition.copy_table_dependents
  3          ( user, 'T1', 'T2',
  4            copy_indexes => dbms_redefinition.cons_orig_params,
  5            num_errors => :nerrors );
  6  end;
  7  /

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> exec dbms_redefinition.finish_redef_table( user, 'T1', 'T2' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> drop table t2;

Table dropped.

ops$tkyte%ORA10GR2> select dbms_metadata.get_ddl( 'TABLE', 'T1' ) from dual;

DBMS_METADATA.GET_DDL('TABLE','T1')
-------------------------------------------------------------------------------

  CREATE TABLE "OPS$TKYTE"."T1"
   (    "OWNER" VARCHAR2(30) CONSTRAINT "SYS_C0039325" NOT NULL ENABLE NOVALIDATE,

        "OBJECT_NAME" VARCHAR2(30) CONSTRAINT "SYS_C0039326" NOT NULL ENABLE NOVALIDAT
E,
        "SUBOBJECT_NAME" VARCHAR2(30),
        "OBJECT_ID" NUMBER CONSTRAINT "SYS_C0039327" NOT NULL ENABLE NOVALIDATE,
        "DATA_OBJECT_ID" NUMBER,
        "OBJECT_TYPE" VARCHAR2(19),
        "CREATED" DATE CONSTRAINT "SYS_C0039328" NOT NULL ENABLE NOVALIDATE,
        "LAST_DDL_TIME" DATE CONSTRAINT "SYS_C0039329" NOT NULL ENABLE NOVALIDATE,
        "TIMESTAMP" VARCHAR2(19),
        "STATUS" VARCHAR2(7),
        "TEMPORARY" VARCHAR2(1),
        "GENERATED" VARCHAR2(1),
        "SECONDARY" VARCHAR2(1),
        "NAMESPACE" NUMBER,
        "EDITION_NAME" VARCHAR2(30),
         CONSTRAINT "T1_PK" PRIMARY KEY ("OBJECT_ID")
  USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
  TABLESPACE "USERS"  ENABLE NOVALIDATE
   ) CLUSTER "OPS$TKYTE"."HASH_CLUSTER" ("OBJECT_ID")




else take an outage and:

a) create table new as select (put new table into hash cluster)
b) add constraints, grants, etc to it
c) drop old table
d) rename new


dbms_redefinition

A reader, September 29, 2009 - 3:50 pm UTC

Thank you. dbms_redefinition - most useful to know!

Single-table Hash Cluster vs. IOT

Stacey Jones, October 13, 2009 - 3:29 pm UTC

Tom,

In one of our data warehouses we have a currency conversion table that gets populated with conversion rates on a daily
basis. The data is kept historically so that reports can compute the conversion rate for any day in the past. The math
to do the rate conversion is very simple but the problem is that it can be done up to a 100 million times per day and
this consumes quite a bit of CPU when the reports are running. The conversion rate table has about 500,000 rows and I
thought I would pre-compute the conversions with a materialized view so that there was no math involved - only a select.
The resulting MV will contain about 170 million rows. Initially I built this with a MV with an IOT table, but was
considering a single-table hash cluster instead, since this seems to be your suggestion in "Effective Oracle by Design".
I built both versions and was a little suprised that the hash cluster solution is slower (elapsed time) and also
consumes more CPU when I check the consumption in v$sqlarea. Could you check out my solution and let me know if

A) I've missed something with my hash cluster solution and this could be improved.
B) I'm better off with the IOT solution
C) I'm better off with something else entirely.

I'm running Oracle 9.2.0.7 on Sun Solaris. Many thanks in advance, your book does rock!

Thanks

Stacey

-- Create the IOT table

--DROP TABLE STG.MV_WSG_IOT;
CREATE TABLE STG.MV_WSG_IOT
(
CONV_DATE DATE NOT NULL,
FROM_CURRENCY_CODE VARCHAR2(6 BYTE) NOT NULL,
TO_CURRENCY_CODE VARCHAR2(6 BYTE) NOT NULL,
CONV_RATE NUMBER NOT NULL,
A_ROWID ROWID NOT NULL,
B_ROWID ROWID NOT NULL,
CONSTRAINT MV_WSG_PK_IOT
PRIMARY KEY
(CONV_DATE, FROM_CURRENCY_CODE, TO_CURRENCY_CODE)
)
ORGANIZATION INDEX
NOLOGGING
TABLESPACE AUDIT_SHIPMENT_DATA
STORAGE (
INITIAL 1M
NEXT 1M
)
COMPRESS 2
NOPARALLEL;

-- Load the IOT with 500000 rows of data
begin
insert/*+ APPEND */ into STG.MV_WSG_IOT select * from stg.mv_wsg where rownum < 500001;
commit;
end;

-- Create the cluster

--DROP TABLE STG.MV_WSG_HASH_CLUSTER;
--DROP CLUSTER STG.CLU_CURR_CONV_LKP;
CREATE CLUSTER STG.CLU_CURR_CONV_LKP
(
CONV_DATE DATE,
FROM_CURRENCY_CODE VARCHAR2(6 BYTE),
TO_CURRENCY_CODE VARCHAR2(6 BYTE)
)
TABLESPACE AUDIT_SHIPMENT_DATA
SIZE 61
STORAGE (
INITIAL 1M
NEXT 1M
)
SINGLE TABLE
HASHKEYS 500000
NOROWDEPENDENCIES
NOCACHE
NOPARALLEL;

-- Create the hash clustered table

--DROP TABLE STG.MV_WSG_HASH_CLUSTER
CREATE TABLE STG.MV_WSG_HASH_CLUSTER
(
CONV_DATE DATE NOT NULL,
FROM_CURRENCY_CODE VARCHAR2(6 BYTE) NOT NULL,
TO_CURRENCY_CODE VARCHAR2(6 BYTE) NOT NULL,
CONV_RATE NUMBER NOT NULL,
A_ROWID ROWID NOT NULL,
B_ROWID ROWID NOT NULL
)
CLUSTER STG.CLU_CURR_CONV_LKP(CONV_DATE,FROM_CURRENCY_CODE,TO_CURRENCY_CODE)
;

-- Load the Hash clustered table with 500000 rows of data
begin
insert/*+ APPEND */ into STG.MV_WSG_HASH_CLUSTER select * from stg.mv_wsg where rownum < 500001;
commit;
end;

-- Analyze both tables

BEGIN
SYS.DBMS_STATS.GATHER_TABLE_STATS (
OwnName => 'STG'
,TabName => 'MV_WSG_HASH_CLUSTER'
,Estimate_Percent => NULL
,Method_Opt => 'FOR ALL INDEXED COLUMNS SIZE AUTO '
,Degree => 4
,Cascade => TRUE
,No_Invalidate => FALSE);
commit;
SYS.DBMS_STATS.GATHER_TABLE_STATS (
OwnName => 'STG'
,TabName => 'MV_WSG_IOT'
,Estimate_Percent => NULL
,Method_Opt => 'FOR ALL INDEXED COLUMNS SIZE AUTO '
,Degree => 4
,Cascade => TRUE
,No_Invalidate => FALSE);
commit;
END;

--Run an explain plan for each of the queries:

SELECT * FROM STG.MV_WSG_IOT t
WHERE 1=1
AND t.conv_date = '05-JUL-04'
AND t.from_currency_code = 'JPY'
AND t.to_currency_code = 'CNY'

Plan
SELECT STATEMENT CHOOSE Cost: 2 Bytes: 43 Cardinality: 1 IO Cost: 2
1 INDEX UNIQUE SCAN UNIQUE STG.MV_WSG_PK_IOT Search Columns: 3 Access Predicates: "T"."CONV_DATE"='05-JUL-04' AND "T"."FROM_CURRENCY_CODE"='JPY' AND "T"."TO_CURRENCY_CODE"='CNY' Cost: 2 Bytes: 43 Cardinality: 1 IO Cost: 2

SELECT * FROM STG.MV_WSG_HASH_CLUSTER t
WHERE 1=1
AND t.conv_date = '05-JUL-04'
AND t.from_currency_code = 'JPY'
AND t.to_currency_code = 'CNY'

Plan
SELECT STATEMENT CHOOSE Bytes: 46 Cardinality: 1
1 TABLE ACCESS HASH STG.MV_WSG_HASH_CLUSTER Object Instance: 1 Search Columns: 3 Access Predicates: "T"."CONV_DATE"='05-JUL-04' AND "T"."FROM_CURRENCY_CODE"='JPY' AND "T"."TO_CURRENCY_CODE"='CNY' Bytes: 46 Cardinality: 1

-- Create a jumbled list to use in querying both tables.

create table stg.test_params
tablespace AUDIT_SHIPMENT_DATA
compress
as
select conv_date,from_currency_code,to_currency_code from (
select * from stg.mv_wsg x where rownum < 500001) x2
order by to_char(conv_date,'dd'),to_char(conv_date,'yy') desc, to_char(conv_date,'mm'), substr(from_currency_code,2,1),substr(to_currency_code,3,1) desc;

-- Run the following test block to time random conversion rate lookups...

DECLARE
CURSOR c_master IS SELECT * FROM stg.test_params WHERE rownum < 100001;
CURSOR c_baseline_query(b1 IN stg.mv_wsg_iot.conv_date%TYPE
,b2 IN stg.mv_wsg_iot.from_currency_code%TYPE
,b3 IN stg.mv_wsg_iot.to_currency_code%TYPE
) IS
SELECT t.conv_rate FROM STG.MV_WSG_IOT t WHERE t.conv_date = b1 AND t.from_currency_code = b2 AND t.to_currency_code = b3;
CURSOR c_replacement_query(
b1 IN stg.mv_wsg_hash_cluster.conv_date%TYPE
,b2 IN stg.mv_wsg_hash_cluster.from_currency_code%TYPE
,b3 IN stg.mv_wsg_hash_cluster.to_currency_code%TYPE
) IS
SELECT t.conv_rate FROM STG.MV_WSG_HASH_CLUSTER t WHERE t.conv_date = b1 AND t.from_currency_code = b2 AND t.to_currency_code = b3;
-- Constants and Variables
num_seconds_in_a_day NUMBER := 24*60*60;
start_time DATE;
end_time DATE;
baseline_elapsed_time NUMBER := 0;-- In seconds
replacement_elapsed_time NUMBER := 0;-- In seconds
percent_change NUMBER := 0;
BEGIN
DBMS_OUTPUT.ENABLE(1000000);
COMMIT;
-- Run and time the baseline query
start_time := SYSDATE;
DBMS_OUTPUT.put_line('Baseline run began at: ' || TO_CHAR(start_time, 'mm/dd/yyyy hh24:mi:ss'));
COMMIT;
FOR m IN c_master LOOP
FOR r IN c_baseline_query(m.conv_date,m.from_currency_code,m.to_currency_code) LOOP
NULL;
END LOOP;-- c_test
END LOOP;--c_master
end_time := sysdate;
DBMS_OUTPUT.put_line('Baseline run ended at: ' || TO_CHAR(end_time, 'mm/dd/yyyy hh24:mi:ss'));
COMMIT;
baseline_elapsed_time := (end_time - start_time) * num_seconds_in_a_day;
-- Run and time the replacement query
start_time := SYSDATE;
DBMS_OUTPUT.put_line('Replacement query run began at: ' || TO_CHAR(start_time, 'mm/dd/yyyy hh24:mi:ss'));
COMMIT;
FOR m IN c_master LOOP
FOR r IN c_replacement_query(m.conv_date,m.from_currency_code,m.to_currency_code) LOOP
NULL;
END LOOP;-- c_test
END LOOP;--c_master
end_time := sysdate;
DBMS_OUTPUT.put_line('Replacement query run ended at: ' || TO_CHAR(end_time, 'mm/dd/yyyy hh24:mi:ss'));
COMMIT;
replacement_elapsed_time := (end_time - start_time) * num_seconds_in_a_day;
-- Summarize the results
DBMS_OUTPUT.put_line('******************************* SUMMARY *******************************');
DBMS_OUTPUT.put_line('Baseline elapsed time in seconds -> '||to_char(baseline_elapsed_time));
DBMS_OUTPUT.put_line('New query elapsed time in seconds -> '||to_char(replacement_elapsed_time));
percent_change := ROUND(ABS(((baseline_elapsed_time-replacement_elapsed_time)/baseline_elapsed_time)*100));
IF (replacement_elapsed_time < baseline_elapsed_time) THEN
dbms_output.put_line('New query is '||to_char(percent_change)||'% faster than the original.');
ELSIF (replacement_elapsed_time > baseline_elapsed_time) THEN
dbms_output.put_line('New query is '||to_char(percent_change)||'% slower than the original.');
ELSE
dbms_output.put_line('No significant performance difference detected, you might try a larger test case');
END IF;
COMMIT;
END;

Gives these results:


Baseline run began at: 10/13/2009 20:04:43
Baseline run ended at: 10/13/2009 20:05:10
Replacement query run began at: 10/13/2009 20:05:10
Replacement query run ended at: 10/13/2009 20:05:41
******************************* SUMMARY *******************************
Baseline elapsed time in seconds -> 27
New query elapsed time in seconds -> 30.99999999999999999999999999999999999997
New query is 15% slower than the original.


Continued

Stacey Jones, October 13, 2009 - 3:33 pm UTC

DBMS_OUTPUT.put_line('New query elapsed time in seconds -> '||to_char(replacement_elapsed_time));
percent_change := ROUND(ABS(((baseline_elapsed_time-replacement_elapsed_time)/baseline_elapsed_time)*100));
IF (replacement_elapsed_time < baseline_elapsed_time) THEN
dbms_output.put_line('New query is '||to_char(percent_change)||'% faster than the original.');
ELSIF (replacement_elapsed_time > baseline_elapsed_time) THEN
dbms_output.put_line('New query is '||to_char(percent_change)||'% slower than the original.');
ELSE
dbms_output.put_line('No significant performance difference detected, you might try a larger test case');
END IF;
COMMIT;
END;

Gives these results:


Baseline run began at: 10/13/2009 20:04:43
Baseline run ended at: 10/13/2009 20:05:10
Replacement query run began at: 10/13/2009 20:05:10
Replacement query run ended at: 10/13/2009 20:05:41
******************************* SUMMARY *******************************
Baseline elapsed time in seconds -> 27
New query elapsed time in seconds -> 30.99999999999999999999999999999999999997
New query is 15% slower than the original.

Computing index cluster size

Johann, April 12, 2011 - 2:48 am UTC

Hi Tom,

As usual, I love the learnings I get from this site. However, while browsing I still cannot find a good way to calculate what I should put in as size given the ff:

1. Have two tables, A and B.
2. For Table A, AVG_ROW_LEN is 172.
3. For table B, it's just 26, but I know this is because there's one column there that people haven't started populating. The possible values for that column has an average length of 226 so I'm thinking I should for now use 26+226=252 as average row length for Table B (I know that's a bit simplistic but until I really have a good idea on the values for that column, it will do for now)
4. And on the average, there are 3.77 entries on table B for each entry on table A.

Given the above, what I have so far as candidate cluster size is 172 + (252 * 3.77) = 1122. That still doesn't take into consideration indexes I want to put in for the columns aside from the clustering column and I currently have no idea how to factor them in. Would appreciate pointers. Thanks.
Tom Kyte
April 13, 2011 - 9:14 am UTC

"ff" ??? not even a tiny idea what that is.
http://acronyms.thefreedictionary.com/FF

lots of possibilities.



If you are trying to size the cluster, it is just the amount of data associated with the key - indexes don't matter, they don't count in the sizing. Just the sum of the row sizes associated with a given cluster key value.

Thanks!

Johann, April 14, 2011 - 3:09 am UTC

Sorry about the "ff" - I've always used it as "following" and I thought that was the norm even before days of IM and SMS.

Thanks for the clarification. I guess I can proceed with my computation.
Tom Kyte
April 14, 2011 - 9:50 am UTC

I've honestly never heard of using ff as "following"

Clusters vs Partitioning

Nathan Marston, April 20, 2011 - 10:30 pm UTC

Hi Tom,

Similar question to the ones before - what are the performance implications of choosing between a clustered table and a partitioned table (list partitioned by what would have been the cluster key) - both for selects based on the cluster key, and DML?

(I guess I'm struggling to see how the organisation of the table data would be different in any meaningful way - and hence, if there would be any performance difference)
Tom Kyte
April 25, 2011 - 7:54 am UTC

they are about as different as night and day.

they are the same in that both force "organization" on the data.

A cluster ensures that rows with a common key are stored very very very near eachother - typically in the same block. All of the rows in the emp table with DEPTNO=10 would be stored on the same block for example.

Partitioning ensures that rows are stored in a partition (which has many many many blocks) based on the value of their partition key. In a HASH partitioned table using the hash key of DEPTNO, all records with deptno=10 would be stored in the same partition, but it is highly unlikely they would be on the same block.

Clustering is typically about getting data on the same block, partitioning is not

Re: Clusters vs Partitioning

Nathan Marston, April 26, 2011 - 11:02 pm UTC

Thanks for the quick reply.

FYI, this isn't a work-related question so there's no urgency for an answer on my end. I think we agree that clustering is specifically aimed at improving performance while partitioning is more general-purpose in nature. The motivation for my question is just academic curiosity :).

Getting back to your response, sure - I understand that in the general case, partitioning will not get rows with equal keys organised together. That's because ordinarily the relationship between partition key values and partitions is typically many keys to one partition, not one-to-one and Oracle's under no obligation to put the same key values together.

So in general, a query on the partitioned field still has to scan the partition and that partition contains data that doesn't match the key you're looking for, so relative to clustering you're performing more LIOs. Even if you had (say) a local index on that partition key - and Oracle actually used it (which is doubtful) you're still worse off than if you'd clustered in the first place.

That wasn't my question though - I was asking about a more specific case. The case I'm interested in is where you are list partitioning the table by what would have been the cluster key, and you have set up the partitioning so there is one partition per key value.

For example, say you had this table containing a code field and the table is frequently accessed by that field:

create table blah ( ..., code varchar2(50) not null, ... );

Say code has 10 distinct values. In one scenario you cluster it by "code", in the other you list partition it by "code" with one partition per distinct "code" value (so there are 10 partitions).

I think we both agree that clustering still makes more sense even in this more specific case (even in this more specific case), if our goal is to make the "code" values sit together in the table and hence reduce LIO. What clinches it for me is that clustering just says what I mean. What I'm trying to see is what the performance implications of choosing partitioning instead would be.

Having explained the difference between my question and what I think you thought my question was... I think I now know how Oracle would treat them differently.

I think the issue is that with my partitioning scheme, "it just so happens" that partition keys and paritions map one to one - this kind of setup is a special corner case. My guess would be that Oracle would (under some circumstances) be unable to take advantage of that fact when optimising queries against the partitioned table.
Tom Kyte
April 27, 2011 - 7:51 am UTC

I think we both agree that clustering still makes more sense even in this more
specific case (even in this more specific case), if our goal is to make the
"code" values sit together in the table and hence reduce LIO.


No, we wouldn't agree, in that narrow very specific case - partitioning and clustering would achieve effectively the same goal.

If you list partition by CODE and each CODE has its own partition - it would be very much like clustering - even a bit better in fact. Both cases would have blocks that are filled with just a single CODE value - but the partitioning example would also have these blocks adjacent to each other (better for the full scan) AND it would have it so that we'd know all of the blocks that need to be full scanned for a partition - whereas the cluster would not, you'd have to full scan the entire cluster if you were to full scan.



Clusters vs Partitions

Nathan Marston, April 28, 2011 - 9:07 pm UTC

If you list partition by CODE and each CODE has its own partition - it would be very much like clustering - even a bit better in fact. Both cases would have blocks that are filled with just a single CODE value - but the partitioning example would also have these blocks adjacent to each other (better for the full scan) AND it would have it so that we'd know all of the blocks that need to be full scanned for a partition - whereas the cluster would not, you'd have to full scan the entire cluster if you were to full scan.

Ah... I see where my confusion has come from. I didn't read the manual properly in the first place ;).

What I was *thinking* clustering did was physically order rows by the cluster key (a kind of generalised IOT, except the key fields didn't need to be the primary key). I couldn't see a difference because I thought that advantage (row ordering) applied in both cases.

If I read the docs correctly, what clustering actually does is organise the row data so that for any cluster key value the rows corresponding to that value occupy the minimum possible number of blocks. If there's multiple blocks involved, they could potentially be anywhere.

Running with the example further, I can't think of anything I could do (short of adding a parallel clause - and that's cheating) to make that clustered table outperform the list partitioned table for queries that have "where code = :x" in the SQL.

The data I'm after is already in the minimum possible number of blocks, Oracle knows exactly where those blocks are, and those blocks are all together. In terms of data organisation, I think that's as good as it gets.

Indexing the cluster key on my clustered table might improve its performance (such an index might not get used), but even if it did I don't think it would outperform the list partitioned table for the same query.
Tom Kyte
April 29, 2011 - 8:12 am UTC

If I read the docs correctly, what clustering actually does is organise the row data so that for any cluster key value the rows corresponding to that value occupy the minimum possible number of blocks. If there's multiple blocks involved, they could potentially be anywhere.

exactly.

Running with the example further, I can't think of anything I could do (short of adding a parallel clause - and that's cheating) to make that clustered table outperform the list partitioned table for queries that have "where code = :x" in the SQL.

Neither can I.

The data I'm after is already in the minimum possible number of blocks, Oracle knows exactly where those blocks are, and those blocks are all together. In terms of data organisation, I think that's as good as it gets.

well, there are always opportunities to perhaps compress the data onto fewer blocks...

Clusters vs Partitions

Nathan Marston, April 29, 2011 - 1:31 pm UTC

Thanks Tom - that makes complete sense to me now. Funny how that happens when I go back and re-check my assumptions, isn't it? ;)

Two more questions - I mentioned earlier that how I *thought* clusters worked (i.e. that it imposed a partial ordering on the data based on some combination of fields, so data with the same "key" sits together on disk like they do with an IOT).

First - I'm wondering whether Oracle currently supports this kind of table? It seems to me having "a table where the rows are ordered by columns x,y,z" where those columns aren't necessarily unique could be quite useful.

Second - any idea when you're likely to start taking new questions?

I do have another one regarding a situation where I think the MV refresh engine is doing "the wrong thing", but I won't take up your time here since it isn't really related to clustering at all.
Tom Kyte
April 30, 2011 - 10:03 am UTC

To accomplish storing all data with the same key together would reqire either preallocating a ton of space for each key and hoping we didn't run out of room or shifting and entire table down when you insert a key at the top or moving all of the rows with the same key to somewhere else when the don't fit any more. In other words - hugely expensive. I don't foresee that happening.

I take new questions all of the time... took at least 36 last week...

Clusters vs Partitioning

Nathan Marston, May 01, 2011 - 8:38 pm UTC

To accomplish storing all data with the same key together would reqire either preallocating a ton of space for each key and hoping we didn't run out of room or shifting and entire table down when you insert a key at the top or moving all of the rows with the same key to somewhere else when the don't fit any more.

Hm... fair enough, but I think I'm still missing something then. Why isn't this an issue with IOTs?

I take new questions all of the time... took at least 36 last week...

Perhaps it has something to do with the time of day that I'm looking to submit questions? Seems almost every time I look at the main page it says you're unable to take new questions.
Tom Kyte
May 04, 2011 - 11:45 am UTC

IOTs do not store the data for the entire table sorted on disk, they store a bunch of blocks on disk that if you read them with a full scan - would return the data unsorted.

In an IOT, the rows on a given block might be sorted by the primary key - but the rows on disk, in whole, are NOT.

An IOT stores data just like an index, an index that is fast full scanned using multiblock IO doesn't return data sorted, it returns it as it sees it on disk.

Cluster ot partition

Jayadevan, September 17, 2012 - 2:39 am UTC

Hello Tom,
I have 6-7 tables involved in a number of left outer joins. There is an activity table which tracks activities by members of a loyalty program. This table has many millions of records with unique activity numbers (millions of customers, each with many activities). Depending on the 'type' of activity- redemption, accrual, tier change etc, the details of the activities will be in the child tables (the other 6 tables). They are joined by the activitynum.
We have one query which lists the activities along with the details (from child tables) for any one member, for the past year.. To tune the query, we tried partitioning (i.e. activity table has an curr_year flag - set to Y or N). We have indexes on the child tables and the query is doing mostly PK-index unque scans. Still the query takes a few seconds while we are tying to make ti retieve data in under a second. Will clustering all these tables on the activity number help?
Tom Kyte
September 20, 2012 - 1:54 am UTC

To tune the query, we tried
partitioning


unless you were full scanning, partitioning isn't really going to be useful to "tune a query" with. Unless you can employ partition elimination to remove some data from being scanned, partitioning isn't going to make you go faster (in fact, it could make you go much slower if you are not careful!!!)

I cannot answer the last question without understanding the nature of the data as it exists right now and how much of your time is spent doing physical IO's.

Here would be a quick way to answer this, to see if it would help - what is the response time of the query when it does ZERO physical IO's (eg: the second time you run it with the same inputs).

If that is not less than a second, then no, clustering (which would be useful to reduce physical IO's) by itself would not accomplish your goal.


How about showing us the tkprof report of the query (with the row source operation section - that is vital!!!) and explaining the schema (indexes and the like)

Partitioning and clustering

Jayadevan, September 20, 2012 - 11:40 pm UTC

Hi Tom,
I am trying to answer all your questions - though not in the same order.
We did not have full scans. But I thought that if the active - i.e. current year's data is a different partition, and we ensured that the data is filtered on the partition key (which we are), it will help. We made local indexes.
The second time the query is executed, it does come back very fast - i.e. physical IO does seem to contribute to the delay.
TKPROF -
Rows     Row Source Operation
-------  ---------------------------------------------------
      1  TABLE ACCESS BY INDEX ROWID DOMPARAMS (cr=3 pr=0 pw=0 time=0 us cost=2 size=61 card=1)
      1   INDEX UNIQUE SCAN DOMPARAMS_PK (cr=2 pr=0 pw=0 time=0 us cost=1 size=0 card=1)(object id 245981)
      1  TABLE ACCESS BY INDEX ROWID DOMPARAMS (cr=3 pr=0 pw=0 time=0 us cost=2 size=61 card=1)
      1   INDEX UNIQUE SCAN DOMPARAMS_PK (cr=2 pr=0 pw=0 time=0 us cost=1 size=0 card=1)(object id 245981)
      0  TABLE ACCESS BY INDEX ROWID DOMPARAMS (cr=0 pr=0 pw=0 time=0 us cost=2 size=61 card=1)
      0   INDEX UNIQUE SCAN DOMPARAMS_PK (cr=0 pr=0 pw=0 time=0 us cost=1 size=0 card=1)(object id 245981)
      0  TABLE ACCESS BY INDEX ROWID DOMPARAMS (cr=0 pr=0 pw=0 time=0 us cost=2 size=61 card=1)
      0   INDEX UNIQUE SCAN DOMPARAMS_PK (cr=0 pr=0 pw=0 time=0 us cost=1 size=0 card=1)(object id 245981)
      0  TABLE ACCESS BY INDEX ROWID MEMBERACTIVITY (cr=0 pr=0 pw=0 time=0 us cost=3 size=32 card=1)
      0   INDEX UNIQUE SCAN MEMBERACTIVITY_PK (cr=0 pr=0 pw=0 time=0 us cost=2 size=0 card=1)(object id 246166)
      0  TABLE ACCESS BY INDEX ROWID DOMPARAMS (cr=0 pr=0 pw=0 time=0 us cost=2 size=61 card=1)
      0   INDEX UNIQUE SCAN DOMPARAMS_PK (cr=0 pr=0 pw=0 time=0 us cost=1 size=0 card=1)(object id 245981)
   1688  SORT AGGREGATE (cr=3 pr=0 pw=0 time=0 us)
      0   NESTED LOOPS  (cr=3 pr=0 pw=0 time=0 us cost=2 size=61 card=1)
      0    TABLE ACCESS BY INDEX ROWID MEMBERMASTER (cr=3 pr=0 pw=0 time=0 us cost=2 size=30 card=1)
      0     INDEX RANGE SCAN MEMBERMASTER_PK (cr=3 pr=0 pw=0 time=0 us cost=1 size=0 card=1)(object id 246200)
      0    INDEX UNIQUE SCAN PRGMEMSTMACTDTL_PK (cr=0 pr=0 pw=0 time=0 us cost=0 size=31 card=1)(object id 246197)
  13504  SORT ORDER BY (cr=15651 pr=0 pw=0 time=24305 us cost=50 size=10801 card=7)
  13504   NESTED LOOPS OUTER (cr=15651 pr=0 pw=0 time=642539 us cost=49 size=10801 card=7)
  13504    NESTED LOOPS OUTER (cr=15643 pr=0 pw=0 time=602157 us cost=41 size=10458 card=7)
  13504     NESTED LOOPS OUTER (cr=15643 pr=0 pw=0 time=580501 us cost=40 size=10269 card=7)
  13504      VIEW  (cr=11553 pr=0 pw=0 time=515406 us cost=20 size=9961 card=7)
  13504       NESTED LOOPS OUTER (cr=11553 pr=0 pw=0 time=508655 us cost=20 size=10052 card=7)
   1688        VIEW  (cr=8174 pr=0 pw=0 time=58322 us cost=16 size=2802 card=2)
   1688         COUNT  (cr=8165 pr=0 pw=0 time=43982 us)
   1688          FILTER  (cr=8165 pr=0 pw=0 time=40617 us)
   1688           NESTED LOOPS OUTER (cr=8165 pr=0 pw=0 time=41572 us cost=16 size=2362 card=2)
   1688            VIEW  (cr=4918 pr=0 pw=0 time=30968 us cost=12 size=2328 card=2)
   1688             NESTED LOOPS OUTER (cr=4918 pr=0 pw=0 time=30245 us cost=12 size=464 card=2)
   1688              NESTED LOOPS OUTER (cr=4270 pr=0 pw=0 time=24702 us cost=9 size=416 card=2)
   1688               NESTED LOOPS OUTER (cr=668 pr=0 pw=0 time=10242 us cost=6 size=366 card=2)
   1688                TABLE ACCESS BY INDEX ROWID MEMBERACTIVITY (cr=439 pr=0 pw=0 time=4820 us cost=5 size=266 card=2)
   1688                 INDEX RANGE SCAN TCC_INDEX (cr=14 pr=0 pw=0 time=1566 us cost=3 size=0 card=2)(object id 299024)
      0                TABLE ACCESS BY INDEX ROWID PRGMANACCUPDACT (cr=229 pr=0 pw=0 time=0 us cost=1 size=50 card=1)
      0                 INDEX UNIQUE SCAN PRGMANACCUPDACT_PK (cr=229 pr=0 pw=0 time=0 us cost=0 size=0 card=1)(object id 246144)
   1688               TABLE ACCESS BY INDEX ROWID MEMACRACT (cr=3602 pr=0 pw=0 time=0 us cost=2 size=25 card=1)
   1688                INDEX UNIQUE SCAN MEMACRACT_PK (cr=1914 pr=0 pw=0 time=0 us cost=1 size=0 card=1)(object id 246164)
      0              TABLE ACCESS BY INDEX ROWID PRGTIRCHGACT (cr=648 pr=0 pw=0 time=0 us cost=2 size=24 card=1)
      0               INDEX UNIQUE SCAN PRGTIRCHGACT_PK (cr=648 pr=0 pw=0 time=0 us cost=1 size=0 card=1)(object id 246225)
   1688            TABLE ACCESS BY INDEX ROWID ACTIVITYPRG (cr=3247 pr=0 pw=0 time=0 us cost=2 size=17 card=1)
   1688             INDEX UNIQUE SCAN ACRACTPGM_PK (cr=1559 pr=0 pw=0 time=0 us cost=1 size=0 card=1)(object id 245844)
  13504        TABLE ACCESS BY INDEX ROWID PNTTYPEMASTER (cr=3379 pr=0 pw=0 time=0 us cost=3 size=140 card=4)
  13504         INDEX RANGE SCAN PNTTYPEMASTER_PK (cr=3 pr=0 pw=0 time=3920 us cost=1 size=0 card=8)(object id 246216)
   1200      TABLE ACCESS BY INDEX ROWID PRGMEMPNTTXN (cr=4090 pr=0 pw=0 time=0 us cost=3 size=44 card=1)
   1200       INDEX RANGE SCAN TCC_INDEX1 (cr=3702 pr=0 pw=0 time=0 us cost=2 size=0 card=1)(object id 298527)
      0     TABLE ACCESS BY INDEX ROWID COMBNSMST (cr=0 pr=0 pw=0 time=0 us cost=1 size=27 card=1)
      0      INDEX UNIQUE SCAN COMBNSMST_PK (cr=0 pr=0 pw=0 time=0 us cost=0 size=0 card=1)(object id 245953)
      0    TABLE ACCESS BY INDEX ROWID MEMBERACTIVITYOTHLNGDES (cr=8 pr=0 pw=0 time=0 us cost=2 size=49 card=1)
      0     INDEX RANGE SCAN MEMACTMUTLINGDES_PK (cr=8 pr=0 pw=0 time=0 us cost=1 size=0 card=1)(object id 246171)
********************************************************************************

Snce the tables are linked on the activity/activityreference number, my thought was to cluster the tables on that column. But the tables involved are big - each one a few hundred MB. So I haev concerns on the insert/update performance.
Tom Kyte
September 26, 2012 - 12:11 pm UTC

that says none of your time is spent doing physical IO, I don't see any pr's or pw's.

there are no full scans (so partitioning != (fast=true) for you)


and where is the rest of the tkrprof??? the query? the report of parse/fetch/execute? the wait events????

true I said the row source was important, but i didn't "forget everything else"

usually people just post the other bits...

Single table hash cluster

A reader, August 14, 2013 - 8:45 am UTC

I'm following an example in your book where I implement a Single Table Hash Cluster when the HASHKEY is a surrogate primary key generated from a sequence, ie. it will never have a hash collision.

As the column is the primary key - I want to declare a constraint to state as such - however, that means it will also create a UNIQUE INDEX to enforce the constraint?

Can that cause the optimizer to use the UNIQUE index when accessing the table? Kind of defeats the purpose of having the cluster to reduce I/O ?
Tom Kyte
August 14, 2013 - 3:20 pm UTC

ie. it will never have a hash collision.


it absolutely would have hash collisions, there is always the chance of a hash collision (by design) in a hash cluster. You are mapping a potentially infinite set of values into a finite number of buckets.

Even if you size your hash cluster at a billion entries and only put in two rows with different hash keys - you have a chance of a hash collision!

however, that means it will also create a UNIQUE INDEX to enforce the
constraint?


absolutely


Can that cause the optimizer to use the UNIQUE index when accessing the table?
Kind of defeats the purpose of having the cluster to reduce I/O ?


no, the optimizer sees the three access paths:

a) full scan
b) index unique scan
c) hash

the optimizer will cost out each one and pick the "cheapest" path which would almost certainly be hash access. that is what the optimizer does.

Single table hash cluster

A reader, August 16, 2013 - 11:27 am UTC

Thank you.

One more question;

If I had a lookup table where I always go in by primary key (generated from a sequence) - what's the difference between using a Single Table Hash cluster and an IOT ? What's the reasons I would choose one over the other ?
Tom Kyte
August 28, 2013 - 5:01 pm UTC

single table hash cluster = hash(key) to get file.block and retrieve record in a single IO.


IOT = index range scan, read root, branch (maybe more than one branch) and then leaf block to get row. probably 3 or 4 single block IO's.


so, hash cluster = cpu(to hash key) + one single block IO

IOT = three/four single block IO's, where everyone is trying to read the same root block (contention for cache buffers chains latch possible).



Single table hash cluster Vs IOT

A reader, August 30, 2013 - 12:21 pm UTC

Thank you.

I was perhaps thinking you would use an IOT for a lookup table where don't know with certainty how many rows there are going to be? Looking at your book again, that would appear logical as you state that hash clusters is suitable when "you know with a good degree of accuracy how many rows the table will have..."

Following on from that, another reason would be when you need to perform range scanning of the cluster key?

Think I kind of answered my own question based on the "opposites" theories for using hash clusters in your book!

View Sizing Post Loading

Martin Rose, August 18, 2015 - 6:56 pm UTC

Tom, is there a way of seeing how well you've sized a hash cluster after you've loaded it up?

I have a single table hash cluster with a fixed, never-changing number of rows. I thought I had correctly sized it, but a hash cluster version of the data takes up 10G, whilst the IOT version just 300K !

Clearly I've sized it wrongly, but how can I see where I've gone wrong?