Skip to Main Content

Breadcrumb

May 4th

Question and Answer

Tom Kyte

Thanks for the question, OP.

Asked: January 22, 2001 - 12:45 am UTC

Last updated: June 07, 2007 - 3:19 pm UTC

Version: 8.1.6.0

Viewed 1000+ times

You Asked

Dear Tom,

Hi !

Q) Is clustrering and Cluster Index are same things ?
How a cluseter index can improve the storage space.
Kindly expalin the scenarios where creating a cluster
index will benefit in the database.


Kindly give the syntax for the creation of CLUSTER
and HASH INDEX ?



Thanks and Regards
OP Lalwani


and Tom said...

A cluster index is needed in support of a cluster. One is used to allow the other to exist. Much like we need an index in support of a primary key. They are not the same things.

Clusters are useful in the database to store related pieces of information from more then 1 table in the same physical database block. It in effect stores data "prejoined". I can use this technique to store all of the data from the DEPT and EMP table for a given DEPTNO on the same block so that all employees of department 10 as well as the department 10 master record are all physically stored on the same exact block. When I go to "join" this data -- it is already done for me -- in a very few IOs I can get all of the data I need. Using conventional tables, this data could be scattered onto many dozens of blocks.

Clusters are useful when you want data with the same cluster key values to be physically stored near eachother.

Hash clusters are useful when you always access the data by primary key. For example, say you have some data in a table T and you ALWAYS query:

select * from T where id = :x;

That might be a good candidate for a hash cluster since there will be no index needed. Oracle will hash the value of :x into a physical address and go right to the data. No index range scan, just a table access by the hash value.

The cluster syntax can be viewed at:
</code> http://docs.oracle.com/cd/A81042_01/DOC/server.816/a76989/ch4e6.htm#13140 <code>

it gives examples of creating a cluster, the cluster key index, and the tables in the cluster.

Also, with a hash cluster there is no index by definition so no syntax to inspect. We hash the cluster key to determine where the data goes -- we do not index it. the data is the index.



Rating

  (12 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Helena Markova, November 16, 2001 - 2:20 am UTC


why not use cluster always?

Thiyag, October 13, 2004 - 8:06 am UTC

Hi Tom,

Why not use cluster always? In other if the cluster does not qualify the below criteria cluster is not to be used;

1. Have lot of DML activity
2. Have non-uniform data.

Can you let us know why? Does cluster backfire anyway? Any idea if the usage of cluster in MS-Sql server is different from Oracle?


Tom Kyte
October 13, 2004 - 8:38 am UTC

do you have Expert One on One Oracle?

In there I spend many pages explaining what they are, how to use them and in fact how they are very different from the clustered index in sybase (later bought by MS)

Which of your books should I read first.

Stewart W. Bryson, February 25, 2005 - 3:22 pm UTC

I'm a data warehouse DBA (9.2 and 10) working mostly on tuning. Which one would benefit me the most. You seem to reference Expert One on One more in the threads I read, but I wanted to make sure.

Thanks.

Tom Kyte
February 25, 2005 - 6:44 pm UTC

design, best practices: Effective Oracle by Design (9ir2 and before)

examples, factual, text book like: Expert one on one Oracle (8ir3 and before)

I won't have all of the new 9i/10g stuff in Expert until end of this year.

Advice

Stewart W. Bryson, February 28, 2005 - 10:48 am UTC

If you didn't have to take our questions all day long, maybe you get get the new version out quicker.

Thanks as usual.

Tom Kyte
February 28, 2005 - 10:54 am UTC

where do you think I get material for the books from :)

hash vs index cluster

Kumar, May 12, 2005 - 10:03 am UTC

Does oracle preallocate space for hash cluster and not for index cluster? If so, one has to estimate really well the space for hash cluster and that may not be necessary for index cluster? Is this a fair statement? Thanks.

Tom Kyte
May 12, 2005 - 1:15 pm UTC

hash cluster --> preallocated based on number of keys and key size.

index cluster --> not preallocated

in both cases, you want to get the key size (amount of data associated with a given key) as good as possible, but with hash clusters understanding the uppper bound on the ultimate number of keys is very relevant, yes.

migrate sql server clustered index

A reader, December 22, 2005 - 3:13 am UTC

Hi

I am migrating a SQL Server 2000 Application to Oracle 10gR2. There are tons of clustered index in this application I was wondering how to transform that to Oracle type of clustered index?

Is it an IOT? A index cluster, a hash cluster, a sort hash cluster? I think it´s more like an IOT however I am not sure if its a good idea change over 50 tables to IOT....

Tom Kyte
December 22, 2005 - 10:44 am UTC

answer this very very important question first:

why did you use the clustered index in sqlserver. Was the primary technical reason behind using that structure to "permit more than one person at a time to insert into the table"?

migrate sql server clustered index

A reader, December 22, 2005 - 12:40 pm UTC

Hi

The reason is this is a GIS Application and making the index clustered the query runs much faster since the index contains the data the query is looking for.

I looked IOT and it wont work neither since IOT only supports primary keys! And these sql server clustered indexes are not used on PK :-(


Tom Kyte
December 22, 2005 - 4:47 pm UTC

well, you can create "just indexes" in Oracle to do that - create index i on t(a,b,c)

And same in sqlserver - the clustered index would not have changed that? It would have stored the data "sorted" on disk.

Is it that you want all of the rows with the same "key" values stored physically together?

migrate sql server clustered index

A reader, December 24, 2005 - 4:28 pm UTC

Hi

I talked with the developers and they say that clustered index store index and the data, same as IOT. They created the index for Spatial queries which accoding to them run much faster. They say the data is clustered in the disk (groupped) so the access time is much better

I think this has to be an IOT but IOT only works with PKs so I think at end of the day Clustered index as such in Oracle does not exist...

I would have to see if create an index with the data would be the solution, but if that is the only Oracle solution then why SQL Server invented clustered index? I am sure they can create composite indexes as well?

Tom Kyte
December 24, 2005 - 5:24 pm UTC

clustered indexes in sqlserver can have a 110% overhead - but anyway.

You are likely looking at a B*TREE CLUSTER, you create an index on the cluster key (say DEPTNO from EMP). There will be an entry in this cluster key once for each distinct key value - it'll point to the block where all of the data that has the same deptno is stored together.

Have you look through the Oracle Concepts Guide? (or if you have access to any of my books - I've written about these physical structures - b*tree clusters, hash clusters, sorted hash clusters, heap tables, external tables, index organized tables ..... in every one)

what do you mean 110$ overhead

A reader, December 25, 2005 - 4:32 pm UTC

Hi

What is the 110% overhead you mentioned in SQL Server clustered index?

Tom Kyte
December 25, 2005 - 4:45 pm UTC

extra storage in subsidiary data structures - such as other indexes. Sort of like secondary indexes on IOT's in Oracle has. Secondary indexes on these structures (iots/clustered indexes in sqlserver) cannot point to "where the data is", they have to store the "index key" in the secondary index and in effect do two index range scans to find the data)

And I believe the index key is stored "twice", once with the index entry in the intermediate index nodes - and again with the row data - so if the key is a sizeable component of the row - and is stored twice...

If the posters goal was to CLUSTER LIKE DATA together by some key, what they are looking for would be the b*tree cluster in Oracle.

Many times - clustered indexes in SQLServer are used to simply spread the data out - to avoid contention. If that was the reason the poster wanted to mimick their behaviour in Oracle - the answer would be "don't - you just need a normal table"



b*tree clusters

A reader, December 25, 2005 - 5:56 pm UTC

Hi

Where is b*tree clusters in the doc explained? Cant find anything :-?


Tom Kyte
December 26, 2005 - 10:08 am UTC

Hash vs. B*Tree cluster

Alen, June 07, 2007 - 12:34 pm UTC

Hi Tom,

in your book "Efficient Oracle by design" i don't understand something - maybe you could clarify. On the pages 393-394 you compared hash clusters vs. b*tree clusters and you wrote:
"If a B*Tree index can increase the buffer cache utilization by simply storing similar data together, a hash cluster can go one step further by eliminating the index blocks altogether."

However, the tkproof report doesn't show that - the numbers in the "query" column are almost the same (49234 vs. 49730), why is that? Since we only access the table blocks, no index blocks, why is the ammount of consistent gets almost the same?

The second thing i don't understand is the number of "fetch" and "rows" - meanwhile in the first and second test they are the same (47620 fetches, 47170 rows) - the third time they are different, but they shouldn't be (47610 fetches, 47160 rows) - 10 fetches less, 10 rows less.
Tom Kyte
June 07, 2007 - 3:19 pm UTC

almost the same is NOT 'the same', it reduced the logical IO's against the data stored in the hash cluster

i had 10 less rows when I created the tables the second time around apparently.

Hash vs. B*Tree cluster

Alen, June 07, 2007 - 5:02 pm UTC

Well I was expecting a large drop in LIO, since the use of HEAP table produced 95930 LIO and the use of B*Tree cluster produced 49730 LIO, thats only 51% of the previous one. Having in mind that "If a B*Tree index can increase the buffer cache utilization by simply storing similar data together, a hash cluster can go one step further by eliminating the index blocks altogether." the result is poor - only 1% LIO less. That is really not a "one step further".

What amazes me more is my test. I wanted to try reproduce your tests, here are my results:
heap table = 38.183 consistent gets
btree cluster = 51.467 consistent gets
hash cluster = 91.845 consistent gets

When I compared yours and mine TKPROF report, i see the difference.

Your's output:
MERGE JOIN CARTESIAN
 TABLE ACCESS HASH USERS_OBJECTS
 BUFFER SORT
  TABLE ACCESS HASH USER_INFO

Mine output:
MERGE JOIN CARTESIAN
 TABLE ACCESS HASH USER_INFO
 BUFFER SORT
  TABLE ACCESS HASH USERS_OBJECTS

See, the tables "switched" the position. I guess that's because of the different data in the tables -> producing different statistics -> different "optimal" plan.

But if you think about it for a second: isn't it ALWAYS better to first access the USER_INFO table, since there is at most one row for a given username (we are using user_info.username = :value), and then join the result to USERS_OBJECTS?

I'm confused...