Skip to Main Content
  • Questions
  • B*tree cluster -or- hash cluster -or- IOT applications?


Question and Answer

Tom Kyte

Thanks for the question, Duke.

Asked: March 10, 2008 - 1:21 pm UTC

Last updated: April 27, 2009 - 11:11 am UTC

Version: 10.2.0

Viewed 1000+ times

You Asked

During your HOTSOS training day Thursday, you covered the various non-heap table possibilities. But I'm still not clear on what applications would be appropriate for each. Can you give me some typical area where each would be considered 'best practice' implementations?

I've thought that IOT was primarily for small, "narrow" lookup tables. Perhaps in a datamart (star schema) -- but not for "wide" (dozens of columns) dimension tables.

I can't imagine using a cluster for a fact and dimension. Perhaps in some OLTP implementation? Clearly Oracle itself uses it internally, but where might we?

Perhaps I'm exhibiting a failure of imagination, but I'm not getting any traction without some concrete examples. Thank you!

and Tom said...

hash cluster - think "lookup, optimized perfect lookup"

If you have lots of "key reads" - you know, "where column = ?" - a hash cluster could retrieve that information *fast* and without having to use the index (reduced contention, less work on retrieval)

evenif "where column = ?" returns LOTS of rows (we would make sure they were all on as FEW blocks as possible).

Think that OLTP system with "where customer = ?", call center with "where caller = ?", HR system with "where empno = ?", that ETL (extract transform load process) with "where zip_code = ?" and so on.

B*Tree's - envision an application that stores "objects" - like the data dictionary. When you create a table, you give us the table name, the columns, their types, constraints, constrained columns, etc.

When we parse (process your data), we need all of that stuff back - together. So, we use a b*tree cluster to store them all together. When you describe table T, we do the minimum number of IO's possible to load the cache (the data is forced to be stored together upon insert). When you describe that table, we use as few blocks as possible to cache that information. We do as little work as possible to retrieve that data - we make sure all of the data from multiple tables are stored together.

Think of any parent child relation where the parent and many/most of the child records are created "at once" (like create table does, like create index does). Think "order/line items" for example.

If you have access to either of Expert one on one Oracle or Expert Oracle Database Architecture - my talk was basically based on those books - there are more examples and more words to be had there as well.


  (1 rating)

Is this answer out of date? If it is, please let us know via a Comment


Example for IOT please

A reader, April 23, 2009 - 8:40 pm UTC

Hi Tom,
Can you please provide an example for IOT please?
Tom Kyte
April 27, 2009 - 11:11 am UTC

create table t ( x int primary key ) organization index;

but, that is probably not what you meant. Here is a short excerpt from Expert Oracle Database Architecture:


Index Organized Tables
Index organized tables (IOTs) are quite simply tables stored in an index structure. Whereas a table stored in a heap is unorganized (i.e., data goes wherever there is available space), data in an IOT is stored and sorted by primary key. IOTs behave just like "regular" tables do as far as your application is concerned; you use SQL to access them as normal. They are especially useful for information retrieval, spatial, and OLAP applications.

What is the point of an IOT? You might ask the converse, actually: what is the point of a heap organized table? Since all tables in a relational database are supposed to have a primary key anyway, isn't a heap organized table just a waste of space? We have to make room for both the table and the index on the primary key of the table when using a heap organized table. With an IOT, the space overhead of the primary key index is removed, as the index is the data, and the data is the index. The fact is that an index is a complex data structure that requires a lot of work to manage and maintain, and the maintenance requirements increase as the width of the row to store increases. A heap, on the other hand, is trivial to manage by comparison. There are efficiencies in a heap organized table over an IOT. That said, IOTs have some definite advantages over their heap counterparts. For example, I remember once building an inverted list index on some textual data (this predated the introduction of interMedia and related technologies). I had a table full of documents, and I would parse the documents and find words within them. I had a table that then looked like this:
create table keywords
( word  varchar2(50),
  position   int,
  doc_id int, 
  primary key(word,position,doc_id)
Here I had a table that consisted solely of columns of the primary key. I had over 100 percent overhead; the size of my table and primary key index were comparable (actually, the primary key index was larger since it physically stored the rowid of the row it pointed to, whereas a rowid is not stored in the table-it is inferred). I only used this table with a WHERE clause on the WORD or WORD and POSITION columns. That is, I never used the table-I used only the index on the table. The table itself was no more than overhead. I wanted to find all documents containing a given word (or "near" another word, and so on). The heap table was useless, and it just slowed down the application during maintenance of the KEYWORDS table and doubled the storage requirements. This is a perfect application for an IOT.

Another implementation that begs for an IOT is a code lookup table. Here you might have ZIP_CODE to STATE lookup, for example. You can now do away with the heap table and just use an IOT itself. Anytime you have a table that you access via its primary key exclusively, it is a candidate for an IOT.

The same would be true if you frequently use BETWEEN queries on a primary or unique key. Having the data stored physically sorted will increase the performance of those queries as well. For example, I maintain a table of stock quotes in my database. Every day, for hundreds of stocks, I gather together the stock ticker, date, closing price, days high, days low, volume, and other related information. The table looks like this:
ops$tkyte@ORA10GR1> create table stocks
  2  ( ticker      varchar2(10),
  3    day         date,
  4    value       number,
  5    change      number,
  6    high        number,
  7    low         number,
  8    vol         number,
  9    primary key(ticker,day)
 10  )
 11  organization index
 12  /
Table created.
I frequently look at one stock at a time for some range of days (e.g., computing a moving average). If I were to use a heap organized table, the probability of two rows for the stock ticker ORCL existing on the same database block are almost zero. This is because every night, I insert the records for the day for all of the stocks. That fills up at least one database block (actually, many of them). Therefore, every day I add a new ORCL record, but it is on a block different from every other ORCL record already in the table. If I query as follows:
Select * from stocks 
 where ticker = 'ORCL' 
   and day between sysdate-100 and sysdate;
Oracle would read the index and then perform table access by rowid to get the rest of the row data. Each of the 100 rows I retrieve would be on a different database block due to the way I load the table-each would probably be a physical I/O. Now consider that I have this same data [NL19]in an IOT. That same query only needs to read the relevant index blocks, and it already has all of the data. Not only is the table access removed, but all of the rows for ORCL in a given range of dates are physically stored "near" each other as well. Less logical I/O and less physical I/O is incurred.