Yes, normal inserts won't use attribute clustering. If you want to apply the clustering, you can do so by moving the table.
The following shows this by creating a table with linear clustering by (c1, c2). But inserting rows in a random order:
create table t (
c1 number, c2 number, junk varchar2(20)
);
alter table t add clustering by linear order ( c1, c2 );
insert into t
select floor ( level / 10 ) c1, mod ( level, 100 ) c2,
lpad ( 'x', 20, 'x' ) junk
from dual
connect by level <= 1000
order by dbms_random.value;
commit;
create index i1 on t ( c1 );
create index i2 on t ( c2 );
exec dbms_stats.gather_table_stats ( user, 't' ) ;
select index_name, clustering_factor
from user_indexes
where table_name = 'T';
INDEX_NAME CLUSTERING_FACTOR
I1 430
I2 431
Notice that the clustering factor is about the same for each column.
When you move the table, the attribute clustering kicks in. So now the rows are sorted by c1, then c2. So the clustering factor for c1 is "perfect":
alter table t move online update indexes;
exec dbms_stats.gather_table_stats ( user, 't' ) ;
select index_name, clustering_factor
from user_indexes
where table_name = 'T';
INDEX_NAME CLUSTERING_FACTOR
I1 5
I2 496
Note the online move is a 12.2 option.
IOTs and table clusters always insert rows in the "correct" place. Regardless of your insert method.
Despite this, attribute clustering has a couple of advantages.
You can add clustering to an existing table. To convert a table to an IOT or add it to a table cluster, you need to rebuild it.
Interleaved clustering also allows you to have a "hybrid" clustering.
Often when you sort rows by one column, they become less sorted on the other columns. Interleaved clustering sorts the data in a Z-order curve. So while neither column will have perfect clustering, the combined order is likely to be better than sorting by one or the other.
You can see this by switching the clustering from linear to interleaved and moving the table again:
alter table t drop clustering ;
alter table t add clustering by interleaved order ( c1, c2 ) ;
alter table t move online update indexes;
exec dbms_stats.gather_table_stats ( user, 't' ) ;
select index_name, clustering_factor
from user_indexes
where table_name = 'T';
INDEX_NAME CLUSTERING_FACTOR
I1 40
I2 221
While the clustering factor for I1 has gone up (5 -> 40), it's halved for I2. The overall clustering factor is much lower than we started with (400+ for both indexes).
So why use IOTs or table clusters?
IOTsThese merge the primary key index and table data into one structure. So want you search by primary key, you only need to access the index. Whereas a heap table you need to access the primary key index, then the table. This often means you're accessing one less block.
Let's add a primary key to the heap table. Then build an IOT clone of it:
alter table t add primary key ( c1, c2 );
create table t_iot (
c1, c2, junk,
primary key ( c1, c2 )
) organization index as
select * from t;
Getting one row by primary key for the heap table uses three gets. Two for the index, one for the table:
set serveroutput off
alter session set statistics_level = all;
select * from t
where c1 = 1
and c2 = 10;
select *
from table(dbms_xplan.display_cursor(null, null, 'IOSTATS LAST'));
PLAN_TABLE_OUTPUT
SQL_ID 5ryynu02p2734, child number 0
-------------------------------------
select * from t where c1 = 1 and c2 = 10
Plan hash value: 635963422
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 3 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 1 | 1 |00:00:00.01 | 3 |
|* 2 | INDEX UNIQUE SCAN | SYS_C008324 | 1 | 1 | 1 |00:00:00.01 | 2 |
-----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("C1"=1 AND "C2"=10)
But getting the same row from the IOT only uses two gets. There's no extra table access:
select * from t_iot
where c1 = 1
and c2 = 10;
select *
from table(dbms_xplan.display_cursor(null, null, 'IOSTATS LAST'));
PLAN_TABLE_OUTPUT
SQL_ID gpjv7d0384n8d, child number 0
-------------------------------------
select * from t_iot where c1 = 1 and c2 = 10
Plan hash value: 3821691147
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 2 |
|* 1 | INDEX UNIQUE SCAN| SYS_IOT_TOP_75219 | 1 | 1 | 1 |00:00:00.01 | 2 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("C1"=1 AND "C2"=10)
Because IOTs store the rows sorted by the primary key, this can help when you have multi-column primary keys. And you (almost always) query the table with an equality (=) on the leading column of the PK.
For example, a common use case is to show customers their recent orders/transactions. An IOT with a primary key of:
( customer_id, transaction_datetime )
Stores all rows for a given customer "next" to each other, sorted by date. So queries like:
select *
from orders
where customer_id = :cust
and transaction_datetime > sysdate - 30;
Will be able to scan this efficiently. Whereas with a heap, the transactions could be spread throughout the table.
create table t_iot (
c1, c2, junk,
primary key ( c1, c2 )
) organization index as
select * from t;
select * from t_iot
where c1 = 1;
select *
from table(dbms_xplan.display_cursor(null, null, 'IOSTATS LAST'));
PLAN_TABLE_OUTPUT
SQL_ID 5ku20fvctdnjq, child number 0
-------------------------------------
select * from t_iot where c1 = 1
Plan hash value: 1749615820
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 10 |00:00:00.01 | 2 |
|* 1 | INDEX RANGE SCAN| SYS_IOT_TOP_137270 | 1 | 10 | 10 |00:00:00.01 | 2 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("C1"=1)
Table clustersThese are more about locating rows from separate tables in the same place. When you join non-clustered tables, the rows for each table will be in separate blocks. If you cluster on the join key, then the rows are in the same block.
For example, let's add a second table. This is the parent table for t, with a single column PK:
create table t2 ( c1 primary key, junk ) as
select distinct c1, junk from t;
The rows for these exist in separate blocks. And joing for a given t2 row does five gets:
select t1.rowid, t2.rowid, t1.junk, t2.junk,
dbms_rowid.rowid_block_number ( t1.rowid ) t1_blk,
dbms_rowid.rowid_block_number ( t2.rowid ) t2_blk
from t1_clus t1
join t2_clus t2
on t1.c1 = t2.c1
where t2.c1 = 1;
ROWID ROWID JUNK JUNK T1_BLK T2_BLK
AAASXKAAMAAAAJ7AAZ AAASXVAAMAAAAL7AAf xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx 635 763
AAASXKAAMAAAAJ7AAa AAASXVAAMAAAAL7AAf xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx 635 763
AAASXKAAMAAAAJ7AAb AAASXVAAMAAAAL7AAf xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx 635 763
AAASXKAAMAAAAJ7AAc AAASXVAAMAAAAL7AAf xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx 635 763
AAASXKAAMAAAAJ7AAd AAASXVAAMAAAAL7AAf xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx 635 763
AAASXKAAMAAAAJ7AA9 AAASXVAAMAAAAL7AAf xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx 635 763
AAASXKAAMAAAAJ7AA+ AAASXVAAMAAAAL7AAf xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx 635 763
AAASXKAAMAAAAJ7AA/ AAASXVAAMAAAAL7AAf xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx 635 763
AAASXKAAMAAAAJ7ABA AAASXVAAMAAAAL7AAf xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx 635 763
AAASXKAAMAAAAJ7ABB AAASXVAAMAAAAL7AAf xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx 635 763
select *
from table(dbms_xplan.display_cursor(null, null, 'IOSTATS LAST'));
PLAN_TABLE_OUTPUT
SQL_ID 8sqzjv6qmcytn, child number 0
-------------------------------------
select t1.rowid, t2.rowid, t1.junk, t2.junk,
dbms_rowid.rowid_block_number ( t1.rowid ) t1_blk,
dbms_rowid.rowid_block_number ( t2.rowid ) t2_blk from t t1 join t2
t2 on t1.c1 = t2.c1 where t2.c1 = 1
Plan hash value: 731811485
--------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
--------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 10 |00:00:00.01 | 5 |
| 1 | NESTED LOOPS | | 1 | 10 | 10 |00:00:00.01 | 5 |
| 2 | TABLE ACCESS BY INDEX ROWID | T2 | 1 | 1 | 1 |00:00:00.01 | 2 |
|* 3 | INDEX UNIQUE SCAN | SYS_C008326 | 1 | 1 | 1 |00:00:00.01 | 1 |
| 4 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 10 | 10 |00:00:00.01 | 3 |
|* 5 | INDEX RANGE SCAN | I1 | 1 | 10 | 10 |00:00:00.01 | 2 |
--------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("T2"."C1"=1)
5 - access("T1"."C1"=1)
If we place these tables in hash cluster instead, then all the rows for each C1 are in the same block. And the join does just 2 gets:
create cluster clus (
c1 number
) hashkeys 1024;
create table t1_clus (
c1 number, c2 number, junk varchar2(20)
) cluster clus ( c1 );
create table t2_clus (
c1 number, junk varchar2(20)
) cluster clus ( c1 );
insert all
when rn = 1 then into t2_clus values ( c1, junk )
when 1=1 then into t1_clus values ( c1, c2, junk )
select t.*, row_number () over ( partition by c1 order by c2 ) rn
from t
order by c1, c2;
commit;
exec dbms_stats.gather_table_stats ( user, 't1_clus' ) ;
exec dbms_stats.gather_table_stats ( user, 't2_clus' ) ;
select t1.rowid, t2.rowid, t1.junk, t2.junk,
dbms_rowid.rowid_block_number ( t1.rowid ) t1_blk,
dbms_rowid.rowid_block_number ( t2.rowid ) t2_blk
from t1_clus t1
join t2_clus t2
on t1.c1 = t2.c1
where t2.c1 = 1;
select *
from table(dbms_xplan.display_cursor(null, null, 'IOSTATS LAST'));
PLAN_TABLE_OUTPUT
SQL_ID 1mnxd2yqptkqv, child number 0
-------------------------------------
select t1.rowid, t2.rowid, t1.junk, t2.junk,
dbms_rowid.rowid_block_number ( t1.rowid ) t1_blk,
dbms_rowid.rowid_block_number ( t2.rowid ) t2_blk from t1_clus t1
join t2_clus t2 on t1.c1 = t2.c1 where t2.c1 = 1
Plan hash value: 3541621931
------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 10 |00:00:00.01 | 2 |
| 1 | MERGE JOIN CARTESIAN| | 1 | 10 | 10 |00:00:00.01 | 2 |
|* 2 | TABLE ACCESS HASH | T2_CLUS | 1 | 1 | 1 |00:00:00.01 | 1 |
| 3 | BUFFER SORT | | 1 | 10 | 10 |00:00:00.01 | 1 |
|* 4 | TABLE ACCESS HASH | T1_CLUS | 1 | 10 | 10 |00:00:00.01 | 1 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("T2"."C1"=1)
4 - access("T1"."C1"=1)
Because it's a hash cluster, the database can locate rows using a hash function. No index needed!
So if table clusters are so great, why are they so rare?
Well, aside from most poeople not knowing about them, hash clusters have a big drawback.
You need to be able to estimate how big each row will be and the total number of rows. This is so you can set the number of hashkeys large enough to avoid collisions. Which make them less efficient. How often can you do that?
Index clusters avoid the problem of needing to know the total number of rows. But make full table scans slower. And have other caveats.
That said, there is a type of cluster I think should be much more widely used:
Single table hash clusters.
These are hash clusters you can only store one table in. These are perfect for small lookup tables for things like status codes, clustered on their primary key.
Here the data rarely changes. Even when you do the total number of rows will stay small. While full scans may be slower, the table is small enough it doesn't matter. Primary key lookups can locate the row by hash, saving you that little bit of work.