Thanks for the question, Lal.

Asked: September 19, 2018 - 4:19 am UTC

Last updated: February 14, 2023 - 2:06 pm UTC

Version: 12.2.0.1

Viewed 1000+ times

You Asked

Hi,

The attribute clustering works only for the direct path inserts?
Will it support the normal inserts? If not whats the way to have the data in table stored in an ordered way especially for normal inserts (not direct path).

Can you explain whether index cluster supports the same, also please shed some light on differences between IOT,Index Cluster,Hash cluster and when to use each.

Thanks very much for your time

and Chris said...

Yes, normal inserts won't use attribute clustering. If you want to apply the clustering, you can do so by moving the table.

The following shows this by creating a table with linear clustering by (c1, c2). But inserting rows in a random order:

create table t (
  c1 number, c2 number, junk varchar2(20)
);

alter table t add clustering by linear order ( c1, c2 );

insert into t 
  select floor ( level / 10 ) c1, mod ( level, 100 ) c2,
         lpad ( 'x', 20, 'x' ) junk
  from   dual
  connect by level <= 1000
  order  by dbms_random.value;
commit;
  
create index i1 on t ( c1 );
create index i2 on t ( c2 );

exec dbms_stats.gather_table_stats ( user, 't' ) ;

select index_name, clustering_factor 
from   user_indexes
where  table_name = 'T';

INDEX_NAME   CLUSTERING_FACTOR   
I1                           430 
I2                           431

Notice that the clustering factor is about the same for each column.

When you move the table, the attribute clustering kicks in. So now the rows are sorted by c1, then c2. So the clustering factor for c1 is "perfect":

alter table t move online update indexes;

exec dbms_stats.gather_table_stats ( user, 't' ) ;

select index_name, clustering_factor 
from   user_indexes
where  table_name = 'T';

INDEX_NAME   CLUSTERING_FACTOR   
I1                             5 
I2                           496

Note the online move is a 12.2 option.

IOTs and table clusters always insert rows in the "correct" place. Regardless of your insert method.

Despite this, attribute clustering has a couple of advantages.

You can add clustering to an existing table. To convert a table to an IOT or add it to a table cluster, you need to rebuild it.

Interleaved clustering also allows you to have a "hybrid" clustering.

Often when you sort rows by one column, they become less sorted on the other columns. Interleaved clustering sorts the data in a Z-order curve. So while neither column will have perfect clustering, the combined order is likely to be better than sorting by one or the other.

You can see this by switching the clustering from linear to interleaved and moving the table again:

alter table t drop clustering ;
alter table t add clustering by interleaved order ( c1, c2 ) ;
alter table t move online update indexes;

exec dbms_stats.gather_table_stats ( user, 't' ) ;

select index_name, clustering_factor 
from   user_indexes
where  table_name = 'T';

INDEX_NAME   CLUSTERING_FACTOR   
I1                            40 
I2                           221

While the clustering factor for I1 has gone up (5 -> 40), it's halved for I2. The overall clustering factor is much lower than we started with (400+ for both indexes).

So why use IOTs or table clusters?

IOTs

These merge the primary key index and table data into one structure. So want you search by primary key, you only need to access the index. Whereas a heap table you need to access the primary key index, then the table. This often means you're accessing one less block.

Let's add a primary key to the heap table. Then build an IOT clone of it:

alter table t add primary key ( c1, c2 );

create table t_iot ( 
  c1, c2, junk, 
  primary key ( c1, c2 )
) organization index as 
  select * from t;

Getting one row by primary key for the heap table uses three gets. Two for the index, one for the table:

set serveroutput off
alter session set statistics_level = all;

select * from t
where  c1 = 1
and    c2 = 10;

select * 
from   table(dbms_xplan.display_cursor(null, null, 'IOSTATS LAST'));

PLAN_TABLE_OUTPUT                                                                                       
SQL_ID  5ryynu02p2734, child number 0                                                                   
-------------------------------------                                                                   
select * from t where  c1 = 1 and    c2 = 10                                                            
                                                                                                        
Plan hash value: 635963422                                                                              
                                                                                                        
-----------------------------------------------------------------------------------------------------   
| Id  | Operation                   | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |   
-----------------------------------------------------------------------------------------------------   
|   0 | SELECT STATEMENT            |             |      1 |        |      1 |00:00:00.01 |       3 |   
|   1 |  TABLE ACCESS BY INDEX ROWID| T           |      1 |      1 |      1 |00:00:00.01 |       3 |   
|*  2 |   INDEX UNIQUE SCAN         | SYS_C008324 |      1 |      1 |      1 |00:00:00.01 |       2 |   
-----------------------------------------------------------------------------------------------------   
                                                                                                        
Predicate Information (identified by operation id):                                                     
---------------------------------------------------                                                     
                                                                                                        
   2 - access("C1"=1 AND "C2"=10)

But getting the same row from the IOT only uses two gets. There's no extra table access:

select * from t_iot
where  c1 = 1
and    c2 = 10;

select * 
from   table(dbms_xplan.display_cursor(null, null, 'IOSTATS LAST'));

PLAN_TABLE_OUTPUT                                                                                   
SQL_ID  gpjv7d0384n8d, child number 0                                                               
-------------------------------------                                                               
select * from t_iot where  c1 = 1 and    c2 = 10                                                    
                                                                                                    
Plan hash value: 3821691147                                                                         
                                                                                                    
-------------------------------------------------------------------------------------------------   
| Id  | Operation         | Name              | Starts | E-Rows | A-Rows |   A-Time   | Buffers |   
-------------------------------------------------------------------------------------------------   
|   0 | SELECT STATEMENT  |                   |      1 |        |      1 |00:00:00.01 |       2 |   
|*  1 |  INDEX UNIQUE SCAN| SYS_IOT_TOP_75219 |      1 |      1 |      1 |00:00:00.01 |       2 |   
-------------------------------------------------------------------------------------------------   
                                                                                                    
Predicate Information (identified by operation id):                                                 
---------------------------------------------------                                                 
                                                                                                    
   1 - access("C1"=1 AND "C2"=10)

Because IOTs store the rows sorted by the primary key, this can help when you have multi-column primary keys. And you (almost always) query the table with an equality (=) on the leading column of the PK.

For example, a common use case is to show customers their recent orders/transactions. An IOT with a primary key of:

( customer_id, transaction_datetime )

Stores all rows for a given customer "next" to each other, sorted by date. So queries like:

select * 
from   orders
where  customer_id = :cust
and    transaction_datetime > sysdate - 30;

Will be able to scan this efficiently. Whereas with a heap, the transactions could be spread throughout the table.

create table t_iot ( 
  c1, c2, junk, 
  primary key ( c1, c2 )
) organization index as 
  select * from t;
  
select * from t_iot
where  c1 = 1;

select * 
from   table(dbms_xplan.display_cursor(null, null, 'IOSTATS LAST'));

PLAN_TABLE_OUTPUT                                                                                   
SQL_ID  5ku20fvctdnjq, child number 0                                                               
-------------------------------------                                                               
select * from t_iot where  c1 = 1                                                                   
                                                                                                    
Plan hash value: 1749615820                                                                         
                                                                                                    
-------------------------------------------------------------------------------------------------   
| Id  | Operation        | Name               | Starts | E-Rows | A-Rows |   A-Time   | Buffers |   
-------------------------------------------------------------------------------------------------   
|   0 | SELECT STATEMENT |                    |      1 |        |     10 |00:00:00.01 |       2 |   
|*  1 |  INDEX RANGE SCAN| SYS_IOT_TOP_137270 |      1 |     10 |     10 |00:00:00.01 |       2 |   
-------------------------------------------------------------------------------------------------   
                                                                                                    
Predicate Information (identified by operation id):                                                 
---------------------------------------------------                                                 
                                                                                                    
   1 - access("C1"=1)

Table clusters

These are more about locating rows from separate tables in the same place. When you join non-clustered tables, the rows for each table will be in separate blocks. If you cluster on the join key, then the rows are in the same block.

For example, let's add a second table. This is the parent table for t, with a single column PK:

create table t2 ( c1 primary key, junk ) as 
  select distinct c1, junk from t;

The rows for these exist in separate blocks. And joing for a given t2 row does five gets:

select t1.rowid, t2.rowid, t1.junk, t2.junk, 
       dbms_rowid.rowid_block_number ( t1.rowid ) t1_blk,
       dbms_rowid.rowid_block_number ( t2.rowid ) t2_blk
from   t1_clus t1
join   t2_clus t2
on     t1.c1 = t2.c1
where  t2.c1 = 1;

ROWID                ROWID                JUNK                   JUNK                   T1_BLK   T2_BLK   
AAASXKAAMAAAAJ7AAZ   AAASXVAAMAAAAL7AAf   xxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxx        635      763 
AAASXKAAMAAAAJ7AAa   AAASXVAAMAAAAL7AAf   xxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxx        635      763 
AAASXKAAMAAAAJ7AAb   AAASXVAAMAAAAL7AAf   xxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxx        635      763 
AAASXKAAMAAAAJ7AAc   AAASXVAAMAAAAL7AAf   xxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxx        635      763 
AAASXKAAMAAAAJ7AAd   AAASXVAAMAAAAL7AAf   xxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxx        635      763 
AAASXKAAMAAAAJ7AA9   AAASXVAAMAAAAL7AAf   xxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxx        635      763 
AAASXKAAMAAAAJ7AA+   AAASXVAAMAAAAL7AAf   xxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxx        635      763 
AAASXKAAMAAAAJ7AA/   AAASXVAAMAAAAL7AAf   xxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxx        635      763 
AAASXKAAMAAAAJ7ABA   AAASXVAAMAAAAL7AAf   xxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxx        635      763 
AAASXKAAMAAAAJ7ABB   AAASXVAAMAAAAL7AAf   xxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxx        635      763 

select * 
from   table(dbms_xplan.display_cursor(null, null, 'IOSTATS LAST'));

PLAN_TABLE_OUTPUT                                                                                                
SQL_ID  8sqzjv6qmcytn, child number 0                                                                            
-------------------------------------                                                                            
select t1.rowid, t2.rowid, t1.junk, t2.junk,                                                                     
dbms_rowid.rowid_block_number ( t1.rowid ) t1_blk,                                                               
dbms_rowid.rowid_block_number ( t2.rowid ) t2_blk from   t t1 join   t2                                          
t2 on     t1.c1 = t2.c1 where  t2.c1 = 1                                                                         
                                                                                                                 
Plan hash value: 731811485                                                                                       
                                                                                                                 
--------------------------------------------------------------------------------------------------------------   
| Id  | Operation                            | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |   
--------------------------------------------------------------------------------------------------------------   
|   0 | SELECT STATEMENT                     |             |      1 |        |     10 |00:00:00.01 |       5 |   
|   1 |  NESTED LOOPS                        |             |      1 |     10 |     10 |00:00:00.01 |       5 |   
|   2 |   TABLE ACCESS BY INDEX ROWID        | T2          |      1 |      1 |      1 |00:00:00.01 |       2 |   
|*  3 |    INDEX UNIQUE SCAN                 | SYS_C008326 |      1 |      1 |      1 |00:00:00.01 |       1 |   
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T           |      1 |     10 |     10 |00:00:00.01 |       3 |   
|*  5 |    INDEX RANGE SCAN                  | I1          |      1 |     10 |     10 |00:00:00.01 |       2 |   
--------------------------------------------------------------------------------------------------------------   
                                                                                                                 
Predicate Information (identified by operation id):                                                              
---------------------------------------------------                                                              
                                                                                                                 
   3 - access("T2"."C1"=1)                                                                                       
   5 - access("T1"."C1"=1)

If we place these tables in hash cluster instead, then all the rows for each C1 are in the same block. And the join does just 2 gets:

create cluster clus (
  c1 number
) hashkeys 1024;

create table t1_clus (
  c1 number, c2 number, junk varchar2(20)
) cluster clus ( c1 );

create table t2_clus (
  c1 number, junk varchar2(20)
) cluster clus ( c1 );

insert all
  when rn = 1 then into t2_clus values ( c1, junk )
  when 1=1 then into t1_clus values ( c1, c2, junk )
  select t.*, row_number () over ( partition by c1 order by c2 ) rn 
  from   t
  order  by c1, c2;
commit;

exec dbms_stats.gather_table_stats ( user, 't1_clus' ) ;
exec dbms_stats.gather_table_stats ( user, 't2_clus' ) ;

select t1.rowid, t2.rowid, t1.junk, t2.junk, 
       dbms_rowid.rowid_block_number ( t1.rowid ) t1_blk,
       dbms_rowid.rowid_block_number ( t2.rowid ) t2_blk
from   t1_clus t1
join   t2_clus t2
on     t1.c1 = t2.c1
where  t2.c1 = 1;

select * 
from   table(dbms_xplan.display_cursor(null, null, 'IOSTATS LAST'));

PLAN_TABLE_OUTPUT                                                                            
SQL_ID  1mnxd2yqptkqv, child number 0                                                        
-------------------------------------                                                        
select t1.rowid, t2.rowid, t1.junk, t2.junk,                                                 
dbms_rowid.rowid_block_number ( t1.rowid ) t1_blk,                                           
dbms_rowid.rowid_block_number ( t2.rowid ) t2_blk from   t1_clus t1                          
join   t2_clus t2 on     t1.c1 = t2.c1 where  t2.c1 = 1                                      
                                                                                             
Plan hash value: 3541621931                                                                  
                                                                                             
------------------------------------------------------------------------------------------   
| Id  | Operation            | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers |   
------------------------------------------------------------------------------------------   
|   0 | SELECT STATEMENT     |         |      1 |        |     10 |00:00:00.01 |       2 |   
|   1 |  MERGE JOIN CARTESIAN|         |      1 |     10 |     10 |00:00:00.01 |       2 |   
|*  2 |   TABLE ACCESS HASH  | T2_CLUS |      1 |      1 |      1 |00:00:00.01 |       1 |   
|   3 |   BUFFER SORT        |         |      1 |     10 |     10 |00:00:00.01 |       1 |   
|*  4 |    TABLE ACCESS HASH | T1_CLUS |      1 |     10 |     10 |00:00:00.01 |       1 |   
------------------------------------------------------------------------------------------   
                                                                                             
Predicate Information (identified by operation id):                                          
---------------------------------------------------                                          
                                                                                             
   2 - access("T2"."C1"=1)                                                                   
   4 - access("T1"."C1"=1)

Because it's a hash cluster, the database can locate rows using a hash function. No index needed!

So if table clusters are so great, why are they so rare?

Well, aside from most poeople not knowing about them, hash clusters have a big drawback.

You need to be able to estimate how big each row will be and the total number of rows. This is so you can set the number of hashkeys large enough to avoid collisions. Which make them less efficient. How often can you do that?

Index clusters avoid the problem of needing to know the total number of rows. But make full table scans slower. And have other caveats.

That said, there is a type of cluster I think should be much more widely used:

Single table hash clusters.

These are hash clusters you can only store one table in. These are perfect for small lookup tables for things like status codes, clustered on their primary key.

Here the data rarely changes. Even when you do the total number of rows will stay small. While full scans may be slower, the table is small enough it doesn't matter. Primary key lookups can locate the row by hash, saving you that little bit of work.

Rating

(5 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Attribute clustering in Data modeller.

Rajeshwaran, Jeyabal, September 28, 2020 - 8:14 am UTC

Team:

How to define Attribute clustering in SQL Developer Data modeller - "Physical" modelling section. (was using sql developer 19.4 version for this)

Here is what i did, got to Physical model, select the table - do right click and select "properties"

In that "general" tab, dont see an option for defining/enabling "attribute clustering" at table level,
in the next tab "Clustering columns" section all got completely disable, what should we do to have it enabled?

September 29, 2020 - 3:57 am UTC

Replicated your experience with 20.2 as well.

Checking with the product team and will get back to you.

Attribute clustering in Data modeller.

Rajeshwaran Jeyabal, March 31, 2022 - 11:26 am UTC

Team,

Tried with the SQL Developer Data modeller version 21.4.3 and dont find the option for defining the attribute clustering in the physical model yet, any updates to this from product team on this?

April 05, 2022 - 5:18 am UTC

Have asked the product team and will update when I get a response

Attribute clustering Vs Automatic Indexing(AI)

Rajeshwaran Jeyabal, March 31, 2022 - 12:00 pm UTC

Team,

The below demo was from Oracle 19c (19.13) running on Exadata cloud@customer platform.

though we got attribute clustering defined on the table, why automatic indexing got poor clustering factor
when compared to manual indexing?

please do let us know if this has to be raised as new question rather than a follow up here

ai_demo@PDB19> create table demo015
  2  nologging
  3  as
  4  select rownum as id,
  5        mod(rownum,100)+1 as C1 ,
  6        a.*
  7  from all_objects a, all_users b,
  8         all_users c
  9  where rownum <= 10000000;

Table created.

ai_demo@PDB19> select max(object_id), count(*) from demo015 where c1 = 57;

MAX(OBJECT_ID)   COUNT(*)
-------------- ----------
          6953     100000

ai_demo@PDB19> select max(object_id), count(*) from demo015 where c1 = 57;

MAX(OBJECT_ID)   COUNT(*)
-------------- ----------
          6953     100000

ai_demo@PDB19> select max(object_id), count(*) from demo015 where c1 = 57;

MAX(OBJECT_ID)   COUNT(*)
-------------- ----------
          6953     100000

ai_demo@PDB19> set autotrace traceonly exp statistics
ai_demo@PDB19> select max(object_id), count(*) from demo015 where c1 = 57;


Execution Plan
----------------------------------------------------------
Plan hash value: 1474407985

--------------------------------------------------------------------------------------
| Id  | Operation                  | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |         |     1 |     7 | 51034   (1)| 00:00:02 |
|   1 |  SORT AGGREGATE            |         |     1 |     7 |            |          |
|*  2 |   TABLE ACCESS STORAGE FULL| DEMO015 |   100K|   683K| 51034   (1)| 00:00:02 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - storage("C1"=57)
       filter("C1"=57)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
     187097  consistent gets
     187091  physical reads
          0  redo size
        428  bytes sent via SQL*Net to client
        546  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

ai_demo@PDB19> set autotrace off
ai_demo@PDB19> 
ai_demo@PDB19> host timeout /T 900

ai_demo@PDB19> select dbms_auto_index.report_activity( activity_start=> systimestamp - 1/24 ) report from dual;

REPORT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
 Activity start               : 29-MAR-2022 05:15:12
 Activity end                 : 29-MAR-2022 06:15:13
 Executions completed         : 4
 Executions interrupted       : 0
 Executions with fatal error  : 0
-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
 Index candidates            : 4
 Indexes created             : 0
 Space used                  : 0 B
 Indexes dropped             : 0
 SQL statements verified     : 0
 SQL statements improved     : 0
 SQL plan baselines created  : 0
 Overall improvement factor  : 0x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
 Unused indexes    : 0
 Space used        : 0 B
 Unusable indexes  : 0
-------------------------------------------------------------------------------

ERRORS
---------------------------------------------------------------------------------------------
No errors found.
---------------------------------------------------------------------------------------------


ai_demo@PDB19> select table_name,index_name,tablespace_name,clustering_factor,
  2        status,visibility,indexing,auto
  3  from  user_indexes
  4  where table_name = 'DEMO015'
  5  order by 1,2;

TABLE_NAME      INDEX_NAME                TABLESPACE_NAME           CLUSTERING_FACTOR STATUS   VISIBILIT INDEXIN AUT
--------------- ------------------------- ------------------------- ----------------- -------- --------- ------- ---
DEMO015         SYS_AI_cvby0u1d95hd2      TS_INDEX_DEMO                       4234524 UNUSABLE INVISIBLE FULL    YES

ai_demo@PDB19> select table_name,index_name,column_name,column_position
  2  from user_ind_columns
  3  where table_name = 'DEMO015'
  4  order by 1,2,3;

TABLE_NAME      INDEX_NAME                COLUMN_NAM COLUMN_POSITION
--------------- ------------------------- ---------- ---------------
DEMO015         SYS_AI_cvby0u1d95hd2      C1                       1

ai_demo@PDB19> alter table demo015 add clustering by linear order(c1);

Table altered.

ai_demo@PDB19> alter table demo015 move online;

Table altered.

ai_demo@PDB19> select table_name,index_name,tablespace_name,clustering_factor,
  2        status,visibility,indexing,auto
  3  from  user_indexes
  4  where table_name = 'DEMO015'
  5  order by 1,2;

no rows selected

ai_demo@PDB19> select max(object_id), count(*) from demo015 where c1 = 21;

MAX(OBJECT_ID)   COUNT(*)
-------------- ----------
          6953     100000

ai_demo@PDB19> select max(object_id), count(*) from demo015 where c1 = 22;

MAX(OBJECT_ID)   COUNT(*)
-------------- ----------
          6953     100000

ai_demo@PDB19> select max(object_id), count(*) from demo015 where c1 = 23;

MAX(OBJECT_ID)   COUNT(*)
-------------- ----------
          6953     100000

ai_demo@PDB19> host timeout /T 900

ai_demo@PDB19> select dbms_auto_index.report_activity( activity_start=> systimestamp - 1/24 ) report from dual;

REPORT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
 Activity start               : 29-MAR-2022 08:36:24
 Activity end                 : 29-MAR-2022 09:36:24
 Executions completed         : 4
 Executions interrupted       : 0
 Executions with fatal error  : 0
-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
 Index candidates            : 5
 Indexes created             : 0
 Space used                  : 0 B
 Indexes dropped             : 0
 SQL statements verified     : 0
 SQL statements improved     : 0
 SQL plan baselines created  : 0
 Overall improvement factor  : 0x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
 Unused indexes    : 0
 Space used        : 0 B
 Unusable indexes  : 0
-------------------------------------------------------------------------------

ERRORS
---------------------------------------------------------------------------------------------
No errors found.
---------------------------------------------------------------------------------------------


ai_demo@PDB19> select table_name,index_name,tablespace_name,clustering_factor,
  2        status,visibility,indexing,auto
  3  from  user_indexes
  4  where table_name = 'DEMO015'
  5  order by 1,2;

TABLE_NAME      INDEX_NAME                TABLESPACE_NAME           CLUSTERING_FACTOR STATUS   VISIBILIT INDEXIN AUT
--------------- ------------------------- ------------------------- ----------------- -------- --------- ------- ---
DEMO015         SYS_AI_cvby0u1d95hd2      TS_INDEX_DEMO                       4234490 UNUSABLE INVISIBLE FULL    YES

ai_demo@PDB19> select table_name,index_name,column_name,column_position
  2  from user_ind_columns
  3  where table_name = 'DEMO015'
  4  order by 1,2,3;

TABLE_NAME      INDEX_NAME                COLUMN_NAM COLUMN_POSITION
--------------- ------------------------- ---------- ---------------
DEMO015         SYS_AI_cvby0u1d95hd2      C1                       1

ai_demo@PDB19> set autotrace traceonly exp statistics
ai_demo@PDB19> select max(object_id), count(*) from demo015 where c1 = 37;


Execution Plan
----------------------------------------------------------
Plan hash value: 1474407985

-------------------------------------------------------------------------------------
| Id  | Operation                  | Name    | Rows  | Bytes | Cost (%CPU)| Time
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |         |     1 |     7 | 51018   (1)| 00:00:02
|   1 |  SORT AGGREGATE            |         |     1 |     7 |            |
|*  2 |   TABLE ACCESS STORAGE FULL| DEMO015 |   100K|   683K| 51018   (1)| 00:00:02
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - storage("C1"=37)
       filter("C1"=37)


Statistics
----------------------------------------------------------
          5  recursive calls
          0  db block gets
     187042  consistent gets
     187033  physical reads
          0  redo size
        428  bytes sent via SQL*Net to client
        546  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

ai_demo@PDB19> set autotrace off
ai_demo@PDB19> create index demo015_c1_idx
  2  on demo015(c1)
  3  nologging
  4  compress advanced low;

Index created.

ai_demo@PDB19> select table_name,index_name,column_name,column_position
  2  from user_ind_columns
  3  where table_name = 'DEMO015'
  4  order by 1,2,3;

TABLE_NAME      INDEX_NAME                COLUMN_NAM COLUMN_POSITION
--------------- ------------------------- ---------- ---------------
DEMO015         DEMO015_C1_IDX            C1                       1
DEMO015         SYS_AI_cvby0u1d95hd2      C1                       1

ai_demo@PDB19> select table_name,index_name,tablespace_name,clustering_factor,
  2          status,visibility,indexing,auto
  3  from  user_indexes
  4  where table_name = 'DEMO015'
  5  order by 1,2;

TABLE_NAME      INDEX_NAME                TABLESPACE_NAME           CLUSTERING_FACTOR STATUS   VISIBILIT INDEXIN AUT
--------------- ------------------------- ------------------------- ----------------- -------- --------- ------- ---
DEMO015         DEMO015_C1_IDX            TS_INDEX_DEMO                        187035 VALID    VISIBLE   FULL    NO
DEMO015         SYS_AI_cvby0u1d95hd2      TS_INDEX_DEMO                       4234490 UNUSABLE INVISIBLE FULL    YES

ai_demo@PDB19> set autotrace traceonly exp statistics
ai_demo@PDB19> select max(object_id), count(*) from demo015 where c1 = 37;


Execution Plan
----------------------------------------------------------
Plan hash value: 3643747066

-------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                |     1 |     7 |  2030   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE                      |                |     1 |     7 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| DEMO015        |   100K|   683K|  2030   (1)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | DEMO015_C1_IDX |   100K|       |   157   (1)| 00:00:01 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("C1"=37)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       2027  consistent gets
        155  physical reads
      11824  redo size
        428  bytes sent via SQL*Net to client
        546  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

ai_demo@PDB19> set autotrace off
ai_demo@PDB19>

April 08, 2022 - 2:26 am UTC

Well..its a black box algorithm, so there's lot of things in play here.

For example, if we build an index that ultimately was not useful due to any reason (eg clustering factor), then we have an internal setting to stops that index being considered again for a certain duration. This stops us repeatedly building, testing, abandoning the same index over and over and over.

That is probably what has happened here.

Does attribute clustering work with direct path insert?

Narendra, February 11, 2023 - 12:25 am UTC

Hello Connor/Chris,

Thank you for another informative thread.
I tried to see attribute clustering in action with direct path insert but am getting unexpected results. What am I missing?

Below is the typical example of how attribute clustering is causing plan change for a simple SQL

SQL> select banner_full from v$version ;

BANNER_FULL
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.16.0.0.0


1 row selected.

Elapsed: 00:00:00.01

SQL> drop table ot1 purge ;

Table dropped.

Elapsed: 00:00:00.12
SQL> create table ot1 as select * from objects_table order by dbms_random.random ;

Table created.

Elapsed: 00:00:00.90
SQL> create index ot1_ix on ot1(object_id) ;

Index created.

Elapsed: 00:00:00.78
SQL> set feedback only
SQL> select * from ot1 where object_id <= 1000 ;

27 rows selected.

Elapsed: 00:00:00.27
SQL> set feedback on
SQL> select * from table(dbms_xplan.display_cursor(format=>'allstats last +HINT_REPORT'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 49ph58ajrk7k5, child number 0
-------------------------------------
select * from ot1 where object_id <= 1000

Plan hash value: 602138737

---------------------------------------------------------------------------------------------
| Id  | Operation   | Name | Starts | E-Rows | A-Rows | A-Time  | Buffers | Reads  |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |  | 1 |    |  27 |00:00:00.25 |    1200 |   1195 |
|*  1 |  TABLE ACCESS FULL| OT1  | 1 |    698 |  27 |00:00:00.25 |    1200 |   1195 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("OBJECT_ID"<=1000)


18 rows selected.

Elapsed: 00:00:00.07
SQL> alter table ot1 add clustering by linear order(object_id) without materialized zonemap;

Table altered.

Elapsed: 00:00:00.01
SQL> alter table ot1 move ;

Table altered.

Elapsed: 00:00:00.65
SQL> alter index ot1_ix rebuild ;

Index altered.

Elapsed: 00:00:01.36
SQL> set feedback only
SQL> select * from ot1 where object_id <= 1000 ;

27 rows selected.

Elapsed: 00:00:00.04
SQL> set feedback on
SQL> select * from table(dbms_xplan.display_cursor(format=>'allstats last +HINT_REPORT'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 49ph58ajrk7k5, child number 0
-------------------------------------
select * from ot1 where object_id <= 1000

Plan hash value: 3116241865

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation       | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |     1 |        |     27 |00:00:00.03 |      5 |      2 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| OT1    |     1 |    698 |     27 |00:00:00.03 |      5 |      2 |
|*  2 |   INDEX RANGE SCAN      | OT1_IX |     1 |    698 |     27 |00:00:00.01 |      3 |      1 |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"<=1000)


19 rows selected.

Elapsed: 00:00:00.07

Now I truncate the table and load same data ordered randomly using direct path insert but the attribute clustering (on OBJECT_ID) does not appear to affect the loaded data because the same SELECT goes for full table scan.

SQL> truncate table ot1 reuse storage ;

Table truncated.

Elapsed: 00:00:00.52
SQL> insert /*+ APPEND */ into ot1 select * from objects_table order by dbms_random.random ;

60239 rows created.

Elapsed: 00:00:01.04
SQL> commit ;

Commit complete.

Elapsed: 00:00:00.00

SQL> set feedback on
SQL> select * from table(dbms_xplan.display_cursor(format=>'allstats last +HINT_REPORT'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 54xqvns3fu3xm, child number 0
-------------------------------------
insert /*+ APPEND */ into ot1 select * from objects_table order by
dbms_random.random

Plan hash value: 3569990740

---------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation    | Name   | Starts | E-Rows | A-Rows | A-Time  | Buffers | Reads  | Writes | OMem | 1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT   |   | 1 |    |   0 |00:00:01.01 |    6409 |  41 | 1195 |      |      |  |
|   1 |  LOAD AS SELECT    | OT1   | 1 |    |   0 |00:00:01.01 |    6409 |  41 | 1195 | 2070K| 2070K| 2070K (0)|
|   2 |   OPTIMIZER STATISTICS GATHERING |   | 1 |  60239 |  60239 |00:00:00.47 |    1205 |   1 |    0 |  256K|  256K|  |
|   3 |    SORT ORDER BY   |   | 1 |  60239 |  60239 |00:00:00.45 |    1199 |   0 |    0 |   12M| 1356K|  11M (0)|
|   4 |     TABLE ACCESS FULL   | OBJECTS_TABLE | 1 |  60239 |  60239 |00:00:00.03 |    1199 |   0 |    0 |      |      |  |
---------------------------------------------------------------------------------------------------------------------------------------------------------


17 rows selected.

Elapsed: 00:00:00.06
SQL> set feedback only
SQL> exec dbms_stats.gather_table_stats(ownname=>user, tabname=>'OT1', method_opt=>'FOR ALL COLUMNS SIZE 1', cascade => true);

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.75
SQL> select * from ot1 where object_id <= 1000 ;

27 rows selected.

Elapsed: 00:00:00.45
SQL> set feedback on
SQL> select * from table(dbms_xplan.display_cursor(format=>'allstats last +HINT_REPORT'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 49ph58ajrk7k5, child number 0
-------------------------------------
select * from ot1 where object_id <= 1000

Plan hash value: 602138737

---------------------------------------------------------------------------------------------
| Id  | Operation   | Name | Starts | E-Rows | A-Rows | A-Time  | Buffers | Reads  |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |  | 1 |    |  27 |00:00:00.44 |    1200 |   1195 |
|*  1 |  TABLE ACCESS FULL| OT1  | 1 |    698 |  27 |00:00:00.44 |    1200 |   1195 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("OBJECT_ID"<=1000)


18 rows selected.

Elapsed: 00:00:00.07

I was expecting the attribute clustering would ensure that data is loaded as clustered around OBJECT_ID when used direct path insert and hence would allow the SELECT query to use the index on OBJECT_ID column. But the clustering factor on OT1_IX index (after direct path load) suggests the data is not clustered.
Can you please help me understand why direct path loading of data is not storing data clustered?

February 13, 2023 - 3:53 pm UTC

The database obeys the ORDER BY for the INSERT; this overrides attribute clustering.

Repeating the demo from TRUNCATE with & without ORDER BY for the INSERT:

truncate table ot1 reuse storage ;
 
insert /*+ APPEND */ into ot1 
  select * from objects_table 
  order by dbms_random.random ;
 
commit ;
set feedback on
select * from table(dbms_xplan.display_cursor(format=>'IOSTATS LAST +HINT_REPORT'));
/*
-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |
-----------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT    |               |      1 |        |      0 |00:00:01.56 |    7637 |     51 |   1446 |
|   1 |  LOAD AS SELECT     | OT1           |      1 |        |      0 |00:00:01.56 |    7637 |     51 |   1446 |
|   2 |   SORT ORDER BY     |               |      1 |  73407 |  74750 |00:00:00.70 |    1451 |      0 |      0 |
|   3 |    TABLE ACCESS FULL| OBJECTS_TABLE |      1 |  73407 |  74750 |00:00:00.05 |    1451 |      0 |      0 |
-----------------------------------------------------------------------------------------------------------------
*/

exec dbms_stats.gather_table_stats(ownname=>user, tabname=>'OT1', method_opt=>'FOR ALL COLUMNS SIZE 1', cascade => true);

select clustering_factor, num_rows, leaf_blocks 
from   user_indexes
where  index_name = 'OT1_IX';
/*
CLUSTERING_FACTOR   NUM_ROWS LEAF_BLOCKS
----------------- ---------- -----------
            74691      74747         149
*/

set feedback only
select * from ot1 where object_id <= 1000 ;
set feedback on

select * from table(dbms_xplan.display_cursor(format=>'iostats last +HINT_REPORT'));
/*
--------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |        |      1 |        |    998 |00:00:00.01 |    1010 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| OT1    |      1 |    385 |    998 |00:00:00.01 |    1010 |
|*  2 |   INDEX RANGE SCAN                  | OT1_IX |      1 |    385 |    998 |00:00:00.01 |      12 |
--------------------------------------------------------------------------------------------------------
*/

truncate table ot1 reuse storage ;
 
insert /*+ APPEND */ into ot1 
  select * from objects_table ;
 
commit ;
set feedback on
select * from table(dbms_xplan.display_cursor(format=>'IOSTATS LAST +HINT_REPORT'));
/*
-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |
-----------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT    |               |      1 |        |      0 |00:00:00.96 |    7598 |     51 |   1446 |
|   1 |  LOAD AS SELECT     | OT1           |      1 |        |      0 |00:00:00.96 |    7598 |     51 |   1446 |
|   2 |   SORT ORDER BY     |               |      1 |  73407 |  74750 |00:00:00.19 |    1451 |      0 |      0 |
|   3 |    TABLE ACCESS FULL| OBJECTS_TABLE |      1 |  73407 |  74750 |00:00:00.03 |    1451 |      0 |      0 |
-----------------------------------------------------------------------------------------------------------------
*/

exec dbms_stats.gather_table_stats(ownname=>user, tabname=>'OT1', method_opt=>'FOR ALL COLUMNS SIZE 1', cascade => true);

select clustering_factor, num_rows, leaf_blocks 
from   user_indexes
where  index_name = 'OT1_IX';
/*
CLUSTERING_FACTOR   NUM_ROWS LEAF_BLOCKS
----------------- ---------- -----------
             1446      74747         149
*/

set feedback only
select * from ot1 where object_id <= 1000 ;
set feedback on

select * from table(dbms_xplan.display_cursor(format=>'IOSTATS LAST +HINT_REPORT'));
/*
--------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |        |      1 |        |    998 |00:00:00.01 |      38 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| OT1    |      1 |    385 |    998 |00:00:00.01 |      38 |
|*  2 |   INDEX RANGE SCAN                  | OT1_IX |      1 |    385 |    998 |00:00:00.01 |      12 |
--------------------------------------------------------------------------------------------------------
*/

Re: Does attribute clustering work with direct path insert?

Narendra, February 13, 2023 - 11:33 pm UTC

Thank you Chris.
I was certain that I was missing something obvious but could not spot it.
Thank you for taking time to answer my question

February 14, 2023 - 2:06 pm UTC

Happy to help

More to Explore

Design

New to good database design? Check out Chris Saxon's full fundamentals class.

Breadcrumb

Question and Answer

You Asked

and Chris said...

Rating

Comments

Attribute clustering in Data modeller.

Attribute clustering in Data modeller.

Attribute clustering Vs Automatic Indexing(AI)

Does attribute clustering work with direct path insert?

Re: Does attribute clustering work with direct path insert?

More to Explore

Design

12c attribute clustering vs old index cluster

Breadcrumb

Question and Answer

You Asked

and Chris said...

Rating

Comments

Attribute clustering in Data modeller.

Attribute clustering in Data modeller.

Attribute clustering Vs Automatic Indexing(AI)

Does attribute clustering work with direct path insert?

Re: Does attribute clustering work with direct path insert?

More to Explore

Design