## Question and Answer

## You Asked

I would be grateful if you can shed some light on the question of using the 12c clustering (sorting) table data to improve the effectiveness of storage indexes.

The question is:

If there is

(A) a widely used predicate on columns SUBMISSION_UNIT_COB_DATE and MEASURE_NAME on a table named MEASURES which is partitioned by date on SUBMISSION_UNIT_COB_DATE

(B) A widely used join of the measure table columns SUBMISSION_UNIT_COB_DATE and SUBMISSION_UNIT_ID which results in a BLOOM FILTER on these columns

Q1. Can Bloom Filters benefit from storage indexes taking advantage of the clustered/sorted data on the bloom filter columns?

Q2. If the answer to Q1 is yes, then which will benefit me more, clustering on the standard WHERE predicates or Bloom filter columns i.e. (A) or

(B) ? Does this depend on whether the standard WHERE PREDICATES are executed first before the Bloom filters at the storage cells?

Q3. If the answer to Q1 is no, then should I ignore the bloom filter columns and always cluster on the standard WHERE predicates? i.e. ignore the join columns for clustering the table data?

I have copied the relevant info from the plan below:

Object: Measures

Access Predicates:Z>=:Z AND :Z<=:Z AND ("MEASURES"."MEASURE_NAME"='IR01 OR Bkt MRS' AND "MEASURES"."SCENARIO"<>'Base' AND "MEASURES"."SUBMISSION_UNIT_COB_DATE"='2017-02-02' AND SYS_OP_BLOOM_FILTER(:BF0000,"MEASURES"."SUBMISSION_UNIT_ID","MEASURES"."SUBMISSION_UNIT_COB_DATE"))

Filter Predicates: ("MEASURES"."MEASURE_NAME"='IR01 OR Bkt MRS' AND "MEASURES"."SCENARIO"<>'Base' AND "MEASURES"."SUBMISSION_UNIT_COB_DATE"='2017-02-02' AND SYS_OP_BLOOM_FILTER (:BF0000,"MEASURES"."SUBMISSION_UNIT_ID","MEASURES"."SUBMISSION_UNIT_COB_DATE"))

The question is:

If there is

(A) a widely used predicate on columns SUBMISSION_UNIT_COB_DATE and MEASURE_NAME on a table named MEASURES which is partitioned by date on SUBMISSION_UNIT_COB_DATE

(B) A widely used join of the measure table columns SUBMISSION_UNIT_COB_DATE and SUBMISSION_UNIT_ID which results in a BLOOM FILTER on these columns

Q1. Can Bloom Filters benefit from storage indexes taking advantage of the clustered/sorted data on the bloom filter columns?

Q2. If the answer to Q1 is yes, then which will benefit me more, clustering on the standard WHERE predicates or Bloom filter columns i.e. (A) or

(B) ? Does this depend on whether the standard WHERE PREDICATES are executed first before the Bloom filters at the storage cells?

Q3. If the answer to Q1 is no, then should I ignore the bloom filter columns and always cluster on the standard WHERE predicates? i.e. ignore the join columns for clustering the table data?

I have copied the relevant info from the plan below:

Object: Measures

Access Predicates:Z>=:Z AND :Z<=:Z AND ("MEASURES"."MEASURE_NAME"='IR01 OR Bkt MRS' AND "MEASURES"."SCENARIO"<>'Base' AND "MEASURES"."SUBMISSION_UNIT_COB_DATE"='2017-02-02' AND SYS_OP_BLOOM_FILTER(:BF0000,"MEASURES"."SUBMISSION_UNIT_ID","MEASURES"."SUBMISSION_UNIT_COB_DATE"))

Filter Predicates: ("MEASURES"."MEASURE_NAME"='IR01 OR Bkt MRS' AND "MEASURES"."SCENARIO"<>'Base' AND "MEASURES"."SUBMISSION_UNIT_COB_DATE"='2017-02-02' AND SYS_OP_BLOOM_FILTER (:BF0000,"MEASURES"."SUBMISSION_UNIT_ID","MEASURES"."SUBMISSION_UNIT_COB_DATE"))

## and we said...

Q1. Can a bloom filter benefit from a storage index?

The answer is yes, a bloom filter can benefit from a storage index.

Q2. If the answer to Q1 is yes, then which will benefit me more, clustering on the standard WHERE predicates or Bloom filter columns i.e. (A) or (B) ? Does this depend on whether the standard WHERE PREDICATES are executed first before the Bloom filters at the storage cells?

The order in which the predicates are evaluated will impact the benefit of using the storage indexes. All where clause predicates (including Bloom Filters) will always be evaluated after partition pruning has occurred.

Since your table is partitioned on SUBMISSION_UNIT_COB_DATE and the query contains an equality predicate on that column, then this predicate will be applied first but it's unlike to get a huge benefit from the storage index as it will only access data within the partition where the SUBMISSION_UNIT_COB_DATE=='2017-02-02'. So, I wouldn't bother sorting on the SUBMISSION_UNIT_COB_DATE column.

The remaining predicates will then be applied, with the most selective predicate being applied first. So, to answer your question regarding which column to sort on, I would recommend you sort on the most commonly used selective, equality predicate.

The answer is yes, a bloom filter can benefit from a storage index.

Q2. If the answer to Q1 is yes, then which will benefit me more, clustering on the standard WHERE predicates or Bloom filter columns i.e. (A) or (B) ? Does this depend on whether the standard WHERE PREDICATES are executed first before the Bloom filters at the storage cells?

The order in which the predicates are evaluated will impact the benefit of using the storage indexes. All where clause predicates (including Bloom Filters) will always be evaluated after partition pruning has occurred.

Since your table is partitioned on SUBMISSION_UNIT_COB_DATE and the query contains an equality predicate on that column, then this predicate will be applied first but it's unlike to get a huge benefit from the storage index as it will only access data within the partition where the SUBMISSION_UNIT_COB_DATE=='2017-02-02'. So, I wouldn't bother sorting on the SUBMISSION_UNIT_COB_DATE column.

The remaining predicates will then be applied, with the most selective predicate being applied first. So, to answer your question regarding which column to sort on, I would recommend you sort on the most commonly used selective, equality predicate.

## and you rated our response

Is this answer out of date? If it is, please let us know via a Review

## Reviews

Team,

was reading through this blog post

<quote>

While on the Exadata storage cells there is a limit on the number of Storage Indexes created on each table.

Prior Exadata 12.2.1.1.0 the limit was 8 columns per table.

From Exadata 12.2.1.1.0 onwards the limit has been increase to 24 columns per table,

which means all of the critical columns used in the where clause predicates of your queries will definitely be covered

</quote>

questions

a) is that the storage index will be created during CTAS / any direct path loads or post the query access?

b) How the storage server understands which columns in the table are the potential candidate for Storage index?

c) At any given point in time, how do we tell which columns in a table are part of storage index or not ?

d) Can you please help us with a test case to show how to prove storage index has the limit of 24 columns per table?

was reading through this blog post

__https://sqlmaria.com/2017/02/21/oracle-storage-index/__<quote>

While on the Exadata storage cells there is a limit on the number of Storage Indexes created on each table.

Prior Exadata 12.2.1.1.0 the limit was 8 columns per table.

From Exadata 12.2.1.1.0 onwards the limit has been increase to 24 columns per table,

which means all of the critical columns used in the where clause predicates of your queries will definitely be covered

</quote>

questions

a) is that the storage index will be created during CTAS / any direct path loads or post the query access?

b) How the storage server understands which columns in the table are the potential candidate for Storage index?

c) At any given point in time, how do we tell which columns in a table are part of storage index or not ?

d) Can you please help us with a test case to show how to prove storage index has the limit of 24 columns per table?

Followup

November 07, 2019 - 4:39 am UTC

b) The algorithm is not public but the storage servers decide on the "best" columns to index. The definition of "best" is internal, but I would expect levels of usage in predicates, and the clustering of the data to be dominant factors.

c) I don't know of a view or similar structure that exposes that. The only way I know would be to run a test query for each column. There are some trace settings that can be applied on the storage servers - you'd need to talk to Support about them because obviously tracing *every* I/O is generally a bad idea :-)

d) Here's an example

SQL> create table t as 2 select 3 trunc(rownum/1000) c1, 4 trunc(rownum/1000) c2, 5 trunc(rownum/1000) c3, 6 trunc(rownum/1000) c4, 7 trunc(rownum/1000) c5, 8 trunc(rownum/1000) c6, 9 trunc(rownum/1000) c7, 10 trunc(rownum/1000) c8, 11 trunc(rownum/1000) c9, 12 trunc(rownum/1000) c10, 13 trunc(rownum/1000) c11, 14 trunc(rownum/1000) c12, 15 rpad(rownum,500) data 16 from 17 ( select 1 from dual connect by level < 5000 ), 18 ( select 1 from dual connect by level < 5000 ); Table created. -- -- lets do a scan on C1 -- SQL> select max(data) from t where c1 < 1000; MAX(DATA) ------------------------------------------------------------ 999999 -- -- we can see that just one scan was not enough to convince -- the cells to make an index -- SQL> select name, value 2 from v$mystat s, v$statname n 3 where s.statistic# = n.statistic# 4 and name like '%storage index%'; NAME VALUE ---------------------------------------------------------------- ---------- cell physical IO bytes saved by storage index 0 cell XT granule IO bytes saved by storage index 0 -- -- Now we'll do 20 scans on C1 -- SQL> declare 2 x varchar2(1000); 3 begin 4 for i in 1 .. 20 5 loop 6 execute immediate 'select max(data) from t where c1 < '||(i*10+1000) into x; 7 end loop; 8 end; 9 / PL/SQL procedure successfully completed. -- -- and we can see that we got some savings -- SQL> select name, value 2 from v$mystat s, v$statname n 3 where s.statistic# = n.statistic# 4 and name like '%storage index%'; NAME VALUE ---------------------------------------------------------------- ---------------- cell physical IO bytes saved by storage index 316601630720 cell XT granule IO bytes saved by storage index 0 -- -- now I'll do 20 scans on columns C1.. C12, to see if they ALL can get an index -- SQL> declare 2 x varchar2(1000); 3 begin 4 for c in 1 .. 12 loop 5 for i in 1 .. 20 6 loop 7 execute immediate 'select max(data) from t where c'||c||' < '||(i*10+1000) into x; 8 end loop; 9 end loop; 10 end; 11 / PL/SQL procedure successfully completed. SQL> SQL> select name, value 2 from v$mystat s, v$statname n 3 where s.statistic# = n.statistic# 4 and name like '%storage index%'; NAME VALUE ---------------------------------------------------------------- ------------------- cell physical IO bytes saved by storage index 3950481989632 cell XT granule IO bytes saved by storage index 0 -- -- So this is our starting point (3950481989632), now I'll run a scan on -- each column C1.. C12, and see if with each one we got a savings -- SQL> select max(data) from t where c1 < 2000; MAX(DATA) ------------------------------- 999999 SQL> select name, value 2 from v$mystat s, v$statname n 3 where s.statistic# = n.statistic# 4 and name like '%storage index%'; NAME VALUE ---------------------------------------------------------------- ------------------- cell physical IO bytes saved by storage index 3965752631296 cell XT granule IO bytes saved by storage index 0 SQL> SQL> select max(data) from t where c2 < 2000; MAX(DATA) ------------------------------- 999999 SQL> SQL> select name, value 2 from v$mystat s, v$statname n 3 where s.statistic# = n.statistic# 4 and name like '%storage index%'; NAME VALUE ---------------------------------------------------------------- ------------------- cell physical IO bytes saved by storage index 3981023272960 cell XT granule IO bytes saved by storage index 0 SQL> SQL> select max(data) from t where c3 < 2000; MAX(DATA) ------------------------------- 999999 SQL> SQL> select name, value 2 from v$mystat s, v$statname n 3 where s.statistic# = n.statistic# 4 and name like '%storage index%'; NAME VALUE ---------------------------------------------------------------- ------------------- cell physical IO bytes saved by storage index 3996293914624 cell XT granule IO bytes saved by storage index 0 SQL> SQL> select max(data) from t where c4 < 2000; MAX(DATA) ------------------------------- 999999 SQL> SQL> select name, value 2 from v$mystat s, v$statname n 3 where s.statistic# = n.statistic# 4 and name like '%storage index%'; NAME VALUE ---------------------------------------------------------------- ------------------- cell physical IO bytes saved by storage index 4011564556288 cell XT granule IO bytes saved by storage index 0 SQL> SQL> select max(data) from t where c5 < 2000; MAX(DATA) ------------------------------- 999999 SQL> SQL> select name, value 2 from v$mystat s, v$statname n 3 where s.statistic# = n.statistic# 4 and name like '%storage index%'; NAME VALUE ---------------------------------------------------------------- ------------------- cell physical IO bytes saved by storage index 4026835197952 cell XT granule IO bytes saved by storage index 0 SQL> SQL> select max(data) from t where c6 < 2000; MAX(DATA) ------------------------------- 999999 SQL> SQL> select name, value 2 from v$mystat s, v$statname n 3 where s.statistic# = n.statistic# 4 and name like '%storage index%'; NAME VALUE ---------------------------------------------------------------- ------------------- cell physical IO bytes saved by storage index 4042105839616 cell XT granule IO bytes saved by storage index 0 SQL> SQL> select max(data) from t where c7 < 2000; MAX(DATA) ------------------------------- 999999 SQL> SQL> select name, value 2 from v$mystat s, v$statname n 3 where s.statistic# = n.statistic# 4 and name like '%storage index%'; NAME VALUE ---------------------------------------------------------------- ------------------- cell physical IO bytes saved by storage index 4057376481280 cell XT granule IO bytes saved by storage index 0 SQL> SQL> select max(data) from t where c8 < 2000; MAX(DATA) ------------------------------- 999999 SQL> SQL> select name, value 2 from v$mystat s, v$statname n 3 where s.statistic# = n.statistic# 4 and name like '%storage index%'; NAME VALUE ---------------------------------------------------------------- ------------------- cell physical IO bytes saved by storage index 4072647122944 cell XT granule IO bytes saved by storage index 0 SQL> SQL> select max(data) from t where c9 < 2000; MAX(DATA) ------------------------------- 999999 SQL> SQL> select name, value 2 from v$mystat s, v$statname n 3 where s.statistic# = n.statistic# 4 and name like '%storage index%'; NAME VALUE ---------------------------------------------------------------- ------------------- cell physical IO bytes saved by storage index 4087917764608 cell XT granule IO bytes saved by storage index 0 SQL> SQL> select max(data) from t where c10 < 2000; MAX(DATA) ------------------------------- 999999 SQL> SQL> select name, value 2 from v$mystat s, v$statname n 3 where s.statistic# = n.statistic# 4 and name like '%storage index%'; NAME VALUE ---------------------------------------------------------------- ------------------- cell physical IO bytes saved by storage index 4103188406272 cell XT granule IO bytes saved by storage index 0 SQL> SQL> select max(data) from t where c11 < 2000; MAX(DATA) ------------------------------- 999999 SQL> SQL> select name, value 2 from v$mystat s, v$statname n 3 where s.statistic# = n.statistic# 4 and name like '%storage index%'; NAME VALUE ---------------------------------------------------------------- ------------------- cell physical IO bytes saved by storage index 4118459047936 cell XT granule IO bytes saved by storage index 0 SQL> SQL> select max(data) from t where c12 < 2000; MAX(DATA) ------------------------------- 999999 SQL> SQL> select name, value 2 from v$mystat s, v$statname n 3 where s.statistic# = n.statistic# 4 and name like '%storage index%'; NAME VALUE ---------------------------------------------------------------- ------------------- cell physical IO bytes saved by storage index 4133729689600 cell XT granule IO bytes saved by storage index 0 -- -- And just to double check, lets make sure that C1 still has its index -- SQL> SQL> select max(data) from t where c1 < 2000; MAX(DATA) ------------------------------- 999999 SQL> SQL> select name, value 2 from v$mystat s, v$statname n 3 where s.statistic# = n.statistic# 4 and name like '%storage index%'; NAME VALUE ---------------------------------------------------------------- ------------------- cell physical IO bytes saved by storage index 4149000331264 cell XT granule IO bytes saved by storage index 0 SQL> SQL> SQL> SQL>

So you can see, we have (at least) 12 indexes. This routine could be scaled up to as many columns as you like.