Skip to Main Content
  • Questions
  • 12c Table Clustering feature and storage indexes, bloom filters and where filters

Breadcrumb

May 4th

Question and Answer

Connor McDonald

Thanks for the question.

Asked: March 01, 2017 - 1:37 am UTC

Last updated: November 07, 2019 - 4:39 am UTC

Version: 12.1.0.1

Viewed 1000+ times

You Asked

I would be grateful if you can shed some light on the question of using the 12c clustering (sorting) table data to improve the effectiveness of storage indexes.
The question is:
If there is
(A) a widely used predicate on columns SUBMISSION_UNIT_COB_DATE and MEASURE_NAME on a table named MEASURES which is partitioned by date on SUBMISSION_UNIT_COB_DATE
(B) A widely used join of the measure table columns SUBMISSION_UNIT_COB_DATE and SUBMISSION_UNIT_ID which results in a BLOOM FILTER on these columns
Q1. Can Bloom Filters benefit from storage indexes taking advantage of the clustered/sorted data on the bloom filter columns?
Q2. If the answer to Q1 is yes, then which will benefit me more, clustering on the standard WHERE predicates or Bloom filter columns i.e. (A) or
(B) ? Does this depend on whether the standard WHERE PREDICATES are executed first before the Bloom filters at the storage cells?
Q3. If the answer to Q1 is no, then should I ignore the bloom filter columns and always cluster on the standard WHERE predicates? i.e. ignore the join columns for clustering the table data?

I have copied the relevant info from the plan below:

Object: Measures

Access Predicates:Z>=:Z AND :Z<=:Z AND ("MEASURES"."MEASURE_NAME"='IR01 OR Bkt MRS' AND "MEASURES"."SCENARIO"<>'Base' AND "MEASURES"."SUBMISSION_UNIT_COB_DATE"='2017-02-02' AND SYS_OP_BLOOM_FILTER(:BF0000,"MEASURES"."SUBMISSION_UNIT_ID","MEASURES"."SUBMISSION_UNIT_COB_DATE"))

Filter Predicates: ("MEASURES"."MEASURE_NAME"='IR01 OR Bkt MRS' AND "MEASURES"."SCENARIO"<>'Base' AND "MEASURES"."SUBMISSION_UNIT_COB_DATE"='2017-02-02' AND SYS_OP_BLOOM_FILTER (:BF0000,"MEASURES"."SUBMISSION_UNIT_ID","MEASURES"."SUBMISSION_UNIT_COB_DATE"))


and we said...

Q1. Can a bloom filter benefit from a storage index?

The answer is yes, a bloom filter can benefit from a storage index.

Q2. If the answer to Q1 is yes, then which will benefit me more, clustering on the standard WHERE predicates or Bloom filter columns i.e. (A) or (B) ? Does this depend on whether the standard WHERE PREDICATES are executed first before the Bloom filters at the storage cells?

The order in which the predicates are evaluated will impact the benefit of using the storage indexes. All where clause predicates (including Bloom Filters) will always be evaluated after partition pruning has occurred.

Since your table is partitioned on SUBMISSION_UNIT_COB_DATE and the query contains an equality predicate on that column, then this predicate will be applied first but it's unlike to get a huge benefit from the storage index as it will only access data within the partition where the SUBMISSION_UNIT_COB_DATE=='2017-02-02'. So, I wouldn't bother sorting on the SUBMISSION_UNIT_COB_DATE column.

The remaining predicates will then be applied, with the most selective predicate being applied first. So, to answer your question regarding which column to sort on, I would recommend you sort on the most commonly used selective, equality predicate.

Rating

  (1 rating)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Few more question on storage index

Rajeshwaran Jeyabal, November 06, 2019 - 12:58 am UTC

Team,

was reading through this blog post

https://sqlmaria.com/2017/02/21/oracle-storage-index/

<quote>
While on the Exadata storage cells there is a limit on the number of Storage Indexes created on each table.
Prior Exadata 12.2.1.1.0 the limit was 8 columns per table.
From Exadata 12.2.1.1.0 onwards the limit has been increase to 24 columns per table,
which means all of the critical columns used in the where clause predicates of your queries will definitely be covered
</quote>

questions
a) is that the storage index will be created during CTAS / any direct path loads or post the query access?
b) How the storage server understands which columns in the table are the potential candidate for Storage index?
c) At any given point in time, how do we tell which columns in a table are part of storage index or not ?
d) Can you please help us with a test case to show how to prove storage index has the limit of 24 columns per table?

Connor McDonald
November 07, 2019 - 4:39 am UTC

a) Yes. Anything that will touch *all* block for the table can yield a storage index.

b) The algorithm is not public but the storage servers decide on the "best" columns to index. The definition of "best" is internal, but I would expect levels of usage in predicates, and the clustering of the data to be dominant factors.

c) I don't know of a view or similar structure that exposes that. The only way I know would be to run a test query for each column. There are some trace settings that can be applied on the storage servers - you'd need to talk to Support about them because obviously tracing *every* I/O is generally a bad idea :-)

d) Here's an example

SQL> create table t as
  2  select
  3    trunc(rownum/1000) c1,
  4    trunc(rownum/1000) c2,
  5    trunc(rownum/1000) c3,
  6    trunc(rownum/1000) c4,
  7    trunc(rownum/1000) c5,
  8    trunc(rownum/1000) c6,
  9    trunc(rownum/1000) c7,
 10    trunc(rownum/1000) c8,
 11    trunc(rownum/1000) c9,
 12    trunc(rownum/1000) c10,
 13    trunc(rownum/1000) c11,
 14    trunc(rownum/1000) c12,
 15    rpad(rownum,500) data
 16  from
 17    ( select 1 from dual connect by level < 5000 ),
 18    ( select 1 from dual connect by level < 5000 );

Table created.

--
-- lets do a scan on C1
--

SQL> select max(data) from t where c1 < 1000;

MAX(DATA)
------------------------------------------------------------
999999

--
-- we can see that just one scan was not enough to convince
-- the cells to make an index
--


SQL> select name, value
  2  from v$mystat s, v$statname n
  3  where s.statistic# = n.statistic#
  4  and name like '%storage index%';

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index                             0
cell XT granule IO bytes saved by storage index                           0

--
-- Now we'll do 20 scans on C1
--

SQL> declare
  2    x varchar2(1000);
  3  begin
  4  for i in 1 .. 20
  5  loop
  6    execute immediate 'select max(data) from t where c1 < '||(i*10+1000) into x;
  7  end loop;
  8  end;
  9  /

PL/SQL procedure successfully completed.

--
-- and we can see that we got some savings
--
SQL> select name, value
  2  from v$mystat s, v$statname n
  3  where s.statistic# = n.statistic#
  4  and name like '%storage index%';

NAME                                                                        VALUE
---------------------------------------------------------------- ----------------
cell physical IO bytes saved by storage index                        316601630720
cell XT granule IO bytes saved by storage index                                 0

--
-- now I'll do 20 scans on columns C1.. C12, to see if they ALL can get an index
--

SQL> declare
  2    x varchar2(1000);
  3  begin
  4  for c in 1 .. 12 loop
  5    for i in 1 .. 20
  6    loop
  7      execute immediate 'select max(data) from t where c'||c||' < '||(i*10+1000) into x;
  8    end loop;
  9  end loop;
 10  end;
 11  /

PL/SQL procedure successfully completed.

SQL>
SQL> select name, value
  2  from v$mystat s, v$statname n
  3  where s.statistic# = n.statistic#
  4  and name like '%storage index%';

NAME                                                                           VALUE
---------------------------------------------------------------- -------------------
cell physical IO bytes saved by storage index                          3950481989632
cell XT granule IO bytes saved by storage index                                    0

--
-- So this is our starting point (3950481989632), now I'll run a scan on 
-- each column C1.. C12, and see if with each one we got a savings
--

SQL> select max(data) from t where c1 < 2000;

MAX(DATA)
-------------------------------
999999






SQL> select name, value
  2  from v$mystat s, v$statname n
  3  where s.statistic# = n.statistic#
  4  and name like '%storage index%';

NAME                                                                           VALUE
---------------------------------------------------------------- -------------------
cell physical IO bytes saved by storage index                          3965752631296
cell XT granule IO bytes saved by storage index                                    0

SQL>
SQL> select max(data) from t where c2 < 2000;

MAX(DATA)
-------------------------------
999999






SQL>
SQL> select name, value
  2  from v$mystat s, v$statname n
  3  where s.statistic# = n.statistic#
  4  and name like '%storage index%';

NAME                                                                           VALUE
---------------------------------------------------------------- -------------------
cell physical IO bytes saved by storage index                          3981023272960
cell XT granule IO bytes saved by storage index                                    0

SQL>
SQL> select max(data) from t where c3 < 2000;

MAX(DATA)
-------------------------------
999999






SQL>
SQL> select name, value
  2  from v$mystat s, v$statname n
  3  where s.statistic# = n.statistic#
  4  and name like '%storage index%';

NAME                                                                           VALUE
---------------------------------------------------------------- -------------------
cell physical IO bytes saved by storage index                          3996293914624
cell XT granule IO bytes saved by storage index                                    0

SQL>
SQL> select max(data) from t where c4 < 2000;

MAX(DATA)
-------------------------------
999999






SQL>
SQL> select name, value
  2  from v$mystat s, v$statname n
  3  where s.statistic# = n.statistic#
  4  and name like '%storage index%';

NAME                                                                           VALUE
---------------------------------------------------------------- -------------------
cell physical IO bytes saved by storage index                          4011564556288
cell XT granule IO bytes saved by storage index                                    0

SQL>
SQL> select max(data) from t where c5 < 2000;

MAX(DATA)
-------------------------------
999999






SQL>
SQL> select name, value
  2  from v$mystat s, v$statname n
  3  where s.statistic# = n.statistic#
  4  and name like '%storage index%';

NAME                                                                           VALUE
---------------------------------------------------------------- -------------------
cell physical IO bytes saved by storage index                          4026835197952
cell XT granule IO bytes saved by storage index                                    0

SQL>
SQL> select max(data) from t where c6 < 2000;

MAX(DATA)
-------------------------------
999999






SQL>
SQL> select name, value
  2  from v$mystat s, v$statname n
  3  where s.statistic# = n.statistic#
  4  and name like '%storage index%';

NAME                                                                           VALUE
---------------------------------------------------------------- -------------------
cell physical IO bytes saved by storage index                          4042105839616
cell XT granule IO bytes saved by storage index                                    0

SQL>
SQL> select max(data) from t where c7 < 2000;

MAX(DATA)
-------------------------------
999999






SQL>
SQL> select name, value
  2  from v$mystat s, v$statname n
  3  where s.statistic# = n.statistic#
  4  and name like '%storage index%';

NAME                                                                           VALUE
---------------------------------------------------------------- -------------------
cell physical IO bytes saved by storage index                          4057376481280
cell XT granule IO bytes saved by storage index                                    0

SQL>
SQL> select max(data) from t where c8 < 2000;

MAX(DATA)
-------------------------------
999999






SQL>
SQL> select name, value
  2  from v$mystat s, v$statname n
  3  where s.statistic# = n.statistic#
  4  and name like '%storage index%';

NAME                                                                           VALUE
---------------------------------------------------------------- -------------------
cell physical IO bytes saved by storage index                          4072647122944
cell XT granule IO bytes saved by storage index                                    0

SQL>
SQL> select max(data) from t where c9 < 2000;

MAX(DATA)
-------------------------------
999999






SQL>
SQL> select name, value
  2  from v$mystat s, v$statname n
  3  where s.statistic# = n.statistic#
  4  and name like '%storage index%';

NAME                                                                           VALUE
---------------------------------------------------------------- -------------------
cell physical IO bytes saved by storage index                          4087917764608
cell XT granule IO bytes saved by storage index                                    0

SQL>
SQL> select max(data) from t where c10 < 2000;

MAX(DATA)
-------------------------------
999999






SQL>
SQL> select name, value
  2  from v$mystat s, v$statname n
  3  where s.statistic# = n.statistic#
  4  and name like '%storage index%';

NAME                                                                           VALUE
---------------------------------------------------------------- -------------------
cell physical IO bytes saved by storage index                          4103188406272
cell XT granule IO bytes saved by storage index                                    0

SQL>
SQL> select max(data) from t where c11 < 2000;

MAX(DATA)
-------------------------------
999999






SQL>
SQL> select name, value
  2  from v$mystat s, v$statname n
  3  where s.statistic# = n.statistic#
  4  and name like '%storage index%';

NAME                                                                           VALUE
---------------------------------------------------------------- -------------------
cell physical IO bytes saved by storage index                          4118459047936
cell XT granule IO bytes saved by storage index                                    0

SQL>
SQL> select max(data) from t where c12 < 2000;

MAX(DATA)
-------------------------------
999999






SQL>
SQL> select name, value
  2  from v$mystat s, v$statname n
  3  where s.statistic# = n.statistic#
  4  and name like '%storage index%';

NAME                                                                           VALUE
---------------------------------------------------------------- -------------------
cell physical IO bytes saved by storage index                          4133729689600
cell XT granule IO bytes saved by storage index                                    0

--
-- And just to double check, lets make sure that C1 still has its index
--

SQL>
SQL> select max(data) from t where c1 < 2000;

MAX(DATA)
-------------------------------
999999






SQL>
SQL> select name, value
  2  from v$mystat s, v$statname n
  3  where s.statistic# = n.statistic#
  4  and name like '%storage index%';

NAME                                                                           VALUE
---------------------------------------------------------------- -------------------
cell physical IO bytes saved by storage index                          4149000331264
cell XT granule IO bytes saved by storage index                                    0

SQL>
SQL>
SQL>
SQL>


So you can see, we have (at least) 12 indexes. This routine could be scaled up to as many columns as you like.