Skip to Main Content
  • Questions
  • 12c Table Clustering feature and storage indexes, bloom filters and where filters

Breadcrumb

more

Connor and Chris don't just spend all day on AskTOM. You can also catch regular content via Connor's blog and Chris's blog. Or if video is more your thing, check out Connor's latest video and Chris's latest video from their Youtube channels. And of course, keep up to date with AskTOM via the official twitter account.

Question and Answer

Maria Colgan

Thanks for the question.

Asked: March 01, 2017 - 1:37 am UTC

Answered by: Maria Colgan - Last updated: November 07, 2019 - 4:39 am UTC

Category: Database - Version: 12.1.0.1

Viewed 100+ times

Whilst you are here, check out some content from the AskTom team: All My OOW14 Sessions

You Asked

I would be grateful if you can shed some light on the question of using the 12c clustering (sorting) table data to improve the effectiveness of storage indexes.
The question is:
If there is
(A) a widely used predicate on columns SUBMISSION_UNIT_COB_DATE and MEASURE_NAME on a table named MEASURES which is partitioned by date on SUBMISSION_UNIT_COB_DATE
(B) A widely used join of the measure table columns SUBMISSION_UNIT_COB_DATE and SUBMISSION_UNIT_ID which results in a BLOOM FILTER on these columns
Q1. Can Bloom Filters benefit from storage indexes taking advantage of the clustered/sorted data on the bloom filter columns?
Q2. If the answer to Q1 is yes, then which will benefit me more, clustering on the standard WHERE predicates or Bloom filter columns i.e. (A) or
(B) ? Does this depend on whether the standard WHERE PREDICATES are executed first before the Bloom filters at the storage cells?
Q3. If the answer to Q1 is no, then should I ignore the bloom filter columns and always cluster on the standard WHERE predicates? i.e. ignore the join columns for clustering the table data?

I have copied the relevant info from the plan below:

Object: Measures

Access Predicates:Z>=:Z AND :Z<=:Z AND ("MEASURES"."MEASURE_NAME"='IR01 OR Bkt MRS' AND "MEASURES"."SCENARIO"<>'Base' AND "MEASURES"."SUBMISSION_UNIT_COB_DATE"='2017-02-02' AND SYS_OP_BLOOM_FILTER(:BF0000,"MEASURES"."SUBMISSION_UNIT_ID","MEASURES"."SUBMISSION_UNIT_COB_DATE"))

Filter Predicates: ("MEASURES"."MEASURE_NAME"='IR01 OR Bkt MRS' AND "MEASURES"."SCENARIO"<>'Base' AND "MEASURES"."SUBMISSION_UNIT_COB_DATE"='2017-02-02' AND SYS_OP_BLOOM_FILTER (:BF0000,"MEASURES"."SUBMISSION_UNIT_ID","MEASURES"."SUBMISSION_UNIT_COB_DATE"))


and we said...

Q1. Can a bloom filter benefit from a storage index?

The answer is yes, a bloom filter can benefit from a storage index.

Q2. If the answer to Q1 is yes, then which will benefit me more, clustering on the standard WHERE predicates or Bloom filter columns i.e. (A) or (B) ? Does this depend on whether the standard WHERE PREDICATES are executed first before the Bloom filters at the storage cells?

The order in which the predicates are evaluated will impact the benefit of using the storage indexes. All where clause predicates (including Bloom Filters) will always be evaluated after partition pruning has occurred.

Since your table is partitioned on SUBMISSION_UNIT_COB_DATE and the query contains an equality predicate on that column, then this predicate will be applied first but it's unlike to get a huge benefit from the storage index as it will only access data within the partition where the SUBMISSION_UNIT_COB_DATE=='2017-02-02'. So, I wouldn't bother sorting on the SUBMISSION_UNIT_COB_DATE column.

The remaining predicates will then be applied, with the most selective predicate being applied first. So, to answer your question regarding which column to sort on, I would recommend you sort on the most commonly used selective, equality predicate.

and you rated our response

  (1 rating)

Is this answer out of date? If it is, please let us know via a Review

Reviews

Few more question on storage index

November 06, 2019 - 12:58 am UTC

Reviewer: Rajeshwaran Jeyabal

Team,

was reading through this blog post

https://sqlmaria.com/2017/02/21/oracle-storage-index/

<quote>
While on the Exadata storage cells there is a limit on the number of Storage Indexes created on each table.
Prior Exadata 12.2.1.1.0 the limit was 8 columns per table.
From Exadata 12.2.1.1.0 onwards the limit has been increase to 24 columns per table,
which means all of the critical columns used in the where clause predicates of your queries will definitely be covered
</quote>

questions
a) is that the storage index will be created during CTAS / any direct path loads or post the query access?
b) How the storage server understands which columns in the table are the potential candidate for Storage index?
c) At any given point in time, how do we tell which columns in a table are part of storage index or not ?
d) Can you please help us with a test case to show how to prove storage index has the limit of 24 columns per table?

Connor McDonald

Followup  

November 07, 2019 - 4:39 am UTC

a) Yes. Anything that will touch *all* block for the table can yield a storage index.

b) The algorithm is not public but the storage servers decide on the "best" columns to index. The definition of "best" is internal, but I would expect levels of usage in predicates, and the clustering of the data to be dominant factors.

c) I don't know of a view or similar structure that exposes that. The only way I know would be to run a test query for each column. There are some trace settings that can be applied on the storage servers - you'd need to talk to Support about them because obviously tracing *every* I/O is generally a bad idea :-)

d) Here's an example

SQL> create table t as
  2  select
  3    trunc(rownum/1000) c1,
  4    trunc(rownum/1000) c2,
  5    trunc(rownum/1000) c3,
  6    trunc(rownum/1000) c4,
  7    trunc(rownum/1000) c5,
  8    trunc(rownum/1000) c6,
  9    trunc(rownum/1000) c7,
 10    trunc(rownum/1000) c8,
 11    trunc(rownum/1000) c9,
 12    trunc(rownum/1000) c10,
 13    trunc(rownum/1000) c11,
 14    trunc(rownum/1000) c12,
 15    rpad(rownum,500) data
 16  from
 17    ( select 1 from dual connect by level < 5000 ),
 18    ( select 1 from dual connect by level < 5000 );

Table created.

--
-- lets do a scan on C1
--

SQL> select max(data) from t where c1 < 1000;

MAX(DATA)
------------------------------------------------------------
999999

--
-- we can see that just one scan was not enough to convince
-- the cells to make an index
--


SQL> select name, value
  2  from v$mystat s, v$statname n
  3  where s.statistic# = n.statistic#
  4  and name like '%storage index%';

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index                             0
cell XT granule IO bytes saved by storage index                           0

--
-- Now we'll do 20 scans on C1
--

SQL> declare
  2    x varchar2(1000);
  3  begin
  4  for i in 1 .. 20
  5  loop
  6    execute immediate 'select max(data) from t where c1 < '||(i*10+1000) into x;
  7  end loop;
  8  end;
  9  /

PL/SQL procedure successfully completed.

--
-- and we can see that we got some savings
--
SQL> select name, value
  2  from v$mystat s, v$statname n
  3  where s.statistic# = n.statistic#
  4  and name like '%storage index%';

NAME                                                                        VALUE
---------------------------------------------------------------- ----------------
cell physical IO bytes saved by storage index                        316601630720
cell XT granule IO bytes saved by storage index                                 0

--
-- now I'll do 20 scans on columns C1.. C12, to see if they ALL can get an index
--

SQL> declare
  2    x varchar2(1000);
  3  begin
  4  for c in 1 .. 12 loop
  5    for i in 1 .. 20
  6    loop
  7      execute immediate 'select max(data) from t where c'||c||' < '||(i*10+1000) into x;
  8    end loop;
  9  end loop;
 10  end;
 11  /

PL/SQL procedure successfully completed.

SQL>
SQL> select name, value
  2  from v$mystat s, v$statname n
  3  where s.statistic# = n.statistic#
  4  and name like '%storage index%';

NAME                                                                           VALUE
---------------------------------------------------------------- -------------------
cell physical IO bytes saved by storage index                          3950481989632
cell XT granule IO bytes saved by storage index                                    0

--
-- So this is our starting point (3950481989632), now I'll run a scan on 
-- each column C1.. C12, and see if with each one we got a savings
--

SQL> select max(data) from t where c1 < 2000;

MAX(DATA)
-------------------------------
999999






SQL> select name, value
  2  from v$mystat s, v$statname n
  3  where s.statistic# = n.statistic#
  4  and name like '%storage index%';

NAME                                                                           VALUE
---------------------------------------------------------------- -------------------
cell physical IO bytes saved by storage index                          3965752631296
cell XT granule IO bytes saved by storage index                                    0

SQL>
SQL> select max(data) from t where c2 < 2000;

MAX(DATA)
-------------------------------
999999






SQL>
SQL> select name, value
  2  from v$mystat s, v$statname n
  3  where s.statistic# = n.statistic#
  4  and name like '%storage index%';

NAME                                                                           VALUE
---------------------------------------------------------------- -------------------
cell physical IO bytes saved by storage index                          3981023272960
cell XT granule IO bytes saved by storage index                                    0

SQL>
SQL> select max(data) from t where c3 < 2000;

MAX(DATA)
-------------------------------
999999






SQL>
SQL> select name, value
  2  from v$mystat s, v$statname n
  3  where s.statistic# = n.statistic#
  4  and name like '%storage index%';

NAME                                                                           VALUE
---------------------------------------------------------------- -------------------
cell physical IO bytes saved by storage index                          3996293914624
cell XT granule IO bytes saved by storage index                                    0

SQL>
SQL> select max(data) from t where c4 < 2000;

MAX(DATA)
-------------------------------
999999






SQL>
SQL> select name, value
  2  from v$mystat s, v$statname n
  3  where s.statistic# = n.statistic#
  4  and name like '%storage index%';

NAME                                                                           VALUE
---------------------------------------------------------------- -------------------
cell physical IO bytes saved by storage index                          4011564556288
cell XT granule IO bytes saved by storage index                                    0

SQL>
SQL> select max(data) from t where c5 < 2000;

MAX(DATA)
-------------------------------
999999






SQL>
SQL> select name, value
  2  from v$mystat s, v$statname n
  3  where s.statistic# = n.statistic#
  4  and name like '%storage index%';

NAME                                                                           VALUE
---------------------------------------------------------------- -------------------
cell physical IO bytes saved by storage index                          4026835197952
cell XT granule IO bytes saved by storage index                                    0

SQL>
SQL> select max(data) from t where c6 < 2000;

MAX(DATA)
-------------------------------
999999






SQL>
SQL> select name, value
  2  from v$mystat s, v$statname n
  3  where s.statistic# = n.statistic#
  4  and name like '%storage index%';

NAME                                                                           VALUE
---------------------------------------------------------------- -------------------
cell physical IO bytes saved by storage index                          4042105839616
cell XT granule IO bytes saved by storage index                                    0

SQL>
SQL> select max(data) from t where c7 < 2000;

MAX(DATA)
-------------------------------
999999






SQL>
SQL> select name, value
  2  from v$mystat s, v$statname n
  3  where s.statistic# = n.statistic#
  4  and name like '%storage index%';

NAME                                                                           VALUE
---------------------------------------------------------------- -------------------
cell physical IO bytes saved by storage index                          4057376481280
cell XT granule IO bytes saved by storage index                                    0

SQL>
SQL> select max(data) from t where c8 < 2000;

MAX(DATA)
-------------------------------
999999






SQL>
SQL> select name, value
  2  from v$mystat s, v$statname n
  3  where s.statistic# = n.statistic#
  4  and name like '%storage index%';

NAME                                                                           VALUE
---------------------------------------------------------------- -------------------
cell physical IO bytes saved by storage index                          4072647122944
cell XT granule IO bytes saved by storage index                                    0

SQL>
SQL> select max(data) from t where c9 < 2000;

MAX(DATA)
-------------------------------
999999






SQL>
SQL> select name, value
  2  from v$mystat s, v$statname n
  3  where s.statistic# = n.statistic#
  4  and name like '%storage index%';

NAME                                                                           VALUE
---------------------------------------------------------------- -------------------
cell physical IO bytes saved by storage index                          4087917764608
cell XT granule IO bytes saved by storage index                                    0

SQL>
SQL> select max(data) from t where c10 < 2000;

MAX(DATA)
-------------------------------
999999






SQL>
SQL> select name, value
  2  from v$mystat s, v$statname n
  3  where s.statistic# = n.statistic#
  4  and name like '%storage index%';

NAME                                                                           VALUE
---------------------------------------------------------------- -------------------
cell physical IO bytes saved by storage index                          4103188406272
cell XT granule IO bytes saved by storage index                                    0

SQL>
SQL> select max(data) from t where c11 < 2000;

MAX(DATA)
-------------------------------
999999






SQL>
SQL> select name, value
  2  from v$mystat s, v$statname n
  3  where s.statistic# = n.statistic#
  4  and name like '%storage index%';

NAME                                                                           VALUE
---------------------------------------------------------------- -------------------
cell physical IO bytes saved by storage index                          4118459047936
cell XT granule IO bytes saved by storage index                                    0

SQL>
SQL> select max(data) from t where c12 < 2000;

MAX(DATA)
-------------------------------
999999






SQL>
SQL> select name, value
  2  from v$mystat s, v$statname n
  3  where s.statistic# = n.statistic#
  4  and name like '%storage index%';

NAME                                                                           VALUE
---------------------------------------------------------------- -------------------
cell physical IO bytes saved by storage index                          4133729689600
cell XT granule IO bytes saved by storage index                                    0

--
-- And just to double check, lets make sure that C1 still has its index
--

SQL>
SQL> select max(data) from t where c1 < 2000;

MAX(DATA)
-------------------------------
999999






SQL>
SQL> select name, value
  2  from v$mystat s, v$statname n
  3  where s.statistic# = n.statistic#
  4  and name like '%storage index%';

NAME                                                                           VALUE
---------------------------------------------------------------- -------------------
cell physical IO bytes saved by storage index                          4149000331264
cell XT granule IO bytes saved by storage index                                    0

SQL>
SQL>
SQL>
SQL>


So you can see, we have (at least) 12 indexes. This routine could be scaled up to as many columns as you like.