Skip to Main Content
  • Questions
  • Index strategy for wide and deep table

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, JAM.

Asked: October 15, 2007 - 4:31 pm UTC

Last updated: October 16, 2007 - 7:55 am UTC

Version: 9.2.0.7

Viewed 1000+ times

You Asked

Hi Tom,

I have a 9207 database with a particularly nasty table, 299 columns, about 38 million rows and growing. The table is about 28G in size, most of the columns are empty and few rows will have more than 30% of the columns populated. (poor design, but I'm sure you can see that from the brief description.

The table has about 27 indexes on it, 24 of those indexes have the same column in the number one or number 2 position. That particular column has 9 unique values. Most of the time, cardinality is poor since most indexes still return over 1 million rows to the table.

As proof of concept, I've shown the user/app people that by adding one more column to the index, the queries will return in much faster time. (I've tried talking about partitioning, table redesign, etc. but do to political factors the options are limited.)

My question is very general: How would you approach dealing with the same column on so many indexes? Should I simply extend the approprite existing indexes by adding additional columns to improve cardinality? Should focus on eliminating that column from the indexes since their are so few unique values? Or is it possible(or even recommended) to have one index with that column and then the rest with more appropriate columnds(higher cardinality).

I ask this as a general question because I'm trying to decide how to approach this beast. Right now I don't have too many options so I'm doing the best to work with what I got.

Thanks.

and Tom said...

(on a 38 million record table, 1 million rows is 1/38th of the data or about 2.6%, that would be selective....)

but if you can add columns to the index to reduce the table fetch by index rowid and a filter - that would be GREAT. for example:

ops$tkyte%ORA10GR2> create table t
  2  as
  3  select mod(rownum,38) id, a.*
  4    from (select * from all_objects order by dbms_random.random) a
  5  /

Table created.

ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> create index t_idx1 on t(id,owner);

Index created.

ops$tkyte%ORA10GR2> create index t_idx2 on t(id,owner,object_name);

Index created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> set termout off
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> set autotrace traceonly
ops$tkyte%ORA10GR2> select /*+ index( t, t_idx1 ) */ * from t where id = 0 and owner = 'SYS' and object_name = 'X';

no rows selected


Execution Plan
----------------------------------------------------------
Plan hash value: 4055207394

--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |     1 |    96 |    70   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T      |     1 |    96 |    70   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T_IDX1 |    82 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("OBJECT_NAME"='X')
   2 - access("ID"=0 AND "OWNER"='SYS')


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        496  consistent gets
          0  physical reads
          0  redo size
       1047  bytes sent via SQL*Net to client
        374  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

ops$tkyte%ORA10GR2> select /*+ index( t, t_idx2 ) */ * from t where id = 0 and owner = 'SYS' and object_name = 'X';

no rows selected


Execution Plan
----------------------------------------------------------
Plan hash value: 3371054274

--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |     1 |    96 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T      |     1 |    96 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T_IDX2 |     1 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID"=0 AND "OWNER"='SYS' AND "OBJECT_NAME"='X')


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          2  consistent gets
          0  physical reads
          0  redo size
       1047  bytes sent via SQL*Net to client
        374  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

ops$tkyte%ORA10GR2> set autotrace off


the first index - t_idx1 - is "OK"
the second index - t_idx2 - is "PERFECTION"


I cannot address the "same column on so many indexes", you would have to verify that the index is actually used - that you need "COLUMN1, X" and "COLUMN1, Y" and "COLUMN1, Z". The only thing you would look for would be indexes that are not used.

This column is pretty selective - 1/9th of the data is about 11%. When you start adding in other attributes - the INDEX becomes even more selective (and that is what counts, the index selectivity - not the selectivity of the individual attributes)


Rating

  (6 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

And index compression?

Stew Ashton, October 16, 2007 - 10:20 am UTC


When your 9-value column is in position one of the index, consider compressing the index to save space and reduce logical I/Os. Here's one of the places where Tom talks about this:

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:8806017524595

index column strategy

JAM, October 16, 2007 - 2:12 pm UTC

Thanks Tom, the issue I'm struggling the most with is deciding how selective is "good enough".

As a follow up -- In the 38M row table, column A has 9 unique values, column B has unique values.

Query has "...where A = :a and B = :b." There is an index on both columns.

Conceptually, If column B is unique do I really need to add column A to the index? Isn't an index with just column B perfect?

Do you think it is worth pursuing performance testing with and without this column on the 23 other indexes that have this column? Column A is not null, varchar2 20 by definition but all values are 10 char, and is never updated once the initial insert of the row is performed.

We're adding 2-3k rows per day to the table so to me, the performance issue I should be focusing in on is the increased response time of indexing this column vs the additional overhead of inserting an entry to 24 indexes.
That sound right?

Several of the indexes that use column A are very selective (< 5% of rows returned) without column A. It seems that including column A in the index doesn't increase selectivity by much.

One more thing.

JAM, October 16, 2007 - 2:22 pm UTC

In case you're wondering, even though column B has all unique values, the only constraint on the table are check for not null. There is no PK or unique constraints. column B relies on seq. generator for uniqueness but this is not guaranteed.


To Jam

martina, October 17, 2007 - 4:09 am UTC

And why do you not create a primary key ?

regards, martina

PK

JAM, October 17, 2007 - 3:08 pm UTC

Quote: "And why do you not create a primary key ?"

When it comes to tuning this DB, there are dozens of "why don't you...."

It's all poiltical. :-(
I'm doing the best I can to make meaningful changes but the user community is fearful becaue they've been burned before by other DBAs.


Index strategy for wide and deep table

Ravi Vedala, November 28, 2007 - 11:40 am UTC

Did you try to partition the table (with compression) and then gather statistics ? I think it is worth giving a try.

It would be good if you can share your findings, after trying out the different options mentioned above.

Ravi Vedala.

More to Explore

VLDB

If you are new to partitioning, check out Connor McDonald's introduction series here.

VLDB

Documentation set on VLDB and Partitioning.