Skip to Main Content
  • Questions
  • How to decide to create index on column

Breadcrumb

XMAS_TREE

The AskTOM team is taking a break over the holiday season, so we're not taking questions or responding to comments, but we might still sneak in a few Office Hours sessions! Thanks for being a member of the AskTOM community.

Question and Answer

Chris Saxon

Thanks for the question, Sunny.

Asked: March 11, 2019 - 3:55 pm UTC

Last updated: March 14, 2019 - 8:37 am UTC

Version: 12c

Viewed 10K+ times! This question is

You Asked

Dear,
I talk about single table contain approx 20 to 25 columns. And in production database this table used with different column combination or single column also used in different where clause at different query.

When some time query run slow so how to decide which column is actually require in index or if some existing column also degreed the performance of query.

How DBA can identify those column actually require in index column.

and Chris said...

The short answer is:

- When the columns appear in the where clause of a SQL statement
- And adding the index enables the database to do less work than is needed for a full table scan => the query is faster

But there's a lot more to it than that ;)

I have an introduction to this on my YouTube channel; the first is:



You can access the full playlist at:

https://www.youtube.com/playlist?list=PL78V83xV2fYlLA-bjMU2ZvUKQOZNrqLEa

The site "Use the Index, Luke!" by Markus Winand also gives a great introduction to indexing

https://use-the-index-luke.com/

Rating

  (5 ratings)

We're not taking comments currently, so please try again later if you want to add a comment.

Comments

Richard's blog

Rajeshwaran, Jeyabal, March 13, 2019 - 6:39 am UTC

Ofcourse don't forget to checkout the RichardFoote's blog, that got a lot many goodies on Oracle indexes.

https://richardfoote.wordpress.com
Chris Saxon
March 13, 2019 - 11:12 am UTC

Yep, that's a great resource.

Selectivity

David D., March 13, 2019 - 9:00 am UTC


"How DBA can identify those column actually require in index column" : SELECTIVITY and CARDINALITY are very important.

In the Richard NIEMEC's book "Oracle Database 12c Release 2 Performance Tuning Tips & Techniques", I read that you can index columns used in WHERE clause or joins if their selectivity is good enought for the CBO :

1) If the table is short (less then 1 000 rows) : selectivity of 1%

2) If the table is big (more then 1 000 rows) : selectivity of 4 or 5%

It is not interesting to index a column of a table of 1 000 000 rows with only two values AND selectivity of 50% for each column or 10% and 90% : the CBO will use a full table scan. OK, OK, there are exceptions to this rule (like FULL INDEX SCANS, FAST FULL INDEX SCANS) but it is out of this question.

I confirm, "Use the Index, Luke!" by Markus Winand is very good (and there is a french traduction) :-)

David D.

Chris Saxon
March 13, 2019 - 11:17 am UTC

There's no hard cut off percentage where a full table scan (FTS) is more efficient than using an index.

A FTS can be faster when fetching as little at 0.5% of the rows from a table.
An index can be faster when fetching as much as 50% of the rows from a table.

What you really need to look at is the number of blocks processed.

How many blocks are there in the table? => number a FTS will read
And how many block accesses will you have using the index?

Broadly speaking the approach that reads fewer blocks will be faster.

.

David D., March 13, 2019 - 2:31 pm UTC


I was speaking about common tables, with thousands rows at least. For very little tables or special select, there are special rules if we speak with hard cut off.

I agree that a table with only 50 rows and, for example, two columns (id01 number id02 number) will be read faster with a Full Table Scan than with using an index : the FTS will read 1 block and the use of index will use 2 blocks : 1 index block and 1 table block. If I was executing
SELECT * from T1 where id01 = 1; 
then I read 1 block with FTS but I want only 2% off the table.
Maybe the FTS will read 4 blocks because Oracle created an extent with the table and under the HWM we have 4 blocks but it is 4 blocks read in only one call to the OS read procedure.

Well, If I have a million table rows with id, first_name, last_name, mail, with an index on mail, Oracle will use a Full Index Scan and only that for a selectivity of 100% if I execute
SELECT mail from t1;



Well, there are so many special cases...

Chris Saxon
March 13, 2019 - 2:38 pm UTC

There is no hard cut off. Even tiny tables still often benefit from an index:

https://richardfoote.wordpress.com/2009/04/16/indexes-on-small-tables-part-i-one-of-the-few/

It doesn't matter how many rows there are in the table. The point is you can't say fetching below X% => an index is always good, above that it's always bad.

David D., March 13, 2019 - 3:15 pm UTC


Very interesting but the test case is very specific.

I keep the rules given by Richard NIEMEC as general rules but, we agree, there are always particular cases.

Chris Saxon
March 13, 2019 - 5:06 pm UTC

I don't think you can follow those guidelines at all.

For example:

One table. Two queries.

One fetches < 1% of the rows from the table. But FTS is the chosen approach.

The over > 20% of the rows. Yet an index range scan is faster:

create table m_and_ms (
  m_and_m_id  not null,
  insert_datetime not null,
  expiry_datetime not null,
  colour not null,
  weight not null,
  type not null,
  junk,
  constraint m_and_ms_pk primary key (m_and_m_id)
) 
as
  select rownum m_and_m_id, 
         date'2018-01-01'+(rownum/550) insert_datetime,
         sysdate + 100 + floor ( rownum / 100 ) expiry_datetime,
         colour,
         weight,
         type,
         junk
  from   (
    select case 
             when rownum = 5500 then 'silver'
             else 
               case mod(rownum, 6)
                 when 0 then 'blue'
                 when 1 then 'brown'
                 when 2 then 'green'
                 when 3 then 'orange'
                 when 4 then 'red'
                 when 5 then 'yellow'
               end
           end colour,
           round(dbms_random.value(0.7, 1.1), 3) weight,
           case 
             when rownum<(5500/2) then 'chocolate' 
             else 'peanut'
           end type,
           rpad('x', 10, 'x') junk
    from   dual
    connect by level <= 5500
    order  by dbms_random.value
  );
  
create index mm_colour_i on m_and_ms (colour);
create index mm_weight_i on m_and_ms (weight);

exec dbms_stats.gather_table_stats ( user, 'm_and_ms' ) ;

alter session set statistics_level = all;
set serveroutput off

select count ( junk ) from m_and_ms m 
where  weight between 0.7 and 0.703;

select * 
from   table(dbms_xplan.display_cursor(null, null, 'IOSTATS LAST'));

select count ( junk ) from m_and_ms m  where  weight between 0.7 and                        
0.702                                                                                       
                                                                                            
SQL_ID  0z55624qq8fz0, child number 0                                                       
-------------------------------------                                                       
select count ( junk ) from m_and_ms m  where  weight between 0.7 and                        
0.703                                                                                       
                                                                                            
Plan hash value: 3457968548                                                                 
                                                                                            
-----------------------------------------------------------------------------------------   
| Id  | Operation          | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |   
-----------------------------------------------------------------------------------------   
|   0 | SELECT STATEMENT   |          |      1 |        |      1 |00:00:00.01 |      44 |   
|   1 |  SORT AGGREGATE    |          |      1 |      1 |      1 |00:00:00.01 |      44 |   
|*  2 |   TABLE ACCESS FULL| M_AND_MS |      1 |     55 |     51 |00:00:00.01 |      44 |   
----------------------------------------------------------------------------------------- 

select count ( junk ) from m_and_ms m
where  m_and_m_id between 1 and 1250;

select * 
from   table(dbms_xplan.display_cursor(null, null, 'IOSTATS LAST'));

SQL_ID  ab9rcyx39pz2n, child number 0                                                                            
-------------------------------------                                                                            
select count ( junk ) from m_and_ms m where  m_and_m_id between 1 and                                            
1250                                                                                                             
                                                                                                                 
Plan hash value: 1872607669                                                                                      
                                                                                                                 
--------------------------------------------------------------------------------------------------------------   
| Id  | Operation                            | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |   
--------------------------------------------------------------------------------------------------------------   
|   0 | SELECT STATEMENT                     |             |      1 |        |      1 |00:00:00.01 |      14 |   
|   1 |  SORT AGGREGATE                      |             |      1 |      1 |      1 |00:00:00.01 |      14 |   
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| M_AND_MS    |      1 |   1250 |   1250 |00:00:00.01 |      14 |   
|*  3 |    INDEX RANGE SCAN                  | M_AND_MS_PK |      1 |   1250 |   1250 |00:00:00.01 |       4 |   
--------------------------------------------------------------------------------------------------------------   

the art of the index (is elusive)

Racer I., March 14, 2019 - 7:08 am UTC

Hi,

Naughty trickster ;) Using an index with perfect clustering. That's not stable (unless you have no housekeeping and no parallel inserts). Also who does range scans on technical ID-PKs?

But it does show that doing it by cookbook will never find the best solutions (here because the data will change). There are also the cases where an index is good for some statements yet sends others into a coma (see for example https://asktom.oracle.com/pls/apex/asktom.search?tag=index-skip-scan-with-high-ndv-leading-column ). Yet other cases are if the best indexes are over multiple columns rather than one per column (including the question of their order/compression).

Our approach is to have as few indexes as possible. We trawl our dbs for unused indexes (via MONITORING or even better techniques in newer versions) and question and possible drop them. So maybe you could add a few too much and weed them out later. But don't go hog wild because the maintenance in the interim is not cheap.
The inverse is of course to monitor for slow statements and add indexes if it looks it might help. Better yet train your SQL-developers to check their statements for performance and missing indexes while they write them or have at least some in-house experts that can sanity check them.

Special cases are the indexes supporting referential constraints. If they point to small lookup tables they will hardly ever be used for selects. If you can avoid deleting (and you should never update the pk) the referenced rows (simply add an 'yn_obsolete' column and set it to 'Y') you don't need that index at all. You can also have housekeeping windows to delete all obsolete master rows where you temporarily create the indexes and drop them afterwards.

regards,
Chris Saxon
March 14, 2019 - 8:37 am UTC

Also who does range scans on technical ID-PKs?

I've seen it often. Many people assume because sequence values always increase, that means they match the insert order. So use it to do things like "get all rows since the last extract".

More to Explore

Administration

Need more information on Administration? Check out the Administrators guide for the Oracle Database