Thanks for the question, Tom .

Asked: September 17, 2001 - 8:56 am UTC

Last updated: September 15, 2007 - 7:31 pm UTC

Version: 8.1.7.,2

Viewed 1000+ times

You Asked

Tom,

Can you please explain whether or not it matters to use the most selective column first in the order of column that consists the index ?.

I had always assumed that the most selective column should be the first column of an index. But if this rule of thumb is used, we cannot take advantage of "compress" indexes (this is my understanding of how compress works).

Can you please comment on what the best practise is ?.

Thanks & Regards

and Tom said...

Well, I actually go into this in some depth in my book.

It has not been true since at least version 6 that having the most selective first is better.

The PRIMARY rule that drive the order of columns in a concatenated index should be the types of predicates you use. If you query:

select * from t where a = :a and b = :b and c = :c;
select * from t where b = :b and c = :c;
select * from t where c = :c;

The most logical index might be on (c,b,a) as it can EASILY be used for all three. Note that I don't really care about the selectivity of anything here -- the optimizer will take care of that for me (if C is not selective enough by itself, it'll not use the index, if it is -- it will).

In 8i, as you pointed out -- there is a compelling reason to put the LEAST discriminating columns first -- compression.

In 9i, there is even a MORE compelling reason. Index skip scans -- see
</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:1156161921465 <code>
here, having the least discriminating entry first can allow us to use an index even when the columns in the predicate are NOT on the leading edge of the index. For example given a table t with an index t_idx(a,b), if A is not selective, we might be able to use a skip scan on a query like:

select * from t where b = 5;

Rating

(9 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Tom, could you please elaborate on "compress index" feature

Andrew, May 09, 2002 - 1:44 pm UTC

I always find something new in your answers. This time "skip index" (explained very well in your previous link) and "compressed index". Could you please elaborate on this compressed index feature, how it works and how it is related to the "most selective" column. I probably shouldn't ask this, because it is probably in the docs, but, Tom, you have such a talent to make it clear....

May 09, 2002 - 7:43 pm UTC

Better yet -- I've explained it in great deatil in my book in the chapter on indexes! I explained tons of stuff in there.

column order - oracle docs

kapil, May 22, 2002 - 10:57 am UTC

Hi Tom,
You have said that since version 6 it is not better to have the most selective column in the leading position
But in the oracle documentation - Oracle8i Designing and Tuning for Performance - Data Access Methods
the following statement is mentioned
"If all keys are used in WHERE clauses equally often, then ordering these keys from most selective to least selective in the CREATE INDEX statement best improves query performance"
Could you throw some light on that statement?

Kapil

May 22, 2002 - 10:54 pm UTC

it is in my experience -- a mistatement on their part. I would say it is inaccurate.

In my book -- I actually benchmark it:

tkyte@TKYTE816> create table t
2 nologging
3 as
4 select * from all_objects;
Table created.

tkyte@TKYTE816> create index t_idx_1 on t(owner,object_type,object_name)
2 nologging pctfree 0;
Index created.

tkyte@TKYTE816> create index t_idx_2 on t(object_name,object_type,owner)
2 nologging pctfree 0;
Index created.

tkyte@TKYTE816> select count(distinct owner), count(distinct object_type),
2 count(distinct object_name ), count(*)
3 from t;

(DISTINCTOWNER) (DISTINCTOBJECT_TYPE) (DISTINCTOBJECT_NAME) COUNT(*)
--------------- --------------------- --------------------- ----------
24 23 12265 21975

Now, to show that neither is more efficient space wise, we?ll measure their space utilization:

tkyte@TKYTE816> exec show_space( 'T_IDX_1', user, 'INDEX' );
Free Blocks.............................0
Total Blocks............................192
Total Bytes.............................1572864
Unused Blocks...........................51
Unused Bytes............................417792
Last Used Ext FileId....................6
Last Used Ext BlockId...................4745
Last Used Block.........................13

PL/SQL procedure successfully completed.

tkyte@TKYTE816> exec show_space( 'T_IDX_2', user, 'INDEX' );
Free Blocks.............................0
Total Blocks............................192
Total Bytes.............................1572864
Unused Blocks...........................51
Unused Bytes............................417792
Last Used Ext FileId....................6
Last Used Ext BlockId...................4937
Last Used Block.........................13

PL/SQL procedure successfully completed.

They use exactly the same amount of space ? there are no differences there (however, the first index is a lot more COMPRESSABLE if we used index key compression! There is an argument for going from least discriminating to most discriminating). Now, lets see how they perform ? if either index is generally more efficient than the other. In order to test that, I used a PLSQL block with hinted queries (so as to use one index or the other) like this:

tkyte@TKYTE816> alter session set sql_trace=true;
Session altered.

tkyte@TKYTE816> declare
2 cnt int;
3 begin
4 for x in ( select owner, object_type, object_name from t )
5 loop
6 select /*+ INDEX( t t_idx_1 ) */ count(*) into cnt
7 from t
8 where object_name = x.object_name
9 and object_type = x.object_type
10 and owner = x.owner;
11
12 select /*+ INDEX( t t_idx_2 ) */ count(*) into cnt
13 from t
14 where object_name = x.object_name
15 and object_type = x.object_type
16 and owner = x.owner;
17 end loop;
18 end;
19 /

PL/SQL procedure successfully completed.

That read ever single row in the index one at a time. The TKPROF report shows us:

SELECT /*+ INDEX( t t_idx_1 ) */COUNT(*)
FROM
T WHERE OBJECT_NAME = :b1 AND OBJECT_TYPE = :b2 AND OWNER = :b3

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ----- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 21975 2.35 2.55 0 0 0 0
Fetch 21975 1.40 1.57 0 44088 0 21975
------- ------ -------- ---------- ----- ---------- ---------- ----------
total 43951 3.75 4.12 0 44088 0 21975

Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
21975 SORT (AGGREGATE)
21975 INDEX (RANGE SCAN) OF 'T_IDX_1' (NON-UNIQUE)

********************************************************************************

SELECT /*+ INDEX( t t_idx_2 ) */COUNT(*)
FROM
T WHERE OBJECT_NAME = :b1 AND OBJECT_TYPE = :b2 AND OWNER = :b3

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ----- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 21975 2.10 2.44 0 0 0 0
Fetch 21975 1.65 1.60 0 44088 0 21975
------- ------ -------- ---------- ----- ---------- ---------- ----------
total 43951 3.75 4.04 0 44088 0 21975

Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
21975 SORT (AGGREGATE)
21975 INDEX (RANGE SCAN) OF 'T_IDX_2' (NON-UNIQUE)

They processed the same exact number of rows, blocks, used equivalent amounts of CPU time and ran in about the same elapsed time (run this same test again and the CPU/ELAPSED numbers will be a little different but on average ? they will be the same). There are no inherit efficiencies to be had ordering the columns in order how discriminating they are.

comparison

kapil, May 24, 2002 - 10:27 am UTC

Hi Tom
That was helpful indeed.
Well the only thing which could be added to the above analysis is comparing the space and performance with the same index in the compressed form.
Would it be possible for you to draw that comparison?
Also, could you share the code of the PL/SQL block "show_space"?

May 24, 2002 - 11:21 am UTC

simple search for show_space on this site will find the code (its in the book as well)

Sort of thought with the above -- anyone would be able to do the compress example if interested.... go for it.

thanks

Kapil, May 27, 2002 - 10:34 am UTC

Hi Tom,
No problem
Thanks for all your help

Sushanta, March 23, 2004 - 6:29 am UTC

Hi Tom,
  I have a problem in the Index Picking.

#----------------------------I have a table 
SQL> desc check_008
 Name                                      Null?    Type
 ----------------------------------------- -------- ---------------
 A                                                  NUMBER
 B                                                  NUMBER
 C                                                  VARCHAR2(2000)


#----------------------------Number of Records present in the table
SQL> SELECT COUNT(*)
  2    FROM check_008
  3  /

  COUNT(*)
----------
     99001

#----------------------------I have a index on the table on column (A,B)
SQL> l
  1  SELECT index_name,SUBSTR(column_name,1,20),column_position
  2    FROM all_ind_columns
  3*  WHERE table_name='CHECK_008'
SQL> /

INDEX_NAME                     SUBSTR(COLUMN_NAME,1 COLUMN_POSITION
------------------------------ -------------------- ---------------
DEL_001                        B                                  2
DEL_001                        A                                  1


SQL> set autotrace traceonly exp
SQL> DELETE FROM check_008
  2   WHERE (a=10 OR 10 IS NULL)
  3     AND (b=2  OR 2 IS NULL)
  4     AND (c='23-MAR-049218' OR '23-MAR-049218' IS NULL)
  5  /

1 row deleted.

Execution Plan
----------------------------------------------------------
   0      DELETE STATEMENT Optimizer=CHOOSE (Cost=14 Card=1 Bytes=19)
   1    0   DELETE OF 'CHECK_008'
   2    1     TABLE ACCESS (BY INDEX ROWID) OF 'CHECK_008' (Cost=14 Card=1 Bytes=19)
   3    2       INDEX (RANGE SCAN) OF 'DEL_001' (NON-UNIQUE) (Cost=7 Card=1)


Then i create a procedure with the same Delete statement like this
SQL>CREATE OR REPLACE PROCEDURE pr_without_nvl(p_a IN NUMBER,p_b IN NUMBER,p_c IN VARCHAR2,p_rows_affected OUT NUMBER)
   IS
BEGIN
 DELETE FROM check_008
  WHERE (a = p_a OR p_a IS NULL)
    AND (b = p_b OR p_b IS NULL)
    AND (c = p_c OR p_c IS NULL);
 p_rows_affected := SQL%ROWCOUNT;
END;
/
Procedure created.

SQL> alter session set sql_trace=true;

Session altered.

SQL> alter session set timed_statistics=true;

Session altered.

SQL> variable delete_count NUMBER;
SQL> exec pr_without_nvl(10,2,'23-MAR-049218',:delete_count);

PL/SQL procedure successfully completed.

SQL> print :delete_count

DELETE_COUNT
------------
           1

By seeing the Tkprof File it looks like 



DELETE FROM CHECK_008
WHERE
 (A = :b1  OR :b1 IS NULL ) AND (B = :b3  OR :b3 IS NULL ) AND (C = :b5  OR
  :b5 IS NULL )


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.02       0.00          0         70          0           0
Execute      1      0.11       0.16        206        831         24           1
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.13       0.16        206        901         24           1

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 46  (ATLAS)   (recursive depth: 1)

Rows     Execution Plan
-------  ---------------------------------------------------
      0  DELETE STATEMENT   GOAL: CHOOSE
      0   DELETE OF 'CHECK_008'
      0    TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'CHECK_008'



Where the Plan is shows as FULL Table Scan ........................????

However from this it is picking Index.

SQL> DELETE FROM check_008
  2   WHERE (a=10 OR 10 IS NULL)
  3     AND (b=2  OR 2 IS NULL)
  4     AND (c='23-MAR-049218' OR '23-MAR-049218' IS NULL)
  5  /

1 row deleted.

Execution Plan
----------------------------------------------------------
   0      DELETE STATEMENT Optimizer=CHOOSE (Cost=14 Card=1 Bytes=19)
   1    0   DELETE OF 'CHECK_008'
   2    1     TABLE ACCESS (BY INDEX ROWID) OF 'CHECK_008' (Cost=14 Card=1 Bytes=19)
   3    2       INDEX (RANGE SCAN) OF 'DEL_001' (NON-UNIQUE) (Cost=7 Card=1)


My Point is is it better to change the Query from
DELETE check_008 
 WHERE a = NVL(p_a,a)
   AND b = NVL(p_b,b)
   AND c = NVL(p_c,c);

to

DELETE check_008 
 WHERE (a = p_a OR p_a IS NULL)
   AND (b = p_b OR p_b IS NULL)
   AND (c = p_c OR p_c IS NULL)

will increase performance when there is a composite index on a and b.
Also why there is a deviation in the Plan, from the SQL*PLUS to Procedure .


Thanks And Best Regards
Sushanta

March 23, 2004 - 7:26 am UTC

when you use bind variables -- think about it...

sometimes A is provided. sometimes not.
sometimes B is provided. sometimes not.

when neither of A or B is provided -- full scan
when A is not provided -- full scan (or index skip scan but probable full scan)
when A is provided -- possible to use index

but -- no one knows IF or WHEN A would be provided.....

in the example with constants, it is well known that A is provided (try your sqlplus example with binds and you'll get the same result)

there is a deviation in the plan because you are comparing APPLES to Toaster ovens and there is no possible comparision there....

Now, 100,000 rows is pretty tiny, so if this delete is not executed very often, what you have is probably fine. Else you can

l_string := 'delete from t ';
if ( p_a is not null )
then
l_string := l_string || ' where a = :a ';
else
l_string := l_string || ' where :a is null ';
end if;
if ( p_b is not null )
then
l_string := l_string || ' and b = :b ';
else
l_string := l_string || ' and :b is null ';
end if;
if ( p_c is not null )
then
l_string := l_string || ' and c = :c ';
else
l_string := l_string || ' and :c is null ';
end if;
execute immediate l_string using p_a, p_b, p_c;

if A is provided, index.
if A is not provided, probable full scan

A reader, March 25, 2004 - 5:39 am UTC

Hi Tom,
  Many thanks for ur input. One thing to clear and it will overall clear on this concept
  When i try for this one in SQL*PLUS

SQL> variable b1 NUMBER
SQL> variable b2 NUMBER
SQL> variable b3 VARCHAR2
STAGE - I
=========
SQL> DELETE FROM CHECK_008
WHERE
 (A = :b1 ) AND (B = :b2  OR :b2 IS NULL ) AND (C = :b3  OR :b3 IS NULL )  2    3
  4  /

0 rows deleted.


Execution Plan
----------------------------------------------------------
   0      DELETE STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=2028)
   1    0   DELETE OF 'CHECK_008'
   2    1     TABLE ACCESS (BY INDEX ROWID) OF 'CHECK_008' (Cost=1 Car
          d=1 Bytes=2028)

   3    2       INDEX (RANGE SCAN) OF 'DEL_008' (NON-UNIQUE)



STAGE-II
========
SQL> DELETE FROM CHECK_008
WHERE
 (A = :b1 OR :b1 IS NULL) AND (B = :b2  OR :b2 IS NULL ) AND (C = :b3  OR :b3 IS NULL )  2    3
  4  /

0 rows deleted.


Execution Plan
----------------------------------------------------------
   0      DELETE STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=2028)
   1    0   DELETE OF 'CHECK_008'
   2    1     TABLE ACCESS (FULL) OF 'CHECK_008' (Cost=1 Card=1 Bytes=
          2028)

Colud you please clear on this that
In case of STAGE-I, ORACLE-OPTIMIZER knows that the value of the bind variable(:b1) must come,whereas in case 
of STAGE-II, ORACLE-OPTIMIZER is in "Doubt!!" , whether the value of the bind variable (:b1) will come or not, 
for that reason it will go for "FULL TABLE SCAN", without taking any RISK.

Is it so ??

Also how ORACLE generates PLAN in case of "OR" Operator ?

Many Thanks ...

Sushanta

March 25, 2004 - 9:16 am UTC

the plan is set up ONCE and must be capable of doing ANY set of inputs, not just yours.

Hence, sometimes b1 could be null and sometimes not. therefore -- well, it has to do what it has to do (full scan)

first query -- you can always use an index.
second query -- you can use one when b1 is not null, you cannot when b1 is null.

sushanta, March 26, 2004 - 2:22 am UTC

Tom,
Thanks a Lot Tom for ur valuable Time.

Regards
Sushanta

Whether or not to use index

Sanji, September 12, 2007 - 5:28 pm UTC

Tom,
I have a doubt regarding index selectivity.
There are over 37 million records in a table and there is an index on a column which has only 17 distinct rows.

select count(distinct ovrtyp_id) from override; ===> 17 rows selected.

The statistics have been collected for all columns, size auto. Following is the data distribution for this column.

MIN(OVRTYP_ID) MAX(OVRTYP_ID) COUNT(OVRTYP_ID) WB
-------------- -------------- ---------------- ----------
102 108 783925 1
199 204 202747 2
300 300 32292746 4
401 450 177653 6
452 501 299539 7
700 700 402911 11
800 800 145645 13
900 900 256526 14
1100 1100 3303698 18
1206 1206 42410 19

The question is, is there a need for this index, theoretically ? The column has only 17 distinct rows.

Now here's the test case. When we run the following query, with the index in place, it takes over 3 minutes to execute

select ovr_status from OVERRIDE
where OVR_START_DATE='08-sep-2007'
and WBU_NAME='HR_REFRESH'
and ovrtyp_id=700
and substr(ovr_old_value,2,10)='EMP_PAYGRP'

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.01 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 703 6.00 196.51 122681 137319 0 10522
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 705 6.00 196.52 122681 137319 0 10522

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 20

Rows Row Source Operation
------- ---------------------------------------------------
10522 TABLE ACCESS BY INDEX ROWID OVERRIDE
402911 INDEX RANGE SCAN IDX_OVR_OVRTYP_ID (object id 3864)

If i drop the index, reanalyze the table(all columns size auto) and reexecute the same query, it continues to run for over an hour.

From the explain plan, it does show that 402911 records out of 37 million records, would require an index for faster selectivity, but the index which is being used is only on 17 distinct records.
Why is the index efficient in this case ?

Thanks
Sanji

Sorry, typo

Sanji, September 13, 2007 - 9:31 am UTC

It's actually 17 distinct values for the field ovrtyp_id.
Question is, why would an index be required on a table with over 37 million records and the index key selectivity being only 17.

Thanks
Sanji

September 15, 2007 - 7:31 pm UTC

maybe the values 1..16 each point to a single row

and the value of 17 points to 37,000,000-16 rows.

so where ovrtyp_id = 1 (thru 16) is very selective.

this index *might* be useful.

More to Explore

Performance

Get all the information about database performance in the Database Performance guide.

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

Tom, could you please elaborate on "compress index" feature

column order - oracle docs

comparison

thanks

Whether or not to use index

Sorry, typo

More to Explore

Performance

Question on indexing strategy

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

Tom, could you please elaborate on "compress index" feature

column order - oracle docs

comparison

thanks

Whether or not to use index

Sorry, typo

More to Explore

Performance