Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Von.

Asked: September 03, 2005 - 10:52 am UTC

Last updated: May 29, 2012 - 6:35 am UTC

Version: 9.2

Viewed 1000+ times

You Asked

Tom
Thanks for your help to the oracle community. I have a question regarding Indexing....Say i have a table t
desc t
name
ssn
emp_id
emp_name
emp_status
dept


I am querying this table on multiple conditions
like

select * from t
where ssn =1245

select * from t
where emp_id =
and dept =



select * from t
where emp_id =
and emp_status =
and dept =

Say the table has 10million rows
Does it make sense to have 3 indexes on this table
index1 (ssn)

index2(emp_id,dept)

index3(emp_id,emp_status,dept)

I think they are redundant..but how can i convince the folks at my work that this is reduntant?
Thanks



and Tom said...

I would just have

index1(ssn)
index2(emp_id,dept,emp_status)


and that is it, the current index2 you have is making inserts/updates of the indexed fields take longer and if you fix the order of columns in index3, it can be used for both the second and third queries effectively.


don't know how to convince someone of "something that appears obvious"? Do they have any reasons they believe they need three? We can "prove" that the two indexes will be used to answer all three queries (assuming they should be used of course)

Rating

  (18 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Von, September 04, 2005 - 11:00 am UTC

Tom,
The reason they believe it gives more selectivity by just having an index on all column combinations that appears in the where clause conditions..for example
select * from t
where emp_id =
and dept =
Their argument is the above query will run faster if it uses index2 rather than index3..
Also
" if you fix the order of columns in index3, it can
be used for both the second and third queries effectively."

why the order matters??

Tom Kyte
September 04, 2005 - 6:13 pm UTC

arg, tell them to read Expert one on one Oracle where I sort of showed that myth of putting the most selective things first to be, well....

A MYTH.

say you have 1 million records.
Say a is ALMOST unique.
Say b has a single value.


the index in (a,b) and the index on (b,a) will perform the same for queries of the form

select * from t where a = ? and b = ?;

absolutely the same. Well, unless of course you put B first and compress the index - in which case the index on (b,a) might perform better as it could be a smaller index with less height.



In order to use a single index for the following predicates:

where c = ?
where c = ? and b = ?
where c = ? and b = ? and a = ?

you would want an index in (c,b,a) -- and forget all about the "selectivity", it doesn't really come into play here.


(you are looking for a key, the order of the columns in the key isn't going to affect finding that key)

Von

A reader, September 05, 2005 - 9:56 am UTC

Thanks for your response
In the following case

index2(emp_id,dept,emp_status)

select emp_status,dept from t
where emp_id =
The above sql doesnt need to perform the I/O to access the table to get the rows right as it just need to access the index right??


Tom Kyte
September 05, 2005 - 11:32 am UTC

correct, everything will come from the index itself.

So order doesn't matter at all?

A reader, September 05, 2005 - 5:08 pm UTC

Just to clarify - how would you pick the order yourself, does it matter one bit?

Tom Kyte
September 05, 2005 - 6:58 pm UTC

the order matters a lot after you look at the predicates. selectivity of a given column *does not* matter.


say you have:

where a=
where a= and b=
where a= and b= and c=

easy -- index(a,b,c) in that order.

Say you have

where b=? and a > ?

easy again, index (b,a) -- goto "b" range scan "a". Doing it (a,b) would in general be much worse -- say A ranges from 1 to 1,000,000. and you enter "where b=5 and a > 5", you have to range scan through 999,995 A's to find the "single" (assumed) B=5.

If b had been first, it would have been a single compare instead.


Think about the data, think about the questions being asked of the data, think about an index and how the orders of the columns "clump" the data.


Selectivity, not really a consideration.
How you use the data -- that, that is the consideration.

Change of Scenario

A reader, September 06, 2005 - 2:41 am UTC

Hi Tom,
With refernce to the original question, your explanation is correct as always.

select * from t
where emp_id =
and emp_status =
and dept =

Say the table has 10million rows
Does it make sense to have 3 indexes on this table
index1 (ssn)

index2(emp_id,dept)

index3(emp_id,emp_status,dept)

For this query you suggested the following solution:
An index on ssn
and another index on emp_id, dept, emp_status

But what if the selection criteria changes:

select * from t
where emp_id > ?
and emp_status in ('?','?')
and dept != ?

In this scenario will the second index hold good? I am not sure whether it will. But then what should be the strategy.
Also can an index be created to cater to both the select statements.
Thanks as always.

Tom Kyte
September 06, 2005 - 8:16 am UTC

that second index can be used to range scan by emp_id, and look at the emp_status, dept to see if we want it.


In that case, if you wanted to index "just for that query", it would make sense to have emp_status,emp_id,dept -- but remember, every single index you add increases the work modifications must do by a lot.



Order of index

A reader, September 08, 2005 - 8:42 am UTC

"say you have:

where a=
where a= and b=
where a= and b= and c=

easy -- index(a,b,c) in that order.

Say you have
where b=? and a > ? "

In the above case - if we have index(a,b,c) , won't that be used for "b=? and a > ?" ?

Thanks



Tom Kyte
September 08, 2005 - 8:54 am UTC

it would/could be used for b=? and a>?

However, having b and then a in the index would be optimal for that specific case (always a tradeoff)

say A is a number between 1 and 1,000,000
say B is also a number between 1 and 1,000,000
say B is either unique or close to being unique

where b = 5 and a > 2

that'll have to look at a's from 2..1,000,000 if the index has (a,b)

that'll have to look at b=5 only (one or two records) for a>2 if the index has(b,a)



vector/scalar

A reader, September 08, 2005 - 9:05 am UTC

Thanks Tom.

Another doubt - was using index a vector in all earlier versions also?

Was there a restriction like for index to be used, leading columns in index should be used in query, in earlier versions?



Tom Kyte
September 08, 2005 - 9:55 am UTC

yes, think of how an index structure would be stored (sorted by the keys in column order)

if you have an index on t(a,b,c)

and you query "where c = ?"

it is unlikely that index would be used to range scan (there are special cases of an index skip scan new in 9i - but they demand that the least selective columns come first actually!)

Myth Index

Lawang, September 08, 2005 - 9:28 am UTC

It was good understsnading the index myth from Tom.

Thx


index skip scan with least selective column last

A reader, September 08, 2005 - 11:08 am UTC

index skip scan can happen with least selective column last in the index ( version 10g).

--------------------------------------------------
drop table test1;

create table test1 as select rownum ID,owner, object_type,object_name from dba_objects;

analyze table test1 compute statistics
for table
for all indexes
for all columns
/

select count(*) from test1;

COUNT(*)
----------
50071

select * from test1 where id = 101;

ID OWNER OBJECT_TYPE OBJECT_NAME
101 SYS TABLE RESOURCE_COST$

Elapsed: 00:00:01.02

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=45 Card=1 Bytes=41)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST1' (TABLE) (Cost=45 Card=1 Bytes=41)
2 1 INDEX (SKIP SCAN) OF 'IND_TEST1' (INDEX) (Cost=44 Card=1)




Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
50 consistent gets
46 physical reads
0 redo size
570 bytes sent via SQL*Net to client
504 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed




Tom Kyte
September 08, 2005 - 3:51 pm UTC

sorry -- missing something here

first, id is "very selective"
second, I don't see the index?

If "they" require more evidence

James, September 08, 2005 - 5:36 pm UTC

If I was in your shoes I would suggest doing it both ways...

Try creating the three indexes like the folks you are wanting to convince think they want. Then drop those indexes and try it Tom's way. Run those variations of queries with "SET AUTOTRACE ON" for both indexing approaches. At minimum check the generated explain plans and compare your results.

This will take a couple extra minutes but you and your "folks" will learn a little about indexing. You will also have some hard evidence of the two approaches.


not in

A reader, September 08, 2005 - 9:14 pm UTC

How exactly we should deal with NOT IN type queries
say for example
table t
a number
b number
c number
d varchr2(1)
I issue
select * from t
where nvl(a,0) not in (3,4,5)
and b =5
and c=10
table t has 10 million rows and the first condition being very selective..meaning nvl(a,0) not in (3,4,5) alone filters out 9 million rows and coupled with b=5 the output just 60,000 rows? how should i create the index??
Thanks


Tom Kyte
September 08, 2005 - 9:35 pm UTC

b,c,a would seem to make the most sense for this one, if the goal was to make this query "as fast as possible"


well, b,c,a,d -- but then the index would likely be larger than the table itself.


depends on the nature of A as well, A is number? or int?

Thanks

A reader, September 08, 2005 - 9:44 pm UTC

Thanks for the reply..
But b is not very selective...there are around 6 million rows with b=5 and with c=10 around 5 million..the important field is a...a is a number but can be null.....but i heard indexes will not be used if a NOT IN condition or NVL(function) is used??


Tom Kyte
September 09, 2005 - 6:53 am UTC

that is why you need to tell us MORE about a. is a value of 3.14 valid for a or is a really a number(38) or what.

tell us more about the query in general -- is is 3,4,5 always, or is is variable?


How index works in COMPOSITE keys

Bhavesh Ghodasara, September 09, 2005 - 8:43 am UTC

hi tom,
I have a table say tbl_personal
the fields are :
unit
location
category
empno
.
.
.
and i have composite primary key on
(unit,location,category,empno)
it means they have index by default
now i want to know how they behave??
I mean once I heard that when we using Like it breaks the index.
i dont know how??
e.g. suppose i want to search for unit = 01 and location =all
then query like unit=:p_unit and location like '%'
any idea is it good or not??
Thanks in advance..
(dont say to read database concepts ...because i am going to read it after some days but please just provide overview)

Tom Kyte
September 09, 2005 - 9:13 am UTC

"breaks the index"?


unit = ? and location like '%' would be better written as

unit = ? and location is not null, but actually since location is part of the primary key just code:

unit = ?


but, even so -- unit = ? and location like '%' can use the index.

Clarification

Bhavesh Ghodasara, September 09, 2005 - 9:25 am UTC

hi tom,
thanks for your reply,
my question is that:
index(unit,location,category,number)
now
query-1) unit=01 and location=01 and category=01 and emp_number='0001' works fine OK
query-2) unit=01 and location=01 and category=01 and
emp_number like % may works fine(dont know though because dont have rights to tkprof output)
query-3) unit=01 and location is like '%' and category =01
and emp_number='0001'
Is query-3 use index ??
and how this query behave???
Bhavesh


Tom Kyte
September 09, 2005 - 9:38 am UTC

they all work *fine*


they all can use the index if appropriate.


query-3 will use the index likely, yes. it will look at all of the unit-01 entries in it and see of the rest of the criteria is satisfied and if so - goto the table to get the rest of the row.

A reader, September 09, 2005 - 9:45 am UTC

"that is why you need to tell us MORE about a. is a value of 3.14 valid for a or
is a really a number(38) or what.

tell us more about the query in general -- is is 3,4,5 always, or is is
variable?
""

--
A is actually a number..always single digit..0- to 10
Thanks


Tom Kyte
September 09, 2005 - 11:04 am UTC

A number is 3.14

An integer is 1,2,3,4

A single digit? (10?)

can you be more precise? is it possible that this A has a very small set of discrete values (like 10 of them plus null?)

Likely yes????

Bhavesh Ghodasara, September 09, 2005 - 9:50 am UTC

hi tom,
what do you mean by likely yes??
i want clarification...because i have to convince it to others too..so pls describe in brief..
Bhavesh

Tom Kyte
September 09, 2005 - 11:05 am UTC

the optmizer will decide.

It would likely use the index if the resulting computed cardinality is OK with it.

Bhavesh Ghodasara, September 09, 2005 - 10:03 am UTC

whats about unit like '%'and location='01' and category='01' and empno='0001'
Bhavesh

Tom Kyte
September 09, 2005 - 11:10 am UTC

that would likely not use the index since it would have to inspect every index entry. It *might* fast full scan the index to get the rowids, but it is unlikely to index range scan it.

A reader, September 09, 2005 - 1:06 pm UTC

"A number is 3.14

An integer is 1,2,3,4

A single digit? (10?)

can you be more precise? is it possible that this A has a very small set of
discrete values (like 10 of them plus null?)"

-- Sorry about the confusion..
A -- can be only be in (1,2,3,4,5,6,7,8,9) or NULL

Out of the 10 million
A with value 1 (just 10,000)
A with value 2( Just 15,000)
A with values 3,4,5 ?(Close to 8 million)
A with NULLS close to (80,000)
A with values 6,7,8,9 (around 30,000)


Tom Kyte
September 09, 2005 - 1:13 pm UTC

ops$tkyte@ORA10G> create table t ( a int, b int, c int, d char(2000) );
 
Table created.
 
ops$tkyte@ORA10G> exec dbms_stats.set_table_stats( user, 'T', numrows => 10000000, numblks => 10000);
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA10G> create index t_idx on t(a,b,c);
 
Index created.
 
ops$tkyte@ORA10G> exec dbms_stats.set_index_stats( user, 'T_IDX', numrows => 10000000, NUMLBLKS=>1000);
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> set autotrace traceonly explain
ops$tkyte@ORA10G> select * from t
  2  where a in (1,2,6,7,8,9)
  3    and b = 10
  4    and c = 5
  5   union all
  6  select * from t
  7  where a is null
  8    and b = 10
  9    and c = 5;
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=51 Card=60 Bytes=6000)
   1    0   UNION-ALL
   2    1     INLIST ITERATOR
   3    2       TABLE ACCESS (BY INDEX ROWID) OF 'T' (TABLE) (Cost=1 Card=10 Bytes=1000)
   4    3         INDEX (RANGE SCAN) OF 'T_IDX' (INDEX) (Cost=1 Card=1)
   5    1     TABLE ACCESS (BY INDEX ROWID) OF 'T' (TABLE) (Cost=50 Card=50 Bytes=5000)
   6    5       INDEX (RANGE SCAN) OF 'T_IDX' (INDEX) (Cost=50 Card=500000)
 
 
 
ops$tkyte@ORA10G> set autotrace off
 

column sequence in index

A reader, May 29, 2012 - 1:30 am UTC

Hi tom,

I tried a test with below:
table is 50000 rows, b is more selective with value from 1-5
index(a,b) with a>=10 and b =5
index(b,a) with a>=10 and b =5

I saw below differences.
1. consistent gets (975<-> 873)
2. CPU cost (37 <-> 34)
I do not think there is significant differences in terms of performance. Any idea?

3. index access method (index fast full scan <-> index range scan)
Here is the key point you want to tell us in this thread, right?
index(a,b) will go through the whole index, while index(b,a) just locate those b=5 and then do range scan?

4. filter/access ( filter("B"=5 AND "A">=10) <-> access("B"=5 AND "A">=10 AND "A" IS NOT NULL))
why here one is filter and one is access? and why there is extra 'AND "A" IS NOT NULL'?

below is my test sqls:

create table test(a int, b int, c int);
insert into test select rownum,ceil(dbms_random.value(1,5)), rownum from dual connect by level<=50000;
create index ind1 on test(a,b);
exec dbms_stats.gather_table_stats(ownname=>user, tabname=>'TEST');
select a,b from test where a>=10 and b = 5;

Execution Plan
----------------------------------------------------------
Plan hash value: 4126477464

-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 12498 | 99984 | 37 (0)| 00:00:01 |
|* 1 | INDEX FAST FULL SCAN| IND1 | 12498 | 99984 | 37 (0)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("B"=5 AND "A">=10)


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
975 consistent gets
0 physical reads
0 redo size
230019 bytes sent via SQL*Net to client
9637 bytes received via SQL*Net from client
840 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
12575 rows processed

drop index ind1;
create index ind2 on test(b,a);
alter system flush buffer_cache;
exec dbms_stats.gather_table_stats(ownname=>user, tabname=>'TEST');
select a,b from test where a>=10 and b = 5;
Execution Plan
----------------------------------------------------------
Plan hash value: 1519675770

-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 12456 | 99648 | 34 (0)| 00:00:01 |
|* 1 | INDEX RANGE SCAN| IND2 | 12456 | 99648 | 34 (0)| 00:00:01 |
-------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("B"=5 AND "A">=10 AND "A" IS NOT NULL)


Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
873 consistent gets
0 physical reads
0 redo size
230019 bytes sent via SQL*Net to client
9637 bytes received via SQL*Net from client
840 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
12575 rows processed

Tom Kyte
May 29, 2012 - 6:35 am UTC

1, 2) it depends on the nature of your data. think about it, close you eyes and IMAGINE the work that needs to be performed in each case (and if you cannot do that, study indexes a bit more until you can - seriously, unless you can visualize them, you'll never be able to optimally use them!)

ops$tkyte%ORA11GR2> /*
ops$tkyte%ORA11GR2> drop table t;
ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> create table t
ops$tkyte%ORA11GR2> as
ops$tkyte%ORA11GR2> select case when mod(rownum,6) = 5 then -rownum else rownum end a, mod(rownum,6) b from all_objects;
ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> create index t_ab on t(a,b);
ops$tkyte%ORA11GR2> create index t_ba on t(b,a);
ops$tkyte%ORA11GR2> */
ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> set autotrace traceonly
ops$tkyte%ORA11GR2> select /*+ index( t t_ab ) */ * from t where a >= 10 and b = 5;

no rows selected


Execution Plan
----------------------------------------------------------
Plan hash value: 840535400

-------------------------------------------------------------------------
| Id  | Operation        | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT |      |     2 |    52 |   173   (1)| 00:00:03 |
|*  1 |  INDEX RANGE SCAN| T_AB |     2 |    52 |   173   (1)| 00:00:03 |
-------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("A">=10 AND "B"=5 AND "A" IS NOT NULL)
       filter("B"=5)

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        160  consistent gets
          0  physical reads
          0  redo size
        336  bytes sent via SQL*Net to client
        409  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

ops$tkyte%ORA11GR2> select /*+ index( t t_ba ) */ * from t where a >= 10 and b = 5;

no rows selected


Execution Plan
----------------------------------------------------------
Plan hash value: 1836128108

-------------------------------------------------------------------------
| Id  | Operation        | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT |      |     2 |    52 |     2   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| T_BA |     2 |    52 |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("B"=5 AND "A">=10 AND "A" IS NOT NULL)

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          2  consistent gets
          0  physical reads
          0  redo size
        336  bytes sent via SQL*Net to client
        409  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

ops$tkyte%ORA11GR2> set autotrace off




There will never be a case where the index on (a,b) is 'better' than (b,a) in this case (where a >= ? and b = ?) and in many cases inferior to (b,a)

there are many cases where the index on (b,a) will be superior to the index on (a,b) for this predicate and none where it is inferior.


So tell me, would you like a solution that is never inferior and likely superior - or one that is never superior and likely to be inferior?

3. index access method (index fast full scan <-> index range scan)
Here is the key point you want to tell us in this thread, right?
index(a,b) will go through the whole index, while index(b,a) just locate
those b=5 and then do range scan?


I never said that did I?

the index on (a,b) will be a range scan in many cases - it was a range scan here in my example.

the index on (a,b) will have to, in general, inspect many more leaf key entries than the index on (b,a) would.

4. filter/access ( filter("B"=5 AND "A">=10) <-> access("B"=5 AND "A">=10 AND
"A" IS NOT NULL))
why here one is filter and one is access? and why there is extra 'AND "A" IS
NOT NULL'?


because in your case, it decided to use the index as if it were a table - and it read the data unsorted using a fast full scan. So, it filtered the data rather than using the index to access the data sorted. it applied the filter to every single row, it did not use the index structure to reduce the number of rows.