You Asked

Hopefully this is an easy one. I would like to know why an index is ignored when the SELECT clause contains multiple columns. It was my understanding that the index points to the position of the row so that whether you select one column or ten should not matter (assuming the row is not chained).

In a nutshell, why does this statement force a full table scan:

select *
from my_table
where my_indexed_column = x;

While this uses the index:

select column1
from my_table
where my_indexed_column = x;

and Tom said...

in a nutshell - it doesn't force a full scan.

probably - the second query only used the index - that is, the index was on (my_indexed_column,column1). The table was avoided.

So, I can give you examples whereby the index will be used, the index will not be used - "it depends".

Also, the index was not ignored, it was decided that "the index would not be the efficient way to process the data"

So, I'll guess (no test case :( :( :( no example to work from :( )

column1 was in the index

the other columns selected with * was not

the clustering factor of the index was rather 'poor' (near the number of rows in the table, not the number of blocks)

that would inhibit large range scans that need to hip hop from index to table - since each row we were to retrieve is presumed to be on a block different from any other block in the table we had already accessed (eg: we'd be reading a lot of the table, a block at a time)

For example:

ops$tkyte%ORA10GR2> create table t
  2  as
  3  select *
  4    from all_objects
  5   order by dbms_random.random;
Table created.

ops$tkyte%ORA10GR2> create index t_idx on t(owner, object_name);
Index created.

ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T', estimate_percent => 100, method_opt => 'for columns owner size 254' );
PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> select table_name, blocks, num_rows,
  2         (select clustering_factor from user_indexes where index_name = 'T_IDX') cf
  3    from user_tables where table_name = 'T';

TABLE_NAME                         BLOCKS   NUM_ROWS         CF
------------------------------ ---------- ---------- ----------
T                                     704      49741      49651

<b> clustering factor is near the number of rows in the table, not blocks.  Therefore, the optimizer knows that
if you do a large range scan, it will hit LOTS of blocks - the data we want to get is scattered all over the place.

That is why I ordered by random above, to get the data spread out, disorganized</b>

ops$tkyte%ORA10GR2> set autotrace traceonly explain
ops$tkyte%ORA10GR2> select owner, object_name from t where owner = 'WMSYS';

Execution Plan
----------------------------------------------------------
Plan hash value: 2946670127

--------------------------------------------------------------------------
| Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT |       |   232 |  5336 |     3   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| T_IDX |   232 |  5336 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------

ops$tkyte%ORA10GR2> select owner, object_name, object_type from t where owner = 'WMSYS';

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   232 |  7888 |   222   (3)| 00:00:02 |
|*  1 |  TABLE ACCESS FULL| T    |   232 |  7888 |   222   (3)| 00:00:02 |
--------------------------------------------------------------------------

It did the right thing here, the full scan is cheaper (just, the index plan would cost about 235/240 - because it would do some IO's thru the index and 232 against the table)..

So, when we need to hit the table, it becomes too expensive to use the index.

If the SAME EXACT DATA were stored just a little differently on disk, things might be very different, for example:

ops$tkyte%ORA10GR2> create table t
  2  as
  3  select *
  4    from all_objects
  5   order by owner, object_name;
Table created.

ops$tkyte%ORA10GR2> create index t_idx on t(owner, object_name);
Index created.

ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T', estimate_percent => 100, method_opt => 'for columns owner size 254' );
PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> select table_name, blocks, num_rows,
  2         (select clustering_factor from user_indexes where index_name = 'T_IDX') cf
  3    from user_tables where table_name = 'T';

TABLE_NAME                         BLOCKS   NUM_ROWS         CF
------------------------------ ---------- ---------- ----------
T                                     704      49743        684

ops$tkyte%ORA10GR2> select owner, object_name from t where owner = 'WMSYS';
--------------------------------------------------------------------------
| Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT |       |   232 |  5336 |     3   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| T_IDX |   232 |  5336 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------

ops$tkyte%ORA10GR2> select owner, object_name, object_type from t where owner = 'WMSYS';
-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |   232 |  7888 |     7   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T     |   232 |  7888 |     7   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T_IDX |   232 |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

since the data we wanted was easy to retrieve from the table (it was all right next to each other) it used the index - the cost was very low. It believes it'll hit about 4 blocks (since about 73 rows/block fit in my 8k block database - this is right) and the cost is very low.

So, you started with "hopefully this is an easy one". It is actually quite complex

Rating

(2 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

You are Great

Sharad Verma, November 12, 2008 - 11:37 pm UTC

Tom - You are great, wonderful explaination.

Re:Index ignored when selecting multiple columns

Scott, November 14, 2008 - 5:02 pm UTC

A similar very nice explanation ....

http://ravivedala.blogspot.com/2008/11/how-oracle-decides-whether-to-use-index.html

November 18, 2008 - 6:25 pm UTC

Scott/Dave/Ravi/A Reader (you've used them all, all of those names, to point to this blog and say "what a great answer over here"...)

It is very nice you like to point to this blog - over and over - however.........

I would like to point out that (quoted from the article referenced):

...Clustering_factor offers information on how the table rows are synchronized with the index. When the clustering factor is close to the number of data blocks and the column value is not row ordered when the clustering_factor approaches the number of rows in the table, the table rows are synchronized with the index. ...

is backwards.

When the clustering factor is NEAR the number of blocks, the table data is well sorted by the index key.

When the clustering factor is NEAR the number of rows, the table data is NOT ordered by the index key.

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

You are Great

Re:Index ignored when selecting multiple columns

More to Explore

PL/SQL demos

PL/SQL docs