Tom, could you please elaborate on "compress index" feature
Andrew, May 09, 2002 - 1:44 pm UTC
I always find something new in your answers. This time "skip index" (explained very well in your previous link) and "compressed index". Could you please elaborate on this compressed index feature, how it works and how it is related to the "most selective" column. I probably shouldn't ask this, because it is probably in the docs, but, Tom, you have such a talent to make it clear....
May 09, 2002 - 7:43 pm UTC
Better yet -- I've explained it in great deatil in my book in the chapter on indexes! I explained tons of stuff in there.
column order - oracle docs
kapil, May 22, 2002 - 10:57 am UTC
Hi Tom,
You have said that since version 6 it is not better to have the most selective column in the leading position
But in the oracle documentation - Oracle8i Designing and Tuning for Performance - Data Access Methods
the following statement is mentioned
"If all keys are used in WHERE clauses equally often, then ordering these keys from most selective to least selective in the CREATE INDEX statement best improves query performance"
Could you throw some light on that statement?
Kapil
May 22, 2002 - 10:54 pm UTC
it is in my experience -- a mistatement on their part. I would say it is inaccurate.
In my book -- I actually benchmark it:
tkyte@TKYTE816> create table t
2 nologging
3 as
4 select * from all_objects;
Table created.
tkyte@TKYTE816> create index t_idx_1 on t(owner,object_type,object_name)
2 nologging pctfree 0;
Index created.
tkyte@TKYTE816> create index t_idx_2 on t(object_name,object_type,owner)
2 nologging pctfree 0;
Index created.
tkyte@TKYTE816> select count(distinct owner), count(distinct object_type),
2 count(distinct object_name ), count(*)
3 from t;
(DISTINCTOWNER) (DISTINCTOBJECT_TYPE) (DISTINCTOBJECT_NAME) COUNT(*)
--------------- --------------------- --------------------- ----------
24 23 12265 21975
Now, to show that neither is more efficient space wise, we?ll measure their space utilization:
tkyte@TKYTE816> exec show_space( 'T_IDX_1', user, 'INDEX' );
Free Blocks.............................0
Total Blocks............................192
Total Bytes.............................1572864
Unused Blocks...........................51
Unused Bytes............................417792
Last Used Ext FileId....................6
Last Used Ext BlockId...................4745
Last Used Block.........................13
PL/SQL procedure successfully completed.
tkyte@TKYTE816> exec show_space( 'T_IDX_2', user, 'INDEX' );
Free Blocks.............................0
Total Blocks............................192
Total Bytes.............................1572864
Unused Blocks...........................51
Unused Bytes............................417792
Last Used Ext FileId....................6
Last Used Ext BlockId...................4937
Last Used Block.........................13
PL/SQL procedure successfully completed.
They use exactly the same amount of space ? there are no differences there (however, the first index is a lot more COMPRESSABLE if we used index key compression! There is an argument for going from least discriminating to most discriminating). Now, lets see how they perform ? if either index is generally more efficient than the other. In order to test that, I used a PLSQL block with hinted queries (so as to use one index or the other) like this:
tkyte@TKYTE816> alter session set sql_trace=true;
Session altered.
tkyte@TKYTE816> declare
2 cnt int;
3 begin
4 for x in ( select owner, object_type, object_name from t )
5 loop
6 select /*+ INDEX( t t_idx_1 ) */ count(*) into cnt
7 from t
8 where object_name = x.object_name
9 and object_type = x.object_type
10 and owner = x.owner;
11
12 select /*+ INDEX( t t_idx_2 ) */ count(*) into cnt
13 from t
14 where object_name = x.object_name
15 and object_type = x.object_type
16 and owner = x.owner;
17 end loop;
18 end;
19 /
PL/SQL procedure successfully completed.
That read ever single row in the index one at a time. The TKPROF report shows us:
SELECT /*+ INDEX( t t_idx_1 ) */COUNT(*)
FROM
T WHERE OBJECT_NAME = :b1 AND OBJECT_TYPE = :b2 AND OWNER = :b3
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ----- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 21975 2.35 2.55 0 0 0 0
Fetch 21975 1.40 1.57 0 44088 0 21975
------- ------ -------- ---------- ----- ---------- ---------- ----------
total 43951 3.75 4.12 0 44088 0 21975
Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
21975 SORT (AGGREGATE)
21975 INDEX (RANGE SCAN) OF 'T_IDX_1' (NON-UNIQUE)
********************************************************************************
SELECT /*+ INDEX( t t_idx_2 ) */COUNT(*)
FROM
T WHERE OBJECT_NAME = :b1 AND OBJECT_TYPE = :b2 AND OWNER = :b3
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ----- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 21975 2.10 2.44 0 0 0 0
Fetch 21975 1.65 1.60 0 44088 0 21975
------- ------ -------- ---------- ----- ---------- ---------- ----------
total 43951 3.75 4.04 0 44088 0 21975
Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
21975 SORT (AGGREGATE)
21975 INDEX (RANGE SCAN) OF 'T_IDX_2' (NON-UNIQUE)
They processed the same exact number of rows, blocks, used equivalent amounts of CPU time and ran in about the same elapsed time (run this same test again and the CPU/ELAPSED numbers will be a little different but on average ? they will be the same). There are no inherit efficiencies to be had ordering the columns in order how discriminating they are.
comparison
kapil, May 24, 2002 - 10:27 am UTC
Hi Tom
That was helpful indeed.
Well the only thing which could be added to the above analysis is comparing the space and performance with the same index in the compressed form.
Would it be possible for you to draw that comparison?
Also, could you share the code of the PL/SQL block "show_space"?
May 24, 2002 - 11:21 am UTC
simple search for show_space on this site will find the code (its in the book as well)
Sort of thought with the above -- anyone would be able to do the compress example if interested.... go for it.
thanks
Kapil, May 27, 2002 - 10:34 am UTC
Hi Tom,
No problem
Thanks for all your help
Sushanta, March 23, 2004 - 6:29 am UTC
Hi Tom,
I have a problem in the Index Picking.
#----------------------------I have a table
SQL> desc check_008
Name Null? Type
----------------------------------------- -------- ---------------
A NUMBER
B NUMBER
C VARCHAR2(2000)
#----------------------------Number of Records present in the table
SQL> SELECT COUNT(*)
2 FROM check_008
3 /
COUNT(*)
----------
99001
#----------------------------I have a index on the table on column (A,B)
SQL> l
1 SELECT index_name,SUBSTR(column_name,1,20),column_position
2 FROM all_ind_columns
3* WHERE table_name='CHECK_008'
SQL> /
INDEX_NAME SUBSTR(COLUMN_NAME,1 COLUMN_POSITION
------------------------------ -------------------- ---------------
DEL_001 B 2
DEL_001 A 1
SQL> set autotrace traceonly exp
SQL> DELETE FROM check_008
2 WHERE (a=10 OR 10 IS NULL)
3 AND (b=2 OR 2 IS NULL)
4 AND (c='23-MAR-049218' OR '23-MAR-049218' IS NULL)
5 /
1 row deleted.
Execution Plan
----------------------------------------------------------
0 DELETE STATEMENT Optimizer=CHOOSE (Cost=14 Card=1 Bytes=19)
1 0 DELETE OF 'CHECK_008'
2 1 TABLE ACCESS (BY INDEX ROWID) OF 'CHECK_008' (Cost=14 Card=1 Bytes=19)
3 2 INDEX (RANGE SCAN) OF 'DEL_001' (NON-UNIQUE) (Cost=7 Card=1)
Then i create a procedure with the same Delete statement like this
SQL>CREATE OR REPLACE PROCEDURE pr_without_nvl(p_a IN NUMBER,p_b IN NUMBER,p_c IN VARCHAR2,p_rows_affected OUT NUMBER)
IS
BEGIN
DELETE FROM check_008
WHERE (a = p_a OR p_a IS NULL)
AND (b = p_b OR p_b IS NULL)
AND (c = p_c OR p_c IS NULL);
p_rows_affected := SQL%ROWCOUNT;
END;
/
Procedure created.
SQL> alter session set sql_trace=true;
Session altered.
SQL> alter session set timed_statistics=true;
Session altered.
SQL> variable delete_count NUMBER;
SQL> exec pr_without_nvl(10,2,'23-MAR-049218',:delete_count);
PL/SQL procedure successfully completed.
SQL> print :delete_count
DELETE_COUNT
------------
1
By seeing the Tkprof File it looks like
DELETE FROM CHECK_008
WHERE
(A = :b1 OR :b1 IS NULL ) AND (B = :b3 OR :b3 IS NULL ) AND (C = :b5 OR
:b5 IS NULL )
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.02 0.00 0 70 0 0
Execute 1 0.11 0.16 206 831 24 1
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.13 0.16 206 901 24 1
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 46 (ATLAS) (recursive depth: 1)
Rows Execution Plan
------- ---------------------------------------------------
0 DELETE STATEMENT GOAL: CHOOSE
0 DELETE OF 'CHECK_008'
0 TABLE ACCESS GOAL: ANALYZED (FULL) OF 'CHECK_008'
Where the Plan is shows as FULL Table Scan ........................????
However from this it is picking Index.
SQL> DELETE FROM check_008
2 WHERE (a=10 OR 10 IS NULL)
3 AND (b=2 OR 2 IS NULL)
4 AND (c='23-MAR-049218' OR '23-MAR-049218' IS NULL)
5 /
1 row deleted.
Execution Plan
----------------------------------------------------------
0 DELETE STATEMENT Optimizer=CHOOSE (Cost=14 Card=1 Bytes=19)
1 0 DELETE OF 'CHECK_008'
2 1 TABLE ACCESS (BY INDEX ROWID) OF 'CHECK_008' (Cost=14 Card=1 Bytes=19)
3 2 INDEX (RANGE SCAN) OF 'DEL_001' (NON-UNIQUE) (Cost=7 Card=1)
My Point is is it better to change the Query from
DELETE check_008
WHERE a = NVL(p_a,a)
AND b = NVL(p_b,b)
AND c = NVL(p_c,c);
to
DELETE check_008
WHERE (a = p_a OR p_a IS NULL)
AND (b = p_b OR p_b IS NULL)
AND (c = p_c OR p_c IS NULL)
will increase performance when there is a composite index on a and b.
Also why there is a deviation in the Plan, from the SQL*PLUS to Procedure .
Thanks And Best Regards
Sushanta
March 23, 2004 - 7:26 am UTC
when you use bind variables -- think about it...
sometimes A is provided. sometimes not.
sometimes B is provided. sometimes not.
when neither of A or B is provided -- full scan
when A is not provided -- full scan (or index skip scan but probable full scan)
when A is provided -- possible to use index
but -- no one knows IF or WHEN A would be provided.....
in the example with constants, it is well known that A is provided (try your sqlplus example with binds and you'll get the same result)
there is a deviation in the plan because you are comparing APPLES to Toaster ovens and there is no possible comparision there....
Now, 100,000 rows is pretty tiny, so if this delete is not executed very often, what you have is probably fine. Else you can
l_string := 'delete from t ';
if ( p_a is not null )
then
l_string := l_string || ' where a = :a ';
else
l_string := l_string || ' where :a is null ';
end if;
if ( p_b is not null )
then
l_string := l_string || ' and b = :b ';
else
l_string := l_string || ' and :b is null ';
end if;
if ( p_c is not null )
then
l_string := l_string || ' and c = :c ';
else
l_string := l_string || ' and :c is null ';
end if;
execute immediate l_string using p_a, p_b, p_c;
if A is provided, index.
if A is not provided, probable full scan
A reader, March 25, 2004 - 5:39 am UTC
Hi Tom,
Many thanks for ur input. One thing to clear and it will overall clear on this concept
When i try for this one in SQL*PLUS
SQL> variable b1 NUMBER
SQL> variable b2 NUMBER
SQL> variable b3 VARCHAR2
STAGE - I
=========
SQL> DELETE FROM CHECK_008
WHERE
(A = :b1 ) AND (B = :b2 OR :b2 IS NULL ) AND (C = :b3 OR :b3 IS NULL ) 2 3
4 /
0 rows deleted.
Execution Plan
----------------------------------------------------------
0 DELETE STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=2028)
1 0 DELETE OF 'CHECK_008'
2 1 TABLE ACCESS (BY INDEX ROWID) OF 'CHECK_008' (Cost=1 Car
d=1 Bytes=2028)
3 2 INDEX (RANGE SCAN) OF 'DEL_008' (NON-UNIQUE)
STAGE-II
========
SQL> DELETE FROM CHECK_008
WHERE
(A = :b1 OR :b1 IS NULL) AND (B = :b2 OR :b2 IS NULL ) AND (C = :b3 OR :b3 IS NULL ) 2 3
4 /
0 rows deleted.
Execution Plan
----------------------------------------------------------
0 DELETE STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=2028)
1 0 DELETE OF 'CHECK_008'
2 1 TABLE ACCESS (FULL) OF 'CHECK_008' (Cost=1 Card=1 Bytes=
2028)
Colud you please clear on this that
In case of STAGE-I, ORACLE-OPTIMIZER knows that the value of the bind variable(:b1) must come,whereas in case
of STAGE-II, ORACLE-OPTIMIZER is in "Doubt!!" , whether the value of the bind variable (:b1) will come or not,
for that reason it will go for "FULL TABLE SCAN", without taking any RISK.
Is it so ??
Also how ORACLE generates PLAN in case of "OR" Operator ?
Many Thanks ...
Sushanta
March 25, 2004 - 9:16 am UTC
the plan is set up ONCE and must be capable of doing ANY set of inputs, not just yours.
Hence, sometimes b1 could be null and sometimes not. therefore -- well, it has to do what it has to do (full scan)
first query -- you can always use an index.
second query -- you can use one when b1 is not null, you cannot when b1 is null.
sushanta, March 26, 2004 - 2:22 am UTC
Tom,
Thanks a Lot Tom for ur valuable Time.
Regards
Sushanta
Whether or not to use index
Sanji, September 12, 2007 - 5:28 pm UTC
Tom,
I have a doubt regarding index selectivity.
There are over 37 million records in a table and there is an index on a column which has only 17 distinct rows.
select count(distinct ovrtyp_id) from override; ===> 17 rows selected.
The statistics have been collected for all columns, size auto. Following is the data distribution for this column.
MIN(OVRTYP_ID) MAX(OVRTYP_ID) COUNT(OVRTYP_ID) WB
-------------- -------------- ---------------- ----------
102 108 783925 1
199 204 202747 2
300 300 32292746 4
401 450 177653 6
452 501 299539 7
700 700 402911 11
800 800 145645 13
900 900 256526 14
1100 1100 3303698 18
1206 1206 42410 19
The question is, is there a need for this index, theoretically ? The column has only 17 distinct rows.
Now here's the test case. When we run the following query, with the index in place, it takes over 3 minutes to execute
select ovr_status from OVERRIDE
where OVR_START_DATE='08-sep-2007'
and WBU_NAME='HR_REFRESH'
and ovrtyp_id=700
and substr(ovr_old_value,2,10)='EMP_PAYGRP'
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.01 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 703 6.00 196.51 122681 137319 0 10522
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 705 6.00 196.52 122681 137319 0 10522
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 20
Rows Row Source Operation
------- ---------------------------------------------------
10522 TABLE ACCESS BY INDEX ROWID OVERRIDE
402911 INDEX RANGE SCAN IDX_OVR_OVRTYP_ID (object id 3864)
If i drop the index, reanalyze the table(all columns size auto) and reexecute the same query, it continues to run for over an hour.
From the explain plan, it does show that 402911 records out of 37 million records, would require an index for faster selectivity, but the index which is being used is only on 17 distinct records.
Why is the index efficient in this case ?
Thanks
Sanji
Sorry, typo
Sanji, September 13, 2007 - 9:31 am UTC
It's actually 17 distinct values for the field ovrtyp_id.
Question is, why would an index be required on a table with over 37 million records and the index key selectivity being only 17.
Thanks
Sanji
September 15, 2007 - 7:31 pm UTC
maybe the values 1..16 each point to a single row
and the value of 17 points to 37,000,000-16 rows.
so where ovrtyp_id = 1 (thru 16) is very selective.
this index *might* be useful.