Team,
Why don't oracle use index in this case, 10053 trace shows that Oracle don't even consider costing the index, just cost the Full table scan and picks that one as the final cost. is that NOT-IN always biased towards FTS?
rajesh@ORA12C> create table t as
2 select a.*,
3 decode(rownum,1,1,99) x
4 from all_objects a;
Table created.
rajesh@ORA12C>
rajesh@ORA12C> create index t_idx on t(x);
Index created.
rajesh@ORA12C>
rajesh@ORA12C> begin
2 dbms_stats.gather_table_stats
3 (ownname=> user,
4 tabname=> 'T',
5 method_opt=>'for all indexed columns size 5') ;
6 end;
7 /
PL/SQL procedure successfully completed.
rajesh@ORA12C> set autotrace traceonly explain
rajesh@ORA12C> select min(object_id) from t where x <> 99;
Execution Plan
----------------------------------------------------------
Plan hash value: 2966233522
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 8 | 432 (1)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 8 | | |
|* 2 | TABLE ACCESS FULL| T | 2 | 16 | 432 (1)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("X"<>99)
10053 trace contentsAccess path analysis for T
***************************************
SINGLE TABLE ACCESS PATH
Single Table Cardinality Estimation for T[T]
SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE
Column (#19):
NewDensity:0.000006, OldDensity:0.000006 BktCnt:89955.000000, PopBktCnt:89954.000000, PopValCnt:1, NDV:2
Column (#19): X(NUMBER)
AvgLen: 3 NDV: 2 Nulls: 0 Density: 0.000006 Min: 0.000000 Max: 1.000000
Histogram: Freq #Bkts: 2 UncompBkts: 89955 EndPtVals: 2 ActualVal: yes
Table: T Alias: T
Card: Original: 89955.000000 Rounded: 2 Computed: 1.500000 Non Adjusted: 1.500000
Scan IO Cost (Disk) = 430.000000
Scan CPU Cost (Disk) = 57157410.960000
Total Scan IO Cost = 430.000000 (scan (Disk))
+ 0.000000 (io filter eval) (= 0.000000 (per row) * 89955.000000 (#rows))
= 430.000000
Total Scan CPU Cost = 57157410.960000 (scan (Disk))
+ 4497750.000000 (cpu filter eval) (= 50.000000 (per row) * 89955.000000 (#rows))
= 61655160.960000
Access Path: TableScan
Cost: 431.605050 Resp: 431.605050 Degree: 0
Cost_io: 430.000000 Cost_cpu: 61655161
Resp_io: 430.000000 Resp_cpu: 61655161
Best:: AccessPath: TableScan
Cost: 431.605050 Degree: 1 Resp: 431.605050 Card: 1.500000 Bytes: 0.000000
***************************************
The index path is indeed *available* but the optimizer doesn't go for it unless prompted
eg
SQL> create table T as
2 select d.*,
3 case when rownum < 20 then object_id end low_freq_val
4 from dba_objects d,
5 ( select 1 from dual connect by level <= 50 )
6 /
Table created.
SQL> create index IX on T ( low_freq_val );
Index created.
So there's only 20 rows that have a 'low_freq_val' present, making it a tiny tiny index. But its not picked
SQL> explain plan for
2 select * from T
3 where low_freq_val != 12;
Explained.
SQL> @exp
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 18 | 2088 | 22058 (1)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 18 | 2088 | 22058 (1)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("LOW_FREQ_VAL"<>12)
13 rows selected.
The standard range scan is available obviously
SQL> explain plan for
2 select * from T
3 where low_freq_val < 12;
Explained.
SQL> @exp
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4 | 464 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 4 | 464 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IX | 4 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("LOW_FREQ_VAL"<12)
14 rows selected.
Yet, we cant coerce the optimizer using a concat because the OR is transformed to a "!=" anyway.
SQL> explain plan for
2 select /*+ use_concat */ * from T
3 where low_freq_val < 12 or low_freq_val > 12;
Explained.
SQL> @exp
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 18 | 2088 | 22058 (1)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 18 | 2088 | 22058 (1)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("LOW_FREQ_VAL"<>12)
13 rows selected.
We can however demand the index by used
SQL> explain plan for
2 select /*+ index(t) */ * from T
3 where low_freq_val != 12 ;
Explained.
SQL> @exp
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 18 | 2088 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 18 | 2088 | 2 (0)| 00:00:01 |
|* 2 | INDEX FULL SCAN | IX | 1 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("LOW_FREQ_VAL"<>12)
Its just one of those things. My hypothesis is that the optimizer doesnt make inferences about values based on subtraction, ie, "I've got 'n' values in total, and 'm' values of a specific case, thus there is n-m values representing the rest". So (my hypothesis) is that it knows there are 20 values in the index, and 'n' values are '12' (where 'n' depends on histogram presence), but it does not come to conclusion that there hence is 20-n potential index keys to scan.