Skip to Main Content
  • Questions
  • no index(range) scan on NOT IN operator

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Rajeshwaran.

Asked: January 03, 2016 - 11:57 am UTC

Last updated: January 05, 2016 - 7:00 am UTC

Version: 12.1.0.2

Viewed 1000+ times

You Asked

Team,

Why don't oracle use index in this case, 10053 trace shows that Oracle don't even consider costing the index, just cost the Full table scan and picks that one as the final cost. is that NOT-IN always biased towards FTS?

rajesh@ORA12C> create table t as
  2  select a.*,
  3     decode(rownum,1,1,99) x
  4  from all_objects a;

Table created.

rajesh@ORA12C>
rajesh@ORA12C> create index t_idx on t(x);

Index created.

rajesh@ORA12C>
rajesh@ORA12C> begin
  2     dbms_stats.gather_table_stats
  3     (ownname=> user,
  4      tabname=> 'T',
  5      method_opt=>'for all indexed columns size 5') ;
  6  end;
  7  /

PL/SQL procedure successfully completed.

rajesh@ORA12C> set autotrace traceonly explain
rajesh@ORA12C> select min(object_id) from t where x <> 99;

Execution Plan
----------------------------------------------------------
Plan hash value: 2966233522

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     8 |   432   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     8 |            |          |
|*  2 |   TABLE ACCESS FULL| T    |     2 |    16 |   432   (1)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("X"<>99)


10053 trace contents

Access path analysis for T
***************************************
SINGLE TABLE ACCESS PATH 
  Single Table Cardinality Estimation for T[T] 
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE
  Column (#19): 
    NewDensity:0.000006, OldDensity:0.000006 BktCnt:89955.000000, PopBktCnt:89954.000000, PopValCnt:1, NDV:2
  Column (#19): X(NUMBER)
    AvgLen: 3 NDV: 2 Nulls: 0 Density: 0.000006 Min: 0.000000 Max: 1.000000
    Histogram: Freq  #Bkts: 2  UncompBkts: 89955  EndPtVals: 2  ActualVal: yes
  Table: T  Alias: T
    Card: Original: 89955.000000  Rounded: 2  Computed: 1.500000  Non Adjusted: 1.500000
  Scan IO  Cost (Disk) =   430.000000
  Scan CPU Cost (Disk) =   57157410.960000
  Total Scan IO  Cost  =   430.000000 (scan (Disk))
                         + 0.000000 (io filter eval) (= 0.000000 (per row) * 89955.000000 (#rows))
                       =   430.000000
  Total Scan CPU  Cost =   57157410.960000 (scan (Disk))
                         + 4497750.000000 (cpu filter eval) (= 50.000000 (per row) * 89955.000000 (#rows))
                       =   61655160.960000
  Access Path: TableScan
    Cost:  431.605050  Resp: 431.605050  Degree: 0
      Cost_io: 430.000000  Cost_cpu: 61655161
      Resp_io: 430.000000  Resp_cpu: 61655161
  Best:: AccessPath: TableScan
         Cost: 431.605050  Degree: 1  Resp: 431.605050  Card: 1.500000  Bytes: 0.000000

***************************************

and Connor said...

The index path is indeed *available* but the optimizer doesn't go for it unless prompted
eg

SQL> create table T as
  2  select d.*,
  3         case when rownum < 20 then object_id end low_freq_val
  4  from dba_objects d,
  5    ( select 1 from dual connect by level <= 50 )
  6  /

Table created.

SQL> create index IX on T ( low_freq_val );

Index created.

So there's only 20 rows that have a 'low_freq_val' present, making it a tiny tiny index. But its not picked

SQL> explain plan for
  2  select * from T
  3  where low_freq_val != 12;

Explained.

SQL> @exp

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    18 |  2088 | 22058   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    |    18 |  2088 | 22058   (1)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("LOW_FREQ_VAL"<>12)

13 rows selected.


The standard range scan is available obviously

SQL> explain plan for
  2  select * from T
  3  where low_freq_val < 12;

Explained.

SQL> @exp

--------------------------------------------------------------------------------------------
| Id  | Operation                           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |      |     4 |   464 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T    |     4 |   464 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | IX   |     4 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("LOW_FREQ_VAL"<12)

14 rows selected.


Yet, we cant coerce the optimizer using a concat because the OR is transformed to a "!=" anyway.

SQL> explain plan for
  2  select /*+ use_concat */ * from T
  3  where low_freq_val < 12 or low_freq_val > 12;

Explained.

SQL> @exp

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    18 |  2088 | 22058   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    |    18 |  2088 | 22058   (1)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("LOW_FREQ_VAL"<>12)

13 rows selected.


We can however demand the index by used

SQL> explain plan for
  2  select /*+ index(t) */ * from T
  3  where low_freq_val != 12 ;

Explained.

SQL> @exp

--------------------------------------------------------------------------------------------
| Id  | Operation                           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |      |    18 |  2088 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T    |    18 |  2088 |     2   (0)| 00:00:01 |
|*  2 |   INDEX FULL SCAN                   | IX   |     1 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("LOW_FREQ_VAL"<>12)



Its just one of those things. My hypothesis is that the optimizer doesnt make inferences about values based on subtraction, ie, "I've got 'n' values in total, and 'm' values of a specific case, thus there is n-m values representing the rest". So (my hypothesis) is that it knows there are 20 values in the index, and 'n' values are '12' (where 'n' depends on histogram presence), but it does not come to conclusion that there hence is 20-n potential index keys to scan.

Rating

  (2 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Index Full scan

Rajeshwaran Jeyabal, January 04, 2016 - 6:41 am UTC

Thanks. this is not what i am looking for, since this would lead to full scan all the leaf blocks rather than range scan a portion of leaf blocks.

SQL> explain plan for
  2  select /*+ index(t) */ * from T
  3  where low_freq_val != 12 ;

Explained.

SQL> @exp

--------------------------------------------------------------------------------
| Id  | Operation                           | Name | Rows  | Bytes | Cost (%CPU)
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |      |    18 |  2088 |     2   (0)
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T    |    18 |  2088 |     2   (0)
|*  2 |   INDEX FULL SCAN                   | IX   |     1 |       |     1   (0)
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("LOW_FREQ_VAL"<>12)



Its just one of those things. My hypothesis is that the optimizer doesnt make inferences about values based on subtraction, ie, "I've got 'n' values in total, and 'm' values of a specific case, thus there is n-m values representing the rest". So (my hypothesis) is that it knows there are 20 values in the index, and 'n' values are '12' (where 'n' depends on histogram presence), but it does not come to conclusion that there hence is 20-n potential index keys to scan.

Well said, this seem to be one of the area, where optimizer has to be enhanced.
Connor McDonald
January 04, 2016 - 7:02 am UTC

How could it *not* be a full scan.

If I ask for "McDonald" in the telephone book, I can jump (ie, range scan) the "M" section.

If I ask for NOT "McDonald", I have to scan A, B, C, .... Z


NDV and Histograms in place

Rajeshwaran Jeyabal, January 05, 2016 - 5:31 am UTC

If I ask for NOT "McDonald", I have to scan A, B, C, .... Z

BTW: dont forget about histograms and NDV for that column.

As per the initial test case provided
a) column X has only 2 distinct values. that is recorded in user_tab_col_statistics.NUM_DISTINCT
b) the presence of histograms says the column X is massively skewed, with value 99 as popular and 1 as infrequent.

Now when this query select * from t where x <> 99 goes for optimization, the optimizer has to look for NDV and histograms and should realize that query is not looking for popular values, so all i need is only rows mapping to "infrequent" values to be returned, so fastest way to return those values should be using an index range scan on those leaf blocks. (but 10053 trace shows that, optimizer dont even consider costing the index access path)

But as you said, this seem to be one of the area, where optimizer has to be enhanced.
<quote>
Its just one of those things. My hypothesis is that the optimizer doesnt make inferences about values based on subtraction, ie, "I've got 'n' values in total, and 'm' values of a specific case, thus there is n-m values representing the rest". So (my hypothesis) is that it knows there are 20 values in the index, and 'n' values are '12' (where 'n' depends on histogram presence), but it does not come to conclusion that there hence is 20-n potential index keys to scan.
</quote>
Connor McDonald
January 05, 2016 - 7:00 am UTC

"so all i need is only rows mapping to "infrequent" values to be returned"

I'd contend thats a non-trivial task. In the simplest form, I must start at the very "left" of the index, and scan all the way along the index until I get to the "right" of the index (which is what we *can* do if hinted).

The alternative would be a lot of hopping back and forth to branch blocks to try identify subsets of leaf blocks to "skip". That's a big ask.

More to Explore

PL/SQL demos

Check out more PL/SQL tutorials on our LiveSQL tool.

PL/SQL docs

PL/SQL reference manual from the Oracle documentation library