Skip to Main Content
  • Questions
  • null conditions in SQL and index usage

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, khalid.

Asked: March 15, 2010 - 2:48 pm UTC

Answered by: Tom Kyte - Last updated: May 27, 2010 - 7:29 am UTC

Category: Database - Version: 10.2.0.4

Viewed 10K+ times! This question is

You Asked

Dear Tom, How does the optimizer deals with checking conditions for null values for indexed column example (where table_name.column_name is null) or (where table_name.column_name is not null) if the filed column_name have index on it or is part of an index? In another word when the optimizer uses the index on the column have null values and if the value of null columns is very less (less than 1%-5%) of the rows in the table.

and we said...

see

http://asktom.oracle.com/Misc/something-about-nothing.html


we do not do things by "%'s", we do them based on cardinality estimates and clustering factors.

We estimate the number of rows that would be found using an index, then we look at how spread out the data is (clustering factor) in the table to see how many database table blocks we'd have to read - and then we decide if reading every block without the index overhead is cheaper then reading the blocks through the index.

We might stop using one index at 0.5% (or less even) whereas we might continue using some other index to 20% (or more even).


http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1032431852141

and you rated our response

  (8 ratings)

Is this answer out of date? If it is, please let us know via a Review

Reviews

March 16, 2010 - 1:36 pm UTC

Reviewer: A reader

Respected Sir,

<quote>

We might stop using one index at 0.5% (or less even) whereas we might continue using some other index to 20%
(or more even).......

<quote>

It would be very appreciate if you give an example ?


Many Thnaks



Tom Kyte

Followup  

March 16, 2010 - 2:46 pm UTC

ops$tkyte%ORA11GR2> create table organized
  2  as
  3  select x.*
  4    from (select * from all_objects order by object_name) x
  5  /

Table created.

ops$tkyte%ORA11GR2> create table disorganized
  2  as
  3  select x.*
  4    from (select * from organized order by dbms_random.random) x
  5  /

Table created.

<b>those tables are arguably "the same", but they are very very different - because the rows are in different orders on the disk</b>

ops$tkyte%ORA11GR2> create index organized_idx on organized(object_name);

Index created.

ops$tkyte%ORA11GR2> create index disorganized_idx on disorganized(object_name);

Index created.

ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> begin
  2  dbms_stats.gather_table_stats
  3  ( user, 'ORGANIZED',
  4    estimate_percent => 100,
  5    method_opt=>'for all indexed columns size 254'
  6  );
  7  dbms_stats.gather_table_stats
  8  ( user, 'DISORGANIZED',
  9    estimate_percent => 100,
 10    method_opt=>'for all indexed columns size 254'
 11  );
 12  end;
 13  /

PL/SQL procedure successfully completed.

<b>both are indexed similarly and have the same level of statistics</b>

ops$tkyte%ORA11GR2> select table_name, blocks, num_rows, 0.05*num_rows, 0.10*num_rows from user_tables
  2  where table_name like '%ORGANIZED' order by 1;

TABLE_NAME                         BLOCKS   NUM_ROWS 0.05*NUM_ROWS 0.10*NUM_ROWS
------------------------------ ---------- ---------- ------------- -------------
DISORGANIZED                         1053      72082        3604.1        7208.2
ORGANIZED                            1054      72082        3604.1        7208.2

ops$tkyte%ORA11GR2> select table_name, index_name, clustering_factor from user_indexes
  2  where table_name like '%ORGANIZED' order by 1;

TABLE_NAME                     INDEX_NAME                     CLUSTERING_FACTOR
------------------------------ ------------------------------ -----------------
DISORGANIZED                   DISORGANIZED_IDX                           72030
ORGANIZED                      ORGANIZED_IDX                               1028

<b>but their stats are different - because the organization of the data is different.  One has a clustering factor near the number of blocks in the table (good) and the other near the number of rows (bad).  The index with the low clustering factor will be used for large range scans - a high percentage of the table can be retrieved efficiently with that one.  The other - not so much

Notice that 5% of the table is about 3,600 rows...</b>

ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> column PLAN_TABLE_OUTPUT format a120 truncate
ops$tkyte%ORA11GR2> set autotrace traceonly explain
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> select * from organized where object_name like 'X%';

Execution Plan
----------------------------------------------------------
Plan hash value: 1925627673

---------------------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               |   383 | 37151 |    10   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| ORGANIZED     |   383 | 37151 |    10   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | ORGANIZED_IDX |   383 |       |     4   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_NAME" LIKE 'X%')
       filter("OBJECT_NAME" LIKE 'X%')

ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> select * from disorganized where object_name like 'X%';

Execution Plan
----------------------------------------------------------
Plan hash value: 2727546897

----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              |   383 | 37151 |   288   (1)| 00:00:04 |
|*  1 |  TABLE ACCESS FULL| DISORGANIZED |   383 | 37151 |   288   (1)| 00:00:04 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("OBJECT_NAME" LIKE 'X%')

<b>Notice that one used an index and the other a full scan.

The costs are very different.

The second figured it was better to full scan rather than do 383 single block IO's (because we scattered the data all over the place) to retrieve the rows from the table. 

383 is about 0.5% of the table, we stopped using it.  In fact, we'd have to be getting less than about 0.3/0.4% of that table before we'd use an index:</b>

ops$tkyte%ORA11GR2> select * from disorganized where object_name like 'R%';

Execution Plan
----------------------------------------------------------
Plan hash value: 3767053355

------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                  |   250 | 24250 |   254   (0)| 00:00:04 |
|   1 |  TABLE ACCESS BY INDEX ROWID| DISORGANIZED     |   250 | 24250 |   254   (0)| 00:00:04 |
|*  2 |   INDEX RANGE SCAN          | DISORGANIZED_IDX |   250 |       |     4   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_NAME" LIKE 'R%')
       filter("OBJECT_NAME" LIKE 'R%')


Thank you.

March 16, 2010 - 1:46 pm UTC

Reviewer: Khalid from Yemen

I was not aware of the “secret sauce” We simply added “0” to the index. Thank you very much Tom. hope its not new question,but is there any web page or book talking about these "secret sauce" I think there are many and we do really need to know them.
Tom Kyte

Followup  

March 16, 2010 - 2:54 pm UTC

I wouldn't call this "secret sauce".

0 isn't special, all you need is any not nullable attribute in the index. It is documented that an entirely null key is not placed in a b*tree index, but if any attribute is not null - then a key is placed in there. Hence, if you have a single column index on (x) and say "where x is null", it follows (it has to follow) that where x is null *cannot* use the index - since x is the only thing in there and entirely null keys are never made in b*tree indexes. But, if you index (x,<anything not nullable>) we KNOW now that all rows are in the index - and hence "x is null" can in fact use the index.

What you really need to know is "how things work", then you have access to an infinite set of 'secret sauces' - all known by you implicitly. It is how I know what I know.

I didn't to know Oracle the way I know Oracle by reading books of silver bullets, magic solutions, tips and tricks.

I got to know Oracle the way I know Oracle by knowing Oracle - the concepts, the basics. Everything else comes from that.

curious case

May 20, 2010 - 6:45 am UTC

Reviewer: A reader

Oracle documentation says that if we have an index on a column which has NULL values and if we use "IS NULL" or "IS NOT NULL" for it then the query should not use Index. Here is a simple test case for it, is there anything which has changed in 11.1.0.7.0 ?

SELECT * FROM v$version 
/
DROP INDEX XYZ_M1
/
DROP TABLE XYZ
/
CREATE TABLE XYZ (TRANSACTION_PROJECT_ID NUMBER(18,0),TRANSACTION_TASK_ID NUMBER(18,0))
/
VAR i NUMBER
VAR j NUMBER
begin
for i in 1 .. 1000 loop
   insert into XYZ values (NULL, 77);
end loop;
for j in 1 .. 10000 loop
   insert into XYZ values (33, 77);
end loop;
commit;
end;
/
select count(*) from xyz
/
create index XYZ_M1 on XYZ(TRANSACTION_PROJECT_ID)
/
exec DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => 'FUSION', tabname=>'XYZ', cascade => TRUE, estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE, method_opt => 'FOR ALL INDEXED COLUMNS SIZE AUTO')
/
set autotrace traceonly explain
select count(*) from xyz where TRANSACTION_PROJECT_ID IS NOT NULL
/
select count(*) from xyz where TRANSACTION_PROJECT_ID IS NULL
/
set autotrace off
=====================================================
Output which i get 

SQL> @testcase.sql

Execution Plan
----------------------------------------------------------
Plan hash value: 1078166315

------------------------------------------------------------------------------
| Id  | Operation        | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |           |     1 |    54 |     0   (0)| 00:00:01 |
|*  1 |  FIXED TABLE FULL| X$VERSION |     1 |    54 |     0   (0)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("INST_ID"=USERENV('INSTANCE'))


Index dropped.


Table dropped.


Table created.


PL/SQL procedure successfully completed.


Execution Plan
----------------------------------------------------------
Plan hash value: 4074547944

-------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     9   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |            |          |
|   2 |   TABLE ACCESS FULL| XYZ  | 11000 |     9   (0)| 00:00:01 |
-------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement


Index created.


PL/SQL procedure successfully completed.


Execution Plan
----------------------------------------------------------
Plan hash value: 1300267402

--------------------------------------------------------------------------------
| Id  | Operation             | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |        |     1 |     3 |     7   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE       |        |     1 |     3 |            |          |
|*  2 |   INDEX FAST FULL SCAN| XYZ_M1 | 10000 | 30000 |     7   (0)| 00:00:01 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("TRANSACTION_PROJECT_ID" IS NOT NULL)


Execution Plan
----------------------------------------------------------
Plan hash value: 4074547944

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     9   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     3 |            |          |
|*  2 |   TABLE ACCESS FULL| XYZ  |  1000 |  3000 |     9   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("TRANSACTION_PROJECT_ID" IS NULL)

Tom Kyte

Followup  

May 24, 2010 - 11:36 am UTC

http://asktom.oracle.com/Misc/something-about-nothing.html

read it again, it was in the original answer, it has examples, it spells it all out.

May 20, 2010 - 6:48 am UTC

Reviewer: A reader

Sorry some mistake in my earlier post, here is the correct output.
SQL> @testcase.sql

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - Production
PL/SQL Release 11.1.0.7.0 - Production
CORE    11.1.0.7.0      Production
TNS for Linux: Version 11.1.0.7.0 - Production
NLSRTL Version 11.1.0.7.0 - Production


Index dropped.


Table dropped.


Table created.


PL/SQL procedure successfully completed.


  COUNT(*)
----------
     11000


Index created.


PL/SQL procedure successfully completed.


Execution Plan
----------------------------------------------------------
Plan hash value: 1300267402

--------------------------------------------------------------------------------
| Id  | Operation             | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |        |     1 |     3 |     7   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE       |        |     1 |     3 |            |          |
|*  2 |   INDEX FAST FULL SCAN| XYZ_M1 | 10000 | 30000 |     7   (0)| 00:00:01 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("TRANSACTION_PROJECT_ID" IS NOT NULL)


Execution Plan
----------------------------------------------------------
Plan hash value: 4074547944

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     9   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     3 |            |          |
|*  2 |   TABLE ACCESS FULL| XYZ  |  1000 |  3000 |     9   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("TRANSACTION_PROJECT_ID" IS NULL)

SQL> 

May 20, 2010 - 6:53 am UTC

Reviewer: A reader

Instead of count(*) if we do * then it does behave appropriately.

SQL> select * from xyz where transaction_project_id is NULL;

Execution Plan
----------------------------------------------------------
Plan hash value: 2404645487

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  1000 |  5000 |     9   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| XYZ  |  1000 |  5000 |     9   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("TRANSACTION_PROJECT_ID" IS NULL)

SQL> select * from xyz where transaction_project_id is NOT NULL;

Execution Plan
----------------------------------------------------------
Plan hash value: 2404645487

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 10000 | 50000 |     9   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| XYZ  | 10000 | 50000 |     9   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("TRANSACTION_PROJECT_ID" IS NOT NULL)


Tom Kyte

Followup  

May 24, 2010 - 11:36 am UTC

define "appropriately"

please see above for my response to you regarding nulls and indexes.

May 26, 2010 - 4:54 am UTC

Reviewer: A reader

Sorry, still didn't get you.

http://tkyte.blogspot.com/2006/01/something-about-nothing.html
From the above link we are trying to workaround the problem of using IS NULL/IS NOT NULL on a NULLABLE column by concatinating with other not null column or with a constant (eg : 0 in the link above). But in my example i have not tried to use any of the mentioned tricks for Optimizer to choose the index ? (theoretically)

OR

are you trying to refer to the answer given in this post where you mention that access path is not dependent on any percentages ? So, are we saying that there is no hard and fast rule on IS NULL/IS NOT NULL even if oracle doc says so ?


Tom Kyte

Followup  

May 26, 2010 - 8:03 am UTC

... So, are we saying that there
is no hard and fast rule on IS NULL/IS NOT NULL even if oracle doc says so ?
...

but the oracle does do not say so, you wrote:

Oracle documentation says that if we have an index on a column which has NULL values and if we use
"IS NULL" or "IS NOT NULL" for it then the query should not use Index.


I posted the link above to show that the statement as you typed it in is 100% provably FALSE. The query can and will use indexes when appropriate. I gave a counter case that proves that statement FALSE. That was the purpose of the link.


Also, you just say "oracle doc says so" but you never quoted a reference. I would like to see it in context. It will either be:

a) there is a documentation bug which needs be fixed
b) you have misread what they were saying - which could be either because
1) what they were saying was not clear
2) you misread it


So, post the link to otn.oracle.com and we'll see.

May 27, 2010 - 3:34 am UTC

Reviewer: A reader

Its quite that my interpretation of the documented statement may differ.

Here is the document link

http://download.oracle.com/docs/cd/E11882_01/server.112/e10595/indexes002.htm#ADMIN11711

"Columns That Are Suitable for Indexing

Some columns are strong candidates for indexing. Columns with one or more of the following characteristics are candidates for indexing:
* Values are relatively unique in the column.
* There is a wide range of values (good for regular indexes).
* There is a small range of values (good for bitmap indexes).
* The column contains many nulls, but queries often select all rows having a value. In this case, use the following phrase:
WHERE COL_X > -9.99 * power(10,125)
Using the preceding phrase is preferable to:
WHERE COL_X IS NOT NULL

This is because the first uses an index on COL_X (assuming that COL_X is a numeric column).

Columns That Are Not Suitable for Indexing

Columns with the following characteristics are less suitable for indexing:

* There are many nulls in the column and you do not search on the not null values.
LONG and LONG RAW columns cannot be indexed."
Tom Kyte

Followup  

May 27, 2010 - 7:29 am UTC

if col_x is nullable AND col_x is the only thing in the index, then what they say is true.

if col_x is nullable AND col_x is on the leading edge of the index, AND there is some other NOT NULL attribute in the index, then what they say is not really true.




And note: they say things like "less suitable", not "does not work"


Understand how indexes work and you don't need anyones list, you could write your own.

Optimizer using wrong index

August 08, 2011 - 8:09 am UTC

Reviewer: Rakesh from Canda

SQL> EXPLAIN PLAN FOR SELECT    ST.CELL_NUMBER FROM ABCDPROD.ABCD_INVT_MST_CELLNO_CAT_STAG ST, ABCDPROD.ABCD_INVT_MST_ITEM_MSISDN_DTL MD, CP
ST_PARAM P WHERE ST.CELL_NUMBER = MD.MSISDN_NUMBER AND MD.MSISDN_STATUS_ID = P.PARAM_ID AND MD.CIRCLE_ID = 11 AND MD.IS_ACTIVE = 'Y' AND ST.
D ST.IS_INCLUDED_FLAG = 'X' AND ( UPPER(P.SHORT_DESC) = 'AVAILABLE' OR UPPER(P.SHORT_DESC) = 'UNAVAILABLE' OR UPPER(P.SHORT_DESC) = 'IDLE' )

Explained.

SQL> @?/rdbms/admin/utlxpls.sql

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3537718687

--------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name                          | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                               |  1437 | 60354 |  5077   (1)| 00:00:38 |       |       |
|*  1 |  HASH JOIN                           |                               |  1437 | 60354 |  5077   (1)| 00:00:38 |       |       |
|*  2 |   TABLE ACCESS FULL                  | ABCD_CMN_MST_PARAM            |     4 |    56 |     2   (0)| 00:00:01 |       |       |
|   3 |   NESTED LOOPS                       |                               |  4846 |   132K|  5074   (1)| 00:00:38 |       |       |
|*  4 |    INDEX FAST FULL SCAN              | IDX_STG_CELL                  |  4846 | 58152 |    96   (3)| 00:00:01 |       |       |
|*  5 |    TABLE ACCESS BY GLOBAL INDEX ROWID| ABCD_INVT_MST_ITEM_MSISDN_DTL |     1 |    16 |     2   (0)| 00:00:01 |    11 |    11 |
|*  6 |     INDEX UNIQUE SCAN                | UK_C0012643_UK                |     1 |       |     1   (0)| 00:00:01 |       |       |
--------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("MD"."MSISDN_STATUS_ID"="P"."PARAM_ID")
   2 - filter(UPPER("P"."SHORT_DESC")='AVAILABLE' OR UPPER("P"."SHORT_DESC")='IDLE' OR UPPER("P"."SHORT_DESC")='UNAVAILABLE')
   4 - filter("ST"."CIRCLE_ID"=11 AND "ST"."IS_INCLUDED_FLAG"='X')
   5 - filter("MD"."IS_ACTIVE"='Y' AND "MD"."CIRCLE_ID"=11)
   6 - access("ST"."CELL_NUMBER"="MD"."MSISDN_NUMBER")

22 rows selected.

SQL> select clustering_factor,last_analyzed from dba_indexes where index_name='IDX_STG_CELL';

CLUSTERING_FACTOR LAST_ANAL
----------------- ---------
            16569 05-AUG-11


SQL>  select table_name,last_analyzed,blocks  from dba_tables where table_name='ABCD_INVT_MST_CELLNO_CAT_STAG';

TABLE_NAME                     LAST_ANAL     BLOCKS
------------------------------ --------- ----------
ABCD_INVT_MST_CELLNO_CAT_STAG  05-AUG-11       4024


1 row selected.

SQL> select blocks from dba_Segments where segment_name='ABCD_INVT_MST_CELLNO_CAT_STAG';


    BLOCKS
----------
      4096


SQL> SELECT COLUMN_NAME,COLUMN_POSITION FROM DBA_IND_COLUMNS WHERE INDEX_NAME='IDX_STG_CELL';

COLUMN_NAME                    COLUMN_POSITION
------------------------------ ---------------
IS_INCLUDED_FLAG                             3
CIRCLE_ID                                    2
CELL_NUMBER                                  1



Tom,

If optimizer goes for an index scan,it should switch 16569 blocks, But if it choses fullscan it should scan just 4096 blocks. Why is the optimizer choosing index scan instead of full scan here.