The Tom Kyte Blog: Something about nothing...

Tuesday, January 24, 2006

Something about nothing...

Something about nothing. A frequently misunderstood concept with NULLS and indexes is the myth that NULL values are not tracked in “normal” indexes such as a conventional B*Tree index. Therefore, many people conclude erroneously that “WHERE COLUMN IS NULL” will not use, in fact cannot, use an index.

The facts are:

Entirely NULL keys are not entered into a ‘normal’ B*Tree in Oracle

Therefore, if you have a concatenated index on say C1 and C2, then you will likely find NULL values in it – since you could have a row where C1 is NULL but C2 is NOT NULL – that key value will be in the index.

As long as you have some not null column in your set of indexed columns, you will find NULLs in the index. Additionally – you always have a NOT NULL (virtual) column to use no matter what!

First, consider this example, supposed we want to leave OBJECT_ID as NULLABLE, but did have a column OWNER that was NOT NULL:


ops$tkyte@ORA9IR2> create table t
2  as
3  select object_id, owner, object_name
4    from dba_objects;
Table created.

ops$tkyte@ORA9IR2> alter table t modify (owner NOT NULL);
Table altered.

ops$tkyte@ORA9IR2> create index t_idx on t(object_id,owner);
Index created.

ops$tkyte@ORA9IR2> desc t
Name                    Null?    Type
----------------------- -------- ----------------
OBJECT_ID                        NUMBER
OWNER                   NOT NULL VARCHAR2(30)
OBJECT_NAME                      VARCHAR2(128)

ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats(user,'T');
PL/SQL procedure successfully completed.

Well, that index can certainly be used to satisfy “IS NOT NULL” when applied to OBJECT_ID:


ops$tkyte@ORA9IR2> set autotrace traceonly explain
ops$tkyte@ORA9IR2> select * from t where object_id is null;

Execution Plan
----------------------------------------------------------
0      SELECT STATEMENT Optimizer=CHOOSE (Cost=3 Card=1 Bytes=34)
1    0   TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=3 Card=1 Bytes=34)
2    1     INDEX (RANGE SCAN) OF 'T_IDX' (NON-UNIQUE) (Cost=2 Card=1)

In fact – even if the table did not have any NOT NULL columns, or we didn’t want/need to have a concatenated index involving OWNER – there is a transparent way to find the NULL OBJECT_ID values rather easily:


ops$tkyte@ORA9IR2> drop index t_idx;
Index dropped.

ops$tkyte@ORA9IR2> create index t_idx_new on t(object_id,0);
Index created.

ops$tkyte@ORA9IR2> set autotrace traceonly explain
ops$tkyte@ORA9IR2> select * from t where object_id is null;

Execution Plan
----------------------------------------------------------
0      SELECT STATEMENT Optimizer=CHOOSE (Cost=3 Card=1 Bytes=34)
1    0   TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=3 Card=1 Bytes=34)
2    1     INDEX (RANGE SCAN) OF 'T_IDX_NEW' (NON-UNIQUE) (Cost=2 Card=1)

The “secret sauce” here is to create what is known as a function based index (although some would argue there is no such thing! I detect a Jonathan Lewis write up there…). We simply added “0” to the index – we can search on OBJECT_ID as before, plus find all records where “OBJECT_ID IS NULL” without changing a thing.

I have used (and will continue to use!) the fact that entirely null entries are not made in B*Tree indexes as a positive thing (selective uniqueness for example – ensure that “project name is unique for all active projects” can be enforced via: create unique index proj_name_unique on t(case when status = ‘ACTIVE’ then project_name end ); - that uniquely indexes only the STATUS=’ACTIVE’ records..). But here, we can use the “reverse” of that trick to ensure we never index an entirely NULL key!

34 Comments:

Raymond said....: Very, very neat trick and very helpfull to fix a problem in a off-the-shelf application. To coin your phrase, "I learn something new every day".; Tue Jan 24, 08:05:00 PM EST
R Menon said....: Hmm..Did not quite understand what the addition of "0" means here?
create index t_idx_new on t(object_id,0);

How did that make t_idx_new a function based index?

Also, are we not supposed to use cascade=>true in the dbms_stats invocation to get stats for index as well?; Tue Jan 24, 08:41:00 PM EST
Thomas Kyte said....: "0" is not a column in the table, it is therefore a "function", an expression. It makes this a function based index with a non-null attribute in it.

All I needed was to use the CBO, you could gather stats on the index (in 10gr2 - create index would have done it by default actually in this example). You could gather other kinds of stats as well - histograms and such.

Was not necessary to demonstrate that "where column is null" can and will use an index easily.; Tue Jan 24, 08:48:00 PM EST
David Aldridge said....: The thought occurs that those repeating values of 0 are a bit of a waste of space, and by reversing the order of the columns we can compress the 0's.

This means relying on an index skip-scan on a function-based index -- any downside you can think of to this?

create index t_idx_new on t(object_id,0);
create index t_idx_new_rvs on t(0,object_id) compress 1;

select INDEX_NAME,LEAF_BLOCKS
from user_indexes where table_name = 'T';

INDEX_NAME LEAF_BLOCKS
------------------------------ -----------
T_IDX_NEW 142
T_IDX_NEW_RVS 126

An 11% saving there.; Tue Jan 24, 08:50:00 PM EST
R Menon said....: got it - thanx! That is a neat one! Did you "discover" it recently?; Tue Jan 24, 08:50:00 PM EST
Anthony Wilson said....: G'day Tom,

Did you miss a "UNIQUE" in your last example (the proj_name_uniq index)?

cheers,
Anthony

PS: You just helped me solve a nagging problem in a very elegant way. Thanks!; Tue Jan 24, 10:54:00 PM EST
David Aldridge said....: Tut. This stupid font is making o's look like 0's when you view the comments, and thgen they lok different when you are posting a comment.; Wed Jan 25, 12:39:00 AM EST
Robert said....: Tom said:
I’ve used the fact that entirely null entries are not made in B*Tree....
But here, we can use the “reverse” of that trick to ensure we never index an entirely NULL key!

Tom,
"entirely null entries" and "entirely NULL key" - Are they the same thing (here in this context) ?
If yes, could you please be consistent for the benefit of us less experienced. Thanks.; Wed Jan 25, 03:02:00 AM EST
Martin said....: A really good insight - thank you for this!; Wed Jan 25, 03:06:00 AM EST
Thomas Kyte said....: Did you miss a "UNIQUE"
Indeed, I had - it is there now, thanks much for the good spot.

"entirely null entries" and "entirely NULL key" - Are they the same thing

Yes, they are the same thing.

A null attribute/column will be entered into the B*Tree - if some other attribute/column of that entry/key is NOT NULL; Wed Jan 25, 06:57:00 AM EST
Todor Botev said....: Tom said:
I’ve used the fact that entirely null entries are not made in B*Tree indexes as a positive thing in the past

Why in the past? Don't you use it any longer? I works perfect for me - do you now have anything better to propose? :); Wed Jan 25, 09:00:00 AM EST
Thomas Kyte said....: Why in the past

Updated to clarify :); Wed Jan 25, 09:05:00 AM EST
Joel Garry said....: This means relying on an index skip-scan on a function-based index -- any downside you can think of to this?

See bug 4904838 to start...; Wed Jan 25, 09:53:00 AM EST
Thomas Kyte said....: Joel Garry said...

that bug will likely be closed as "not a bug", but rather an implementation restriction with in-lists.

But I was having an offline with David A. about this skip scan. It is not available with the "function" (0) on the leading edge - skip scans won't be done in that case - so it is not an option.; Wed Jan 25, 10:03:00 AM EST
Robert Vollman said....: I pick a bone with your title. You're talking about NULL, not nothing. NULL is not nothing.

A lot of programmers assume NULL is the same as nothing (as is the case in other languages). But NULL is "Unknown", not nothing.

You should call it "Something about NULL..."

http://thinkoracle.blogspot.com/2005/05/null-vs-nothing.html; Wed Jan 25, 12:07:00 PM EST
Thomas Kyte said....: but null doesn't rhyme with something...; Wed Jan 25, 12:12:00 PM EST
Anonymous said....: To Rob Vollman,

NULL is not "Unknown". "Unknown" is a value, and NULL is not a value.

NULL is simply a marker to say 'no value' (which is pretty close to 'no thing'). It may be INTERPRETED as, e.g.
Not known,
Will never be known
Not yet known
To be decided,
Not relevant,
Not yet relevant
and, given the eccentricities of the English language, many other things.

Jonathan Lewis; Wed Jan 25, 12:27:00 PM EST
yas said....: that bug will likely be closed as "not a bug", but rather an implementation restriction with in-lists.
I also have a new open bug for this. Index skip scan cost shows less than an index fast full scan in the 10053 trace but the optimizer chooses index fast full scan. Unfortunately the bug is not open to public yet. This is in 9.2.0.7 (not existent in 9.2.0.5).; Wed Jan 25, 03:52:00 PM EST
Peter Lewis said....: "but null doesn't rhyme with something..."

How about.... "A mull about null" ??; Wed Jan 25, 05:35:00 PM EST
Robert Vollman said....: Bah, Jonathan Lewis - what do you know about Oracle? :)

I'm not exactly clear on the distinction you're making, but it does appear that you agree that NULL is definitely NOT nothing.

I prefer Pete's suggestion! "A mull about NULL."

(Which, according to Jonathan, might mean Tom is blogging about something irrelevant ... :); Wed Jan 25, 05:57:00 PM EST
Joel Garry said....: that bug will likely be closed as "not a bug", but rather an implementation restriction with in-lists.
Yeah, it seems like a silly thing to want a skip scan with an in list, unless one happens to know the list members would be found better with a skip scan, in which case one should use histograms or maybe even the hint (SAP too blunt to do that notwithstanding).

I did say "to start" on purpose, so people would go on to look at the included bugs like 4479933.

I think the basic functionality that is being desired is to be able to use skip scans as range scans, since enterprise software is likely to prepend global identifiers like "company" to most every compound key in the product. So you'd want Oracle to be able to aggregate the current financial results for all companies by listing the part of the key that contains the date plus whatever other filter conditions. Using bind variables, right?; Wed Jan 25, 06:32:00 PM EST
Anonymous said....: Speaking of indexes, has anyone read this book :
http://www.amazon.com/gp/product/0471719994/qid=1138234511/sr=1-1/ref=sr_1_1/002-6837186-8172853?s=books&v=glance&n=283155; Wed Jan 25, 07:28:00 PM EST
Laurent Schneider said....: NULL is not "Unknown"

well, at least I can proof that unknown is null :

1) a = NULL is UNKNOWN
http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/sql_elements005.htm#sthref491
2) exec if ( (1=null) is null ) then dbms_output.put_line('unknown is null'); end if
unknown is null; Thu Jan 26, 07:41:00 AM EST
Anonymous said....: Laurent,

In this context, "UNKNOWN" is being used to represent a third boolean value and Oracle (Corp) has used "null" in two different ways - once as the marker for an absence of data and once as the boolean value.

Jonathan Lewis; Thu Jan 26, 02:58:00 PM EST
Noons said....: Very interesting trick, this one. I wonder if something similar could be done for varchar2 column leading indexes?

I can see a few interesting applications of this already. Come to think of it: when an index is the concatenation of two or more columns of different data types, what's the type of the concatenated indexed value and is there some FBI/implicit conversion going on behind the scenes?

Say for example:

create index iblah
on blah(col1, col2,0);

where col1 = DATE and col2 = VARCHAR2, what is the data type of the actual index key concatenation stored in the index entry? And is there any implicit conversion happening if I then use:
"where col1= sysdate" ?; Thu Jan 26, 08:39:00 PM EST
Anonymous said....: Nuno,

Your question demonstartes EXACTLY the reason why I wrote the note "There's no such thing as a function-based index" that Tom highlighted in his blog.

Your example is a three column index, of which the third column is a virtual column of type number.

Columns don't lose their identity (or type) by going into an index; and an index does not have an 'index entry type'. But the name 'FBI' distracts you into thinking that the index definition is more than the sum of its parts, and invites you to forget about the significance of the columns.

The only "loss of identity" occurs when you use columns as inputs to functions - then the type of the function IS the type of the virtual column, and is completely independent of the types of the input columns.

Jonathan Lewis; Fri Jan 27, 02:55:00 AM EST
Anonymous said....: What are the minimal requirements for this trick to work? I tried it in Oracle 9.2.0.1.0 and the execution plan did not change at all.; Tue Apr 11, 04:27:00 AM EDT
Thomas Kyte said....: What are the minimal requirements for this trick to work?

Minimal requirements are an optimizer with a brain - the Cost Based One.

Make sure you are using it (gather stats). The autotraces above - since they include the card= cost= bits - shows I was using it.; Tue Apr 11, 07:10:00 AM EDT
Anonymous said....: In the interests of making it easier to find, here's the updated link to the article referenced.

No FBI post; Tue Nov 28, 09:11:00 AM EST
Chris Taylor said....: Hey, Tom, I think I found a bug in this in 10.2.0.3.

I added an Index to a table using
(col1, col2, col3, 0)
for testing because some queries use
"WHERE COLx is NULL" or IS NOT NULL.

Well, after adding the index, one of our reports that joins many tables and using a NVL function quit returning rows. If I drop the index and add it without the ,0 the query returns fine. Very wierd. Not sure even how to search for this on metalink for an existing bug. I'm thinking about creating a case myself.

Very bizarre...

Thoughts?; Mon Oct 08, 10:57:00 AM EDT
Chris Taylor said....: Looks like this might be the bug:

Note:4621590.8
Subject: Bug 4621590 - Wrong result (NULL) from function based index with OUTER join; Mon Oct 08, 11:03:00 AM EDT
Chris Taylor said....: You can use a "1", instead of a "0" and the results will return correctly for anyone that comes along to read this.; Wed Oct 10, 09:18:00 AM EDT
Chris Taylor said....: Well, interesting, if you specify a '1' instead of '0' you get an error on DBMS_STATS.GATHER_TABLE_STATS with CASCADE=>TRUE.; Mon Oct 15, 08:33:00 AM EDT
Anesha said....: Hi Nice Blog .Data processing services are helpful in streamlining a wide range of corporate activities and operations. Data processing and related other services are not only good to present the full and processed data that is to be used for the overall benefit rather their primary function is to present an insightful explanation of the data.data conversion services india; Tue Dec 02, 12:03:00 AM EST