Mull about Null...
YAMAN. Yet Another Mull About NULL. This current title was suggested based on a discussion about what exactly “NULL” in the database means. Null is the absence of a value. In a Boolean expression, NULL is “unknown” (it is neither true, nor false – leaving “I don’t know” as the only possible other answer). When persistently stored as a value in a database column – I would say NULL is “the absence of a value”, or more succinctly “nothing”. So, the last title “Something about Nothing” makes good sense to me.
In the last article we explored one of the big myths associated with NULL values and indexes in Oracle and saw that predicates of the form “WHERE COLUMN IS NULL” can easily and naturally use indexes (and have been using indexes in fact for a long time) in Oracle – contrary to popular thought. NULLs are in fact indexed in Oracle in many cases. Whenever you have an index that includes a NOT NULL attribute – all rows in the base table will in fact appear in the index. The result is that that index can be used to search for “NULLS” in any of the other attributes quite naturally without resorting to an “evil” full scan.
I will say that I seriously wish the default state of a database column was NOT NULL – that is, instead of columns by default being NULLABLE, and therefore we would have to go out of our way to make them NULLABLE. My experience is that most columns are in fact NOT NULL – and we just “forget” to state the obvious. This leads to indexes that sometimes cannot be used for a particular class of questions. For example:
Many people will say “but we could just use the index to count these rows, and the index is small – why isn’t the database doing that?”. In fact, they’ll try hints – everything to no avail. The database will steadfastly refuse to use the index to count the rows in this case. The reason – it is entirely possible that an index key entry in the index we created could have an entirely NULL entry – and hence not every row in the table is represented in the index!
Now, we could “fix” that by using the function based index trick in the prior article – but the reality of the situation is that OBJECT_ID is in fact “NOT NULL” and all we need to do is tell the database this fact. And when we do – good things happen
In fact, the optimizer makes use of constraints all of the time (the cost base optimizer mostly, the rule based optimizer is pretty much brain dead in this respect:
In subsequent articles I will be exploring in more depth two related topics:
On an unrelated note, it seems the Wiki that got started is going rather well. After seeing the Wiki next to the Q&A site – I will concur that that Wiki seems to be a much better format for this sort of information. Having the comments on a secondary page as the Q&A site does is an impediment and I like having the ability to edit the Wiki pages. The organization of the Wiki with a table of contents and some structure is nicer as well. I still think it was useful to have both be tried out – for sometimes only time will tell what really works.
If someone had asked me years ago my opinion of a web site that allows people to sell junk from their garage to each other (eBay) I would have laughed at them. I don’t think any of us knows exactly what will work until we see it.
In the last article we explored one of the big myths associated with NULL values and indexes in Oracle and saw that predicates of the form “WHERE COLUMN IS NULL” can easily and naturally use indexes (and have been using indexes in fact for a long time) in Oracle – contrary to popular thought. NULLs are in fact indexed in Oracle in many cases. Whenever you have an index that includes a NOT NULL attribute – all rows in the base table will in fact appear in the index. The result is that that index can be used to search for “NULLS” in any of the other attributes quite naturally without resorting to an “evil” full scan.
I will say that I seriously wish the default state of a database column was NOT NULL – that is, instead of columns by default being NULLABLE, and therefore we would have to go out of our way to make them NULLABLE. My experience is that most columns are in fact NOT NULL – and we just “forget” to state the obvious. This leads to indexes that sometimes cannot be used for a particular class of questions. For example:
ops$tkyte@ORA9IR2> create table we_dont_use_pks
2 as
3 select *
4 from dba_objects
5 where object_id is not null
6 /
Table created.
ops$tkyte@ORA9IR2> create unique index
2 we_dont_use_pks on
3 we_dont_use_pks(object_id)
4 /
Index created.
ops$tkyte@ORA9IR2> begin
2 dbms_stats.gather_table_stats
3 ( user,
4 'we_dont_use_pks',
5 cascade => TRUE );
6 end;
7 /
PL/SQL procedure successfully completed.
ops$tkyte@ORA9IR2> set autotrace traceonly explain
ops$tkyte@ORA9IR2> select count(*)
2 from we_dont_use_pks;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=67 Card=1)
1 0 SORT (AGGREGATE)
2 1 TABLE ACCESS (FULL) OF
'WE_DONT_USE_PKS' (Cost=67 Card=31285)
ops$tkyte@ORA9IR2> set autotrace off
Many people will say “but we could just use the index to count these rows, and the index is small – why isn’t the database doing that?”. In fact, they’ll try hints – everything to no avail. The database will steadfastly refuse to use the index to count the rows in this case. The reason – it is entirely possible that an index key entry in the index we created could have an entirely NULL entry – and hence not every row in the table is represented in the index!
Now, we could “fix” that by using the function based index trick in the prior article – but the reality of the situation is that OBJECT_ID is in fact “NOT NULL” and all we need to do is tell the database this fact. And when we do – good things happen
ops$tkyte@ORA9IR2> alter table we_dont_use_pks
2 modify object_id NOT NULL
3 /
Table altered.
ops$tkyte@ORA9IR2> set autotrace traceonly explain
ops$tkyte@ORA9IR2> select count(*)
2 from we_dont_use_pks;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=12 Card=1)
1 0 SORT (AGGREGATE)
2 1 INDEX (FAST FULL SCAN) OF 'WE_DONT_USE_PKS'
(UNIQUE) (Cost=12 Card=31285)
ops$tkyte@ORA9IR2> set autotrace off
In fact, the optimizer makes use of constraints all of the time (the cost base optimizer mostly, the rule based optimizer is pretty much brain dead in this respect:
ops$tkyte@ORA9IR2> set autotrace traceonly explain
ops$tkyte@ORA9IR2> select /*+ RULE */ count(*)
2 from we_dont_use_pks;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=HINT: RULE
1 0 SORT (AGGREGATE)
2 1 TABLE ACCESS (FULL) OF 'WE_DONT_USE_PKS'
ops$tkyte@ORA9IR2> set autotrace off
In subsequent articles I will be exploring in more depth two related topics:
- How constraints can really impact us (rather how the appalling lack of constraints can adversely affect us). Especially in a data warehouse where they are most important (and people seem most hesitant to use them – too “slow” they say – when in fact the opposite is quite the truth)
- How NULL is not evil as a foreign key value – how outer joins and the need for them are not bad. Why you don’t want to make up some dummy default value for a NULLABLE field just to avoid an outer join. (full scans are not evil, outer joins are not evil and ‘fixing’ the perceived outer join problem with a default value is really not a good idea)
On an unrelated note, it seems the Wiki that got started is going rather well. After seeing the Wiki next to the Q&A site – I will concur that that Wiki seems to be a much better format for this sort of information. Having the comments on a secondary page as the Q&A site does is an impediment and I like having the ability to edit the Wiki pages. The organization of the Wiki with a table of contents and some structure is nicer as well. I still think it was useful to have both be tried out – for sometimes only time will tell what really works.
If someone had asked me years ago my opinion of a web site that allows people to sell junk from their garage to each other (eBay) I would have laughed at them. I don’t think any of us knows exactly what will work until we see it.
16 Comments:
Interesting stuff, Tom!
I almost fear to put this in front of you, but you might find my old post about Oracle's null handling interesting.
The reason – it is entirely possible that an index key entry in the index we created could have an entirely NULL entry – and hence not every row in the table is represented in the index!
I know what you meant ... but the above doesn't sound good.
Kris,
Here is my response, which I also posted over at your blog:
Hmm - I am afraid I must disagree a bit. Once you set the NOT NULL constraint, Oracle is handling the situation correctly. Your rank is not blank, it is CIVILIAN or UNCLEARED or BIN_LADEN_FAMILY_MEMBER or some other actual, defined value that correctly describes you to that attribute.
In other words, the business rule is: people accessing the database must have a rank. A perhaps unstated correlary of that is "there must be a rank that corresponds to every type of person in the world". You haven't created all the members of that set, so the application rightly barfs.
sPh
Hey sPh, I've posted a response back on my blog entry, but I'll let you hunt it down. I wouldn't want to squat my blog on Tom's. :-)
Kris your blog entry shows an oracle anomaly that has been around for ever. It's not exactly new knowledge and unfortunately probably too many applications that depend on this anomaly would croak if oracle were to even think ( which they won't ) about changing it.
But you could fix your application by putting in a DEFAULT value ( a lot of people think anytime you put in NOT NULL you should seriously consider having a DEFAULT value ).
John,
Not null in the column definition means -at least as I read it-: you must put some meaningfull value in the column.
For example;
table t(id number not null, last_name varchar2(100) not null, salary number(10,2) not null)
This means: a person necessarily identified by the number id has the necessary name of last_name and must have a salary of value salary.
These are business rules. The point is, that when you put a default value in the column definition of salary, you could get some really interesting effect when the 'input' proces makes a mistakes and inserts a null value for salary.
I do not think this would be good business practice..
Regards
"When persistently stored as a value in a database column – I would say NULL is “the absence of a value”, or more succinctly “nothing”."
Ok, when stored in the database, your point is valid. In certain contexts, NULL is nothing. But in most cases, NULL and nothing are two different things.
But take for example the process of inserting into the database for purposes of persistence. There, NULL and nothing are treated differently.
In this case, "nothing" is NULL, but NULL is not nothing. NULL is considered something, otherwise default values would be used when inserting NULL (why they aren't).
Oops, I just got a nosebleed...
Rob, I cannot exactly agree that database table column constraints including the DEFAULT ( optional ) constraint are business rules. Certainly business rules might be considered by the people responsible for the logical database design as well as (potentially maybe) the people responsible for logical database design.
Oracle's implementation of how it supports empty character strings in this example is an anomaly. The database specific exceptions to standards are something that anyone doing the physical database implementation should be aware of.
Does this help?
The way I see this is that it is not possible to state explicitly whether DEFAULT value column constraints are business rules or not without knowing about both the business in question and the development guidelines and rules in place.
It has to be a situation where appropriate definitions are assumed and which are dependent on the prevailing business conditions.
However where database development is regulated by business rules and methodologies there is a good chance that any DEFAULT value constraints will be defined as a result of one or more business rules or dependencies. Well, at least they should be, shouldn't they?
I'm not going to mull about NULLs though - I'd only embarrass myself...
DEFAULT value column constraints
The DEFAULT clause is not a constraint.
Fair comment, but your point is?
table t(id number not null, last_name varchar2(100) not null, salary number(10,2) not null)
This means: a person necessarily identified by the number id has the necessary name of last_name and must have a salary of value salary.
These are business rules.
Until you start doing business with people or societies who use singular names...
I don’t think any of us knows exactly what will work until we see it.
I don't think any of us knows how fragile the design is until it breaks.
Martin Mull is an indexing expert?
No, that would have been "Mull on Null", this is "Mull about Null" - distinctly different :)
I think that first definition, to grind or pulverize, describes this post well! (and that is a good thing)
ikiocr
>> "Martin Mull is an indexing expert?"
No, it's just one of my common typos:
WHERE last_name IS NOT MULL
POST A COMMENT
<< Home