Thanks for the question, Eli.

Asked: February 21, 2002 - 8:53 am UTC

Last updated: February 18, 2005 - 2:20 pm UTC

Version: 8.1.7

Viewed 1000+ times

You Asked

Hi
I would like to get your opinion regarding what i am traing to do.
I have vary big table which include millions of records. The "NAME" column is a nullble column, but in my case include only 2000 null values. My question is how I can index only the null values of this column. also did you see any problem with what I am doing below ?(I am familiar with the restriction of storing null values on an index, and base on this limit i am trying to solve this)
also did you see a problem to create concatenate index with my method below ?

The below is my test case :
REM ---------The function -----------
CREATE OR REPLACE FUNCTION MY_LENGTH (MY_STARING IN varchar2)
RETURN number
DETERMINISTIC
AS
TMP_NUMBER number;
BEGIN
TMP_NUMBER:=LENGTH(MY_STARING);
IF TMP_NUMBER IS NULL
THEN
RETURN 0;
ELSE
RETURN NULL;
END IF;

end MY_LENGTH;
/
REM ---- TEST table
create table foo (id number(10) not null,
name varchar(10),
mydate date not null);

REM ------ Insert data

insert into foo values (1,'ELI',sysdate);
insert into foo values (2,NULL,sysdate);
insert into foo values (3,'STAM',sysdate);
insert into foo values (4,NULL,sysdate);
insert into foo values (5,NULL,sysdate);
commit;

alter session set Optimizer_mode=first_rows;
alter session set QUERY_REWRITE_ENABLED=TRUE;
alter session set QUERY_REWRITE_INTEGRITY=TRUSTED;

REM - create the index
create index null_ix on foo(MY_LENGTH(name));

set autot on
select * from foo where MY_LENGTH(name)=0;
select * from foo where MY_LENGTH(name)is null;

and Tom said...

What I would suggest is you use DECODE instead of writing you own function. It'll be easier/faster then using your own PLSQL function. I would have decode return 1 when name is NULL and NULL otherwise (seems more intuitive to have it return 1 instead of 0 to me for some reason).

So, my example would look like this:

ops$tkyte@ORA817DEV.US.ORACLE.COM> alter session set QUERY_REWRITE_ENABLED=TRUE;

Session altered.

ops$tkyte@ORA817DEV.US.ORACLE.COM> alter session set QUERY_REWRITE_INTEGRITY=TRUSTED;

Session altered.

ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> drop table t;

Table dropped.

ops$tkyte@ORA817DEV.US.ORACLE.COM> create table t ( name varchar2(30), object_type varchar2(255) );

Table created.

ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> insert into t select object_name, object_type from all_objects;

17164 rows created.

ops$tkyte@ORA817DEV.US.ORACLE.COM> insert into t select NULL, object_type from all_objects where rownum <= 200;

200 rows created.

ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> create index t_idx on t( decode( name, NULL, 1, NULL ) );

Index created.

ops$tkyte@ORA817DEV.US.ORACLE.COM> create index t_idx2 on t( decode( name, NULL, 1, NULL ), object_type );

Index created.

ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> analyze table t
2 compute statistics
3 for table
4 for all indexes
5 for all indexed columns;

Table analyzed.

ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> set autotrace on explain
ops$tkyte@ORA817DEV.US.ORACLE.COM> select count(*) from t where decode(name,NULL,1,NULL) = 1;

COUNT(*)
----------
200

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=17)
1 0 SORT (AGGREGATE)
2 1 INDEX (RANGE SCAN) OF 'T_IDX' (NON-UNIQUE) (Cost=1 Card=200 Bytes=3400)

ops$tkyte@ORA817DEV.US.ORACLE.COM> select count(*) from t where decode(name,NULL,1,NULL) is null;

COUNT(*)
----------
17164

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=3 Card=1 Bytes=17)
1 0 SORT (AGGREGATE)
2 1 TABLE ACCESS (FULL) OF 'T' (Cost=3 Card=17164 Bytes=291788)

ops$tkyte@ORA817DEV.US.ORACLE.COM> select count(*) from t where decode(name,NULL,1,NULL) = 1 and object_type = 'PROCEDURE';

COUNT(*)
----------
0

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=26)
1 0 SORT (AGGREGATE)
2 1 INDEX (RANGE SCAN) OF 'T_IDX2' (NON-UNIQUE) (Cost=1 Card=1 Bytes=26)

ops$tkyte@ORA817DEV.US.ORACLE.COM> set autotrace off

That third query there shows concatenated indexes will work just fine as well

We can prove that our indexes only indexed the entirely NULL entries via:

ops$tkyte@ORA817DEV.US.ORACLE.COM> analyze index t_idx validate structure;

Index analyzed.

ops$tkyte@ORA817DEV.US.ORACLE.COM> select lf_rows from index_stats;

LF_ROWS
----------
200

ops$tkyte@ORA817DEV.US.ORACLE.COM> analyze index t_idx2 validate structure;

Index analyzed.

ops$tkyte@ORA817DEV.US.ORACLE.COM> select lf_rows from index_stats;

LF_ROWS
----------
17364

ops$tkyte@ORA817DEV.US.ORACLE.COM>

The first index only has 200 entries since only 200 rows have NULL names. The other index has a row for EVERY entry since the object_type column is never null...

Rating

(6 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Very good

Eli, February 24, 2002 - 3:23 am UTC

Can you explain in more details why concatenate index include an entry for each column on the table .I will expectant that the first column will filter the second column values .Did in this case NULL will be store on the index ?

February 24, 2002 - 9:34 am UTC

If ANY column value in a concatenated index is NOT NULL -- an entry is made. So, if I have an index on

t(a,b,c)

and ANY OF a,b,c is NOT NULL in a row -- an index entry is made.

Very good

ELI, February 24, 2002 - 10:16 am UTC

I need to close the picture . for concatenate indexes shows above it's look like the NULL's are storing on the index. (index include the same amount of record that the table have) . did in this case null are storing on the index ? if yes it is any why to access the table via the nullble values on the index .

February 24, 2002 - 12:54 pm UTC

Think of it like this, when you 

create index t_idx on t(a,b)

it is "as if" it were creating the index on
   

   a || b

if the result of 

  a || b

is NOT NULL -- an entry will be made in the index structure.  If it is NULL, it will not be in the index.  

NULL columns may be stored in an index (a b*tree index) however, entirely NULL index entries are NOT stored in the b*tree. 

If at least one of the columns in the index is not nullable, we have great opportunities to use this index to find nulls in the table, consider:

ops$tkyte@ORA817DEV.US.ORACLE.COM> create table t ( a int, b int NOT NULL, c int );

Table created.

ops$tkyte@ORA817DEV.US.ORACLE.COM> 
ops$tkyte@ORA817DEV.US.ORACLE.COM> insert into t
  2  select decode( mod(rownum,100), 0, null, rownum ), rownum, rownum
  3  from all_objects;

17148 rows created.

ops$tkyte@ORA817DEV.US.ORACLE.COM> 
ops$tkyte@ORA817DEV.US.ORACLE.COM> create index t_idx on t(a,b,c);

Index created.

ops$tkyte@ORA817DEV.US.ORACLE.COM> 
ops$tkyte@ORA817DEV.US.ORACLE.COM> analyze table t compute statistics
  2  for table
  3  for all indexes
  4  for all indexed columns;

Table analyzed.

ops$tkyte@ORA817DEV.US.ORACLE.COM> 
ops$tkyte@ORA817DEV.US.ORACLE.COM> set autotrace traceonly explain
ops$tkyte@ORA817DEV.US.ORACLE.COM> select a,b from t where a is null;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=171 Bytes=1368)
   1    0   INDEX (RANGE SCAN) OF 'T_IDX' (NON-UNIQUE) (Cost=2 Card=171 Bytes=1368)

<b>see how it can use an index range scan now to satisfy IS NULL.  Going further, even if B is NULLABLE in this example:</b>

ops$tkyte@ORA817DEV.US.ORACLE.COM> 
ops$tkyte@ORA817DEV.US.ORACLE.COM> alter table t modify b null;

Table altered.

<b>under the appropriate cases, we can still use the index to satisfy the is null.  Here I just added a predicate on b so that the leading edge of the index is used:
</b>

ops$tkyte@ORA817DEV.US.ORACLE.COM> select a,b from t where a is null and b = 5;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=8)
   1    0   INDEX (RANGE SCAN) OF 'T_IDX' (NON-UNIQUE) (Cost=2 Card=1 Bytes=8)



ops$tkyte@ORA817DEV.US.ORACLE.COM> set autotrace off

<b>and again, we can scan it even though we are looking for NULLs (since the predicate precludes B from being null, we can index range scan safely)</b>

A reader, February 24, 2002 - 12:04 pm UTC

null with index

Abhi, August 11, 2003 - 4:13 am UTC

hello Sir ..

sir if column is not indexed .. then what would be better..

where col1 is not null
or
where length(col1) > 0 ...
column length is varchar2(1000);

Thanx

August 11, 2003 - 7:38 am UTC

the first is semantically more meaningful, hence more correct.

RE : Updates

A reader, February 18, 2005 - 9:34 am UTC

Tom,

I think this approach is great for identifying NULL columns in a huge-table for query purposes. But we have situations where we want to first identify rows in a column (not-indexed) with NULL values and then update that column in those rows with a non-null value from another table/file. This would be performance impacting if we need to create indexes on these columns, since the indexes needs to be updated as well. What would be your approach if you have to perform this operation on a daily basis.
1. You have a table with large number of rows say 20 million rows
2. Relatively fewer rows have NULL values in that column say 20000 rows
3. Identify those rows with NULL values and update those rows with a non-null value.

February 18, 2005 - 9:39 am UTC

my approach would be to ask the fundemental question

"where did we go so wrong in our design that we are forced to do this"

And remember -- physcial schemas are a little like newtonian physics. The statement "for every action there is an equal and opposite reaction" pops into mind.

You say "This would be performance
impacting if we need to create indexes on these columns,"

I might say "Yeah, but the penalty for NOT having them is huge so the miniscule hit you take over here is more than paid for by the speedup over here"

Meaning -- everything is a trade off, you just have to look at what is "best" in your case.

Full scanning 20,000,000 rows won't take very long at the end of the day however. But I'd be much much more concerned with the "why, why are we doing this again?"

Why not fix the code and not put nulls in there in the first place? Or, upon retrieval, if the column is NULL, pull it from the other table?

The best way to speed something up? Not do it AT ALL.

select ...., nvl( column, (select value from other_table where key=T.fk ) )
from T;

that would allow me to not have to update that column

RE : Nulls

A reader, February 18, 2005 - 10:50 am UTC

Tom,

This scenario happens in our system where the file containing that piece of data comes in at a later point in time. However, in order to meet the SLA, we process part of the information and send it to downstream applications. When the file containing that relevant piece of data comes later, we go back and update the data that we had already populated.
Suppose the columns that we are updating are set to NOT NULL and if we put some default values, what is the suggestion you provide that would make us to go back and update this column with the true value from the file at a later point in time to make this process more efficient?

Thanks

February 18, 2005 - 2:20 pm UTC

why not "just join" like suggested? (eg: that field should perhaps not be in that table at all? it is in some other table and you join to pick it up)

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

Very good

Very good

null with index

RE : Updates

RE : Nulls

More to Explore

PL/SQL demos

PL/SQL docs

A way to index only null values

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

Very good

Very good

null with index

RE : Updates

RE : Nulls

More to Explore

PL/SQL demos

PL/SQL docs