Skip to Main Content
  • Questions
  • Fast way to determine non-null columns for an arbitrary query on a single table

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Mark.

Asked: September 21, 2011 - 4:03 pm UTC

Last updated: September 23, 2011 - 11:07 am UTC

Version: 11.2..0.2

Viewed 1000+ times

You Asked

I have a question that might be of general interest.

I need to build, for a potentially large dynamic table (where the columns are defined by our end user), a query that will tell me for a given criteria which columns are entirely null.

I could do that by

    SELECT COUNT(COL1),....COUNT(COLN)
      FROM TABLE_NAME
     WHERE (some user-supplied filter criteria)


If the criteria does not allow the use of an index, this will end up scanning the entire table, to return values I don't really care about (other than that they are not zero).

I'm wondering if there is a smarter way that might not have to table scan the entire table. What I was sort of hoping for would be a scan that could stop looking at a column once it found that there was some non-null value there. Once it found a non-null value in every selected column it could stop before looking at every row.

I suppose I could say

    WITH X AS (
       SELECT * FROM TABLE_NAME
        WHERE (criteria)
              )
    SELECT 
        CASE WHEN EXISTS (
              SELECT * 
                FROM X 
               WHERE COL1 IS NOT NULL) THEN 1 
             ELSE 0 END,   
      ...
        CASE WHEN EXISTS (
              SELECT * 
                FROM X 
               WHERE COLN IS NOT NULL) THEN 1 
             ELSE 0 END   
      FROM DUAL


That expresses that all I care about is that some value exists in the given column, but the explain plan for that shows that Oracle create a temporary table with just the rows that match the criteria, which is not going to be better than the simple COUNT() approach.

Since I don't know all of the possible criteria a user could request, I don't think a materialized view would help.

I'm guessing a user-defined aggregate function would probably perform worse than the built-in COUNT().


Thanks,

Mark Z.






and Tom said...

I cannot think of a method other than procedurally processing the data -

for x in (select nvl2(c1,1,0) c1, nvl2(c2,1,0) c2, ... from t where ... )
loop
   l_c(1) := greatest( x.c1, l_c(1) );
   l_c(2) := greatest( x.c2, l_c(2) );
   ....
   total := 0;
   for i in 1 .. l_c.count loop total := total + l_c(i); end loop;
   exit when total = number_of_columns;
end loop;


or something similar.


Unless this table was in the tens of millions of rows - I would not consider doing that however. We are pretty good at scanning... It can be brutally efficient.

Is this answer out of date? If it is, please let us know via a Comment