Skip to Main Content
  • Questions
  • Update multiple columns of a table in a single pass

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Drazen.

Asked: January 05, 2010 - 2:31 am UTC

Last updated: January 06, 2010 - 1:27 pm UTC

Version: 10.2

Viewed 10K+ times! This question is

You Asked

Hi Tom,

somehow I just can not find the solution for it and would very much appreciate your help. The task is simple: all NULL values of a table should be updated to a certain not null value, let's say to 0,. whereas the not NULL values remain the same. If there are many columns to be updated and if a table is a little bit bigger, for example > 5 GB, then it would take quite a long time to update each column separately. Could it be done in a just one single run? Of course there is also a question, how the table blocks would "look" like afterwards, e.g. if there would be very many chained rows, since we update a NULL to not null. The next question arises as well, if the existing indexes would also be "disturbed", although none of the indexed columns will be updated. Many thanks & best regards from Vienna, Drazen.

and Tom said...

this is a horrible idea.

http://asktom.oracle.com/Misc/use-null-for-unknonw-data.html

I recommend you rethink this entirely. Really and truly I do.


update t
set c1 = case when c1 is null then 0 else c1 end, 
    c2 = case when c2 is null then 0 else c2 end,
    ......


that hurt me to type that in :(

Rating

  (6 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

NULLs in data warehouse

James Su, January 05, 2010 - 2:36 pm UTC

Hi Tom,
I read from a book that we need to try to avoid NULLs in a data warehouse fact table. Instead, we create an entry called "unknown" in the dimension table, and populate the field in the fact table with the dimension key that points to this "unknown" entry. The reason to do so is to avoid outer join because it's not friendly to data presentation tools. What do you think?
Tom Kyte
January 06, 2010 - 7:11 am UTC

I think that all things should be presented with "why"

If I were to say to you:

do not do that, it is wrong.

that in this context means *anything* - when then would you say? Would you say "person X said this, it must be true"

I read it in a book, it must be true.


Please go back to the book, find the reference and post back here their logic, their thought process.

The reason to do so is to avoid outer join because it's not
friendly to data presentation tools.


that is a silly statement. Next they'll say "do not do facts and dimensions, the reason is to avoid joins because it's not friendly to data presentation tools"


here is one reason to use nulls
http://asktom.oracle.com/Misc/use-null-for-unknonw-data.html


Update of multiple columns in a single run

Drazen Eror, January 06, 2010 - 8:19 am UTC

Hi Tom,
you're right, it really seems to be a horrible idea. The request came from our development and competence center. I assure you, they do not understand the consequences of a such operation, nor do they really try to avoid NULLs. And when the NULLs already exists, nothing can be done any more. They did not accept my suggestion engaging defaults, I already forgot what the reason was.
Let's presume there are only 5% NULL values in total which are spread among many columns, which luckily aren't indexed. The solution with the usage of case is of course very unpractical, since it does an update even if the value was not null.
Could you please think of some possibility using cursors in a PL-SQL block?
Thank you very much!
Drazen.

Tom Kyte
January 06, 2010 - 9:55 am UTC

... The request came from our
development and competence center....

laughed at that :)

update t
set c1 = case when c1 is null then 0 else c1 end, 
    c2 = case when c2 is null then 0 else c2 end,
    ......<b>
where c1 is null or c2 is null or c2 is null ....;</b>



will be the only way to do it in a single SQL - it'll update the fewest rows possible, but if you want to update in a single pass, you will update columns that are NOT NULL if at least one of them is NULL - no avoiding that without using many passes.

Nulls in data warehouse

James Su, January 06, 2010 - 8:23 am UTC

Hi Tom,
Thank you for your reply. I went back to the book and found these: (it doesn't say not friendly to data presentation tools, but to users. That was my mistake)

The Data Warehouse Toolkit (Second Edition)
The Complete Guide toDimensional Modeling
by Ralph Kimball, Margy Ross

page 49, Chapter 1:
...Typically, many sales transaction line items involve products that are not being
promoted. We will need to include a row in the promotion dimension, with its
own unique key, to identify "No Promotion in Effect" and avoid a null promotion
key in the fact table. Referential integrity is violated if we put a null in a
fact table column declared as a foreign key to a dimension table. In addition to
the referential integrity alarms, null keys are the source of great confusion to
our users because they can't join on null keys.

You must avoid null keys in the fact table. A proper design includes a row in the
corresponding dimension table to identify that the dimension is not applicable
to the measurement.

page 261, Chapter 13:
...The surrogate date key in the fact table must not be null, but the full date description
in the corresponding date dimension table row can indeed be null.

page 270, Chapter 13:
...We also eliminate the nulls in the classic positional
fact table design because a row exists only if the measurement exists.

Page 408, Glossary
Null: A data field or record for which no value exists. We avoid null keys
in the fact table by assigning a dimension surrogate key to identify "Not
Applicable," "To Be Determined," or other "Empty" conditions.
Tom Kyte
January 06, 2010 - 10:02 am UTC

... but to users....

to which I say the same thing. If users are smart enough to join, users are smart enough to join, period.

If they are not, they use a BI tool and the developers set up the "end user layer", what is presented to the users - and that would not include "having to join themselves", that would have been done in the presentation layer already.

Back to square one.



... You must avoid null keys in the fact table. A proper design includes a row in
the
corresponding dimension table to identify that the dimension is not applicable
to the measurement.
...

I disagree with that sentiment.


... Referential integrity is violated if we put a null in a
fact table column declared as a foreign key to a dimension table. ...

that is not a true statement in any RDBMS. Not only is it not true, it is the opposite of true in more ways than one. A true statement would be "referential integrity is violated if we use a fake value for missing data" (see referenced link again above)


... null keys are the source of great confusion
to
our users because they can't join on null keys.
...

if that is true, the users cannot do joins, they are not smart enough in the first place, they need to use a tool that already includes all of this and just gives them a "big old spread sheet" to query.


NULLS

A reader, January 06, 2010 - 10:44 am UTC


issue with joins and null

Mike Kutz, January 06, 2010 - 12:11 pm UTC

tom:
from above
... null keys are the source of great confusion
to
our users because they can't join on null keys.
...

if that is true, the users cannot do joins, they are not smart enough in the first place, they need to use a tool that already includes all of this and just gives them a "big old spread sheet" to query.
----
I've ran into issue with joins on NULL values.
I have a 'hack' but i don't know if it is a good/proper way to do it.

Is my 'hack' valid?
is there a better way to do it?
or did i just skip the page(s) in the books that go over joins with NULL values?

the set up:
I have a table that has zero or one row per "ID,sub_type" pair.
(initially 9iR2, but will be upgrading to 11gR2 this year)

create table calc_values (
  item_id  varchar2(10) not null,
  sub_type varchar2(10),
  result   number
);

-- force maximum of one row
create unique index cv_uix on calc_values(item_id,sub_type);


create or replace procedure upsertCalcValues( p_item_id in varchar2, p_sub_type in varchar2, p_result in number )
as
begin
  merge into calc_values CV
  using ( select p_item_id as item_id,
            p_sub_type as sub_type,
            p_result as result
         from dual ) DI
  on ( CV.item_id=DI.item_id
    and CV.sub_type=DI.sub_type )
  when matched then update
    set result=DI.result
  when not matched then insert
    (item_id, sub_type, result) values (DI.item_id,DI.sub_type,DI.result);

  -- other stuff goes here
end;
/

-- add some data
execute upsertCalcValues('alice', null, 1);
execute upsertCalcValues('alice', 'strange', -3/2);
execute upsertCalcValues('alice', 'charm', 2/3);
execute upsertCalcValues('bob', null, 1);
execute upsertCalcValues('bob', 'up', 1/2);
execute upsertCalcValues('bob', 'down', -1/2);
-- update a row
execute upsertCalcValues('alice', 'charm', 3/2);
commit;
select * from calc_values;

ITEM_ID    SUB_TYPE       RESULT
---------- ---------- ----------
alice                          1
alice      strange          -1.5
alice      charm             1.5 <-- notice, only one 'alice','charm' row and RESULT is 3/2 not 2/3
bob                            1
bob        up                 .5
bob        down              -.5

6 rows selected.


-- try to update a row with a NULL value
execute upsertCalcValues('alice', null, 1.1);

ORA-00001: unique constraint (CV_UIX) violated

================
My hack to MERGE statement
-- There is no such thing as SUB_TYPE named 'null'.
-- SUB_TYPE IS NULL is valid. (and, actually, end-users prefer IS NULL over 'null')
alter table calc_values
  add constraint cv_nonull_chk check ( sub_type != 'null' );


-- modify MERGE statement
...
on ( CV.item_id=DI.item_id
and nvl(CV.sub_type,'null')=nvl(DI.sub_type,'null') )
...



In 11g, i'd create a virtual column for the NVL() value and a PK with that column.

Tom Kyte
January 06, 2010 - 1:27 pm UTC

ops$tkyte%ORA10GR2> create or replace procedure upsertCalcValues( p_item_id in varchar2, p_sub_type in varchar2, p_result in number )
  2  as
  3  begin
  4    merge into calc_values CV
  5    using ( select p_item_id as item_id,
  6              p_sub_type as sub_type,
  7              p_result as result
  8           from dual ) DI
  9    on ( CV.item_id=DI.item_id<b>
 10      and decode( CV.sub_type, DI.sub_type, 1, 0 ) = 1 )</b>
 11    when matched then update
 12      set result=DI.result
 13    when not matched then insert
 14      (item_id, sub_type, result) values (DI.item_id,DI.sub_type,DI.result);
 15
 16    -- other stuff goes here
 17  end;
 18  /

Procedure created.

<b>or</b>

ops$tkyte%ORA10GR2> create or replace procedure upsertCalcValues( p_item_id in varchar2, p_sub_type in varchar2, p_result in number )
  2  as
  3  begin
  4    merge into calc_values CV
  5    using ( select p_item_id as item_id,
  6              p_sub_type as sub_type,
  7              p_result as result
  8           from dual ) DI
  9    on ( CV.item_id=DI.item_id<b>
 10      and ( CV.sub_type= DI.sub_type or (cv.sub_type is null and di.sub_type is null) )</b>
 11          )
 12    when matched then update
 13      set result=DI.result
 14    when not matched then insert
 15      (item_id, sub_type, result) values (DI.item_id,DI.sub_type,DI.result);
 16
 17    -- other stuff goes here
 18  end;
 19  /

Procedure created.



In that fashion, you don't have to pick some value - who is to say that subtype 'null' isn't a valid value?

Mike Kutz, January 11, 2010 - 8:09 am UTC

Tom
Thank you!

I like the 2nd one better as it seems to say exactly why we need the extra check.

Enjoy the New Year.

MK