Our product includes a Pro*C program that, once a month, gathers all of the month's transactions into a staging table for billing purposes. After putting the raw data into the staging table, the program updates a column in the table:
EXEC SQL commit;
EXEC SQL update the_staging_table tst
set (varcharcolumn) =
(select obt.varcharcolumn
from other_big_table obt
where obt.numbercolumn = tst.numbercolumn
and obt.varcharcolumn2 = tst.varcharcolumn2)
where tst.varcharcolumn is NULL
and tst.numbercolumn != 0;
One of our customers runs this program on its own computer controlled by a service provider. At this point in the program, the staging table has about 200,000 rows in it and as a practical matter this will update almost all of them. The average row length is about 125. The other_big_table has maybe 20,000,000 rows, average row length 200, and the join is on a unique index of other_big_table. About every third time the program runs, this happens while executing that update:
ORA-30036: unable to extend segment by 16 in undo tablespace 'UNDO'
The conversation with the customer's service provider goes like this:
SP: "Fix your program. The UNDO tablespace is 2.5GB and you used all of it."
Me: "Give it more UNDO space. Disk is cheap."
SP: "If you used more commits, this wouldn't happen."
Me: "It's already written as one transaction, and I'm much too stubborn to break this simple update into updates of 1000 rows at a time or something."
The service provider won't let us look at the database to see what unusual thing might be going on. (If it were here, I'd change the program to set a trace and then do a tkprof and see if there's something wacky in the I/O, I guess. I can't change the program and send it, though, because changes to the production environment are rarely permitted. I can route DBA requests, but they must be precise and the turnaround is a week.)
My question is three-part:
First, is there really something about a simple update that could gobble up even a big UNDO tablespace (and 2.5GB doesn't seem all that big)? I know that at least those 200,000 records would be in UNDO until the next commit (at least), but does the fact that there's a correlated update with a much larger table make a difference?
Second, if the answer to the first is "no", are there any arguments I can use? In other contexts you've said that adding more commits can make things worse, but I don't think that was related to undo usage.
Third, if the answer to the first might be "yes", is there a precise question or test I can send to the DBA that might answer what's going on?
2.5 gb is small.
if the column is indexed, it could easily consume a lot of space.
and since the column is NULL and being updated to not null - you might be migrating a lot of rows, that'll generate additional undo - not much but some (this table should have a good pctfree setting on it!)
The statement:
"If you used more commits, this wouldn't happen." is, to put it rather bluntly, really "not smart"
Transactions are important, they are relevant. Putting a commit in willy nilly is stupid and dangerous. You'd have to take that single, simple update and break it into a long complex process with the ability to restart itself when it fails (committing frequently is the leading cause of ora-1555, if you commit frequently, you'll almost assure yourself that it WILL FAIL and will have to be restarted.
The correlated subquery is not generating additional undo, no. You can measure the amount of undo you generate by printing out used_ublk from v$transaction after the update.
2.5gb of undo is tiny, teeny tiny. Your code is correct, your code is proper. They need to allocate addition undo - until they do, they will be stuck - as in "not able to do what they need to do"
but - something else must be going on here - I think that they must have a really long running transaction (it could be very small - just long running) - if this column is not indexed- you wouldn't generate more than a couple tens of megabytes (like 70-80mb) of undo.
I just ran:
create table t
as
select * from big_table.big_table
union all
select * from big_table.big_table
/
alter table t add data varchar2(4000);
update t set data = rpad('*',4000,'*');
select used_ublk, round(used_ublk*8/1024,1) mb from v$transaction;
and the result was:
USED_UBLK MB
---------- ----------
8999 70.3
So, I think they are actually the cause of the problem (unless you have this column indexed - if so, drop it, update, create it)
They must have other stuff going on and IT is using the vast majority of the undo OR they have a transaction that has been open for a very very long time.
suggestion: start logging the amount of undo your generate, then tell them "ummm, sorry, I only need 100mb - you have a problem elsewhere"