David -- Thanks for the question regarding "Settling argument over UNDO", version 10.2.0
Submitted on 20-Aug-2008 12:46 Central time zone
Last updated 2-Sep-2008 12:02
You Asked
Our product includes a Pro*C program that, once a month, gathers all of the month's transactions into a staging table for billing purposes. After putting the raw data into the staging table, the program updates a column in the table:
EXEC SQL commit;
EXEC SQL update the_staging_table tst
set (varcharcolumn) =
(select obt.varcharcolumn
from other_big_table obt
where obt.numbercolumn = tst.numbercolumn
and obt.varcharcolumn2 = tst.varcharcolumn2)
where tst.varcharcolumn is NULL
and tst.numbercolumn != 0;
One of our customers runs this program on its own computer controlled by a service provider. At this point in the program, the staging table has about 200,000 rows in it and as a practical matter this will update almost all of them. The average row length is about 125. The other_big_table has maybe 20,000,000 rows, average row length 200, and the join is on a unique index of other_big_table. About every third time the program runs, this happens while executing that update:
ORA-30036: unable to extend segment by 16 in undo tablespace 'UNDO'
The conversation with the customer's service provider goes like this:
SP: "Fix your program. The UNDO tablespace is 2.5GB and you used all of it."
Me: "Give it more UNDO space. Disk is cheap."
SP: "If you used more commits, this wouldn't happen."
Me: "It's already written as one transaction, and I'm much too stubborn to break this simple update into updates of 1000 rows at a time or something."
The service provider won't let us look at the database to see what unusual thing might be going on. (If it were here, I'd change the program to set a trace and then do a tkprof and see if there's something wacky in the I/O, I guess. I can't change the program and send it, though, because changes to the production environment are rarely permitted. I can route DBA requests, but they must be precise and the turnaround is a week.)
My question is three-part:
First, is there really something about a simple update that could gobble up even a big UNDO tablespace (and 2.5GB doesn't seem all that big)? I know that at least those 200,000 records would be in UNDO until the next commit (at least), but does the fact that there's a correlated update with a much larger table make a difference?
Second, if the answer to the first is "no", are there any arguments I can use? In other contexts you've said that adding more commits can make things worse, but I don't think that was related to undo usage.
Third, if the answer to the first might be "yes", is there a precise question or test I can send to the DBA that might answer what's going on?
and we said...
2.5 gb is small.
if the column is indexed, it could easily consume a lot of space.
and since the column is NULL and being updated to not null - you might be migrating a lot of rows, that'll generate additional undo - not much but some (this table should have a good pctfree setting on it!)
The statement:
"If you used more commits, this wouldn't happen."
is, to put it rather bluntly, really "not smart"
Transactions are important, they are relevant. Putting a commit in willy nilly is stupid and dangerous. You'd have to take that single, simple update and break it into a long complex process with the ability to restart itself when it fails (committing frequently is the leading cause of ora-1555, if you commit frequently, you'll almost assure yourself that it WILL FAIL and will have to be restarted.
The correlated subquery is not generating additional undo, no. You can measure the amount of undo you generate by printing out used_ublk from v$transaction after the update.
2.5gb of undo is tiny, teeny tiny. Your code is correct, your code is proper. They need to allocate addition undo - until they do, they will be stuck - as in "not able to do what they need to do"
but - something else must be going on here - I think that they must have a really long running transaction (it could be very small - just long running) - if this column is not indexed- you wouldn't generate more than a couple tens of megabytes (like 70-80mb) of undo.
I just ran:
create table t
as
select * from big_table.big_table
union all
select * from big_table.big_table
/
alter table t add data varchar2(4000);
update t set data = rpad('*',4000,'*');
select used_ublk, round(used_ublk*8/1024,1) mb from v$transaction;
and the result was:
USED_UBLK MB
---------- ----------
8999 70.3
So, I think they are actually the cause of the problem (unless you have this column indexed - if so, drop it, update, create it)
They must have other stuff going on and IT is using the vast majority of the undo OR they have a transaction that has been open for a very very long time.
suggestion: start logging the amount of undo your generate, then tell them "ummm, sorry, I only need 100mb - you have a problem elsewhere"
Thanks!
August 21, 2008 - 5pm Central time zone
Reviewer: David Weigel from Villanova, PA USA
Thanks much for the deep thoughts. I hadn't thought about the effect of updating an index on UNDO.
I've checked the updated column and it's not part of an index, so that's clear. Now I can go back
to the customer's service provider....
Other possible causes
August 21, 2008 - 8pm Central time zone
Reviewer: Gary from Sydney, Aus
How long does this update take ? While it may not generate a lot of UNDO in its own right, if it
does take a long time it may be the cause for undo retention from other transactions. It may be the
transaction Tom postulates with "I think that they must have a really long running transaction (it
could be very small - just long running)"
Maybe sometimes it gets stuck waiting on a lock (and since you are updating most of the table
locking the entire table in exclusive mode before the update could resolve that).
Also, as well as indexes, check whether there's triggers, auditing or other exotica (materialized
view log) which may account for additional undo.

August 22, 2008 - 3am Central time zone
Reviewer: A reader from France
By the way
<quote>
suggestion: start logging the amount of undo your generate, then tell them "ummm, sorry, I only
need 100mb - you have a problem elsewhere"
</quote>
How do we do that ?
Followup August 22, 2008 - 9am Central time zone:
used_ublk from v$transaction. just like I did above, for example.
historical undo information in 10g
August 22, 2008 - 4am Central time zone
Reviewer: Dhairyasheel from India-Mumbai
Hi,
you can find historical undo usage information from the view
DBA_HIST_UNDOSTAT in Oracle 10g Release 2.
If you dont know which exact table to query just fire the query
select * from dict where table_name like '%UNDO%' ORDER BY TABLE_NAME;
by the names itself you can understand which tables to use.
Regards.

August 29, 2008 - 10am Central time zone
Reviewer: Alexander
Tom,
From above, I am trying to expand on this query:
select used_ublk, round(used_ublk*8/1024,1) mb from v$transaction;
I think it would be useful to have a script that grabs a few other thing in addition to what you have.
One of those things is the sql using the redo. I can't seem to get it right. I can join v$transaction to v$session to identify user information successfully.
On another thread you showed how to see user's sql, it was not trivial however. I am wondering if it can be done for this purpose on 10g (the thread was pretty old).
This is what I have so far
select y.username,
y.osuser,
x.log_io,
x.phy_io,
round(used_ublk*(select value/1024 from v$parameter where name = 'db_block_size')/1024,1)
mb,
x.start_date,
y.status
from v$transaction x,
v$session y
where saddr = ses_addr
Followup August 30, 2008 - 9am Central time zone:
... One of those things is the sql using the redo. I can't seem to get it right. ...
that is because we don't generate redo generated at the statement level.
You can see the current sql (or latest sql) a session has run, and that is pretty easy - there is a sql_address in v$session

September 2, 2008 - 10am Central time zone
Reviewer: Alexander
I misspoke, I meant to say undo. I don't know if that changes anything though.
Are you saying I can join on sql_address and get the sql in the query I have? Or do you mean I can
get it from writing a separate query? I tried using just about everything in v$sqltext, v$sql,
v$session, and v$transaction to try and get this.
Followup September 2, 2008 - 12pm Central time zone:
we don't track undo at that level either.
We do that by session.

September 2, 2008 - 11am Central time zone
Reviewer: Pedro
Check Metalink note 5442919.8 for a bug regarding the UNDO.
|