The Tom Kyte Blog: That old restart problem again...

Tuesday, April 27, 2010

That old restart problem again...

Not something entirely learned new today - but rather a revisited "something I learned" coupled with an "I didn't necessarily expect it in this case".

It is the old "statement restart" and "evil triggers" issue. I had an asktom question asking why a row seemed to be getting deleted twice. You should read that link, I'll be referring to it here.

I immediately knew what the issue was (I was pretty sure). Before I even read the test case - I had already pointed them to three articles I wrote on the subject a while ago...

But when I looked at their test case and modified it slightly to be smaller and easier to read - I was frankly surprised at what was happening - but ok with the behavior.

It takes a series of bad things to happen for this issue the poster was seeing to have happen, you have to

a) use a trigger (evil)
b) do something non-transactional in that trigger - for example modify a package global variable
c) do slow by slow processing (if they did a big mass operation - they could still see a restart, but it would be less likely perhaps)

Here is a snippet of the code again:


ops$tkyte%ORA11GR2> create or replace PACKAGE pacepack
2  as
3      type array is table of number index by varchar2(40);
4      g_data array;
5      g_cnt  number;
6  end pacepack;
7  /

Package created.

That package just has some global variables (evil global variables - evil evil - they didn't use globals in their API, they had getter/setter functions - but they behaved like globals as well). We'll create a row trigger to save the rowids of the rows our trigger has processed - even if Oracle rolls back our firing statement - modifications we've made in our trigger to the globals won't rollback.

Now, we'll code the trigger:


ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> create table delete_data
2  as
3  select owner, object_name
4  from dba_objects
5  where rownum  < 53001;

Table created.




ops$tkyte%ORA11GR2> create or replace trigger delete_data_bt
2  before delete or update on delete_data
3  for each row
4  begin
5      if ( pacepack.g_data.exists(rowidtochar(:old.rowid)) )
6      then
7          dbms_output.put_line( 'doing "' || :old.rowid ||
                   '" again was called ' || pacepack.g_cnt );
8      else
9          pacepack.g_data(rowidtochar(:old.rowid)) := 1;
10      end if;
11      pacepack.g_cnt := pacepack.g_cnt + 1;
12  end;
13  /

Trigger created.

The trigger checks to see if the PLSQL global table variable has the rowid already entered in it - if so, it says "I already did this one", else it sets a flag showing it processed that one for the first time. We also increment a count (you know, to count the rows we've deleted or updated).

Now we process the data:


ops$tkyte%ORA11GR2> declare
2      CURSOR L_DELETE_CSR IS
3      SELECT *
4        FROM delete_data
5         for update ;
6      l_cnt number := 0;
7  BEGIN
8       pacepack.g_data.delete;
9       pacepack.g_cnt := 0;
10      for l_delete_row in l_delete_csr
11      loop
12            update delete_data
13             set owner = lower(owner)
14           where current of L_delete_CSR;
15          l_cnt := l_cnt + 1;
16      end loop;
17      dbms_output.put_line( 'trigger count = ' || pacepack.g_cnt ||
                     ' local count = ' || l_cnt );
18  END;
19  /
doing "AAAYMdAAEAABFrnABH" again was called 8827
doing "AAAYMdAAEAABFwiAAx" again was called 20140
doing "AAAYMdAAEAABFyeACk" again was called 31405
doing "AAAYMdAAEAABFzaABE" again was called 42670
trigger count = 53004 local count = 53000

PL/SQL procedure successfully completed.

As you can see - our trigger was fired 53,004 times - for 53,000 rows. Our counts do not match and we can see the four updates that were restarted.

Why they were restarted - not really relevant - the fact is - they CAN be restarted, we can show that they can be restarted, it is documented that they can be. Any statement can be restarted - we do a savepoint before executing the statement and if we deem necessary (which we did obviously), we rollback to that savepoint and do it over.

So, expect the restart - and stop using triggers. If you do use triggers - be very very very careful to make sure they are not subject to issues with restarts (eg: NO autonomous transactions - NO modifications of global variables - NO API calls that have side effects that don't roll back (setter functions for example)!

15 Comments:

Curtis said....: So... are autonomous transactions in triggers always evil or is there a good use of them?

For example they only good use i have seen for them in triggers is to spawn a scheduler job to do something at a later time because something happened (new insert, update, delete maybe).

For instance we have a package that automatically creates new range (date) partitions on a base table (that has multiple domain indexes so we couldn't use SET INTERVAL in 11g). So when a dml operation happens, we create a scheduler job (if the job doesn't exist) to verify that all proper partitions exist. The job doesn't assume that a partition needs to be split/merged but queries the base table to figure it out in the case of a rollback was performed on the dml.

We figured creating the job post dml through a trigger was preferable to having a job poll the table for changes.

Opinions in your post's context?; Tue Apr 27, 07:44:00 PM EDT
Thomas Kyte said....: @Curtis

... For example they only good use i have seen for them in triggers is to spawn a scheduler job to do something at a later time ....

Oh, so in this case you spawned 4 extra jobs.

Or, if I rollback you spawned 53,004 jobs in response to something that *never happened*.

This is truly evil - yes. Autonomous transactions have exactly ONE use:

to log the error message that just happened - right before the rollback to undo any outstanding work is issued.

I don't get your example - why would you do that upon DML - this just sounds like a job that should be SCHEDULED - once, as in only one time - to run at the end of each month or something. I don't see why you would check with each DML operation - since the partition the row is inserted into must already exist - why would you check over and over and over again?; Tue Apr 27, 07:49:00 PM EDT
Thomas Kyte said....: @Curtis,

if doing this async makes sense (and it doesn't yet to me), you should just use dbms_job - that'll roll back with you - it is transactional; Tue Apr 27, 07:50:00 PM EDT
Curtis said....: @Tom,

In the 53004 case, if they all happened in the same day, we would only spawn 1 job, but we would attempt to spawn 53003 jobs. The job would actually run during a low usage time of the day.

As for our use case, we have a horrible (non-existant) data model where one database may have 300 identical schemas with identical tables inside. Also, our data arrives out of date order. So today i may get records for today, but tomorrow i may get records for 10 years in the past. We also don't precreate our partitions for all possible date ranges, but we have a MAXVALUE partition so data may fall into a partition that doesn't exist yet, and we will alter partition split it out in our job. We also don't always need a "this" month partition, so we could go a year without updates before they resume again.

Needless our use case is odd, and I was just wondering your opinion on a valid use for an autonomous transaction.

Also, we were using scheduler jobs instead of dbms_jobs due to dbms_scheduler familiarity and eventual goal of using the rest of dbms_scheduler's abilities schedules/programs/job classes/windows. Also, the scheduler jobs were much easier for us to reschedule the job if the table was unable to be partitioned due to locks.; Tue Apr 27, 09:16:00 PM EDT
Narendra said....: Tom,

Reading the comments (from Curtis) so far, I think it is high time non-transactional nature of DBMS_SCHEDULER is filed as a "bug", especially since Oracle is promoting it as enhanced (??) version of DBMS_JOB.

p.s. If "bug" sounds harsh :), then at least an enhancement request should be filed to have an option to make DBMS_SCHEDULER transactional.; Wed Apr 28, 04:20:00 AM EDT
Jon T. said....: Just a quick point, unless I'm missing something. If you want to count the number of rows then you need to refer to SQL%ROWCOUNT. The count is actually indicating the nunber of times the trigger is run. In this scenario the two, I guess, are synonymous, and maybe I'm being a bit too pedantic.; Wed Apr 28, 12:30:00 PM EDT
Thomas Kyte said....: @Jon T.

yes, you are correct, in real life, I should have:

l_cnt := l_cnt + sql%rowcount;

In this case - single user, delete where current of, nothing else happening - it should all be OK.

but yes, you are correct.; Wed Apr 28, 01:29:00 PM EDT
Connor McDonald said....: re Narendra's comment.

I agree - I was all excited when 11g talked about "lightweight" scheduler jobs...but no, still that commit whacked in there :-); Thu Apr 29, 06:55:00 AM EDT
ksun said....: @Tom

you said: this restart is documented that they can be.

Can you please tell us which Oracle Documentation we can refer to ?; Wed Feb 16, 10:43:00 AM EST
Thomas Kyte said....: It is in the PL/SQL language guide where they state:

If the triggering statement of a BEFORE statement trigger is an UPDATE or DELETE statement that conflicts with an UPDATE statement that is running, then the database does a transparent ROLLBACK to SAVEPOINT and restarts the triggering statement. The database can do this many times before the triggering statement completes successfully. Each time the database restarts the triggering statement, the trigger fires. The ROLLBACK to SAVEPOINT does not undo changes to package variables that the trigger references. To detect this situation, include a counter variable in the package.; Wed Feb 16, 11:25:00 AM EST
ksun said....: @Tom

Thanks.

The pre-condition is:
"the triggering statement of a BEFORE statement trigger is an UPDATE or DELETE statement",
but trigger delete_data_bt in your example has no UPDATE/DELETE statement.

Also the example in your Book: Expert Oracle Database Architecture 9i and 10g (page p249),
the restart is caused by a update lock.

I wonder how one can interpret the Oracle Document.; Thu Feb 17, 11:06:00 AM EST
ksun said....: @Tom

Following modified script shows some coincidence of undo_extent and restart.
Tested on 10g and 11g.

-----------------------------------
-- assume no other active transactions

set serveroutput on

create or replace package pacepack
as
type array is table of number index by varchar2(40);
g_data array;
g_cnt number;
g_restart_cnt number;
g_start_undo_extent_cnt number;
end pacepack;
/

drop table delete_data;

create table delete_data as
select owner
,object_name
from dba_objects
where rownum < 53001;

create or replace trigger delete_data_bt
before delete or update on delete_data
for each row
begin
if ( pacepack.g_data.exists(rowidtochar(:old.rowid)) )
then
pacepack.g_restart_cnt := pacepack.g_restart_cnt + 1;

for c in (select count(*) cnt from dba_undo_extents where status = 'ACTIVE') loop -- assume only one active transaction
dbms_output.put_line(' Restart#: ' || pacepack.g_restart_cnt || ', Undo Extent Count: ' || c.cnt);
end loop;

dbms_output.put_line(' Row#: ' || (pacepack.g_cnt - pacepack.g_restart_cnt) ||
' with Rowid: ' || :new.rowid ||
' was called again ' || pacepack.g_cnt);
dbms_output.put_line(' Block: ' || dbms_rowid.rowid_block_number(:old.ROWID) ||
' ,Row: ' || dbms_rowid.rowid_row_number (:old.ROWID));
dbms_output.put_line(' ----------------------------------');
else
pacepack.g_data(rowidtochar(:old.rowid)) := 1;
end if;

pacepack.g_cnt := pacepack.g_cnt + 1;

if pacepack.g_cnt = 1 then -- only print for first update
for c in (select count(*) cnt from dba_undo_extents where status = 'ACTIVE') loop
pacepack.g_start_undo_extent_cnt := c.cnt;
dbms_output.put_line('Start Undo Extent Count: ' || c.cnt);
dbms_output.put_line('----------------------------------');
end loop;
end if;
end;
/

declare
cursor l_delete_csr is
select *
from delete_data
for update;
l_cnt number := 0;
begin
pacepack.g_data.delete;
pacepack.g_cnt := 0;
pacepack.g_restart_cnt := 0;

for l_delete_row in l_delete_csr loop
update delete_data
set owner = lower(owner)
where current of l_delete_csr;
l_cnt := l_cnt + 1;
end loop;

dbms_output.put_line( pacepack.g_restart_cnt || ' Rows Restarted.' ||
' trigger count = ' || pacepack.g_cnt ||
' ,local count = ' || l_cnt
);

for c in (select count(*) cnt from dba_undo_extents where status = 'ACTIVE') loop
dbms_output.put_line('----------------------------------');
dbms_output.put_line('End Undo Extent Count: ' || c.cnt ||
', New Created Undo Extents: ' || (c.cnt - pacepack.g_start_undo_extent_cnt));

dbms_output.put_line('----------------------------------');
if (c.cnt - pacepack.g_start_undo_extent_cnt) = pacepack.g_restart_cnt then
dbms_output.put_line('! The number of new Undo Extents is equal to number of Restarts !');
else
dbms_output.put_line('? The number of new Undo Extents is NOT equal to number of Restarts ?');
end if;
end loop;
end;
/; Fri Feb 18, 08:01:00 AM EST
Thomas Kyte said....: @ksun

care to share the "motivation", I mean- what were you trying to show/figure out/whatever.

Right now we just have a bunch of code - but no "motivation", what is the motivation behind the code.; Fri Feb 18, 08:03:00 AM EST
ksun said....: @Tom

The "motivation" is to figure out if the restart is triggered by new undo_extent requirement (creation).; Fri Feb 18, 08:12:00 AM EST
ksun said....: @Tom

I slightly extended your test code to show the coincidence of new active undo_extent and restart
in Blog: "Update Restart and new Active Undo Extent" (

http://ksun-oracle.blogspot.com/2011/05/update-restart-and-new-undo-extent.html ); Wed May 11, 04:49:00 AM EDT