You Asked

After reviewing the Oracle documentation regarding isolation levels, I'm trying to make sure I am correctly understanding how my DML will work. The basic use case is that I am developing a job management service that triggers jobs that are implemented by other microservices. It triggers these jobs by sending them a message via a message broker. There are multiple instances of a centralized job management service that has a timer that fires every second. During each 1 second run, a transaction is started on the Java side using Spring's @Transaction annotation. The Java app then calls a stored procedure (the stored proc is a security requirement our company has, we cannot use SQL directly from our Java code) that will look for all jobs that are in a WAITING state and whose trigger time is ready. The stored procedure will flip the state from WAITING to RUNNING and return a resultset with the jobs that are ready to run. The Java side will then create messages from the resultset and push them on the broker. If pushing messages fails, the transaction will be rolled back which will put the affected jobs back in the WAITING state.

There are a couple of options in the stored proc that I have evaluated to get the jobs that are ready to run and to update them to the RUNNING state. My leaning is:

  TYPE pk_list IS TABLE OF NUMBER INDEX BY PLS_INTEGER;
  pk pk_list;
  ...
  UPDATE jobs 
     SET state = 'RUNNING'
   WHERE state = 'WAITING'
     AND SYSTIMESTAMP >= trigger_time
   RETURNING job_id BULK COLLECT INTO pk;
  ...
  -- Use pk to SELECT * FROM jobs returning as SYS_REFCURSOR

Since there will be multiple (about 6) instances of the job management service each call this stored proc, I'm am concerned about race conditions and two instances updating the same rows. Updating is actually not a big issue because they would be updated to the same state. The bigger issue is returning those jobs and double-triggering them.

Here is how I think this works. If I use READ_COMMITTED isolation, each instance will get a consistent view of the data when starting each statement (the UPDATE job's DML in this case). This first instance to get there will put an exclusive lock on the rows found in the WHERE clause. If a second instance was to start their query, they would see a view of the data prior to the first instance committing data. It would be blocked by the lock and would not be able to update the row, but once the row was unlocked, it would perform the update and return the same set of rows in its bulk collect.

If I use SERIALIZED isolation, each instance will get a consistent view of the data when starting the transaction. The same locking would happen and blocking would happen, but when the second instance tries to perform the update, I should see a "ORA-08177: can't serialize access for this transaction" error be raised. This could basically be ignored, since I know the jobs where just found by a different instance.

I also considered using a SELECT FOR UPDATE followed by UPDATE instead of the UPDATE...BULK COLLECT but I think that would give me the same results. It might also have a larger chance of collision since the SELECT and UPDATE are split across two statements.

I'd prefer to use the simpler READ_COMMITTED isolation if I can be guaranteed that collisions won't actually cause an issue. Otherwise, I am good to move to SERIALIZED isolation.

I'm also open to other options that will meet the basic requirement - only one instance should be allowed to see the data as WAITING and ready.

and Chris said...

Updates always get rows in current mode - which is as they exist right now. So when one session commits, any updates in other sessions waiting on this will see the change.

For example, run this code in one session:

create table t (
  c1 int, c2 varchar2(10)
);

insert into t values ( 1, 'WAITING' );
insert into t values ( 2, 'WAITING' );
insert into t values ( 3, 'RUNNING' );
commit;

declare
  recs dbms_sql.number_table;
begin
  update t
  set    c2 = 'RUNNING'
  where  c2 = 'WAITING'
  returning c1 bulk collect into recs;
  
  dbms_output.put_line ( 'Updated ' || recs.count );
  for i in 1 .. recs.count loop
    dbms_output.put_line ( recs(i) );
  end loop;
end;
/

Updated 2
1
2

If you start a second session at this point and run the anonymous block, it'll wait for the first to commit/rollback. When you do you'll see this output in the second:

Updated 0

This is because the update in the second session gets the rows as they are after the first committed; this means it sees no rows with the value WAITING.

You could make the transactions serializable and ignore the ORA-08177 errors, but there's no need to.

You can use SELECT ... FOR UPDATE too if you want. To avoid read consistency issues with this, bulk collect the query output and use this array to drive the update:

declare
  recs dbms_sql.number_table;
begin
  select c1
  bulk collect into recs
  from   t
  where  c2 = 'WAITING'
  for update;
  
  forall i in 1 .. recs.count
    update t
    set    c2 = 'RUNNING'
    where  c1 = recs (i);
  
  dbms_output.put_line ( 'Updated ' || recs.count );
  for i in 1 .. recs.count loop
    dbms_output.put_line ( recs(i) );
  end loop;
end;
/

There are a few reasons you may prefer this over a plain update:

The nowait | wait N | skip locked clause

If one update takes a long time to commit for some reason, sessions can start piling up quickly. If you're unlucky this may make your app unusable. These options allow the transaction to stop processing, instead of waiting (potentially indefinitely).

The transaction does (lots) of other work before the update

Select for update also works in current mode, seeing the data as it is at the time it's able to lock the rows. If the query returns no rows, you may be able to stop the transaction immediately, avoiding wasted work because the update will change nothing.

You have other DML in the transaction that may lead to deadlocks

If the transaction runs many updates/deletes, you need to take care to avoid deadlocks. Doesn't sound like this is the case here, but it's worth bearing in mind for general transaction management.

Is this answer out of date? If it is, please let us know via a Comment

More to Explore

SQL

The Oracle documentation contains a complete SQL reference.

Breadcrumb

Question and Answer

You Asked

and Chris said...

More to Explore

SQL

Understanding transaction isolation levels for highly concurrent updates to the same row

Breadcrumb

Question and Answer

You Asked

and Chris said...

More to Explore

SQL