After reviewing the Oracle documentation regarding isolation levels, I'm trying to make sure I am correctly understanding how my DML will work. The basic use case is that I am developing a job management service that triggers jobs that are implemented by other microservices. It triggers these jobs by sending them a message via a message broker. There are multiple instances of a centralized job management service that has a timer that fires every second. During each 1 second run, a transaction is started on the Java side using Spring's @Transaction annotation. The Java app then calls a stored procedure (the stored proc is a security requirement our company has, we cannot use SQL directly from our Java code) that will look for all jobs that are in a WAITING state and whose trigger time is ready. The stored procedure will flip the state from WAITING to RUNNING and return a resultset with the jobs that are ready to run. The Java side will then create messages from the resultset and push them on the broker. If pushing messages fails, the transaction will be rolled back which will put the affected jobs back in the WAITING state.
There are a couple of options in the stored proc that I have evaluated to get the jobs that are ready to run and to update them to the RUNNING state. My leaning is:
TYPE pk_list IS TABLE OF NUMBER INDEX BY PLS_INTEGER;
pk pk_list;
...
UPDATE jobs
SET state = 'RUNNING'
WHERE state = 'WAITING'
AND SYSTIMESTAMP >= trigger_time
RETURNING job_id BULK COLLECT INTO pk;
...
-- Use pk to SELECT * FROM jobs returning as SYS_REFCURSOR
Since there will be multiple (about 6) instances of the job management service each call this stored proc, I'm am concerned about race conditions and two instances updating the same rows. Updating is actually not a big issue because they would be updated to the same state. The bigger issue is returning those jobs and double-triggering them.
Here is how I think this works. If I use READ_COMMITTED isolation, each instance will get a consistent view of the data when starting each statement (the UPDATE job's DML in this case). This first instance to get there will put an exclusive lock on the rows found in the WHERE clause. If a second instance was to start their query, they would see a view of the data prior to the first instance committing data. It would be blocked by the lock and would not be able to update the row, but once the row was unlocked, it would perform the update and return the same set of rows in its bulk collect.
If I use SERIALIZED isolation, each instance will get a consistent view of the data when starting the transaction. The same locking would happen and blocking would happen, but when the second instance tries to perform the update, I should see a "ORA-08177: can't serialize access for this transaction" error be raised. This could basically be ignored, since I know the jobs where just found by a different instance.
I also considered using a SELECT FOR UPDATE followed by UPDATE instead of the UPDATE...BULK COLLECT but I think that would give me the same results. It might also have a larger chance of collision since the SELECT and UPDATE are split across two statements.
I'd prefer to use the simpler READ_COMMITTED isolation if I can be guaranteed that collisions won't actually cause an issue. Otherwise, I am good to move to SERIALIZED isolation.
I'm also open to other options that will meet the basic requirement - only one instance should be allowed to see the data as WAITING and ready.
Updates always get rows in current mode - which is as they exist right now. So when one session commits, any updates in other sessions waiting on this will see the change.
For example, run this code in one session:
create table t (
c1 int, c2 varchar2(10)
);
insert into t values ( 1, 'WAITING' );
insert into t values ( 2, 'WAITING' );
insert into t values ( 3, 'RUNNING' );
commit;
declare
recs dbms_sql.number_table;
begin
update t
set c2 = 'RUNNING'
where c2 = 'WAITING'
returning c1 bulk collect into recs;
dbms_output.put_line ( 'Updated ' || recs.count );
for i in 1 .. recs.count loop
dbms_output.put_line ( recs(i) );
end loop;
end;
/
Updated 2
1
2If you start a second session at this point and run the anonymous block, it'll wait for the first to commit/rollback. When you do you'll see this output in the second:
Updated 0
This is because the update in the second session gets the rows as they are after the first committed; this means it sees no rows with the value WAITING.
You could make the transactions serializable and ignore the ORA-08177 errors, but there's no need to.
You can use SELECT ... FOR UPDATE too if you want. To avoid read consistency issues with this, bulk collect the query output and use this array to drive the update:
declare
recs dbms_sql.number_table;
begin
select c1
bulk collect into recs
from t
where c2 = 'WAITING'
for update;
forall i in 1 .. recs.count
update t
set c2 = 'RUNNING'
where c1 = recs (i);
dbms_output.put_line ( 'Updated ' || recs.count );
for i in 1 .. recs.count loop
dbms_output.put_line ( recs(i) );
end loop;
end;
/There are a few reasons you may prefer this over a plain update:
The nowait | wait N | skip locked clause
If one update takes a long time to commit for some reason, sessions can start piling up quickly. If you're unlucky this may make your app unusable. These options allow the transaction to stop processing, instead of waiting (potentially indefinitely).
The transaction does (lots) of other work before the update
Select for update also works in current mode, seeing the data as it is at the time it's able to lock the rows. If the query returns no rows, you may be able to stop the transaction immediately, avoiding wasted work because the update will change nothing.
You have other DML in the transaction that may lead to deadlocks
If the transaction runs many updates/deletes, you need to take care to avoid deadlocks. Doesn't sound like this is the case here, but it's worth bearing in mind for general transaction management.