Skip to Main Content
  • Questions
  • Uncompleted/Long Running Batch Jobs and Instance Crash

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Sandeep.

Asked: November 28, 2015 - 10:30 am UTC

Last updated: December 16, 2015 - 1:27 am UTC

Version: 11.2.0.4.0

Viewed 1000+ times

You Asked

Hi Tom

Could you please help,
1 > What is a batch job? Is it a job which runs without human intervention, in a specified time.
2 > If 1 is true then is the dbms_scheduler way of job scheduling is the only method provided by oracle ?
3 > what immediately happens to the batch job, in case of instance crash?
3 > What happens when the instance crashed, but the batch job is in the middle? Considering a batch commit is in place, the transactions before the next commit,and after the last commit, should be rolled back! Does anything else happens here?
4 > given a scenario, batch job runs from 00:01 am till 4:00 am
instance crashed at 02:00am
I brought up the instance at 03:00am
In this scenario what would be the behavior of the job and transactions(if it is like [insert commit;update commit;insert commit;])
A >> after 02:00am? to me it is void, no instance=no-database=no-jobs
B >> after 03:00am? I am not able to imagine, how the batch job should behave?
5 > How do I find out, which all transactions are success and which all transactions are failed, considering the check happens after 09:00am in the morning?

Thanks and Regards,
Sandeep

and Connor said...

A batch job is basically anything that runs in a typically "unattended" fashion, ie, there is no screen or terminal that a user is sitting at waiting for it for finish.

There are many ways of running them - via scheduler, via external script, via cron, via 3rd party scheduling products etc etc

In terms of transactions, batch jobs are exactly the same as anything in Oracle - in an instance crash, everything up to last commit is saved, all changes that are not yet committed will be rolled back .

So easiest way - dont commit until the end. (Then you just re-run your batch job).

Or, commit at particular points and make sure log sufficient information to make your jobs restartable.

Rating

  (7 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

A reader, December 03, 2015 - 6:26 am UTC

Hi Connor,

Thanks for the response.

I totally agree with you, would also like to know, [ may be some document you may please direct me to] how to find out which all update statements [just to say] are rolled back, at the time of instance crash.
Considering it is a oracle job (dbms_scheduler), what will happen to the job status, killed or suspended on instance crash. If killed then ok, if not will it be trying to resume once the db is up?

Regards,
Sandeep
Connor McDonald
December 03, 2015 - 7:30 am UTC

Here's an example - this job runs for one minute

SQL> BEGIN
  2    DBMS_SCHEDULER.create_job (
  3      job_name        => 'ONE_MINUTE_JOB',
  4      job_type        => 'PLSQL_BLOCK',
  5      job_action      => 'BEGIN dbms_lock.sleep(60); END;',
  6      start_date      => SYSTIMESTAMP,
  7      enabled         => TRUE);
  8  END;
  9  /

PL/SQL procedure successfully completed.



I then did shutdown abort from another session. WHen the database is restarted I can look at DBA_SCHEDULER_JOB_LOG, and I'll see

SQL> @pt "select * from DBA_SCHEDULER_JOB_LOG where job_name like 'ON%'"
LOG_ID                        : 21250
LOG_DATE                      : 03-DEC-15 03.07.44.348000 PM +08:00
OWNER                         : MCDONAC
JOB_NAME                      : ONE_MINUTE_JOB
JOB_SUBNAME                   :
JOB_CLASS                     :
OPERATION                     : RUN
STATUS                        : STOPPED
USER_NAME                     :
CLIENT_ID                     :
GLOBAL_UID                    :
CREDENTIAL_OWNER              :
CREDENTIAL_NAME               :
DESTINATION_OWNER             :
DESTINATION                   :
ADDITIONAL_INFO               : REASON="ORA-01014: ORACLE shutdown in progress"
-----------------


You might want to repeat that with a more brutal test (ie, server power off)



A reader, December 04, 2015 - 6:56 am UTC

Hi Connor,

Many Thanks for the lucid example.

Could you please also help in how to find out which all update statements are rolled back .

Thanks and Regards,
Sandeep
Connor McDonald
December 05, 2015 - 4:11 pm UTC

You cant tell unless you have good instrumentation in your code, ie

logger('Commencing update 1');
update ...
logger('update 1 completed');

etc etc

where 'logger' writes to a table or file etc.

Good instrumentation is *everything* in any software

A reader, December 06, 2015 - 2:37 am UTC

Hi Connor,

Thank You.
Please confirm if the below analysis is TRUE.
For
update tableA set status=0 where status=1;
UNDO
update tableA set status=1 where status=0;
REDO
update tableA set status=0 where status=1;

The UNDO + REDO =>goes to=> Redo Log Buffer
in case of commit the redo info becomes permanent and get recorded in on-line-redo-log-files

in case of roll back, it is also enforcing back a change in database, so it is also a kind of commit, so this information should be present in on-line-redo-log-files.
and log miner can be used.

The above assumption, will only satisfy, if oracle uses the algorithm of recording all changes [change the original to a different + change back to original from a different ] on commit

Thanks and Regards,
Sandeep
Connor McDonald
December 06, 2015 - 11:12 am UTC

"in case of commit the redo info becomes permanent and get recorded in on-line-redo-log-files"

No. Both committed and uncommitted changes are written to redo logs. Its not like we buffer all the changes *until* a commit. We continuously write redo information from the redo buffer to disk.

A reader, December 08, 2015 - 12:51 pm UTC

Hi Connor,

I missed the point sorry about it
The conditions where the log buffer is flushed include:
A session issues a commit or a rollback command.
The log buffer becomes 1/3 full.
A timeout (every 3 seconds) occurs.
A checkpoint occurs.
So definitely the redo-log-files will have uncommitted transaction information as well.

In case of Instance crash happens, the SGA is gone, changed buffers are gone as well,

SO is it like,
during instance recovery, online redo-log-file (as sga is gone) or if required archive-log-file is referred to find out which all transactions are committed and use DBWR to make the changes permanent, and for uncommitted transaction do nothing because the blocks in the datafile would not have changed,
so whats the point in trying to undo something which has not yet done, so 'roll back uncommitted transaction' exactly does what? removing the uncommitted transaction from online-redo-log-file and archive-log-file,

Regards,
Sandeep

Connor McDonald
December 08, 2015 - 2:10 pm UTC

You may have written *uncommitted* changes to the datafiles. So for instance recovery, you need to undo those.

A reader, December 09, 2015 - 1:34 am UTC

Hi Connor,

So is this how, things work ?

When log switch happens, it triggers a checkpoint, or may be DBWn get fired because of space crunch in databse-buffer-cache,
And the uncommitted data present in buffer-cache [dirty buffers not yet committed ] also get written into datafile.
And the roll back process undo these uncommitted information from within the datafile.

The information needed to do the undo is fetched from redo-log [as for every undo image there is also a corresponding redo-entry is present]

Please confirm.

Regards,
Sandeep
Connor McDonald
December 09, 2015 - 9:08 am UTC

Thats a reasonable summation

A reader, December 12, 2015 - 1:41 am UTC

Hi Connor,

Thank You.

The 'uncommitted-undo' information, is removed from the redo-log-files, once after successful rollback, so is this the reason why rolled back transactions cant be seen even through logminer?

If it is not removed, but just marked as rolled back, then ideally it should be visible ?

Regards,
Sandeep
Connor McDonald
December 14, 2015 - 4:08 am UTC

Nothing is ever "removed" from a redo log.

What Logminer decides to expose through the API is entirely up to the developers that look after LogMiner, but I think it makes sense to display committed transactions.

A reader, December 15, 2015 - 11:29 pm UTC

Hi Connor,

Thank You.

I am sorry, but when I look from a customer's point of view, it makes sense to me, to find out what all sets of activities are not done. Out of 100 things suppose 98 things are done, then finding only two things which are not done is easy, rather than finding which all 98 things are done and conclude which two things are not done since you can't see them in the done-list.

I wonder how can I convenience, the upset customer.

Regards,
Sandeep

Connor McDonald
December 16, 2015 - 1:27 am UTC

redo and undo are there for us to recovery an instance, and to provide rollback and read consistenct mechanisms.

They are not there to supplement application logic.

As with any application, you should have a thorough level of instrumentation throughout your code. If you need to use LogMiner (or equivalent) to work out the state of your data after a crash...then that's an application that needs some work.

http://asktom.oracle.com/Misc/instrumentation.html

More to Explore

DBMS_SCHEDULER

More on PL/SQL routine DBMS_SCHEDULER here