Skip to Main Content
  • Questions
  • Difference between checkpoint, incremental checkpoint & rolling checkpoint

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, John.

Asked: June 06, 2004 - 10:15 pm UTC

Last updated: August 01, 2012 - 6:48 am UTC

Version: 9.2.0.5.

Viewed 10K+ times! This question is

You Asked

Tom,

What is the difference between Checkpoint, incremental checkpoint & rolling checkpoints?

I've read that an incremental checkpoint occurs every 3 seconds as opposed to a checkpoint which occurs based on fast_start_mttr_target, log switches, instance shutdown, alter tablespace begin ... backup, and alter system checkpoint. If this is true, then a checkpoint will always occur every three seconds and it will overshadow the "regular" checkpoint caused by fast_start_mttr_target, log switches, etc since three seconds will occur more frequent so what is the point of having a checkpoint after log switches, fast_start_mttr_target, etc.? Or is the checkpoint that occurs every 3 seconds not the checkpoint that is used to mark where instance recovery will rollforward and rollback during an instance recovery.

What about a rolling checkpoint? Did some author just make this term up or does is it a special type of checkpoint? If it is, what is its purpose?

I've been searching for the answer to this question and asking metalink for a while, but I haven't found the answer nor has anybody been able to answer this question.





and Tom said...

they are all checkpoints of sorts. the goal of a checkpoint is to get dirty buffers from the SGA onto disk safely.

incremental checkpoints are looked for when dbwr times out, which is 3 seconds. They may or may not be performed every three seconds.


fast_start_mttr_target doesn't fire checkpoints -- it uses incremental checkpoints to incrementally keep the buffer cache "cleaner". the less dirty blocks in the cache, the less time it takes to recover.

But basically -- think about it like this -- checkpointing, the act of cleaning out the buffer cache of dirty blocks in order to make recovery faster -- is something that in most systems is happening almost constantly. There are still major 'checkpoint' events -- such as "you cannot reuse a redo log file until the checkpoint it fired off has completed", and these will 'pause' the database (eg: if we try to advance into a log file whose checkpoint has not completed -- you get the infamous "checkpoint not complete, cannot allocate new log" message -- database pauses while dbwr writes like mad). The other checkpoint events are lower priority - more like a "hey, if you have time, why don't you do this so as to keep up better over time". They will not pause the database.


But checkpoints do not occur every N units of time, dbwr (like many background processes -- smon, pmon and so on) time out every N units of time (3 seconds is typical) and look to see if they need to do anything. If so, they do it, else they go back to sleep.


a rolling checkpoint is a made up term as far as I can tell. I'm not familar with it.

Rating

  (48 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

I can see clearer now

john cantu, June 25, 2004 - 9:51 pm UTC

Tom,

Thank you for clearing things up for me! However, as the fog cleared, I can now clearly see another obstacle. According to the Oracle documentation, instance recovery will only rollforward and rollback from the time of the last checkpoint. Are they referring to "major" checkpoints and/or or incremental checkpoints?

Also, does an incremental checkpoint involve about the same series of steps as a "major 'checkpoint' events"? According Oracle support, they do the same thing, but are triggered by different events. I just want a second confirmation since I can't test this out for myself.

Tom Kyte
June 26, 2004 - 6:37 pm UTC

last completed checkpoint - incremental or whatever (they are all 'checkpoints', with the goal of getting blocks onto disk so that no recovery would be needed to be applied to them). It asks the question "hey, how far back into the logs do I have to go and start applying from -- knowing the further back, the longer we take"

the goal of incremental checkpoints is to advance the checkpoint marker so that less redo would have to be read and applied.

All a checkpoint does is write blocks out and advance a marker. It looks and says "hmm, what blocks should I write out so I can advance this marker so that we have less redo to process". A "major checkpoint" is trying really hard to advance the marker into the next redo log, an "incremental" is just trying to advance the marker far enough so that only X blocks of redo would have to be applied during recovery. Same jobs, different triggers.

Thank for your help.

John Cantu, June 30, 2004 - 9:31 am UTC

Tom,

Understood. Thanks once again for clearing things up.



DBWR Checkpoint

Dilip Patel, July 20, 2004 - 4:34 am UTC

Tom,

Thanks for the book "Effective Oracle by Design" and ofcourse for this site.

I have sized redo log file to 50 MB, and set following for checkpoints.

log_checkpoint_interval 25600
log_checkpoint_timeout 0
log_checkpoints_to_alert TRUE

So checkpoint should happen 4 times before redo log switch.

But I am observing the events in the alert.log file as following.
Mon Jul 19 06:05:58 2004
Beginning log switch checkpoint up to RBA [0x248e2.2.10], SCN: 0x0000.4feed4ed
Thread 1 advanced to log sequence 149730
Current log# 3 seq# 149730 mem# 0: /u05/attens/redo03.log
Mon Jul 19 06:05:58 2004
ARC0: Beginning to archive log# 2 seq# 149729
ARC0: Completed archiving log# 2 seq# 149729
Mon Jul 19 06:33:46 2004
Completed checkpoint up to RBA [0x248e2.2.10], SCN: 0x0000.4feed4ed
Mon Jul 19 07:59:10 2004
Beginning log switch checkpoint up to RBA [0x248e3.2.10], SCN: 0x0000.4fef7cf8
Thread 1 advanced to log sequence 149731
Current log# 4 seq# 149731 mem# 0: /u05/attens/redo04.log
Mon Jul 19 07:59:10 2004
ARC0: Beginning to archive log# 3 seq# 149730
ARC0: Completed archiving log# 3 seq# 149730

And during the same time my statspack report shows following activity.

The statspack report is between 6:05 to 7:50. (I know it's a longer time for snaps, but I am monitoring the batch activity)

Instance Activity Stats for DB: ATTENS Instance: attens Snaps: 403 -405

Statistic Total per Second per Trans
--------------------------------- ---------------- ------------ ------------
CPU used by this session 719,580 114.4 36.2
CPU used when call started 719,580 114.4 36.2
CR blocks created 11 0.0 0.0
DBWR buffers scanned 814,895 129.6 41.0
DBWR checkpoint buffers written 2,824 0.5 0.1
DBWR checkpoints 23 0.0 0.0
DBWR free buffers found 812,030 129.1 40.9
DBWR lru scans 1,475 0.2 0.1
DBWR make free requests 1,782 0.3 0.1
DBWR summed scan depth 814,895 129.6 41.0
DBWR transaction table writes 76 0.0 0.0
DBWR undo block writes 2,263 0.4 0.1
SQL*Net roundtrips to/from client 415,051 66.0 20.9
SQL*Net roundtrips to/from dblink 0 0.0 0.0
background checkpoints completed 1 0.0 0.0
background checkpoints started 1 0.0 0.0

I have following queries.

1. DBWR checkpoints shows 23, and "background checkpoints completed" and "background checkpoints started" shows only 1
(There are no activity like Dropping objects)

What can be the reason ?



2. All incremental checkpoints get logged in the alert.log file or not ?

3. What is the event/view I can trace for incremental checkpoints.




Tom Kyte
July 20, 2004 - 9:08 am UTC

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:344617462392 <code>

they do incremental checkpoints. and don't forget -- it is based on redo blocks -- which is tied to the OS -- not oracle -- blocksize.

1) i only see one background checkpoint in your alert log between those times? why would it be any different? incremental checkpoints are not full blown checkpoints -- they are the tail chasing the head.

2) no

3) they just happen - there is no statistic directly linked to that. dbwr checkpoints happen for lots of reasons. in this case, they were probably linked to the incrementals -- since you say you dropped no objects (but PQ and other things can fire them as well)

checkpoint incomplete

A reader, July 21, 2004 - 4:23 am UTC

Hi

When we see checkpoint incomplete message in the alert we know that dbwr is not writing fast enough.

Will add more redo log group help this issue? If so how can does it help? I understand that during a checkpoint the lgwr will always have to wait for dbwr to finish writing dirty buffers before advance to next redo group so the number of redo group doesnt matter here correct?

Also, lgwr waits for dbwr when there is a checkpoint, does this apply only to background checkpoints (log switches) or incremental checkpoints too? I dont think it applies to incremental checkpoints because if that is the case then the workaround metalink offers would not work except increase number of dbwr.

Tom Kyte
July 21, 2004 - 8:12 am UTC

lgwr cannot reuse a logfile before dbwr has flushed the blocks protected by that redo log.


many times adding more redo logs "solves" this issue. If you have a system that processes in "spurts" -- a spurt of activity could flood dbwr. By having additional log files for lgwr to use, dbwr can "catch up".

lgwr does not "always have to wait for dbwr to finish writing before advancing to the next group". lgwr only waits for dbwr to finish writing if the group it was advancing into has not yet completed it's checkpoint.

So, if you just started up, and have 6 groups -- we know that lgwr won't wait for dbwr until it has used AT LEAST 6 logfiles up -- since it can advance into 2, 3, 4, 5, 6 safely -- they are not protecting any blocks in the cache. It is only when we go back to 1 that we have to ask "dbwr, did you finish the checkpoint for logfile 1? if not, wait, if so, advance"



It is only background checkpoints -- not incrementals -- that would/could cause lgwr to have to wait.

fast_start_mttr_target

reader, September 04, 2004 - 12:51 pm UTC

In my 10g database,

SQL> show parameter fast

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
fast_start_io_target                 integer     0
fast_start_mttr_target               integer     0
fast_start_parallel_rollback         string      LOW

Does it mean that incremental checkpoint is disabled? I read in online doc, in 10g oracle performs automatic tuning of checkpoints. If fast_start_mttr_target is set to zero by default, does oracle not do checkpoint tuning by default. Thanks. 

Tom Kyte
September 04, 2004 - 1:07 pm UTC

the fast start stuff do the increment checkpointing. If you do not have it on, it is not done.

Checkpoint Queue Latch

Vivek Sharma, September 22, 2004 - 8:06 am UTC

Dear Tom,

I have an Oracle 8174 Database which is basically a Datawarehouse DB.

I can see that many a times, the background processes are waiting for Latch free wait events and the latch name being "Checkpoint Queue Latch". Metalink does not gives any explanation about this latch. But in one of the forum, an Oracle Support Executive says that this could be becuase Log_checkpoint_interval parameter or any such parameter which causes Incremental Checkpoints is set to a very low value. I feel, it is the other way round.

I need your explanation for better understanding. What is this latch for and how do I eliminate the contention on this latch.


Tom Kyte
September 22, 2004 - 8:25 am UTC

why do you care that a background is waiting on this?

what background is waiting on it and are your end users actually waiting on the background?

Speed up DBWR drity writes

DBA, January 22, 2005 - 1:10 pm UTC

We experienced a specific scenario...

Oracle9i 9.2.0.4 (64 bit)
Solaris 9
Data Guard - Physical standby using LGWR async
A pl/sql procedure was performing a massive update in addition to normal db activity (OLTP db).
NO DG errors

log_checkpoint_interval integer 0
log_checkpoint_timeout integer 1800
log_checkpoints_to_alert boolean TRUE
fast_start_mttr_target integer 60
db_cache_size big integer 4194304000
db_writer_processes integer 2

From the alert logfile:

Tue Jan 18 12:53:50 2005 --> started checkpoint
Thread 1 cannot allocate new log, sequence 16046
Checkpoint not complete
Current log# 21 seq# 16045 mem# 0: /u01/oradata/tst/redo09a.log
Current log# 21 seq# 16045 mem# 1: /u07/oradata/tst/redo09b.log
Tue Jan 18 12:56:01 2005 --> completed checkpoint
Completed checkpoint up to RBA [0x3ea3.2.10], SCN: 0x0000.40c7900b
Tue Jan 18 12:56:01 2005
LGWR: Completed archiving log 21 thread 1 sequence 16045
Creating archive destination LOG_ARCHIVE_DEST_2: 'TSTSTBY'
LGWR: Beginning to archive log 1 thread 1 sequence 16046
Beginning log switch checkpoint up to RBA [0x3eae.2.10], SCN: 0x0000.40c90c03
Thread 1 advanced to log sequence 16046
Current log# 1 seq# 16046 mem# 0: /u01/oradata/tst/redo01a.log
Current log# 1 seq# 16046 mem# 1: /u07/oradata/tst/redo01b.log
Tue Jan 18 12:56:02 2005
ARC1: Evaluating archive log 21 thread 1 sequence 16045
ARC1: Archive destination LOG_ARCHIVE_DEST_2: Previously completed
ARC1: Beginning to archive log 21 thread 1 sequence 16045
Creating archive destination LOG_ARCHIVE_DEST_1: '/u01/app/oracle/admin/tst/arch/arch16045.arc'
ARC1: Completed archiving log 21 thread 1 sequence 16045
Tue Jan 18 12:56:47 2005
Thread 1 cannot allocate new log, sequence 16047
Checkpoint not complete
Current log# 1 seq# 16046 mem# 0: /u03/oradata/tst/redo01a.log
Current log# 1 seq# 16046 mem# 1: /u07/oradata/tst/redo01b.log
Tue Jan 18 12:59:08 2005
Completed checkpoint up to RBA [0x3ea4.2.10], SCN: 0x0000.40c7aab4

Observations:
1. ARCn process was not slow in archiving (v$archive showed no files required archiving, also checked the timestamps of the archived files).
2. We were switching 100MB logs every 2 minutes

Question:
To prevent the "Checkpoint not complete" - db stalled for nearly 2 minutes, I plan to do the following
1. Change the fast_start_mttr_target value from 60 to 30 to force DBWR to write dirty blocks to disk more often
2. Increasing the DBWR process from 2 to 4
3. Increasing the size of the online redo logfile from 100MB to 500MB (to prevent frequent log switch and checkpoints)

I don't see any reason in increasing the number of online redo log files groups since online redo logfiles were available and it appears the problem was due to a slow DBWR process. Am I correct?

Do you agree this problem is due to a slow DBWR. Any suggestions to fix this.

Thanks very much for your time.


Tom Kyte
January 22, 2005 - 4:25 pm UTC

1) ok
2) depends on whether you are using async io or not, if not, that would be my first approach
3) or -- add more 100meg log files to increase the frequency of checkpoints.... else, you may just have pushed the wall back 5 feet -- only to give yourself that much more running room to build up speed before you smack into it really hard...

Speed up DBWR dirty writes

DBA, January 23, 2005 - 2:17 pm UTC

Hi Tom,

Thanks for the info. I still would like to understand the behaviour of DBWR in this specific scenario. Here is some addl. info reg. the "checkpoint not complete problem".
OS supports async i/o - Solaris 2.9
disk_asynch_io = TRUE
log_buffer = 1572864
log_checkpoint_interval=0
log_checkpoint_timeout=1800
log_checkpoints_to_alert=TRUE
fast_start_mttr_target=60
db_cache_size=4194304000
db_writer_processes=2

After looking closely at the alert logfile, the checkpoints took an average of ~ 15 minutes. From the alert logfile

RBA SCN CKP_STIME CKP_ETIME CKP_TIME
0x3ea3.2.10 0x0000.40c7900b 12:43:30 12:56:01 ~ 12
0x3ea4.2.10 0x0000.40c7aab4 12:44:20 12:59:08 ~ 15
0x3ea5.2.10 0x0000.40c7c7a3 12:45:06 13:02:40 ~ 18
0x3ea7.2.10 0x0000.40c7fd83 12:46:33 13:04:37 ~ 18

It appears that the DBWR checkpoint did not compelete (from your site quote - "the interim checkpoints *did not get ignored*, they just did not complete. big
difference.)

Questions:
1. Could you confirm the following
LGWR flushes blocks from the log_buffer to the online redo logfiles based on the following events
- every 3 seconds
- log_buffer is 1/3 full
- when log_buffer is 1 MB full - does this apply in 9i/10G also?
- commits (via piggyback commits)

Anything else?

2. DBWR writes dirty buffers from the db_cache to disk based on the following events
- every 3 seconds
- check point at a log switch (complete/full checkpoint)
- incremental checkpoint (based on fast_start_mttr_target value)
- incremental checpoint (based on log_checkpoint_timeout interval value)
- when a shadow process scans more than 1/4 of db_cache (as per Metalink Note ID: 147468.1)

Anything else?


3) Oracle switched and archived sucessfully (from v$archived_log) 10 online redo logfiles before the first checkpoint not complete error was reported. You indicate in one of your answers the following
" Now, given how fast the archive went -- and how long it took to catch up -- I'd be looking at "why" - perhaps you are not checkpointing frequently enough (too many dirty blocks) or dbwr is not "tuned" -- the disks are really slow?"
a) What could cause DBWR to take ~ 15 minutes to complete a checkpoint? Even if I have 3/4th of the buffer_cache dirty it shouldn't take 15 minutes. I observed a ton of buffer busy waits and creation of about 150 _UNDO segments during this period. sar showed 10-15% of wait i/o which is typical on our server.
b) even if I change the fast_start_mttr_target from 60 to 30, its not going to help nor would setting the log_checkpoint_timeout=60 (every minute). In our case the checkpoints JUST DON'T COMPLETE quickly enough.

Please help me understand this better and if possible provide your suggestions.

Can't thank you enough.




Tom Kyte
January 23, 2005 - 3:08 pm UTC

1) yes -- and before dbwr is allowed to checkpoint a block too -- the redo for that block has to be on disk.

2) every 3 seconds it wakes up if it was sleeping to see if something is there that needs be done. It does not necessarily *do* anything every three seconds. If none of the triggers to flush have happened -- it won't

it would put blocks to disk in the even the buffer cache is getting full too -- to make room for new blocks.


3) arch and lgwr have the advantage of sequential io -- they do not have to seek all over the place to put data back down on disk (that is why we have redo, we could just write blocks out when you commit after all). dbwr has to put blocks back where they go which is here, there, everywhere. lots of random seeks and writes.

dbwr checkpointing 15 minutes worth of work isn't just writing (say) 5 meg out contigously, it is writing 5 meg out in 8k pieces all over the place -- seek, write, seek, write, seek, write...


You were flooding the buffer cache at that point, dbwr was apparently treading water here --trying to keep afloat. no one was really waiting on IO seriously here, async io => not waiting -- fire it off and the OS will let us know when it completes.




Speed up DBWR dirty writes

DBA, January 23, 2005 - 6:41 pm UTC

Tom,

1) Which Oracle background process writes the "Beginning checkpoint" and "checkpoint completed" info to the alert logfile? DBWR or CKPT? Just curious since CKPT is the one that updates the file headers

1) Is there way to determine the number of dirty blocks at a given time in the db_cache (SGA)?

2) how can I verify if a incremental checkpoint via fast_start_mttr_target was actually triggered? I understand that it MAY not complete due to DBWR async i/o.

3) Checkpoint due to log_checkpoint_timeout can be verified since Oracle writes to the alert logfile? Correct?

Thank You




Buffer busy wait

Nilanjan Ray, January 24, 2005 - 2:37 am UTC

In response to the previous question:

buffer busy wait could be independent of dbwr. it seems to me that a heavy delete or update statement(s). may be hitting too many hot blocks. might be having cache buffer chain latch wait too. Unselective indexes or block level contention for data blocks(?)

what does v$waitstat reveal ?

Regards

Tom Kyte
January 24, 2005 - 8:28 am UTC

they had the message "checkpoint not complete" and the database paused while dbwr flushed blocks.

dbwr & buffer busy waits

Nilanjan Ray, January 27, 2005 - 2:26 am UTC

Will it be appropiate to assume that the buffer cache is small compared to the workload ?

Tom Kyte
January 27, 2005 - 8:18 am UTC

no. buffer cache could be infinite in size here.


When we try to advance into a redo log file, we must make sure that all of the blocks protected by the redo in that log file are safely checkpointed onto disk -- if they are not we must WAIT.

In fact, a really large buffer cache could even be part of the cause of this -- too many blocks in the cache to be flushed in too short a period of time (we saved up lots of changes and then had to hurry up and write them out -- a smaller cache might have forced dbwr to clean them out sooner)

dbwr & buffer busy waits

Nilanjan Ray, January 28, 2005 - 3:44 am UTC

Thnaks Tom,

Should have got it earlier " a large buffer cache could be a problem too".

It seems that we have arrived at a point to review the IO subsystem in the OS. The original poster made no mention about disks & IO. Could raid 5 (if at all) be kicking in ?

Regards


Tom Kyte
January 28, 2005 - 7:25 am UTC

anything that makes IO slow would be kicking in there.

Speed up DBWR dirty writes

DBA, January 29, 2005 - 1:32 pm UTC

Tom/Nilanjan,

Update: Tom is correct here (the statspack report also confirms this).

The checkpoint incomplete problem in our system was due to the following

- 4 GB buffer
- slow disks (write complete waits were very high (4k) during the sudden burst of DML activity - on a normal day its less than 100).
- the datafile was on one disk (RAID 1)- less bandwidth for DBWR to write. This disk was being pounded by other processes (from v$filestat).

Tom,

If you have the time could you answer the following

1) Which Oracle background process writes the "Beginning checkpoint" and "checkpoint completed" info to the alert logfile? DBWR or CKPT? Just curious since CKPT is the one that updates the file headers/control files.

2) Is there way to determine the number of dirty blocks at a given time in the db_cache (SGA)?

3) how can I verify if a incremental checkpoint via fast_start_mttr_target was actually triggered? I understand that it MAY not complete due to DBWR async i/o.

4) How to determine the exact time (duration) for a checkpoint to complete from the data dict or internal tables (not from the alert logfile)? v$datafile_header has the completion time only.

Thanks for your time.

Regards



Tom Kyte
January 30, 2005 - 9:17 am UTC

1) not sure, not really 'relevant' though, it could provide us no useful information. Think of the oracle instance as just that -- the instance.

2) v$bh

3) triggered -- you can just assume it was, that is what the code does. If you tell us to limit recovery, it'll compute how many dirty blocks we should have and start flushing them out when we exceed that. dbwr wakes up every three seconds to check for that condition. condition encountered and it starts flushing.

4) alert log is the only way I know/have done.

Incremental checkpoints

A reader, February 14, 2005 - 9:12 am UTC

Tom,
Where can I find incremental checkpoints in documentation. I looked in Performance Tuning and Administrator's Guide but cannot find the term incremental checkpoints.
Thanks


Checkpoint

Nadeesh, February 25, 2005 - 2:01 am UTC

Tom,

A small query.

If incremental checkpoint does not update the datafile headers. Then in case of inconsistent shutdown e.g "shutdown abort" , does oracle applies redo entries from the begning of current redo log, because datafile headers contains information of the full checkpoint after log switch.

Tom Kyte
February 25, 2005 - 4:58 pm UTC

it records what it needs, it is there to minimize the redo we need to apply. It knows how much redo it needs to catch things up.

Full Information about Check Point

Santosh Hiremath, April 29, 2005 - 8:56 am UTC

It has solved all my doubts abount of types check point and working of it.

Tom,

It would be very helpfull if you can answer for this problem.
I have one more qurey on check point, that is how this RBA is genarated and what is the purpose of having checkpoint queue.

Correct me please
Because i am not finding full information about how exactly the instance recovery will be done, because as for my knowledge whenever the Full Check point happens the info about RBA will be written to Control file and Datafile
headder so no need of recovery, Recovery has to done only dirty buffers are not written to disk whose RBA is generated and recorded Control file.

But how exactly this RBA's and SCN's will help in rebulding instance i.e Rolling forward and rolling Backward fase.

Thanks and Regards
Santosh from Bangalore.




Tom Kyte
April 29, 2005 - 9:32 am UTC

I like thinking conceptually about these. I find it easier to understand.

but the RBA is just:

RBA, Redo Byte Address, "log seq no:block:offset"

and think of the SCN as a "clock", it ticks upon commit.

So the rba isn't really "generated" so much as "noted".


Thanks

Santosh, May 02, 2005 - 8:09 am UTC

Hi Tom,
Santosh from Bangalore-India

Thanks your followup.

It has cleared clouds on this matter.Now i can think more deeply on this one.

Thanks and Regards,
Santosh

checkpoint

ARU, March 08, 2006 - 10:36 pm UTC

Hi Tom,

A quick question about checkpoint if I may.
When you say that the incremental checkpoint will keep dirty blocks in the buffer to a manageable level, what does it exactly mean?
Does it actually trigger the DBWR to flush the dirty buffers to the disk ? if not how does it help?
Regards,
ARU.


Tom Kyte
March 09, 2006 - 1:02 pm UTC

yes, it does. an incremental checkpoint is designed to limit the number of dirty blocks - it does this by checkpointing them.

need some clarification

shams, March 30, 2006 - 10:55 pm UTC

if incremental checkpoint cause dbwr to flush dirty buffers to disk, then why the datafile header is not updated.

Looking for explanation for RBA and SCN

Michael Schmitt, May 28, 2006 - 10:52 am UTC


Archive process performance

Roger, May 28, 2006 - 10:56 pm UTC

Hi Tom

In the alertlog, i got the following error:

Thread 1 cannot allocate new log, sequence 416
All online logs needed archiving

I only have 2 logfile groups, i know i can address this by adding new groups, but is there any other way? like setting up the second arc process? i did so, but seems there is no any change, i only have one archive dest and 1 proccessor, if the 2 or more archive processes will increase the performance in my case?

Thanks

Tom Kyte
May 29, 2006 - 7:34 am UTC

do you have automatic archiving ENABLED?

Archive process performance

Roger, May 29, 2006 - 9:40 am UTC

Hi Tom

I run 10R2, archive is automatically enabled.

Tom Kyte
May 30, 2006 - 8:21 am UTC

and so do you have sufficient IO bandwidth to actually copy log1 to arch_destination as fast as log1 is created.

What I mean is - did you look at your physical layout and set it up such that ARCH is not contending with LGWR and other things are not contending with ARCH and that ARCH has the resources necessary to actually copy the redo generated in the time permitted?

Archive process performance

A reader, May 30, 2006 - 8:38 am UTC

Tom

Yes, I/O is the bottleneck, i am wondering if 2 arc processces would be helpful when the other resources are all available, 2 acr processes copying one online log to one dest is faster than one arc doing that?

Tom Kyte
May 30, 2006 - 9:35 am UTC

Not if you don't have sufficient IO for one of them, having two will likely compound the issue.



Meaning of Turn off Checkpoint

Arindam Mukherjee, July 18, 2006 - 11:49 am UTC

Respected Mr. Tom,

In Oracle 9i Database performance tuning and guide, at the page no. 17-12, I read the following lines.

Oracle responds with the following:

NAME VALUE
physical reads 2376
physical writes 14932
physical writes non checkpoint 11165

The first row shows the number of data blocks retrieved from disk.
The second row shows the number of data blocks written to disk.
The last row shows the number of writes to disk that would occur if you turned off checkpointing.

My question what is turn off checkpointing and why I should do that. Sorry!! I could not understand the third column name and its value. Please help me.

Regards,
Arindam Mukherjee

Tom Kyte
July 19, 2006 - 8:15 am UTC

it is perhaps loosely worded - you do not "turn it off", but you can increase the times between them (checkpoints).

Meaning of Turn off Checkpoint

Andr, August 02, 2006 - 9:32 am UTC

Hi,Tom
if ACTUAL_REDO_BLKS has become more than log_checkpoint_interval and some sessions needed free buffers at the same time.
What task has more priority for DBWR ?
free buffer request(write cirtain dirty buffers for request) or adujst the checkpoint length (write according to "RBA-sequence") ? What will DBWR do at first?

Tom Kyte
August 02, 2006 - 12:04 pm UTC

nothing happens exactly at truly the same time for a serial piece of code.

Meaning of Turn off Checkpoint

Andr, August 03, 2006 - 3:25 am UTC

Hi Tom.
i.e. DBWR continue write checkpoint queue(until ACTUAL_REDO_BLKS<=log_checkpoint_interval) and will ignore free buffer request ?
Sorry, I don't understand your answer.

Tom Kyte
August 03, 2006 - 9:23 am UTC

if dbwr is checkpointing, dbwr is certainly not ignoring anything as the outcome of the checkpointing of a block is to make it "freeable".

My answer was "dbwr doesn't get these things at the same time, dbwr is a serial process, linear"

but if dbwr checkpoints, dbwr has just freed a block that can be used for a free buffer request.

Meaning of Turn off Checkpoint

Andr, August 04, 2006 - 2:54 am UTC

Hi Tom.
Thank you for the explanation.
i.e. if a block is dirty and free buffer request, "DBWR has just freed a block" because there is this block in the checkpoint queue already ?
Does the checkpoint queue "equal" the dirty queue ?


Tom Kyte
August 04, 2006 - 7:54 am UTC

there is conceptually a list of blocks that will need checkpointing at some point.

there is conceptually a list of blocks ordered by how they should be checkpointed in order to release redo from being needed to perform crash recovery (to meet your mean time to repair MTTR requirements)

Meaning of Turn off Checkpoint

Andr, August 04, 2006 - 3:07 am UTC

But If the dirty buffer hasn't written yet, DBWR can't free it for another block use. And if this buffer is in the tail of the checkpoint queue, would DBWR write this buffer before other buffers from queue ?

Tom Kyte
August 04, 2006 - 7:58 am UTC

it just really doesn't matter does it. You have a condition such that:

a) a block should be checkpointed to meet your MTTR
b) someone needs a free buffer

both a and b are completely satisfied simultaneously here if we flush that block.

don't over analyze this stuff - the algorithms, in addition to quite simply not being documented, change from release to release. I cannot imagine how this level of detail possibly helps us as database developers?

Partial Checkpoint

Mukhtar Shaikh, August 11, 2006 - 8:01 am UTC

Hi Tom

Extremely thankful for the tips.

If i put the tablespace in backup mode.

ALTER TABLESPACE tbs1 BEGIN BACKUP;

This will trigger the PARTIAL CHECKPOINT, which will flush the dirty buffers of the datafiles belonging to tablespace TBS1.

Later i issued ALTER TABLESPACE tbs1 END BACKUP;

Now please tell how the instance recovery will be done.

Thanks in Advance

Tom Kyte
August 11, 2006 - 11:04 am UTC

there is no instance recovery to be done in your example. There was no crash.

OLTP commit problem

A reader, October 13, 2006 - 11:25 pm UTC

Hi Tom,

i would need your help in my problem, i have OLTP system.

with statspack report (elapsed time of 48Hour - too large but we are checking the overall performance)

SQL ordered by Executions for DB: FARMA Instance: farma Snaps: 1 -180
-> End Executions Threshold: 100

CPU per Elap per
Executions Rows Processed Rows per Exec Exec (s) Exec (s) Hash Value
------------ --------------- ---------------- ----------- ---------- ----------
63,351,785 0 0.0 0.00 0.00 1053795750
COMMIT

but in the section "Instance Activity Stats for DB: FARMA Instance: farma Snaps: 1 -180"

user commits 865,562

how can we have 63,351,785 commits issued but only 865,562 reported in the instance activity section ?

thank you for your cooperation

Tom Kyte
October 14, 2006 - 8:13 am UTC

...
(elapsed time of 48Hour - too large but we are checking
the overall performance)

....


ARG - why do you do that, even though you know it is wrong. You cannot make sense of anything over 48hours unless you had a sustained, constant, invariable load doing the same sort of work for 48hours.

read through this:

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:69099514810001#74175033266034 <code>

You are suffering the same issue. SQL comes, sql goes. Statspack looks at exactly two things - your begin snap and end snap. your sql statement might have aged in and out of the cache dozens of times over that period.

This 48 hour statspack is almost useless (some of the stats like transactions per second MIGHT make sense, but only if you had a constant invariant load)

In fact, I would term it as less than useful, of negative benefit. For it will have you chasing white rabbits down holes.

Larger Online Redo Logs and Fast_start_mttr_target

VLS, October 30, 2006 - 12:10 am UTC

Hi Tom,

In your new book "Expert Oracle 9i and 10g...", in chapter 3 on Online Redo Logs, you have said that "if more number of users modify same block frequently, then having a larger Online redo logs will be beneficial but will result in longer Instance recovery". Just wanted to know few things on this :

1. If I set fast_start_mttr_target to 120 seconds and have larger online redo logs, it should solve the purpose of both i.e faster Instance recovery and users modifying same block and the block is in the cache for longer time. Am I right ?

2. Fast_start_mttr_target, log_checkpoint_* parameters results in Incremental Checkpoint and these are not recorded in the Alert.Logs.

3. During checkpoints, whether it is an Incremental Checkpoint or a Complete Checkpoint (triggered during log switches), the dirty blocks are written to disk but are also available in the cache. These blocks, once written, can either be overwritten by another block or will be modified by the user, if required, instead of going for a DISK I/O.

I mean, if a block x was a dirty block and has been written to disk during Incremental or Complete Checkpoint.
Scenario 1 : A user requiring block x again for modification, since it is already available in the cache and a copy of it written to disk, DISK I/O will not be initiated and the copy available in the cache will be modified.

Scenario 2 : Block X has been written to disk, now user read block y and since block x has been moved to disk and free to overwritten, block y will be overwritten by x.



4. If my concepts in Question 3 are correct then, larger online redo logs will only be a better solution for the scenario depicted in your book when fast_start_mttr_target and log_checkpoint_* parameters are not set and checkpoint occurs at log switches only.

Please correct me.

Thanks and Regards

Tom Kyte
October 30, 2006 - 8:52 am UTC

1) yes, that is the goal of the mttr (mean time to repair) settings - to reduce crash recovery times by having dbwr be more aggressive in it's cleansing of the buffer cache.

2) yes

3) correct

4) in the absence of any parameter settings - the presumption is you are using defaults, as most people would/should be (the fewer the better).

did you read the bullet point right below the one you are refering to? I pointed out the obvious conflict between "larger redo logs" and "lowered time to repair" right there.

Mean Time to Recover and Larger Online redo logs

VLS, October 30, 2006 - 9:37 am UTC

Hi Tom,

Thanks for your reply.

I have read the next bullet point as well and it says to ensure faster instance recovery one should have smaller online redo logs as against larger redo logs which are meant for concurrent access to same blocks multiple times. In other words, either I can have faster instance recovery or performance. I want both so can I have larger redo logs and fast_start_mttr or log_checkpoint_* parameters. But again this won't work.

Also, what if I have one redo log file of smaller size and other redo logs of larger size and unset other parameters.

Regards


Tom Kyte
October 30, 2006 - 9:48 am UTC

what wouldn't work about it?

if you have mixed size redo logs, just expect checkpoints to be initiated at different rates of time.

MTTR Parameters and larger online redo logs

VLS, October 30, 2006 - 10:36 am UTC

Hi Tom,

In my previous posting, I meant that if I have larger online redo logs file for performance reason (Lots of user modifying same block) then I have to sacrifice on instance recovery and if I set fast_start_mttr or log_checkpoint_* parameters, then I have to sacrifice performance. There is no way that I can have both.

Am I correct ?

Regards


Tom Kyte
October 30, 2006 - 12:43 pm UTC

you will not necessarily sacrifice performance by using mttr targets, it depends. it depends on your system, your IO cababilities. You will see one of three things

a) no change in performance
b) decreased performance
c) increased performance

for example c) could happen because you stop hitting "checkpoint not complete" issues.

a) could happen because dbwr has plenty of spare cycles to do work and the increased write IO doesn't affect you.



Slow and Fast Checkpoint

Parikshit Paul, April 04, 2007 - 6:35 am UTC

Hi Tom,
I have recently read about the slow and fast checkpoint in [URL=" http://www.dbasupport.com/forums/showthread.php?t=54451"]DBASupport forum[/URL].
It was said that during a slow checkpoint (which is triggered by a log switch) all the dirty buffers may not be written to the disk, thereby the requirement of the said log being necessary during crash recovery. Is this true?
Tom Kyte
April 04, 2007 - 10:32 am UTC

the dirty buffers will be written to disk - in the background, as needed.

The log file will be needed during crash recovery if dbwr hasn't completed checkpointing all of the blocks.

The dirty blocks however will ultimately be written to disk - and definitely will be written to disk prior to the log file that triggered the checkpoint is reused.

So...

Parikshit Paul, April 05, 2007 - 6:16 am UTC

Slow checkpoint isn't limited by 1000 blocks or db_checkpoint_batch_size as was said in the thread.Once a checkpoint has started (slow or fast) will flush all the dirty buffers to the disk, Right?
Tom Kyte
April 05, 2007 - 10:59 am UTC

once started, it'll flush them before that redo log file can be reused - yes.

"lgwr cannot reuse a logfile before dbwr has flushed the blocks protected by that redo log."

Naresh, November 03, 2007 - 10:05 am UTC

"lgwr cannot reuse a logfile before dbwr has flushed the blocks protected by that redo log.
"

hello Tom,

Why is this so? Is it because instance recovery has to work only with the online redo logs?

Thanks,
Naresh
Tom Kyte
November 05, 2007 - 11:21 am UTC

correct

A reader, April 28, 2008 - 4:28 pm UTC

Hi Tom; just have a quick question, if DBWR wake up every 3 seconds to write dirty buffer to data files, then, what is the point of having checkpoint, which come between much longer interval?
Tom Kyte
April 29, 2008 - 8:35 am UTC

it wakes up to see what if it needs to do anything. did you see:

... incremental checkpoints are looked for when dbwr times out, which is 3
seconds. They may or may not be performed every three seconds.
....

today, checkpoints are typically almost always happening - sort of continuously, they are not the "major event" they used to be in most cases.

A reader, May 26, 2008 - 5:12 am UTC

Hi Tom,
You said:
>>
"Last completed checkpoint - incremental or whatever (they are all 'checkpoints', with the goal of
getting blocks onto disk so that no recovery would be needed to be applied to them). It asks the
question "hey, how far back into the logs do I have to go and start applying from -- knowing the
further back, the longer we take""

the goal of incremental checkpoints is to advance the checkpoint marker so that less redo would
have to be read and applied.

All a checkpoint does is write blocks out and advance a marker. It looks and says "hmm, what
blocks should I write out so I can advance this marker so that we have less redo to process". A
"major checkpoint" is trying really hard to advance the marker into the next redo log, an
"incremental" is just trying to advance the marker far enough so that only X blocks of redo would
have to be applied during recovery. Same jobs, different triggers.
>>


What I am unable to understand is whats the relation between checkpoint and redo logs.
A checkpoint flushes dirty buffers from data buffer cache to datafiles and mark the SCN to the headers of datafiles and control files.
Where in the process it interracts with redo log buffer or redo log files?
So how would it know how far it has to go back to the redo log fiels to apply redo?
A checkpoint advance a marker in the datafiles and control files.

I am really confused with checkpoint SCN and commit SCN?
Does the information of checkpoint SCN is also maintained in the redo log files by CKPT?

Thanks
Tom Kyte
May 27, 2008 - 7:59 am UTC

we know what is in each redo log - look at v$archived_log, it tells you what is covered in there.

If we know that everything up to change# X was written to datafile Y, we know we don't need to apply archive Z to it if archive Z has changes that all pre-date that change# X.

A commit scn is personal, it is for a transaction. A checkpoint scn is global, it is "the scn right now", it is like a clock.

So, a commit scn is what time that transaction committed, a checkpoint scn is a time reflecting the checkpoint.


Just think of the SCN like a clock, like a timestamp. Don't think of it as a number - as a system change number - just think of it like a timestamp.

checkpoin SCN and commit Scn

A reader, May 30, 2008 - 4:48 am UTC

Tom

You means to say that checkpoint SCn and commit Scn can be different. As far as i know there is only one Scn clock.

Do you want to say in case of commit scn is only written in log file while in case of checkpoint the scn is written to all the datafile and control file
Tom Kyte
May 30, 2008 - 7:22 am UTC

Ok, let me try an analogy.

at 11am, you filed your inbox into the filing cabinet.

however, time marches on.

it is now 11:30am and your desk catches fire, your inbox is burnt up.


what do you have. Well, you have everything safely in the filing cabinet up to and including 11am inputs.

You lost in the fire everything that happened after 11am - upto 11:30am, that needs to be reproduced (replayed, recovered).


Your checkpoint time - 11am.
Your last commit scn - 11:30am


do you get it - they are just different points in time, they are INSTANCES of scn values as of a particular point in time. There is one clock, the clock is forever advancing - it would be impossible for "everything" to be as of the same exact point in time if the database is open - the different pieces all move at different rates.

Until the quantum version of Oracle that is, when everything can happen simultaneously.


Each file in an active database will have a slightly different "time of consistency" - they are all written to at different times. The 'scn' that corresponds to them is a particular value of the single, sole scn that controls the database - but it is a value like 11:01am, 11:02am, 11:03am are just single instances of time - all from the same clock.

Aman...., September 27, 2008 - 12:28 am UTC

Sir,
Checkpoint SCN is the scn that is part of the datafile header right? When this number is changed in the datafile header and also when its updated in the controlfile? What I am understanding is that when we issue , begin backup to the tablespace than this checkpoint SCN is changed and freezed. So in the case of the recovery, when we restore this freezed checkpoint SCN datafile from the backup, from this number to the curretn Commit SCN stored in the controlfile, recovery must happen to makethe datafile consistent.Is it sounds correct?
Thanks and regards
Aman....
Tom Kyte
September 27, 2008 - 11:38 am UTC

do not go to this level.

Just understand this:

a) there is a clock
b) Oracle calls the clock a system change/commit number
c) every commit advances the clock
d) no operations are atomic - a checkpoint happens over a long period of time, the scn is advancing continously

in the files we know from what time (what scn) we need redo in order to recover that file. That is all - that is all we need to really understand here.

The file knows the oldest bit of redo it might need to recovery in the event of an instance failure. As the checkpoints occur, the scn timestamp the file records will advance accordingly.

A question about incremental checkpoint

liping, August 19, 2009 - 9:50 pm UTC

Hi Tom,
I have a question about incremental checkpoint.
I recently read a book called <<Orace Database 10g Ocp Certification All-in-One Exam Guide>> that published by oracle press.When talked about the database buffer cache,the book said:

"When a server process looks for a free buffer, starting at the least recently used end of the LRU list, whenever it finds a dirty buffer in the course of its search it will transfer its address to the checkpoint queue."

"This means that a very busy buffer, one that is being continuously updated, will never be written to disk, because it will always be at the most recently used end of the LRU list".

"But what if DBWR has written some blocks to disk before the crash? It might be that JOHN (or another user) was continually requerying his data, but that DAMIR had made his uncommitted change and not looked at the data again. DBWn will therefore decide to write DAMIR¿s changes to disk in preference to JOHN¿s; DBWn will always tend to write inactive blocks rather than active blocks. So now, the datafiles are storing DAMIR¿s uncommitted transaction but missing JOHN¿s committed transaction.
This is as bad a corruption as you can have. But think it through. If the instance crashes now¿a power cut, perhaps, or a shutdown abort¿the roll forward will still be able to sortout the mess."

JOHN updated a block before DAMIR made(I mean the block JOHN updated has a low RBA than DAMIR made.),how does the instance recovery work?What does the incremental checkpoint insure that no corruptions?

Tom Kyte
August 24, 2009 - 5:05 pm UTC

... "This means that a very busy buffer, one that is being continuously updated,
will never be written to disk, because it will always be at the most recently
used end of the LRU list". ...

that is false.


A question about incremental checkpoint

A reader, August 25, 2009 - 12:36 am UTC

Hi Tom,
Thanks for your reply!

You said:
>>
... "This means that a very busy buffer, one that is being continuously updated,
will never be written to disk, because it will always be at the most recently
used end of the LRU list". ...

that is false.
>>

Do you mean that is errors in the book?
If so,What does the incremental checkpoint doing?
Can you answer the question I asked last time?

It had been bothering me for a long time,Thanks a lot!

Tom Kyte
August 25, 2009 - 10:00 am UTC

... Do you mean that is errors in the book? ...

You took something out of context, I see this one sentence:

"This means that a very busy buffer, one that is being continuously
updated,
will never be written to disk, because it will always be at the most recently
used end of the LRU list".


and I say "false", because eventually the buffer MUST be written to disk - we'll eventually overwrite the online redo log that protects that buffer in memory and before we do that - we HAVE to checkpoint that buffer to disk.

so, as an atomic sentence, by itself, it is wrong.


... Can you answer the question I asked last time? ...

and that was what exactly?



if you are asking about recovery - basically, forget the incremental checkpoint, just call it "checkpointing". The act of checkpointing a block to disk makes it so the block won't need any redo generated prior to the checkpoint to recover that block.

While we are checkpointing blocks we are advancing the "checkpoint marker", we are advancing a pointer in the redo logs basically (think of them as a big linear file). As we checkpoint information, we can advance that pointer and say "before this pointer - we do not need the redo to perform crash recovery, after this pointer - we may need the redo to perform crash recovery"

If we crash, we know where the checkpoint pointer was in the redo stream, we read the redo starting at that pointer and apply the redo to the database blocks IF NECESSARY (we can look at the block and see if that redo needs to be applied, or not).


instance recovery

A reader, March 05, 2011 - 10:56 pm UTC

Hi Tom,

We have been discussing under the presumption that the change gets written to the online redo log, now what would happen (just theoretically) if there was a change made, a commit issued and the instance crashes before the change gets written to the online redo? How will the instance recovery happen?

Thanks,

Tom Kyte
March 07, 2011 - 12:23 pm UTC

If the commit had not finished - was note fully recorded in the online redo log - before the "thing" failed (database, operating system - whatever) - then it would be as if the commit had never happened.

It would be no different then if you updated a million rows - did not commit yet - and the machine failed.

If on the other hand, the commit had completed after being issued - and the commit finished - then the commit would have made the data durable, it would be recorded in the online redo logs as "committed" and would be fully recovered during crash recovery.


It all depends on whether the commit, after being issued, had completed or not.

Thank you much!

A reader, March 07, 2011 - 7:36 pm UTC

Thank you much! Very clear!

question

Shravan, April 06, 2011 - 7:17 am UTC

Hi Tom,

please help me to understand the concept of RMAN.. i ll be very thankful to u, if u suggest me which book can give me a brief concept knowledge..

Tom Kyte
April 12, 2011 - 12:59 pm UTC

clarifications

doro, April 10, 2012 - 4:33 am UTC

Tom,
Oracle database concepts 11gR2 manual states that "An incremental checkpoint is a type of thread checkpoint partly intended to avoid writing large numbers of blocks at online redo log switches. DBWn checks at least every three seconds to determine whether it has work to do. When DBWn writes dirty buffers, it advances the checkpoint position, causing CKPT to write the checkpoint position to the control file, but not to the data file headers."
Also
"During instance recovery, the database must apply the changes that occur between the checkpoint position and the end of the redo thread. As shown in Figure 13-5, some changes may already have been written to the data files. However, only changes with SCNs lower than the checkpoint position are guaranteed to be on disk."
My question is: if the dbw is moving the checkpoint position how come there are changes written in the database files without checkpoint information in the control file and without no CKPT RBA in the logs? could be a case when the instance collaps after the dbw proces has written the dirty buffers but the ckpt didn't had the time to update or i missunderstood the whole process?




Tom Kyte
April 10, 2012 - 7:49 am UTC

how come there are
changes written in the database files without checkpoint information in the
control file and without no CKPT RBA in the logs


because the act of writing to datafiles and the act of updating the other information happen sequentially. One after the other.

when the checkpoint happen?

A reader, July 10, 2012 - 7:14 pm UTC

Hi,Tom
I have read the quesions and answers about this article,now I'm still confused with the checkpoint.
Q1;when the normal checkpoint happen?
Q2;when the incremental incremental checkpoint happen?
Q3:when the checkpoint quene change?
thank you
Tom Kyte
July 12, 2012 - 4:42 pm UTC

they are all the same for all intents and purposes.

A checkpoint can be thought of as the act of writing out dirty blocks from the buffer cache to disk and updating internal structures that tell us this has happened so that during crash recovery - we know how far along in online redo log stream we can start - how much of the online redo log we can skip.

many things trigger these sort of events, but they are all "checkpoints". I myself do not differentiate between them really - the are all conceptually the same.

things that can trigger them are (but not necessarily limited to)

o estimated time to recover (you can bound our recovery time and we'll checkpoint anytime we estimate we are near that amount of time)

o amount of dirty blocks in the cache

o log switches

o administrator initiated


checkpoint queue ,scn ,

A reader, August 01, 2012 - 12:25 am UTC

Hi,Tom
I have read many describtion about checkpoint and also have some confusions .Hoping for your answers .
after ora8R oracle introduce increment checkpoint ,the process dbwr flust the data in buffercache to disk by the checkpoint queue ,i want to know after dbwr has just finished flushing a block from the queue to the disk ,then what oracle will do on the control file and datafile header so that oracle knows whether the block should be recoverd if oracle crash at this time ?
Tom Kyte
August 01, 2012 - 6:48 am UTC

we just update them. an incremental checkpoint doesn't really work any differently than any other sort of checkpoint.

During *any* sort of checkpoint activity, dirty blocks are flushed to disk (to datafiles) and eventually meta data about what has been flushed will be recorded in control files and data file headers.


That is what we need to know -

checkpoint => dirty blocks written to datafiles and later metadata about what has been flushed is recorded somewhere.


incremental, log switch initiated, alter system checkpoint initiated, whatever - they are all the same.


don't get too wrapped up in the nuances of the initiating event (incremental, etc).