dont fully agree on the advise re redo log sizeing
john felstead, January 10, 2002 - 9:11 am UTC
Not so sure your last statement about sizing at 25M is a good one. Isnt that a bit of a broad statement as you arent taking into account the size/throughput of the database. Everything else makes sense but you could be giving people a bumb steer with that size?
I try and aim at log switches every 20 minutes and never below 5 minutes during highest batch updates. This needs a size of 350Mb per redo log on the DB's i work on. I typically use 6 redo log groups. Am i doing something wrong in this?
January 10, 2002 - 10:04 am UTC
I'll just add "your mileage may vary". There are times when gigabyte sized redo logs are good, times when 25meg are good, there may even be times when smaller then 25m are good.
rules of thumb (ROT) are always made to be broken -- i don't have many "ROT" myself. 25m is a "safe" starting spot (safer then the old starter size of 512k anyway)..
Can not allocate Log file
sivababu, March 18, 2002 - 5:16 am UTC
Hello TOM
I got more database problems at different customer places. Today is not good day. I'm getting the same error like above. I think i have to increase the log_buffer, Log_checkpoint_interval and log_checkpoint_timeout. But you told in the other question that we have to drop the log file and create it. Will it effect existing database and how we can do it?. Can you pl give me the steps?.
Entry from alert.log:
Mon Mar 18 08:20:42 2002
Thread 1 advanced to log sequence 20021
Current log# 1 seq# 20021 mem# 0: /opt/oracle/dbs/oradata/mangdbs/redo01.log
Mon Mar 18 08:21:15 2002
Thread 1 cannot allocate new log, sequence 20022
Checkpoint not complete
Current log# 1 seq# 20021 mem# 0: /opt/oracle/dbs/oradata/mangdbs/redo01.log
Thread 1 advanced to log sequence 20022
Current log# 2 seq# 20022 mem# 0: /opt/oracle/dbs/oradata/mangdbs/redo02.log
Mon Mar 18 08:21:50 2002
Thread 1 cannot allocate new log, sequence 20023
Checkpoint not complete
Current log# 2 seq# 20022 mem# 0: /opt/oracle/dbs/oradata/mangdbs/redo02.log
Thread 1 advanced to log sequence 20023
Current log# 1 seq# 20023 mem# 0: /opt/oracle/dbs/oradata/mangdbs/redo01.log
Mon Mar 18 08:24:14 2002
Thread 1 cannot allocate new log, sequence 20024
Checkpoint not complete
Entry from init.ora:
db_block_buffers = 8192
db_block_size = 2048
shared_pool_size = 125000000
large_pool_size=62914560
log_archive_start = FALSE
log_buffer = 163840
log_checkpoint_interval=10000
lock_checkpoint_timeout=1800
Is it possible to increase this way..
log_buffer = 2163840
log_checkpoint_interval=10000
lock_checkpoint_timeout=1800
Thanks in advance.
regards,
sivababu
March 18, 2002 - 8:29 am UTC
nothing in your init.ora is going to affect this. You need more log, you have insufficient log to carry you through your load of work. Add more logfiles.
Just add more logfiles -- see the alter database command for the syntax.
(you know, a 16meg buffer cache and a 125m shared pool, 62m large pool -- you might have that BACKWARDS.... thats a HUGE shared pool, simply HUGE. I normally start at 50/60m or thereabouts). A 160k log buffer is a tad small as well.
If this is a production instance and you are not in archive log mode (I'm ASSUMING by having log_archive_start=false you are not) your DAY WILL ONLY GET WORSE as the sudden loss of data tends to ruin peoples day and running in noarchivelog is a 100% sure way to have a sudden loss of lots of data!
clarification
David, February 25, 2003 - 2:01 pm UTC
Tom,
I wonder if you could clarify something for me regarding the "cannot allocate new log" message. From our alert log (see below), it seems that log#2 would have been available for Oracle to use. Instead, our system grinds to a halt until a checkpoint completes.
Alert Log Snippet (blank lines added for clarity):
----------------------------------------------------
Tue Feb 25 03:42:29 2003
Beginning log switch checkpoint up to RBA [0x19ff.2.10], SCN: 0x0000.644a08a2
Thread 1 advanced to log sequence 6655
Current log# 1 seq# 6655 mem# 0: /redo01/oradata/eisp/redo1a.log
Current log# 1 seq# 6655 mem# 1: /redo02/oradata/eisp/redo1b.log
Tue Feb 25 03:42:29 2003
ARC0: Beginning to archive log# 3 seq# 6654
Tue Feb 25 03:43:08 2003
ARC0: Completed archiving log# 3 seq# 6654
Tue Feb 25 04:11:27 2003
Beginning log switch checkpoint up to RBA [0x1a00.2.10], SCN: 0x0000.644a1165
Thread 1 advanced to log sequence 6656
Current log# 2 seq# 6656 mem# 0: /redo01/oradata/eisp/redo2a_NEW.log
Current log# 2 seq# 6656 mem# 1: /extra01/oradata/eisp/redo2b.log
Tue Feb 25 04:11:27 2003
ARC0: Beginning to archive log# 1 seq# 6655
Tue Feb 25 04:11:55 2003
ARC0: Completed archiving log# 1 seq# 6655
Tue Feb 25 04:19:39 2003
Completed checkpoint up to RBA [0x19ff.2.10], SCN: 0x0000.644a08a2
Tue Feb 25 05:00:05 2003
Completed checkpoint up to RBA [0x1a00.2.10], SCN: 0x0000.644a1165
Tue Feb 25 05:06:35 2003
Beginning log switch checkpoint up to RBA [0x1a01.2.10], SCN: 0x0000.644a1dbb
Thread 1 advanced to log sequence 6657
Current log# 3 seq# 6657 mem# 0: /redo02/oradata/eisp/redo3a.log
Current log# 3 seq# 6657 mem# 1: /extra01/oradata/eisp/redo3b.log
Tue Feb 25 05:06:35 2003
ARC0: Beginning to archive log# 2 seq# 6656
Tue Feb 25 05:07:00 2003
ARC0: Completed archiving log# 2 seq# 6656
Tue Feb 25 05:51:12 2003
Beginning log switch checkpoint up to RBA [0x1a02.2.10], SCN: 0x0000.644a251e
Thread 1 advanced to log sequence 6658
Current log# 1 seq# 6658 mem# 0: /redo01/oradata/eisp/redo1a.log
Current log# 1 seq# 6658 mem# 1: /redo02/oradata/eisp/redo1b.log
Tue Feb 25 05:51:12 2003
ARC0: Beginning to archive log# 3 seq# 6657
Tue Feb 25 05:51:35 2003
ARC0: Completed archiving log# 3 seq# 6657
Tue Feb 25 06:39:05 2003
Thread 1 cannot allocate new log, sequence 6659
Checkpoint not complete
Current log# 1 seq# 6658 mem# 0: /redo01/oradata/eisp/redo1a.log
Current log# 1 seq# 6658 mem# 1: /redo02/oradata/eisp/redo1b.log
Tue Feb 25 06:56:30 2003
Completed checkpoint up to RBA [0x1a01.2.10], SCN: 0x0000.644a1dbb
Tue Feb 25 06:56:30 2003
Beginning log switch checkpoint up to RBA [0x1a03.2.10], SCN: 0x0000.644a2b29
Thread 1 advanced to log sequence 6659
Current log# 2 seq# 6659 mem# 0: /redo01/oradata/eisp/redo2a_NEW.log
Current log# 2 seq# 6659 mem# 1: /extra01/oradata/eisp/redo2b.log
****************************************
As soon as a checkpoint was completed for [0x1a01.2.10], then log# 2 was able to be allocated. If I am reading the log correctly, it was not doing anything anyway so why could it not be used prior to the checkpoint completing? I'm matching up the "beginning log switch checkpoint up to RBA [xxxx]" with the "completed checkpoint up to RBA [xxxx]" to see that #1 and #3 were busy, but what was #2 doing that it could not be allocated? I think I'm repeating myself...
Thanks for any insight,
David
February 25, 2003 - 8:27 pm UTC
here is the begin of the checkpoint on the data protected by log 2
Beginning log switch checkpoint up to RBA [0x1a01.2.10], SCN: 0x0000.644a1dbb
Thread 1 advanced to log sequence 6657
Current log# 3 seq# 6657 mem# 0: /redo02/oradata/eisp/redo3a.log
Current log# 3 seq# 6657 mem# 1: /extra01/oradata/eisp/redo3b.log
Arch finished copying log 2 for us here
Tue Feb 25 05:06:35 2003
ARC0: Beginning to archive log# 2 seq# 6656
Tue Feb 25 05:07:00 2003
ARC0: Completed archiving log# 2 seq# 6656
Tue Feb 25 05:51:12 2003
Beginning log switch checkpoint up to RBA [0x1a02.2.10], SCN: 0x0000.644a251e
Thread 1 advanced to log sequence 6658
Current log# 1 seq# 6658 mem# 0: /redo01/oradata/eisp/redo1a.log
Current log# 1 seq# 6658 mem# 1: /redo02/oradata/eisp/redo1b.log
Tue Feb 25 05:51:12 2003
ARC0: Beginning to archive log# 3 seq# 6657
Tue Feb 25 05:51:35 2003
ARC0: Completed archiving log# 3 seq# 6657
Tue Feb 25 06:39:05 2003
Thread 1 cannot allocate new log, sequence 6659
Checkpoint not complete
Current log# 1 seq# 6658 mem# 0: /redo01/oradata/eisp/redo1a.log
Current log# 1 seq# 6658 mem# 1: /redo02/oradata/eisp/redo1b.log
Here is where the checkpoint finished up -- the one started for number 2
Tue Feb 25 06:56:30 2003
Completed checkpoint up to RBA [0x1a01.2.10], SCN: 0x0000.644a1dbb
Tue Feb 25 06:56:30 2003
Beginning log switch checkpoint up to RBA [0x1a03.2.10], SCN: 0x0000.644a2b29
Thread 1 advanced to log sequence 6659
Current log# 2 seq# 6659 mem# 0: /redo01/oradata/eisp/redo2a_NEW.log
Current log# 2 seq# 6659 mem# 1: /extra01/oradata/eisp/redo2b.log
that is why is could not be reused -- the blocks it protects in the buffer cache were NOT yet on disk...
Now, given how fast the archive went -- and how long it took to catch up -- I'd be looking at "why" - perhaps you are not checkpointing frequently enough (too many dirty blocks) or dbwr is not "tuned" -- the disks are really slow?
archive log file on OPS
DoQuangHuy, February 25, 2003 - 9:53 pm UTC
My database runs on OPS, redo log size is 8M, could you please tell me why archive log file even increase (about 4-6 minutes) when my database has no user connect to whole day. How can I check my database is running on load balancing mode.
February 25, 2003 - 10:23 pm UTC
8m is pretty small for a high end system...
but anyway, use log miner (see the admin guide) to analyze what is in there. tons of stuff -- queues, jobs running in the background with dbms_job, smon - they all can generate redo...
You check your listener.ora and tnsnames.ora to see that it is setup that way.
archive log
Reader, February 26, 2003 - 1:16 am UTC
Tom,
I have 2 database, one runs on 8i OPS and the other runs on 8i (no OPS). When no user connect to, database runs on OPS genarates 1 archivelog file per 4-6 minutes. I analyse archivelog files and select from v$logmnr_contents:
SQLWKS> SELECT operation,count(*) FROM v$logmnr_contents group by operation
2>
OPERATION COUNT(*)
-------------------------------- ----------
COMMIT 8696
DELETE 5
INSERT 5
INTERNAL 17903
START 8697
UPDATE 30
6 rows selected.
When no user connect to, database runs on 8i (no OPS) genarates only 1 archivelog file when database startup.
Could you explain the difference.
February 26, 2003 - 8:43 am UTC
look at what "internal" operations are doing.
as i said, look for jobs, look for active sql, look for aq, log in and look
Awesome
David, February 26, 2003 - 10:30 am UTC
Thanks so much! I was thinking the "Beginning checkpoint..." applied to the log number on the next line in the alert log. I now see that all three logs were "busy".
Our log_checkpoint_interval is 1000000 blocks and, yes, our disks are raid 5. Changing the raid is possible long term. In the near term we are going to try reducing the size of the logs (256MB) and maybe shorten the checkpoint interval...
Excellent
A reader, February 26, 2003 - 8:32 pm UTC
Tom,
Great explaination of the alert log
Jim, March 02, 2003 - 11:02 pm UTC
Nicely done
Submitted Enhancement Request
Ted Persky, June 17, 2003 - 11:32 am UTC
Dear Tom:
FYI, with the advent of Oracle-managed files, I have submitted Enhancement Request No. 3004256 to create a new redo logfile group in the case of checkpoint or archive not completed. We'll see if it is accepted for a future release.
Best regards,
LGWR and DBWR for performance
ygp, August 07, 2003 - 11:11 am UTC
In your book, you mention that Log Writer is useful to improve the performance since it does sequnetial writing. Using only DBWR (non-sequential writing) will degrade the performance.
But, as I understand, the "checkpoint not complete, cannot allocate new log", error *indicates* "DBWR cannot compete with LGWR".
Where do I fail to understand ?
August 09, 2003 - 4:53 pm UTC
Umm, it is not an "either or" thing - I don't know how you got that idea.
lgwr is a point of serialization in the database -- we all wait on him.
dbwr is a guy that goes on in the background.
the checkpoint not complete indicates that dbwr was not keeping up with lgwr and is generally caused by undersizing your log files (or number thereof)
DBWR & LGWR
ygp, August 31, 2003 - 7:17 am UTC
You said
"the checkpoint not complete indicates that dbwr was not keeping up with lgwr and
is generally caused by undersizing your log files (or number thereof) "
I understand that they do totally diffeent works, but, LGWR has to wait for DBWR to complete. What I was refering the *only* benifit of sequential writing by the LGWR.
Suppose there are 10 Log files. By making it 100 or even 1000 does not guarantee that the error would not occur.
At some point, sooner or later, DBWR has to go ahead of LGWR.
August 31, 2003 - 9:40 am UTC
not ahead, just keep up with.
Yes, if you constantly run at full steam -- 100% -- you must architect a system such that DBWR is able to keep up with lgwr. this is done by having sufficient devices and cpus to spread the scattered write work out.
what I said stands 100% though "IS GENERALLY caused by ....." Most systems run with peaks and valleys -- on these systems (the vast majority out there) the solution is in fact to size logs such that they carry you over the peak and lets the valleys catch up.
rollback?
Duncan, December 09, 2003 - 7:16 am UTC
Tom
I ran an update during the day which took 5hrs 45 mins to complete. It finished at 6 pm and our cold backup didn't kick in until 3am the
next morning. When i got in the next day i noticed that there were lots of messages like:
"...Thread 1 cannot allocate new log, sequence 177759...."
in the alert log, which i gather is from redo logs not being large enough (we don't normally get this).
I assume this was due to the disk intensive update i was running that day?
But also the database shutdown immediate began at 3am and didn't finish til 4.45am. Can you shed some light on why this
would happen if the update finished at 6pm the previous evening?
The relevant parts of the ALERT log are listed below:
Mon Dec 1 18:03:04 2003
Thread 1 cannot allocate new log, sequence 177753
Checkpoint not complete
Current log# 1 seq# 177752 mem# 0: /general/oradata/OP3/redo_1.log
Current log# 1 seq# 177752 mem# 1: /tmptblsp/OP3/logfile/redo_2.log Mon Dec 1 18:03:26 2003 Thread 1 advanced to log sequence 177753
Current log# 4 seq# 177753 mem# 0: /general/oradata/OP3/redo_7.log
Current log# 4 seq# 177753 mem# 1: /tmptblsp/OP3/logfile/redo_8.log Mon Dec 1 18:03:26 2003
ARC0: Beginning to archive log# 1 seq# 177752
Mon Dec 1 18:03:39 2003
ARC0: Completed archiving log# 1 seq# 177752
Mon Dec 1 18:03:43 2003
Thread 1 cannot allocate new log, sequence 177754
Checkpoint not complete
Current log# 4 seq# 177753 mem# 0: /general/oradata/OP3/redo_7.log
Current log# 4 seq# 177753 mem# 1: /tmptblsp/OP3/logfile/redo_8.log Mon Dec 1 18:04:09 2003 Thread 1 advanced to log sequence 177754
Current log# 5 seq# 177754 mem# 0: /general/oradata/OP3/redo_9.log
Current log# 5 seq# 177754 mem# 1: /tmptblsp/OP3/logfile/redo_10.log Mon Dec 1 18:04:09 2003
ARC0: Beginning to archive log# 4 seq# 177753
Mon Dec 1 18:04:21 2003
ARC0: Completed archiving log# 4 seq# 177753
Tue Dec 2 03:00:02 2003
Restarting dead background process EMN0
EMN0 started with pid=16
Tue Dec 2 03:00:07 2003
Shutting down instance (immediate)
License high water mark = 45
Tue Dec 2 03:00:10 2003
ALTER DATABASE CLOSE NORMAL
Tue Dec 2 03:00:14 2003
ARC0: Beginning to archive log# 5 seq# 177754
Tue Dec 2 03:00:14 2003
Thread 1 advanced to log sequence 177755
Current log# 2 seq# 177755 mem# 0: /general/oradata/OP3/redo_3.log
Current log# 2 seq# 177755 mem# 1: /tmptblsp/OP3/logfile/redo_4.log Tue Dec 2 03:00:21 2003
ARC0: Completed archiving log# 5 seq# 177754
Tue Dec 2 03:00:22 2003
Thread 1 advanced to log sequence 177756
Current log# 3 seq# 177756 mem# 0: /general/oradata/OP3/redo_5.log
Current log# 3 seq# 177756 mem# 1: /tmptblsp/OP3/logfile/redo_6.log Tue Dec 2 03:00:22 2003
ARC0: Beginning to archive log# 2 seq# 177755
ARC0: Completed archiving log# 2 seq# 177755
Tue Dec 2 03:00:34 2003
Thread 1 advanced to log sequence 177757
Current log# 1 seq# 177757 mem# 0: /general/oradata/OP3/redo_1.log
Current log# 1 seq# 177757 mem# 1: /tmptblsp/OP3/logfile/redo_2.log Tue Dec 2 03:00:34 2003
ARC0: Beginning to archive log# 3 seq# 177756
ARC0: Completed archiving log# 3 seq# 177756
Tue Dec 2 03:00:46 2003
Thread 1 advanced to log sequence 177758
Current log# 4 seq# 177758 mem# 0: /general/oradata/OP3/redo_7.log
Current log# 4 seq# 177758 mem# 1: /tmptblsp/OP3/logfile/redo_8.log Tue Dec 2 03:00:46 2003
ARC0: Beginning to archive log# 1 seq# 177757
ARC0: Completed archiving log# 1 seq# 177757
Tue Dec 2 03:00:59 2003
Thread 1 cannot allocate new log, sequence 177759
Checkpoint not complete
Current log# 4 seq# 177758 mem# 0: /general/oradata/OP3/redo_7.log
Current log# 4 seq# 177758 mem# 1: /tmptblsp/OP3/logfile/redo_8.log Tue Dec 2 03:01:09 2003 Thread 1 advanced to log sequence 177759
Current log# 5 seq# 177759 mem# 0: /general/oradata/OP3/redo_9.log
Current log# 5 seq# 177759 mem# 1: /tmptblsp/OP3/logfile/redo_10.log Tue Dec 2 03:01:09 2003
ARC0: Beginning to archive log# 4 seq# 177758
ARC0: Completed archiving log# 4 seq# 177758
Tue Dec 2 03:01:20 2003
Thread 1 cannot allocate new log, sequence 177760
Checkpoint not complete
Current log# 5 seq# 177759 mem# 0: /general/oradata/OP3/redo_9.log
Current log# 5 seq# 177759 mem# 1: /tmptblsp/OP3/logfile/redo_10.log Tue Dec 2 03:01:41 2003 Thread 1 advanced to log sequence 177760
Current log# 2 seq# 177760 mem# 0: /general/oradata/OP3/redo_3.log
Current log# 2 seq# 177760 mem# 1: /tmptblsp/OP3/logfile/redo_4.log Tue Dec 2 03:01:41 2003
ARC0: Beginning to archive log# 5 seq# 177759
ARC0: Completed archiving log# 5 seq# 177759
Tue Dec 2 03:01:51 2003
...
...
...continues with these type of messages until:
...
...
...
Tue Dec 2 04:37:19 2003
Thread 1 cannot allocate new log, sequence 177836
Checkpoint not complete
Current log# 2 seq# 177835 mem# 0: /general/oradata/OP3/redo_3.log
Current log# 2 seq# 177835 mem# 1: /tmptblsp/OP3/logfile/redo_4.log Tue Dec 2 04:37:30 2003 Thread 1 advanced to log sequence 177836
Current log# 3 seq# 177836 mem# 0: /general/oradata/OP3/redo_5.log
Current log# 3 seq# 177836 mem# 1: /tmptblsp/OP3/logfile/redo_6.log Tue Dec 2 04:37:30 2003
ARC0: Beginning to archive log# 2 seq# 177835
ARC0: Completed archiving log# 2 seq# 177835
Tue Dec 2 04:38:12 2003
Thread 1 cannot allocate new log, sequence 177837
Checkpoint not complete
Current log# 3 seq# 177836 mem# 0: /general/oradata/OP3/redo_5.log
Current log# 3 seq# 177836 mem# 1: /tmptblsp/OP3/logfile/redo_6.log Thread 1 advanced to log sequence 177837
Current log# 1 seq# 177837 mem# 0: /general/oradata/OP3/redo_1.log
Current log# 1 seq# 177837 mem# 1: /tmptblsp/OP3/logfile/redo_2.log Tue Dec 2 04:38:19 2003
ARC0: Beginning to archive log# 3 seq# 177836
ARC0: Completed archiving log# 3 seq# 177836
Tue Dec 2 04:38:47 2003
SMON: disabling tx recovery
SMON: disabling cache recovery
Tue Dec 2 04:41:08 2003
Thread 1 closed at log sequence 177837
Tue Dec 2 04:41:11 2003
Completed: ALTER DATABASE CLOSE NORMAL
Tue Dec 2 04:41:12 2003
ALTER DATABASE DISMOUNT
Completed: ALTER DATABASE DISMOUNT
archiving is disabled
Tue Dec 2 04:41:13 2003
ARCH shutting down
ARC0: Archival stopped
Could this have been due to the transaction being uncommitted and thus all the activity on shutdown was actually the
db trying to rollback?
December 09, 2003 - 8:17 am UTC
if the transaction was uncommitted -- yes, this activity is the database rolling back.
my checkpoints takes 1 hour?
Marvin, July 27, 2004 - 4:29 am UTC
Hi
I see checkpoint not complete msg in my alert:
Thread 1 advanced to log sequence 637
Current log# 8 seq# 637 mem# 0: /u02/oradata/SIMP/redo01/redo08_a.log
Current log# 8 seq# 637 mem# 1: /u03/oradata/SIMP/redo02/redo08_b.log
Thu Jul 22 15:16:57 2004
ARC1: Evaluating archive log 7 thread 1 sequence 636
Thu Jul 22 15:16:57 2004
SIMP; ARC1: Beginning to archive log 7 thread 1 sequence 636 (0.320627198:0.320650582)
ARCH: Connecting to console port...
Thu Jul 22 15:16:57 2004
SIMP; ARC1: Creating local archive destination LOG_ARCHIVE_DEST_1: '/u03/oradata/SIMP/arch/SIMP_531695692_1_636.arc' (thread 1 sequence 636)
ARCH: Connecting to console port...
Thu Jul 22 15:16:57 2004
SIMP; ARC1: Closing local archive destination LOG_ARCHIVE_DEST_1: '/u03/oradata/SIMP/arch/SIMP_531695692_1_636.arc'
ARCH: Connecting to console port...
Committing creation of archivelog '/u03/oradata/SIMP/arch/SIMP_531695692_1_636.arc'
Thu Jul 22 15:16:57 2004
SIMP; ARC1: Completed archiving log 7 thread 1 sequence 636
ARCH: Connecting to console port...
Thu Jul 22 16:29:53 2004
Thread 1 cannot allocate new log, sequence 638
Checkpoint not complete
Current log# 8 seq# 637 mem# 0: /u02/oradata/SIMP/redo01/redo08_a.log
Current log# 8 seq# 637 mem# 1: /u03/oradata/SIMP/redo02/redo08_b.log
The thing is if I see my log history my last log switch is 1 hour ago so how can I get this message?
634 532188010 1 634 320590412 20040722 13:44:09 320618825 1 20040716 21:14:52
635 532188863 1 635 320618825 20040722 14:00:10 320627198 1 20040716 21:14:52
636 532192617 1 636 320627198 20040722 14:14:23 320650582 1 20040716 21:14:52
637 532196995 1 637 320650582 20040722 15:16:57 320693752 1 20040716 21:14:52
638 532198486 1 638 320693752 20040722 16:29:55 320704883 1 20040716 21:14:52
639 532198500 1 639 320704883 20040722 16:54:46 320708220 1 20040716 21:14:52
Are my checkpoints taking over one hour?!
July 27, 2004 - 7:27 am UTC
could well be - you have 500meg log files -- that could protect gigabytes and gigabytes of changes in the buffer cache (remember a 50 byte change to a block is going to create a FULL BLOCK to flush -- say 8k, but only a couple of bytes of redo)
either use more smaller logs (same total amount of log) to have more frequent checkpoints or use incremental checkpointing -- log_checkpoint_interval, log_checkpoint_timeout -- to make dbwr more aggresive.
my redo log are 50mb
Marvin, July 27, 2004 - 9:40 am UTC
Hi
the redo logs are 50mb not 500mb!
July 27, 2004 - 9:55 am UTC
634 532188010 1 634 320590412 20040722 13:44:09
320618825 1 20040716 21:14:52
635 532188863 1 635 320618825 20040722 14:00:10
320627198 1 20040716 21:14:52
636 532192617 1 636 320627198 20040722 14:14:23
320650582 1 20040716 21:14:52
637 532196995 1 637 320650582 20040722 15:16:57
320693752 1 20040716 21:14:52
638 532198486 1 638 320693752 20040722 16:29:55
320704883 1 20040716 21:14:52
639 532198500 1 639 320704883 20040722 16:54:46
320708220
given I don't know what query produced that, I assumed column 2 was the size.
so, what query was that.
I make a pardon
Marvin, July 27, 2004 - 10:16 am UTC
Sorry should have posted the query!
select * from v$log_history where sequence# between 632 and 640;
RECID STAMP THREAD# SEQUENCE# FIRST_CHANGE# FIRST_TIME NEXT_CHANGE# RESETLOGS_CHANGE# RESETLOGS_TIME
---------- ---------- ---------- ---------- ------------- ----------------- ------------ ----------------- -----------------
632 532184671 1 632 320529820 20040722 12:28:19 320557098 1 20040716 21:14:52
633 532187049 1 633 320557098 20040722 13:04:31 320590412 1 20040716 21:14:52
634 532188010 1 634 320590412 20040722 13:44:09 320618825 1 20040716 21:14:52
635 532188863 1 635 320618825 20040722 14:00:10 320627198 1 20040716 21:14:52
636 532192617 1 636 320627198 20040722 14:14:23 320650582 1 20040716 21:14:52
637 532196995 1 637 320650582 20040722 15:16:57 320693752 1 20040716 21:14:52
638 532198486 1 638 320693752 20040722 16:29:55 320704883 1 20040716 21:14:52
639 532198500 1 639 320704883 20040722 16:54:46 320708220 1 20040716 21:14:52
640 532198516 1 640 320708220 20040722 16:55:00 320711583 1 20040716 21:14:52
9 rows selected.
It was STAMP.
So I am puzzled how can a checkpoint take one hoourr
This is 10g btw
rgds
July 27, 2004 - 11:08 am UTC
checkpoints are by design done slowly, lazy in the background.
50meg of log can still be covering lots and lots and lots of database blocks.
enable incremental check pointing
or
increase the number of logs
I have many logs
Marvin, July 27, 2004 - 11:16 am UTC
Hi
I have 8 groups of 50mb redologs!
Guess 16 will do
thx
July 27, 2004 - 12:54 pm UTC
or use incremental checkpointing.
is this incremental checkpointing?
A reader, July 28, 2004 - 10:54 am UTC
I have this to achive incremental checkpoints
fast_start_mttr_target integer 1200
not good enough? :-?
July 28, 2004 - 1:19 pm UTC
if you are the reader from above
and
if you have this set
then
apparently it is not set small enough
Checkpoints - concept
A reader, July 29, 2004 - 12:25 am UTC
Tom,
Ive read the concepts manual on the DBWR, LGWR, CKPT background processes about 3 times before posting this question :). I still find it a little hard to grasp as to why checkpoints take very long sometimes.
From what I understand, 1) DBWRs usually do a good job in flushing dirty buffers to disk. One could ofcourse configure multiple DBWRs if they see a need. 2) LGWRs also seem to do a good job of flushing from redo buffers to redolog files. It is the checkpointing process (either using CKPT or LGWR) that is invariably the real bottleneck. But why??
Checkpoints merely update file headers with updated SCNs upon a commit correct? I realize that it is proportional to number of datafiles, io/disk speed. But conceptually, I dont understand why it should take very long sometimes to update a few datafiles (say 100), controlfiles with a few SCNs? I can understand if every data block had something that had to be updated with this information. But that isnt the case right? Just the datafile headers and controlfiles get updated with the new SCN. The data has already been flushed to disk (the individual data blocks) by the DBWR. So why the big delay now to just update file headers? Even if the disks were really slow, why should it take all that long??
Thanks for your time.
July 29, 2004 - 7:41 am UTC
lgwr signals a checkpoint,
dbwr is the main checkpoint process, ckpt has the relatively simple job of updating file headers.
dbwr is invariably the bottleneck in a checkpoint.
checkpoints flush the blocks in the buffer cache that are protected by redo. if you generate 50meg of redo -- you might be protecting thousands and thousands of blocks in the buffer cache. before we can reuse that 50meg redo log file -- we must flush to disk all of those thousands of dirty blocks.
remember:
update t set indexed_number_column = 0 where primary_key = 1;
updates a single row and generates very little redo (lets say 228 bytes for grins
ops$tkyte@ORA9IR2> update t set x = 0 where pk = 1;
1 row updated.
Statistics
----------------------------------------------------------
0 recursive calls
1 db block gets
1 consistent gets
0 physical reads
228 redo size
784 bytes sent via SQL*Net to client
800 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
1 rows processed
(just enough redo to say "change indexed_number_column to zero for rowid "x").
but -- in the buffer cache, we have just
a) changed an 8k table block that will need to be written
b) maybe caused a row migration (update a null to a non-null) so we might
have 2 8k table blocks
c) updated perhaps 2 index leaf blocks (column was 1000000000000000, is now 0,
index entry moved from the right to the left.
d) maybe caused a split up an index -- say 2 more blocks.
so that measely 228 bytes of redo might be protecting 4 8k blocks (32k of data) that dbwr must flush out at some point.
this is why we use redo -- instead of just writing to the datafiles when you commit. we can write 228 bytes sequentially to the log files -- instead of writing 32k of data out using random IO (lots of seeking around)
the checkpoint writes out all of these dirty blocks -- that is what takes so "long"
LGWR, DBWR concepts
A reader, July 29, 2004 - 1:43 pm UTC
Tom,
Thanks for the response. So, DBWR is the essentially culprit. And I used to think (wrongly?) that DBWR was intelligent enough to know when to flush to disk in order to generate optimal performance. But that isn't the case.
When a massive update/delete happens, I conclude from your response that DBWR lazily appears to sit there doing nothing, while the LGWR works fervently in the background. The only time the DBWR wakes up is 1) when Oracle needs a free db_buffer and there isnt available one. Then the DBWR will flush it even if there are uncommitted txns OR 2) The LGWR isnt able to do its job the logfiles get wrapped around too fast and so it eventually forces a checkpoint so it could reuse redo group 1. The interim checkpoints during log switches funnily got ignored by Oracle in order to satisfy the load of the single massive update/delete (no matter what the log_checkpoint_interval is set to). So, CKPT is pretty lazy too. The only guy doing any work under pressure is the LGWR.
Did I get it straight?
> so that measely 228 bytes of redo might be protecting 4 8k blocks (32k of data) that dbwr must flush out at some point
One other thing how can I verify this? How can I verify that the DBWR wrote 4 blocks? With a trace and/or v$sysstat?
Many thanks for your time and patience.
July 29, 2004 - 2:15 pm UTC
dbwr is a piece of software that reacts according to the parameters set for it.
lgwr works in the background whilst you are filling up the redo log buffer and lgwr works "in the foreground" when you commit (you wait for lgwr)...
dbwr flushes blocks because we ran out of them (cache is full)
or lgwr signals a checkpoint on the log switch (not just when it switches all of the way around)
or because an incremental checkpoint was signaled (fast_start_mttr, log_checkpoint*)
and it does this every 3 seconds at least (makes the checks)
It is not that lgwr isn't able to do its job at all -- it is because lgwr did its job!
the interim checkpoints *did not get ignored*, they just did not complete. big difference.
ops$tkyte@ORA9IR2> select * from v$sysstat where upper(name) like '%DBWR%';
STATISTIC# NAME CLASS VALUE
---------- ------------------------------ ---------- ----------
49 DBWR checkpoint buffers writte 8 155
n
50 DBWR transaction table writes 8 21
51 DBWR undo block writes 8 45
52 DBWR revisited being-written b 8 0
uffer
53 DBWR make free requests 8 0
54 DBWR free buffers found 8 0
55 DBWR lru scans 8 0
56 DBWR summed scan depth 8 0
57 DBWR buffers scanned 8 0
58 DBWR checkpoints 8 0
59 DBWR cross instance writes 40 0
60 DBWR fusion writes 40 0
12 rows selected.
v$sysstat
A reader, July 29, 2004 - 2:42 pm UTC
Thank you for the explanation.
> select * from v$sysstat where upper(name) like '%DBWR%';
Which statistic in the output tells me # of blocks written? I see # of checkpoint buffers written - but not # of blocks written to the datafiles. I guess it is the "physical writes" parameter that I should be looking at - correct?
July 29, 2004 - 3:04 pm UTC
well, physical writes is number of write calls -- and we could batch them up.
checkpoint buffer is the number of block buffers that were written for checkpoints.
ops$tkyte@ORA9IR2> select * from v$sysstat where upper(name) like '%DBWR%';
STATISTIC# NAME CLASS VALUE
---------- ------------------------------ ---------- ----------
49 DBWR checkpoint buffers writte 8 1056
n
50 DBWR transaction table writes 8 275
51 DBWR undo block writes 8 586
52 DBWR revisited being-written b 8 0
uffer
53 DBWR make free requests 8 35
54 DBWR free buffers found 8 47717
55 DBWR lru scans 8 35
56 DBWR summed scan depth 8 47747
57 DBWR buffers scanned 8 47747
58 DBWR checkpoints 8 1
59 DBWR cross instance writes 40 0
60 DBWR fusion writes 40 0
12 rows selected.
ops$tkyte@ORA9IR2> update big_table.big_table set object_name = object_name where rownum <= 10000;
10000 rows updated.
ops$tkyte@ORA9IR2> commit;
Commit complete.
ops$tkyte@ORA9IR2> alter system checkpoint;
System altered.
ops$tkyte@ORA9IR2> select * from v$sysstat where upper(name) like '%DBWR%';
STATISTIC# NAME CLASS VALUE
---------- ------------------------------ ---------- ----------
49 DBWR checkpoint buffers writte 8 1885
n
50 DBWR transaction table writes 8 295
51 DBWR undo block writes 8 756
52 DBWR revisited being-written b 8 0
uffer
53 DBWR make free requests 8 35
54 DBWR free buffers found 8 47717
55 DBWR lru scans 8 35
56 DBWR summed scan depth 8 47747
57 DBWR buffers scanned 8 47747
58 DBWR checkpoints 8 2
59 DBWR cross instance writes 40 0
60 DBWR fusion writes 40 0
12 rows selected.
sendmail when logswitch happens.
reader, September 16, 2004 - 5:20 am UTC
is it possiable to send a alert to know when log switch happens?(possiable mail)... or trigger a shell.or at job...
thanks in advance .
September 16, 2004 - 8:15 am UTC
wouldn't that be annoying after a while?
what is the goal here, why would you do this? (maybe I can help you achieve the goal -- not sending the email, but whatever it is you were trying to "achieve" in the end)
Yes
Invisible, March 18, 2005 - 10:42 am UTC
Our database definitely has peaks and valleys. In particular, data is slowly entered into the DB, and then when a project is finished, the data gets "exported" into some archival format, and the entire project is then deleted from the DB in a single operation. This deletion, I would imagine, generates a pretty big redo spike. (After all, until the transaction commits, we might want to UNDO all that deletion...)
Anyway, I too am getting the message about no logs being free. (I can well believe it - I saw 41 log switches in less than 3 minutes in one case!)
I currently have 4 x 1MB redo logs. In your opinion, should I make the logs bigger, or just have more of them? (Or maybe I should do both?)
Oracle 8i - if it makes any difference.
Thanks - you rule BTW. ;-)
March 18, 2005 - 11:01 am UTC
4meg is tiny.
I'd make them larger (will increase the time between checkpoints -- could increase instance recovery time, but there are other parameters you can set to control that as well)
OK
Invisible, March 18, 2005 - 11:33 am UTC
Right. Well, I've never had a database recovery session take longer than about... oh... 17 seconds. (Once you find the necessary file, if they have been moved.)
Anyway, thanks for your time Tom.
March 18, 2005 - 11:55 am UTC
I'm talking instance recovery -- the more blocks dirty in the cache, the longer rolling forward takes after a crash
Typical
Invisible, April 12, 2005 - 6:45 am UTC
OK, well... the logs were too small. And now it seems I've made them too big. Heh!
Before I make any more changes to my DB, is there any way to figure out "how full" the current online redo log is? (I've looked at all the v$ views I can find, but none seems to tell me what I'm after...)
April 12, 2005 - 7:30 am UTC
what is your goal? If it is to mimimize or bound the recovery time after a crash, we have parameters for that.
redo logs are not really "full", they are a big circular buffer. What would you be wanting to measure by "fullness"
Bekka
Invisible, April 12, 2005 - 7:37 am UTC
I'd just like it so that at least 1 log switch happen each day. (After all, only *archived* redo log gets onto tape...)
Yes, the logs are used in a circular fasion. Once the current log is filled up, Oracle starts filling the next one (while at the same time archiving the one it just filled). I'd like to know how close that is to happening - basically, to know how much redo we generate in a day. Currently, the only way I can figure out is to ask for a log switch, and see how big the archived log is.
As I say, I'd like one or more log switch per day. As I see it, I can make the logs smaller again, or I can schedule a job to run each day to ask Oracle to switch logs. (I assume it's possible to do such a thing.)
...alternatively, it might just be that they aren't using the DB much right now. Who knows? ;-) Anyway, would be nice to measure this thing...
April 12, 2005 - 7:42 am UTC
in 9i, there is an archive_lag_target you can use (in the future...)
You can use the v$sysstat table to see redo generated over a period of time, look at it, come back later, look again and subtract.
*
Invisible, April 12, 2005 - 7:55 am UTC
Ooo... *lots* of statistics there... mmm. :-)
User commits = user rollbacks = 0... OOC, do these stats reset when you close the DB? Or are they permanently saved on disk?
Anyway, I assume it's the class 2 stuff I need to stare at...
("Blocks written" or something.)
BTW - thanks for the 9i tip. Can't wait to upgrade. LOL! (That is, assuming they don't decide to centralise the thing - that would be bad.)
April 12, 2005 - 8:47 pm UTC
ops$tkyte@ORA9IR2> select name, value from v$sysstat where name like '%redo%';
NAME VALUE
------------------------------ ----------
redo synch writes 3
redo synch time 0
redo entries 492
redo size 126044
redo buffer allocation retries 0
redo wastage 11120
redo writer latching time 0
redo writes 38
redo blocks written 277
redo write time 0
redo log space requests 0
redo log space wait time 0
redo log switch interrupts 0
redo ordering marks 0
14 rows selected.
redo size... lots of redo stats.
Good
Invisible, April 13, 2005 - 5:10 am UTC
OK, well thanks for your help. I'm going to go away and do some testing now. But what I'm probably going to do is schedule a job to switch the logs periodically.
One last thing... do I have to use an "at" job to schedule this? Or does Oracle itself have some built-in facility to run a command periodically?
April 13, 2005 - 9:13 am UTC
dbms_job
Highly Active DB
VInayak, April 14, 2005 - 10:16 am UTC
I'd just like to share on eof my past experinces. I was with a company that does processing related to credit cards. One of their databasa was very very actively used. Just to give an idea, they had E10K machine, 22 CPUs , 18GB memory and they were using EMC PowerPath for I/O balancing. The amount of redo generated was HUGE. When I joined there, this databse was having really very bad performance. By checking the ALERT logfile, found that they are doing checkpoint almost EVERY SECOND. their redolog file size was only 2MB. Increased the redo log file to 64MB and it was still doing the checkpoint almost every 30 seconds. and we were still getting checkpoint not complete messages in alert log. To make it short, in the end, what we did was we created a total of 12 redo log files each with 2GB of size. With that size, we were checkpoint after about 15 minutes.
file size and checkpoint co-relation
hrishy, April 21, 2005 - 9:26 am UTC
Hi Tom
My db size is growing and approximately it would be around 5Tb.I am just wundering what should be the size of the datafiles .Should i restrict them to 2Gb each or we should allow them to grow 20Gb each..
My concern is if i have 2Gb datafile recovery would be easier isnt it
If i have 20Gb datafile checkpointing will be faster (i think )how much overhead does the checkpointing add anyway if we go for 2gb each file
we will be taking backups through flash snap from veritas.Which puts all the tablespaces in hotbackup mode and just breaks the mirrors.
how do i strike a balance ?
My linux os does not have problems with 2gb file systems.
How do find the blocksize for ext3 file systems on linux like its 8kb 4kb etc ?
April 21, 2005 - 12:00 pm UTC
I would not use a checkpoint as my criteria for a filesize. ckpt does the updating of the file headers, dbwr does single block (in general) writes to flush to disk
you use the file size that makes you feel comfortable.
file size and checkpoint co-relation
Vinayak, April 22, 2005 - 12:05 am UTC
To : hrishy
I had 5 TB database and I'm using some datafiles with 64GB sizes ....
In 10g, they have BIGfile tablespace ... you'll working just on tablespaces, not on datafiles anymore ...like
ALTER TABLESPACE bigts RESIZE <whatever> ;
For Invisible
whizkid, April 25, 2005 - 4:49 am UTC
SELECT
le.leseq CURRENT_LOG_SEQUENCE#,
100*cp.cpodr_bno/LE.lesiz PERCENTAGE_FULL
from
x$kcccp cp,x$kccle le
WHERE
LE.leseq =CP.cpodr_seq
/
The above script will tell you how of your current redo log file is full.. i dont know whether it will fulfill your purpose..
One more questions
Sandra, June 20, 2005 - 2:04 pm UTC
I talked to one of Oracle analyist through Metalink,
he said this could bring the database down eventually, is this true ? If not what is the worse case of scenario could happened ? Is this could cause the whole database hanging even for query or not allowed any dml any more?
June 20, 2005 - 2:52 pm UTC
virtually all x$ and v$ queries are "expensive" and if you run them frequently, you will likely be preventing someone from doing something.
the reason: they are looking at in memory data structures typically and therefore need to latch (lock) them so they don't change a pointer or something while you are reading it. So, the act of you reading it will prevent others from modifying it.
I've not understood why you would need on a regular repeating basis to know "how full" a logfile is. Maybe once out of curiosity
I am sorry
Sandra, June 21, 2005 - 2:54 pm UTC
Sorry , I didn't make it clear. What I mean is the message "checkpoint no complete, can not add new logs".
Since it won't give you any ORA- error and the analyst from Oracle said it could bring the instance down without warnings, that why I what to ask if this is true. And how to be precautious .
Thanks very much
June 21, 2005 - 8:59 pm UTC
no, it will not bring down the database, it will cause the database to appear to 'pause' while it happens, while we finish the checkpoint.
Archiving and checkpoint
A reader, July 12, 2005 - 1:43 am UTC
Can I assume safely that the log switch check point ensures that the archive log copy process is successfully over??
July 12, 2005 - 4:41 pm UTC
there is a check point fired by a log switch.
there is a ARCH process that archives log files and it'll be notified when the log switches as well.
but they are two independent events, related only in that both are signalled on a log switch.
ORA-12519
Marcio Portes, July 18, 2005 - 9:18 pm UTC
Tom, have you heard some side effect as ORA-12519 by having "cannot allocate new log, sequence ..." contention before?
It's about a java application under Tomcat.
alert is pointing "cannot allocate new log, ..."
Before increasing redolog, I'd like to know if it's related each other.
Regards,
Marcio Portes.
July 19, 2005 - 7:21 am UTC
well, if the logs got stuck and the middle tier tried to connect - it would get stuck (as the database is stuck until you clear the archive destination and let it archive or the checkpoint completes -- totally a configuration issue, should never actually happen)
As the middle tier gets stuck, more people try to come in and more try to connect.
Eventually I could see the database server having issues and returning errors, definitely. All new connections would at best "hang" and if you swamped the listener, it could start failing.
I have not observed this error in that situation, but that situation is something that needs to be fixed regardless.
A reader, September 17, 2005 - 10:54 am UTC
What is the use of ALERT LOG FILES for DBA(checkpoint information is needed for DBA or anything else also)
what is the important information DBA will get from ALERT LOG FI
September 17, 2005 - 11:36 am UTC
You get informational messages (startup, shutdown, log switch)
You get ora-600 messages
You get job failure messages
You get space management related failure messages
and so on.
Checkpoint and DBWR
sushil, December 08, 2005 - 6:25 pm UTC
Dear Tom
Suddendly after increasing the number of dbwr from 4 to 6(which we did after free buffer waits were encountered as top event in the staspack and the application was hanging) now that "Checkpoint not complete" have started appearing in the log files too frequently since last few days.Could you explain why by increasing the dbwr that message should start appearing all of a sudden?
December 09, 2005 - 1:32 am UTC
because perhaps you removed one bottleneck so people are generating more changes more frequently and the 6 dbwrs are not sufficient to keep up with that workload.
the removal of a bottleneck frequently lets some other bottleneck pop to the top.
Perhaps you need to add additional redo log or figure out why the dbwrs cannot checkpoint sufficiently fast (is something wrong with the IO on that system)
Checkpoint and DBWR
Sushil, December 12, 2005 - 6:58 am UTC
I checked with my system admin for any io contention, he says it looks alright to him.In the meantime that particular wait "Checkpoint not complete" has vanished from the statspack report as well eventhough we have not done any specific change at all.
Also while that wait event came in we did not do less transactions as suggested we were still doing 310 transactions/sec.So after increasing the dbwr from 4 to 6 that wait popped up for 4 days and now since couple of days theres no sign of it.Could you shed some light on this as your comments are invaluable.
Thanks.
December 12, 2005 - 8:12 am UTC
insufficient data. How can the SA say "it was alright" unless they were looking at it right then and there?
Sounds like you had a burst of activity, you had insufficient online redo log configured to carry you through this burst, lgwr had to wait for the checkpoints to therefore complete.
Sushil, December 13, 2005 - 11:15 am UTC
Checked with the app team they say they dint run any reports nor there was any module added.In short there was no burst at that time and my concern is that it wasnt for a hour or so it came continuosly for 4 days then disappeared again.
My logs are sized at 500MB with 8 groups and log_bugger=10M.
Now since that event has gone and application is running fine do you still think that increasing the size or groups will help or should i leave it as it is?
If i do increase the size of redo logs is there any direct corealation between log_buffer and redo log size.
Thanks
December 13, 2005 - 1:20 pm UTC
something happened during that period of time, doesn't matter that no one is owning up to it, something else was going on.
period. You had some processing happening that the shutdown "killed" and it didn't get started again.
Rather than shutdown next time, look around, see what is actually going on. You might be surprised.
Sushil, December 14, 2005 - 6:55 am UTC
Thanks Tom
And what about my second query as regarding the increasing log buffer and log size?
December 14, 2005 - 8:41 am UTC
log buffer won't affect the amount of generated redo log directly, it will affect how fast you can generate it, which in turn may affect the amount of generated redo log in total! (think about it....)
a larger log buffer might let you generate more log in the same amount of time.
a larger log buffer will not cause you to generate more log than you would for the same operation performed with a smaller log buffer.
Checkpoint not complete
Vikas Khanna, April 07, 2006 - 7:51 am UTC
Hi Tom,
The alert log contents are following:
Fri Apr 7 00:44:09 2006
Thread 1 cannot allocate new log, sequence 1114
Checkpoint not complete
Current log# 2 seq# 1113 mem# 0: /redo-02-a/databases/admw/redo-t01-g02-m1.log
Current log# 2 seq# 1113 mem# 1: /redo-04-a/databases/admw/redo-t01-g02-m2.log
Private_strands 121 at log switch
Beginning log switch checkpoint up to RBA [0x45a.2.10], SCN: 0x0000.7040ac15
Thread 1 advanced to log sequence 1114
Current log# 4 seq# 1114 mem# 0: /redo-02-a/datab Current log# 4 seq# 1114 mem# 1: /redo-04-a/databases/admw/redo-t01-g04-m2.log
Fri Apr 7 00:44:12 2006
ARC2: Evaluating archive thread 1 sequence 1113
ARC2: Destination LOG_ARCHIVE_DEST_2 archival not expedited
Fri Apr 7 00:44:12 2006
ARC2: Beginning to archive thread 1 sequence 1113 (1883282802-1883286549) (admw)
ARCH: Connecting to console port...
Fri Apr 7 00:44:12 2006
ARC2: Creating local archive destination LOG_ARCHIVE_DEST_1: '/arch-01/databases/admw/redolog-1113-1-583227512.arc' (thread 1 sequence 1113)
(admw)
ARCH: Connecting to console port...
Fri Apr 7 00:44:12 2006
ARC2: Closing local archive destination LOG_ARCHIVE_DEST_1: '/arch-01/databases/admw/redolog-1113-1-583227512.arc'
(admw)
ARCH: Connecting to console port...
Committing creation of archivelog '/arch-01/databases/admw/redolog-1113-1-583227512.arc'
Invoking non-expedited destination LOG_ARCHIVE_DEST_2 thread 1 sequence 1113 host admw_b
Fri Apr 7 00:44:12 2006
ARC2: Completed archiving thread 1 sequence 111ARCH: Connecting to console port...
FAL[server, ARC2]: Begin FAL noexpedite archive (branch 583227512 thread 1 sequence 1113 dest admw_b)
Fri Apr 7 00:44:12 2006
ARC2: Creating remote archive destination LOG_ARCHIVE_DEST_2: 'admw_b' (thread 1 sequence 1113)
(admw)
ARCH: Connecting to console port...
ARC2: Standby redo logfile selected for thread 1 sequence 1113 for destination LOG_ARCHIVE_DEST_2
Fri Apr 7 00:44:12 2006
ARC2: Closing remote archive destination LOG_ARCHIVE_DEST_2: 'admw_b'
(admw)
ARCH: Connecting to console port...
FAL[server, ARC2]: Complete FAL noexpedite archive (thread 1 sequence 1113 destination admw_b)
Fri Apr 7 00:49:14 2006
Completed checkpoint up to RBA [0x45a.2.10], SCN: 1883286549
Fri Apr 7 00:53:29 2006
Incremental checkpoint up to RBA [0x45a.7a1.0], current log tail at RBA [0x45a.ec1.0]
Fri Apr 7 01:04:25 2006
Fri Apr 7 01:14:09 2006
Thread 1 cannot allocate new log, sequence 1115
Checkpoint not complete
Current log# 4 seq# 1114 mem# 0: /redo-02-a/databases/admw/redo-t01-g04-m1.log
Current log# 4 seq# 1114 mem# 1: /redo-04-a/databases/admw/redo-t01-g04-m2.log
Private_strands 121 at log switch
Beginning log switch checkpoint up to RBA [0x45b.2.10], SCN: 0x0000.70414f38
Thread 1 advanced to log sequence 1115
Current log# 3 seq# 1115 mem# 0: /redo-01-a/databases/admw/redo-t01-g03-m1.log
Current log# 3 seq# 1115 mem# 1: /redo-03-a/databases/admw/redo-t01-g03-m2.log
Fri Apr 7 01:14:12 2006
ARC2: Evaluating archive thread 1 sequence 1114
ARC2: Destination LOG_ARCHIVE_DEST_2 archival not expedited
Fri Apr 7 01:14:12 2006
ARC2: Beginning to archive thread 1 sequence 1114 (1883286549-1883328312) (admw)
ARCH: Connecting to console port...
Fri Apr 7 01:14:12 2006
ARC2: Cre (admw)
ARCH: Connecting to console port...
Fri Apr 7 01:14:14 2006
ARC2: Closing local archive destination LOG_ARCHIVE_DEST_1: '/arch-01/databases/admw/redolog-1114-1-583227512.arc'
(admw)
ARCH: Connecting to console port...
Committing creation of archivelog '/arch-01/databases/admw/redolog-1114-1-583227512.arc'
Invoking non-expedited destination LOG_ARCHIVE_DEST_2 thread 1 sequence 1114 host admw_b
Fri Apr 7 01:14:14 2006
ARC2: Completed archiving thread 1 sequence 1114 (1883286549-1883328312) (admw)
ARCH: Connecting to console port...
FAL[server, ARC2]: Begin FAL noexpedite archive (branch 583227512 thread 1 sequence 1114 dest admw_b)
Fri Apr 7 01:14:14 2006
ARC2: Creating remote archive destination LOG_ARCHIVE_DEST_2: 'admw_b' (thread 1 sequence 1114)
(admw)
ARCH: Connecting to console port...
ARC2: Standby redo logfile selected for thread 1 sequence 1114 for destination LOG_ARCHIVE_DEST_2
Fri Apr 7 01:14:17 2006
ARC2: Closing remote archive destination LOG_ARCHIVE_DEST_2: 'a (admw)
ARCH: Connecting to console port...
FAL[server, ARC2]: Complete FAL noexpedite archive (thread 1 sequence 1114 destination admw_b)
Fri Apr 7 01:19:12 2006
Completed checkpoint up to RBA [0x45b.2.10], SCN: 1883328312
Fri Apr 7 01:23:29 2006
Incremental checkpoint up to RBA [0x45b.1e8.0], current log tail at RBA [0x45b.1237.0]
Fri Apr 7 01:44:09 2006
Thread 1 cannot allocate new log, sequence 1116
Checkpoint not complete
Current log# 3 seq# 1115 mem# 0: /redo-01-a/databases/admw/redo-t01-g03-m1.log
Current log# 3 seq# 1115 mem# 1: /redo-03-a/databases/admw/redo-t01-g03-m2.log
Private_strands 121 at log switch
Beginning log switch checkpoint up to RBA [0x45c.2.10], SCN: 0x0000.70415e20
Thread 1 advanced to log sequence 1116
Current log# 1 seq# 1116 mem# 0: /redo-01-a/databases/admw/redo-t01-g01-m1.log
Current log# 1 seq# 1116 mem# 1: /redo-03-a/databases/admw/redo-t01-g01-m2.log
Fri Apr 7 01:44:12 2006
ARC2: Evaluating archive thread 1 sequence 1115
ARC2: Destination LFri Apr 7 01:44:12 2006
ARC2: Beginning to archive thread 1 sequence 1115 (1883328312-1883332128) (admw)
ARCH: Connecting to console port...
Fri Apr 7 01:44:12 2006
ARC2: Creating local archive destination LOG_ARCHIVE_DEST_1: '/arch-01/databases/admw/redolog-1115-1-583227512.arc' (thread 1 sequence 1115)
(admw)
ARCH: Connecting to console port...
Fri Apr 7 01:44:12 2006
ARC2: Closing local archive destination LOG_ARCHIVE_DEST_1: '/arch-01/databases/admw/redolog-1115-1-583227512.arc'
(admw)
ARCH: Connecting to console port...
Committing creation of archivelog '/arch-01/databases/admw/redolog-1115-1-583227512.arc'
Invoking non-expedited destination LOG_ARCHIVE_DEST_2 thread 1 sequence 1115 host admw_b
Fri Apr 7 01:44:12 2006
ARC2: Completed archiving thread 1 sequence 1115 (1883328312-1883332128) (admw)
ARCH: Connecting to console port...
FAL[server, ARC2]: Begin FAL noexpedite archive (branch 583227512 thread 1 sequence 1115 dest admw_b)
Fri Apr 7 01:44:12 2006
ARC2: Creating rem (admw)
ARCH: Connecting to console port...
ARC2: Standby redo logfile selected for thread 1 sequence 1115 for destination LOG_ARCHIVE_DEST_2
Fri Apr 7 01:44:13 2006
ARC2: Closing remote archive destination LOG_ARCHIVE_DEST_2: 'admw_b'
(admw)
ARCH: Connecting to console port...
FAL[server, ARC2]: Complete FAL noexpedite archive (thread 1 sequence 1115 destination admw_b)
Fri Apr 7 01:49:14 2006
Completed checkpoint up to RBA [0x45c.2.10], SCN: 1883332128
Fri Apr 7 01:53:29 2006
Incremental checkpoint up to RBA [0x45c.28a.0], current log tail at RBA [0x45c.c48.0]
Fri Apr 7 02:14:07 2006
Thread 1 cannot allocate new log, sequence 1117
Checkpoint not complete
Even after setting the few parameters of the init.ora as, still getting the error message :
sys@ADMW>show parameter db_writer
NAME TYPE VALUE
------------------------------------ ----------- ----------db_writer_processes integer 4
sys@ADMW>show parameter fast_start_mttr
NAME TYPE VALUE
------------------------------------ ----------- ----------fast_start_mttr_target integer 120
The V$log is also displayed :
sys@ADMW>/
GROUP# THREAD# SEQUENCE# BYTES MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIME
---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- -----------
1 1 1132 536870912 2 YES INACTIVE 1883674146 07-APR-2006
2 1 1133 536870912 2 YES INACTIVE 1883708203 07-APR-2006
3 1 1135 536870912 2 NO CURRENT 1883780045 07-APR-2006
4 1 1134 536870912 2 YES INACTIVE 1883746995 07-APR-2006
What else can be done to get rid of the error. The redo logs are on independent disks and the ARCH is reading them from these filesystems and creating them on a totally different file system on different controllers and disks.
Please advice, what to do to solve this error. Do I need to add more groups, so that DBWR gets sufficient time to flush the dirty buffers from the queue which have already been logged when LGWR attempts to reuse the file.
Your advice will be appreciated.
Thanks
April 08, 2006 - 8:48 am UTC
there is no error, there is a message.
You either
a) checkpoint more aggresively (lower fast start mttr target, MORE redo log files of smaller sizes - each switch triggers one, log_checkpoint_* parameters)
b) make dbwr faster if it is slow (async io, io slaves, multiple dbwrs)
c) add more log files to get you through the peak processing time without having to write around
Why add a logfile?
Nish Garg, April 07, 2006 - 4:32 pm UTC
Tom,
You are advising to add new logfile to minimize 'Checkpoint not comple' error. I don't understand this. If check checkpoint occurs at every logswitch, the if you have 3 logfiles or 1000, how it is going to help? Database will not switch to next logfile unless check point is not complete. So in my opinion increasing checkpoint frequency or increasing logfile size are the only two options to handle 'checkpoint not complete' error. Please correct me if I am wrong?
April 08, 2006 - 9:44 am UTC
database will not reuse a logfile that is still protecting data that exists only in the buffer cache.
If you had 3 logfiles (a,b,c) when we switched from A to B - that triggered a checkpoint. All of the blocks protected by the redo in A must be on disk before A can be reused.
So, we are using B now, then we switch to C. When C is filled we switch to A.
IF the checkpoint triggered by switching out of A is not yet complete - we cannot use A right now, we must wait.
But - what if there were a fourth log file - D. When C filled - we do not try to use A again (not yet). We use D. No waiting. We have a bit more time to checkpoint the data protected by A. Perhaps now when D is filled - sufficient time has passed and A is safe to reuse - without waiting.
If you have access to either Expert One on Oracle Oracle (my first book) or Expert Oracle Database Architecture (most recent book) I go into this in more detail.
In short, adding another logfile will postpone the reuse of any given logfile (by the time it takes to fill that newly added logfile). It can make it so that you get through bursts of activity without encountering the checkpoint not complete message.
Great response
Nish Garg, April 10, 2006 - 2:49 pm UTC
Thanks for your response.
Is there any drawback of having many online redo log files. I have currently 14 logfiles, why not 100 ?
April 11, 2006 - 11:41 am UTC
each switch will initiate a checkpoint.
the real question is - do you want 14 large redo logs that give you X gig of online redo or do you want 100 smaller redo logs that give you X gig of online redo log.
The answer comes from within - do you want to initiate checkpoints that often, or not.
but the discussion only applies if we are discussing "X gig" of online redo log and varying only the number of files. If you have 14-50mb files or 100-50mb files - don't really care, pretty much the same. Although 100 is sort of extreme (high) in practice.
Add logfiles
Vikas, April 11, 2006 - 5:53 am UTC
Hi tom,
Based on your suggestion added 4 more log groups of SIZE 512M, to already existing 4 redo log groups but still the message in the alert log doesn't goes off.
sys@ADMW>/
GROUP# THREAD# SEQUENCE# BYTES MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIME
---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- -----------
1 1 1332 536870912 2 YES INACTIVE 1888231924 11-APR-2006
2 1 1333 536870912 2 YES INACTIVE 1888251634 11-APR-2006
3 1 1331 536870912 2 YES INACTIVE 1888202124 11-APR-2006
4 1 1334 536870912 2 NO CURRENT 1888283242 11-APR-2006
5 1 1327 536870912 2 YES INACTIVE 1888127096 11-APR-2006
6 1 1328 536870912 2 YES INACTIVE 1888159889 11-APR-2006
7 1 1329 536870912 2 YES INACTIVE 1888164314 11-APR-2006
8 1 1330 536870912 2 YES INACTIVE 1888197375 11-APR-2006
What is your recommendation, should I continue to add more redo log groups.
Is there a way to see how many more redo log groups do we require to get rid of the message.
Thanks
April 11, 2006 - 2:29 pm UTC
is this happening during a furious burst of activity - or is this more or less your continous activity.
eg: is this a spike, a peak or is this "normal processing, happening continously"
About Processing
Vikas Khanna, April 12, 2006 - 12:26 am UTC
Hi Tom,
This is the normal processing this database undertakes, every hour the data is getting loaded in the tables especially 2 hot spot tables worth 7 million records.
Please suggest, the next course of action. I have even alterted the FAST_START_MTTR_TARGET , low to 60 secs but still no effect.
Thanks
April 12, 2006 - 8:04 am UTC
you'll need to be more clear.
It SOUNDS LIKE:
every hour we reload some big tables, then there is a period of no-loading, then there is a period of loading again, then no load, then load
My question is: is this CONTINOUS - forever - without stopping. Or does the redo generation come and go and come and go and come and go
I'm leaning towards "it comes and goes and comes and goes"
(and if you are bulk loading these objects EVERY HOUR - is that a "truncate+reload" or a "delete+reload" and can we perhaps employ direct path bulk loads with nologging in this case - since it would seem you could reload the data in the event of media failure)
In order to help you - you'll have to give us as much detail about your actual processing as possible
redolog size
don, August 30, 2006 - 8:30 am UTC
We have redolog of 100m and getting checkpoint no completed error. Do you think we should increase size from 100 to 500M?
August 30, 2006 - 8:51 am UTC
or add more 100m logs. see the original answer above, this page discusses this.
Excellent information
steve, September 14, 2006 - 3:29 pm UTC
Oracle Version: 9.2.0.5
Operating System: Solaris 8
setup: 4 redo log files, each one sized at 200M.
I just want to make sure I understood this correctly. We get
a lot of the following errors:
Thread <x> cannot allocate new log, sequence 104316
All online logs needed archiving
But we only had one:
Thread 1 cannot allocate new log, sequence 104300
Checkpoint not complete
Would it be correct to assume that DBWR is not the main
problem. The problem may be more the speed at which the
redo log files are written to the archive destination? Or
am I out in left field on this.
In either case though, I guess the easiest solution is
to add more redo files?
Thanks
Steve
September 14, 2006 - 4:40 pm UTC
the first cannot allocate new log is because ARCH cannot keep up
the second is because that one time, DBWR wasn't fast enough checkpointing.
ARCH is the "main" culprit in this case.
Thanks!
steve, September 14, 2006 - 4:53 pm UTC
Thanks Tom!
So there's nothing we can do to speed up ARCH (other than
faster disks I guess) - correct?
More redo log files should do the trick - correct?
September 14, 2006 - 4:55 pm UTC
configure more than one arch maybe.
make sure arch and lgwr do not contend with eachother.
make sure arch does not contend with something else.
eg; when lgwr writes to "log1" on "device1", arch can read "log2" on "device2" without contention in a perfect world. arch would write log2 to "archive_dest" - on a device no one else is really using.
Dirty blocks and heavy updates
Keith, February 06, 2007 - 7:55 am UTC
We are seeing checkpoint not complete messages, and I appreciate your comments about DBWR not keeping up. I think I see some correlation between the problems and jobs which update the same records multiple times over a long period in the same unit of work. Could this be a cause? In general, how does Oracle handle the flushing out of data blocks which have been dirtied bewteen the start of the checkpoint and DBWR pushing them out to disk? Might the checkpoint complete message get delayed because the block got dirty again?
February 06, 2007 - 11:21 am UTC
it is caused by dbwr not being to keep up and keep the buffer cache clean in the amount of time it takes to fill up all of the online redo logs.
whether you update the same records or different ones doesn't really matter, you are having dirty blocks in cache, they need to get on disk and it is not happening fast enough (actually, updating the same rows - a small set of rows - will cause a small set of blocks to be dirtied - it would REDUCE in general this checkpoint not complete, because a small set of blocks is getting dirtied)
Checkpoint not complete
Prashant, February 27, 2007 - 2:15 pm UTC
Tom,
I read your book Expert Oracle Database Architecture. Well, at least the part explaining 'Checkpoint not complete' error. I also read your very first answer on this page to a 'Checkpoint not complete' issue that William was having. I am confused after reading both the answers.
I have bolded the questions that I need answers for below.
The second square bullet on page 177 in your book says:
"Recreate the log files with a larger size. This will extend the amount of time between the time we fill the online redo log, and the time we need to reuse it."
One of the answers on this page says:
"Another way is to make the log files smaller, hence increasing the frequency with which we checkpoint"
These points seem to contradict each other. Can you qualify the points? In which case would you increase the log file size and in which case would you make it small?
We are getting "Checkpoint not complete" message in one of our databases. There are 4 redo log groups with single member in them - each sized to 200MB. Is it too big size? I would also like to say that it is not related to a slow ARCH process since I don't see "Archival required" message in the alert log.
DB_WRITER_PROCESSES is set to 2. FAST_START_MTTR_TARGET is set to 0. I don't care about recovery time right now. On page 94 of your book you say to increase DB_BLOCK_LRU_LATCHES parameter when I increase DB_WRITER_PROCESSES. This parameter is obsolete in 10.1.0.4. Then what's the equivalent parameter in 10.1.0.4? Our database is on RHEL4U3, Oracle 10.1.0.4.
DISK_ASYNC_IO is set to TRUE. The "Checkpoint not complete" message has occurred 3 times in the last 4 days since I have made the redo log file size 200MB and set up 4 redo log files. I am not sure what I should do now.
On a general note, I don't understand the logic behind making redo log file size smaller. The solution to get rid of this error is to spread out checkpoints as farther as possible. Making redo log file size small would cause more frequent log switches, in turn more frequent checkpoints - increasing the likelihood of this issue. What am I not understanding here?
You have also mentioned in the book about postponing the message. Is our aim only to postpone it? Is this message inevitable then?
Can you help?
February 27, 2007 - 2:37 pm UTC
they do not contradict.
think about the physics here.
make them bigger - less frequent need to checkpoint, might be able to make it from log to log without having to wait for the checkpoint to complete - you have more time.
make them smaller (but keep the same volume - eg: if you had 5gig of logs, keep 5gig of logs but make them smaller). Now you checkpoint more often, but more continously and maybe you keep up.
different ways to achieve the same goal.
ahh, but you do care about recovery time, if you set the fast start mttr target, you'll checkpoint more continuously and won't have so much to do as you switch from log to log - it is sort of like using smaller redo logs.
What should I do?
Prashant, February 27, 2007 - 2:58 pm UTC
I understand both the points. I guess the answer is 'depends'. ;-)
About the recovery, when I said I don't care about the recovery, I meant just NOW. I would rather get rid of these abrupt pauses in the database operation than to worry about the recovery time. Besides, FAST_START_MTTR_TARGET is set to 0 by default. I haven't changed it so it shouldn't affect my checkpointing. Correct?
But back to my main question, what do you think I should do? Why should I be getting this message with 200MB redo log file?
Thank you,
Prashant
February 27, 2007 - 3:10 pm UTC
fast start mttr is all about checkpointing, it sets up incremental checkpointing to make it so dbwr keeps the buffer cache clean and if the cache is clean, it doesn't take long to do a "full" checkpoint hence you don't have the checkpoint not completes
sort of like using lots of small logs! continuous checkpointing.
FAST_START_MTTR_TARGET
Prashant, February 27, 2007 - 4:40 pm UTC
How do I find out the history of when the checkpoints occur? Any v$ table that will tell me at what time an incremental or full checkpoint is occurring?
February 28, 2007 - 2:32 pm UTC
log checkpoints to alert init.ora parameter.
This is very puzzling...
Prashant, February 27, 2007 - 5:16 pm UTC
What am I missing here?
I have 5 redo log files each 200MB. I set FAST_START_MTTR_TARGET to 120 seconds. I am still getting "Checkpoint not complete" message at the bottom.
Tue Feb 27 16:41:38 2007
ALTER SYSTEM SET fast_start_mttr_target=120 SCOPE=BOTH;
Tue Feb 27 16:42:18 2007
Completed checkpoint up to RBA [0x32d6.2.10], SCN: 619134734
Tue Feb 27 16:51:21 2007
Incremental checkpoint up to RBA [0x32d6.13777.0], current log tail at RBA [0x32d6.21ec5.0]
Tue Feb 27 17:07:13 2007
Private_strands 48 at log switch
Beginning log switch checkpoint up to RBA [0x32d7.2.10], SCN: 0x0000.24e9eecf
Thread 1 advanced to log sequence 13015
Current log# 2 seq# 13015 mem# 0: /u03/flash_recovery_area/UP10G/onlinelog/o1_mf_2_2y1bz3fn_.log
Tue Feb 27 17:07:13 2007
ARC1: Evaluating archive thread 1 sequence 13014
Tue Feb 27 17:07:13 2007
ARC1: Beginning to archive thread 1 sequence 13014 (619134734-619310799) (up10g)
ARCH: Connecting to console port...
Tue Feb 27 17:07:13 2007
ARC1: Creating local archive destination LOG_ARCHIVE_DEST_10: '/u03/flash_recovery_area/UP10G/archive
log/2007_02_27/o1_mf_1_13014_%u_.arc' (thread 1 sequence 13014)
(up10g)
ARCH: Connecting to console port...
Tue Feb 27 17:07:23 2007
ARC1: Closing local archive destination LOG_ARCHIVE_DEST_10: '/u03/flash_recovery_area/UP10G/archivel
og/2007_02_27/o1_mf_1_13014_2y9bwkx0_.arc'
(up10g)
Tue Feb 27 17:07:24 2007
ARCH: Connecting to console port...
Committing creation of archivelog '/u03/flash_recovery_area/UP10G/archivelog/2007_02_27/o1_mf_1_13014
_2y9bwkx0_.arc'
Created Oracle managed file /u03/flash_recovery_area/UP10G/archivelog/2007_02_27/o1_mf_1_13014_2y9bwk
x0_.arc
Tue Feb 27 17:07:25 2007
ARC1: Completed archiving thread 1 sequence 13014 (619134734-619310799) (up10g)
ARCH: Connecting to console port...
Tue Feb 27 17:08:40 2007
Thread 1 cannot allocate new log, sequence 13016
Checkpoint not complete
Current log# 2 seq# 13015 mem# 0: /u03/flash_recovery_area/UP10G/onlinelog/o1_mf_2_2y1bz3fn_.log
What can I do?
February 28, 2007 - 2:33 pm UTC
so how fast are you switching logs at this point in time.
this /u03/ file system - got contention?
Log switching
Prashant, March 02, 2007 - 12:27 pm UTC
Log switching is happening every 20 to 30 minutes.
/u03/flash_recovery_area filesystem does have some contention.
My online redo logs, archive logs and backup image copies are on this filesystem.
How do I find out how much contention though?
March 04, 2007 - 6:19 pm UTC
operating system utilities specific to your operating system are typically used to measure the amount of activity and response times.
Checkpoint not complete
iwan, March 23, 2008 - 11:15 am UTC
Hi Tom,
We have redolog of 40 group 25m and still getting checkpoint no completed error. what should we do now?
March 24, 2008 - 11:18 am UTC
perhaps 40x25 is not sufficient - but re-read the original response and all of the review/followups above, they describe what causes this and what to do.
Parameters that Influence Checkpoints
Christophe, June 05, 2008 - 11:13 am UTC
Hi Tom,
First of all, thank you for all you explain us since so long.
I am working on a DB which gives ""checkpoint not complete, cannot allocate new log" alerts, but I don't understand what an Oracle doc explains.
Especially the "90%" point and the priority between checkpoints and actual work they do each.
Can You please help me ?
Thank you, best regards,
Christophe.
Metalink note 76713.1
(...)
"So, when Oracle does a checkpoint due to a log switch, it does not really have to do anything. The checkpoint position is already advanced out of the soon to be reused log. The log switch checkpoint does not actually do
anything right away. It is a "zero priority" checkpoint: it does not cause any writes to happen on its own, but is marked as complete only when checkpointing for other reasons causes the checkpoint position to advance
beyond the rba that marked the end of the log when the checkpoint was initiated.
Example:
--------
Assume you have two logs, A and B. When B fills, Oracle initiates a zero priority checkpoint to advance the checkpoint position to the end of log B.
However, because the checkpoint position is no more than 90% of the size of the smallest log behind the end of the log, the checkpoint position is roughly
10% of the way into log B when this checkpoint is initiated. There is no reason to do any writes, as log A can be reused immediately. So Oracle does nothing and lets the checkpoint position continue to advance based upon other factors. When the other factors cause the checkpoint position to advance beyond the end of log B, the zero priority checkpoint previously initiated is
declared complete and you see the checkpoint complete message.
Note: If you manually switch log files, the checkpoint position may not have advanced out of the log to be reused. In this case, Oracle initiates a real checkpoint and advances the checkpoint position up to the rba that marked the end of the log when the checkpoint
was initiated."
June 05, 2008 - 1:37 pm UTC
skipping all of the note stuff there - just read my original response above if you want to hear how I would say it.
basically the note is just saying "when we switch logs from A to B, we signal a checkpoint that must be completed before we can advance into A again"
ignore the 90% blah blah blah. It is just saying "that checkpoint must complete before A is reused, if it isn't you get checkpoint not complete - by default, when that checkpoint is signaled - we'd already have begun work on it - but we have a ways to go - if your configured online redo log is insufficient - we might not finish in time and have to signal the checkpoint not complete"
but - just read my original reply for my wording.
Pasko, June 06, 2008 - 10:12 am UTC
Hi,
if you are on 10GR2 you can use the following view to set the optimal values for fast_start_mttr_target and log file size.
select ir.optimal_logfile_size,
ir.log_file_size_redo_blks,
ir.estimated_mttr,
ir.target_mttr
from v$instance_recovery ir
start with fast_start_mttr_target=v$instance_recovery.estimated_mttr
and then keep watching the value of v$instance_recovery.optimal_logfile_size
Regards,
Pasko
Your opinion
Chaman, June 08, 2008 - 8:11 pm UTC
Hello Tom,
Is that the right way to set the optimal value ?
<<START>>
select ir.optimal_logfile_size,
ir.log_file_size_redo_blks,
ir.estimated_mttr,
ir.target_mttr
from v$instance_recovery ir
start with fast_start_mttr_target=v$instance_recovery.estimated_mttr
and then keep watching the value of v$instance_recovery.optimal_logfile_size
<<END>>
In our case the value of fast_start_mttr_target = 60 and the v$instance_recovery.estimated_mttr = 23.
No ARC0 meesages after the "Checkpoint not complete" message
Sandeep Kurian, January 14, 2009 - 9:54 am UTC
Hi Tom,
Great explanation for the alert log messages..
But something that baffles me is that after I had received the Checkpoint meesages in my alert log .. there are no further ARC0 messages similar to the following
ARC0: Evaluating archive log 12 thread 1 sequence 2595
ARC0: Beginning to archive log 12 thread 1 sequence 2595
Creating archive destination LOG_ARCHIVE_DEST_1: '/oracle/XXX/oraarch/XXXarch1_2595.dbf'
ARC0: Completed archiving log 12 thread 1 sequence 2595
Wed Jan 10 07:30:03 2007
When I got the Checkpoint error , the redolog were defined as such
4 online redolog groups with 2 members each and 50 MB each
Then I resized the redologs to 100MB.
The log buffer is 2 MB.
I still get the Checkpoint message.. and in addition to that there are no ARC0 messages though the archives are generated successfully.
The messages received during the checkpoint error are the following
************
Thread 1 advanced to log sequence 14288
Current log# 4 seq# 14288 mem# 0: /oracle/XXX/origlogB/log_g4m1.dbf
Current log# 4 seq# 14288 mem# 1: /oracle/XXX/mirrlogB/log_g4m2.dbf
Wed Jan 14 12:29:23 2009
Thread 1 cannot allocate new log, sequence 14289
Checkpoint not complete
Current log# 4 seq# 14288 mem# 0: /oracle/XXX/origlogB/log_g4m1.dbf
Current log# 4 seq# 14288 mem# 1: /oracle/XXX/mirrlogB/log_g4m2.dbf
Wed Jan 14 12:29:26 2009
Completed checkpoint up to RBA [0x37ce.2.10], SCN: 839619203
Wed Jan 14 12:29:30 2009
Beginning log switch checkpoint up to RBA [0x37d1.2.10], SCN: 839620666
Thread 1 advanced to log sequence 14289
Current log# 1 seq# 14289 mem# 0: /oracle/XXX/origlogA/log_g1m1.dbf
Current log# 1 seq# 14289 mem# 1: /oracle/XXX/mirrlogA/log_g1m2.dbf
Wed Jan 14 12:30:01 2009
Thread 1 cannot allocate new log, sequence 14290
Checkpoint not complete
Current log# 1 seq# 14289 mem# 0: /oracle/XXX/origlogA/log_g1m1.dbf
Current log# 1 seq# 14289 mem# 1: /oracle/XXX/mirrlogA/log_g1m2.dbf
Wed Jan 14 12:30:04 2009
Completed checkpoint up to RBA [0x37cf.2.10], SCN: 839620446
Wed Jan 14 12:30:06 2009
Beginning log switch checkpoint up to RBA [0x37d2.2.10], SCN: 839620958
**********************
Any advice would be much appreciated.
January 16, 2009 - 4:49 pm UTC
what init.ora changes did you also make....
Sandeep Kurian, January 15, 2009 - 5:45 am UTC
Meanwhile I am also getting the below messages in trace files XXX_arc0_22279.trc and XXX_arc1_22281.trc
*** 2009-01-15 11:34:23.950
kcrrwkx: nothing to do (end)
*** 2009-01-15 11:35:23.951
kcrrwkx: nothing to do (end)
*** 2009-01-15 11:36:23.951
kcrrwkx: nothing to do (end)
Metalink suggests this is a bug . But would this error suppress the ARC0 meesages in the alert log?
THanks in advance ,
Sandeep Kurian
Checkpoint not complete
Laksiri, April 01, 2009 - 5:51 am UTC
Dear Tom,
the infamous "checkpoint not complete, cannot allocate new log" message.
this occurrs when Oracle attempts to reuse a log file but the checkpoint that would flush
the blocks that may have redo in this log file has not yet completed -- we must wait
until that checkpoint completes before we can reuse that file -- thats when this message
is printed. during this time (when we cannot allocate a new log) processing is suspended
in the database while the checkpoint is made to complete ASAP.
In the above explanation System will suspend till checkpoint completes. I have a doubt there.
This will happen only when oracle try to overwrite the log file to restart the log cycle and again if there is commited data in the log file which is going to be over written. Other wise system function will continue with out suspending
AM i correct please explain
April 01, 2009 - 9:45 am UTC
well, you can also get "archival required", if ARCH cannot copy the redo logs fast enough - lgwr has to wait....
so, this will happen anytime we try to reuse a redo logfile and something that needs to be done to that redo hasn't finished yet.
Bernard L Mathysse, May 19, 2009 - 8:56 am UTC
We are also getting this error on our 11G database:
Thread 1 cannot allocate new log, sequence 1858
Private strand flush not complete
Oracle Metalink note 372557.1 says that a "strand" is a new 10gr2 term for redo latches and is not an error?
But it has lead to a database crash on our system:
Thread 1 cannot allocate new log, sequence 1851
Checkpoint not complete
Current log# 2 seq# 1850 mem# 0: M:\ORACLE\REDOB\BUKSPROD\REDO02B.LOG
Current log# 2 seq# 1850 mem# 1: N:\ORACLE\REDOA\BUKSPROD\REDO02A.LOG
ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 2264'
The system crashed.
We have our current redo log set to 50Mbytes and find that they switch 1 to 2 times per hour during the day and 4 to 5 times during backups and exports ( the previous error happended during an export). We have our parameters set as so:
log_checkpoint_interval integer 0
log_checkpoint_timeout integer 0
fast_start_io_target integer 0
fast_start_mttr_target integer 360
When we query the v$instance_recovery view we get very large values for the OPTIMAL_LOGFILE_SIZE (e.g. 6836) That's in megabytes!
In order to prevent the 'cannot allocate' maybe we should be looking at changing the fast_start_mttr_target or even switching it off in favour of the log_checkpoint parameters?
May 23, 2009 - 12:04 pm UTC
please utilize support if you are having a crash? why would you come to a forum?
Checkpoints and hot buffers
just reader, June 05, 2009 - 2:21 pm UTC
Tom,
1. Will DBWR flash dirty buffers triggered by log switch or continuous checkpoint if one transaction updates blocks and another continually re-reads the same data effectively pinning those blocks in buffers at MRU end of LRU?
2. Is checkpoint information actually written to redo logs and if so does it happen right after DBWR finished?
June 08, 2009 - 12:41 pm UTC
1) sure, a dirty buffer needs to be written to disk eventually, it cannot stay in the cache only forever.
2) checkpoint *information* is written all over the place, the redo logs, control files, datafiles, etc. It is more or less constantly happening.
phs, June 10, 2009 - 9:33 pm UTC
hi tom
i read 435887.1 in metalink.
it said can use this method to simulate the isuus "log can't allocate';
SQL> alter system switch logfile;
System altered.
Alert log shows the following.
Thread 1 advanced to log sequence 206
Current log# 3 seq# 206 mem# 0: +GROUP01/oasm/onlinelog/group_3.258.609176403
Current log# 3 seq# 206 mem# 1: +GROUP01/oasm/onlinelog/group_3.263.609176613
Now connect to another session (using scott/tiger schema object in the example) and leave an update TX active.
SQL> update emp set sal = sal * 1;
14 rows updated.
... do not commit and keep session active
---in my opinion, this step only modify the db block in db cache,lgwr did't wirte the modify log into redo log,is it right?
Then manually switch the logs in the original session, and now the alert log shows the following messages.
Or the following.
Thread 1 cannot allocate new log, sequence 85
Private strand flush not complete
Current log# 3 seq# 84 mem# 0: C:\ORACLE\ORA11\ORADATA\ORCL11\REDO03.LOG
--this happen i guess , because we did't commit the update,so lgwr did't write the update log to redo log,
is that right?
so my question is when i did't commit a transcation,
will lgwr write the redo to redo log, and if it write,
how can you explain the above simulate issue happen?
June 11, 2009 - 8:10 am UTC
... ... do not commit and keep session active
---in my opinion, this step only modify the db block in db cache,lgwr did't
wirte the modify log into redo log,is it right?
...
your opinion is inaccurate. We write uncommitted transaction data into the redo log all of the time. the redo log buffer is flushed frequently, constantly - and out goes the committed redo stream as well as the uncommitted redo stream.
Think about this - you have a 1mb redo log buffer in the SGA (suppose you do, you configure the size of it). You update 100mb of data generating 200mb of redo (indexes and all). Since we have only 1mb of redo buffer in the cache, we must have put out to disk at least 199mb of redo before you get to commit...
Their test is very time sensitive. It may or may not happen that way.
My approach to doing this would be to flood the system with tons of transactions, after having configured far too few online redo logs. That would generate the cannot allocate new log.
Scofield, June 13, 2009 - 9:57 pm UTC
Tom;
I am in Chapter 3 in your Expert Oracle Database architecture book;
I have some issues during reading redolog files topic.
Suppose
I issue insert:
It is in buffer cache now, and it is in redolog buffer.(not yet written to redolog files)
1-)What if system crushes immediately before lgwr writes the data to redolog files? Is the data lost?
2-)Assume the lgwr writes the data to redologs, and system crashes before checkpoint.
Does oracle get the transaction from redolog and reexecute it.( so that it will again be buffer cache,and waiting for checkpoint)
or
put the data directly into datafiles bypass buffer cache?
3-) When I perform simple insert or update :
Does Oracle write the new row to buffercache and after checkpoint this data is written to disk (datafiles).
or
At First, Oracle retrieves the empty block from disk and put it into buffer cache and then perform insert into buffer cache, and again after checkpoint this will be written to disks
June 15, 2009 - 12:02 pm UTC
1) you did not commit, how could the data be lost? it doesn't exist when we come back up, you never committed - there was nothing to "lose"
2) if the redo was written to the redo log files - when we recover, we'll replay that transaction (roll forward) and then discover the session that did it did not commit - so we'll roll it back
3) when you insert we find a block that has sufficient free space on it and modify it in the buffer cache. when you update, we have to find the existing row on some existing block and we update that in the buffer cache.
either operation might require a read from disk to get the block into the cache.
Freeze durations?
Loïc, August 07, 2009 - 8:27 am UTC
Hi Tom,
is there a way to measure the duration of the database freeze while waiting for checkpoint to complete?
I would of course prefer "a database way" to "an operating system way" :)
Thanks in advance,
Loïc
August 07, 2009 - 10:22 am UTC
In the original question - we see:
Sun Feb 13 03:30:26 2000
>>>>>Thread 1 cannot allocate new log, sequence 41801
>>>>>Checkpoint not complete
Current log# 1 seq# 41800 mem# 0: /kla/klaquest/dym/keep/redo1.log
Current log# 1 seq# 41800 mem# 1: /kla/qdb1/dym/keep/redo1a.log
Sun Feb 13 03:30:38 2000
Thread 1 advanced to log sequence 41801
so, the duration was 12 seconds.
that is the most straight forward method I know. You could use ASH (active session history) to see if people have waited on "log file switch" - but that would mean you would have to have people blocked during that period...
A try
Loïc, August 07, 2009 - 1:24 pm UTC
Thanks Tom to point ASH.
Could you maybe advise for the following query:
WITH my_data AS (SELECT sample_id,
cast(sample_time AS DATE) AS sample_time,
MAX(time_waited) / 1000000 AS time_waited_s,
min(event) AS event
FROM v$active_session_history
WHERE event like '%log file switch%'
AND session_state='WAITING'
GROUP BY sample_id, sample_time),
my_data2 AS (SELECT row_number() over (ORDER BY sample_id) r,
sample_id,
sample_time,
decode(sample_id - lag(sample_id) over (ORDER BY sample_id),1,1,NULL) AS diff,
time_waited_s,
event
FROM my_data)
SELECT d.sample_time,
d.event,
(SELECT SUM(time_waited_s) FROM my_data2 d2 WHERE d2.r BETWEEN d.r AND d.r2) AS time_waited_s
FROM (SELECT d.*,
nvl((SELECT MIN(d2.r)-1 FROM my_data2 d2 WHERE d2.r > d.r AND d2.diff IS NULL),
(SELECT MAX(d2.r) FROM my_data2 d2))AS r2
FROM my_data2 d
) d
WHERE diff IS NULL;
Which aims to say when a freeze occured and of course the duration of the freeze:
SAMPLE_TIME EVENT TIME_WAITED_S
----------- -------------------------------------------------- -------------
07/08/2009 log file switch completion 10.372336
07/08/2009 log file switch completion 5.975531
07/08/2009 log file switch (private strand flush incomplete) 0.403701
07/08/2009 log file switch completion 1.864275
07/08/2009 log file switch completion 0.043123
07/08/2009 log file switch (private strand flush incomplete) 2.619294
07/08/2009 log file switch (private strand flush incomplete) 0.549646
07/08/2009 log file switch (private strand flush incomplete) 0.940027
07/08/2009 log file switch completion 6.762621
07/08/2009 log file switch completion 6.83921
07/08/2009 log file switch (checkpoint incomplete) 1.609012
07/08/2009 log file switch (checkpoint incomplete) 2.531256
07/08/2009 log file switch completion 0.643473
07/08/2009 log file switch (checkpoint incomplete) 1.954845
07/08/2009 log file switch (checkpoint incomplete) 0.103907
07/08/2009 log file switch completion 0.97085
07/08/2009 log file switch (private strand flush incomplete) 0.976165
07/08/2009 log file switch completion 0.167953
07/08/2009 log file switch (checkpoint incomplete) 1.856985
07/08/2009 log file switch completion 0.734215
07/08/2009 log file switch (checkpoint incomplete) 3.642008
07/08/2009 log file switch (checkpoint incomplete) 3.733871
07/08/2009 log file switch (private strand flush incomplete) 2.666382
07/08/2009 log file switch (private strand flush incomplete) 0.824016
07/08/2009 log file switch (private strand flush incomplete) 0.343181
07/08/2009 log file switch completion 0.015503
07/08/2009 log file switch (private strand flush incomplete) 0.536985
07/08/2009 log file switch completion 0.976511
07/08/2009 log file switch (private strand flush incomplete) 0.978144
07/08/2009 log file switch completion 0.060924
Thanks in advance,
Loïc
August 07, 2009 - 2:41 pm UTC
... Which aims to say when a freeze occured and of course the duration of the freeze: ...
It would be more accurate to say
It shows me when my users were impacted by the checkpoint not complete - it might have happened hundreds of other times, but no one waited on it.
And it shows me how long they waited, it might have been much longer in duration but my users were not doing anything right then, so they were NOT waiting on it.
Oups
Loïc, August 07, 2009 - 1:30 pm UTC
Of course a
...
SELECT to_char(d.sample_time,'DD/MM/YYYY HH24:MI:SS') AS sample_time,
d.event,
round((SELECT SUM(time_waited_s) FROM my_data2 d2 WHERE d2.r BETWEEN d.r AND d.r2),1) AS time_waited_s
...
to format would be better.
Loïc
Private Strand Flush Not Complete
Santosh Kumar, August 08, 2009 - 3:46 am UTC
Today I was going through an OTN discussion on "When log_buffer flush?".
http://forums.oracle.com/forums/thread.jspa?threadID=940809 The discussion has used a new terminology(new for me) "Private Redo", which I think is introduced in 10g as above mentioned metalink note states.
I request you to please explain the terminology "Private Redo" and "strand".
Thanks,
Santosh
August 11, 2009 - 12:59 pm UTC
private redo should be somewhat self defining. It is not commingled in the redo buffer with everything else.
now, one definition of strand is "To separate from other elements in a construction, either by moving it out of the construction or moving the rest of the construction."
so basically, it is a private redo strand (separated) in hopes of optimizing the commit and redo generation processing. If your transaction is relatively small, you'll generate all of your redo into a private strand (which increases concurrency, instead of fighting to get into the redo log buffer in general) and when you commit, you write your private strand out. If your transaction gets large - you'll end up going "public" into the greater redo log buffer in general.
Private Strands
Santosh Kumar, August 12, 2009 - 12:57 am UTC
Thanks Tom,
Your simple words are more elaborative and make things look easy.
Will you please clear one more thing, what do you mean by "relatively small". Can we quantify this? Is similar sort of structure present for undo also?
Regards,
Santosh Kumar
August 13, 2009 - 11:35 am UTC
I use 'relatively small' just to be vague - It would be too much information to go further. We can change it (the size), we probably have changed it, it will change. You cannot control it...
about the archive log files and dataguard setup
Habeeb, January 07, 2010 - 12:45 am UTC
hi tom
the redolog files & lgwr activites explained by you are really greate
however i would like you to answer the same with respect to a dataguard configuration
i get messages as
ARC7: Evaluating archive log 9 thread 1 sequence 422493
ARC7: Destination LOG_ARCHIVE_DEST_3 archival not expedited
ARC7: Destination LOG_ARCHIVE_DEST_2 archival not expedited
ARC7: Beginning to archive log 9 thread 1 sequence 422493
Creating archive destination LOG_ARCHIVE_DEST_1: '/archival/ONLINEPD/arch/ONLINEPD_1_422493.arc'
Invoking non-expedited destination LOG_ARCHIVE_DEST_2 thread 1 sequence 422493 host ONLINEPD_DR
Invoking non-expedited destination LOG_ARCHIVE_DEST_3 thread 1 sequence 422493 host ONLINEPD_FALL
ARC7: Completed archiving log 9 thread 1 sequence 422493
ARC7: Begin FAL noexpedite archive (thread 1 sequence 422456 destination ONLINEPD_FALL)
in my prod alert ,so what should i get
i am really confused
January 11, 2010 - 8:13 am UTC
looks OK to me, what are you thinking is wrong?
this just shows you are using the archive moving of redo - not lgwr - but arch - to move the redo.
A reader, February 05, 2010 - 8:17 pm UTC
Hi sir;
Just for the experiement:
I shutdown the database and restore old redologs and create a controlfile with noresetlogs option
It fails with:
ORA-01229: data file 4 is inconsistent with logs.
How can oracle understand if the current online redologs are consistent or not?
Is there a data dictionary view that I can check wheter redologs are consistent or not?
For the controlfiles and datafiles, I can compare the checkpointScn in v$datafile and v$datafile_header.
However I have no idea how to compare redologs?
February 08, 2010 - 9:36 pm UTC
because the data file headers have information in them, that is where your create controlfile statement filled in all of the missing bits of information in the control files - from the datafiles themselves.
A reader, February 13, 2010 - 4:54 pm UTC
Thanks sir.
Can I see this information in v$datafile_header view? or any other view?
February 16, 2010 - 12:48 pm UTC
see what information - v$datafile and v$datafile_header is totally documented, you know what is there. What are you looking for specifically (and read the docs on these views to see if you cannot answer your own question)
for the original question
mammut, March 09, 2010 - 6:41 am UTC
there is a very nice troubleshooting guide on metalink
"Checkpoint Tuning and Troubleshooting Guide [ID 147468.1]"
plz don't make fun of me thnks
March 09, 2010 - 11:51 am UTC
don't use German postal codes (PLZ) and I won't make fun of you.
If you use vowels where they are called for - I'll even say thanks from time to time.
Checkpoints Complete At Once
Jim, October 27, 2010 - 2:00 pm UTC
Hi Tom
i just recently upgraded from 10.2.0.4 to 10.2.0.5 on a test server and was running some checks
In creating some test tables using a select statement such as
create table <tablename> as select * from <tablename>;
Note: the table has a total 5377922 records
I noticed that i was getting the alert log entries as stated in this thread
afterwards then the checkpoints completed all at once
Wed Oct 27 11:37:15 Pacific Daylight Time 2010
Beginning log switch checkpoint up to RBA [0x10bca.2.10], SCN: 1263038699
Wed Oct 27 11:37:15 Pacific Daylight Time 2010
Thread 1 advanced to log sequence 68554 (LGWR switch)
Current log# 1 seq# 68554 mem# 0: K:\ORACLE\ORADATA\LLPROD\REDO_01A.LOG
Current log# 1 seq# 68554 mem# 1: I:\ORACLE\ORADATA\LLPROD\REDO_01B.LOG
Wed Oct 27 11:37:26 Pacific Daylight Time 2010
Thread 1 cannot allocate new log, sequence 68555
Checkpoint not complete
Current log# 1 seq# 68554 mem# 0: K:\ORACLE\ORADATA\LLPROD\REDO_01A.LOG
Current log# 1 seq# 68554 mem# 1: I:\ORACLE\ORADATA\LLPROD\REDO_01B.LOG
Wed Oct 27 11:37:26 Pacific Daylight Time 2010
Completed checkpoint up to RBA [0x10bc7.2.10], SCN: 1263032435
Wed Oct 27 11:37:27 Pacific Daylight Time 2010
Beginning log switch checkpoint up to RBA [0x10bcb.2.10], SCN: 1263040775
Wed Oct 27 11:37:27 Pacific Daylight Time 2010
Thread 1 advanced to log sequence 68555 (LGWR switch)
Current log# 2 seq# 68555 mem# 0: K:\ORACLE\ORADATA\LLPROD\REDO_02A.LOG
Current log# 2 seq# 68555 mem# 1: I:\ORACLE\ORADATA\LLPROD\REDO_02B.LOG
Wed Oct 27 11:37:40 Pacific Daylight Time 2010
Thread 1 cannot allocate new log, sequence 68556
Checkpoint not complete
Current log# 2 seq# 68555 mem# 0: K:\ORACLE\ORADATA\LLPROD\REDO_02A.LOG
Current log# 2 seq# 68555 mem# 1: I:\ORACLE\ORADATA\LLPROD\REDO_02B.LOG
Wed Oct 27 11:37:40 Pacific Daylight Time 2010
Completed checkpoint up to RBA [0x10bc8.2.10], SCN: 1263034573
Wed Oct 27 11:37:41 Pacific Daylight Time 2010
Beginning log switch checkpoint up to RBA [0x10bcc.2.10], SCN: 1263042841
Wed Oct 27 11:37:41 Pacific Daylight Time 2010
Thread 1 advanced to log sequence 68556 (LGWR switch)
Current log# 4 seq# 68556 mem# 0: K:\ORACLE\ORADATA\LLPROD\REDO_04A.LOG
Current log# 4 seq# 68556 mem# 1: I:\ORACLE\ORADATA\LLPROD\REDO_04B.LOG
Wed Oct 27 11:37:52 Pacific Daylight Time 2010
Completed checkpoint up to RBA [0x10bc9.2.10], SCN: 1263036633
Wed Oct 27 11:37:53 Pacific Daylight Time 2010
Beginning log switch checkpoint up to RBA [0x10bcd.2.10], SCN: 1263044915
Wed Oct 27 11:37:53 Pacific Daylight Time 2010
Thread 1 advanced to log sequence 68557 (LGWR switch)
Current log# 3 seq# 68557 mem# 0: K:\ORACLE\ORADATA\LLPROD\REDO_03A.LOG
Current log# 3 seq# 68557 mem# 1: I:\ORACLE\ORADATA\LLPROD\REDO_03B.LOG
Wed Oct 27 11:38:05 Pacific Daylight Time 2010
Completed checkpoint up to RBA [0x10bca.2.10], SCN: 1263038699
Wed Oct 27 11:38:06 Pacific Daylight Time 2010
Beginning log switch checkpoint up to RBA [0x10bce.2.10], SCN: 1263046977
Wed Oct 27 11:38:06 Pacific Daylight Time 2010
Thread 1 advanced to log sequence 68558 (LGWR switch)
Current log# 5 seq# 68558 mem# 0: K:\ORACLE\ORADATA\LLPROD\REDO_05A.LOG
Current log# 5 seq# 68558 mem# 1: I:\ORACLE\ORADATA\LLPROD\REDO_05B.LOG
Wed Oct 27 11:38:19 Pacific Daylight Time 2010
Thread 1 cannot allocate new log, sequence 68559
Checkpoint not complete
Current log# 5 seq# 68558 mem# 0: K:\ORACLE\ORADATA\LLPROD\REDO_05A.LOG
Current log# 5 seq# 68558 mem# 1: I:\ORACLE\ORADATA\LLPROD\REDO_05B.LOG
Wed Oct 27 11:38:19 Pacific Daylight Time 2010
Completed checkpoint up to RBA [0x10bcb.2.10], SCN: 1263040775
Wed Oct 27 11:38:20 Pacific Daylight Time 2010
Beginning log switch checkpoint up to RBA [0x10bcf.2.10], SCN: 1263049037
Wed Oct 27 11:38:20 Pacific Daylight Time 2010
Thread 1 advanced to log sequence 68559 (LGWR switch)
Current log# 1 seq# 68559 mem# 0: K:\ORACLE\ORADATA\LLPROD\REDO_01A.LOG
Current log# 1 seq# 68559 mem# 1: I:\ORACLE\ORADATA\LLPROD\REDO_01B.LOG
Wed Oct 27 11:38:42 Pacific Daylight Time 2010
Completed checkpoint up to RBA [0x10bcc.2.10], SCN: 1263042841
Wed Oct 27 11:42:57 Pacific Daylight Time 2010
Completed checkpoint up to RBA [0x10bcd.2.10], SCN: 1263044915
Wed Oct 27 11:43:09 Pacific Daylight Time 2010
Completed checkpoint up to RBA [0x10bce.2.10], SCN: 1263046977
Wed Oct 27 11:43:21 Pacific Daylight Time 2010
Completed checkpoint up to RBA [0x10bcf.2.10], SCN: 1263049037
******************************************************
Redo Logs (Redo Log Groups/Status/Size)
******************************************************
Member Member Group Group
Group# Status Archived Sequence# Members Bytes MBytes Bytes MBytes
---------- ---------------- -------- ---------- ---------- ---------------- ------- -------------- -------
1 INACTIVE YES 68549 2 134,217,728 128 268,435,456 256
2 CURRENT NO 68550 2 134,217,728 128 268,435,456 256
3 INACTIVE YES 68547 2 134,217,728 128 268,435,456 256
4 INACTIVE YES 68546 2 134,217,728 128 268,435,456 256
5 INACTIVE YES 68548 2 134,217,728 128 268,435,456 256
---------------- ------- -------------- -------
sum 671,088,640 640 1,342,177,280 1,280
Can you tell me where i need to look in order to determine what the cause of "Checkpoint not complete" is ?
October 27, 2010 - 2:12 pm UTC
the cause of the checkpoint not complete is that you generated more blocks in the buffer cache that were dirty than DBWR was able to flush out in time.
There would have been dirty blocks in the cache (NOT from the create table as select - those go straight to disk). Those blocks would have been protected by the redo in the various redo log files. We cannot reuse the redo log files that protect those blocks until DBWR gets the blocks onto disk.
So, for example, say block 42 is dirty in the cache. Suppose further that block 42 has some redo in redo log file A. And suppose you have just two log files.
You are running your create table as select. Assuming you are in archive log mode - that will generate REDO (but not buffers in the cache to clean out). So, redo log A fills up and we switch to redo log B. The create table as select's you are running are filling up B like crazy. DBWR is doing a checkpoint - but hasn't gotten to block 42 yet (there were lots of blocks to flush).
Now the create table as select's fill up redo log B - and we need to switch to A. But - we cannot - redo log A is needed to recover block 42 in the event of a crash - and since block 42 isn't checkpointed yet, we have to wait. We get "checkpoint not complete"
So, that is WHY it happens. It could have been (probably was) caused by the sudden and large redo stream you generated with the create table as selects. The way to fix that would be to:
a) add more redo log files - so that DBWR has a chance to catch up from this sudden big spurt of redo generation
b) use the fast_start_mttr_target to make DBWR more aggressive about keeping the buffer cache clean
c) make dbwr 'faster' (more disks, less contention on disks, whatever it takes)
Follow-up
Jim, October 28, 2010 - 6:06 pm UTC
Wow !
Thanks Tom for taking the time to assit me
When you advised as one of the options to add more redo files, did you mean additional group(s) or additonal members within the exiting groups ?
November 01, 2010 - 4:56 am UTC
adding another member would just increase redundancy
you want to add another group with members - to give you more overall online redo log storage.
Manoj Kaparwan, October 29, 2010 - 7:58 pm UTC
Additional groups.
A reader, November 02, 2010 - 11:20 am UTC
Dear Sir,
In above explanation regarding checkpoint you said that the block created by CTAS goes directly to disk and than later in the next paragraph you mention
"...The create table as select's you are running are filling up B like crazy. DBWR is doing a checkpoint - but hasn't gotten to block 42 yet (there were lots of blocks to flush)."
Do you mean to say that the redo which is produce during CTAS is to protect undo related to data dictionary?
Could you please explain for what redo is generate during CTAS?
Many thanks
November 02, 2010 - 11:37 am UTC
the redo generated during the create table as select is the redo generated to protect the table from media (disk) failure whilst running in archive log mode. In archive log mode (which I assume most situations reflect), CTAS logs the table to redo - it generates redo for everything.
Follow-Up
Jim, November 02, 2010 - 1:18 pm UTC
Thanks Tom and Manoj Kaparwan
I was also looking at the OPTIMAL_LOGFILE_SIZE as mentioned above in v$instance_recovery and according to it i need 4GB of REDO:
select OPTIMAL_LOGFILE_SIZE "RedoSize" from v$instance_recovery;
RedoSize
----------
4342
In your opinion, how much stock should I place in this value ?
I would have to add about another 3GB of REDO, if i go by the value.
Thanks
Jim
November 03, 2010 - 1:20 am UTC
here is a good quick overview of that bit of advice
http://www.orafaq.com/node/1437 it is driven by your fast_start_mttr_target basically - it shows how large to make the redo logs so that checkpoints are initiated by your mttr target - not so much by log switches.
Thanks For The Assist
Jim, November 04, 2010 - 2:08 pm UTC
Just a thank you for your time
Jim
Is logfile multiplexing playing a role ?
Christophe, December 02, 2010 - 4:25 am UTC
Hello,
I've followed your advice and added a new logfile group, but the problem is still hanging around (certainly less frequent but still there).
The instance is a migration from and old server to a new one and from 9i to 11gR2, and we have the exact same workload against the instance, but I can't find even once the problem into the old instance (9i) alert.log ! the only difference (concerning logfiles) between the two configuration is multiplexing, with 9i we had only two copies and then with 11g we have 3 copies of the logfiles. I'm wondering if in that matter multiplexing can play a role ?
December 07, 2010 - 9:04 am UTC
It is more likely that either:
a) you are generating more redo log on 11g, since you have the old alerts - you should be able to see how frequently you were switching in the past.
b) your redo log devices are not as efficient on the new hardware as the old.
Having a 3rd member would not likely affect this as the log files are written to in parallel - all three should be written do simultaneously.
Checkpoing and mttr
rishi, December 22, 2010 - 8:55 am UTC
I gone through this discussions,most of time tom recommend sizing log files or increasing number for handling issues like checkpoing not complete, can tom suggest fast_mttr_target parameter helps in this case if it is not set and we are getting the error,
2)if we enable fast_mttr_target parameter can it help in reducing number of redo logs during batch job or it will have no effect
thanks
December 22, 2010 - 2:32 pm UTC
yes, using something that makes dbwr aggressively clean the buffer cache - like fast start mttr target, will alleviate the "cannot allocate new logs".
All you need to do is make it so that somehow dbwr can get the blocks protected by some redo log file into the datafiles before we need to reuse that redo log file. More redo is one approach, keeping the cache 'clean' is another.
fast_start_mttr_target cannot affect the aggregate amount of redo generated.
Alexander, May 04, 2011 - 2:29 pm UTC
Hi,
You wrote in your last book checkpoint not complete messages are bad and the database basically freezes up until allocate a new log. I'm seeing many Thread 1 cannot allocate new log, sequence xxxxx without being accompanied by the checkpoint message. What does these mean? Is it still bad? I assume this means I still need to add more redo logs.
Thanks.
May 04, 2011 - 3:27 pm UTC
show me surrounding text from the alert log.
Alexander, May 04, 2011 - 3:34 pm UTC
It seems to be coinciding with our archivelog backups:
Starting control autobackup
Control autobackup written to SBT_TAPE device
comment 'API Version 2.0,MMS Version 5.3.3.0',
media '542110'
handle 'OCMGW01P_c-3268357615-20110504-3d.bkp'
Wed May 4 15:45:11 2011
ALTER SYSTEM ARCHIVE LOG
Wed May 4 15:45:11 2011
Thread 1 cannot allocate new log, sequence 24243
Private strand flush not complete
Current log# 2 seq# 24242 mem# 0: /ocmgw01p/oralogs01/redoOCMGW01P_02.log
Current log# 2 seq# 24242 mem# 1: /ocmgw01p/oralogs02/redoOCMGW01P_02.log
Wed May 4 15:45:11 2011
Thread 1 advanced to log sequence 24243 (LGWR switch)
Current log# 3 seq# 24243 mem# 0: /ocmgw01p/oralogs02/redoOCMGW01P_03.log
Current log# 3 seq# 24243 mem# 1: /ocmgw01p/oralogs03/redoOCMGW01P_03.log
Wed May 4 15:45:11 2011
ALTER SYSTEM ARCHIVE LOG
Wed May 4 15:45:11 2011
Thread 1 cannot allocate new log, sequence 24244
Private strand flush not complete
Current log# 3 seq# 24243 mem# 0: /ocmgw01p/oralogs02/redoOCMGW01P_03.log
Current log# 3 seq# 24243 mem# 1: /ocmgw01p/oralogs03/redoOCMGW01P_03.log
Wed May 4 15:45:11 2011
May 04, 2011 - 3:41 pm UTC
This message just means we haven't completed a LGWR write yet and something is trying to make us advance the log - we have to wait for that write to complete.
Checkpint not complete
Rakesh, July 13, 2011 - 4:24 am UTC
Tom,
If we get checkpoint not complete error. Is it possible that committed data will not be wriiten to the disk?
July 13, 2011 - 8:12 pm UTC
nope, not a chance of that happening. you get checkpoint not complete, cannot allocate new log when we try to switch log files - when it happens - the database is suspended (so no one is committing right then and there and anyone that did - already has their stuff in the log files - it is safe)
Moving to SSD drives
Allen, November 22, 2011 - 10:00 am UTC
Tom,
In the past redolog files where put on alternating drive letters while archive logs where put on a single drive. We saw an improvement when loading a large number or records from one Oracle DB to another.
Why log? The data was being moved from SDE (spacial database engine) database to a non SDE database. Logically an Oracle link can not move a source data type to a destination where that datatype is not. Because of online updates to other parts of the table, the table cannot be taken offline to load.
We are looking at SSD drives. Putting all the redologs on one drive seems ok. Faith in hardware has gone way up since 1980.
My concern is that testing will show no difference whith putting all the redo logs on one drive untill the user creates a task that was not tested for.
I know there are few to none rules of thumb as "my milage may vary" but any thoughts would be appreciated.
Windows 2008R2, Oracle 11gR2 for the destination, IBM supplied vendor package on the destination so modifications / tuning is difficult at best.
Thanks in advance.
November 22, 2011 - 12:58 pm UTC
We are looking at SSD drives. Putting all the redologs on one drive seems
ok. Faith in hardware has gone way up since 1980.
In my experience that might not do very much. redo writing is large sequential writes. spinning disks are good at that. No seeking really. SSD is "ok performance for writes" and "ok performance for large sequential reads (you tend to max out IO bandwidth for both magnetic and ssd transfers)" and "awesome performance for random reads".
My concern is that testing will show no difference whith putting all the
redo logs on one drive untill the user creates a task that was not tested for.
I'm sorry, but I could not parse that sentence. What are you trying to ask?
SSD for redo logs follow up
Allen, November 22, 2011 - 3:11 pm UTC
Hi,
Thanks for the quick response. Yea the SSD are great for IOPS and random reads. Without having tried them – hmmm - their throughput 6gb/s may be maxed out. Actually the network may be the limiting factor. We only have a 10Gig-E, not the 40gig backplane of an ExaData, for the disks.
As for the second part that could not be parsed (turns red) the other DBA did not understand it either. Sorry. I meant that even tons of application testing may not bring up issues. The DB works great with all the redo-logs on one SSD drive instead of separate logs on separate drives. That is *until* the user does something that was never tested for. Then performance goes to heck in a hand basket.
Looking back the question is more of a statement since everything works great under known test conditions. It’s the new ones coming out in the next upgrade that may, just may, kill performance.
As a new question on redo logs, perhaps I’m confused about disk activity with redo log reads & writes. Say users A, B and C start working. Tables are updated, rows inserted – the world is great. These changes are written to the redo logs until either a ‘commit work;’ or “snap shot too old” occurs.
If the rare “roll back” occurs or when the redo logs are archived; the logs are read. Hence after a log switch, redo log N is being read while the next redo log is being written to. [warning blanket statement follows] Even with disk cache and RAID-10 there is still physical disk drive head contention. I have no proof of this, just theories.
Here SSD drives are great. No seek time, no latency time. While the network backplane may be a bottle neck – moving the redo logs to separate disk drive letters pointing to physically separate drive arrays made a difference. Well in one case. One site out of millions of ORACLE sites.
Please let me know what you think. Thanks.
November 22, 2011 - 7:10 pm UTC
These changes are written to the
redo logs until either a ‘commit work;’ or “snap shot too old” occurs.
Sorry - could not parse that :)
what does a commit or ora-1555 have to do with anything? A commit will wait (by default) for the rest of the redo to be written, an ora-1555 has no relevance to redo (it is all about UNDO). But a commit will not cause a sudden flood of redo to be written as we are constantly streaming that stuff in the background.
the redo logs are NOT READ during a rollback, again that is all about UNDO - *not* redo. We do not read the redo logs during normal database activity. We use UNDO to rollback, we use UNDO for read consistency. We read redo during instance recovery and media recovery. Not during normal database processing.
I
have no proof of this, just theories.
theory be gone, we don't read redo during normal database processing.
ARCH reads redo to archive it.
Streams might read redo if you are using it.
But the database itself does not.
cio
lh, November 23, 2011 - 7:23 am UTC
One way to boost dbwr performance could be to use CIO (concurrent io) where it is supported.
Final on SSD for redo logs
Allen Shepard, November 23, 2011 - 9:49 am UTC
Tom,
Thanks.
"But a commit will not cause a sudden flood of redo to be written as we are constantly streaming that stuff in the background." - did not know, or had forgotten, that part.
Hmmm "ARCH reads redo to archive it. Streams might read redo if you are using it." - more to look into. Hope to post real world results. We run two I/O sensitive applications.
Plus new things to play with - (clears throat) "Evaluate different archetutal constructs" - yea. That's my story and I'm sticking too it.
Happy Thanksgiving. Be back in December.
Allen
is sequence unique across different threads ?
j.h zheng, March 04, 2012 - 7:55 am UTC
hi tom
i was wondering if redo log sequence is unique across different threads or just unique within the specific thread when redo advances in the rac ?
thanks .
March 04, 2012 - 9:35 am UTC
sequences are unique within a thread.
On checkpoints, MTTR and REDO logs
Mette, April 09, 2012 - 3:23 pm UTC
Hello Tom,
One out of customers has come to us with a database set up like this:
Version 10.2.0.5 on Windows
log_checkpoint_timeout = 0 - set to zero ... how long may a block remain dirty before written to disk
fast_start_mttr_target = 0 - set to zero ... etstimated time go recover from a crash
log:_checpoint_interval = 0 - unset ... how many blocks written to redo before a checkpoint
8 redo groups with 50 MB logfiles each.
They automatically switch logfiles every half hour.
When we look at the v$log it always looks like this (with one current and the other files ACTIVE)
I know it's becase the DBWR has a lazy settnig with the above mentioned parameters, but could you explain
it to us what actually happens when the 3 params are 0.
SQL> select * from v$log;
GROUP# THREAD# SEQUENCE# BYTES MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIME
---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- -------------------
1 1 37626 262144000 2 YES ACTIVE 1233427710 03 04 2012 09:49:07
2 1 37627 262144000 2 YES ACTIVE 1233451270 03 04 2012 10:19:10
3 1 37628 262144000 2 YES ACTIVE 1233468362 03 04 2012 10:49:10
4 1 37629 262144000 2 YES ACTIVE 1233486682 03 04 2012 11:19:09
5 1 37630 262144000 2 YES ACTIVE 1233502929 03 04 2012 11:49:09
6 1 37631 262144000 2 YES ACTIVE 1233520608 03 04 2012 12:19:09
7 1 37632 262144000 2 NO CURRENT 1233536309 03 04 2012 12:49:08 ---<<<<<<<<
8 1 37625 262144000 2 YES ACTIVE 1233406028 03 04 2012 09:19:08 ---<<<< Still active after 4 hours - has DBWR fallen asleep?
At every logfile shift they get a "checkpoint not completed" - and the customer does not like it, even though we state that
this is not an error - just a wait situation. But I would like to change it for him. I'm tempted just to reset til checkpoint_timeout to default (1800 sec).
In the manual it is clearly specified that log_checkpoint_timeout is NOT set to 0 unless the mttr-taget is set (everything is 0 in this inherited system).
Or should I set the mttr_target to maybe 5-10 min instead? Or both?
I can see what is happening but could you shed some light on the 3 above parameters effect here?
best regards
Mette
undesirede settings
Mette, April 10, 2012 - 12:45 am UTC
Hi Tom, and thanks for the clerification.
So basicaly you say that there have not been any completed checkpoints the last 4 hours because of these weird settings ?
So we have a very very lazy dbwr ?
---
Yes, I'll go for chaning to defaults.
Mette
April 10, 2012 - 7:40 am UTC
I refuse to call it "lazy", you told it "no worries - do whatever".
Private strand flush not complete after completed checkpoint
Milan, June 02, 2012 - 4:54 am UTC
Hello Tom,
I have the problem "cannot allocate" also. My reason is "Private strand flush not complete". I can see from the alert.log the begining and completing of the log switch checkpoint (the data protected by log 2):
Sat Jun 02 10:00:20 2012
Thread 1 cannot allocate new log, sequence 43733
Private strand flush not complete
Current log# 2 seq# 43732 mem# 0: C:\ORACLE\ORADATA\OMGN\REDO02.RDO
Current log# 2 seq# 43732 mem# 1: C:\ORACLE\ONLINELOGS\OMGN\REDO02.RDO
Beginning log switch checkpoint up to RBA [0xaad5.2.10], SCN: 4639864294
Thread 1 advanced to log sequence 43733 (LGWR switch)
Current log# 3 seq# 43733 mem# 0: C:\ORACLE\ORADATA\OMGN\REDO03.RDO
Current log# 3 seq# 43733 mem# 1: C:\ORACLE\ONLINELOGS\OMGN\REDO03.RDO
Sat Jun 02 10:00:26 2012
Archived Log entry 87461 added for thread 1 sequence 43732 ID 0xf6b593e5 dest 1:
Archived Log entry 87462 added for thread 1 sequence 43732 ID 0xf6b593e5 dest 2:
Sat Jun 02 10:03:26 2012
Completed checkpoint up to RBA [0xaad5.2.10], SCN: 4639864294
So I suppose the checkpoint is completed (the data protected by log 2). But next time the log 2 should be used I see the same message. Could you please explain what does "Private strand flush not complete" mean in this case? Thanks.
June 04, 2012 - 7:25 am UTC
Have you demonstrated in any way that this is truly causing a performance issue of any sort? There are insufficient timestamps on this unfortunately - we don't know how much time transpires during this:
Sat Jun 02 10:00:20 2012
Thread 1 cannot allocate new log, sequence 43733
Private strand flush not complete
Current log# 2 seq# 43732 mem# 0: C:\ORACLE\ORADATA\OMGN\REDO02.RDO
Current log# 2 seq# 43732 mem# 1: C:\ORACLE\ONLINELOGS\OMGN\REDO02.RDO
Beginning log switch checkpoint up to RBA [0xaad5.2.10], SCN: 4639864294
Thread 1 advanced to log sequence 43733 (LGWR switch)
and that is all that counts - the next timestamp is for the archival which we don't care about.
This message is written anytime there is a small active transaction (small enough to be using a private strand) during a log switch. We have to flush their redo and then proceed. By definition - this should be a very small event (since if the transaction gets "big", it wouldn't be using a private strand)
Can "Checkpoint not complete" error cause ORA-01033
A reader, October 15, 2012 - 5:47 pm UTC
I have a followup query regarding this error in relation to an ORA-01033 error which is being generated by the Oracle instance.
Upon trawling through the alert log, I find this error (cannot allocate new log,Checkpoint not complete) listed a minute or so AFTER the actual ORA-01033 event but nothing else directly pointing to the immediate shutdown.
Are/Can these events be related?
The earlier posts also mention that this checkpoint error suspends all processing in the database, Can you elucidate further on this aspect sil vous plait?
October 16, 2012 - 9:29 am UTC
no.
and it is not an error, it is a warning, the database will be "frozen" during that time as dbwr goes into overdrive to catch up.
everything not related to flushing blocks to disk will be put on hold, everyone will experience a wait event while dbwr goes to work.
sam, May 07, 2014 - 10:29 pm UTC
Tom,
I have an 11g very small idle database and I keep seeing this message in the alert log
Wed May 07 01:04:26 2014
Thread 1 cannot allocate new log, sequence 2331
Private strand flush not complete
Current log# 2 seq# 2330 mem# 0: /u01/app/oracle/oradata/finp/redo02.log
Thread 1 advanced to log sequence 2331 (LGWR switch)
Current log# 1 seq# 2331 mem# 0: /u01/app/oracle/oradata/finp/redo01.log
Wed May 07 01:04:30 2014
Archived Log entry 2263 added for thread 1 sequence 2330 ID 0x8c16f947 dest 1:
Wed May 07 01:13:50 2014
TABLE SYS.WRI$_OPTSTAT_HISTHEAD_HISTORY: ADDED INTERVAL PARTITION SYS_P422 (118) VALUES LESS THAN (TO_DATE(' 2014-05-07 06:55:28', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))
Wed May 07 02:00:00 2014
Closing scheduler window
Closing Resource Manager plan via scheduler window
Clearing Resource Manager plan via parameter
Wed May 07 02:57:33 2014
Thread 1 cannot allocate new log, sequence 2332
Private strand flush not complete
Current log# 1 seq# 2331 mem# 0: /u01/app/oracle/oradata/finp/redo01.log
Thread 1 advanced to log sequence 2332 (LGWR switch)
I saw your answer on May 4, 2011 but you do not say what do to fix it or just simply ignore it. Do you do anything to resolve this?
<<<This message just means we haven't completed a LGWR write yet and something is trying to make us advance the log - we have to wait for that write to complete. >>>
I have 3 redo logs sized at 50 meg and 2 groups (defaults).