Skip to Main Content
  • Questions
  • Uncommitted transactions in Data Files and Redo Log files

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Kamran.

Asked: April 28, 2009 - 12:57 am UTC

Last updated: December 13, 2019 - 3:14 am UTC

Version: 10.2.0

Viewed 10K+ times! This question is

You Asked

Hi Tom
First of all I want to inform you that, I love you very much and I have a big respect to you

I have a question related uncommitted transactions. I know that while we issue COMMIT, LGWR writes committed and uncommitted transactions from redo log buffer into redo log files. And while we do CHECKPOINT, DBWR writes committed and "uncommitted" transactions from database buffer cache into data files. My question is:

1. Why LGWR process writes uncommitted transactions to Redo Log files?
2. Why DBWR writes uncommitted transactions to Data Files?

Why do we need these uncommitted transactions in our Redo Log and Data Files?

Thank you very much

----------------------
Kamran Agayev A.
http://kamranagayev.wordpress.com

and we said...

Well, just to clarify, log writer is writing committed and uncommitted transactions from the redo log buffer to the log files more or less continuously - not just on commit (when the log buffer is 10mb full, 1/3 full, every 3 seconds or every commit - whichever comes first - those all trigger redo writes).

And DBWR is more or less continuously checkpointing these days - with fast start mttr target and the other log related parameters, dbwr is flushing dirty blocks to disk more or less "all of the time". There is this major event called a checkpoint, but it is more of a marker than anything else since the work is ongoing.

1) because you can do transactions that are much larger than available memory - there isn't room to buffer it all. That and the fact that if you saved it all up, a commit could take a very very very long time - we want to get ahead of the curve.

You can have transactions that are much larger than available memory.

You can have transactions that are much larger than available online redo log (which leads to #2 in a second)

You want commit times (we optimize for commit, we do not plan on rolling back, rolling back is the EXCEPTION, not the rule, commits are the rule) to be flat - even - regardless of transaction size (it takes about the same amount of time to commit 1 row update as it does a 1,000,000 row update)


2) DBWR will write out dirty blocks onto disk in order to

a) allow an online redo log file to be reused, if it did not put the block safely onto disk - we could not reuse the redo log file (that contains the redo to redo the changes). If we cannot reuse the redo log file - we would STOP DEAD. Hence, since stopping dead in our tracks is a bad thing, we put the dirty blocks onto disk so we can reuse the redo information.

b) make more space in the buffer cache, you can have modifications that exceed the available SGA by many times. Also, we are multi-user so many people can be doing things at once. We cannot be constrained by memory limitations

c) limit the time it takes to recover from a crash. Say we were doing a big update and updated ton of information. We commit. We crash immediately after the commit. If DBWR left all of the dirty blocks in the cache, we'd have to wait during startup for that transaction to be replayed - that could take a long time. In order to reduce the mean time to repair the database after a crash - we have things like the fast_start_mttr_target

http://docs.oracle.com/cd/B19306_01/server.102/b14237/initparams.htm#REFRN10058

that limits the amount of dirty blocks in the cache that would need recovery after a failure.

Rating

  (38 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

oracle certifications-- no longer degree of honor

A reader, April 28, 2009 - 8:29 am UTC

Tom,
The OP who asked the question is a 10g OCP as is obvious from his blog.Strangely/Sadly enough the question he asked is the most basic one.Doesn't these concepts are taught in OCP preparation?Or are these concepts so difficult to grasp?
Or are these concepts so poorly documented?
Why every now and then such questions are raised no matter whether its a newbie or a certified?

to "A reader" (the one before me):

Another reader, April 28, 2009 - 8:55 am UTC

he's a student, see
http://kamranagayev.wordpress.com/2008/12/06/how-delete-duplicate-rows/

and the ordering of his "useful links" shows, that he does not know about the proper use of datatypes

Datatype of id column in useful links is VARCHAR2

Kamran Agayev, April 28, 2009 - 9:15 am UTC

Don't worry with datatypes there's no problem :) 
The type of id in front of all useful links is varchar2, that's why ordered in that way. OK? 

SQL> create table order1 (id varchar2(20));

Table created.

SQL> begin
  2  for i in 1..10 loop
  3  insert into order1 values(i);
  4  end loop;
  5  commit;
  6  end;
  7  /

PL/SQL procedure successfully completed.

SQL> select * from order1 order by 1;

ID
--------------------
1
10
2
3
4
5
6
7
8
9

10 rows selected.

SQL>

Tom Kyte
April 28, 2009 - 10:26 am UTC

... The type of id in front of all useful links is varchar2, that's why ordered in
that way. OK?
....

their point was - you put numbers into strings and it doesn't sort right. #10 comes before #2? quick fix for you - use fixed width string formats for your nubmers, 01, 02, 03, .... (unless you think you'll have 100 links some day of course!)


documentation

Kamran Agayev, April 28, 2009 - 9:18 am UTC

Could you please show me from documentation that DBWR writes uncommitted transactions as well?
Tom Kyte
April 28, 2009 - 10:41 am UTC

well, if you think about it - it has too.... It doesn't really even need documentation.


Look at the number of blocks in your SGA, create a table bigger than that, update every row, watch it succeed!


some things - which obviously must be true - go without saying.


the buffer cache is just that - a CACHE. Blocks are transparently accessed via the cache, regardless of whether we had to read them from disk into the cache and then return them, or whether they were already in the cache.


http://docs.oracle.com/docs/cd/B19306_01/server.102/b14220/process.htm#sthref1500

show me where it says "dbwr only writes committed blocks"? (which actually might be IMPOSSIBLE in a high update, small table situation - the block is constantly undergoing modifications by simultaneous transactions - it is never in a "completely committed state" - nothing says that it has to be)



another proof point -

http://docs.oracle.com/docs/cd/B19306_01/server.102/b14220/backrec.htm#sthref2419

well, you know a transaction can be larger than the available online redo log (you might have 2x5mb redo logs - you can update 1gb of data). As per the recovery process:

"Use only the online redo logs and never require the use of the archived logs"


well, if you can only use online redo logs - you only have 10mb of redo available, but 1gb of changes... Where is the other 1gb-10mb of change information?

There is but only one place where it could be - in the datafiles. That is why we complete a CHECKPOINT before we advance into an online redo log - we have to make sure the blocks (committed or otherwise) are safely on disk before reusing that datafile.




student??

Kamran Agayev, April 28, 2009 - 9:22 am UTC

Creating table with name "students" and inserting my name into that table, doesn't mean that "I'm a student". Am I right? :)

I can insert there your name also :)

Good question

Hashmi, April 28, 2009 - 10:09 am UTC

These concepts are really difficult to grasp.The documentation has put little effort to explain such concepts properly and in less ambiguous language but then we have luckily the great Tom.
You need to understand that DBWR and commit are mutually exclusive things.A DBWR is totally unaware of commit.All DBWR cares for is writing dirty buffers to datafiles.A DBWR does not know whether the dirty buffer is committed or not.There is no algorithm designed to let DBWR know whether the dirty buffers it gonna write are committed or not because it does not need.
Similarly LGWR is unaware of a checkpoint.An LGWR does not know/care whats happening in the buffer cache.
DBWR<--->checkpoint
LGWR<---->commit

As you can see here:
http://download.oracle.com/docs/cd/B10501_01/server.920/a96524/c09procs.htm#21919
It does not even talk about commit.

I know the difference

Kamran Agayev, April 28, 2009 - 10:15 am UTC

Hi Hashimi
I know that DBWR is unaware of COMMIT and LGWR is unaware of CHECKPOINT.
I asked a question only about committed and uncommitted transactions and wanted to know the basic principle. I'm surprised that Tom answered my question very patiently and briefly, but others began to treat me roughly

Thiru, April 29, 2009 - 4:48 am UTC

Quote:
"when the log buffer is 10mb full, 1/3 full, every 3 seconds or every commit - whichever comes first - those all trigger redo writes"

10 MB full or 3 MB?? I check the doc, but I am not able to find it.
Tom Kyte
April 29, 2009 - 9:11 am UTC

where did 3mb come from?


actually, 10 was wrong, it is 1mb.

but, it is actually not "that relevant" - let's put it this way:

lgwr is constantly flushing the redo log buffer to disk, many things trigger the write, the goal is to always have the redo log buffer as empty as possible - so when you commit - the amount of remaining redo is very very tiny. That is why commit times are flat regardless of transaction size.

what is instance recovery means and what is media recovery means?

A reader, May 07, 2009 - 8:21 am UTC


Tom Kyte
May 11, 2009 - 11:42 am UTC

instance recovery is done after an instance fails, it requires only the ONLINE REDO logs. It is also known as "crash recovery" in some cases.

If you shutdown abort
and then startup, you'll do an instance recovery, it requires the online redo logs only.


media recovery is when you restore a file from backup. The scenario is that the media (disk) that the datafile was on has broken. You cannot access that datafile anymore. So, you go to your backups and find an old copy of that datafile. You restore it to a new disk and then apply all of the archived redo logs that were generated since you backed up that file AND THEN we apply the online redo log files to it to catch it up to the rest of the database.

Many Thanks TOM

Danyc, May 09, 2009 - 4:44 am UTC

Very well explained Tom, wish you all the best and keep going to teach us!

Regards!

Where exactly DML commands will fall into ?? Buffer Cache or to Redo Log Buffer ?

SreeNIVASa SuMAN, May 10, 2009 - 2:53 am UTC

Hi Tom,
Please to meet u
Ofcourse This is myfirst question in the Reviews
1.Kindly let me know the concept on DML operations with Redo log cache and Buffer cache.
2.Operation of "Automatic Managed Undo" with "Recover the database"
Tom Kyte
May 11, 2009 - 4:51 pm UTC

... Where exactly DML commands will fall into ?? Buffer Cache or to Redo Log Buffer ? ...

both of course, you make changes to blocks (that is in the buffer cache) and those changes are protected by redo which is generated into the redo log buffer and then flushed to disk...


1) not sure what you are looking for, but if you want "how does it all work" - that is a book and the book I wrote on it is called "Expert Oracle Database Architecture"

2) we use undo to rollback uncommitted transactions. During a recovery - you roll forward with the redo (archive+online for media recovery, online only for instance recovery following a 'crash') and then we roll back any outstanding, uncommitted transactions using UNDO.

A reader, May 30, 2009 - 11:23 pm UTC

Mr Tom;
For example I issue a bid update and do not commit;
After that shutdown abort occurs immediately.
1-)This uncommitted transaction is both stored in redologs and datafiles right?

2-)Since lgwr writes every 3 seconds, what if shutdown abort occurs after 1 second from the transaction?

Tom Kyte
June 01, 2009 - 7:48 pm UTC

1) maybe, maybe not

it could be

a) in neither the redo nor datafiles
b) in the redo but not the datafiles
c) in both

(if you have access to expert oracle database architecture - i cover these cases in great detail)

2) see #1. but lgwr writes AT LEAST every three seconds, not "just every three". when you commit - lgwr writes too.

To Kamran

Aman...., June 01, 2009 - 11:35 pm UTC

Kamran,

The point that you have mentioned that DBWR doesn't seem to wite into the the datafiles the uncommitted data, is actually not correct. DBWR, has to flush the buffer cache for numerous reasons and this means, the buffers which are marked "drity" has to sent to the data files in order to get them protected, without being bothered that they have been committed or not.Its LGWRs job to make sure that when committed, all teh enteries MUST go into the log files as there may be a chance that before they could get into the datafiles, the system may crash due to any unexpected issue.So the committed data must be sent to the log files. The uncommitted buffers are sent to the datafiles, the only place where they are safe. In the mean while , if the transaction gets over, next time these buffers would be cleaned with the "delayed block cleanout" . I guess, on Oracle forums, there are couple of discussions over the same topic. Just search over there and I am sure you would find it.


A reader, July 11, 2009 - 11:10 pm UTC

Dear Tom;
I really appreciate if you clear my doubts about disk access issue in insert,update,delete.

1-) Insert:
Suppose I issue a simple insert,
First,Oracle has to check if the segment has free blocks.(freelist or bitmaps)
Does this opreation require going to disk? I mean where does the freelist or bitmaps stored?
Could you explain the steps that Oracle perform during insert?

2-)Update and delete:

These operations require disk access, am I right?
First oracle reads blocks from cache to disk and then "update or delete" rows in cache?
What is the sequential order in update or delete?
Does oracle again check free blocks and then retrive blocks, and modify?


Tom Kyte
July 14, 2009 - 5:26 pm UTC

1 & 2) does that operation require going to disk?

maybe - think about it, if what we need is in the buffer cache, we don't need to go to disk. If this is a freshly started database and NO ONE has touched this table, then yes, the first insert into it would have to read data in from disk.

The second insert into it - probably would not have to read anything from disk, but it might.

There is nothing special here (with regards to modifications). Every SQL statement needs a set of blocks - those blocks are either

a) in the cache
b) not in the cache

if (a) is true, no physical IO. If (b) is true, then we do a physical IO.


A reader, July 17, 2009 - 10:23 pm UTC

"insert into it would have to read data in from disk."

Sir, Update and delete,select can read data from disk but
Why does insert into read data from disk?
There are free blocks in buffer cache,The data should be inserted into empty blocks in buffer cache.
Why it is necessary to read data from disk during insert?

Tom Kyte
July 24, 2009 - 9:01 am UTC

insert MIGHT read from disk
insert MIGHT NOT read from disk

replace insert with select, update, delete, merge - it is all the same.


look at the sentence in full:

maybe - think about it, if what we need is in the buffer cache, we don't need to go to disk. If this is a freshly started database and NO ONE has touched this table, then yes, the first insert into it would have to read data in from disk.

nowhere there does it say INSERT MUST. It says

a) if what we need is in the cache, we don't read disk
b) if what we need is NOT in the cache, we obviously have to perform a disk read.

A reader, July 25, 2009 - 6:49 pm UTC

"insert MIGHT read from disk"


Sir, can you explain in what condition insert read from disk?


AS far as I know,
We dont need to retrieve blocks from disk to buffer cache during insert.
We just simply insert into empty blocks in buffer cache.


Tom Kyte
July 26, 2009 - 7:55 am UTC

insert sort of needs to figure out where to put stuff. We do not create a new block for every row. When you insert, we need to read things like freelists (manual segment space management) or bitmap blocks (automatic segment space management). When we read those blocks - they will tell us where to put the data.

And when we figure out that the block in file 5, block #42 is the block we should insert into - we need to get that into the cache. It probably already has some data on it, we need to add to it.

we might have to physical IO the freelist data, the bitmap data and the target block.

then again, we might not.


That is for HEAP data of course, now YOU need to think about what would happen with an index organized table, a b*tree cluster, a hash cluster, a partitioned table, etc.........


A reader, August 01, 2009 - 12:11 am UTC

Thanks a lot sir, now I am clear.

When I perform coventional insert:

1-)First I go the disk and, get the info from freelist or bitmap blocks.

2-)Retrive that block to cache.

3-)Insert the data into cache.

4-)After checkpoint this modified block will be written disk again.


Is that the correct order?

You also said that we might not need to read from disk
In what condition we might not?
Tom Kyte
August 04, 2009 - 12:35 pm UTC

NO


when you perform an insert

you MIGHT, you MAYBE SOMETIMES, you COULD, you MIGHT HAVE TO go to disk.

How about I write this way instead:

inserting data into a table does not require physical IO's to take place.


... You also said that we might not need to read from disk ...

isn't that obvious? WHEN THE BLOCK IS ALREADY IN THE CACHE.


Just read the first response over and over:


insert MIGHT read from disk
insert MIGHT NOT read from disk

replace insert with select, update, delete, merge - it is all the same.


look at the sentence in full:

maybe - think about it, if what we need is in the buffer cache, we don't need to go to disk. If this is a freshly started database and NO ONE has touched this table, then yes, the first insert into it would have to read data in from disk.

nowhere there does it say INSERT MUST. It says

a) if what we need is in the cache, we don't read disk
b) if what we need is NOT in the cache, we obviously have to perform a disk read.

A reader, August 07, 2009 - 10:26 pm UTC

Hello Tom;
Where does bitmap for LMT and freelist for DMT stored? Do they stored in every block header? or at one block in every segment

Tom Kyte
August 11, 2009 - 12:43 pm UTC

you are confusing terms.

LMT (locally managed tablespaces) can use ASSM (automatic segment space management) - free space is managed in bitmap blocks in the extent itself, the bitmap blocks are intermingled with your data in the extent. LMT can use MSSM (manual segment space management) where free space is managed in freelists in the extent header - outside of the extent itself - not intermingled with your data.

With ASSM you set pctfree and not pctused, not freelists, not freelist groups.


With MSSM you set pctfree AND pctused AND freelists AND freelist groups (and if you do not, you suffer the concurrency penalty from not having enough freelist/freelist groups and inefficient space utilization by not setting pctused appropriately.

DMT (dictionary managed tablespaces) only do MSSM.

For Hashmi

Aman...., August 07, 2009 - 10:26 pm UTC

Hashmi,

I don't think that its correct to say that DBWR and LGWR are mutually exclusive. They actually work very closely bound with each other. It can't happen that they work independantly. Yes its correct that DBWR is working at Checkpoint(not always, there are changes in its writing) and LGWR at commit(not always and there are changes from 10g onwards).

Just my 2 cents.
Aman....
Tom Kyte
August 11, 2009 - 12:45 pm UTC

In the context of what Hashmi was talking about - he is correct and I would characterize it in similar terms.

Scofield

A reader, August 08, 2009 - 4:54 am UTC

Tom, sorry for this silly question:

Regarding the insert, why are we retrieving empty blocks from disk to cache?
There are already empty blocks in the cache, why dont we simply insert into empty blocks in buffer cache without disk access.


Tom Kyte
August 11, 2009 - 1:10 pm UTC

I wrote (a couple of times :) )

if what we need is in the cache, we don't read disk

If what we NEED is in the cache, we won't read the disk.
If what we NEED is not in the cache, we will read the disk.

That is all - when we insert (assuming manual segment space management first) - we need to find a block with sufficient freespace on it. That block is found by walking the freelists (extent header information). The freelists may or may not be in the buffer cache, if they are not, we have to read them in. When we do - we may find that the block we want to insert into (block 55 in file 42 - the freelist told us that is our block) may or may not be in the buffer cache - if it is not, we read it in. Then we have to maintain the indexes - we may or may not have to read them into the buffer cache and so on.

assuming automatic segment space management - we have to read the bitmap blocks in the extent themselves - these may or may not be in the buffer cache - if they are not, we NEED to read them in. We read them and then we find the block we'd like to insert into - and if it is not in the cache - we have to read that in. And then the indexes.


We don't just find a block willy nilly by accident in the buffer cache that is associated with the table and has free space on it - we walk freelists or bitmaps to find a block and then insert into that block.

Scofield

A reader, August 14, 2009 - 10:21 pm UTC

Thanks a lot sir;
1-)Assume I am using freelist and have one freelist on each segment.
Is the freelist stored in the first block of each segment or is there any particular block it reside?

2-)Assume ASSM. I am using bitmaps.
Is bitmap block stored in one block in each extent? How is it stored?


Tom Kyte
August 24, 2009 - 8:04 am UTC

1) it is in the segment header but can expand out from there over time if it gets really large.

2) they are sprinkled throughout the entire segment - all over the place.

Aman...., August 22, 2009 - 11:49 am UTC

n the context of what Hashmi was talking about - he is correct and I would characterize it in similar terms.


Sir, can you please explain it a bit more?

regards
Tom Kyte
August 25, 2009 - 9:02 am UTC

succinctly, the gist is:

dbwr isn't a commit process, lgwr is
lgwr isn't a checkpointing process, dbwr is

Scofield

A reader, September 05, 2009 - 2:19 am UTC

Respected Sir;


Say , my transaction started at T1.

Someone did some modification after T1, now ITL entries of the block is open.
Then a commit issued and itl entries are cleaned.


Now, during my transaction how does oracle understand if the row was modified after T1 and read the before image from undo?
ITL entries are already cleaned, so it has to check something else..
Tom Kyte
September 06, 2009 - 9:12 am UTC

the block is 'cleaned' to signify "these are done, no need to travel to the undo segment to figure it out, we are telling you these are done"

do not think of cleaning as erasing. think of cleaning as adding more information, more detail

Aman...., September 05, 2009 - 8:22 am UTC

succinctly, the gist is:

dbwr isn't a commit process, lgwr is
lgwr isn't a checkpointing process, dbwr is


Oh in that way! I agree in this context that they are different sir :) .

regards
Aman....

A reader, September 05, 2009 - 6:26 pm UTC

Respected sir,


Suppose I issue insert.
Oracle first checks freelist or bitmaps and according to this info, it will perform the action.

When I first allocate size to SGA, buffer cache is also sized automatically.
What I want to know is, When I startup the database, Are there free blocks in buffer cache?
Or free blocks are retrieved from disk according to size of the buffer cache?
I mean are free blocks in buffer cache used for the blocks that should be retrieved from disk?
or can we use them directly?


Say, I have just started the database.
I issue an insert, oracle finds where to insert data by reading freelist or bitmap and then
since I have just started the database, It will retrive this block to cache.
and so on,
The thing is even there are initially freeblocks in buffer cache they are never used.Block are always retrived from disk.




Tom Kyte
September 06, 2009 - 9:37 am UTC

... Are there free blocks in
buffer cache?
...

of course, most of them are. They could not have been used yet - could they....

But seeing your next sentence, I think you mean "blocks with free space from various segments - for future inserts"

Your terminology is a little loose. A free block in the buffer cache is an empty block, one that is not used by anything - an empty block buffer, a free block buffer, an unused thing.

A block with free space on it is entirely different - and we would not 'preload' them into the cache, no. The first insert into a table after a restart would almost certainly result in a physical IO.


please re-read this page, we've gone over this time after time.

You insert into a table
we need to find a free block IN THE TABLE - a block IN THE TABLE that we can insert into.

we process the freelist or bitmap freelist. In order to process the freelist/bitmap freelist we MIGHT have to do physical IO in order to get the required information into the cache.

When we process the freelist/bitmap freelist, it will tell us "block 5 from file 42 is what you should use". We might have to do a physical IO from file 42 of block 5 in order to get it into the cache. Or not.

Scofield

A reader, September 13, 2009 - 4:48 am UTC

Thanks sir;
Other than insert,
Does oracle also check freelist or bitmaps prior to update and delete?

Tom Kyte
September 14, 2009 - 1:15 pm UTC

sure, an update can migrate rows (move them to another block) - freeing space on one block and needing to find space on another.

an update/delete can free up space, causing a block to become a candidate for future inserts, that would be necessary to note

an update can increase the space used on a block, causing a block to no longer be a candidate - that would be necessary to note.

Lock is preventing reads from dirty blocks or uncommited data from data files

V.Hariharaputhran, May 30, 2010 - 6:08 am UTC

Mr.Oracle,

This is about the base question of the thread."Data files do contain uncommited transactions".Do agree this, so as to freeup online redo logs.

Please lets me know if my understanding is correct, Lets take a scenario

Create table a(x integer,y number);
Insert into a values(1,100);
Insert into a values(2,200);

Session S1
Update a set y = 201
where x = 2;

This DML leads to -->
Lock is established.This data might/may not have been moved to the datafiles.

session S2

I issue (SELECT * from a where x = 2)- i will get blocked.

session S3
The DBA issues a CHECKPOINT.

Now definetly the DBWR flushes the dirty blocks into the data files and the UNCOMMITED data is now definetly present in the Data files.

Now even though if the uncommited data is present in the DATAFILES, the session 2 user(any other user) will NOT be able to read the uncommited data from datafiles, BECAUSE

1. The lock is yet not released, this row level lock is still there untill we issue the commit.

Thus even though uncommited data is present in the datafiles, this DIRTY READ(READ UNCOMMITED) is not permitted because of the lock acquired. Irrecpective of the data being in CACHE or in DATAFILES this very same locking mechanisim is the base for the default read commited isolation level.

Thanks for your time.

Regards
V.Hari


Tom Kyte
May 30, 2010 - 6:59 am UTC

do you have access to my book Expert Oracle Database Architecture - I go through this sort of stuff in great detail.


session 2 will NOT get blocked, reads are not blocked by writes in Oracle.

Oracle does not work even remotely in the way you suggest.


I suggest you read through the Server Concepts Manual at the very least, it covers topics like this (and I was one of the two authors working on it - so I know it covers this stuff..)

http://docs.oracle.com/docs/cd/E11882_01/server.112/e10713/toc.htm

some sort of writes blocks reads in Oracle?

Pasko, May 30, 2010 - 11:59 am UTC

Hi Tom,

You said...reads are not blocked by writes in Oracle.

We have known this Fact for years, but i am not so sure as of right now ;)

I have been seeing sort of Blocking Behavior recently. Sometimes, although Readers are not completely blocked when reading from a heavily updated Table by other Sessions, we have noticed severe Performance Problems because of the "read by other session" Wait Event. This Wait Event is even worse in RAC Environments.

Regards,

Pasko





Tom Kyte
May 31, 2010 - 8:13 am UTC

the read by other session means simply:

Session A needs block X
Block X is not in the cache

therefore, A needs to do a phyisical IO to read block X into cache

However - when A goes to do the physical IO, A notices that session B is already doing the IO. So, rather than do a double IO - A just waits for B

In short, A gets the block FASTER because B has already started the IO and is just waiting for B to complete it, it would take longer if A decided to not wait for B as A would have to do the work that B is already halfway through doing.


What it means is......


Apparently, your IO pattern is such that you are doing lots of random IOs under heavy load and your cache is not sufficient to hold all of them and hence you have to wait for IO to complete. This read by other session is just another way of seeing a wait for physical IO.




Waiting for an event to complete (physical IO) is not blocking, not even remotely.

Lock is preventing reads from dirty blocks or uncommited data from data files

V.Hariharaputhran, May 30, 2010 - 12:22 pm UTC

Mr.Oracle,
Thanks for your reply.Sorry the sentence got trucated, it is

session S2

I issue (SELECT * from a where x = 2)- i will get blocked if i do an update on the same record.

Again i would like to know your comments on the below

session S3
The DBA issues a CHECKPOINT.

Now definetly the DBWR flushes the dirty blocks into the data files and the UNCOMMITED data is now definetly present in the Data files.

Now even though if the uncommited data is present in the DATAFILES, the session 2 user(any other user) will NOT be able to read the uncommited data from datafiles, BECAUSE

1. The lock is yet not released, this row level lock is still there untill we issue the commit.

Thus even though uncommited data is present in the datafiles, this DIRTY READ(READ UNCOMMITED) is not permitted because of the lock acquired. Irrecpective of the data being in CACHE or in DATAFILES this very same locking mechanisim is the base for the default read commited isolation level.

Thanks for your time.

Regards
V.Hari


Tom Kyte
May 31, 2010 - 8:17 am UTC

I issue (SELECT * from a where x = 2)- i will get blocked if i do an update on the same record.

absolutely FALSE.



Thus even though uncommited data is present in the datafiles, this DIRTY READ(READ UNCOMMITED) is not permitted because of the lock acquired. Irrecpective of the data being in CACHE or in DATAFILES this very same locking mechanisim is the base for the default read commited isolation level.


You truly need to read the books I referenced for you - you do not understand at all how it works.


First, the checkpoint is a red herring, it is meaningless, the same thing happens process wise regardless of whether the block is in the cache or on disk - it can be in either place, we don't care, the result is the same.

Second, whenever your query hits a block that has been modified since your query began (or your transaction began - depending on the isolation level of your transction) - regardless of its committed state, we don't care if the modification you are seeing has been committed OR NOT - we roll back that change using the UNDO.

*we roll back that change - regardless of the committed state, regardless of where the block was retrieved from* and you do not wait for any locks.



You can easily see this - just open up three sessions - two 'regular', one 'DBA'


do your scenario, go ahead and try to get "select * from a where x=2" blocked - when you do, post the steps to reproduce (we'll be waiting a very very very veeeerrrryyyyy long time to hear from you on that)

Dirty Data present in data files

v.hariharaputhran, May 31, 2010 - 10:59 am UTC

Mr.Oracle


Thanks a tone for your time and patient explanations.

Looks like the way i have asked/presented my doubt is totaly wrong.

Trying to frame better this time, In short what i would like to know is

Part 1
Even though the dirty data is present in the data files, which are written by DBWR, How the other sessions are restricted from viewing this uncommitted data from the data file. (even if they are in memory still we will not be able to see untill we issue commit).

- Answer is it all depends on the isolation level.

Part 2

Session s1:-

Update t1 set x = 2 where y = 1;

Session 2;

So this session would not be able to view the new value of x(modified by session s1 of x to "2" unless untill session1 commits it).

So when session 2 executes "select * from t1 where y =1" will not have the NEW VALUE of x because

1. Still the modified blocks from session 1 are not yet commited.Hence the row is still locked.

2. Hence the session 2 while quering the data it would create a image copy of data for read consistency irrespective of some other session modifying the data, and will contain only commited data till at the point of time

I think i have tried to explain better this time.


Regards
V.Hari




Tom Kyte
June 08, 2010 - 7:55 am UTC

part 1 -

when you hit a block that has changed or is in the process of being changed (changed - it could be committed, or not committed - the only relevant thing is "it changed") since

a) the time at which your query was OPENED (this is typical, it is read committed)
b) the time at which your transaction was STARTED (this is serializable or read only isolation)
c) the flashback time requested in the query itself (using flashback query)

we roll back those changes. We don't care if the block was in the cache, read from disk into the cache, gotten from another node in a RAC system - whatever. The only thing that matters is:

the block was modified since the point in time our query must be as of, when that is found to be true (it was modified - whether committed or not) we'll use the UNDO information to roll back that change.



part2 -

correct, an uncommitted change is NOT visible ever to another transaction, session 1 will NOT see the effect of the update from session 1 unless and until session 1 commits that change.


1) the fact that the row is (or is not) locked is quite simply "not relevant" to this discussion really....

Uncommited data in datafiles

V.Hariharaputhran, June 09, 2010 - 3:32 am UTC

Mr Oracle,
Thanks for your time,reply and for your site. Iam not very clear with the Part 1. Request you to breif it a little more,

"
the block was modified since the point in time our query must be as of, when that is found to be true (it was modified - whether committed or not) we'll use the UNDO information to roll back that change."

Does it mean that

when ever i try to select a data(who's block is modified), then the server process on identifying it being a modified block would apply the UNDO changes and return the result.


Regards
V.Hari

Tom Kyte
June 10, 2010 - 11:36 am UTC

yes, that is exactly what I mean. You said the same thing I did.

It is called read consistency. See the server concepts guide for details.

uncommited data in datafiles

Manish Orkey., January 13, 2011 - 3:46 am UTC

What happens to the uncommited data "occupying space" in data files assuming the transaction was not committed but was rolled back ? To rephrase, the uncommitted data is occupying SPACE on disk(datafiles) but is of no use. So is this uncommitted data, ever purged from the datafiles so that space is released ? If yes, When ? and Who does this purging ?
Tom Kyte
January 13, 2011 - 10:35 am UTC

rollback is a logical operation, it logically restores the table to what it was, but it does that by running a compensating transaction. It is not like recovery which physically restores - bit for bit, byte for byte - the database to the way it way, rolling back logically 'fixes' the database.


If you insert 5,000 rows and rollback, we issue 5,000 delete by rowids. You will have the free space in the table of 5,000 rows.

If you delete 5,000 rows and rollback - we issue 5,000 positioned inserts in effect - the rows go back to where they were.

If you update 5,000 rows and rollback - we issue 5,000 updates to unupdate the data in effect

I say in effect because in reality it is more complex - but the concept holds true.


So, if you have an empty table:

ops$tkyte%ORA11GR2> create table t as select * from all_objects where 1=0;

Table created.

ops$tkyte%ORA11GR2> exec dbms_stats.gather_table_stats( user, 'T' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2> select blocks from user_tables where table_name = 'T';

    BLOCKS
----------
         0


and you insert a bunch of stuff into it:

ops$tkyte%ORA11GR2> insert into t select * from all_objects;

72143 rows created.


ops$tkyte%ORA11GR2> rollback;

Rollback complete.


and then measure again:

ops$tkyte%ORA11GR2> exec dbms_stats.gather_table_stats( user, 'T' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2> select blocks from user_tables where table_name = 'T';

    BLOCKS
----------
      1126



You'll see the table has grown - but it is empty.


Tables only grow - unless and until you shrink the, reorganize them, rebuild them.

And those operations are rarely if ever needed since the space will be reused within the table quite nicely in general over time. It is almost never necessary to shrink/reorg - it is sometimes necessary but that is the exception - not the rule.

redo in redo logs

Manish Orkey, January 18, 2011 - 6:42 am UTC

Dear Tom,

First of all a BIG thank you for your quick and convincing explanation to my earlier query.

If redo log and datafiles, both are the seat of data then :-
(i) how is redolog different from datafile...apart from the fact that redo log is overwritten and reused.
(ii) why does ORACLE needs redo log ALSO (in addition to datafiles), if data is what it stores? Why not only have just datafiles and back them up continuously?

Thanks,
Manish Orkey.
Tom Kyte
January 23, 2011 - 3:29 pm UTC

(i) I don't know what to say here. redo is just that - binary data that tells us how to redo a set of operations on the datafiles. Datafiles contain database blocks that have your data on them. You query datafiles, we use redo to manage data in datafiles, to recover it specifically.

(ii) because when you commit, we don't update the randomly distributed blocks in the datafiles, we write redo out that tells us how to redo those changes against those blocks in the datafiles. We'd rather write out a couple hundred bytes to a sequential file than have to write hundreds of 8k blocks in a random fashion to datafiles - a lot more efficient.

does dbwr procecess write uncomitted block in datafile?

riyaz, March 14, 2013 - 3:44 pm UTC

spcial Thank Tom for guidlines of us i have dought about does dbwr process write uncommited block in data file if yes then which situation?and what is mean of irrespective block?
Thanks.
Tom Kyte
March 25, 2013 - 9:16 am UTC

yes, it has to. otherwise a transaction could never be larger than the buffer cache.


uncommitted changes are written to disk constantly, in all circumstances, it is very normal.

Difference between Rollback and Recovery in case of uncommited transaction

Amiya, May 18, 2015 - 1:32 pm UTC

Hi Tom,

You have explained above
rollback is a logical operation, it logically restores the table to what it was, but it does that by running a compensating transaction. It is not like recovery which physically restores - bit for bit, byte for byte - the database to the way it way, rolling back logically 'fixes' the database.

I want to know why cannot it restore tha data from undo tablespace in case of Rollback. instead of doing it logically compensating transactions.

And how recovery is physical restore (bit for bit, byte for byte ). As per documentation recovery is done by using redo log files which contains binary data that tells us how to redo a set of operations on the datafiles.

thanks in Advance

Doubt

A reader, September 04, 2017 - 12:17 pm UTC

hi every one,

this is satya. i have doubt, you can said above redo log writing committed and uncommitted data also..let assume suddenly data base is crash. now we all are apply the backup data to database
and apply the redo log data(archive log data) also, here my doubt
is what data is apply to data base committed data or uncommitted data will apply please explain...........

thank you.
Chris Saxon
September 04, 2017 - 2:09 pm UTC

After applying all the changes from the redo logs (roll forward), any uncommitted transactions are undone (rollback). So after recovery is complete, you only have data from transactions committed before the crash.

How does rollback happen

Chayan, May 03, 2018 - 10:31 pm UTC

Hi Tom,
Very nice explanation that clarified a lot of my doubts.
So LGWR and DBWR writes both committed and uncommitted data of a transaction to disk i.e. to redo log files and data files on disks respectively.

Another question:-

I am wondering with host of these uncommitted transactions in redo log files as well as data files, suppose Oracle recovers from a crash, it needs to rollback any uncommitted transactions/data - how does it do that? How far back in the redo logs it checks to see for transactions that need to be rolled back and what parameters it uses to track and rollback the uncommitted data.

Connor McDonald
May 07, 2018 - 4:48 am UTC

Before we can re-use a redo log (ie, lets say we've filled 1 => 2 => 3 and we want to cycle back around to redo #1) we ensure that any changes 'contained' in that redo have already been flushed (checkpointed) to the data files.

That way, in an instance crash recovery, we do not need to hunt back through 10's or 1000's of archived redo - because the datafiles are *at least* as up to date as the oldest entries in the redo logs.

Uncommitted data in which datafile

Krishna yadav, December 01, 2019 - 1:20 pm UTC

We see from above all conversations uncommitted data goes into datafile , now does it move to undo tablespace or segment related tablespace of datafile ?

Now extending it bit further if my uncommitted data is in my datafile during instance recovery , this uncommitted data is how identified to be rolled back and how data from redo log is used since that to stores similar data ???
Connor McDonald
December 13, 2019 - 3:14 am UTC

Both - lets assume we have "no" memory on our server, so changes almost immediately are pushed to disk. A typical work flow is:

We want to update a row. Which means we need to record some undo for that change. But *any* change to anything in oracle is recorded in redo. So

1) we record some redo for the undo
2) we write some undo
3) we record some redo for the actual change
4) we write the change to the datafiles (because this scnenario has no memory)

In a real situation, (2) and (4) might be done just in memory, and the disk write happens later. It doesn't really matter.

During instance recovery, we are looking at redo. So

We see the redo from (1), and thus put the undo back in place
We see the redo from (3), and hence put the uncommitted change back in place.

We can now use the undo to rollback that uncommitted change.