Skip to Main Content
  • Questions
  • How does buffer access happens for a particular table's buffers?

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Aman.

Asked: March 04, 2010 - 10:09 pm UTC

Last updated: March 09, 2010 - 4:13 pm UTC

Version: 11.2.0.1

Viewed 1000+ times

You Asked

Hi sir,

When for a table, a select command is issued, how does the lookup happens? I am aware that in the buffer cache, the buffers are managed by the DBA but how the buffer lookup happens? I mean to say, for example, I ran a query like 'select * from emp". For the EMP table, how the lookup would be done that where the blocks of this table are located?

Regards
Aman....

and Tom said...

we would first come up with a plan.

Let's say that plan is a full scan.


Next we would read information to help us get the plan started - in this case, we need the extent information for EMP to full scan it, so we load that up.

Now, we know the first N blocks we want to read (full scans use multiblock IO). We start by looking in the buffer cache for the first block - to do this, we take the DBA (data block address - the file/block information) and hash it to a number between 1 and M - where M is the number of "list of blocks we maintain in the buffer cache". In the cache - we store an array of pointers to linked lists, each linked list is a list of cached blocks. We start by figuring out what list would/could have our block on it. Then, we latch that list (cache buffers chain latch) and search it for our block.

If we find it - ok, go onto the next block, else - we say "ok, we need to physical IO block 1, we need block 2 now" and do that process over and over. If we find a block (say block 4) in the cache - we know we need to get blocks 1,2,3 from disk - so we issue the multiblock IO, read them in, hash their DBA's and put them on the right lists in the cache. Now we have blocks 1..4 - we continue with 5 and so on.


Basically, to find a block we

a) hash the DBA
b) that tells us an index slot to look at
c) walk the list of blocks that slot points to (in isolation, using a latch)
d) either find or not the block in question
e) if not found, physical IO and put it in the cache


Rating

  (7 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Thomas, March 05, 2010 - 7:36 am UTC

What controls how many index slots will be used? I assume
that index slot is a different term for hash bucket, right?

Does Oracle ever have to re-hash the entire buffer cache
data structure, e.g., to increase the number of hash buckets?

How does multiversioning come into play here, as the same
block can be in the buffer cache several times as of
different points in time? Or will only the corresponding
undo blocks that allow the reconstruction of an earlier
point in time be cached?
Tom Kyte
March 05, 2010 - 11:21 am UTC

what controls size of array of pointers? the sga of the buffer cache does.

resizing the sga components can cause things to be moved around - yes.

We cache multiple versions of a block (you can query v$bh and count/group by to see this).


Thanks sir.

Aman...., March 06, 2010 - 9:41 pm UTC

Thanks so much sir. So it means once the extent information is loaded for the requested segment, the available blocks of it are hashed using their DBA. Once key-value pair using the hash function is generated, the lookup for the similar ky-value pair is started using the Cache Buffers Chain latch. Is it correct? In the hash table, I believe, each hash value owns a bucket which is referred as Hash Buckets. Does hash Bucket represents the same 'index slot' that you have mentioned? Also, is it mandatory that the buffers of one segment must remain in the same hash bucket or they can exist in other buckets as well?

Thanks once again sir!

Regards
Tom Kyte
March 09, 2010 - 10:48 am UTC

there are N (linked) lists of blocks in the buffer cash. To figure out what list a given block would be found on, you take the DBA (data block address) and hash it - to a number between 1 and N. We latch onto that list and walk it to find the block.


index slot, hash bucket - it is an index into a set of 1..N things.

thanks sir.

Aman, March 09, 2010 - 11:00 am UTC

Thanks so much sir! A follow up question which came to my mind today about the buffer cache. We have a LRU list and the dirty list(containing the dirty buffers). When the buffer becomes dirty and there is a need to make space in the buffer cache, dirty buffers are moved to the dirty list which are eventually flushed out the DBWR. The question is that since both are just linked lists, when an address of a buffer is moved from lru list to the dirty list, there is no movement of the actual buffer to anywhere. Than how come its termed that this movement makes a space of a new incoming buffer ?

Thanks so much once again sir!

Regards

Tom Kyte
March 09, 2010 - 11:52 am UTC

because there isn't even an LRU list in reality. Using those concepts however makes it easy to *talk* about the things.

Aman, March 09, 2010 - 12:05 pm UTC

because there isn't even an LRU list in reality.
Oh! I did know that LRU is now called REPL list but I didn't know that there is no such list at all!! That's a complete shock sir! So than what does it mean by Free Buffer Wait event which happens when DBWR is writing dirty buffers to the data file to make space? It doesn't write on every 3seconds now I believe as it used to in the previous. I can't seem to recall the parameter which take care of the percentage of the batch that DBWR looks for before it starts writing the buffers.In the Performance Tuning guide of Oracle Univ, there is a reference of such a list where the dirty buffers are collected.So what that book is referring to sir? Even the same is mentioned almost everywhere whenever there is a talk about buffer cache.

regards
Regards
Tom Kyte
March 09, 2010 - 12:48 pm UTC

"REPL" list? I don't know what that is.

There is a list, it is not your typical LRU list with blocks moving around based on their last reference.

Free buffer waits are what the always were - and they do not happen when dbwr is writing to disk, they happen when you need a buffer and for (many different) reasons, there isn't one available. One thing that might happen as a result of you waiting for a buffer is that DBWR will be asked to make room, but that isn't the cause - it is an effect.


Aman, March 09, 2010 - 1:07 pm UTC

Sir, the reference of REPL list was taken from here.
http://prutser.files.wordpress.com/2008/12/checkpointsukoug.pdf

There is a list, it is not your typical LRU list with blocks moving around based on their last reference.


So which list is this one sir?

Free buffer waits are what the always were - and they do not happen when dbwr is writing to disk, they happen when you need a buffer and for (many different) reasons, there isn't one available. One thing that might happen as a result of you waiting for a buffer is that DBWR will be asked to make room, but that isn't the cause - it is an effect.


Okay, this I understood sir, thanks a bunch for it! Okay, so let me ask like this, there is a reference of a list containing all the addresses of the buffers which are dirty which is mentioned at many places as Dirty list. So why does that list is maintained? What I understand is that using that list, incremental checkpointing is maintained but it would be great if you would explain about it.

Thanks and regards

Tom Kyte
March 09, 2010 - 4:13 pm UTC

Well, REPL is definitely not a standard term.

We call it an LRU, but it isn't. We use a touch count algorithm along with 'aging'.


The dirty list you talk of is just a list of pointers to blocks (that are pointed to by other lists themselves - used to organize and access the blocks). The dirty list is just organized in a manner to make it so that the 'older' changes (blocks with older changes) get written before blocks with 'newer' changes - to reduce the amount of redo necessary for crash recovery.



Similar Problem of buffer is not access

Prafulla, April 07, 2010 - 1:32 pm UTC

I have two schemas in a databse PRDPUB and TOOLSERVER having one table bp_user_profile on both the tables having same amount of data ,index and constraints.I am using one query to delete about 26000 rows from both the table ,but in PRDPUB schemas it is hanging but in TOOLSERVER schemas is is taking very less time .While i tried deleting 100 rows in PRDPUB it took 44 sec and in TOOLSERVER it is taking 13 ms.I checked the explain plan and it is very much similar. But when i checked the trace file i got some difference like in PRDPUB, it is parsing and executing 2 times also doing the manipulation in the disk while in TOOLSERVER it is parsing and executing once also doing the manipulation in the memory(buffer).

I can't able to find the reason . I tried to generate the statistics and rebuild the index but no improvement.For your reference i am attaching the explain plan and trace file output. Could you please check and suggest me some solution


explain plan for "delete from bp_user_profile where id_user in (select id_user from (SELECT * FROM bp_user_profile WHERE
id_home_category=5005 AND DT_USER_LAST_UPDATED <'01-DEC-09' AND ID_USER_STATUS_ACTIVE ='DELETED_BY_ADMIN'
OR ID_USER_STATUS_ACTIVE ='DELETED_BY_USER') where rownum<=100);"

PLAN_TABLE_OUTPUT in PRDPUB schema
------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 288284804

-------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | 100 | 7600 | 1906 (1)| 00:00:23 |
| 1 | DELETE | BP_USER_PROFILE | | | | |
| 2 | NESTED LOOPS | | 100 | 7600 | 1906 (1)| 00:00:23 |
| 3 | VIEW | VW_NSO_1 | 100 | 1300 | 1885 (1)| 00:00:23 |
| 4 | SORT UNIQUE | | 100 | 2100 | | |
|* 5 | COUNT STOPKEY | | | | | |
|* 6 | TABLE ACCESS BY INDEX ROWID | BP_USER_PROFILE | 86428 | 1772K| 1885 (1)| 00:00:23 |
| 7 | BITMAP CONVERSION TO ROWIDS | | | | | |
| 8 | BITMAP OR | | | | | |
| 9 | BITMAP CONVERSION FROM ROWIDS | | | | | |
|* 10 | INDEX RANGE SCAN | BP_USER_PROFILE_ID_USER_STATUS | | | 12 (0)| 00:00:01 |
| 11 | BITMAP AND | | | | | |
| 12 | BITMAP CONVERSION FROM ROWIDS| | | | | |
|* 13 | INDEX RANGE SCAN | BP_USER_PROFILE_ID_USER_STATUS | | | 2 (0)| 00:00:01 |
| 14 | BITMAP CONVERSION FROM ROWIDS| | | | | |
|* 15 | INDEX RANGE SCAN | BP_USR_PROF_IDCATG | | | 79 (2)| 00:00:01 |
|* 16 | INDEX UNIQUE SCAN | BP_USR_PROF_PK_ID_USER | 1 | 63 | 1 (0)| 00:00:01 |

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - filter(ROWNUM<=100)
6 - filter("ID_USER_STATUS_ACTIVE"='DELETED_BY_USER' OR "ID_USER_STATUS_ACTIVE"='DELETED_BY_ADMIN' AND
"ID_HOME_CATEGORY"=5005 AND "DT_USER_LAST_UPDATED"<'01-DEC-09')
10 - access("ID_USER_STATUS_ACTIVE"='DELETED_BY_USER')
13 - access("ID_USER_STATUS_ACTIVE"='DELETED_BY_ADMIN')
15 - access("ID_HOME_CATEGORY"=5005)

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
16 - access("ID_USER"="$nso_col_1")

PLAN_TABLE_OUTPUT in TOOLSERVER schema
------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 288284804

-------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | 102 | 7752 | 2892 (2)| 00:00:35 |
| 1 | DELETE | BP_USER_PROFILE | | | | |
| 2 | NESTED LOOPS | | 102 | 7752 | 2892 (2)| 00:00:35 |
| 3 | VIEW | VW_NSO_1 | 100 | 1300 | 2871 (1)| 00:00:35 |
| 4 | SORT UNIQUE | | 100 | 2100 | | |
|* 5 | COUNT STOPKEY | | | | | |

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
|* 6 | TABLE ACCESS BY INDEX ROWID | BP_USER_PROFILE | 170K| 3500K| 2871 (1)| 00:00:35 |
| 7 | BITMAP CONVERSION TO ROWIDS | | | | | |
| 8 | BITMAP OR | | | | | |
| 9 | BITMAP CONVERSION FROM ROWIDS | | | | | |
|* 10 | INDEX RANGE SCAN | BP_USER_PROFILE_ID_USER_STATUS | | | 22 (0)| 00:00:01 |
| 11 | BITMAP AND | | | | | |
| 12 | BITMAP CONVERSION FROM ROWIDS| | | | | |
|* 13 | INDEX RANGE SCAN | BP_USER_PROFILE_ID_USER_STATUS | | | 22 (0)| 00:00:01 |
| 14 | BITMAP CONVERSION FROM ROWIDS| | | | | |
|* 15 | INDEX RANGE SCAN | BP_USR_PROF_IDCATG | | | 78 (2)| 00:00:01 |
|* 16 | INDEX UNIQUE SCAN | BP_USR_PROF_PK_ID_USER | 1 | 63 | 1 (0)| 00:00:01 |

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - filter(ROWNUM<=100)
6 - filter("ID_USER_STATUS_ACTIVE"='DELETED_BY_USER' OR "ID_USER_STATUS_ACTIVE"='DELETED_BY_ADMIN' AND
"ID_HOME_CATEGORY"=5005 AND "DT_USER_LAST_UPDATED"<'01-DEC-09')
10 - access("ID_USER_STATUS_ACTIVE"='DELETED_BY_USER')
13 - access("ID_USER_STATUS_ACTIVE"='DELETED_BY_ADMIN')
15 - access("ID_HOME_CATEGORY"=5005)

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
16 - access("ID_USER"="$nso_col_1")

34 rows selected.



trace file output

delete from prdpub.bp_user_profile where id_user in (select id_user from (SELECT * FROM prdpub.bp_user_profile WHERE
id_home_category=:"SYS_B_0" AND DT_USER_LAST_UPDATED <:"SYS_B_1" AND ID_USER_STATUS_ACTIVE =:"SYS_B_2"
OR ID_USER_STATUS_ACTIVE =:"SYS_B_3") where rownum<=:"SYS_B_4")

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 2 0.00 0.00 0 0 0 0
Execute 2 0.61 0.61 1027 1574 8292 200
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.61 0.61 1027 1574 8292 200

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: SYS

Rows Row Source Operation
------- ---------------------------------------------------
0 DELETE BP_USER_PROFILE (cr=1542994 pr=527 pw=0 time=43954914 us)
100 NESTED LOOPS (cr=794 pr=13 pw=0 time=125348 us)
100 VIEW VW_NSO_1 (cr=558 pr=0 pw=0 time=95818 us)
100 SORT UNIQUE (cr=558 pr=0 pw=0 time=95618 us)
100 COUNT STOPKEY (cr=558 pr=0 pw=0 time=94328 us)
100 TABLE ACCESS BY INDEX ROWID BP_USER_PROFILE (cr=558 pr=0 pw=0 time=94325 us)
103 BITMAP CONVERSION TO ROWIDS (cr=481 pr=0 pw=0 time=93891 us)
1 BITMAP OR (cr=481 pr=0 pw=0 time=93879 us)
1 BITMAP CONVERSION FROM ROWIDS (cr=88 pr=0 pw=0 time=11643 us)
22224 INDEX RANGE SCAN BP_USER_PROFILE_ID_USER_STATUS (cr=88 pr=0 pw=0 time=77 us)(object id 89712)
1 BITMAP AND (cr=393 pr=0 pw=0 time=80750 us)
1 BITMAP CONVERSION FROM ROWIDS (cr=10 pr=0 pw=0 time=1150 us)
1911 INDEX RANGE SCAN BP_USER_PROFILE_ID_USER_STATUS (cr=10 pr=0 pw=0 time=21 us)(object id 89712)
3 BITMAP CONVERSION FROM ROWIDS (cr=383 pr=0 pw=0 time=105831 us)
181506 INDEX RANGE SCAN BP_USR_PROF_IDCATG (cr=383 pr=0 pw=0 time=26 us)(object id 88278)
100 INDEX UNIQUE SCAN BP_USR_PROF_PK_ID_USER (cr=236 pr=13 pw=0 time=8945 us)(object id 88279)

********************************************************************************
delete from toolserver.bp_user_profile where id_user in (select id_user from (SELECT * FROM toolserver.bp_user_profile WHERE
id_home_category=:"SYS_B_0" AND DT_USER_LAST_UPDATED <:"SYS_B_1" AND ID_USER_STATUS_ACTIVE =:"SYS_B_2"
OR ID_USER_STATUS_ACTIVE =:"SYS_B_3") where rownum<=:"SYS_B_4")

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.13 0.12 0 741 3144 100
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.13 0.12 0 741 3144 100

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: SYS

Rows Row Source Operation
------- ---------------------------------------------------
0 DELETE BP_USER_PROFILE (cr=741 pr=0 pw=0 time=123855 us)
100 NESTED LOOPS (cr=741 pr=0 pw=0 time=95704 us)
100 VIEW VW_NSO_1 (cr=528 pr=0 pw=0 time=94093 us)
100 SORT UNIQUE (cr=528 pr=0 pw=0 time=94091 us)
100 COUNT STOPKEY (cr=528 pr=0 pw=0 time=93729 us)
100 TABLE ACCESS BY INDEX ROWID BP_USER_PROFILE (cr=528 pr=0 pw=0 time=93724 us)
102 BITMAP CONVERSION TO ROWIDS (cr=490 pr=0 pw=0 time=93380 us)
1 BITMAP OR (cr=490 pr=0 pw=0 time=93369 us)
1 BITMAP CONVERSION FROM ROWIDS (cr=93 pr=0 pw=0 time=11663 us)
22722 INDEX RANGE SCAN BP_USER_PROFILE_ID_USER_STATUS (cr=93 pr=0 pw=0 time=78 us)(object id 87755)
1 BITMAP AND (cr=397 pr=0 pw=0 time=80340 us)
1 BITMAP CONVERSION FROM ROWIDS (cr=11 pr=0 pw=0 time=1118 us)
1911 INDEX RANGE SCAN BP_USER_PROFILE_ID_USER_STATUS (cr=11 pr=0 pw=0 time=15 us)(object id 87755)
3 BITMAP CONVERSION FROM ROWIDS (cr=386 pr=0 pw=0 time=104382 us)
181849 INDEX RANGE SCAN BP_USR_PROF_IDCATG (cr=386 pr=0 pw=0 time=24 us)(object id 87758)
100 INDEX UNIQUE SCAN BP_USR_PROF_PK_ID_USER (cr=213 pr=0 pw=0 time=1194 us)(object id 87754)

********************************************************************************

linked list implelmtation in Oracle internal

Andy, July 30, 2012 - 11:55 am UTC

Hi All,

this is very good and well explained linked list ADT implementation in Oracle internals

cheers!!