Skip to Main Content

Breadcrumb

May 4th

Question and Answer

Chris Saxon

Thanks for the question, Raymond.

Asked: June 20, 2000 - 9:43 am UTC

Last updated: October 01, 2019 - 1:19 pm UTC

Version: version8.0.5

Viewed 10K+ times! This question is

You Asked

What benefits can be gained from changing from 2k blocks to 4k blocks or even 8k blocks


and Tom said...

Here are a couple of pros and cons (some of them will be both pros and cons -- depends on your perspective)....

going from 2k to Xk where X > 2...

o the amount of block overhead in your database will decrease. There is a fixed amount of overhead per block. If you have a block that is 4 times LARGER you will have less then 1/4 the block overhead after you change over.

o you will have less row chaining and perhaps less row migrations. Row chaining happens when a row is too large to fit on one contigous block -- this is much less likely in an Xk block then a 2k block. Row migration happens when we attempt to update a row but discover there is insufficient space on the block to do so. We leave the "head" of the row on the original block (cannot change the rowid) and put the row itself on another block entirely. Going from 2k with a 10% pctfree (default) to an Xk block with 10% free means we have perhaps more free space on each block to accomidate this (but we also have more rows per block so it might not matter all of the time).

o you will have more people contending for the same block perhaps. We do lots of stuff at the block level -- you may notice more contention in this area.

o you will have the same amount of data cached (volume) but the granularity of the data will be N times larger. For every row you want, you might get N times as many rows pulled along. This can be good -- this can be bad. It really depends on what you are doing. In a heavy duty OLTP system with lots and lots of scattered reads all over the place -- this might be bad as no one is going after the same blocks (eg: I read a row on a block and no one else ever needs that block -- i cached lots of stuff no one is going to use and I may have bumped out some other data that we could have used). In a query intensive system (DSS, Data Warehouse), it may be very good as I was going to read the blocks right around my data anyway. And - people are going to be needing that data again and again.

o you'll get more rows / block in indexes. This will result in indexes that require less maintenance as they tend to need to reorganize themselves less frequently.

o larger block sizes allow for larger index keys (although whether or not THIS is a good thing is in question as well, there is something to be said for reasonably sized index entries). The size of the index keys is a little less then about 40% of the block size. In an 8k database for example, you'll get:
create index t_idx on t(x,y,z)
*
ERROR at line 1:
ORA-01450: maximum key length (3218) exceeded

Ok a 2k block -- it would be less.




Also -- try to find the native OS blocksize of your system and make sure to use a block size that is at least as large as this AND is a multiple of this if possible. Some of the old considerations (eg: pre 7.3 -- you had to consider the blocksize would limit the maxextents of a table) do not apply.


Rating

  (71 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

OS block size

Erik, October 23, 2002 - 7:27 am UTC

Okay, so we need to set the db block size as a multiple of the OS block size, but how do we find the OS block size, for windows or unix systems?

Tom Kyte
October 23, 2002 - 7:28 am UTC

Ask your OS admin. the tools vary from platform to platform.

Sagi, October 23, 2002 - 8:34 am UTC

Hi Tom,

If I am not wrong on unix we can use

df -g

to find the OS block size. But how can we check on Win-NT/2000.

I have tried to get this by searching google but was not able to. Would appreciate your answer.

Regards,
Sagi

Tom Kyte
October 23, 2002 - 8:47 am UTC

[tkyte@tkyte-pc-isdn tkyte]$ df -g
df: invalid option -- g
Try `df --help' for more information

[tkyte@tkyte-pc-isdn tkyte]$ uname -a
Linux tkyte-pc-isdn.us.oracle.com 2.4.18-14 #1 Wed Sep 4 13:35:50 EDT 2002 i686 i686 i386 GNU/Linux

so, not all "unix" (i count linux as unix personally)

as I said, -- varies OS by OS by OS. As an NT "guru", I don't even have a windows machine to play with anymore.

NT Block Size

Mark A. Williams, October 23, 2002 - 9:49 am UTC

IIRC... block size on NT/WinMM is 512 bytes...

HTH,

Mark

kkrause, February 14, 2003 - 4:58 pm UTC

perl -e '$a=(stat ".")[11]; print $a' will get it on a Linux box

NTFS block size

Alex Smelyansky, March 25, 2003 - 7:55 pm UTC

1. For wintel default size is 512 byte, but you can change it by pre-formatting hard drive.

2. Use fsutil fsinfo ntfsinfo c: to get

c:\>fsutil fsinfo ntfsinfo c:
NTFS Volume Serial Number : 0x1e345b64345b3e49
Version : 3.1
Number Sectors : 0x0000000004a852c0
Total Clusters : 0x0000000000950a58
Free Clusters : 0x00000000003ae433
Total Reserved : 0x0000000000000000
Bytes Per Sector : 512
Bytes Per Cluster : 4096 -- this is answer
Bytes Per FileRecord Segment : 1024
Clusters Per FileRecord Segment : 0
Mft Valid Data Length : 0x000000000d72a000
Mft Start Lcn : 0x00000000000c0000
Mft2 Start Lcn : 0x00000000004a852c
Mft Zone Start : 0x0000000000800020
Mft Zone End : 0x000000000083b520

3. To get all possible OS block sizes use
c:\ format /?

4. In explorer select your hard drive, click right button, select "Format..." from
context menu, and see your OS block size as "Allocation unit size".

5. Tom, I wonder, why they know there systems so badly!

Tom Kyte
March 26, 2003 - 7:19 am UTC

Hey, its windows, you aren't supposed to have to know anything.


Seems we're not all as clever as we think!

Steve, March 26, 2003 - 11:16 am UTC

Microsoft Windows 2000 [Version 5.00.2195]
(C) Copyright 1985-2000 Microsoft Corp.

C:\>fsutil c:
'fsutil' is not recognized as an internal or external command,
operable program or batch file.

C:\>fsinfo c:
'fsinfo' is not recognized as an internal or external command,
operable program or batch file.

C:\>ntfsinfo c:
'ntfsinfo' is not recognized as an internal or external command,
operable program or batch file.

C:\>

And right clicking on the drive - selecting Format shows a format box which only says "Default Allocation Size" - not the actual size!

Updated response.

MSaeed, April 10, 2003 - 2:11 pm UTC

It's been a while since you responded to the original Question. Would you say that response still holds true in Oracle 8i Env.

I am faced with a schema that uses a lot of Longs, Varchar2(2000), varchar2(255) etc.... in an OLTP env.

What should be my primary driving factor for choosing a block size: Row chaining, row migration, block contention or something else ?

If you feel you have answered this somewhere else, could you please provide me with a link. I have access to metalink as well, but your answers are usually to the point.

Greatly appreciate your updated response.


Tom Kyte
April 11, 2003 - 7:52 am UTC

The information holds true -- but if you have lots of fat rows -- smaller blocks would be a pain, lots of chaining.

We can fix block contention via other methods (pctfree, minimize records per block) so the chaining/migration would be it.

LONGS -- they will be chained, period, no avoiding it. You should use CLOBS instead.

Thin rows initially becoming fat later

A reader, April 14, 2003 - 5:07 pm UTC

Hi Tom:
Thanks for the response.

"The information holds true -- but if you have lots of fat rows -- smaller blocks would be a pain, lots of chaining."

We can't control that since we have tons of vaqrchar2(2000) columns that can expand later on.

"We can fix block contention via other methods(pctfree, minimize records per block) so the chaining/migration would be it."

Good point.

"LONGS -- they will be chained, period, no avoiding it. You should use CLOBS instead. "

We are working on that migration.

Is there any truth in the relation of VLM (very large memory addressing for SGA sizes) and the d/b block size ?
What I mean is that to have an SGA size of over 6 GB must I have a 16k block size? is this dependant on the underlying OS?

Thanks.
MSaeed


Tom Kyte
April 14, 2003 - 5:21 pm UTC

The larger the SGA, the more optimum larger blocks become -- less blocks to manage and look at when looking for a block in the cache.

If you have a transactional system, I might wonder why you would need a super large SGA. Lots of single row, keyed reads of disparate data...

A reader, April 16, 2003 - 1:45 pm UTC

I guess that is one indication that the sales guys are doing their job...well.

Seriously though.

But why would a super large SGA not help for an OLTP system? Is achieving the "fully" cached d/b not good in certain situations?

It seem that, a min of 8Gb with at least quad CPU cofig is becomming the norm for any server hosting a d/b app and a SAN thrown in for data storage.

Thanks.
MS


Tom Kyte
April 16, 2003 - 6:33 pm UTC

transactional systems create tons of data that is not read right back -- maybe never read, maybe read by batch much later. The db would be fully cached but only 1% of it is accessed. So, you have 99% junk in the SGA getting in the way -- having to be managed.

bigger is not always better.

and on an 8gb machine, my SGA would still be in the 1-3gig range at best I would suspect, unless I'm using shared server -- then my large pool needs may will cause it to exceed that.


Different block sizes for different tablespaces

Steve, May 23, 2003 - 5:24 am UTC

We have an oltp system and the database has a block size of 4k. However, we do run reports off of some of the tables for which I have created mv's to ease the pressure off the oltp side of things. Is there a case for putting these mv's in their own tablespace with a different block size say 8k or even 16k?

Tom Kyte
May 23, 2003 - 8:51 am UTC

I've not seen anything one way or the other on this really.

There are theoretical pros and cons to both.

pros -- bigger blocks better for full scan, less block overhead
cons -- memory set aside for 4k cannot be used for 8k blocks and vice versa. You might have a ton of free memory that the 4k system wants but cannot use or vice versa.

But.......

Steve, May 23, 2003 - 9:42 am UTC

they must've been created to have some advantage surely?

Tom Kyte
May 23, 2003 - 10:02 am UTC

to allow you to transport your 2k block sized tablespaces from your OLTP system to your 16k block size data warehouse for ETL'ing without doing the dump and load.

Is more than one Buffer Cache possible?

Tony, June 12, 2003 - 2:39 am UTC

Tom, I think its possible to define different block size for each tablespace in Oracle9i. Do we have to create separate buffer cache for different blocksizes?

Note:- Tablespaces with different block size is possible and need to create different Buffer Pool in DB2.


Tom Kyte
June 12, 2003 - 8:47 am UTC

if you use more then one blocksize, you need to configure separate caches for each one.



Hybrid Systems

A reader, August 28, 2003 - 12:43 pm UTC

Hi

We have an OLTP and Reporting Hybrid System (under two schemas), currently running Oracle 9.2.0.1 on Redhat Advanced Server 2.1. We set db_block_size to 8k, my question is can I recreate my reporting schema's table in 16k tablespaces and set a db cache for that? I am not sure if Oracle's multi block size feature is meant to used in my situation.

Also I want to know if bigger db_block_size can "waste" data block buffer, for example in 4k data blocks let's say I can fit 100 rows in it and in 8k I can fit 200 rows however queries dont always visit same blocks to get the desired result. For example I have table X with 1000000 rows, a query returns 50000 rows, if these 50000 rows are in 500 blocks in both 4k and 8k, in the latter case I am wasting my db cache right?

Cheers

Tom Kyte
August 29, 2003 - 7:58 am UTC


you can -- but you would want to ask yourself "why". do you know that 16k is better? have you tested it out?

the multiple block size feature was designed to allow you to transport a tablespace from a database with blocksize X to a database with blocksize Y so you can read that data, transform it and load it into the database quickly. you would then get rid of the blocksize X tablespace after you loaded it.



Obviously -- in the example you propose -- if the data is in 500 4k blocks, it will take 50% of the buffer cache of 500 8k blocks. Have you "wasted" your buffer cache? no -- you have simply cached 2x the information and undoubtably other sessions may well be interested in it. It could work to your advantage or disadvantage just as easily.



Windows file system utilities

Paul, September 12, 2003 - 9:46 am UTC

fsutil is available on WindowsXP, but MS indicates that it should run fine on W2K.

Windows is terrible at these underlying system tasks. You need to correlate the "Default allocation size" and your hard drive size with this information below to determine the actual size.

Drive Size Cluster Size Number of Sectors
--------------------- ---------------- --------------
512MB or less 512 bytes 1
513MB to 1024MB(1GB) 1024 bytes (1KB) 2
1025MB to 2048MB(2GB) 2048 bytes (2KB) 4
2049MB and larger 4096 bytes (4KB) 8


Of course, you could always upgrade to a flavour of *nix. ;)

Will Oracle Fuse O/S blocks?

Thaha Hussain, September 14, 2003 - 3:35 am UTC

Dear Tom,

If we are giving block size higher than o/s block size, can Oracle tamper the real block size of O/S Or it will create a logical layer on top of it?

- Thaha

Tom Kyte
September 14, 2003 - 9:55 am UTC


we will just ask the OS to write Nk of data -- the OS will do what it does -- at the block size it does things at. we need not do anything special -- not anymore then you would

Very good information

sudhir, September 14, 2003 - 4:21 pm UTC

Some other point and questions

1) 9.2 has tablespace compression option - Having 8k block size would be better than 2k for more efficient compression (likely)

2) Is fast-full-scan better with 8k vs 2k? I guess at block overhead level, 8k would offer better. But overall?

3) If the size of SGA is kept same, having bigger block size means, less blocks to manage, thus better use of latches? May be though, concurrency may be impacted due to lack of free buffers in SGA?

4) Is DBW performance better, due to fact it has to manage less number of blocks?

Thanks

Tom Kyte
September 14, 2003 - 10:12 pm UTC

1) yes

2) 8k will full scan "faster" due to the fact there will be less blocks, yes.

3) as long as you can manage LESS BLOCKS. ponder that... you might have AS MANY or MORE or LESS blocks with 8k vs 2k....

4) see #3. you might have more blocks to flush in order to make room. if you managed lots of 2k blocks, you might have less flushing going on since you can keep your working set in ram. that 8k block might have 7k of non-relevant data. that 2k block might have only 1k of non-relevant data. that is another way to look at it.

DB_BLOCK_SIZE

Jeff, September 19, 2003 - 10:01 am UTC

Tom,

Our Unix OS block size is 8K. We have a Data Warehouse

We have large memory available, please don't ask me how much, that is admin's job.

What is optimal multiblock read count? Currently set at 32.

What is optimal block size? Currently 8k

Now even tho OS is set at 8k, perhaps that should be larger, then set DBMS accordingly. Rather than accepting OS 8k size as given.

Doesn't network transport packet size have some impact here as well in distributed queries, to avoid internetwork packet fragmentation?

I would think why not make multi block 128 and block size 128k - but not sure if too large or too small.

Can you keep it simple, since I am user not DBA and this "should" be transparent to me. In other words, DBMS should handle this sizing for me, not me and not DBA. But I find I do work DBA s/b doing, since this is so complicated and even DBA not have time to look at in such detail. Same with UNIX admin, he no have time or inclination to look at such detail and when I look, he/she say that not my job. So nothing gets done or changed. Sound familiar?

Please advise db block size, multi block read, for 8.1.7 data warehouse environment.


Best regards,
Jeff







Tom Kyte
September 20, 2003 - 5:34 pm UTC

8k is a dandy size.

do you full scan frequently? you might consider finding out your systems maximum read size and adjusting up.

i don't see the tie in between "network packets" and "blocksizes" at all.


you cannot have a 128k blocksize.


sounds like you should just strip and mirror everything and forget about it. since you and the admin don't really want to work together.

and why the dba wouldn't be concerned about the amount of ram available is sort of "way beyond my ability to comprehend".

Rows per block

Ma$e, October 01, 2003 - 12:14 pm UTC

Hi Tom:

I'm sure you have covered this somewhere but I could not find it and this was the closest thread where I think I could post this question. So forgive me if I'm wrong.

How can I compute the number of rows per block for a given table ( "block #", "# of rows" )? Fixed length rows I guess would be easy, but how about rows that have variable length datatypes ?

Thanks
Ma$e


Tom Kyte
October 01, 2003 - 1:54 pm UTC

well, with chained and migrated rows -- its not really possible. the closed thing you can get is

select dbms_rowid.rowid_block_number( rowid ), count(*) from t group by ...

Thanks a lot.

Ma$e, October 01, 2003 - 3:37 pm UTC

Hi Tom:

Thanks for the response. That helps a lot.

If I may.... I have a followup question.

I'm trying to determine the "fullness" of the block (i.e is there room for additional ITL to grow beyond the INITRANS).

How do I determine that a block is so full that it can't create another tx slot? Could this lead to a severe locking on this table ? How do I determine that the locking is due to a low value of initrans.

Thanks.
Ma$e

A reader, December 23, 2003 - 12:07 am UTC

Hi Tom,

Two quick questions:

1. What are the cons of having the os blocksize larger than db blocksize? I mean, can't we adjust it by appropriately changing the db_multiblock_read_count anyway?

2. If we change the os block size, can we restore a physical backup (taken with a different os blcoksize) successfully (please talk about both rman or filesystem)?

Thanks for your great site.

Tom Kyte
December 23, 2003 - 10:00 am UTC

1) the OS is going to read in that size -- and many times -- most all IO's are single block IO's (not multi-block IO's) via indexes hence the dbfmbrc doesn't even come into play.

so, you read more then you need.

2) yes, it is just a file to us. you can do whatever.

Not able to use IOT because of BLOCK_SIZE

Saminathan Seerangan, January 08, 2004 - 2:59 pm UTC

Dear Tom,

Thanks for your kindly help and admire your knowledge on Oracle! 

The reason I am posting my question here is that you suggested NOT to use IOT because of my DB BLOCK_SIZE(which is 2K).

<ASKTOM>
your blocksize is way too small to make that fat of a table effecitive as an 
IOT.  The rows will overflow and this will de-generate into:

overflow segment access by rowid pointer
    index range scan

not any different then

table access by index rowid
     index range scan


except that your inserts will be slower then they would be with normal "table 
plus index"

From: 
http://asktom.oracle.com/pls/ask/f?p=4950:61:::::P61_ID:14338694728363

</ASKTOM>


Oracle8i Enterprise Edition Release 8.1.7.3.0 - Production
With the Partitioning and Parallel Server options
JServer Release 8.1.7.3.0 - Production


My OS BLOCKSIZE is 8192

SQL> show parameter block
 
NAME                                 TYPE    VALUE
------------------------------------ ------- ------------------------------
db_block_buffers                     integer 1000000
db_block_checking                    boolean FALSE
db_block_checksum                    boolean FALSE
db_block_lru_latches                 integer 26
db_block_max_dirty_target            integer 1000000
db_block_size                        integer 2048      <=====================
db_file_multiblock_read_count        integer 8        <=====================
hash_multiblock_io_count             integer 0
sort_multiblock_read_count           integer 2
SQL> 




Now I am really concerned about our DB BLOCK_SIZE after reading your followup for my IOT related question.


Also I read PROS & CONS of BLOCK SIZE issue  here.

8K(>2K) BLOCK_SIZE PROS
=======================
1)The amount of block overhead in the database will decrease.  There is a fixed 
amount of overhead per block

2)Bigger blocks better for full scan, less block overhead

3)Less row chaining and perhaps less row migrations

4)More rows/block in indexes.  This will result in indexes that 
require less maintenance as they tend to need to reorganize themselves less 
frequently. 

5)In IOT less OVERFLOW  compare to 2K

6)Larger block sizes allow for larger index keys.

7)9.2 has tablespace compression option - Having 8k block size would be better 
than 2k for more efficient compression (likely)


8K (>2K) BLOCK_SIZE CONS
========================
1)8k block might have 7k of non-relevant data but 2k block might have only 1k of non-relevant data.

2)More people contending for the same block perhaps.  We do lots 
of stuff at the block level -- you may notice more contention in this area.





Ok. Ultimately I do not see any compelling reason to have 2K block size since we have OS BLOCK SIZE as 8K. Also i see more PROS then CONS.

So it will be appropriate to change the BLOCK_SIZE from 2K to 8K when we get a chance(The time when we do upgrade to 9i- may be within a month or so)

Please let me know your comments.

Thanks again,
Sami
 

Tom Kyte
January 08, 2004 - 3:36 pm UTC

so, your oracle blocksize is 2k, thats the gating factor here.

multi block read count only concerns us during full scan, which never happens during "lookup" type processing. it is not relevant.


when you goto 9i, you can easily test (create an 8k blocksize tablespace and alter table move stuff into it and see how it goes).

but if you are going to rebuild during the upgrade anyway and your os blocksize is truly 8k (confirm with your SA), 8k is a fine blocksize to use in general

OPS/RAC and DB_BLOCK_SIZE

A reader, May 03, 2004 - 4:33 pm UTC

Dear Tom,

Does OPS/RAC has anything to do with choosing DB_BLOCK_SIZE value? My co-worker is telling that smaller DB_BLOCK_SIZE is better for OPS/RAC because of pinging between instance.

Is that true?


Tom Kyte
May 03, 2004 - 7:58 pm UTC

in 9i, no, not really -- the pinging is happening over the wire.

in 8i, when pinging was done to disk -- maybe, maybe not -- depends if you could save significantly on the speed of a 2k write vs 4k write vs 8k write to disk.

Impact of BLOCKSIZE on SQl*Loader

Lisa Spory, May 25, 2004 - 10:42 am UTC

Tom,

We built our new Production data warehouse with a 32K Blocksize. We are now attempting to run our "History Load" SQL*Loader routine which will load 8 years of historical data into the warehouse.

The Tablespace that we are loading history data into is 2K, we created a 2K cache and a 2K Tablespace assuming that it was more suitable for such high insert activity. We then plan to move the data into the 32K partitioned tables.

However, the SQL*Loader routine runs 50% slower in the data warehouse then it did in test databases having an 8K blocksize.

How would the database 32K blocksize, and the 2K tablespaces/cache factor into this slowdown?

I looked at a tkprof report of a sample run thinking that perhaps the db_file_multiblock_readcount of 32 with a 2K cache would make Oracle favor FTS, but the query plans looked good. The only thing I did notice was that the (fetch count / rows fetched) was > 1 for many of the lookup queries.

Any light you can shed on this would be greatly appreciated. The code is poorly designed (row-by-row), but without permission or time to rewrite the load routine it is very important to at least see the level of performance experienced on development boxes.

Thanks!

Tom Kyte
May 25, 2004 - 11:21 am UTC

2k only makes sense in a highly CONCURRENT modification environment. Less contention for a given block.

with 2k, you were doing lots more work (managing lots of little things) for your sqlldr session. best to go with the standard blocksize, period.







Pinging over the wire

Naresh, July 13, 2004 - 10:52 am UTC

"in 9i, no, not really -- the pinging is happening over the wire."

Tom,

How much is the overhead for the ping in 9i?

Consider a situation where the PK is a sequence and the table is being populated from two instances. The table can get upto 40 records per second (20 per instance) at peak times.

How would the contention for the PK block impact this in 9i?



Tom Kyte
July 13, 2004 - 12:25 pm UTC

the block would be going back and forth over the interconnect -- it would be as fast as the wire permitted it.


Choice of block size from third party tool

Sean, May 27, 2005 - 4:12 pm UTC

Hi Tom,

We are using 16k block for tablespace and our OS block is 8K. Now we are using third party tool to create file system on OS file system. The tool gives us choices to pick block size: 8k or 16k. Which one should we use?

Thanks so much for your help.



Tom Kyte
May 27, 2005 - 5:33 pm UTC

sorry, not my area of expertise -- I added a new tab on my home page "Other Resources", you might ping one of them for their advice.

I know what I would guess, but I don't like guessing :)

Steve Adams on Block Size

Mark A. Williams, May 27, 2005 - 9:33 pm UTC

This advice from Steve Adams re: block size might be interesting...

</code> http://www.ixora.com.au/tips/buffered_block_size.htm <code>

- Mark

Tom Kyte
May 28, 2005 - 9:11 am UTC

"File system buffer size", is that the same as file system BLOCK size.

file system blokc size vs db block size

Sean, May 28, 2005 - 7:11 pm UTC

Hi Tom,

Then do you agree that file system block size should be equal to db block size if possible, assuming there is no third party tool?

Thanks so much for your help.

By the way, Mark, thanks for the useful link.


Tom Kyte
May 28, 2005 - 7:36 pm UTC

that would be my *guess*, yes.

File System Block Size and File System Buffer Size

Mark A. Williams, May 30, 2005 - 1:48 am UTC

> "File system buffer size", is that the same as file system BLOCK size.

I sent a note to Steve and this is his reply:

[quote]
Hi Mark,

No, file system buffer size is not the same as file system block size.
File systems with small blocks buffer multiple blocks per buffer, and file systems with large blocks use multiple buffers per block.
Blocks are the unit of file system space allocation, whereas buffers are the unit buffer cache space allocation and thus of buffered I/O.
When using buffered I/O, the database block size must match the file system buffer size, regardless of the block size.

PS. You can quote me if you think it would be helpful.
[/quote]

So it seems that when using buffered i/o the Oracle block size should match the file system buffer size. I don't know if the original poster is using buffered i/o or not, but I thought it was an interesting point from Steve all the same.

- Mark

Tom Kyte
May 30, 2005 - 8:50 am UTC

Thanks, I know Steve is very precise in his terminology, so that "buffer" word raised a flag.

file system buffer size

Sean, June 15, 2005 - 12:00 pm UTC

Hi Tom,

I asked support engineers of different vendors. None of them seems to know the meaning of this term. Let me put this way, do you consider file system buffer size when you decide db_block_size? Or you just consider file system block size?

Thanks so much for your help.

Sean

Tom Kyte
June 16, 2005 - 3:23 am UTC

I pretty much use 8 or 16k in most all cases these days personally. 32k if appropriate (warehouse, on OS that supports it)



OS Block size vs. FS Block size Vs FS buffer size (Steve's term)

A reader, August 04, 2005 - 10:07 am UTC

in terms of general guidelines, I found the following
article at Howard's site useful. It makes sense to me
and Howard writes in a very clear fashion:
</code> http://www.dizwell.com/html/db_block_size.html <code>

then I looked at this thread, and noticed that there
are other criteria which may override the criteria
discussed in howard's article. So, is there a nice summary
of guidelines that you think makes sense in regards to db block size vs. fs block size vs. fs buffer size (Steve's term.)

My "guess" is to go with what Howard says in general unless you have a specific requirement for which you want to try out an alternative block size.

In other words, ave ou ever used db block size not equal
to OS block size yourself and if so, what were the results?


Tom Kyte
August 04, 2005 - 10:27 am UTC

I think this sums it up nicely
</code> http://www.ixora.com.au/tips/buffered_block_size.htm <code>

on our real systems, we tend to not use buffered file systems (and with the move to asm and all, those are never buffered)

Now, would I go out and rebuild all of my 8k databases to be 4k to fit the chart? Not unless I identified IO as being a serious problem first.

hmmm...

A reader, August 04, 2005 - 11:20 am UTC

"on our real systems, we tend to not use buffered file systems "

Do you mean that in most systems, we use raw partitions?

I thought that was not the case (not with the systems
we worked on.)

Also, there seem to be some new file systems that allow you
to be do direct I/O as well. Do you know of any such systems? Thanx!

Tom Kyte
August 04, 2005 - 12:15 pm UTC

or directio, or mount options that turn off OS buffering.


/dev/dsk/c0t3d0s3 /dev/rdsk/c0t3d0s3 /d01 ufs 2 yes forcedirectio
/dev/dsk/c0t1d0s0 /dev/rdsk/c0t1d0s0 /d02 ufs 2 yes forcedirectio
/dev/dsk/c2t2d0s3 /dev/rdsk/c2t2d0s3 /d03 ufs 2 yes forcedirectio
/dev/dsk/c3t1d0s1 /dev/rdsk/c3t1d0s1 /d04 ufs 2 yes forcedirectio
/dev/dsk/c3t0d0s0 /dev/rdsk/c3t0d0s0 /log1 ufs 2 yes forcedirectio
/dev/dsk/c2t1d0s0 /dev/rdsk/c2t1d0s0 /log2 ufs 2 yes forcedirectio


from one of my solaris boxes for example

ok..

A reader, August 04, 2005 - 3:15 pm UTC

very briefly,what is the difference between raw partitions, directio, and
mount options that turn off OS buffering. The latter seems
like the FS where you can use direct IO optionally and also
gain the advantages of file systems ("ease of use" etc.)

Tom Kyte
August 04, 2005 - 8:39 pm UTC

they are, in the year 2005, very close to eachother performance wise.

Your last sentence is more or less "dead on". OCFS is an implementation that does that, good for datafiles, but I wouldn't want my binaries on there.

Oracle 10g block size

thirumaran, August 05, 2005 - 10:28 am UTC

Hi Tom,

I am new to oracle 10g and had last worked on oracle projects 3 year's back.

For 10g DB how do I decide on block size. I need to recommend a block size for a Product DB. what factors should i consider while recommending a block size. What are the checklist i use see before recommending a block size.

Examples will be a great help to me.

Thanks in adv
Thirumaran







Tom Kyte
August 05, 2005 - 1:46 pm UTC

that is precisely what is contained on this page?

How about RAC?

A reader, August 05, 2005 - 10:30 am UTC

A know there is no simple answer before the testing is done but .... would you consider larger blocks (16k) for RAC to speed up cache fusion?

Tom Kyte
August 05, 2005 - 1:48 pm UTC

so -- more data on a block, more people concurrently interested perhaps in the block, consuming more ram and hence network bandwidth on the interconnect, perhaps being sent back and forth more frequently because it holds more data

(devils advocate).

No, if you are worried about minimizing interconnect traffic, you'll look "higher" than the block, you'll be looking at partitioning of data and workload accordingly.

How about RAC?

A reader, August 05, 2005 - 2:41 pm UTC

Thanks Tom, my gut feeling was telling me the same ... it can get worse with 16k before it gets better. Thanks for your time.

what is diff. between this two

Nikunj, September 12, 2005 - 9:34 am UTC

Hi Tom,

If my database server have below
DB_BLOCK_SIZE=4096
DB_CACHE_SIZE=12M
DB_2K_CACHE_SIZE=8M
DB_8K_CACHE_SIZE=4M

and if my database server have
DB_BLOCK_SIZE=4096
DB_CACHE_SIZE=12M
DB_2K_CACHE_SIZE=0
DB_8K_CACHE_SIZE=0

what are the proc and cros between this two ?

Regards,
Nikunj

Tom Kyte
September 12, 2005 - 10:24 am UTC

if you don't have any 2k or 8k tablespaces - you would not configure the 2k/8k caches - they only get used if you have them.

If you "unconfigure" them and have tablespaces with those blocksizes - you'll not be able to use them.



Which average size is closer to the actual?

A reader, September 12, 2005 - 2:13 pm UTC

Select avg(CNT)*8192/1024 "AVERAGE_ROW_SIZE", 'PACKAGE' "SOURCE"
from
(select dbms_rowid.rowid_block_number( rowid ), count(*) CNT from test
group by dbms_rowid.rowid_block_number( rowid ))
UNION
Select dt.AVG_ROW_LEN "AVERAGE_ROW_SIZE", 'DICTIONARY' "SOURCE"
from dba_tables dt
where table_name='TEST';

224 DICTIONARY
229.31 PACKAGE

Or are they not even the same thing?????

This is an instance with 8k db block size.

Tom Kyte
September 12, 2005 - 3:17 pm UTC

statistics are stale immediately after gathered and a modification happens.

Also, there are nuances to do with migrated and chained rows and so on.

They sure seem closer than close enough to me.

DB Block Size

Maruthi, May 11, 2006 - 11:41 am UTC

Hi Tom,
The responses are very interesting. I have a question about increasing the database tablespace size in Oracle 10g.
In our concern, for USERS tablespace we are using around 32 Gig of Memory space. We in need to increase this tablespace to 64 GIG. We are using solaris and it contains around 80 GIG of memory.
We tried to increase the USERS tablespace size by increasing the file size and the maximum file size value using the enterprise manager(HTML GUI).
But it actually throws an error as "larger than the allowed number of blocks". After some search, I came to know
that db_block_size of 8k will support up to 32 g memory space for each USERS(Tablespace) datafiles? Now I want to increase our memory. I tried to set the db_block_size to 16 k in init.ora and restarted the oracle server. But it doesn't get set and I couldn't able to increase the USERS tablespace size.
Could you please tell me how to increase the database tablespace size?



Tom Kyte
May 11, 2006 - 7:52 pm UTC

32gig of MEMORY? hmm, tablespaces usually take disk space.

you have a solaris box with 80gb of MEMORY? that is a big machine, not unheard of but big..


Add another datafile. That is all you need to do, alter tablespace users add datafile 'whatever' size XXgb

Impact of BLOCKSIZE on SQl*Loader #2

Serge Shmygelsky, June 09, 2006 - 2:58 am UTC

Hello Tom,

hope you're well.
I have the same question as one of the guys on this page. We're loading a lot of data with SQL*Loader and suddenly found that loading speed in the production system is way slower than in the testing one (about 6-8 times slower). The only difference which I can observe is that in production system we have 16K tablespace and in testing it is 8K. Why could it happen and what can be done to make things better? I wouldn't like to change production blocksize as it is a kind of DWH system and we increased blocksize deliberately to have maximum performance for selects.

Tom Kyte
June 09, 2006 - 6:40 am UTC

are you sure that is all - same disks, same archivelog mode setting, same number of users (same load on machine), same data...


tkprof it. show us the difference.

Impact of BLOCKSIZE on SQl*Loader #2 - ctd

Serge Shmygelsky, June 15, 2006 - 9:31 am UTC

Hello Tom,

thanks for your attention. I put a logon trigger in the database to catch up what SQLLoader is doing. So the situation is as follows:
we have 2 users created by the same sql clause in the same instance. One of them has tablespace of 16K blocks (production) and another one - of 8K (test):

TABLESPACE_NAME BLOCK_SIZE INITIAL_EXTENT NEXT_EXTENT MIN_EXTENTS MAX_EXTENTS PCT_INCREASE MIN_EXTLEN STATUS CONTENTS LOGGING FOR EXTENT_MAN ALLOCATIO PLU SEGMEN DEF_TAB_ RETENTION BIG
------------------------------ ---------- -------------- ----------- ----------- ----------- ------------ ---------- --------- --------- --------- --- ---------- --------- --- ------ -------- ----------- ---
GUARDATA 16384 65536 1 2147483645 65536 ONLINE PERMANENT LOGGING NO LOCAL SYSTEM NO AUTO DISABLED NOT APPLY NO
GUARTEST 8192 65536 1 2147483645 65536 ONLINE PERMANENT LOGGING NO LOCAL SYSTEM NO AUTO DISABLED NOT APPLY NO

Datafiles are on the same disk array. Tables are created by the same clause. The only difference is that in production environment the table contains partitions for each day of this and the next year and in the test environment it contains partitions only for 3 months.
I'm loading the same file on the same machine with the same amount of users.

Here are the trailers from SQL*Loader logs:

Production enviroment:

Bind array size not used in direct path.
Column array rows : 5000
Stream buffer bytes: 256000
Read buffer bytes: 1048576

Total logical records skipped: 0
Total logical records read: 192541
Total logical records rejected: 0
Total logical records discarded: 0
Total stream buffers loaded by SQL*Loader main thread: 61
Total stream buffers loaded by SQL*Loader load thread: 62

Run began on Thu Jun 15 14:51:45 2006
Run ended on Thu Jun 15 14:52:34 2006

Elapsed time was: 00:00:49.03
CPU time was: 00:00:06.78

Test environment:

Bind array size not used in direct path.
Column array rows : 5000
Stream buffer bytes: 256000
Read buffer bytes: 1048576

Total logical records skipped: 0
Total logical records read: 192541
Total logical records rejected: 0
Total logical records discarded: 0
Total stream buffers loaded by SQL*Loader main thread: 61
Total stream buffers loaded by SQL*Loader load thread: 62

Run began on Thu Jun 15 15:22:37 2006
Run ended on Thu Jun 15 15:22:54 2006

Elapsed time was: 00:00:16.87
CPU time was: 00:00:06.91

I'm not sure that I can put entire SQL trace in here as it is quite big so I'll put only the trailers.

Production environment:

OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 84 0.15 0.14 0 1 0 0
Execute 83 1.02 1.01 0 4 0 0
Fetch 110 1.98 1.91 0 90298 0 129
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 277 3.15 3.06 0 90303 0 129

Misses in library cache during parse: 17
Misses in library cache during execute: 12

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 248 0.00 0.00
SQL*Net message from client 248 0.24 6.04
SQL*Net more data to client 4 0.00 0.00
control file sequential read 10 0.00 0.00
SQL*Net more data from client 123 0.00 0.26
db file sequential read 9 0.00 0.04
log file sync 1 0.01 0.01
direct path write 20 0.00 0.00
rdbms ipc reply 17 0.01 0.03


OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 3756 0.34 0.35 0 0 0 0
Execute 4328 0.43 0.42 0 531 218 110
Fetch 13492 1.86 1.78 0 34539 1 16424
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 21576 2.63 2.56 0 35070 219 16534

Misses in library cache during parse: 34
Misses in library cache during execute: 32

89 user SQL statements in session.
3823 internal SQL statements in session.
3912 SQL statements in session.
********************************************************************************
Trace file: rash_ora_20487.trc
Trace file compatibility: 10.01.00
Sort options: default

1 session in tracefile.
89 user SQL statements in trace file.
3823 internal SQL statements in trace file.
3912 SQL statements in trace file.
54 unique SQL statements in trace file.
48079 lines in trace file.
47 elapsed seconds in trace file.

Test environment:

OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 84 0.03 0.02 0 1 0 0
Execute 83 0.02 0.01 0 0 0 0
Fetch 110 1.28 1.24 0 48410 0 129
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 277 1.33 1.28 0 48411 0 129

Misses in library cache during parse: 2

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 248 0.00 0.00
SQL*Net message from client 248 0.23 6.12
SQL*Net more data to client 4 0.00 0.00
log file sync 2 0.00 0.00
control file sequential read 10 0.00 0.00
SQL*Net more data from client 123 0.00 0.26
direct path write 20 0.00 0.00
rdbms ipc reply 288 1.96 2.99
db file sequential read 148 0.01 0.05


OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 461 0.06 0.06 0 0 0 0
Execute 485 0.20 0.21 0 1480 639 347
Fetch 235 0.01 0.00 0 290 1 114
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 1181 0.27 0.28 0 1770 640 461

Misses in library cache during parse: 5
Misses in library cache during execute: 4

89 user SQL statements in session.
478 internal SQL statements in session.
567 SQL statements in session.
********************************************************************************
Trace file: rash_ora_24055.trc
Trace file compatibility: 10.01.00
Sort options: default

1 session in tracefile.
89 user SQL statements in trace file.
478 internal SQL statements in trace file.
567 SQL statements in trace file.
43 unique SQL statements in trace file.
8083 lines in trace file.
16 elapsed seconds in trace file.

So there is an obvious huge difference in recursive calls. Why could it happen?

Thanks in advance

Tom Kyte
June 15, 2006 - 4:38 pm UTC

...
Datafiles are on the same disk array. Tables are created by the same clause. The
only difference is that in production environment the table contains partitions
for each day of this and the next year and in the test environment it contains
partitions only for 3 months.
...

funny how you can say "same clause" yet the tables are like night and day different?

How many partitions loaded into in each environment (that would definitely affect this)


how many rows are you actually loading here?

Impact of BLOCKSIZE on SQl*Loader #2 - ctd

Serge Shmygelsky, June 16, 2006 - 4:23 am UTC

Hello Tom,

sorry for my ignorance. I didn't think that partitions overhead could be so huge.

Actually I have partitions for each day. And the data is loaded as it comes in real time - so all the records in the file can contain calls for one (or two if the file is collected just after midnight) days and will fall into one or two partitions.
Now as I deleted partitions for the next year, peformance is 8 times better.
And I think that now I can understand why I had this huge amount of recursive calls.
Say, we have a table which has a lot of range partitions, e.g.
partition a1 values less then to_date ('01.01.2008')
partition a2 values less then to_date ('31.12.2007')
partition a3 values less then to_date ('30.12.2007')
...
partition an values less then to_date ('16.06.2006')

When Oracle is trying to find a partition to put a record into then it first looks if the partition key of the record being inserted falls into the highest partition. If it is not, it goes down partition by partition trying to find a correct one.
Is my understanding correct?

Tom Kyte
June 16, 2006 - 7:00 pm UTC

but you didn't answer my questions :(

are you using direct and are the row counts really small - that would be "not a good idea"

really small here would be 10's of thousands of records or less.

Impact of BLOCKSIZE on SQl*Loader #2 - ctd

Serge Shmygelsky, June 19, 2006 - 4:10 am UTC

Yes, I'm loading data in direct mode. Here is my parameter file:
DIRECT=Y
MULTITHREADING=y
DATE_CACHE=2500
PARALLEL=y
SKIP_INDEX_MAINTENANCE=TRUE
COLUMNARRAYROWS=10000
STREAMSIZE=128000
READSIZE=2097152

A datafile usually contains about 100,000 - 200,000 records and I'm loading about 100 files in 1 run (every 2 hours)

Tom Kyte
June 19, 2006 - 6:06 pm UTC

doubtful you want to use direct for so few rows. what if you did a conventional path load? How does that work out?

(direct is NOT always FAST=TRUE)

Impact of BLOCKSIZE on SQl*Loader #2 - ctd

Serge Shmygelsky, June 29, 2006 - 4:30 am UTC

This is the log from direct path:

Bind array size not used in direct path.
Column array rows : 5000
Stream buffer bytes: 256000
Read buffer bytes: 1048576

Total logical records skipped: 0
Total logical records read: 74430
Total logical records rejected: 0
Total logical records discarded: 0
Total stream buffers loaded by SQL*Loader main thread: 26
Total stream buffers loaded by SQL*Loader load thread: 39

Run began on Tue Jun 20 10:06:25 2006
Run ended on Tue Jun 20 10:06:36 2006

Elapsed time was: 00:00:11.55
CPU time was: 00:00:03.27


This is the log from conventional path:

Space allocated for bind array: 250776 bytes(18 rows)
Read buffer bytes: 1048576

Total logical records skipped: 0
Total logical records read: 74430
Total logical records rejected: 0
Total logical records discarded: 0

Run began on Tue Jun 20 10:11:22 2006
Run ended on Tue Jun 20 10:11:53 2006

Elapsed time was: 00:00:30.68
CPU time was: 00:00:07.78

So I need 3 times less time while loading in direct. As I don't have to maintain any indices and constraints and main idea is to load the data ASAP, I think it is quite reasonable to use direct path.

8 vs 16 vs 32

Elaine H, July 20, 2006 - 9:17 am UTC

I have been reading through this question with great interest. I understand that there is no definitive answer, but can you please give your best recommendation on the following.

Mixed system, mass data loads and online users accessing data. 9iR2 on AIX

We are in the process of testing on a new platform (Solaris 10 x86) and 10g. We were thinking to make the blocksize for the database 32k since the direct loading and append operations should perform better, if I understand correctly.

However this could make the online performance less efficient?

In trying to create the instance, I used the example from the 10gR2 documentation (I do not like dbca and prefer to create manually) and set the system tablespace to 325M. db_block_size was set to 32k. In just trying to create the tablespaces (no objects) I ran out of space only 7 tablespaces into the create script -- there are about 30 tablespaces for the application. What was eating all the space? Was it a function of the blocksize? I recreated this database, still with 32k blocks, and after creation and running of some of the dictionary scripts, had already used 311M of space. This is by far the largest system tablespace that I have had to create and I have been working with Oracle since v6.0.31. I would have assumed that using locally managed tablespaces with auto segment management would have compensated for the block size. Is this an invalid assumption on my part?

If I need to recreate system with a smaller block size, what are the inherent problems with maintaining larger blocksizes in some of the application tablespaces? I know I would need to configure the caches for the other sizes, but from an administration standpoint, what other considerations need to be taken into account? Also is this a viable strategy?

If you have answered this elsewhere, would you please provide a link? I have both Effective Oracle by Design and Expert Oracle but couldn't find something related to this. I am sort of sure you would have included something, but I don't seem to be looking with the right keywords, so if it is there, would you please provide either the keyword, chapter or page number? If none of the above, would you please give what your personal preference would be? I know it may not fit in all cases, but since I have to determine block size at database creation, it would be nice to have your expert, albeit best guess to work from.

Thanks so much.

Tom Kyte
July 22, 2006 - 4:46 pm UTC

If you want the rule of thumb, go with 8k.

Else you would need to test on your system to see which gives you the over all best - I doubt 32k is going to be significantly faster for your loads - true there could be less block overhead, you could compress better (if you are even using that), but 8k doesn't "stink"

I would not recommend multiple block sizes and the multiple caches at all for this.


you don't really give sufficient information to answer the question about running out of space. I'm not sure what ran out of space here - your file system, the dictionary, whatever. If you have "massive amounts of data loads" - worrying about a couple of megabytes in system is being penny wise and pound foolish - it is going to be a couple hundred megabytes - period.

How to catch up the block..

Ken Acker, August 26, 2006 - 1:42 pm UTC

I guess I was not able to put the right words to searchÂ… Sorry for causing inconvenience.

Just wanted to know whats filling up my system tablespace, we were doin someone 3rd party installation and I found that system table space is going huge, I was trying to get some sort of script but it's was of no good. I will really appreciate if you could provide me something which could help me finding out whats eat up my tablespaces.



Page size, block size and Oracle block size

A reader, September 01, 2006 - 2:48 pm UTC

Hi Tom,

1. In a Datawarehousing environment (AIX and 10gR2), when the page size and the o.s block size (the one that is set in the LVM) is 8K, what would you suggest the database block size to be for best performance?

2. Is the performance bad if database block sizes are greater than 8K on AIX?

Thanks,


Chosing on the block size

Sanji, December 13, 2006 - 4:44 pm UTC

Tom,

In one of your replies, you mentioned

"If you want the rule of thumb, go with 8k.

Else you would need to test on your system to see which gives you the over all best -"

What is the best way of testing the system ?
What wait events should we look for that "might" influence the idea of moving to a higher/lower db_block_size.

Thanks
Sanji

Tom Kyte
December 15, 2006 - 7:57 am UTC

set it up, test it out - you know, simulate, benchmark.

Frankly - use 8k pretty much.


Choosing the largest block size for indexes?

Dimosthenis Vodentsis, January 14, 2007 - 12:08 pm UTC

Hello Tom,

In an Oracle Press for DBA's book (Oracle 9i version) I've read that for indexes the best practise is to use the biggest block size you can (so for Linux 32k). That is what I have done, the primary db_block_size is 8k and all the indexes reside in 32k.
1) Is this wise?
2) Moving also in that tablespace all the materialized views that exist in the database is this also wise?

Thank you.

File system cluster size vs DB Block size

Julian, February 08, 2007 - 11:42 pm UTC

Hello Tom,

We have W2K3 cluster solution for SAP/Oracle 10g system.
The Oracle DB block size is now 8kb but the files system cluster size is 4kb.

1. Do you see any problem for I/O performance wiht this configuration?
2. We plan to change the file system cluster size from 4kb to 32kb. Do you see any problem for this move?

Thank you,
Julian

Multiple block sizes and the large pool

Stewart W. Bryson, May 09, 2007 - 11:17 am UTC

Tom:

I've been thinking about the hybrid environments proposed by some of the postings, and am trying to figure out how parallel query and the large pool play into this. Supposing that I have an 8K block size, and configured 32K tablespaces holding historical or DSS data, how would the default block size of 8K affect parallel query reading 32K blocks into the large pool, which is configured using the default block size of 8K? It seems that the 32K blocks in the DSS tablespace would be read using 4 8K blocks in the large pool. Would this be detrimental? I understand the overhead of more blocks when it comes to the buffer cache (serialized latches looking for blocks), but is there a per-block overhead associated with the large pool, which holds the results of parallelized reads against tables and indexes?

Thanks.
Tom Kyte
May 11, 2007 - 9:51 am UTC

large pool is used for message buffers, it is not "read into"


Multiple block sizes and the large pool cont.

Stewart W. Bryson, May 11, 2007 - 11:13 am UTC

What is the content of the parallel message buffers? I was under the impression it was "data", being assembled from all the producers into one result set ready for consumption. If it is simply just messages, meaning "this producer goes to that consumer", then that seems like a relatively small data set, and putting it in the large pool would seem like overkill.
Tom Kyte
May 11, 2007 - 11:49 am UTC

messages are the data transferred between the processes - they are not "blocks", they are data structures.

the large pool is not used as a cache for blocks, the large pool is not "read into", the large pool does not have "blocks" in it

the large pool is a place to allocate shared memory from by processes that need memory that is to be managed with "allocate and free" calls - not based on an LRU type of usage (like the shared pool is)

Looking for blocks in buffer cache

Matt, May 11, 2007 - 6:18 pm UTC

In one of your previous responses on this page you said...

The larger the SGA, the more optimum larger blocks become -- less blocks to manage and look at when
looking for a block in the cache.


Does that mean that when Oracle checks to see if a block is in the buffer cache it has to scan every block every time?
Tom Kyte
May 14, 2007 - 12:40 pm UTC

no, it keeps lists of lists.

we take the dba (database block address - file+block) and hash it, that'll tell us which of N lists to look through.

Increasing db_block_size from 2ko to 8ko

Yanni, July 16, 2007 - 4:22 pm UTC

What database parameters do we need to verify when incresing db_block_size from 2ko to 8ko in 9ir2. Here are my settings with 2ko block size;

The actual OS block size = 4ko

Thanks
Tom Kyte
July 17, 2007 - 11:40 am UTC

not sure what you mean by "verify"

control file block size

A reader, September 04, 2007 - 1:16 am UTC

Hello,Mr. Tom:
I found my control file block size and data file block size are defferent ,I do not think I set a control file block size when I create db,where does it from ?

SQL> select name,block_size,file_size_blks from v$controlfile;
NAME                           BLOCK_SIZE FILE_SIZE_BLKS
------------------------------ ---------- --------------
C:ORACLEPRODUCT10.2.0ORADA      16384            450
TAORCLCONTROLFILEO1_MF_390M
ZRBL_.CTL
SQL> show parameter block_size;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_block_size                        integer     8192
SQL> select name,block_size from v$datafile;

NAME                           BLOCK_SIZE
------------------------------ ----------
C:ORACLEPRODUCT10.2.0ORADA       8192

TAORCLDATAFILEO1_MF_SYSTEM_

390N2CK3_.DBF
C:ORACLEPRODUCT10.2.0ORADA       8192

TAORCLDATAFILEO1_MF_UNDOTBS

1_390N2D0W_.DBF
C:ORACLEPRODUCT10.2.0ORADA       8192

TAORCLDATAFILEO1_MF_SYSAUX_

390N2CW2_.DBF
NAME                           BLOCK_SIZE

------------------------------ ----------
C:ORACLEPRODUCT10.2.0ORADA       8192

TAORCLDATAFILEO1_MF_TEST_39

0N2DBX_.DBF
C:ORACLEPRODUCT10.2.0ORADA       8192

TAORCLDATAFILEO1_MF_DEFAULT

__390N2DJK_.DBF
C:ORACLEPRODUCT10.2.0DBDAT       8192

AORCLDATAFILEO1_MF_TEST100_
NAME                          BLOCK_SIZE
------------------------------ ----------
3BYJR6R3_.DBF
C:DUL.DBF                           8192
7 rows selected.
SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Prod
PL/SQL Release 10.2.0.3.0 - Production
CORE    10.2.0.3.0      Production
TNS for 32-bit Windows: Version 10.2.0.3.0 - Production
NLSRTL Version 10.2.0.3.0 - 



Regards !
Alan

Tom Kyte
September 05, 2007 - 1:40 pm UTC

the redo and control file block sizes are OS block sizes.

you only pick the db-block size.

Impact on database parameters

Yannick, September 06, 2007 - 3:53 pm UTC

What database parameters are impacted and need adjustement when incresing db_block_size from 2ko to 8ko in 9ir2.

Thanks


Tom Kyte
September 11, 2007 - 8:28 am UTC

probably......

none. you are using db_cache_size already right? so there you go.

you should have defaulted most others - like multi-block read counts and such - so none.

Contention

Aru, June 22, 2008 - 5:39 pm UTC

Hi Tom,
I am trying to get my head around block contention.
You said above " you will have more people contending for the same block perhaps. We do lots of stuff
at the block level -- you may notice more contention in this area.".

Please can you elaborate? Does this refer to DML's? If so, I thought that once the block is read into the buffer, two or more people can simultaneously read or write to the buffer or is it serialized?
Please explain how exactly is there any contention for data blocks.
Thanks Tom,
Regards,
Aru.

Tom Kyte
June 22, 2008 - 9:50 pm UTC

well, in the buffer cache, the block is a data structure. At the lowest level, the modifications to that block must be serialized (eg: in order to MODIFY the bits and bytes on a block, you must have it in current mode. Only one transaction can have that block in current mode at the same time. So, the modifications to to bits and bytes are serialized - not by commit/rollback - but by latches)

Also, there is a maximum number of transactions that can have access to the block (255) at the same time - and that number (255) might be much lower in practice if the block is near full when lots of transactions try to go after it... (there is a transaction header in the block - it starts at INITRANS size and ends at <end of space, cannot grow the transaction list> or <MAXTRANS which is 255 in current releases> - whichever comes first.

16K block size

A, April 09, 2009 - 11:03 am UTC

Hello Tom,
I created a tablespace with 16K block size and moved one the big table into it.

db_file_multiblock_read_count = 16

When I ran few of queries against this table (full table scan) there was no difference in the timings as compared timings on 8k block size.

The UNIX(SunOS 5.10)'s I/O block size is 8k

df -g
8192 block size

Is this the reason why Im not seeing any improvement?

Or what other parameters/setting do I need to check?

Tom Kyte
April 13, 2009 - 3:58 pm UTC

think about it

you still read Xmb of data....
you still read Xmb of data....



db_file-mbrc is used to determine the IO size. We take it upon startup and multiply it by the default block size. Your IO size is therefore 128k.

You'll either read 16x8k blocks in a multi-block read (at best, probably less, we have to read around blocks that are in the cache)

Or your read 8x16k blocks in a multi-block read (at best....)


But either way, you are reading Xmb of data, 128k at a time.


You do not mention a version, but in current releases (10g and up), you should NOT set this. We'll compute your observed multi-block read count and use that to cost queries, but we'll always use the MAXIMUM IO SIZE for your platform to do the reads (best of all possible worlds)


Why did you think they might be different in response times?

How to..?

Jatin, July 29, 2009 - 4:17 am UTC

In case we are convinced to move from 4k to 8k, what could be the best steps:

1. alter system set db_8K_cache_size = <some value>
2. create a tablespace with 8K storage parameters
3. move tablespace from 4K to 8K

repeat the above steps for all TS?

My question is:

1. Can we do this for system ts as well?
2. Once all TS are moved, will db_nk_cache hold relevance?
3. Our warehouse has 4K block size and all the reasons in reply to original post sound convincing.. Is it advisable to move the entire system from 4K to 8K or 16K (of course first on QA and so on). Can you give me some chekpoints before taking this step?

Thanks
Jatin

Higher Block Size

Jatin, August 04, 2009 - 9:25 am UTC

Hi Tom

Is there really an issue in moving up from 4k to 8k (or 16k) for the whole warehouse database? Kindly show some light.

Best Regards
Jatin
Tom Kyte
August 04, 2009 - 3:22 pm UTC

the major issue is rebuilding the entire database.

other than that, no .

Export/Import for rebuild..?

Jatin, August 05, 2009 - 2:41 am UTC

Hi Tom

Thanks for the reply.

Does rebuilding an entire DB means recreating a new DB with larger block size and export/import? I don't see any other way to move the entire database to a larger block size -- please suggest; the DB I have in mind is 1 TB in size.

Best Regards
Jatin
Tom Kyte
August 05, 2009 - 8:51 am UTC

it would require an entire rebuild - yes. export/import, dump/load, data pump, create table as select from table@remote, whatever you want to do - but it would involve taking all of the data out of one database and putting it into another.

You could use multiple block sizes - but you would still have to rebuild EVERYTHING but system (alter table t move, alter index i rebuild) and then you would end up with a complete disaster - multiple buffer caches, no automatic anything (non-standard block sizes do not participate in automatic memory anything), completely non-standard configuration that makes you totally different from everyone else (meaning you'll have a good chance of encountering an oddity that no one else has). Strongly recommend against this.

blocksize

p, June 08, 2010 - 2:12 am UTC

What are the problems if the database block size is better then the os blocksize
Tom Kyte
June 09, 2010 - 8:46 am UTC

is better?

or "is bigger"?

It is almost always bigger since OS block sizes are typically 0.5k or 1k in size. No problem.

What about the effect on inserts?

A reader, September 02, 2010 - 3:13 pm UTC

Tom,

How does larger block sizes impact inserts? Since there will be relaitively smaller number of calls to the physical I/O, will that make inserts any faster? Let's consider - a warehouse with typical high load jobs & extremely quick operational datastore like 500 inserts/sec.

Thanks,

Tom Kyte
September 09, 2010 - 6:59 pm UTC

DBWR does the physical IO's - they happen in the background, they will not really impact YOU the end user and your inserts.

Does a larger block size impact inserts - sure. But how is the question....

it could make them slower (you have more contention with bigger blocks of course, more concurrent attempts to use it, less scalable).

it could make them faster (if you are doing a direct path load in a data warehouse and are the only thing writing to the block)

it might make no difference at all (in most all cases, probably this is true)

We tend to use smaller blocksizes in highly concurrent operations - bigger block sizes in warehouses.

but in general - 8k is "perfect"

Thank you!

A reader, September 16, 2010 - 6:46 pm UTC

Thank yoy very much for the explanation!

Data blocks - Data from different tables

Kailash, March 31, 2011 - 7:45 am UTC

Dear Mr. Tom,
I was reading your Expert One-On-One Oracle book on database blocks. Based on that, the following is my understanding on Oracle storage organization is :
1 tablespace can have multiple segments
1 segment can span multiple extents
1 extent can be across multiple database blocks (the most granular)

Each database block has a overhead section and a data section. The overhead section has an entry for table directory (ie. list of table data stored in this block)

Question
#1. Is my understanding correct?
#2. If an Oracle block stores a table directory, then it is possible then for e-g
if table-A and table-B have storage attribute defined to store in the same tablespace TS1, it is very much possible that a block in TS1 can store data from both the tables A and B?
#3. When Oracle does a physical read of table-A from the datafiles to bring this block in to the DB-buffer in SGA (assuming the block is not already there), it is very possible that the table data for B is also loaded in the DB buffer. Is that correct?

As always, your explanations are very helpful
Tom Kyte
April 12, 2011 - 10:10 am UTC

1) yes, a tablespace has zero or more segments.
a segment is comprised of zero (in 11g, one in 10g and before) or more extents
an extend is comprised of some number of blocks (greater than zero)

2) yes, read about clusters in the TABLE chapter in that book, I cover that.

3) correct, if you are using a cluster.

scalability problem

A reader, September 28, 2012 - 8:55 am UTC

Hello,

Our process to perform data load does row by row. This is inevitable due to the way the tables are designed. For each row it extracts from the source, the program spawns around 50 different kinds of DML statements that works on few tables. The average time taken is around 400ms for each row of operation (that is, 400ms for those 50 DMLs - which is around 8ms per DML). This is not bad. However, to process 200M of rows, one job instance will take around 600 days to complete.

To reduce the overall time, the plan is to run 100 such jobs with a hope they all together run and complete it in 6 days.

However, it doesn't work that way. All the 100 jobs are going to work on the same set of tables and indexes. Besides this, there are bunch of "gc" waits as well. I know this is a bad design (it should have done bulk operation but it couldn't).

All the DMLs are very OLTP in nature. It inserts/updates on just one row.

Some action items we have thinking:
1. Shutdown all instances except 1 to avoid "gc" events
2. Run around 30 jobs one one instance until CPU hits close to 90% utilization
3. For certain big tables, create Indexes as global hash partitioned indexes with pctfree 80
4. Create 2K block size tablespaces and move very active tables and indexes to it

I am interested in the 4th item. We haven't tested the pros and cons of this. Would that work? When reporting people run queries or any SELECT kind of statements, would it impact them since the number of IOs would be higher in those case?

As a DBA, what would be the other options database can provide to enhance the speed and at the same time reduce contention?

Thanks,

Tom Kyte
September 28, 2012 - 4:49 pm UTC

Our process to perform data load does row by row. This is inevitable due to
the way the tables are designed.


hmmm, inevitable because your software was designed one way. It is software because.... it is changeable...

if you do row by row and you take 1ms to process a row (very very very good and fast) and you have 5,000,000 rows to load, you are looking at almost an hour and a half. Take that up to 10's of millions, hundreds and you see where this goes.

It scales very very well. It scales perfectly linearly. It just isn't very fast.

400ms for each row of operation (that
is, 400ms for those 50 DMLs - which is around 8ms per DML). This is not bad.


wow, you and I have differing opinions of what bad means. very different. that sounds horrifically bad - 1/2 second? per row?

To reduce the overall time, the plan is to run 100 such jobs with a hope they
all together run and complete it in 6 days.


hah, hope you have more than 100 cpus.

(it should have done bulk operation but it
couldn't).


yes, yes it could. It would have taken someone else to code it maybe but it could definitely utilize at least SOME bulk operations - at the very very very least. If not entirely.

All the DMLs are very OLTP in nature. It inserts/updates on just one row.


common problem, people turn their batch systems into huge OLTP systems and then try to scale up by running hundreds of threads - but then you have massive contention and cpu overload and it doesn't work anyway.

1) probably not, use what you have, use LESS threads

2) running up to 90% utilization in OLTP is death, you need to keep it at 66% or less.

3) why? pctfree only counts on the build - after that we go to 100%, pctfree doesn't make sense for an index so much as a table. It is only used for builds and rebuilds.

4) ugh, don't even - no. would you like to manage the caching and exchanging of four times as much stuff??? in RAC???


As a DBA, what would be the other options database can provide to enhance the
speed and at the same time reduce contention?


in this case, not too much. You are hitting pure "math" here. 400ms *per row* is absurd, it is huge. especially as the designers of this system knew how much data they would have to crunch.

the only way this is going to go fast - dead serious - is a serious long hard look at the approach the application takes to the database. DBA's - they hope for a 10/20% change in performance. If you want orders of magnitude (and you need it), you need to look to the developers, to the application.

sorry - there quite simply is no quick switch, some configuration change, you can make here. The math is overwhelming.





follow up,

A reader, October 01, 2012 - 9:04 am UTC

Thank you. I am glad and sad. Glad that we both are in the same page about the performance enhancement. Sad because it is a pretty big challenge to take people into the direction of redesign/redeveloping.

Coming back to 2K block size, would it at least help the OLTP load process when multiple processes are performing the loads?

Thanks,

Tom Kyte
October 08, 2012 - 3:21 pm UTC

I can see it both helping in one way, hurting in another.

more blocks, more block overhead, less caching of your data.

more blocks, less possible concurrency issues, maybe slightly reduced concurrency based waits (which may or may not reduce runtimes!!! less waits <> faster)

more blocks, more index maintenance


DB_FILE_MULTIBLOCK_READ_COUNT

John Davey, October 05, 2012 - 10:32 am UTC

I am creating a new DW application on 11.2.0.3

I get the impression that you don't think that the parameter DB_FILE_MULTIBLOCK_READ_COUNT needs to be set or configured in any way - is that because the mbrc measure is now calculated by the collection of system statistics ?

I currently run this DW on an 8k block size but am thinking of switching to 16k. We update rarely - once a week - and have our biggest performance issues around full table and index scans. Do you think this is worth a try ?
Tom Kyte
October 09, 2012 - 12:32 pm UTC

it is automatically set for you - it does not need to be set.

we'll cost queries using the observed read count (what you really achieve on your system)

we'll run queries with the maximum IO size requested - just in case we can read that many blocks every now and then.

so you get the benefit of the right cost and the maximum io size when possible.


if you set it - that'll be used for the cost and the read - even if we could have read more.



it is doubtful a larger block size would help you - whether the blocks are 1k, 8k, 16k - we just do usually 1MB IO's. The only benefit it might have would be to reduce the block overhead marginally.


A much bigger bang for the buck would be table compression and index key compression.

when creating an index corresponding to 13 bytes, it causes an error exceeding 6398 bytes

Suresh, August 21, 2018 - 8:22 am UTC

when creating an index corresponding to 13 bytes, it causes an error exceeding 6398 bytes (ORA-01450).

SQL> CREATE INDEX test003_fbi02 ON test003 ( damotest('HBIL','PCTPCPAM','RRN',c01) ASC );
CREATE INDEX test002_fbi02 ON test002 ( damotest('HBIL','PCTPCPAM','RRN',c01) ASC )
*
ERROR at line 1:
ORA-01450: maximum key length (6398) exceeded


But when I use the below process. It works. Why?!

SQL> create tablespace testidx01 datafile 'testidx01.dbf' size 20m blocksize 16K;

Tablespace created.

SQL>
SQL> create index t1_idx on test003 (substrb(C01,1,100)) tablespace testidx01;

Index created.

SQL>
SQL> show parameter max_string_size

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
max_string_size string EXTENDED
SQL>
Connor McDonald
August 21, 2018 - 12:01 pm UTC

I'm assuming your function 'damotest' returns a varchar2?

We assume the largest possible varchar2 that might come back when deciding if you might exceed the index key limit.

So try this:

CREATE INDEX test003_fbi02 ON test003 ( 
  cast(damotest('HBIL','PCTPCPAM','RRN',c01) as varchar2(50)
); 


where you choose an appropriate size

Is this recommendation still valid in Oracle 12c or 18?

Herta, November 14, 2018 - 4:15 pm UTC

The documentation suggest that it still is, but our ERP-vendor (OLTP db) wants us to set the db_block_size to 32k, claiming the minimal value for db_block_size is 16k for Oracle 12.
Chris Saxon
November 15, 2018 - 11:00 am UTC

The default block size is still 8k in 18c. And as the docs say:

A block size of 8 KB is optimal for most systems.

https://docs.oracle.com/en/database/oracle/oracle-database/18/tgdba/IO-configuration-and-design.html#GUID-1D9458EB-7D82-47B7-81D8-A5A34D11C92C

Bets

mar Rodríguez, September 28, 2019 - 12:25 pm UTC

Hi
which do you think it is best export strategy from 32K unix to import into 16k Windows server 2012r2?

thank so much
Connor McDonald
October 01, 2019 - 1:04 am UTC

I don't think windows supports 32k, so I don't think can use transportable tablespace.

So that leaves datapump as the only real option here.

Table structure vs block size

Johan Snyman, October 01, 2019 - 9:34 am UTC

From the following article: http://jonathanlewis.wordpress.com/2014/12/23/just-in-case/
the planned structure of any high-volume tables could play a role in selecting the block size, given that throughput of and redo generated by inserts into a table with a row definition which is larger than the block size can be severely impacted.

Regards
Johan
Chris Saxon
October 01, 2019 - 1:19 pm UTC

Thanks for fishing that out!