Extents are contiguous space on disk
W. I. Brown, July 26, 2001 - 4:02 pm UTC
You told Tianhua that an extent is not contiguous space on disk. Both my experience and Oracle's documentation tell me that you are wrong. "Oracle8: The Complete Reference", chapter 19 says that a Tablespace is a file. This is the level that the operating system manages. Space management within a Tablespace is handled by the database engine. Within a Tablespace are Tables, Indexes, and Clusters. Tables are made up of Segments, which are made up of Extents. Extents allocated to the same Table do not have to be, and normally are not, contiguous, but each extent is a contiguous block (not an OS block) of space within the Tablespace. The size of the Extent is determined by by the storage parameter specification for the Table or Tablespace. If you attempt to allocate a new extent that is larger than the largest contiguous block of space available within a Tablespace, the allocation will either extend the Tablespace if Autoextend is turned on, or it will fail. If you attempt to allocate a new Extent for a Tablespace that is larger than the largest contiguous block of space on the disk, it will fail.
TO: W.I. Brown
Mark, July 26, 2001 - 8:16 pm UTC
W.I. Brown:
Take a look at this...
The formatting will be hard to follow, but the query illustrates that extents are, in fact, NOT contiguous ON DISK.
SQL> select segment_type, tablespace_name, extent_id, file_id, block_id from dba_extents where segment_name = 'DE_TX_NODE' and extent_id < 3;
SEGMENT_TYPE TABLESPACE_NAME EXTENT_ID FILE_ID BLOCK_ID
---------------- ---------------- ---------- ---------- ----------
TABLE USERS 0 16 1321
TABLE USERS 1 15 1241
TABLE USERS 2 16 63417
3 rows selected.
Given your experience, please explain how extent_id 0 is in file_id 16 and block 1321 while extent_id 1 is in file_id 15 and block 1241? Extent 0 and extent 1 are in separate files! If extents are contigious on disk, how could it be? Also, notice that extent_id 2 is back in file_id 16, but at block 63417...
What Tom is saying makes perfect sense!
Got it...
Mark, July 26, 2001 - 9:06 pm UTC
I think I understand "their" argument now... Using my data as an example... The tablespace (users) is a LMT with UNIFORM 128K extents. So "they" are saying that all 128K of each extent is physically contiguous on disk? No way! My OS block size is 512 bytes. I can guarantee that that all 256 OS blocks that are required to make up one of my 128K extents are *not* next to each other on disk.
Anyway, sorry for adding confusion to an already 'beaten to death topic.' You must have the patience of Job. Have a great vacation.
Perfect Explain
A reader, July 26, 2001 - 11:37 pm UTC
Hi, Tom,
hope you climb the mountain, and relax.
And also hope they are "contiguous" on the road. ;)
Have fun and we are waiting for you back.
Thanks
A reader, July 27, 2001 - 4:02 am UTC
A reader, July 27, 2001 - 2:19 pm UTC
not
Another question then arises
Ganesh Aditham, July 27, 2001 - 3:20 pm UTC
PHYSICALLY -- no, block 1 of an
extent might be very very far away from block 2 on disk. LOGICALLY -- yes,
block 1 is right before block 2 in the file.
Then what's the point in decreasing number of extents into 1 extent for objects by reorg ,as even in 1 extent the object gets dispersed all over the disk and same will be the situation for objects having multiple extents,?
How will a object having less number of extents give a better performance?
Can somebody clearify on this
Can't let it die
Mark, July 27, 2001 - 10:46 pm UTC
OK, I can't let this die, so... I downloaded a utility from sysinternals
(formerly ntinternals) that displays the distribution of a file across
clusters on an NTFS disk.
This is what I did:
SQL> create tablespace ts_temp
datafile 'C:\ORACLE\ORADATA\CDS1\TSTEMP01.DBF' size 65560K reuse
autoextend off extent management local uniform size 64M;
This is not a contrived example, as I have a tablespace for rollback segments
that are 64M each. Notice that this is a LMT with UNIFORM extent allocation of 64M.
The size of the file is 64M + 64K (for the bitmap). Therefore, I am forcing the
situation where this file can hold only a single extent. I believe that the debate
is whether or not a single extent will be physically contiguous on disk. I then used
the sysinternals utility to look at the file. It is in *two* fragments (physically
located on *two* non-adjacent disk clusters), *not* a single contiguous fragment. Remember this file has a *single* extent in it.
The output of the utility is:
NT 4.0 Defragmentation Demonstration Program V1.0
Copyright (C) 1997 Mark Russinovich
[ snip ]
Opening volume: C:
Open status: The operation completed successfully.
Enter commands ('?' for help):
: n C:\ORACLE\ORADATA\CDS1\TSTEMP01.DBF
Clusters for file: C:\ORACLE\ORADATA\CDS1\TSTEMP01.DBF
VCN: 0 LCN: 1966165 LEN: 44
VCN: 44 LCN: 2608488 LEN: 16348
Enumerate file clusters: STATUS_SUCCESS
The cluster size on my system is 4k, and you can see that the file occupies 16,392
clusters == len fragment1 + len fragment2. Notice that the 'break' is not at a location
where the bitmap would be in one cluster and the 'real' extent in another cluster.
(It is 176K into the file). Now I know someone will say, "Yeah, but that's Windows!
A 'real' system wouldn't do it that way." I believe that this would be the case on
whatever platform you happen to be running. I am also running on Solaris and AIX, but I
don't have a utility to map the file allocation on disk, so I can't verify my suspicion.
In any case, I simply don't understand why it is so difficult to accept that a filesystem
is not going to allocate the space requested by Oracle in contiguous blocks on disk. If
someone can convince me otherwise, I would admit 'defeat'. But at least try to make a
stronger argument than 'not'. Sorry about the formatting, I don't know how (or if it is possible) to change the formatting in the 'feedback' facility.
Database files are (almost) never contiuous on a disk
Harrison, August 01, 2001 - 10:45 am UTC
Hi: Just to point out how little the OS cares about our concept of being physically contiguous, due to their size, non-trivial database files are large enough that they are always on several platters, making up at least one cylinder in a disk. You can be assured that if the controller can connect parts of many platters so the file is read correctly, it is not more difficult to skip parts because they are in another file. The controller also skips bad parts of disks (and reads a spare side to fill in the data).
The pictures that utilities make, showing a file system as one big rectangle is just a logical concept that combines many different platters, with multiple heads on each side, and not the physical reality. The important part is that the file system can read the file in order, not that it is all in one spot. It isn't.
Who says that there is no use of defraging a table.
sarb, August 02, 2001 - 11:52 am UTC
Its really usefull or i will say performance enhancement by defraging a table . Eg. its better to have 5-10 larger extents for a table sized 10 MB rather than 1280 Extents of 8K each. And what about at table sized 530 MB with 2200 extents rather same table with 53 extents, certainly # of extents effects the performance thoug they r not congiguous on disk physically. ( other parameters also effects the size for the extent)
Ganesh - Extents
A Reviewer, August 03, 2001 - 5:41 am UTC
Ganesh, To answer your question, there is no benefit to be had from 1 large extent instead of a number of smaller ones. The 1 extent is just a logical view. The data blocks are dispersed accross the disks by the OS, you cannot govern this. If you were importing data into a new table it makes sense to put all data into 1 large extent initially. After that there is no need to routinely export/import to maintain this large 1 extent. Fragmentation on Disk is a different matter!!.
An Extent minimalist
Steve H, August 03, 2001 - 6:40 am UTC
Thanks Tom (and other contributors). Before the days of AUTOEXTEND (7.3), to avoid running out of extents, we needed to keep an eye on the number of extents allocated. I guess this habit just kept rolling.
I wonder how many other things we still do out of habit ?
How many extents are too many.
Richard, August 03, 2001 - 12:21 pm UTC
Other RDBMS have been using bitmaps for space management since the 80s and early 90s. Having worked with DB2 and its segmented tablespaces and uniform extent allocations, the number of extents was never an issue. Exporting and importing tables just for the sake of singular extents is a wasteful enterprise. It's about time the Oracle community caught up with the rest of the database world in space management methodology.
Catherine., October 01, 2001 - 10:11 am UTC
If there was an opportunity to ask questions at ask tom I would but there is always a backlog.....
Anyway, I have a colleague who insists that when a datafile is created on a new disk thats not used for anything but Oracle datafiles, then that datafile will be created as a contiguous chunk of disk. And therefore, seeing as the datafile is contiguous, if you allocate 1 large extent, then this means that this extent will be contiguous physically. He explained that if there is contiguous disk space available, the datafile will be created contiguously.
Opinions and hard facts would be most welcome because like a lot of people I also believe that it doesn't matter how many extents you have.
October 01, 2001 - 3:09 pm UTC
and just how many times in your life do you get a virgin disk? thats not in a raid set? or using a volume manager?
It is a MOOT discussion. Even if they (files) were contigous (and IN GENERAL they are not), we read in chunks. In between our reads - the disk head will be sent elsewhere for someone else (unless you are running on virgin disks in single user mode -- I haven't done that since the late 80's myself -- everything has been multi-user since then and "disks" generally aren't "disks" anymore -- they are volumes under the control of some software).
In short, the facts are life are:
- lots of extents are OK
- The disk will be fragmented
- it is not something to lose any sleep over.
Sarb extents does NOT matter
lsc, October 02, 2001 - 6:50 pm UTC
Sarb number of extents does not matter because in OLTP system most reads are indexed which means that the actual data is obtained by first reading the index segment then with the rowid in the leaf the data is obtained, back again to read index then again data and so on. That is probably why wait events
db_file_scattered_read is referred as table scans (random read)
and
db_file_sequential_read as index scans (sequential read)
So even you have a huge extent instead of 1000 you still do random read, the head of your hard disk is always moving around from one side to another
A huge extent could benefit in Full Table Scans (but well assuming your extent is contiguos PHYSICALLY in the disk which is very very rare)
There are two levels of contiguoness, one is OS the other is Oracle, OS is PHYSICAL, Oracle is logical, from Oracle you see the segments/extents/blocks (whatever) contiguos but physically they are NOT
The only problem you may have with small extents in dictionary managed tablespace is that frequent/continuous allocation of extents would lead to contention of ST enqueue (dictionary contention) a problem which does not arise in locally managed tablespaces
I remember there was a way to dump the datafile header to a trec file then it is used to check the physical location of the datafile and you could see that the datafile had random physical addresses
logical volume manager
Reader, July 03, 2003 - 12:22 pm UTC
Tom, within oracle, is there any tool to find where exactly the data for a table is on which disk especially when we have hundreds of disks in a logical volume manager? Thanks.
July 03, 2003 - 8:08 pm UTC
with 9i OEM, there is for some setups.
otherwise, no, not really.
Related Question
Matt, July 03, 2003 - 9:53 pm UTC
I think it was James Morle (Scaling Oracle 8i) that highlighted that, for a given disk it is always better to only use the outside of a disk platter for db file storage. The premise is that the outside of the disk is furthest away from the centre of the disk and so spins fastest. As a result disk seek times are reduced.
I always wondered, how on earth do you manage to specify where on a disk a file will go.
I've never got to the stage where this has needed to be a consideration.
Does anyone have any comments on how this is achieved?
Best Regards,
July 04, 2003 - 7:36 am UTC
today -- with SAN, NAS, 96gig disks (i've heard you can get 300gig disks now even -- just think, 4disks and you too can run a terabyte data warehouse at home!), pretty much "not going to be in the realm of possibility"
link?
Reader, July 04, 2003 - 12:33 pm UTC
<quote> with 9i OEM, there is for some setups.
otherwise, no, not really. <quote>
Tom, could you direct me to a link to learn how exactly I can do it using oem?
Please find time enjoy 4th of July. No need to reply to me now. We all appreciate your dedication and you do work 365 days a year. You are truly amazing.
What about redundancy and throughput?
Matt, July 07, 2003 - 4:57 am UTC
[SNIP...]
today -- with SAN, NAS, 96gig disks (i've heard you can get 300gig disks now
even -- just think, 4disks and you too can run a terabyte data warehouse at
home!), pretty much "not going to be in the realm of possibility"
[...SNIP]
I think that J Morle also puts forward the argument for using many smaller disks rather then fewer larger ones. The more disks the less the likelihood of a failure and lengthy recovery (I take on board the comments about SAN and NAS). I guess that 300 Gig disks are fine if they satisfy your I/O requirements and you have covered the redundancy with your hardware architecture.
July 07, 2003 - 7:34 am UTC
Oh yes, 100% -- we have lots to say about RAID levels too. (see BAARF) </code>
http://www.oaktable.net/main.jsp <code>
It is just that nobody does that - they buy whatever is on sale that week and live with it. 96gig drives seem to be the norm these days.
I probably get a question along the lines of "Ok, I have 4 disks for my 300gig database, what should I do with them to maximize performance" ugg.
baarf initiative
bob, April 16, 2004 - 10:51 am UTC
where is the commentary? The link just says they are done talking about it. Your in or your out..
I would like to read some of this commentary from oaktable folks like yourself.
April 16, 2004 - 12:51 pm UTC
raid 5 can be made to work in certain situations just fine.
would i choose it in a system where I had lots of direct writes (lots of index creates, lots of lob writes, lots of direct path operations) -- not if I could avoid it.
would i choose it for temp, redo, arch? not if I could avoid it
Might i use it for datafiles (without the direct path operations above) -- which are written to by DBWR in the background (users don't wait for it) on a modestly busy system -- perhaps.
</code>
http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:359617936136 <code>