Skip to Main Content
  • Questions
  • How to spread a table into multiple disk without using partition table feature?

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Frank.

Asked: January 02, 2003 - 11:05 pm UTC

Answered by: Tom Kyte - Last updated: September 09, 2010 - 7:13 pm UTC

Category: Database - Version: 8.1.7

Viewed 1000+ times

You Asked

Tom,
(1) If i want to do what the subject asks in order to take advantage of the parallel disk read for the full table scan, is it possible without using partition table?
(2) Each table has one HWM even the table has multiple extents. For the partitioned table, I guess each partition has its own HWM, so a full table scan will read all blocks under HWM of every partition, am i right?
Thank you!
Frank

and we said...

1) sure, using file system striping is the most common. Just strip a bunch of disks.

short of that, create a tablespace with many datafiles. Use a locally managed tablespace of course. Use a uniform extent size that will cause the table to go into 100 or so extents. Create the table in that tablespace. We'll allocate extents for it in a round robin fashion - spreading the data out.


2) yes.

and you rated our response

  (24 ratings)

Is this answer out of date? If it is, please let us know via a Review

Reviews

allocate datafiles

January 03, 2003 - 10:29 am UTC

Reviewer: Frank from west palm beach, FL

Tom,
If a tablespace has datafiles such as df1, df2, df3...dfn, I though oracle would allocate extents in df1 first until it's full, then df2, df3...etc.

But your comment seems you are saying oracle allocates the extent in each datafile in round robin fasion even the lower number of the datafiles still have available spaces? Lets assume a segment's last allocated extent was in datafile(n), when it needs more extent, oracle will allocate it in the datafile (n+1), then n+2, ... max#, 1, 2, ... n, is it true?
But, my test shows it's not true. Here it goes,

I had tablespace named test which has two datafiles #10,#11 size 512k. Then i created a table in tbs test and inserted records, i found the file#11 started to fill out first then file#10, not like (ext#0)in file#11,(ext#1)in file#10,(ext#2)in file#11,(ext#3)in file#10....
So if i want to spread a 500M table into 5 disks, i can create 10 datafiles and spread them accrodingly, just like this (file#1,#6 in disk1, file#2,#7 in disk2...file#5,#10 in disk5). If i need more space in the feature, i need to add datafile in the same way, like this file#11 in disk1, file#12 in disk2...etc. If oracle did allocate extent in the round robin fashion, all i needed is creating 5 big datafiles(200M each), and set extent size to a relative small number i.e. 2M each, then oracle would put these extents evenly into the 5 datafiles. But this is not true accroding to my observation, Oracle does NOT allocate extents in tho round robin fashion by itself, the DBA forces oracle to spread the data by spreading the datafiles in the round robin fashion, but NOT oracle.

SQL>  drop tablespace test including contents and datafiles;

Tablespace dropped.

SQL> create tablespace test 
  2  datafile 'd:\frank\test01.dbf' size 512k
  3  extent management local;

Tablespace created.

SQL> alter tablespace test
  2  add datafile 'd:\frank\test02.dbf' size 512k;

Tablespace altered.

SQL>  create table scott.test 
  2   (c char(2000))
  3   tablespace test;

Table created.

SQL> select extent_id, file_id, block_id, bytes, blocks from dba_extents
  2  where segment_name='TEST'
  3  and owner='SCOTT';

 EXTENT_ID    FILE_ID   BLOCK_ID      BYTES     BLOCKS
---------- ---------- ---------- ---------- ----------
         0         11         17      65536         16

SQL> begin
  2    for i in 1..32 loop
  3      insert into scott.test values (i);
  4      commit;
  5    end loop;
  6  end;
  7  /

PL/SQL procedure successfully completed.

SQL> select extent_id, file_id, block_id, bytes, blocks from dba_extents
  2  where segment_name='TEST'
  3  and owner='SCOTT';

 EXTENT_ID    FILE_ID   BLOCK_ID      BYTES     BLOCKS
---------- ---------- ---------- ---------- ----------
         0         11         17      65536         16
         1         11         33      65536         16
         2         11         49      65536         16

SQL>  begin
  2     for i in 1..300 loop
  3      insert into scott.test values (i);
  4      commit;
  5    end loop;
  6  end;
  7  /
 begin
*
ERROR at line 1:
ORA-01653: unable to extend table SCOTT.TEST by 16 in tablespace TEST
ORA-06512: at line 3


SQL> select count(*) from scott.test;

  COUNT(*)
----------
       223

SQL>  select extent_id, file_id, block_id, bytes, blocks from dba_extents
  2   where segment_name='TEST'
  3   and owner='SCOTT';

 EXTENT_ID    FILE_ID   BLOCK_ID      BYTES     BLOCKS
---------- ---------- ---------- ---------- ----------
         0         11         17      65536         16
         1         11         33      65536         16
         2         11         49      65536         16
         3         11         65      65536         16
         4         11         81      65536         16
         5         11         97      65536         16
         6         11        113      65536         16
         7         10         17      65536         16
         8         10         33      65536         16
         9         10         49      65536         16
        10         10         65      65536         16

 EXTENT_ID    FILE_ID   BLOCK_ID      BYTES     BLOCKS
---------- ---------- ---------- ---------- ----------
        11         10         81      65536         16
        12         10         97      65536         16
        13         10        113      65536         16

14 rows selected.

SQL> select file#, substr(name, 1, 30) from v$datafile
  2  where file# in (10,11);

     FILE# SUBSTR(NAME,1,30)
---------- ------------------------------
        10 D:\FRANK\TEST01.DBF
        11 D:\FRANK\TEST02.DBF 

Tom Kyte

Followup  

January 03, 2003 - 11:17 am UTC

ops$tkyte@ORA817DEV> drop tablespace testing including contents;

Tablespace dropped.

ops$tkyte@ORA817DEV>
ops$tkyte@ORA817DEV> create tablespace testing
  2  datafile '/tmp/testing01.dbf' size 1m reuse
  3  extent management local uniform size 128k
  4  /

Tablespace created.

ops$tkyte@ORA817DEV>
ops$tkyte@ORA817DEV> alter tablespace testing
  2  add datafile '/tmp/testing02.dbf' size 1m reuse
  3  /

Tablespace altered.

ops$tkyte@ORA817DEV> drop table t;

Table dropped.

ops$tkyte@ORA817DEV> create table t ( x char(255) ) tablespace testing;

Table created.

ops$tkyte@ORA817DEV>
ops$tkyte@ORA817DEV> begin
  2     for i in 1 .. 20000000
  3     loop
  4          insert into t values (i);
  5          commit;
  6     end loop;
  7  end;
  8  /
begin
*
ERROR at line 1:
ORA-01653: unable to extend table OPS$TKYTE.T by 16 in tablespace TESTING
ORA-06512: at line 4


ops$tkyte@ORA817DEV>
ops$tkyte@ORA817DEV> select extent_id, file_id from dba_extents where segment_name = 'T' and owner = user
  2  order by 1
  3  /

 EXTENT_ID    FILE_ID
---------- ----------
         0         10
         1         12
         2         10
         3         12
         4         10
         5         12
         6         10
         7         12
         8         10
         9         12
        10         10
        11         12
        12         10
        13         12

14 rows selected.

ops$tkyte@ORA817DEV> drop tablespace testing including contents;

Tablespace dropped.

ops$tkyte@ORA817DEV>
ops$tkyte@ORA817DEV> create tablespace testing
  2  datafile '/tmp/testing01.dbf' size 1m reuse
  3  /

Tablespace created.

ops$tkyte@ORA817DEV>
ops$tkyte@ORA817DEV> alter tablespace testing
  2  add datafile '/tmp/testing02.dbf' size 1m reuse
  3  /

Tablespace altered.

ops$tkyte@ORA817DEV> drop table t;
drop table t
           *
ERROR at line 1:
ORA-00942: table or view does not exist


ops$tkyte@ORA817DEV> create table t ( x char(255) ) tablespace testing;

Table created.

ops$tkyte@ORA817DEV>
ops$tkyte@ORA817DEV> begin
  2     for i in 1 .. 20000000
  3     loop
  4          insert into t values (i);
  5          commit;
  6     end loop;
  7  end;
  8  /
begin
*
ERROR at line 1:
ORA-01653: unable to extend table OPS$TKYTE.T by 93 in tablespace TESTING
ORA-06512: at line 4


ops$tkyte@ORA817DEV>
ops$tkyte@ORA817DEV> select extent_id, file_id from dba_extents where segment_name = 'T' and owner = user
  2  order by 1
  3  /

 EXTENT_ID    FILE_ID
---------- ----------
         0         10
         1         12
         2         10
         3         12
         4         10
         5         12
         6         10
         7         12

8 rows selected.


<b>i cannot reproduce using either of LMT with UNIFORM or DMT on 817 -- however, using auto-allocate:</b>

ops$tkyte@ORA817DEV> drop tablespace testing including contents;

Tablespace dropped.

ops$tkyte@ORA817DEV>
ops$tkyte@ORA817DEV> create tablespace testing
  2  datafile '/tmp/testing01.dbf' size 1m reuse
  3  extent management local
  4  /

Tablespace created.

ops$tkyte@ORA817DEV>
ops$tkyte@ORA817DEV> alter tablespace testing
  2  add datafile '/tmp/testing02.dbf' size 1m reuse
  3  /

Tablespace altered.

ops$tkyte@ORA817DEV> drop table t;
drop table t
           *
ERROR at line 1:
ORA-00942: table or view does not exist


ops$tkyte@ORA817DEV> create table t ( x char(255) ) tablespace testing;

Table created.

ops$tkyte@ORA817DEV>
ops$tkyte@ORA817DEV> begin
  2     for i in 1 .. 20000000
  3     loop
  4          insert into t values (i);
  5          commit;
  6     end loop;
  7  end;
  8  /
begin
*
ERROR at line 1:
ORA-01653: unable to extend table OPS$TKYTE.T by 128 in tablespace TESTING
ORA-06512: at line 4


ops$tkyte@ORA817DEV>
ops$tkyte@ORA817DEV> select extent_id, file_id from dba_extents where segment_name = 'T' and owner = user
  2  order by 1
  3  /

 EXTENT_ID    FILE_ID
---------- ----------
         0         10
         1         10
         2         10
         3         10
         4         10
         5         10
         6         10
         7         10
         8         10
         9         10
        10         10
        11         10
        12         10
        13         10
        14         10
        15         12

16 rows selected.


<b>I did, so use uniform extent sizes</b>

just like I said:

Use a locally managed tablespace of course. <b>Use a uniform extent size that will cause the table to go into 100 or so extents.</b> Create the table in that tablespace.



 

Excellent stuff.

March 10, 2003 - 12:26 pm UTC

Reviewer: Dave from Colorado Springs

I'm implementing the poor-man's striping methodology (through round-robin extent allocation) on a DW fact table on 9iR2, and in order to get the most out of the excellent data segment compression feature I am defining the (LM) tablespace with a 32K block size.

Can you confirm whether it makes sense to aim at an extent size that is a multiple of (db_file_multiblock_read_count*32K) in this case?

I also have no hang-ups about segments having a great many extents, but where would you draw the line -- i was thinking that 2048-4096 might be around the most i wanted to deal with? (side note: i would like the extent size to be small if possible as i want to get bitmap index partitions in the same TS as their table partitions)

Thanks for any advice you can give with this.

Tom Kyte

Followup  

March 10, 2003 - 5:58 pm UTC

extent size should be a multiple of that yes.

in an lmt you really don't have to "deal" with extents -- they deal with themselves. 2-4k of them would work.

why does the extent size matter for getting an index in the same tablespace??

March 10, 2003 - 11:26 pm UTC

Reviewer: Dave from Colorado Springs

Thanks Tom,

The only reason for preferring a small extent size is that I have very many (maybe >30) local bitmap indexes. Most of them will have very few distinct values (in some cases only one or two distinct values per partition), so they can be very compact. If it were not for this I would be happy to take 100 extents per segment, but that would involve some wasted space on the indexes. I want them in the same TS as the table partitions in order to make TS Transport feasible.

If the overhead on a couple of thousand extents is not significant, and it will reduce my disk usage by 10%, then I'll go for that. 10% is not a hugely significant amount but if I can put off the bureaucracy of needing any money for more disks by 10% more minutes then it's effort well spent.

Thanks again.

March 10, 2003 - 11:31 pm UTC

Reviewer: Dave from Colorado Springs

Whoa, just thought of a supplemental question.

The 32K TS is on a database where the "default" (is that the right terminology?) block size is 8K. Does the db_file_multiblock_read_count apply as a multiple of the 8K for all datafiles, or is it TS dependant -- ie. (32K*dfmrc) for 32K TS?, and (8K*dmfrc) for 98K TS's on the same instance?

Tom Kyte

Followup  

March 11, 2003 - 7:42 am UTC

Interesting question.  I set up this test (normal blocksize is 8k)


ops$tkyte@ORA920> alter session set db_file_multiblock_read_count = 8;
Session altered.

ops$tkyte@ORA920> alter session set events '10046 trace name context forever, level 12';
Session altered.

-- flush buffer caches
ops$tkyte@ORA920> alter tablespace sixteen_k offline;
Tablespace altered.

ops$tkyte@ORA920> alter tablespace sixteen_k online;
Tablespace altered.

ops$tkyte@ORA920> alter tablespace users offline;
Tablespace altered.

ops$tkyte@ORA920> alter tablespace users online;
Tablespace altered.


-- this is in 8k users tablespace
ops$tkyte@ORA920> select /*+ full(b) */ count(*) from big_table.big_table b;

  COUNT(*)
----------
   1833792

-- this is in 16k tablespace
ops$tkyte@ORA920> select /*+ full(b) */ count(*) from big_table b;

  COUNT(*)
----------
   1833792


select /*+ full(b) */ count(*) from big_table.big_table b

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.12          0          4          3           0
Execute      1      0.01       1.26        197        196          4           0
Fetch        2      2.12      22.30      21795      22002          4           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      2.14      23.69      21992      22202         11           1

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  db file scattered read                       2774        0.63         21.16
********************************************************************************
select /*+ full(b) */ count(*) from big_table b

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.03          1          1          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      1.53       4.47      10853      10865          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      1.53       4.50      10854      10866          0           1

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  db file scattered read                       2714        0.04          3.45


<b>so, both had the same number of IO waits more or less.  The max IO size on my machine is 1m (tested that already) </b>

Now, if we take the number of PIO's and divide:

ops$tkyte@ORA920> select 21795/8, 10853/4 from dual;

   21795/8    10853/4
---------- ----------
  2724.375    2713.25

we can see that the 8k blocksize tablespace did 8block IO where as the 16k did 4block IO.

Hence, they must be setting

IO_SIZE = dbmrc * default_blocksize = 64k on my system

and using that....  Looking at the raw trace file confirms this:


select /*+ full(b) */ count(*) from big_table.big_table b
END OF STMT
...
WAIT #3: nam='db file scattered read' ela= 1139 p1=9 p2=32737 p3=8
WAIT #3: nam='db file scattered read' ela= 315 p1=9 p2=33035 p3=8
WAIT #3: nam='db file scattered read' ela= 278 p1=9 p2=33043 p3=8
WAIT #3: nam='db file scattered read' ela= 283 p1=9 p2=33051 p3=8
WAIT #3: nam='db file scattered read' ela= 308 p1=9 p2=33059 p3=8
WAIT #3: nam='db file scattered read' ela= 381 p1=9 p2=33067 p3=8
WAIT #3: nam='db file scattered read' ela= 336 p1=9 p2=33075 p3=8
WAIT #3: nam='db file scattered read' ela= 356 p1=9 p2=33083 p3=8
WAIT #3: nam='db file scattered read' ela= 360 p1=9 p2=33091 p3=8
.........
select /*+ full(b) */ count(*) from big_table b
END OF STMT
...
WAIT #3: nam='db file scattered read' ela= 1633 p1=15 p2=9 p3=4
WAIT #3: nam='db file scattered read' ela= 1064 p1=15 p2=13 p3=4
WAIT #3: nam='db file scattered read' ela= 1788 p1=15 p2=17 p3=4
WAIT #3: nam='db file scattered read' ela= 785 p1=15 p2=21 p3=4
WAIT #3: nam='db file scattered read' ela= 911 p1=15 p2=25 p3=4
WAIT #3: nam='db file scattered read' ela= 783 p1=15 p2=29 p3=4
WAIT #3: nam='db file scattered read' ela= 842 p1=15 p2=33 p3=4
WAIT #3: nam='db file scattered read' ela= 797 p1=15 p2=37 p3=4



the first query in the 8k tblspace used reads of 8, whereas the second used reads of 4 blocks. 

Better and better

March 11, 2003 - 8:24 am UTC

Reviewer: Dave from Colorado Springs

Great demo Tom.

Is it naive of me to think that the difference in db file scattered reads is nearly entirely due to the different block sizes?

Tom Kyte

Followup  

March 11, 2003 - 9:06 am UTC

If you mean the wait time -- not naive but incorrect ;)

Here, I just reran this:

select /*+ full(b) */ count(*) from big_table.big_table b

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.02       0.41        197        196          4           0
Fetch        2      1.68       4.73      21795      22002          4           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      1.70       5.15      21992      22198          8           1

Elapsed times include waiting on following events:<b>
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  db file scattered read                       2774        0.08          3.73
</b>

which is for all intents and purposes the same as the other 16k one.

So -- what happened?  How could this be?

It had everything to do with my test -- my test was to show the dbmrc -- not the performance difference between the two ;)  I demonstrated that the IO size is set -- and we use the IO size to read the data.

What happened here was:

a) i hadn't touched big_table.big_table in a while
b) I had been playing with ops$tkyte.big_table (it was "hot")
c) I'm using a cooked (ufs) filesystem, not raw.
d) my OS buffers cooked files.

Hence, when I read from big_table.big_table the first time, that was true PIO -- it really did go to disk.

queries against ops$tkyte.big_table were PIO's but they used the file system cache to read from!  Hence the waits were dramatically different.


Just running the big_table.big_table query again -- "fixed" that.


This is why I always laugh when people say "we need to flush that buffer cache in order to benchmark to get a baseline".  Unless they are using raw or have mounted their disks specially to bypass the os buffer cache, they are only kidding themselves (and even then, they are kidding themselves by flushing the cache -- that is even more artificial then leaving the cache there as the cache is never empty except right after startup and you do that what? once in a blue moon)  anyway -- I digress....


This is a good example though of why moving to RAW can actually slow you down.  Suppose you had a database on UFS.  You look and say "wow, doing tons of PIO, i've read raw is much faster, lets move to raw".  Next day performance is really down the tubes.  What happened?

Well, by moving to raw you turned all of your PIO into true PIO, real live honest disk reads as you've bypassed the OS filesystem cache!  In reality your PIO wasn't all true PIO in the first place ;)  So now you are really reading disks.   Now you make the false conclusion "raw is slow, they were wrong"....

feedback loop repeat endlessly... 
 

March 11, 2003 - 9:30 am UTC

Reviewer: Dave from Colorado Springs

Now I don't know what's worse -- being incorrect or being naive. I saw the TS offline/online and thought you had done that to flush the buffer cache. I see now that I'm only damning myself more with every word.

Thanks again Tom

Tom Kyte

Followup  

March 11, 2003 - 9:45 am UTC

well -- i did do that cause even getting a PIO from the OS buffer cache will cause some wait. I wanted it all to be PIO's -- the alter tablespace does that -- but the speed difference was due to other external differences ;)

July 07, 2003 - 9:28 pm UTC

Reviewer: Vipin from New york

Hi Tim,


Can't we acheive the stripping in yet another way say by the following methode:-


Using CREATE TABLE /*+ PARALLEL (DEGREE N)*/ tablespace tb1 AS SELECT * FROM ..option

Here the data file for the tablespace tb1 into which the table is craeted will be stripped across different disks.

Please correct me if i am wrong.



Tom Kyte

Followup  

July 08, 2003 - 7:11 am UTC

you would

a) need a tablespace with more then 1 file
b) not need parallel


say you have a tablespace with 4 files -- an LMT with uniform extents. Extent 1 will be in file 1, extent 2 in file 2, extent 3 in file 3, extent 4 in file 4, extent 5 in file 1 and so on.


you just need to load the data to achieve that. even with LMTs that are autoallocated, it'll put the first couple of small extents in the same file and then start moving from file to file.

DMT's will do the same thing -- round robin around the available files.

July 08, 2003 - 9:25 am UTC

Reviewer: Vipin

Hi Tom,

The method you said is the perfect way to do that, but what my doubt is, can't we do it using the method described above also.

I just wanted to confirm that my understanding about this was correct.

Tom Kyte

Followup  

July 08, 2003 - 11:56 am UTC

creating the table in parallel would work as well, yes. my point was, you don't *need* parallel in order to do it. It is all about extents.


July 08, 2003 - 1:03 pm UTC

Reviewer: Vipin

Thanks Tom

alter tbs offline ?

July 09, 2003 - 1:46 am UTC

Reviewer: Alvin from Philippines

What does taking a tablespace offline do ?

1. does it flush all the dirty blocks in the buffer cache to disk ?
2. and remove all the entries of all segments residing on that tablespace from memory ?
3. Do alter tablespace begin backup have the same effect ? or does it just freeze the scn ?


Tom Kyte

Followup  

July 09, 2003 - 11:01 am UTC

1) yes it would

2) depends on the version -- in all it "invalidates" them at least. in 8i, they appear to "go away", in 9i, they'll be in v$bh still but can be considered "not used blocks" since we don't use them when you online the tablespace

3) The dirty buffers that belong to the database files are written out to the files and the datafiles are checkpointed. The datafile headers are updated to the SCN captured when the begin backup is issued. The datafile headers are not updated until the 'alter tablespace end backup' command is issued; however, the datablock within the database files can continue to be read and updated.

how to spread a table accross multiple tablepsaces

April 21, 2004 - 2:09 pm UTC

Reviewer: Mir from Houston, TX

Tom,

Is it possible to spread a table accross multiple tablespaces without partitioning.

Tom Kyte

Followup  

April 21, 2004 - 9:03 pm UTC

no

Multiple Datafiles and Parallel Query

August 19, 2004 - 6:20 am UTC

Reviewer: Vivek Sharma from Bombay, India

Dear Tom,

I have few questions.

1. Is there any relation between Parallel Query Slaves and Number of Datafiles ? I mean, Is there any limit for a query to spawn number of parallel query processes based on number of datafiles in a multi CPU machine.

2. I have a 36 CPU machine. I have a partitioned table with 6 partitions and the size is around 30GB. I have 3 tablespaces with a datafile of 10 GB each and 2 partitions per datafile and these datafiles all are in a same controller. What will be the impact on the performance ?

3. Referring to the scenario in step 2, will I get any performance gain if I distribute each partitions to individual tablespace and datafiles and probably on different controllers. 30GB table, 6 Partitions hence 5GB per partition. If I keep the limit of the datafile to 2 GB thus creating 3 datafiles each for a partition, will it help.

4. Is there any disadvantage, in terms of performance, in using datafile size more than 2GB ? We have many datafiles which are of size 10GB, shall I consider reducing this to certain limit so that I get a good performance gain.

As usual, your explanation and answer will help me gain a good understanding on I/O.

Awaiting your response.

Thanks and Regards
Vivek

Tom Kyte

Followup  

August 19, 2004 - 9:52 am UTC

1) PQ uses dynamic ranges of the table. the number of files (which could all be on the same disk at the end of the day) isn't really used, the size of the table (how big the range is)

2) impossible to say?!? think about it -- how do you use this file? do you full scan it? do you partition eliminate to just the most current partition most often? do you index range scan it? you and only you can truly answer this. Given the way you use this data -- is what you've just done creating a hot spot or is it spreading the IO out evenly?

think "SAME" -- stripe and mirror everything. I'd want just one big logical volume and perhaps one or two files created on it. let the underlying stuff spread the IO out evenly no matter how I access/use the data.

3) same, same same - strip and mirror everything or just SE SE SE stripe it.

4) it would be rare and extreme for a large file to become a serious bottleneck.

I/O Contention

August 20, 2004 - 2:20 pm UTC

Reviewer: Vivek Sharma from Bombay, India

Dear Tom,

Thanks for your reply.

This is in reference to my above question (point 4). You say that it is rare and extreme that Large Datafile will have an I/O Contention.

Oracle 9i Documentation

</code> http://download-west.oracle.com/docs/cd/B10501_01/server.920/a96520/hardware.htm#98196 <code>
says that

"To avoid I/O bottlenecks during parallel processing or concurrent query access, all tablespaces accessed by parallel operations should be striped. Striping divides the data of a large table into small portions and stores them on separate datafiles on separate disks."

I can understand from this para that, If i have a datafile of 10GB and many concurrent users read data from this datafile then this should have a serious I/O contention. Whereas, If I spread the data into 5 datafiles of 2GB each and spread them on to separate disk, then the performance should be better with no I/O Contention.

Further, the documentation says

"When using manual striping, the degree of parallelism (DOP) is more a function of the number of disks than of the number of CPUs. First, it is necessary to have one server process for each datafile to drive all the disks and limit the risk of experiencing I/O bottlenecks. Second, manual striping is very sensitive to datafile size skew, which can affect the scalability of parallel scan operations. "

It co-relates the parallel query process with the number of datafiles to reduce I/O bottlenecks.

Awaiting your reply.
Thanks & Regards
Vivek

Tom Kyte

Followup  

August 21, 2004 - 11:08 am UTC

that is "DIY" striping, poor mans striping.

it would be infinitely easier and better and more maintainable to take that 10g file and just put it onto a logical volume that is striped over many disks. you achieve the same same same exact effect without any of the admin overhead associated with lots of little files. that and you can use a more reasonable strip size of say 1meg, instead of 2gig.


use filesystem striping (9i and before) or ASM striping (10g and above) and forget about this level of mini-micro management.


ASSM and table spread across different disks

April 15, 2005 - 12:16 pm UTC

Reviewer: Parag J Patankar from India

Hi Tom,

I had tried to check the same results with ASSM ( Auto segment space management ) in Oracle 9iR2

create tablespace testing
datafile '/tmp/testing01.dbf' size 1m
extent management local segment space management auto;

alter tablespace testing 
add datafile '/tmp/testing02.dbf' size 1m;

drop table t;

create table t ( x char(255) ) tablespace testing;

begin
  2     for i in 1 .. 20000000
  3     loop
  4          insert into t values (i);
  5          commit;
  6     end loop;
  7  end;
  8  /
begin
*
ERROR at line 1:
ORA-01653: unable to extend table OPS$ATLAS.T by 16 in tablespace TESTING
ORA-06512: at line 4
 
22:01:59 SQL> select extent_id, file_id from dba_extents where 
22:02:24   2  segment_name ='T' and owner='ATLAS';

 EXTENT_ID    FILE_ID
---------- ----------
         0          6
         1          6
         2          6
         3          6
         4          6
         5          6
         6          6
         7          6
         8          6
         9          6
        10          6
        11          6
        12          6
        13          6
        14          7

15 rows selected.

it shows that in ASSM I am not able to spread table into muliple disks. Am I correct Sir ?

if it is true then I have to consider very important point of i/o before setting up ASSM.

regards & thanks
pjp 

Tom Kyte

Followup  

April 15, 2005 - 1:20 pm UTC

the first couple of 64k extents in locally managed tablespaces using system allocated extents "stay together". See it started going to file 7 after the first couple.

see the first review at top, we discussed that.

Tables and tablespaces

June 19, 2005 - 12:15 pm UTC

Reviewer: Ravindran B from India

Tom,

Is there any possibility to split single table data across multiple tablespaces. For Ex. I want all names starting from A-D go to a tablespace1 and E-K to tablespace2 etc?

Tom Kyte

Followup  

June 19, 2005 - 12:27 pm UTC

partitioning accomplishes that -- yes.

You would range partition the table.

Tables and columns

June 20, 2005 - 11:30 am UTC

Reviewer: Ravindran B from India

Tom,

Thanks a lot. BTW, the performance is very slow when I do a stress test of inserting records into a table of 250 columns compared to 1000 columns (1000 columns insert is fast. Both tables are having PK on a column with same datatype and length. Is there any specific reason behind this? Will Partitioning any way help increase performance in 250 columns?

Tom Kyte

Followup  

June 20, 2005 - 12:53 pm UTC

not that I don't believe you -- but I believe you've done something wrong in the metric gathering here.

care to share your test.

July 21, 2005 - 6:23 pm UTC

Reviewer: Andrew

Tom,
I am trying to understand disk striping. So please bear with me if my question is stupid.

The 9i Performance Manual states :
</code> http://download-west.oracle.com/docs/cd/B10501_01/server.920/a96533/iodesign.htm#20395 <code>
" # Stripe depth is the size of the stripe, sometimes called stripe unit.
# Stripe width is the product of the stripe depth and the number of drives in the striped set."

Further down it states :
"
On low-concurrency (sequential) systems, ensure that no single I/O visits the same disk twice. For example, assume that the stripe width is four disks, and the stripe depth is 32k. If a single 1MB I/O request (for example, for a full table scan) is issued by an Oracle server process, then each disk in the stripe must perform four I/Os to return the requested data. To avoid this situation, the size of the average I/O should be smaller than the stripe width multiplied by the stripe depth. If this is not the case, then a single I/O request made by Oracle to the operating system results in multiple physical I/O requests to the same disk.
"

My understanding of striping was that if you "stripe" a file across 10 disks then you are breaking the file into 10 chunks and dividing it over 10 disks. So each chunk(stripe depth) multiplied by the number of disks (stripe width) will be equal to the total size of the file.

But based on the above definition of stripe width which is : stripe depth multiplied by number of disks,
it would seem that
<conclusion1>
stripe width = total size of file.
</conclusion1>

The second blurb from the manual says that " the average IO should be smaller than the stripe depth multiplied by stripe width" . But this equates to
Average IO < stripe_depth * stripe_depth * number_of_disks
(which doesn't make sense).

So this seems to suggest that maybe the definition is incorrect and actually
<conclusion2>
stripe width = number of disks
<conclusion2>

conclusion1 and conclusion2 contradict each other. What is the difference (i.e what am I missing here ?)

Thanks

Tom Kyte

Followup  

July 22, 2005 - 8:36 am UTC

<quote>
My understanding of striping was that if you "stripe" a file across 10 disks
then you are breaking the file into 10 chunks</quote>

that is not likely the case, you would be breaking that file into hundreds or thousands or more chunks distributed over the ten physical disks (depends on file size of course, somewhere between 1 and thousands or more chunks)

ThankYou

July 22, 2005 - 11:30 am UTC

Reviewer: Andrew


JOB Queue

July 25, 2005 - 11:40 am UTC

Reviewer: Ravindran B from India

Tom,

You advised earlier that we have to use AQ to replace JOB_QUEUE_INTERVAL and dbms_job in 10g.

I recently read something on DBMS_SCHEDULER which can replace the dbms_job in 10g. Is it true that DBMS_SCHEDULER can run the job immediately as it is scheduled? (i.e) Achieve the dbms_job.submit and _JOB_QUEUE_INTERVAL=1 of 9i?

Tom Kyte

Followup  

July 25, 2005 - 1:17 pm UTC

Um, where did I say that?

and what does file striping have to do with jobs and queues?

dbms_scheduler is in fact a "bigger, better" dbms_job. dbms_job is alive and kicking in 10g. AQ is not a replacement for jobs or the scheduler but a different technlogy all together.

</code> http://docs.oracle.com/docs/cd/B14117_01/appdev.101/b10802/d_sched.htm#996728 <code>

JOB_QUEUE

July 25, 2005 - 11:57 am UTC

Reviewer: Ravindran B from India

OOPS!!!!

Apologies. Question posted against irrelevant forum.
Pls ignore

EXTENT SIZE

March 24, 2006 - 6:37 am UTC

Reviewer: A reader

According to what I have understood is

1- There is no ONE good reason to use DMTs now. right?
2- If LMT tablespace with uniform extent size spreads
the data on multiple files (and its good for i/o performance
), then whats the use of LMT with non uniform size ? Also
you have said that even in LMT with non uniform size it will
put some initial extents in one file and then move on as
round robbin. So can one say that non uniform size is
of no use now? or what is the advantage of using non uniform
size over the uniform size.
3- Even if oracle is using round robbin, its just distributing the data over different files, this doesnt
make sure that data is distributed over i/o devices until
we actully by ourselves put files on different I/O devices.
So a question arises that how oracle number the datafiles?
is it the sequence in which we add datafile ?

Thanks


Tom Kyte

Followup  

March 24, 2006 - 9:59 am UTC

1) correct, there is no good reason to CREATE a DMT. If you have existing ones and are happy - let them be.

2) the uniform versus non-uniform has nothing to do with spreading data out. non-uniform (system allocated extents) do the same spreading out.

I use system allocated extents as my default these days - since many times we are not 100% sure of the final exact size of somethings and if you want to do parallel direct path operations - the extent trimming that takes place with the system allocated extents (but cannot with UNIFORM) is nice.

3) correct. This is just Oracle doing "poor man's striping", trying to stripe the tablespace over many devices. It is up to you to ensure
a) the devices are different OR
b) the devices are already striped

if you want even IO patterns.

but what about file numbering

March 24, 2006 - 10:11 am UTC

Reviewer: A reader

Thanks,

but what about file numbering, how oracle orders the
data file to allocate extent in?
for e.g.
first i add datafile file1 then file3 then file2, so will
oracle put 1st extent on file1 2nd on file3 3rd on file2
or 1st on file1 2nd on file2 3rd on file3?


Tom Kyte

Followup  

March 24, 2006 - 10:33 am UTC

round and round it goes - not documented. but you may observe it round robin'ing.

You cannot control it.

April 11, 2006 - 6:35 am UTC

Reviewer: Vikas from INDIA

Hi tom,

This was the analysis we got after running the ADDM.However we have 8 disks all configured on RAID 0+1 which means mirrored and striped. We have two TEMP tablespaces TEMP and DBA_TEMP for which the files are located at

sys@ADMW>select name from V$tempfile;

NAME
----------------------------------------------------------------------------------------------------
/fs-c01-a/databases/admw/temp-01.dbf
/fs-c01-a/databases/admw/dba_temp-01.dbf
/fs-b01-b/databases/admw/temp-02.dbf
/fs-a01-b/databases/admw/dba_temp-02.dbf
/fs-c01-b/databases/admw/dba_temp-03.dbf
/fs-a01-c/databases/admw/dba_temp-04.dbf
/fs-b01-a/databases/admw/temp-03.dbf
/fs-c01-b/databases/admw/temp-04.dbf

8 rows selected.
where in each file is located at different disk, to redistribute the I/O across many disks.

Still we got:

Finding The throughput of the I/O subsystem was significantly lower than expected.
Impact (minutes) 7.22
Impact (%) 56.68

The performance of file /fs-b01-a/databases/admw/temp-03.dbf was significantly worse than other files. If striping all files using the SAME methodology is not possible, consider striping this file over multiple disks.

What are your suggestions! Please guide.

Tom Kyte

Followup  

April 11, 2006 - 2:33 pm UTC

if the system is striped already, not sure why multiple datafiles.

are you experience tons of IO to your temp tablespace? You don't say.

Rebalancing across datafiles

September 03, 2010 - 10:53 am UTC

Reviewer: Zach from St. Paul, MN

We have a tablespace that was originally designed with one datafile. When that datafile was near capacity, a second one was added on another file system, and that one is nearing capacity. Over time this tablespace has become by far our busiest from a physical IO perspective (OLTP system). We are again needing to add space. We have 34 different mounts setup on different chunks of disk in our SAN. What I'd like our DBA to do is to create 34 files, one on each disk to spread the IO out as much as possible. We have no ability to restructure our database DDL at this time as this is a highly regulated system so this seems like the easiest way to spread out the IO and allow for much more growth of this tablespace. After scouring this forum and other resources, I'm struggling to find an easy way to accomplish this rebalancing. I realize we could just add all the new datafiles and new data extents would be round robin'd across these files, but what I really need to accomplish is spreading out all the existing data in the original 2 files across all the files. What is the easiest way to accomplish this?
Tom Kyte

Followup  

September 09, 2010 - 7:13 pm UTC

you should just use an OS utility to stripe a bunch of disks together...

I don't know what you mean by "we cannot restructure our database ddl" - but, if you want to have this tablespace striped over the 34 disks you would:

a) create a new tablespace with these 34 files
b) move all of the segments in the old tablespace to the new (alter table t move, rebuild the indexes)
c) drop old tablespace
d) rename new tablespace to old tablespace name

if you need to do this online, use dbms_redefinition to move the segments.


Consider moving to ASM (automatic storage management) to provide yourself with striping, rebalancing and lots of other stuff in the future