Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Jeff.

Asked: August 25, 2004 - 3:39 pm UTC

Last updated: June 15, 2007 - 7:51 am UTC

Version: 9.2.0

Viewed 10K+ times! This question is

You Asked

Tom,
Thanks for your time and this site, it is a valuable resource that I use almost every day.

My question is how to estimate the ultimate size of an index organized table.

Our high volume table is an IOT and we need to be able to estimate the ultimate size of the table at client sites with widely different row volumes. We can determine the average size the data in all of the columns, and the row count by working with the client. But we have been unable to find a formula to compute the total amount of storage that the IOT will take in the documentation or online.

I have tried to run a few different volumes through a test machine to derive a formula, but the client volumes far exceed the hardware that I have available to perform these tests.

Any help that you could provide in finding an approach to estimating the size of an IOT would be much appreciated.

Thanks,
Jeff


and Tom said...

i always answer this the same way -- table, index, iot, cluster, whatever:

a) create it
b) load it with REPRESENTATIVE data (a number 1,000,000,000,000 takes less storage than the number 9,999,999)
c) load it with a reasonable pct -- somewhere between 0.01 and 10% of the target volume (the bigger the table, the smaller the percent)
d) analyze it and multiply

But bear in mind that indexes grow differently based on how they are inserted into. A right hand side index grows much slower than an index that is randomly inserted into over time (if primary key is a monotomically increasing value, Oracle recognizes that and does 90/10 block splits, if randomly inserting it is more of a 50/50 split)

Rating

  (20 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

some clarifications

Jeff, August 25, 2004 - 4:56 pm UTC

Thanks for your quick response!

I understand the importance of using representative data and have developed programs to generate the data based on client specific needs. I will make sure it varies the amounts appropriately.

The IOT in question is always built using the CREATE TABLE AS SELECT syntax with roughly half the data (the inputs) and then multiple parallel processes will insert the other half (the results). But sometimes the ratio of inputs to results can be more like 1:5 or even 1:10.

Can you comment on how we might expect the the block splits to work for the CTAS followed by the incremental inserts, and how the block splits should affect our analysis?

Finally, when you say analyze it and multiply are you referring to gathering Oracle statistics on the table and then using those numbers to calculate the potential size, or simply determining the size of the IOT with 10% client-representative data and multiplying that number by 10?

Thanks!






Tom Kyte
August 26, 2004 - 9:22 am UTC

the CTAS will build a "perfect structure" (pctfree=whatever you set it to)

the incremental inserts are what will break it down over time. and then you are back to "right hand side" or "random". if you have a RHS index, one could reasonably expect it to be 10% white space over time (90/10 splits). if you have a randomly inserted into index (data goes on the left, the right, the middle, where ever) you would reasonably expect it to be closer to the 50% whitespace level over time (worst case).

So, perhaps a:

a) fill it with N% of the data
b) analyze it (yes, analyze, an IOT always uses the CBO so you'll be analyzing this in real life too!)
c) multiply out to 100% of the data and then double it to be safe.

temp tablespace

friend :), August 26, 2004 - 2:04 am UTC

Hi Tom

How to deallocate temporary tablespace space ?

i tried
alter session set events 'immediate trace name DROP_SEGMENTS level TS#+1';

but dont know what happens with this?


Tom Kyte
August 26, 2004 - 9:41 am UTC

why? what is the goal here...

method depends on the implementation, how did you do temp and whats the version.

but why?

Still not clear

Jeff, August 26, 2004 - 12:08 pm UTC

Tom, I apologize if I am being slow, but I want to make sure my space estimates are as accurate as possible.

The inserts of data after the CTAS will be random, definitely not "right handed". So we can expect the whitespace level to be closer to 50%.

I create the table with a CTAS and a series of inserts using representative data and the expected number of inserts, ending up with a table that is 10% of the expected final volume.

I then analyze the table, computing statistics, and have the following statistics available:

user_tables
NUM_ROWS Number of rows in the table
AVG_SPACE Average amount of free space in a data block
CHAIN_CNT Number of rows in the table that are chained
AVG_ROW_LEN Average length of a row in the table in bytes

user_indexes
BLEVEL B*-Tree level: depth of the index
LEAF_BLOCKS Number of leaf blocks in the index
DISTINCT_KEYS Same as the number of rows in the table
AVG_LEAF_BLOCKS_PER_KEY IOT has a primary key this value is always 1
AVG_DATA_BLOCKS_PER_KEY Average number of data blocks in the table for a key
CLUSTERING_FACTOR Indicates the amount of order of the rows in the table

Even without doing the analyze I have the size of the IOT segment available

user_segments
BYTES Size in bytes, of the segment

Is my formula simply

user_segments.BYTES * 10

Or should I use the data gathered during the analyze, and if so how?



Tom Kyte
August 26, 2004 - 2:02 pm UTC

((leaf blocks+branch_blocks) * 10)

if you did a 10% load.

you want to count "used" blocks (you could use dbms_space as well, without the analyze if you like)



temp

friend, August 26, 2004 - 11:16 pm UTC

I read it from metalink

Doc ID:
Note:47400.1
Subject:
EVENT: DROP_SEGMENTS - Forcing cleanup of TEMPORARY segments
Type:
REFERENCE
Status:
PUBLISHED
Content Type:
EXT/PLAIN
Creation Date:
22-SEP-1997
Last Revision Date:
11-AUG-2004
**************************

Please suggest best way to deallocate the temporary space as its doesnot get till 2-3 hours.
Thanks!


The DROP_SEGMENTS event


Tom Kyte
August 27, 2004 - 7:38 am UTC

umm, that note is ancient history.

are you using a true temporay tablespace with tempfiles et.al. ? if so, no cleaup necessary. if not, do so.

Should be the last question

Jeff, August 27, 2004 - 10:08 am UTC

Tom,
Thanks for your answers, I think I have the information that I need.

Can you please verify that the following is correct?

In order to determine the number of branch blocks I need to analyze my IOT and validate the structure, then query INDEX_STATS.

ANALYZE TABLE iot_test VALIDATE STRUCTURE;

SELECT blocks, lf_blks, br_blks FROM INDEX_STATS;

There is no other way to determine the number of index blocks mathematically from the information provided by DBMS_SPACE, or the statistics columns in USER_TABLES or USER_INDEXES.

At this point I would have the information need to estimate with a fair degree of accuracy the ultimate size if my IOT. I can then take this number and add in some % of additional space for growth and some more just "to be safe".

Thanks again for your time.


Tom Kyte
August 27, 2004 - 10:18 am UTC

sure.



plsql

friend, September 09, 2004 - 1:07 am UTC

Hi tom

Its 3 years back i worked on plsql and after few days i have client interview of plsql
can you suggest any link or idea to get prepare for plsql interview?

Thanks to be in this world


Tom Kyte
September 09, 2004 - 8:13 am UTC

otn.oracle.com -> documentation. plsql programmers guide.

long

friend, September 10, 2004 - 1:51 am UTC

Tom,

whats the way to select long datatype over dblink?

Regards,



Tom Kyte
September 10, 2004 - 8:34 am UTC

scott@ORA9IR2> desc user_views@ora920.us.oracle.com;
Name Null? Type
---------------------------------------- -------- ----------------------------
VIEW_NAME NOT NULL VARCHAR2(30)
TEXT_LENGTH NUMBER
TEXT LONG
TYPE_TEXT_LENGTH NUMBER
TYPE_TEXT VARCHAR2(4000)
OID_TEXT_LENGTH NUMBER
OID_TEXT VARCHAR2(4000)
VIEW_TYPE_OWNER VARCHAR2(30)
VIEW_TYPE VARCHAR2(30)
SUPERVIEW_NAME VARCHAR2(30)

scott@ORA9IR2> select text from user_views@ora920.us.oracle.com;

TEXT
-------------------------------------------------------------------------------
select "X","Y" from t

scott@ORA9IR2> select max(text_length) from all_views@ora920.us.oracle.com;

MAX(TEXT_LENGTH)
----------------
11617

scott@ORA9IR2> select text from all_views@ora920.us.oracle.com where text_length = 11617;

TEXT
-------------------------------------------------------------------------------
select triguser.name, trigobj.name,
decode(t.type#, 0, 'BEFORE STATEMENT',
1, 'BEFORE EACH ROW',
2, 'AFTER STATEMENT',
3, 'AFTER EACH ROW',

..............



mutuatin

friend, September 13, 2004 - 12:50 am UTC

Hi Tom,

What is mutuation in oralce?
Please explain with practical aspect.

Thanks!


Tom Kyte
September 13, 2004 - 7:54 am UTC

what does this have to do with an IOT sizing exercise?

mutuation

frined, September 13, 2004 - 11:57 pm UTC

Hi Tom,
I tried to find the article with "mutuation" keyword but did not get single one so i posted here (no link with IOTs).
Sorry....

Now please provide update on following:
What is mutuation in oralce?
Please explain with practical aspect.

Thanks!


Tom Kyte
September 14, 2004 - 7:49 am UTC

mutating

the word is mutating.

</code> http://www.brainydictionary.com/words/mu/mutuation192859.html <code>is what mutuation is (and it is not related to the database)

so, please search on the correct word

Further help for Jeff in sizing IOTs

Scott Martin, September 22, 2004 - 11:48 pm UTC

Jeff;

If you are still interested in sizing of index only tables,  you may be interested in some of the views we’ve written above the raw index data itself.  For example, to generate a histogram of available space in your index blocks, you could simply execute:

select round(availablespace, -2)   availablespace
     , count(*)                    nblocks
     , min(mydbadot)               sample
  from INDEX_BLOCKS
 where owner = 'SCOTT'
   and name  = 'ACCOUNTI'
group by round(availablespace, -2)
;
AVAILABLESPACE    NBLOCKS SAMPLE
-------------- ---------- ------------
             0          1 1.26710
          2600       5991 1.33803
          2700          9 1.33802
          2800        659 1.26858
          2900          7 1.26711
          3200          1 1.40262

If you wanted a detailed look at the distribution of keys across your branch and leaf blocks try
SQL> select min(indexlevel)       indexlevel
  2       , count(*)              nkeys
  3       , mydbadot              dba
  4       , rpad(min(char01), 11) minkey
  5       , rpad(max(char01), 11) maxkey
  6    from index_keys
  7   where owner      = 'SCOTT'
  8     and name       = 'PAT_SSN'
  9  group by mydbadot
 10  order by 1 desc, 4
 11  ;

INDEXLEVEL      NKEYS DBA          MINKEY      MAXKEY
---------- ---------- ------------ ----------- -----------
         1         15 1.16496      1           9
         0          2 1.16497      081-08-7388 084-73-5622
         0          4 1.17539      110-96-8149 184-45-1682
         0          1 7.4938       193-72-7052 193-72-7052
         0          4 1.17538      219-69-5713 233-02-2989
         0          1 7.4942       258-72-1800 258-72-1800
         0          1 7.4940       278-87-0378 278-87-0378
         0          2 1.17540      281-77-2553 289-39-5390
         0          4 1.17536      357-59-5655 363-27-3772
         0          4 7.4103       436-68-8596 479-19-4054
         0          1 7.4941       482-73-1638 482-73-1638
         0          3 7.4102       549-38-2766 580-86-3383
         0          3 7.4937       629-66-0887 699-68-6369
         0          4 7.4101       723-91-9173 784-73-5471
         0          1 7.4943       798-80-2422 798-80-2422
         0          3 1.17537      819-45-0329 876-76-8588
         0          2 7.4939       949-46-3452 959-96-9589

We can also generate a list of all key values which required the use of the overflow table if that is of interest to you.  Check out 
http://www.tlingua.com/articles/iot.html
for more cool stuff.

Hope this helps,
SCott.
 

index size estimation

A reader, October 17, 2005 - 3:33 pm UTC

Hi

I need to estimate how much space I need to for two indexes based on a column of number(9) and a column of vharchar2(9). The table has over 2000 million of rows.

The method you mentioned load some sample data, analyze and multiply is fine however I dont understand the multiply part.... For an index what do I use to multiply? What values from dba_indexes or dba_ind_columns?

The other question is you said that a right hand index grows much slower, is a right hand index a monotomically increasing value for example a PK? If that is the case I dont understand, doesnt that make the index grow faster?


Tom Kyte
October 17, 2005 - 9:04 pm UTC

more likely from dba_segments - the size occupied by the index itself.


right hand indexes under go "90/10" splits (when the right block fills - it puts 90% of the data to the left, 10% to the right - the index is about 90% full)....

randomly inserted into indexes under go closer to a 50/50 split - about half over there, about half over here - so they may be 50/40% "full" in general - if the data is truly randomly inserted.

index size estimation

A reader, October 18, 2005 - 3:15 am UTC

Hi

To estimate the index size, if I have to use dba_segments I dont have to analyze correct?
With dba_segments I have a problem, out extent sizes are 4GB for this table so it might show more allocated space than needed. I understand if this is the case I should dbms_space and find the real occupied size right?

I have read Metalink and it talks about

LF_ROWS
LF_BLKS
LF_ROWS_LEN
LF_BLK_LEN
DEL_LF_ROWS
DEL_LF_ROWS_LEN
PCT_USED

Are those really useful to estimate an index size...?

Tom Kyte
October 18, 2005 - 9:05 am UTC

4gig really? (~2gig extents maybe?)

ops$tkyte@ORA9IR2> create tablespace testing extent management local uniform size 4294967296;
create tablespace testing extent management local uniform size 4294967296
                                                               *
ERROR at line 1:
ORA-25147: UNIFORM SIZE value greater than maximum extent size


ops$tkyte@ORA9IR2> !oerr ora 25147
25147, 00000, "UNIFORM SIZE value greater than maximum extent size"
// *Cause: In CREATE/ALTER TABLESPACE, the value specified for the UNIFORM SIZE         option was greater than the maximum extent size.
// *Action: Choose a lower value for the UNIFORM SIZE option.



or analyze and use index_stats...

or create a tablespace with system allocated extent sizes for sizing - the extents start small and grow over time - so 4gig won't come into play as a fixed size

 

explanation on IOT estimatoin really helpful

Kris, December 06, 2006 - 11:38 am UTC

How ever, I would like to know how to caliculate size.

1) I analyze and query index_stats and results are

SQL> SELECT blocks, lf_blks, br_blks FROM INDEX_STATS;

    BLOCKS    LF_BLKS    BR_BLKS
---------- ---------- ----------
        56         44          1

My database block size is 8k, so total size is 56*8k?
Please let me know. I think am wrong.

Kris 

Tom Kyte
December 07, 2006 - 8:45 am UTC

why do you think that, don't be so hard on yourself.

you have 56 blocks allocated, 44 of which are leaf blocks and 1 is a branch (root block in this case) block.



ya correct

Kris, December 07, 2006 - 10:26 am UTC

ya thatz correct

reader

A reader, March 30, 2007 - 5:50 pm UTC

Is there a reliable way of finding a avg_row_len for an index

For tables dba_tables.avg_row_len gives the average row length of a table

For index , I did a "alter index ... validate structure" and calculated (LF_ROWS_LEN+BR_ROWS_LEN)/(BR_ROWS+LF_ROWS). Does not seems to be correct. DBA_INDEXES does not have a average row length

Tom Kyte
March 31, 2007 - 12:50 pm UTC

branch rows don't have full keys in them.

It depends on what you want to use this number for - lf_rows_len / lf_rows seems to me to be a 'sensible' number

but, not knowing what you could possible do with this number - and hence "why" you want it - I don't know.

reader

A reader, March 31, 2007 - 1:06 pm UTC

I am using it for sizing calcultion. Finding the disk space used by the index
Tom Kyte
March 31, 2007 - 1:17 pm UTC

you do not need the average "row" length then

you need the size of the segment, and you have that.

A reader, March 31, 2007 - 2:45 pm UTC

There is a sizing calculation for a brand new databse. This is a 3rd party vendor. They have a spread sheet based on the average row length of table and average row length of index. For indexes, in my calculations, the sizing based on lf_row_len/lf_rows = 19 where as the index column (index on only one column) is DATE and the length is only 7

Does date datatype stored in a special structure
Tom Kyte
April 01, 2007 - 7:59 pm UTC

you won't get there from here.

The only way I know to size is actually load some representative data, measure and extrapolate.

And sometimes with that you get really lucky and get a number close to what reality will show.

Because in real life, there are way too many variables....

Table size estimation

A reader, June 15, 2007 - 6:21 am UTC

Tom,

I go though your site whenver I am free just browsing thru it.....Its amazing.

Question:
Can you provide some article/document on how to calculate the average row length and
How do you decide intital storage parameters for the Table when you create them

Thanks,
Dheeraj

A reader, November 22, 2007 - 12:21 pm UTC

Is there a way to measure waits due to index block splits?