Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Keith.

Asked: November 11, 2009 - 12:48 pm UTC

Last updated: September 24, 2013 - 4:57 pm UTC

Version: 10.2.0.3

Viewed 10K+ times! This question is

You Asked

Mr. Kyte,

I am working on databases where the most important concern is insert rate. The databases are on 32-bit RHEL (free buffer waits of course). We are currently inserting at a rate of approx. 4 MB/sec into tables (standard ddl below) that are range partitioned by date (time span = 10 mins...converted into secs from 1.1.1970). Table size is approx. 200 GB. Extents are currently an uniform size of 10M (2M for indexes) using LMT and ASSM. Only one session inserts into each table. Inserts are bulk inserts of 5000 rows at a time via OCI. Each table has approx. 6 indexes (local) that cannot be dropped and/or rebuilt. The average rows sizes are 200-400k. There are no PKs or FKs. The database is in NOARCHIVELOG mode as there is currently no recovery requirement.

Outside of improving the disk layout, disk performance, using ASM, etc... (which are all in the works, especially for redo), what advice would you give in regards to blocksize, subpartitioning, extent mananagement, etc... for the tables to increase the insert rate?

Thanks.

Keith

Table DDL:

CREATE TABLE sample_table_ddl
(
row1 RAW(8),
TIMETAG NUMBER NOT NULL,
row3 NUMBER,
row4 NUMBER,
row5 NUMBER,
row6 NUMBER,
row7 NUMBER,
row8 NUMBER,
row9 NUMBER,
row10 NUMBER,
row11 NUMBER,
row12 NUMBER,
row13 NUMBER,
row14 NUMBER,
row15 VARCHAR2(33 BYTE),
row16 NUMBER,
row17 NUMBER,
row18 NUMBER,
row19 NUMBER,
row20 NUMBER,
row21 NUMBER,
row22 NUMBER,
row23 NUMBER,
row24 NUMBER,
row25 NUMBER,
row26 NUMBER,
row27 NUMBER,
row28 NUMBER,
row29 NUMBER,
row30 NUMBER,
row31 NUMBER,
row32 NUMBER,
row33 NUMBER,
row34 NUMBER,
row35 NUMBER,
row36 VARCHAR2(25 BYTE),
row37 VARCHAR2(25 BYTE),
row38 VARCHAR2(25 BYTE),
row39 VARCHAR2(25 BYTE),
row40 NUMBER,
row41 NUMBER,
row42 NUMBER,
row43 NUMBER,
row44 NUMBER,
row45 NUMBER,
row46 NUMBER,
row47 NUMBER,
row48 NUMBER,
row49 NUMBER,
row50 NUMBER,
row51 NUMBER,
row52 NUMBER,
row53 VARCHAR2(9 BYTE),
row54 RAW(3),
row55 VARCHAR2(7 BYTE),
row56 NUMBER,
row57 VARCHAR2(25 BYTE),
row58 NUMBER,
row59 NUMBER,
row60 NUMBER,
row61 NUMBER,
row62 NUMBER,
row63 NUMBER,
row64 VARCHAR2(5 BYTE),
row65 NUMBER,
row66 NUMBER,
row67 NUMBER,
row68 RAW(8),
row69 NUMBER,
row70 NUMBER,
row71 NUMBER,
row72 NUMBER,
row73 NUMBER
)
TABLESPACE DATA_CDR
PARTITION BY RANGE (TIMETAG)
(
PARTITION P1257533000 VALUES LESS THAN (1257533000)
LOGGING
NOCOMPRESS
TABLESPACE DATA,
PARTITION P1257533600 VALUES LESS THAN (1257533600)
LOGGING
NOCOMPRESS
TABLESPACE DATA,
PARTITION P1257534200 VALUES LESS THAN (1257534200)
LOGGING
NOCOMPRESS
TABLESPACE DATA
)
NOCOMPRESS
NOCACHE
NOPARALLEL
MONITORING;

and Tom said...

if only one thing at a time inserts into a table, partitioning will not make it any faster - partitioning could only make inserts faster if they remove contention. Partitioning makes an insert take a small bit longer in single user mode as we have to figure out "where to put the data" before putting it. You are partitioning I assume so as to make it easy to purge old data...

noarchivelog mode here is not relevant - since arch would do its work in the background and not really impact performance of the inserts. You are generating redo because as far as I can tell - you are using conventional path inserts.

extent management isn't going to be a big deal either - locally managed can add extents blindingly fast.

What might be of use would be direct path loads and larger bulk inserts. Direct path will always write above the high water mark of the table (I'm assuming that you don't delete, you always truncate/drop old partitions - if so, direct path is perfectly ok for you space wise). It'll also take the newly inserted data and create a mini index of just the new stuff and merge it into the existing indexes in bulk (more efficient index maintenance). A direct path load can skip redo and undo generation for the TABLE (but not the indexes), the direct path index maintenance however will tend to generate less redo/undo in general.

http://docs.oracle.com/cd/B19306_01/appdev.102/b14250/oci12obn.htm#sthref1920


that would be something to consider.

Rating

  (15 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Oleksandr Alesinskyy, November 13, 2009 - 2:57 pm UTC

What you mean by "merging an index"? May you provide an example?

Tom's answered the q in a different thread

A reader, November 15, 2009 - 3:03 pm UTC

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:486472300346315675#489412000346609386

when we direct path - we maintain "mini indexes" for the newly loaded data and AFTER the load is done, we merge in bulk these mini indexes with the main index.

Instead of adding the index entries one by one as we load. It is done in bulk, less 'damaging' to the index structure that way.
Tom Kyte
November 15, 2009 - 3:36 pm UTC

thanks

and that is also why indexes can be left in a direct load state (unique ones) - because we cannot detect unique violations at the time of insert necessarily (also why direct path insert + dml error logging isn't supported - the insert will fail, not error log...)


direct path and dml error logging

A reader, November 16, 2009 - 10:18 am UTC

Tom,

I think (I don't tested it right now) but direct path do not support only Unique and Primary Key dml error logging. But it supports other errors like Foreign key violation, null constraints, etc...

Is'nt it?

Mohamed Houri


Tom Kyte
November 23, 2009 - 12:12 pm UTC

correct, it is documented as such.

Batch Commit

A reader, November 17, 2009 - 3:59 am UTC

Hi,

Direct path operations would definitely be the first choice.

But wouldn't a large LOG_BUFFER size and batch commits (COMMIT_WRITE='BATCH,NOWAIT') also be a good idea for this one as the data is obviously not recovery critical?

Cheers,
Markus
Tom Kyte
November 23, 2009 - 1:49 pm UTC

well, if they just write their code in plsql, they'll get batch,nowait - plsql has been doing that since version 7.0

but, I hesitate to suggest this in an external program, for the simple reason that a lot of people don't know how to use it properly. Improper use can lead to data loss (because of the application doing the data load NOT because of the database) or duplication (NOT because of the database)

adderek, November 17, 2009 - 5:58 pm UTC


Free Buffer Waits

David Eads, November 19, 2009 - 4:57 pm UTC

Keith,

If you are seeing a lot of Free Buffer Waits then you may need to add more DB Writers or possibly turn on Asynch IO. Direct path inserts will be the fastest way to insert with a single process, but if you can break it down and do inserts in parallel with multiple processes then conventional inserts can be much faster.

Moving to multiple sessions writing to the table first mentioned

Keith Cutler, April 29, 2010 - 2:43 pm UTC

Tom,

Thanks as always for doing what you do!!

We are now moving to multiple sessions writing to the table mentioned in the original question.

I am thinking that adding hash sub-partitions to this range partitioned table will stave off block contention. Would you agree?

Thanks again!!

Keith
Tom Kyte
April 29, 2010 - 3:00 pm UTC

... will stave off block contention. Would you agree?...

the free buffer waits you were experiencing where not contention induced (remember, you started with "only one thing inserts at a time"). They were likely caused by you trying to create 5,000 new rows in the *buffer* cache. The free buffers we needed were buffers in the cache.

Adding hash partitioning would only make that worse (as the 5,000 blocks now have to be put away into probably even more blocks).


direct pathing would bypass that - we'd write to the datafiles directly, not via the cache. However, only one thing can direct path into a single segment at a time.

Clarification

Keith Cutler, April 29, 2010 - 3:16 pm UTC

Sorry, did not mean in order to reduce free buffer waits. I know now they are caused by buffer cache size (on 32-bit) and poor I/O design. Moving to 64-bit and ASM to resolve that issue.

That aside, my thinking was that if multiple sessions will now be writing to the same partition (range), that they would compete for the next free block. So if I hash sub-partition, each session would possibly (if sub-partition column is unique enough) write to different blocks.

I used hash partitioning to reduce "hot block" contention when designing tables that supported ticket sales (where everyone wants the best seats/front row) along with PCTFREE to reduce the number of rows per block and thought it might similarly apply in this situation as well.

Thanks again!! (Especially for the fast response)
Tom Kyte
April 29, 2010 - 3:35 pm UTC

... That aside, my thinking was that if multiple sessions will now be writing to
the same partition (range), that they would compete for the next free block....

in the old days - multiple free lists would be called for - today, ASSM (automatic segment space management) which you are using would. They (the concurrent sessions) would not be inserting into the same block in general

The hash partitioning could help with locally partitioned, monotonically increasing indexes - indexes on sequences, or dates that are always increasing. Those indexes would have a 'hot right hand side' and the more of them you have, the cooler they will be.

But the indexes would have to be locally partitioned in order to have me partition the table to achieve that. If they are not locally partitioned, they would be globally partitioned and can be partitioned entirely independent of the underlying table (meaning it would not be necessary to partition the table to achieve the index contention reduction)

ASSM Issue & Concurrent DML

A reader, July 06, 2010 - 10:56 am UTC

Hi Tom,

In our Database we have online tablespace with two datafile in it and segment space management as Auto (ASSM) and in that we have regular table (ex: transaction_table).
When the following procedure is executed in 10 CONCURRENT process(from shell script) with different input parameter , the size of the table (transaction_table) grows drastically
In our case it grows from 6 GB to 25 GB where as it should have been from 6 GB to 7 GB
We are using Oracle 10.2.0.3 version

TRANSACTION_TABLE details
--------------------------
TABLESPACE ABC
PCTUSED 0
PCTFREE 10
INITRANS 1
MAXTRANS 255
STORAGE (
INITIAL 300M
NEXT 2M
MINEXTENTS 1
MAXEXTENTS UNLIMITED
PCTINCREASE 0
BUFFER_POOL DEFAULT
)
NOLOGGING
NOCOMPRESS
NOCACHE
NOPARALLEL
MONITORING

Procedure1 (input_parameter in number)
......
......
cursor c1 as select * from table1 where column1=input_parameter;

begin

open cursor
loop

--- lots of validation of data , based on the validation the following IF condition executes

if condition1 then

delete from transaction_table where column1=

goto last_step;
end if

if condition2 then

update.... transaction_table where column1=

goto last_step;
end if

if condition3 then

insert into transaction_table.......

goto last_step;
end if

<<last_step>>
.....
end loop
close cursor

end procedure1;

To cross check, when we move the table(transaction_table) to different tablespace with Alter command and then again move back to the original tablespace, then the size of the table is 7 GB
Please suggest what could be the problem here...

Thanks

Tom Kyte
July 06, 2010 - 4:43 pm UTC

you don't say where the commits are here at all.

you seem to indicate that this must be ASSM, but have you reproduced anything similar using manual segment space management?

ASSM Issue & Concurrent DML

A reader, July 06, 2010 - 11:55 pm UTC

We do a commit for every 10000 rows processed in the cursor loop and a commit after the loop(end of the procedure).
This scenario is not reproducible for Manual segment space management

Thanks
Tom Kyte
July 08, 2010 - 11:22 am UTC

please utilize support for this - I don't have anything to grab onto here to diagnose with.

ASSM Issue & Concurrent DML

A reader, July 09, 2010 - 12:14 am UTC

Thanks Tom

Insert on Tables with Lots of Indexes.

Suhas Patil, September 16, 2013 - 1:38 pm UTC

Hi Tom,

Good Day.

While loading data into tables which is highly indexed we are facing severe performance issues. The table has around 70Mil records and we are trying to insert about 1 mill records in 2 batches of 500K. The same data when loaded into a un-indexed table takes fraction of a second. So we wanted to know the below.

1. Is there a thumb rule to calculate the extra time required to maintain indexes for batch load. Say our table has 200 columns and 100 columns are indexed (about 80 Indexes) , so can i calculate the time required to maintain these indexes for a load of say 100K records.

2. Is there any better way of loading the data without dropping the index before the load and re-creating after the load.The tables are used by applications which are online and dropping the indexes will result in downtime.


Regards,
Suhas.
Tom Kyte
September 24, 2013 - 4:57 pm UTC

1) it can be as much as 2 to 3 times as long for each index (eg: if a row can be inserted in N units of time, it might be as much as 2/3*N for a row with an index). so your might might be taking as much as 100 times longer due to the indexes (really - 100 indexes???? really?)


2) absolutely. Load into a new empty table. Index table. Add empty partition to existing table. Exchange this partition in. If you need global indexes, you can have them maintained during the exchange in operation (hopefully you would have far less global indexes than you have indexes right now).

also, ensure you are using a direct path load - that will maintain the indexes on a single table (no partitioning) as efficiently as possible. the direct path load will create "mini indexes" on the newly loaded data and at the end of the operation - it will merge them into the existing indexes. bear in mind a direct path load will lock the table during the load. You would want to add all 1,000,000 records in a single insert /*+ append */

if you ...

Sokrates, September 24, 2013 - 6:15 pm UTC

... Load into a new empty table. Index table. ...,
then you can calculate the time required to build the indexes, see
http://www.sqlfail.com/2013/08/26/index-size-estimatation/

A reader, September 25, 2013 - 2:39 pm UTC

Hi Tom,

"the direct path load will create "mini indexes" on the newly loaded data and at the end of the operation - it will merge them into the existing indexes."

when it merge in to exiting indexes will it re-build or insert+delete?

Thank you,

Thanks Tom

Suhas Patil, October 02, 2013 - 9:27 am UTC

Hi Tom,

Thanks a lot for your reply.

The table is Siebel CRM tables hence the high indexes.

Please let me know if the time of index maintenance (2 to 3 times for a index) can be reduced.

Thanks,
Suhas

More to Explore

VLDB

If you are new to partitioning, check out Connor McDonald's introduction series here.

VLDB

Documentation set on VLDB and Partitioning.