Skip to Main Content
  • Questions
  • Data Extract performance issue from ORACLE TABLE

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, PRS.

Asked: October 14, 2006 - 10:20 am UTC

Last updated: October 03, 2007 - 1:30 pm UTC

Version: 9.2.07

Viewed 10K+ times! This question is

You Asked

Tom,
I have a table around 12 Million rows. We are using third party software which requires input as flat files. So we are producing a flat files by extracting data from oracle table and sending it to the inbox of that software.
The problem is when we extract the data from a large table (volume 12 million to 15 million rows). The extract takes for large single table 30 to 40 minutes. Writing to disk take more time.
Is there anyway to improve the performance of extract? We are extracting the data through single select statement. I have tried giving PARALLEL hint but performance does not improve.


I have 20 large tables and produce 20 extract feeds for third party software. Any help is appreciated.

Thanks,
PRS

and Tom said...

Ok, answer this: how could running a full scan in parallel - but then feeding the results to a process that is serial improve performance :) You have a serial bit of code, you are not aggregating the data, you are just reading it row by row.

Now, that said, it should not take 30 to 40 minutes for a "normal sized" table to be unloaded like that. My "big_table" here has 10,000,000 rows and an average row width of about 100 bytes (that is what is relevant - the SIZE, not the number of rows really....)

$ time array_flat userid=big_table/big_table 'sqlstmt=select * from big_table' arraysize=500 > big_table.dat

Connected to ORACLE as user: big_table/big_table

Unloading 'select * from big_table'
Array size = 500
ID,OWNER,OBJECT_NAME,SUBOBJECT_NAME,OBJECT_ID,DATA_OBJECT_ID,OBJECT_TYPE,CREATED,LAST_DDL_TIME,TIMESTAMP,STATUS,TEMPORARY,GENERATED,SECONDARY,X
10000000 rows extracted

real 2m59.615s
user 2m12.010s
sys 0m8.260s

$ ls -l big_table.dat
-rw-rw-r-- 1 tkyte tkyte 1076782089 Oct 15 04:26 big_table.dat


So, on my machine - about 3 minutes to unload 10million x 100 bytes of data. I rather think the bottleneck exists in whatever bit of client code you've developed - for example, it is likely fetching a row at a time, not array fetching and performing it's IO suboptimally.

I used:

</code> http://asktom.oracle.com/~tkyte/flat/index.html <code>

the pro*c code I have there.


Rating

  (9 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

PRS, October 15, 2006 - 11:57 am UTC

I will try PRO*C program. My client code is only .sql being called from shell script.

Does arraysize setting in .sql will help?

Tom Kyte
October 15, 2006 - 1:07 pm UTC

it will help a little, default is 15.

Funny that you want high performance, on big tables, and you are using a simple command line utility as your approach.....

PRS, October 17, 2006 - 9:40 am UTC

Tom,
Used your proc program. But the speed is not great.
It takes 9 minutes to extract 3.5 M Records with record size 250 characters.

I checked on the server CPU and memory is available. It is sun v480 4 cpu 16GB RAM.

I tried playing with Arraysize parameter in the call. But it does not help much.

Any help is appreciated.

Thanks,
PRS

Tom Kyte
October 17, 2006 - 9:56 am UTC

well, unless YOU split up the unload (unloading various ranges...) we will use precisely 1 cpu (no sense to do a parallel full scan, you have a SERIAL OUTPUT process).

Memory - we used it just fine I'm sure.


Most people would say "gee, you took it from 30/40 minutes down to 9 minutes, that is AWESOME".

Anyway, do it yourself parallelism, you can run a query to get rowid pairs that slice the table into 8 (since you have 4 cpus) non-overlapping slices that completely cover the table.

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:10498431232211 <code>
split.sql, use it to get 8 rowid ranges and call array_flat 8 times in parallel.



Awesome?

John G, October 17, 2006 - 10:17 am UTC

"Most people would say "gee, you took it from 30/40 minutes down to 9 minutes,
that is AWESOME".

The original throughput was stated as 12 to 15 million rows in 30 to 40 minutes. This is 375K to 400K records per minute.
The followup stated a throughput of 3.5 million rows in 9 minutes. This is 388K rows per minute.

This doesn't pass the "awesome" threshold.


Tom Kyte
October 17, 2006 - 10:33 am UTC

ahh, that was not clear in a quick read (which is what reviews get, I don't reread the entire page), you sort of changed the metric on me.

Anyway

so likely, you have hit the "ceiling" on your machine - parallelize it, and make sure you are not bottlenecked on OUTPUT IO (eg: everything writing to the same disk) as well as INPUT IO (eg: make sure you light up all of the disks when you do your own parallel extract)

Awesome?

Jan van Mourik, October 17, 2006 - 10:45 am UTC

awe·some (ô's&#601;m) pronunciation
adj.
1. Inspiring awe: an awesome thunderstorm.
2. Expressing awe: stood in awesome silence before the ancient ruins.
3. Slang. Remarkable; outstanding: “a totally awesome arcade game” (Los Angeles Times).

- Slow process, 30-40 minutes for a large table
- 20 large tables
- So maybe 600~800 minutes
- AskTom question
- Answer gets me to 9 minutes
- So maybe 180 minutes total
- I went from 10+ hours to 3 hours without doing much myself
- Looks "Remarkable" to me!

Now maybe by taking Tom's hints and examples I can get it to perform even better, plus I learned a bunch about how to approach this problem, now how can that not be awesome???


extract performance from Oracle

Mark Johnson, November 14, 2006 - 6:47 pm UTC

PRS, check fastreader from wisdomforce.com if pro*c is still slow for you. fastreader probably could unload with speed about 100- 150K rows per second (if average width is 250 chars as you mentioned and you will use its parallel settings). Hope the storage I/O is fast otherwise it will be your major bottleneck

Follow up Question

A reader, September 28, 2007 - 7:24 pm UTC

Tom :

I appreciate your valued comment/suggestion. My question is related to the original question in the thread and here is my scenario:

---------
Scenario
---------

Our DB is on Host A and a third party Software Batch Script is running on Host B. The Batch Script
can read a file or a named pipe. The source for the Batch Script, is a bunch of rows
(could be huge depending on the scenario) from a table.


---------
Question
---------

1. Which one of the following is the most efficient (= FAST) way to pass the data to the Batch Script running
on Host B from the Database running on Host A ?

Kindly add yours if I am missing something here.

a. Use JDBC to get the data and write to the named pipe, that the Batch Script can read.
b. Same as a, except that write all the data to a file and once done, invoke the Batch script.
c. Same as a or b, except use OCI.
d. Same as a or b, except use PL/SQL
e. Same as a or b, except use array_flat (Pro*c Implementation) ?


2. Would your answer be any different, if the DB and the Third Party Software are running on the same Server ?

Tom Kyte
October 03, 2007 - 1:30 pm UTC

you do not tell me anything about what this batch script does or is.

I know it is not a sql script, since it can "read"

so - i have no comment as I have no idea what you are talking about.

Follow up question

A reader, October 05, 2007 - 2:04 am UTC

The batch script for instance could take a name, address1, address2, city, state and address fields from a customer table and write the cleansed
( standardization/formatting/validation etc., etc.,) values to a file.

Follow up

A reader, October 10, 2007 - 7:40 pm UTC

Tom :

This is a follow up of the previous question and this is the clarification to the question.

The Batch Script, mentioned in the previous posting will take row by row of customer data, do some enrichment, standardization, validation and report the results to a file. The Batch Script can either read from a file or read from a pipe.

My question is basically , what is the fastest way to give the data to the Script, from an Oracle table, since the Batch Script cannot read from a table directly.

Follow up

A reader, October 17, 2007 - 3:52 pm UTC

Tom :

Do you suggest that I post my response as a separate question ?