PRS, October 15, 2006 - 11:57 am UTC
I will try PRO*C program. My client code is only .sql being called from shell script.
Does arraysize setting in .sql will help?
October 15, 2006 - 1:07 pm UTC
it will help a little, default is 15.
Funny that you want high performance, on big tables, and you are using a simple command line utility as your approach.....
PRS, October 17, 2006 - 9:40 am UTC
Tom,
Used your proc program. But the speed is not great.
It takes 9 minutes to extract 3.5 M Records with record size 250 characters.
I checked on the server CPU and memory is available. It is sun v480 4 cpu 16GB RAM.
I tried playing with Arraysize parameter in the call. But it does not help much.
Any help is appreciated.
Thanks,
PRS
October 17, 2006 - 9:56 am UTC
well, unless YOU split up the unload (unloading various ranges...) we will use precisely 1 cpu (no sense to do a parallel full scan, you have a SERIAL OUTPUT process).
Memory - we used it just fine I'm sure.
Most people would say "gee, you took it from 30/40 minutes down to 9 minutes, that is AWESOME".
Anyway, do it yourself parallelism, you can run a query to get rowid pairs that slice the table into 8 (since you have 4 cpus) non-overlapping slices that completely cover the table.
</code>
http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:10498431232211 <code>
split.sql, use it to get 8 rowid ranges and call array_flat 8 times in parallel.
Awesome?
John G, October 17, 2006 - 10:17 am UTC
"Most people would say "gee, you took it from 30/40 minutes down to 9 minutes,
that is AWESOME".
The original throughput was stated as 12 to 15 million rows in 30 to 40 minutes. This is 375K to 400K records per minute.
The followup stated a throughput of 3.5 million rows in 9 minutes. This is 388K rows per minute.
This doesn't pass the "awesome" threshold.
October 17, 2006 - 10:33 am UTC
ahh, that was not clear in a quick read (which is what reviews get, I don't reread the entire page), you sort of changed the metric on me.
Anyway
so likely, you have hit the "ceiling" on your machine - parallelize it, and make sure you are not bottlenecked on OUTPUT IO (eg: everything writing to the same disk) as well as INPUT IO (eg: make sure you light up all of the disks when you do your own parallel extract)
Awesome?
Jan van Mourik, October 17, 2006 - 10:45 am UTC
awe·some (ô'səm) pronunciation
adj.
1. Inspiring awe: an awesome thunderstorm.
2. Expressing awe: stood in awesome silence before the ancient ruins.
3. Slang. Remarkable; outstanding: a totally awesome arcade game (Los Angeles Times).
- Slow process, 30-40 minutes for a large table
- 20 large tables
- So maybe 600~800 minutes
- AskTom question
- Answer gets me to 9 minutes
- So maybe 180 minutes total
- I went from 10+ hours to 3 hours without doing much myself
- Looks "Remarkable" to me!
Now maybe by taking Tom's hints and examples I can get it to perform even better, plus I learned a bunch about how to approach this problem, now how can that not be awesome???
extract performance from Oracle
Mark Johnson, November 14, 2006 - 6:47 pm UTC
PRS, check fastreader from wisdomforce.com if pro*c is still slow for you. fastreader probably could unload with speed about 100- 150K rows per second (if average width is 250 chars as you mentioned and you will use its parallel settings). Hope the storage I/O is fast otherwise it will be your major bottleneck
Follow up Question
A reader, September 28, 2007 - 7:24 pm UTC
Tom :
I appreciate your valued comment/suggestion. My question is related to the original question in the thread and here is my scenario:
---------
Scenario
---------
Our DB is on Host A and a third party Software Batch Script is running on Host B. The Batch Script
can read a file or a named pipe. The source for the Batch Script, is a bunch of rows
(could be huge depending on the scenario) from a table.
---------
Question
---------
1. Which one of the following is the most efficient (= FAST) way to pass the data to the Batch Script running
on Host B from the Database running on Host A ?
Kindly add yours if I am missing something here.
a. Use JDBC to get the data and write to the named pipe, that the Batch Script can read.
b. Same as a, except that write all the data to a file and once done, invoke the Batch script.
c. Same as a or b, except use OCI.
d. Same as a or b, except use PL/SQL
e. Same as a or b, except use array_flat (Pro*c Implementation) ?
2. Would your answer be any different, if the DB and the Third Party Software are running on the same Server ?
October 03, 2007 - 1:30 pm UTC
you do not tell me anything about what this batch script does or is.
I know it is not a sql script, since it can "read"
so - i have no comment as I have no idea what you are talking about.
Follow up question
A reader, October 05, 2007 - 2:04 am UTC
The batch script for instance could take a name, address1, address2, city, state and address fields from a customer table and write the cleansed
( standardization/formatting/validation etc., etc.,) values to a file.
Follow up
A reader, October 10, 2007 - 7:40 pm UTC
Tom :
This is a follow up of the previous question and this is the clarification to the question.
The Batch Script, mentioned in the previous posting will take row by row of customer data, do some enrichment, standardization, validation and report the results to a file. The Batch Script can either read from a file or read from a pipe.
My question is basically , what is the fastest way to give the data to the Script, from an Oracle table, since the Batch Script cannot read from a table directly.
Follow up
A reader, October 17, 2007 - 3:52 pm UTC
Tom :
Do you suggest that I post my response as a separate question ?