For the last 17 years I've been focused on real-time apps where the number of rows retrieved from Oracle is usually 1 and almost always less than 10. I have a good reputation in being able to write very performant real-time applications, but suddenly I'm faced with reading millions (and possibly billions) of rows from a massive batch database server and am looking for ways to improve the speed.
So here's the big question: What is hands down the absolute fastest way to extract data from an oracle table or view, given these constraints:
1) There is exactly one query being executed which traverses the entire table or view.
2) The rows being extracted from oracle need to be pre-processed by a formatting routine that will essentially convert the entire row into a fixed length record. The layout for the fixed length record is similar to a Cobol data layout, however, no packed binary, just good old plain ascii.
3) The fixed length records will be grouped into bundles and the bundles will be handed off to background threads that will further process the data and produce output files and/or summary information. Note that for our benchmark speed test, this last piece is omitted.
We found this blub on the Driver page that states that using OCI over IPC is faster than a network connection and are trying to test the premise using a home-made Java based speed-test/benchmark program that just does steps 1 & 2 from the above constraints.
https://docs.oracle.com/cd/E11882_01/appdev.112/e13995/oracle/jdbc/OracleDriver.html An IPC connection is much faster than a network connection.
Our database hardware consists of a bare-metal AIX database server. We also have several other Linux based VM's in the same datacenter. The remote client for our testing purposed is just one of the Linux VM's in the same datacenter as the database AIX server.
In the java speed-test app, we've set the statement fetch size to 4196 rows. The results are are confusing me, see table below. According to our results, using JDBC's thin driver (over the network) is faster than OCI over IPC. Note, we set a max records value to stop the benchmark at a fixed point rather than traversing the entire table.
Mib/
Library Protocol Connect String Client Records Seconds Second
jdbc ojdbc jdbc:oracle:thin:@MyDbServer:1521/MySID remote 3,629,265 453.569 9.195
oci sql*net jdbc:oracle:oci8:@MyDbServer:1521/MySID remote 3,629,265 631.424 6.605
oci ipc jdbc:oracle:oci8:@ local 3,629,265 667.554 6.248
Due the the poor Mib/Sec rate we see using the Java App, I decided to write a C++ OCCI program and it is much, much faster, but alas, the database server is AIX and I can't for the life of me figure out how to compile the program on AIX. Will the program be faster if I run it on the database server? Is there conceivably another method of data extraction that might be faster than the Java or C++ programs we've written?
In the C++ program, I've set the prefetch memory size to 1Gib and have played with different fetch row sizes. The client is a remote Linux server in the same datacenter as the database server itself.
Prefetch Prefetch Mib/
Memory Max Rows Rows Second
1,073,741,824 100,000 10 2.529
1,073,741,824 100,000 100 11.135
1,073,741,824 100,000 1,000 29.115
1,073,741,824 100,000 10,000 32.064
1,073,741,824 100,000 100,000 23.067
1,073,741,824 1,000,000 100,000 32.970
1,073,741,824 10,000,000 10,000 33.358
1,073,741,824 3,000,000 10,000 34.014
1,073,741,824 3,000,000 10,000 33.990 (101.990 Seconds)
Example Remote Connection String being used the the OCCI program:
(DESCRIPTION = (ADDRESS=(PROTOCOL = TCP)(HOST = MyHostName)(PORT = 1521)) (CONNECT_DATA= (SERVICE_NAME = MyDbSID) (SERVER = DEDICATED)))
Now, after dropping all this in your lap, I have a question about the "ResultSet". I've looked at the documentation for Java, C++, and C#, and all three appear to use "ResultSet". What is disturbing to me is that "ResultSet" requires one to traverse the results exactly one row at a time. What would be better is to break the results up in to micro-batches. We know the data is coming into the Driver in micro batches as per the Statements "setFetchSize()" method. So instead of saying "resultSet.next()", we would do "resultSet.nextBatch()", where "nextBatch()" would return a unique ResultSet object that contains either the same number of rows that is set by the "setFetchSize()" Statement method, or the number of rows left to process that were retrieved by the last fetch from the database by the Driver, whichever is smaller. A call to ReslutSet.nextBatch(), where no rows are left, would throw an exception that clearly indicates the end of the query results. Each micro batch can then be independently processed background thread(s)/task(s).
Is there an Oracle API or product that works this way?
Thank you for your time and I look forward to hearing from you!
_wade
I would say take a look at this Pro*C program (which is based on one from asktom many years back). It perform array fetching on the data to keep it as fast as possible.
https://github.com/gregrahn/oracle-ascii-unload I popped that into my virtual box (which is running on my laptop) so we're not talking high power server spec here :-)
I got the following running *locally* on the server, where table "LOTS_DATA" is 1,000,000 million copies of SCOTT.EMP, hence 14million rows in total.
[oracle@db19 flat]$ time ./oracle-ascii-unload userid=system/admin sqlstmt='select * from lots_data' arraysize=1000 > /dev/null
Unloading 'select * from mcdonac.lots_data'
Array size = 1000
EMPNO|ENAME|JOB|MGR|HIREDATE|SAL|COMM|DEPTNO
14000000 rows extracted
Elapsed: 00:01:13
real 1m13.319s
user 1m6.283s
sys 0m0.484s
So just under 14million rows per minute, and then I bumped by VM up to 2 cores, and ran 2 concurrently, and the second one ran:
[oracle@db19 flat]$ time ./oracle-ascii-unload userid=system/admin sqlstmt='select * from mcdonac.lots_data' arraysize=1000 > /dev/null
Unloading 'select * from mcdonac.lots_data'
Array size = 1000
EMPNO|ENAME|JOB|MGR|HIREDATE|SAL|COMM|DEPTNO
14000000 rows extracted
Elapsed: 00:01:14
real 1m13.705s
user 1m6.398s
sys 0m0.644s
so scaling wise you could hence run multiple ones each unloading different tables, or different partitions of the same table, or different subsets of data etc
Note that this program takes the SQL as an argument, so a program with the SQL statement burnt into the code could potentially run faster, because you know in advance how to build/format the output records.
But ultimately the limiting factors for fast unloading here are going to be:
a) how fast you can read the source data
on my VM I ran this
SQL> begin
2 for i in ( select * from lots_data ) loop
3 null;
4 end loop;
5 end;
6 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:20.56
So the act of just getting 14m rows on my VM is going to take a minimum of 20 seconds.
b) running as close to the data as possible (ie, unload the data on the server not across the network). If you *must* run across the network, you'd be looking at huge arraysizes to minimize trips and also the network infrastructure (bandwidth, latency, jumbo frames etc)
c) minimizing system calls. In the Pro*C above, we're calling 'printf' for each column for each row. I could optimize that a *lot* by sending output in larger chunks (eg building a string for 'n' rows and then dumping that).
For example, concatenating the data for each column and just one printf call per row gave me:
[oracle@db19 flat]$ time ./oracle-ascii-unload userid=system/admin sqlstmt='select * from mcdonac.lots_data' arraysize=1000 > /dev/null
Unloading 'select * from mcdonac.lots_data'
Array size = 1000
EMPNO|ENAME|JOB|MGR|HIREDATE|SAL|COMM|DEPTNO
14000000 rows extracted
Elapsed: 00:00:52
real 0m52.400s
user 0m45.527s
sys 0m0.444s
If I take out *all* of the printf elements, I got down to around 46seconds...so I'm pretty close as good as my VM is going to do.
As an aside, if you don't need the flexiblity of dynamic SQL, then if you install the pro*c demos, you can use sample3.pc as a base for a static SQL version that uses array fetching. Hard-coding the query, array size (to 500), columns from my LOTS_DATA into that one and no printf calls gave
[oracle@db19 flat]$ time ./sample3
Connected to ORACLE as user: system
real 0m20.506s
user 0m17.191s
sys 0m0.171s
which shows me that I can read the data in Pro*C just as fast as PL/SQL.