Thanks for the question, wade.

Asked: September 21, 2020 - 3:43 am UTC

Last updated: September 25, 2020 - 5:04 am UTC

Version: 19

Viewed 10K+ times! This question is

You Asked

For the last 17 years I've been focused on real-time apps where the number of rows retrieved from Oracle is usually 1 and almost always less than 10. I have a good reputation in being able to write very performant real-time applications, but suddenly I'm faced with reading millions (and possibly billions) of rows from a massive batch database server and am looking for ways to improve the speed.

So here's the big question: What is hands down the absolute fastest way to extract data from an oracle table or view, given these constraints:

1) There is exactly one query being executed which traverses the entire table or view.
2) The rows being extracted from oracle need to be pre-processed by a formatting routine that will essentially convert the entire row into a fixed length record. The layout for the fixed length record is similar to a Cobol data layout, however, no packed binary, just good old plain ascii.
3) The fixed length records will be grouped into bundles and the bundles will be handed off to background threads that will further process the data and produce output files and/or summary information. Note that for our benchmark speed test, this last piece is omitted.

We found this blub on the Driver page that states that using OCI over IPC is faster than a network connection and are trying to test the premise using a home-made Java based speed-test/benchmark program that just does steps 1 & 2 from the above constraints.

https://docs.oracle.com/cd/E11882_01/appdev.112/e13995/oracle/jdbc/OracleDriver.html

     An IPC connection is much faster than a network connection.

Our database hardware consists of a bare-metal AIX database server. We also have several other Linux based VM's in the same datacenter. The remote client for our testing purposed is just one of the Linux VM's in the same datacenter as the database AIX server.

In the java speed-test app, we've set the statement fetch size to 4196 rows. The results are are confusing me, see table below. According to our results, using JDBC's thin driver (over the network) is faster than OCI over IPC. Note, we set a max records value to stop the benchmark at a fixed point rather than traversing the entire table.

                                                                                                   Mib/
Library  Protocol  Connect String                               Client      Records    Seconds   Second
jdbc     ojdbc     jdbc:oracle:thin:@MyDbServer:1521/MySID      remote    3,629,265    453.569    9.195
oci      sql*net   jdbc:oracle:oci8:@MyDbServer:1521/MySID      remote    3,629,265    631.424    6.605
oci      ipc       jdbc:oracle:oci8:@                           local     3,629,265    667.554    6.248

Due the the poor Mib/Sec rate we see using the Java App, I decided to write a C++ OCCI program and it is much, much faster, but alas, the database server is AIX and I can't for the life of me figure out how to compile the program on AIX. Will the program be faster if I run it on the database server? Is there conceivably another method of data extraction that might be faster than the Java or C++ programs we've written?

In the C++ program, I've set the prefetch memory size to 1Gib and have played with different fetch row sizes. The client is a remote Linux server in the same datacenter as the database server itself.

     Prefetch               Prefetch      Mib/
       Memory    Max Rows       Rows    Second
1,073,741,824     100,000         10     2.529
1,073,741,824     100,000        100    11.135
1,073,741,824     100,000      1,000    29.115
1,073,741,824     100,000     10,000    32.064
1,073,741,824     100,000    100,000    23.067
1,073,741,824   1,000,000    100,000    32.970
1,073,741,824  10,000,000     10,000    33.358
1,073,741,824   3,000,000     10,000    34.014
1,073,741,824   3,000,000     10,000    33.990  (101.990 Seconds)

Example Remote Connection String being used the the OCCI program:

     (DESCRIPTION = (ADDRESS=(PROTOCOL = TCP)(HOST = MyHostName)(PORT = 1521)) (CONNECT_DATA= (SERVICE_NAME = MyDbSID) (SERVER = DEDICATED)))

Now, after dropping all this in your lap, I have a question about the "ResultSet". I've looked at the documentation for Java, C++, and C#, and all three appear to use "ResultSet". What is disturbing to me is that "ResultSet" requires one to traverse the results exactly one row at a time. What would be better is to break the results up in to micro-batches. We know the data is coming into the Driver in micro batches as per the Statements "setFetchSize()" method. So instead of saying "resultSet.next()", we would do "resultSet.nextBatch()", where "nextBatch()" would return a unique ResultSet object that contains either the same number of rows that is set by the "setFetchSize()" Statement method, or the number of rows left to process that were retrieved by the last fetch from the database by the Driver, whichever is smaller. A call to ReslutSet.nextBatch(), where no rows are left, would throw an exception that clearly indicates the end of the query results. Each micro batch can then be independently processed background thread(s)/task(s).

Is there an Oracle API or product that works this way?

Thank you for your time and I look forward to hearing from you!

_wade

and Connor said...

I would say take a look at this Pro*C program (which is based on one from asktom many years back). It perform array fetching on the data to keep it as fast as possible.

https://github.com/gregrahn/oracle-ascii-unload

I popped that into my virtual box (which is running on my laptop) so we're not talking high power server spec here :-)

I got the following running *locally* on the server, where table "LOTS_DATA" is 1,000,000 million copies of SCOTT.EMP, hence 14million rows in total.

[oracle@db19 flat]$ time ./oracle-ascii-unload userid=system/admin sqlstmt='select * from lots_data' arraysize=1000 > /dev/null
Unloading 'select * from mcdonac.lots_data'
Array size = 1000
EMPNO|ENAME|JOB|MGR|HIREDATE|SAL|COMM|DEPTNO
14000000 rows extracted
Elapsed: 00:01:13


real 1m13.319s
user 1m6.283s
sys 0m0.484s

So just under 14million rows per minute, and then I bumped by VM up to 2 cores, and ran 2 concurrently, and the second one ran:

[oracle@db19 flat]$ time ./oracle-ascii-unload userid=system/admin sqlstmt='select * from mcdonac.lots_data' arraysize=1000 > /dev/null
Unloading 'select * from mcdonac.lots_data'
Array size = 1000
EMPNO|ENAME|JOB|MGR|HIREDATE|SAL|COMM|DEPTNO
14000000 rows extracted
Elapsed: 00:01:14


real 1m13.705s
user 1m6.398s
sys 0m0.644s

so scaling wise you could hence run multiple ones each unloading different tables, or different partitions of the same table, or different subsets of data etc

Note that this program takes the SQL as an argument, so a program with the SQL statement burnt into the code could potentially run faster, because you know in advance how to build/format the output records.

But ultimately the limiting factors for fast unloading here are going to be:

a) how fast you can read the source data

on my VM I ran this

SQL> begin
  2  for i in ( select * from lots_data ) loop
  3    null;
  4  end loop;
  5  end;
  6  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:20.56

So the act of just getting 14m rows on my VM is going to take a minimum of 20 seconds.

b) running as close to the data as possible (ie, unload the data on the server not across the network). If you *must* run across the network, you'd be looking at huge arraysizes to minimize trips and also the network infrastructure (bandwidth, latency, jumbo frames etc)

c) minimizing system calls. In the Pro*C above, we're calling 'printf' for each column for each row. I could optimize that a *lot* by sending output in larger chunks (eg building a string for 'n' rows and then dumping that).

For example, concatenating the data for each column and just one printf call per row gave me:

[oracle@db19 flat]$ time ./oracle-ascii-unload userid=system/admin sqlstmt='select * from mcdonac.lots_data' arraysize=1000 > /dev/null
Unloading 'select * from mcdonac.lots_data'
Array size = 1000
EMPNO|ENAME|JOB|MGR|HIREDATE|SAL|COMM|DEPTNO
14000000 rows extracted
Elapsed: 00:00:52


real  0m52.400s
user  0m45.527s
sys 0m0.444s

If I take out *all* of the printf elements, I got down to around 46seconds...so I'm pretty close as good as my VM is going to do.

As an aside, if you don't need the flexiblity of dynamic SQL, then if you install the pro*c demos, you can use sample3.pc as a base for a static SQL version that uses array fetching. Hard-coding the query, array size (to 500), columns from my LOTS_DATA into that one and no printf calls gave

[oracle@db19 flat]$ time ./sample3

Connected to ORACLE as user: system

real 0m20.506s
user 0m17.191s
sys 0m0.171s

which shows me that I can read the data in Pro*C just as fast as PL/SQL.

Rating

(1 rating)

Is this answer out of date? If it is, please let us know via a Comment

Comments

FastBCP : parallel export

Romain Ferraton, January 23, 2025 - 8:01 pm UTC

We have built a CLI Export tool called FastBCP (Windows or Linux) that can export data using multiple threads and streams data to several CSV files.
You can choose :
- source type : Oracle, MySQL, PostgreSQL, MSSQL, Netezza, ClickHouse, ODBC.
- delimiter,
- date, time and numeric format
- to merge CSV files or not,
- source : schema, table, query (raw or in a file)
- output : directory, filename, encoding, file type (csv or parquet)
- methods to parallelize export (rowid, DataDriven, RangeId, Ctid...)
- parallel degree : fixe or dynamic

No install (databases clients are included).
It log on console and eventually in json and/or database.

Powershell Exemple :

.\FastBCP.exe `
--connectiontype "oraodp" `
--server "localhost:1521/ORCLPDB" `
--user "TPCH_IN" `
--password "TPCH_IN" `
--query "SELECT EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, COMM, DEPTNO FROM TPCH_IN.LOTS_DATA" `
--directory "D:\temp" `
--fileoutput "LOTS_DATA.CSV" `
--decimalseparator "." `
--delimiter "|" `
--dateformat "yyyy-MM-dd HH:mm:ss" `
--encoding "UTF-8" `
--method "DataDriven" `
--distributekeycolumn "EMPNO" `
--paralleldegree 12

It return LOTS_DATA.CSV 14M rows +header in 5049 ms (2.77M rows/sec | 22M cells/sec)
It is completly dynamic, and scalable

If your want to try it : contact me

More to Explore

Administration

Need more information on Administration? Check out the Administrators guide for the Oracle Database

Breadcrumb

Question and Answer

You Asked

and Connor said...

Rating

Comments

FastBCP : parallel export

More to Explore

Administration

How to improve Oracle data extraction throughput rate

Breadcrumb

Question and Answer

You Asked

and Connor said...

Rating

Comments

FastBCP : parallel export

More to Explore

Administration