You Asked

Hi, I am attempting to export a large amount of data from multiple separate tables from Oracle 11 into a NoSQL database via a Java app utilising JDBI.

The data is being read from the following tables: store, store2, staff and product.

The final desired data structure is a multi-tiered structure like so;

Country
  Store1
    StoreFloorSize
    StoreAddress1
    StorePostcode
    StoreStaff
      StaffMember1
 StaffForename
      StaffMember2
 StaffForename
    StoreProducts
      Product1
 ProductName
      Product2
 ProductName
  Store2
...

There will be many countries and each country can have many stores and each store can have many staff members/products.

So far I've attempted to perform this export by querying the data from Oracle in the structure utilising cursor expressions (refcursors) and then
mapping the results to Java objects before saving to the new NoSQL database.

A very simplified version of the query used to extract data from Oracle is below;

select countryName, cursor(storeFloorSize, storeAddress1, storePostcode, cursor(select staffForename from staff where staff.storeId = store.Id),
cursor(select productName from product where product.storeid = store.id)
from (select * from store union all select * from store2) store WHERE store.countryid = country.id) from country

This approach works however due to the volume of data it's taking a long time (a number of days to complete) and there are a few constraints with it.

The entire process takes a two to three days to complete however when looking at Oracle stats the time spent executing on Oracle is only approximately 6 hours (Note 6 hours for query completion is good for the volume of data).

So far in trying to track down where this additional time is taken I've done/checked the following;

1. First the NoSQL database has been removed from the equation entirely and the performance remains the same.
2. The Oracle server and machine on which the Java application is running on are both fine in terms of CPU and Memory resources (very little usage on both machines for both resources)
3. I've broken the task up across multiple threads each working on separate partitions of the table (country in the above example); each thread performs select from oracle-> map to java objects -> save to NoSQL - This parallel processing when done across a large number of threads reduced the execution time on Oracle but had no real affect on the overall time. (These are separate threads in Java and each has their own separate connection to Oracle via a connection pool)
4. I've tried modifying the fetchSize property however it seems to have a very small difference (This adds another complication as each result row contains three cursors and when running in parallel across a large number of threads the MAX_OPEN_CURSORS setting on Oracle needs to increase drastically very quickly).

I can't seem to identify any particular bottle necks however resource utilisation is still very low.

As mentioned in the first line I'm using the JDBI wrapper around JDBC to perform the query and map the results to Java objects however if this was the bottle neck I believe that I'd see high usage on the machine running the Java application.

Is there anything I may have overlooked with regard to the above, or Oracle settings that may be pertinent to investigate further, or might I be better of moving back to pure SQL queries and performing the transformation in Java?

and Chris said...

Hmmm, converting relational data from Oracle to NoSQL using Java. Sounds painful ;)

Anyway, if the time spent in the database is only 6 hours, yet the process takes days to complete, you need to figure out what the application is doing in those other 50+ hours.

Until you know this you're shooting in the dark. Maybe you'll get lucky and figure out the problem. More likely you'll keep tweaking things which have little or no impact.

Profile the whole application, end-to-end. Products such as Dynatrace can help you with this.

Once you've identified where the time is going, you can start figuring out what to change.

If you need further help once you have this analysis, post your findings and we'll see what we can do.

Of course, the easier solution may be to use a replication product to do the conversion, such as GoldenGate:

http://www.oracle.com/us/products/middleware/data-integration/goldengate-for-big-data-ds-2415102.pdf

Is this answer out of date? If it is, please let us know via a Comment

More to Explore

SQL

The Oracle documentation contains a complete SQL reference.

Breadcrumb

Question and Answer

You Asked

and Chris said...

More to Explore

SQL

Java Oracle ETL using CursorExpressions

Breadcrumb

Question and Answer

You Asked

and Chris said...

More to Explore

SQL