Skip to Main Content
  • Questions
  • A replacement technology in Oracle database for ETL process

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Mark.

Asked: January 28, 2021 - 8:25 pm UTC

Last updated: January 29, 2021 - 10:04 am UTC

Version: 19.6.0.0.0

Viewed 100+ times

You Asked

Hi,

We currently use a third-party tool for the ingestion of files into Oracle database tables. This piece of software is quite expensive and I am investigating the possibility of replicating a small part of its functionality directly within Oracle technologies instead. I appreciate that this question is vague, so I will keep to the high-level features that need to be replicated. If you need more information, I am more than happy to discuss this further.

* The ETL tool has its own scripting language that is dynamically generated for each invocation of the ETL process. Most of the dynamic parts of the script are driven from data values within the Oracle database. Once generated, the script is then executed to perform the work.

* The ability to read text and binary files including multiple file formats within a single input file.
Dynamic file formats are an example of the previous requirement.

* To be able to perform optional enrichments by lookup up a key value.

* Bulk loading of the transformed data directly into an Oracle table partition.

* Multiple executions of the same ETL process may occur simultaneously working on a different set of files.

I have experimented with the obvious such as external tables, global temporary tables, SQL, in-memory and PL/SQL pipelined functions. Whilst I can get the correct output for a simple use-case, I cannot compete with the speed of the ETL tool using Oracle technologies. I discounted the use of SQL joins because some of the enrichment lookups are conditional, sometimes based upon another value in the data row being processed. There could be multiple lookups per row and tens of millions of rows in a single file. I discounted the use of SQL in PL/SQL because of the number of context switches this will incur.

I am stuck for ideas right now and would really appreciate your input as to whether there is another technique for implementing this type of ETL process directly in SQL or PL/SQL.

Many thanks in advance for any help and advice you can give.
Mark.

and we said...

I'm sure you can replicate all the logic in the database, but it sounds like you'll have to build a complete framework for processing files. This is a lot of work!

The big challenge is here:

* The ability to read text and binary files including multiple file formats within a single input file.
Dynamic file formats are an example of the previous requirement.


External tables are great when you know the input file structure. If you're working with a relatively small number of known record structures, it may be feasible to write routines to process each record type.

I discounted the use of SQL in PL/SQL because of the number of context switches this will incur.

You're on 19.6, which means you have access to (table) SQL macros! I'm not sure these will help, but they're worth looking into:

https://blogs.oracle.com/datawarehousing/sql-macros-have-arrived-in-autonomous-database

You could also check out polymorphic table functions. These allow you to create routines such as dynamic CSV-to-columns converters:

https://livesql.oracle.com/apex/livesql/file/content_F99JG73Z169WENDTTQFDQ0J09.html

But as I say, this will (probably) only be worth it if you're dealing with few record types. If you're dealing with a large number of record types and/or you have to process new record types with little (no!) warning you'll want to build a complete dynamic framework.

Remember that your time costs the company money too! In my experience, many companies avoid buying software because it's "too expensive" and build it themselves... only to spend significantly more in salaries for developers, testers, etc. to get the same result.

Rating

  (2 ratings)

Comments

SQL Macros in 19.6 ?!?!?!

Rajeshwaran, Jeyabal, January 29, 2021 - 9:44 am UTC

....
You're on 19.6, which means you have access to (table) SQL macros! I'm not sure these will help, but they're worth looking into:

https://blogs.oracle.com/datawarehousing/sql-macros-have-arrived-in-autonomous-database
....


but the docs says this

....
Starting with Oracle Database release 19c, version 19.7, SQL table macros are supported. SQL table macros are expressions, typically used in a FROM clause, to act as a sort of polymorphic (parameterized) views.
....


https://docs.oracle.com/en/database/oracle/oracle-database/19/lnpls/release-changes.html#GUID-0A638FCA-89C2-44E0-A5D5-0D09800D920D
Chris Saxon
January 29, 2021 - 10:04 am UTC

The docs are incorrect; SQL macros were added in 19.6

Confirmed

Graht, January 29, 2021 - 9:43 pm UTC

I can confirm that you can do it because the company I work for uses Oracle for our ETL process.

But it is a ton of work. We hired an outside contractor with *exceptional* Oracle PL/SQL Development skills and it took him about 4 months to write all of the required packages. And we've had to tweak it several times over the years.

More to Explore

SQL

The Oracle documentation contains a complete SQL reference.