Skip to Main Content

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Sam.

Asked: May 29, 2020 - 8:44 pm UTC

Last updated: June 03, 2020 - 5:35 am UTC

Version: 12c

Viewed 1000+ times

You Asked

Hi There,

Will like some thoughts on the following (working on 12c).

We get flat files from various sources and each of these might come in with different delimiter ('|', ',','tab').

Is there a way to create an automated process to do the following:
- keep track of recent file drops in the directory
- fire off the process which would
* Find the delimiter in the file
* Find the number of columns in the file
* Parse through the file and review the columns enclosed in specific characters like double quotes (for OPTIONALLY ENCLOSE BY..)
* Dynamically create the External Table using the above (could use VARCHAR2(4000) for all the columns)
* Create a staging table from the External Table (CTAS)
* Create a log of all the processed files

Once the data has been moved into the Staging Tables we can review/process the same as per requirement.
I am not sure if this is doable purely on an Oracle PL/SQL environment. And I am also relatively new to Unix Scripting.

Please do share some examples/code if you can (Unix or PL/SQL).
Thanks!

and Chris said...

The scheduler has a file watcher object which can monitor a location and start a job when a new file arrives. So that handles the job kick-off:

https://oracle-base.com/articles/11g/scheduler-enhancements-11gr2#file_watcher

* Find the delimiter in the file

This is a HARD problem!

What if a file contains two or more possible delimiters? How do you choose the correct one? How will you resolve issues if the automated process chooses the wrong delimiter? How do you even know it was wrong?

Building a generic "get any file and deal with it" loader is a lot of work.

So the question is do you really receive files which can be in any format? Or do you have to deal with several different (known) formats?

If you're working with files that have (many) different formats, but you know what they all are, I prefer to create a table and write a package to handle each type. This makes it easy to do file validation (fields have the correct data type, it's not corrupted, etc.) and report corrupt files.

If you really must write a "get any file and deal with it" I recommend upgrading to Oracle Database 18c. This has a couple of features which make this a lot easier:

* Polymorphic table functions (PTFs)
* Inline external tables

With PTFs you can write a delimiter-to-columns converter. For example, this CSV to columns converter I wrote:

https://livesql.oracle.com/apex/livesql/file/content_F99JG73Z169WENDTTQFDQ0J09.html

Inline external are "tables" that only exist within your query. These have a "table" name of EXTERNAL, with the columns and external table pasted within it.

For example, this reads the first two columns from test.txt in the directory TMP:

  select * from external ( (
      c1 varchar2(10), c2 varchar2(10)
    )     
    default directory tmp 
    location ('test.txt') 
  );


These save you having to create and drop a table/file.

* Create a log of all the processed files

What exactly should go in this log? External tables can create log files with details about the file read. You can create a table storing file names, rows processed, etc. It really depends on what you want to capture.

Rating

  (7 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

the APEX_DATA_PARSER package

Stew Ashton, June 01, 2020 - 3:53 pm UTC

If I may, the APEX_DATA_PARSER package appears to do almost everything you want. It is available if you install version 19.1 or later of APEX. I would highly recommend installing the latest APEX version which has the most up-to-date version of the package; also install the patch sets if you have support.

Here is an explanation of the package:

https://blogs.oracle.com/apex/super-easy-csv-xlsx-json-or-xml-parsing-about-the-apex_data_parser-package

I don't know if the performance or scalability of this package would meet your requirements, but I think it's the best thing available from Oracle for a generic requirement. Note that it doesn't really use external tables.

Best regards,
Stew Ashton

Sam, June 02, 2020 - 1:49 am UTC

Thanks for your reply Chris.
I will look into the file watcher objects. As per the delimiters maybe we can try to educate our counterparts to provide a standard delimiter . This has been a bit of challenge for us.
As per the columns being submitted, is there a way to build a generalized solution which could accept all the columns being submitted. Thinking in terms of automation once again. Let’s say we receive 20 columns in1 file and the same changes to 23 the following day. Is there an easy way to handle this? I was initially thinking of having say 500 columns in External table (Col1 .. Col500), and then load only the 1st 20 columns as supplied in the file. But the External Tables don’t seem to work in this way.
I will look into the Apex Parser option as well. Thanks Stew! Would this work with 12c?
Thanks!

APEX support

Stew Ashton, June 02, 2020 - 5:24 am UTC

From https://www.oracle.com/tools/downloads/apex-downloads.html

"Oracle APEX 20.1 is fully supported through Oracle Support Services on all Editions (EE, SE2, SE, and SE1) of the Oracle database, 11.2.0.4 or higher with a valid Oracle Database Technical Support agreement."

APEX_DATA_PARSER

Sam, June 02, 2020 - 10:32 pm UTC

Hi Stew,

Did some further readup on this utility and thanks for sharing this piece of information. We might be able to use it.

Although I am still exploring it, do you know if we can schedule some jobs to auto load files from a specific directory location on an hourly basis?

Thanks again!
Connor McDonald
June 03, 2020 - 5:34 am UTC

Chris has mentioned the file watcher, but if you are not using that, then the standard scheduler can be used, and you can easily use an external table to get a listing of files in a directory if you need to check for existence.

An example of that here

http://www.oracle-developer.net/display.php?id=513

Sam, June 02, 2020 - 10:38 pm UTC

This seems to be a front end GUI based application. Hence, just wondering if we can schedule some of the SQL -> Data Workshop jobs to run on a continuous basis.

Thanks!
Connor McDonald
June 03, 2020 - 5:35 am UTC

APEX is a GUI but if you install it on *your* database, then you also get all the database backend facilities it offers (including APEX_DATA_PARSER).

Click the download link from apex.oracle.com to get the installation package.

A reader, June 02, 2020 - 11:09 pm UTC

Sorry for yet another post (haven't figured out how to edit/remove my feedback as of yet :( ).

I signed up for a free workspace on apex.oracle.com. Tried the SQL Workshop option. I can't seem to find the APEX_DATA_PARSER utility under the SQL Workshop or App Builder. The production version is 20.1

Please advise if you have any thoughts...
Connor McDonald
June 03, 2020 - 5:35 am UTC

See my previous reply. You need to install it on *your* database

Sam, June 03, 2020 - 6:03 pm UTC

Thanks for your feedback. Will get the utility installed on my instance.

More to Explore

Design

New to good database design? Check out Chris Saxon's full fundamentals class.