Skip to Main Content
  • Questions
  • Uncompressing externally zipped documents

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Martin.

Asked: January 22, 2019 - 10:30 am UTC

Last updated: January 24, 2019 - 5:00 pm UTC

Version: 11g2

Viewed 1000+ times

You Asked

Good morning/afternoon guys.

I am loading XML files from Windows into the database as BLOBs via an Apex front-end, however the performance leaves much to be desired (unsurprising, due to the 1Gb file sizes).

I have a cunning plan to first zip the files (with Windows 7zip) down to a far more manageable 16K-ish and then uncompress them with UTL_COMPRESS once loaded.

I'm pretty ignorant when it comes to anything to do with compression, so I had a good hunt around the docs. & forums, but cannot find anything that indicates this may be possible.


Could you tell me if/how it is possible to do this. Is UTL_COMPRESS even compatible with such external compression algorithms? Note, any solution must be entirely Windows-based as the users need to manually do this compression from their PCs. If 7zip is a no-go, I'm perfectly prepared to swap to another.

(Sadly, I'm also still stuck on 11g2).

and Chris said...

This sounds like a job for the preprocessor option of external tables!

Using this to read zipped files is a key use-case.

For example, if you have the following file:

[oracle@localhost tmp]$ cat test.txt
1,xxxxxxx,yyyyyyy
2,xxxxxxx,yyyyyyy
3,xxxxxxx,yyyyyyy
4,xxxxxxx,yyyyyyy
5,xxxxxxx,yyyyyyy
6,xxxxxxx,yyyyyyy
[oracle@localhost tmp]$ gzip test.txt 
[oracle@localhost tmp]$ ls test.txt*
test.txt.gz


You then read it like so:

create or replace directory exec_dir as '/bin';
grant execute on directory exec_dir to chris;

conn chris/chris

create table t (
  c1 int, 
  c2 varchar2(20), 
  c3 varchar2(20)
) organization external (
  default directory tmp
  access parameters ( 
    records delimited by newline
    disable_directory_link_check
    preprocessor exec_dir:'zcat'
    fields terminated by ',' (
      c1, c2, c3
    )
  )
  location ( 'test.txt.gz' )
);

select * from t;

C1   C2        C3        
   1 xxxxxxx   yyyyyyy   
   2 xxxxxxx   yyyyyyy   
   3 xxxxxxx   yyyyyyy   
   4 xxxxxxx   yyyyyyy   
   5 xxxxxxx   yyyyyyy   
   6 xxxxxxx   yyyyyyy 


Note this allows you to do some dangerous stuff!

To limit this, create a folder for the scripts you'll call in the preprocessor. Ensure you tightly control access to this folder!

If you want a Windows example, you can find one at:

https://blogs.oracle.com/oraclemagazine/preprocess-external-tables

Rating

  (2 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Martin Rose, January 22, 2019 - 3:48 pm UTC

Yes, but I can't use External Tables to reference a file on a Windows client, if the database resides under Unix.

Doesn't the DIR of the file being loaded have to be on the same machine as the server?
Chris Saxon
January 22, 2019 - 5:34 pm UTC

Yes. The file location must be accessible from the database server.

Martin Rose, January 24, 2019 - 4:46 pm UTC

For those interested, it has proven possible to use 7zip after all.

I created a one line Windows .BAT file as follows: "c:\program files\7-zip\7z" a -tGZip -mx1 %1.gzip %1

[-tGZip chooses the LZ method of compression (which is compatible with Oracle's UTL_COMPRESS)].

I then drag onto it the file (1Gb) I am compressing. That compressed file is then loaded into Apex as a BLOB using the usual 'File Browse' method, before de-compressing it at 'Submit Time' processing via a call to a PL/SQL package.
SELECT UTL_COMPRESS.LZ_UNCOMPRESS(BLOB_CONTENT)   AS UNCOMPRESSED_FILE
FROM   APEX_APPLICATION_TEMP_FILES
WHERE  NAME = (filename);

Upload times (incl. the compression/de-compression) are more than halved & temporary space is barely used (the file size dropping to 1/25th what it used to be), proving it to be a very worthwhile technique for large files.
Chris Saxon
January 24, 2019 - 5:00 pm UTC

Thanks for sharing!

More to Explore

Design

New to good database design? Check out Chris Saxon's full fundamentals class.