Skip to Main Content

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Dusan.

Asked: December 02, 2021 - 9:41 am UTC

Last updated: December 03, 2021 - 4:31 am UTC

Version: 19c

Viewed 1000+ times

You Asked

Dear Asktom team,

We are processing external tables, where it is agreed that the incoming files are in character set EE8SSWIN1250.
Nevertheless, it may happen that the incoming file has a different character set. The file is processed, but the characters are scrambled.
That leads to unwanted results.
Is it possible to check from PL/SQL the character set of the file before processing?


Thanks,

Dusan

and Connor said...

Not really, because a file might not even contain any header information that indicates its encoding. There are tools out there that will "guess" at what the encoding is which might help, eg

[oracle@db192 ~]$ file -i emp.csv
emp.csv: text/plain; charset=us-ascii


You could take advantage of this by using *another* external table to query this result and then make a decision on whether you need to modify the true external table, eg

  create table file_check
   (    ftype varchar2(64)
   )  
   organization external
    ( type oracle_loader
      default directory temp
      access parameters
      ( records delimited by newline
        preprocessor  bin:'run_file_command.sh'
   )
      location
       ( temp:'dummy.txt'
       )
    )
   reject limit unlimited


where "run_file_command.sh" is something like:

file -i emp.csv | sed 's/.*charset=//g'



Is this answer out of date? If it is, please let us know via a Comment

More to Explore

Design

New to good database design? Check out Chris Saxon's full fundamentals class.