Skip to Main Content
  • Questions
  • oracle utl_file encoding from utf8; missing character

Breadcrumb

Question and Answer

Sergiusz Wolicki

Thanks for the question, Zoltan.

Asked: July 03, 2018 - 8:04 am UTC

Last updated: August 18, 2018 - 4:08 pm UTC

Version: 12.1.0.2.0

Viewed 10K+ times! This question is

You Asked

I like to export large amount of text data from db to file.
The characterset in db is UTF8.
The excepted result in the file is ISO8859P2 or MSWIN1250.

My db settings:

    SELECT * FROM v$nls_parameters;
    1 NLS_LANGUAGE HUNGARIAN 0
    2 NLS_TERRITORY HUNGARY 0
    9 NLS_CHARACTERSET UTF8 0
    10 NLS_SORT HUNGARIAN 0
    16 NLS_NCHAR_CHARACTERSET UTF8 0
    17 NLS_COMP BINARY 0
    18 NLS_LENGTH_SEMANTICS CHAR 0
    19 NLS_NCHAR_CONV_EXCP FALSE 0

    select * from nls_database_parameters;
    1 NLS_RDBMS_VERSION 12.1.0.2.0
    2 NLS_NCHAR_CONV_EXCP FALSE
    15 NLS_NCHAR_CHARACTERSET UTF8
    16 NLS_CHARACTERSET UTF8
    19 NLS_TERRITORY AMERICA
    20 NLS_LANGUAGE AMERICAN

    select * from nls_session_parameters;
    1 NLS_LANGUAGE HUNGARIAN
    2 NLS_TERRITORY HUNGARY
    9 NLS_SORT HUNGARIAN
    15 NLS_COMP BINARY
    16 NLS_LENGTH_SEMANTICS CHAR
    17 NLS_NCHAR_CONV_EXCP FALSE


The file created at server directory (linux). I haven't more information for linux characterset settings.

The PLSQL code:

    DECLARE
       v_fh     UTL_FILE.FILE_TYPE;
       v_eol    VARCHAR2(2);
       v_eollen PLS_INTEGER;
       CURSOR cur_sql IS
          
    SELECT T3.ID_RESULT
          ,T3.column1 
    FROM   table1
    WHERE  id_result = 999999
      ;
       "ID_RESULT" DBMS_SQL.NUMBER_TABLE;
       "column1" DBMS_SQL.VARCHAR2A;
    
    BEGIN
       EXECUTE IMMEDIATE 'ALTER SESSION SET NLS_DATE_FORMAT = ''YYYY.MM.DD HH24:MI:SS''';
       v_eol := CHR(13)||CHR(10);
       v_eollen := LENGTH(v_eol);
       v_fh := UTL_FILE.FOPEN('REP_DIR','result_test.csv','W', 32000);
    
       OPEN cur_sql;
       LOOP
          FETCH cur_sql
          BULK COLLECT INTO "ID_RESULT",
                            "column1",
                            LIMIT 1000;
          IF "ID_RESULT".COUNT > 0 THEN
             FOR i IN "ID_RESULT".FIRST .. "ID_RESULT".LAST LOOP
    
               UTL_FILE.PUT(v_fh, CONVERT("column1"(i),'EE8ISO8859P2','UTF8'));
               UTL_FILE.PUT_nchar(v_fh, v_eol);
    
               UTL_FILE.PUT(v_fh, CONVERT("column1"(i),'EE8MSWIN1250','UTF8'));
               UTL_FILE.PUT(v_fh, v_eol);
    
               UTL_FILE.PUT(v_fh, CONVERT("column1"(i),'EE8ISO8859P2'));
               UTL_FILE.PUT(v_fh, v_eol);
    
               UTL_FILE.PUT(v_fh, CONVERT("column1"(i),'EE8MSWIN1250'));
               UTL_FILE.PUT(v_fh, v_eol);
          
               UTL_FILE.PUT(v_fh, "column1"(i));
               UTL_FILE.PUT(v_fh, v_eol);             
    
               UTL_FILE.PUT(v_fh, utl_raw.cast_to_varchar2(utl_raw.convert(utl_raw.cast_to_raw("column1"(i) ),'HUNGARIAN_HUNGARY.EE8MSWIN1250', 'ENGLISH_UNITED KINGDOM.UTF8')));
               UTL_FILE.PUT(v_fh, v_eol);               
    
               UTL_FILE.fflush(v_fh);
             END LOOP;
          END IF;
          EXIT WHEN cur_sql%NOTFOUND;
       END LOOP;
       CLOSE cur_sql;
       UTL_FILE.FCLOSE(v_fh);
    EXCEPTION
       WHEN 
            .........
          RAISE;
    END;


The original value in db (hexa):
4373 6572 6520 4C61 6A6F 736E E9
Cs er e La jo sn é

The result in Notepad++ (encode in UTF8):

Csere Lajosn

Csere Lajosn

Csere Lajosn

Csere Lajosn

Csere Lajosné

Csere Lajosn

The result in Notepad++ (encode in ANSI, char set: windows-1250):

Csere Lajosn

Csere Lajosn

Csere Lajosn

Csere Lajosn

Csere Lajosné

Csere Lajosn

The result in Notepad++ (encode in ANSI, char set: iso-8859-2):

Csere Lajosn

Csere Lajosn

Csere Lajosn

Csere Lajosn

Csere LajosnĂŠ

Csere Lajosn


When i use CONVERT i lost "é" character.
How can i do convert the string from UTF8 to ANSI?

Thanks,
Zoltan

and we said...



You should never use the CONVERT function for anything other than specific character set repair tasks when advised by Oracle Support.

To generate a file in an encoding other than the database character set, open the file in write byte mode ('wb'). Then, use UTL_I18N.STRING_TO_RAW to convert your text into the desired character set making it at the same time to a RAW value. Then, write the RAW value with UTL_FILE.PUT_RAW.

I recommend defining a local procedure for doing the conversion and writing, to simplify coding of multiple write requests.

Is this answer out of date? If it is, please let us know via a Comment

More to Explore

Administration

Need more information on Administration? Check out the Administrators guide for the Oracle Database