Thanks for the question.

Asked: August 15, 2017 - 5:38 am UTC

Last updated: October 14, 2021 - 11:34 am UTC

Version: 12c R1

Viewed 100K+ times! This question is

You Asked

I have a table with 100 million rows. Table size is around 25 GB in the DB.
I need to unload this data to a csv file.
What is the best fastest way to do this.

I was reading about external tables to do this by using CTAS but I dont think there is an option to directly create as table as select as a CSV format which create ascii file comma separated. It only gives the option as dmp oracle loader binary file.

Is there a way to do external table - CTAS with output on server as CSV - acsii file ? That would be ideal I think.

sql loader is the native industry king for loading millions of data into the oracle database but why not the other way round ?

I tried using sql developer export to csv and it is taking ages to complete due to 100 million rows. It is taking around 2 hours for one million row csv export so you can imagine this will be a 200 hour job using sql developer.

Toad just crashes on this attempt.

I tried sqlplus spool and that is also taking a long time as it has to scan thru the entire 100 million dataset.

I did not try the old utl file which we used to do in the olden days. I dont know what 12c enhancements are for this or utl file is even supported in this new cloudy world. But anything sequential is going to take a long time.

Spark and Cassandra can parallelize and do this in few minutes, but thats a different architecture. My data resides in Oracle

Is there a way to parallelize and do this in chunks of million then may be I can merge the 100 files in unix.

Please suggest ?

and Chris said...

You're right, you can't do a CTAS export to CSV using an external table.

Here's a couple of things you could try instead. I'll assume you have a table like this:

create table t as 
  select rownum x, sysdate y from dual
  connect by level <= 1000;

SQLcl parallel spool

You can take advantage of the CSV sql format in SQLcl to pump your data out in that format. And use a parallel hint to have many processes running your query. Then you just need to spool the output:

set term off
set feed off
set sqlformat csv
spool out.csv
select /*+ parallel */* from t;
spool off

DIY parallelism with UTL_FILE

You can build a PL/SQL routine using utl_file to export a range of data from a table:

create or replace procedure csv_unload ( 
  start_id t.x%type, end_id t.x%type
) as
  type t_tab is table of t%rowtype;
  rws  t_tab;
  output   utl_file.file_type;
  filename varchar2(100);
begin

  select * 
  bulk   collect into rws
  from   t
  where  x between start_id and end_id;
  
  filename := 't_' || start_id || '_' || end_id || '.csv';
  
  execute immediate 'alter session set nls_date_format=''YYYY/MM/DD'''; 
  
  output := utl_file.fopen ( 'TMP', filename, 'w' );
  
  utl_file.put ( output, 'X,Y' );
  
  for i in 1 .. rws.count loop
    utl_file.put_line ( output, rws(i).x || ',' || rws(i).y );
  end loop;
  
  utl_file.fclose ( output );
  
end csv_unload;
/

As you're exporting millions of rows you'll probably want to change the bulk collect to use explicit cursors with a limit. Or you may run out of PGA! ;)

Then call this using dbms_parallel_execute. This will submit N jobs producing N files you can merge together:

declare
  task  varchar2(30) := 'parallel_unload';
  plsql varchar2(1000);
begin
  dbms_parallel_execute.create_task( task_name => task );
  
  dbms_parallel_execute.create_chunks_by_sql(
    task_name   => task,
    sql_stmt    => 'with grps as (
  select ntile(10) over (order by x) grp, t.*
  from   t
)
  select min(x) start_id, max(x) end_id from grps
  group  by grp',
    by_rowid    => false
  );
   
  plsql := q'[begin 
   csv_unload( :start_id,:end_id );
end;]';

  dbms_parallel_execute.run_task(
      task_name        => task,
      sql_stmt         => plsql,
      language_flag    => dbms_sql.native,
      parallel_level   => 10
  );
  
  dbms_parallel_execute.drop_task( task_name => task );
end;
/

HT to Sean Stuber for PL/SQL dbms_parallel_execute examples: https://seanstuber.wordpress.com/2013/10/17/parallel-plsql-with-dbms_parallel_execute/

Rating

(6 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

sqlplus settings

anonymous, August 16, 2017 - 10:34 am UTC

Hi

Sqlplus can write quite fast these kind of files, but please also notice sqlplus settings, such as arraysize, trimspool, trimout, flush etc. They can have major impact on the efficency of writing to file.

August 16, 2017 - 1:29 pm UTC

Yep, SQLcl has all the same setting as SQL*Plus.

A reader, November 16, 2017 - 8:03 pm UTC

I was trying to spool over 130000 of rows in sqlcl with sqlformat csv. How come tt only sent about 11 rows to the spool file and stopped ?

November 17, 2017 - 4:54 pm UTC

What exactly was your script? Are you sure the connection between SQLcl and the database wasn't severed for some reason?

continue

A reader, November 20, 2017 - 6:19 pm UTC

I just logged in with .sql and set the following before copied and pasted the SQL into sqlcl.

set term off
set feed off
set sqlformat csv
spool out.csv
select * from 130000rows_table

November 23, 2017 - 12:50 am UTC

Sorry - I can't reproduce

SQL> create table t130000rows_table as select d.* from dba_objects d, ( select 1 from dual connect by level < 10 )
  2  where rownum <= 130000;

Table T130000ROWS_TABLE created.

SQL> host cat c:\temp\unload.sql
set term off
set feed off
set sqlformat csv
spool out.csv
select * from t130000rows_table;
spool off

SQL> @c:\temp\unload.sql
SQL> host wc -l out.csv
 130001 out.csv

Bulk processing for exporting data to CSV

Sagar, September 17, 2021 - 10:46 am UTC

Hi Connor

W.r.t. to below snippet with PL/SQL you have shared, would like to add BULK COLLECT to below solution with using DBMS_SQL.
https://community.oracle.com/tech/developers/discussion/4176813/generating-csv-file-with-headers-using-stored-procedure

I don't think DBMS_SQL.PARSE supports nested table type as input and only accepts VARCHAR2 type. How should I go about implementing solution shared in above community link, in Bulk format to process X rows at a time?

Appreciate your response.
Thanks in advance..

September 17, 2021 - 5:10 pm UTC

You can change a DBMS_SQL cursor to a ref cursor with:

cur := DBMS_SQL.to_refcursor ( dbms_sql_cur );

Then bulk collect fetch it like any other cursor

https://livesql.oracle.com/apex/livesql/file/content_C1W23DSC7WZM6WY23PBAJZI2W.html

..Export to a bucket/object storage ?

Zanco Jaff, October 14, 2021 - 10:49 am UTC

How to save the file/CSV to object storage , possible to use dbms_cloud.put_object in the code?

October 14, 2021 - 11:34 am UTC

Are you using Autonomous? If so, you can do something like this to unload data in various formats:

begin
  dbms_cloud.export_data (
    credential_name => 'obj_store_cred',
    file_uri_list   => 'https://<your object store URI/file_name.csv',
    query           => 'select * from your_query',
    format          => '{"type" : "CSV"}'
  );
end;

Breadcrumb

Question and Answer

You Asked

and Chris said...

Rating

Comments

sqlplus settings

continue

Bulk processing for exporting data to CSV

..Export to a bucket/object storage ?

More to Explore

PL/SQL demos

PL/SQL docs

How to unload table data to csv file - fastest way for millions of records

Breadcrumb

Question and Answer

You Asked

and Chris said...

Rating

Comments

sqlplus settings

continue

Bulk processing for exporting data to CSV

..Export to a bucket/object storage ?

More to Explore

PL/SQL demos

PL/SQL docs