Skip to Main Content
  • Questions
  • How to unload table data to csv file - fastest way for millions of records

Breadcrumb

Dev Live Dev Intro

This month we are celebrating Developers at AskTOM. We welcome Developers of all levels of experience to join us at our FREE Developer Live events coming in August. Just click on the left to register today! If you are brand new to Database Technology, then we also have got you covered. Just click on the right for your comprehensive FREE training program to kick start your Oracle Database Development journey!

Question and Answer

Chris Saxon

Thanks for the question.

Asked: August 15, 2017 - 5:38 am UTC

Answered by: Chris Saxon - Last updated: November 23, 2017 - 12:50 am UTC

Category: Database Development - Version: 12c R1

Viewed 50K+ times! This question is

You Asked

I have a table with 100 million rows. Table size is around 25 GB in the DB.
I need to unload this data to a csv file.
What is the best fastest way to do this.

I was reading about external tables to do this by using CTAS but I dont think there is an option to directly create as table as select as a CSV format which create ascii file comma separated. It only gives the option as dmp oracle loader binary file.

Is there a way to do external table - CTAS with output on server as CSV - acsii file ? That would be ideal I think.

sql loader is the native industry king for loading millions of data into the oracle database but why not the other way round ?

I tried using sql developer export to csv and it is taking ages to complete due to 100 million rows. It is taking around 2 hours for one million row csv export so you can imagine this will be a 200 hour job using sql developer.


Toad just crashes on this attempt.

I tried sqlplus spool and that is also taking a long time as it has to scan thru the entire 100 million dataset.

I did not try the old utl file which we used to do in the olden days. I dont know what 12c enhancements are for this or utl file is even supported in this new cloudy world. But anything sequential is going to take a long time.

Spark and Cassandra can parallelize and do this in few minutes, but thats a different architecture. My data resides in Oracle

Is there a way to parallelize and do this in chunks of million then may be I can merge the 100 files in unix.



Please suggest ?

and we said...

You're right, you can't do a CTAS export to CSV using an external table.

Here's a couple of things you could try instead. I'll assume you have a table like this:

create table t as 
  select rownum x, sysdate y from dual
  connect by level <= 1000;


SQLcl parallel spool

You can take advantage of the CSV sql format in SQLcl to pump your data out in that format. And use a parallel hint to have many processes running your query. Then you just need to spool the output:

set term off
set feed off
set sqlformat csv
spool out.csv
select /*+ parallel */* from t;
spool off


DIY parallelism with UTL_FILE

You can build a PL/SQL routine using utl_file to export a range of data from a table:

create or replace procedure csv_unload ( 
  start_id t.x%type, end_id t.x%type
) as
  type t_tab is table of t%rowtype;
  rws  t_tab;
  output   utl_file.file_type;
  filename varchar2(100);
begin

  select * 
  bulk   collect into rws
  from   t
  where  x between start_id and end_id;
  
  filename := 't_' || start_id || '_' || end_id || '.csv';
  
  execute immediate 'alter session set nls_date_format=''YYYY/MM/DD'''; 
  
  output := utl_file.fopen ( 'TMP', filename, 'w' );
  
  utl_file.put ( output, 'X,Y' );
  
  for i in 1 .. rws.count loop
    utl_file.put_line ( output, rws(i).x || ',' || rws(i).y );
  end loop;
  
  utl_file.fclose ( output );
  
end csv_unload;
/


As you're exporting millions of rows you'll probably want to change the bulk collect to use explicit cursors with a limit. Or you may run out of PGA! ;)

Then call this using dbms_parallel_execute. This will submit N jobs producing N files you can merge together:

declare
  task  varchar2(30) := 'parallel_unload';
  plsql varchar2(1000);
begin
  dbms_parallel_execute.create_task( task_name => task );
  
  dbms_parallel_execute.create_chunks_by_sql(
    task_name   => task,
    sql_stmt    => 'with grps as (
  select ntile(10) over (order by x) grp, t.*
  from   t
)
  select min(x) start_id, max(x) end_id from grps
  group  by grp',
    by_rowid    => false
  );
   
  plsql := q'[begin 
   csv_unload( :start_id,:end_id );
end;]';

  dbms_parallel_execute.run_task(
      task_name        => task,
      sql_stmt         => plsql,
      language_flag    => dbms_sql.native,
      parallel_level   => 10
  );
  
  dbms_parallel_execute.drop_task( task_name => task );
end;
/


HT to Sean Stuber for PL/SQL dbms_parallel_execute examples: https://seanstuber.wordpress.com/2013/10/17/parallel-plsql-with-dbms_parallel_execute/

and you rated our response

  (3 ratings)

Is this answer out of date? If it is, please let us know via a Review

Reviews

sqlplus settings

August 16, 2017 - 10:34 am UTC

Reviewer: anonymous from finland

Hi

Sqlplus can write quite fast these kind of files, but please also notice sqlplus settings, such as arraysize, trimspool, trimout, flush etc. They can have major impact on the efficency of writing to file.



Chris Saxon

Followup  

August 16, 2017 - 1:29 pm UTC

Yep, SQLcl has all the same setting as SQL*Plus.

November 16, 2017 - 8:03 pm UTC

Reviewer: A reader

I was trying to spool over 130000 of rows in sqlcl with sqlformat csv. How come tt only sent about 11 rows to the spool file and stopped ?
Chris Saxon

Followup  

November 17, 2017 - 4:54 pm UTC

What exactly was your script? Are you sure the connection between SQLcl and the database wasn't severed for some reason?

continue

November 20, 2017 - 6:19 pm UTC

Reviewer: A reader

I just logged in with .sql and set the following before copied and pasted the SQL into sqlcl.

set term off
set feed off
set sqlformat csv
spool out.csv
select * from 130000rows_table


Connor McDonald

Followup  

November 23, 2017 - 12:50 am UTC

Sorry - I can't reproduce

SQL> create table t130000rows_table as select d.* from dba_objects d, ( select 1 from dual connect by level < 10 )
  2  where rownum <= 130000;

Table T130000ROWS_TABLE created.

SQL> host cat c:\temp\unload.sql
set term off
set feed off
set sqlformat csv
spool out.csv
select * from t130000rows_table;
spool off

SQL> @c:\temp\unload.sql
SQL> host wc -l out.csv
 130001 out.csv



More to Explore

PL/SQL

Check out more PL/SQL tutorials on our LiveSQL tool.