I have a table with 100 million rows. Table size is around 25 GB in the DB.
I need to unload this data to a csv file.
What is the best fastest way to do this.
I was reading about external tables to do this by using CTAS but I dont think there is an option to directly create as table as select as a CSV format which create ascii file comma separated. It only gives the option as dmp oracle loader binary file.
Is there a way to do external table - CTAS with output on server as CSV - acsii file ? That would be ideal I think.
sql loader is the native industry king for loading millions of data into the oracle database but why not the other way round ?
I tried using sql developer export to csv and it is taking ages to complete due to 100 million rows. It is taking around 2 hours for one million row csv export so you can imagine this will be a 200 hour job using sql developer.
Toad just crashes on this attempt.
I tried sqlplus spool and that is also taking a long time as it has to scan thru the entire 100 million dataset.
I did not try the old utl file which we used to do in the olden days. I dont know what 12c enhancements are for this or utl file is even supported in this new cloudy world. But anything sequential is going to take a long time.
Spark and Cassandra can parallelize and do this in few minutes, but thats a different architecture. My data resides in Oracle
Is there a way to parallelize and do this in chunks of million then may be I can merge the 100 files in unix.
Please suggest ?
You're right, you can't do a CTAS export to CSV using an external table.
Here's a couple of things you could try instead. I'll assume you have a table like this:
create table t as
select rownum x, sysdate y from dual
connect by level <= 1000;
SQLcl parallel spoolYou can take advantage of the CSV sql format in SQLcl to pump your data out in that format. And use a parallel hint to have many processes running your query. Then you just need to spool the output:
set term off
set feed off
set sqlformat csv
spool out.csv
select /*+ parallel */* from t;
spool off
DIY parallelism with UTL_FILEYou can build a PL/SQL routine using utl_file to export a range of data from a table:
create or replace procedure csv_unload (
start_id t.x%type, end_id t.x%type
) as
type t_tab is table of t%rowtype;
rws t_tab;
output utl_file.file_type;
filename varchar2(100);
begin
select *
bulk collect into rws
from t
where x between start_id and end_id;
filename := 't_' || start_id || '_' || end_id || '.csv';
execute immediate 'alter session set nls_date_format=''YYYY/MM/DD''';
output := utl_file.fopen ( 'TMP', filename, 'w' );
utl_file.put ( output, 'X,Y' );
for i in 1 .. rws.count loop
utl_file.put_line ( output, rws(i).x || ',' || rws(i).y );
end loop;
utl_file.fclose ( output );
end csv_unload;
/
As you're exporting millions of rows you'll probably want to change the bulk collect to use explicit cursors with a limit. Or you may run out of PGA! ;)
Then call this using dbms_parallel_execute. This will submit N jobs producing N files you can merge together:
declare
task varchar2(30) := 'parallel_unload';
plsql varchar2(1000);
begin
dbms_parallel_execute.create_task( task_name => task );
dbms_parallel_execute.create_chunks_by_sql(
task_name => task,
sql_stmt => 'with grps as (
select ntile(10) over (order by x) grp, t.*
from t
)
select min(x) start_id, max(x) end_id from grps
group by grp',
by_rowid => false
);
plsql := q'[begin
csv_unload( :start_id,:end_id );
end;]';
dbms_parallel_execute.run_task(
task_name => task,
sql_stmt => plsql,
language_flag => dbms_sql.native,
parallel_level => 10
);
dbms_parallel_execute.drop_task( task_name => task );
end;
/
HT to Sean Stuber for PL/SQL dbms_parallel_execute examples:
https://seanstuber.wordpress.com/2013/10/17/parallel-plsql-with-dbms_parallel_execute/