I have a requirement to archive historical data, that is delete data from a number of tables that is more than 36 months old, but only a subset of data, for arguments sake lets call them quotes given to customers that have not been taken up. We currently have 54 tables from which data will be removed, although this is a dynamic list. As part of the data removal we need to back up the data being removed but not the complete table(s), in case any Quote needs to be recovered. I was unable to work out how to use DBMS_DATAPUMP to only export limited data from a table, at say a maximum of 5000 quotes. So, what I have done is to create temporary tables that are copies of the definition of the source tables, copy the data to delete to these temporary tables and then use DBMS_DATAPUMP to export the temporary tables, then delete the data and temporary tables. Is this a valid approach and am I missing or losing something by taking this approach? Is there anyway to avoid using temporary tables?
The create temporary tables code is
PROCEDURE build_temporary_tables(p_report_file IN UTL_FILE.FILE_TYPE,
p_report_filename IN VARCHAR2,
p_current_timestamp IN TIMESTAMP) IS
v_DML_cmd VARCHAR2(500);
v_temp_table_count NUMBER(15) ;
BEGIN
-- Create a set of empty temporary tables that are otherwise identical to the
-- tables identified for data removal
UTL_FILE.PUT_LINE( p_report_file, 'Build Temporary Tables started');
v_temp_table_count := 0 ;
FOR i IN (SELECT table_name FROM archive_table_list)
LOOP
v_DML_cmd := 'CREATE TABLE ABC.' ||i.table_name || '_T' ||
' AS (SELECT * FROM ' || i.table_name || ' WHERE 1=2 )' ;
EXECUTE IMMEDIATE v_DML_cmd ;
v_temp_table_count := v_temp_table_count + 1 ;
END LOOP;
UTL_FILE.PUT_LINE(p_report_file, ' Number of temp tables created '||
TRIM(TO_CHAR(v_temp_table_count, '999999999')));
UTL_FILE.NEW_LINE(p_report_file, 1 );
EXCEPTION
WHEN OTHERS THEN
DBMS_OUTPUT.PUT_LINE(DBMS_UTILITY.FORMAT_ERROR_BACKTRACE);
RAISE;
END build_temporary_tables;
The temporary tables are then populated with insert commands using dynamic SQL
The Archiving code looks like this
PROCEDURE archive_quotes(p_report_file IN UTL_FILE.FILE_TYPE,
p_archive_directory IN VARCHAR2,
p_current_timestamp IN TIMESTAMP,
p_archive_filename IN VARCHAR2) IS
ind NUMBER; -- Loop index
h1 NUMBER; -- Data Pump job handle
percent_done NUMBER; -- Percentage of job complete
job_state VARCHAR2(30); -- To keep track of job state
le ku$_LogEntry; -- For WIP and error messages
js ku$_JobStatus; -- The job status from get_status
jd ku$_JobDesc; -- The job description from get_status
sts ku$_Status; -- The status object returned by get_status
v_job_state varchar2(4000);
v_table_list VARCHAR2(3000);
BEGIN
-- Use utility DBMS_DATAPUMP to copy all the data to be deleted to a dump file
-- A report log is also created
SELECT listagg(''''||table_name||'_T'||'''', ',')
WITHIN GROUP
(ORDER BY table_name)
INTO v_table_list
FROM archive_table_list ;
-- Create a (user-named) Data Pump job to do a schema export.
h1 := DBMS_DATAPUMP.OPEN(operation => 'EXPORT',
job_mode => 'TABLE',
job_name => 'SYS_EXPORT_ARCHIVE_'|| TRIM(TO_CHAR(archive_job_seq.NEXTVAL, '0000')),
version => 'COMPATIBLE');
-- Specify a single dump file for the job (using the handle just returned)
-- and a directory object, which must already be defined and accessible
-- to the user running this procedure.
DBMS_DATAPUMP.ADD_FILE(handle => h1,
filename => p_archive_filename||'.dmp',
filetype => DBMS_DATAPUMP.KU$_FILE_TYPE_DUMP_FILE,
directory => p_archive_directory,
reusefile => 1);
DBMS_DATAPUMP.ADD_FILE(handle => h1,
filename => p_archive_filename||'.log',
filetype => DBMS_DATAPUMP.KU$_FILE_TYPE_LOG_FILE,
directory => p_archive_directory,
reusefile => 1);
-- A metadata filter is used to specify the schema that will be exported.
DBMS_DATAPUMP.METADATA_FILTER(handle => h1,
name => 'SCHEMA_EXPR',
value => 'IN(''ABC'')');
DBMS_DATAPUMP.METADATA_FILTER(handle => h1,
name => 'NAME_LIST',
value => v_table_list,
object_type => 'TABLE') ;
DBMS_DATAPUMP.SET_PARAMETER(handle => h1,
name =>'INCLUDE_METADATA',
value => 1);
DBMS_DATAPUMP.SET_PARAMETER(handle => h1,
name =>'DATA_ACCESS_METHOD',
value =>'AUTOMATIC');
-- Start the job. An exception will be generated if something is not set up
-- properly.
DBMS_DATAPUMP.START_JOB(h1);
DBMS_DATAPUMP.WAIT_FOR_JOB(h1, v_job_state);
DBMS_DATAPUMP.DETACH(h1) ;
DBMS_OUTPUT.PUT_LINE(v_job_state);
EXCEPTION
WHEN OTHERS THEN
DBMS_OUTPUT.PUT_LINE(DBMS_UTILITY.FORMAT_ERROR_BACKTRACE);
RAISE;
END archive_quotes ;
Instead of DBMS_DATAPUMP, I'd use the Data Pump driver for external tables. With this you can unload the results of a query!
create table t as
select level c1, mod ( level, 13 ) c2, sysdate c3
from dual
connect by level <= 100;
create table ext (
c1, c2, c3
) organization external (
type oracle_datapump
default directory data_pump_dir
location ( 'ext.dmp' )
) as
select * from t
where c2 = 0;
select * from ext;
C1 C2 C3
13 0 04-FEB-2020 14:40:07
26 0 04-FEB-2020 14:40:07
39 0 04-FEB-2020 14:40:07
52 0 04-FEB-2020 14:40:07
65 0 04-FEB-2020 14:40:07
78 0 04-FEB-2020 14:40:07
91 0 04-FEB-2020 14:40:07
This way the data remains accessible should you want to query it. Without having to load it into a temporary table first.
You could have add the files created to a single external table, so you've got one place to query everything. But selecting from an external table reads all the files it points to. So this could be slow.
12.2 addressed this with partitioned external tables. And from 19c you can have hybrid partitioned tables - one table where each partition could be regular row data or files on the database server.
NB - although this uses the Data Pump driver, the files are incompatible with Data Pump Export/Import files. So you can't use this to read your existing files :(