You Asked

I have a requirement to archive historical data, that is delete data from a number of tables that is more than 36 months old, but only a subset of data, for arguments sake lets call them quotes given to customers that have not been taken up. We currently have 54 tables from which data will be removed, although this is a dynamic list. As part of the data removal we need to back up the data being removed but not the complete table(s), in case any Quote needs to be recovered. I was unable to work out how to use DBMS_DATAPUMP to only export limited data from a table, at say a maximum of 5000 quotes. So, what I have done is to create temporary tables that are copies of the definition of the source tables, copy the data to delete to these temporary tables and then use DBMS_DATAPUMP to export the temporary tables, then delete the data and temporary tables. Is this a valid approach and am I missing or losing something by taking this approach? Is there anyway to avoid using temporary tables?

The create temporary tables code is

PROCEDURE build_temporary_tables(p_report_file       IN UTL_FILE.FILE_TYPE,
                                  p_report_filename   IN VARCHAR2,
                                  p_current_timestamp IN TIMESTAMP) IS

 v_DML_cmd        VARCHAR2(500);

 v_temp_table_count NUMBER(15) ;
 
 BEGIN

   -- Create a set of empty temporary tables that are otherwise identical to the 
   -- tables identified for data removal  
   UTL_FILE.PUT_LINE( p_report_file, 'Build Temporary Tables started');
   v_temp_table_count := 0 ;
   
   FOR i IN (SELECT table_name FROM archive_table_list) 
   LOOP        
       v_DML_cmd  := 'CREATE TABLE ABC.' ||i.table_name || '_T' || 
                       ' AS (SELECT * FROM ' || i.table_name || ' WHERE 1=2 )' ;
       EXECUTE IMMEDIATE v_DML_cmd ; 
       v_temp_table_count := v_temp_table_count + 1 ;
              
   END LOOP;

   UTL_FILE.PUT_LINE(p_report_file, '  Number of temp tables created  '||
                                        TRIM(TO_CHAR(v_temp_table_count, '999999999')));   
   UTL_FILE.NEW_LINE(p_report_file, 1 );

 EXCEPTION
      WHEN OTHERS THEN
           DBMS_OUTPUT.PUT_LINE(DBMS_UTILITY.FORMAT_ERROR_BACKTRACE);
           RAISE;
           
 END build_temporary_tables;

The temporary tables are then populated with insert commands using dynamic SQL

The Archiving code looks like this

 PROCEDURE archive_quotes(p_report_file       IN UTL_FILE.FILE_TYPE,
                          p_archive_directory IN VARCHAR2,
                          p_current_timestamp IN TIMESTAMP,
                          p_archive_filename  IN VARCHAR2) IS
  
  ind NUMBER;              -- Loop index
  h1 NUMBER;               -- Data Pump job handle
  percent_done NUMBER;     -- Percentage of job complete
  job_state VARCHAR2(30);  -- To keep track of job state
  le ku$_LogEntry;         -- For WIP and error messages
  js ku$_JobStatus;        -- The job status from get_status
  jd ku$_JobDesc;          -- The job description from get_status
  sts ku$_Status;          -- The status object returned by get_status
  v_job_state varchar2(4000);
  v_table_list    VARCHAR2(3000); 
   
   
 BEGIN 
 
   -- Use utility DBMS_DATAPUMP to copy all the data to be deleted to a dump file
   -- A report log is also created  
   SELECT listagg(''''||table_name||'_T'||'''', ',')
        WITHIN GROUP
         (ORDER BY table_name)   
     INTO v_table_list
     FROM archive_table_list ;

  -- Create a (user-named) Data Pump job to do a schema export.
  h1 := DBMS_DATAPUMP.OPEN(operation => 'EXPORT',
                              job_mode  => 'TABLE',
                              job_name  => 'SYS_EXPORT_ARCHIVE_'|| TRIM(TO_CHAR(archive_job_seq.NEXTVAL, '0000')),
                              version   => 'COMPATIBLE');                    

 -- Specify a single dump file for the job (using the handle just returned)
 -- and a directory object, which must already be defined and accessible
 -- to the user running this procedure.

  DBMS_DATAPUMP.ADD_FILE(handle    => h1,
                            filename   => p_archive_filename||'.dmp',
                            filetype   => DBMS_DATAPUMP.KU$_FILE_TYPE_DUMP_FILE,
                            directory  => p_archive_directory,
                            reusefile  => 1);
                             
  DBMS_DATAPUMP.ADD_FILE(handle    => h1,
                             filename  => p_archive_filename||'.log',
                             filetype  => DBMS_DATAPUMP.KU$_FILE_TYPE_LOG_FILE,
                             directory => p_archive_directory,
                             reusefile => 1);                             

  -- A metadata filter is used to specify the schema that will be exported.
  DBMS_DATAPUMP.METADATA_FILTER(handle  => h1,
                                     name    => 'SCHEMA_EXPR',
                                     value   => 'IN(''ABC'')');

  DBMS_DATAPUMP.METADATA_FILTER(handle      => h1,
                                     name        => 'NAME_LIST',
                                     value       => v_table_list,                                     
                                     object_type => 'TABLE') ;                                     
                                     
  DBMS_DATAPUMP.SET_PARAMETER(handle      => h1,
                                  name        =>'INCLUDE_METADATA',
                                  value       => 1);    
                                  
  DBMS_DATAPUMP.SET_PARAMETER(handle      => h1,
                                  name        =>'DATA_ACCESS_METHOD',
                                  value       =>'AUTOMATIC');                                     
   
  -- Start the job. An exception will be generated if something is not set up
  -- properly. 

  DBMS_DATAPUMP.START_JOB(h1);

  DBMS_DATAPUMP.WAIT_FOR_JOB(h1, v_job_state);
  
  DBMS_DATAPUMP.DETACH(h1) ;

  DBMS_OUTPUT.PUT_LINE(v_job_state);
  
 EXCEPTION
      WHEN OTHERS THEN
           DBMS_OUTPUT.PUT_LINE(DBMS_UTILITY.FORMAT_ERROR_BACKTRACE);
           RAISE; 
           
 END archive_quotes ;

and Chris said...

Instead of DBMS_DATAPUMP, I'd use the Data Pump driver for external tables. With this you can unload the results of a query!

create table t as 
  select level c1, mod ( level, 13 ) c2, sysdate c3
  from   dual
  connect by level <= 100;
  
create table ext ( 
  c1, c2, c3
) organization external (
  type oracle_datapump  
  default directory data_pump_dir
  location ( 'ext.dmp' )
) as
  select * from t
  where  c2 = 0;
  
select * from ext;

C1     C2   C3                     
   13     0 04-FEB-2020 14:40:07    
   26     0 04-FEB-2020 14:40:07    
   39     0 04-FEB-2020 14:40:07    
   52     0 04-FEB-2020 14:40:07    
   65     0 04-FEB-2020 14:40:07    
   78     0 04-FEB-2020 14:40:07    
   91     0 04-FEB-2020 14:40:07

This way the data remains accessible should you want to query it. Without having to load it into a temporary table first.

You could have add the files created to a single external table, so you've got one place to query everything. But selecting from an external table reads all the files it points to. So this could be slow.

12.2 addressed this with partitioned external tables. And from 19c you can have hybrid partitioned tables - one table where each partition could be regular row data or files on the database server.

NB - although this uses the Data Pump driver, the files are incompatible with Data Pump Export/Import files. So you can't use this to read your existing files :(

Is this answer out of date? If it is, please let us know via a Comment

More to Explore

Design

New to good database design? Check out Chris Saxon's full fundamentals class.

Breadcrumb

Question and Answer

You Asked

and Chris said...

More to Explore

Design

Archiving Using DBMS_DATAPUMP

Breadcrumb

Question and Answer

You Asked

and Chris said...

More to Explore

Design