Hi Tom.
I'm a great believer that a backup you haven't tested isn't a backup, so I'm trying to ensure that our RMAN backups get automatically verified, however I've come across some weird behaviour with rman backup validation.
My aim is to run full verification of our backups on a regular basis using RMAN and send an email to someone if it finds any errors. I've been testing it out and found some strange behaviour which I'd like to understand
I start by creating a full backup of the database using RMAN. We use incrementally updated image copies so I'm using "backup incremental level 1 for recover of copy with tag 'database' database;". Once I've performed the backup
1. If I run "restore database validate" and "restore archivelog all validate" it (unsurprisingly) says all is fine
2. If I hex edit one of the data files and run the same commands, it picks up the corruption and returns an error code to the command prompt
3. If I hex edit one of the archive logs and run the same commands, it picks up the corruption and returns an error code to the command prompt
4. If I delete one of the archive logs and run the same commands, it picks up the missing file and returns an error code to the command prompt
However, if I delete one of the data file copies, RMAN shows an error on screen but then says "datafile X will be created automatically during restore operation" and does not return an error code to the command prompt. See RMAN log below
channel ORA_DISK_1: scanning datafile copy C:\BACKUPS\MYDB\E812A6F4CE05460DBEABDF29CB7A5D6F\DATAFILE\O1_MF_UNDOTBS1_LSJCL5G5_.DBF
channel ORA_DISK_1: scanning datafile copy C:\BACKUPS\MYDB\E812A6F4CE05460DBEABDF29CB7A5D6F\DATAFILE\O1_MF_TBS1_LSJCDDKS_.DBF
channel ORA_DISK_1: scanning datafile copy C:\BACKUPS\MYDB\E812A6F4CE05460DBEABDF29CB7A5D6F\DATAFILE\O1_MF_TBS1_LSJCDVOT_.DBF
channel ORA_DISK_1: scanning datafile copy C:\DBACKUPS\MYDB\C3\E812A6F4CE05460DBEABDF29CB7A5D6F\DATAFILE\O1_MF_TBS1_LSJCHCYH_.DBF
ORA-19505: failed to identify file "C:\BACKUPS\MYDB\E812A6F4CE05460DBEABDF29CB7A5D6F\DATAFILE\O1_MF_TBS1_LSJCHCYH_.DBF"
ORA-27041: unable to open file
OSD-04002: unable to open file
O/S-Error: (OS 2) The system cannot find the file specified.
ORA-19600: input file is datafile copy 0 (C:\BACKUPS\MYDB\E812A6F4CE05460DBEABDF29CB7A5D6F\DATAFILE\O1_MF_TBS1_LSJCHCYH_.DBF)
channel ORA_DISK_1: scanning datafile copy C:\BACKUPS\MYDB\E812A6F4CE05460DBEABDF29CB7A5D6F\DATAFILE\O1_MF_TBS1_LSJCG8VQ_.DBF
channel ORA_DISK_1: scanning datafile copy C:\BACKUPS\MYDB\E812A6F4CE05460DBEABDF29CB7A5D6F\DATAFILE\O1_MF_TBS1_LSJCC7MK_.DBF
failover to previous backup
datafile 13 will be created automatically during restore operation
Finished restore at 05-JAN-24
So my questions are:
1. Why does RMAN not consider a completely missing data file a serious enough error in backup validation to throw an error? The documentation says "The main purpose of RMAN validation is to check for corrupt blocks and missing files. You can also use RMAN to determine whether backups can be restored.". In this case I have a completely missing file which I would have thought was a fairly serious failure in whether backups can be restored and yet RMAN is saying this is fine.
2. Is there a way to make RMAN return an error code if a file is missing so we can use it via a batch file and send an email to the DBAs if any files are missing? As far as I know crosscheck does not return an error code and restore validate doesn't either so I can't see a way this can be scripted without grepping through the logs for "ORA-19505" which isn't elegant.
Reproduction
------------
1. create a database which is in archivelog mode
2. from rman run: "backup incremental level 1 for recover of copy with tag 'daily' database"
3. switch redo logs a few times to ensure we have some archive logs
4. from rman run: "restore database validate" (all good)
5. from rman run: "restore archivelog all validate" (all good)
6. from os: rename one of the archive logs
7. from rman run: "restore archive log all validate" - you should get an error and RMAN returns -1 if you run it in a script
8. from os: delete or rename one of the data file copies
9. from rman run: "restore database validate" - you will get an error on screen but RMAN returns 0 if you run it in a script
Are you sure there is not some other backup available to RMAN (which would mean a restore would indeed be available?)
Because I dont see your observations
C:\Users\connor>rman target /
Recovery Manager: Release 19.0.0.0.0 - Production on Wed Jan 17 14:41:02 2024
Version 19.19.0.0.0
Copyright (c) 1982, 2023, Oracle and/or its affiliates. All rights reserved.
connected to target database: DB19S (DBID=857515531)
RMAN> RUN {
2> recover copy of database with tag 'daily' until time 'sysdate - 1';
3> backup incremental level 1 for recover of copy with tag 'daily' database;
4> }
Starting recover at 17-JAN-24
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=859 device type=DISK
no copy of datafile 1 found to recover
no copy of datafile 3 found to recover
no copy of datafile 4 found to recover
no copy of datafile 5 found to recover
no copy of datafile 7 found to recover
Finished recover at 17-JAN-24
Starting backup at 17-JAN-24
using channel ORA_DISK_1
no parent backup or copy of datafile 1 found
no parent backup or copy of datafile 5 found
no parent backup or copy of datafile 4 found
no parent backup or copy of datafile 3 found
no parent backup or copy of datafile 7 found
channel ORA_DISK_1: starting datafile copy
input datafile file number=00001 name=X:\ORACLE\ORADATA\DB19S\SYSTEM01.DBF
output file name=X:\ORACLE\BKP\DB19S\DATA_D-DB19S_I-857515531_TS-SYSTEM_FNO-1_012GQNPK tag=DAILY RECID=1 STAMP=1158504256
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:15
channel ORA_DISK_1: starting datafile copy
input datafile file number=00005 name=X:\ORACLE\ORADATA\DB19S\LARGETS.DBF
output file name=X:\ORACLE\BKP\DB19S\DATA_D-DB19S_I-857515531_TS-LARGETS_FNO-5_022GQNQ3 tag=DAILY RECID=2 STAMP=1158504270
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:15
channel ORA_DISK_1: starting datafile copy
input datafile file number=00004 name=X:\ORACLE\ORADATA\DB19S\UNDOTBS01.DBF
output file name=X:\ORACLE\BKP\DB19S\DATA_D-DB19S_I-857515531_TS-UNDOTBS1_FNO-4_032GQNQI tag=DAILY RECID=3 STAMP=1158504284
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:15
channel ORA_DISK_1: starting datafile copy
input datafile file number=00003 name=X:\ORACLE\ORADATA\DB19S\SYSAUX01.DBF
output file name=X:\ORACLE\BKP\DB19S\DATA_D-DB19S_I-857515531_TS-SYSAUX_FNO-3_042GQNR1 tag=DAILY RECID=4 STAMP=1158504295
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:07
channel ORA_DISK_1: starting datafile copy
input datafile file number=00007 name=X:\ORACLE\ORADATA\DB19S\USERS01.DBF
output file name=X:\ORACLE\BKP\DB19S\DATA_D-DB19S_I-857515531_TS-USERS_FNO-7_052GQNR8 tag=DAILY RECID=5 STAMP=1158504297
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:01
Finished backup at 17-JAN-24
Starting Control File and SPFILE Autobackup at 17-JAN-24
piece handle=C:\ORACLE\PRODUCT\19\DATABASE\C-857515531-20240117-00 comment=NONE
Finished Control File and SPFILE Autobackup at 17-JAN-24
RMAN> restore database validate;
Starting restore at 17-JAN-24
using channel ORA_DISK_1
channel ORA_DISK_1: scanning datafile copy X:\ORACLE\BKP\DB19S\DATA_D-DB19S_I-857515531_TS-SYSTEM_FNO-1_012GQNPK
channel ORA_DISK_1: scanning datafile copy X:\ORACLE\BKP\DB19S\DATA_D-DB19S_I-857515531_TS-SYSAUX_FNO-3_042GQNR1
channel ORA_DISK_1: scanning datafile copy X:\ORACLE\BKP\DB19S\DATA_D-DB19S_I-857515531_TS-UNDOTBS1_FNO-4_032GQNQI
channel ORA_DISK_1: scanning datafile copy X:\ORACLE\BKP\DB19S\DATA_D-DB19S_I-857515531_TS-LARGETS_FNO-5_022GQNQ3
channel ORA_DISK_1: scanning datafile copy X:\ORACLE\BKP\DB19S\DATA_D-DB19S_I-857515531_TS-USERS_FNO-7_052GQNR8
Finished restore at 17-JAN-24
RMAN> restore archivelog all validate;
Starting restore at 17-JAN-24
using channel ORA_DISK_1
channel ORA_DISK_1: scanning archived log D:\ORACLE\ARCH\DB19S_ARC0000000088_1126560222.0001
channel ORA_DISK_1: scanning archived log D:\ORACLE\ARCH\DB19S_ARC0000000089_1126560222.0001
channel ORA_DISK_1: scanning archived log D:\ORACLE\ARCH\DB19S_ARC0000000090_1126560222.0001
Finished restore at 17-JAN-24
RMAN> restore database validate;
Starting restore at 17-JAN-24
using channel ORA_DISK_1
channel ORA_DISK_1: scanning datafile copy X:\ORACLE\BKP\DB19S\DATA_D-DB19S_I-857515531_TS-SYSTEM_FNO-1_012GQNPK
channel ORA_DISK_1: scanning datafile copy X:\ORACLE\BKP\DB19S\DATA_D-DB19S_I-857515531_TS-SYSAUX_FNO-3_042GQNR1
channel ORA_DISK_1: scanning datafile copy X:\ORACLE\BKP\DB19S\DATA_D-DB19S_I-857515531_TS-UNDOTBS1_FNO-4_032GQNQI
channel ORA_DISK_1: scanning datafile copy X:\ORACLE\BKP\DB19S\DATA_D-DB19S_I-857515531_TS-LARGETS_FNO-5_022GQNQ3
channel ORA_DISK_1: scanning datafile copy X:\ORACLE\BKP\DB19S\DATA_D-DB19S_I-857515531_TS-USERS_FNO-7_052GQNR8
ORA-19505: failed to identify file "X:\ORACLE\BKP\DB19S\DATA_D-DB19S_I-857515531_TS-USERS_FNO-7_052GQNR8"
ORA-27041: unable to open file
OSD-04002: unable to open file
O/S-Error: (OS 2) The system cannot find the file specified.
ORA-19600: input file is datafile copy 0 (X:\ORACLE\BKP\DB19S\DATA_D-DB19S_I-857515531_TS-USERS_FNO-7_052GQNR8)
failover to previous backup
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 01/17/2024 14:47:15
RMAN-06026: some targets not found - aborting restore
RMAN-06023: no backup or copy of datafile 7 found to restore
RMAN> exit
Recovery Manager complete.
C:\Users\connor>echo %ERRORLEVEL%
1