These Officers, with the Men belonging to the Engine, at their Quarterly Meetings, discourse of Fires, of the Faults committed at some, the good Management in some Cases at others, and thus communicating their Thoughts and Experience they grow wise in the Thing, and know how to command and to execute in the best manner upon every Emergency.
—Benjamin Franklin in a letter to the Pennsylvania Gazette, following a disastrous fire in Philadelphia in the 18th century
In the previous chapter, you learned how to make backup copies of the database. You now turn your attention to repairing the database if it gets damaged. For the purposes of this chapter, you should assume the availability of backup copies made using Recovery Manager (RMAN) but you will discover that backup copies are not the only repair option available. This chapter also acquaints you with a powerful tool called Data Recovery Advisor (DRA), which greatly simplifies the database administrator’s job.
The following are true stories based on personal experiences.
Sometimes we make a mountain out of a molehill. Early in my career, in the days of Oracle 7, a few blocks of data in the company’s most critical database were found to be corrupted. Any transaction that touched the affected blocks could not complete, but the database was otherwise functional. This was a large Enterprise Resource Planning (ERP) database that the business depended on for all aspects of its functioning, so the application administrators went into high gear. There were no disk backups because of the sheer size of the database, so the application administrators recalled the tapes from the offsite storage facility. They shut down the database at the end of the day and began restoring it from tapes, hoping to have the database ready for business the next day. However, three-quarters of the way into the restore operation, a defect was found in the tapes, and the application administrators had to requisition the previous set of tapes from the offsite storage facility. The offsite storage facility was 100 miles away, and this necessitated a substantial delay.
After the database files were successfully restored from tape, the administrators began the task of recovering transactions from the archived redo log files; this took a huge amount of time. From start to finish, the restore and recovery operation took 48 hours instead of the original estimate of 12 hours, and the business lost 2 days of work. The mistake was using a one-size-fits-all repair strategy that did not take into consideration the amount of damage; it is not necessary to shut down and restore the entire database if only one data file is damaged. In Oracle Database 12c, it is even possible to repair individual blocks without affecting the availability of the rest of the database.
Tip An appropriate repair strategy is one that is tailored to the situation and causes minimum downtime. Individual data files and data blocks can be repaired without impacting the rest of the database.
In my second story, which occured more recently in my career, all the king’s horses and all the king’s men couldn’t put Humpty Dumpty together again. The disks crashed, and a large data-warehouse database became unusable. Unfortunately, the disk backups were being stored on the same disks as the database, and restoration from tapes was the only option. Needless to say, any archived redo logs that were not on tape were lost forever, which meant that a significant number of transactions could never be recovered. In a bad case of déjà vu, the first set of tapes was found to be defective, and restoration had to be started afresh using a second set of tapes. But that was not the end of the story; the database was still unusable even after all available redo logs had been applied. It turned out that data was regularly loaded in NOLOGGING mode, which means redo information was not being recorded in the redo logs. Large parts of the database had to be re-created from other sources, and it was several weeks before it was fully functional again.
It happened to me—it could happen to you.
Tip Issue the ALTER DATABASE FORCE LOGGING command to prevent users from using NOLOGGING mode without the knowledge of the DBAs.
There are several kinds of repair operations. This section describes each in turn.
These two terms are often used synonymously in everyday usage—for example, the title of this chapter is simply “Recovery,” and the title of this section is “Types of Recovery.” However, the words restore and recover have specialized technical meanings for the DBA. The word restore refers to the operation of replacing a damaged or missing file such as a data file, a control file, or an archived redo log file from a backup copy. The word recover refers to the process of replaying transactions recorded in the redo logs and making the database usable again. Usually, the recovery process is preceded by the restore process. However, Oracle performs automatic recovery operations (a.k.a. crash recovery) if the database is restarted after an ungraceful shutdown such as a system crash.
Coincidentally enough, the two most important RMAN commands for repairing a database are RESTORE and RECOVER—they are powerful commands with a rich set of options. Later in this chapter, you see how DRA simplifies the DBA’s job.
If one part of a database is damaged, the rest of the database usually continues to function. The most common examples are loss of a data file (except a data file from the SYSTEM tablespace) or corruption of a few blocks of data. It is usually possible to restore and recover the affected parts of the database without impacting the availability of the rest of the database.
Note Single-block recovery is available only in the Enterprise Edition.
After the affected parts of the database are restored from the backup, the transactions in the redo log files must be recovered. Sometimes the recovery phase cannot be completed. For example, an archived redo log may be lost or damaged. Or you might intentionally wish to stop the recovery process at a point in time in the past; perhaps you wish to erase the effects of mistakes made by a user. You will hear the term RESETLOGS used in conjunction with an incomplete recovery, and the incarnation number of the database is incremented in such cases. Incomplete recovery is also referred to as point-in-time recovery.
Point-in-time recovery after user error can be very time-consuming when the database is large or when a large number of redo logs have to be processed. Partial recovery is not an option in such cases because the integrity of the database might be compromised if different parts of the database reflected different points in time. The Enterprise Edition of Oracle Database 12c offers the option to unwind transactions by using a special type of log called the flashback log. This can be orders of magnitude faster than traditional recovery but is limited by the amount of space reserved for the storage of flashback logs. Note that the flashback method of recovering a database can only be used to perform logical recovery; it cannot be used to recover from physical damage to the database such as corruption of data blocks.
All the methods of recovery that I have covered so far can be classified as physical recovery because they are not tailored to the affected data. An alternative to physical recovery following user error is logical recovery, which is tailored to the affected data. Here are some examples:
Oracle Database 12c also offers a variety of methods for logical recovery based on the information contained in the undo segments. These methods are called flashback methods; I’ll present more information about them later in this chapter.
The information contained in the undo segments can be used to reconstruct prior versions of the data. The amount of information available in the undo segments is governed by the size of the undo segments and by the UNDO_RETENTION setting. The default value of the UNDO_RETENTION setting is only 900 seconds (15 minutes)—you should increase it to a more appropriate value. Also make sure the undo segments are fairly large.
Note The more-advanced flashback features such as Flashback Transaction Query, Flashback Table, and Flashback Database are available only with the Enterprise Edition of Oracle Database 12c.
If data in a table is inadvertently changed, you can obtain the prior values of the data items by using the undo segments and construct suitable SQL statements to correct the data, as in the following example. First, let’s increase the salary of employee 101 by $1,000:
SQL> UPDATE employees
2 SET salary=salary + 1000
3 WHERE employee_id = 101;
1 row updated.
SQL> SELECT salary
2 FROM employees
3 WHERE employee_id = 101;
SALARY
----------
18000
SQL> COMMIT;
Commit complete.
Next, determine what the employee’s salary used to be, one hour ago. You can do so with the AS OF TIMESTAMP clause:
SQL> SELECT salary
2 FROM employees AS OF TIMESTAMP SYSDATE - 1/24
3 WHERE employee_id = 101;
SALARY
----------
17000
You can formulate an UPDATE command to change the employee’s salary back to the original value. Notice the use of the AS OF TIMESTAMP clause once again:
SQL> UPDATE employees
2 SET salary = (SELECT salary
3 FROM employees AS OF TIMESTAMP SYSDATE - 1/24
4 WHERE employee_id = 101)
5 WHERE employee_id = 101;
1 row updated.
SQL> COMMIT;
Commit complete.
SQL> SELECT salary
2 FROM employees
3 WHERE employee_id = 101;
SALARY
----------
17000
If a row of data is changed multiple times, you can use the VERSIONS BETWEEN clause to search for prior versions of the row in the undo segments. VERSIONS_STARTTIME and VERSIONS_XID are pseudocolumns that tell you when each version was created and which transaction created it. The absence of these values indicates the time prior to the window specified in the query. In the following example, you can see the original value of the employee’s salary ($17,000), the transaction that changed the value to $18,000, and the transaction that reversed the change:
SQL> COLUMN versions_starttime FORMAT a32
SQL> SELECT versions_starttime,
2 versions_xid,
3 salary
4 FROM employees VERSIONS BETWEEN TIMESTAMP SYSDATE - 1/24 AND SYSDATE
5 WHERE employee_id = 101;
VERSIONS_STARTTIME VERSIONS_XID SALARY
-------------------------------- ---------------- ----------
28-NOV-07 02.15.02 PM 05000F0031030000 17000
28-NOV-07 02.14.31 PM 09001000D4030000 18000
17000
The SQL statements required to reverse changes can be automatically generated from the undo segments. For example, an INSERT statement can be reversed with a DELETE statement, and vice versa. All you need to do is to select the necessary SQL statements from the flashback_transaction_query view—a DBA-level privilege called SELECT ANY TRANSACTION is required. The transaction identifier (XID) in the following example is the one obtained in the previous section:
SQL> SELECT undo_sql
2 FROM flashback_transaction_query
3 WHERE XID = ’09001000D4030000’;
UNDO_SQL
--------------------------------------------------------------------------------
update "HR"."EMPLOYEES" set "SALARY" = ’17000’ where ROWID = ’AAARAgAAFAAAABYABj’;
All changes made to a table can be removed by using the FLASHBACK TABLE command if the information in the undo segments has not yet been overwritten by newer transactions. For example, suppose you inadvertently increase the salary of all employees, as in the following example:
SQL> UPDATE employees
2 SET salary=salary + 1000;
107 rows updated.
SQL> COMMIT;
Commit complete.
Before you use the FLASHBACK TABLE command, you must give Oracle permission to restore rows to new locations if necessary. You do this using the ENABLE ROW MOVEMENT clause of the ALTER TABLE command. You can then issue the FLASHBACK TABLE command:
SQL> FLASHBACK TABLE employees TO TIMESTAMP SYSDATE - 1/24;
FLASHBACK TABLE employees TO TIMESTAMP SYSDATE - 1/24
*
ERROR at line 1:
ORA-08189: cannot flashback the table because row movement is not enabled
SQL> ALTER TABLE employees ENABLE ROW MOVEMENT;
Table altered.
SQL> FLASHBACK TABLE employees TO TIMESTAMP SYSDATE - 1/24;
Flashback complete.
If an entire table is inadvertently deleted, you can simply recover it from the Recycle Bin by using the FLASHBACK TABLE...TO BEFORE DROP command. Dropped tables remain in the Recycle Bin until they are explicitly purged by using the PURGE command:
SQL> CREATE TABLE employees_backup AS SELECT * FROM employees;
Table created.
SQL> SELECT COUNT (*)
2 FROM employees_backup;
COUNT(*)
----------
107
SQL> DROP TABLE employees_backup;
Table dropped.
SQL> DESCRIBE user_recyclebin;
Name Null? Type
----------------------------------------- -------- ----------------------------
OBJECT_NAME NOT NULL VARCHAR2(30)
ORIGINAL_NAME VARCHAR2(32)
OPERATION VARCHAR2(9)
TYPE VARCHAR2(25)
TS_NAME VARCHAR2(30)
CREATETIME VARCHAR2(19)
DROPTIME VARCHAR2(19)
DROPSCN NUMBER
PARTITION_NAME VARCHAR2(32)
CAN_UNDROP VARCHAR2(3)
CAN_PURGE VARCHAR2(3)
RELATED NOT NULL NUMBER
BASE_OBJECT NOT NULL NUMBER
PURGE_OBJECT NOT NULL NUMBER
SPACE NUMBER
SQL> SELECT object_name,
2 original_name
3 FROM user_recyclebin;
OBJECT_NAME ORIGINAL_NAME
------------------------------ --------------------------------
BIN$1ethyUzGQEW849KhDJOcJg==$0 EMPLOYEES_BACKUP
SQL> FLASHBACK TABLE employees_backup TO BEFORE DROP;
Flashback complete.
SQL> SELECT COUNT (*)
2 FROM employees_backup;
COUNT(*)
----------
107
SQL> DROP TABLE employees_backup;
Table dropped.
SQL> PURGE user_recyclebin;
Recyclebin purged.
SQL> FLASHBACK TABLE employees_backup TO BEFORE DROP;
FLASHBACK TABLE employees_backup TO BEFORE DROP
*
ERROR at line 1:
ORA-38305: object not in RECYCLE BIN
This feature was introduced in Oracle Database 12c. An archive can be created to store all changes made to data rows during the lifetime of a table. The archive is not dependent on the undo segments, and, consequently, its contents are not lost if the information in the undo segments is overwritten by new transactions. A complete discussion is outside the scope of this chapter.
The biggest weapon in the armory of flashback features is the FLASHBACK DATABASE command. For example, let’s suppose that major changes are being made to the database. You simply need to create a restore point by using the CREATE RESTORE POINT command. If the changes need to be tested and test data created before releasing the database to users, you can create a second restore point. If the testing is successful, the test data can be completely erased by reverting to the second restore point, and the database can then be released to users. If the testing is unsuccessful, you can revert to the first restore point to wipe out all traces of your changes.
Using the FLASHBACK DATABASE command requires that the database be shut down and then brought into the MOUNT state. After the command has completed, you must open the database with the RESETLOGS option.
Note that the flashback-database technique depends on a special type of log file called a flashback log. The retention of information in flashback logs is governed by the Oracle parameter DB_FLASHBACK_RETENTION_TARGET. Also, the DBA must explicitly enable logging by using the command ALTER DATABASE FLASHBACK ON before opening the database.
You can also recover prior versions of data from the archived redo logs by using a utility called LogMiner, as illustrated in Listing 13-2. LogMiner can construct undo SQL that you can use to selectively reverse changes to your data.
Note that LogMiner works as advertised only if you have enabled minimal supplemental logging. According to the Oracle Database 12c documentation, this “ensures that LogMiner (and any products building on LogMiner technology) has sufficient information to support chained rows and various storage arrangements such as cluster tables and index-organized tables.” The command that enables minimal supplemental logging is ALTER DATABASE ADD SUPPLEMENTAL LOG DATA.
A complete discussion of LogMiner can be found in Chapter 18 of Oracle Database 12c Utilities.
Physical recovery usually requires the use of RMAN . The two most powerful RMAN repair commands are RESTORE DATABASE and RECOVER DATABASE. The first command restores all data files from the last backup. The second command uses the redo logs and recovers all the changes made since the backup. But full database recovery using the RESTORE DATABASE and RECOVER DATABASE commands is not required in all cases. The most appropriate database strategy is one that is tailored to the situation, and RMAN provides a wide range of repair options that may overwhelm the beginner. However, Oracle Database 12c also provides a powerful tool called Data Recovery Advisor (DRA) that greatly simplifies the DBA’s job.
Let’s demonstrate physical recovery by using DRA to repair corrupt blocks—the same situation described in my first horror story. In those days, Oracle did not offer the capability to repair individual blocks while the rest of the database was still being used, although it did offer the capability to repair individual files. Block recovery is the least invasive of all recovery operations, and DRA makes the process as easy as it is possible to imagine.
I decided to intentionally corrupt one of the data blocks in the Employees table in the sample HR schema, and the DBA_EXTENTS view gave me the information I needed to determine exactly which blocks to corrupt:
[oracle@localhost ~]$ sqlplus sys/oracle@pdb1 as sysdba
SQL*Plus: Release 12.1.0.1.0 Production on Wed Dec 3 18:22:19 2014
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
PDB1@ORCL> select block_id from dba_extents where segment_name=’EMPLOYEES’;
BLOCK_ID
----------
200
PDB1@ORCL> select extent_id, file_id, block_id, bytes, blocks
2 from dba_extents
3 where segment_name = ’EMPLOYEES’;
EXTENT_ID FILE_ID BLOCK_ID BYTES BLOCKS
---------- ---------- ---------- ---------- ----------
0 13 200 65536 8
PDB1@ORCL> column file_name format a80
PDB1@ORCL> set linesize 80
PDB1@ORCL> select file_name from dba_data_files where file_id=13;
FILE_NAME
--------------------------------------------------------------------------------
/u01/app/oracle/oradata/ORCL/F0832BAF14721281E045000000000001/datafile/o1_mf_use
rs_9fxvoh6n_.dbf
PDB1@ORCL> exit
Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
I shut down the database and modified the file containing those blocks using a Linux utility called dd (device dump):
[oracle@localhost ~]$ dd if=/dev/null of=/u01/app/oracle/oradata/ORCL/F0832BAF14721281E045000000000001/datafile/o1_mf_users_9fxvoh6n_.dbf ibs=8192 obs=8192 seek=200 count=1 conv=notrunc
0+0 records in
0+0 records out)
Then I started the database. Everything worked normally until I tried to list the contents of the table. SQL*Plus then displayed an error:
[oracle@localhost ~]$ sqlplus sys/oracle@pdb1 as sysdba
SQL*Plus: Release 12.1.0.1.0 Production on Wed Dec 3 19:00:58 2014
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
PDB1@ORCL> select * from hr.employees;
select * from hr.employees
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 13, block # 202)
ORA-01110: data file 13:
’/u01/app/oracle/oradata/ORCL/F0832BAF14721281E045000000000001/datafile/o1_mf_us
ers_9fxvoh6n_.dbf’
PDB1@ORCL> exit
The Oracle-recommended action was, “Try to restore the segment containing the block indicated. This may involve dropping the segment and re-creating it. If there is a trace file, report the error in it to your Oracle representative.” But DRA makes short work of the problem, as you shall see:
[oracle@localhost ~]$ oerr ora 1578
01578, 00000, "ORACLE data block corrupted (file # %s, block # %s)"
// *Cause: The data block indicated was corrupted, mostly due to software
// errors.
// *Action: Try to restore the segment containing the block indicated. This
// may involve dropping the segment and recreating it. If there
// is a trace file, report the errors in it to your ORACLE
// representative.
First, let’s check whether any other blocks are also corrupted. That seems like the smart thing to do—if one block has somehow been corrupted, then other blocks may also have been corrupted, and you can fix all of them at the same time. You use the VALIDATE DATABASE command to check the entire database. (Note that checking the entire database may take a lot of time, and RMAN lets you check individual data files.) The output is shown next (failure messages only):
[oracle@localhost ~]$ rman
Recovery Manager: Release 12.1.0.1.0 - Production on Wed Dec 3 19:02:12 2014
Copyright (c) 1982, 2013, Oracle and/or its affiliates. All rights reserved.
RMAN> connect target;
connected to target database: ORCL (DBID=1365223133)
RMAN> validate database;
File Name: /u01/app/oracle/oradata/ORCL/F0832BAF14721281E045000000000001/datafile/o1_mf_users_9fxvoh6n_.dbf
Block Type Blocks Failing Blocks Processed
---------- -------------- ----------------
Data 0 96
Index 0 46
Other 64 438
validate found one or more corrupt blocks
See trace file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_6537.trc for details
You then issue the LIST FAILURE, ADVISE FAILURE, and REPAIR FAILURE commands in turn. The LIST FAILURE command tells you what you already know at this point—that the file EXAMPLE01.DBF contains a corrupted block:
RMAN> list failure;
Database Role: PRIMARY
List of Database Failures
=========================
Failure ID Priority Status Time Detected Summary
---------- -------- --------- ------------- -------
202 HIGH OPEN 03-DEC-14 Datafile 13: ’/u01/app/oracle/oradata/ORCL/F0832BAF14721281E045000000000001/datafile/o1_mf_users_9fxvoh6n_.dbf’ contains one or more corrupt blocks
Next you ask DRA to analyze the problem and recommend a course of action. If possible, DRA lists manual methods that can be used to recover from the problem. For example, a data file that has been inadvertently moved to a different directory can be restored to its previous location. In this case, there are no such alternatives, and DRA tells you that the only course is to restore the corrupted block from the backup copies of the database and redo any modifications made to the block since the backup copy was created. DRA automatically creates a RMAN script for the purpose, and the REPAIR FAILURE PREVIEW command lists the contents of the script:
RMAN> advise failure;
Database Role: PRIMARY
List of Database Failures
=========================
Failure ID Priority Status Time Detected Summary
---------- -------- --------- ------------- -------
202 HIGH OPEN 03-DEC-14 Datafile 13: ’/u01/app/oracle/oradata/ORCL/F0832BAF14721281E045000000000001/datafile/o1_mf_users_9fxvoh6n_.dbf’ contains one or more corrupt blocks
analyzing automatic repair options; this may take some time
using channel ORA_DISK_1
analyzing automatic repair options complete
Mandatory Manual Actions
========================
no manual actions available
Optional Manual Actions
=======================
no manual actions available
Automated Repair Options
========================
Option Repair Description
------ ------------------
1 Recover multiple corrupt blocks in datafile 13
Strategy: The repair includes complete media recovery with no data loss
Repair script: /u01/app/oracle/diag/rdbms/orcl/orcl/hm/reco_1931933171.hm
RMAN> repair failure preview;
Strategy: The repair includes complete media recovery with no data loss
Repair script: /u01/app/oracle/diag/rdbms/orcl/orcl/hm/reco_1931933171.hm
contents of repair script:
# block media recovery for multiple blocks
recover datafile 13 block 200 to 263;
All that is left to do is to execute the script created by DRA. This is done via the simple command REPAIR FAILURE. You can use the NOPROMPT qualifier if you don’t want to be prompted for confirmation at various points during the process.
First the corrupted block is restored from the backup copies of the database:
RMAN> repair failure;
Strategy: The repair includes complete media recovery with no data loss
Repair script: /u01/app/oracle/diag/rdbms/orcl/orcl/hm/reco_1931933171.hm
contents of repair script:
# block media recovery for multiple blocks
recover datafile 13 block 200 to 263;
Do you really want to execute the above repair (enter YES or NO)? y
executing repair script
Starting recover at 03-DEC-14
using channel ORA_DISK_1
channel ORA_DISK_1: restoring block(s)
channel ORA_DISK_1: specifying block(s) to restore from backup set
restoring blocks of datafile 00013
channel ORA_DISK_1: reading from backup piece /u01/app/oracle/fast_recovery_area/ORCL/F0832BAF14721281E045000000000001/backupset/2014_12_03/o1_mf_nnndf_TAG20141203T174627_b7z4xync_.bkp
channel ORA_DISK_1: piece handle=/u01/app/oracle/fast_recovery_area/ORCL/F0832BAF14721281E045000000000001/backupset/2014_12_03/o1_mf_nnndf_TAG20141203T174627_b7z4xync_.bkp tag=TAG20141203T174627
channel ORA_DISK_1: restored block(s) from backup piece 1
channel ORA_DISK_1: block restore complete, elapsed time: 00:00:03
Finally, any modifications made to the block since the backup copy was created are redone with the help of the redo logs:
starting media recovery
archived log for thread 1 with sequence 128 is already on disk as file /u01/app/oracle/fast_recovery_area/ORCL/archivelog/2014_12_03/o1_mf_1_128_b7z6c80z_.arc
archived log for thread 1 with sequence 129 is already on disk as file /u01/app/oracle/fast_recovery_area/ORCL/archivelog/2014_12_03/o1_mf_1_129_b7z6dnd4_.arc
archived log for thread 1 with sequence 130 is already on disk as file /u01/app/oracle/fast_recovery_area/ORCL/archivelog/2014_12_03/o1_mf_1_130_b7z6g2qv_.arc
media recovery complete, elapsed time: 00:00:07
Finished recover at 03-DEC-14
repair failure complete
RMAN> exit
Recovery Manager complete.
I’d like to conclude this introduction to the subject of recovery with a brief mention of testing. Not testing recovery procedures is a terrible mistake. A recovery test should always be performed before a new database is put to use. However, it is not feasible to perform periodic testing on a live database. You can use the RMAN command DUPLICATE DATABASE to verify the usability of backups without harming the live database. If the database is too large, you can exclude some tablespaces. A very useful validation technique is the PREVIEW clause, which lets you validate scripts without actually performing the commands in the script.
It is also important to document your recovery procedures. Chapter 15 covers the importance of standard operating procedures (SOPs). Database recovery is the most stressful task a DBA has to perform, and an SOP is exactly what you need to guide you through the situation.
Here is a short summary of the concepts touched on in this chapter:
18.191.139.169