“Sometimes,” he added, “there is no harm in putting off a piece of work until another day. But when it is a matter of baobabs, that always means a catastrophe. I knew a planet that was inhabited by a lazy man. He neglected three little bushes …”
—The Little Prince by Antoine de Saint-Exupéry
In my first job as a database administrator, working for a Fortune 100 company, I managed only three databases, each no more than 100MB in size, and had ample time for database maintenance. I performed regular database-maintenance procedures to keep the databases in peak operating condition, included data archiving and purging and rebuilding of tables and indexes. I also published graphs of database and application performance every week.
Gartner Research estimated in 2007 that the average amount of data managed by each DBA is 1TB. This allows very little time for database maintenance.
Here are the components of a simple maintenance plan—they should be automated as much as possible. The following sections of this chapter cover each component in detail.
Caution Index rebuilding and tablespace defragmentation are often conducted without sufficient justification. An old paper by Juan Loaiza, “How to Stop Defragmenting and Start Living: The Definitive Word on Fragmentation,” (which you can find on the web) can be considered the last word on the subject.
Backups are the most important aspect of database maintenance, but the honest truth is that most companies don’t pay enough attention to them. Here are some best practices for backups; please refer to Chapter 12 and Chapter 13 for specific information on methods of backups (and recovery).
The following practices apply to all database backups, not just Oracle backups:
Tip Don’t forget to perform backups of databases used for development and testing.
The following practices apply specifically to Oracle database backups:
Tip Backups performed using third-party software such as Network Appliance Snapshot can be integrated with RMAN.
Tip Oracle Database 11g introduced a single parameter called DB_ULTRA_SAFE that controls the values of DB_BLOCK_CHECKING, DB_BLOCK_CHECKSUM, and DB_LOST_WRITE_UPDATE.
The query optimizer requires statistical information about the data, such as the number of rows in each table, in order to generate efficient query plans. This information is collected using the procedures in the DBMS_STATS package. Beginning with Oracle Database 10g, statistical information about the data is refreshed automatically by a scheduled nightly job called GATHER_STATS_JOB; you are free to disable this job in favor of a custom approach. Oracle supplies a wide collection of procedures to collect and manage statistics, including GATHER, DELETE, IMPORT, EXPORT, LOCK, UNLOCK, PUBLISH, and RESTORE procedures. A complete list of them follows, and you can find details in Oracle Database PL/SQL Packages and Types Reference.
Caution Small changes in statistics can sometimes produce big changes in query plans—not always for the better—and this problem can challenge even the most experienced DBAs. The LOCK and RESTORE procedures can be very useful in improving plan stability.
Database performance commonly degrades as the database grows large. To keep the database performing optimally, you should remove unnecessary data. In some cases, it may be necessary to save a copy of the data in a data archive, such as flat files, XML documents, or the like, before the data is purged from the database. The archiving phase also involves a special backup that is tagged with a high-retention setting, such as seven years.
The freed space is automatically used by new data. Regular purging keeps the size of the database under control and helps keep performance stable. If a large amount of data is purged at a time, the sizes of the tables and indexes are not commensurate with the amount of data they actually contain, and additional maintenance is required to compact and shrink the bloated tables and indexes. This is discussed in the section on rebuilding tables and indexes.
The use of partitioning can completely eliminate the need for archiving, purging, and rebuilding. For example, a table can be divided into partitions in such a way that each month’s data is inserted into a separate partition. Queries that require the most recent data perform optimally because the required data is clustered in a relatively small number of data blocks instead of being randomly scattered throughout the table as happens when partitioning is not used. In the case of partitioned tables, purging the data requires only that a partition be dropped; this is a very quick operation. If the old partitions are in a separate tablespace, an alternative to purging is to convert the tablespaces to read-only mode and move them to cheap storage such as WORM drives or unmirrored SATA disks; this is a wonderful way to keep performance optimal without having to remove data. Even the overhead of backups can be kept under control; read-only tablespaces must be backed up only one more time with a high-retention setting and then no longer need to be backed up.
Caution Partitioning is an extra-cost option for which Enterprise Edition is a prerequisite. If you are not licensed for Partitioning, you can use partition views or use a do-it-yourself approach.
The size of a table does not change when records are deleted from it—the indexes associated with the table do not shrink in size either. The wasted space causes three problems:
If you are licensed for Diagnostics Pack, you can use the Automatic Segment Advisor to identify tables that would benefit from rebuilding. Alternatively, you can conduct your own investigation of wasted space, using the SPACE_USAGE procedure. Listing 14-1 shows a sample run; you see that 1161 blocks are empty.
Once you have identified tables and indexes that will benefit from compaction, you can reclaim the wasted space using the SHRINK SPACE command, as in Listing 14-2. Note that you have to give Oracle permission explicitly to relocate rows, using the ENABLE ROW MOVEMENT command—Oracle does not do this automatically, so as not to affect any application that has taken advantage of row addresses (ROWIDs).
Log files of all kinds accumulate over time and can fill up the available space. For example, a trace file is created whenever certain errors occur, such as ORA-00060 (deadlock) and ORA-00600 (internal error). Listing 14-3 shows the first few lines from such a trace file.
The log files are organized into an Automatic Diagnostics Repository (ADR), whose location is controlled by the DIAGNOSTIC_DEST setting. Log files are automatically purged after 30 days—you can change the retention window using the SET CONTROL option of the adrci tool. You can also purge log files manually using the tool’s PURGE command, as in Listing 14-4.
Tip Trace files can be very large. The maximum size of a trace file is controlled by the max_dump_file_size initialization setting, whose default value is unlimited. Consider changing the setting to a value that is appropriate for your environment.
Oracle provides the ability to audit all aspects of database usage. Audit records can be stored in the AUD$ table owned by SYS or in files outside the database—this is controlled by the AUDIT_TRAIL and AUDIT_FILE_DEST settings. There are a number of useful views on the AUD$ table, the chief ones being DBA_AUDIT_TRAIL and DBA_AUDIT_SESSION. Listing 14-5 shows a sample of the data in the DBA_AUDIT_SESSION table.
When creating a database using DBCA, you can choose to enable auditing for logon and logoff events and certain privileged activities such as ALTER DATABASE and ALTER USER. A number of views can be used to determine which activities are being audited, the chief ones being DBA_STMT_AUDIT_OPTS (which SQL commands are being monitored) and DBA_OBJ_AUDIT_OPTS (which tables are being monitored). Listing 14-6 shows a sample of the data in the DBA_STMT_AUDIT_OPTS table.
Reviewing audit records regularly is a good practice. Note that Oracle does not delete old audit data automatically—that is the responsibility of the DBA. You should consider archiving the data using the exp or expdp utility before you delete it.
Tip The AUD$ table is placed in the SYSTEM tablespace when the database is created. The AUD$ table can grow very large, so the SYSTEM tablespace is not a good location for it—it should be housed in a dedicated tablespace instead. Metalink notes 72460.1 Moving AUD$ to Another Tablespace and Adding Triggers to AUD$ and 1019377.6 Script to move SYS.AUD$ table out of SYSTEM tablespace discuss this issue.
The security of the database is the DBA’s responsibility. Here are some simple security rules:
Regular capacity reviews are part of good database management. Database size and free space should be recorded at regular intervals—the required information can be found in the DBA_DATA_FILES and DBA_FREE_SPACE views, as shown in Listing 14-7. Database size and free space can also be checked using Enterprise Manager. Space should be proactively added to the database as necessary. You should make growth projections and identify opportunities to reclaim disk space. Capacity management also refers to use of system resources such as CPU and memory. The information in the Statspack reports and tables can be used to track system utilization, or you can develop a simple system that periodically captures system utilization metrics from the V$SYSSTAT table.
A time series is a sequence of measurements made at periodic intervals. The Statspack tables contain much time series data and can be used for capacity reviews. The Statspack tables STATS$OSSTAT, STATS$SYSSTAT, STATS$SYSTEM_EVENT, and STATS$SYS_TIME_MODEL are of particular interest—they store snapshots of the dynamic performance views V$OSSTAT, V$SYSSTAT, V$SYSTEM_EVENT, and V$SYS_TIME_MODEL, respectively. As illustrated in Listing 14-8, V$OSSTAT offers cumulative values of various OS metrics such as CPU usage, and STATS$OSSTAT contains snapshots of this data; the snapshots are identified by the SNAP_ID column.
The time series data in the Statspack tables can be manipulated with SQL queries into a format that is suitable for reports and graphs. First you use the PIVOT operator to create a two-dimensional table of the cumulative values (Listing 14-9). Then you use an analytic function called LAG to compute the difference between the values in successive rows (Listing 14-10). The CPU utilization graph shown in Figure 14-1 was generated using the data produced by the SQL query shown in Listing 14-11.
There is a wealth of other useful data in the Statspack tables. For example, STATS$SYSSTAT offers cumulative values for Oracle metrics such as logical reads and physical reads; STATS$SYSTEM_EVENT reports the time spent waiting for I/O, locks, and other resources; and STATS$SYS_TIME_MODEL offers summary information such as the total of the elapsed times of all SQL queries processed by Oracle.
Oracle periodically releases fixes, known as patches, for software bugs. A one-off patch is a fix for an individual bug, whereas a patchset is a collection of numerous bug fixes. A critical patch update (CPU) is a collection of fixes for security-related bugs; Oracle releases one such CPU every quarter. A patchset has an Oracle database version string associated with it; for example, 11.2.0.4 is a patchset associated with Oracle 11g Release 2. By contrast, one-off patches and CPUs are identified by a single number; for example, 7210195 was the number of the July 2008 CPU. One-off patches, patches, and patchsets can be downloaded from the Patches & Updates section of the Metalink customer portal.
Some organizations adopt a policy of applying software patches and CPUs whenever they become available. Listing 14-12 is a typical notice regarding the availability of a new CPU that you will receive from Oracle if you have a support contract.
The notification about a CPU directs you to a web page where you can find more information. Once you have determined the specific patch number applicable to your OS platform, you can download the CPU from My Oracle Support.
CPUs are applied using the opatch utility. Complete instructions are provided with the patch.
Here is a short summary of the concepts touched on in this chapter:
3.15.172.195