The goal of this chapter is to introduce you to the various health check utilities, both new in 12.2 and from previous versions, that can greatly ease management and troubleshooting efforts for High Availability and DR solutions. This chapter will focus mostly on features that are not necessarily Exadata specific but that can and should be used in Exadata and other engineered environments. The chapter will highlight important and often hidden aspects of the different features and software that can be used for health checks as well as cover a few of the new features available in the 12.2 GI and RDBMS software.
Cluster Verification Utility
The Cluster Verification Utility (CVU) is an extremely important component of the Grid Infrastructure installation. This tool can be used to perform a variety of health checks on an Oracle Real Application Cluster, regardless of whether the Real Application Cluster is on commodity hardware or an engineered system. This tool is essential in the installation of Grid Infrastructure as well as in the Database Creation Assistant (DBCA). However, there are several great and often underutilized one-off use cases for the tool, which this chapter will go over. The Cluster Verification Utility is available in $GRID_HOME. You will need to set your environment to +ASM to be able to successfully launch CVU.
Software Home Checks
Sometimes, for various reasons, a DBA may come to question the integrity of a software home, such as the RDBMS installation home or the Grid Infrastructure installation itself, for example, after patching and having to roll back certain patches or after a particularly problem-ridden attempt at Grid Infrastructure upgrade/patch that requires manual intervention. It is always a good thing to be able to say that the file permissions ad user/group ownerships of all files within an Oracle software installation are correct and as they should be.
One of the great things about the Cluster Verification Utility is that the return code of many of the commands signifies whether an issue was encountered. Because of this, it is actually easy to set up shell scripts that run cluvfy comp software and then email an administrator if an issue is found so that it can be investigated.
File System Space Checks
It is a known fact that database administrators must keep an eye on the free space available on file systems that contain Grid Infrastructure and Oracle Database software homes because of logging configurations and other various environment-specific reasons. For this reason, every database administrator likely has their own custom script to check directories or df output for space usage. However, the Cluster Verification Utility can greatly simplify these scripts, if Grid Infrastructure is installed.
Grid Infrastructure–Only Check
While cluvfy comp free space is handy, it is a bit limited in what it can do. Still, it does have its uses as a quick check of whether there is any free space in the Grid Infrastructure home. It will check only whether the file system has 5 percent free space and will check only the Grid Infrastructure home. Moreover, the command does not indicate whether a space check failed with a return code, which can complicate scripting.
Generic Space Checks
There exists another type of file system free space check within the Cluster Verification Utility, and that is the one provided by cluvfy comp space. This type of verification is much more robust and comes with the added benefit of return codes that change depending on whether the check fails or succeeds.
Cluster Verification Utility Health Checks
The Cluster Verification Utility can be used to perform health checks on three major components: databases, ASM, and clusters. We will go in depth on the health checks available for all three.
Database Health Checks
This type of check cannot be run without first performing some setup. Because CVU needs to connect to a database to run health checks and check for best-practice adherence, credentials need to be stored. Oracle uses a secure external password store also known as the wallet . The wallet will need to be set up for health checks to work properly for the database component. In this chapter, we will go over how to create such a wallet and how to run a health check using CVU.
CVU Wallets
Unlike the stand-alone Oracle connection wallets that can be used by regular clients, the wallet used by the Cluster Verification Utility is managed by the Grid Infrastructure software and is created, modified, and deleted via crsctl commands.
Running the Health Check
Cluster Health Checks
The Cluster Verification Utility can also perform ASM- and OS-level health checks. The -collect cluster flag will indicate that both ASM- and OS-level checks need to be done, although -collect asm can be used to gather only ASM-level information.
Cluster Verification Utility Baselines
A related feature to the health checks is the ability to create baselines and then compare them. The Cluster Verification Utility will store the contents of a health check so that it can be referenced in the future to help highlight changes in a cluster. The following example shows a way to collect best-practice information on all components on an Oracle Real Application Cluster and save them as a baseline called baseline1. The raw output is shown to help you understand the type of data that is collected by baselines.
Orachk
Orachk is a tool that can be used to run more comprehensive health checks and best-practice tests in an Oracle environment. Orachk comes in two flavors: Orachk and Exachk. Exachk can be used on all engineered systems other than Oracle Database Appliance, and Orachk can be used on all other commodity or engineered systems running the Oracle stack. Orachk and Exachk are similar and overlap in almost all options that are not specific to the Exadata platform (such as cell storage server checks, and so on).
Orachk is installed in the Grid Infrastructure home as well as the Oracle database software home in $ORACLE_HOME/suptools/ and can also be downloaded from Oracle directly. Each PSU that is applied to these software homes stages the latest Orachk version in the $ORACLE_HOME/suptools directory. It is recommended that you download the latest Orachk version from the Oracle web site and install that instead of using the Orachk version that is bundled with the Grid Infrastructure home.
Upgrading Orachk
Orachk/Exachk and Oracle RESTful Data Services
The latest versions of the Orachk and Exachk tools are now able to be used via REST calls. This is made possible by the use of Oracle’s ORDS feature. ORDS can be set up for Orachk only via the root user and is available only on operating systems that are compatible with Orachk daemon mode.
The following is an example of how to configure Orachk with ORDS; it enables the automatic restart of the Orachk daemon in the case of server restarts.
Recommended Settings for Orachk Daemons
There are a few settings that should be configured for the Orachk daemon to ensure that regularly scheduled health checks do not silently fail and leave a system unmonitored for best-practice usage.
Notification Emails
Retention Periods
Automated Password Verification Checks
Trace File Analyzer
Trace File Analyzer (TFA) is arguably one of the most useful tools that Oracle has released for the Oracle Database and Grid Infrastructure software in terms of enabling Oracle DBAs to react quickly and analyze all the different logfiles that the Oracle software writes errors to. Trace File Analyzer, along with the Support Tools Bundle, is extremely powerful as a troubleshooting and diagnosis tool. Most of the features related to TFA are out of scope for this book a bit, but some of the new and exciting features that can be extremely useful will be touched upon in this chapter.
Upgrading TFA to Include the Support Tools Bundle
By default, TFA is upgraded with every quarterly update patch that comes out for Grid Infrastructure and the database software (12.2 and onward). However, the version of TFA that comes shipped with PSUs is generally three months behind what is available for download from Oracle Support. Furthermore, the version of TFA that comes with quarterly patches does not include the Support Tools Bundle, which has many useful features for diagnosing problems. The Support Tools Bundle can be downloaded from TFA Collector - TFA with Database Support Tools Bundle (Doc ID 1513912.1). While it is perfectly acceptable to use the TFA that is bundled with a standard Grid Infrastructure installation, you should download and install the Support Tools Bundle as shown in this chapter. This will upgrade your TFA with a multitude of new and useful features.
TFA must be installed/patched as the root user. This is because certain TFA files are owned by root in a Grid Infrastructure software home and because init scripts need be modified.
Using Trace File Analyzer to Manage Logfiles
Old logs need to be deleted once they are no longer relevant. Every DBA knows this, and most DBAs have scripts that help to automate this task as it quickly becomes burdensome the more databases and clusters that a DBA has to manage. Up until 11g, this was a manual process that required custom shell scripts to help with the maintenance; however, in 11g Oracle came out with a concept called Automatic Diagnostic Repository (ADR). Now, the latest versions of Trace File Analyzer have become integrated with ADR, which allows for streamlined maintenance of database and Grid Infrastructure logs.
Analyzing Logfile Space Usage
Purging Database Logfiles
TFA as a Health Check Tool
Trace File Analyzer can be used to analyze logfiles on several components over a range of time for errors or warnings. This can be extremely useful as an ongoing monitoring solution for large clusters.
New Health Check and Troubleshooting Features in 12.2
Oracle introduced a new health check and troubleshooting feature in 12.2 that can aid in identifying and resolving issues. Using this new feature in conjunction with the rest of the features outlined in this chapter should help to prepare a DBA for even the most vexing of performance issues.
Cluster Health Advisor
Summary
This chapter covered a lot of the new and old troubleshooting and health check topics in the hope of raising your awareness of the amazing options available to DBAs, especially with the new features of TFA, Orachk, and latest features in 12.2.