CHAPTER 2

image

Clusterware Stack Management and Troubleshooting

by Syed Jaffar Hussain, Kai Yu

In Chapter 1, we mentioned that the Oracle RAC cluster database environment requires cluster manager software (“Clusterware”) that is tightly integrated with the operating system (OS) to provide the cluster management functions that enable the Oracle database in the cluster environment.

Oracle Clusterware was originally introduced in Oracle 9i on Linux with the original name Oracle Clusterware Management Service. Cluster Ready Service (CRS) as a generic cluster manager was introduced in Oracle 10.1 for all platforms and was renamed to today’s name, Oracle Clusterware, in Oracle 10.2. Since Oracle 10g, Oracle Clusterware has been the required component for Oracle RAC. On Linux and Windows systems, Oracle Clusterware is the only clusterware we need to run Oracle RAC, while on Unix, Oracle Clusterware can be combined with third-party clusterware such as Sun Cluster and Veritas Cluster Manager.

Oracle Clusterware combines a group of servers into a cluster environment by enabling communication between the servers so that they work together as a single logical server. Oracle Clusterware serves as the foundation of the Oracle RAC database by managing its resources. These resources include Oracle ASM instances, database instances, Oracle databases, virtual IPs (VIPs), the Single Client Access Name (SCAN), SCAN listeners, Oracle Notification Service (ONS), and the Oracle Net listener. Oracle Clusterware is responsible for startup and failover for the resources. Because Oracle Clusterware plays such a key role in the high availability and scalability of the RAC database, the system administrator and the database administrator should pay careful attention to its configuration and management.

This chapter describes the architecture and complex technical stack of Oracle Clusterware and explains how those components work. The chapter also describes configuration best practices and explains how to manage and troubleshoot the clusterware stack. The chapter assumes the latest version of Oracle Clusterware 12cR1.

The following topics will be covered in this chapter:

  • Oracle Clusterware 12cR1 and its components
  • Clusterware startup sequence
  • Clusterware management
  • Troubleshooting cluster stack startup failure
  • CRS logs and directory structure
  • RACcheck, diagcollection.sh, and oratop
  • Debugging and tracing CRS components
  • RAC database hang analysis

Clusterware 12cR1 and Its Components

Before Oracle 11gR2, Oracle Clusterware was a distinct product installed in a home directory separate from Oracle ASM and Oracle RAC database. Like Oracle 11gR2, in a standard 12cR1 cluster, Oracle Clusterware and Oracle ASM are combined into a product called Grid Infrastructure and installed together as parts of the Grid Infrastructure to a single home directory. In Unix or Linux environments, some part of the Grid Infrastructure installation is owned by the root user and the rest is owned by special user grid other than the owner of the Oracle database software oracle. The grid user also owns the Oracle ASM instance.

Only one version of Oracle Clusterware can be active at a time in the cluster, no matter how many different versions of Oracle Clusterware are installed on the cluster. The clusterware version has to be the same as the Oracle Database version or higher. Oracle 12cR1 Clusterware supports all the RAC Database versions ranging from 10gR1 to 12cR1. ASM is always the same version as Oracle Clusterware and can support Oracle Database versions ranging from 10gR1 to 12cR1.

Oracle 12cR1 introduced Oracle Flex Cluster and Flex ASM. The architecture of Oracle Clusterware and Oracle ASM is different from the standard 12cR1 cluster. We will discuss Oracle Flex Cluster and Flex ASM in Chapter 5. This chapter will focus on the standard 12cR1 cluster.

Storage Components of Oracle Clusterware

Oracle Clusterware consists of a storage structure and a set of processes running on each cluster node. The storage structure consists of two pieces of shared storage: the Oracle Cluster Registry (OCR) and voting disk (VD) plus two local files, the Oracle Local Registry (OLR) and the Grid Plug and Play (GPnP) profile.

OCR is used to store the cluster configuration details. It stores the information about the resources that Oracle Clusterware controls. The resources include the Oracle RAC database and instances, listeners, and virtual IPs (VIPs) such as SCAN VIPs and local VIPs.

The voting disk (VD) stores the cluster membership information. Oracle Clusterware uses the VD to determine which nodes are members of a cluster. Oracle Cluster Synchronization Service daemon (OCSSD) on each cluster node updates the VD with the current status of the node every second. The VD is used to determine which RAC nodes are still in the cluster should the interconnect heartbeat between the RAC nodes fail.

Both OCR and VD have to be stored in a shared storage that is accessible to all the servers in the cluster. They can be stored in raw devices for 10g Clusterware or in block devices in 11gR1 Clusterware. With 11g R2 and 12cR1 they should be stored in an ASM disk group or a cluster file system for a freshly installed configuration. They are allowed to be kept in raw devices and block devices if the Clusterware was just being upgraded from 10g or 11gR1 to 11gR2; however, it is recommended that they should be migrated to an ASM disk group or a cluster file system soon after the upgrade. If you want to upgrade your Clusterware and Database stored in raw devices or block devices to Oracle Clusterware 12c and Oracle Database 12c, you must move the database and OCR/VDs to ASM first before you do the upgrade, as Oracle 12c no longer supports the use of raw device or block storage. To avoid single-point-of failure, Oracle recommends that you should have multiple OCRs, and you can have up to five OCRs. Also, you should have at least three VDs, always keeping an odd number of the VDs. On Linux, the /etc/oracle/ocr.loc file records the OCR location:

$ cat  /etc/oracle/ocr.loc
ocrconfig_loc=+VOCR
local_only=FALSE

In addition, you can use the following command to find the VD location:

$ ./crsctl query css votedisk

The Oracle ASM disk group is the recommended primary storage option for OCR and VD. Chapter 5 includes a detailed discussion of storing OCR and VDs in an ASM disk group.

Two files of Oracle Clusterware (OLR) and GPnP profile are stored in the grid home of the local file system of each RAC node. OLR is the OCR’s local version, and it stores the metadata for the local node and is managed by the Oracle High Availability Services daemon (OHASD). OLR stores less information than OCR, but OLR can provide this metadata directly from the local storage without the need to access the OCR stored in an ASM disk group. One OLR is configured for each node, and the default location is in $GIHOME/cdata/<hostname>.olr. The location is also recorded in /etc/oracle/olr.loc, or you can check it through the ocrcheck command:

$ cat /etc/oracle/olr.loc
olrconfig_loc=/u01/app/12.1.0/grid/cdata/knewracn1.olr
crs_home=/u01/app/12.1.0/grid
 
$ ocrcheck -local -config
Oracle Local Registry configuration is :
  Device/File Name         : /u01/app/12.1.0/grid/cdata/knewracn1.olr

The GPnP profile records a lot of important information about the cluster, such as the network profile and the VD. The information stored in the GPnP profile is used when adding a node to a cluster. Figure 2-1 shows an example of the GPnP profile. This file default is stored in $GRID_HOME/gpnp/<hostname>/profiles/peer/profile.xml.

9781430250449_Fig02-01.jpg

Figure 2-1. GPnP profile

Clusterware Software Stack

Beginning with Oracle 11gR2, Oracle redesigned Oracle Clusterware into two software stacks: the High Availability Service stack and CRS stack. Each of these stacks consists of several background processes. The processes of these two stacks facilitate the Clusterware. Figure 2-2 shows the processes of the two stacks of Oracle 12cR1 Clusterware.

9781430250449_Fig02-02.jpg

Figure 2-2. Oracle Clusterware 12cR1 stack

High Availability Cluster Service Stack

The High Availability Cluster Service stack is the lower stack of the Oracle Clusterware. It is based on the Oracle High Availability Service (OHAS) daemon. The OAHS is responsible for starting all other clusterware processes. In the next section, we will discuss the details of the clusterware sequences.

OHAS uses and maintains the information in OLR. The High Availability Cluster Service stack consists of the following daemons and services:

GPnP daemon (GPnPD): This daemon accesses and maintains the GPnP profile and ensures that all the nodes have the current profile. When OCR is stored in an ASM diskgroup, during the initial startup of the clusterware, OCR is not available as the ASM is not available; the GPnP profile contains enough information to start the Clusterware.

Oracle Grid Naming Service (GNS): This process provides the name resolutions with the cluster. With 12cR1, GNS can be used for multiple clusters in contrast to the single-cluster version.

Grid Interprocess Communication (GIPC): This daemon supports Grid Infrastructure communication by enabling Redundant Interconnect Usage.

Multicast Domain Name Service (mDNS): This daemon works with GNS to perform name resolution.

This stack also includes the System Monitor Service daemon (osysmond) and Cluster Logger Service daemon (ologgerd).

The CRS Stack

The CRS stack is an upper-level stack of the Oracle Clusterware which requires the support of the services of the lower High Availability Cluster Service stack. The CRS stack includes the following daemons and services:

CRS: This service is primarily responsible for managing high availability operations. The CRS daemon (CRSD) manages the cluster resource’s start, stop monitor, and failover operations. CRS maintains the configuration information in OCR. If the cluster has an Oracle RAC database, the resources managed by CRS include the Oracle database and its instances, listener, ASM instance, VIPs, and so on. This service runs as the crs.bin process on Linux/Unix and OracleOHService on Windows.

CSS: This service manages and monitors the node membership in the cluster and updates the node status information in VD. This service runs as the ocssd.bin process on Linux/Unix and OracleOHService (ocssd.exe) on Windows.

CSS Agent: This process monitors, starts, and stops the CSS. This service runs as the cssdagent process on Linux/Unix and cssdagent.exe on Windows.

CSS Monitor: This process works with the cssdagent process to provide the I/O fencing to ensure data integrity by rebooting the RAC node in case there is an issue with the ocssd.bin process, a CPU starvation, or an OS locked up. This service runs as cssdmonitor on Linux/Unix or cssdmonitor.exe on Windows. Both cssdagent and cssdmonitor are the new features started in 11gR2 that replace the previous Oracle Process Monitor daemon (oprocd) in 11gR1.

Cluster Time Synchronization Service (CTSS): A new daemon process introduced with 11gR2, which handles the time synchronization among all the nodes in the cluster. You can use the OS’s Network Time Protocol (NTP) service to synchronize the time. Or, if you disable NTP service, CTSS will provide the time synchronization service. This service runs as the octssd.bin process on Linux/Unix or octssd.exe on Windows.

Event Management (EVM): This background process publishes events to all the members of the cluster. On Linux/Unix, the process name is evmd.bin, and on Windows, it is evmd.exe.

ONS: This is the publish and subscribe service that communicates Fast Application Notification (FAN) events. This service is the ons process on Linux/Unix and ons.exe on Windows.

Oracle ASM: Provides the volume manager and shared storage management for Oracle Clusterware and Oracle Database.

Clusterware agent processes: Oracle Agent (oraagent) and Oracle Root Agent (orarootagent). The oraagent agent is responsible for managing all Oracle-owned ohasd resources. The orarootagent is the agent responsible for managing all root-owned ohasd resources.

Clusterware Startup Sequence

Oracle Clusterware is started up automatically when the RAC node starts. This startup process runs through several levels. Figure 2-3 shows the multiple-level startup sequences to start the entire Grid Infrastructure stack plus the resources that Clusterware manages.

9781430250449_Fig02-03.jpg

Figure 2-3. Startup sequence of 12cR1 Clusterware processes

Level 0:The OS automatically starts Clusterware through the OS’s init process. The init process spawns only one init.ohasd, which in turn starts the OHASD process. This is configured in the /etc/inittab file:

$cat /etc/inittab|grep init.d | grep –v grep
h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
Oracle Linux 6.x and Red Hat Linux 6.x have deprecated inittab. init.ohasd is configured in startup in /etc/init/oracle-ohasd.conf:
$ cat /etc/init/oracle-ohasd.conf
......
 
start on runlevel [35]
stop  on runlevel [!35]
respawn
exec /etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
This starts up "init.ohasd run", which in turn starts up the ohasd.bin background process:
$ ps -ef | grep ohasd | grep -v grep
root      4056     1  1 Feb19 ?        01:54:34 /u01/app/12.1.0/grid/bin/ohasd.bin reboot
root     22715    1  0 Feb19 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run

Once OHASD is started on Level 0, OHASD is responsible for starting the rest of the Clusterware and the resources that Clusterware manages directly or indirectly through Levels 1-4. The following discussion shows the four levels of cluster startup sequence shown in the preceding Figure 2-3.

Level 1:OHASD directly spawns four agent processes:

  • cssdmonitor: CSS Monitor
  • OHASD orarootagent: High Availability Service stack Oracle root agent
  • OHASD oraagent: High Availability Service stack Oracle agent
  • cssdagent: CSS Agent

Level 2: On this level, OHASD oraagent spawns five processes:

  • mDNSD: mDNS daemon process
  • GIPCD: Grid Interprocess Communication
  • GPnPD: GPnP profile daemon
  • EVMD: Event Monitor daemon
  • ASM: Resource for monitoring ASM instances

Then, OHASD oraclerootagent spawns the following processes:

  • CRSD: CRS daemon
  • CTSSD: CTSS daemon
  • Diskmon: Disk Monitor daemon (Exadata Storage Server storage)
  • ACFS: (ASM Cluster File System) Drivers

Next, the cssdagent starts the CSSD (CSS daemon) process.

Level 3: The CRSD spawns two CRSD agents: CRSD orarootagent and CRSD oracleagent.

Level 4: On this level, the CRSD orarootagent is responsible for starting the following resources:

  • Network resource: for the public network
  • SCAN VIPs
  • Node VIPs: VIPs for each node
  • ACFS Registry
  • GNS VIP: VIP for GNS if you use the GNS option

Then, the CRSD orarootagent is responsible for starting the rest of the resources as follows:

  • ASM Resource: ASM Instance(s) resource
  • Diskgroup: Used for managing/monitoring ASM diskgroups.
  • DB Resource: Used for monitoring and managing the DB and instances
  • SCAN listener: Listener for SCAN listening on SCAN VIP
  • SCAN VIP: Single Client Access Name VIP
  • Listener: Node listener listening on the Node VIP
  • Services: Database services
  • ONS
  • eONS: Enhanced ONS
  • GSD: For 9i backward compatibility
  • GNS (optional): performs name resolution

ASM and Clusterware: Which One is Started First?

If you have used Oracle RAC 10g and 11gR1, you might remember that the Oracle Clusterware stack has to be up before the ASM instance starts on the node. Because 11gR2, OCR, and VD also can be stored in ASM, the million-dollar question in everyone’s mind is, “Which one is started first?” This section will answer that interesting question.

The Clusterware startup sequence that we just discussed gives the solution: ASM is a part of the CRS of the Clusterware and it is started at Level 3 after the high availability stack is started and before CRSD is started. Then, the question is, “How does the Clusterware get the stored cluster configuration and the clusterware membership information, which are normally stored in OCR and VD, respectively, without starting an ASM instance?” The answer is that during the startup of the high availability stack, the Oracle Clusterware gets the clusterware configuration from OLR and the GPnP profile instead of from OCR. Because these two components are stored in the $GRID_HOME in the local disk, the ASM instance and ASM diskgroup are not needed for the startup of the high availability stack. Oracle Clusterware also doesn’t rely on an ASM instance to access the VD. The location of the VD file is in the ASM disk header. We can see the location information with the following command:

$  kfed read /dev/dm-8 | grep -E 'vfstart|vfend'
kfdhdb.vfstart:                     352 ; 0x0ec: 0x00000160
kfdhdb.vfend:                       384 ; 0x0f0: 0x00000180

The kfdhdb.vfstart is the begin AU offset of the VD file, and the kfdhdb.vfend indicates the end AU offset of the VD file. Oracle Clusterware uses the values of kfdhdb.vfstart and kfdhdb.vfend to locate the VD file.

In this example, /dev/dm-8 is the disk for the ASM disk group VOCR which stores the VD file, as shown with running the following command:

$ crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   7141f13f99734febbf94c73148c35a85 (/dev/dm-8) [VOCR]
                                               Located 1 VD(s).

Clusterware Management

The Grid Infrastructure Universal Installer takes care of the installation and configuration of the Oracle Clusterware and the ASM instance. After this installation, the Clusterware and ASM get restarted automatically every time when the server  starts. Most times, this entire stack works well without need for a lot of manual intervention. However, as the most important infrastructure for Oracle RAC, this stack does need some proper management and ongoing maintenance work. Oracle Clusterware provides several tools, utilities, and log files for a Clusterware admin to perform management, troubleshooting, and diagnostic work. This section will discuss tools and Clusterware management, and the next few sections will discuss Clusterware troubleshooting and diagnosis.

Clusterware Management Tools and Utilities

Oracle provides a set of tools and utilities that can be used for Oracle Grid Infrastructure management. The most commonly used tool is the Clusterware control utility crsctl, which is a command-line tool for managing Oracle Clusterware. Oracle Clusterware 11gR2 has added to crsctl the cluster-aware commands that allow you to perform CRS check, start, and stop operations of the clusterware from any node. Use crsctl –help to print all the command Help with crsctl.

$ crsctl -help
Usage: crsctl add       - add a resource, type, or other entity
crsctl backup    - back up voting disk for CSS
crsctl check     - check a service, resource, or other entity
crsctlconfig    - output autostart configuration
crsctl debug     - obtain or modify debug state
crsctl delete    - delete a resource, type, or other entity
crsctl disable   - disable autostart
crsctl discover  - discover DHCP server
crsctl enable    - enable autostart
crsctleval      - evaluate operations on resource or other entity without performing them
crsctl get       - get an entity value
crsctlgetperm   - get entity permissions
crsctllsmodules - list debug modules
crsctl modify    - modify a resource, type, or other entity
crsctl query     - query service state
crsctl pin       - pin the nodes in the nodelist
crsctl relocate  - relocate a resource, server, or other entity
crsctl replace   - replace the location of voting files
crsctl release   - release a DHCP lease
crsctl request   - request a DHCP lease or an action entrypoint
crsctlsetperm   - set entity permissions
crsctl set       - set an entity value
crsctl start     - start a resource, server, or other entity
crsctl status    - get status of a resource or other entity
crsctl stop      - stop a resource, server, or other entity
crsctl unpin     - unpin the nodes in the nodelist
crsctl unset     - unset a entity value, restoring its default

You can get the detailed syntax of a specific command, such as crsctl status -help. Starting with 11gR2, crsctl commands are used to replace a few deprecated crs_* commands, such as crs_start, crs_stat, and crs_stop. In the following sections, we discuss the management tasks in correlation with the corresponding crsctl commands.

Another set of command-line tools are based on the srvctl utility. These commands are used to manage the Oracle resources managed by the Clusterware.

A srvctl command consists of four parts:

$ srvctl <command> <object> [<options>]

The command part specifies the operation of this command. The object part specifies the resource where this operation will be executed. You can get Help with the detailed syntax of the srvctl by running the srvctl Help command. For detailed Help on each command and object and its options for use, run the following commands:

$ srvctl <command> -h or
$ srvctl <command> <object> -h

There are also other utilities:

  • oifcfg is a command-line tool that can be used to configure network interfaces.
  • ocrconfig is a command-line tool that can be used to administer the OCR and OLR.
  • ocrcheck is the OCR Check tool to check the state of the OCR.
  • ocrdump is the Oracle Clusterware Registry Dump tool that can be used to dump the contents of OCR.
  • Oracle Enterprise Manager Database Control 11g and Enterprise Manager Grid control 11g and 12c can be used to manage the Oracle Clusterware environment.

Start Up and Stop Clusterware

As we discussed in the previous section, through the OS init process, Oracle Clusterware is automatically started up when the OS starts. The clusterware can also be manually started and stopped by using the crsctl utility.

The crsctl utility provides the commands to start up the Oracle Clusterware manually:

Start the Clusterware stack on all servers in the cluster or on one or more named server in the cluster:

$ crsctl start cluster [-all | - n server1[,..]]

For example:

$crsctl start cluster –all
$ crsctl start cluster –n k2r720n1

Start the Oracle High Availability Services daemon (OHASD) and the Clusterware service stack together on the local server only:

$crsctl start crs

Both of these two crsctl startup commands require the root privilege on Linux/Unix to run. The 'crsctl start crs' command will fail if OHASD is already started.

The crsctl utility also provides similar commands to stop the Oracle Clusterware manually. It also requires root privilege on Linux/Unix to stop the clusterware manually.

The following command stops the clusterware stack on the local node, or all nodes, or specified local or remote nodes. Without the [-f] option, this command stops the resources gracefully, and with the [-f] option, the command forces the Oracle Clusterware stack to stop, along with the resources that Oracle Clusteware manages.

$ crsctl stop cluster [-all | -n
server_name[...]] [-f]

The following command stops the Oracle High Availability service on the local server. Use the [-f] option to force any resources to stop, as well as to stop the Oracle High Availability service:

$ crsctl stop crs [-f]

Managing Oracle Clusterware

You can use the following command to check the cluster status:

$ crsctl check cluster {-all}
 
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

Check the CRS status with the following command:

$ crsctl check crs
 
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

Check the OHASD status:

$GRID_HOME/bin/crsctl check has
 
CRS-4638: Oracle High Availability Services is online

Check the current status of all the resources using the following command. It replaces the crs_stat –t command on 11gR1 and earlier.

[grid@knewracn1 ∼]$ crsctl status  resource -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       knewracn1                STABLE
               ONLINE  ONLINE       knewracn2                STABLE
               ONLINE  ONLINE       knewracn4                STABLE
ora.DATA1.dg
               ONLINE  ONLINE       knewracn1                STABLE
               ONLINE  ONLINE       knewracn2                STABLE
               ONLINE  ONLINE       knewracn4                STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       knewracn1                STABLE
               ONLINE  ONLINE       knewracn2                STABLE
               ONLINE  ONLINE       knewracn4                STABLE
ora.LISTENER_LEAF.lsnr
               OFFLINE OFFLINE      knewracn5                STABLE
               OFFLINE OFFLINE      knewracn6                STABLE
               OFFLINE OFFLINE      knewracn7                STABLE
               OFFLINE OFFLINE      knewracn8                STABLE
ora.net1.network
               ONLINE  ONLINE       knewracn1                STABLE
               ONLINE  ONLINE       knewracn2                STABLE
               ONLINE  ONLINE       knewracn4                STABLE
ora.ons
               ONLINE  ONLINE       knewracn1                STABLE
               ONLINE  ONLINE       knewracn2                STABLE
               ONLINE  ONLINE       knewracn4                STABLE
ora.proxy_advm
               ONLINE  ONLINE       knewracn1                STABLE
               ONLINE  ONLINE       knewracn2                STABLE
               ONLINE  ONLINE       knewracn4                STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       knewracn2                STABLE
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       knewracn4                STABLE
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       knewracn1                STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       knewracn1                169.254.199.3 192.16.
                                                                                   8.9.41,STABLE
ora.asm
      1        ONLINE  ONLINE       knewracn1                STABLE
      2        ONLINE  ONLINE       knewracn2                STABLE
      3        ONLINE  ONLINE       knewracn4                STABLE
ora.cvu
      1        ONLINE  ONLINE       knewracn1                STABLE
ora.gns
      1        ONLINE  ONLINE       knewracn1                STABLE
ora.gns.vip
      1        ONLINE  ONLINE       knewracn1                STABLE
ora.knewdb.db
      1        ONLINE  ONLINE       knewracn2                Open,STABLE
      2        ONLINE  ONLINE       knewracn4                Open,STABLE
      3        ONLINE  ONLINE       knewracn1                Open,STABLE
ora.knewracn1.vip
      1        ONLINE  ONLINE       knewracn1                STABLE
ora.knewracn2.vip
      1        ONLINE  ONLINE       knewracn2                STABLE
ora.knewracn4.vip
      1        ONLINE  ONLINE       knewracn4                STABLE
ora.mgmtdb
      1        ONLINE  ONLINE       knewracn1                Open,STABLE
ora.oc4j
      1        ONLINE  ONLINE       knewracn1                STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       knewracn2                STABLE
ora.scan2.vip
      1        ONLINE  ONLINE       knewracn4                STABLE
ora.scan3.vip
      1        ONLINE  ONLINE       knewracn1                STABLE
-------------------------------------------------------------------------------------

These commands can be executed by root user, grid (GI owner), and Oracle (RAC owner). You also can disable or enable all the CRSDs:

$GRID_HOME/bin/crsctl disable crs
$GRID_HOME/bin/crsctl enable crs

Managing OCR and the Voting Disk

Oracle provides three tools to manage OCR: ocrconfig, ocrdump, and ocrcheck. The ocrcheck command lists the OCR and its mirrors.

The following example lists the OCR location in the +VOCR diskgroup and its mirror in the +DATA1 diskgroup. In 11gR2 and 12cR1, OCR can have up to five mirrored copies. Each mirrored copy can be an ASM diskgroup or a cluster file system:

$ ocrcheck
Status of Oracle Cluster Registry is as follows:
         Version                  :          3
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       3192
         Available space (kbytes) :     258928
         ID                       : 1707636078
         Device/File Name         :      +VOCR
               Device/File integrity check succeeded
                                          Device/File Name         :      +DATA1/
                                    
 Device/File integrity check succeeded
                                    Device/File not configured
                                    Device/File not configured
                                    Device/File not configured
 
         Cluster registry integrity check succeeded
 
    Logical corruption check bypassed due to non-privileged user
You also can use the ocrconfig command to add/delete/replace OCR files, and you can add another mirror of OCR in +DATA2:
                           $GRID_HOME/bin/ocrconfig –add +DATA2
 
Or remove the OCR copy from +DATA1 :
$GRID_HOME/bin/ocrconfig –delete +DATA1

The ocrdump command can be used to dump the contents of the OCR to a .txt or .xml file. It can be executed only by the root user, and the default file name is OCRDUMPFILE:

$ ./ocrdump
$ ls -l OCRDUMPFILE
-rw------- 1 root root 212551 Dec 28 20:21 OCRDUMPFILE

The OCR is backed up automatically every four hours on at least one of the nodes in the cluster. The backups are stored in the $GRID_HOME/cdata/<cluster_name> directory. To show the backup information, use the ocrconfig -showbackup command:

$GRID_HOME/bin/ocrconfig –showbackup
 
knewracn1     2013/03/02 07:01:37     /u01/app/12.1.0/grid/cdata/knewrac/backup00.ocr
knewracn1     2013/03/02 03:01:33     /u01/app/12.1.0/grid/cdata/knewrac/backup01.ocr
 
knewracn1     2013/03/01 23:01:32     /u01/app/12.1.0/grid/cdata/knewrac/backup02.ocr
knewracn1     2013/03/01 03:01:21     /u01/app/12.1.0/grid/cdata/knewrac/day.ocr
knewracn1     2013/02/20 02:58:55     /u01/app/12.1.0/grid/cdata/knewrac/week.ocr
knewracn1     2013/02/19 23:15:34     /u01/app/12.1.0/grid/cdata/knewrac/backup_20130219_231534.ocr
knewracn1     2013/02/19 23:05:26     /u01/app/12.1.0/grid/cdata/knewrac/backup_20130219_230526.ocr
.....

The steps to restore OCR from a backup file are as follows:

  1. Identify the backup by using the ocrconfig -showbackup command.
  2. Stop the clusterware on all the cluster nodes.
  3. Perform the restore with the restore command:
    ocrconfig –restore file_name
  4. Restart the crs and do an OCR integrity check by using cluvfy comp ocr.

You can use the following command to check the VD location:

$ crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   7141f13f99734febbf94c73148c35a85 (/dev/dm-8) [VOCR]
                                 Located 1 voting disk(s).
To move the VD to another location, you can use the following crsctl command:
                                   $GRID_HOME/bin/crscrsctl replace votedisk +DATA3

Managing CRS Resources

The srvctl utility can be used to manage the resources that the Clusterware manages. The resources include database, instance, service, nodeapps, vip, asm, diskgroup, listener, scan, scan listener, serer pool, server, oc4j, home, file system, and gns. This managed resource is specified in the <object> part of the command. The command also specifies the management operation on the resource specified in the <action> part of the command. The operations include enable, disable, start, stop, relocate, status, add, remove, modify, getenv, setenv, unsetenv, config, convert, and upgrade.

srvctl <action> <object> [<options>]

Here are a few examples of SRVCTL commands.

Check the SCAN configuration of the cluster:

$srvctl config scan
SCAN name:knewrac-scan.kcloud.dblab.com, Network: 1
Subnet IPv4: 172.16.0.0/255.255.0.0/eth0
Subnet IPv6:
SCAN 0 IPv4 VIP: -/scan1-vip/172.16.150.40
SCAN name:knewrac-scan.kcloud.dblab.com, Network: 1
Subnet IPv4: 172.16.0.0/255.255.0.0/eth0
Subnet IPv6:
SCAN 1 IPv4 VIP: -/scan2-vip/172.16.150.83
SCAN name:knewrac-scan.kcloud.dblab.com, Network: 1
Subnet IPv4: 172.16.0.0/255.255.0.0/eth0
Subnet IPv6:
SCAN 2 IPv4 VIP: -/scan3-vip/172.16.150.28

Check the node VIP status on knewracn1:

$ srvctl status vip  -n knewracn1
VIP 172.16.150.37 is enabled
VIP 172.16.150.37 is running on node: knewracn1

Check the node apps on knewracn1:

$ srvctl status nodeapps  -n knewracn1
VIP 172.16.150.37 is enabled
VIP 172.16.150.37 is running on node: knewracn1
Network is enabled
Network is running on node: knewracn1
ONS is enabled
ONS daemon is running on node: knewracn1

Adding and Removing Cluster Nodes

The flexibility of Oracle Clusterware is exhibited through its ability to scale up and scale down the existing cluster online by adding and removing nodes in conformity with the demands of the business. This section will outline the procedure to add and remove nodes from the existing cluster.

Adding a Node

Assume that you have a two-node cluster environment and want to bring in an additional node (named rac3) to scale up the existing cluster environment, and that the node that is going to be part of the cluster meets all prerequisites essential to begin the procedure to add a node.

Adding a new node to the existing cluster typically consists of the following stages:

  • Cloning Grid Infrastructure Home (cluster/ASM)
  • Cluster configuration
  • Cloning RDBMS home

When the new node is ready with all necessary prerequisites to become part of the existing cluster, such as storage, network, OS, and patches, use the following step-by-step procedure to add the node:

From the first node of the cluster, execute the following command to initiate integrity verification checks for the cluster and on the node that is going to be part of the cluster:

$ cluvfy stage –pre nodeadd –n rac3 –fixup -verbose

When no verification check failures are reported, use the following example to launch the procedure to add the node, assuming that the Dynamic Host Configuration Protocol (DHCP) and Grid Naming Service (GNS) are not configured in the current environment:

$ $GRID_HOME/oui/bin/addNode.sh –silent "CLUSTER_NEW_NODES={rac3}"
              "CLUSTER_NEW_VIRTUAL_HOSTNAMES={rac3_vip}"

Use the following example when adding to the Flex Cluster setup:

$ $GRID_HOME/oui/bin/addNode.sh –silent "CLUSTER_NEW_NODES={rac3}"
 "CLUSTER_NEW_VIRTUAL_HOSTNAMES={rac3-vip}" "CLUSTER_NEW_NODE_ROLES={hub}"

Execute the root.sh script as the root user when prompted on the node that is joining the cluster. The script will initialize cluster configuration and start up the cluster stack on the new node.

After successfully completing the procedure to add a new node, perform post node add verification checks from any cluster node using the following example:

$ cluvfy stage –post nodeadd –n rac3
$ crsctl check cluster –all –- verify the cluster health from all nodes
$ olsnodes –n  -- to list all existing nodes in a cluster

After a successful node addition, execute the following from $ORACLE_HOME to clone the Oracle RDBMS software over the new node to complete the node addition procedure:

$ORACLE_HOME/oui/bin/addNode.sh "CLUSTER_NEW_NODES={rac3}" 

When prompted, execute the root.sh script as the root user on the new node.

Once a new node is successfully added to the cluster, run through the following post-addnode command:
        
       $ ./cluvfy stage –post addnode –n rac3 -verbose

Removing a Node

Assume that you have a three-node cluster environment and want to delete the rac3 node from the existing cluster. Ensure the node that is going to be dropped has no databases, instances, or other services running. If any do exist, either drop them or just move them over to other nodes in the cluster. The following steps outline a procedure to remove a node from the existing cluster:

The node that is going to be removed shouldn’t be pinned. If so, unpin the node prior to starting the procedure. The following examples demonstrate how to identify if a node is pinned and how to unpin the node:

$ olsnodes –n –s -t

You will get the following typical output if the nodes are pinned in the cluster:

rac1        1       Active  Pinned
rac2        2       Active  Pinned
rac3        3       Active  Pinned

Ensure that the cluster stack is up and running on node rac3. If the cluster is inactive on the node, you first need to bring the cluster up on the node and commence the procedure to delete the node.

Execute the following command as the root user from any node if the node that is going to be removed is pinned:

$ crsctl unpin css –n rac3

Run the following command as the root user on the node that is going to be removed:

$GRID_HOME/deinstall/deinstall –local

image Note   The –local argument must be specified to remove the local node; otherwise, the cluster will be deinstalled from every node of the cluster.

Run the following command as the root user from an active node in a cluster:

$crsctl delete node –n rac3

From any active node, execute the following command to update the Oracle inventory for GI and RDBMS homes across all nodes:

$GRID_HOME/oui/bin/runInstaller –updateNodeList ORACLE_HOME=$GRID_HOME cluster_nodes={rac1,rac2} CRS=TRUE -silent
 
$GRID_HOME/oui/bin/runInstaller –updateNodeList ORACLE_HOME=$ORACLE_HOME cluster_nodes={rac1,rac2} CRS=TRUE  –silent

When you specify the –silent option, the installer runs in silent mode and therefore doesn’t display any interactive screens. In other words, it will run in non-interactive mode.

From any active node, verify the post-node deletion:

$cluvfy stage –post nodedel –n rac3 –verbose
$olsnodes –n –s -t

Clean up the following directories manually on the node that was just dropped:

/etc/oraInst.loc, /etc/oratab, /etc/oracle/ /tmp/.oracle, /opt/ORCLmap
Also, the filesystem where cluster and RDBMS software was installed.

Troubleshooting common Clusterware Stack Start-Up Failures

Various factors could contribute to the inability of the cluster stack to come up automatically after a node eviction, failure, reboot, or when cluster startup initiated manually. This section will focus and cover some of the key facts and guidelines that will help with troubleshooting common causes for cluster stack startup failures. Though the symptoms discussed here are not exhaustive or complete, the key points explained in this section indeed provide a better perspective to diagnose various cluster daemon processes common start-up failures and other issues.

Just imagine: a node failure or cluster manual shutdown, and subsequent cluster startup doesn’t start the Clusterware as expected. Upon verifying the cluster or CRS health status, one of the following error messages have been encountered by the DBA:

$GRID_HOME/bin/crsctl check cluster
 
CRS-4639: Could not contact Oracle High Availability Services
CRS-4639: Could not contact Oracle High Availability Services
CRS-4000: Command Check failed, or completed with errors
OR
 
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager

ohasd start up failures –This section will explain and provide most significant information to diagnose common issues of Oracle High Availability Services (OHAS) daemon process startup failures and provide workarounds for the following issues:

CRS-4639: Could not contact Oracle High Availability Services
 
OR
 
CRS-4124: Oracle High Availability Services startup failed
CRS-4000: Command Start failed, or completed with errors

First, review the Clusterware alert and ohasd.log files to identify the root cause for the daemon startup failures.

Verify the existence of the ohasd pointer, as follows, in the OS-specific file: /etc/init, /etc/inittab

h1:3:respawn:/sbin/init.d/init.ohasd run >/dev/null 2>&1 </dev/null

This pointer should have been added automatically upon cluster installation and upgrade. In case no pointer is found, add the preceding entry toward the end of the file and as the root user start the cluster manually or initiate the inittab to start up these things automatically.

If the ohasd pointer exists, the next thing to check is the cluster high availability daemon auto start configuration. Use the following command as the root user to confirm the auto startup configuration:

$ GRID_HOME/bin/crsctl config has  -- High Availability Service
$ GRID_HOME/bin/crsctl config crs   -- Cluster Ready Service

Optionally, you can also verify the files under the /var/opt/oracle/scls_scr/hostname/root or /etc/oracle/scls_scr/hostname/root location to identify whether the auto config is enabled or disabled.

As the root user, enable the auto start and bring up the cluster manually on the local node when the auto startup is not configured. Use the following examples to enable has/crs auto-start:

$ CRS-4621: Oracle High Availability Services autostart is disabled.
 
Example:
$ GRID_HOME/bin/crsctl enable has    – turns on auto startup option of ohasd
$ GRID_HOME/bin/crsctl enable crs    - turns on auto startup option of crs
 
$ GRID_HOME/bin/crsctl start has – initiate OHASD daemon startup
$ GRID_HOME/bin/crsctl start crs – initiate CRS daemon startup

Despite the preceding, if the ohasd daemon process doesn’t start and the problem persists, then you need to examine the component-specific trace files to troubleshoot and identify the root cause. Follow these guidelines:

Verify the existence of the ohasd daemon process on the OS. From the command-line prompt, execute the following:

ps -ef |grep init.ohasd

Examine OS platform–specific log files to identify any errors (refer to the operating system logs section later in this chapter for more details).

Refer the ohasd.log trace file under the $GRID_HOME/log/hostname/ohasd location, as this file contains useful information about the symptoms.

Address any OLR issues that are being reported in the trace file. If OLR corruption or inaccessibility is reported, repair or resolve the issue by taking appropriate action. In case of a restore, restore it from a previous valid backup using the $ocrconfig  -local –restore $backup_location/backup_filename.olr command.

Verify Grid Infrastructure directory ownership and permission using OS level commands.

Additionally, remove the cluster startup socket files from the /var/tmp/.oracle, /usr/tmp/.oracle, /tmp/.oracle directory and start up the cluster manually. The existence of the directory is subject to operating system dependency.

CSSD startup issues – In case the CSSD process fails to start up or is reported to be unhealthy, the following guidelines help in identifying the root cause of the issue:

Error : CRS-4530: Communications failure contacting Cluster Synchronization Services daemon:

Review the Clusterware alert.log and ocssd.log file to identify the root cause of the issue.

Verify the CSSD process on the OS:

ps –ef |grep cssd.bin

Examine the alert_hostname.log and ocssd.log logs to identify the possible causes that are preventing the CSSD process from starting.

Ensure that the node can access the VDs. Run the crsctl query css votedisk command to verify accessibility. If the node doesn’t access the VD files for any reason, check for disk permission and ownership and for logical corruptions. Also, take the appropriate action to resolve the issues by either resetting the ownership and permission or by restoring the corrupted OCR file.

If any heartbeat (network|disk) problems are reported in the logs mentioned earlier, verify the private interconnect connectivity and other network-related settings on the node.

If the VD files are placed on ASM, ensure that the ASM instance is up. In case the ASM instance is not up, refer to the ASM instance alert.log to identify the instance’s startup issues.

Use the following command to verify asm, cluster_connection, cssd, and other cluster resource status:

$ crsctl stat res -init -t
 
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       rac1                     Started,STABLE
ora.cluster_interconnect.haip
      1        ONLINE  OFFLINE      rac1                     STABLE
ora.crsd
      1        ONLINE  OFFLINE      rac1                     STABLE
ora.cssd
      1        ONLINE  OFFLINE      rac1                     STABLE
ora.cssdmonitor
      1        ONLINE  UNKNOWN      rac1                     STABLE
ora.ctssd
      1        ONLINE  ONLINE       rac1                     ACTIVE:0,STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.drivers.acfs
      1        ONLINE  ONLINE       rac1                     STABLE
ora.evmd
      1        ONLINE  ONLINE       rac1                     STABLE
ora.gipcd
      1        ONLINE  ONLINE       rac1                     STABLE
ora.gpnpd
      1        ONLINE  ONLINE       rac1                     STABLE
ora.mdnsd
      1        ONLINE  ONLINE       rac1                     STABLE
ora.storage
      1        ONLINE  ONLINE       rac1                     STABLE
 
If you find the ora.cluster_interconnect.hiap resource is OFFLINE, you might need to verify the interconnect connectivity and check the network settings on the node. Also, you can try to startup the offline resource manually using the following command:
$GRID_HOME/bin/crsctl start res ora.cluster_interconnect.haip –init
Bring up the offline cssd daemon manually using the following command:
$GRID_HOME/bin/crsctl start res ora.cssd –init

The following output will be displayed on your screen:

CRS-2679: Attempting to clean 'ora.cssdmonitor' on 'rac1'
CRS-2681: Clean of 'ora.cssdmonitor' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac1'
CRS-2676: Start of 'ora.cssdmonitor' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac1'
CRS-2676: Start of 'ora.cssd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rac1'
CRS-2672: Attempting to start 'ora.crsd' on 'rac1'
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rac1' succeeded
CRS-2676: Start of 'ora.crsd' on 'rac1' succeeded

CRSD startup issues – When the CRSD–related startup and other issues are being reported, the following guidelines provide assistance to troubleshoot the root cause of the problem:

CRS-4535: Cannot communicate with CRS:

Verify the CRSD process on the OS:

ps –ef |grep crsd.bin

Examine the crsd.log to look for any possible causes that prevent the CRSD from starting.

Ensure that the node can access the OCR files; run the 'ocrcheck' command to verify. If the node can’t access the OCR files, check the following:

Check the OCR disk permission and ownership.

If OCR is placed on the ASM diskgroup, ensure that the ASM instance is up and that the appropriate diskgroup is mounted.

Repair any OCR-related issue encountered, if needed.

Use the following command to ensure that the CRSD daemon process is ONLINE:

$ crsctl stat res -init -t
 
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.crsd
      1        ONLINE  OFFLINE       rac1

You can also start the individual daemon manually using the following command:

$GRID_HOME/bin/crsctl start res ora.cssd –init

In case the Grid Infrastructure malfunctions or its resources are reported as being unhealthy, you need to ensure the following:

  • Sufficient free space must be available under the $GRID_HOME and $ORACLE_HOME filesystem for the cluster to write the events in the respective logs for each component.
  • Ensure enough system resource availability, in terms of CPU and memory.
  • Start up any individual resource that is OFFLINE.

Clusterware Exclusive Mode

Beginning with 11gR2 (11.2.0.2), the cluster stack can be invoked in an exclusive mode to carry out a few exclusive cluster maintenance tasks, such as restoring OCR and VDs, troubleshooting root.sh issues, and so on. To start the cluster in this mode on any particular node, the cluster stack must not be active on other nodes in the cluster. When a cluster is started in exclusive mode, no VD and networks are required. Use the following command as the root user to bring the cluster in exclusive mode:

$crsctl start crs –excl {-nocrs} {-nowait}

With the –nocrs argument, Oracle Clusterware will be started without the CRSD process, and with the –nowait argument, Clusterware start doesn’t depend on Oracle High Availability Service (ohasd) daemon start.

Troubleshooting OCR Issues

In the event of any OCR issues, such as logical corruption, missing permissions and ownership, integrity, and loss of mirror copy, the following troubleshooting and workaround methods are extremely helpful for identifying the root cause as well as resolving the issues:

Verify the OCR file integrity using the following cluster utilities:

$ocrcheck                                - verifies OCR integrity & logical corruption
$ocrcheck –config                        - lists OCR disk location and names
$ocrcheck –local –config                 - lists LOR name and location
$cluvfy comp ocr -n all –verbose         – verifies integrity from all nodes
$cluvfy comp ocr -n rac1 –verbose        - verifies integrity on the local node

With the ocrdump utility, you can dump either the entire contents or just a section from the OCR file into a text file. The following commands achieve that:

$ ocrdump <filename.txt>
-- to obtains a detailed output, run the command as the root user

With the preceding command issued, OCR contents will be dumped into a text file, and if the output filename is not mentioned, a file named OCRDUMPFILE will be generated in the local directory.

$ ocrdump –stdout –keyname SYSTEM.css {-xml}

The preceding command lists css section–specific contents from the current OCR file, and the contents will be displayed on the prompt if the output is not diverted into any file.

$ ocrdump –backupfile <filename and location>
-- will dump specific backup contents.

Diagnose, Debug, Trace Clusterware and RAC Issues

When the default debugging information generated by the Oracle Clusterware processes based on their default trace level settings doesn’t provide enough clues to reach a conclusion to a problem, it is necessary to increase the default trace levels of specific components and their subcomponents to get comprehensive information about the problem. The default tracing levels of Clusterware components is set to value 2, which is sufficient in most cases.

In the following sections, we will demonstrate how to modify, enable, and disable the debugging tracing levels of various cluster components and their subcomponents using the cluster commands.

To understand and list various cluster attributes and their default settings under a specific Clusterware component, use the following example command:

$ crsctl stat res ora.crsd -init -t –f

The output from the preceding example helps you to find the default settings for all arguments of a specific component, like stop/start dependencies, logging/trace level, auto-start, failure settings, time, and so on.

Debugging Clusterware Components and Resources

Oracle lets you dynamically modify and disable the default tracing levels of any of the cluster daemon (CRSD, CSSD, EVMD) processes and their subcomponents. The crsctl set {log|trace} command allows modification of the default debug setting dynamically. The trace levels range from 1 to 5, whereas the value 0 turns off the tracing option. Higher trace levels generate additional diagnostic information about the component.

The following example lists the default log settings for all modules of a component; the command must be executed as the root user to avoid an Insufficient User Privileges error:

$ crsctl get log {css|crs|evm} ALL

The following output fetches the default trace levels of various subcomponents of CSSD:

Get CSSD Module: BCCM  Log Level: 2
Get CSSD Module: CLSF  Log Level: 0
Get CSSD Module: CLSINET  Log Level: 0
Get CSSD Module: CSSD  Log Level: 2
Get CSSD Module: GIPCBCCM  Log Level: 2
Get CSSD Module: GIPCCM  Log Level: 2
Get CSSD Module: GIPCGM  Log Level: 2
Get CSSD Module: GIPCNM  Log Level: 2
Get CSSD Module: GPnP  Log Level: 1
Get CSSD Module: OLR  Log Level: 0
Get CSSD Module: SKGFD  Log Level: 0
 
To list all components underneath of a module, use the following example as the root user:
 
$ crsctl lsmodules                              -- displays the list of modules
$ crsctl lsmodules {css|crs|evm}                -– displays the sub-components of a module

To set a non-default tracing level, use the following syntax as the root user:

Syntax:
$ crsctl set log {module}  "component_name=debug_level"
$ crsctl set log res "resourcename=debug_level"
Example:
$ crsctl set log crs crsmain=3
$ crsctl set log crs crsmain=3,crsevt=4
--- let you set different log levels to multiple modules
  
$ crsctl set log crs all=5
$ crsctl set log res ora.rondb.db:5

If the node is evicting due to some mysterious network heartbeat (NHB) issues and the default information is not sufficient to diagnose the cause, you can increase the CSSD tracing level to a higher number. To troubleshoot NHB-related issues, you can set the log level to 3 as the root user, as shown in the following example:

$ crsctl set log css ocssd=4

The following examples disable the tracing:

$ crsctl set log crs crsmain=0
$ crsctl set log res ora.rondb.db:0
$ crsctl set log res ora.crs:0-init

The –init flag must be specified while modifying the debug mode of a key cluster daemon process. To list the current logging and tracing levels of a particular component and its subcomponents, use the following example:

$crsctl stat res ora.crsd –init –f |grep LEVEL

Tracing levels also can be set by specifying the following environmental variables on the local node (however, you need to restart the cluster on the local node to enforce the logging/tracing changes):

$ export ORA_CRSDEBUG_ALL=1   --sets debugging level 1 to all modules
$ export ORA_CRSDDEBUG_CRS=2  --sets debugging level 2 to CRS module

OS Level Tracing

You should also be able to use the OS-specific tracing utility (gdb, pstack, truss, strace, and so on) to dump the debug information of an OS process. The following exercise demonstrates the procedure:

Identify the process ID that you want to set the OS level tracing; for example:

$ps –ef |grep oraagent.bin

Attach the process with the OS-specific debug utility; for example, on the HP-UX platform:

$pstack /u00/app/12.1.0/grid/bin/orarootagent.bin 4558

You can then provide the information to Oracle Support or consult your OS admin team to help you identify any issues that were raised from the OS perspective.

Diagnose cluvfy Failures

The cluvfy:runCluvfy utility can be used to accomplish pre-component and post-component verification checks, including OS, network, storage, overall system readiness, and clusterware best practices. When the utility fails to execute for no apparent reason, and in addition the –verbose argument doesn’t yield sufficient diagnostic information about the issue, enable the debugging mode for the utility and re-execute the command to acquire adequate information about the problem. The following example demonstrates enabling debugging mode:

$ export SRVM_TRACE=true

Rerun the failed command after setting the preceding environmental variable. A detailed output file will be generated under the $GRID_HOME/cv/log location, which can be used to diagnose the real cause. When debug settings are modified, the details are recorded in the OCR file and the changes will be affected on that node only.

In addition, when Java-based Oracle tools (such as srvctl, dbca, dbua, cluvfy, and netca) fail for unknown reasons, the preceding setting will also help to generate additional diagnostic information that can be used to troubleshoot the issues.

Example:
$srvctl status database –d

image Note   When the basic information from the CRS logs doesn’t provide sufficient feedback to conclude the root cause of any cluster or RAC database issue, setting different levels of trace mode might produce useful, additional information to resolve the problem. However, the scale of the debug mode level will have an impact on the overall cluster performance and also potentially generate a huge amount of information in the respective log files. On top of that, it is highly advised to seek the advice of Oracle Support prior to tampering with the default settings of cluster components.

Grid Infrastructure Component Directory Structure

Each component in Grid Infrastructure maintains a separate log file and records sufficient information under normal and critical circumstances. The information written in the log files will surely assist in diagnosing and troubleshooting Clusterware components or cluster health-related problems. Exploring the appropriate information from these log files, the DBA can diagnose the root cause to troubleshoot frequent node evictions or any fatal Clusterware problems, in addition to Clusterware installation and upgrade difficulties. In this section, we explain some of the important CRS logs that can be examined when various Clusterware issues occur.

alert<HOSTNAME>.log: Similar to a typical database alert log file, Oracle Clusterware manages an alert log file under the $GRID_HOME/log/$hostname location and posts messages whenever important events take place, such as when a cluster daemon process starts, when a process aborts or fails to start a cluster resource, or when node eviction occurs. It also logs information about node eviction occurrences and logs when a voting, OCR disk becomes inaccessible on the node.

Whenever Clusterware confronts any serious issue, this should be the very first file to be examined by the DBA seeking additional information about the problem. The error message also points to a trace file location where more detailed information will be available to troubleshoot the issue.

Following are a few sample messages extracted from the alert log file, which explain the nature of the event, like node eviction, CSSD termination, and the inability to auto start the cluster:

[ohasd(10937)]CRS-1301:Oracle High Availability Service started on node rac1.
 [/u00/app/12.1.0/grid/bin/oraagent.bin(11137)]CRS-5815:Agent '/u00/app/12.1.0/grid/bin/oraagent_oracle' could not find any base type
 entry points for type 'ora.daemon.type'. Details at (:CRSAGF00108:) {0:1:2} in /u00/app/12.1.0/grid/log/rac1/agent/ohasd/oraagent_oracle/oraagent_oracle.log.
[cssd(11168)]CRS-1713:CSSD daemon is started in exclusive mode
[cssd(11168)]CRS-1605:CSSD voting file is online: /dev/rdsk/oracle/vote/ln1/ora_vote_002; details in /u00/app/12.1.0/grid/log/rac1/cssd/ocssd.log.
 
[cssd(11052)]CRS-1656:The CSSdaemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u00/app/12.1.0/grid/log/rac1/cssd/ocssd.log
 [cssd(3586)]CRS-1608:This node was evicted by node 1, rac1; details at (:CSSNM00005:) in /u00/app/12.1.0/grid/log/rac2/cssd/ocssd.log.

ocssd.log: Cluster Synchronization Service daemon (CSSD) is undoubtedly one of the most critical components of Clusterware, whose primary functionality includes node monitoring, group service management, lock services, and cluster heartbeats. The process maintains a log file named ocssd.log under the $GIRD_HOME/log/<hostname>/cssd location and writes all important event messages in the log file. This is one of the busiest CRS log files and is continuously being written; when the debug level of the process is set too high, the file tends to get more detailed information about the underlying issue. Before the node eviction happens on the node, it writes the warning message in the log file. If a situation such as node eviction, VD access issues, or inability of the Clusterware to start up on the local node is raised, it is strongly recommended that you examine the file to find out the reasons.

Following are a few sample entries of the log file:

2012-11-30 10:10:49.989: [CSSD][21]clssnmvDiskKillCheck: not evicted, file /dev/rdsk/c0t5d4 flags 0x00000000, kill block unique 0, my unique 1351280164

ocssd.l04:2012-10-26 22:17:26.750: [CSSD][6]clssnmvDiskVerify: discovered a potential voting file
ocssd.l04:2012-10-26 22:36:12.436: [CSSD][1]clssnmvDiskAvailabilityChange: voting file /dev/rdsk/c0t5d4 now online
ocssd.l04:2012-10-26 22:36:10.440: [CSSD][1]clssnmReadDiscoveryProfile: voting file discovery string(/dev/rdsk/c0t5d5,/dev/rdsk/c0t5d4)
2012-12-01 09:54:10.091: [CSSD][30]clssnmSendingThread: sending status msg to all nodes
ocssd.l01:2012-12-01 10:24:57.116: [CSSD][1]clssnmInitNodeDB: Initializing with OCR id 1484043234
  • [cssd(7335)]CRS-1612:Network communication with node rac2 (02) missing for 50% of timeout interval. Removal of this node from cluster in 14.397 seconds2013-03-15 17:02:44.964
  • [cssd(7335)]CRS-1611:Network communication with node rac2 (02) missing for 75% of timeout interval. Removal of this node from cluster in 7.317 seconds2013-03-15 17:02:50.024
  • [cssd(7335)]CRS-1610:Network communication with node rac2 (02) missing for 90% of timeout interval. Removal of this node from cluster in

Oracle certainly doesn’t recommend removing the log file manually for any reason, as it is governed by Oracle automatically. Upon reaching a size of 50 MB, the file will be automatically archived as cssd.l01 in the same location as part of the predefined rotation policy, and a fresh log file (cssd.log) will be generated. There will be ten archived copies kept for future reference in the same directory as part of the built-in log rotation policy and ten-times-ten file retention formula.

When the log file is removed before it reaches 50 MB, unlike the database alter.log file, Clusterware will not generate a new log file instantly until the removed file reaches 50 MB. This is because despite the removal of the file from the OS, the CSS process will be still writing the messages to the file until the file becomes a candidate for the rotation policy. When the removed file reaches a size of 50 MB, a new log file will appear or be generated and will be available in this way. However, the previous messages won’t be able to recall in this context.

When the CSSD fails to start up, or it is reported to be unhealthy, this file can be referred to ascertain the root cause of the problem.

crsd.log: CRSD is another critical component of Clusterware whose primary functionality includes resource monitoring, resource failover, and managing OCR. The process maintains a log file named crsd.log under the $GIRD_HOME/log/<hostname>/crsd location and writes all important event messages in the log file. Whenever a cluster or non-cluster resource stops or starts, or a failover action is performed, or any resource-related warning message or communication error occurs, the relevant information is written to the file. In case you face issues like failure of starting resources, the DBA can examine the file to get relevant information that could assist in a resolution of the issue.

Deleting the log file manually is not recommended, as it is governed by Oracle and archived automatically. The file will be archived as crsd.l01 under the same location on reaching a size of 10 MB, and a fresh log file (crsd.log) will be generated. There will be ten archived copies kept for future reference in the same directory.

When the CRSD fails to start up, or is unhealthy, refer to this file to find out the root cause of the problem.

ohasd.log: The Oracle High Availability Service daemon (OHASD), a new cluster stack, was first introduced with 11gR2 to manage and control the other cluster stack. The primary responsibilities include managing OLR; starting, stopping, and verifying cluster health status on the local and remote nodes; and also supporting cluster-wide commands. The process maintains a log file named crsd.log under the $GIRD_HOME/log/<hostname>/ohasd location and writes all important event messages in the log file. Examine the file when you face issues running root.sh script, such as when the ohasd process fails to start up or in case of OLR corruption.

Oracle certainly doesn’t encourage deleting the log file for any reason, as it is governed by Oracle automatically. The file will be archived as ohasd.l01 under the same location on reaching a size of 10 MB, and a fresh log file (ohasd.log) will be generated. Like crsd.log and crs.log, there will be ten archived copies kept for future reference in the same directory.

2013-04-17 11:32:47.096: [ default][1] OHASD Daemon Starting. Command string :reboot
2013-04-17 11:32:47.125: [ default][1] Initializing OLR
2013-04-17 11:32:47.255: [  OCRRAW][1]proprioo: for disk 0 (/u00/app/12.1.0/grid_1/cdata/rac2.olr), id match (1), total id sets,
  • need recover (0), my votes (0), total votes (0), commit_lsn (3118), lsn (3118)
  • 2013-04-17 11:32:47.368: [ default][1] Loading debug levels . . .
  • 2013-04-17 11:32:47.803: [  clsdmt][13]Creating PID [6401] file for home /u00/app/12.1.0/grid_1 host usdbp10 bin ohasd to /u00/app/12.1.0/grid_1/ohasd/init/

Upon successful execution of ocrdump, ocrconfig, olsnodes, oifcfg, and ocrcheck commands, a log file will be generated under the $GRID_HOME/log/<hostname>/client location. For EVM daemon (EVMD) process–relevant details, look at the evmd.log file under the $GRID_HOME/log<hostname>/evmd location. Cluster Health Monitor Services (CHM) and logger services are maintained under the $GRID_HOME/log/<hostname>/crfmond, crflogd directories.

Figure 2-4 depicts the hierarchy of the Clusterware component directory structure.

9781430250449_Fig02-04.jpg

Figure 2-4. Unified Clusterware log directory hierarchy

Operating system (OS) logs: Referring to the OS-specific log file will be hugely helpful in identifying Clusterware startup and shutdown issues. Different platforms maintain logs at different locations, as shown in the following example:

HPUX - /var/adm/syslog/syslog.log
AIX - /bin/errpt –a
Linux - /var/log/messages
Windows  - Refer .TXT log files under Application/System log using Windows Event Viewer
Solaris - /var/adm/messages

image Caution   Oracle Clusterware generates and maintains 0-sized socket files in the hidden './oracle' directory under the location /etc or /var/tmp (according to the platform). Removing these files as part of regular log cleanup or unintentionally removing them might lead to a cluster hung situation.

image Note   It is mandatory to maintain sufficient free space under the file system on which grid and RDBSM software are installed to prevent Clusterware issues; in addition, Oracle suggests not to remove the logs manually.

Oracle Clusterware Troubleshooting - Tools and Utilities

Managing and troubleshooting various issues related to Clusterware and its components are two of the key responsibilities of any Oracle DBA. Oracle provides a variety of tools and utilities in this context that the DBA can use to monitor Clusterware health and also diagnose and troubleshoot any serious Clusterware issues. Some of the key tools and utilities that Oracle provides are as follows: CHM, diagcollection.sh,ProcWatcher, RACcheck, oratop, OSWatcher Black Box Analyzer (OSWbba), The Light on-board monitor (LTOM),Hang File Generator (HANGFG).

In the following sections, we will cover some of the uses of these very important tools and describe their advantages.

Cluster Health Check with CVU

Starting with 11.2.0.3, the cluster verification utility (cluvfy) is capable of carrying out the post-Clusterware and Database installation health checks. With the new –healthcheck argument, cluster and database components best practices, mandatory requirements, deviation, and proper functionality can be verified.

The following example collects detailed information about best-practice recommendations for Clusterware in an HTML file named cvucheckreport_<timestamp>.htm:

$./cluvfy comp healthcheck –collect cluster –bestpractice -html

When no further arguments are attached with the healthcheck parameter, the Clusterware and Database checks are carried out. Use the following example to perform the health checks on the cluster and database because no –html argument was specified; the output will be stored in a text file:

$./cluvfy comp healthcheck

The cluvfy utility supports the following arguments:

-collect cluster|database
             -bestpractices|-mandatory|-deviations
             -save –savedir  --to save the output under a particular location
             -html               -- output will be written in an HTML file

Real-Time RAC Database Monitoring - oratop

The oratop utility, which is currently restricted to the Linux operating system, resembles an OS-specific top-like utility, providing near–real-time resource monitoring capability for a RAC and single-instance database from 11.2.0.3 onward. It is a very lightweight monitoring utility that utilizes very minimal resources, 0.20% memory and <1% CPU, on the server. With this utility, you can monitor a RAC database, a stand-alone database, and local as well as remote databases.

Invoke the oratop Utility

Download the oratop.zip file from My Oracle Support (MOS) at https://support.oracle.com/epmos/faces/MosIndex.jspx?_afrLoop=463945118311568&_afrWindowMode=0&_adf.ctrl-state=19ctrm4ozz_4. Unzip the file and set the appropriate permission to the oratop file, which is chmod 755 oratop on the Linux platform. Ensure the following database init parameters: timed_statistics set to TRUE and the statistics_level set to TYPICAL. Also, the following environmental settings need to be set on the local node before invoking the utility:

 
$ ORACLE_UNQNAME=<dbname>
$ ORACLE_SID=<instance_name1>
$ ORACLE_HOME=<db_home>
$ export LD_LIBRARY_PATH=$ORACLE_HOME/lib
$ export PATH=$ORACLE_HOME/bin:$PATH

The following example runs the utility and sets the interval to every ten seconds to refresh the window (default is every three seconds):

$ratop –i 10
 
$oratop –t <tns_name_for_remote_db> -- to monitor remote database.

Input the database user name and password credentials when prompted. When no credentials are entered, it will use the default user SYSTEM with MANAGER as the default password to connect to the database. If you are using a non-system database user, ensure that the user has read permission on some of the dictionary dynamic views, such as v_$SESSION, v_vSYSMETRIC, v_$INSTANCE, v_ $PROCESS, v_$SYSTEM_EVENT, and so on.

Figure 2-5 shows the output window of the oratop utility.

9781430250449_Fig02-05.jpg

Figure 2-5. oratop output screen shot

The granular statistics that appear in the window help to identify database performance contention and bottlenecks. The live window guidelines are categorized into three major sections: 1) top five events (similar to the AWR/ASH report), 2) top Oracle sessions on the server in terms of high I/O and memory and 3) DB load (also provides blocking session details, etc.). Press q or Q to quit from the utility and press Cont+C to abort.

image Note   The tool is available for downloading only through MOS, which requires additional support licensing.

RAC Configuration Audit Tool - RACcheck

RACcheck is a tool that performs audits on various important configuration settings and provides a comprehensive HTML-based assessment report on the overall health check status of the RAC environment.

The tool is currently certified on the majority of operating systems that can be used in interactive and non-interactive modes and also supports multiple databases at a single run. It can be run across all nodes, on a subset of cluster nodes, or on a local node. When the tool is invoked, it carries out the health checks on various components, such as cluster-wide, CRS, Grid, RDBMS, ASM, general database initialization parameters, OS kernel settings, and OS packages. The most suitable time for performing health checks with this tool is immediately after deploying a new RAC environment, before and after planned system maintenances, prior to major upgrades, and quarterly.

With its Upgrade Readiness Assessment Module ability, it will simplify and enhance system upgrade readiness reliability. Apart from regular upgrade prerequisite verifications, the module lets you perform automatic prerequisite verification checks for patches, best practices, and configuration. This will be of great assistance before planning any major cluster upgrades.

Invoke the RACcheck Tool

Download the raccheck.zip file from MOS, unzip it, and set the appropriate permission to the raccheck file, which is chmod 755 raccheck on Unix platforms. To invoke the tool in interactive mode, use the following example at the command prompt as the Oracle software owner and provide the following input when prompted:

$./raccheck

To perform RAC upgrade readiness verification checks, use the following example and response with your inputs when prompted with questions:

$./raccheck –u –o pre

Following are the supported arguments with the tool:

$ ./raccheck -h
 
Usage : ./raccheck [-abvhpfmsuSo:c:rt:]
        -a      All (Perform best practice check and recommended patch check)
        -b      Best Practice check only. No recommended patch check
        -h      Show usage
        -v      Show version
        -p      Patch check only
-m      exclude checks for Maximum Availability Architecture        -u      Run raccheck to check pre-upgrade or post-upgrade best
practices.-o pre or -o post is mandatory with -u option like ./raccheck -u -o pre
-f      Run Offline.Checks will be performed on data already        -o      Argument to an option. if -o is followed by
              v,V,Verbose,VERBOSE or Verbose, it will print checks which
                passs on the screen
                if -o option is not specified,it will print only failures on
 screen. for eg: raccheck -a -o v
-r      To include High availability best practices also in regular
 healthcheck eg ./racchekck -r(not applicable for exachk)
        -c      Pass specific module or component to check best practice
 for.

The assessment report provides a better picture of the RAC environment and includes an overall system health check rating (out of 100), Oracle Maximum Availability Architecture (MAA) best practices, bug fixes, and patch recommendations.

image Note   The tool is available for download only through MOS, which requires additional support licensing. Executing the tool when the systems are heavily loaded is not recommended. It is recommended to test the tool in a non-production environment first, as it doesn’t come by default with Oracle software.

Cluster Diagnostic Collection Tool - diagcollection.sh

Every time you run you into a few serious Clusterware issues and confront node eviction, you typically look at various CRS-level and OS-level logs to gather the required information to comprehend the root cause of the problem. Because Clusterware manages a huge number of logs and trace files, it will sometimes be cumbersome to review many logs from each cluster node. The diagcollection.sh tool, located under GRID_HOME/bin, is capable of gathering the required diagnostic information referring to various important sources, such as CRS logs, trace and core files, OCR data, and OS logs.

With the diagnostic collection tool, you have the flexibility to collect diagnostic information at different levels, such as cluster, Oracle RDBMS home, Oracle base, and Core analysis. The gathered information from various resources will then be embedded into a few zip files. You therefore need to upload these files to Oracle Support for further analysis to resolve the problem.

The following example will collect the $GRID_HOME diagnostic information:

./diagcollection.sh --collect --crs $GRID_HOME

The following CRS diagnostic archives will be created in the local directory:

crsData_usdbt43_20121204_1103.tar.gz -> logs, traces, and cores from CRS home.

image Note   Core files will be packaged only with the --core option.

ocrData_usdbt43_20121204_1103.tar.gz -> ocrdump, ocrcheck etc
coreData_usdbt43_20121204_1103.tar.gz -> contents of CRS core files in text format
osData_usdbt43_20121204_1103.tar.gz -> logs from operating system
Collecting crs data
log/usdbt43/cssd/ocssd.log: file changed size
  
Collecting OCR data
Collecting information from core files
Collecting OS logs

After data collection is complete, the following files will be created in the local directory:

crsData_$hostname_20121204_1103.tar.gz
ocrData_$hostname _20121204_1103.tar.gz
coreData_$hostname _20121204_1103.tar.gz
osData_$hostname _20121204_1103.tar.gz

The following example will assist you in getting the supported parameters list that can be used with the tool (output is trimmed):

./diagcollection.sh -help
    --collect
             [--crs] For collecting crs diag information
             [--adr] For collecting diag information for ADR; specify ADR location
             [--chmos] For collecting Cluster Health Monitor (OS) data
             [--all] Default.For collecting all diag information. <<<>>>
             [--core] Unix only. Package core files with CRS data
             [--afterdate] Unix only. Collects archives from the specified date.
             [--aftertime] Supported with -adr option. Collects archives after the specified
             [--beforetime] Supported with -adr option. Collects archives before the specified
             [--crshome] Argument that specifies the CRS Home location
             [--incidenttime] Collects Cluster Health Monitor (OS) data from the specified
             [--incidentduration] Collects Cluster Health Monitor (OS) data for the duration
 
             NOTE:
             1. You can also do the following
                ./diagcollection.pl --collect --crs --crshome <CRS Home>
  
     --clean        cleans up the diagnosability
                    information gathered by this script
  
     --coreanalyze  Unix only. Extracts information from core files
                    and stores it in a text file
        Use the –-clean argument with the script to clean up previously generated files.

image Note   Ensure that enough free space is available at the location where the files are being generated. Furthermore, depending upon the level used to collect the information, the script might take a considerable amount of time to complete the job. Hence, keep an eye on resource consumption on the node. The tool must be executed as root user.

CHM

The Oracle CHM tool is designed to detect and analyze OS–and cluster resource–related degradations and failures. Formerly known as Instantaneous Problem Detector for Clusters or IPD/OS, this tool tracks the OS resource consumption on each RAC node, process, and device level and also connects and analyzes the cluster-wide data. This tool stores real-time operating metrics in the CHM repository and also reports an alert when certain metrics pass the resource utilization thresholds. This tool can be used to replay the historical data to trace back what was happening at the time of failure. This can be very useful for the root cause analysis of many issues that occur in the cluster such as node eviction.

For Oracle Clusterware 10.2 to 11.2.0.1, the CHM/OS tool is a standalone tool that you need to download and install separately. Starting with Oracle Grid Infrastructure 11.2.02, the CHM/OS tool is fully integrated with the Oracle Grid Infrastructure. In this section we focus on this integrated version of the CHM/OS.

The CHM tool is installed to the Oracle Grid Infrastructure home and is activated by default in Grid Infrastructure 11.2.0.2 and later for Linux and Solaris and 11.2.0.3 and later for AIX and Windows. CHM consists of two services: osysmond and ologgerd. osysmond runs on every node of the cluster to monitor and collect the OS metrics and send the data to the cluster logger services. ologgerd receives the information from all the nodes and stores the information in the CHM Repository. ologgerd runs in one node as the master service and in another node as a standby if the cluster has more than one node. If the master cluster logger service fails, the standby takes over as the master service and selects a new node for standby. The following example shows the two processes, osysmond.bin and ologgerd:

$ ps -ef | grep  -E 'osysmond|ologgerd' | grep -v grep
root      3595     1  0 Nov14 ?        01:40:51 /u01/app/11.2.0/grid/bin/ologgerd -m k2r720n1 -r -d /u01/app/11.2.0/grid/crf/db/k2r720n2
root      6192     1  3 Nov08 ?        1-20:17:45 /u01/app/11.2.0/grid/bin/osysmond.bin

The preceding ologgerd daemon uses '-d /u01/app/11.2.0/grid/crf/db/k2r720n2', which is the directory where the CHM repository resides. The CHM repository is a Berkeley DB-based database stored as *.bdb files in the directory. This directory requires 1GB of disk space per node in the cluster.

$ pwd
/u01/app/11.2.0/grid/crf/db/k2r720n2
$ ls *.bdb
crfalert.bdb  crfclust.bdb  crfconn.bdb  crfcpu.bdb  crfhosts.bdb  crfloclts.bdb  crfts.bdb  repdhosts.bdb

Oracle Clusterware 12cR1 has enhanced the CHM by providing a highly available server monitor service and also support for the Flex Cluster architecture. The CHM in Oracle Clusterware 12cR1 consists of three components:

  • osysmond
  • ologgerd
  • Oracle Grid Infrastructure Management Repository

The System Monitor Service process (osysmon) runs on every node of the cluster. The System Monitor Service monitors the OS and cluster resource–related degradation and failure and collects the real-time OS metric data and sends these data to the cluster logger service.

Instead of running on every cluster node as in Oracle Clusterware 11gR2, there is only one cluster logger service per every 32 nodes in Oracle Clusterware 12cR1. For high availability, this service will be restarted in another node if this service fails.

On the node that runs both osysmon and ologgerd:

grid@knewracn1]$ ps -ef | grep  -E 'osysmond|ologgerd' | grep -v grep
root      4408     1  3 Feb19 ?        08:40:32 /u01/app/12.1.0/grid/bin/osysmond.bin
root      4506     1  1 Feb19 ?        02:43:25 /u01/app/12.1.0/grid/bin/ologgerd -M -d /u01/app/12.1.0/grid/crf/db/knewracn1

On other nodes that run only osysmon:

[grid@knewracn2 product]$ ps -ef | grep  -E 'osysmond|ologgerd' | grep -v grep
root      7995     1  1 Feb19 ?        03:26:27 /u01/app/12.1.0/grid/bin/osysmond.bin

In Oracle Clusterware 12cR1, all the metrics data that the cluster logger service receives are stored in the central Oracle Grid Infrastructure Management Repository (the CHM repository), which is a new feature in 12c Clusterware. The repository is configured during the installation or upgrade to Oracle Clusterware by selecting the “Configure Grid Infrastructure Management Repository” option in Oracle Universal Installer (OUI), as shown in Figure 2-6.

9781430250449_Fig02-06.jpg

Figure 2-6. Configure Grid Infrasturecture Management Repository in OUI

This repository is an Oracle database. Only one node runs this repository in a cluster. If the cluster is a Flex Cluster, this node must be a hub node. Chapter 4 will discuss the architecture of Oracle Flex Clusters and different types of cluster nodes in a Flex Cluster.

To reduce the private network traffic, the repository database (MGMTDB) and the cluster logger service process (osysmon) can be located to run on the same node as shown here:

$ ps -ef | grep -v grep | grep pmon | grep MGMTDB
grid     31832     1  0 Feb20 ?        00:04:06 mdb_pmon_-MGMTDB
 
$ ps -ef | grep -v grep | grep  'osysmon'
root      2434     1  1 Feb 20 ?        00:04:49 /u01/app/12.1.0/grid/bin/osysmond.bin

This repository database runs under the owner of the Grid Infrastructure, which is the “grid” user in this example. The database files of the CHM repository database are located in the same diskgroup as the OCR and VD. In order to store the Grid Infrastructure repository, the size requirement of this diskgroup has been increased from the size for the OCR and VD. The actual size and the retention policy can be managed with the oclumon tool. The oclumon tool provides a command interface to query the CHM repository and perform various administrative tasks of the CHM repository. The actual size and the retention policy can be managed with the oclumon tool.

For example, we can get the repository information such as size, repository path, the node for the cluster logger service, and all the nodes that the statistics are collected from using a command like this:

$ oclumon manage -get repsize reppath alllogger -details
 
CHM Repository Path = +DATA1/_MGMTDB/DATAFILE/sysmgmtdata.260.807876429
CHM Repository Size = 38940
Logger = knewracn1
Nodes = knewracn1,knewracn2,knewracn4,knewracn7,knewracn5,knewracn8,knewracn6

The CHM admin directory $GRID_HOME/crf/admin has crf(hostname).ora, which records the information about the CHM repository:

cat /u01/app/12.1.0/grid/crf/admincrfknewracn1.ora
BDBLOC=default
PINNEDPROCS=osysmond.bin,ologgerd,ocssd.bin,cssdmonitor,cssdagent,mdb_pmon_-MGMTDB,kswapd0
MASTER=knewracn1
MYNAME=knewracn1
CLUSTERNAME=knewrac
USERNAME=grid
CRFHOME=/u01/app/12.1.0/grid
knewracn1 5=127.0.0.1 0
knewracn1 1=127.0.0.1 0
knewracn1 0=192.168.9.41 61020
MASTERPUB=172.16.9.41
DEAD=
knewracn1 2=192.168.9.41 61021
knewracn2 5=127.0.0.1 0
knewracn2 1=127.0.0.1 0
knewracn2 0=192.168.9.42 61020
ACTIVE=knewracn1,knewracn2,knewracn4
HOSTS=knewracn1,knewracn2,knewracn4
knewracn5 5=127.0.0.1 0
knewracn5 1=127.0.0.1 0
knewracn4 5=127.0.0.1 0
knewracn4 1=127.0.0.1 0
knewracn4 0=192.168.9.44 61020
knewracn8 5=127.0.0.1 0
knewracn8 1=127.0.0.1 0
knewracn7 5=127.0.0.1 0
knewracn7 1=127.0.0.1 0
knewracn6 5=127.0.0.1 0
knewracn6 1=127.0.0.1 0

You can collect CHM data on any node by running the diagcollection.pl utility on that node as a privileged user root. The steps are as follows:

First, find the cluster node where the cluster logger service is running:

$/u01/app/12.1.0/grid /bin/oclumon manage -get master
 
Master = knewracn1

Log in to the cluster node that runs the cluster logger service node as a privileged user (in other words, the root user) and run the diagcollection.pl utility. This utility collects all the available data stored in the CHM Repository. You can also specify the specific time and duration to collect the data:

[root@knewracn1 ∼]# /u01/app/12.1.0/grid/bin/diagcollection.pl -collect -crshome /u01/app/12.1.0/grid
Production Copyright 2004, 2010, Oracle. All rights reserved
CRS diagnostic collection tool
The following CRS diagnostic archives will be created in the local directory.
crsData_knewracn1_20130302_0719.tar.gz -> logs,traces and cores from CRS home. Note: core files will be packaged only with the --core option.
ocrData_knewracn1_20130302_0719.tar.gz -> ocrdump, ocrcheck etc
coreData_knewracn1_20130302_0719.tar.gz -> contents of CRS core files in text format
 
osData_knewracn1_20130302_0719.tar.gz -> logs from operating system
Collecting crs data
/bin/tar: log/knewracn1/cssd/ocssd.log: file changed as we read it
 
Collecting OCR data
Collecting information from core files
No corefiles found
The following diagnostic archives will be created in the local directory.
acfsData_knewracn1_20130302_0719.tar.gz -> logs from acfs log.
Collecting acfs data
Collecting OS logs
Collecting sysconfig data

This utility creates two .gz files, chmosData_<host>timestamp.tar.gz and

osData_<host>timestamp.tar.gz, in the current working directory:

[root@knewracn1 ∼]# ls -l *.gz
-rw-r--r--. 1 root root     1481 Mar  2 07:24 acfsData_knewracn1_20130302_0719.tar.gz
-rw-r--r--. 1 root root 58813132 Mar  2 07:23 crsData_knewracn1_20130302_0719.tar.gz
-rw-r--r--. 1 root root    54580 Mar  2 07:24 ocrData_knewracn1_20130302_0719.tar.gz
-rw-r--r--. 1 root root    18467 Mar  2 07:24 osData_knewracn1_20130302_0719.tar.gz

These .gz files include various log files that can be used for the diagnosis of your cluster issues.

You also can use the OCLUMON command-line tool to query the CHM repository to display node-specific metrics for a specified time period. You also can print the durations and the states for a resource on a node during a specified time period. The states can be based on predefined thresholds for each resource metric and are denoted as red, orange, yellow, and green, in decreasing order of criticality. OCLUMON command syntax is as follows:

$oclumon dumpnodeview [[-allnodes] | [-n node1 node2] [-last "duration"] |
[-s "time_stamp" -e "time_stamp"] [-v] [-warning]] [-h]
-s indicates the start timestamp and –e indicates the end timestamp

For example, we can run the command like this to write the report into a text file:

$GRID_HOME/bin/oclumon dumpnodeview -allnodes -v -s "2013-03-0206:20:00" -e "2013-03-0206:30:00"> /home/grid/chm.txt

A segment of /home/grid/chm.txt looks like this:

$less  /home/grid/chm.txt
 
----------------------------------------
Node: knewracn1 Clock: '13-03-02 06.20.04' SerialNo:178224
----------------------------------------
 
SYSTEM:
#pcpus: 1 #vcpus: 2 cpuht: Y chipname: Intel(R) cpu: 7.97 cpuq: 2 physmemfree: 441396 physmemtotal: 5019920 mcache: 2405048 swapfree: 11625764 swaptotal: 12583912 hugepagetotal: 0 hugepagefree: 0 hugepagesize: 2048 ior: 93 iow: 242 ios: 39 swpin: 0 swpout: 0 pgin: 90 pgout: 180 netr: 179.471 netw: 124.380 procs: 305 rtprocs: 16 #fds: 26144 #sysfdlimit: 6815744 #disks: 5 #nics: 4 nicErrors: 0
 
TOP CONSUMERS:
topcpu: 'gipcd.bin(4205) 5.79' topprivmem: 'ovmd(719) 214072' topshm: 'ora_ppa7_knewdb(27372) 841520' topfd: 'ovmd(719) 1023' topthread: 'crsd.bin(4415) 48'
 
CPUS:
cpu0: sys-4.94 user-3.10 nice-0.0 usage-8.5 iowait-10.93
cpu1: sys-5.14 user-2.74 nice-0.0 usage-7.88 iowait-4.68
 
PROCESSES:
 
name: 'ora_smco_knewdb' pid: 27360 #procfdlimit: 65536 cpuusage: 0.00 privmem: 2092 shm: 17836 #fd: 26 #threads: 1 priority: 20 nice: 0 state: S
name: 'ora_gtx0_knewdb' pid: 27366 #procfdlimit: 65536 cpuusage: 0.00 privmem: 2028 shm: 17088 #fd: 26 #threads: 1 priority: 20 nice: 0 state: S

name: 'ora_rcbg_knewdb' pid: 27368 #procfdlimit: 65536 cpuusage: 0.00 privmem: ......

RAC Database Hang Analysis

In this section, we will explore the conceptual basis for invoking and interpreting a hang analysis dump to diagnose a potential RAC database hung/slow/blocking situation. When a database either is running unacceptably slow, is hung due to an internal system developed interdependence deadlock or a latch causing database hung/slowness, or else a prolonged deadlock/block hurts overall database performance, it is advisable to perform a hang analysis, which helps greatly in identifying the root cause of the problem. The following set of examples explains how to invoke and use the hang analysis:

SQL> sqlplus / as sysdba
 SQL> oradebug setmypid
 SQL> oradebug unlimit
 SQL> oradebug setinst all         -- enables cluster-wide hang analysis
 SQL> oradebug –g all hanganalyze 3   --is the most commonly used level
<< wait for couple of minutes >>
SQL> oradebug –g all hanganalyze 3

The hang analysis levels can be the currently set value between 1 to 5 and 10. When hanganlyze is invoked, the diagnostic information will be written to a dump file under $ORACLE_BASE/diag/rdbms/dbname/instance_name/trace, which can be used to troubleshoot the problem.

We have built the following test case to develop a blocking scenario in a RAC database to demonstrate the procedure practically. We will then interpret the trace file to understand the contents to troubleshoot the issue. The following steps were performed as part of the test scenario:

Create an EMP table:

SQL> create table emp (eno number(3),deptno number(2), sal number(9));

Load a few records in the table.

From instance 1, execute an update statement:

SQL> update emp set sal=sal+100 where eno=101; -- not commit performed

From instance 2, execute an update statement for the same record to develop a blocking scenario:

SQL> update emp set sal=sal+200 where eno=101;

At this point, the session on instance 2 is hanging and the cursor doesn’t return to the SQL prompt, as expected.

Now, from another session, run the hang analysis as follows:

SQL>oradebug setmypid
Statement processed.
SQL >oradebug setinst all
Statement processed.
SQL >oradebug -g all hanganalyze 3 <level 3 is most suitable in many circumstances>
Hang Analysis in /u00/app/oracle/diag/rdbms/rondb/RONDB1/trace/RONDB1_diag_6534.trc

Let’s have a walk-through and interpret the contents of the trace file to identify the blocker and holder details in context. Here is the excerpt from the trace file:

Node id: 1
List of nodes: 0, 1,       << nodes (instance) count >>
 
*** 2012-12-16 17:19:18.630
===============================================================================
HANG ANALYSIS:
  instances (db_name.oracle_sid): rondb.rondb2, rondb.rondb1
  oradebug_node_dump_level: 3        << hanganlysis level >>
  analysis initiated by oradebug
  os thread scheduling delay history: (sampling every 1.000000 secs)
    0.000000 secs at [ 17:19:17 ]
      NOTE: scheduling delay has not been sampled for 0.977894 secs    0.000000 secs from [ 17:19:14 - 17:19:18 ], 5 sec avg
    0.000323 secs from [ 17:18:18 - 17:19:18 ], 1 min avg
    0.000496 secs from [ 17:14:19 - 17:19:18 ], 5 min avg
===============================================================================
Chains most likely to have caused the hang:
 [a] Chain 1 Signature: 'SQL*Net message from client'<='enq: TX - row lock contention'
     Chain 1 Signature Hash: 0x38c48850
 
===============================================================================
Non-intersecting chains:
 
-------------------------------------------------------------------------------
Chain 1:
-------------------------------------------------------------------------------
    Oracle session identified by:       << waiter >>
    {
                instance: 2 (rondb.rondb2)
                   os id: 12250
              process id: 40, oracle@hostname (TNS V1-V3)
              session id: 103
        session serial #: 1243
    }
    is waiting for'enq: TX - row lock contention' with wait info:
    {
                      p1: 'name|mode'=0x54580006
                      p2: 'usn<<16 | slot'=0x20001b
                      p3: 'sequence'=0x101fc
            time in wait: 21.489450 sec
           timeout after: never
                 wait id: 33
                blocking: 0 sessions
             current sql: update emp set sal=sal+100 where eno=1
 
and is blocked by
 => Oracle session identified by:              << holder >>
    {
                instance: 1 (imcruat.imcruat1)
                   os id: 8047
              process id: 42, oracle@usdbt42 (TNS V1-V3)
              session id: 14
        session serial #: 125
    }
    which is waiting for 'SQL*Net message from client' with wait info:
    {
                      p1: 'driver id'=0x62657100
                      p2: '#bytes'=0x1
            time in wait: 27.311965 sec
           timeout after: never
                 wait id: 131
                blocking: 1 session
 
*** 2012-12-16 17:19:18.725
 
State of ALL nodes
([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]):
[102]/2/103/1243/c0000000e4ae9518/12250/NLEAF/[262]
 [262]/1/14/125/c0000000d4a03f90/8047/LEAF/
*** 2012-12-16 17:19:47.303
===============================================================================
HANG ANALYSIS DUMPS:
  oradebug_node_dump_level: 3
===============================================================================
 
State of LOCAL nodes
([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]):
[102]/2/103/1243/c0000000e4ae9518/12250/NLEAF/[262]
 
===============================================================================
END OF HANG ANALYSIS
===============================================================================

In the preceding example, SID 102 on instance 2 is blocked by SID 261 on instance 1. Upon identifying the holder, either complete the transaction or abort the session to release the lock from the database.

It is sometimes advisable to have the SYSTEMSTATE dump along with the HANGANALYSIS to generate more detailed diagnostic information to identify the root cause of the issue. Depending upon the level that is used to dump the SYSTEMSTATE, the cursor might take a very long time to return to the SQL prompt. The trace file details also can be found in the database alter.log file.

You shouldn’t be generating the SYSTEMSTATE dump under normal circumstances; in other words, unless you have some serious issues in the database or are advised by Oracle support to troubleshoot some serious database issues. Besides, the SYSTEMSTATE tends to generate a vast trace file, or it can cause an instance crash under unpredictable circumstances.

Above all, Oracle provides the HANGFG tool to automate the collection of systemstate and hang analysis for a non-RAC and RAC database environment. You need to download the tool from my_oracle_suport (previously known as metalink). Once you invoke the tool, it will generate a couple of output files, named hangfiles.out and hangfg.log, under the $ORACLE_BASE/diag/rdbms/database/instance_name/trace location.

Summary

This chapter discussed the architecture and components of the Oracle Clusterware stack, including the updates in Oracle Clusterware 12cR1. We will talk about some other new Oracle Clusterware features introduced in Oracle 12cR1 in Chapter 4.

This chapter also discussed tools and tips for Clusterware management and troubleshooting. Applying the tools, utilities, and guidelines described in this chapter, you can diagnose many serious cluster-related issues and address Clusterware stack startup failures. In addition, you have learned how to modify the default tracing levels of various Clusterware daemon processes and their subcomponents to obtain detailed debugging information to troubleshoot various cluster-related issues. In a nutshell, the chapter has offered you all essential cluster management and troubleshooting concepts and skills that will help you in managing a medium-scale or large-scale cluster environment.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.14.98