Introduction to PowerHA SystemMirror for AIX
This chapter contains the following topics:
1.1 What is PowerHA SystemMirror for AIX
PowerHA SystemMirror for AIX (also referred to as PowerHA in this book) is the IBM Power Systems™ data center solution that helps protect critical business applications from outages, planned or unplanned. One of the major objectives of PowerHA is to offer uninterrupted business services by providing redundancy in spite of different component failures.
1.1.1 High availability (HA)
In today’s complex environments, providing continuous service for applications is a key component of a successful IT implementation. High availability is one of the components that contributes to providing continuous service for the application clients, by masking or eliminating both planned and unplanned systems and application downtime. A high availability solution ensures that the failure of any component of the solution, either hardware, software, or system management, does not cause the application and its data to become permanently unavailable to the user.
High availability solutions should eliminate single points of failure through appropriate design, planning, selection of hardware, configuration of software, control of applications, a carefully controlled environment, and change management discipline.
In short, we can define high availability as the process of ensuring, through the use of duplicated or shared hardware resources, managed by a specialized software component, that an application is available for use.
1.1.2 Cluster multiprocessing
In addition to high availability, PowerHA also provides the multiprocessing component. The multiprocessing capability comes from the fact that in a cluster there are multiple hardware and software resources managed by PowerHA to provide complex application functionality and better resource utilization.
A short definition for cluster multiprocessing might be multiple applications running over several nodes with shared or concurrent access to the data.
Although desirable, the cluster multiprocessing component depends on the application capabilities and system implementation to efficiently use all resources available in a multi-node (cluster) environment. This must be implemented starting with the cluster planning and design phase.
PowerHA is only one of the high availability technologies and builds on the increasingly reliable operating systems, hot-swappable hardware, increasingly resilient applications, by offering monitoring and automated response.
A high availability solution based on PowerHA provides automated failure detection, diagnosis, application recovery, and node reintegration. With an appropriate application, PowerHA can also provide concurrent access to the data for parallel processing applications, thus offering excellent horizontal and vertical scalability (with the addition of the dynamic LPAR management capabilities)
PowerHA depends on Reliable Scalable Cluster Technology (RSCT). RSCT is a set of low-level operating system components that allow clustering technologies implementation, such as PowerHA and General Parallel File System (GPFS). RSCT is distributed with AIX. On the current AIX release, AIX 7.1, RSCT is on Version 3.1.2.0. RSCT. After installing PowerHA and Cluster Aware AIX (CAA) file sets, RSCT Topology Services subsystem is deactivated and all its functionality is performed by CAA.
PowerHA version 7.1 and later rely heavily on CAA infrastructure available in AIX 6.1TL6 and AIX 7.1. CAA provides communication interfaces and monitoring provision for PowerHA and execution using CAA commands with clcmd.
PowerHA also provides disaster recovery functionality such as cross site mirroring, IBM HyperSwap® and Geographical Logical Volume Mirroring. These cross-site clustering methods support PowerHA functionality between two geographic sites. Various methods exist for replicating the data to remote sites. For more information, IBM PowerHA SystemMirror 7.1.2 Enterprise Edition for AIX, SG24-8106.
1.2 Availability solutions: An overview
Many solutions can provide a wide range of availability options. Table 1-1 lists various types of availability solutions and their characteristics.
Table 1-1 Types of availability solutions
Solution
Downtime
Data availability
Observations
Stand-alone
Days
From last backup
Basic hardware and software costs
Enhanced stand-alone
Hours
Until last transaction
Double the basic hardware cost
High availability clusters
Seconds
Until last transaction
Double hardware and additional services; more costs
Fault-tolerant computing
Zero downtime
No loss of data
Specialized hardware and software, very expensive
High availability solutions, in general, offer the following benefits:
Standard hardware and networking components (can be used with the existing hardware).
Works with nearly all applications.
Works with a wide range of disks and network types.
Excellent availability at reasonable cost.
The highly available solution for IBM POWER systems offers distinct benefits:
Proven solution (more than 20 years of product development)
Using “off the shelf” hardware components
Proven commitment for supporting our customers
IP version 6 (IPv6) support for both internal and external cluster communication
Smart Assist technology enabling high availability support for all prominent applications
Flexibility (virtually any application running on a stand-alone AIX system can be protected with PowerHA)
When you plan to implement a PowerHA solution, consider he following aspects:
Thorough HA design and detailed planning from end to end
Elimination of single points of failure
Selection of appropriate hardware
Correct implementation (do not take “shortcuts”)
Disciplined system administration practices and change control
Documented operational procedures
Comprehensive test plan and thorough testing
A typical PowerHA environment is shown in Figure 1-1. Both IP heartbeat networks and non-IP network heartbeating is performed through the cluster repository disk.
Figure 1-1 PowerHA cluster
1.2.1 Downtime
Downtime is the period when an application is not available to serve its clients. Downtime can be classified in two categories, planned and unplanned:
Planned:
 – Hardware upgrades
 – Hardware/Software repair/replacement
 – Software updates/upgrades
 – Backups (offline backups)
 – Testing (periodic testing is required for cluster validation.)
 – Development
Unplanned:
 – Administrator errors
 – Application failures
 – Hardware failures
 – Operating system errors
 – Environmental disasters
The role of PowerHA is to maintain application availability through the unplanned outages and normal day-to-day administrative requirements. PowerHA provides monitoring and automatic recovery of the resources on which your application depends.
1.2.2 Single point of failure (SPOF)
A single point of failure is any individual component that is integrated in a cluster and that, in case of failure, renders the application unavailable for users.
Good design can remove single points of failure in the cluster: nodes, storage, and networks. PowerHA manages these, and also the resources required by the application (including the application start/stop scripts).
Ultimately, the goal of any IT solution in a critical environment is to provide continuous application availability and data protection. The high availability is just one building block in achieving the continuous operation goal. The high availability is based on the availability of the hardware, software (operating system and its components), application, and network components.
To avoid single points of failure, you need these items:
Redundant servers
Redundant network paths
Redundant storage (data) paths
Redundant (mirrored, RAID) storage
Monitoring of components
Failure detection and diagnosis
Automated application fallover
Automated resource reintegration
As previously mentioned, a good design is able to avoid single points of failure, and PowerHA can manage the availability of the application through downtimes. Table 1-2 lists each cluster object, which, if it fails, can result in loss of availability of the application. Each cluster object can be a physical or logical component.
Table 1-2 Single points of failure
Cluster object
Single point of failure eliminated by:
Node (servers)
Multiple nodes
Power supply
Multiple circuits, power supplies, or uninterruptible power supply (UPS)
Network adapter
Redundant network adapters
Network
Multiple networks connected to each nodes, redundant network paths with independent hardware between each node and the clients
TCP/IP subsystem
Use of non-IP networks to connect each node to its neighbor in a ring
I/O adapter
Redundant I/O adapters
Controllers
User-redundant controllers
Storage
Redundant hardware, enclosures, disk mirroring or RAID technology, redundant data paths
Application
Configuring application monitoring and backup nodes to acquire the application engine and data
Sites
Use of more than one site for disaster recovery
Resource groups
Use of resource groups to control all resources required by an application
PowerHA also optimizes availability by allowing for dynamic reconfiguration of running clusters. Maintenance tasks such as adding or removing nodes can be performed without stopping and restarting the cluster.
In addition, other management tasks, such as modifying storage, managing users, can be performed on the running cluster using the Cluster Single Point of Control (C-SPOC) without interrupting user access to the application running on the cluster nodes. C-SPOC also ensures that changes made on one node are replicated across the cluster in a consistent manner.
1.3 History and evolution
IBM High Availability Cluster Multiprocessing (HACMP) development started in 1990 to provide high availability solutions for applications running on IBM RS/6000® servers. We do not provide information about the early releases, which are no longer supported or were not in use at the time of writing. Instead, we provide highlights about the most recent versions.
Originally designed as a stand-alone product (known as HACMP classic), after the IBM high availability infrastructure known as Reliable Scalable Clustering Technology (RSCT) became available, HACMP adopted this technology and became HACMP Enhanced Scalability (HACMP/ES), because it provides performance and functional advantages over the classic version. Starting with HACMP v5.1, there are no more classic versions. Later HACMP terminology was replaced with PowerHA with v5.5 and then to PowerHA SystemMirror v6.1.
Starting with the PowerHA 7.1, the Cluster Aware AIX (CAA) feature of the operating system is used to configure, verify, and monitor the cluster services. This major change improved reliability of PowerHA because the cluster service functions were running in kernel space rather than user space. CAA was introduced in AIX 6.1TL6. At the time of writing this book, the release is PowerHA 7.1.3 SP1.
1.3.1 PowerHA SystemMirror Version 7.1.1
Released in September 2010, PowerHA 7.1.1 introduced improvements to PowerHA in terms of administration, security, and simplification of management tasks. The following list summarizes f the improvements in PowerHA V7.1.1:
Federated Security allows cluster-wide single point of control, such as these:
 – Encrypted File System (EFS) support
 – Role-based access control (RBAC) support
 – Authentication by using LDAP methods
Logical volume manager (LVM) and CSPOC enhancements, to name several:
 – EFS management by C-SPOC
 – Support for mirror pools
 – Disk renaming inside the cluster
 – Support for EMC, Hitachi, HP disk subsystems multipathing LUN as a clustered repository disk
 – Capability to display disk Universally Unique Identifier (UUID)
 – File system mounting feature (JFS2 Mount Guard), which prevents simultaneous mounting of the same file system by two nodes, which can cause data corruption
Repository resiliency.
Dynamic automatic reconfiguration (DARE) progress indicator.
Application management improvements such as new application startup option.
When you add an application controller, you can choose the application startup mode. Now, you can choose background startup mode, which is the default and where the cluster activation moves forward with an application start script that runs in the background. Or, you can choose foreground startup mode. When you choose the application controller option, the cluster activation is sequential, which means that cluster events hold application-startup-script execution. If the application script ends with a failure (non-zero return code), the cluster activation is considered to failed, also.
New network features, such as defining a network as private, use of netmon.cf file, and more network tunables.
1.3.2 PowerHA SystemMirror Version 7.1.2
Released in October 2012, PowerHA 7.1.2 continued to add features and functionality:
Two new cluster types (stretched and linked clusters):
 – Stretched cluster refers to a cluster that has sites that are defined in the same geographic location. It uses a shared repository disk. Extended distance sites with only IP connectivity is not possible with this cluster.
 – Linked cluster refers to a cluster with only IP connectivity across sites.
IPv6 support reintroduced
Backup repository disk
Site support reintroduced with Standard Edition
PowerHA Enterprise Edition reintroduced:
 – New HyperSwap support added for DS88XX
 – All previous storage replication options supported in PowerHA 6.1 are supported:
 • IBM DS8000® Metro Mirror and Global Mirror
 • San Volume Controller Metro Mirror and Global Mirror
 • IBM Storwize® v7000 Metro Mirror and Global Mirror
 • EMC SRDF synchronous and asynchronous replication
 • Hitachi TrueCopy and HUR replication
 • HP Continuous Access synchronous and asynchronous replication
 – Geographic Logical Volume Manager (GLVM)
1.3.3 PowerHA SystemMirror Version 7.1.3
Released in October 2013, the PowerHA V7.1.3 continued the development of SystemMirror, by adding further improvements in management, configuration simplification, automation, and performance areas. The following list summarizes the improvements in PowerHA V7.1.3:
Unicast heartbeat
Dynamic host name change
Cluster split and merge handling policies
The clmgr command enhancements:
 – Embedded hyphen and leading digit support in node labels
 – Native HTML report
 – Cluster copying through snapshots
 – Syntactical built-in help
 – Split and merge support
CAA enhancements:
 – Scalability up to 32 nodes
 – Support for unicast and multicast
 – Dynamic host name or IP address support
HyperSwap enhancements:
 – Active-active sites
 – One node HyperSwap
 – Auto resynchronization of mirrorring
 – Node level unmanage mode support
 – Enhanced repository disk swap management
PowerHA Plug-in enhancements for IBM Systems Director:
 – Restore snapshot wizard
 – Cluster simulator
 – Cluster split/merge support
Smart Assist for SAP enhancements
 
Note: More information about new features in PowerHA 7.1.3 are in Guide to IBM PowerHA SystemMirror for AIX Version 7.1.3, SG24-8167.
1.4 High availability terminology and concepts
To understand the functionality of PowerHA and to use it effectively, understanding several important terms and concepts can help.
1.4.1 Terminology
The terminology used to describe PowerHA configuration and operation continues to evolve. The following terms are used throughout this book:
Cluster Loosely-coupled collection of independent systems (nodes) or logical partitions (LPARs) organized into a network for the purpose of sharing resources and communicating with each other.
PowerHA defines relationships among cooperating systems where peer cluster nodes provide the services offered by a cluster node if that node is unable to do so. These individual nodes are together responsible for maintaining the functionality of one or more applications in case of a failure of any cluster component.
Node An IBM Power (System p, System i®, or BladeCenter) system (or LPAR) running AIX and PowerHA that is defined as part of a cluster. Each node has a collection of resources (disks, file systems, IP addresses, and applications) that can be transferred to another node in the cluster in case the node or a component fails.
Clients A client is a system that can access the application running on the cluster nodes over a local area network (LAN). Clients run a client application that connects to the server (node) where the application runs.
1.4.2 Concepts
The basic concepts of PowerHA can be classified as follows:
Topology Contains basic cluster components nodes, networks, communication interfaces, and communication adapters.
Resources Logical components or entities that are being made highly available (for example, file systems, raw devices, service IP labels, and applications) by being moved from one node to another. All resources that together form a highly available application or service, are grouped together in resource groups (RG).
PowerHA keeps the RG highly available as a single entity that can be moved from node to node in the event of a component or node failure. Resource groups can be available from a single node or, in the case of concurrent applications, available simultaneously from multiple nodes. A cluster can host more than one resource group, thus allowing for efficient use of the cluster nodes.
Service IP label A label that matches to a service IP address and is used for communications between clients and the node. A service IP label is part of a resource group, which means that PowerHA can monitor it and keep it highly available.
IP address takeover (IPAT)
The process whereby an IP address is moved from one adapter to another adapter on the same logical network. This adapter can be on the same node, or another node in the cluster. If aliasing is used as the method of assigning addresses to adapters, then more than one address can reside on a single adapter.
Resource takeover This is the operation of transferring resources between nodes inside the cluster. If one component or node fails because of a hardware or operating system problem, its resource groups are moved to the another node.
Fallover This represents the movement of a resource group from one active node to another node (backup node) in response to a failure on that active node.
Fallback This represents the movement of a resource group back from the backup node to the previous node, when it becomes available. This movement is typically in response to the reintegration of the previously failed node.
Heartbeat packet A packet sent between communication interfaces in the cluster, used by the various cluster daemons to monitor the state of the cluster components (nodes, networks, adapters).
RSCT daemons These consist of two types of processes (topology and group services) that monitor the state of the cluster and each node. The cluster manager receives event information generated by these daemons and takes corresponding (response) actions in case of any failure.
Group leader The node with the highest IP address as defined in one of the PowerHA networks (the first network available), that acts as the central repository for all topology and group data coming from the RSCT daemons concerning the state of the cluster.
Group leader backup
This is the node with the next highest IP address on the same arbitrarily chosen network, that acts as a backup for the group leader. It takes over the role of group leader in the event that the group leader leaves the cluster.
Mayor A node chosen by the RSCT group leader (the node with the next highest IP address after the group leader backup), if such exists, else it is the group leader backup itself. The mayor is responsible for informing other nodes of any changes in the cluster as determined by the group leader.
1.5 Fault tolerance versus high availability
Based on the response time and response action to system detected failures, the clusters and systems can belong to one of the following classifications:
Fault-tolerant systems
High availability systems
1.5.1 Fault-tolerant systems
The systems provided with fault tolerance are designed to operate virtually without interruption, regardless of the failure that might occur (except perhaps for a complete site down because of a natural disaster). In such systems, all components are at least duplicated for both software or hardware.
All components, CPUs, memory, and disks have a special design and provide continuous service, even if one sub-component fails. Only special software solutions can run on fault tolerant hardware.
Such systems are expensive and extremely specialized. Implementing a fault tolerant solution requires a lot of effort and a high degree of customization for all system components.
For environments where no downtime is acceptable (life critical systems), fault-tolerant equipment and solutions are required.
1.5.2 High availability systems
The systems configured for high availability are a combination of hardware and software components configured to work together to ensure automated recovery in case of failure with a minimal acceptable downtime.
In such systems, the software involved detects problems in the environment, and manages application survivability by restarting it on the same or on another available machine (taking over the identity of the original machine: node).
Therefore, eliminating all single points of failure (SPOF) in the environment is important. For example, if the machine has only one network interface (connection), provide a second network interface (connection) in the same node to take over in case the primary interface providing the service fails.
Another important issue is to protect the data by mirroring and placing it on shared disk areas, accessible from any machine in the cluster.
The PowerHA software provides the framework and a set of tools for integrating applications in a highly available system.
Applications to be integrated in a PowerHA cluster can require a fair amount of customization, possibly both at the application level and at the PowerHA and AIX platform level. PowerHA is a flexible platform that allows integration of generic applications running on the AIX platform, providing for highly available systems at a reasonable cost.
Remember, PowerHA is not a fault tolerant solution and should never be implemented as such.
1.6 Software planning
In the process of planning a PowerHA cluster, one of the most important steps is to choose the software levels to be running on the cluster nodes.
The decision factors in node software planning are as follows:
Operating system requirements: AIX version and recommended levels.
Application compatibility: Ensure that all requirements for the applications are met, and supported in cluster environments.
Resources: Types of resources that can be used (IP addresses, storage configuration, if NFS is required, and so on).
1.6.1 AIX level and related requirements
Before you install the PowerHA, check the other software level requirements.
Table 1-3 shows the required PowerHA and AIX levels at the time this book was written.
Table 1-3 AIX level requirements
PowerHA version
AIX level
Required APAR1
Minimum RSCT level
PowerHA v6.1
5300-09
 
2.4.12.0
6100-02-01
 
2.5.4.0
7100-02-01
PowerHA 6.1 SP3
3.1.0.0
PowerHA v7.1
6100-06
 
3.1.0.0
7100-00
 
3.1.0.0
PowerHA v7.1.1
6100-07-02
 
3.1.2.0
7100-01-02
 
3.1.2.0
PowerHA v7.1.2
6100-08-01
 
3.1.4.0
7100-02-01
 
3.1.4.0
PowerHA v7.1.3
6100-09-01
 
3.1.5.1
7100-03-01
 
3.1.5.1

1 authorized program analysis report (APAR)
The current list of recommended service packs for PowerHA are at the following web page:
The following AIX base operating system (BOS) components are prerequisites for PowerHA:
bos.adt.lib
bos.adt.libm
bos.adt.syscalls
bos.ahafs
bos.cluster
bos.clvm.enh
bos.data
bos.net.tcp.client
bos.net.tcp.server
bos.rte.SRC
bos.rte.libc
bos.rte.libcfg
bos.rte.libcur
bos.rte.libpthreads
bos.rte.lvm
bos.rte.odm
cas.agent (optional, but required only for IBM Systems Director plug-in)
devices.common.IBM.storfwork.rte (optional, but required for sancomm)
Requirements for NFSv4
The cluster.es.nfs file set that is included with the PowerHA installation medium installs the NFSv4 support for PowerHA, along with an NFS Configuration Assistant. To install this file set, the following BOS NFS components must also be installed on the system.
AIX Version 6.1:
 – bos.net.nfs.server 6.1.9.0
 – bos.net.nfs.client 6.1.9.0
AIX Version 7.1:
 – bos.net.nfs.server 7.1.3.0
 – bos.net.nfs.client 7.1.3.0
Requirements for RSCT
Install the RSCT file sets before installing PowerHA. Ensure that each node has the same version of RSCT.
To determine if the appropriate file sets are installed and what their levels are, issue the following commands:
/usr/bin/lslpp -l rsct.compat.basic.hacmp
/usr/bin/lslpp -l rsct.compat.clients.hacmp
/usr/bin/lslpp -l rsct.basic.rte
/usr/bin/lslpp -l rsct.core.rmc
If the file sets are not present, install the appropriate version of RSCT.
1.6.2 Licensing
Most software vendors require that you have a unique license for each application for each physical machine and also on a per core basis. Usually, the license activation code is entered at installation time.
However, in a PowerHA environment, in a takeover situation, if the application is restarted on a different node, be sure that you have the necessary activation codes (licenses) for the new machine; otherwise, the application might not start properly.
The application might also require a unique node-bound license (a separate license file on each node).
Some applications also have restrictions with the number of floating licenses available within the cluster for that application. To avoid this problem, be sure that you have enough licenses for each cluster node so the application can run simultaneously on multiple nodes (especially for concurrent applications).
For current information about PowerHA licensing, see the list of frequently asked questions:
1.7 PowerHA software installation
The PowerHA software provides a series of facilities that you can use to make your applications highly available. Remember, not all system or application components are protected by PowerHA.
For example, if all the data for a critical application resides on a single disk, and that specific disk fails, then that disk is a single point of failure for the entire cluster, and is not protected by PowerHA. AIX logical volume manager or storage subsystems protection must be used in this case. PowerHA only provides takeover for the disk on the backup node, to make the data available for use.
This is why PowerHA planning is so important, because your major goal throughout the planning process is to eliminate single points of failure. A single point of failure exists when a critical cluster function is provided by a single component. If that component fails, the cluster has no other way of providing that function, and the application or service dependent on that component becomes unavailable.
Also keep in mind that a well-planned cluster is easy to install, provides higher application availability, performs as expected, and requires less maintenance than a poorly planned cluster.
1.7.1 Checking for prerequisites
After you complete the planning worksheets, verify that your system meets the requirements of PowerHA. Many potential errors can be eliminated if you make this extra effort. See Table 1-3 on page 13.
1.7.2 New installation
PowerHA can be installed using the AIX Network Installation Management (NIM) program, including the Alternate Disk Migration option. You must install the PowerHA file sets on each cluster node. You can install PowerHA file sets either by using NIM or from a local software repository.
Installation using an NIM server
We suggest using NIM because it allows you to load the PowerHA software onto other nodes faster from the server than from other media. Furthermore, it is a flexible way of distributing, updating, and administrating your nodes. It allows you to install multiple nodes in parallel and provides an environment for maintaining software updates. This is useful and a time saver in large environments; for smaller environments, a local repository might be sufficient.
If you choose NIM, you must copy all the PowerHA file sets onto the NIM server and define an lpp_source resource before proceeding with the installation.
Installation from CD/DVD or hard disk
If your environment has only a few nodes, or if the use of NIM is more than you need, you can use CD/DVD installation or make a local repository by copying the PowerHA file sets locally and then use the exportfs command. This allows other nodes to access the data using NFS.
1.7.3 Installing PowerHA
Before installing PowerHA SystemMirror for AIX, read the following release notes (in the /usr/es/sbin/cluster/ directory) for current information about requirements or known issues:
PowerHA Standard Edition /usr/es/sbin/cluster/release_notes
PowerHA Enterprise Edition /usr/es/sbin/cluster/release_notes_xd
Smart Assists /usr/es/sbin/cluster/release_notes_assist
More details about installing and configuring are in Chapter 4, “Installation and configuration” on page 133.
To install the PowerHA software on a server node, complete the following steps:
1. If you are installing directly from the installation media, such as a CD/DVD or from a local repository, enter the smitty install_all fast path command. The System Management Interface Tool (SMIT) displays the “Install and Update from ALL Available Software” panel.
2. Enter the device name of the installation medium or installation directory in the INPUT device/directory for software field and press Enter.
3. Enter the corresponding field values.
To select the software to install, press F4 for a software listing, or enter all to install all server and client images. Select the packages you want to install according to your cluster configuration. Some of the packages might require prerequisites that are not available in your environment.
The following file sets are required and must be installed on all servers:
 – cluster.es.server
 – cluster.es.client
 – cluster.cspoc f
Read the license agreement and select Yes in the Accept new license agreements field. You must choose Yes for this item to proceed with installation. If you choose No, the installation might stop, and issue a warning that one or more file sets require the software license agreements. You accept the license agreement only once for each node.
4. Press Enter to start the installation process.
 
Tip: A good practice is to download and install the latest PowerHA Service Pack at the time of installation:
Post-installation steps
To complete the installation, complete the following steps:
1. Verify the software installation by using the AIX lppchk command, and check the installed directories to see if the expected files are present.
2. Run the lppchk -v and lppchk -c cluster* commands. Both commands run clean if the installation is good; if not, use the proper problem determination techniques to fix any problems.
3. A reboot might be required if RSCT prerequisites have been installed since the last time the system was rebooted.
More information
For more information about upgrading PowerHA, see Chapter 5, “Migration” on page 151.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.229.86