5.3. CSM administration

In 5.1, “CSM concepts and architecture” on page 212, we touch on the topics of CSM management and administration as a basic introduction to the main features of CSM and how they function.

In this section, we examine these administration topics in detail by using examples and sample scenarios, and discuss the following areas:

  • Log file management

  • Managing Node groups

  • Hardware control

  • Cluster File Manager

  • Software Management System

  • CSM monitoring

  • Diagnostic probes

  • CSM backup

  • Querying CSM database

  • CSM problem determination and diagnostics

  • CSM hostname changes

5.3.1. Log file management

CSM logs to several different log files during installation and cluster management. These log files are available on the management server and managed nodes, and they help to determine the status of a command, or in troubleshooting a CSM issue.

Most of the CSM log files on the management server are located in the /var/log/csm directory. Table 5-1 lists the log files on the management server and their purpose.

Table 5-1. Log files on management server
Log FilePurpose
/var/log/csm/install.logcsm.core install log
/var/log/csm/installms.logOutput of installms command
/var/log/csm/installnode.logVerbose output of installnode command
/var/log/csm/installnode.node.log.*hmc_nodecond out of each node’s installation
/var/log/csm/csmsetupyast.logOutput of csmsetupyast command
/var/log/csm/updatenode.logOutput of updatenode command
/var/log/csm/smsupdatenode.logOutput of smsupdatenode command
/var/log/csm/cfmerror.logCFM error log
/var/log/csm/cfmchange.logOutput of CFM file updates
/var/log/csm/hw_logfileHW control daemon status log
/var/log/csm/hmc[IP_address].log.*HMC communication error messages
/var/log/csm/hmc[IP_address].java_traceTracing for openCIMOM calls to HMC
/var/log/csm/hmc_logfile.314Tracing for libhmc_power.so
/var/log/csm/getadapters/getadapters.node.log.*Output of getadapters command for each node
/var/ct/RMstart.logResource Manager status log
/var/ct/*.stderrRSCT daemons standard errors
Linux log files in /var/logsRefer to problem determination section 3.
Other Linux log filesRefer to problem determination section 3

Table 5-2 on page 253 lists log files on managed nodes and their purpose.

Table 5-2. Log files on managed nodes
Log filePurpose
/var/log/csm/install.logcsm.core install log
/var/log/csm/updatekernel.logKernel update log running smsupdatenode

5.3.2. Node groups

Managed nodes can be grouped together by using the nodegrp command. Distributed commands can be issued against groups for common tasks, instead of performing them on each node. Default node groups created at install time are shown in Example 5-27.

Example 5-27. nodegrp command
# nodegrp
ManagedNodes
AutoyastNodes
ppcSLES81Nodes
AllNodes
SuSE82Nodes
SLES72Nodes
pSeriesNodes
SLES81Nodes
LinuxNodes
PreManagedNodes
xSeriesNodes
EmptyGroup
APCNodes
RedHat9Nodes
MinManagedNodes

Node groups are created with the nodegrp command:

#nodegrp -a lpar1, lpar2 testgroup

This creates a group called test group which includes nodes lpar1 and lpar2. For more information, refer to the nodegrp man page.

Distributed commands such as dsh can be run on nodegroups:

# dsh -w testgroup date

5.3.3. Hardware control

The CSM hardware control feature is used to remotely control HMC-attached pSeries servers. Remote nodes can be powered on, off, the power status can be queried, and you can open a remote console from the management server.

It is mandatory to have all pSeries servers connected to HMC, and to have the HMC communicate with the management server for the hardware control function. Figure 5-7 shows hardware control feature design for a simple CSM cluster.

Figure 5-7. pSeries CSM cluster with hardware control using HMC


Hardware control uses openCIMOM (public software) and conserver software to communicate to HMC to issue remote commands. The IBM.HWCTRLRM daemon subscribes and maintains state to HMC openCIMOM events during startup. Conserver is started at boot time on the management server and reads from a defined config file located at /etc/opt/conserver/conserver.cf.

The following hardware control commands are available on the management server:

rpower Powers nodes on and off and queries power status

rconsole Opens a remote serial console for nodes

chrconsolecfg Removes, adds and re-writes conserver config file entries

rconsolerefresh Refreshes conserver on the management server

getadapters Obtains MAC addresses of remote nodes

lshwinfo Collects node information from Hardware Control points

systemid Stores userid and encrypted password required to access remote hardware

The rpower and rconsole commands are frequently used hardware control commands and we discuss them in detail here:

Remote power

Remote power commands access the CSM database for node attribute information.

PowerMethod Node attribute must be set to hmc to access pSeries nodes.

HardwareControlPoint is the hostname or IP address of the Hardware Management Console (HMC).

HardwareControlNodeId is the hostname or IP address of the managed node which is attached to the HMC over a serial link.

Other Node attributes such as HWModel, HWSerialNum, HWType are obtained automatically using lshwinfo.

Remote power configuration is outlined in 5.2.5, “Installing CSM on the management server” on page 228.

Remote console

The Remote console command communicates with the console server to open remote console to nodes using management VLAN and Serial connections. The HMC works as the remote console server listening for requests from the management server.

Only one read write console, but multiple read only consoles, can be opened to each node by using the rconsole command.

5.3.4. Configuration File Manager (CFM)

Configuration File Manager (CFM) is a CSM component to centralize and distribute files across management nodes in a management cluster. This is similar to file collections on IBM PSSP. Common files such as /etc/hosts across the cluster are distributed from the management server using a push mechanism through root’s crontab and/or event monitoring. CFM uses rdist to distribute files. Refer to 5.1.7, “CSM diagnostic probes” on page 220 for more information on hostname changes.

CFM uses /cfmroot as its main root directory, but copies all files to /etc/opt/csm/cfmroot with a symlink on the management server. File permissions are preserved while copying. Make sure that you have enough space in your root directory or create /cfmroot on a separate partition and symlink it from /etc/opt/csm/cfmroot.

Example 5-28 shows cfmupdatenode usage.

Example 5-28. cfmupdatenode usage
Usage: cfmupdatenode [-h] [-v | -V]
                [-a | -N node_group[,node_group] | --file file ] [-b]
                [[-y] | [-c]] [-q [-s] ] [-r remote shell path]
                [-t timeout] [-M number of max children]
                [-d location for distfile] [-f filename] [[-n] node_list]
     -a         Files are distributed to all nodes. This option cannot be
                used with the -N or host positional arguments.
     -b         Backup. Preserve existing configuration file (on nodes) as
                "filename".OLD
     -c         Perform binary comparison on files and transfer them
                if they differ.
     -d distfile location
                cfmupdatenode will generate a distfile in the given (absolute)
                path and exit (without transferring files). This way the user
                can execute Rdist with the given distfile and any options
                desired.
     -f  filename
                Only update the given filename. The filename must be the
                absolute path name of the file and the file must reside in
                the cfmroot directory
     --file filename
                specifies a file that contains a list of nodes names. If the
                file name "-", then the list is read from stdin. The file
                can contain multiple lines and each line can have one or node
                names, separated by spaces.
     -h         Writes the usage statement to standard out.
     [-n]  node_list
                Specifies a list of node hostnames, IP addresses, or node
                ranges on which to run the command. (See the noderange man
                page for information on node ranges.)
     -M  number of maximum children
                Set the number of nodes to update concurrently.
                (The default is 32.)
     -N Node_group[,Node_group...]
                Specifies one or more node groups on which to run the command.
     -q         Queries for out of date CFM files across the cluster.
     -s         Reports which nodes are up to date by comparing last CFM
                update times. Must be called with the -q option.
     -r remote  shell path.
                Path to remote shell. (The default is the DSH_REMOTE_CMD
                environment variable, or /usr/bin/rsh).
     -t timeout
                Set the timeout period (in seconds) for waiting for response
                from a remote process. (The default is 900).
     -v | V     Verbose mode.
     -y         Younger mode. Does not update files younger than master copy.

Note

CFM can be set up prior to running the installnode command, and common files are distributed at install time while installing nodes.


At CSM install time, root’s crontab is updated with an entry to run cfmupdatenode every day at midnight.This can changed to suit your requirements.

#crontab -l |grep cfmupdate
0 0 * * * /opt/csm/bin/cfmupdatenode -a 1>>/var/log/csm/cfmerror.log
2>>/var/log/csm/cfmerror.log

Some common features of CFM, along with usage examples, are described here.

  • In general, it is important to have a single /etc/hosts file across the management cluster. The CSM database and other commands do hostname resolution either using /etc/hosts or DNS. To keep a single copy of /etc/hosts, symlink /etc/hoststo /cfmroot/etc/hosts:

    # ln -s /etc/hosts /cfmroot/etc/hosts
    
  • Run the cfmupdatenode command to copy the hosts file to all managed nodes defined in the CSM database:

    # cfmupdatenode -a
    
  • If you want to have a file that is different on the management server and all managed nodes, copy or create the file to /cfmroot instead of symlinking it and then distributing it across to nodes. Files in /cfmroot are not distributed to the management server.

    # copy /etc/file.1 /cfmroot/etc/file.1
    #touch /cfmroot/etc/file.1
    #cfmupdatenode -a
    
  • Files can be distributed to selected nodegroups only, instead of to all managed nodes, by creating the files with a ._groupname extension. To distribute ntp.conf file to, for example, nodegroup group1, create the file with ._group1 and when cfmupdatenode is run next time, it copies the ntp.conf file only to group1 nodes.

    # cp /etc/ntp.conf /cfmroot/etc/ntp.conf.group1
    # cfmupdatenode -a
    
  • CFM customizations can be done after distributing files using pre-install and post-install scripts. Create script files with pre- and post- extensions in cfmroot and when cfmupdatenode runs, pre- and post- scripts are run accordingly. Example 5-29 shows running a pre-install script to save the file and distribute the file, and then running the post script to reset permissions.

    Example 5-29. Example of cfmupdatenode
    Create pre and post install scripts
    root@ms#cat >/cfmroot/etc/ntp.conf.pre
    #!/bin/sh
    cp /etc/ntp.conf /etc/ntp.conf.'date'
    ^D
    root@ms# cat >/cfmroot/etc/ntp.conf.post
    #!/bin/sh
    /sbin/service ntprestart
    ^D
    root@ms#chmod 755 /cfmroot/etc/ntp.conf.pre ntp.conf.post
    root@ms#cp /etc/ntp.conf /cfmroot/etc/ntp.conf._group1
    root@ms#cfmupdatenode -a
    

  • To have event monitoring monitor CFM file modifications and push the files whenever files are modified, start the condition and responses as below:

    # startcondresp CFMRootModTimeChanged CFMModResp
    

Whenever a file in /cfmroot is modified, the changes are propagated to all managed nodes in the cluster.

Note

Use caution while enabling CFM event monitoring, as it can impact system performance.


User id management with CFM

CFM can be used to implement centralized user id management in your management domain. User ids and passwords are generated on the management server, stored under /cfmroot, and distributed to nodes as scheduled.

Copy the following files to /cfmroot to set up effective user id management:

  • /etc/passwd ----> /cfmroot/etc/password_useridmgmt.group

  • /etc/shadow------> /cfmroot/etc/shadow_useridmgmt.group

  • /etc/group---------> /cfmroot/etc/group_useridmgmt.group

  • /etc/hosts------> /cfmroot/etc/hosts_useridmgmt.group

Be aware that any id and password changes made on the nodes will be lost once centralized user id management is implemented. However, you can force users to change their passwords on the management server instead of on nodes. Set up scripts or tools to centralize user id creation and password change by group on the management server, and disable password command privileges on managed nodes.

CFM distributes files to managed nodes, but never deletes them. If a file needs to be deleted, delete it manually or with a dsh command from the management server. All CFM updates and errors are logged to files /var/log/csm/cfmchange.log and /var/log/csm/cfmerror.log.

For more information, refer to IBM Cluster Systems Management for Linux: Administration Guide, SA22-7873.

5.3.5. Software maintenance

The CSM Software Maintenance System (SMS) is used to install, query, update and delete Linux RPM packages on the management server and managed nodes. It is performed using the smsupdatenode command. Autoupdate open source software is a prerequisite for using SMS.

SMS uses either install mode to install new RPM packages, or update mode to update existing RPM packages on cluster nodes. Preview or test mode only tests the update without actually installing the packages.

The SMS directory structure includes /csminstall/Linux/InstallOSName/InstallOSVersion/InstallOSArchitecture/RPMS ../updates and ../install subdirectories to maintain all SMS RPMs, updates and install packages, respectively. Sample SMS directory structure on SuSE8.1 looks like the following:

  • /csminstall/Linux/SLES/8.1/ppc64/RPMS - contains all dependent RPM packages

  • /csminstall/Linux/SLES/8.1/ppc64/updates - -contains all RPM package updates

  • /csminstall/Linux/SLES/8.1/ppc64/install - contains all new RPM packages that are not installed with OS and need to be installed. All third party vendor software can also be placed in this subdirectory.

Copy the requisite RPM packages in the respective subdirectories from Install or Update CDs.

Note

SMS is only for maintaining RPM packages. OS patch CDs cannot be used for updating OS packages.

Follow these steps to copy the RPM packages from patch CDs to respective subdirectories, and then issue smsupdatenode:

1.
Mount the Patch CD on /mnt/cdrom.

2.
#cd /mnt/cdrom;cp 'find . -name *.rpm 
/csminstall/Linux/SLES/8.1/ppc64/updates '

3.
# smsupdatenode -v


Example 5-30 shows usage of smsupdatenode.

Example 5-30. smsupdatenode usage
Usage:
smsupdatenode   [-h] [-a | -N node_group[,node_group] | --file file ]
                [-v | -V] [-t | --test] [-q | --query [-c | --common]]
                [--noinsdeps] [-r "remote shell path"]
                [-i | --install packagename[,packagename]]
                [-e | --erase {--deps | --nodeps} packagename[,packagename]]
                [-p | --packages packagename[,packagename]] [[-n] node_list]
smsupdatenode   [--path pkg_path] --copy {attr=value... | hostname}
       -a       Run Software Maintenance on all nodes.
       --copy {attr=value... | hostname}
                Copy the distrobution CD-Roms corresponding to the given
                attributes or hostname to the correct /csminstall directory.
                If you give attr=value pairs they must come at the end of the
                command line. The valid attributes are:
                        InstallDistributionName
                        InstallDistributionVersion
                        InstallPkgArchitecture
                If a hostname is given, the distribution CD-ROMs, and
                destination directory, are determined by the nodes
                attributes.
       -e | --erase {--deps | --nodeps} packagename[,packagename]
                Removes the RPM packages specified after either the --deps
                or --nodeps option.
            --deps
                Removes all packages dependent on the package targeted for
                removal.
            --nodeps
                Only removes this package and leaves the dependent packages
                installed.
       --file filename
                specifies a file that contains a list of nodes names. If the
                file name "-", then the list is read from stdin. The file
                can contain multiple lines and each line can have one or node
                names, separated by spaces.
       -h       Writes the usage statement to standard out.
       [-n] node_list
                Specifies a list of node hostnames,  IP addresses, or node
                ranges on which to run the command. (See the noderange man
                page for information on node ranges.)
       -i | --install packagename[,packagename]
                Installs the given RPM packages.
       -N Node_group[,Node_group...]
                Specifies one or more node groups on which to run the
                command.
       --noinsdeps
                Do not install RPM dependencies.
       -p | --packages packagename[,packagename]
                Only update the given packages. The user does not have to
                give the absolute path. It will be determined by looking under
                directory structure corresponding to the node.
       --path pkg_path
                Specifies one or more directories, separated by colons, that
                contain copies of the distrobution CD-ROMs. The default on a
                Linux system is /mnt/cdrom and the default on an AIX system is
                /dev/cd0. This flag may only be used with the --copy flag.
       -q | --query [-c | --common]
                Query all the RPMs installed on the target machines and report
                the RPMs installed that are not common to every node.
           -c | --common
                Also report the common set of RPMs (installed on every target
                node).
       -r "remote shell path"
                Path to use for remote commands. If this is not set, the
default
                is determined by dsh.
       -t | --test
                Report  what would be done by this command without making any
                changes to the target system(s)
       -v | -V    Verbose mode.

SMS writes logs to /var/log/csm/smsupdatenode.log files.

Kernel packages are updated as normal RPM packages using SMS. Once upgraded, kernel cannot be backed out, so use caution while running smsupdatenode command with any kernel packages (kernel* prefix).

Also, make sure to run lilo to reload the boot loader if you upgrade kernel and wants to load the new kernel.

5.3.6. CSM Monitoring

CSM uses Reliable Scalable Cluster Technology Infrastructure (RSCT) for event monitoring. RSCT has been proven to provide highly available and scalable infrastructure in applications such as GPFS and PSSP.

CSM Monitoring uses a condition and response-based system to monitor system resources such as processes, memory, CPU and file systems. A condition can be a quantified value of a monitored resource attribute, and is based on a defined event expression. If an event expression is true, then an event is generated.

File system utilization(/var) is a resource to be monitored, and “condition” can be THE percent utilization on that resource. For example, /var >90% means if the /var file system increases above a 90% threshold value, then the event expression is true and an event is generated. To prevent flooding of generating events, a re-arm expression can be created. In this case, no event will be generated until the re-arm expression value is true.

A response can be one or more actions performed when an event is triggered for a defined condition. Considering the file system resource example, if we define that a response action is to increase the file system by 1 MB if /var reaches above 90% and to notify the system administrator, then after monitoring is started, whenever /var goes above 90%, a response action is performed automatically.

A set of predefined conditions and responses are available at CSM install. See the IBM Cluster Systems Management for Linux: Administration Guide SA22-7873, for more information.

Resource Monitoring and Control (RMC) and Resource Managers (RMs)

Resource Monitoring and Control (RMC) and Resource Managers (RM) are components of RSCT and are critical for monitoring.

  • RMC provides monitoring of system resources. The RMC daemon monitors resources and alerts RM.

  • RM can be defined as a process that maps resources and resource classes to commands for one or more resources. Resource classes contain resource attributes and descriptions and are available for query through the command line.

Table 5-3 lists available resource managers and their functions.

Table 5-3. Resource managers
Resource managerFunction
IBM.AuditRMAudit Logging
IBM.ERRMEvent resource manager
IBM.HWCTRLRMHardware control
IBM.DMSRMDomain node management
IBM.HostRMHostname management
IBM.SensorRM 
IBM.FSRMFile System management

Table 5-4 lists predefined resource classes and can be obtained with the command lsrsrc.

Table 5-4. Predefined resource classes
Resource classDescription of attribute
IBM.AsociationPersistent Resources
IBM.AuditlogEvent Audit logging
IBM.AuditlogTemplateTemplate for audit logging
IBM.ConditionPre-defined conditions
IBM.EthernetDevicePrimary Ethernet device
IBM.EventResponsePre-defined responses
IBM.HostManagement server host
IBM.FileSystemFile system attributes
IBM.Program 
IBM.TokenRingToken ring device
IBM.SensorCFM root and MinManaged
IBM.ManagedNodeManaged node
IBM.ManagementServerManagement server on nodes
IBM.NodeAuthenticateNode authentication
IBM.PreManagedNodePreManaged node classification
IBM.NodeGroupNodegroups
IBM.NetworkINterfaceDefined network interface
IBM.DmsCtrlDomain control
IBM.NodeHwCtrlNode Hardware control point attributes
IBM.HwCtrlPointHardware Control point (HMC)
IBM.HostPublic 

lsrsrc -l Resource_class will list detailed attributes of each resource class. Check the man page of lsrsrc for more details.

Customizing event monitoring

As explained, custom conditions and responses can be created and custom monitoring can be activated on one or more nodes as follows:

  • Create a custom condition or event expression such as monitor file system space used on node lpar1 only in the management domain:

    #mkcondition -r IBM.FileSystem -e "PercentTotUsed > 90" -E "PercentTotUsed
    < 85" -n lpar1 -m d "File system space used
    

    wherein:

    -r option is for resource class

    -e is for creating an event expression

    -E is for re-arm expression

    -n is for specifying a node

    -m is for management domain. Is a must if -n is used.

    d is for description of condition

  • Create a custom response, such as e-mail root using a notification script to run Sunday, Monday and Tuesday.

    #mkresponse -n "E-mail root" -d " 1+2+3 -s "/usr/sbin/rsct/bin/notifyevent
    root" -e b "E-mail root any time"
    

    wherein,

    -n is for action

    -d days of week

    -e type of event to run and b is for both event and re-arm event

  • Link the created file system space used condition and e-mail notification response

    									#mkcondresp -"File system space used" E-mail root any time"
    
  • Start the created condition and responses linked above:

    #startcondresp "File system space used" "E-mail root any time"
    
  • List the condition and responses to check the status. State is listed as “Active” if started and “Not Active” if not started.

    #lscondresp
    

Example 5-31 shows the output of lscondresp.

Example 5-31. lscondresp output
#lscondresp
Displaying condition with response information:
Condition                      Response                         Node       State
"NodeFullInstallComplete"      "RunCFMToNode"                   "mgmt_server" "Active"
"NodeManaged"                  "GatherSSHHostKeys"              "mgmt_server" "Active"
"UpdatenodeFailedStatusChange" "UpdatenodeFailedStatusResponse" "mgmt_server" "Active"
"NodeChanged"                  "rconsoleUpdateResponse"         "mgmt_server" "Active"
"NodeFullInstallComplete"      "removeArpEntries"               "mgmt_server" "Active"
"FileSystem Space Used"        "E-mail root any time"          "mgmt_server" "Active"
						

If any file system size exceeds 90% on lpar1, our newly created event triggers an an event as a response action by e-mailing root. Monitoring resumes once the file system is fixed back to 85%.

Multiple response actions can be defined to a single condition, and a single response can be assigned to multiple conditions. For Example 5-31, an action such as increasing the file system or deleting files older than 60 days from the file system to claim space could be other actions.

5.3.7. Diagnostic probes

CSM diagnostic probes help you diagnose system problems using programs called probes.The CSM command probemgr is useful in sending custom probes to determine problems; users write their own diagnostics scripts and call probemgr.

All predefined probes are located in the /opt/csm/diagnostics/probes directory, and probemgr can access the user-defined directory before reading the predefined probes called with a -D option. System problem diagnostics can be dependent on each other, and probes take a defined order while running. Example 5-32 shows usage of probemgr.

Example 5-32. Probemgr usage
probemgr [-dh] [-c {0|10|20|127}] [-l {0|1|2|3|4}]
         [-e prb,prb,...] [-D dir] [-n prb]
    -h        Display usage information
    -d        Show the probes dependencies and the run order
    -c        Highest level of exit code returned by a probe that the
              probe manager permits before terminating.The defaule value
              is 10
                0   - Success
                10  - Success with Attention Messages
                20  - Failure
                127 - Internal Error
    -l        Indicates the message output level. The default is 3
                0 - Show probe manager messages, probe trace messages,
                    probe explanation and suggested action messages, probe
                    attention messages and probe error messages
                1 - Show probe trace messages, probe explanation and
                    suggested action messages, probe attention messages
                    and probe error messages
                2 - Show probe explanation and suggested action messages,
                    probe attention messages and probe error messages
                3 - Show probe attention messages and probe error
                    messages
                4 - Show probe error messages only
    -e prb,.. List of probes to exclude when creating the probe dependency
              tree. This also means that those probes will not be run
    -D dir    Directory where user specified probes reside
    -n prb    Run the specified probe

Table 5-5 lists the default pre-defined probes available and the probe dependencies.

Table 5-5. Probes and dependencies
ProbeDependent probe
dshssh-protocol
nfsnetwork
rmcnetwork
errmrmc
fs-mountsnone
network-ifacesnetwork-enabled
dmsrmrmc
network-routesnetwork-enabled network-ifaces
network-hostnamenone
network-enablednone
network-pingnetwork-enabled

network-ifaces

network-routes
networknetwork-enabled

network-hostnamr

network-ifaces

network-routes

network-ipforward

network-ping
network-ipforwardnone
ssh-protocolnone
rsh-protocolnone

All probes are run from the management server using the probemgr command. For detailed information on each probe, refer to probemgr man page.

5.3.8. Querying the CSM database

CSM stores all cluster information, such as nodes, attributes of nodes, and so on, in a database at a centralized location in the /var/ct directory. This database is accessed and modified using tools and commands, but not directly with a text editor.

Table 5-6 on page 268 lists commands you can use to access the CSM database.

Table 5-6. CSM database commands
CommandPurpose
lsnodeLists nodes defined
definenodeAdd/define nodes
chnodeChange node definitions
rmnodeRemove defined nodes
smsupdatenodeSoftware update the node
installnodeInstall node
csmsetupyastSetup config for the node to be installed
cfmupdatenodeDistribute files
rpowerRemote power
rconsoleRemote Console

5.3.9. Un-installing CSM

CSM is un-installed by using the uninstallms command on the management server. Not all packages are removed while running uninstallms. Table 5-7 identifies what is removed and what is not removed with uninstallms.

Table 5-7. Uninstallms features
Node definitionsRemoved
Node group definitionsRemoved
Predefined conditionsRemoved
CSM packagesRemoved
CSM log filesRemoved
CSM Package prerequisitesNot removed
RSCT packages when rsct.basic is presentNot removed
RSCT packages when csm.client is present on mgmt. serverNot removed
RSCT packages when no rsct.basic installedRemoved
/cfmrootNot removed
/csminstallNot removed
/opt/csmRemoved
SSH public keysNot removed

Clean up manually to remove all the packages and directories that are not removed with the uninstallms command to completely erase CSM. Refer to IBM Cluster Systems Management Guide for Linux: Planning and Installation Guide-Version 1.3.2, SA22 7853, for detailed information

5.3.10. Distributed Command Execution Manager (DCEM)

DCEM is a Cluster Systems Management GUI interface used to run a variety of tasks on networked computers. Currently this is not available for pSeries machines.

5.3.11. Backing up CSM

Currently CSM backup and restore features are not available for pSeries Linux management server version 1.3.2. These will be available in the near future.

5.3.12. CSM problem determination and diagnostics

CSM logs detailed information in various log files on the management server and on managed nodes. These log files are useful in troubleshooting problems. In this section, we discuss some common and frequent problems which may be encountered while setting up and running CSM. For more detailed information and diagnostics, refer to the IBM Cluster Systems Management Guide for Linux: Administration Guide, SA22-7873.

Table 5-8 lists common CSM problems and their fixes.

Table 5-8. Common CSM problems and fixes
ProblemFix
installms failsMake sure to copy requisites to the reqs directory on the temporary CSM package folder
rpower reports <unknown> status for queryManagement server event subscriptions either expired or hung on HMC. Refresh openCIMOM on HMC or reboot HMC.
rpower -a query reports “Hardware Control Socket Error”
  • The Java path might have changed on the management server. Verify that the Java search path is “/usr/lib/IBMJava2-1.3.1/jre/bin/java”.

    Restart HWCTRL as follows:

    - stopsrc -s IBM.HWCTRLRM;

    - startsrc -s IBM.HWCTRLRM -e “HC_JAVA_PATH=/usr/lib/IBMJav a2-1.3.1”

Any other rpower or rconsole errors
  • Stop/start RMC daemons such as IBM.HWCTRLRM on the management server

  • Stop/Start openCIMOM on the HMC by running “initCIMOM stop” and “initCIMOMstart”

  • Reboot the HMC

  • Start IBM.HWCTRLRM with trace hooks on to collect more data by running “startsrc -s IBM.HWCTRLRM -e “HC_JAVA_VERBOSE=/tmp/jni.txt”

  • Run “rmcctl -A” and rmcctl -Z” to stop/remove and add/start RMC daemons

  • Check IBM Cluster Systems Mgmt for Linux: Hardware Control Guide, SA22-7856

rconsole not coming up on the current windowCheck the flags to make sure the -t option is specified
rconsole fails with “xinit failed to connect to console” error
  • Corrupted conserver.cf. Rewrite the config files and refresh conserver conserver

  • chrconsolecfg -a

  • rconsolerefresh -r

csmsetupyast fails with getadapters errors
  • Run getadapters command line to populate CSM node database

  • Or use the chnode command to upload node network attributes such as InstallAdapterType, MAC address, and so on

  • getadapaters may fail if run on multiple nodes, so check /var/log/csm/getadapters/getadapter*. * logs to check, and fix any errors with locks

installnode fails
  • Run lsnode and verify duplicate nodes entries are listed

  • Verify log files to find/fix any errors

  • Check for node attribute “Mode” to make sure it is set to PreManaged. If set to Installing and installnode fails, reset it to “PreManaged” with the chnode command

  • Check network cables for proper connectivity

  • Check /etc/dhcpd.conf and restart dhcpd

  • Check /etc/inetd.conf for tftp and restart inetd

  • Open the read only console and look for any packaging errors. Installnode waits for input if any package errors are encountered. Open the read-write console to interactively respond to install options and close the console.

updatenode failsCheck for hostname resolution errors
dsh failsCheck for SSH authentication errors
smsupdatenode failsCheck that RPM packages are copied to the right directories
Event monitoring failsCheck for proper network connectivity

Refer to IBM Cluster Systems Management for Linux: Hardware Control guide SA22-7856 for more information on hardware control and HMC connectivity and RMC issues.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.219.156.161