vSphere HA agent troubleshooting

vSphere HA host states are reported by vCenter Server when there are some errors in your vSphere hosts. When working with vSphere infrastructure in a highly available environment, you may encounter different kinds of errors that prevent vSphere HA from working correctly, for example, HA agent on crimv1esx002.linxsol.com in cluster Cluster-ML-FT in DataCenter017-Milan has an error or insufficient resources to satisfy HA failover level on cluster. This is followed by agent error, vSphere HA agent cannot be correctly installed or configured, Internal AAM Errors - agent couldn't start, and so on.

In this topic, we will discuss possible causes and troubleshooting tips to solve these issues. A good starting point for troubleshooting HA agents' errors could be VMkernel logs that you can find in /var/log, as discussed in previous chapters.

As the VMware Knowledge Base suggests (http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2007739), check if your vSphere host is in the lockdown mode. You can use power shell to verify this using the following command:

Get-vmhost crimv1esx001.linxsol.com | select Name, @{N="LockDown";E={$_.Extensiondata.Config.adminDisabled}} | ft -auto Name, LockDown

If your vSphere host is in Lockdown mode, you can use the following command to exit from the mode:

(Get-vmhost crimv1esx001.linxsol.com | get-view).ExitLockdownMode()

You can also verify it from the vSphere web client:

  1. Select your vSphere host and go to Manage.
  2. Go to Settings and then to Security Profile.
  3. Scroll down to Lockdown Mode.
  4. You can verify if your vSphere host is in Disabled, Normal, or Strict mode.
    vSphere HA agent troubleshooting

Unreachable or uninitialized state

The vSphere HA agent becomes uninitialized on a vSphere host. When a master vSphere host or vCenter Server tries to contact to the agent of the vSphere host and it doesn't respond, it is declared to be in the uninitialized state. There could be multiple possible reasons why a vSphere host is uninitialized. When an HA agent gets uninitialized, the vSphere host is not able to reach any datastores, not even the local datastore where the vSphere host HA caches the state information of HA agent.

You should also check the firewall ports on your vSphere hosts if they are open for the vSphere HA agent to communicate with other hosts and the vCenter Server. The vSphere HA Agent uses port 8182 for communication. You can check the event log from the vSphere client to find out the reason.

It is logged as vSphere HA Agent for the host that has an error. You should make sure the datastores are accessible by the vSphere host. For troubleshooting datastores, you can follow the guidelines given in Chapter 5, Monitoring and Troubleshooting Storage. If the problem persists, you should reinstall the HA agent on the vSphere host (the topic will be covered in the upcoming context).

Incoming Ports

Outgoing Ports

TCP

UDP

8042

 

8045

 

8182

8182

 

2050

 

2250

A vSphere HA agent is declared in an unreachable state when a vSphere master host is unable to contact a secondary vSphere host. In this scenario, the vSphere HA stops monitoring the virtual machines and is unable to maintain them. It might be as simple as a networking problem, where vCenter Server is unable to reach the vSphere HA agent, or as complicated where all vSphere hosts in a given cluster have failed. vCenter Server cannot communicate to the HA agent when you disable the vSphere HA agent on hosts and then re-enable it. If a vSphere HA agent fails, the watchdog process tries to restart it, but if the watchdog service fails to restart the HA agent, the HA agent is declared to be unreachable. You can follow the guidelines given in Chapter 4, Monitoring and Troubleshooting Networking. You should also make sure your cluster is not having any failures. If it still doesn't resolve the HA agent problem, you should reinstall the vSphere HA agent on your vSphere hosts by following the topic Reinstalling HA agent discussed later in this chapter:

Unreachable or uninitialized state
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.201.206