Chapter 13. Hardware Maintenance

Once deployed, it’s not uncommon for XenServer installations to run without incident for years. Often, it’s a hardware issue or growth in requirements that prompts administrators to look at their XenServer infrastructure. In this chapter, we’ll be covering practices for everything related to hardware maintenance, be it storage, network, or computer.

Upgrades, Support, and Hardware

A chapter on hardware wouldn’t be complete without some discussion of how the XenServer hardware compatibility list (HCL) works. As mentioned in Chapter 4, the XenServer HCL is located at http://hcl.xenserver.org. A component gets added to the HCL when the hardware vendor and Citrix agree that they will jointly accept user support calls for a given version of XenServer on that hardware.

While in a perfect world, all hardware would be tested and certified for new versions, the reality is that often, hardware vendors wish to no longer certify XenServer for legacy or end-of-life hardware. This doesn’t mean XenServer won’t function on older hardware, but if you want to maintain a “supported platform” in the eyes of Citrix and your hardware vendor, you may find that upgrading XenServer past a certain version will place you into “unsupported” territory.

As a XenServer administrator, it’s important to pay attention to the HCL as you plan out your upgrades. Newer hardware may not have been certified for older versions of XenServer, and older hardware may no longer be actively supported for newer versions of XenServer. If you find yourself in a situation where the hardware you wish to use isn’t on the XenServer HCL, you should first start with your hardware vendor and determine if it is in the process of certification. Once the hardware vendor performs its certification, it then provides Citrix with the results, which Citrix then posts. By starting with the hardware vendor, you not only provide it with information about how its hardware is being used, but also might be able to add weight to any efforts that increase the priority of future XenServer certifications.

Storage

Each recipe in this section will relate to ensuring you have sufficient free storage to operate the XenServer pool at peak efficiency.

Adding Local Storage

Problem

As part of your XenServer design, you are using local storage for virtual machine disks. The usage requirements for those virtual machines require additional storage.

Solution

Assuming your server has sufficient free drive bays to install additional storage, you can add drives and within XenServer create new local storage repositories.

Changing the Default Local Storage

When XenServer is installed, it uses the existing drive for both dom0 and as local storage. Changing the physical configuration of that default drive will require a complete reinstall of XenServer.

Discussion

As the XenServer deployment grows, so will the need for storage and networking. When dealing with local storage, additional drives can be added; however, they have to be added to the host as “local storage 2,” and so forth. As an example, if a 2.2 TB or greater capacity spindle drive is added to the host, on reboot, XenServer will not automatically pick this up. In the following sequence, we’ll be adding a 4 TB drive as additional local storage.

While in the control domain, the drive will follow Linux conventions: being mapped to a device, such as /dev/sdb, and should be verified as being the large spindle drive that was added by issuing the command in Example 13-1.

Example 13-1. Checking the geometry of device /dev/sdb
# fdisk –l /dev/sdb

Because the drive is known to be 4 TB in size, if /dev/sdb is the device entry within the control domain for this spindle drive, fdisk should return an warning followed by the geometry of /dev/sdb and what the limit is for a DOS partition table format (2.2 TB), as shown in Example 13-2.

Example 13-2. Using fdisk to validate /dev/sdb
# fdisk -l /dev/sdb
WARNING: GPT (GUID Partition Table) detected on '/dev/sdb'! 
The util fdisk doesn't support GPT. Use GNU Parted.
Note: sector size is 4096 (not 512)

WARNING: The size of this disk is 4.0 TB (4000787025920 bytes).
DOS partition table format can not be used on drives for volumes
larger than 2.2 TB (2199023255040 bytes). Use parted(1) and GUID
partition table format (GPT).

Disk /dev/sdb: 4000.7 GB, 4000787025920 bytes
255 heads, 63 sectors/track, 60800 cylinders
Units = cylinders of 16065 * 4096 = 65802240 bytes
Device Boot Start End Blocks Id System
/dev/sdb1 1 60801 3907018576 ee EFI GPT

Because we have confirmed device /dev/sdb is greater than 2.2 TB, fdisk cannot be used. The control domain offers gdisk to create GPT partitions for partitions less or greater than 2.2 TB; however, this is not necessary because XAPI can be used to accomplish this.

To add /dev/sdb as “local storage 2,” the command in Example 13-3 can be issued on the XenServer host so it is formatted, checked, tagged with a UUID, and added to the XAPI database—appearing in XenCenter as a storage repository from this point out for the particular XenServer host.

Example 13-3. Create new local storage SR
# xe sr-create content-type=user device-config:device=/dev/sdb 
  host-uuid=<host-uuid> name-label=”Local Storage 2” 
  shared=false type=lvm

Software RAID Configurations

XenServer doesn’t support the use of software RAID configurations. If you wish to use multiple disks in a local SR, specify each disk in comma-separated format as part of the device-config parameter when creating the SR.

USB Storage for Backup

Problem

You wish to back up virtual machine disks onto removable media.

Solution

Removable media is automatically identified by XenServer and can be used for multiple purposes including as VM storage. In this example, we wish to use USB storage to back up virtual machines.

Discussion

XenServer also has support for udev, or USB-based devices. It is recommended these be used for backups and not actual VM disk storage. In my own experience, any USB flash or spindle device that is above 32 GB should be preformatted, outside of XenServer, with an EXT3 filesystem. The reason for this is that most types of such devices are NTFS-based. We want to ensure this drive is clean before plugging it in and determining its /dev/sdXY entry.

So, if I have a 4 TB volume attached to /dev/sde, I would execute the command in Example 13-4 to create a USB-based device to export backups to—allowing it to be disconnected from XenServer and reattached as needed.

Example 13-4. Create local storage SR for backup purposes
# xe sr-create content-type=user device-config:device=/dev/sde 
  host-uuid=<host-uuid> name-label=”USB BACKUP” shared=false type=ext

Networking

Replacing a NIC

Problem

A network card has become defective and must be replaced.

Solution

Place the XenServer host into maintenance mode and replace the physical NIC, then configure XenServer to use the new NIC in place of the old NIC.

Discussion

During the installation or upgrade of XenServer, the host machine’s hardware is profiled and stored in the XAPI database for dom0. This information is highly detailed to ensure that after installation and subsequent reboots for maintenance, the host’s hardware is still in exact working order. This means that the administrator or IT department cannot just shut down a XenServer host: removing the faulty card and replacing it as if something has been changed on the host (such as a network card), or if it fails to load, then dom0 will run in a default “maintenance” mode where manual intervention is required.

An example of the type of information stored for a network card is shown in Example 13-5.

Example 13-5. NIC information
<row ref="OpaqueRef:028983cc-d0f7-316e-1d99-c822e3439f91" 
__ctime="314225446" 
__mtime="314225446" 
DNS="10.0.0.2,10.0.0.3" 
IP="10.0.0.10" IPv6="('')" 
MAC="f0:92:1c:13:b7:08"
MTU="1500" 
VLAN="-1" VLAN_master_of="OpaqueRef:NULL" VLAN_slave_of="()" 
_ref="OpaqueRef:028983cc-d0f7-316e-1d99-c822e3439f91" 
bond_master_of="('OpaqueRef:6a2b5648-af19-5636-5695-3e5385d0a81e')" 
bond_slave_of="OpaqueRef:NULL" currently_attached="true" 
device="bond0" 
device_name="bond0" 
disallow_unplug="false" 
gateway="10.0.0.1" 
host="OpaqueRef:1b25f88f-3c25-05c6-c00e-37859fd68ed4" 
ip_configuration_mode="Static" 
ipv6_configuration_mode="None" 
ipv6_gateway="" 
managed="true" 
management="true" 
metrics="OpaqueRef:d3d5e33f-a9b4-363e-1d9e-78b2ddc73f2d" 
netmask="255.0.0.0" network="OpaqueRef:471f9e43-5d50-f525-dbbc-6ae1c10e462a" 
other_config="()" 
physical="false" 
primary_address_type="IPv4" 
tunnel_access_PIF_of="()"
tunnel_transport_PIF_of="()" 
uuid="eda9c065-7bf6-ff9f-dc40-2ae53efc12c9"/>

By replacing the card without following the proper steps for a standalone host or pool, this information is now invalidated.

First we place the host into maintenance mode:

# xe host-disable

Then, if the NIC is a management interface, we need to disable it:

# xe host-management-disable

Next we determine the pif and find the device position for the NIC:

# xe pif-list params=all

Armed with the pif, we need to instruct XAPI to forget about it, and then the host must be halted and the replacement NIC installed:

# xe pif-forget uuid={pif-uuid}
# halt

After installing the new NIC and restarting the host, determine the MAC address for the replacement NIC:

# ip addr

We now need to introduce the replacement pif in the same position as the original. This requires the original device position and the new MAC address:

# xe pif-introduce device={device position} host-uuid={host-uuid} mac={new MAC}

With the pif defined, we can now add in any fixed network address parameters:

# xe pif-reconfigure-ip uuid={host-uuid} IP=”10.0.0.20” 
netmask=”255.0.0.0.” gateway=”10.0.0.1” dns=”10.0.0.1”

and complete the task by reconfiguring the management network to use the new pif (assuming the original pif was a management network):

# xe host-management-reconfigure pif-uuid={pif-uuid}

Hosts

Adding a New Host to a Pool

Problem

The existing pool capacity is insufficient, and a new host needs to be added to the pool.

Solution

Obtain a host with comparable capabilities as the original host, and prepare to add it to the pool.

Discussion

As discussed earlier, resource pools provide an aggregate virtualization environment consisting of multiple hosts. In order to function at peak efficiency, the capabilities of each host should be as close as possible. This compatibility extends to the host CPU feature set, which should, ideally, be identical. Unfortunately, while processor vendors market a CPU as identical, individual steppings of a given processor may contain hardware fixes making them not quite identical.

While XenServer does a good job of masking CPU features, or ensuring that one host does not expose a CPU feature that other pool members do not have, this essentially diminishes the performance of the new host. Because operating systems are fully aware of the capabilities of a host CPU, it is imperative all CPUs in a resource pool have identical features lest the operating system crash in the event a VM migration occurs.

Don’t Forget About Network Configuration

While CPU compatibility is paramount, it’s important to recall that all hosts in a resource pool must have identical physical network configurations.

The first step in ensuing a consistent pool is to obtain the current features of the pool master:

# xe host-get-cpu-features

Copy the returned feature set and attempt to apply it to the new host prior to any attempt at joining the new host to the pool:

# xe host-set-cpu-features features=[pool master CPU features]

If successful, reboot the host and add it to the resource pool. In the event the command fails, this typically means the CPU doesn’t adequately support feature masking. To verify this, you can obtain CPU information and compare it to the CPU information from the pool master, as shown in Example 13-6. If the “maskable” flag is set to anything other than “full,” you may not be able to create a viable feature mask for this CPU.

Example 13-6. Determine CPU information
# xe host-cpu-info

Recovery from Host Failure When HA Is Enabled

Problem

Host failure occurred while the host was accessing the heartbeat state file, and the file has an orphaned lock.

Solution

To remove all locks, you must perform an emergency HA reset.

Discussion

If a partial host failure occurs while the HA daemon on the host was accessing the state file on shared storage, it may be necessary to temporarily disable HA. This can be done using the following command:

# xe host-emergency-ha-disable --force

If the host was a pool master, then it should automatically recover, and member servers should automatically reconnect. If a member server does not automatically reconnect, it may be necessary to reset the pool master address. This can be done using the following command on the impacted member server:

Reset pool master address on member server
# xe pool-emergency-reset-master master-address={pool_master_address}
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.133.233