Fabric Interconnect device failure and recovery

UCS Fabric Interconnects are deployed in a cluster configuration from the control plane perspective. A Fabric Interconnect pair is in active/standby configuration. The active Fabric Interconnect is called primary and the standby Fabric Interconnect is called subordinate.

All control plane communication is handled by the primary Fabric Interconnect that manages the main configuration database. The main configuration database is stored on the primary and replicated on the subordinate Fabric Interconnect. The primary sends updates to the subordinate when configuration changes occur through dedicated Ethernet links called L1/L2.

In a situation where the Fabric Interconnect running the primary instance fails, the subordinate Fabric Interconnect takes the role of primary instantaneously. Access to UCS Manager stops, and you need to log out and log back in. In a split brain situation, where both Fabric Interconnects try to come online as the primary, the Fabric Interconnect database version gets checked, and the Fabric Interconnect with the higher database revision number becomes the primary. We discussed a split brain situation in Chapter 11, Configuring Backup, Restore, and High Availability, and learned that the UCS chassis midplane has been designed such that it is helpful in resolving split brain scenarios.

If the dedicated communication links L1/L2 between two Fabric Interconnects fail, the Fabric Interconnects have special access to the blade chassis for communicating the heartbeat. In this situation, the role of the Fabric Interconnects will not change; primary will remain primary, and subordinate will remain subordinate. However, any configuration changes made on the primary will not be reflected on the secondary database until the dedicated Ethernet links are restored.

The following procedure should be performed to replace a failed Fabric Interconnect:

  1. Upgrade the firmware of the replacement Fabric Interconnect to the same level as the running Fabric Interconnect with the following steps:
    1. Connect the new Fabric Interconnect to the management network (do not connect the L1 and L2 cables).
    2. Convert the new Fabric Interconnect into an SSH, and run through the setup wizard by configuring the Fabric Interconnect as a standalone.
    3. Update both UCS Manager and the Fabric Interconnect firmware code to the code running on the existing cluster member.
    4. Once the upgrades are complete, verify that the running and startup versions match those of the existing cluster member.
  2. Once the firmware is updated, use the following commands to erase the configuration on the standalone Fabric Interconnect:
        UCS # connect local-mgmt
UCS # erase configuration
UCS # yes (to reboot)
  1. Connect the L1 and L2 cables between the Fabric Interconnect.
  2. Erase the configuration; this will cause the setup wizard to run again on the new Fabric Interconnect and detect the presence of a peer Fabric Interconnect. When prompted for the detection of a peer Fabric Interconnect, type y to add the new Fabric Interconnect to the existing cluster. Save the configuration and reboot.
  3. Log in to the UCS Manager, or use the following command line to verify the cluster state:
        UCS # connect local-mgmt
UCS # show cluster state
Cluster Id: 0x633acf7e9b7611e1-0x9587147fbb1fc579
A: UP, PRIMARY
B: UP, SUBORDINATE
HA READY

Sometimes, it may also become necessary to change the current primary Fabric Interconnect in the cluster. One such situation is during a Fabric Interconnect's firmware upgrade, where it is necessary to switch the cluster lead while upgrading code on the primary cluster.

Use the following commands to change the cluster lead:

UCS # cluster lead a
UCS # cluster force primary

Either of these two commands can be used to make Fabric A the primary Fabric Interconnect:

UCS # show cluster state 

(if done quickly, you'll see the status of SWITCHOVER IN PROGRESS)

Fabric Interconnects have redundant power supply and fan units so they can avoid a complete failure. With this built-in redundancy, it is very rare that Fabric Interconnects will go down completely. The redundant parts of Fabric Interconnects are hot swappable and can be changed without any disruption.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.104.95