HyperFlex Architecture
The Cisco HyperFlex HX Data Platform is a high-performance, distributed file system that supports multiple hypervisors with a wide range of enterprise-grade data management and optimization services. The Cisco HyperFlex HX Data Platform combines the cluster’s storage devices into a single distributed, multitier, object-based datastore. It makes this data available through the file system protocols and mechanisms needed by the higher-level hypervisors, virtual machines, and containers. Performance scales linearly as a cluster scales because all components contribute both processing and storage capacity to the cluster. A Cisco HyperFlex cluster can be deployed in three different ways:
HyperFlex standard cluster: This type of cluster is composed of a pair of Cisco UCS fabric interconnects and at least three HX hyperconverged nodes.
HyperFlex stretch cluster: With a stretch cluster, half of the cluster nodes are physically located in one place, and the remaining half are located in a distant secondary location.
HyperFlex edge cluster: This type of cluster is composed of at least three HX hyperconverged nodes, with no UCS fabric interconnect required. Starting with HyperFlex version 4.0, it is possible to deploy cluster with only two HX hyperconverged nodes.
Figure 2-1 illustrates a Cisco HyperFlex solution.
This section covers the components that comprise the HyperFlex architecture.
The HyperFlex architecture requires a storage controller virtual machine (SCVM) with a dedicated number of processor cores and amount of memory—enough to allow the SCVM to deliver consistent performance and not affect the performance of the other virtual machines on the cluster. The controller can access all storage without hypervisor intervention, thanks to the VMware VM_DIRECT_PATH feature. It uses the node’s memory and SSDs as part of a distributed caching layer, and it uses the node’s HDDs for distributed capacity storage.
The controller integrates the data platform into the VMware vSphere cluster through the use of three preinstalled VMware ESXi vSphere Installation Bundles (VIBs) on each node:
IO Visor: This VIB provides a network file system (NFS) mount point so that the ESXi hypervisor can access the virtual disks that are attached to individual virtual machines. From the hypervisor’s perspective, it is simply attached to a network file system. The IO Visor intercepts guest VM IO traffic and intelligently redirects it to the HyperFlex SCVMs.
VMware API for Array Integration (VAAI): This storage offload API allows vSphere to request advanced file system operations such as snapshots and cloning. The controller implements these operations via manipulation of the file system metadata rather than actual data copying, providing rapid response and thus rapid deployment of new environments.
stHypervisorSvc: This VIB adds enhancements and features needed for HyperFlex data protection and VM replication.
A HyperFlex cluster requires a minimum of three HX-Series “converged” nodes (with disk storage). Data is replicated across at least two of these nodes, and a third node is required for continuous operation in the event of a single node failure. Each node that has disk storage is equipped with at least one high-performance SSD for data caching and rapid acknowledgment of write requests. Each node also is equipped with additional disks, up to the platform’s physical limit, for long-term storage and capacity.
Each SCVM running on the hyper-converged nodes runs an instance of the Linux services used by the HyperFlex system. Table 2-1 lists these services and their purposes.
Table 2-1 HyperFlex Controller Services
HX Service Name |
Linux Service Name |
What the Service Does |
Springpath file system |
storfs |
Core of the Springpath file system. Owns disks and read/write process |
SCVM client |
scvmclient |
Proxy NFS server running on ESXi that intercepts the file I/O operations and sends them to CRM |
System management service |
stMgr |
Overall system management and inter-process communication |
HyperFlex Connect server |
hxmanager |
HX Connect UI Manager Service |
Replication services |
replsvc |
Responsible for replication |
Cluster IP Monitor |
cip-monitor |
Responsible for assigning cluster management and data IP addresses |
Replication Cluster IP Monitor |
repl-cip-monitor |
Responsible for assigning cluster replication IP addresses |
Stats Cache Service |
carbon-cache |
Accepts metrics over various protocols and writes them to disk as efficiently as possible; requires caching metric values in RAM as they are received and flushing them to disk on an interval using the underlying Whisper library |
Stats Aggregator Service |
carbon-aggregator |
Can be run in front of carbon-cache to buffer metrics over time before reporting them to Whisper |
Stats Listener Service |
statsd |
A network daemon that runs on the Node.js platform and listens for statistics, such as counters and timers, sent over UDP or TCP and sends aggregates to one or more pluggable back-end services (such as Graphite) |
Cluster Manager Service |
exhibitor |
Controls stop/start and configuration of Zookeeper |
HX Device Connector |
hx_device_connector |
Responsible for connecting to Intersight |
Web server |
tomcat |
Web server using Java servlets |
Reverse proxy server |
nginx |
HTTP and reverse proxy server, as well as a mail proxy server |
Job scheduler |
cron |
Daemon to execute scheduled commands |
DNS and name server service |
resolvconf |
A simple database for runtime nameserver information and a simple framework for notifying applications of changes in that information |
Stats web server |
graphite |
A web application that uses the Twisted framework to provide real-time visualization and storage of numeric time-series data with a daemon that processes the incoming time-series data and saves it as Whisper database files |
Cluster Management IP |
mgmtip |
The cluster management IP service in pre-3.0 clusters |
IO Visor, which is the SCVM client, lives as a process in user space inside ESXi and can be thought of as a simple NFS proxy. It behaves as a server for the VMware NFS client, while looking like a client to the controller VMs (stCtlVM/SCVM server). It is installed into ESXi as a vSphere Installation Bundle (VIB) that is auto-deployed during cluster installation.
The data platform implements a log-structured file system that uses a caching layer in SSDs to accelerate read requests and write replies, and a persistence layer implemented with HDD data distribution. Incoming data is distributed across all nodes in the cluster to optimize performance using the caching tier. Efficient data distribution is achieved by mapping received data to stripe units that are stored equally across all nodes, with the number of data replicas determined by the policies you set.
Figure 2-2 illustrates a logical representation for the Log file system.
Overall space consumption in the HyperFlex clustered file system is optimized by the default deduplication and compression features.
Deduplication is always on, helping reduce storage requirements in virtualization clusters in which multiple operating system instances in guest virtual machines result in large amounts of replicated data.
Compression further reduces storage requirements, reducing costs. In addition, the log-structured file system is designed to store variable-sized blocks, reducing internal fragmentation.
Figure 2-3 provides a logical representation of the inline compression and deduplication of the HyperFlex system.
When exiting data gets updated, it gets appended to the log, and the metadata tree also gets updated. When this happens, the original blocks of data are left behind, with no references that need to be garbage collected. Likewise, when a file is deleted, the segments that make up the file are still persisted on the disk as dead keys until the cleaner runs and compacts the segment to free up space.
Figure 2-4 provides a logical representation of the data blocks when they get updated or deleted.
Cluster Resource Manager (CRM) has many responsibilities as it coordinates functions across a cluster. It is responsible for understanding all resources available in the cluster. In addition, CRM is responsible for the timely detection of failures and conducting the appropriate remediation.
CRM maintains a Zookeeper (ZK) database that is responsible for mapping data to the mirror set.
Cleaner removes stale data and releases storage. It typically runs in the background continuously. Cleaner goes into sleep mode when it is not needed and wakes when policy defined conditions are met.
A cluster shutdown stops storage cluster operations and shuts it down. A shutdown cluster has all storage cluster processes, including the working VMs, powered down. This does not include powering down the nodes in the cluster or shutting down the vCenter or FI cluster. The Cisco HX cluster shutdown procedure does not shut down the ESXi hosts.
Rebalancing occurs when a CRM event is triggered, such as when resources are added (either a new node, a new disk, or both), resources are deleted (retiring a node or a failure condition), monitoring timers are expired, or users initiate requests.
Zookeeper is a service for maintaining configuration information, naming information, distributed synchronization, and group services for distributed applications within a cluster. It allows distributed processes to coordinate with each other via a shared hierarchical namespace, organized much like a file system, where each node in the tree is known as a znode.
Zookeeper also provides high availability and reliability, with no single point of failure characteristics for large distributed environments. It is replicated over a number of nodes called an ensemble. The nodes in the ensemble must know about each other in order to maintain an image of state and transactional logs.
The Cisco HyperFlex solution is composed of three different types of nodes:
HyperFlex hybrid nodes: Hybrid converged nodes use a combination of SSD for the short-term storage caching layer and HDDs.
HyperFlex all flash nodes: An all flash converged node uses only SSDs.
HyperFlex edge for remote branch offices: Hybrid or all flash converged nodes based on the HX220c.
Figure 2-5 illustrates the different HyperFlex node models.
The HX220c M5/M4 servers extend the capabilities of the Cisco HyperFlex portfolio in a 1U form factor with the addition of the Intel Xeon processor scalable family with 24 DIMM slots.
HXAF220c-M5SX All Flash is a small-footprint Cisco HyperFlex All Flash model that contains a 240 GB M.2 form factor SSD that acts as the boot drive, a 240 GB housekeeping SSD, either a single 375 GB Optane NVMe SSD or a 1.6 TB NVMe SSD or a 400 GB SAS SSD write-log drive, and six to eight 960 GB or 3.8 TB SATA SSDs for storage capacity. For configurations requiring self-encrypting drives, the caching SSD is replaced with an 800 GB SAS SED SSD, and the capacity disks are also replaced with either 800 GB, 960 GB, or 3.8 TB SED SSDs.
Figure 2-6 shows the front view of an HXAF220c-M5SX All Flash node.
HX220c-M5SX Hybrid is a small-footprint Cisco HyperFlex hybrid model that contains a minimum of six and up to eight 1.8 TB or 1.2 TB SAS HDDs that contribute to cluster storage capacity, a 240 GB SSD housekeeping drive, a 480 GB or 800 GB SSD caching drive, and a 240 GB M.2 form factor SSD that acts as the boot drive. For configurations requiring self-encrypting drives, the caching SSD is replaced with an 800 GB SAS SED SSD, and the capacity disks are replaced with 1.2 TB SAS SED HDDs.
Figure 2-7 shows the front view of an HX220c-M5SX Hybrid node.
HXAF220c-M4S All Flash is a small-footprint Cisco HyperFlex all flash model that contains two Cisco Flexible Flash (FlexFlash) Secure Digital (SD) cards that act as the boot drives, a single 120 GB or 240 GB SSD data-logging drive, a single 400 GB NVMe or a 400 GB or 800 GB SAS SSD write-log drive, and six 960 GB or 3.8 TB SATA SSDs for storage capacity. For configurations requiring self-encrypting drives, the caching SSD is replaced with an 800 GB SAS SED SSD, and the capacity disks are also replaced with either 800 GB, 960 GB, or 3.8 TB SED SSDs.
Figure 2-8 shows the front view of an HXAF220c-M4S All Flash node.
Cisco HyperFlex HX220c-M4S Hybrid Node is a small-footprint Cisco HyperFlex hybrid model that contains six 1.8 TB or 1.2 TB SAS HDD drives that contribute to cluster storage capacity, a 120 GB or 240 GB SSD housekeeping drive, a 480 GB SAS SSD caching drive, and two Cisco Flexible Flash (FlexFlash) SD cards that act as boot drives. For configurations requiring self-encrypting drives, the caching SSD is replaced with an 800 GB SAS SED SSD, and the capacity disks are replaced with 1.2TB SAS SED HDDs.
Figure 2-9 shows the front view of an HX220c-M4S Hybrid node.
The HX240C M5/M4 servers extend the capabilities of Cisco’s HyperFlex portfolio in a 2U form factor with the addition of the Intel Xeon processor scalable family with 24 DIMM slots.
HXAF240c-M5SX All Flash is a capacity-optimized Cisco HyperFlex all flash model that contains a 240 GB M.2 form factor SSD that acts as the boot drive, a 240 GB housekeeping SSD, either a single 375 GB Optane NVMe SSD or a 1.6 TB NVMe SSD or a 400 GB SAS SSD write-log drive installed in a rear hot-swappable slot, and 6 to 23 960 GB or 3.8 TB SATA SSDs for storage capacity. For configurations requiring self-encrypting drives, the caching SSD is replaced with an 800 GB SAS SED SSD, and the capacity disks are also replaced with either 800 GB, 960 GB, or 3.8 TB SED SSDs.
Figure 2-10 shows the front view of an HXAF240c-M5SX All Flash node.
HX240c-M5SX Hybrid is a capacity-optimized Cisco HyperFlex hybrid model that contains a minimum of 6 and up to 23 1.8 TB or 1.2 TB SAS small form factor (SFF) HDDs that contribute to cluster storage, a 240 GB SSD housekeeping drive, a single 1.6 TB SSD caching drive installed in a rear hot-swappable slot, and a 240 GB M.2 form factor SSD that acts as the boot drive. For configurations requiring self-encrypting drives, the caching SSD is replaced with a 1.6 TB SAS SED SSD, and the capacity disks are replaced with 1.2TB SAS SED HDDs.
Figure 2-11 shows the front view of an HX240c-M5SX Hybrid node.
HX240c-M5L Hybrid is a density-optimized Cisco HyperFlex hybrid model that contains a minimum of 6 and up to 12 6 TB or 8 TB SAS large form factor (LFF) HDDs that contribute to cluster storage, a 240 GB SSD housekeeping drive and a single 3.2 TB SSD caching drive (both installed in the rear hot-swappable slots), and a 240 GB M.2 form factor SSD that acts as the boot drive. Large form factor nodes cannot be configured with self-encrypting disks and are limited to a maximum of eight nodes in a cluster in the initial release of HyperFlex 3.0.
Figure 2-12 shows the front view of an HX240c-M5L Hybrid node.
HXAF240c-M4SX All Flash is a capacity-optimized Cisco HyperFlex All Flash model that contains two FlexFlash SD cards that act as boot drives, a single 120 GB or 240 GB SSD data-logging drive, a single 400 GB NVMe or a 400 GB or 800 GB SAS SSD write-log drive, and 6 to 23 960 GB or 3.8 TB SATA SSDs for storage capacity. For configurations requiring self-encrypting drives, the caching SSD is replaced with an 800 GB SAS SED SSD, and the capacity disks are also replaced with either 800 GB, 960 GB, or 3.8 TB SED SSDs.
Figure 2-13 shows the front view of an HXAF240c-M4SX All Flash node.
HX240c-M4SX Hybrid is a capacity-optimized Cisco HyperFlex hybrid model that contains a minimum of 6 and up to 23 1.8 TB or 1.2 TB SAS HDDs that contribute to cluster storage, a single 120 GB or 240 GB SSD housekeeping drive, a single 1.6 TB SAS SSD caching drive, and two FlexFlash SD cards that act as the boot drives. For configurations requiring self-encrypting drives, the caching SSD is replaced with a 1.6 TB SAS SED SSD, and the capacity disks are replaced with 1.2TB SAS SED HDDs.
Figure 2-14 shows the front view of an HX240c-M4SX Hybrid node.
HXAF220c-M5N All-NVMe is a small-footprint Cisco HyperFlex all-NVMe model that contains a 240 GB M.2 form factor SSD that acts as the boot drive, a 1 TB housekeeping NVMe SSD, a single 375 GB Intel Optane NVMe SSD write-log drive, and six to eight 1 TB or 4 TB NVMe SSDs for storage capacity. Optionally, the Cisco HyperFlex Acceleration Engine card can be added to improve write performance and compression. Self-encrypting drives are not available as an option for the all-NVMe nodes.
Figure 2-15 shows the front view of an HXAF220c-M5N All-NVMe node.
Hybrid converged nodes use a combination of SSDs for the short-term storage caching layer and HDDs. All Flash nodes only make use of SSDs, and All-NVMe nodes use only Non-Volatile Memory Express (NVMe) drives.
All current model Cisco UCS M4 and M5 generation servers, except the C880 M4 and C880 M5, may be used as compute-only nodes connected to a Cisco HyperFlex cluster, along with a limited number of previous M3 generation servers. Any valid CPU and memory configuration is allowed in the compute-only nodes, and the servers can be configured to boot from SANs, local disks, or internal SD cards. The following servers may be used as compute-only nodes:
Cisco UCS B200 M3 blade servers
Cisco UCS B200 M4 blade servers
Cisco UCS B200 M5 blade servers
Cisco UCS B260 M4 blade servers
Cisco UCS B420 M4 blade servers
Cisco UCS B460 M4 blade servers
Cisco UCS B480 M5 blade servers
Cisco UCS C220 M3 rack-mount servers
Cisco UCS C220 M4 rack-mount servers
Cisco UCS C220 M5 rack-mount servers
Cisco UCS C240 M3 rack-mount servers
Cisco UCS C240 M4 rack-mount servers
Cisco UCS C240 M5 rack-mount servers
Cisco UCS C460 M4 rack-mount servers
Cisco UCS C480 M5 rack-mount servers
The servers connect to the fabric interconnects via the Cisco VIC 1227 mLOM card, the Cisco 1387 mLOM card, or the Cisco VIC 1457 mLOM card. The VIC 1227 card is used in conjunction with the Cisco UCS 6248UP or 6296UP model fabric interconnects, the VIC 1387 is used in conjunction with the Cisco UCS 6332 or 6332-16UP model fabric interconnects, and the Cisco UCS VIC 1457 is used in conjunction with the Cisco UCS 6454 model fabric interconnect.
The Cisco UCS Virtual Interface Card (VIC) 1227 is a dual-port Enhanced Small Form-Factor Pluggable (SFP+) 10 Gbps Ethernet and Fibre Channel over Ethernet (FCoE)-capable PCI Express (PCIe) modular LAN-on-motherboard (mLOM) adapter installed in the Cisco UCS HX-Series rack servers.
Figure 2-16 shows a Cisco VIC 1227 mLOM card.
The Cisco UCS VIC 1387 card is a dual-port Enhanced Quad Small Form-Factor Pluggable (QSFP+) 40 Gbps Ethernet and Fibre Channel over Ethernet (FCoE)-capable PCI Express (PCIe) mLOM adapter installed in the Cisco UCS HX-Series rack servers.
Figure 2-17 shows a Cisco VIC 1387 mLOM card.
The Cisco UCS VIC 1457 is a quad-port Small Form-Factor Pluggable (SFP28) mLOM card designed for the M5 generation of Cisco UCS C-Series rack servers. The card supports 10/25 Gbps Ethernet or FCoE. The card can present PCIe standards-compliant interfaces to the host, and these can be dynamically configured as either NICs or HBAs.
Figure 2-18 shows a Cisco VIC 1457 mLOM card.
Table 2-2 through Table 2-9 list the Small Form-Factor Pluggable (SPF), Quad Small Form-Factor Pluggable (QSFP), cables with integrated transceivers, and cable specifications that are supported by the Cisco VIC 1227 mLOM card, Cisco VIC 1387 mLOM card, and Cisco VIC 1457 mLOM card.
Table 2-2 Cisco VIC 1227 mLOM Card SFP and Twinax Support
Connector (Media) |
Cable Type |
SFP-10G-USR |
Ultra-short-range MMF |
SFP-10G-SR |
Short-range MMF |
SFP-10G-LR |
Long-range SMF |
10GBASE-CU |
SFP+ cable 1, meter |
10GBASE-CU |
SFP+ cable 3, meter |
10GBASE-CU |
SFP+ cable 5, meter |
10GBASE-CU |
SFP+ cable 7, meter |
SFP-10GB-ACU |
Cable, 7 meter |
Table 2-3 Cisco VIC 1227 mLOM Card Cable Specifications
Connector (Media) |
Cable |
Distance |
Power (Each Side) |
Transceiver Latency (Link) |
Standard |
SFP+ CU copper |
Twinax |
5 meters |
Approximately 0.1 W |
Approximately 0.1 microseconds |
SFF 8431 |
SFP+ ACU copper |
Active Twinax |
7 meters/10 meters |
Approximately 0.5 W |
Approximately 6.8 nanoseconds |
SFF 8461 |
SFP+ SR MMF and SR |
MM OM2 MM OM3 |
82 meters/300 meters |
1 W |
Approximately 0 microseconds |
IEEE 802.3ae |
Table 2-4 Cisco VIC 1387 QSFP and QSA Support
Connector (Media) |
Cable Type |
QSFP-40G-SR4 |
40GBASE-SR4 QSFP+ transceiver module for MMF, 4 lanes, 850 nm wavelength, 12-fiber MPO/MTP connector |
QSFP-40G-SR4-S |
40GBASE-SR4 QSFP+ transceiver module for MMF, 4 lanes, 850 nm wavelength, 12-fiber MPO/MTP connector, S-class |
QSFP-40G-CSR4 |
40GBASE-CSR4 QSFP+ transceiver module for MMF, 4 lanes, 850 nm wavelength, 12-fiber MPO/MTP connector |
QSFP-40G-SR-BD |
40 Gbps QSFP BiDi optical transceiver |
QSFP-40G-LR4 |
40GBASE-LR4 QSFP+ transceiver module for SMF, duplex LC, 10-kilometer reach |
QSFP-40G-LR4-S |
40GBASE-LR4 QSFP+ transceiver module for SMF, duplex LC, 10-kilometer reach, S-class |
QSFP-4SFP10G-CU1M |
40GBASE-CR4 QSFP+ to 4 10GBASE-CU SFP+ direct attach breakout cable assembly, 1 meter passive |
QSFP-4SFP10G-CU3M |
40GBASE-CR4 QSFP+ to 4 10GBASE-CU SFP+ direct attach breakout cable assembly, 3 meter passive |
QSFP-H40G-CU1M |
Direct attach copper, 1 meter |
QSFP-H40G-CU3M |
Direct attach copper, 3 meter |
QSFP-H40G-CU5M |
Direct attach copper, 5 meter |
QSFP-H40G-ACU7M |
40GBASE-CR4 QSFP+ direct attach copper cable, 7 meter active |
QSFP-H40G-ACU10M |
40GBASE-CR4 QSFP+ direct attach copper cable, 10 meter active |
QSFP-4x10G-AC7M |
40GBASE-CR4 QSFP+ to 4 10GBASE-CU SFP+ direct attach breakout cable assembly, 7 meter active |
QSFP-4x10G-AC10M |
40GBASE-CR4 QSFP+ to 4 10GBASE-CU SFP+ direct attach breakout cable assembly, 10 meter active |
QSFP-H40G-AOC1M |
40 Gbps QSFP active optical cable, 1 meter |
QSFP-H40G-AOC2M |
40 Gbps QSFP active optical cable, 2 meter |
QSFP-H40G-AOC3M |
40 Gbps QSFP active optical cable, 3 meter |
QSFP-H40G-AOC5M |
40 Gbps QSFP active optical cable, 5 meter |
QSFP-H40G-AOC7M |
40 Gbps QSFP active optical cable, 7 meter |
QSFP-H40G-AOC10M |
40 Gbps QSFP active optical cable, 10 meter |
QSFP-4X10G-AOC1M |
QSFP to 4 SFP 10 Gbps active optical cable, 1 meter |
QSFP-4X10G-AOC2M |
QSFP to 4 SFP 10 Gbps active optical cable, 2 meter |
QSFP-4X10G-AOC3M |
QSFP to 4 SFP 10 Gbps active optical cable, 3 meter |
QSFP-4X10G-AOC5M |
QSFP to 4 SFP 10 Gbps active optical cable, 5 meter |
QSFP-4X10G-AOC7M |
QSFP to 4 SFP 10 Gbps active optical cable, 7 meter |
QSFP-4X10G-AOC10M |
QSFP to 4 SFP 10 Gbps active optical cable, 10 meter |
CVR-QSFP-SFP10G |
QSFP to SFP+ adapter (QSA) |
Table 2-5 Cisco VIC mLOM 1387 SFP and Twinax Support with QSA
Connector (Media) |
Cable Type |
SFP-10G-USR |
Ultra-short-range MMF |
SFP-10G-SR |
Short-range MMF |
SFP-10G-LR |
Long-range SMF |
10GBASE-CU |
SFP+ cable, 1 meter |
10GBASE-CU |
SFP+ cable, 3 meter |
10GBASE-CU |
SFP+ cable, 5 meter |
10GBASE-CU |
SFP+ cable, 7 meter |
SFP-10GB-ACU |
Cable, 7 meter |
Table 2-6 40 Gigabit Ethernet Cabling Specifications
Connector (Media) |
Cable |
Distance |
Power (Each Side) |
Transceiver Latency (Link) |
Standard |
SFP+ copper (CU) |
Twinax |
1, 3, and 5 meter |
Approximately 0.1 W |
Approximately 0.1 microsecond |
SFF 8431 |
SFP+ ACU copper |
Active Twinax |
7 meter 10 meter |
Approximately 0.5 W |
Approximately 0.1 microsecond |
SFF 8461 |
SFP+ FET |
MM OM2 MM OM3 MM OM4 |
25 and 100 meter |
1 W |
Approximately 0 microseconds |
IEEE 802.3ae |
SFP+ short reach (SR) and MMF |
MM OM2 MM OM3 MM OM4 |
82 and 300 meter |
1 W |
Approximately 0 microseconds |
IEEE 802.3ae |
SFP+ long reach (LR) |
SMF |
10 kilometer |
1 W |
Approximately 0 microseconds |
IEEE 802.3ae |
SFP+ long reach (ER) |
SMF |
40 kilometer |
1.5 W |
Approximately 0 microseconds |
IEEE 802.3ae |
SFP+ long reach (ZR) |
SMF |
80 kilometer |
1.5 W |
Approximately 0 microseconds |
IEEE 802.3ae |
Table 2-7 Cisco VIC 1457 SFP Support: SFP+ 10 Gbps Transceivers
Connector (Media) |
Cable Type |
SFP-10G-SR |
10GBASE-SR, 850 nm, MMF, 300 meter |
SFP-10G-SR-S |
10GBASE-SR, 850 nm, MMF, 300 meter, S-class |
SFP-10G-LR |
10GBASE-LR, 1310 nm, SMF, 10 kilometer |
SFP-10G-LR-S |
10GBASE-LR, 1310 nm, SMF, 10 kilometer, S-class |
Table 2-8 Cisco VIC 1457 SF28P Support: SFP28 25 Gbps Transceivers
Connector (Media) |
Cable Type |
SFP-25G-SR-S |
25G BASE SFP+, 850 nm, MMF, 400 meter, S-class |
SFP-10/25G-CSR-S(1) |
10/25GBASE-CSR SFP+, 850 nm, MMF, 400 meter, S-class |
Table 2-9 Cisco VIC 1457 Twinax Support
Connector (Media) |
Cable Type |
SFP-H10GB-CU1M |
10GBASE-CU SFP+ direct attach copper cable, 1 meter |
SFP-H10GB-CU1-5M(4) |
10GBASE-CU SFP+ direct attach copper cable, 1.5 meter |
SFP-H10GB-CU2M |
10GBASE-CU SFP+ direct attach copper cable, 2 meter |
SFP-H10GB-CU2-5M(4) |
10GBASE-CU SFP+ direct attach copper cable, 2.5 meter |
SFP-H10GB-CU3M |
10GBASE-CU SFP+ direct attach copper cable, 3 meter |
SFP-H10GB-CU5M |
10GBASE-CU SFP+ direct attach copper cable, 5 meter |
SFP-H10GB-ACU7M |
10GBASE-CU SFP+ direct attach active copper cable, 7 meter |
SFP-H10GB-ACU10M |
10GBASE-CU SFP+ direct attach active copper cable, 10 meter |
SFP-10G-AOC1M |
10GBASE active optical SFP+ cable, 1 meter |
SFP-10G-AOC2M |
10GBASE active optical SFP+ cable, 2 meter |
SFP-10G-AOC3M |
10GBASE active optical SFP+ cable, 3 meter |
SFP-10G-AOC5M |
10GBASE active optical SFP+ cable, 5 meter |
SFP-10G-AOC7M |
10GBASE active optical SFP+ cable, 7 meter |
SFP-10G-AOC10M |
10GBASE active optical SFP+ cable, 10 meter |
SFP-H25G-CU1M |
25GBASE-CU SFP28 direct attach copper cable, 1 meter |
SFP-H25G-CU2M |
25GBASE-CU SFP28 direct attach copper cable, 2 meter |
SFP-H25G-CU3M |
25GBASE-CU SFP28 direct attach copper cable, 3 meter |
SFP-25G-AOC1M |
25GBASE-AOC SFP28 active optical cable, 1 meter |
SFP-25G-AOC2M |
25GBASE-AOC SFP28 active optical cable, 2 meter |
SFP-25G-AOC3M |
25GBASE-AOC SFP28 active optical cable, 3 meter |
SFP-25G-AOC5M |
25GBASE-AOC SFP28 active optical cable, 5 meter |
SFP-25G-AOC7M |
25GBASE-AOC SFP28 active optical cable, 7 meter |
SFP-25G-AOC10M |
25GBASE-AOC SFP28 active optical cable, 10 meter |
This section describes the supported topologies in which the HyperFlex HX Data Platform can be deployed. These topologies include the following:
HyperFlex standard cluster
HyperFlex stretch cluster
HyperFlex edge cluster
The Cisco HyperFlex system is composed of a pair of Cisco UCS fabric interconnects along with up to 32 HX-Series rack-mount servers per cluster. Up to 32 compute-only servers can also be added per HyperFlex cluster. Adding Cisco UCS rack-mount servers and/or Cisco UCS 5108 blade chassis, which house Cisco UCS blade servers, allows for additional compute resources in an extended cluster design. Up to 8 separate HX clusters can be installed under a single pair of fabric interconnects. The two fabric interconnects both connect to every HX-Series rack-mount server, and both connect to every Cisco UCS 5108 blade chassis and Cisco UCS rack-mount server. Upstream network connections, also referred to as Northbound network connections, are made from the fabric interconnects to the customer data center network at the time of installation.
Figure 2-19 illustrates the topology for a standard HyperFlex cluster.
Figure 2-20 illustrates the topology for an extended HyperFlex cluster.
Fabric interconnects (FIs) are deployed in pairs, and the two units operate as a management cluster while forming two separate network fabrics, referred to as the A side and B side fabrics. Therefore, many design elements refer to FI A or FI B, alternatively called fabric A or fabric B. Both fabric interconnects are active at all times, passing data on both network fabrics for a redundant and highly available configuration. Management services, including Cisco UCS Manager, are also provided by the two FIs, but in a clustered manner, where one FI is the primary, and one is secondary, with a roaming clustered IP address. This primary/secondary relationship is only for the management cluster and has no effect on data transmission.
Fabric interconnects have the following ports, which must be connected for proper management of the Cisco UCS domain:
Mgmt: A 10/100/1000 Mbps port for managing the fabric interconnect and the Cisco UCS domain via GUI and CLI tools. This port is also used by remote KVM, IPMI, and SoL sessions to the managed servers within the domain. This is typically connected to the customer management network.
L1: A cross-connect port for forming the Cisco UCS management cluster. This port is connected directly to the L1 port of the paired fabric interconnect using a standard CAT5 or CAT6 Ethernet cable with RJ-45 plugs. It is not necessary to connect this to a switch or hub.
L2: A cross-connect port for forming the Cisco UCS management cluster. This port is connected directly to the L2 port of the paired fabric interconnect using a standard CAT5 or CAT6 Ethernet cable with RJ-45 plugs. It is not necessary to connect this to a switch or hub.
Console: An RJ-45 serial port for direct console access to the fabric interconnect. This port is typically used during the initial FI setup process with the included serial to RJ-45 adapter cable. This can also be plugged into a terminal aggregator or remote console server device.
The HX-Series converged servers are connected directly to the Cisco UCS fabric interconnects in Direct Connect mode. This option enables Cisco UCS Manager to manage the HX-Series rack-mount servers using a single cable for both management traffic and data traffic. All the Cisco HyperFlex M4 generation servers are configured with the Cisco VIC 1227 or Cisco VIC 1387 network interface card (NIC) installed on the mLOM slot, which has dual 10 Gigabit Ethernet) or 40 Gigabit Ethernet ports. Cisco HyperFlex M5 generation servers can be configured only with the Cisco VIC 1387 card or the Cisco VIC 1457. The standard and redundant connection practice is to connect port 1 of the VIC (the right-hand port) to a port on FI A and port 2 of the VIC card (the left-hand port) to a port on FI B (0). An optional configuration method for servers containing the Cisco VIC 1457 card is to cable the servers with 2 links to each FI, using ports 1 and 2 to FI A, and ports 3 and 4 to FI B. The HyperFlex installer checks for this configuration and ensures that all servers’ cabling matches. Failure to follow this cabling practice can lead to errors, discovery failures, and loss of redundant connectivity.
All nodes within a Cisco HyperFlex cluster must be connected at the same communication speed—for example, mixing 10 Gb with 25 Gb interfaces is not allowed. In addition, for clusters that contain only M5 generation nodes, all the nodes within a cluster must contain the same model of Cisco VIC cards.
Various combinations of physical connectivity between the Cisco HX-Series servers and the fabric interconnects are possible, but only specific combinations are supported. For example, use of the Cisco QSA module to convert a 40 Gigabit Ethernet QSFP+ port into a 10 Gigabit Ethernet SFP+ port is not allowed with M4 generation servers but is allowed with M5 generation servers in order to configure a mixed cluster of M4 and M5 generation servers along with model 6248 or 6296 fabric interconnects. Table 2-10 lists the possible connections and which of these methods are supported.
Table 2-10 Supported Physical Connectivity
Fabric Interconnect Model |
6248 |
6296 |
6332 |
6332-16UP |
6454 |
||||
Port Type |
10 Gigabit Ethernet |
10 Gigabit Ethernet |
40 Gigabit Ethernet |
10 Gigabit Ethernet Breakout |
40 Gigabit Ethernet |
10 Gigabit Ethernet Breakout |
10 Gigabit Ethernet Onboard |
10 Gigabit Ethernet |
25 Gigabit Ethernet |
M4 with VIC 1227 |
✓ |
✓ |
— |
— |
— |
— |
— |
✓ |
— |
M4 with VIC 1387 |
— |
— |
✓ |
— |
✓ |
— |
— |
— |
— |
M4 with VIC 1387 + QSA |
— |
— |
— |
— |
— |
— |
— |
✓ |
— |
M5 with VIC 1387 |
— |
— |
✓ |
— |
✓ |
— |
— |
— |
— |
M5 with VIC 1387 + QSA |
✓ |
✓ |
— |
— |
— |
— |
— |
✓ |
— |
M5 with VIC 1457 or 1455 |
✓ |
✓ |
— |
— |
— |
— |
— |
✓ |
✓ |
Figure 2-21 illustrates the connectivity of an HX-Series server.
HyperFlex extended clusters also incorporate 1 to 16 Cisco UCS blade servers for additional compute capacity. A blade chassis comes populated with 1 to 4 power supplies and 8 modular cooling fans. In the rear of the chassis are two bays for installation of Cisco fabric extenders. The fabric extenders (also commonly called IO modules, or IOMs) connect the chassis to the fabric interconnects. Internally, the fabric extenders connect to the Cisco VIC installed in each blade server across the chassis backplane. The standard practice is to connect 1 to 8 10 Gigabit Ethernet links, or 1 to 4 40 Gigabit Ethernet links (depending on the IOMs and FIs purchased) from the left-side IOM, or IOM 1, to FI A, and to connect the same number of 10 Gigabit Ethernet or 40 Gigabit Ethernet links from the right-side IOM, or IOM 2, to FI B. All other cabling configurations are invalid and can lead to errors, discovery failures, and loss of redundant connectivity.
Figure 2-22 illustrates the connectivity for the Cisco UCS 5108 chassis.
A stretch cluster is geographically redundant, meaning that part of the cluster resides in one physical location, and another part resides in a second location. The cluster also requires a “tie breaker” or “witness” component, which should reside in a third, separate location. The goal of this design is to help ensure that the virtual infrastructure remains available even in the event of the complete loss of one site.
Figure 2-23 shows a logical representation of the components that are part of a HyperFlex stretch cluster.
Characteristics of stretch cluster include the following:
A stretch cluster is a single cluster with nodes geographically distributed at different locations.
Storage is mirrored locally and across each site (but not to the tie-breaker witness).
Sites need to be connected over a low-latency network to meet the write requirements for applications and for a good end-user experience.
Geographic failover (using a virtual machine) is like failover in a regular cluster.
Node failure in a site is like node failure in a regular cluster.
Split brain is a condition in which nodes at either site cannot see each other. This condition can lead to problems if a node quorum cannot be determined (so that virtual machines know where to run). Split brain is caused by network failure or site failure.
Stretch clusters have a witness: an entity hosted on a third site that is responsible for deciding which site becomes primary after a split-brain condition.
A stretch cluster does have some physical limitations to keep in mind. Some applications, specifically databases, require write latency of less than 20 milliseconds (ms). Many other applications require latency of less than 10 ms to avoid problems with the application. To meet these requirements, the round-trip time (RTT) network latency on the stretch link between sites in a stretch cluster should be less than 5 ms. The speed of light (3e8 m/s) at the maximum recommended stretch cluster site distance of 100 km (approximately 62 miles) introduces about 1 ms of latency by itself. In addition, time is needed for code path and link hops (from node to fabric interconnect to switch), which also plays a role in determining the maximum site-to-site recommended distance.
Figure 2-24 shows a logical representation of a HyperFlex stretch cluster.
Figure 2-25 illustrates the topology for a HyperFlex stretch cluster.
A HyperFlex edge cluster is composed of two, three, or four Cisco UCS HX-Series converged nodes (with disk storage). Cluster node expansion is not supported but is planned for a future software release with 10 Gigabit Ethernet topologies. Each node is equipped with at least one high-performance SSD for data caching and rapid acknowledgment of write requests. Each node is also equipped with additional disks, up to the platform’s physical limit, for long-term storage capacity.
The following are the components of a Cisco HyperFlex Edge system:
Cisco HyperFlex HX-Series rack-mount servers, including the following models:
Cisco HyperFlex HX220c-M5SX rack-mount servers (2-, 3-, or 4-node configuration)
Cisco HyperFlex HXAF220c-M5SX All Flash rack-mount servers (2-, 3-, or 4-node configuration)
Cisco HyperFlex HX220c-M4S rack-mount servers (3-node configuration only)
Cisco HyperFlex HXAF220c-M4S All Flash rack-mount servers (3-node configuration only)
Cisco HyperFlex Data Platform software
VMware vSphere ESXi hypervisor
VMware vCenter Server (end user supplied)
In this solution, Cisco Intersight is the management platform that performs the deployment and administration of the HyperFlex Edge systems across multiple sites.
A Cisco HyperFlex edge cluster is built using Cisco HX-Series rack-mount servers that are not connected to Cisco UCS fabric interconnects. Upstream network connections, also referred to as Northbound network connections, are made directly from the servers to the customer-chosen data center top-of-rack (ToR) switches at the time of installation.
Figure 2-26 shows a logical topology for a HyperFlex edge cluster.
The following section describes the physical connectivity of the 3-node cluster. Chapter 7, “Deploying HyperFlex Anywhere with Intersight,” reviews the 2-node edge cluster.
The Cisco HyperFlex Edge product supports three networking topologies: single Gigabit Ethernet switch, dual Gigabit Ethernet, and 10 Gigabit Ethernet switch (either single or dual) configurations, depending on requirements and the available switching hardware. This section describes specific requirements for each topology, as well as common network requirements that apply to all three topologies.
A single-switch configuration provides a simple topology requiring only a single switch and two Gigabit Ethernet ports per server. Link or switch redundancy is not provided. Access ports and trunk ports are the two supported network port configurations.
Figure 2-27 illustrates the logical network topology of an edge cluster with a single upstream switch.
The upstream network requirements are as follows:
A managed switch with VLAN capability
Six physical Gigabit Ethernet ports for three HyperFlex nodes
(Optional) Jumbo frames
PortFast or PortFast trunk configured on each port to ensure uninterrupted access
Two Gigabit Ethernet ports are required per server:
Port 1: For management (ESXi and Cisco Integrated Management Controller [CIMC]), vMotion traffic, and VM guest traffic
Port 2: For HyperFlex storage traffic
There are two supported network port configurations: access ports or trunk ports.
Spanning tree PortFast (access ports) or PortFast trunk (trunk ports) must be enabled for all network ports connected to HyperFlex servers. Failure to configure PortFast causes intermittent CIMC disconnects during ESXi bootup and longer-than-necessary network reconvergence during physical link failure.
Figure 2-28 illustrates the physical topology for a single switch from a single edge node.
Some things to consider with this topology are as follows:
Cable both integrated LOM ports to the same ToR switch.
If desired, cable the dedicated CIMC port to the same switch or to an out-of-band management switch.
Do not use the 10 Gigabit Ethernet ports on the VIC.
Ethernet interfaces can be configured either as access ports or trunk ports, as follows:
An access port can have only one VLAN configured on the interface; it can carry traffic for only one VLAN.
A trunk port can have one or more VLANs configured on the interface; it can carry traffic for several VLANs simultaneously.
Table 2-11 summarizes the differences between access ports and trunk ports. You can use the details described in this table to determine which ports to use for a deployment.
Table 2-11 Trunk and Access Port Definitions
Trunk Ports |
Access Ports |
Requires more setup and definition of VLAN tags within CIMC, ESXi, and HX Data Platform installer. |
Provides a simpler deployment process than trunk ports. |
Provides the ability to logically separate management, vMotion, and VM guest traffic on separate subnets. |
Requires that management, vMotion, and VM guest traffic share a single subnet. |
Provides flexibility to bring in additional L2 networks to ESXi. |
Requires a managed switch to configure ports 1 and 2 on discrete VLANs; storage traffic must use a dedicated VLAN, with no exceptions. |
A dual-switch configuration provides a slightly more complex topology with full redundancy that protects against switch failure, link and port failure, and LOM/PCIe NIC HW failures. It requires two switches that may be standalone or stacked, and four Gigabit Ethernet ports and one additional PCIe NIC per server. Trunk ports are the only supported network port configuration.
Figure 2-29 illustrates the logical network topology of an edge cluster with dual upstream switch connectivity.
The upstream network requirements are as follows:
Two managed switches with VLAN capability (standalone or stacked)
12 physical Gigabit Ethernet ports for three HyperFlex nodes (All 12 ports must be configure as trunk and allow all applicable VLANs.)
(Optional) Jumbo frames
PortFast trunk configured on all ports to ensure uninterrupted access to CIMC)
Four Gigabit Ethernet ports are required per server:
Port 1: For management (ESXi, HyperFlex controller, and CIMC) and VM guest traffic
Port 2: For HyperFlex storage traffic (and vMotion standby)
Port 3: For VM guest traffic (and management standby)
Port 4: For vMotion traffic (and storage standby)
Two ports use LOM, and two ports are from a PCIe add-in NIC:
1 LOM port and 1 PCIe port serve management and VM guest traffic in a redundant configuration
1 LOM port and 1 PCIe port serve storage data and vMotion traffic in a redundant and load-balanced configuration
The Intel i350 quad-port NIC (UCSC-PCIE-IRJ45) must be installed for this topology:
The NIC may be selected at ordering time and shipped preinstalled from the factory.
The NIC may also be field-installed if ordered separately. Either riser 1 or riser 2 may be used, although riser 1 is recommended.
Only trunk ports are supported in the dual-switch configuration.
A spanning tree PortFast trunk must be enabled for every network port connected to HyperFlex servers. Failure to configure PortFast causes intermittent CIMC disconnects during ESXi bootup and longer-than-necessary network reconvergence during physical link failure.
Figure 2-30 illustrates the physical topology for a dual-switch configuration from a single edge node.
Some things to consider with this topology are as follows:
Cable both integrated LOM ports to the same ToR switch.
Cable any two out of four PCIe NIC ports to the same ToR switch. Do not connect more than two PCIe NIC ports prior to installation. After cluster installation, you may freely use the remaining ports.
Redundancy occurs at the vSwitch level and includes one uplink port from the onboard LOM and one uplink port from PCIe NIC for each vSwitch.
If desired, cable the dedicated CIMC port to the same switch or to an out-of-band management switch.
Do not use the 10 Gigabit Ethernet ports on the VIC.
10 Gigabit Ethernet switch configuration provides a fully redundant technology that protects against switch (if using dual or stacked switches) and link and port failures. The 10 Gigabit Ethernet switch may be standalone or stacked. In addition, this configuration requires the following:
Two 10 Gigabit Ethernet ports and a VIC 1387 with two QSAs per server
Use of trunk mode
Deployment using the on-premise OVA installer and not Intersight
Figure 2-31 illustrates the logical network topology of an edge cluster with 10 Gigabit Ethernet upstream switch connectivity.
The upstream network requirements are as follows:
Two 10 Gbps ports are required per server using a VIC 1387:
Each physical VIC port is logically divided into 4 vNICs, as seen by the hypervisor.
Only 10 Gbps speeds are supported (no 40 Gbps).
M5 servers require VIC 1387 and two QSAs to reach 10 Gbps speeds.
M4 servers require VIC 1227 to reach 10 Gbps speeds.
Additional NICs:
Additional third-party NICs may be installed in the HX edge nodes, as needed.
All non-VIC interfaces must be shut down until the installation is complete.
Only a single VIC is supported per HX edge node.
Only trunk ports are supported in 10 Gigabit Ethernet switch configurations.
Spanning tree PortFast trunk should be enabled for all network ports connected to HX ports.
Figure 2-32 illustrates the physical topology for a 10 Gigabit Ethernet dual-switch configuration from a single edge node.
Some things to consider with this topology are as follows:
For M5 servers, ensure that a Cisco 40 Gigabit Ethernet-to-10 Gigabit Ethernet QSA is installed in both VIC ports.
If using a single 10 Gigabit Ethernet switch, cable both 10 Gigabit Ethernet ports to the same switch.
If using dual 10 Gigabit Ethernet switches or stacked switches, cable a 1X10 Gigabit Ethernet port to each switch, ensuring that all port 1s from all nodes go to the same switch and all port 2s from all nodes are connected to the other switch.
Cable the dedicated CIMC port to the same switch or to an out-of-band management switch.
This chapter presents the architecture of the HyperFlex HX Data Platform, including the components that make this solution possible. This chapter also reviews the different cluster topology configurations that are supported by the HyperFlex HX Data Platform. It discusses both the logical and physical topologies and the supported hardware for these topologies. Chapter 3, “Installing HyperFlex,” discusses deploying various types of HyperFlex clusters.
3.133.144.197