Chapter 11

Bringing a vSphere Design Together

In this chapter, we'll pull together all the various topics that we've covered so far throughout this book and put them to use in a high-level walkthrough of a VMware vSphere design. Along the way, we hope you'll get a better understanding of VMware vSphere design and the intricacies that are involved in creating a design.

This chapter will cover the following topics:

  • Examining the decisions made in a design
  • Considering the reasons behind design decisions
  • Exploring the impact on the design of changes to a decision
  • Mitigating the impact of design changes

Sample Design

For the next few pages, we'll walk you, at a high level, through a simple VMware vSphere design for a fictional company called XYZ Widgets. We'll first provide a business overview, followed by an overview of the major areas of the design, organized in the same fashion as the chapters in the book. Because VMware vSphere design documentation can be rather lengthy, we'll include only relevant details and explanations. A real-world design will almost certainly need to be more complete, more detailed, and more in-depth than what is presented in this chapter. Our purpose here is not to provide a full and comprehensive vSphere design but rather to provide a framework in which to think about how the various vSphere design points fit together and interact with each other and to help promote a holistic view of the design.

We'll start with a quick business overview and a review of the virtualization goals for XYZ Widgets.

Business Overview for XYZ Widgets

XYZ Widgets is a small manufacturing company. XYZ currently has about 60 physical servers, many of which are older and soon to be out of warranty and no longer under support. XYZ is also in the process of implementing a new ERP system. To help reduce the cost of refreshing the hardware, gain increased flexibility with IT resources, and reduce the hardware acquisitions costs for the new ERP implementation, XYZ has decided to deploy VMware vSphere in its environment. As is the case with many smaller organizations, XYZ has a very limited IT staff, and the staff is responsible for all aspects of IT—there are no dedicated networking staff and no dedicated storage administrators.

XYZ has the following goals in mind:

  • XYZ would like to convert 60 existing workloads into VMs via a physical-to-virtual (P2V) process. These workloads should be able to run unmodified in the new vSphere environment, so they need connectivity to the same VLANs and subnets as the current physical servers.
  • Partly due to the ERP implementation and partly due to business growth, XYZ needs the environment to be able to hold up to 200 VMs in the first year. This works out to be over 300% growth in the anticipated number of VMs over the next year.
  • The ERP environment is really important to XYZ's operations, so the environment should provide high availability for the ERP applications.
  • XYZ wants to streamline its day-to-day IT operations, so the design should incorporate that theme. XYZ management feels the IT staff should be able to “do more with less.”

These other requirements and constraints also affected XYZ's vSphere design:

  • XYZ has an existing Fibre Channel (FC) storage area network (SAN) and an existing storage array that it wants to reuse. An analysis of the array shows that adding drives and drive shelves to the array will allow it to handle the storage requirements (both capacity and performance) that are anticipated. Because this design decision is already made, it can be considered a design constraint.
  • There are a variety of workloads on XYZ's existing physical servers, including Microsoft Exchange 2007, DHCP, Active Directory domain controllers, web servers, file servers, print servers, some database servers, and a collection of application servers. Most of these workloads are running on Microsoft Windows Server 2003, but some are Windows Server 2008 and some are running on Red Hat Enterprise Linux.
  • A separate network infrastructure refresh project determined that XYZ should adopt 10 Gigabit Ethernet and Fibre Channel over Ethernet (FCoE) for network and storage connectivity. Accordingly, Cisco Nexus 5548 switches will be the new standard access-layer switch moving forward (replacing older 1 Gbps access-layer switches), so this is what XYZ must use in its design. This is another design constraint.
  • XYZ would like to use Active Directory as its single authentication point, as it currently does today.
  • XYZ doesn't have an existing monitoring or management framework in place today.
  • XYZ has sufficient power and cooling in its datacenter to accommodate new hardware (especially as older hardware is removed due to the virtualization initiative), but it could have problems supporting high-density power or cooling requirements. The new hardware must take this into consideration.

Your Requirements and Constraints Will Likely Be Much More Detailed
The requirements and constraints listed for XYZ Widgets are intentionally limited to major design vectors to keep this example simple while still allowing us to examine the impact of various design decisions. In real life, of course, your requirements and constraints will almost certainly be much more detailed and in-depth. In fact, you should ensure that your design constraints and requirements don't leave any loose ends that may later cause a surprise. Although some loose ends could be categorized as assumptions, you'll want to be careful about the use of assumptions. Assumptions should not encompass major design points or design points that could be considered pivotal to project success or failure. Instead, carefully consider all assumptions you make and, where possible, gather the information necessary to turn them into requirements or constraints. Don't be afraid of being too detailed here!

Now that you have a rough idea of the goals behind XYZ's virtualization initiative, let's review its design, organized topically according to the chapters in this book.

Hypervisor Design

XYZ's vSphere design calls for the use of VMware vSphere 5.1, which—like vSphere 5.0—only offers the ESXi hypervisor, not the older ESX hypervisor with the RHEL-based Service Console. Because this is its first vSphere deployment, XYZ has opted to keep the design as simple as possible and to go with a local install of ESXi, instead of using boot from SAN or AutoDeploy.

vSphere Management Layer

XYZ purchased licensing for VMware vSphere Enterprise Plus and will deploy VMware vCenter Server 5.1 to manage its virtualization environment. To help reduce the overall footprint of physical servers, XYZ has opted to run vCenter Server as a VM. To accommodate the projected size and growth of the environment, XYZ won't use the virtual appliance version of vCenter Server, but will use the Windows Server–based version instead. The vCenter Server VM will run vCenter Server 5.1, and XYZ will use separate VMs to run vCenter Single Sign-On and vCenter Inventory Service. The databases for the various vCenter services will be provided by a clustered instance of Microsoft SQL Server 2008 running on Windows Server 2008 R2 64-bit. Another VM will run vCenter Update Manager to manage updates for the VMware ESXi hosts. No other VMware management products are planned for deployment in XYZ's environment at this time.

Server Hardware

XYZ Widgets has historically deployed HP ProLiant rack-mount servers in its datacenter. In order to avoid retraining the staff on a new hardware platform or new operational procedures, XYZ opted to continue to use HP ProLiant rack-mount servers for its new VMware vSphere environment. It selected the HP DL380 G8, picking a configuration using a pair of Intel Xeon E5-2660 CPUs and 128 GB RAM. The servers will have a pair of 146 GB hot-plug hard drives configured as a RAID 1 mirror for protection against drive failure.

Network connectivity is provided by a total of four on-board Gigabit Ethernet (GbE) network ports and a pair of 10 GbE ports on a converged network adapter (CNA) that provides FCoE support. (More information about the specific networking and shared storage configurations is provided in an upcoming section.) Previrtualization capacity planning indicates that XYZ will need 10 servers in order to virtualize the 200 workloads it would like to virtualize (a 20:1 consolidation ratio). This consolidation ratio provides an (estimated) VM-to-core ratio of about 2:1. This VM-to-core ratio depends on the number of VMs that XYZ runs with only a single vCPU versus multiple vCPU. Older workloads will likely have only a single vCPU, whereas some of the VMs that will handle XYZ's new ERP implementation are likely to have more vCPUs.

Networking Configuration

As we mentioned, each of the proposed VMware vSphere hosts has a total of four 1 GbE and two 10 GbE network ports. XYZ Widgets proposes to use a hybrid network configuration that uses both vSphere Standard Switches as well as a vSphere Distributed Switch.

Each ESXi host will have a single vSwitch (vSphere Standard Switch) that contains all four on-board 1 GbE ports. This vSwitch will handle the management and vMotion traffic, and vMotion will be configured to use multiple NICs to improve live migration performance times. XYZ elected not to have vMotion run across the 10 GbE ports because these ports are also carrying storage traffic via FCoE.

The vSphere Distributed Switch (VDS, or dvSwitch) will be uplinked to the two 10 GbE ports and will contain distributed port groups for the following traffic types:

  • Fault tolerance (FT)
  • VM traffic spanning three different VLANs

A group of Cisco Nexus 5548 switches provides upstream network connectivity, and every server will be connected to two switches for redundancy. Although the Nexus 5548 switches support multichassis link aggregation, the VDS won't be configured with the “Route based on IP hash” load-balancing policy; instead, it will use the default “Route based on originating virtual port ID.” XYZ may evaluate the use of load-based teaming (LBT) on the dvSwitch at a later date. Each Nexus 5548 switch has redundant connections to XYZ's network core as well as FC connections to the SAN fabric.

Shared Storage Configuration

XYZ Widgets already owned a FC-based SAN that was installed for a previous project. The determination was made, based on previrtualization capacity planning, that the SAN needed to be able to support an additional 15,000 I/O operations per second (IOPS) in order to virtualize XYZ's workloads. To support this workload, XYZ has added a four 200 GB enterprise flash drives (EFDs), forty-five 600 GB 15K SAS drives, and eleven 1 TB SATA drives. These additional drives support an additional 38.8 TB of raw storage capacity and approximately 19,000 IOPS (without considering RAID overhead).


Note
This environment was designed for server virtualization, so this drives the storage configuration. If you were designing for an environment to support virtual desktops (VDI), which has very different storage I/O requirements and I/O profiles, then your design would need to be adjusted accordingly. For example, VDI workloads are read heavy during boot, but write heavy during steady state—so the storage configuration needs to take that I/O profile into account.

The EFDs and the SAS drives will be placed into a single storage pool from which multiple LUNs will be placed. In addition to supporting vSphere APIs for Array Integration (VAAI) and vSphere APIs for Storage Awareness (VASA), the array has the ability to automatically tier data based on usage. XYZ will configure the array so that the most frequently used data is placed on the EFDs, and the data that is least frequently used will be placed on the SATA drives. The EFDs will be configured as RAID 1 (mirror) groups, the SAS drives as RAID 5 groups, and the SATA drives as a RAID 6 group. The storage pool will be carved into 1 TB LUNs and presented to the VMware ESXi hosts.

As described earlier, the VMware ESXi hosts are attached via FCoE CNAs to redundant Nexus 5548 FCoE switches. The Nexus 5548 switches have redundant uplinks to the FC directors in the SAN core, and the storage controllers of XYZ's storage array—an active/passive array according to VMware's definitions—have multiple ports that are also attached to the redundant SAN fabrics. The storage array is Asymmetric Logical Unit Access (ALUA) compliant.

VM Design

XYZ has a number of physical workloads that will be migrated into its VMware vSphere environment via a P2V migration. These workloads consist of various applications running on Windows Server 2003 and Windows Server 2008. During the P2V process, XYZ will right-size the resulting VM to ensure that it isn't oversized. The right-sizing will be based on information gathered during the previrtualization capacity-planning process.

For all new VMs moving forward, the guest OS will be Windows Server 2008 R2. XYZ will use a standard of 8 GB RAM per VM and a single vCPU. The single vCPU can be increased later if performance needs warrant doing so. A thick-provisioned 40 GB Virtual Machine Disk Format (VMDK) will be used for the system disk, using the LSI Logic SAS adapter (the default adapter for Windows Server 2008). XYZ chose the LSI Logic SAS adapter for the system disk because it's the default adapter for this guest OS and because support for the adapter is provided out of the box with Windows Server 2008. XYZ felt that using the paravirtual SCSI adapter for the system disk added unnecessary complexity. Additional VMDKs will be added on a per-VM basis as needed and will use the paravirtualized SCSI adapter. Because these data drives are added after the installation of Windows into the VM, XYZ felt that the use of the paravirtualized SCSI driver was acceptable for these virtual disks.

Given the relative newness of Windows Server 2012, XYZ decided to hold off on migrating workloads to this new server OS as part of this project.

VMware Datacenter Design

XYZ will configure vCenter Server to support only a single datacenter and a single cluster containing all 10 of its VMware ESXi hosts. The cluster will be enabled for vSphere High Availability (HA) and vSphere Distributed Resource Scheduling (DRS). Because the cluster is homogenous with regard to CPU type and family, XYZ has elected not to enable vSphere Enhanced vMotion Compatibility (EVC) at this time. vSphere HA will be configured to perform host monitoring but not VM monitoring, and vSphere DRS will be configured as Fully Automated and set to act on recommendations of three stars or greater.

Security Architecture

XYZ will ensure that the firewall on the VMware ESXi hosts is configured and enabled, and only essential services will be allowed through the firewall. Because XYZ doesn't initially envision using any management tools other than vCenter Server and the vSphere Web Client, the ESXi hosts will be configured with Lockdown Mode enabled. Should the use of the vSphere command-line interface (vCLI) or other management tools prove necessary later, XYZ will revisit this decision.

To further secure the vSphere environment, XYZ will place all management traffic on a separate VLAN and will tightly control access to that VLAN. vMotion traffic and FT logging traffic will be placed on separate, nonroutable VLANs to prevent any sort of data leakage.

vCenter Server will be a member of XYZ's Active Directory domain and will use default permissions. XYZ's VMware administrative staff is fairly small and doesn't see a need for a wide number of highly differentiated roles within vCenter Server. vCenter Single Sign-On will use XYZ's existing Active Directory deployment as an identity source.

Monitoring and Capacity Planning

XYZ performed previrtualization capacity planning. The results indicated that 10 physical hosts with the proposed specifications would provide enough resources to virtualize the existing workloads and provide sufficient room for initial anticipated growth. XYZ's VMware vSphere administrators plan to use vCenter Server's performance graphs to do both real-time monitoring and basic historical analysis and trending.

vCenter Server's default alerts will be used initially and then customized as needed after the environment has been populated and a better idea exists of what normal utilization will look like. vCenter Server will send email via XYZ's existing email system in the event a staff member needs to be alerted regarding a threshold or other alarm. After the completion of the first phase of the project—which involves the conversion of the physical workloads to VMs—then XYZ will evaluate whether additional monitoring and management tools are necessary. Should additional monitoring and capacity-planning tools prove necessary, XYZ is leaning toward the use of vCenter Operations to provide additional insight into the performance and utilization of the vSphere environment.

Examining the Design

Now that you've seen an overview of XYZ's VMware vSphere design, we'd like to explore the design in a bit more detail through a series of questions. The purpose of these questions is to get you thinking about how the various aspects of a design integrate with each other and are interdependent on each other. You may find it helpful to grab a blank sheet of paper and start writing down your thoughts as you work through these questions.

These questions have no right answers, and the responses that we provide here are simply to guide your thoughts—they don't necessarily reflect any concrete or specific recommendations. There are multiple ways to fulfill the functional requirements of any given design, so keep that in mind! Once again, we'll organize the questions topically according to the chapters in this book; this will also make it easier for you to refer back to the appropriate chapter where applicable.

Hypervisor Design

As you saw in Chapter 2, “The ESXi Hypervisor,” the decisions about how to install and deploy VMware ESXi are key decision points in vSphere designs and will affect other design decisions:

XYZ has selected local (stateful) installations of ESXi rather than boot from SAN or AutoDeploy. What are some drawbacks of this decision? For an organization that is new to VMware vSphere, using local (stateful) installations of ESXi is simpler and easier to understand and might be the best approach—operationally speaking—for that particular organization. Remember that it's important to consider not only the technical impacts of your design choices, but also the organizational and operational impacts. However, while this design choice does have some advantages, it also has disadvantages. Upgrading or patching the ESXi hosts might be more complex, and expanding the capacity of the vSphere environment requires new local installs on new hardware. This could make it difficult for XYZ's IT organization to respond quickly enough to changing business demands as XYZ Widgets' business grows.
What impact would it have on XYZ's design to switch to AutoDeploy for the ESXi hosts? Switching from local (stateful) installations to using AutoDeploy would have several impacts on the design. XYZ would need to add DHCP and TFTP services to the server subnet (where they might not have been present before), and this addition might impact other servers and equipment on the same subnet. Using AutoDeploy would introduce a dependency on these other network services and require that XYZ also use Host Profiles (if it wasn't using them already). The Host Profiles requirement, in turn, would introduce a dependency on vCenter Server as well.

vSphere Management Layer

We discussed design decisions concerning the vSphere management layer in Chapter 3, “The Management Layer.” In this section, we'll examine some of the design decisions XYZ made regarding its vSphere management layer:

XYZ is planning to run vCenter Server as a VM. What are the benefits of this arrangement? What are the disadvantages? As we discussed in Chapter 3, running vCenter Server as a VM can offer some benefits. For example, XYZ can protect vCenter Server from hardware failure using vSphere HA, which may help reduce overall downtime. Depending on XYZ's backup solution and strategy (not described here), it's possible that backups of vCenter Server may be easier to make and easier to restore. XYZ's hybrid design, illustrated in Figure 11.1, also sidesteps one perceived concern with running vCenter Server as a VM: the interaction between vCenter Server and the VDS it manages. Although changes in vSphere 5.1 greatly mitigate this concern (through VDS configuration rollback, for example), the placement of management traffic on a standard vSwitch eliminates this potential concern.

Are there other disadvantages that you see to running vCenter Server as a VM? What about other advantages of this configuration?

Figure 11.1 A hybrid network configuration for XYZ's vSphere environment

11.1
What would be the impact on XYZ's design if it wanted to run vCenter Server as a physical computer instead of as a VM? Ignoring the fact that VMware recommends running vCenter Server as a VM—and keeping in mind that best practices aren't to be followed blindly without an understanding of the reasoning behind them—it's possible for XYZ's design to be modified to run vCenter Server as a VM. However, a number of impacts would result. How will XYZ provide HA for vCenter Server? Will all of vCenter Server's components run on a single system, or will multiple physical systems be required? What about HA for the other components? What about environmental (power, rack space, cooling) considerations?
What is the impact of running all of vCenter Server's components, SQL Server, and vCenter Update Manager in the same guest OS instance? XYZ has opted to distribute these workloads across multiple VMs. If it consolidated them into a single VM, the resource needs of that VM would clearly be much greater than they would have been without combining these applications. For running multiple components on the same system, VMware recommends a minimum of 10 GB RAM, and that doesn't account for the SQL database. Taking the SQL database into account, you're looking at a VM with at least 16 GB RAM and at least two vCPUs.
Overall, the configuration complexity is slightly reduced because there is no need for a dedicated service account for authentication to SQL Server and because there are fewer VMs to manage (only one VM running all three applications instead of three VMs, each running one application). On the downside, a fault in this VM will affect multiple services, and running all these services in a single VM might limit the scalability of XYZ's design.

Server Hardware

Server hardware and the design decisions around server hardware were the focus of our discussion in Chapter 4, “Server Hardware.” In this section we ask you a few questions about XYZ's hardware decisions and the impact on the company's design:

What changes might need to be made to XYZ's design if it opted to use blade servers instead of rack-mount servers? The answers to this question depend partially on the specific blade-server solution selected. Because XYZ was described as using primarily HP servers, if the blade-server solution selected was HP's c7000 blade chassis, a number of potential changes would arise:
  • The design description indicates that XYZ will use 10 physical servers. They will fit into a single physical chassis but may be better spread across two physical chassis to protect against the failure of a chassis. This increases the cost of the solution.
  • Depending on the specific type of blade selected, the number and/or type of NICs might change. If the number of NICs was reduced too far, this would have an impact on the networking configuration. Changes to the network configuration (for example, having to cut out NFS traffic due to limited NICs) could then affect the storage configuration. And the storage configuration might need to change as well, depending on the availability of CNAs for the server blades and FCoE-capable switches for the back of the blade chassis.
What if XYZ decided to use 1U rack-mount servers instead of 2U rack-mount servers like the HP DL380 specified in the design description? Without knowing the specific details of the 1U server selected, it would be difficult to determine the exact impact on the design. If you assume that XYZ has switched to an HP DL360 or equivalent 1U rack server, you should ensure that the company can maintain enough network and storage connectivity due to a reduced number of PCI Express expansion slots. There might also be concerns over the RAM density, which would impact the projected consolidation ratio and increase the number of servers required. This, in turn, could push the cost of the project higher. You should also ensure that the selected server model is fully supported by VMware and is on the hardware compatibility list (HCL).
Would a move to a quad-socket server platform increase the consolidation ratio for XYZ? We haven't given you the details to determine the answer to this question. You'd need an idea of the aggregate CPU and memory utilization of the expected workloads. Based on that information, you could determine whether CPU utilization might be a bottleneck.

In all likelihood, CPU utilization wouldn't be a bottleneck; memory usually runs out before CPU capacity, but it depends on the workload characteristics. Without additional details, it's almost impossible to say for certain if an increase in CPU capacity would help improve the consolidation ratio. However, based on our experience, XYZ is probably better served by increasing the amount of memory in its servers instead of increasing CPU capacity.

Keep in mind that there are some potential benefits to the “scale-up” model, which uses larger servers like quad-socket servers instead of smaller dual-socket servers. This approach can yield higher consolidation ratios, but you'll need to consider the impacts on the rest of the design. One such potential effect to the design is the escalated risk of and impact from a server failure in a scale-up model with high consolidation ratios. How many workloads will be affected? What will an outage to that many workloads do to the business? What is the financial impact of this sort of outage? What is the risk of such an outage? These are important questions to ask and answer in this sort of situation.

Networking Configuration

The networking configuration of any vSphere design is a critical piece, and we discussed networking design in detail in Chapter 5, “Designing Your Network.” XYZ's networking design is examined in greater detail in this section.

What would be the impact of switching XYZ's network design to use only 1 Gigabit Ethernet, instead of 1 and 10 Gigabit Ethernet? Naturally, XYZ needs to ensure that the servers could provide enough network throughput without the 10 GbE links. Further, because the 10 GbE links are running FCoE, XYZ needs to provide some sort of connectivity back to XYZ's existing SAN; that probably means the addition of FC HBAs into the servers. This raises additional questions—are FC HBAs available for this server model? Are enough FC ports available on the SAN? What other operational impacts might result from this change? Most likely, XYZ needs to add not only FC HBAs but additional 1 GbE ports as well, which could really present an issue depending on the number of available PCIe slots in the server. Finally, XYZ needs to revisit the network equipment selection, because Nexus 5548 switches would no longer be needed to handle the 10 GbE/FCoE connectivity.
What benefit would there be, if any, to using link aggregation with the 1 GbE links in XYZ's design? The traffic that is going across the 1 GbE links is largely point-to-point; the management traffic is from the ESXi host to vCenter Server, and the vMotion traffic is host-to-host. Thus there would be very little benefit from the use of link aggregation, which mostly benefits one-to-many/many-to-one traffic patterns. Further, vSphere's support for multi-NIC vMotion already provides an effective mechanism for scaling vMotion traffic between hosts. The link-aggregation configuration is also more complex than a configuration that doesn't use link aggregation.

As a side note regarding link aggregation, the number of links in a link aggregate is important to keep in mind. Most networking vendors recommend the use of one, two, four, or eight uplinks due to the algorithms used to place traffic on the individual members of the link-aggregation group. Using other numbers of uplinks will most likely result in an unequal distribution of traffic across those uplinks.


Tip
For more information about how the load-balancing algorithms on Cisco's switches work, refer to www.cisco.com/en/US/tech/tk389/tk213/technologies_tech_note09186a0080094714.shtml.

What changes would need to be made, if any, to XYZ's design if it decided to use only standard vSwitches instead of the current hybrid approach? What are the advantages and disadvantages to this approach? Switching to vSphere standard switches (vSwitches) offers a slight decrease in complexity but results in significant impacts on operational concerns. Figure 11.2 shows how a vSwitch could be swapped for a dvSwitch in XYZ's design.

As the result of using only vSwitches, the administrative overhead is potentially increased because changes to the network configuration of the ESXi hosts must be performed on each individual host, instead of being centrally managed like the VDS. The fact that vSwitches are managed per host introduces the possibility of a configuration mismatch between hosts, and configuration mismatches could result in VMs being inaccessible from the network after a vMotion (either a manual vMotion or an automated move initiated by vSphere DRS).

On the flip side, given that XYZ is using Enterprise Plus licensing, it could opt to use host profiles to help automate the management of the vSwitches and help reduce the likelihood of configuration mismatches between servers.

There is also a loss of functionality, because a distributed switch supports features that a standard vSwitch doesn't support, such as Switched Port Analyzer (SPAN), inbound and outbound traffic shaping, and private VLANs. It's also important to understand that some additional VMware products, such as vCloud Director, require a VDS for full functionality. Although XYZ doesn't need vCloud Director today, switching to vSwitches might limit future growth opportunities, and this consideration must be included in the design analysis.

Figure 11.2 A potential configuration for XYZ using vSphere standard switches instead of a VDS

11.2

Shared Storage Configuration

The requirement of shared storage to use so many of vSphere's most useful features, like vMotion, makes shared storage design correspondingly more important in your design. Refer back to Chapter 6, “Storage,” if you need more information as we take a closer look at XYZ's shared storage design:

How would XYZ's design need to change if it decided to use NFS exclusively for all its storage? Are there any considerations to this design decision? Some changes are immediately apparent. First, XYZ would no longer need FCoE CNAs in its servers and could get by with straight Ethernet adapters, although it would most likely want to keep the 10 GbE support. Second, the way in which the storage is presented to the hosts would likely change; XYZ might opt to go with larger NFS exports. The size of the NFS exports would need to be gated by a few different factors:
  • The amount of I/O being generated by the VMs placed on that NFS datastore, because network throughput would likely be the bottleneck in this instance.
  • The amount of time it took to back up or restore an entire datastore. XYZ would need to ensure that these times fell within its agreed recovery time objective (RTO) for the business.
XYZ's design already calls for 10 GbE, so network throughput won't generally be an issue. The use of link aggregation won't provide any benefit in an NFS environment, so the fact that XYZ's design doesn't utilize link aggregation is a nonissue. On the other hand, XYZ might want to consider the use of Network I/O Control (NIOC) to more carefully regulate the IP-based traffic and ensure that the traffic types are given the appropriate priorities on the shared network links.

Historically speaking, VMware has had a tendency to support new features and functionality on block storage platforms first, followed by NFS support later. Although this trend isn't guaranteed to continue in the future, it would be an additional fact XYZ would need to take into account when considering a migration to NFS.

Finally, a move to only NFS would prevent the use of raw device mappings (RDMs) for any applications in the environment, because RDMs aren't possible on NFS.

How would XYZ's design need to change if it decided it wanted to use datastore clusters and Storage DRS in its design? The use of datastore clusters and Storage DRS could offer XYZ some operational benefits with regard to storage management and VM placement, so there are reasons XYZ might consider this as part of its design. If XYZ also wanted to use the array's autotiering functionality, though, it would probably need to configure Storage DRS to make its decision only on capacity, not on I/O latency. The use of datastore clusters might also lead XYZ to use a larger number of smaller datastores, because the vSphere administrators wouldn't need to worry about managing more datastores (these would be managed via Storage DRS for capacity and/or latency).
How would XYZ's design need to change if it decided to use iSCSI (via the VMware ESX software iSCSI initiator) instead of FC? As with replacing FCoE with NFS, the hardware configuration would need to change. XYZ would want to replace the FCoE CNA, although it would likely want to retain its 10 GbE connectivity. The company might also want to change the network configuration to account for the additional storage traffic, but the use of dual 10 GbE NIC ports doesn't provide much flexibility in that regard. Instead, XYZ might need to use NIOC and/or traffic shaping to help ensure that iSCSI traffic wasn't negatively affected by other traffic patterns.

The storage configuration might also need to change, depending on the I/O patterns and amount of I/O generated by the VMs.

Finally, using the software iSCSI initiator would affect CPU utilization by requiring additional CPU cycles to process storage traffic. This could have a negative impact on the consolidation ratio and require XYZ to purchase more servers than originally planned.

The default multipathing policy for an active/passive array is usually most recently used (MRU). Does XYZ's array support any other policies? What would be the impact of changing to a different multipathing policy if one was available? We noted that XYZ's storage array is an active/passive array, so the multipathing policy would typically be MRU. However, we also indicated that XYZ's array supports ALUA, which means the Round Robin multipathing policy is, in all likelihood, also supported. Assuming that typical storage best practices were followed (redundant connections from each storage processor to each SAN fabric), this means the VMware ESXi hosts will see four optimal paths for each LUN (and four non-optimal paths) and can put traffic on all four of those active paths instead of only one. This would certainly result in a better distribution of traffic across storage paths and could potentially result in better performance. It's important, though, to ensure that you follow the configuration recommendations available from the storage vendor where applicable.

VM Design

As we described in Chapter 7, “Virtual Machines,” VM design also needs to be considered with your vSphere design. Here are some questions and thoughts on XYZ's VM design:

Does the use of Windows Server 2003 present any considerations in a VMware vSphere environment? In general, the only real consideration with regard to Windows Server 2003 comes in the form of file-system alignment within the virtual disks. Windows Server 2003 is a fully supported guest OS, and VMware vSphere offers VMware Tools for Windows Server 2003. However, by default, NTFS partitions created in Windows Server 2003 aren't aligned on a 4 KB boundary, and this misalignment can potentially have a significant impact on storage performance as the environment scales. Based on the scenario given, the number of Windows Server 2003 workloads is and will be relatively small; therefore, the impact on the storage environment is likely to be quite limited in most cases. Nevertheless, XYZ should take the necessary steps to ensure that file-system partitions are properly aligned, both for systems that are converted via P2V and for systems that are built fresh in the virtual environment. For systems built fresh for the virtual environment, XYZ can streamline the process by using VM templates and correcting the file-system alignment in the VM templates.

Many variations of Linux are also affected, so XYZ should ensure that it corrects the file-system alignment on any Linux-based VMs as well.

Note that both Windows Server 2008 and Windows Server 2012 properly align partitions by default.

What impact would using thin-provisioned VMDKs have on the design? The performance difference between thick-provisioned VMDKs and thin-provisioned VMDKs is minimal and not an area of concern. Potential concerns over SCSI reservations due to frequent metadata changes aren't an issue in an environment of this size and would be eliminated entirely if XYZ used the VAAI support in its array (which is enabled by default when ALUA is configured). Operationally, XYZ would need to update its monitoring configuration to monitor for datastore oversubscription to ensure that it didn't find itself in a situation where a datastore ran out of available space.

VMware Datacenter Design

The logical design of the VMware vSphere datacenter and clusters was discussed at length in Chapter 8, “Datacenter Design.” Here, we'll apply the considerations mentioned in that chapter to XYZ's design:

What impact would it have on the design to use 2 clusters of 5 nodes each instead of a single cluster of 10 nodes? Cluster sizing affects a number of other areas. First, a reduced cluster size might give XYZ more flexibility in the definition of cluster-wide configuration settings. For example, does XYZ need an area where DRS is set to Partially Automated instead of Fully Automated? Do regulatory factors prevent XYZ from taking advantage of automated migrations that might drive this requirement? It's possible to set DRS values on a per-VM basis, but this practice grows unwieldy as the environment scales in size. To reduce operational overhead, XYZ might need to create a separate cluster with this configuration.

Reducing cluster size means you reduce the ability of DRS to balance workloads across the entire environment, and you limit the ability of vSphere HA to sustain host failures. A cluster of 10 nodes might be able to support the failure of 2 nodes, but can a cluster of 5 nodes support the loss of 2 nodes? Or is the overhead to support that ability too great with a smaller cluster?

Does the use of vCenter Server as a VM impact XYZ's ability to use VMware Enhanced vMotion Compatibility? EVC will be very helpful to XYZ over time. As XYZ adds servers to its environment, EVC can help smooth over differences in CPU families to ensure that vMotion can continue to migrate workloads between old and new servers.

However, the use of vCenter Server as a VM introduces some potential operational complexity around the use of EVC. VMware has a Knowledge Base article that outlines the process required to enable EVC when vCenter Server is running as a VM; see kb.vmware.com/kb/1013111. To avoid this procedure, XYZ might want to consider enabling EVC in the first phase of its virtualization project.

Security Architecture

We focused on the security of vSphere designs in Chapter 9, “Designing with Security in Mind.” As we review XYZ's design in the light of security, feel free to refer back to our security discussions from Chapter 9 for more information:

Does the default configuration of vCenter Server as a domain member present any security issues? If so, how could those issues be addressed? Recall that, by default, the Administrators local group on the computer where vCenter Server is installed is given the Administrator role in vCenter Server. When vCenter Server is in a domain, the Domain Admins group is a member of the local Administrators group. This confers the Administrator vCenter role on the Domain Admins group, which may not be the intended effect. To protect against this overly broad assignment of rights, you should create a separate local group on the vCenter Server computer and assign that group the Administrator role within vCenter Server. Then, remove the local Administrators group from the Administrator role, which will limit access to vCenter Server to only members of the newly created group.

Monitoring and Capacity Planning

Chapter 10, “Monitoring and Capacity Planning,” centered on the use and incorporation of monitoring and capacity planning in your vSphere design. Here, we examine XYZ's design in this specific area:

If XYZ needs application-level awareness for some of the application servers in its environment, does the design meet that requirement? As currently described, no. The built-in tools provided by vCenter Server, which are what XYZ currently plans to use, don't provide application awareness. They can't tell if Microsoft Exchange, for example, is responding. The built-in tools can only tell if the guest OS instance is responding, and then only if VM Failure Monitoring is enabled at the cluster level.

If XYZ needed application-level awareness, it would need to deploy an additional solution to provide that functionality. That additional solution would increase the cost of the project, would potentially consume resources on the virtualization layer and affect the overall consolidation ratio, and could require additional training for the XYZ staff.

Summary

In this chapter, we've used a sample design for a fictional company to illustrate the information presented throughout the previous chapters. You've seen how functional requirements drive design decisions and how different decisions affect various parts of the design. We've also shown examples of both intended and unintended impacts of design decisions, and we've discussed how you might mitigate some of these unintended impacts. We hope the information we've shared in this chapter has helped provide a better understanding of what's involved in crafting a VMware vSphere design.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.131.47