Advanced Networking

Windows Server 2012 is changing how we design networking for servers, and particularly, for Hyper-V. In this section, you are going to look at converged fabrics. You will also look at QoS, which makes converged fabrics possible.

Quality of Service

Although it doesn’t create many headlines, QoS is an important feature of Windows Server 2012. Without it, many of the new possibilities that enable flexibility, cost savings, and easier deployment of Hyper-V would just not be possible.

Remembering a World without QoS

Why did a clustered Windows Server 2008 R2 Hyper-V host require so many physical NICs? Let’s count them, leaving iSCSI aside for a moment:

1. Virtual machines: There would be at least one virtual network (the technology preceding the virtual switch), and each virtual network required one connection to the LAN.
2. Management OS: Although the management OS could share a virtual network’s physical NIC, we were always concerned about management traffic affecting virtual workloads, or vice versa.
3. Cluster: A private network between the nodes in the cluster provided a path for the heartbeat, and we usually configure redirected I/O to pass over this network.
4. Live migration: A second private network provided us with bandwidth to move virtual machines within the cluster with zero loss in availability for their workloads.

That is four NICs so far; keep counting. NIC teaming wasn’t universal with Windows Server 2008 R2, but if you implemented it, you’d now have eight NICs (two for every function, to make four NIC teams). Many larger deployments would put in an additional backup network or isolate the impact of the backup traffic from the management OS and the services running on the virtual machines. With NIC teaming, you’re now adding two more NICs, and you have 10 NICs at this point. And now we can think about iSCSI with its two NICs (using MPIO for iSCSI NICs because NIC teaming is not supported with iSCSI), and you require 12 NICs per host. And we haven’t even considered the need to support multiple virtual networks.


The Live Migration Network
The Live Migration network also provided a second path for the cluster heartbeat in case the cluster network had a problem.
Most people deployed 1 GbE NICs. The time it took to put a host into maintenance mode was dependent on how quickly the memory of the virtual machines on that host could be synchronized with other hosts. Imagine an engineer visiting a customer site and being told that it was going to take a weekend to put the 1 TB RAM host into maintenance mode so they could shut it down! This is why the Live Migration NIC became the first network to run at 10 Gbps.
Even with every possible tweak, Windows Server 2008 R2 could never fill a 10 GbE NIC. However, Windows Server 2012 is able to fully utilize these high-bandwidth pipes.
Thanks to the ability to run an unlimited number (depending on your hosts and networking) of simultaneous Live Migrations in Windows Server 2012, you can probably take advantage of even more bandwidth for those 4 TB RAM hosts that you are planning to deploy!

An incredible amount of cabling, documentation, careful implementation (many mistakes were basic networking/cabling errors), and switch ports required yet more spending on hardware, support, and electricity. Did the host hardware even have enough slots for all these quad-port NIC cards? Consider blade servers; they take only so many network expansion cards, and this usually meant buying another pair (or more) of those incredibly expensive blade chassis switches or Virtual Connect type of devices.

Why did we need so many NICs? Couldn’t we run these services on fewer NICs? Couldn’t larger organizations deploy 10 GbE networking and run everything through just a NIC team?


10 GbE Hardware Fabrics
Some hardware vendors have been selling 10 GbE fabric solutions that can slice up bandwidth at the hardware layer for years. These solutions are very expensive, have static QoS settings, are inflexible, and lock a customer into a single hardware vendor.

Without expensive hardware solutions, the answer was no; we needed to physically isolate each of these functions in the host. Live Migration needed bandwidth. Services running in virtual machines needed to be available to users, no matter what administrators did to move workloads. Monitoring agents couldn’t fail to ping the hosts, or alarms would be set off. Cluster heartbeat needs to be dependable, or we end up with unwanted failover. We physically isolated these functions in the host to guarantee quality of service.

Introducing QoS

QoS has been in Windows Server for some time, offering basic functionality. But we never had a function that could guarantee a minimum level of service to a virtual NIC or a protocol (or an IP port). That is exactly what we are getting with QoS in Windows Server 2012. At its most basic level, we can configure the following rules in QoS:

Bits per Second–Based or Weight-Based You will choose between implementing QoS by using bps-based or weight-based rules.
The bps-based rules can be quite specific, guaranteeing a very certain amount of bandwidth, which can be useful for some applications. For example, a host company (a public cloud) will usually create rules based on bandwidth that are easy to communicate to customers. However, bps rules can be considered inflexible, especially if workloads are mobile between hosts. Imagine guaranteeing more bandwidth than is possible if two highly guaranteed virtual machines are placed together in a large cloud, where such specific placement rules become unwieldy. Or what if virtual machines have to be moved to a host that cannot offer enough bandwidth?
You get more flexibility with weight-based rules. A weight-based rule is based on a share of bandwidth, with no consideration of the actual speed. For example, a virtual NIC in a virtual machine is guaranteed 50 percent of total bandwidth. On a 10 GbE NIC, that virtual machine is guaranteed 5 Gbps. That virtual machine can move to a different host that has a 1 GbE network. Now the virtual machine is guaranteed 512 Mbps.
While weight-based rules are extremely flexible and are usually going to be the correct choice, bps-based rules do have their uses because of the certainty that they can guarantee.
Minimum Bandwidth An engineer can guarantee a minimum share of the host’s bandwidth to a virtual NIC or a protocol. For example, if you guaranteed 25 percent of bandwidth to the SMB protocol, SMB would always be able to get 25 percent of the total bandwidth. If there was available bandwidth, SMB could consume it if required. However, if SMB is not using the full 25 percent, the idle bandwidth becomes available to other protocols until SMB requires it.
Minimum-bandwidth rules are all about guaranteeing an SLA for a service or function. Because they guarantee a share, they are flexible rules and are the recommended approach to designing QoS. Because flexibility is what we want, you will typically use (but not always) the weight-based approach to creating minimum-bandwidth rules.
Maximum Bandwidth You can limit the bandwidth consumption of a virtual NIC or protocol with this type of rule. Configuring a virtual machine to not exceed a certain amount of bandwidth can be useful in some circumstances. A hosting company (or public cloud) might limit customers’ bandwidth based on how much they are willing to pay. In that scenario, you should note that QoS rules won’t understand concepts such as the entity called a customer; QoS rules apply to individual virtual NICs or protocols. Another scenario in which a maximum-bandwidth rule is useful is when some application in a virtual machine goes crazy and wants to consume every bit of bandwidth that it can find. An administrator could temporarily apply a maximum-bandwidth rule to the necessary virtual NIC(s) to limit the problem without disconnecting the virtual machine, fix the problem, and remove the rule.
Maximum-bandwidth rules are all about limiting, and that limits flexibility. There is no concept of guaranteeing a service. There is no burst feature of the minimum rule, whereby a protocol can consume bandwidth if there is no impact on other protocols. And cloud purists might argue that it removes one of the key attributes of a cloud: elasticity.

The focus of Microsoft’s effort in Windows Server 2012 was minimum bandwidth. Most of the time, you will implement minimum-bandwidth QoS rules that are based on weight. This is the most flexible solution, because it makes no assumptions about hardware capacity, and it has an elastic nature, whereby virtual NICs or protocols can burst beyond the minimum guarantee.

Understanding the Three Ways to Apply QoS

There are three ways to apply QoS in Windows Server 2012. How we do it depends on how we design our Hyper-V hosts, and what kind of traffic we want to apply the rules to. Figure 4-18 illustrates how you will decide to apply QoS.

These are the three approaches to applying QoS:

Hyper-V Virtual Switch If the traffic in question is passing through a virtual switch, you can create QoS rules that are based on virtual NICs rather than on protocols. With this type of rule, you are creating minimum or maximum rules for connections.
Server Networking This category includes any networking that does not include a virtual NIC and a Hyper-V virtual switch, such as a physical NIC that is used by the management OS, or the NICs used by a file server. The rules are based on protocols rather than on virtual NICs, which can give you great granularity of control. There are two ways to apply QoS when you have physical networking.

Figure 4-18 Deciding on a QoS strategy

c04f018.eps
If you have NICs and physical networking (switches and adapters) that support Data Center Bridging (DCB), you can create rules for protocols or IP ports that will be applied by the hardware.
If you do not have end-to-end DCB-capable networking, you can create rules for protocols or IP ports that will be applied by the OS packet scheduler.

Data Center Bridging
Data Center Bridging, or DCB (IEEE 802.1), is an extension to Ethernet that allows the classification and prioritization of traffic for lossless transmission. DCB allows you to classify protocols and prioritize them. The QoS rules are applied at the hardware level, therefore not increasing the load on the OS packet scheduler.
You must have end-to-end support, including NICs and networking appliances, to use DCB.

Here’s an interesting twist: you can create QoS rules that are based on protocol or IP ports in the guest OS of a Windows Server 2012 virtual machine. QoS will be applied by the OS packet scheduler within the virtual machine, effectively carving up whatever bandwidth the virtual machine’s virtual NICs have access to.
Networking That Bypasses the Operating System Chapter 7 will teach you about something called Remote Direct Memory Access (RDMA). RDMA enables Server Message Block (SMB) Multichannel (powered by RSS) to use 100 percent of bandwidth in very high bandwidth networks without fully utilizing the processor of the server in question. This feature, called SMB Direct, does this by offloading the SMB transfer so that it effectively becomes invisible to the operating system. The throughput of SMB Direct with SMB Multichannel is able to match or exceed that of traditional storage connections. This has made it possible for Microsoft to support storing Windows Server 2012 virtual machines on Windows Server 2012 file shares instead of SAS/iSCSI/Fibre Channel SANs without any drop in performance. In fact, some file server designs will offer performance vs. price that cannot be matched by traditional storage. You should not skip Chapter 7!
High-end Hyper-V deployments that take advantage of SMB 3.0 will use RDMA. We cannot use the OS packet scheduler to apply QoS policies because the traffic is invisible to the management OS. For this reason, we must use DCB to apply QoS rules.
RDMA over Converged Ethernet (RoCE) is a type of high-capacity networking that supports SMB Direct. It is recommended that, if the NIC(s) support it, Priority-based Flow Control (or PFC,IEEE 802.1Qbb) should be enabled for the SMB Direct (RDMA) protocol to ensure that there is no data loss for Hyper-V host storage traffic.

QoS Is Not Just for Hyper-V
As you can see, only one of the three ways to classify traffic and apply QoS rules is specific to Hyper-V. QoS is useful for Hyper-V host design, but it can also be used outside the cloud in traditional physical servers.

Now you understand how to decide which type of QoS approach to use based on your network requirements. We can summarize as follows:

  • If the traffic comes from a virtual NIC, create rules for virtual NICs.
  • You should create rules for protocols or IP ports when the traffic is not coming from a virtual NIC. We’d prefer to do it with DCB, but we can use the OS packet scheduler. That last approach works inside virtual machines too.
  • Any protocol that the management OS cannot see must use DCB for applying QoS rules.

Implementing QoS

This section shows you how to implement each of the three approaches to QoS. There is no way to implement this functionality in the GUI in Windows Server 2012. Every step of the process will be done using PowerShell.

Applying QoS to Virtual NICs

When traffic is passing through a virtual switch, and therefore from a virtual NIC, we create QoS rules that are based on virtual NICs. This is the simplest of the three ways to configure QoS: you have a virtual NIC, and you give it a share of bandwidth.

The first thing we must deal with is the virtual switch, where the QoS rules will be applied. When you create a virtual switch by using PowerShell, you have the option to specify a bandwidth reservation mode (QoS rules guaranteeing a minimum amount of bandwidth) of either weight-based (a share of bandwidth) or absolute (bits per second). The virtual switch has to be one or the other. This does not affect your ability to specify maximum-bandwidth rules of either type. You also must do this at the time you create the virtual switch. If you do not specify an option, the virtual switch will be set to Absolute. The following snippet of PowerShell queries the bandwidth reservation mode of a virtual switch:

PS C:> (Get-VMSwitch "External1").BandwidthReservationMode
Absolute

The following example creates a new external virtual switch called ConvergedNetSwitch that is linked to a NIC team called ConvergedNetTeam. The bandwidth reservation mode is set to weight-based:

New-VMSwitch "ConvergedNetSwitch" -MinimumBandwidthMode weight `
-NetAdapterName “ConvergedNetTeam” -AllowManagementOS 0

You should perform this configuration on each host that you want to use virtual NIC QoS rules for. For example, maybe a virtual machine was to be assigned an SLA guaranteeing at least 10 percent of bandwidth. In this case, you need to be sure that the weight-based bandwidth reservation mode was configured on the appropriate virtual switch on each host that the virtual machine could be placed on.

We now can create QoS rules for each required virtual NIC. This is a simple operation. The following rule guarantees a virtual machine 10 percent of bandwidth of any virtual switch it is connected to:

Set-VMNetworkAdapter –VMName “Virtual Machine 1” -MinimumBandwidthWeight 10

You can even specify one particular virtual NIC in a virtual machine. The next example retrieves the first virtual NIC of Virtual Machine 2. It then sets an SLA indicating that the virtual NIC will get 20 percent of bandwidth if required, but it will be capped at 1 Gbps. You can see in this example that we are not prevented from using absolute-bps rules to limit bandwidth, even if the switch is set to a weight-based bandwidth reservation mode:

Set-VMNetworkAdapter {Get-VMNetworkAdapter -VMName “Virtual `
Machine 2”}[0] -MinimumBandwidthWeight 20 -MaximumBandwidth 1073741824

You will see more of this type of rule when you look at the topic of converged fabrics later in the chapter. Here are some important notes on this way of using QoS:

  • If you assign a virtual NIC a weight of 3 and another virtual NIC a weight of 1, then you have split the bandwidth to 75 percent and 25 percent.
  • Never assign a total weight of more than 100.
  • Be conservative when assigning weights to critical workloads such as cluster communications. Although a weight of 1 might be OK on a 10 GbE NIC for cluster communications, Microsoft is recommending that you use a weight of at least 5; the cluster heartbeat is critical in a Hyper-V cluster.
  • If you are going to assign QoS rules to virtual machines, consider creating classes such as Gold, Silver, and Bronze, with each class having a defined weight. This will make operating and documenting the environment much easier.
  • At the time of writing this book (just after the release of Windows Server 2012), running Get-VMNetworkAdapter -VMName *| FL Ba* will not show you the results of your QoS policies.

Applying QoS to Protocols with the OS Packet Scheduler

We will classify protocols and use the OS packet scheduler to apply QoS for traffic that does not go through a virtual switch in the operating system. This type of rule applies in two scenarios:

  • Traffic for physical NICs in which we do not have end-to-end support for DCB.
  • We want to apply QoS inside the guest operating system of a virtual machine, assuming that it is running Windows Server 2012.

The PowerShell cmdlet we will use is New-NetQosPolicy. New-NetQosPolicy is a very flexible cmdlet, allowing us to create QoS rules. The cmdlet supports built-in protocol classifications or filters shown in Table 4-2.

Table 4-2: Built-in protocol filters for QoS

LabelProtocolMatching Criteria
-LiveMigrationLive MigrationTCP 6600
-iSCSIiSCSITCP/UDP 3260
-SMBSMB or file servicesTCP/UDP 445
-NetDirect <Port Number>SMB Direct (RDMA)Match the identified RDMA port
-NFSNFSTCP/UDP 2049

The cmdlet will allow you to create rules based on source and destination IP details including the following:

  • Ports or port ranges
  • Network address
  • Protocol, such as TCP or UDP

You can even specify the Windows network profile (Domain, Private, or Public), or the priority of QoS rules if there are multiple matches. We could write an entire chapter on this cmdlet alone, so here are a few useful examples:

This first example uses the built-in filter to create a QoS rule to guarantee 30 percent of bandwidth to SMB. The rule is given a priority of 3.

New-NetQosPolicy “SMB” –SMB –MinBandwidthWeight 30 –Priority 3

Note that –MinBandwidthWeight is short for –MinBandwidthWeightAction, and -Priority is short for -PriorityValuc80e021Action. You do not need to type in the full names of the flags.

An important protocol to protect is cluster communications; you don’t want the cluster heartbeat to not be able to be starved of bandwidth! Notice how this example specifies the destination IP port for cluster communications (3343):

New-NetQosPolicy “Cluster” -IPDstPort 3343 –MinBandwidthWeightAction 15 `
–Priority 255

The priority of this cluster example is set to the maximum allowable value of 255 (the default is 127 if the priority of the rule is not defined). If we had a scenario in which another rule matched this rule (which is quite unlikely in this case), then this rule would win. The rule with the highest priority always wins. The lowest possible value is 0.

A handy option for New-NetQosPolicy is the -Default flag. This allows you to create a QoS rule for all protocols not matching a filter in any other rule. The following does this for us:

New-NetQosPolicy “EverythingElse” –Default –MinBandwidthWeightAction 15

So far, every example applies QoS on a per-server basis using the OS packet scheduler. A handy option is to use a Differentiated Services Code Point (DSCP) value to tag traffic (from 0 to 63) so that the physical network can apply QoS for you. This means that you can stop a single type of traffic from flooding the LAN.

A common source of pain for network administrators is the nightly backup. The next example uses a DSCP flag of 30 to mark backup traffic going to 10.1.20.101. The network administrators can use this flag to identify traffic on the LAN and apply QoS in the network devices to control all backup traffic.

New-NetQosPolicy –Name "Backup" -IPDstPrefixMatchCondition `
10.1.20.101/24 -DSCPAction 30

Remember that each of these examples can be used on Hyper-V hosts, traditional physical servers, and in virtual machines where the operating system is Windows Server 2012.

Applying QoS to Protocols Using DCB

The final option for QoS rules is to let the hardware do the work. It is the preferred option if you have DCB-capable NICs and network appliances, and it is the required option if you plan on creating QoS rules for SMB Direct (RDMA) to ensure that it has enough bandwidth through a host’s NIC(s).

DCB uses priorities to apply QoS policies. You will use the following process to classify protocols and create traffic classes that prioritize the traffic:

1. Install the DCB feature on the server by using Server Manager.
2. Create the required QoS policies by classifying protocols.
3. Create a traffic class that matches each created QoS policy.
4. You should enable PFC if you are using RoCE networking and the NIC(s) support PFC.
5. Enable DCB settings to be applied to the NICs.
6. Specify the NICs that will use DCB.

Enabling the DCB feature can be done in Server Manager or by running the quicker PowerShell alternative:

Install-WindowsFeature Data-Center-Bridging

Next you will want to create each of your QoS policies. The following example classifies SMB Direct by using –NetDirectPort 445. Note how this cmdlet does not specify any minimum or maximum bandwidth settings.

New-NetQosPolicy “SMB Direct” –NetDirectPort 445 –Priority 2

We specify the bandwidth rule when we create the traffic class for DCB. We should match the name (for consistency) and priority (required) of the QoS classification when we create a DCB traffic class.

New-NetQosTrafficClass “SMB Direct” –Priority 2 –Algorithm ETS –Bandwidth 40

DCB can work slightly differently as compared to the other examples. We are setting a bandwidth rule of 40 percent, but DCB can use an algorithm to ensure that lesser protocols are not starved. Enhanced Transmission Selection, or ETS (http://msdn.microsoft.com/library/windows/hardware/hh406697(v=vs.85).aspx), is an industry standard that ensures lower-priority protocols of a minimum level of bandwidth. The alternative is to specify Strict as the algorithm, and that will cause DCB to apply the traffic class with no exceptions.

If we are using RoCE, we should enable PFC for SMB Direct if the NIC(s) support it. This should be done on both ends of the connection, such as a Hyper-V host using SMB storage for virtual machine files, and the file server that is providing that storage. You can enable PFC by running the next example. Once again, the priority will match the SMB Direct QoS policy and DCB traffic class:

Enable-NetQosFlowControl –Priority 2

The next step is to enable DCB settings to be pushed down to the NICs:

Set-NetQosDcbxSetting –Willing $false

You then enable DCB on each required NIC:

Enable-NetAdapterQos “Ethernet 1”

DCB does require a bit more work than the other two ways of implementing QoS. However, it is a hardware function, so it is going to be the most efficient option. And as you will learn in Chapter 7, using SMB Direct will be completely worth the effort.

Using NIC Teaming and QoS

NIC teaming and QoS are compatible features of Windows Server 2012. Classification, tagging, and PFC all work seamlessly. There are some things to watch out for.

Minimum-bandwidth policy for virtual NICs passing through a Hyper-V Port NIC team can be complicated. Remember that virtual NICs are hashed to specific team members (physical NICs) in the team, and they are limited to that team member until a failover is required. The weights of those virtual NICs are used to divide up the bandwidth of a team member. They are not used to divide up the bandwidth of the entire team.

You cannot assign more than the bandwidth of a single team member by using absolute or bps-based rules.

How you configure your NIC team can also affect how DCB behaves, because traffic may not be distributed evenly between the various team members. Review the two modes and two load-distribution methods of NIC teaming from earlier in this chapter to see how this might impact your design.

Understanding the Microsoft Best Practice Design

There is no published best practice on designing QoS from Microsoft. The reason is simple; there are just too many variables in the design process that engineers and consultants will have to work through. Our advice is as follows:

  • Use the decision process illustrated in Figure 4-18 earlier to decide which type of QoS you need to implement.
  • Determine how much bandwidth each virtual NIC or protocol requires. One way to do that is to imagine that you can install NICs of any speed and have to always choose the right one for budget reasons, but you could temporarily burst through the limits.
  • Use the right NIC-team design for your network environment and workload. Ideally, you will be able to define the switch architecture to give you the best results.
  • Do not enable DCB and the OS packet scheduler QoS rules on the same NICs or networking stack. They are not designed to work together.

Converged Fabrics

We have presented many networking technologies that are new in Windows Server 2012. At times, you might have wondered whether each technology is important. They are, because they led you to this part of the chapter, and possibly one of the most important parts of the book. This is where you learn how to design converged fabrics.

Understanding Converged Fabrics

Fabric is a term that refers to a network. Earlier in the chapter, when we talked about a world without QoS, we listed all the network connections required for building a clustered Hyper-V host. Each one of those connections is referred to as a fabric. We questioned why we needed so many networks. In past versions of Windows Server and Hyper-V, we used physical isolation of those fabrics to guarantee a minimum level of service. However, we now have QoS in Windows Server 2012, which enables us to set a minimum-bandwidth reservation for virtual NICs or for protocols. And this makes it possible for us to converge each fabric into fewer NICs, possibly even as few as one.

The benefits should be immediately obvious. You are going to have fewer NICs. That means less cabling, fewer switch ports, less hardware being purchased, smaller support contracts, and the electricity bill being further reduced. Converged fabrics give us design flexibility too. We can use a NIC or aggregation of NICs (NIC team) and divide it logically to suit the needs of the infrastructure or business.

There are some less obvious benefits too. At this point, you have seen how to script the creation and configuration of all the networking components of Hyper-V. The entire construction of a converged fabric can be scripted. Write it once and reuse it many times; this is going to have huge time-saving and consistency-making rewards, whether you are an operator deploying hundreds of Hyper-V hosts in a public cloud or a field engineer deploying one host at a time for small-business customers.

Another less obvious benefit is that you can make the most of your investments in high-speed networking. Many organizations have started to implement 10 GbE networking. Many others would like to, but there is no way that they could pay to install that many 10 GbE NICs or pay for underutilized switches. Converged fabrics is about making full, but well-controlled, use of bandwidth. An investment in 10 GbE (or faster) networking with converged fabrics will allow more than just the live migration to be faster. Other fabrics can make use of that high-capacity bandwidth while live migration doesn’t need it. This is balanced by administrator-defined QoS policies.

Having too many eggs in one basket is always bad. NIC teaming comes to the rescue because we can team two of those high-speed networks to not only achieve failover, but also to take advantage of link aggregation for better sharing of bandwidth.

How much convergence is too much? Or how much convergence is the right amount? Quite honestly, we have to give the classic consultant answer of “it depends” for those questions. You should then look at various factors such as actual bandwidth requirements for each fabric vs. what each NIC or NIC team can provide. We should also consider the hardware enhancements or offloads that we plan to use. For example, you learned earlier in this chapter that RSS and DVMQ can offer greatly improved network performance, but they cannot be enabled on the same NICs.

The days of the traditional physical isolation for every fabric are truly over. Microsoft has given us the tools in Windows Server 2012 to create more-elegant, more-economic, more-flexible, and more-powerful designs.

Designing Converged Fabrics

The beauty of converged fabrics is that there is no one right answer for everyone; the design process is a matter of balancing desired convergence with compatibility and with bandwidth requirements. This section presents designs that we can expect to be commonly deployed over the coming years.

There is no feature in Windows Server 2012 for converged fabrics. There is no control in Server Manager, no option in Network Connections, and no specific PowerShell module. If you have read this chapter from end to end, taking the time to understand each section, you already know what the components are and how to implement converged fabrics. All that remains is to know how to design them.

Using Management OS Virtual NICs

There is a reason that we made a point about virtual NICs in the management OS. You can create a design in which all of the host’s fabrics pass through the virtual switch, and on to the NIC or NIC team to reach the physical network, as you can see in Figure 4-19. Pairing this with QoS policies for the virtual NICs will guarantee each virtual NIC a certain amount of bandwidth.

Figure 4-19 Using virtual NICs to converge fabrics

c04f019.eps

Each management OS virtual NIC is connected to a port on the virtual switch. And each management OS virtual NIC will appear as a connection, with its own protocol configurations, such as IPv4 and/or IPv6 settings. You can even isolate these virtual NICs on the physical network by assigning VLAN IDs to them.

A QoS will be created to assign each management OS virtual NIC a minimum amount of bandwidth. We don’t need to use a protocol approach because we’re using each virtual NIC for a specific function.

Although you can assign QoS policies to the virtual machine virtual NICs, we’re going to try to avoid this unless absolutely necessary. It can become messy. Here’s an example why:

1. You assign a weight of 1 to many virtual machines.
2. You use functionality such as Power Optimization or Dynamic Optimization in System Center 2012 Virtual Machine Manager to automatically move virtual machines around a Hyper-V cluster using Live Migration.
3. At night, many idle virtual machines might be squeezed down to a few hosts in a cluster so that idle hosts can be powered down (to reduce power bills).
4. You find yourself assigning a sum weight of well over 100 to all the virtual NICs on the remaining powered-up hosts.

Who really wants to keep track of all those virtual NICs, especially when you might have up to 8,000 virtual machines in a single cloud? Experienced engineers like to manage by exception. We can take this approach with QoS:

1. Assign minimum bandwidth policies to the management OS virtual NICs.
2. Assign a default QoS policy to the virtual switch to reserve bandwidth for all virtual NICs that do not have an explicit QoS policy.

Here is the PowerShell to create a virtual switch and the management OS virtual NICs, assign VLAN IDs to the new virtual NICs, and apply the required QoS policies:

write-host "Creating external virtual switch, with no Management OS virtual `
NIC, with weight-based QoS”
#You could substitute the name of a NIC team for the name of the physical `
(NIC Ethernet 2)
New-VMSwitch "ConvergedNetSwitch" -MinimumBandwidthMode weight `
-NetAdapterName “Ethernet 2” -AllowManagementOS 0
write-host "Setting default QoS policy"
Set-VMSwitch "ConvergedNetSwitch" -DefaultFlowMinimumBandwidthWeight 40
write-host "Creating virtual NICs for the Management OS"
Add-VMNetworkAdapter -ManagementOS -Name "ManagementOS" -SwitchName `
“ConvergedNetSwitch”
Set-VMNetworkAdapter -ManagementOS -Name "ManagementOS" `
-MinimumBandwidthWeight 20
Set-VMNetworkAdapterVlan –ManagementOS –VMNetworkAdapterName `
“ManagementOS” –Access -VlanId 101
Add-VMNetworkAdapter -ManagementOS -Name "Cluster" -SwitchName `
“ConvergedNetSwitch”
Set-VMNetworkAdapter -ManagementOS -Name "Cluster" -MinimumBandwidthWeight 10
Set-VMNetworkAdapterVlan –ManagementOS –VMNetworkAdapterName "ManagementOS" `
–Access -VlanId 102
Add-VMNetworkAdapter -ManagementOS -Name "LiveMigration" -SwitchName `
“ConvergedNetSwitch”
Set-VMNetworkAdapter -ManagementOS -Name "LiveMigration" `
-MinimumBandwidthWeight 30
Set-VMNetworkAdapterVlan –ManagementOS –VMNetworkAdapterName `
“LiveMigration” –Access -VlanId 103

In a matter of seconds, you have configured a simple converged fabric for a clustered Hyper-V host that uses a type of storage that is not based on iSCSI.

We should do some mathematics before moving forward. A total weight of 100 has been allocated. The percentage of bandwidth allocated to virtual NICs is calculated as follows:

Bandwidth percentage = (allocated weight / sum of weights)

Keep that in mind in case you allocate a total weight of 80 (such as 30, 10, 20, and 20) and you find percentages that look weird when you run the following cmdlet:

Get-VMNetworkAdapter -ManagementOS | FL Name, BandwidthPercentage

We could take the preceding script one step further by appending the following scripting that configures the IPv4 addresses of the host’s new virtual NICs:

write-host "Waiting 30 seconds for virtual devices to initialize"
Start-Sleep -s 30
write-host "Configuring IPv4 addresses for the Management OS virtual NICs"
New-NetIPAddress -InterfaceAlias "vEthernet (ManagementOS)" -IPAddress `
10.0.1.31 -PrefixLength 24
Set-DnsClientServerAddress -InterfaceAlias "vEthernet `
(ManagementOS)” -ServerAddresses “10.0.1.11”
New-NetIPAddress -InterfaceAlias "vEthernet (Cluster)" -IPAddress `
192.168.1.31 -PrefixLength “24”
New-NetIPAddress -InterfaceAlias "vEthernet (LiveMigration)" `
-IPAddress 192.168.2.31 -PrefixLength “24”

It doesn’t take too much imagination to see how quickly host networking could be configured with such a script. All you have to do is change the last octet of each IP address, and you can run it in the next host that you need to ready for production.

A Converged Fabrics Design Process

With so many possible options, you might want to know where to start and how to make decisions. Here is a suggested process:

1. Start with a single NIC in your host design.
2. Decide which kind of storage you want to use. Storage such as SAS and Fibre Channel will have no impact, because they use host bus adapters (HBAs) instead of NICs. Will you need to enable RSS? Will you use RDMA for high-bandwidth SMB 3.0, thus requiring end-to-end support for DCB on the network?
3. Do you want to enable RSS and/or DVMQ? Remember, they cannot share a NIC.
4. How much bandwidth will each connection require? For example, you need at least 1 GbE for live migration in a small-to-medium implementation. Will 1 GbE NICs be enough, or will you need 10 GbE NICs, or faster for SMB storage?
5. Now consider NIC teaming, to add link aggregation and failover.
6. Ensure that whatever approach you are considering for the storage connections will be supported by the storage manufacturer.

A whiteboard, a touch-enabled PC with a stylus, or even just a pen and paper will be tools you need now. We’re going to present a few possible converged fabric designs and discuss each one:

Virtualized NICs with NIC Teaming This design of Figure 4-19 can be modified slightly to use a NIC team for the external virtual switch, as shown in Figure 4-20. An additional management OS virtual NIC has been added to isolate the backup traffic from the host to a specific VLAN.

Figure 4-20 Converged fabric using virtual NICs and NIC team

c04f020.eps
The NIC team will typically be created using Hyper-V Port load distribution because there are many more virtual NICs (including the virtual machines) than there are team members.
The physical makeup of the team members will depend on your requirements and your budget. You could make this solution by using four onboard 1 GbE NICs. But you do have to account for NIC failure in a NIC team; would the bandwidth of three NICs have been enough if you were using a traditional physical NIC fabric design? Maybe this design would be good for a small implementation. Maybe a medium one would do better with an additional quad-port NIC card being added to the team, giving it a total of 8×1 GbE NICs. That’s a lot of NICs, and at that point, you might be economically better off putting in 2×10 GbE NICs. Very large hosts (Hyper-V can handle 4 TB RAM in a host) probably want to go bigger than that again.
You could enable DVMQ in this design to enhance the performance of the virtual NICs.
This design is perfect for simple host designs that don’t want to leverage too many hardware enhancements or offloads.
Using SMB Storage with SMB Direct and SMB Multichannel This design (Figure 4-21) is a variation of the previous one, but it adds support for storing virtual machines on a Windows Server 2012 file share and using RSS (SMB Multichannel) and RDMA (SMB Direct).
An additional pair of RDMA-enabled NICs (iWARP, RoCE, or InfiniBand) are installed in the host. The NICs are not NIC teamed; there is no point, because RDMA will bypass the teaming and SMB Multichannel will happily use both NICs simultaneously. The RDMA NICs are placed into the same network as the file server that will store the virtual machines. We do not need to enable QoS for the NICs because the only traffic that should be passing through them is SMB.
You can enable DVMQ on the NICs in the NIC team to improve the performance of virtual-machine networking. SMB Multichannel is going to detect two available paths to the file server and therefore use both storage NICs at the same time. Enabling RSS on those NICs, thanks to their physical isolation, will enable SMB Multichannel to create multiple streams of traffic on each of the two NICs, therefore making full use of the bandwidth.

Figure 4-21 Dedicated NICs for SMB 3.0 storage

c04f021.eps
This design can provide a hugely scalable host, depending on the bandwidth provided. SMB 3.0 over InfiniBand has been demonstrated to provide a throughput of around 16 gigabytes (not gigabits) per second with a penalty of just 5 percent CPU utilization (Microsoft TechEd 2012). That is true enterprise storage scalability!
Converged iSCSI Connections Both of the iSCSI connections will be created as Management OS virtual NICs. Each virtual NIC will be in a different VLAN, therefore meeting the requirement of both of them being in unique IP ranges. You can see this design in Figure 4-22. There are three important points to remember with this design. As usual with this style of design, QoS policies will be created for the virtual NICs.
First, you should reserve a minimum amount of bandwidth for each iSCSI virtual NIC that would match what you would use for a physical NIC implementation. Do not starve the storage path.
Second, you should verify that this design will not cause a support issue with your storage manufacturer. The manufacturers often require dedicated switches for iSCSI networking to guarantee maximum storage performance, and this goes against the basic concept of converged fabrics.
Finally, MPIO should be used as usual for the iSCSI connections. If you are using a storage manufacturer’s device-specific module (DSM), make sure that this design won’t cause a problem for it.

Figure 4-22 Converging the iSCSI connections

c04f022.eps
This design is a budget one. Ideally, you will use physical NICs for the iSCSI connections. This is because using nonconverged NICs will allow you to have dedicated switches for the iSCSI fabric, will provide better storage performance, and will have support from those storage manufacturers that require dedicated iSCSI switches.
Physically Isolated iSCSI The supportability of this converged fabric design is maximized by implementing iSCSI using a pair of traditional physical NICs, as shown in Figure 4-23. No QoS needs to be applied to the iSCSI NICs because they are physically isolated.
Enabling SR-IOV SR-IOV-enabled NICs cannot be used in a converged fabric. In this case (Figure 4-24), we are going to abandon the virtual NIC approach for the management OS fabrics.
A pair of SR-IOV-capable NICs are installed in the host. One SR-IOV external virtual switch is created for each of the two NICs. In this example, Virtual Machine 1 has no network-path fault tolerance because it has just a single virtual NIC. Virtual Machine 2 does have network-path fault tolerance. It is connected to both virtual switches via two virtual NICs, and a NIC team is created in the guest OS.

Figure 4-23 Using physically isolated iSCSI

c04f023.eps
All of the management OS functions are passing through a single team interface on a NIC team, configured with address-hashing load distribution. This algorithm will enable communications to use the bandwidth of the team, within the bounds of QoS. DCB is being used in this example to apply QoS policies on a per-protocol basis, thus negating the need for even virtual isolation. We can further tag the traffic by using DSCP so that network administrators can control traffic at a VLAN level. If DCB NICs were not available, the QoS policies could be enforced by the OS packet scheduler. RSS will also be enabled, if possible, to provide more streams for SMB Multichannel.
The benefit of this design is that it uses fewer VLANs and requires just a single IP address per host for the management OS. Bigger hosts can also be used because the processing of virtual-machine networking is offloaded to SR-IOV rather than being processed by the management OS (virtual switches).
A variation of this design allows you to use RDMA for SMB storage. The NICs in the management OS would not be teamed because RDMA will bypass teaming. DCB would be required for this design, and PFC would be required for RoCE NICs. RSS would be enabled on the management OS NICs for SMB Multichannel.

Figure 4-24 Nonconvergence of SR-IOV NICs

c04f024.eps

This is just a small sample of the possible converged fabric designs. Using the suggested process will help you find the right design for each project that you do.

With the converged fabric configured on your hosts, you can join them to a domain, install management agents, or start to build Hyper-V clusters.

Once you go converged fabrics, you’ll never go back. The flexibility that they provide, the abstraction from physical networks, and the ability to deploy a host configuration in seconds from a reusable script will bring you back every single time.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.207.206