Chapter 8

Performance Tuning

Images

CERTIFICATION OBJECTIVES

8.01     Host and Guest Resource Allocation

8.02     Optimizing Performance

Images     Two-Minute Drill

Q&A   Self Test


Appropriately distributing compute resources is one of the most important aspects of a virtualized cloud environment. Planning for future growth and the ability to adjust compute resources on demand is one of the many benefits of a virtualized environment. This chapter explains how to configure compute resources on a host computer and a guest virtual machine and how to optimize the performance of a virtualized environment.

CERTIFICATION OBJECTIVE 8.01

Host and Guest Resource Allocation

It is important to allocate the correct resources for hosts and the guest virtual machines that reside on them. This section covers host resource allocation first since a host is needed to operate guests. This is followed by guest virtual machine resource allocation.

Host Resource Allocation

Building a virtualization host requires careful consideration and planning. First, you must identify which resources the host requires and plan how to distribute those resources to a virtual machine. Next, you must plan the configuration of the guest virtual machine that the host computer will serve.

You must attend to the configuration of resources and the licensing of the host in the process of moving to a virtualized environment or virtual cloud environment. This consists of the following:

Images   Compute resources

Images   Quotas and limits

Images   Licensing

Images   Reservations

Images   Resource pools

Compute Resources

Adequate compute resources are key to the successful operation of a virtualization host. Proper planning of the compute resources for the host computer ensures that the host can deliver the performance needed to support the virtualization environment.

Compute resources can best be defined as the resources that are required for the delivery of virtual machines. They are the disk, processor, memory, and networking resources that are shared across pools of virtual machines and underpin their ability to deliver the value of the cloud models, as covered in Chapter 1.

As a host is a physical entity, the compute resources that the host utilizes are naturally physical, too. However, cloud providers may allocate a subset of their available physical resources to cloud consumers to allocate to their own virtual machines. Compute resources are displayed in Figure 8-1.

FIGURE 8-1   Host compute resources: processor, disk, memory, and network

Images

For disk resources, physical rotational disks and solid state hard drives are utilized, as well as their controller cards, disk arrays, host bus adapters, and networked storage transmission media. For network resources, network interface cards (NICs) and physical transmission media such as Ethernet cables are employed. Central processing units (CPUs) are employed for the processor, and physical banks of RAM are used to supply memory.

Quotas and Limits

Because compute resources are limited, cloud providers must protect them and make certain that their customers only have access to the amount that the cloud providers are contracted to provide. Two methods used to deliver no more than the contracted amount of resources are quotas and limits.

Limits are a defined floor or ceiling on the amount of resources that can be used, and quotas are limits that are defined for a system on the total amount of resources that can be utilized. When defining limits on host resources, cloud providers have the option of setting a soft or hard limit. A soft limit will allow the user to save a file even if the drive reaches 100GB, but will still log an alert and notify the user. A hard limit, on the other hand, is the maximum amount of resources that can be utilized. For example, a hard limit of 100GB for a storage partition will not allow anything to be added to that partition once it reaches 100GB, and the system will log an event to note the occurrence and notify the user.

The quotas that are typically defined for host systems have to do with allocation of the host compute resources to the host’s guest machines. These quotas are established according to service level agreements (SLAs) that are created between the cloud provider and cloud consumers to indicate a specific level of capacity.

Capacity management is essentially the practice of allocating the correct amount of resources in order to deliver a business service. The resources that these quotas enforce limits upon may be physical disks, disk arrays, host bus adapters, RAM chips, physical processors, and network adapters. They are allocated from the total pool of resources available to individual guests based on their SLA.

Quotas and limits on hosts can be compared to speed limits on the highway; very often there are both minimum and maximum speeds defined for all traffic on the roads. A quota can be defined as the maximum speed and a limit can be defined as the minimum speed for all vehicles using that road’s resources.

Licensing

After designing the host computer’s resources and storage limits, an organization or cloud provider needs to identify which vendor it is going to use for its virtualization software. Each virtualization software vendor has its own way of licensing products. Some of them have a free version of their product and only require a license for advanced feature sets that enable functionality, like high availability, performance optimization, and systems management. Others offer a completely free virtualization platform but might not offer some of the more advanced features with their product.

Choosing the virtualization platform is a critical step, and licensing is a factor in that decision. Before deploying a virtualization host and choosing a virtualization vendor, the organization must be sure to read the license agreements and determine exactly which features it needs and how those features are licensed. In addition to licensing the virtualization host, the guest requires a software license as well.

Reservations

Reservations work similarly to quotas. Whereas quotas are designed to ensure the correct capacity gets delivered to customers by defining an upper limit for resource usage, reservations are designed to operate at the other end of the capacity spectrum by ensuring that a lower limit is enforced for the amount of resources guaranteed to a cloud consumer for their guest virtual machine or machines.

The importance of a reservation for host resources is that it ensures certain virtual machines always have a defined baseline level of resources available to them regardless of the demands placed on them by other virtual machines. The reason these guest reservations are so important is that they enable cloud service providers to deliver against their SLAs.

Resource Pools

Resource pools are slices or portions of overall compute resources on the host or those allocated from the cloud provider to consumers. These pools include CPU, memory, and storage and they can be provided from a single host or a cluster of hosts. Resources can be partitioned off in resource pools to provide different levels of resources to specific groups or organizations, and they can be nested within a hierarchy for organizational alignment.

Resource pools provide a flexible mechanism with which to organize the sum total of the compute resources in a virtual environment and link them back to their underlying physical resources.

Guest Resource Allocation

Before creating a guest virtual machine, an organization needs to consider several factors. A guest virtual machine should be configured based on the intended application or task that the guest is going to support. For example, a guest running a database server may require special performance considerations, such as more CPUs or memory based on the designated role of the machine and the system load. In addition to CPUs and memory, a guest may require higher-priority access to certain storage or disk types.

An organization must consider not only the role of the virtual machine, the load of the machine, and the number of clients it is intended to support but also the performance of ongoing monitoring and assessment based on these factors. The amount of disk space the guest is using should be monitored and considered when deploying and maintaining storage.

The allocation of resources to virtual machines must be attended to in the process of moving to a virtualized environment or virtual cloud environment because the organization will either be allocating these resources from its available host resources or paying for them from a cloud provider. Organizations should evaluate each of the following resources:

Images   Compute resources

Images   Quotas and limits

Images   Licensing

Images   Physical resource redirection

Images   Resource pools

Images   Dynamic resource allocation

Compute Resources

The compute resources for virtual machines enable service delivery in the same way that compute resources for hosts do. However, the resources themselves are different in that they are virtualized components instead of physical components that can be held in your hand or plugged into a motherboard.

Guest compute resources are still made up of disk, network, processor, and memory components, but these components are made available to virtual machines not as physical resources, but as abstractions of physical components presented by a hypervisor that emulates those physical resources for the virtual machine.

Physical hosts have a Basic Input/Output System (BIOS) that presents physical compute resources to a host so they can be utilized to provide computing services, such as running an operating system and its component software applications. With virtual machines, the BIOS is emulated by the hypervisor to provide the same functions. When the BIOS is emulated and these physical resources are abstracted, administrators have the ability to divide the virtual compute resources from their physical providers and distribute those subdivided resources across multiple virtual machines. This ability to subdivide physical resources is one of the key elements that make cloud computing and virtualization so powerful.

When splitting resources among multiple virtual machines, there are vendor-specific algorithms that help the hypervisor make decisions about which resources are available for each request from its specific virtual machine. There are requirements of the host resources for performing these activities, including small amounts of processor, memory, and disk. These resources are utilized by the hypervisor for carrying out the algorithmic calculations to determine which resources will be granted to which virtual machines.

Quotas and Limits

As with host resources, virtual machines utilize quotas and limits to constrain the ability of users to consume compute resources and thereby prevent users from either monopolizing or completely depleting those resources. Quotas can be defined either as hard or soft. Hard quotas set limits that users and applications are barred from exceeding. If an attempt to use resources beyond the set limit is registered, the request is rejected, and an alert is logged that can be acted upon by a user, administrator, or management system. The difference with a soft quota is that the request is granted instead of rejected, and the resources are made available to service the request. The same alert, however, is still logged so that action can be taken to either address the issue with the requester for noncompliance with the quota or charge the appropriate party for the extra usage of the materials.

Licensing

Managing hardware resources can be less of a challenge than managing license agreements. Successfully managing software license agreements in a virtual environment is a tricky proposition. The software application must support licensing a virtual instance of the application.

Some software vendors still require the use of a dongle or a hardware key when licensing their software. Others have adopted their licensing agreements to coexist with a virtual environment. A guest requires a license to operate just as a physical server does. Some vendors have moved to a per-CPU-core type of license agreement to adapt to virtualization. No matter if the application is installed on a physical server or a virtual server, it still requires a license.

Organizations have invested heavily in software licenses. Moving to the cloud does not always mean that those licenses are lost. Bring Your Own License (BYOL), for example, is a feature for Azure migrations that allows existing supported licenses to be migrated to Azure so that companies do not need to pay for the licenses twice. Software assurance with license mobility allows for licenses to be brought into other cloud platforms such as Amazon Web Service (AWS) or VMware vCloud.

Soft Quota Cell Phone Pain

A painful example that most people can relate to regarding soft quotas is cell phone minutes usage. With most carriers, if a customer goes over the limit of their allotted cell phone minutes on their plan, they are charged an additional nominal amount per minute over. They will receive a warning when they go over the limit if their account is configured for such alerts, or they will receive an alert in the form of their bill that lets them know just how many minutes over quota they have gone and what they owe because of their overage. They are not, however, restricted from using more minutes once they have gone over their quota. If their cell phone minutes were configured as a hard quota, customers would be cut off in the middle of a phone call as soon as they eclipsed their quota. This usage of soft quotas is a great example of engineering cellular phone service by the phone companies, and it can be utilized across many other cloud services by their providers.

Physical Resource Redirection

There are so many things that virtual machines can do that sometimes we forget that they even exist on physical hardware. However, there are occasions when you will need a guest to interface with physical hardware components. Some physical hardware components that are often mapped to virtual machines include USB drives, parallel ports, serial ports, and USB ports.

In some cases, you may want to utilize USB storage exclusively for a virtual machine. You can add a USB drive to a virtual machine by first adding a USB controller. When a USB drive is attached to a host computer, the host will typically mount that drive automatically. However, only one device can access the drive at a single time without corrupting the data, so the host must release access to the drive before it can be mapped to a virtual machine. Unmount the drive from the host and then you will be ready to assign the drive to the virtual machine.

Parallel and serial ports are interfaces that allow for the connection of peripherals to computers. There are times when it is useful to have a virtual machine connect its virtual serial port to a physical serial port on the host computer. For example, a user might want to install an external modem or another form of a handheld device on the virtual machine, and this would require the guest to use a physical serial port on the host computer. It might also be useful to connect a virtual serial port to a file on a host computer and then have the guest virtual machine send output to a file on the host computer. An example of this would be to send data that was captured from a program running on the guest via the virtual serial port and transfer the information from the guest to the host computer.

In addition to using a virtual serial port, it is also helpful in certain instances to connect to a virtual parallel port. Parallel ports are used for a variety of devices, including printers, scanners, and dongles. Much like the virtual serial port, a virtual parallel port allows for connecting the guest to a physical parallel port on the host computer.

In addition to supporting serial and parallel port emulation for virtual machines, some virtualization vendors support USB device pass-through from a host computer to a virtual machine. USB pass-through allows a USB device plugged directly into a host computer to be passed through to a virtual machine. USB pass-through allows for multiple USB devices such as security tokens, software dongles, temperature sensors, or webcams that are physically attached to a host computer to be added to a virtual machine.

The process of adding a USB device to the virtual machine usually consists of adding a USB controller to the virtual machine, removing the device from the host configuration, and then assigning the USB device to the virtual machine. When a USB device is attached to a host computer, that device is available only to the virtual machines that are running on that host computer and only to one virtual machine at a time.

Resource Pools

A resource pool is a hierarchical abstraction of compute resources that can give relative importance, or weight, to a defined set of virtualized resources. Pools at the higher level in the hierarchy are called parent pools; these parents can contain either child pools or individual virtual machines. Each pool can have a defined weight assigned to it based on either the business rules of the organization or the SLAs of a customer.

Resource pools also allow administrators to define a flexible hierarchy that can be adapted at each pool level as required by the business. This hierarchical structure makes it possible to maintain access control and delegation of the administration of each pool and its resources; to ensure isolation between the pools, as well as sharing within the pools; and finally to separate the compute resources from discrete host hardware. This last feature frees administrators from the typical constraints of managing the available resources from the host they originated from. Those resources are bubbled up to a higher level for management and administration when utilizing pools.

Dynamic Resource Allocation

Just because administrators can manage their compute resources at a higher level with resource pools, it does not mean they want to spend their precious time doing it. Enter dynamic resource allocation. Instead of relying on administrators to evaluate resource utilization and apply changes to the environment that result in the best performance, availability, and capacity arrangements, a computer can do it for them based on business logic that has been predefined by either the management software’s default values or the administrator’s modification to those values.

Management platforms can manage compute resources not only for performance, availability, and capacity reasons but also to realize more cost-effective implementation of those resources in a data center, employing only the hosts required at the given time and shutting down any resources that are not needed. By employing dynamic resource allocation, providers can both reduce power costs and go greener by shrinking their power footprint and waste.

CERTIFICATION OBJECTIVE 8.02

Optimizing Performance

Utilization of the allocation mechanisms we have talked about thus far in this chapter allows administrators to achieve the configuration states that they seek within their environment. The next step is to begin optimizing that performance. Optimization including the following:

Images   Configuration best practices

Images   Common issues

Images   Scalability

Images   Performance concepts

Images   Performance automation

Configuration Best Practices

There are some best practices for the configuration of each of the compute resources within a cloud environment. The best practices for these configurations are the focus for the remainder of this section. These best practices center on those allocation mechanisms that allow for the greatest value to be realized by service providers. To best understand their use cases and potential impact, we investigate common configuration options for memory, processor, and disk.

Memory

Memory may be the most critical of all computer resources, as it is usually the limiting factor on the number of guests that can run on a given host, and performance issues appear when too many guests are fighting for enough memory to perform their functions. Two configuration options available for addressing shared memory concerns are memory ballooning and swap disk space.

Memory Ballooning   Hypervisors have device drivers that they build into the host virtualization layer from within the guest operating system. Part of this installed tool set is a balloon driver, which can be observed inside the guest. The balloon driver communicates to the hypervisor to reclaim memory inside the guest when it is no longer valuable to the operating system. If the host begins to run low on memory, it will grow the balloon driver to reclaim memory from the guest. This reduces the chance that the physical host will start to utilize virtualized memory from a defined paging file on its available disk resource, which causes performance degradation. An illustration of the way this ballooning works can be found in Figure 8-2.

FIGURE 8-2   How memory ballooning works

Images

Swap Disk Space   Swap space is disk space that is allocated to service memory requests when the physical memory capacity limit has been reached. When virtualizing and overcommitting memory resources to virtual machines, administrators must make certain to reserve enough swap space for the host to balloon memory in addition to reserving disk space within the guest operating system for it to perform its swap operations.

Virtual machines and the applications that run on them will take a significant performance hit when memory is swapped out to disk. However, you do not need that large of a disk for swap space, so it is a good practice to keep a solid state drive in the host that can be used for swap space if necessary. This will ensure that those pages moved to swap space are transferred to high-speed storage, and it will lessen the impact of memory paging operations.

Processor

CPU time is the amount of time a process or thread spends executing on a processor core. For multiple threads, the CPU time of the threads is additive. The application CPU time is the sum of the CPU time of all the threads that run the application. Wait time is the amount of time that a given thread waits to be processed; it could be processed but must wait on other factors such as synchronization waits and I/O waits.

High CPU wait times signal that there are too many requests for a given queue on a core to handle, and performance degradation will occur. While high CPU wait time can be alleviated in some situations by adding processors, these additions sometimes hurt performance as well. Caution must be exercised when adding processors as there is a potential for causing even further performance degradation if the applications using them are not designed to be run on multiple CPUs. Another solution for alleviating CPU wait times is to scale out instead of scaling up, two concepts that we explore in more detail later in this chapter.

CPU Affinity   It is also important to properly configure CPU affinity, also known as processor affinity. CPU affinity is where threads from a specific virtual machine are tied to a specific processor or core, and all subsequent requests from that process or thread are executed by that same processor or core. CPU affinity overrides the built-in processor scheduling mechanisms so that threads are bound to specific processor cores.

Images   Benefits   The primary benefit of CPU affinity is to optimize cache performance. Processor cache is local to that processor, so if operations are executed on another processor, they are unable to take advantage of the cache on the first processor. Furthermore, the same data cannot be kept in more than one processor cache, so when the second processor caches new content, it must first invalidate the cache from the first processor. This can happen when a performance-heavy threat moves from one processor to another, and it can be prevented by assigning the virtual machine thread to a processor so that its cache never moves. This also ensures that the cache that has been created for that processor is utilized more often for that virtual machine thread.

Images   Caveats   Assigning CPU affinity can cause many problems and should be used sparingly. In many cases, the best configuration will be not to configure CPU affinity and to let the hypervisor choose the best processor for the task at hand. This is primarily because CPU affinity does not prevent other virtual machines from using the processor core, but it restricts the configured virtual machine from using other cores; thus, the preferred CPU could be overburdened with other work. Also, where the host would normally assign the virtual machine’s thread to another available CPU, CPU affinity would require the virtual machine to wait until the CPU became available before its thread would be processed.

Test CPU affinity before implementing it in production. You may need to create CPU affinity rules for all other virtual machines to ensure that they do not contend for CPU cores. Document affinity settings so that other administrators will be aware of them when migrating virtual machines or performing other changes to the environment.

Disk

Poor disk performance, or poorly designed disk solutions, can have performance ramifications in traditional infrastructures, slowing users down as they wait to read or write data for the server they are accessing. In a cloud model, however, disk performance issues can limit access to all organization resources because multiple virtualized servers in a networked storage environment might be competing for the same storage resources, thereby crippling their entire deployment of virtualized servers or desktops. The following sections describe some typical configurations and measurements that assist in designing a high-performance storage solution. These consist of the following:

Images   Disk performance

Images   Disk tuning

Images   Disk latency

Images   I/O throttling

Images   I/O tuning

Disk Performance   Disk performance can be configured with several different configuration options. Media type can affect performance, and administrators can choose between the most standard types of traditional rotational media or chip-based solid state drives. Solid state drives are much faster than their rotational counterparts as they are not limited by the physical seek arm speed that reads the rotational platters. Solid state drives, while becoming more economical in the last few years, are still much more expensive than rotational media and are not utilized except where only the highest performance standards are required.

The next consideration for disk performance is the speed of the rotational media, should that be the media of choice. Server-class disks start at 7,200 rpm and go up to 15,000 rpm, with seek times for the physical arm reading the platters being considerably lower on the high-end drives. In enterprise configurations, price point per gigabyte is primarily driven by the rotation speed and only marginally by storage space per gigabyte. When considering enterprise storage, the adage is that you pay for performance, not space.

Once the media type and speed have been determined, the next consideration is the type of RAID array that the disks are placed in to meet the service needs. Different levels of RAID can be employed based on the deployment purpose. These RAID levels should be evaluated and configured based on the type of I/O and on the need to read, write, or a combination of both.

Disk Tuning   Disk tuning is the activity of analyzing what type of I/O traffic is taking place across the defined disk resources and moving it to the most appropriate set of resources. Virtualization management platforms enable the movement of storage, without interrupting current operations, to other disk resources within their control.

Virtualization management platforms allow either administrators or dynamic resource allocation programs to move applications, storage, databases, and even entire virtual machines among disk arrays with no downtime to make sure that those virtualized entities get the performance they require based on either business rules or SLAs.

Disk Latency   Disk latency is a counter that provides administrators with the best indicator of when a resource is experiencing degradation due to a disk bottleneck and needs to have action taken against it. If high latency counters are experienced, a move to either another disk array with quicker response times or a different configuration, such as higher rotational speeds or a different array configuration, is warranted. Another option is to configure I/O throttling.

I/O Throttling   I/O throttling does not eliminate disk I/O as a bottleneck for performance, but it can alleviate performance problems for specific virtual machines based on a priority assigned by the administrator. I/O throttling defines limits that can be utilized specifically for disk resources allocated to virtual machines to ensure that they are not performance or availability constrained when working in an environment that has more demand than the availability of disk resources.

I/O throttling may be a valuable option when an environment contains both development and production resources. The production I/O can be given a higher priority than the development resources, allowing the production environment to perform better for end users.

Prioritization does not eliminate the bottleneck. Rather, prioritizing production machines just passes the bottleneck on to the development environment, which becomes even further degraded in performance as it waits for all production I/O requests when the disk is overallocated. Administrators can then assign a priority or pecking order for the essential components that need higher priority.

I/O Tuning   When designing systems, administrators need to analyze I/O needs from the top down, determining which resources are necessary to achieve the required performance levels. In order to perform this top-down evaluation, administrators first need to evaluate the application I/O requirements to understand how many reads and writes are required by each transaction and how many transactions take place each second. Once those application requirements are understood, they can build the disk configuration (specifically, which types of media, what array configuration, the number of disks, and the access methods) to support that number.

Common Issues

There are some failures that can occur within a cloud environment, and the system must be configured to be tolerant of those failures and provide availability in accordance with the organization’s SLA or other contractual agreements.

Mechanical components in an environment will experience failure at some point. It is just a matter of time. Higher-quality equipment may last longer than cheaper equipment, but it will still break down someday. This is something you should be prepared for.

Failures occur mainly on each of the four primary compute resources: disk, memory, network, and processor. This section examines each of these resources in turn.

Common Disk Issues

Disk-related issues can happen for a variety of reasons, but disks fail more frequently than the other compute resources because they are the only compute resource that has a mechanical component. Due to the moving parts, failure rates are typically quite high. Some common disk failures include

Images   Physical hard disk failures

Images   Controller card failures

Images   Disk corruption

Images   HBA failures

Images   Fabric and network failures

Physical Hard Disk Failures   Physical hard disks frequently fail because they are mechanical, moving devices. In enterprise configurations, they are deployed as components of drive arrays, and single failures do not affect array availability.

Controller Card Failures   Controller cards are the elements that control arrays and their configurations. Like all components, they fail from time to time. Redundant controllers are costly to run in parallel as they require double the amount of drives to become operational, and that capacity is lost because it is never in use until failure. Therefore, an organization should do a return-on-investment analysis to determine the feasibility of making such devices redundant.

Disk Corruption   Disk corruption occurs when the structured data on the disk is no longer accessible. This can happen as a result of malicious acts or programs, skewing of the mechanics of the drive, or even a lack of proper maintenance. Disk corruption is hard to repair, as the full contents of the disks often need to be reindexed or restored from backups. Backups can also be unreliable for these failures if the corruption began before its identification, as the available backup sets may also be corrupted.

Host Bus Adapter Failures   HBA failures, while not as common as physical disk failures, need to be expected and storage solutions need to be designed with them in mind. HBAs have the option of being multipathed, which prevents a loss of availability in the event of a failure.

Fabric and Network Failures   Similar to controller card failure, fabric or network failures can be relatively expensive to design around, as they happen when a storage networking switch or switch port fails. The design principles to protect against such a failure are similar to those for HBAs, as multipathing needs to be in place to make certain all hosts that depend on the fabric or network have access to their disk resources through another channel.

Common Memory Issues

Memory-related issues, while not as common as disk failures, can be just as disruptive. Good system design in cloud environments will take RAM failure into account as a risk and ensure that there is always some RAM available to run mission-critical systems in case of memory failure on one of their hosts. The following are some types of memory failures:

Images   Memory chip failures

Images   Motherboard failures

Images   Swap files that run out of space

Memory Chip Failures   Memory chip failures happen less frequently than physical device failures since memory chips have no moving parts and mechanical wear does not play a part. They will, however, break from time to time and need to be replaced.

Motherboard Failures   Similar to memory chips, motherboards have no moving parts, and because of this, they fail less frequently than mechanical devices. When they do fail, however, virtual machines are unable to operate, as they have no processor, memory, or networking resources that they can access. In this situation, they must be moved immediately to another host or go offline.

Swap Files Out of Space   Swap space failures often occur in conjunction with a disk failure, when disks run out of available space to allocate to swap files for memory overallocation. They do, however, result in out-of-memory errors for virtual machines and hosts alike.

Network Issues

Similar to memory components, network components are relatively reliable because they do not have moving parts. Unlike memory, network resources are highly configurable and prone to errors based on human mistakes during implementation. Some common types of network failures include

Images   Physical NIC failures

Images   Speed or duplex mismatches

Images   Switch failures

Images   Physical transmission media failures

Physical NIC Failures   Network interface cards can fail in a similar fashion to other printed circuit board components like motherboards, controller cards, and memory chips. Because they fail from time to time, redundancy needs to be built into the host through multiple physical NICs and into the virtualization through designing multiple network paths using virtual NICs for the virtual machines.

Speed or Duplex Mismatches   Mismatch failures happen only on physical NICs and switches, as virtual networks negotiate these automatically. Speed and duplex mismatches result in dropped packets between the two connected devices and can be identified through getting many cyclical redundancy check (CRC) errors on the devices.

Switch Failures   Similar to fabric and network failures, network switch failures are expensive to plan for as they require duplicate hardware and cabling. Switches fail wholesale only a small percentage of the time, but more frequently have individual ports fail. When these individual ports do fail, the resources that are connected to them need to have another path available or their service will be interrupted.

Physical Transmission Media Failures   Cables break from time to time when their wires inside are crimped or cut. This can happen either when they are moved, when they are stretched too far, or when they become old, and the connector breaks loose from its associated wires. As with other types of network failures, multiple paths to the resource using that cable is the way to prevent a failure from interrupting operations.

Physical Processor Issues

Processors fail for one of three main reasons: they get broken while getting installed, they are damaged by voltage spikes, or they are damaged due to overheating from failed or ineffective fans. Damaged processors either take hosts completely off-line or degrade performance based on the damage and the availability of a standby or alternative processor in some models.

Scalability

Most applications will see increases in workloads in their life cycles. For this reason, the systems supporting those applications must be able to scale to meet increased demand. Scalability is the ability of a system or network to manage a growing workload in a proficient manner or its ability to be expanded to accommodate the workload growth. All cloud environments need to be scalable, as one of the chief tenets of cloud computing is elasticity, or the ability to adapt to growing workload quickly.

Scalability can be handled either vertically or horizontally, more commonly referred to as “scaling up” or “scaling out,” respectively.

Vertical Scaling (Scaling Up)

To scale vertically means to add resources to a single node, thereby making that node capable of handling more of a load within itself. This type of scaling is most often seen in virtualization environments where individual hosts add more processors or more memory with the objective of adding more virtual machines to each host.

Horizontal Scaling (Scaling Out)

To scale horizontally, more nodes are added to a configuration instead of increasing the resources for any one node. Horizontal scaling is often used in application farms, where more web servers are added to a farm to handle distributed application delivery better. The third type of scaling, diagonal scaling, is a combination of both, increasing resources for individual nodes and adding more of those nodes to the system. Diagonal scaling allows for the best configuration to be achieved for a quickly growing, elastic solution.

Images

Know the difference between scaling up and scaling out.

Performance Concepts

There are some performance concepts that underlie each of the failure types and the allocation mechanisms discussed in this chapter. As we did with the failure mechanisms, let’s look at each of these according to their associated compute resources.

Disk

The configuration of disk resources is an important part of a well-designed cloud system. Based on the user and application requirements and usage patterns, there are numerous design choices that need to be made to implement a storage system that cost-effectively meets an organization’s needs. Some of the considerations for disk performance include

Images   IOPS

Images   Read and write operations

Images   File system performance

Images   Metadata performance

Images   Caching

IOPS   IOPS, or input/output operations per second, are the standard measurement for disk performance. They are usually gathered as read IOPS, write IOPS, and total IOPS to distinguish between the types of requests that are being received.

Read and Write Operations   As just mentioned, there are two types of operations that can take place: reading and writing. As their names suggest, reads occur when a resource requests data from a disk resource, and writes occur when a resource requests new data be recorded on a disk resource. Based on which type of operation takes place, different configuration options exist both for troubleshooting and performance tuning.

File System Performance   File system performance is debated as a selling point among different technology providers. File systems can be formatted and cataloged differently based on the proprietary technologies of their associated vendors. There is little to do in the configuration of file system performance outside of evaluating the properties of each planned operation in the environment.

Metadata Performance   Metadata performance refers to how quickly files and directories can be created, removed, or checked. Applications exist now that produce millions of files in a single directory and create very deep and wide directory structures, and this rapid growth of items within a file system can have a huge impact on performance. The ability to create, remove, and check their status efficiently grows in direct proportion to the number of items in use on any file system.

Caching   To improve performance, hard drives are architected with a mechanism called a disk cache that reduces both read and write times. On a physical hard drive, the disk cache is usually a RAM chip that is built in and holds data that is likely to be accessed again soon. On virtual hard drives, the same caching mechanism can be employed by using a specified portion of a memory resource.

Network

Similar to disk resources, the configuration of network resources is critical. Based on the user and application requirements and usage patterns, numerous design choices need to be made to implement a network that cost-effectively meets an organization’s needs. Some of the considerations for network performance include

Images   Bandwidth

Images   Throughput

Images   Jumbo Frames

Images   Network latency

Images   Hop counts

Images   Quality of service (QoS)

Images   Multipathing

Images   Load balancing

Bandwidth   Bandwidth is the measurement of available or consumed data communication resources on a network. Performance of all networks is dependent on having available bandwidth.

Throughput   Throughput is the amount of data that can be realized between two network resources. Throughput can be substantially increased through the use of bonding or teaming of network adapters, which allows resources to see multiple interfaces as one single interface with aggregated resources.

Jumbo Frames   Jumbo Frames are Ethernet frames with more than 1,500 bytes of payload. These frames can carry up to 9,000 bytes of payload, but depending on the vendor and the environment they are deployed in, there may be some deviation. Jumbo Frames are utilized because they are much less processor intensive to consume than a large number of smaller frames, therefore freeing up expensive processor cycles for more business-related functions.

Network Latency   Network latency refers to any performance delays experienced during the processing of any network data. A low-latency network connection is one that experiences small delay times, such as a dedicated T-1, while a high-latency connection frequently suffers from long delays, like DSL or a cable modem.

Hop Counts   A hop count represents the total number of devices a packet passes through to reach its intended network target. The more hops data must pass through to reach its destination, the greater the delay is for the transmission. Network utilities like ping can be used to determine the hop count to an intended destination. Ping generates packets that include a field reserved for the hop count (typically referred to as a TTL, or time-to-live), and each time a capable device (usually a router) along the path to the target receives one of these packets, that device modifies the packet, decrementing the TTL by one. Each packet is sent out with a particular time-to-live value, ranging from 1 to 254; for every router (hop) that it traverses, that TTL count is decremented. Also, for every second that the packet resides in the memory of the router, it is also decremented by one. The device then compares the hop count against a predetermined limit and discards the packet if its hop count is too high. If the TTL is decremented to zero at any point during its transmission, an ICMP port unreachable message is generated, with the IP of the source router or device included, and sent back to the originator. The finite TTL is used as it counts down to zero to prevent packets from endlessly bouncing around the network due to routing errors.

Quality of Service (QoS)   QoS is a set of technologies that can identify the type of data in data packets and divide those packets into specific traffic classes that can be prioritized according to defined service levels. QoS technologies enable administrators to meet their service requirements for a workload or an application by measuring network bandwidth, detecting changing network conditions, and prioritizing the network traffic accordingly. QoS can be targeted at a network interface, toward a given server’s or router’s performance, or regarding specific applications. A network monitoring system is typically deployed as part of a QoS solution to ensure that networks are performing at the desired level.

Multipathing   Multipathing is the practice of defining and controlling redundant physical paths to I/O devices so that when an active path to a device becomes unavailable, the multipathing configuration can automatically switch to an alternate path to maintain service availability. The capability of performing this operation without intervention from an administrator is known as automatic failover.

Images

It is important to remember that multipathing is almost always an architectural component of redundant solutions.

A prerequisite for taking advantage of multipathing capabilities is to design and configure the multipathed resource with redundant hardware, such as redundant network interfaces or host bus adapters.

Load Balancing   A load balancer is a networking solution that distributes incoming traffic among multiple servers hosting the same application content. Load balancers improve overall application availability and performance by preventing any application server from becoming a single point of failure.

If deployed alone, however, the load balancer becomes a single point of failure by itself. Therefore, it is always recommended to deploy multiple load balancers in parallel. In addition to improving availability and performance, load balancers add to the security profile of a configuration by the typical usage of network address translation, which obfuscates the IP address of the back-end application servers.

Performance Automation

Various tasks can be performed to improve performance on machines. It is typical for these tasks to be performed at regular intervals to maintain consistent performance levels. However, it can be quite a job to maintain a large number of systems, and organizational IT departments are supporting more devices per person than ever before. They accomplish this through automation. Automation uses scripting, scheduled tasks, and automation tools to do the routine tasks so that IT staff can spend more time solving the real problems and proactively looking for ways to make things better and even more efficient.

PowerShell commands are provided in many examples because these commands can be used with the AWS Command Line Interface (CLI) or the Microsoft Azure Cloud Shell. PowerShell was chosen for its versatility. However, other scripting languages can also be used depending on the platform. Scripts can be combined into tasks using AWS Systems Manager or Microsoft Azure runbooks.

This section discusses different performance-enhancing activities that can be automated to save time and standardize. They include the following:

Images   Archiving logs

Images   Clearing logs

Images   Compressing drives

Images   Scavenging stale DNS entries

Images   Purging orphaned resources

Images   Reclaiming resources

Archiving Logs

Logs can take up a lot of space on servers, but you will want to keep logs around for a long time in case they are needed to investigate a problem or a security issue. For this reason, you might want to archive logs to a logging server and then clear the log from the server.

A wide variety of cloud logging and archiving services are available that can be leveraged instead of setting up a dedicated logging server. Some services include Logentries, OpenStack, Sumo Logic, Syslog, Amazon S3, Amazon CloudWatch, and Papertrail. Cloud backup services can also be used to archive logs. Services such as AWS Glacier can be configured to pull log directories and store them safely on another system so they are not lost. These systems can consolidate logs, then correlate and deduplicate them to save space and gain network intelligence.

Clearing Logs

There is very little reason to clear logs unless you have first archived them to another service or server. The previous section outlined how to archive logs to a local logging server or to cloud services. Ensure that these are configured and that they have been fully tested before clearing logs that could contain valuable data. Logs are there for a reason. They show the activity that took place on a device, and they can be very valuable in retracing the steps of an attacker or in troubleshooting errors. You do not want to be the person who is asked, “How long has this been going on?” and you have to answer, “I don’t know because we cleared the logs last night.”

Here is a PowerShell function to clear the logs from computers 1 through 4 called ClearComputer1-4Logs. You first provide the function with a list of computers. It then puts together a list of all logs, goes through each, and clears the log.

Images

Compressing Drives

Compressing drives can reduce the amount of space consumed. However, accessing files on the drives will require a bit more CPU power to decompress before the file can be opened. Here is the command you can use to compress an entire drive. You can place this in a Windows group policy to encrypt the data drives (D:) of various machines depending on how you apply the group policy. The following command specifies that the D: directory and everything below it should be compressed. The –recurse command is what causes the compression to take place on all subfolders.

Images

Scavenging Stale DNS Entries

As mentioned in Chapter 4, DNS distributes the responsibility for both the assignment of domain names and the mapping of those names to IP addresses to the authoritative name servers within each domain. DNS servers register IP address assignments as host records in their database. Sometimes a record is created, and then that host is removed, or it is assigned a new address. The DNS server would retain a bogus record in the former case and redundant addresses in the latter case.

Scavenging is the process of removing DNS entries for hosts that no longer respond on that address. You can configure automatic scavenging on DNS servers. All you have to do is enable the scavenging feature and set the age for when DNS records will be removed. If a host cannot be reached for the specified number of days, its host record in DNS will automatically be deleted.

Purging Orphaned Resources

Applications, hypervisors included, do not always clean up after themselves. Sometimes child objects or resources from deleted or moved objects still remain on systems. These are known as orphaned resources.

In Microsoft System Center Virtual Machine Manager (SCVMM), you can view orphaned resources by opening the Library workspace and clicking Orphaned Resources. You can right-click the object to delete it, but we want to automate the task. A script to remove all orphaned resources from SCVMM would take many pages of this book, so we will point you to a resource where you can obtain an up-to-date script for free:

https://www.altaro.com/hyper-v/free-script-find-orphaned-hyper-v-vm-files/

Orphaned resources show up in the VMware vSphere web client with “(Orphaned)” after their name. You can remove them with this script after logging into the command line on the host:

Images

Reclaiming Resources

Many companies have inactive virtual machines that continue to consume valuable resources while providing no business value. Metrics can identify machines that might not be used, at which point, a standard message can be sent to the owner of the machine notifying them that their machine has been flagged for reclamation unless they confirm that it is still providing business value. Alternatively, you can give the owner the option of keeping or reclaiming the resources themselves rather than automatically doing it. However, if the owner of the virtual machine does not respond in a timely manner, the organization may decide to have the machine reclaimed automatically.

If reclamation is chosen, the machine can be archived and removed from the system and the resources can be freed up for other machines. The automation can be initiated whenever metrics indicate an inactive machine. VMware vRealize has this capability built in for vCenter, and similar automation can be created for other tools. In Microsoft Azure, the Resource Manager can be configured to reclaim resources.

CERTIFICATION SUMMARY

When building a virtualization host, special consideration needs to be given to adequately planning the resources to ensure that the host is capable of supporting the virtualized environment. Creating a virtual machine requires thorough planning regarding the role the virtual machine will play in the environment and the resources needed for the virtual machine to accomplish that role. Planning carefully for the virtual machine and the primary resources of memory, processor, disk, and network can help prevent common failures.

KEY TERMS

Use the following list to review the key terms discussed in this chapter. The definitions also can be found in the glossary.

bandwidth   A measurement of available or consumed data communication resources on a network.

caching   A mechanism for improving the time it takes to read from or write to a disk resource.

compute resources   The resources that are required for the delivery of virtual machines: disk, processor, memory, and networking.

CPU wait time   The delay that results when the CPU cannot perform computations because it is waiting for I/O operations.

hop count   The total number of devices a packet passes through to reach its intended network target.

horizontal scaling   A scalability methodology whereby more nodes are added to a configuration instead of increasing the resources for any one node. Horizontal scaling is also known as scaling out.

I/O throttling   Defined limits utilized specifically for disk resources assigned to virtual machines to ensure they are not performance or availability constrained when working in an environment that has more demand than availability of disk resources.

input/output operations per second (IOPS)   A common disk performance measurement of how much data is provided over a period of time.

Jumbo Frames   Large frames that are used for large data transfers to lessen the burden on processors.

limit   A floor or ceiling on the amount of resources that can be utilized for a given entity.

load balancing   Networking solution that distributes incoming traffic among multiple resources.

memory ballooning   A device driver loaded inside a guest operating system that identifies underutilized memory and allows the host to reclaim memory for redistribution.

metadata performance   A measure of how quickly files and directories can be created, removed, or checked on a disk resource.

multipathing   The practice of defining and controlling redundant physical paths to I/O devices.

network latency   Any delays typically incurred during the processing of any network data.

orphaned resource   A child object or resource from deleted or moved objects that remains on a system.

quality of service (QoS)   A set of technologies that provides the ability to manage network traffic and prioritize workloads to accommodate defined service levels as part of a cost-effective solution.

quota   The total amount of resources that can be utilized for a system.

read operation   Operation in which a resource requests data from a disk resource.

reservation   A mechanism that ensures a lower limit is enforced for the amount of resources guaranteed to an entity.

resource pool   Partition of compute resources from a single host or a cluster of hosts.

scalability   Ability of a system or network to manage a growing workload in a proficient manner or its ability to be expanded to accommodate the workload growth.

scavenging   The process of removing DNS entries for hosts that no longer respond on that address.

throughput   The amount of data that can be realized between two network resources.

vertical scaling   A scalability methodology whereby resources such as additional memory, vCPUs, or faster disks are added to a single node, thereby making that node capable of handling more of a load within itself. Vertical scaling is also known as scaling up.

write operation   Operation in which a resource requests that new data be recorded on a disk resource.

Images TWO-MINUTE DRILL

Host and Guest Resource Allocation

Images  Proper planning of the compute resources for a host computer ensures that the host can deliver the performance needed to support its virtualized environment.

Images  Quotas and limits allow cloud providers to control the amount of resources a cloud consumer can access.

Images  A reservation helps to ensure that a host computer receives a guaranteed amount of resources to support its virtual machine.

Images  Resource pools allow an organization to organize the total compute resources in the virtual environment and link them back to their underlying physical resources.

Images  Guest virtual machines utilize quotas and limits to constrain the ability of users to consume compute resources and can prevent users from either completely depleting or monopolizing those resources.

Images  Software applications and operating systems must support the ability to be licensed in a virtual environment, and the licensing needs to be taken into consideration before a physical server becomes a virtual server.

Images  A guest virtual machine can support the emulation of a parallel and serial port; some can support the emulation of a USB port.

Images  Dynamic resource allocation can be used to automatically assign compute resources to a guest virtual machine based on utilization.

Optimizing Performance

Images  There are a number of best practices for configuration of compute resources within a cloud environment. Cloud administrators must be able to optimize memory, disk, and processor resources as discussed in this chapter.

Images  There are multiple failures that can occur within a cloud environment, including hard disk failure, controller card failures, disk corruption, HBA failure, network failure, RAM failure, motherboard failure, network switch failure, and processor failure.

Images  Node capacity can be increased to vertically scale (scale up) or additional nodes can be added to horizontally scale (scale out).

Images  Log archiving and clearing, drive compression, DNS scavenging, purging orphaned resources, and reclaiming resources can be automated to improve performance on machines.

Images SELF TEST

The following questions will help you measure your understanding of the material presented in this chapter. As indicated, some questions may have more than one correct answer, so be sure to read all the answer choices carefully.

Host and Guest Resource Allocation

1.   Which of the following would be considered a host compute resource?

A.   Cores

B.   Power supply

C.   Processor

D.   Bandwidth

2.   Quotas are a mechanism for enforcing what?

A.   Limits

B.   Rules

C.   Access restrictions

D.   Virtualization

3.   How are quotas defined?

A.   By management systems

B.   According to service level agreements that are defined between providers and their customers

C.   Through trend analysis and its results

D.   With spreadsheets and reports

4.   When would a reservation be used?

A.   When a maximum amount of resources needs to be allocated to a specific resource

B.   When a minimum amount of capacity needs to be available at all times to a specific resource

C.   When capacity needs to be measured and controlled

D.   When planning a dinner date

5.   How does the hypervisor enable access for virtual machines to the physical hardware resources on a host?

A.   Over Ethernet cables

B.   By using USB 3.0

C.   Through the system bus

D.   By emulating a BIOS that abstracts the hardware

6.   What mechanism allows one core to handle all requests from a specific thread on a specific processor core?

A.   V2V

B.   CPU affinity

C.   V2P

D.   P2V

7.   In a scenario where an entity exceeds its defined quota but is granted access to the resources anyway, what must be in place?

A.   Penalty

B.   Hard quota

C.   Soft quota

D.   Alerts

8.   Which of the following must be licensed when running a virtualized infrastructure?

A.   Hosts

B.   Virtual machines

C.   Both

D.   Neither

9.   What do you need to employ if you have a serial device that needs to be utilized by a virtual machine?

A.   Network isolation

B.   Physical resource redirection

C.   V2V

D.   Storage migration

10.   You need to divide your virtualized environment into groups that can be managed by separate groups of administrators. Which of these tools can you use?

A.   Quotas

B.   CPU affinity

C.   Resource pools

D.   Licensing

Optimizing Performance

11.   Which tool allows guest operating systems to share noncritical memory pages with the host?

A.   CPU affinity

B.   Memory ballooning

C.   Swap file configuration

D.   Network attached storage

12.   Which of these options is not a valid mechanism for improving disk performance?

A.   Replacing rotational media with solid state media

B.   Replacing rotational media with higher-speed rotational media

C.   Decreasing disk quotas

D.   Employing a different configuration for the RAID array

Images SELF TEST ANSWERS

Host and Guest Resource Allocation

1.   Images   C. The four compute resources used in virtualization are disk, memory, processor, and network. On a host, these are available as the physical entities of hard disks, memory chips, processors, and network interface cards (NICs).

Images   A, B, and D are incorrect. Cores are a virtual compute resource. Power supplies, while utilized by hosts, are not compute resources, because they do not contribute resources toward the creation of virtual machines. Bandwidth is a measurement of network throughput capability, not a resource itself.

2.   Images   A. Quotas are limits on the resources that can be utilized for a specific entity on a system. For example, a user could be limited to storing up to 10GB of data on a server or a virtual machine limited to 500GB of bandwidth each month.

Images   B, C, and D are incorrect. Quotas cannot be used to enforce rules or setup virtualization. Access restrictions are security entities, not quantities that can be limited, and virtualization is the abstraction of hardware resources, which has nothing to do with quotas.

3.   Images   B. Quotas are defined according to service level agreements that are negotiated between a provider and its customers.

Images   A, C, and D are incorrect. Management systems and trend analysis provide measurement of levels of capacity, and those levels are reported using spreadsheets and reports, but these are all practices and tools that are used once the quotas have already been negotiated.

4.   Images   B. A reservation should be used when there is a minimum amount of resources that needs to have guaranteed capacity.

Images   A, C, and D are incorrect. Dealing with maximum capacity instead of minimums is the opposite of a reservation. Capacity should always be measured and controlled, but not all measurement and control of capacity deals with reservations. Obviously, if you are planning for a dinner date you will want to make reservations, but that has nothing to do with cloud computing.

5.   Images   D. The host computer BIOS is emulated by the hypervisor to provide compute resources for a virtual machine.

Images   A, B, and C are incorrect. These options do not allow a host computer to emulate compute resources and distribute them among virtual machines.

6.   Images   B. CPU affinity allows all requests from a specific thread or process to be handled by the same processor core.

Images   A, C, and D are incorrect. You can use a V2V to copy or restore files and program from one guest virtual machine to another. V2P allows you to migrate a guest virtual machine to a physical server. P2V allows you to migrate a physical server’s operating system, applications, and data from the physical server to a newly created guest guest virtual machine on a host computer.

7.   Images   C. Soft quotas enforce limits on resources, but do not restrict access to the requested resources when the quota has been exceeded.

Images   A, B, and D are incorrect. Penalties may be incurred if soft quotas are exceeded, but the quota must first be in place. A hard quota denies access to resources after it has been exceeded. Alerts should be configured, regardless of the quota type, to be triggered when the quota has been breached.

8.   Images   C. Both hosts and guests must be licensed in a virtual environment.

Images   A, B, and D are incorrect. Both hosts and guests must be licensed in a virtual environment.

9.   Images   B. Physical resource redirection enables virtual machines to utilize physical hardware as if they were physical hosts that could connect to the hardware directly.

Images   A, C, and D are incorrect. These options do not allow you to redirect a guest virtual machine to a physical port on a host computer.

10.   Images   C. Resource pools allow the creation of a hierarchy of guest virtual machine groups that can have different administrative privileges assigned to them.

Images   A, B, and D are incorrect. Quotas are employed to limit the capacity of a resource, CPU affinity is used to isolate specific threads or processes to one processor core, and licensing has to do with the acceptable use of software or hardware resources.

Optimizing Performance

11.   Images   B. Memory ballooning allows guest operating systems to share noncritical memory pages with the host.

Images   A, C, and D are incorrect. CPU affinity is used to isolate specific threads or processes to one processor core. Swap file configuration is the configuration of a specific file to emulate memory pages as an overflow for physical RAM. Network attached storage is a disk resource that is accessed across a network.

12.   Images   C. Decreasing disk quotas helps with capacity issues, but not with performance.

Images   A, B, and D are incorrect. Changing from rotational to solid state media increases performance since it eliminates the dependency on the mechanical seek arm to read or write. Upgrading rotational media to higher rotational speed also speeds up both read and write operations. Changing the configuration of the array to a different RAID level can also have a dramatic effect on performance.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.62.105