Virtualization is a maturing technology. This chapter provides an introduction to basic virtualization concepts and issues. It begins by defining virtualization and examining the motivations for using it, then turns to system virtualization models.
Technology is developed in response to a need. Virtualization technologies were invented to address gaps in the functionality of existing computer systems. Let’s take a look at the meaning of computer virtualization and consider the various needs that it fulfills.
System virtualization is technology that creates “virtual environments,” which allow multiple applications to run on one computer as if each has its own private computer. Virtualization is achieved by creating a virtual computer or virtual operating system that behaves like a real one, but isn’t. Workload isolation is the primary goal of system virtualization.
Three virtualization models are commonly used. The first model is based on the ability to provide multiple isolated execution environments in one operating system (OS) instance—an approach called operating system virtualization (OSV). In this model, each environment contains what appears to be a private copy of the operating system in a container (a similar technology calls them jails). In the second model of virtualization, multiple operating system instances run on one set of hardware resources. This model takes advantage of virtual machines (VMs), which may run the same or different operating systems. Virtual machines are provided by hardware and/or software called a hypervisor, which creates the illusion of a private machine for each “guest” OS instance. In the third model, hardware partitioning ensures the electrical segregation of computer hardware resources—CPUs, RAM, and I/O components—so as to create multiple independent computers within one computer. Each isolated grouping of hardware is called a partition or domain.
Through the course of this book, we will describe different forms of computer virtualization in detail. We will use the phrase “virtual environment” (VE) to refer to any of these three models of virtualization.
The original and most common reason to virtualize is to facilitate server consolidation, although the implementation of compute clouds is emerging as an important alternative type of consolidation. Today’s data center managers face a series of extreme challenges: They must continue to add to compute workloads while minimizing operational costs, which include the electricity to power and cool those computing systems. In data centers with little empty rack space, this requirement necessitates squeezing existing workloads into unused compute capacity on existing systems and increasing the workload density of new systems.
Virtualization also provides a convenient layer of separation. This greatly simplifies the provisioning of a workload into a pool of compute resources, as well as the movement of workloads among compute nodes in a pool. These abilities greatly increase the flexibility and ease of creating compute clouds such as Database as a Service (DBaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS).
New servers are better able to handle this task than older ones. In many cases, data centers have achieved consolidation ratios of 20:1 or even 100:1, thereby reducing annual costs by hundreds of thousands or millions of dollars.
How can you achieve such savings? Workload consolidation is the process of implementing several computer workloads on one computer. This type of consolidation is not the same as the concept exemplified by installing multiple programs on a desktop computer, where you are actively using only one program at a time. Instead, a server with consolidated workloads will run multiple programs on the same CPU(s) at the same time.
Most computers that are running only one workload are under-utilized, meaning that more hardware resources are available than the workload needs. The result is inefficient use of an organization’s financial resources. Which costs more: five computers to run five workloads, or one computer that can run all five workloads? Of course, it is impossible to purchase exactly the right amount of “computer.” For example, it is not possible to purchase a computer with half of a CPU for a workload that needs only half of the compute power of a single CPU.
Figure 1.1 shows the amount of CPU capacity used by two workloads, each residing on its own system. Approximately 70% of the investment in the first system’s CPU is wasted, and 60% of the second system’s CPU goes unused. In this arrangement, the cost of a CPU—not to mention other components, such as the frame and power supplies—is wasted.
It is possible—and often desirable—to run another workload on the same system instead of purchasing a second computer to handle the second workload. Figure 1.2 shows the CPU consumption for the same two workloads after consolidation onto one system. As you can see, the amount of wasted investment has decreased by an entire CPU. In this system, the OS will spend some compute cycles managing resource usage of the two workloads and reducing the impact that one workload has on the other. This mediation increases CPU utilization, thereby reducing available CPU capacity, but we will ignore this effect for now.
Of course, these examples use average values, and real-world computers do not run well at 100% CPU utilization. Given this fact of life, you must avoid being overly aggressive when consolidating workloads.
In the early days of the computer industry, computers were often so expensive that a corporation might be able to own only one of these machines. Given the precious nature of this resource, it was important for an organization to make the most of a computer by keeping it as busy as possible doing useful work.
At first, computers could run only one program at a time. This arrangement was unwieldy when a group of users sought to use the same computer, creating a need for a multiuser operating system. Software engineers designed such operating systems with features that prevented one program from affecting another program in a harmful way and prevented one user from causing harm to the other users’ programs and data. Other features were designed to prevent one program from consuming more system resources—CPU cycles, physical memory (RAM), or network bandwidth—than it should. This consideration was important for early batch systems like OS/360 on early mainframes, but it became even more important for influential early time-sharing systems such as CTSS and Multics.
A further issue was that computers, prior to the development of virtualization, were able to run only one operating system at a time. This constraint created operational problems and increased costs if different users of a system needed different operating systems for their applications. Some operating systems were good for time-sharing and others were oriented toward batch processing, but applications during that era were inevitably written for a specific operating system or for a specific version. Without virtualization, companies had to purchase additional systems to run the different operating systems, or schedule dedicated time for each group of users needing their own OS version.
For example, an IBM mainframe customer in the 1970s might want to upgrade from an older IBM operating system like OS/360 to the newly released MVS. This change required the purchase of additional, very expensive hardware, and perhaps created the need to schedule system conversion and test times on weekends and late at night. VM/370—one of the first virtualization technologies—became very popular with systems programmers because they were able to run both the old and new operating systems at the same time, during normal working hours. The same programmers found that VM/370 provided a time-sharing environment that was pleasant and efficient compared to the batch-oriented operating systems otherwise in use. Different operating systems with different strengths could run side by side on the same hardware without incurring the capital expense needed to acquire another system.
Later, computer manufacturers developed inexpensive microcomputers, which were more affordable than the relatively expensive minicomputers and mainframes. Unfortunately, these microcomputers and their early operating systems were not well suited to running multiple production applications. This led to a common practice of running one application per computer.
Some early virtualization approaches offered little choice of operating systems for guests. Although the current virtual machine platforms typically support multiple operating system types in their virtual machines, hardware partitioning and operating system virtualization approaches do not.
More recently, progress in computer performance has led to a desirable problem: too much compute capacity. Many servers in data centers run at a very low average CPU utilization rate. Many users would like to put most of the rest of this capacity to work—a feat that can be achieved by consolidating workloads.
Operating systems designed for use by multiple users (e.g., most UNIX derivatives) have a long history of running multiple applications simultaneously. These operating systems include sophisticated features that isolate running programs, thereby preventing them from interfering with one another and attempting to provide each program with its fair share of system resources. Even these systems have limitations, however. For example, the application might assume that only one instance of that application will be running on the system, and it might acquire exclusive access to a singular, non-shareable system resource, such as a lock file with a fixed name. The first instance of such an application would lock the file to ensure that it is the only application modifying data files. A second instance of that application might then attempt to lock that same file, but the attempt would inevitably fail. Put simply, multiple instances of that application cannot coexist unless they can be isolated from each other.
Even if multiple workloads can coexist, other obstacles to consolidation may be present. Corporate security or regulatory rules might dictate that one group of users must not be able to know anything about programs being run by a different group of users. Either a software barrier is needed to prevent undesired observation and interaction, or those two user groups must be restricted to the use of different systems. The different user groups might also require different OS patch levels for their applications, or require the applications to operate with different system availability and maintenance windows. In general, however, UNIX-like operating systems are good platforms for consolidation because they provide user separation and resource management capabilities and scale well on large platforms.
Some other operating systems—particularly those that were originally designed to be single-user systems—cannot be used as a base for consolidated workloads quite so easily. Their architecture can make coexistence of similar workloads impossible and coexistence of different workloads difficult. Modifying a single-user OS so that it can run multiple workloads concurrently can be much more difficult than designing this capability into the system at the beginning. The use of these platforms as single-application servers led to the industry mindset of one application per server, even on systems that can effectively run multiple applications simultaneously.
Another solution is needed: the ability—or apparent ability—to run multiple copies of the operating system concurrently with one workload in each OS, as shown in Figure 1.3. To the hardware, this arrangement does not differ dramatically from previous ones: The two workloads have become slightly more complex, but they are still two workloads.
To achieve the consolidation of multiple workloads onto one computer, software or firmware barriers between the workloads might be used, or entire copies of different operating systems might be run on the same system. The barriers separate virtual environments, which behave like independent computers to various degrees, depending on the virtualization technology. Once virtualization is accomplished, several benefits can be achieved, which fall into two categories:
Cost reductions
Reduced aggregate acquisition costs of computers
Reduced aggregate support costs of computer hardware
Reduced data center space for computers
Reduced need for electricity to power computers and cooling systems
In some cases, reduced support costs for operating systems
In some cases, reduced license and support costs for application software
Nonfinancial benefits
Increased architectural flexibility
Increased business agility due to improved workload mobility
Of course, nothing in life is free: There is a price to pay for these benefits. Some of the drawbacks of consolidation and virtualization are summarized here:
Perceived increase in complexity: One physical computer will have multiple VEs; this is balanced by having fewer computers.
Changes to asset tracking and run books: For example, rebooting a physical computer might require rebooting all of its VEs.
Additional care needed when assigning workloads to computers: The computer and the virtualization technology represent a single point of failure for almost all technologies. It is important to balance availability with consolidation density.
Potential for increased or new costs:
Some virtualization technologies have an upfront cost or recurring charge associated with licensing or support of the software.
A computer that supports virtualization may be more expensive than a single-workload system.
The level of support needed for a computer using virtualization may cost more than the level of support for the least important of the workloads being consolidated; if most of the workloads were running on unsupported systems, support costs might actually increase.
Data center architects and system administrators will need training on the virtualization technologies to be used.
After years of implementing VEs solely to isolate consolidated workloads, some users realized that certain benefits of virtualization can be worth the effort even if only one workload is present on a system. For example, the business agility gained from simple VE mobility can prove highly useful. The ability to move a workload (a process called migration) enables businesses to respond more quickly to changing business needs. For example, you might move a VE to a larger system during the day, instead of planning the acquisition of a new, larger system and the reimplementation of the workload on that system. The VE provides a convenient “basket” of jobs that can be moved from one system to another. Virtual machines are particularly effective at providing this benefit.
Some tools even enable regular migrations to respond to periodic fluctuations in demand. For example, a batch processing workload might have minimal processing needs during the day but perform significant work at night. This workload could be migrated to a small system with other light loads in the early morning and then migrated to a larger, more powerful system in the early evening. This might avoid a lengthy start-up time before each day’s processing.
Because VEs are convenient, manageable objects, other business needs can also be addressed with virtualization. A snapshot (i.e., a complete copy of a VE) can be made before the VE boots, or after its workload is quiesced. If the VE becomes damaged while it runs, whether accidentally or maliciously, the workload can be quickly restored from the snapshot. The data in the damaged copy can then be methodically inspected, both for valid transactions that should be rerun against the workload and as part of a thorough security analysis. Many file systems and storage systems include the ability to copy a storage object very quickly, reducing the effects of this operation on the service being provided.
Another advantage of VEs, even in a nonconsolidated configuration, is realized more fully by some virtualization technologies than by others—namely, security. Some types of VEs can be hardened to prevent users of the VE (even privileged ones) from making changes to the system. Operating system virtualization provides the most opportunities for novel security enhancements.
Virtualization can also help prepare the organization to handle future workloads. If the needs of future workloads are not known, it may be easier to meet those needs on a per-VE basis. For example, hypervisors can host different types of operating systems. A subsequent workload might use software that is available on only one operating system, which is a different OS than the one used by the first workload.
In summary, consolidation is used to reduce the costs of acquisition and operation, and virtualization is needed to isolate one workload from another workload so that each can operate as if it is the only workload running on the computer. Further, virtualization can be used in some cases for the unique advantages it brings, even on unconsolidated systems.
As computer architects and administrators gained experience with virtualization, they realized that a new model of workload architecture could be achieved. This model has been named cloud computing. As we mentioned earlier, this is a subset of workload consolidation.
According to the U.S. National Institute of Standards and Technology (NIST), cloud computing is a “model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources.” Virtualization addresses the needs of several “essential characteristics”:
On-demand self-service: Many types of VEs boot more quickly than an entire computer would, because hardware self-tests are not necessary.
Resource pooling: Mature virtualization technologies must include the isolation features necessary to ensure orderly pooling of resources.
Rapid elasticity: Many virtualization solutions permit dynamic addition or removal of hardware resources from individual VEs, and all of them simplify and accelerate on-demand start-up of VEs.
Further, NIST defines three service models: Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). Virtualization tools can be leveraged to enable each of these models.
Many of the capabilities of VEs can be put into context and further investigated with common use cases.
All consolidated systems require resource management features to prevent one VE from overwhelming a finite resource and thereby preventing other VEs from using it. This possibility is discussed in the use case “General Workload Consolidation” (discussed later in this chapter).
In addition, all consolidated systems need firm security boundaries between VEs to prevent one VE from interacting with another in an unintended fashion. This concept is discussed in Chapter 2, “Use Cases and Requirements.”
Virtualization creates the sense—and sometimes the reality—that there is something extra between the VE and the hardware. In some virtualization technologies, the extra layer creates performance overhead, reducing workload performance. This overhead also reduces scalability—the ability to run many workloads on the system. Nevertheless, the separation provided by this layer can be very beneficial, as it provides a more well-defined boundary between workload and hardware. This arrangement makes it easier to move a VE from one computer to another than it is to move an OS instance from one computer to another.
Learning to manage virtualized environments requires modifying your thinking and changing some practices. Some tasks that were difficult or impossible become easier. Other tasks that were relatively simple become more complicated. The lack of a one-to-one relationship between workload and physical computer presents a new challenge for many people. Also, a new opportunity afforded by virtualization—moving a workload from one computer to another while the workload is active—brings its own challenges, including keeping track of the workloads that you are trying to manage.
Fortunately, you can choose from a set of tools that aim to simplify the task of virtualization management. These tools perform hardware discovery, OS provisioning, workload provisioning, configuration management, resource management, performance monitoring and tuning, security configuration and compliance reporting, network configuration, and updating of OS and application software. This topic is discussed in Chapter 7, “Automating Virtualization.”
Many different models for system virtualization have been developed. These models share many traits, but differences between them abound. Some virtualization features are appropriate for some models; others are not.
Each model can be described in terms of two characteristics: flexibility and isolation. Those two characteristics have an inverse relationship: Typically, the more isolation between VEs, the less flexibility in resource allocation. Conversely, flexibility requires sharing, which reduces isolation. Based on these characteristics, you can create a spectrum of resource flexibility versus workload isolation and place any particular virtualization model or implementation on that continuum, as shown in Figure 1.4.
As described in detail later in this chapter, hardware partitions offer the most isolation but the least flexibility. This arrangement is appropriate for business-critical workloads where service availability is the most important factor. Each partition has complete control over its hardware. At the other end of the spectrum, operating system virtualization (OSV) offers the most flexible configurations but the least isolation between the VEs, which are often called containers. Containers also provide the best scalability and have demonstrated the highest virtualization density. OSV is also discussed later in this chapter.
Between those two extremes, the virtual machines model creates the illusion that many computers are present, using one computer and a layer of firmware and/or software. That layer is the hypervisor, which provides multiplexed access from each operating system instance to the shared hardware. It also provides the ability to install, start, and stop each of those instances. Two types of VM implementations are possible: A Type 1 hypervisor runs directly on the hardware, while a Type 2 hypervisor runs on an operating system. Both types of hypervisors are discussed later in more detail.
Some of these virtualization models can be combined in one system, as shown in Figure 1.5. For example, one virtual machine can run an OS that also supports OSV. You can use layered virtualization to take advantage of the strengths of each type. Note that this strategy does add complexity, which is most noticeable when troubleshooting problems.
The next few sections of this chapter describe each of the three virtualization categories shown in Figure 1.4: hardware partitions, virtual machines, and operating system virtualization. The descriptions provided here are generic, discussing factors common to related solutions in the industry. Implementations specific to the Oracle Solaris ecosystem are described in the next few chapters.
Each of the descriptions in this chapter mentions that model’s traits and strengths. A detailed analysis of their relative strengths and weaknesses is provided in Chapter 8, “Choosing a Virtualization Technology.” Also, the Appendix contains a detailed narrative of the history of virtualization.
Maximizing isolation within the same computer requires a complete separation of compute resources—software and hardware—that still achieves some level of savings or flexibility compared to separate systems. In the ideal case, an electrically isolated environment (a partition) is an independent system resource that runs its own copy of an operating system. The OS runs directly on the hardware, just as in a completely non-virtualized environment. With this approach, any single failure, either in hardware or software, in a component of one VE cannot affect another VE in the same physical computer. Hardware partitions are used, for example, to consolidate servers from different company departments when maximum isolation and technical support chargebacks are required.
In some implementations, the only shared component is the system cabinet, although such an approach yields flexibility but little cost savings. This is especially true if the resources in different partitions cannot be merged into larger VEs. Other implementations share interconnects, clock control, and, in some cases, multiple hardware partitions on a single system board. On a practical level, the minimum components held in common would consist of the system cabinet, redundant power supplies and power bus, and, to promote flexible configurations and minimally qualify as virtualization, a shared but redundant backplane or interconnect. The label “virtualization” can be applied to some of these systems because the CPUs, memory, and I/O components can be reconfigured on the fly to any partition while still maintaining fault isolation. This limited set of common components provides the best failure isolation possible without using separate computers.
Because of these characteristics, some people do not consider hardware partitioning to really be virtualization. Nevertheless, because of the role that hardware partitioning plays in consolidating and isolating workload environments, we will include this model in our discussions. The next few sections discuss some of the relevant factors related to hardware isolation.
Several factors should be considered when choosing the type of virtualization for a particular situation. We will describe the application of those factors to each of the virtualization types, in the following sections.
Limiting the set of components shared by two different partitions increases the failure isolation of those environments. With this approach, a failure of any hardware component in one partition will not affect another partition in the same system. Any component that can be shared, such as the backplane, must also be partitionable so that a failure there affects only the one partition using that component. This isolation scheme is shown in Figure 1.6.
A complete implementation of partitions prevents the use of any type of covert communication channel. In essence, partitions are as secure as separate computers. The level of protection from security attacks relies solely on the applications, operating systems, and hardware. Denial-of-service attacks between partitions are not possible because nothing above the system’s power grid is shared.
Separate hardware requires a distinct copy of an operating system for each partition. This arrangement reinforces the separation of the partitions and maintains the benefits, and effort, of per-partition OS maintenance, such as OS installation and patching. To maximize partition independence, a failure in one OS instance must be prevented from affecting another partition.
Because each partition runs a separate operating system instance, it is possible to run different versions of an operating system, or different operating systems, in different partitions.
When using specialized software such as high-availability (HA) clustering, certain configurations may not achieve the desired availability goals. Proper configuration will prevent one failure from affecting both partitions. Some implementations used in the industry achieve this level of independence more effectively than others.
Most hard-partitioning systems allow the partitions to be different sizes. A partition can usually be resized. With some types, this operation requires stopping all software, including the operating system, that was using the resources being reconfigured. The ability to reconfigure the quantity of resources contained in each partition without a service outage can be a powerful feature, enabling nondisruptive load-balancing. Changing the sizes of two partitions can be viewed as moving the barrier between them, as depicted in Figure 1.7.
Most of these systems are large-scale systems (more than four CPU sockets per system) and contain multiple CPU sockets on each circuit board. If such a system is configured with multiple partitions per CPU board, a hardware failure on that CPU board can cause multiple partitions to fail. CPU failures affect only the partition that was using that CPU. For that reason, where failure isolation is the most important consideration, only one partition should be configured per CPU board. In contrast, if partition density is the most important consideration, the ability to configure multiple partitions per CPU board will be an important feature.
Two related types of scalability exist in the context of system virtualization: guest scalability and per-VE performance scalability. Guest scalability is the number of VEs that can run on the system without significantly interfering with one another. Hardware partitions are limited by the number of CPUs or CPU boards in the system, but can also be limited by other hardware factors. For some of these systems, only 2 partitions can be configured for each system; as many as 24 partitions can exist in others.
Because these systems are generally intended to perform well with dozens of CPUs in a single system image, they usually run large workloads on a small number of partitions. Their value derives from their combination of resource flexibility, failure isolation, and per-VE performance scalability.
Because hardware partitioning does not require an extra layer of software, there should be no performance overhead inherent in this type of virtualization. Applications will run with the same performance as in a nonpartitioned system with the same hardware.
Hardware isolation requires specialized hardware. This requirement usually includes components that aid in the management of the partitions, including the configuration of hardware resources into those partitions. These components may also assist in the installation, basic management, and health monitoring of the OS instances running on the partitions. Specialized ASICs control data paths and enforce partition isolation.
Hardware partitions offer the best isolation in the virtualization spectrum. Whenever isolation is the most important factor, hardware partitions should be considered.
Partitions are the only virtualization method that achieves native performance and zero performance variability. Whether the workload is run on an eight-CPU partition or an eight-CPU nonpartitioned system, the performance will be exactly the same.
Compared to other virtualization methods, partitions offer some other advantages as well. Most notably, few changes to data center processes are required: Operating systems are installed and maintained in the same fashion as on non-virtualized systems.
Several products offer excellent hardware isolation. This section provides a representative list of examples.
The first server to use SPARC processors and Solaris to implement hard partitioning was the Cray CS6400, in 1993. Sun Microsystems included Dynamic Domains on the Enterprise 10000 in 1997 and has continued this pattern in every subsequent SPARC generation, including the M6-32 and recently released M7 systems. The implementation of hardware isolation in the most recent generation of SPARC processors is described in Chapter 5, “Physical Domains.”
On the CS6400, E10000, and the following generations of systems, this implementation provides complete electrical isolation between Dynamic Domains. There is no single point of failure in a domain that would affect all of the domains. However, a hardware failure of a component in the shared backplane can affect multiple domains. Starting in 1993, Dynamic Domains could be reconfigured without rebooting them.
Hewlett-Packard’s (HP’s) nPars feature was first made available on some members of the PA-RISC–based HP 9000 series. It is also a feature of some of HP’s Integrity systems. In 2007, HP added the ability to reconfigure these partitions without rebooting them.
Amdahl’s Multiple Domain Facility (MDF) and subsequently IBM’s mainframe Logical Partitions (LPARs) were among the earliest implementations of hardware-based partitioning, available since the 1980s. MDF and LPARs use specialized hardware and firmware to create separate execution contexts with assigned CPUs, RAM, and I/O channels. A domain or partition may have dedicated physical CPUs or logical CPUs that are implemented on a physical CPU shared with other domains and shared according to a priority weighting factor. Physical RAM is assigned to one partition at a time, and can be added or removed from a partition without rebooting it.
The first type of virtualization to become possible, and still one of the most popular approaches, is virtual machines. This model provides the illusion that many independent computers are present in the system, each running a copy of an OS. Each of these VEs is called a virtual machine. Software or firmware, or a combination of both, manages the OS instances and provides multiplexed access to the hardware. This supporting layer, which acts as the hypervisor, gives this model its flexibility but adds a certain amount of performance overhead while it performs its tasks.
Failure isolation of hypervisors varies with the implementation. Each shared resource is a single point of failure, including the hypervisor itself.
Most hypervisors provide virtual machines that mimic the physical hardware. A few of them emulate a completely different hardware architecture. Some of these hypervisors are used to develop new hardware, simulating the hardware in software or testing software that will run on the hardware. Others are used to run software compiled for a CPU architecture that is not available or is not economical to continue operating.
A Type 1 hypervisor comprises software or firmware that runs directly on the computer’s hardware. It typically has components found in a complete operating system, including device drivers. Some implementations offer the ability to assign a set or quantity of physical CPUs or CPU cores to a specific VE. Other implementations use a scheduler to give each operating system instance a time slice on the CPU(s). Some versions offer both choices. Each VE appears to be its own computer, and each appears to have complete access to the hardware resources assigned to it, including I/O devices. Although hypervisors also provide shared access to I/O devices, this capability inflicts a performance penalty.
Type 1 hypervisors implement a small feature set designed exclusively for hosting virtual machines. When the system starts, the hypervisor is placed into the main system RAM or specific area of reserved memory; in some architectures, additional elements reside in firmware, hardware, and BIOS. The hypervisor may make use of or require specialized hardware-assist technology to decrease its overhead and increase performance and reliability.
The Type 1 hypervisor is a small environment designed specifically for the task of hosting virtual machines. This model has several advantages over Type 2 hypervisors—namely, simplicity of design, a smaller attack surface, and less code to analyze for security validation. The primary disadvantages of Type 1 hypervisors are that they require more coding and they do not allow a base operating system to run any applications with native performance. Also, they cannot freely leverage services provided by a host OS. Even mundane features such as a management interface or a file system may need to be built “from scratch” for the hypervisor. Adding these features increases the complexity of the hypervisor, making it more like an OS.
In most cases, Type 1 hypervisors use one VE as a management environment. The administrator interacts with that VE via the system console or via the network. The VE contains tools to create and otherwise manage the hypervisor and the other VEs, usually called guests. Some Type 1 systems also allow for one or more specialized VEs that virtualize I/O devices for the other VEs. Figure 1.8 shows the overall structure of a Type 1 hypervisor implementation, including a virtual management environment (VME) and a virtual I/O (VIO) VE.
Hypervisors that offer VIO guests typically provide an alternative to their use—namely, direct, exclusive access to I/O devices for some VEs. Direct access offers better I/O performance but limits the number of VEs that can run on the system. If the system has only four network connectors, for example, then only three VEs can have direct network access, because the virtualization management console (VMC) needs at least one NIC for its own use. This limit can be relieved on some virtualization platforms by Single Root I/O Virtualization (SR-IOV), a hardware specification that lets a single PCIe device appear to be multiple devices, each of which can be assigned to different VEs.
The VEs of Type 1 hypervisors offer good failure isolation. A software failure that occurs in one guest is not more likely to cause the failure of another guest, compared to two separate computers.
Failure of a shared hardware component, however, may cause a service disruption in multiple guests. Also, failure of the hypervisor will stop all of the guests. Because of the potential for failure of shared components, both software and hardware, multiple guests in one computer should not be used to achieve high availability.
Type 1 hypervisors tend to be very secure, but the details of that security depend on the implementation. Access to the hypervisor must be strongly protected because a successful attack on the hypervisor will enable the attacker to control all of the virtual machines that the hypervisor manages, including their storage.
In most implementations, guests share hardware resources. When those implementations are used, denial-of-service attacks between guests will be possible unless suitable resource controls are used.
Because each guest runs a separate copy of an operating system, Type 1 hypervisors have the potential to support heterogeneous operating systems, and many of them do. Even hypervisors that are able to run only one operating system type can use different versions of that operating system in different guests.
With hypervisors, the potential exists for one guest to consume a sufficient volume of resources to cause problems for other guests. Although some early hypervisors did not include resource controls, the potential for problems led to the implementation of fairly complete controls. Most Type 1 hypervisors implement controls on CPU usage, and dedicate a section of RAM for each guest. In addition, some of these hypervisors place controls on the amount of I/O throughput that may be consumed by each guest.
The scalability of hypervisors is limited by the efficiency of the hypervisor and the available hardware. Software hypervisors run on the computer’s CPU(s), reducing the effective CPU time available to guests, though this amount depends on the available features and the efficiency of resource usage. Each guest must be configured with its own RAM (usually at least a few gigabytes). Scalability of hypervisor guests is constrained more often by the amount of RAM than by any other resource.
Databases are particularly sensitive to resource availability. They require the ability to run certain software threads without delay, and they require large amounts of RAM. Further, they may not run well when virtual I/O is in use because they depend on low-latency I/O transactions.
Some workloads require only a small portion of the resource capacity available in modern systems. In this case, dozens of guests may run effectively with a Type 1 hypervisor.
Virtualization may reduce the quantity of computers needed, but hypervisors do not reduce the management burden. Each guest includes an operating system that must be managed.
IBM developed the first hypervisors, and coined the term by which they became known, for its mainframes in the 1960s. VM/370, a popular hypervisor for IBM mainframes, was introduced in 1972. Its descendant on current mainframes is z/VM, which supports virtual machines running IBM operating systems such as z/OS and z/VM, and open operating systems including several distributions of Linux.
Dell’s VMware ESXi is a Type 1 hypervisor for x86 systems. It supports common operating systems such as Microsoft Windows, Oracle Solaris, and OS X, and many releases of Linux distributions such as CentOS, Debian, Oracle Linux, Red Hat Enterprise Linux, SUSE Linux, and Ubuntu. Its virtualization management console (VMC) is called the service console.
Oracle VM Server for x86 and Citrix XenServer are commercial implementations of the open-source Xen hypervisor for x86 systems. Xen supports a variety of guest operating systems, but differs in architecture from VMware ESX: Xen uses specialized guest domains for parts of the virtualization infrastructure. A specially privileged “dom0” guest, running a Linux or BSD distribution, provides a VMC and often provides virtual I/O to other guests.
Oracle VM Server for SPARC, which is discussed in detail later, is a SPARC hypervisor on chip multithreading (CMT) servers that is used to support guests running Solaris. The VMC is called the control domain. Virtual devices are implemented by the control domain or one or more specialized VEs called service domains and I/O domains. Service domains can be grouped into HA pairs to improve availability of I/O devices. System administrators can also assign devices directly to VEs.
IBM’s PowerVM Hypervisor is a combination of firmware and software that creates and manages LPARs (logical partitions) on Power CPU cores. These systems also support a virtualization technology called Micro-Partitioning that can run multiple OS instances on a CPU core.
PowerVM LPARs may run AIX or Linux operating systems. PowerVM offers VIO partitions and direct device assignment.
Type 2 hypervisors run within a conventional operating system environment, enabling virtualization within an OS. The computer’s OS (e.g., Oracle Solaris, Linux distributions, Microsoft Windows) boots first and manages access to physical resources such as CPUs, memory, and I/O devices. The Type 2 hypervisor operates as an application on that OS. Like the Type 1 hypervisor, the Type 2 hypervisor may make use of or require hardware-assist technology to decrease overhead attributable to the hypervisor.
Type 2 hypervisors do have some disadvantages. For example, they must depend on the services of a hosting operating system that has not been specifically designed for hosting virtual machines. Also, the larger memory footprint and CPU consumed by unrelated features of a conventional OS may reduce the amount of physical resources remaining for guests.
The primary advantage of Type 2 hypervisors for desktop systems is that the user can continue to run some applications—such as e-mail, word processing, and software development programs—with the user’s favorite OS and its tools, without incurring a performance penalty. Other advantages include the ability to leverage features provided by the OS: process abstractions, file systems, device drivers, web servers, debuggers, error recovery, command-line interfaces, and a network stack. Similar advantages apply in server environments: Some applications on a server may run directly on the OS, whereas other applications are hosted in virtual machines, perhaps to provide increased isolation for security purposes, or to host a different OS version without the disruption of installing a Type 1 hypervisor on the bare metal. These advantages can be compelling enough to compensate for the memory footprint of the hosting OS and the performance penalty of the hypervisor.
It is sometimes assumed that a Type 2 hypervisor is inherently less efficient than a Type 1 hypervisor, but this need not be the case. Further, a Type 2 hypervisor may benefit from scalability and performance provided by the underlying OS that would be challenging to replicate in a Type 1 hypervisor.
Examples of Type 2 hypervisors include Oracle VM VirtualBox, VMware Server, VMware Fusion, Parallel Inc. Parallels Workstation, and Microsoft Windows Virtual PC.
The isolation of software and hardware failures with Type 2 hypervisors is similar to that provided by Type 1 hypervisors. Instead of relying on a Type 1 hypervisor to protect guests from hardware failures, a Type 2 guest relies on both the Type 2 hypervisor and the underlying operating system. The host operating system is more complex, however, and this complexity offers additional opportunities for failure.
Because Type 2 hypervisors are typically a component of a desktop computer environment, rather than a server, workloads in Type 2 hypervisor guests are less valuable targets and are rarely attacked specifically. Generally, the responsibility for their protection falls on the operating system. The security characteristics of the operating system should be considered before implementing a Type 2 hypervisor, if sensitive data will reside on the guests. When security is a concern, users should prohibit remote access and use antivirus software.
Many Type 2 hypervisors initially ran on only one type of host operating system, and supported only one type of guest operating system. Most of them now support multiple guest OS types, and many run on multiple types of hosts.
Resource management features vary greatly among the Type 2 hypervisors.
The ability to run multiple guests efficiently depends on the implementations of both the hypervisor and the host operating system. Physical resource limitations restrict the absolute quantity of compute and memory capacity available to the guests. Insufficient RAM is more frequently a problem than insufficient CPU.
Each Type 2 hypervisor includes a user interface for management of guests. Because these are desktop environments, tools for centralized management of multiple systems running Type 2 hypervisors are difficult to find.
Another distinction between different forms of hypervisor-based virtualization is whether the hypervisor offers full virtualization, paravirtualization, or both. When full virtualization is used, the hypervisor creates virtual machines that are architecturally consistent with the “bare metal” physical machine. With paravirtualization, the hypervisor provides software interfaces that virtual machines can use to communicate with the hypervisor to efficiently request services and receive event notifications. Paravirtualization can greatly reduce performance overhead—a factor that plagues many hypervisor solutions.
The advantage of full virtualization is that unmodified operating systems can be run as guests, simplifying migration and technology adoption, albeit at the cost of requiring the hypervisor to implement all platform details. This approach can create substantial overhead depending on the platform, especially for I/O, timer, and memory management.
Paravirtualization offers the opportunity to optimize performance by providing more efficient software interfaces to these and other functions. It can include cooperative processing between guest and hypervisor for memory management (e.g., memory ballooning, shared pages for I/O buffers), shortened I/O path length (e.g., device drivers making direct calls to the hypervisor or combining multiple I/O requests into a single request), clock skew management, and other optimizations. The main disadvantage of paravirtualization is that it requires source code to port the guest OS onto the virtualized platform, or at least the ability to add optimized devices drivers.
Examples of paravirtualization include Xen implementations such as Oracle VM Server for x86 and Citrix XenServer, Oracle VM Server for SPARC, the guest/host additions in Oracle VM VirtualBox, the VMware Tools for VMware ESX, and the Conversational Monitor System (CMS) under VM/370.
Hypervisors typically represent a “middle ground” between the isolation of hard partitions and the flexibility of OSV. The additional isolation of separate OS instances compared to that afforded by OSV allows for the consolidation of completely different operating systems. The hypervisor layer also provides a convenient point of separation for VEs, thereby facilitating and simplifying VE mobility.
Some hypervisors offer optional CPU and I/O partitioning, which can significantly reduce the overhead of the hypervisor. Of course, the scalability of this method is limited by the number of CPUs and I/O buses. Systems with few CPUs must share these resources among the VEs.
Hardware partitioning and virtual machine technologies share a common trait: Each virtual environment contains an instance of an operating system. Most of those technologies allow different operating systems to run concurrently.
In contrast, operating system virtualization (OSV) uses features of the operating system to create VEs that are not separate copies of an operating system. This approach provides the appearance of an individual operating system instance for each VE. Most OSV implementations provide the same OS type as the hosting OS. The two most commonly used terms used for guests of OSV implementations are “containers” and “zones.”
Figure 1.9 shows the relationships among the hardware, OS, and VEs when using OSV.
We have already discussed the importance of isolation to virtualization. The isolation between OSV VEs is just as important as the isolation noted in other models. For OSV, this isolation is enforced by the OS kernel, rather than by a hypervisor or hardware.
In the OSV model, all processes share the same operating system kernel, which must provide a robust mechanism to prevent two different VEs from interacting directly. Without this isolation, one VE could potentially affect the operation of another VE. The kernel must be modified so that the typical interprocess communication (IPC) mechanisms do not work between processes in different VEs, at least in a default configuration. The network stack can be modified to block network traffic between VEs, if desired. Existing security features can be enhanced to provide this level of isolation. In some implementations, it is easier to achieve a security goal in a container than in a non-virtualized environment.
OSV implementations are usually very lightweight, taking up less disk space and consuming less RAM than virtual machines. They also add very little CPU overhead. Nevertheless, although they can easily mimic the same operating system, most of them do not support the ability to appear as either another operating system or an arbitrary version of the same OS.
Another strength of this model of virtualization relates to the possibility of hardware independence. Because a physical computer is not being simulated, an operating system that runs on multiple CPU architectures can potentially provide the same feature set, including OSV features, on different types of computers.
Later in this book we will discuss an approach that can be used to choose a virtualization technology based on prioritizing certain factors, including the ability to isolate software and hardware faults. This section describes these factors in the context of operating system virtualization.
With OSV, all isolation of software and hardware failures must be provided by the operating system, which may utilize hardware failure isolation features if they exist. For example, the operating system may be able to detect a hardware failure and limit the effects of that failure to one VE. Such detection may require hardware features to support this functionality.
The isolation between processes in different VEs can also be used to minimize propagation of software or hardware failures. A failure in one VE should not affect other VEs. This kind of isolation is easier to achieve if each VE has its own network services, such as sshd.
Further, the operating system must prevent any event that is occurring in one VE from affecting another VE. This includes unintentional events such as software failures as well as actions taken by a successful intruder.
To be both robust and efficient, these hardware and software features must be tightly integrated into the OS implementation.
All of the necessary functionality of OSV is provided by the OS, rather than by hardware or an extra layer of software. Usually this functionality is provided via features integrated into the core of the OS. In some cases, however, the features are provided by a different organization or community and integrated on-site, with varying levels of application compatibility and processing efficiency.
The shared kernel offers a privileged user the ability to observe all processes running in all VEs, which simplifies the process of performance analysis and troubleshooting. You can use one tool to analyze resource consumption of all processes running on the system, even though many are running in different VEs. After the problem is understood, you can use the same centralized environment to control the VEs and their processes.
This type of global control and observability is nothing new for consolidated systems, but it provides a distinct advantage over other virtualization models, which lack a centralized environment that can inspect the internals of the guests. Indeed, analyzing an OSV system is no more complicated than analyzing a consolidated one.
After the resource usage characteristics of a particular workload are known, resource management tools should be used to ensure that each VE has sufficient access to the resources it needs. Notably, centralized control offers the potential for centrally managed, fine-grained, dynamic resource management. Many operating systems already have sophisticated tools to control the consumption of resources in one or more of these ways:
Assigning CPU time, which is performed by a software scheduler. This control can be achieved through process prioritization or by capping the amount of CPU time that a process uses during an interval of real time.
Providing exclusive assignment of a group of processes to a group of processors.
Dedicating a portion of RAM to a group of processes, capping the amount of RAM that a group of processes can use, or guaranteeing that a VE will be able to use at least a specific amount of RAM.
Dedicating a portion of the network bandwidth of a physical network port to an IP address or a group of processes.
Because most operating systems already have the ability to control these resources with fine granularity, if these controls are extended to the VEs, their resource consumption can be managed with the same granularity.
Some operating systems include automated features that detect and handle error problems. For example, system software might detect that a network service such as sshd has failed and attempt to restart that service. In the context of resource management, dynamic resource controls can be used to react to changing processing needs of different workloads by changing resource control values on the fly.
The basic model of OSV assumes that the VEs provide the same operating system interfaces to applications as a non-virtualized environment provided by the host operating system (e.g., system calls). If this similarity can be achieved, there is no need to modify applications.
Additionally, a particular implementation may mimic a different operating system if the two are sufficiently similar. In this case, the functionality of an OS kernel can be represented by its system calls. A thin layer of software can translate the system calls of the expected guest OS into the system calls of the hosting OS. This strategy can allow programs and libraries compiled for one OS to run—unmodified—in a VE that resides on a different OS, as long as they are all compiled for the same hardware architecture.
In such a multiple-OS configuration, the extra operations involved in translating one set of functionality to another will incur a certain amount of CPU overhead, decreasing system performance. Achieving identical functionality is usually challenging, but sufficient compatibility can be achieved to enable common applications to run well.
Operating system virtualization features must allow isolated access to hardware so that each VE can make appropriate use of hardware but cannot observe or affect another VE’s hardware accesses. In such a system, each VE might be granted exclusive access to a hardware resource or, alternatively, such access might be shared. Existing implementations of OSV provide differing functionality for hardware access.
Figure 1.10 shows most VEs sharing most of the CPUs and a network port. In this system, one VE has exclusive access to another port and two dedicated CPUs.
As part of the OSV design, engineers choose the line of division between the base OS operations and the VEs. A VE may include only the application processes or, alternatively, each VE may include some of the services provided by the operating system, such as network services and naming services.
Because one control point exists for all of the hardware and the VEs, all resource management decisions can be made from one central location. For example, the single process scheduler can be modified to provide new features for the system administrator. These features can be used to ensure that the desired amount of compute power is applied to each VE to meet its business needs. Because the scheduler can monitor each process in each VE, it can make well-informed scheduling decisions; it does not need to give control to a separate scheduler per VE.
An alternative method to moderate CPU power as a resource gives one VE exclusive access to a set of CPUs. This approach means that the VE’s processes have access to the entire capacity of those CPUs and reduces cache contention. RAM can be treated similarly—that is, either as an assignment of a physical address range or as a simple quantity of memory.
Other resource constraints can include limits or guaranteed minimum amounts or portions.
Operating system virtualization technologies tend to scale as well as the underlying operating system. From one perspective, all processes in all VEs can be seen as a set of processes managed by one kernel, including inter-VE isolation rules. If the kernel scales well to many processes, then the system should likewise scale well to many VEs of this type. At least one implementation of OS virtualization—Solaris Zones—has demonstrated excellent scalability, with more than 100 VEs running on one Solaris instance.
Operating system features must provide the ability to create, configure, manage, and destroy VEs. This capability can be extended to remote management.
Similar to other virtualization models, OSV has its particular strengths. Some of these benefits are specific goals of OSV implementations; others are side effects of the OSV model.
Many of the strengths of OSV implementations are derived from the tight integration between the OSV technology and the OS kernel. Most of these operating systems are mature and have well-developed facilities to install and maintain them and to manage multiple workloads. It is usually possible to extend those features to the environments created via OSV.
A significant strength of OSV is its efficient use of resources. This efficiency applies to the use of CPU time, RAM, and virtual memory.
When implemented correctly, OSV will not add any CPU overhead compared to a consolidated but non-virtualized system. The OS must still perform the same operations for that set of running applications. However, to perform well with more than three or four VEs, the OS must be scalable—that is, it must be able to switch the CPU(s) among the dozens or hundreds of processes in the VEs. It must also be able to efficiently manage the many gigabytes of RAM and swap space used by those processes.
Because OSV VEs do not have a separate OS instance, they do not consume hundreds of megabytes of RAM per VE for each OS kernel. Instead, the amount of RAM needed for multiple VEs typically is limited to the memory footprint of the underlying OS plus the amount of RAM used by each of the consolidated applications. In some implementations, operating systems that reuse a program’s text (program) pages can reduce the memory footprint of a VE even further by sharing those text pages across VEs.
Because of the single OS instance found in OSV, a centralized point exists for security controls. This arrangement also creates the possibility of per-VE configurable security and centralized auditing.
The primary goal for OSV implementations is to minimize the effort needed to maintain many operating systems in a data center environment. Put simply, fewer OS instances means less activity installing, configuring, and updating operating systems. Because the OS is already installed before a VE is created, provisioning VEs is usually very rapid, taking anywhere from a few seconds to a few minutes. The minimalist nature of OSV also reduces the time to boot a VE—if that step is even needed—to a few seconds.
Examples of OS virtualization include Oracle Solaris Zones (also called Solaris Containers from 2007 to 2010), HP-UX Secure Resource Partitions, Linux Containers, and AIX Workload Partitions. Each of these products follows the model described earlier, with their differences reflecting their specific use of network and storage I/O, security methods and granularity, and resource controls and their granularity.
Server consolidation improves data center operations by reducing the number of servers, which in turn reduces hardware acquisition costs, hardware and software support costs, power consumption, and cooling needs. Virtualization enables consolidation of workloads that might otherwise interfere with one another, and of workloads that should be isolated for other reasons, including business agility and enhanced security.
Three general virtualization models exist. Operating system virtualization creates virtual OS instances—that is, software environments in which applications run in isolation from one another but share one copy of an OS and one OS kernel. Virtual machines rely on a hypervisor to enable multiple operating system instances to share a computer. Hardware partitioning separates a computer into separate pools of hardware, each of which acts as its own computer, with its own operating system.
Each of these models has both strengths and weaknesses. Each has also been implemented on multiple hardware architectures, with each implementation being most appropriate in certain situations. The better you understand the models and implementations, the more benefit you can derive from virtualization.
3.15.26.135