9. Applications of Oracle Virtualization

Most of this book describes Oracle Solaris virtualization technologies. This chapter illustrates the application of these technologies in several strategic use cases. Rather than simply show examples of applications running in virtualized environments, which is obvious, we show interesting cases where the unique capabilities of Oracle Solaris virtualization are leveraged. Examples include Oracle’s use of virtualization to build strategic products, and illustrations of actual but anonymous customer deployments.

9.1 Database Zones

Multiple database instances can be isolated in a variety of ways. Extreme isolation can be achieved by running them in different servers, or different data centers. Solaris Zones achieve a good balance of isolation and flexibility, providing an effective security boundary around each zone, and creating individual namespaces. This software and security partitioning does not use a hypervisor, so it does not reduce performance as software hypervisors do. Database zones are appropriate to consolidate multiple databases in single Solaris systems, SPARC or x86, and Database as a Service (DBaaS) clouds.

The inherent characteristics of Solaris Zones, as well as their comprehensive features, are described in Chapter 3, “Oracle Solaris Zones.” The following subsections describe several of the features specific to Solaris and Solaris Zones that are particularly relevant to the use of Oracle Database software.

9.1.1 Identity and Naming Services

Each Solaris Zone has its own set of Solaris configuration files. These provide per-zone naming and naming services, including separate host names, IP addresses, naming services such as LDAP, and users.

A database zone cannot detect other database zones or interact with them, unless those zones use normal network methods to communicate with each other.

9.1.2 Security

Several security features lend themselves well to databases. A few of them are discussed here.

The configurable security boundary of Solaris Zones is implemented with Solaris privileges. Further, those privileges can be used by the system administrator to enable specific nonprivileged users to manage specific zones. For example, a user account within a database zone can be given all of the privileges available within the zone, and that user account can be assigned to a database administrator (DBA). The DBA would then effectively have superuser privileges within the zone. In addition, a user account in the global zone can be assigned to the DBA. The system administrator could then delegate administrative abilities to the DBA, enabling the DBA to boot that zone and perform some other administrative tasks.

Oracle Transparent Data Encryption uses the Solaris 11 encryption framework, which works exactly the same way in a zone as on bare metal. Combined with the individual namespaces of Solaris Zones, this framework enables a zone to protect its own data without risk of disclosure by the global zone administrator or other DBAs—a security factor that is particularly important in a cloud environment.

9.1.3 Resource Management

Although most types of Solaris Zones share all resources, a comprehensive set of resource management features enables the platform administrator to control per-zone resource consumption. This makes the boundary around a zone an effective resource management boundary.

Properly managed consolidated environments prevent “resource hogs”—that is, workloads that consume so much of a resource that others “starve” and cannot work properly. Resource management controls are required in any consolidated environment, including cloud computing.

9.1.3.1 CPU

For database zones, you can use any of the CPU management methods available to zones. Nevertheless, two factors are worth highlighting in conjunction with CPU management: computing efficiency and software license cost.

Unless you oversize the server, or assign hardware resources to a database zone, CPU performance may be inconsistent in regard to databases. For example, adding a new database may reduce the available computing capacity, in turn reducing the performance of all databases. To ensure consistent performance and optimal efficiency, one or more whole cores should be assigned to a database zone. This method reduces the opportunities for CPU contention.

By assigning whole cores with a dedicated-cpu or capped-cpu setting, you can also limit software license costs. Solaris Zones are one of the virtualization methods that can be used as “hard partitions” by Oracle for software license purposes.

9.1.3.2 Memory

One goal when using database zones is to prevent paging, thereby ensuring consistent, optimal performance. Any workload that is paged out to swap disk will experience greatly degraded performance when its pages are brought back in from swap disk. A second goal is to prevent one or more zones from using so much RAM that other zones starve, leading to paging or—even worse—an inability to allocate RAM.

You can configure a system’s zones to avoid paging in any of them, but you must understand their memory requirements. The goal is to limit the RAM usage of each zone, so that the aggregate memory usage is less than the amount of RAM available to them. One method to accomplish this is to configure a virtual memory cap for each zone. Chapter 3, “Oracle Solaris Zones,” describes this feature, which is known as the swap property. Recall that the virtual memory cap refers to the sum of RAM plus the amount of swap disk. For each zone, the swap property should be set to the same size as the sum of RAM and swap disk, when the workload runs in a non-virtualized environment. Be sure that this cap is large enough: If the zone attempts to allocate more memory than configured, software will fail. By constraining the use of RAM by each zone, you implicitly ensure that there is enough RAM for each other zone; see Figure 9.1. If you might deploy zones in the same server in the future, you should limit the set of zones and the amount of RAM in use.

Image

Figure 9.1 Capping Aggregate RAM Use

A slightly more complicated method allows some paging, and the ability to control the amount of swap space that can be consumed per zone. Assigning a value to both the physical and swap properties of the capped-memory resource will limit the amount of RAM used by each zone, and also limit the amount of virtual memory used by each zone.

Kernel zones are a special case, because each has its own preallocated physical memory pages in RAM, along with a separate swap disk area. You can mimic the previously described configuration by setting the physical property of the capped-memory resource to the amount of virtual memory that the database processes will need.

Once you understand the memory needs of each database, you can assign a few database zones that will fit into a particular server. For example, a server with 512 GB of RAM could be used to run five databases, each of which needs 100 GB. More than 100 GB of RAM will still be left for use by the global zone, and by another, similar database zone that you might add in the future.

9.1.3.3 Network

Virtual NICs (vNICs) add both flexibility and security. For example, one physical Ethernet port can be connected to your database management network, enabling you to access all of the database zones via their vNICs. This feature can also be used to reduce the number of IP addresses used by database zones, which are visible on the data center network.

Multiple ports can be used to isolate different types of network access, such as client access and RAC interconnects. Databases that are not protected by RAC would not have access to the RAC interconnect networks.

The use of redundant vNICs provides for efficient sharing of a small number of physical network ports, by multiple database instances. Each database zone should use two vNICs, each connected to one of two physical network ports that are configured for redundancy.

If desired, bandwidth consumption can be managed to prioritize some database instances over others.

9.1.3.4 Storage

Internal storage may be used with database zones, or remote storage may be accessed via the usual methods: NFS, FC, and iSCSI. Storage bandwidth may be controlled using Solaris features, if the storage is connected to the system with Ethernet.

Storage access is similar to a non-zoned environment. Redundant connections help here, too.

Although ZFS works well for database storage, it does not provide all of the performance and data ecosystem features available with ASM. ZFS is a good choice for most development databases, as well as low to medium I/O-rate databases.

9.1.3.5 Shared Memory

Although most types of Solaris Zones share all resources, a comprehensive set of resource management features allow the platform administrator to control per-zone resource consumption. This makes the boundary around a zone an effective resource management boundary.

9.1.4 Administrative Boundary

Booting a zone is faster than booting the entire computer, so start-up of a database instance in a zone is faster, too. This is also true if a zone reboot is needed, which means less downtime for the database.

Several methods can be used to create many zones that are almost identical in configuration. In development environments, this kind of technique makes it easier to ensure that developers are using the same tool sets. It is even more useful in test environments, where maintaining a consistent environment may be crucial both to successful production deployments and to problem resolution.

Using zones reduces confusion regarding Oracle Homes, because each zone will have its own home. Other zones’ homes will not be visible to that zone, thereby eliminating the possibility of naming conflicts.

Both native zones and kernel zones may be reconfigured while they run, including the devices that have been assigned to them. This strategy further reduces the need for downtime, as storage devices may be added to a running zone.

The ability to migrate a zone improves the flexibility of consolidated environments, including databases. The method of migration, however, depends on the type of zone. Solaris 10 Zones and Solaris 11 native zones may be halted and then moved. Kernel zones may be moved while they are running, without any modification to application software. Nevertheless, live migration of huge-memory virtual environments (VEs) may take a very long time, even though users will not notice a disruption of service. In extreme situations, the disruption may be noticeable, or even cause other software to time out. When availability is a primary concern, Oracle RAC may be a better solution than live migration.

9.1.5 Fault Isolation

Solaris Zones are not as effective at isolating faults as logical domains are, because multiple zones rely on one Solaris kernel. Even so, they do provide effective software fault isolation, including the naming services mentioned earlier. A failure of a database software component within one zone will not affect other zones.

Zones can be RAC nodes—an approach that protects against hardware failure as well as software failures.

9.1.6 Conclusion

Database consolidation, including a DBaaS cloud environment, requires sufficient isolation between databases to comply with corporate policy and government regulations. Oracle Solaris Zones provide a complete security and identity boundary, along with comprehensive resource controls, thereby satisfying the needs of almost all consolidated database environments. Further, these goals are achieved without negative performance impact or additional cost.

9.2 Virtualization with Engineered Systems and Oracle SuperCluster

Oracle Engineered Systems are designed to provide performance and scale for enterprise applications, and are typically optimized for a particular workload category. Examples include Oracle Exadata, which is designed for high-performance Oracle databases in OLTP and DSS environments; Oracle Exalogic, which provides complementary benefits for middleware applications; and Oracle Exalytics, which is designed for data analytics. Oracle Engineered Systems share common design patterns, and they leverage engineering between layers of the Oracle hardware, networking, virtualization, OS, and application software stack for optimal and predictable performance.

One member of the Oracle Engineered Systems family is Oracle SuperCluster. This product provides unique value for multi-tier enterprise services that require high performance, scale, and availability.

9.2.1 Oracle SuperCluster

Oracle SuperCluster shares many properties associated with other members of the engineered system family. It differs from Exadata and Exalogic, however, in that it supports high-performance database and application tiers; it also permits a heterogeneous mix of database and application versions, whereas Exadata and Exalogic tightly integrate and certify specific application versions. Oracle SuperCluster is especially distinct in that it leverages SPARC, Solaris, and the related virtualization technologies of Oracle VM Server for SPARC and Solaris Zones instead of the x86 servers and Oracle Linux used in most Oracle Engineered Systems.

Oracle SuperCluster has been evolving over SPARC and Solaris product generations with several consistent themes. SuperCluster systems are delivered in a fixed number of preplanned configurations containing SPARC “compute nodes” with logical domain and Solaris Zones virtualization, shared general-purpose storage and specialized Exadata storage cells, an InfiniBand communications backbone, and a common management infrastructure and toolset.

The first SuperCluster systems used the SPARC T4 system as the compute node building block, with successive generations being based on later SPARC platforms. At the time of this writing, the latest version is SuperCluster M7, which is based on the M7-8 server platform. This section will focus on the M7-8 version, though many of the principles described here also apply to its predecessors.

SuperCluster heavily leverages virtualization, and it uses SPARC physical domains, Oracle VM Server for SPARC, and Solaris Zones for isolation, multitenancy, and resiliency, and to permit concurrent operation of different Solaris OS versions.

9.2.2 Hardware Architecture

Oracle SuperCluster M7 contains the following components:

Image One or two SPARC M7 chassis, with two physical domains per chassis, each containing 1 to 4 processors with 32 cores each, with up to 8 TB of RAM per rack. The M7 servers are powerful systems with software-in-silicon hardware features that accelerate database operations, support encryption, and protect data integrity against storage corruption and overlays. Each M7 server can be subdivided into 4-socket “physical domains” (PDoms), as discussed in Chapter 5, “Physical Domains,” each of which can be treated as a stand-alone server that can independently fail, reboot, or be serviced. Up to 18 SuperCluster M7 systems can be connected together on the same InfiniBand network.

Image ZS3 (ZFS storage appliance) with 160 TB (raw) capacity for virtual machine and application use.

Image Exadata storage servers for Oracle Database data, with between 3 and 11 servers, each of which has 96 TB disk and 12.8 TB flash memory capacity. Besides raw capacity and speed, Exadata storage servers are “intelligent storage” that offload database functions such as database scans, filtering, joins, and row selection. Queries that might require hosts to read millions of records, and then discard rows that do not match the query, are offloaded and filtered in storage, with only relevant rows being returned. This technique reduces server CPU utilization and core requirements, minimizes the amount of data movement, and speeds up database operations beyond what can be accomplished with regular storage. Costs are further reduced by horizontal columnar compression performed on the storage device.

Image QDR 40 Gb/s InfiniBand network. This permits high-speed communication between SuperCluster components.

Scale is enhanced by adding incremental storage and by scaling to a multi-rack configuration, all composed as a single system. Additionally SuperCluster can use external storage, typically based on FibreChannel arrays, that may already exist in an enterprise architecture.

9.2.3 Virtualization Architecture

Oracle SuperCluster uses layered virtualization based on physical domains, Oracle VM Server for SPARC, and Solaris Zones. Zones run within logical domains, which themselves run within physical domains. Each of these virtualization technologies runs with zero or near-zero overhead, making it possible to leverage virtualization without incurring a performance penalty.

General-purpose (non-engineered) systems also use these virtualization technologies, and can organize them into a variety of combinations depending on user requirements. This flexibility permits different levels of performance, availability, cloud capability, live migration, consolidation levels, workload mixes, resource sharing, and dynamic resource management.

A design goal of Oracle SuperCluster is to yield the highest possible performance, with predictable behavior for each workload. To meet that goal, it uses a small number of preplanned configurations with optimal performance and pretested, validated, deterministic behavior.

9.2.4 Physical Domains

Each SuperCluster M7 compute node is divided into two physical domains for maximum isolation and independence. Each physical domain acts as a separate computer system in the same M7 chassis, and has half the CPU, memory, and I/O resources of the M7 compute node.

A chassis can have two, four, or eight CMIOU (CPU, memory, I/O) boards, depending on whether the chassis is quarter, half, or fully populated.

9.2.5 Logical Domains

Each PDom uses Oracle VM Server for SPARC logical domains with specific preset domain configurations and roles. Compared to general-purpose deployments that are oriented toward flexibility and convenience, this design emphasizes static resource assignment in fixed configurations to provide maximum performance.

SuperCluster PDoms host a small number of high-performance logical domains. An Oracle SuperCluster M7-8 PDom can have as many logical domains as CMIOUs: A PDom with one CMIOU board (in a quarter-populated chassis) can have one logical domain, while a PDom with four CMIOUs can have up to four logical domains. By comparison, general-purpose SPARC systems can have as many as 128 logical domains on the same server or physical domain, with resource granularity down to the individual CPU thread.

SuperCluster logical domains use physical I/O and static domain allocation when optimal performance is needed. General-purpose Oracle VM Server for SPARC deployments provide very good performance while offering flexibility and the benefits of virtual I/O (live migration, dynamic reconfiguration), while SuperCluster ensures native or near-native I/O performance under all circumstances.

SuperCluster optimizes performance by aligning domain CPU, memory, and I/O resource allocations to eliminate non-uniform memory access (NUMA) latency. NUMA effects can sharply reduce performance on vertically scaled, multi-socket servers. SuperCluster uses topology-aware rules to arrange assignment of CPUs, RAM, and PCIe buses to ensure local memory access. This can have a substantial effect on ensuring maximum performance and eliminating overhead and variability due to NUMA latency.

SuperCluster domains have specific application-oriented roles that are superimposed on the standard logical domain types.

9.2.5.1 Dedicated Domains

Dedicated domains are statically defined at SuperCluster installation time. They own PCIe root controllers, with direct physical access to 10 GbE NICs and to InfiniBand HCAs for storage and Exadata private networks. This setup ensures bare-metal performance for I/O, and it eliminates dependencies on other domains.

Dedicated domains run Oracle Solaris 11, and are further designated as being either database domains or application domains. The primary difference between these types is that database domains access Exadata storage cells over the private Exadata InfiniBand network for optimized Oracle Database 11g Release 2 or Oracle Database 12c performance. Application domains can run arbitrary applications (hence the name) as well as older versions of Oracle Database or other database software.

The number and configuration of dedicated domains is established at installation time, and can be changed on a post-installation basis by Oracle staff. Resource granularity is provided in increments of 1 CPU core and 16 GB of RAM.

9.2.5.2 Root Domains

Root domains run Oracle Solaris 11 and host SR-IOV physical functions (PFs) to provide virtual functions (VFs) for I/O domains, as described later in this chapter. SR-IOV devices may be Ethernet, InfiniBand, or FibreChannel.

Root domains are also Oracle VM Server for SPARC service domains, which provide I/O domains with virtual devices for iSCSI boot disks.

Applications are not run in SuperCluster root domains, so as to avoid compromising performance or availability in the I/O domains depending on them. This is consistent with best practices for service domains in general-purpose Oracle VM Server for SPARC environments.

Root domains are kept small so as to make resources available to the other domain types, yet should be large enough to drive virtual I/O requirements. A root domain with one InfiniBand HCA and 10 GbE NIC typically has two cores and 32 GB of RAM; double the resources are available if it has two HCAs and 10 GbE NICs. This scale is consistent with the general-purpose Oracle VM Server for SPARC practice of balancing service domain performance with the additional consideration of static allocation and heavy use of physical I/O.

Like dedicated domains, root domains are configured at installation time and can be subsequently reconfigured by Oracle staff.

9.2.5.3 I/O Domains

I/O domains run Oracle Solaris 11 and use SR-IOV virtual functions for applications. Corresponding SR-IOV physical functions are served by the root domains, as described previously. This scheme provides near-native I/O performance for Ethernet, InfiniBand, and FibreChannel devices. I/O domains use virtual I/O devices for iSCSI boot devices; these devices are provided by root domains using disk back-ends in the SuperCluster’s built-in ZFS storage appliance. The use of the term “I/O domains” in SuperCluster is somewhat different than that in standard logical domains: I/O domains in SuperCluster use virtual I/O device resources from root domains, but when discussing logical domains outside the context of SuperCluster, I/O domains have direct connections to physical I/O devices.

Unlike the other domain types, I/O domains can be dynamically created and then destroyed in the post-installation environment using the I/O Domain Creation tool, which lets data center administrators define the domain and associate CPU, memory, and I/O resources—in particular, assigning InfiniBand and 10 GbE virtual functions from designated root domains. Resources are drawn from “parked” (unassigned) resources in CPU or memory repositories, and returned to the repositories when domains are removed.

As with dedicated domains, I/O domains can be further characterized as either database domains or application domains. There are physical distinctions between the two: Dedicated domains own PCIe root buses for direct physical I/O, whereas I/O domains use SR-IOV and implement static resource assignment to avoid NUMA effects.

9.2.5.4 Domain Considerations

Databases and other applications are hosted in dedicated domains or I/O domains. Dedicated domains are recommended for larger applications using more than one 32-core processor (sometimes referred to as a socket), while I/O domains are better suited for smaller applications needing up to one 32-core processor. That reflects the higher I/O capacity available to dedicated domains, and it maximizes the benefit from NUMA optimizations that permit scalable performance over multiple CPU chips. Further, dedicated domains own the physical resources for their I/O and are subject to neither variations in performance from competing use of a physical I/O resource nor availability dependencies on a root domain.

I/O domains provide native or near-native performance for a single application of moderate size. A chief benefit of I/O domains is that they can be dynamically created or destroyed as needed by using the osc-setcoremem administrative tool. They depend on root domains for their I/O. While I/O performance is expected to be near-native, there can be competition between I/O domains for physical I/O resources they share.

In practice, when starting or stopping SuperCluster systems, root domains must be started up before I/O domains to make the necessary services and SR-IOV functions available. The reverse is true when shutting systems down: The I/O domains should be stopped first, and then the root domains on which they depend.

Root domains should be considered infrastructure domains for the purpose of hosting I/O resources needed by the I/O domains. Applications should not run there to avoid compromising their availability and performance.

All domains on SuperCluster M7-8 run Solaris 11.3 or later. Branded zones, described in the next section, can be used in dedicated or I/O domains for applications certified with Solaris 10. Earlier versions of SuperCluster can run Solaris 10 and earlier versions of Solaris 11.

9.2.6 Oracle Solaris Zones

Dedicated domains and I/O domains can host Solaris Zones—an arrangement that permits increased virtualization density compared to the number of logical domains supported on a SuperCluster system. Solaris Zones have no overhead, so there is no performance penalty for adding zones to logical domains within physical domains. Layered virtualization with zones provides granular resource allocation, workload isolation, and security between workloads.

Zones are recommended when additional resource sharing and fine-grained allocation are needed. As a default, zones are defined without dedicated CPU and memory resources, so resources not used by an idle zone are available to other zones without any administrative effort. CPU allocations to zones can be controlled with the Fair Share Scheduler (FSS), with the exception of database zones, which also should not use rcapd for memory controls (these rules apply in non-SuperCluster environments, too). It is also possible to define zones with dedicated CPU and memory resources, as with general-purpose Solaris deployments, for applications. The same Solaris administrative tools for zones (zoneadm, zonecfg) are used as on general-purpose systems.

Both dedicated domains and I/O domains can use native Solaris 11 zones. In each of those categories, application domains can optionally use Solaris 10 branded zones. Solaris 10 branded zones provide the appearance of running an application in a Solaris 10 system for applications that have not been validated against Solaris 11, or have been imported from physical Solaris 10 instances elsewhere. Earlier SuperCluster editions support running Solaris 10 directly, and also permit Solaris 8 and Solaris 9 branded zones to support even older software stacks on more recent hardware.

9.2.7 Summary of Oracle SuperCluster Virtualization

Oracle SuperCluster uses layered virtualization based on physical domains, logical domains, and Solaris Zones. It provides zero or near-zero overhead virtualization to deliver maximum performance and scalability. SuperCluster takes advantage of virtualization where it is appropriate, but avoids it when desirable. A notable example is the use of physical I/O to avoid virtualization overhead, and static allocation to provide optimal assignment of CPU, memory, and I/O for lowest latency. Compared to general-purpose systems that leverage logical domains, this kind of virtualization is static in nature, and designed for a relatively small number of large, high-performance OS instances, rather than a large number of small OS instances. This design choice is consistent with its goal of being the optimal platform for large enterprise applications and databases.

9.3 Virtualization with Secure Enterprise Cloud Infrastructure

Virtualization concepts open the door to a bewildering array of choices. Oracle Optimized Solutions focus on a small set of workloads, or a specific type of deployment, delivering proven guidance that reduces the number of choices to a manageable set. These solutions then yield balanced integration and flexibility. This section describes the application of virtualization to one of those solutions: Secure Enterprise Cloud Infrastructure.

9.3.1 Introduction

Oracle researches practical compute architectures, with the intention of reducing the amount of research, testing, and risk associated with “do it yourself” data center projects. Its findings are documented as Oracle Optimized Solutions, which include detailed descriptions of hardware and software configurations. One of these designs is the Secure Enterprise Cloud Infrastructure (SECI) solution. This section describes the role that virtualization plays in SECI, the technologies chosen, and the rationale for those choices.

Many corporations have benefited from public cloud computing environments. Advantages include reduced capital expenditure costs and increased business agility. However, public cloud computing is not appropriate for every workload. Government regulations, for example, require in-house storage of certain types of data. Some corporations also have concerns regarding information security; these concerns are fed by the occasional high-profile attack as well as aggregate data about attacks that are not publicized.

To benefit from the advantages of cloud computing, while avoiding the known and perceived weaknesses of this approach, many corporations are creating “private clouds”—that is, collections of compute, networking, and storage equipment that can be used as easily and flexibly as public clouds, while allowing the organization to maintain in-house control over the data and processes. Private clouds also deliver an interim solution for corporations that want to take a phased approach to public cloud adoption.

Private clouds are often used for development and test, and for smaller production workloads, especially when many people are performing these tasks and must use an isolated compute environment for a short period of time—on a scale of hours to days. Large production workloads are not so dynamic and require the best performance and scalability, which is not always compatible with the cloud computing model, either public or private.

As a private cloud solution, SECI is able to provide multiple cloud computing service models, as defined by the National Institute of Standards and Technology (NIST): Software as a Service (SaaS), Platform as a Service (PaaS) (including Database as a Service [DBaaS]), and Infrastructure as a Service (IaaS). The SECI Implementation Guide details the steps to create an IaaS cloud. Other documents describe other service models for SECI.

The SECI design enables typical IaaS capabilities, greatly simplifying the provisioning of virtual environments and operating system instances running in those VMs. The features in Oracle Enterprise Manager Ops Center use a simple point-and-click interface that is aware of compute nodes, storage devices, and network configurations. It can deploy VEs, both Oracle Solaris Zones and Oracle VM Logical Domains, and also deliver Solaris packages into those VEs.

With OEM Cloud Control, you can also automate provisioning of database environments into those VEs. The SECI DBaaS guide adds another layer of functionality. DBaaS is one type of PaaS, giving self-service users such as database administrators and database testers the ability to quickly instantiate a database environment. An element in a service catalog might be a simple database instance, or it might include data protection, business continuity, and archival features. In simpler configurations, users specify just the quantity of basic compute resources, such as CPU cores, RAM, and amount of disk storage.

Similar functionality exists for integrated development environments (another type of PaaS), as well as for Oracle Fusion Middleware, Oracle SOA Suite, and others.

Finally, Oracle Enterprise Manager delivers automated provisioning of SaaS software such as Oracle and ISV applications, which can also be deployed in an SECI environment.

9.3.2 SECI Components

Like most Oracle Optimized Solutions, SECI includes hardware and software specifications as well as architecture details so that you can build a configuration that has already been tested with specific combinations of software. This solution is built on SPARC computers that use the new SPARC M7 processors, the Oracle Solaris operating system, and Oracle storage systems.

Because a basic environment consists of servers in one data center rack, either a small number of larger systems or a large number of smaller systems can be configured. The compute nodes in an SECI environment are SPARC systems that use one, two, or four SPARC M7 processors, resulting in a maximum of 640 compute cores per base rack. Each rack also includes two ZFS storage appliances, storing petabytes (PBs) of data.

Two types of expansion racks exist. Compute racks hold up to 20 servers, with up to 1024 CPU cores for each rack. Storage racks can combine SSDs and HDDs. Eight racks may be connected together, creating one cloud with thousands of cores and thousands of terabytes of data.

At the time of this book’s writing, the smallest compute node in an SECI environment was a SPARC T7-1 system, which includes one SPARC M7 processor and 512 GB of RAM. The midsize node was a SPARC T7-2, and the largest was a SPARC T7-4, including two and four SPARC M7 CPUs, respectively.

A data center rack configured for SECI can hold up to 14 T7-1 computers, yielding 448 cores per rack. An expansion rack may be added that holds 20 of these computers. The other computer models lead to different quantities of systems per rack and cores per rack, with a maximum of 640 cores per rack using SPARC T7-4 nodes.

The SECI documents refer to these compute nodes as “virtualization servers.” Because these systems use the same hardware, operating system, and infrastructure software in an SECI private cloud as they do when set up as individual systems, the performance of workloads in virtualization servers should be the same in the two different configurations.

Each virtualization server runs the Oracle Solaris 10 or 11 operating system in logical domains using the Oracle VM Server for SPARC features described elsewhere in this book. Workloads that require the use of Solaris 8 or 9 can run in Solaris “branded zones” that maximize software compatibility. Branded zones are described in Chapter 3, “Oracle Solaris Zones.”

Oracle Enterprise Manager Ops Center, described in Chapter 7, “Automating Virtualization,” is the primary infrastructure management interface to SECI environments. Ops Center creates server pools, which are groups of logical domains or Solaris Zones configured to run a specific set of workloads. These VEs are housed on the storage systems, enabling them to boot on any compute node in their server pool. Figure 9.2 shows the high-level components of a server pool.

Image

Figure 9.2 A Server Pool

Ops Center also provides tools to manage the servers and VEs, including provisioning and updating software and firmware. Monitoring tools are also included, enabling you to observe utilization and investigate changes in performance over time.

9.3.3 Service Domains

If you choose to create logical domains that use only physical I/O, the number of domains that you can create is limited by the quantity of PCIe slots in the computer. To create more logical domains, you must use virtual I/O. Chapter 4, “Oracle VM Server for SPARC,” described service domains—that is, logical domains that “own” PCIe slots and use them to provide virtual I/O services to guest domains. These services include virtual network devices and virtual disks.

SECI uses control domains and other logical domains to act as service domains. The use of guest domains that access storage and the network via service domains increases the workload density of one rack.

In the context of SECI, VEs in server pools are guest domains or zones in guest domains. In other words, these VEs rely on the virtual I/O provided by service domains. Achieving the performance and availability goals of server pools requires proper configuration of the service domains they will use.

Server pools that use logical domains require the use of one or two service domains that provide virtual I/O services to the guest domains that run user workloads. These service domains should be configured with sufficient CPU and RAM so that they can efficiently deliver their services. As a general rule, we recommend that service domains have at least two CPU cores. If a service domain uses more than one 10 GbE port or FC card, an additional core should be configured for each port in use, subject to measured CPU load and memory consumption.

Generally, service domains should be configured with 16 GB of RAM. In environments that plan for low workload density and do not use ZFS or NFS for virtual disk back-ends, 8 GB of RAM may be sufficient.

9.3.4 Server Pools

A core concept of Secure Enterprise Cloud Infrastructure is the “server pool.” This logical design element represents a set of VEs that are configured to run in a set of computers. These computers are called “virtualization servers” in this context. You use OEM Ops Center to configure each VE.

After a server pool has been created, self-service users can use OEM Cloud Control to specify the desired characteristics such as the quantity of cores. Based on the user-provided information, Cloud Control chooses an unused VE that satisfies the need at hand. The infrastructure software then performs any necessary configuration changes such as logical storage connections, and starts the VE and any requested additional workload. It also restarts VEs after software or hardware failure.

A server pool uses a variety of physical resources, including compute and storage assets. When a VE is not running, it uses only storage space. While the VE is running, its workload uses CPUs, RAM, and network devices associated with that VE. The cloud administrator controls the amount of those resources allocated to the VE, and determines whether a VE will share resources or use dedicated resources, in conjunction with Ops Center and Cloud Control.

All VEs are stored on the ZFS storage appliances. Storage for logical domains is typically accessed from the VE via NFS, but Solaris Zones use iSCSI or FC protocols. Storage for application data is not limited to the SECI hardware, but can be connected by standard methods and protocols.

In addition, Ops Center includes availability and load-balancing policies. These features reduce the amount of downtime usually associated with updates and upgrades of hardware and software, and minimize the amount of downtime caused by software or hardware failures. Load-balancing features ensure adequate performance of all workloads, or of the most important workloads, if resource consumption of one or more VEs changes over time.

The process of updating a computer’s firmware may require rebooting the computer. When performing this administrative operation, SECI uses existing virtualization features to minimize or eliminate downtime for workloads. Ops Center supports live migration features of the SPARC platforms. In an SECI environment, VEs are installed on shared storage, allowing the user to move a running VE from one computer in a pool to another in the same pool, without any disruption of the workload running in that VE. After all workloads have been migrated to other physical nodes, the firmware may be updated. As mentioned earlier, this ability has many other practical applications.

SPARC systems have very robust uptime characteristics. Sometimes, however, a VE may fail, or the virtualization server it was using may fail. In that case, Ops Center automatically restarts the VE, using a different server if necessary. If the failure of a virtualization server leads to restart of several VEs, Ops Center restarts them in order of their importance—a parameter that you choose when configuring VEs.

In addition, you can configure a server pool so that it periodically compares the resources being used by the VEs, resulting in automated live migrations of one or more VEs. Several parameters control the details of this load-balancing feature. You can also manually initiate a load-balancing operation.

9.3.5 Security

SECI benefits from encryption in VEs, with no additional performance impact compared to a non-virtualized environment. The encryption technologies used in SECI do not require any modification of software. The network and storage configuration can use encryption, in a manner that remains transparent to the application.

Oracle Solaris contains a diverse set of security features, such as Role-Based Access Control (RBAC). These features offer fine-grained administrative permissions and improve administrative accountability, VLANs to isolate network traffic, and ZFS data encryption to protect data at rest. SECI is able to leverage all of those features.

SECI environments typically use separate networks for management, storage, and workload data to further isolate access. Separate networks make it easier to achieve the proper balance between security and functionality in each context.

9.3.6 Planning of Resources and Availability

You can consider an SECI environment to be one large server that has been partitioned—albeit without the high price point typically associated with large systems. When planning VEs and server pools as part of this environment, several factors should be considered.

Compute and memory resources will be configured for a VE, but not assigned until the VE is booting on a virtualization server. Conversely, a VE begins consuming persistent storage—hard disk drives or flash storage devices—when it is provisioned, and it continues to use that storage until the VE is destroyed. Each VE will be stored on an NFS share or a FC or iSCSI LUN. To meet the typical requirements, storage of the operating system should be mirrored, either within the share or LUN or by mirroring two shares or LUNs.

For Solaris Zones, we recommend a minimum of 1 vCPU (a hardware thread) and 2 GB of RAM for each zone. Each logical domain should be assigned at least 1 core and 16 GB of RAM. For each type of VE, the maximum quantities are limited only by the amount of physical resources in each server. Generally, the hardware resources needed in a VE are the same as those needed in a non-virtualized environment.

Planned hardware and software maintenance will require the migration of VEs to other server(s) in the set of nodes in a server pool. Although it is not necessary to configure each physical node so that it can run all of workloads, you should plan for at least one or two computers to be unavailable. For example, perhaps you will choose a group of eight nodes, sized so that six of them together are able to comfortably run all of the workloads. Then, if the environment experiences a hardware failure in one node while you are performing planned maintenance on another node, there will be no outage of the service provided to your customers.

Traditional high availability (HA) of services can be achieved with the optional Oracle Solaris Cluster software. Solaris Cluster can perform very fast detection of application or VE failover; when it identifies such a state, it can perform an automated failover to another physical node.

Physical node connectivity (I/O: storage and network) must be large enough to handle the largest aggregate demands of the VEs that may run on it. Also, the I/O connections must provide availability appropriate for the most demanding VE that will run on it.

The back-end storage characteristics must be appropriate for all of the VEs. For example, if one or more workloads are databases that require a certain I/O worst-case transaction latency, the virtual storage, including the back-end, must deliver that latency or better.

The ability to react to changes in workload demand suggests that a good default assignment of VEs to computers should consist of one very important workload, plus a few other less important workloads. If one or more of the workloads unexpectedly increases, one or more of the less important workloads can be moved to other computers in the server pool. If the increase involves the critical workload, the other workloads can be moved and the resources available to the critical one can be increased. If the increase is in another workload, it can be moved to a computer with spare capacity, or other small workloads can be moved, freeing up resources that can be reassigned.

9.3.7 Conclusion

SECI uses virtualization to achieve flexible, highly automated consolidation. The ease of VE migration enables greater business agility, such that the system can more easily respond to changes in business volume. Some of those changes are predictable: A quarter-end workload, for example, might be expanded in one node after moving other workloads to other nodes. Other changes are unexpected. For example, a consumer product may become much more popular than expected, leading to a rapid increase in online ordering from a retailer. With SECI, you can quickly adapt to this shift in demand for compute resources.

9.4 Virtualization in Oracle Exalytics

Another example of Oracle Engineered Systems is the Oracle Exalytics In-Memory Machine (Exalytics, for short). Exalytics is designed for the problem domain of business analytics and decision support. It places special emphasis on in-memory processing, which is important for efficient processing of the ad-hoc queries seen for business analytics and decision support.

Exalytics comes in two varieties: (1) an x86 version and (2) a SPARC version based on the T5-8 server. Virtualization is used in both products, with each using the version of Oracle VM for its chip architecture.

The SPARC version of Exalytics is based on the T5-8 server platform, with 128 CPU cores (1024 CPU threads) and 4 TB of RAM. I/O uses solid-state disks, InfiniBand, 10 GbE networking, and FibreChannel. Exalytics provides enterprise-wide analytic processing for an institution that needs vertical scale and high performance, as well as a consolidation platform to prevent server sprawl caused by departmental solutions in multiple business units.

As with SuperCluster, the T5-8 Exalytics system can be configured using one of a preset number of logical domain configurations, with Oracle Solaris Zones supporting higher-granularity virtualization in each domain. This configuration permits concurrent operation of production and testing on the same platform, or co-residence of multiple business units, with ensured isolation, security, and non-interference.

The virtual environments host the Oracle analytic software: Oracle Business Intelligence EE Accelerator, Oracle In-Memory Data Caching, Oracle BI Publisher Accelerator, Oracle Business Intelligence Foundation Suite including Essbase, and Oracle Times-Ten In-Memory Database for Analytics.

The special advantage of this platform is the “in-memory acceleration” exploited by the application components. In-memory processing is essential for large business analytics and decision support applications, and the T5-8 platform provides this memory capacity and CPU capacity. Logical domains and zones permit multitenancy, yet add zero overhead to the solution stack. As with SuperCluster, the engineered system benefit is provided by a predefined set of configuration options, and deployment with solid-state disk and InfiniBand interconnects. That configuration lets the applications efficiently leverage the large memory of the T5-8 platform.

9.5 Consolidating with Oracle Solaris Zones

Solaris Zones are excellent environments for consolidating many kinds of applications. For example, consolidating multiple Oracle Solaris web servers into zones on one system is a straightforward process, but you can also consolidate web servers from other operating systems. Consolidating into zones provides better isolation and manageability than simply collapsing the contents into one web server. Benefits of using zones in this situation include workload isolation, comprehensive resource controls, delegated administration, and simple mobility, among others.

This section demonstrates the simplicity of migrating Apache web server environments from multiple UNIX and Linux systems to one Oracle Solaris 11 system. This example is relatively simple because Apache’s configuration files use the same syntax, for any one version, on any UNIX or Linux system.

Nevertheless, slight differences in file system layout must be taken into account during this consolidation. On most Linux distributions, the default location for Apache’s configuration file is /etc/httpd/conf/httpd.conf, while on Solaris it is /etc/apache2/2.2/httpd.conf. Further, the default home of web pages is /var/www/html on most Linux systems, but is /var/apache2/2.2/htdocs on Solaris 11 systems.

To further simplify this example, we will assume that the web servers are delivering static web pages that exist in one NFS file system. Migrating scripts that deliver dynamic content may require more effort.

The web servers in our example have equal importance. We will use resource controls to ensure that each web server has sufficient and equal access to system resources. Each zone will be assigned 100 CPU shares and allowed to use 1 GB of RAM, 2 GB of virtual memory, and 100MB of locked memory.

The commands shown in this section assume that the original web servers have been stopped. If you must minimize the service outage, you can choose different IP addresses for the zones and change the DNS maps after you have tested the zones. If the original Apache servers are part of a single load-balanced configuration, you can perform a rolling upgrade from the original systems to the new zones.

In the command examples shown here, the prompt GZ# indicates that you should enter the command from a shell running in the global zone. The prompt web01# indicates that you should enter the command from a shell running in the non-global zone named web01.

This example assumes that you are familiar with the content covered in Chapter 3, “Oracle Solaris Zones.”

9.5.1 Planning

The first step in the consolidation process is gathering the necessary information. For our example, there are five web servers, each with its home page in /web-pages/index.html. These servers are named web01 through web05. The new system, running Oracle Solaris 11, will have five zones named web01 through web05. Each original system had one IP address: 10.1.1.101 through 10.1.1.105. Those IP addresses will move to their respective zones, as shown in Figure 9.3.

Image

Figure 9.3 Consolidating Web Servers

Each web server mounts a shared NFS file system at /webpages. This directory is used in the Apache configuration file, /etc/apache2/2.2/httpd.conf, with the following directive:

DocumentRoot "/webpages"

9.5.2 Configure CPU Utilization

It is possible to enable the Fair Share Scheduler as the default scheduler for the system, for each zone, or for any combination of those. However, to effectively assign CPU shares to a zone, you must also make FSS the default scheduler for the system and assign shares to the global zone. To make the system boot with FSS as the default scheduler on subsequent reboots, enter this command in the global zone:

GZ# dispadmin -d FSS

To change the default scheduler to FSS without rebooting, enter this command:

GZ# priocntl -s -c FSS -i all

To immediately make FSS become the default scheduler and be used for all future system boots, you must enter both commands.

The FSS scheduling algorithm treats processes in the global zone the same way that it treats processes in non-global zones. To ensure that you can run programs in the global zone, including commands that manage the zones, you should ensure that the global zone has also been assigned a sufficient quantity of shares. The following command accomplishes that goal:

GZ# zonecfg –z global
zonecfg:global> set cpu-shares=100
zonecfg:global> exit

9.5.3 Create Zones

We will first create and customize one zone, then replicate it. After Oracle Solaris 11 is installed on the new system, the process of installing a zone begins with configuring it. Chapter 3, “Oracle Solaris Zones,” describes the individual commands used in the rest of this section.

GZ# zonecfg -z web01
web01: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:web01> create
zonecfg:web01> set cpu-shares=100
zonecfg:web01> add capped-memory
zonecfg:web01:capped-memory> set physical=1g
zonecfg:web01:capped-memory> set swap=2g
zonecfg:web01:capped-memory> set locked=100m
zonecfg:web01:capped-memory> end
zonecfg:web01: select anet 0
zonecfg:web01:anet> set allowed-address=10.1.1.101/24
zonecfg:web01:anet> end
zonecfg:web01> verify
zonecfg:web01> exit

Use sysconfig(1M) to create a system configuration profile for web01. Choose “None” for networking, because the network configuration is driven by the zone’s configuration.

GZ# sysconfig create-profile -o /tmp

SC profile successfully generated as:
/tmp/sc_profile.xml

Exiting System Configuration Tool. Log is available at:
/system/volatile/sysconfig/sysconfig.log.6445

Install web01 with the following system configuration profile:

GZ# zoneadm -z web01 install -c /tmp/sc_profile.xml
The following ZFS file system(s) have been created:
    rpool/VARSHARE/zones/web01
Progress being logged to /var/log/zones/zoneadm.20160516T143228Z.web01.install
...
Log saved in non-global zone as /system/zones/web01/root/var/log/zones/zoneadm.20160516T143228Z.web01.install

Boot the web01 zone to have an opportunity to further configure the zone as a web server. First configure the zone to be an NFS client. The file /etc/vfstab will need an entry like this one:

nfsserver:/web   -   /webpages   nfs   -   yes   intr,bg

A new zone does not automatically mount NFS shares, but it is very easy to enable this service, and any services that it needs:

web01# svcadm enable -r nfs/client

Although Oracle Solaris includes the Apache web server software, it is not installed by default. The following command will perform this installation, adding any other packages that are required:

web01# pkg install apache-22
           Packages to install:  7
           Mediators to change:  3
            Services to change:  2
       Create boot environment: No
Create backup boot environment: No
...
Updating package state database                 Done
Updating package cache                           0/0
Updating image state                            Done
Creating fast lookup database                   Done
Updating package cache                           1/1

You can configure the web server software after installation is complete. For this example, we will simply tell the software the location of the static web pages, by editing the file /etc/apache2/2.2/httpd.conf. Using that file, you can enable the web server. This starts the web server now, and tells Solaris that the web service should be started automatically when the zone boots. Also, Solaris will automatically restart the web service if it fails.

web01# svcadm enable apache22

At this point, the zone web01 is a functioning web server. Now we can replicate that zone and its configuration and customization, with two exceptions, to complete the project.

First, two preparation steps must be completed. The first command stops the original zone so that it can be replicated; the second saves its zone configuration in a file that can be used as input for the configuration of the other zones:

GZ# zoneadm -z web01 shutdown
GZ# zonecfg -z web01 export -f /tmp/web.cfg

The next few steps must be performed once per additional web server. The commands that following demonstrate the creation of one of these zones.

Using the zone configuration information from web01 as a starting point, configure a new zone. The only difference is the IP address, so you can simply edit the file that you created earlier, /tmp/web.cfg. Then configure the zone with this command:

GZ# zonecfg -z web02 -f /tmp/web.cfg

You must also modify the system configuration information that you saved in /tmp/sc_profile.xml. The only change needed is the node name, which must be changed from web01 to web02.

With that, we are ready for the final step: cloning the original zone.

GZ# zoneadm -z web02 clone -c /tmp/sc_profile.xml web01
The following ZFS file system(s) have been created:
    rpool/VARSHARE/zones/web02
Progress being logged to /var/log/zones/zoneadm.20160516T154210Z.web02.clone
Log saved in non-global zone as /system/zones/web02/root/var/log/zones/zoneadm.20160516T154210Z.web02.clone

Using the clone subcommand replicates the configuration modifications that have been made to web01, including enabling the NFS client service and Apache web service, and their configuration files.

9.5.4 Testing

You should now be able to test the zones with a web browser, using their host names or IP addresses.

9.5.5 Summary

Solaris Zones make it easy to consolidate multiple workloads from separate systems into one system, as there is only one copy of the OS to manage. This section demonstrated a manual method of creating a few similar zones. If your situation calls for a large number of zones, use of the Solaris Automated Installer will simplify the entire process of zone creation by automating it.

9.6 Security Hardening with Oracle Solaris Zones

Previous sections in this chapter described the use of Oracle Solaris virtualization in Oracle Engineered Systems. Many simpler uses of virtualization exist. For example, server virtualization is commonly used to consolidate multiple workloads onto one computer.

Oracle Solaris Zones have subtle uses in addition to their application for general-purpose consolidation. These uses rely on a unique combination of features:

Image Service Management Facility (SMF) services are configured separately for each zone, allowing you to turn off unneeded services in a zone such as Telnet, FTP, and even SSH, yet still allow secure access from the platform administrator’s environment, called the global zone.

Image Zones have a strict security boundary that prevents direct inter-zone interaction.

Image You can prevent unauthorized modification of Oracle Solaris programs with the file-mac-profile feature. This yields an “immutable” zone.

Image Privileges granted to a zone are configurable, enabling a platform administrator to further restrict the abilities of a zone, or to selectively enhance the abilities of a zone.

Image Resource management controls can be assigned to each zone, allowing the platform administrator to limit the amount of resources that the zone can consume.

How can this combination provide unique functionality?

Solaris Zones can be configured to be more secure than general-purpose operating systems in many ways. For example, even the root user of an immutable zone cannot modify the zone’s operating system programs. This limitation prevents Trojan horse attacks that attempt to replace those programs with malicious programs. Also, a process running in a zone cannot directly modify any kernel data, nor can it modify, add, or remove kernel modules such as device drivers. Zones lack the necessary privileges to modify the operating system and its kernel, and there is no mechanism to add privileges to a running zone from within the zone or anywhere else.

Even considering those measures, the ability to selectively remove privileges can be used to further tighten a zone’s security boundary. Other features can easily disable network access except for specific network services. This feat is difficult to accomplish in most operating systems without rendering the system unusable or unmanageable. After all, without SSH or Telnet service, how would you log in to such a system?

You can combine all of those limitations to achieve defense in depth—a strategy conceived by the U.S. National Security Agency to defend computers against unwanted intrusion. Disabling services limits the external attack surface of the zone. An attacker who tries to take advantage of a weakness in the service being provided, such as web server software, will find that the internal attack surface is also very small, because so little of the zone can be modified. If the zone is configured appropriately, an intruder who somehow gains entry cannot access other systems via the network, or can access only a specific list of systems and services provided by those systems.

The combination of the ability to enforce those limitations and the resource controls that are part of the functionality of Solaris Zones is very powerful. Collectively, they enable you to configure an application environment that can do little more than fulfill the role you choose for it.

This section describes a method that can be used to slightly expand a zone’s abilities, but then tighten the security boundary tightly around the zone’s intended application. This section combines individual steps and knowledge from Chapter 3, “Oracle Solaris Zones.” The example in this section uses the Network Time Protocol (NTP) service. Because this section is intended as a platform for a discussion of security, however, we do not provide a complete description of the configuration of NTP. You can visit http://www.ntp.org to obtain more information about the proper use of NTP. Note that many other services can be hardened by using this method.

The command examples in this section use the prompt GZ# to indicate a command that must be entered by the root user in the global zone. The prompt timelord# shows that a command will be entered as the root user of the zone named timelord.

9.6.1 Scenario

Imagine that you want to run an application on an Oracle Solaris system but the workload running on this system must not be accessible from the Internet. Further, imagine that the application needs an accurate sense of time, which can be achieved without zones by using multiple systems and a firewall. With Solaris Zones, you can accomplish those goals with just one system, and offer additional protection as well.

In this scenario, you will need two zones. One zone provides a traditional environment for the application, and will not be discussed further in this section. The second zone has the ability to change the system’s clock, but has been made extremely secure by ensuring that it meets the following requirements:

Image The zone can make outbound network requests and accept responses to those requests.

Image The zone does not allow any inbound network requests (even secure ones such as SSH requests).

Image The zone can make outbound requests only to a specific set of IP addresses and port numbers associated with trusted NTP servers.

Image The zone can set the system’s clock, which is used by the global zone and most types of zones.

Image The zone has minimal abilities beside the ones it needs to perform its task.

A Solaris zone can be configured to meet that list of needs.

Each zone has its own Service Management Facility. As with non-virtualized Solaris environments, network services provided by a zone do not accept remote connections by default, except SSH. The zone that will manage the clock can also be configured so that it does not respond to SSH requests. Likewise, configurable privileges enable you to remove unnecessary privileges from the zone. Typically, native zones share the global zone’s system clock, but may not modify it. A configuration setting can be used to permit a zone to modify the shared system clock.

Figure 9.4 shows the zone named timelord, the NIC it uses, and the system’s clock, which will be modified by the zone. It also shows a different internal network, plus the zone for the application. The application zone will share the bge0 NIC with the global zone.

Image

Figure 9.4 A Secure Network Service

9.6.2 Basic Steps

The following outline shows the steps to accomplish the goals described earlier in this section. It can be generalized to harden any service, not just an NTP client.

1. Configure and install a zone.

2. Boot the zone.

3. Install additional software.

4. Remove unnecessary privileges.

5. Reboot the zone to verify correct operation without those privileges.

6. Permit the zone to modify the system clock.

7. Lock down the system configuration.

8. Test the zone.

In most cases, the order of those steps is not critical, but occasionally it is necessary to remove privileges after the system has been configured because the configuration steps may require those privileges.

9.6.3 Implementing Hardened Zones

Chapter 3, “Oracle Solaris Zones,” discussed the commands used to create and boot a zone. The commands in the current example assume that a separate network interface (net1) exists, and that the zone was configured with the following commands:

GZ# zonecfg -z timelord
zonecfg:timelord> create
zonecfg:timelord> select anet 0
zonecfg:timelord:anet> set lower-link=net1
zonecfg:timelord:anet> set allowed-address=192.168.5.5/24
zonecfg:timelord:anet> set defrouter=192.168.5.1
zonecfg:timelord:anet> end
zonecfg:timelord> exit

Additional configuration information can be provided with the sysconfig(1M) command:

GZ# sysconfig create-profile –o /tmp –g
naming_services,location,users,identity,support,keyboard

You will be prompted to provide some system configuration information, after which you can boot the zone.

GZ# zoneadm –z timelord install –c /tmp/sc_profile.xml
GZ# zoneadm –z timelord boot

After the zone is running, you can disable unneeded Oracle Solaris services. By default, Solaris 11 and newly created Solaris Zones offer only one network service: SSH. We will access the zone from the global zone with the zlogin command, so we can disable that service as well. The service rpc/bind does not allow connections, but would show up in a port scan, so we will disable it, too. Finally, we will need the software related to time synchronization, so we will install that.

GZ# zoneadm -z timelord boot
GZ# zlogin timelord
timelord# svcadm disable rpc/bind
timelord# svcadm disable ssh
timelord# exit
GZ# zoneadm -z timelord shutdown

The next step is to modify the zone’s set of allowed privileges, removing unnecessary privileges and optionally adding privileges that are not part of the default set, but are needed so the zone can achieve its goals. It can be challenging to determine the complete set of privileges needed by a program unless you can exercise all of the code in the program. However, the ppriv(1) command will report attempts to perform operations that require privileges not currently held by the process. Zones have all the privileges that they need to use NTP, so we do not need to add any.

We know that this zone will not be using NFS as a client or server, so we can remove the sys_nfs privilege. Also, we choose not to support system accounting, although we could use that service if we wished. With that choice, we can remove the privilege sys_acct. We can use zonecfg to remove those two privileges:

GZ# zonecfg -z timelord
zonecfg:timelord> set limitpriv=default,!sys_nfs,!sys_acct
zonecfg:timelord> exit

At this point, the zone is configured without unnecessary privileges.

For this example, we will use the Network Time Protocol service daemon, ntpd(1M), to automatically synchronize the system’s clock with time servers on the Internet on a periodic basis. To enable the zone to modify that clock, the zone’s global-time setting must be set to true. The following command sequence permits the zone to set the system clock:

GZ# zonecfg -z timelord
zonecfg:timelord> set global-time=true
zonecfg:timelord> exit

After removing unnecessary privileges, and granting the ability to perform necessary operations that are not permitted by default, we can complete the configuration of the one service that this zone will perform:

GZ# zoneadm -z timelord boot

Wait for the zone to finish booting. Then, install, configure, and enable the NTP service:

GZ# zlogin timelord
timelord# pkg install network/ntp
timelord# cd /etc/inet
timelord# cp ntp.client ntp.conf
(Customize /etc/inet/ntp.conf for your site.)
timelord# svcadm enable network/ntp

The following command sequence prevents the zone from modifying its Solaris programs or configuration, beginning with the next boot:

GZ# zonecfg -z timelord
zonecfg:timelord> set file-mac-profile=fixed-configuration
zonecfg:timelord> exit

Reboot the zone and verify the immutability of the system configuration files:

GZ# zoneadm -z timelord reboot
GZ# zlogin timelord
timelord:# cd /etc/inet
timelord:/etc/inet# echo "# text" >> ntp.conf
-bash: ntp.conf: Read-only file system
timelord:/etc/inet# ls -l ntp.conf
-r--r--r--   1 root     root        4210 May 19 13:19 ntp.conf
timelord:/etc/inet# chmod 744 ntp.conf
chmod: WARNING: can't change ntp.conf

While isolating the zone, why not also limit the amount of resources that it can consume? If the zone is operating normally, the use of resource management features is unnecessary, but they are easy to configure and their use in this situation could be valuable. These limits could reduce or eliminate the effects of a hypothetical bug in ntpdate that might cause a memory leak or other unnecessary use of resources.

Further, limiting the amount of resources that can be consumed by the zone provides another layer of security in this environment. In particular, resource constraints can reduce or eliminate risks associated with a denial-of-service attack. Note that the use of these features is not strictly necessary. Instead, their use is shown here for completeness, to demonstrate the possibilities.

Chapter 3, “Oracle Solaris Zones,” described the resource controls available for Solaris Zones. Here is a brief explanation of our choices. Note, however, that there are other reasonable choices for this situation.

After setting a RAM cap on the zone described previously, a few quick tests with rcapstat(1) might show that the zone needs less than 100MB of memory to do its job. We could cap the amount of RAM at 100MB to prevent this zone from consuming an unnecessary amount of RAM, but we also want to prevent it from causing excessive paging. We can prevent a zone from paging by setting the RAM and virtual memory (VM) caps to the same value. Of course, we do not want to set the VM cap below the amount that is really needed, so we will set a generous limit on both: 100MB caps for both RAM and VM. A cap on locked memory will further minimize the potential for the zone’s processes to disrupt other legitimate activities without causing a problem for NTP.

NTP is not a compute-intensive activity, so we will limit its ability to use a CPU. Also, capping the number of processes limits the ability to exhaust a fixed resource: process table slots.

GZ# zonecfg -z timelord
zonecfg:timelord> add capped-memory
zonecfg:timelord:capped-memory> set physical=200m
zonecfg:timelord:capped-memory> set swap=200m
zonecfg:timelord:capped-memory> set locked=20m
zonecfg:timelord:capped-memory> end
zonecfg:timelord> add capped-cpu
zonecfg:timelord:capped-cpu> set ncpus=0.5
zonecfg:timelord:capped-cpu> end
zonecfg:timelord> set max-processes=100
zonecfg:timelord> exit

Additional restrictions can be placed on this zone, as mentioned later in this chapter.

9.6.4 Test

A simple method to test this zone in a lab environment includes stopping the zone, using the global zone to change the system clock with the date command, and then monitoring the clock while booting the zone. The clock should be synchronized with NTP servers within a few minutes.

9.6.5 Security Analysis

Many attacks on computers try to take advantage of a security weakness in software that listens to the network. The ability to turn off all such services greatly decreases the security risk posed by this kind of weakness. You can also use many of the other features of Oracle Solaris to enhance the security of a zone. Table 9.1 reviews the security features and capabilities used in the example described in this section.

Image

Table 9.1 Security Controls

Before using a security method like the one described in this section, you should validate its ability to handle the types of attacks you want to prevent. The method described in this section may or may not be suitable for your particular security needs.

9.6.6 Summary

Oracle Solaris Zones offer significant security capabilities not available with other virtualization technologies. An understanding of security issues and the features of Solaris Zones will help you to balance security and functionality needs.

9.6.7 Further Reading

This section focused on the process of fine-tuning the functionality of a Solaris Zone, by expanding on the default abilities and removing others. It was not intended to showcase all of the features that could have been used. Additional features that might be used in situations that require increased security are described in the Oracle Solaris documentation:

Image Solaris Privileges, described individually in the privileges(5) man page

Image Solaris IP Filter, to prevent any unwanted outbound network connections

Image Encryption for network traffic and ZFS file systems

Image Automated security compliance reporting tools

9.7 Customer Deployment 1

The preceding sections discussed Oracle architectures and products. In this section, we will briefly illustrate some anonymous (and blended) customer deployments that offer interesting examples of Oracle Solaris virtualization in practice. Several common deployment styles and use cases have proved successful and been deployed in multiple institutions in real-world applications.

One example involves a large financial institution that was scaling up core applications into logical domains on M-series servers. This company needed vertical scale for the applications, and it had stringent requirements for performance and availability. At the same time, it had applications that were certified for Solaris 10 and needed to continue to operate them.

To meet these requirements, the company deployed M6-32 systems using Oracle RAC across physical domains, which provides isolation and insulation from OS and processor faults, and across separate M6-32 systems, which provides additional protection. In addition to Oracle RAC, it deployed Oracle Solaris Cluster to maintain high-availability applications.

This financial institution used virtual I/O for many of its domains, but root domains and SR-IOV were used for the domains with the most critical performance. These choices maximized I/O performance without incurring the overhead of going through a service domain. Logical domains support SR-IOV for Ethernet, InfiniBand, and FibreChannel.

Another interesting choice was the use of logical domains dynamic resource management (DRM), which automatically adjusts the number of virtual CPUs in a domain based on its CPU consumption. This feature of Oracle VM Server for SPARC lets administrators set high and low thresholds for CPU utilization: When the domain’s utilization rate is low and it has more CPUs than it needs, CPUs are removed until the CPU utilization rate exceeds the lower limit, or until a minimum CPU count is reached. When a domain’s utilization rate is high, CPUs are added based on CPU threshold values until the maximum number of CPUs for the domain has been reached. This feature permits automatic CPU load-balancing for domains based on changing load conditions, without requiring further administrative intervention. This outcome was important for meeting the company’s goal of deploying several hundred domains without acquiring enough resources for the peak capacity needed by each domain. For even higher consolidation density, the organization deployed zones within the same domain containing an application and the database containing its data—a multitier application environment in a single domain.

This deployment illustrates the use of logical domains for large applications on vertically scaled servers, as well as server consolidation to reduce operational expense and complexity.

9.8 Customer Deployment 2

Another popular deployment pattern is creation of pooled server farms as private clouds. This pattern was introduced long before the term “cloud” came into popular usage, and has been seen at multiple institutions.

One company set up a pool of more than 100 T4 servers, and used them as a consolidation target from older “legacy” SPARC servers. The firm moved existing applications through a combination of reinstalling applications in new domains and using the logical domains’ physical-to-virtual (P2V) tool to rehost an existing physical Solaris instance in a guest domain. It also used standard Solaris techniques to create archives of existing Solaris systems running on older hardware. As a consequence, the company could install these systems in domains using customized JumpStart methods. It was ultimately able to reduce its server footprint by hundreds of physical servers, while moving to a more modern platform that added performance and agility.

A similar implementation was done at another customer in the same business sector. In this case, the company established “built for purpose” virtual server farms for middleware applications and for databases, each configured and optimized for its specific application category.

9.9 Customer Deployment 3

As an additional example of a popular pattern, multiple customers have combined Oracle VM Server for SPARC and Oracle Solaris Zones, similar to the strategies used by the previously mentioned companies, but with additional elaborations.

One common pattern is to rehost old Solaris versions in new hardware, and many customers are hosting Solaris 10 even on new M7 and T7 servers by running it in guest VEs. Some of these Solaris 10 Zones are hosting legacy Solaris 8 and Solaris 9 environments in branded zones so that they can continue to operate even older applications. Other customers, which do not have this increasingly rare combination, host Solaris 11 domains with Solaris 10 branded zones—a scheme that permits them to run Solaris 10 certified applications while benefiting from the Solaris 11 kernel, which runs much more effectively on the newest hardware.

Another example using nested virtualization comprises logical consolidation for a single application that resided on separate physical servers in the past. While customers frequently accomplish this type of consolidation with multiple domains, some have found that they can get better hosting density and improved performance by hosting related applications in zones residing in the same domain. This approach leverages the in-memory networking stack, which then eliminates network I/O latency for multitier applications. Examples include middleware combined with the organization’s database, and Oracle PeopleSoft and other layered applications combined with the database.

These examples illustrate how customers have leveraged Oracle virtualization technology to provide greater operational efficiency, increased performance, and reduced costs.

9.10 Summary

This chapter has reviewed use cases in which the unique capabilities of Oracle virtualization technologies are leveraged in Oracle product solutions, or exploited in customer deployments. The common theme is that these technologies can be combined to provide virtualization with near-zero overhead that permits high performance and high scale while providing the primary benefits of virtualization.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.90.26