Chapter
9

Storage Virtualization

TOPICS COVERED IN THIS CHAPTER:

  • What storage virtualization is
  • Host-based storage virtualization
  • Network-based storage virtualization
  • Controller-based storage virtualization
  • In-band vs. out-of-band storage virtualization
  • Use cases for storage virtualization
  • How to configure controller-based virtualization
  • Software-defined storage

image This chapter covers the popular forms of storage virtualization seen and implemented in the real world. We'll start out by explaining the many different layers of virtualization that exist in a typical storage stack, which have been there for years but are not often thought of as virtualization per se. We'll introduce the SNIA Shared Storage Model (SSM) and use this as a frame of reference for mapping different types of storage virtualization to where they fit in the SSM stack. We'll spend a fair amount of time talking about controller-based virtualization, which to date has been the most popular form of storage virtualization implemented by vendors and technology companies, as well as being deployed in the real world by customers. We'll review some potential use cases, look at how storage virtualization is commonly implemented, and consider some potential pitfalls. We'll finish the chapter by exploring the most recent form of storage virtualization to hit the market: software-defined storage. We'll examine some examples, pros, cons, and potential futures for software-defined storage, which promises to be a disruptive architecture.

What Storage Virtualization Is

Storage virtualization is one of those terms that means many things to many people. Ask five people what it means, and you could easily get five different answers. This isn't because storage virtualization is immature or still undefined as a technology. It's more likely because virtualization happens at just about every layer of the storage stack, and new forms of storage virtualization are coming along all the time.

At its highest level, virtualization is the abstraction of physical devices to logical, or virtual, devices. However, storage virtualization (as well as other forms of virtualization) often goes a step further by virtualizing devices that have already been virtualized. With this in mind, let's quickly look at a common storage stack with applications sitting at the top and physical storage media at the very bottom, and work our way down the stack via the route an application I/O would take. As we do this, I'll point out some of the many areas where virtualization takes place. Figure 9.1 helps make things a little clearer.

FIGURE 9.1 Levels of virtualization in traditional SAN storage

image

Applications usually speak to filesystems or databases, which sit on top of logical volumes. These logical volumes are often made by virtualizing multiple discrete LUNs into a single virtual/logical volume. The underlying LUNs might be accessed over a SAN, which itself is carved up into zones, which are basically virtual SCSI buses. On the storage array, the LUNs themselves are virtual devices defined in cache memory. Beneath these virtual LUNs are storage pools, which are virtual pools of storage that are sometimes created from multiple RAID groups. Underneath the RAID groups sit multiple physical drives, but even these physical drives are virtualized into a simple logical block address (LBA) space by their on-board controllers in order to hide the complexity of the inner workings. Every step described here can be considered virtualization.

As shown in Figure 9.1, lower layers in the stack are often virtualized as we move up the stack, resulting in layer upon layer of virtualization. The net result is that although your application, database or filesystem might think it is talking directly to a dedicated locally attached physical disk drive, it is absolutely not! It is talking to a virtual device that has been virtualized many times at many layers in the stack in order to deliver higher performance and higher availability.

image

Because of the multiple layers of virtualization that occur in order to provide an application with persistent storage, no application, filesystem, volume manager, or even operating system has any clue where on disk its data is stored. They may have an address in the local filesystem, but that maps to an address on a logical volume, which maps to an address on a LUN, which maps to pages in a pool, which map to blocks on a logical device on a RAID group, which map to LBA addresses on multiple physical drives, which finally map to sectors and tracks on drive platters. The point is that application, filesystem, and operating system tools have absolutely no idea on which sectors their data lives on the underlying storage media! Don't let this keep you awake at night worrying. These layers of virtualization have been in use for many years and are well and truly field tested and known to be sound. And as you will find out, these layers of virtualization bring massive benefits to the game.

Despite the numerous levels of virtualization previously described, these tend to be so established and deeply embedded that we often forget that they exist and hardly ever consider them as storage virtualization.

The SNIA Shared Storage Model

Before we dive into the various types of storage virtualization technology in the market, it's probably a good idea to have a quick look at the SNIA-defined Shared Storage Model (SSM).

image

As noted earlier in the book, SNIA is the Storage Networking Industry Association, which is a nonprofit trade association dedicated to advancing storage-related standards, technologies, and education. It is made up of companies that have an interest in advancing storage technologies and standards.

The SNIA SSM is a lot like the better-known Open Systems Interconnection (OSI) seven-layer networking model referred to so often in the network world. Although the SNIA SSM is far less famous and far less referenced than the OSI model, it can be quite useful when describing storage concepts and architectures. In this chapter, we'll use the SSM as a reference point when describing different types of storage virtualization.

Figure 9.2 shows the SNIA Shared Storage Model.

FIGURE 9.2 SNIA Shared Storage Model

image

As you can see in Figure 9.2, the SNIA SSM is a pretty simple layered model. At the bottom are the lowest-level components such as disk drives, solid-state media, and tapes. At the top is where higher-level constructs such as filesystems and databases are implemented.

Technically speaking, the SSM is divided into three useful layers under the application layer:

  • File/Record layer
  • Block Aggregation layer
  • Storage Devices layer

In the past, most of the interesting storage virtualization action has occurred at the Block Aggregation layer. But as you can see, there are three discrete sublayers within that layer. Figure 9.3 takes a closer look at the SSM and breaks the layers down a little more.

FIGURE 9.3 Further breakdown of the SNIA Shared Storage Model

image

image

The SNIA uses Roman numerals when identifying the layers of the SSM stack. This is done so that they are never confused with the layers of OSI network model.

Based on the SSM layered model, the SNIA defines three types of storage virtualization:

  • Host-based virtualization
  • Network-based virtualization
  • Storage-based virtualization

All of these occur at the Block Aggregation layer of the SSM.

Let's have a quick look at each of these.

Host-Based Virtualization

Host-based storage virtualization is usually implemented via a logical volume manager (LVM), often referred to as just a volume manager. Volume managers work with block storage—DAS or SAN—and virtualize multiple block I/O devices into logical volumes. These logical volumes are then made available to higher layers in the stack, where filesystems are usually written to them.

The logical volumes can be sliced, striped, concatenated, and sometimes even software-RAID protected. From the perspective of storage virtualization, they take devices from the layer below and create new virtual devices known as logical volumes. These logical volumes often have superior performance, improved availability, or both.

Figure 9.4 shows a simple volume manager creating a single logical volume from four LUNs presented to it from a SAN-attached storage array.

FIGURE 9.4 Logical volume manager creating a logical volume

image

Although definitely possible, and sometimes useful, volume managers are not often used to create software-RAID-protected logical volumes. This is normally due to the potential performance overhead associated with creating parity-based software RAID. Logical volumes created by a host-based volume manager also tend to be limited in scalability due to being host centric—they can't easily be shared between multiple hosts.

Figure 9.5 shows how host-based volume managers map to the SNIA SSM. Volume managers are host-based software that manipulate block storage devices and aggregate them as new block devices (for example, by taking two block devices and joining them together to form a single mirrored volume). These volumes are then utilized by databases and filesystems.

FIGURE 9.5 LVM mapping to the SNIA Shared Storage Model

image

In the real world, people generally don't consider LVM functions as storage virtualization. To be fair, this is probably because volume managers have been around and doing this kind of host-based virtualization a lot longer than we have been using the term storage virtualization.

Network-Based Virtualization

Network-based (or SAN-based) storage virtualization has been tried and failed. There were a few solutions, the best known probably being EMC Invista, but most, if not all, have gone the way of the dinosaurs.

Conceptually speaking, storage virtualization at the network layer requires intelligent network switches, or SAN-based appliances, that perform functions such as these:

  • Aggregation and virtualization of storage arrays
  • Combining LUNS from heterogeneous arrays into a single LUN
  • Heterogeneous replication at the fabric level (replicating between different array technologies)

All of these were promoted and raved about by vendors and the SNIA. All were attempted, and most have failed. They looked good on paper but, for the most part, never proved popular with customers.

For the record, these devices tended to be out-of-band asynchronous solutions.

Storage-Based Virtualization

Most people refer to storage-based virtualization as controller-based virtualization. And controller-based virtualization is the predominant form of storage virtualization currently seen in the real world. Because of this, you will look in depth at controller-based virtualization later in the chapter. But in a nutshell, advanced storage arrays are able to attach to other downstream storage arrays, discover their volumes, and use them the same way that they utilize their own internal disks, effectively virtualizing the downstream arrays behind them.

In-Band and Out-of-Band Virtualization

The SNIA categorizes storage virtualization as either in-band or out-of-band.

In in-band virtualization, the technology performing the virtualization sits directly in the data path. This means that all I/O—user data and control data—passes through the technology performing the virtualization. A common example in the real world is a storage controller that sits in front of and virtualizes other storage controllers. The controller that performs the virtualization sits directly in the data path—in between the host issuing the I/O and the storage array that it is virtualizing. As far as the host is concerned, the controller performing the virtualization is its target and looks and feels exactly like a storage array (in this case, it is, only it's not the ultimate destination for the data). Whereas from the perspective of the storage array that is being virtualized, it appears as a host. Figure 9.6 gives a high-level view of this type of configuration.

FIGURE 9.6 In-band storage virtualization

image

There are several advantages to this in-band approach to virtualization. One major advantage is that no special drivers are required on the host, and in the real world this can be a massive bonus. Another advantage is that in-band virtualization can be used to bridge protocols, such as having the device that performs the virtualization connected to FC storage arrays on the backend but presenting iSCSI LUNs out of the front-end. Figure 9.7 shows a configuration where a device performing in-band virtualization is bridging between iSCSI and FC.

FIGURE 9.7 In-band storage virtualization device performing bridging

image

Out-of-band virtualization, sometimes referred to as asymmetric, usually has the meta-data pass through a virtualization device or appliance and usually requires special HBA drivers and agent software deployed on the host. It is far less popular and far less deployed than in-band solutions.

Why Storage Virtualization?

So what are the factors that drive the popularity of storage virtualization? Storage virtualization attempts to address the following challenges:

  • Management
  • Functionality
  • Performance
  • Availability
  • Technology refresh/migrations
  • Cost

On the management front, storage virtualization can help by virtualizing multiple assets—be they disk drives or storage arrays—behind a single point of virtualization. This single point of virtualization then becomes the single point of management for most day-to-day tasks. For example, virtualizing three storage arrays behind a Hitachi VSP virtualization controller allows all four arrays to be managed by using the Hitachi VSP management tools. And as with all good virtualization technologies, multiple technologies from multiple vendors can be virtualized and managed through a single management interface.

On the functionality front, it is possible to add functionality to the storage devices being virtualized. A common example is when virtualizing low-tier storage arrays behind a higher-tier array. The features and functionality, such as snapshots and replication, of the higher-tier array (the array performing the virtualization) are usually extended to the capacity provided by the lower-tier array.

On the performance front, storage virtualization can improve performance in several ways. One way is by virtualizing more disk drives behind a single virtual LUN. For example, if your LUN has 16 drives behind it and needs more performance, you can simply increase the number of drives behind the virtual LUN. Another way that storage virtualization commonly assists with server virtualization is by virtualizing smaller, lower-performance arrays behind a high-performance, enterprise-class array with a large cache, effectively extending the performance of the large cache to the arrays being virtualized.

On the availability front, quite low in the stack, RAID groups are used to create RAID-protected LUNs that look and feel like a physical block access disk drive but are actually made up of multiple physical disk drives formed into a resilient RAID set. Array-based replication and snapshot technologies also add to the protection provided by virtualization.

On the technology refresh front, controller-based virtualization is often used to smooth out storage migrations. This helps volumes presented to a particular host to be nondisruptively migrated from one storage array to another without downtime to the host or interruption to I/O.

From a cost perspective, storage virtualization can assist by virtualizing cheaper, lower-end storage arrays behind higher-performance arrays. This brings the performance benefits of the higher-performance array to the capacity of the virtualized array. It also extends the features—such as replication, snapshots, deduplication, and so on—of the high-end array to the capacity of the lower-cost array.

image

Take all of these benefits of virtualization with a pinch of salt. They are all plausible and implemented in the real world, but they are also all subject to caveats. As a quick example, virtualizing low-end storage arrays behind a high-performance array can extend the performance of the high-performance array to the capacity provided by the low-end array, but only up to a point. If you try to push it too far, performance of the high-end array can start to suffer.

image

As always, all cost savings depend on how much you pay. Virtualization devices and licenses are seldom cheap or free.

Enough of the theory. Let's take a closer look at the most popular form of storage virtualization: controller-based virtualization.

Controller-Based Virtualization

First up, let's map this to the SSM model. Controller-based virtualization occurs at the device layer within the block aggregation layer, and is a form of block-based in-band virtualization. This is shown in Figure 9.8.

FIGURE 9.8 Controller-based virtualization mapped to the SSM

image

At a high level, controller-based virtualization is the virtualization of one storage array behind another storage array, as shown in Figure 9.9.

FIGURE 9.9 Controller-based virtualization

image

image

In a controller virtualization setup, one array acts as the master and the other as the slave. We will refer to the master as the virtualization controller and the slave as the virtualized array or array being virtualized.

In a controller-based virtualization configuration, the virtualizing controller sits in front of the array being virtualized. As far as hosts issuing I/O are concerned, the virtualization controller is the target, the ultimate end point to which data is sent. The host needs no special driver or agent software and is blissfully unaware that the array it is talking to is effectively a middleman and that its data is in fact being sent on to another storage array for persistent storage.

The virtualization controller is where all of the intelligence and functionality resides. The array being virtualized just provides dumb RAID-protected capacity with a bit of cache in front of it. None of the other features and functions that might be natively available in the virtualized array are used, and they therefore don't need to be licensed.

In most controller-based virtualization setups, when the host issues a write I/O, it issues I/O to the virtualization controller, which caches the I/O and returns the ACK to the host. The write I/O is then lazily transferred to the array being virtualized, according to the cache destage algorithms of the virtualization controller. The virtualization controller usually imitates a host when talking to the arrays virtualized behind it and issues the same write data to the virtualized array, which in turn caches it and commits it to disk according to its own cache destage algorithm. This is shown in Figure 9.10.

Because the virtualizing array usually imitates a host when talking to the array virtualized behind it, no special software or options are required on the arrays being virtualized. As far as they are concerned, they are just connected to another host that issues read and write I/O to it. For example, the HDS VSP platform supports controller-based virtualization and imitates a Windows host when talking to the arrays virtualized behind it.

FIGURE 9.10 Write I/O to a virtualized storage array

image

Vendors such as HDS and NetApp adopted the term storage hypervisor to describe their controller-based virtualization platforms. The description works, up to a certain point.

Next let's take a look at a typical controller-based storage virtualization configuration and walk through the high-level steps in configuring it.

Typical Controller Virtualization Configuration

In this section, you'll walk through the typical high-level steps required to configure controller-based virtualization. These steps are based around configuring controller-based virtualization on a Hitachi VSP array. Even though the steps are high level, they can solidify your understanding of how to configure a typical controller-based virtualization setup.

Configuring the Array Being Virtualized

Typically, the first thing to do is configure the array that you will be virtualizing. On this array, you create whatever RAID-protected volumes that you would usually configure, and you present them as SCSI LUNs on the front-end ports. You need to configure LUN masking so that the WWPNs of the virtualization controller have access to the LUNs. These LUNs don't need any special configuration parameters and can usually use normal cache settings. It is important that you configure any host mode options for these LUNs and front-end ports according to the requirements of the virtualization controller. For example, if the virtualization controller imitates a Windows host, you need to make sure that the LUNs and ports are configured to expect a Windows host.

For example, Figure 9.11 shows a virtualized array with 100 TB of storage configured as five 2 TB LUNs presented to four front-end ports. The front-end ports are configured to Windows host mode, and LUN masking is set to allow the WWPNs of the virtualization controller ports.

FIGURE 9.11 Configuring an array to be virtualized

image

image

It is vital that no other hosts access the LUNs that are presented from the virtualized array to the virtualization controller. If other hosts access these LUNs while the virtualization controller is accessing them, the data on those LUNs will be corrupted. The most common storage virtualization configurations have all the storage in the virtualized array presented to the virtualization controller. This way, there is no need for any hosts to connect directly to the virtualized array. It is, however, possible to configure some LUNs on the virtualized arrays for normal host access, with other LUNs used exclusively by the virtualization controller. The important point is that no single LUN should be accessed by both.

Configuring the Virtualization Controller

After configuring the array to be virtualized, next you configure the virtualization controller. Here you will configure two or more front-end ports—yes, front-end ports—into virtualization mode. This virtualization mode converts a port from its normal target mode to a special form of initiator mode that allows the ports to connect to the array being virtualized and discover and use its LUNs. In order to keep things simple, these ports usually emulate a standard Windows or Linux host so that the array being virtualized doesn't need to have any special configuration changes made. This keeps interoperability and support very simple.

Connecting the Virtualization Controller and Virtualized Array

Connectivity is usually FC and can be either direct attach or SAN attach. Due to the critical nature of these connections, many people opt for direct attach in order to keep the path between the two arrays as simple and clean as possible.

When the virtualization controller is connected to the array being virtualized, it performs a standard PLOGI. Because its WWPN has been added to the LUN masking list on the array being virtualized, it will discover and claim the LUNs presented to it the same way it would use capacity from locally installed drives. The common exception is that the virtualization controller usually does not apply RAID to the discovered LUNs. Low-level functions such as RAID are still performed by the virtualized array.

From this point on, the LUNs discovered can be used by the virtualization controller in exactly the same way that it uses its own internal volumes. By that, we mean that the external volumes can be used to create pools and LUNs.

Figure 9.11 shows a virtualized array presenting five LUNs to the virtualization controller. The virtualization controller logs in to the virtualized array, discovers and claims the LUNs, and forms them into a pool. This pool is then used to provide capacity for four newly created volumes that are presented out of the front-end of the virtualization controller to two hosts as SCSI LUNs. The two hosts have no idea that the storage for the LUNs is virtualized behind the array it is talking to. In fact, the host thinks it is talking to physical disk drives that are physically installed in the server hardware that it is running on!

The local storage in the virtualization controller is referred to as internal storage, whereas the storage in the array being virtualized is often referred to as external storage.

Putting It All Together

Exercise 9.1 recaps the previous sections in a more succinct format.

EXERCISE 9.1

Configuring Storage Virtualization

These steps outline a fairly standard process for configuring storage virtualization. The exact procedure in your environment may vary, so be sure to consult your array documentation. If in doubt, consult with your array vendor or channel partner. We walk through a high-level example here to help you understand what is going on.

  1. On the array being virtualized, perform the following tasks:
    1. Carve the backend storage into RAID-protected volumes. Normally, large volumes, such as 2 TB, are created and presented out of the front as FC LUNs.
    2. Present these LUNs on the front-end ports and mask them to the WWPNs of the virtualization controller.
    3. Configure any standard cache settings for these LUNs.
    4. Configure any host mode settings required by the virtualization controller. For example, if your virtualization controller emulates a Windows host, make sure you configure the host mode settings appropriately.
  2. On the virtualization controller, configure a number of front-end ports as external ports (virtualization mode). You should configure a minimum of two for redundancy.
  3. Now that the basic tasks have been performed on the virtualization controller as well as the array being virtualized, the arrays need to be connected. These connections can be either direct attach or SAN attach. The important thing is that they can see each other and communicate.
  4. Once the arrays are connected and can see each other, use the virtualization controller's GUI or CLI to discover the LUNs presented to it from the virtualized array.

These newly discovered volumes can then be configured and used just like any other disk resources in the virtualization controller, such as using them to create pools or exported LUNs.

Now that we've described how controller-based virtualization works, let's look more closely at some of the potential benefits, including the following:

  • Prolonging the life of existing assets
  • Adding advanced functionality to existing assets
  • Augmenting functionalities such as auto-tiering
  • Simplifying migration

Prolonging the Life of an Existing Array

One of the touted benefits of virtualizing an existing storage array is to prolong its life. The idea is that instead of turning off an old array (because it's out-of-date and doesn't support the latest advanced features you require), you decide to give it a brain transplant by virtualizing it behind one of your newer arrays, giving it a new lease of life.

This idea works fine on paper, but the reality tends to be somewhat more complicated. Even in a difficult economy with diminishing IT budgets, arrays are rarely virtualized to prolong their life.

Some of the complications include the following:

  • Having to keep the virtualized array on contractual maintenance congruent to the environment it will continue to serve. It is not the best idea in the world to have the old virtualized array on a Next Business Day (NBD) support contract if it's still serving your tier 1 mission-critical apps, even if it is virtualized behind another newer array.
  • Multi-vendor support can be messy and plagued with finger-pointing, where each vendor blames the other and you get nowhere fast! This is made all the more challenging if the array you are virtualizing is old, as the vendor is usually not keen for you to keep old products on long-term maintenance and they generally stop releasing firmware updates and so on.

Experience has shown that although virtualizing an array to prolong its life works on paper, reality tends to be somewhat more messy.

Adding Functionality to the Virtualized Array

It's a lot more common for customers to buy a new tier 1 enterprise-class array along with a new tier 2/3 midrange array and to virtualize the midrange array behind the enterprise-class array. This kind of configuration allows the advanced features and intelligence of the tier 1 array to be extended to the capacity provided by the virtualized tier 2/3 array. Because the array performing the virtualization treats the capacity of the virtualized array the same way it treats an internal disk, volumes on the virtualized array can be thin provisioned, deduplicated, replicated, snapshotted, tiered, hypervisor offloads-supported, you name it.

This kind of configuration is quite popular in the real world.

Storage Virtualization and Auto-tiering

All good storage arrays that support controller-based virtualization will allow the capacity in a virtualized array to be a tier of storage that can be used by its auto-tiering algorithms. If your array supports this, then using the capacity of your virtualized array as a low tier of storage can make sound financial and technical sense.

Because the cost of putting a physical disk in a tier 1 array can make your eyes water—even if it's a low-performance disk such as 4 TB NL-SAS—there is solid financial merit in populating only the internal drive slots of a tier 1 array with high-performance drives. Aside from the high cost of populating internal drive slots, these internal drive slots also provide lower latency than capacity that is accessed in a virtualized array. So it stacks up both financially and technically to keep the internal slots for high-performance drives and use virtualized capacity as a cheap and deep tier.

This leads nicely to a multitier configuration that feels natural, which is shown in Figure 9.12.

FIGURE 9.12 Storage virtualization and auto-tiering

image

There is no issue having the hot extents of a volume on internal flash storage, the warm extents on internal high-performance SAS, and the coldest and least frequently accessed extents down on 7.2 K or 5.4 K NL-SAS on an externally virtualized array. This kind of configuration is widely deployed in the real world.

Of course, this all assumes that your virtualization license doesn't cost so much that it destroys any business case for storage virtualization.

Additional Virtualization Considerations

There is no getting away from it: controller-based virtualization adds a layer of complexity to your configuration. This is not something to run away from, but you should definitely be aware of it before diving in head first.

image Real World Scenario

Complications That Can Arise Due to Storage Virtualization

On the topic of complexity, one company was happily using storage virtualization until a senior storage administrator made a simple mistake that caused a major incident. The storage administrator was deleting old unused external volumes—volumes that were being virtualized. However, the virtualization controller gave volumes hexadecimal IDs, whereas the virtualized array gave volumes decimal IDs. To cut a long story short, the administrator got his hex and decimal numbers mixed up and deleted a whole load of the wrong volumes. This caused a lot of systems to lose their volumes, resulting in a long night of hard work for a lot of people. This minor complexity of hex and decimal numbering would not have existed in the environment if storage virtualization was not in use.

Depending on how your array implements storage virtualization and depending on how you configure it, you may be taking your estate down a one-way street that is hard to back out of. Beware of locking yourself into a design that is complicated to change if you later decide that storage virtualization is not for you.

If not designed and sized properly, performance can be a problem. For example, if you don't have enough cache in your virtualization controller to deal with the workload and additional capacity provided by any virtualized arrays behind it, you may be setting yourself up for performance problems followed by costly upgrades. Also, if the array you are virtualizing doesn't have enough drives to be able to cope with the data being thrown at it by the virtualization controller, it can quickly cause performance problems.

image Real World Scenario

The Importance of Planning Your Storage Virtualization Properly

I previously worked with a company that hadn't sized or configured their live production storage virtualization solution appropriately. They made two fundamental mistakes.

First, they didn't provide enough performance resources (on this occasion, not enough disk drives) in the array being virtualized. This meant that at peak times, the virtualized array was woefully incapable of receiving the large amounts of data the virtualization controller was sending it. This resulted in data backing up in the virtualization controller's cache and quickly brought the virtualization controller and all hosts connected to it to a grinding halt.

Their problem wasn't helped by the fact that they had configured the cache on the virtualization controller to be used as a write-back cache for the virtualized volumes. This meant that as the host writes destined for the virtualized array came into the virtualization controller, they ACKed very quickly and cache started filling up. However, as cache got fuller, it became important to get the data out of cache and onto disk, but the disk in this instance was on the virtualized array. And as we mentioned earlier, the virtualized array was undersized, and the data couldn't be pumped out of the cache fast enough, resulting in a huge cache-write-pending situation that brought the virtualization controller to its knees.

Finally, the cost of a virtualization setup can be an issue if you don't know what you're getting yourself into. Virtualization licenses rarely come for free, and their cost must be factored into any business case you make for controller-based virtualization!

Software-Defined Storage

Software-defined storage (SDS) is another form of storage virtualization. However, to make things more complicated here, there is no formal definition for software-defined storage, and it has been said by many that “there is no term as badly defined as software defined.”

For a lot of vendors, it's little more than a marketing buzzword, and every vendor is desperately trying to prove that their products are software defined. However, despite the lack of an agreed-upon definition, there are massive design and architectural changes afoot that promise to redefine not only storage, but the entire data center! And there absolutely are vendors out there shipping real software-defined storage products!

We will talk about some next-generation storage virtualization technologies that, in my opinion, fit as software-defined storage. Either way, they are forms of storage virtualization that differ from what we have already talked about. They're very cool!

SDS Explained

In order to better explain software-defined storage, it's useful to first take a look at traditional non-software-defined storage. Then we have something to compare to.

Traditional Non-Software-Defined Storage

In traditional non-SDS storage arrays, the intelligence is tightly coupled with physical hardware. It is tightly coupled with the hardware controllers, which in turn are usually tightly coupled with the physical drives and the rest of the backend. Put another way, the firmware, which provides the intelligence and all the data services, will run only on specific hardware controllers from the same vendor. You can't take the microcode from one vendor's array and run it on the controller hardware of another vendor's array. You can't even take it and run it on industry-standard server hardware platforms such as HP ProLiant and Cisco UCS.

In traditional non-software-defined storage, the software and the hardware are married in a strictly monogamous relationship!

image

When talking about intelligence, we're referring to everything—high availability, snapshots, replication, RAID, deduplication, thin provisioning, hypervisor offloads, you name it. In traditional non-software-defined storage architectures, it's all implemented in microcode that will run only on specific hardware. In software-defined storage, this intelligence can run on any industry-standard hardware or hypervisor.

A lot of traditional arrays provide high availability (HA) via dual-controller architectures. Take away the dual controllers—specific hardware with specific controller interconnects—and you can no longer implement HA.

As if the integration between intelligence and hardware wasn't already tight enough for traditional arrays, some take it a step further and implement some of their intelligence in the hardware, as is the case with custom silicon such as ASICs and FPGAs. This is hardware and software integration at its tightest.

Basically, with traditional storage arrays, the software is useless without the hardware, and the hardware is useless without the software. The two are welded together at the factory!

This tight integration makes it hard to take these architectures and rework them for more-flexible SDS solutions. But, hey, who says they need to? These tightly coupled architectures are still running businesses across the world.

The SDS Way

SDS is the polar opposite of traditional arrays and their tightly coupled intelligence and hardware. SDS is all about decoupling the hardware and the intelligence. It takes all of the value out of the hardware and shovels it into software. That makes the hardware an almost irrelevant, cheap commodity.

From an architectural perspective, software-defined storage sees storage in three layers—orchestration, data services, and hardware—as shown in Figure 9.13. None of these layers are tightly integrated with each other, meaning that you can mix and match all three, giving you more flexibility in choosing your hardware, your data services, and your orchestration technologies.

FIGURE 9.13 Three layers of software-defined storage

image

image

You may hear people refer to software defined storage (SDS) as separating the control plane from the data plane. This terminology is more popular and better suited to the network world, where the concepts of control planes and data planes are more widely used. Also, in the networking world, there is more control plane commonality as devices from different vendors interact with each other far more often than in the storage world. However, the principle is the same; it is referring to this decoupling of intelligence from hardware.

There are several key aspects to SDS. One of the most prominent features is the use of virtual storage appliances (VSAs) to reduce the strain on hardware. APIs are a convenient feature of working with SDS. SDS is easily scalable and is compatible with the cloud and other innovations. Let's look at each of these points a little more closely.

One popular example of ripping the intelligence out of the hardware—which is certainly a form of software-defined storage—is the virtual storage appliance (VSA). A VSA is an instance of a fully fledged intelligent storage controller implemented as pure software in the form of a virtual machine (VM) that runs on industry-standard hypervisors, which in turn run on industry-standard x64 hardware.

VSAs can utilize local disks in the physical servers they run on. They can also utilize capacity from DAS disk trays or even shared storage such as iSCSI and FC. Many of the VSA vendors recommend large, dumb JBOD (just a bunch of disks) storage attached to them for capacity.

Running storage controllers—the intelligence—as virtual machines opens up a whole new world of possibilities. For example, if you need to add resources to a VSA, just have the hypervisor assign more CPU (vCPU) or RAM (vRAM) to it. When you no longer need the additional resources assigned to the VSA, simply reduce the amount of vCPU and vRAM assigned. We could even go as far as to say that the entire stack (compute, storage, networking, applications, and so on) could be intelligent enough to ramp up resources to the VSA at times of high demand on the VSA and then to take them away again when the VSA no longer needs them. A simple example might be an overnight post-process deduplication task that is CPU intensive. When the job kicks in, either run an associated script to add more vRAM and vCPU or allow the system to dynamically assign the resources. Then when the dedupe task finishes, take the resources back. Now that's what I call dynamic!

These VSAs can run on the same hypervisor instance as your hosts and applications, bringing the storage closer to the application.

Established vendors, such as HP, as well as startup vendors including the likes of Nutanix, Nexenta, ScaleIO, SimpliVity, and others, have technologies that utilize the VSA software-defined storage model.

SDS is also API driven. In fact, more than that, it is open API driven, including RESTful APIs. RESTful APIs use the following simple verbs to perform functions: GET, POST, PUT, and DELETE. A major aim with SDS is for users to interact with the system via APIs, as well as the system itself using APIs to make its own internal calls. It is all about the APIs with software-defined storage. Contrast this with more-traditional non-SDS storage, where administration was handled via a CLI and usually very poor GUIs.

SDS is scale-out by nature. Take the simple VSA example again. If you need more storage controllers, just spin up more VSAs. These can be managed by the common industry standard APIs mentioned earlier, allowing for simpler, more centralized management. It is no longer necessary to log on to each array in order to configure it.

SDS embraces the cloud. As the storage layer is already abstracted and physically removed from the intelligence, it is simpler to implement cloud as a tier or target for storage. Traditional arrays tend to be designed with the storage medium close to the controllers, and the software intelligence on the controllers understands minute detail about the storage medium. SDS takes a totally different approach, where the storage hardware is abstracted, making cloud integration massively simpler and more fundamental. SDS solutions embrace cloud APIs such as REST and offer integrations such as Amazon S3 and OpenStack Swift.

Another benefit of the software-defined model is that the rate of software innovation is far higher than that of hardware innovation. Basically, it is easier, faster, and cheaper to innovate in software than it is to innovate in hardware. So why slow down your innovation cycles by pinning innovation to hardware upgrades and releases?

Will Software-Defined Storage Take Over the World?

While the concept and some of the implementations of SDS are really interesting and absolutely have their place, they most likely will exist alongside traditional storage arrays for a very long time.

After all, it was not that long ago that traditional, tightly coupled architectures were all the rage, and terms like appliance, turnkey, and prepackaged were in vogue. Now they are blasphemous if you listen to those who have planted their flag in the SDS world. But let's not forget that the traditional approach of fully testing, burning in, and slapping a “designed for XYZ” logo on the side of a traditional array has its advantages. For one thing, you know that the vendor has tested the solution to within an inch of its life, giving you the confidence to run your tier 1 business-critical applications on it. You also have a single vendor to deal with when things do go wrong, instead of having several vendors all blaming each other.

image

Interestingly enough, some vendors ship SDS solutions as software-only. Others sell SDS solutions as software and hardware packages that include both the controller intelligence and the hardware. The main reason for this is supportability. Although the software (intelligence) and hardware are sold as a single package here, they do not have to be. This is purely to make support simpler because the vendor will have extensively tested the software running on their hardware and will have it running in their labs to assist with support calls.

At the end of the day, established companies around the world aren't going to give up on something that has served them well for many years. They will likely start testing the water with SDS solutions and implement them in lab and development environments before trusting their mission-critical systems to them. Newer companies, which are more likely to be highly virtualized with zero investment in existing storage technologies, are far more likely to take a more adventurous approach and place their trust in SDS-based solutions.

As with all things, do your homework and try before you buy!

Chapter Essentials

Host-Based Storage Virtualization Host-based virtualization is usually in the form of a logical volume manager on a host that aggregates and abstracts volumes into virtual volumes called logical volumes. Volume managers also provide advanced storage features such as snapshots and replication, but they are limited in scalability, as they are tied to a single host. Most people don't consider host-based storage virtualization as a true form of storage virtualization.

Network-Based (SAN-Based) Storage Virtualization Network-based virtualization is more of a concept than a reality these days, as most of the products in this space didn't take off and have been retired.

Controller-Based Storage Virtualization This is by far the most common form of storage virtualization and consists of an intelligent storage array (or just storage controllers) that can virtualize the resources of a downstream array. Controller-based storage virtualization is a form of in-band, block-based virtualization.

Software-Defined Storage SDS is any storage controller intelligence implemented solely in software that can be installed on industry-standard hypervisors running on industry-standard server hardware that can virtualize and add intelligence to underlying commodity storage.

Summary

In this chapter we covered all of the popular implementations of storage virtualization seen in the real world, and concentrated on the most popular form—controller based virtualization. We covered some of the major use cases for storage virtualization, as well as some of the potential problems you'll encounter if you don't get it right. We then finished the chapter with a discussion of the hot new topic of software defined storage (SDS) and compared and contrasted it with the more traditional approach of tightly integrating storage intelligence with storage hardware.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.98.239