Chapter 13

Managing Data Storage in the Cloud

IN THIS CHAPTER

check Getting acquainted with cloud storage

check Implementing hybrid cloud storage

check Preparing for change

Data storage is a prime concern for businesses and can be complex and expensive. There are many considerations when it comes to data storage. A small sampling of the typical storage questions includes

  • How often will the data be accessed?
  • When should the data be deleted?
  • Where geographically should the data be stored?
  • What is the data backup plan?

These concerns don’t go away within a cloud environment. In fact, in many ways the movement to the cloud can complicate data storage because there are many more choices in the cloud, the scale of data may be much greater, and managing data across on premises and private, public, and hybrid clouds can be much more challenging.

While many businesses are investigating a multicloud approach to data storage, many questions need to be answered. When should you keep data in your private cloud or data center? When is it appropriate to store data in a public cloud environment? How are the storage options going to impact important issues such as latency, availability, security, and governance?

In this chapter, we explore the evolution of storage requirements leading to the hybrid cloud strategy approach. In addition, we consider how companies can streamline their organizations’ IT capabilities by leveraging both public and private cloud storage options.

Understanding Cloud Storage Fundamentals

The design of cloud storage is very similar to other cloud services in terms of self-service, elasticity, and scalability. Cloud storage is based on abstracting physical storage with a well-defined interface so it can be managed in a self-service manner. In addition, cloud storage needs to use an architecture that protects each consumer’s cloud data from other consumers.

Of course, one of the most important characteristics of cloud storage is how it integrates with IaaS, PaaS, and SaaS clouds. Today, hard disks are connected to physical servers using a SATA interface, which is rapidly replacing older IDE and SCSI protocols. However, in the cloud, scalable storage services are typically not connected directly to physical servers, but use interfaces that transfer data over Ethernet or other communication technologies. We explore how storage connects with virtual machines in the cloud in the upcoming “Elements of storage” sidebar.

The following sections address four key fundamentals of cloud storage: access protocols, usage scenarios, functions, and benefits.

Cloud storage access protocols

Technical stuff One important issue in cloud storage is the speed and ease of accessing data when it’s needed. In order for cloud storage to be a viable alternative to on-premises data storage, you need to be able to access your data at a competitive cost and at a speed that is appropriate for the situation. Today, there are four types of cloud storage access methods:

  • Web services application programming interfaces (APIs): These use RESTful APIs (following the principles of Representational State Transfer) to integrate with applications.
  • File-based protocols: These protocols are used to transfer files and provide integration independent of the application being connected. They also generally provide faster service than web service APIs. Different types are
    • Network File System (NFS)
    • Common Internet File System (CIFS)
    • File Transfer Protocol (FTP)
  • Block-based APIs: These use iSCSI (Internet SCSI) to connect an application to storage middleware that support services such as storage for databases, data replication, and data reduction.
  • Web-based Distribution Authoring and Versioning (WebDAV): This is based on Hypertext Transfer Protocol (HTTP).

Technical stuff The most common method for accessing cloud storage is by using web service APIs such as REST (Representational State Transfer). Cloud storage vendors implement this technology because it’s dynamic and simple to use in the cloud. In addition, because of the use of virtualization in cloud environments, there’s a requirement for a more stateless (no set location for any code) access protocol. Web service APIs support this requirement for statelessness. This access method is used by Amazon Simple Storage Service (Amazon S3), Windows Azure (Microsoft’s Cloud Platform), and others. However, Web service APIs need to be integrated with a specific application when used for cloud storage, which can create some challenges. If you want to avoid the need to integrate with an application, file-based protocols and block-based APIs can be used as alternative access methods. Another connection protocol is WebDAV, a specification designed to create an efficient cloud storage interface.

Delivery options for cloud storage

How will your cloud provider deliver your storage capability? You can use an appliance within your data center, or connect to a public or remote storage service.

Although latency is a major issue for primary (tier 1) cloud storage, particularly for data used frequently, vendors are currently offering a different class of products called hybrid cloud storage solutions that may ultimately address primary storage. (Because we talk about hybrid clouds in general throughout this book, some of the terminology may be confusing, but bear with us.) The idea is to use local and cloud-based resources to address performance issues associated with storage in the cloud. Generally, these offerings consist of two things:

  • An appliance that is a physical or virtual server where the hardware and software are preconfigured so the user doesn’t have to understand the details
  • A connection to a remote storage service

The appliance intelligently handles the movement between the local storage and the cloud; to the end user, all the data seems to be in one place.

A cache is a block of memory for temporary storage on the appliance that provides a high-speed buffer between your client and the cloud service. The cache uses a host of algorithms to keep the most frequently used data on the local, expensive hardware. For read requests, attributes such as the age of the data, time since last accessed, time since last updated, and so on are used. For write requests, the appliance may write the data locally on the machine and then burst it out to the cloud storage provider.

The data is generally encrypted when it’s transported. When you request data from the provider, the data is first deduplicated to make it faster to retrieve.

Functions of cloud storage

The type of information you need to store and how quickly you need to access data both have an impact on the type of storage you will use. You can use policy-based replication to enable more granular control over how and where data is stored.

Cloud storage can serve multiple purposes:

  • General-purpose storage for day-to-day or periodic use
  • Data protection and continuity, which can include data replication and backup and restore functionality
  • Archive and records management, meaning recoverable long-term data retention to support compliance and regulatory requirements

Benefits of cloud storage

Some of the benefits of cloud storage include

  • Agility: The elastic nature of the cloud enables you to gain potentially unlimited storage in an on-demand model.
  • Fewer physical devices to purchase and maintain: When you’re storing data in a data center, you have to plan for the servers that will be part of this storage solution. This means you need to plan for purchasing the machines and maintain them during their life cycle. Additionally, you must make sure that you have enough space and can meet power requirements. In the cloud, you don’t have to purchase physical devices or deal with environmental issues. The cloud provider should do this for you (but it pays to do your homework on the services that your provider offers).
  • Disaster recovery: The cloud can serve as a good replacement for tape or other backups and can minimize concerns about your own data center capacity to support your backups. Instead of continuing to expand your on-premises storage, your information can be backed up to the cloud. If your systems go down, you can retrieve your data from the cloud.
  • Cost: Although DAS is relatively inexpensive, NAS and SAN devices require significant capital expenditures. The cloud storage model is based on usage, so you only pay for what you use. This is similar to how you use your telephone — generally speaking, you pay for what you use.

Deploying Hybrid Cloud Storage

You might consider various scenarios for a storage architecture when you deploy a cloud. Remember that in a hybrid and multicloud model, some of your resources and assets might be on-premises while other will live in one or more clouds. Here are some possible scenarios:

  • Your applications and data are on-premises, and your tier 2 and 3 data is stored in a public cloud.
  • Some of your applications are in a public cloud, your data is on-premises, and your storage is in a public cloud.
  • You have a private cloud within your enterprise, and you’re managing a private cloud that’s hosted elsewhere.
  • Some of your applications are in a public cloud along with your data. Some of your applications and data are on-premises. Your storage is both in the cloud and on-premises.

You get the idea. In a hybrid world, there can be multiple permutations in terms of how you architect your applications, data, and storage. So, here’s what you need to be thinking about in terms of storage as you deploy a hybrid cloud.

Interfaces

To store and retrieve data, your applications need an API that connects your local system to the cloud-based storage system. Users should be able to send data to the cloud storage device and access data from it. You need to ensure that the APIs the cloud provider uses are interoperable with your own, because there are few standards for cloud storage. (See Chapter 5 for more on standards.) In other words, vendors like to use their own APIs.

According to experts, what users want is a standard like the ubiquitous TCP/IP for the network used across all storage interfaces. However, this may be difficult because each vendor may define its own APIs based on SOAP and REST. So, for the near term, there may be similarities, but vendors won’t be completely interoperable.

Security

Security is always a concern. Make sure security measures are in place when data is transferred between storage and on-premises locations, as well as access-control measures once the data is stored. Files need to be secure while in storage, too.

Reliability

Data integrity is also a piece of the hybrid cloud environment. You need to make sure that your data gets from point A to point B and that it maintains its integrity. Your cloud provider might index your data. Its integrity also needs to remain intact when it’s in storage. For example, if indexes are corrupted, you can lose your data. We talk about the why and how of security in Chapter 15.

Business continuity

Planned and even unplanned downtime can cause problems for your business. Your storage provider needs to include snapshots, mirroring, and backups, as well as rapid recovery so that if the provider’s system goes down, you’re covered. You also need to make sure that the right service level agreements are in place (see Chapter 19).

Reporting and chargeback

Because cloud storage is a pay-as-you-go model, you need to know what your bill will be at the end of the billing cycle. This will include any transactional charges the provider might charge you as well as storage costs.

Management

In a hybrid cloud environment, if you choose to store some of your data on-premises and some in the cloud, you’ll need to be able to manage the environments together. How will service levels be monitored and managed across these environments? How will you know if there’s a problem with your storage provider? It would be nice to be able to manage all of this together, in one spot, in one single “pane of glass.” However, the industry is not there yet, because it’s continuing to evolve its offerings in this space. See Chapter 12 for more on managing a hybrid and multicloud environment.

Performance/latency

Once you put your data in the cloud, you are subject to latency (delays that occur when processing data) issues. The questions to ask here are these (which we explore more deeply in the next section):

  • How quickly will your applications need data?
  • What are the risks if data isn’t available in a reasonable time frame?
  • Will your applications experience time-out and thus problems?
  • Does the cloud storage provider match or exceed your network speeds?
  • Are there any bottlenecks?

Data and network speed

Once you start moving your data into the cloud, you may need to address latency concerns, depending on the amount of data you’re storing there and how often you need to access it. In a hybrid model, you’re not just utilizing your LAN or WAN for data access, you’re now going across the Internet to access it. So, you really need to think about the kind of data you’re willing to store in the cloud based on how often you need to get to it and the network speed that you’re dealing with. Although storage may be unlimited (for a price) in the cloud, the network is not. Two issues you need to consider are amount of data and network speed.

Amount of data

Say that you want to store a large amount of tier 3 data in a storage cloud provider. It may not make sense to actually try to transport the data over the Internet. Remember, the bandwidth of a truck is greater than any existing network. It might make sense to provide the data to the vendor in another way. Calculate transfer rates based on the amount of data you have and then decide which leads to the next point regarding network speed.

Network speeds

Bandwidth is just one element that contributes to network speed. Latency is another one. Latency refers to a delay in processing data as it moves from one part of a network to another. For example, when a singer’s mouth moves on a video but the words don’t seem to match, that’s because of latency. Low latency is when there’s a short delay; high latency is when there’s a longer delay. So, although the speed of your network should be fixed according to the bandwidth of the network connection, it doesn’t always work that way because of latencies. A number of factors contribute to network latency, including data collisions, contention for bandwidth, encryption, as well as routers and computer hardware delays, to name a few.

A good corporate LAN/WAN is a gigabit network, which means that your internal network might be faster than the Internet. So, after your information gets to the Internet, you may experience a bottleneck as the information moves to your provider. This bottleneck will affect how quickly you can get your data off your premises and, more importantly, back to your premises. If you have a petabyte of data in a provider’s cloud and want to analyze it on-premises, be aware that it’s going to take a while to get the data back. You need to consider this issue when planning your hybrid deployment. For instance, you may decide not to store tier 1 data in the cloud because network speeds may not match your requirements for use.

Planning for Cloud Growth and Change

Planning for cloud growth and change involves understanding your data, devising a strategy to deal with the growth, and choosing a provider. We discuss each step in this section.

Understanding your data

In a cloud environment, as with any environment, you need to understand your rate of data generation. Of course data is being generated at faster and faster rates with the advent of newer technologies like the Internet of Things (IoT) and social data as well as the technical ability to store all of this data.

In a multicloud world, this data is being generated in multiple cloud environments and on-premises.

Devising a growth strategy

The second step is to devise a strategy to determine how you’re going to deal with the growth of data and the move to the cloud. As part of this strategy, you need to understand how much storage growth you want to support internally and how much you can support outside your corporate walls. You need to do an analysis that compares your investment in corporate infrastructure to a potential cloud strategy. This analysis includes the following:

  • What kind of applications and data you’re willing to store in the cloud versus what you want to keep on-premises: This includes data issues associated with regulatory compliance and other risk factors. Although you may be thinking only about archive and backup applications, experts advise considering other applications that may not be mission-critical. However, make sure that your provider can adhere to any regulatory or compliance issues your company has in place. You also need to make sure they are willing to change if something changes in your industry.
  • A risk assessment: Every company has its own tolerance level when it comes to risk. Aside from technology risks, you may also want to consider how your processes might change in the cloud. For example, you need to determine whether there are any people, processes, or cultural issues to consider.
  • On-site data storage costs: Include all costs associated with on-site data storage: hardware, software, maintenance, environmental costs (such as electricity), and so on.
  • Cloud storage costs: Include all costs associated with cloud storage, including data migration costs and storage costs associated with these applications and data.

Choosing a provider

When you’ve decided that you want some of your applications and data in the cloud, you need to pick your provider with due diligence. Read the fine print in terms of costs associated with the storage and what contract termination looks like. You also want to make sure that the provider puts recovery-time objectives in place, in case there’s a problem with its service. Also, make sure the vendor you select is viable. For example, what happens to your storage if your service provider goes out of business? Will you be able to recover your assets?

Experts also advise to ensure that an escape clause is in your contract, in case your provider doesn’t perform as advertised.

Warning These concerns boil down to trust and doing your homework. Do you trust your vendor and have you put the right contracts in place to protect yourself? Have you done your homework? If you haven’t, you need to.

The hybrid cloud storage model offers many advantages to organizations that want to maintain the security of storing their highly confidential data within a private cloud and then selectively store data with fewer confidentiality requirements in the public cloud. Ultimately, the right mix between public and private environments is one that maximizes cost savings while maintaining security and geographic storage requirements.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.190.156.80