Chapter 4

Managing a Hybrid and Multicloud Environment

IN THIS CHAPTER

check Deconstructing cloud concepts

check Discovering resource pools/cloud models and services

check Evaluating the role of the data center

check Finding out how the public cloud fits and when the private cloud shines

Management of a hybrid and multicloud environment is a complex topic as it spans computing activities in on-premises data centers, private and hybrid clouds, and numerous public cloud environments. In the past, computing resources were physical and highly siloed. Therefore, managing individual systems combined with their workloads made sense.

But times have changed with the advent of cloud computing. In this distributed work, many applications are independent of their underlying infrastructure. Organizations no longer look at individual computing resources as stand-alone systems. Rather the combination of the data center, private and public cloud, and Software as a Service (SaaS) applications now define computing. The users of computing services within your business no longer distinguish between a workload running in a data center and a service running in a public cloud. Users simply want everything to work predictably. In this chapter, we discuss what it means to manage computing in the era of the hybrid cloud.

What Are You Managing?

So, what are you actually managing in a multicloud environment? What are the considerations? Not only do you have to ensure that a service is up and running, but you have to make sure that you address the diversity of goals, roles, resources, and other issues that must be supported and addressed. Take a step back and look at the type of services you need to manage in this world of hybrid computing. We have divided the capabilities that you’ll need to manage into five categories:

  • SaaS applications
  • External cloud resources
  • Internal cloud resources
  • External cloud resources
  • Managing external services

Managing SaaS Applications

Increasingly businesses are turning to SaaS applications that are owned and operated by third-party vendors. It’s not uncommon for a single business to support hundreds of SaaS applications. However, businesses are beginning to bring some level of control to the use of SaaS applications. IT management have to contend with several key problems in managing SaaS applications.

Anyone with access to a browser can access and sign up for a license and start using a SaaS application. For example, to exchange large files, it’s not uncommon for well-meaning employees to use a file-sharing application like Box or Dropbox to circumvent email attachment limits. With the growing popularity of SaaS applications, a business can quickly lose control.

Remember All SaaS applications are not equal. Some well-designed SaaS applications provide value to the business along with governance and auditability for administrators. On the other hand, other SaaS applications are designed for consumers and do not have the visibility that businesses require. Additionally, the application may not have well-defined interfaces so that it can easily connect to other corporate applications. Furthermore, applications that are not there may have vulnerabilities in the application that can put the business at risk.

Needless to say, these and other reasons are why a business needs to have oversight into the use of SaaS applications. Many corporate IT organizations have long battled with business units for control of computing resources. Often called shadow IT, business units use a variety of SaaS applications and tolls without the knowledge of IT. The cloud has accelerated this process.

Smart IT organizations have learned how to collaborate with business units so that business management can use the tools that are best suited for their task while protecting the business from risk. These organizations often will set up a working group consisting of IT leaders and business leaders to set parameters for what is acceptable. For example, everyone can agree on a set of SaaS applications that both meet their day-to-day needs and are fully vetted for security and reliability. One of the benefits is that IT or the procurement organization can negotiate for both price and support.

Tip But a better solution is for IT to proactively research and approve a full set of tools that the company will use and create a library of easily findable and usable tools in a convenient place. This solution allows employees to self-serve only approved items from the SaaS application library. IT should go further still, to encourage employees to engage with the team when they have a problem that isn’t met by the standard tools. Rather than push back or delay, IT would be more successful by offering a bonus for finding unmet needs and then providing a commitment to solve the problem quickly. (We talk about service level agreements in the section later in this chapter.)

Generally, SaaS applications are managed by the organizations that created them, so IT is probably not responsible for managing external SaaS applications. However, IT isn’t off the hook. The users of the SaaS application have little interest in excuses. Users simply want to know that an application is operational at all times. When IT operations explains that the SaaS vendor is responsible for the management of that application, the user is rarely satisfied. They won’t make a distinction between an application that resides in the data center or a private cloud and a third-party application managed in a public cloud.

For example, a great SaaS application is hosted in a public cloud, such as Amazon or Google. The public cloud experiences an outage that lasts for several hours. Once users begin calling IT to complain, IT contacts the SaaS vendor who explains that the problem is with the cloud provider. The problem, of course, is that the SaaS application user will have no sympathy and will demand action.

Optimizing SaaS Management

Once a business has set the ground rules for using SaaS within the organization and educated users on the best practices of using public SaaS applications, it can take additional steps to improve costs, productivity, and security.

As use of SaaS applications expands through an enterprise, IT and security teams should review use of SaaS applications. Security should examine actual use to understand whether any practices are risking loss of the business’s intellectual property (IP), opening up connections that hackers can exploit or where other insecure activities can occur.

Cloud Access Management (CAM) is a form of identity management that is specifically targeted to cloud service. Using CAM, users can be explicitly given rights to specific SaaS applications (and not to others), and governance specified for what information they can access. Security can use CAM to formalize which company personnel can access which SaaS applications and the rights they can exercise within SaaS applications. For example, HR employees may have the right to update all employees’ job performance ratings, while people outside of HR may be granted the rights to see only their own information.

IT will be interested in how many employees are using each SaaS application, and what they use the application for. As more people use a SaaS application, IT may be able to use that information to negotiate better terms for using the SaaS application from its vendor. For example, if multiple business units have purchased the same SaaS application, your business can receive a more favorable licensing agreement if you combine the management and purchasing of the application. From observing patterns of application use, IT may also see opportunities for purchasing other tools that can improve the operation of the business. Or, the need to integrate one SaaS application with others may become apparent.

Managing External Cloud Resources

Businesses use many types of external or public cloud resources that require management. Resources may be virtual machines that developers use, storage for backups or disaster recovery, databases for big data activities, and the list goes on.

Cloud resources are the building blocks used create applications. These infrastructure services are designed as a layer below SaaS applications and therefore are the responsibility of software developers.

Understanding who is using cloud services is important because that’s where management should be focused. Management of cloud services is typically practiced in IT and software development organizations.

Visibility and control of external resources

As with SaaS applications, simply grabbing a cloud resource to execute a task is often too easy. To be successful, you need to have the ability to apply controls so that you can gain visibility into the cloud applications and services.

The biggest management challenges of external cloud resources are identifying the most appropriate services to use, verifying their characteristics (performance, security, cost, and so on), and making sure that these services are used exclusively. The rationale for using these resources to the exclusion of other resources is that after a service has been selected, investments in training, testing, and building infrastructure software will occur to make the service work effectively. You should avoid selecting and using a different service that has the same functionality, as it can double the costs of using the functionality.

Because using resources and building applications are fundamentally technical activities, the development organization or IT (if software is being developed for internal company use) should drive their management. Development has knowledge of the functionality of the resources required for the product(s) they’re developing, how a service will integrate into the product framework, and the long-term technical goals of the business.

Tip The general cycle for approval, use, and eventual reuse of cloud services is:

  1. Identify and define the functional requirements of the software being developed.
  2. Research resources available from the cloud providers that are already used by the business to find a good match.

    Extend the search to other cloud providers if adequate solutions aren’t found or if additional cloud providers should be nurtured in the spirit of multicloud.

  3. Perform tests in a pilot project to verify the resource(s) found in Step 2 meet the functional requirements.
  4. If the testing is successful, form a business relationship with the service vendor and get access to the production version of the resource.

    Tip The vendor may be a cloud provider or a third party who makes its software available in a cloud platform. If the testing is not successful, go back to Step 2 or consider building the resource internally.

  5. Document the new service and its availability to the full development organization.
  6. On a regular basis, perhaps yearly, revisit existing resources (internally developed as well as external resources), the requirements they need to meet, and their operational history and issues.

    If requirements or the resource have changed, consider restarting this process at Step 2 to see whether a new resource may be a better fit.

The importance of self-service

The idea of creating a catalog of approved computing resources that consumers may select from is critical to being able to manage in a consistent and predictable way. It’s a simple idea, but a very important one.

Cloud providers make it as easy as possible for consumers to find and use their services. At the low pay-as-you-go prices of most cloud resources, it’s a very compelling bargain. However, the company should limit employee choices of computing resources to approved resources.

Remember The challenge is to make it easier for employees to use the company’s catalog to select what they want rather than go to the cloud themselves. To do so, the company must get ahead of the curve to understand the requirements and needs of development organizations. If the work of looking for, testing, and approving resources can be completed before the needs of the development organization become critical, then the catalog can include what the development organization needs. That proactivity will allow the easiest choice to be the right choice.

Service level agreements (SLAs)

Every cloud resource comes with a contractual agreement, known as a service level agreement (SLA), that outlines what the provider is delivering, along with the customer’s responsibilities. The agreement should outline characteristics like availability, accuracy, response time, throughput, and security. These important traits are critical for selecting resources that will meet the performance requirements of the services or applications that will use them.

Warning Many public cloud services are quite clear that the vendor accepts responsibility in only a limited set of situations for a problem. For example, if someone in the organization misconfigures a service and it’s out of commission, the vendor accepts responsibility. If the vendor is directly responsible for a security breach, the responsibility is clear. However, there are many gray areas. For example, what happens if a flood in the region knocks out service or destroys data? What if third-party networking services that link a customer and the cloud are down? The vendor didn’t create the flood or outage. Therefore, the disruption may not be covered by the SLA. What does it mean if the problem is the vendor’s responsibility? Will the vendor be liable for your lost business when the service is down, or will it simply refund the money you were charged during the outage? In many cases, end-user businesses have complex insurance policies to cover these types of technology outages that lead to business disruptions.

Performance claims also constitute SLAs, which are formal commitments for performance of the resource in actual use. In many cases, cloud vendors battle each other and on-premises technologies in terms of uptime. One vendor may claim 99.99 percent uptime while another might claim 99.999 percent uptime. Although the difference may seem negligible, an extra nine is equivalent to approximately 47 more minutes of uptime a year. Forty-seven minutes may sound like a small disruption or a catastrophe, depending on the workload — for example, a test environment versus a retailer’s transactional system. Table 4-1 shows the approximate amount of downtime per a year based on system availability.

TABLE 4-1 Downtime Based on System Availability

Availability as a percentage

Downtime per year

90% known as “one nine”

36.53 days

99% “two nines”

3.65 days

99.9% “three nines”

8.77 hours

99.99% “four nines”

52.60 hours

99.999% “five nines”

5.26 minutes

99.9999% “six nines”

31.56 seconds

Of course, as you get more nines, the cost of the service will increase. A cloud that claims five nines will be more expensive than a less predictable offering that has two nines Claims are routinely verified by consumers and third-party auditors, either through test harnesses or within operation of the completed application or service. But SLAs go further: Failure of a computing resource to meet SLAs can and should be grounds for canceling contracts.

Addressing Poor Cloud and Computing Behaviors

Cloud providers routinely seek to make their computing environments secure and robust, but users of those environments can still engage in risky behaviors. We’ve all heard about people choosing passwords that are easy to guess, possibly leading to unauthorized theft of information or damage to software and systems. That’s just one example of a dangerous behavior that companies work hard to prevent.

Warning Many corporate users bring their own devices to the workplace or use them when they’re working remotely. The IT organization is responsible for making sure that the right software and the right best practices are in place to ensure safety and security. IT organizations can take steps to avoid these dangerous behaviors, starting with employee education about the dangers of poor security practices. Many companies implement governance strategies that explicitly control which systems employees can access and the data they’re qualified to read or modify. Often, governance is based on the specific job the employee is performing with role-based access control (RBAC).

One more example of a security risk that is particularly relevant to today’s employees is the use of social media. In previous generations, knowledge of company policies and information was understood to be proprietary to the company and employees helped each other learn and follow safe practices. However, a new generation has grown up with social media and uses it to share information, answer questions, and generally extend the user’s community outside the company’s boundaries. Many social media users do understand the risks, but some don’t and may not recognize the danger in sharing corporate information or business practices outside the company.

Managing Internal Cloud Resources

As the cloud becomes ubiquitous as a way to deliver services to customers, the management of those resources can make the difference between success and failure. Because the business is serving cloud resources to its own employees and perhaps its business partners, it’s not a public cloud provider, but will deliver cloud resources via private or hybrid clouds. If it’s using a hybrid cloud, the business may well pass public cloud resources to its internal customers, making the business both a public cloud consumer as well as a private cloud provider.

Not surprisingly, many of the issues regarding the consumption of cloud services are still relevant if your business provides cloud resources to customers, employees, and/or partners. The biggest difference is that the consumers of public cloud providers are much more diverse than the consumers within a single business, so it’s normal for a private or hybrid cloud to offer only resources and services that are specific to that business.

Remember Regardless of the scope of offerings, issues like self-service, SLAs, and approved resources are just as critical, if not more so, to consumers of private and hybrid clouds.

Managing a hybrid cloud environment

Businesses are increasingly leveraging a combination of public and private clouds. The combination of these resources provides the benefits of scalability, flexibility, and performance to their internal computing consumers. Nowadays, with public clouds offering a high degree of security to match its broad catalog of services and resources, companies who still use private or hybrid clouds may have even more stringent requirements than before. As cloud computing matures and becomes core to the business strategy, the organization will likely select well vetted applications and resources, with more thorough security. In a private or hybrid cloud context, the approval process will be more important than ever.

Remember Carefully curated cloud resources offered for use in private or hybrid clouds are perfect for self-service. Not only will the carefully selected resources have been preapproved by the business with licensing or purchase already completed, but the resources will be specifically what the company requires to enable and secure critical in-house applications. In a mature self-service hybrid cloud, all the available resources and applications will be well architected to be the building blocks of a productive and safe data center.

Understanding the role of internal SLAs

When a company is a private or hybrid cloud provider to its internal consumers, the company should define SLAs for the resources and services provided. Doing so will formalize the operational requirements for those services and increase the chances that consumers will be pleased with the internal services. SLAs provide objective targets for performance and other operational characteristics, such as meantime between failures, and therefore are important for determining whether performance problems with applications are due to problems with the application or problems with the resources and services that the application uses.

In the public cloud, management of resources and services is the responsibility of the public cloud vendor or the third party who provides the software. A responsible organization should continually monitor the software operation to ensure that SLAs are being met. In the private cloud environment, it’s usually the responsibility of IT operations to monitor the software for deviations from SLA commitments.

While public cloud providers have the responsibility to watch SLAs, they have many customers and may not always respond immediately. On the other hand, a company running its own private cloud has one customer (itself) and is equally responsible for ensuring that all applications, services, and resources are always working effectively. In some organizations, executives will be watching whether SLAs are being met because problems can affect the company’s bottom line.

Managing Internal Services

As more businesses rely on cloud computing, they’re creating internal services that are delivered to consumers within a company via a private or public cloud or a hybrid of both. These consumers are likely to be spread across all divisions of a company and have differing levels of technical expertise. They may not be aware of differences between applications running in private, hybrid, or public cloud contexts. But regardless of where applications are served from, internal consumers of cloud services expect professional applications to be operating with reliability and security and be backed by professional support. This situation is not very different from applications provided from a precloud data center or running on a desktop workstation.

Support for internal applications may come from IT or from a call center — both managed within the company providing the private or hybrid cloud. It’s important that the company handling the support have first-hand access to the computing environment so that it can determine the causes of problems that consumers run into, or better, see and resolve problems before consumers call with a problem.

Supporting cloud customers

Providing call centers and support for internal consumers of private/hybrid cloud applications is an essential part of managing a private or hybrid cloud. When customers run into problems with an application, they expect quick help. For third-party applications, comprehensive support will usually come from those third parties although internal support should be aware of the common issues consumers may encounter. But if applications are developed or maintained in-house by internal development teams or by IT acting as a development organization, then development has a role to play with support.

For their part, support organizations pride and measure themselves on delivering quality and timely support. In fact, support organizations increasingly see a reduction of support calls as a goal. Reducing the number of calls can be accomplished by making the software more robust and more intuitive, but support organizations have little control over the software features being developed. Hence, development and support need to work closely together to provide world-class support.

Monitoring internal and external systems

High quality support requires knowledge of the systems being supported and their operational status. Customer problems may be due to misunderstandings or lack of knowledge of how an application works or how to use it to get a specific result. In these cases, support needs to understand the application in depth so that it can guide the customer in the right direction. Of course, hopefully the customer has been trained to use the application, but once the customer calls, support must address the issues.

Imagine that a customer calls with a problem regarding a specific feature of an application. When support receives the call, the first step will likely be to ask the customer what application they were using and what they were doing. This type of call probably happens every day, perhaps all day long. Remember that support is being measured on how quickly they can help customers, so how can they streamline this process? One answer is that the application should be tracking each user’s activities and saving the information in a location that is accessible to support. Then, when a customer calls, support can look up the customer’s recent activities and quickly know exactly what they were doing when they called support. When the process is fully automated, the phone system can recognize the caller, find the username, and present the support staff with the customer’s latest activities.

Rather than asking the caller for what application they were using, the support staff can answer the call with “I see you were using our CRM application and just got an error message when you tried to look up a business by its nickname. Is that what you’re calling about?” The time saved on each call, multiplied by all the calls, will be a big success for the support organization, the customer, and the company.

This example is just one of many that shows how monitoring applications improves the management of those applications. The following sections look at a few more examples.

Monitoring resources imported from the public cloud

In a hybrid cloud context, you rely on public cloud resources to augment the services used in the private cloud portion of the hybrid environment. To verify that those resources are operating with sufficient performance and meeting SLAs, you have a few choices:

  • Set up test software in the public cloud to sample performance of the resources. On the positive side, by testing the resource outside of your computing stack, you’ll avoid affecting your application’s performance. On the negative side, you may be testing resources that are so independent of your application that the testing results aren’t applicable to your application’s performance.
  • Set up that same test software in the hybrid cloud. The results are more likely to be consistent with what your application is experiencing, but the downside is that the testing will have a greater impact on the performance of your infrastructure because that’s where the testing is operating.
  • Look to see whether the resources you’re testing have a dashboard or other operational information available to you. If so, it may provide the performance information you’re looking for.
  • Instrument the application running in the hybrid environment to log the actual performance of the resources as delivered to the application. If done carefully, it will have little impact on the application’s performance. This approach, which is perhaps the best solution, is desirable as it will be the most accurate way to monitor the resources’ behavior, and the results can be integrated with all the other issues being monitored by the application.

Remember Regardless of which path you follow, the testing and reporting of resource performance should be automated (so humans don’t have to keep watching), and a notification system should be used to send serious failure information to support teams.

Monitoring the cloud infrastructure

Public cloud vendors provide information about the operational status of major subsystems and services within their cloud environments. The same type of information should be maintained for private and hybrid clouds. Although this information is usually high level, ensuring that basic services are working properly should be the first thing that support people look to verify.

Monitoring applications and services

Applications can report on the actions of application users to help support personnel provide advice to users who call with troubles. That same information about what users are doing is also very useful for developers and IT. User experience experts can look at how users work with the application to see problems in the user interface, opportunities for streamlining the user’s experience, and bugs in the application. Product managers use the information to verify that features are being used as intended and to explore how new features can improve the application.

Applications routinely generate logs, or records of exactly what the application has done. Developers examine logs after applications crash to help figure out why the crash occurred. With the use of log analysis tools, logs can reveal operational patterns that may suggest ways to improve the application.

Applications can also be instrumented to provide other useful information to the business, including Key Performance Indicators (KPIs) that, when designed properly, can disclose whether applications are meeting the goals they were designed to achieve.

Tip Increasingly, artificial intelligence and machine learning are being applied to old operational data to discover the patterns associated with problems so that current performance can be quickly assessed for recurring problems or the symptoms of problems that can then be evaluated by other tools.

Constructing dashboards

A wealth of data and information comes from a busy cloud environment, and many employees can use that information to solve their daily challenges. No one can process all that information in its raw state; there are just too many details.

Harnessing this information into a form that is useful requires at least two parts:

  • Analysis software that processes and reduces the data to a manageable form
  • A visualization technique that makes the information easy for people to recognize

Dashboards are often the best tool for presenting operational information to various audiences. With flexibility in how dashboards are constructed, different views with different focuses can be designed for different audiences. Support needs information to help users with problems, developers need to see performance data and information about what customers are doing, product managers need to see KPI and other usability information, and executives need to see high-level status and data on customer acceptance.

Managing External Services

One characteristic of a hybrid cloud environment is that an organization will have a variety of services that need to be managed. Some will be straightforward services, such as the ability to store data that is rarely used but must be kept. In other situations, businesses are developing their own applications that will operate in the public cloud. To be successful, visibility and control over external and internal resources is critical.

DevOps and deployment to public clouds

Nowadays, best practices for developing software for the cloud are based on DevOps practices. DevOps combines Development with Operations and enables continuous development and deployment of software. (See Chapter 11 for a more in-depth discussion of DevOps.) DevOps streamlines the management of application and service life cycles by eliminating handoffs between development and operations and makes the software more robust because developers are now paying more attention to operational issues.

After an application is deployed to the cloud, DevOps engineers continue to watch the software while it’s running. If operational issues come up, the engineers responsible for the software are immediately involved and can either solve problems tactically in the cloud or, if necessary, make changes to the application and redeploy it with fixes as quickly as the code can be changed and tested.

External system monitoring

With applications and services used by customers in other companies, it’s even more important that system monitoring gather application and service usage than it was within a single company. After all, companies have many more opportunities to quiz their own employees about how well systems work than quizzing employees of other companies.

Similarly, pushing product and operational status to external customers becomes more important because the customers will have less awareness of those issues and less loyalty to the software than internal customers.

Application and service life cycles

In the public cloud, there is less tolerance for problems, outages, and frustrations. Applications and services in the public cloud must be as robust as possible. Consider these issues for making applications and services fit the “always on and always available” expectations of the public cloud:

  • DevOps’s goal for continuous development and deployment is to release new features as soon as they’re ready. But while it was once okay to take the application down while it was being upgraded, it isn’t acceptable in the cloud. Applications and services must be able to be upgraded without disturbing customers who are using the application.
  • Application and server failures are always bad, but they’re probably worse in the cloud (if only because there may be many more users). Designing in failover capabilities so that a failure causes only a momentary pause is much more desirable than having the application be completely unavailable, or perhaps worse, cause user data loss.

The Future of Multicloud Management

Many different services and deployment models are emerging as part of the hybrid cloud fabric. We’re at the stage where it’s becoming more and more important to be able to bring together the management of all the internal and external cloud and data center services necessary to manage a well-run operation.

Not only are businesses leveraging a public cloud, they often will be using several different public cloud services across departments. When you combine that with all the private and third-party services being deployed and operated, there is escalating complexity as you scale. You should begin asking some key questions:

  • What are all the services being used, and which ones do you anticipate adding?
  • Why is a service being used? Does it fulfill the business requirement?
  • Do the services provide the level of security and governance demanded by management?
  • Is the data that the application generates stored in the appropriate geography?
  • Is the latency of the overall environment acceptable to all service consumers?

You have to deal with a lot of issues as you begin to put together your multicloud management strategy. You can’t think of your services as separate islands of computing, applications, or storage. Rather you need to put in place an infrastructure and approach that provides a seamless interface across all of your services so that you can provide consistency and predictability across all of your services.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.44.223