Chapter 4
IN THIS CHAPTER
Deconstructing cloud concepts
Discovering resource pools/cloud models and services
Evaluating the role of the data center
Finding out how the public cloud fits and when the private cloud shines
Management of a hybrid and multicloud environment is a complex topic as it spans computing activities in on-premises data centers, private and hybrid clouds, and numerous public cloud environments. In the past, computing resources were physical and highly siloed. Therefore, managing individual systems combined with their workloads made sense.
But times have changed with the advent of cloud computing. In this distributed work, many applications are independent of their underlying infrastructure. Organizations no longer look at individual computing resources as stand-alone systems. Rather the combination of the data center, private and public cloud, and Software as a Service (SaaS) applications now define computing. The users of computing services within your business no longer distinguish between a workload running in a data center and a service running in a public cloud. Users simply want everything to work predictably. In this chapter, we discuss what it means to manage computing in the era of the hybrid cloud.
So, what are you actually managing in a multicloud environment? What are the considerations? Not only do you have to ensure that a service is up and running, but you have to make sure that you address the diversity of goals, roles, resources, and other issues that must be supported and addressed. Take a step back and look at the type of services you need to manage in this world of hybrid computing. We have divided the capabilities that you’ll need to manage into five categories:
Increasingly businesses are turning to SaaS applications that are owned and operated by third-party vendors. It’s not uncommon for a single business to support hundreds of SaaS applications. However, businesses are beginning to bring some level of control to the use of SaaS applications. IT management have to contend with several key problems in managing SaaS applications.
Anyone with access to a browser can access and sign up for a license and start using a SaaS application. For example, to exchange large files, it’s not uncommon for well-meaning employees to use a file-sharing application like Box or Dropbox to circumvent email attachment limits. With the growing popularity of SaaS applications, a business can quickly lose control.
Needless to say, these and other reasons are why a business needs to have oversight into the use of SaaS applications. Many corporate IT organizations have long battled with business units for control of computing resources. Often called shadow IT, business units use a variety of SaaS applications and tolls without the knowledge of IT. The cloud has accelerated this process.
Smart IT organizations have learned how to collaborate with business units so that business management can use the tools that are best suited for their task while protecting the business from risk. These organizations often will set up a working group consisting of IT leaders and business leaders to set parameters for what is acceptable. For example, everyone can agree on a set of SaaS applications that both meet their day-to-day needs and are fully vetted for security and reliability. One of the benefits is that IT or the procurement organization can negotiate for both price and support.
Generally, SaaS applications are managed by the organizations that created them, so IT is probably not responsible for managing external SaaS applications. However, IT isn’t off the hook. The users of the SaaS application have little interest in excuses. Users simply want to know that an application is operational at all times. When IT operations explains that the SaaS vendor is responsible for the management of that application, the user is rarely satisfied. They won’t make a distinction between an application that resides in the data center or a private cloud and a third-party application managed in a public cloud.
For example, a great SaaS application is hosted in a public cloud, such as Amazon or Google. The public cloud experiences an outage that lasts for several hours. Once users begin calling IT to complain, IT contacts the SaaS vendor who explains that the problem is with the cloud provider. The problem, of course, is that the SaaS application user will have no sympathy and will demand action.
Once a business has set the ground rules for using SaaS within the organization and educated users on the best practices of using public SaaS applications, it can take additional steps to improve costs, productivity, and security.
As use of SaaS applications expands through an enterprise, IT and security teams should review use of SaaS applications. Security should examine actual use to understand whether any practices are risking loss of the business’s intellectual property (IP), opening up connections that hackers can exploit or where other insecure activities can occur.
Cloud Access Management (CAM) is a form of identity management that is specifically targeted to cloud service. Using CAM, users can be explicitly given rights to specific SaaS applications (and not to others), and governance specified for what information they can access. Security can use CAM to formalize which company personnel can access which SaaS applications and the rights they can exercise within SaaS applications. For example, HR employees may have the right to update all employees’ job performance ratings, while people outside of HR may be granted the rights to see only their own information.
IT will be interested in how many employees are using each SaaS application, and what they use the application for. As more people use a SaaS application, IT may be able to use that information to negotiate better terms for using the SaaS application from its vendor. For example, if multiple business units have purchased the same SaaS application, your business can receive a more favorable licensing agreement if you combine the management and purchasing of the application. From observing patterns of application use, IT may also see opportunities for purchasing other tools that can improve the operation of the business. Or, the need to integrate one SaaS application with others may become apparent.
Businesses use many types of external or public cloud resources that require management. Resources may be virtual machines that developers use, storage for backups or disaster recovery, databases for big data activities, and the list goes on.
Cloud resources are the building blocks used create applications. These infrastructure services are designed as a layer below SaaS applications and therefore are the responsibility of software developers.
Understanding who is using cloud services is important because that’s where management should be focused. Management of cloud services is typically practiced in IT and software development organizations.
As with SaaS applications, simply grabbing a cloud resource to execute a task is often too easy. To be successful, you need to have the ability to apply controls so that you can gain visibility into the cloud applications and services.
The biggest management challenges of external cloud resources are identifying the most appropriate services to use, verifying their characteristics (performance, security, cost, and so on), and making sure that these services are used exclusively. The rationale for using these resources to the exclusion of other resources is that after a service has been selected, investments in training, testing, and building infrastructure software will occur to make the service work effectively. You should avoid selecting and using a different service that has the same functionality, as it can double the costs of using the functionality.
Because using resources and building applications are fundamentally technical activities, the development organization or IT (if software is being developed for internal company use) should drive their management. Development has knowledge of the functionality of the resources required for the product(s) they’re developing, how a service will integrate into the product framework, and the long-term technical goals of the business.
Research resources available from the cloud providers that are already used by the business to find a good match.
Extend the search to other cloud providers if adequate solutions aren’t found or if additional cloud providers should be nurtured in the spirit of multicloud.
If the testing is successful, form a business relationship with the service vendor and get access to the production version of the resource.
The vendor may be a cloud provider or a third party who makes its software available in a cloud platform. If the testing is not successful, go back to Step 2 or consider building the resource internally.
On a regular basis, perhaps yearly, revisit existing resources (internally developed as well as external resources), the requirements they need to meet, and their operational history and issues.
If requirements or the resource have changed, consider restarting this process at Step 2 to see whether a new resource may be a better fit.
The idea of creating a catalog of approved computing resources that consumers may select from is critical to being able to manage in a consistent and predictable way. It’s a simple idea, but a very important one.
Cloud providers make it as easy as possible for consumers to find and use their services. At the low pay-as-you-go prices of most cloud resources, it’s a very compelling bargain. However, the company should limit employee choices of computing resources to approved resources.
Every cloud resource comes with a contractual agreement, known as a service level agreement (SLA), that outlines what the provider is delivering, along with the customer’s responsibilities. The agreement should outline characteristics like availability, accuracy, response time, throughput, and security. These important traits are critical for selecting resources that will meet the performance requirements of the services or applications that will use them.
Performance claims also constitute SLAs, which are formal commitments for performance of the resource in actual use. In many cases, cloud vendors battle each other and on-premises technologies in terms of uptime. One vendor may claim 99.99 percent uptime while another might claim 99.999 percent uptime. Although the difference may seem negligible, an extra nine is equivalent to approximately 47 more minutes of uptime a year. Forty-seven minutes may sound like a small disruption or a catastrophe, depending on the workload — for example, a test environment versus a retailer’s transactional system. Table 4-1 shows the approximate amount of downtime per a year based on system availability.
TABLE 4-1 Downtime Based on System Availability
Availability as a percentage |
Downtime per year |
90% known as “one nine” |
36.53 days |
99% “two nines” |
3.65 days |
99.9% “three nines” |
8.77 hours |
99.99% “four nines” |
52.60 hours |
99.999% “five nines” |
5.26 minutes |
99.9999% “six nines” |
31.56 seconds |
Of course, as you get more nines, the cost of the service will increase. A cloud that claims five nines will be more expensive than a less predictable offering that has two nines Claims are routinely verified by consumers and third-party auditors, either through test harnesses or within operation of the completed application or service. But SLAs go further: Failure of a computing resource to meet SLAs can and should be grounds for canceling contracts.
Cloud providers routinely seek to make their computing environments secure and robust, but users of those environments can still engage in risky behaviors. We’ve all heard about people choosing passwords that are easy to guess, possibly leading to unauthorized theft of information or damage to software and systems. That’s just one example of a dangerous behavior that companies work hard to prevent.
One more example of a security risk that is particularly relevant to today’s employees is the use of social media. In previous generations, knowledge of company policies and information was understood to be proprietary to the company and employees helped each other learn and follow safe practices. However, a new generation has grown up with social media and uses it to share information, answer questions, and generally extend the user’s community outside the company’s boundaries. Many social media users do understand the risks, but some don’t and may not recognize the danger in sharing corporate information or business practices outside the company.
As the cloud becomes ubiquitous as a way to deliver services to customers, the management of those resources can make the difference between success and failure. Because the business is serving cloud resources to its own employees and perhaps its business partners, it’s not a public cloud provider, but will deliver cloud resources via private or hybrid clouds. If it’s using a hybrid cloud, the business may well pass public cloud resources to its internal customers, making the business both a public cloud consumer as well as a private cloud provider.
Not surprisingly, many of the issues regarding the consumption of cloud services are still relevant if your business provides cloud resources to customers, employees, and/or partners. The biggest difference is that the consumers of public cloud providers are much more diverse than the consumers within a single business, so it’s normal for a private or hybrid cloud to offer only resources and services that are specific to that business.
Businesses are increasingly leveraging a combination of public and private clouds. The combination of these resources provides the benefits of scalability, flexibility, and performance to their internal computing consumers. Nowadays, with public clouds offering a high degree of security to match its broad catalog of services and resources, companies who still use private or hybrid clouds may have even more stringent requirements than before. As cloud computing matures and becomes core to the business strategy, the organization will likely select well vetted applications and resources, with more thorough security. In a private or hybrid cloud context, the approval process will be more important than ever.
When a company is a private or hybrid cloud provider to its internal consumers, the company should define SLAs for the resources and services provided. Doing so will formalize the operational requirements for those services and increase the chances that consumers will be pleased with the internal services. SLAs provide objective targets for performance and other operational characteristics, such as meantime between failures, and therefore are important for determining whether performance problems with applications are due to problems with the application or problems with the resources and services that the application uses.
In the public cloud, management of resources and services is the responsibility of the public cloud vendor or the third party who provides the software. A responsible organization should continually monitor the software operation to ensure that SLAs are being met. In the private cloud environment, it’s usually the responsibility of IT operations to monitor the software for deviations from SLA commitments.
While public cloud providers have the responsibility to watch SLAs, they have many customers and may not always respond immediately. On the other hand, a company running its own private cloud has one customer (itself) and is equally responsible for ensuring that all applications, services, and resources are always working effectively. In some organizations, executives will be watching whether SLAs are being met because problems can affect the company’s bottom line.
As more businesses rely on cloud computing, they’re creating internal services that are delivered to consumers within a company via a private or public cloud or a hybrid of both. These consumers are likely to be spread across all divisions of a company and have differing levels of technical expertise. They may not be aware of differences between applications running in private, hybrid, or public cloud contexts. But regardless of where applications are served from, internal consumers of cloud services expect professional applications to be operating with reliability and security and be backed by professional support. This situation is not very different from applications provided from a precloud data center or running on a desktop workstation.
Support for internal applications may come from IT or from a call center — both managed within the company providing the private or hybrid cloud. It’s important that the company handling the support have first-hand access to the computing environment so that it can determine the causes of problems that consumers run into, or better, see and resolve problems before consumers call with a problem.
Providing call centers and support for internal consumers of private/hybrid cloud applications is an essential part of managing a private or hybrid cloud. When customers run into problems with an application, they expect quick help. For third-party applications, comprehensive support will usually come from those third parties although internal support should be aware of the common issues consumers may encounter. But if applications are developed or maintained in-house by internal development teams or by IT acting as a development organization, then development has a role to play with support.
For their part, support organizations pride and measure themselves on delivering quality and timely support. In fact, support organizations increasingly see a reduction of support calls as a goal. Reducing the number of calls can be accomplished by making the software more robust and more intuitive, but support organizations have little control over the software features being developed. Hence, development and support need to work closely together to provide world-class support.
High quality support requires knowledge of the systems being supported and their operational status. Customer problems may be due to misunderstandings or lack of knowledge of how an application works or how to use it to get a specific result. In these cases, support needs to understand the application in depth so that it can guide the customer in the right direction. Of course, hopefully the customer has been trained to use the application, but once the customer calls, support must address the issues.
Imagine that a customer calls with a problem regarding a specific feature of an application. When support receives the call, the first step will likely be to ask the customer what application they were using and what they were doing. This type of call probably happens every day, perhaps all day long. Remember that support is being measured on how quickly they can help customers, so how can they streamline this process? One answer is that the application should be tracking each user’s activities and saving the information in a location that is accessible to support. Then, when a customer calls, support can look up the customer’s recent activities and quickly know exactly what they were doing when they called support. When the process is fully automated, the phone system can recognize the caller, find the username, and present the support staff with the customer’s latest activities.
Rather than asking the caller for what application they were using, the support staff can answer the call with “I see you were using our CRM application and just got an error message when you tried to look up a business by its nickname. Is that what you’re calling about?” The time saved on each call, multiplied by all the calls, will be a big success for the support organization, the customer, and the company.
This example is just one of many that shows how monitoring applications improves the management of those applications. The following sections look at a few more examples.
In a hybrid cloud context, you rely on public cloud resources to augment the services used in the private cloud portion of the hybrid environment. To verify that those resources are operating with sufficient performance and meeting SLAs, you have a few choices:
Public cloud vendors provide information about the operational status of major subsystems and services within their cloud environments. The same type of information should be maintained for private and hybrid clouds. Although this information is usually high level, ensuring that basic services are working properly should be the first thing that support people look to verify.
Applications can report on the actions of application users to help support personnel provide advice to users who call with troubles. That same information about what users are doing is also very useful for developers and IT. User experience experts can look at how users work with the application to see problems in the user interface, opportunities for streamlining the user’s experience, and bugs in the application. Product managers use the information to verify that features are being used as intended and to explore how new features can improve the application.
Applications routinely generate logs, or records of exactly what the application has done. Developers examine logs after applications crash to help figure out why the crash occurred. With the use of log analysis tools, logs can reveal operational patterns that may suggest ways to improve the application.
Applications can also be instrumented to provide other useful information to the business, including Key Performance Indicators (KPIs) that, when designed properly, can disclose whether applications are meeting the goals they were designed to achieve.
A wealth of data and information comes from a busy cloud environment, and many employees can use that information to solve their daily challenges. No one can process all that information in its raw state; there are just too many details.
Harnessing this information into a form that is useful requires at least two parts:
Dashboards are often the best tool for presenting operational information to various audiences. With flexibility in how dashboards are constructed, different views with different focuses can be designed for different audiences. Support needs information to help users with problems, developers need to see performance data and information about what customers are doing, product managers need to see KPI and other usability information, and executives need to see high-level status and data on customer acceptance.
One characteristic of a hybrid cloud environment is that an organization will have a variety of services that need to be managed. Some will be straightforward services, such as the ability to store data that is rarely used but must be kept. In other situations, businesses are developing their own applications that will operate in the public cloud. To be successful, visibility and control over external and internal resources is critical.
Nowadays, best practices for developing software for the cloud are based on DevOps practices. DevOps combines Development with Operations and enables continuous development and deployment of software. (See Chapter 11 for a more in-depth discussion of DevOps.) DevOps streamlines the management of application and service life cycles by eliminating handoffs between development and operations and makes the software more robust because developers are now paying more attention to operational issues.
After an application is deployed to the cloud, DevOps engineers continue to watch the software while it’s running. If operational issues come up, the engineers responsible for the software are immediately involved and can either solve problems tactically in the cloud or, if necessary, make changes to the application and redeploy it with fixes as quickly as the code can be changed and tested.
With applications and services used by customers in other companies, it’s even more important that system monitoring gather application and service usage than it was within a single company. After all, companies have many more opportunities to quiz their own employees about how well systems work than quizzing employees of other companies.
Similarly, pushing product and operational status to external customers becomes more important because the customers will have less awareness of those issues and less loyalty to the software than internal customers.
In the public cloud, there is less tolerance for problems, outages, and frustrations. Applications and services in the public cloud must be as robust as possible. Consider these issues for making applications and services fit the “always on and always available” expectations of the public cloud:
Many different services and deployment models are emerging as part of the hybrid cloud fabric. We’re at the stage where it’s becoming more and more important to be able to bring together the management of all the internal and external cloud and data center services necessary to manage a well-run operation.
Not only are businesses leveraging a public cloud, they often will be using several different public cloud services across departments. When you combine that with all the private and third-party services being deployed and operated, there is escalating complexity as you scale. You should begin asking some key questions:
You have to deal with a lot of issues as you begin to put together your multicloud management strategy. You can’t think of your services as separate islands of computing, applications, or storage. Rather you need to put in place an infrastructure and approach that provides a seamless interface across all of your services so that you can provide consistency and predictability across all of your services.
18.188.44.223