Chapter 17. Performance and Stress Testing

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 17. Performance and Stress Testing

If you know neither the enemy nor yourself, you will succumb in every battle.

—Sun Tzu

After peripherally mentioning performance and stress testing in previous chapters, we now turn our full attention to these tests. In this chapter, we discuss how these tests differ in purpose and output as well as how they impact scalability. Whether you use performance tests, stress tests, neither, or both, this chapter should give you some fresh perspectives on the purpose and viability of testing that you can use to either revamp or initiate a testing process in your organization.

As is the case with the quality of your product, scalability is something that must be designed early in the development life cycle. Testing, while a necessary evil, is an additional cost in our organizations meant to uncover problems and oversights with our designs.

Performing Performance Testing

Performance testing covers a broad range of engineering evaluations, where the emphasis is on the final measurable performance characteristics instead of the actual material or product.¹ With respect to computer science, performance testing focuses on determining the speed, throughput, or effectiveness of a device or piece of software. Performance testing is often called load testing; to us, these two terms are interchangeable. Some professionals will argue that performance testing and load testing have different goals but similar techniques. To avoid a pedantic argument, we will use a broader goal for defining performance testing so that it incorporates both.

1. Performance testing. Wikipedia. http://en.wikipedia.org/wiki/Performance_testing.

According to our definition, the goal of performance testing is to identify, document, and, where possible, eliminate bottlenecks in the system. This is done through a strict controlled process of measurement and analysis. Load testing is utilized as a method in this process.

Handling the Load with Load Testing

Load testing is the process of putting load or user demand on a system to measure its response and stability, the purpose of which is to verify that the application can meet the desired performance objectives, which are often specified in a service level agreement (SLA). A load test measures such things as response time, throughput, and resource utilization. It is not intended to identify the system’s breaking point unless this point occurs below the peak load condition that is expected by the specifications, requirements, or normal operating conditions. If that should occur, you have a serious issue that must be addressed prior to release.

Example load tests include the following

• Test a mail server with the load of the expected number of users’ email accounts.

• Test the same mail server with the expected load of email messages.

• Test a SaaS application by sending many and varied simulated user requests to the application over an extended period of time—the more like production traffic, the better.

• Test a load-balanced pair of app servers with a scaled-down load of user traffic.

Establish Success Criteria

The first step in performance testing is to establish the success criteria. For SaaS systems, this is often based on the concurrent usage and response time metrics. For existing solutions, most companies use baselines established over time in a production environment and/or previous tests within a performance or load testing environment. For new products, you should increase demand on the solution until either the product stops responding or it responds in an unpredictable or undesirable manner. This becomes the benchmark for the as- yet-unreleased new product.

When replacing systems, the benchmarks of the old (to-be-replaced system) are often used as a starting point for the expectations of the new system. Usually such replacements are predicated on creating greater throughput for the purpose of reducing costs at equivalent transaction volumes or to allow the company to grow more cost-effectively in the future.

Establish the Appropriate Environment

After establishing a benchmark, the next step is to establish an appropriate environment in which to perform testing. The environment encapsulates the network, servers, operating system, and third-party software contained with the product. The performance testing environment ideally will be separate from other environments, including development, QA, staging, and the like. This separation is important because you need a stable, consistent environment to conduct tests repeatedly over some extended duration. Mixing the performance environment with other environments will mean greater levels of changes to the environment and, as a result, lower levels of confidence in the results. Furthermore, some of these tests need to be run over extended time periods, such as 24 hours, to produce the load expected for batch routines. As such, the environment will be largely unavailable for use for other purposes for extended periods of time. To achieve the best results, the environment should mirror production as closely as possible, with all of the obvious financial constraints.

The performance testing environment should mimic the production environment to the greatest extent possible because environmental settings, configurations, different hardware, different firewall rules, and much more can all dramatically affect test results. Even different patch versions of the operating system, which might seem a trivial concern, can have dramatically different performance characteristics for applications. This does not mean that you need a full copy of your production environment; although that would be nice, few companies can afford such a luxury. Instead, make wise tradeoffs but stick to the same basic architecture and implementation as much as possible. For example, pools of servers that in production include 40 servers can be scaled down in a test environment to only two or three servers. Databases are often very difficult to scale down because the amount of data affects the query performance. In some cases, you can “trick” the database into believing it has the same amount of data as the production database to ensure the queries execute with the same query plans. Spend some time pondering the performance testing environment, and discuss the tradeoffs that you are making. If you can sufficiently balance the cost with the effectiveness, you will be able to make the best decisions in terms of what the environment should look like and how accurate the results will be.

An additional reason to create a separate test environment may arise if you are practicing continuous delivery or have plans to do so. For your automated systems to be able to easily schedule delivery of packages, they ideally should run those packages through stages of environments, where each stage is focused on some aspect of the potential quality issues (such as a performance testing environment in this case). The more constrained or shared your environments, the more changes that will become backed up waiting for automated environment reconfiguration to perform automated testing. Splitting out these environments helps ensure a fluid pipeline with fastest possible delivery (assuming no major blocking issues) into the production environment.

Define the Tests

The third step in performance planning is to define the tests. As mentioned earlier, a multitude of tests can be performed on the various services and features. If you try to run all of them, you may never release any products. The key is to use the Pareto distribution, also known as the 80/20 rule: Find the 20% of the tests that will provide you with 80% of the needed information. Product tests almost always follow some similar distribution when it comes to the amount or value of information provided. This situation arises because the features are not all used equally, and some are more critical than others. For example, the feature handling user payments may be more important than the feature handling a user’s search for friends, so it should be tested more vigorously.

Vilfredo Pareto

Vilfredo Federico Damaso Pareto (1848–1923) was an Italian economist who was responsible for making several important contributions to economics. One of his most notable insights is the Pareto distribution. Fascinated by power and wealth distribution in societies, Pareto studied property ownership in Italy. He observed in a 1909 publication that 20% of the population owned 80% of the land, thus giving rise to his Pareto distribution.

Technically, the Pareto distribution is a power law of probability distribution, meaning that it states a special relationship between the frequency of an observed event and the size of the event. Another power law is Kleiber’s law of metabolism, which states that the metabolic rate of an animal scales to the ¾ power of the mass. As an example, a horse that is 50 times larger than a rabbit will have a metabolism 18.8 times greater than the rabbit.

Lots of other rules of thumb exist, but the Pareto distribution is very useful, when it applies, for getting the majority of a result without the majority of the effort. The caution, of course, is to make sure the probability distribution applies before using it. If you have a scenario in which the information is one for one with the action, you cannot get 80% of the information by performing only 20% of the action; instead, you will have to perform the percentage work that you need to achieve the equivalent percentage information.

When you define the tests, be sure to include tests of various types. Some types or categories of tests include endurance, load, most used, most visible, and component (app, network, database, cache, and storage). An endurance test is used to ensure that a standard load experienced over a prolonged period of time does not have any adverse effects due to such problems as memory leaks, data storage, log file creation, or batch jobs. A normal user load with as realistic traffic patterns and activities as possible is used here. It is often difficult to come up with actual or close-to-actual user traffic. A minimum substitute for this input is a series of actions—such as a signup process followed by a picture upload, a search for friends, and a logout—written into a script that can be executed over and over. A more ideal scenario is to gather actual users’ traffic from a network device or app server and replay it in the exact same order while varying the time period. That is, first you can run the test over the same time period in which the users generated the traffic, and then you can increase the speed and ensure the application performs as expected with the increased throughput.

Remember to be mindful of the test definition as it relates to both continuous integration and—more importantly—continuous delivery (refer back to Chapter 5, Management 101, for a definition or refresher on this topic). To be successful with continuous delivery, we need to ensure that the tests we define can be automated and that the success criteria for them can be evaluated by the automation.

Execute the Tests

The load test essentially puts a user load on the system up to the expected or required level to ensure the application is stable and responsive according to internal or external service level agreements. A commonly used test scenario is testing the path that most users take through the application. In contrast, a most visible test scenario is testing the part of the application that is seen the most by users, such as the home page or a new landing page. The component test category is a broad set of tests that are designed to test individual components in the system. One such test might be to exercise a particularly long-running query on the database to ensure it can handle the prescribed amount of traffic. Similarly, traffic requests through a load balancer or firewall are other component tests that you might consider.

In the text execution step, you work through the test plan, executing the tests methodically in the environment established for this testing and recording various measurements such as transaction times, response times, outputs, and behavior. Gather everything that you can. Data is your friend in performance testing. It is important to keep this data from release to release. As described in the next step, comparison between various releases is critical to understanding the data and determining if the data indicates normal operating ranges or the potential presence of a problem.

In organizations practicing continuous delivery, there are a few ways to think about how to execute performance tests. The first is to have the performance testing occur nearly continuously and outside the critical path to delivery into your production environment. This approach has the beneficial effect of not stalling releases waiting for past submissions to the performance test environment to complete. An unfortunate side effect is that should any release identify a significant performance problem that may cause availability problems, you will not find it prior to release. As a result, outages may increase and availability decrease in exchange for the benefit of decreasing time to market.

A second approach is to have releases move through the performance environment sequentially prior to release. While this protects you against potential performance-related outages, it can significantly decrease your delivery velocity. Imagine that you have several releases in the automated delivery queue. If you will perform endurance tests including overnight batch testing, each may need to wait for its own cycle. The result is that an approach meant to be a faster and low-risk introduction to a production environment starts to slow down relative to your old way of doing things.

A hybrid approach is likely to work best, with some level of testing (exercising the code for a short period of time) done in series prior to release and longer-endurance testing happening with batches of releases once a day. This approach allows you to mitigate much of the risk associated with outages, even as you continue to enjoy the time-to-market benefits of continuous delivery. When practicing the hybrid approach, you will likely need at least two performance testing environments: one for sequential testing (in-line and prior to release) and one for prolonged-endurance testing (post release).

Analyze the Data

Step 5 in the performance testing process is to analyze the data gathered. This analysis can be done in a variety of manners, depending on factors such as the expertise of the analyst, the expectations of thoroughness, the acceptable risk level, and the time allotted. Perhaps the simplest analysis is a comparison of this candidate release with past releases. A query that executes 25 times per second without increased response time in the current release may be a problem if it could execute 50 times per second with no noticeable degradation in performance in the last release. The fun begins in the next step—trying to figure out why this change has occurred. Although decreases in capacity of throughput or increases in response time are clearly items that should be noted for further investigation, the opposite is true as well. A sudden dramatic increase in capacity might indicate that a particular code path has been dropped or SQL conditional statements have been lost; such a change should be noted as a potential target of investigation as well. We hope that in these scenarios an engineer has refactored and improved the performance, but it is best to document this change and ask follow-up questions to confirm it.

A more detailed analysis involves graphing the data for visual reference. Sometimes, it is much easier when data is graphed on line, bar, or pie charts to recognize anomalies or differences. Although these may or may not be truly significant, such graphs are generally quick ways of making judgments about the release candidate.

A further detailed analysis involves performing statistical analysis on the data. Statistical tests such as control charts, t-tests, factor analysis, main effects plots, analysis of variance, and interaction plots can all be helpful. These tests help identify the factors causing the observed behavior and help to determine whether you should be concerned about their overall effect.

In the case of continuous delivery, failure of the automated performance test cases (e.g., on a percentage of capacity loss for a given attribute such as queries per second) should either stall the release for evaluation or deliver the release but open a work ticket for analysis. You might decide that you are willing to accept an automated release if the change in performance falls below one threshold (e.g., 2%) and that the release should be stalled for evaluation above that threshold.

Report to Engineers

If the individuals performing the tests are not part of the Agile team, then an additional step of communicating with the engineers who wrote the software must be undertaken. We would prefer to have the Agile team who wrote the software also perform the tests, but sometimes the team performing these tests is still functionally aligned.

The goal of sharing is to ensure that each item or anomaly from the report gets worked to closure. Closure may occur in one of at least two ways. The first case is to identify the anomaly as an expected outcome of the changes. In this case, the engineer responsible for the explanation should be able to support why the performance deviation is not only expected but actually warranted (as in the case where the increase in revenue will offset the resulting increase in cost). The second case is for a bug to be filed so that the engineering team can investigate the issue further and ideally fix it. It is entirely possible that more tests (with the help of engineering) may need to be run to make a solid business case for no action being taken or to fix a possible bug. In the case of continuous delivery workflows, all reporting should be automated.

Repeat the Tests and Analysis

The last step in the performance process is to repeat the testing and reanalyze the data. This can be needed either because a fix was provided for a bug that was logged in step 6 or because there is additional time and the code base is likely always changing due to functional bug fixes. If sufficient time and resources are available, these tests should definitely be repeated to ensure the results have not changed dramatically from one build to another for the candidate release and to continue probing for potential anomalies.

Summary of Performance Testing Steps

When conducting performance testing, the following steps are the critical steps to completing it properly. You can add steps as necessary to fit your organization’s needs, but these are the ones you must have to ensure you achieve the results that you expect:

1. Establish success criteria. Establish which criteria are expected from the application, component, device, or system that is being tested.

2. Establish the appropriate environment. Make sure your testing environment is as close to production as possible to ensure that your test results are accurate.

3. Define the tests. There are many different categories of tests that you should consider for inclusion in the performance testing, including endurance, load, most used, most visible, and component tests.

4. Execute the tests. This step is where the tests are actually being executed in the environment established in step 2.

5. Analyze the data. Analyzing the data can take many forms—some as simple as comparing to previous releases, others including stochastic models.

6. Report to engineers. If the individuals performing the tests are not part of the Agile team, then an additional step of communicating with the engineers who wrote the software must be undertaken. Provide the analysis to the engineers and facilitate a discussion about the relevant points.

7. Repeat the tests and analysis. Continue testing and analyzing the data as necessary to validate bug fixes. If time and resources permit, testing should also continue.

Follow these seven steps and any others that you need to add for your specific situations and organization. The key to a successful process is making it fit the organization.

Performance testing covers a broad range of testing evaluations, but they share a focus on the necessary characteristics of the system rather than on the individual materials, hardware, or code. Concentrating on ensuring the software meets or exceeds the specified requirements or service level agreements is what performance testing is all about.

Don’t Stress over Stress Testing

Stress testing is a process that is used to determine an application’s stability when subjected to above-normal loads. By comparison, in load testing, the load is only as much as specified or normal operations require. Stress testing goes well beyond these levels, often to the breaking point of the application, to observe the behaviors.

Although several different methods of stress testing exist, the two most commonly used are positive testing and negative testing. In positive testing, the load is progressively increased to overwhelm the system’s resources. In negative testing, resources such as memory, threads, or connections are taken away. Besides determining the exact point of demise or (in some instances) the degradation curve of the application, a major purpose of stress testing is to drive the application beyond its capacity to make sure that when it fails, it can recover gracefully. This approach tests the application’s recoverability.

An extreme example of negative stress testing is Netflix’s Chaos Monkey. Chaos Monkey is a service that runs on Amazon Web Services (AWS). It seeks out auto-scaling groups (ASGs) and terminates virtual machines (instances within AWS) for each of the groups. Netflix has taken it a step further with Chaos Gorilla, another service that shuts down entire Amazon Availability zones to make sure healthy zones can successfully handle system load with no impact to customers. The company does this to understand how such a loss of resources will impact its solution. According to the Netflix blog, “Failures happen and they inevitably happen when least desired or expected.”² The idea is that by scheduling the Chaos Monkey or Chaos Gorilla to “work” during the normal business day, the team can respond to, analyze, and react to issues that would otherwise catch them by surprise in the middle of the night.

2. Chaos Monkey released into the wild. Netflix Techblog, July 30, 2012. http://techblog.netflix.com/2012/07/chaos-monkey-released-into-wild.html.

Why do we call this example extreme? The folks at Netflix run it in their production environment! Considered one way, this is really a parallel evolution of continuous delivery practices. To be successful in emulating Netflix, a company first needs to ensure that it has all of the incident and crisis management procedures identified earlier in this book nailed. The good news is that should you be interested, the kind folks at Netflix have released the Chaos Monkey into the wild under the project Simian Army on GitHub. Go check it out!

Identify the Objectives

The first step in stress testing is to identify what you want to achieve with the test. As with all projects, time and resources are limited for this sort of testing. By identifying goals up front, you can narrow the field of tests that you will perform and maximize your return on the time and resources invested.

Stress testing can help identify baselines, ease of recoverability, and system interactions, in addition to the results of negative testing. Broadly speaking, ease of recoverability and baselines are considered positive testing. Stress testing to establish a baseline helps to identify the peak utilization possible or degradation curve of a product. Recoverability testing helps to understand how a system fails and recovers from that failure. Testing systems’ interactions attempts to ensure that some given functionality continues to work when one or more other services are overloaded.

Identify the Key Services

Next we need to create an inventory of the services to be tested. As we won’t be able to test everything, we need a way to prioritize our testing. Some factors that you should consider are criticality to the overall system, service issues most likely to affect performance, and service problems identified through load testing as bottlenecks. Let’s talk about each one individually.

The first factor to use in determining which services should be selected for stress testing is the criticality of each service to the overall system performance. If there is a central service such as a database abstraction layer (DAL) or user authorization, it should be included as a candidate for stress testing because the stability of the entire application depends on this service. If you have architected your application into fault-tolerant “swim lanes” (discussed in depth in Chapter 21, Creating Fault-Isolative Architectural Structures), you are likely to still have core services that have been replicated across the lanes.

The second consideration for determining services to stress test is the likelihood that a service will affect performance. This decision will be influenced by knowledgeable engineers but should also be somewhat scientific. You can rank services by the usage of processes such as synchronous calls, I/O, caching, locking, and so on. The more of these higher-risk processes that are included in the service, the more likely they are to have an effect on performance.

The third factor for selecting services to be stress tested is identification of services during load testing as a bottleneck. With any luck, if a service has been identified as a bottleneck, this constraint will have already been fixed—but you should recheck it during stress testing.

Collectively, these three factors should provide you with strong guidelines for selecting the services on which you should focus your time and resources to ensure you get the most out of your stress testing.

Determine the Load

The third step in stress testing is to determine how much load is actually necessary. Determining the load is important for a variety of reasons. First, it is helpful to know at approximately what load the application will start exhibiting strange behaviors so that you don’t waste time on much lower loads. Second, you need to understand whether your test systems have enough capacity to generate the required load. The load that you decide to place upon a particular service should stress it sufficiently beyond the breaking point, thereby enabling you to observe the behavior and consequences of the stress. One way to accomplish this is to identify the load under which the service begins to exhibit poor behavior, and then to incrementally increase the load beyond this point.

The important thing is to be methodical, record as much data as possible, and create a significant failure of the service. Stress can be placed upon the service in a variety of manners, such as by increasing the number of requests, shortening any delays, or reducing the hardware capacity. An important factor to remember is that loads, whether identified in production or in load testing, should always be scaled to the appropriate level based on the differences in hardware between the environments.

Establish the Appropriate Environment

As with performance testing, establishing the appropriate environment is critical to effective stress testing. The environment must be stable, consistent, and as close to production as possible. This last criterion might be hard to meet unless you have an unlimited budget. If you are one of the less fortunate technology managers who is constrained by a budget like the rest of us, you will have to scale this expectation down. For example, large pools of servers in production can be scaled down to small pools of two or three servers, but the important consideration is that there are multiple servers load balanced using the same rules. The class of servers should be the same if at all possible, or else a scale factor must be introduced. A production environment with solid-state disks and a test environment with hybrid flash and 15,000-rpm disks, for example, will likely cause the product to exhibit different performance characteristics and different load capacities in the two environments.

It is important to spend some time pondering the appropriate stress testing environment, just as you did for the performance testing environment. Understand the tradeoffs that you are making with each difference between your production and testing environments. Balance the risks and rewards to make the best decisions in terms of what the environment should look like and how useful the tests will be. Unlike with performance testing, you need not be concerned about how the environment affects continuous delivery with stress testing. Generally stress testing is a point-in-time phenomenon and need not be performed prior to each release.

Identify the Monitors

The fifth step in the stress testing process is to identify what needs to be monitored or which data needs to be collected. It is as important to identify what needs to be monitored and captured as it is to properly choose the service, load, and tests. You certainly do not want to go to the trouble of performing the tests, only to discover that you did not capture the data that you needed to perform a proper analysis.

Some items that might be important to consider as potential data points are the results or behavior of the service, response time, CPU load, memory usage, disk usage, thread deadlocks, SQL count, transactions failed, and so on. The results of the service are important in the event that the application provides erroneous results. Comparison of the expected and actual results should be considered as a very good measure of the behavior of the service under load.

Create the Load

The next step in the stress testing process is to create the simulated load. This sixth step is important because it often takes more work than running the actual tests. Creating sufficient load to stress the service may be very difficult if your services have been well architected to handle especially high loads. The best loads are those that are replicated from real user traffic. Sometimes, it is possible to gather this from application or load balancer logs. If this is possible and the source of your load data, then you will likely need to coordinate other parts of the system, such as the database, to ensure they match the load data. For example, if you are testing a signup service and plan to replay actual user registrations from your production logs, you will need to not only extract the registration requests from your logs, but also have the data in the test database set to a point before the user registrations began. The reason for this is that if the user is already registered in the database, a different code path will be executed than is normally the case for a user registration. This difference will significantly skew your testing results and yield inaccurate results. If you cannot get real user traffic to simulate your load, you can revert to writing scripts that simulate a series of steps that exercise the service in a manner as close to normal user traffic as possible.

Execute the Tests

After you have finalized your test objectives, identified the key services to be tested, determined the load necessary, set up your environment, identified what needs to be monitored, and created the simulated load that will be used, you are ready for the seventh step—actually executing the tests. In this step, you methodically progress through your identified services performing the stress tests under the loads determined and meticulously record the data that you identified as being important to perform a proper analysis. As with performance testing, you should keep data from release to release. Comparing the results from various releases is a great way to quickly understand the changes that have taken place from one release to another.

Analyze the Data

The last step in stress testing is to perform the analysis on the data gathered during the tests. The analysis for the stress test data is similar to that done for the performance tests, in that a variety of methods can be implemented depending on factors such as the amount of time allocated, the skills of the analyst, the acceptable amount of risk, and the level of details expected.

The other significant determinant in how the data should be analyzed is the objectives or goals determined in step 1. If the objective is to establish a baseline, little analysis needs to be done—perhaps just enough to validate that the data accurately depicts the baseline, that it is statistically significant, and that it has only common cause variation. If the objective is to identify the failure behavior, the analysis should focus on comparing the results from the case where the load was below the breaking point and the case where the load was above it. This will help identify warning signs of an impending problem as well as the emergence of a problem or inappropriate behavior of the system at certain loads. If the objective is to test for the behavior when the resource is removed completely from the system, the analysis will probably want to include a comparison of response times and other system metrics between various resource-constrained scenarios and post load to ensure that the system has recovered as expected. For the interactivity objective, the data from many different services may have to examined together. This type of examination might include multivariate analyses such as principal component analysis or factor analysis. The objective identified in the very first step will be the guidepost for this analysis.

A successful analysis will meet the objectives set forth for the tests. If a gap in the data or missing test scenario prevents you from completing the analysis, you should reexamine your steps and ensure you have accurately followed the eight-step process outlined earlier.

Summary of Stress Testing Steps

When performing stress testing, the following steps are the critical ones to completing it properly. As with performance testing, you can add additional steps as necessary to fit your organization’s needs.

1. Identify the objectives. Identify why you are performing stress testing. These goals usually fall into one of four categories: establish a baseline, identify behavior during failure and recovery, identify behavior during loss of resources, and determine how the failure of one service will affect the entire system.

2. Identify the key services. Time and resources are limited, so you must select only the most important services to test.

3. Determine the load. Calculate or estimate the amount of load that will be required to stress the application to the breaking point.

4. Establish the appropriate environment. The environment should mimic production as much as possible to ensure the validity of the tests.

5. Identify the monitors. You don’t want to execute tests and then realize you are missing data. Plan ahead by using the objectives identified in step 1 as criteria for what must be monitored.

6. Create the load. Create the actual load data, preferably from user data.

7. Execute the tests. In this step, the tests are actually executed in the environment established previously.

8. Analyze the data. The last step is to analyze the data.

Follow these eight steps and any others that you need to add for your specific situations and organization. Ensure the process fits the needs of the organization.

We need to take a break in our description and praise of the stress testing process to discuss the downside of such testing. Although we encourage the use of stress testing, it is admittedly one of the hardest types of testing to perform properly—and if you don’t perform it properly, the effort is almost always wasted. As we discussed in step 4 about setting up the proper environment, if you switch classes of storage or processor speeds, these changes can completely destroy the validity of the test results. Unfortunately, establishing the appropriate environment is a relatively easy step to get correct, especially when compared to the sixth step, creating the load. Load creation is by far the hardest task and the most likely place that you or your team will mess up the process and cause erroneous or inaccurate results. It is very, very difficult to accurately capture and replay real user behavior. As discussed earlier, doing so often necessitates synchronization of data within caches and stores, such as databases or files, because inconsistencies will exercise different code paths and render inaccurate results. Additionally, creating a very large load itself can often be problematic from a capacity standpoint, especially when you’re trying to test the interactivity of multiple services.

Given these challenges, we caution you about using stress testing as your only safety net. As we will discuss in the next chapter on go/no-go decisions and rollback, you must have multiple relief valves in the event that problems arise or disaster strikes. We will also cover this subject more fully in Part III, “Architecting Scalable Solutions,” when we discuss how to use swim lanes and other application-splitting methods to improve scalability and stability.

As we stated at the beginning of this section, the purpose of stress testing is to determine an application’s stability when subjected to above-normal loads. It is clearly differentiated from load testing, where the load is only as much as specified; in stress testing, we go well beyond this level to the breaking point and watch the failure and the recovery of the service or application. We recommend an eight-step stress testing process starting with defining objectives and ending with analyzing the data. Each step in this process is critical to ensuring a successful test yielding the results that you desire. As with our other processes, we recommend starting with this one intact and adding to it as necessary for your organization’s needs.

Performance and Stress Testing for Scalability

As we discussed in Chapter 11, Determining Headroom for Applications, it is critical to scalability that you know where you are in terms of capacity for a particular service within your system. Only then can you calculate how much time and growth you have left to scale. This knowledge is fundamental for planning headroom or infrastructure projects, splitting databases/applications, and making budgets. The way to ensure your calculations remain accurate is to conduct performance testing on all your releases to ensure you are not introducing unexpected load increases. It is not uncommon for an organization to implement a maximum load increase allowed per release. As your capacity planning becomes more sophisticated, you will come to see the load added by new features and functionality as a cost that must be accounted for in the cost–benefit analysis.

Additionally, stress testing is necessary to ensure that the expected breakpoint or degradation curve is still at the same point as previously identified. It is possible to leave the normal usage load unchanged but decrease the total load capacity through new code paths or changes in logic. For instance, an increase in a data structure lookup of 90 milliseconds would likely go unnoticed if included in the total response time for a user’s request. If this service is tied synchronously to other services, however, as the load builds, hundreds or thousands of 90-millisecond delays will add up and decrease the peak capacity that services can handle.

When we talk about change management, as defined in Chapter 10, Controlling Change in Production Environments, we are really discussing more than the lightweight change identification process for small startup companies. That is, we are referring to the fuller-featured process by which a company attempts to actively manage the changes that occur in its production environment. We have previously defined change management as consisting of the following components: change proposal, change approval, change scheduling, change implementation and logging, change validation, and change efficacy review. Performance testing and stress testing augment this change management process by providing a practice implementation and—most importantly—a validation of the change. You would never expect to make a change without verifying that it actually affected the system the way that you think it should, such as by fixing a bug or providing a new piece of functionality. As part of performance and stress testing, we validate the expected results in a controlled environment prior to production. This additional step helps ensure that when a change is made in production, it will also work as it did during testing under varying loads.

The most significant factor that we should consider when relating performance testing and stress testing to scalability is the management of risk. As outlined in Chapter 16, Determining Risk, risk management is one of the most important processes when it comes to ensuring your systems will scale. The precursor to risk management is risk analysis, which attempts to calculate the amount of risk associated with various actions or components. Performance testing and stress testing are two methods that can significantly decrease the risk associated with a particular service change. For example, if we were using a failure mode and effects analysis tool and identified a failure mode of a particular feature as being an increase in query time, the mitigation recommended could be to test this feature under actual load conditions, as with a performance test, to determine the actual behavior. This could also be done with extreme load conditions, as with a stress test, to observe behavior above normal conditions. Both of these tests would provide much more information with regard to the actual performance of the feature and, therefore, would lower the amount of risk. Clearly, these two testing processes are powerful tools when it comes to reducing, and thereby managing, the amount of risk within the release or the overall system.

From these three areas—headroom, change control, and risk management—we can see the inherent relationship between successful scalability of a system and the adoption of the performance and stress testing processes. As we cautioned previously in the discussion of stress testing, the creation of the test load is not easy, and if done poorly can lead to erroneous data. However, this challenge does not mean that it is not worth pursuing the understanding, implementation, and (ultimately) mastery of these processes.

Conclusion

In this chapter, we discussed in detail the performance testing and stress testing processes, both of which have important implications for scalability of a system. For the performance testing process, we defined a seven-step process. The key to completing this process successfully is to be methodical and scientific about the testing.

For the stress testing process, we defined an eight-step process. These were the basic steps we felt necessary to have a successful process. You can add other steps as necessary to ensure a proper fit of this process within your organization.

We concluded this chapter with a discussion of how performance testing and stress testing fit with scalability. Based on the relationship between these testing processes and three factors—headroom, change control, and risk management—these processes also are directly responsible for scalability.

Key Points

• Performance testing covers a broad range of engineering evaluations where the emphasis is on the final measurable performance characteristic.

• The goal of performance testing is to identify, document, and (where possible) eliminate bottlenecks in the system.

• Load testing is a process used in performance testing.

• Load testing is the process of putting load or user demand on a system so as to measure its response and stability.

• The purpose of load testing is to verify that the application can meet a desired performance objective, often one specified in a service level agreement.

• Load and performance testing are not substitutes for proper architecture.

• The seven steps of performance testing are as follows:

1. Establish the criteria expected from the application.

2. Establish the proper testing environment.

3. Define the right tests to perform.

4. Execute the tests.

5. Analyze the data.

6. Report to the engineers, if they are not organized into Agile teams.

7. Repeat as necessary.

• Stress testing is a process that seeks to determine an application’s stability when subjected to above-normal loads.

• Stress testing, as opposed to load testing, goes well beyond the normal traffic—often to the breaking point of the application—and observes the behaviors that occur.

• The eight steps of stress testing are as follows:

1. Identify the objectives of the test.

2. Choose the key services for testing.

3. Determine how much load is required.

4. Establish the proper test environment.

5. Identify what must be monitored.

6. Create the actual test load.

7. Execute the tests.

8. Analyze the data.

• Performance testing and stress testing impact scalability through the areas of headroom, change control, and risk management.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 17. Performance and Stress Testing

Create new playlist

Sign In

Sign Up

Chapter 17. Performance and Stress Testing

Performing Performance Testing

Establish Success Criteria

Establish the Appropriate Environment

Define the Tests

Execute the Tests

Analyze the Data

Report to Engineers

Repeat the Tests and Analysis

Don’t Stress over Stress Testing

Identify the Objectives

Identify the Key Services

Determine the Load

Establish the Appropriate Environment

Identify the Monitors

Create the Load

Execute the Tests

Analyze the Data

Performance and Stress Testing for Scalability

Conclusion

Key Points

Table of Contents for
Chapter 17. Performance and Stress Testing