5 Quality as a condiment

This chapter covers

  • The testing pyramid
  • Continuous deployment versus continuous delivery
  • Restoring confidence in your test suite
  • How flaky tests contribute to flaky systems
  • Feature flags
  • Why the operations team should own the testing infrastructure

Imagine you’re at one of those quick-serve restaurants where you move from station to station adding ingredients to your final dish. Each part of your meal is a separate ingredient that comes together at the end of the line. Once you get to the end of the line, you might ask for a condiment to enhance the dish. But there is no condiment called “quality.” If there were, everyone would order their dish with extra quality at the end. Just drown it in quality! This is exactly what many organizations are doing with their quality strategy.

Lots of companies have a dedicated quality assurance (QA) team that’s responsible for ensuring that you’re producing a quality product. But the quality of the product can’t exist in itself. Overall quality comprises the quality of all the individual ingredients. If you’re not checking the quality of the ingredients, the quality of the final product can be only so good. Quality is not a condiment. Not in restaurants, and not in software development.

To consistently deliver a quality product, you have to build quality into every component and ingredient individually, and you have to verify that quality separately before it gets completely absorbed into the final product. Waiting until the end of the development life cycle and tacking on testing can be a recipe for disaster. When you’re testing all the ingredients of the product, chances are, you’re doing a lot more testing generally. This means the way you structure your testing efforts becomes important. The quality of those test results becomes equally important. You need to be able to trust the outputs of those tests, or people will question their need. But often in the quality as a condiment antipattern, you’re keeping your eyes on the wrong sorts of metrics around testing.

If you’re doing any automated testing, you probably keep a watchful eye on the number of test cases. In fact, you probably thump your chest occasionally, overly proud of just how much automation you have. And despite having 1,500 automated test cases, a full QA team that does regression on every release, you still end up releasing your software with bugs. Not only that, but you sometimes release with embarrassing bugs that highlight the fact that some areas are just not tested at all. You’re not alone.

Are there ways to keep from releasing bugs into production? My opinion is no. As long as you’re writing software, you’ll also be writing bugs that don’t get caught and make their way into production. This is your lot in life. This is what you signed up for. I’ll give you a minute to digest that.

Now that you’ve accepted that, I’ll tell you it’s not the end of the world. You can work on making certain types of bugs much less likely to occur and develop a process whereby when you identify a bug, you know exactly what to do to test for that scenario and ensure that it doesn’t happen again. With enough practice, you’ll be able to identify classes of errors, perhaps eliminating an entire variety of errors in one swoop. But the process is one of iteration and commitment.

The meat of this chapter focuses on the testing process and how it applies to DevOps. It’s not a complete philosophy on how testing should be done, but I will dive into the underpinnings of testing strategies in widespread use today. Once I establish the basics around these ideas, I’ll talk about the feedback loop that you get from your test suite and how it builds (or destroys) confidence in what’s being deployed. And finally, I’ll discuss why testing is important to the deployment process and ensuring that your source code repository is always in a deployable state.

5.1 The testing pyramid

Many of today’s test suites are centered around the idea of the testing pyramid. The testing pyramid is a metaphor for the types of testing that you should be doing for your application. It gives an opinionated view on where the most energy and effort should be expended in the testing life cycle.

The testing pyramid places a heavy emphasis on unit testing being the foundation of your test suite and the largest component of it. Unit tests are written to test one specific section of code (known as the unit) and ensure that it functions and behaves as intended. Integration tests are the next level in the pyramid and focus on combining the various units of the system and testing them in groups. These tests exercise the interactions between the units. Finally, end-to-end tests exercise the system from the per-spective of an end user.

I’ll dive into each of these areas in more detail later in the chapter. Throughout this chapter, I will assume you’re doing automated testing in some fashion. This doesn’t mean that some manual regression testing isn’t also happening, but for the items in this list, I’m assuming a computer is performing these tasks.

DEFINITION The testing pyramid is a metaphor to highlight how you should group your tests, along with guidelines for the number of tests that should exist relative to other types of test groups. Most of your tests should be unit tests, followed by integration tests, and lastly, end-to-end tests.

Figure 5.1 shows an example of the testing pyramid. You’ll notice three distinct layers: unit testing, integration testing, and end-to-end testing. These groupings are intended to focus the goals of the test suite, with the idea of quick feedback at the base of the pyramid, and tests becoming slower but more extensive as you move up the pyramid stack.

Figure 5.1 The testing pyramid

You may be wondering, what in the name of good software development does the testing pyramid have to do with DevOps? There are two primary reasons it’s important.

First, as the DevOps movement pushes for more and more automation, teams like operations might be getting exposed to these concepts for the first time. It’s important that they understand the methods and practices currently employed by development teams. Using the same guidelines and practices will help align the teams and create synergies. (Yes, I used the word “synergies.” It really does feel like the best word here. Don’t judge me.)

The second reason is that as DevOps pushes for more automation, the automation will need to interact with other processes within the software development life cycle. These processes need to be able to emit a signal, which becomes a quantitative metric for a qualitative property. That sentence sounds like I’m going to try to sell you something, so let me give an example.

An automated deployment process would want to know that something is OK to deploy. It needs to know that the code is of good quality. But what does “quality” mean to a computer? It doesn’t inherently know the difference between a flaky test that is known to fail regularly and a test that, if it fails, the code should be avoided at all costs. As a developer, you have to design those sorts of qualitative assumptions into your test suites so that automation can make this determination for you. The results of the test suite are as much for machine consumption (automation) as for human consumption. This makes it a topic at least worthy of discussion in the DevOps circle.

The testing pyramid serves as a set of guide rails for structuring your test suite in order to provide maximum speed and quality. If you follow the overall structure proposed by the testing pyramid, you will be able to create accurate feedback quickly to your end user. If your tests are structured so that the majority of them are integration tests, these tests will inherently run slower than their less heavyweight unit tests.

As I’ll discuss later in the chapter, as you move up the pyramid, not only does the speed of tests tend to decrease, but also the number of external factors that might influence the tests begin to increase. By using the testing pyramid as a reminder for your test structure, you can ensure that your test suite will be optimized to provide accurate feedback faster than if you placed too much energy higher up the pyramid.

You may be wondering why a DevOps book has a chapter focused on testing. The first point is to highlight that automated testing plays a crucial role in the DevOps approach. The more automation you begin to build into your system, the more important it is to have automated verification that those changes were successful. The second reason is really to speak to the operations staff members who may not be as in tune with the testing process as their developer counterparts. The testing life cycle is so important in DevOps transformations that it’s important for operations staff members to understand it just as much as development staff members. Operations staff will need to work on more and more automation, specifically around infrastructure creation and management. Having a foundation for standard approaches to automated testing will serve them well in their DevOps journey.

Hopefully, I’ve defended why the next 20 or so pages belong in this book, so let’s get to it.

5.2 Testing structure

The testing pyramid provides a framework for structuring your testing suite. The next section goes into a bit more detail on each hierarchy level in the testing pyramid. The testing pyramid has expanded over the years to encapsulate different methods of testing, such as API tests on top of integration tests as an example. I’ve shied away from those particular additions and instead consider those part of the integration layer of testing.

5.2.1 Unit tests

Unit testing sits at the bottom of the testing pyramid. It’s the foundation for all the tests that come after it. If properly designed, any unit test failure should result in tests higher in the stack failing pretty reliably.

So, what is unit testing? In unit testing, an individual unit or component of software is tested. This component might be a method, a class, or a function, depending on what exactly is being tested. The key is to ensure that all possible code paths in the unit being tested are exercised and their success evaluated.

DEFINITION Unit tests are tests written for individual units, or components, of software. These components are at a granular level in the code, such as a method, function, or class, instead of an entire module of the application.

Unit tests should be written by the developer writing the component or unit under test. The reason is that unit tests should be run regularly by the developer during the creation and subsequent testing of the unit under development! In some circumstances, a unit test will be missed by a developer, in which case it’s OK for someone else to write the unit test to ensure thorough testing. But if your organization currently has someone other than the developer writing unit tests, I’m sorry, but you’re doing it wrong. Leaving the task to another group means a few things, including the following:

  • The developer has the most context for the test case. Because of this, they’ll have the best frame of reference for creating the test case.

  • The developer will have an automated process for verifying their code. If they ever need to refactor code, they’ll be aided by the automated tests they’ve created around the unit they’re working on.

  • By ensuring that the developers are writing tests, teams can take advantage of different development practices, like test-driven development, as part of their development workflow.

  • If the responsibility is on the developer, it becomes easier to enforce unit testing requirements through processes like code reviews prior to merging the code into the source code repository.

Ensuring that developers are the primary authors of unit tests is a must. If for some reason that doesn’t work for your organization, you should look long and hard at the trade-offs. At the very least, you should make sure you have solid answers to how you’re dealing with the preceding bullet points.

What is test-driven development?

In the previous section, I mentioned taking advantage of practices like test-driven development (TDD). For those who don’t know, TDD is the practice by which a developer turns their code requirements into test cases.

For example, if the function being written must take three numeric values, apply a formula to them, and return the results, the developer could write these requirements as tests before writing any code that implements those requirements. Only after the test cases have been developed does the developer write the underlying implementations for that code. The developer knows when they have functioning code because the test cases should now be passing, as long as they’ve met the developer’s expectation.

This approach encourages developers to keep the units relatively small, which leads to an easier time debugging applications. Also, with the tests being written ahead of the implementation, you can reduce the chance of developers writing tests that concern themselves with implementation details, rather than inputs and outputs.

 

Unit test structure

In terms of structure, unit tests should be isolated as much as possible from interactions with the rest of the system. Anything that isn’t part of the unit under test should be mocked or stubbed out. The goal is to ensure that the tests run fast and are not marred by possible failures from other systems.

As an example, suppose you’re writing a function that does some calculations to generate a rate of return on an investment. Your function would normally interact with an API to get the original purchase price of the investment. You want that unit test to be able to give you consistent feedback on how your code is functioning. But, if inside your unit test you call the actual API, you’re introducing new potential failure points that have nothing to do with your code. What if the API is down? What if the API is malfunctioning? This adds noise to your unit test because the failures are outside your control. Faking the API call allows you to keep the test focused on your code. This isn’t to say that testing the interaction with the API isn’t important; it’s just a concern for a different test, higher in the testing pyramid (specifically, the integration tests portion of the pyramid).

Another advantage to faking these types of interactions and keeping things focused is speed. Unit tests should be designed with speed in mind in order to reduce the amount of time a developer needs to wait to get feedback on the change. The faster the feedback, the more likely a developer is to run these tests locally. Keeping the test focused on local interactions also makes it much easier to debug, since you know with certainty it’s the function at fault.

Determining what to unit test

The hardest part about unit testing is knowing what to write test cases for. To many people’s surprise, testing everything can be counterproductive. If you test everything, refactoring is more difficult, not less. This often comes up when developers attempt to write test cases for internal implementations.

If you change the internal implementation of how a problem is solved, all of the corresponding tests against that internal implementation begin to fail. This makes refactoring cumbersome, because you’re refactoring not only the code, but also all the tests that exercised that code. This is where it becomes important to understand how your code paths are called.

Let’s use the example of the rate calculation. You have a function called mortgage _calc that people will invoke to get the total cost of a mortgage with interest. The resulting value should be a dollar amount. The black box of how you go about calculating that may not need to be tested individually if those methods are called only within the mortgage_calc function; the detailed implementations of those methods are exercised as a result of testing mortgage_calc. This encapsulation allows you to refactor a bit easier. Maybe one day you decide to change the internal implementation of mortgage_calc. You can ensure that mortgage_calc still exhibits the expected behavior without needing to refactor all the tests for the internal implementation of that function, freeing you up to make code changes with confidence.

The hard part is, I don’t have a one-size-fits-all method for solving this problem. The best I can offer is to identify public code paths and private code paths and focus your unit testing around the public code paths, allowing you to change the private paths without spending a ton of time refactoring internal tests. Focus on those code paths that are called from multiple places. Testing internal implementations isn’t completely discouraged, but use it sparingly.

Having unit tests at the bottom of the test pyramid reflects that they should also be the lion’s share of your test cases. They’re usually the fastest, most reliable, and most granular types of tests you can create. The source of failure in a unit test should be pretty obvious, because the unit under test is tightly scoped. Unit tests feature prominently later in the chapter, when I start to discuss the automated execution of tests. This automation is typically done by a continuous integration server such as Jenkins, CircleCI, Harness, and many others.

NOTE Continuous integration servers have become extremely popular as a way to automatically execute a code base’s test suite. Continuous integration servers generally have hooks into common code repositories to detect when a change has been merged and to act based on that change. ThoughtWorks (www .thoughtworks.com/continuous-integration) is often credited with inventing continuous integration and has some excellent resources on its website.

5.2.2 Integration tests

Integration tests are the next level in the testing pyramid. The goal is to begin to test the connection points between systems and the way the application handles responses to those systems. Whereas in a unit test, you may have mocked out or faked a database connection, in the integration test phase, you’ll connect to an actual database server, write data to the database, and then read that data to validate that your operation worked successfully.

Integration tests are important because seldom do two things just seamlessly work together. The two items being integrated might have been built with very different use cases in mind, but when they’re brought together, they fail in spectacular ways.

As an example, I once worked for a company that was building a new headquarters. The building and the parking garage attached to the building were being designed separately. When they finally started construction, they realized the parking garage floors didn’t line up with the building floors, so additional stairways needed to be added to integrate the two. This is an example in the physical world of a failed integration test!

Integration tests will take longer, considering the interactions that need to happen between components of the test. In addition, each of those components will likely need to go through a setup and teardown process to get them in the correct state for testing. Database records might need to be populated or local files downloaded for processing, just to name a couple of examples. Because of this, integration tests are generally more expensive to run, but they still play a vital part in the testing strategy.

Integration tests should never run their integration points against production instances. This might sound rudimentary, but I feel like it’s best to state it clearly. If you need to test against another service, such as a database, you should try to launch that service locally in the test environment. Even read-only testing against production environments could place undue stress on the production servers, creating problems for real users.

Tests that write data are problematic for production as well; when your test environment begins spamming production with bogus test data, you’ll probably need to have a pretty stern conversation with your bosses. Testing in production is definitely possible, but it takes a large, orchestrated effort by all members of the technical team.

If launching a local copy of the dependency isn’t feasible, you should consider standing up a test environment that’s running these services for the test suite to test against. Problems that you run into here, though, might be related to data consistency problems. If you’re isolating your dependencies per test case, you can do tests that follow this order:

  1. Read the number of rows in the database.

  2. Perform an operation on the database, such as an insert.

  3. Verify that the number of rows has increased.

This is a common pattern you see in test cases. Not only is it wrong, but it’s exceptionally wrong in the case of a shared staging environment. How does your application verify that no one else is writing to that table? What if two tests execute at the same time, and now instead of having N + 1 records, you have N + 2 records (where N is the number of records in the database prior to your operation)?

In a shared testing infrastructure scenario, these tests must be much more explicit. Counting rows no longer is sufficient. You need to verify that the exact row you created exists. It’s not complicated, but just a little more involved. You’ll run into a lot of scenarios like this if you opt to use a shared environment for integration testing. But if you can’t have isolated dependencies per test run, this might be your next best option.

Contract testing

Another popular form of testing that’s been cropping up is contract testing. The idea behind contract testing is to create a way to detect when your underlying assumptions about a stubbed-out service have changed.

If you’re going to use a mock or a stub service for your testing purposes, you have to be sure that the service is accepting inputs and producing outputs in the way you expect. If the real service were to change its behavior, but your tests don’t reflect that, you’ll end up releasing code that doesn’t interact with the service correctly. Enter the contract test.

The contract tests are a separate set of tests that run against a service to ensure that the inputs and outputs of the endpoints are still behaving in the way you expect them to. Contract tests are prone to change, so running them at a more infrequent rate isn’t uncommon. (Daily will probably suffice.)

By having contract tests, you can detect when another service has changed its expectations and update your stubs and mocks appropriately. If you want more information, see the excellent chapter on contract tests in Testing Java Microservices by Alex Soto Bueno, Andy Gumbrecht, and Jason Porter (Manning, 2018).

5.2.3 End-to-end tests

End-to-end tests exercise the system from the perspective of the end user. Sometimes referred to as UI tests, they launch or simulate a browser or client-side application and drive change through the same mechanism that an end user would. End-to-end tests usually verify results the same way, by ensuring that data is properly displayed, response times are reasonable, and no pesky UI errors show up. Often end-to-end tests will be launched via a variety of browsers and browser versions to make sure no regression errors are triggered by any browser and version combinations.

End-to-end tests are at the top of the pyramid. They’re the most complete test, but also the most time-consuming and often the flakiest. They should be the smallest set of tests in your testing portfolio. If you’re overly reliant on end-to-end tests, you’ll probably find yourself with a test suite that can be brittle and easily fail between runs. These failures might be outside your actual test cases. When a test fails because an element in the web page has changed names or locations, that’s one thing. But when your test fails because the underlying web driver that powers the test has failed, tracking down and debugging the issue can be a frustrating exercise.

Heavy focus on end-to-end tests

Another common cause of heavy end-to-end tests is that the team responsible for doing the majority of testing is not the development team. Many (but certainly not all) QA teams that are doing programmatic testing lean on UI tests because it’s the way they’re accustomed to interacting with the application and the underlying data.

A lot of detailed knowledge is required to understand where a value on the page comes from. It might be straight from a field in the database. It might be a calculated value that comes from a database field, with additional context being brought in from application logic. Or it might be computed on the fly. The point, though, is that someone who isn’t intimately familiar with the code might not be able to answer the question of where the data comes from. But if you write a UI test, it doesn’t always matter. You’re checking for a known value to exist on a known record.

In my experience, I’ve seen testing teams become heavily reliant on production data as part of their regression testing, partly because production will have specific use cases that the testing team can rely on as part of the regression test. The difficulty comes in, however, when production has incorrect data, meaning the bug has already escaped into the wild. Then the end-to-end test ensures that the data matches what’s in production instead of what the actual calculated value should be.

As fewer unit tests are employed and more end-to-end tests are used in their place, the situation becomes exacerbated. What can end up happening is that your test suite moves from testing for correctness to testing for conformity, which isn’t always the same thing. Instead of testing that 2 + 2 = 4, you test that 2 + 2 = whatever production says. The bad news is, if production says 5, the laws of mathematics get tossed out the window. The good news is that your UI test passes. This just stresses how important it is to make sure the key functions involved are covered in unit tests and that you don’t rely on them to be caught higher up in the testing pyramid.

If you’ve done a lot of work with end-to-end tests, you probably have recognized that they’re often brittle. By brittle, I mean that they’re easily broken, often needing to be handled with kid gloves and an enormous amount of care and feeding. Small, inconsequential changes to the code or the layout of a web page can break an end-to-end test suite.

A lot of that is rooted in the way these sorts of tests are designed. To find a value that you’re testing against, the test engineer needs to know and understand the layout of the page. Through that, they can build techniques to parse a page and interpret the value that you’re looking for to test against. This is great, until one day the page layout changes. Or the page is slow to load. Or the driver engine responsible for parsing the page runs out of memory. Or a third-party web plugin for ads doesn’t load. The list of things that could go wrong is long and distinguished.

The brittleness of testing, combined with the amount of time it takes to execute end-to-end tests, forces these sorts of tests to live at the very top of the hierarchy, and as a result, make up the smallest portion of your testing portfolio. But because of their extensive nature, you end up actually exercising quite a few code paths with a single end-to-end test because the scope of what’s under test is so much larger. Instead of testing a single unit or a single integration, you’re now testing an entire business concept. That business concept will probably exercise multiple smaller things on its path to success. As an example, say your end-to-end test was going to test the order-generation process. Your end-to-end test might look like the following:

  1. Log in to the website.

  2. Search the product catalog for unicorn dolls.

  3. Add a unicorn doll to your shopping cart.

  4. Execute the checkout process (payment).

  5. Verify that a confirmation email/receipt was sent.

These five steps are pretty basic from a user-interaction perspective, but from a system perspective, you’re testing a lot of functionality in one test. The following are just a handful of things that are being tested at a more granular level with this:

  • Database connectivity

  • Search functionality

  • Product catalog

  • Shopping cart functionality

  • Payment processing

  • Email notification

  • UI layout

  • Authentication functionality

And this list could arguably be a lot longer if you really wanted to dig into it. But suffice it to say that an end-to-end test exercises quite a bit of functionality in a single go. But it’s also longer and more prone to random failures that have nothing to do with the actual system under test. Again, real test failures are valuable feedback, and you need to understand if your system isn’t providing correct responses. But if your tests are failing because of hardware limitations, web driver crashes, or other things around the scaffolding of your test suite, then you must consider the value proposition of the test. You might end up playing whack-a-mole as you solve one problem in the testing infrastructure only to create or discover another.

I try to limit the number of end-to-end tests performed to core business functionality. What tasks must absolutely work in the application? In the case of my example, a simple e-commerce site, they might be as follows:

  • Order processing

  • Product catalog searching

  • Authentication

  • Shopping cart functionality

The example tested all of those functions with a single end-to-end test. When building end-to-end tests, it’s important to understand the key business drivers and to make sure those have good test coverage. But if you get a lot of end-to-end tests, chances are you’ll increase your failure rate due to unrelated problems in your test suite. This will create a lack of confidence in your test suite, which generates a lot of other issues on the team. The goal is to make sure each of your end-to-end tests adds more value than it creates additional work and troubleshooting. Your list of things to test may grow and shrink over time.

5.3 Confidence in your test suite

Imagine you’re in the cockpit of an airplane. The pilot is showing you his preflight checklist. He’s running through the routine when something in the routine fails. He turns to you and says, “Don’t worry about it; this thing just fails sometimes.” He then reruns the steps of the preflight checklist, and magically everything works this time. “See, I told you.” How good do you feel about that flight? Probably not very good.

Confidence in your test suite is an asset. When a test suite becomes unpredictable, its value as a confidence-building tool diminishes. And if a test suite doesn’t build confidence, what’s the point of it? Many organizations lose sight of what a test suite is supposed to provide. Instead, they cargo-cult the idea of automated testing based on nothing more than that’s what you’re supposed to do. If confidence in the test suite begins to wane, that’s something you need to address sooner rather than later.

Gauging confidence in the test suite is more of an art than a science. An easy way to gauge it is the way people react to the test suite failing. A failed test should trigger the engineer to begin investigating their own code, looking at what changed and how that change might have impacted the test suite. But when confidence in a test suite is low, the first thing an engineer will do is rerun the test suite. This is often accompanied with a lack of belief that the change they made could in any way affect the test that is failing.

The next thing that happens when confidence is low is that the build environment begins to be questioned: “Something must have changed with these build servers.” That kicks off a series of conversations with whoever supports the build servers burning up their valuable time. My point isn’t that these things aren’t possible culprits in a failed test scenario. In many situations, they are. But in an environment with low test confidence, these are the first to be blamed, as opposed to the most likely thing that changed in the environment--the code!

You can gauge whether confidence in your test suite is low in various ways. The simplest way is to just ask the engineers. How confident are they in their changes if the test suite passes or fails? They will be able to give you a keen perspective on the quality of the test suite because they’re interacting with it daily. But just because confidence in your test suite is low doesn’t mean it has to stay there.

5.3.1 Restoring confidence in your test suite

Restoring confidence in your test suite isn’t a monumental task. It requires identifying the source of bad tests, correcting them, and increasing the speed at which problems are known. You can do that largely by following the testing pyramid to start.

Test suites should fail immediately after they encounter a failure

When you’re running tests, it can be tempting to have your test suite run from beginning to end, reporting all the failures at the end of the run. The problem is, you may end up spending a lot of compute time only to find out the test failed two minutes into the run. What value is gained by the subsequent failures? How many of the integration or end-to-end tests failed because a basic failure in the unit tests has cascaded up the chain?

In addition, the reliability of your test suite generally worsens as you move up the pyramid. If your unit tests are failing, you can be pretty confident that the cause is something wrong with the code. Why continue with the rest of the execution? The first thing I recommend you do to help bolster confidence in the test suite is to divide the suite’s execution into phases. Group your tests into levels of confidence, similar to the testing pyramid. You might have several subgroups within the integration tests layer, but if any unit tests fail, continuing on in the test suite carries little value.

The objective is to give developers fast feedback on the acceptability of their code. If you run a set of lower tests that fail, but then continue on with other tests, the feedback gets confusing. Every test failure becomes a murder mystery. As an example, say you have an end-to-end test that failed because the login routine didn’t work. You begin your investigation at the top of the failure event (the login page error) and work your way backward. The login page test failed because the renderer action failed, because some required data didn’t load, because the database connection failed, because the method for decrypting the password failed and gave a bogus response.

All of that investigation takes time. If you had just failed right at the decryption password method’s unit test, you’d have a much clearer understanding of what happened and where to start your investigation. The test suite should try to make it clear and obvious where things failed.

The more time developers spend researching test failures, the lower the perceived usefulness of the test suite becomes. I put emphasis on the word “perceived” because confidence and perception are unfortunately intertwined in the human mind. If people waste time troubleshooting the test suite to figure out what went wrong, the idea that the test suite has little value will spread. Making it clear and obvious when something fails will go a long way in helping to combat that perception.

It’s also extremely important to have good test case hygiene. If a bug is discovered in production, a test to detect that issue going forward should be created. Finding the same bug slip through the testing process over and over again diminishes confidence not only in the test suite, but also in the product from your end users.

Don’t tolerate flaky tests

The next thing to do is to take stock of which tests are flaky. (Hint: it’s probably your end-to-end tests.) Keep a list of those tests and turn them into work items for your team to address. Reserve time in your work cycles to focus on improving the test suite. The improvements can be anything from changing your approach to the test, to finding out the reason for the test’s unreliability. Maybe a more efficient way of finding elements in the page would result in less memory consumption.

Understanding why a test is failing is an important part of the upkeep of your test suite. Don’t lose sight of that, even if you’re improving on just a single flaky test per week. This is valuable work that will pay dividends. But if, after all that work, you find the same test to be failing repeatedly for reasons that exist outside the actual test case, I’m going to suggest something drastic. Delete it. (Or archive it if you don’t like to live dangerously.)

If you can’t trust it, what value is it really bringing? The amount of time you spend rerunning the test is probably creating a larger cost on the team than the savings of having that test automated. Again, if a test isn’t raising confidence, it’s not really doing its job. Flaky tests usually stem from a handful of things, including the following:

  • The test case isn’t well understood. The expected result isn’t taking into account certain cases.

  • Data collisions are happening. Previous test data is creating a conflict with the expected results.

  • With end-to-end tests, variable loading times create “time-outs” when waiting for certain UI components to show in the browser.

There are certainly plenty of other reasons that your tests might randomly fail, but a large batch of them can probably fall under one of these issues or a variation of it. Think about any shared components with other tests that could create a data issue for your test environment. Here are a few questions you can ask yourself that might help improve these collisions:

  • How can you isolate those tests?

  • Do they need to be run in parallel?

  • How do tests clean up data after they’ve run?

  • Do tests assume an empty database before they execute, or are they responsible for cleaning the database themselves? (I feel tests should ensure that the environment is configured as expected.)

Isolate your test suite

Test suite isolation becomes a real problem when your test suite relies heavily on integration and end-to-end tests. Isolating tests is pretty easy for unit tests, because the component under test uses in-memory integrations or entirely mocked-out integrations. But integration tests can be a little trickier, and the problem is commonly at the database integration layer. The easiest thing to do is to run separate database instances for every test. This can be a bit of a resource hog, though, depending on your database system of choice. You may not have the luxury of being able to separate everything completely.

If you can’t run separate instances completely, you may want to try running multiple databases on the same instance. Your test suite could create a randomly named database at the start of the test suite, populate the data for the necessary test cases, and then tear down the database after those test cases have completed. For testing purposes, you really don’t need a clean, proper name for the database; so as long as it’s a legal name from the database engine’s perspective, you should be fine. The catch with this scheme, however, is ensuring that your test suites are cleaning up after themselves when they complete.

You also need to understand what you do with the database on a failed test. The database might be an important part of the troubleshooting process, so keeping it around can add a ton of value. But if you remove the automated destruction of the database, you’ll absolutely need to contend with humans forgetting to delete the test database after their investigation has completed. You’ll need to evaluate the best course for your team based on your organization. Every approach has pros and cons, so you’ll need to figure out which set of pros and cons works best for your organization.

There are options for further isolation through dynamic environment creation if your organization is capable of implementing and managing automated environment creation. Spinning up new virtual machines for test cases is an attractive option, but even with automation, the time spent on bootstrapping can be too long to provide the quick feedback engineers crave. Spinning up enough virtual machines at the start of the day to ensure that every test can run in its own isolated machine can save on the spin-up costs, but probably will end up creating actual monetary costs to ensure you have resource capacity.

It also creates a scaling issue as you create a much more linear relationship between your test cases and your supporting infrastructure. Running your test cases in separate Docker containers is a way to reduce this cost, since containers are not only lightweight in terms of resources, but also are very fast to start, allowing you to scale quickly as resource demands rise. Configuring testing infrastructure is beyond the scope of this book, but it’s important to highlight those two paths as potential options for further test isolation.

Limit the number of end-to-end tests

End-to-end tests in my experience are the flakiest of the bunch. This is mainly due to the very nature of end-to-end tests.

The tests are tightly coupled with the user interface. Small changes in the user interface create havoc on automated tests that depend on a specific layout for the website. The end-to-end test typically must have some understanding of the way the UI is structured, and that knowledge is typically embedded inside the test case in some shape or form. Add to that the pains of executing tests on shared hardware, and you can run into performance issues, which ultimately affects the evaluation criteria of the test suite.

I wish I had an easy answer to solve the woes of end-to-end testing. They’re typically unreliable while at the same time being incredibly necessary. The best advice I can give follows:

  • Limit the number of end-to-end tests. They should be limited to critical path actions for your application, such as sign-in and checkout processes for an e-commerce site.

  • Limit your end-to-end tests to testing functionality and don’t consider performance. Performance will vary greatly in a test environment based on any number of factors outside of just the application. Performance testing should be handled and approached separately.

  • Isolate end-to-end tests as much as possible. The performance impact of two end-to-end tests running on the same machine can cause what seems to be random issues. The cost of additional hardware is much less than the human cost of troubleshooting these issues (assuming you’ve done the preceding recommendations first).

I’ve spent a lot of time talking about the test suite and restoring confidence in it. This may seem off the DevOps path just a tad, but it’s important that you have a solid test strategy in place because DevOps’ strength comes from the ability to leverage automation. But automation is driven by triggers and signals throughout your environment. Automated testing is one important signal about the quality of code. Having confidence in that signal is a must. In the next section, you’ll take those signals coming from your test suite and apply them to the deployment pipeline.

5.3.2 Avoiding vanity metrics

When you start to talk about confidence in test suites, people always will reach for metrics to convey quality. This is a good reaction to have, but you need to be wary of the types of metrics you use. Specifically, you’ll want to steer clear of vanity metrics.

Vanity metrics are datapoints that you measure in your system, but they’re easily manipulated and don’t provide a clear picture of the information the user wants. For example, “Number of registered users” is a common vanity metric. It’s great that you have three million registered users, but if only five of them log in on a regular basis, the metric can be woefully misleading.

Vanity metrics can be prevalent in test suites. The metric commonly bandied about is test coverage. Test coverage is a measurement, usually a percentage, of the number of code paths that are being exercised inside a test suite. This number is usually easily accessed through tooling and can be a rallying call across developers, QA, and product teams. But in reality, test coverage is an example of a vanity metric. Coverage doesn’t necessarily speak to the quality of the test or what specifically is being tested.

If I turn on the engine to a car, I’m exercising a great number of components, just by virtue of starting the engine. But that doesn’t necessarily mean that all of the components in the car are operating to spec because they didn’t immediately blow up when the car started. I call this out now specifically so that when you’re designing tests, you’re conscious of this vanity metric concept and don’t fall victim to its allure.

Test coverage is great, but not having 100% test coverage doesn’t mean your testing suite isn’t extremely robust. And having 100% test coverage doesn’t mean your testing suite is doing anything worthwhile. You must look beyond the numbers and focus on the quality of those tests.

5.4 Continuous deployment vs. continuous delivery

Most of you don’t need continuous deployment. (Somewhere a thought leader is cringing at the idea that I would mock the holy grail of DevOps, continuous deployment.) For those who don’t know, continuous deployment is the idea that every commit to your mainline branch (primary/trunk) will trigger a deployment process into production. This means that the latest changes are always rolled out to production. This process is completely automated and hands-off.

In contrast, continuous delivery is a practice aimed to ensure that the application is always in a deployable state. This means that you no longer have a broken primary or trunk branch during a release cycle, regardless of how often you do a release.

The two items often get conflated. The main difference is that with continuous deployment, every change that gets committed to the main branch is released through an automated process, with no manual intervention. With continuous delivery, the focus is more on ensuring the capability to deploy code when needed without having to wait for a large release train with a bunch of other changes. Every commit may not be released automatically, but it could be released at any time if needed.

For a simple example, imagine a product manager wants to release a small change to fix a problem that a single customer is having. In an environment without continuous delivery, the manager might need to wait until the next release of the system, which could be weeks away. In a continuous delivery environment, the deployment pipelines, infrastructure, and development processes are structured so that a release of an individual piece of the system is possible at any point. The product manager could have development make the change and release that bug fix on a separate timetable than that of the other development teams.

With those definitions laid out, it should be noted that continuous deployment can’t really happen without continuous delivery. Continuous delivery is a stop on the journey to continuous deployment. Despite the hype around continuous deployment, I don’t feel it’s a great goal for all companies. The act of continuous deployment is supposed to force certain behaviors onto the organization. You need rock-solid automated testing if you’re going to deploy every commit to production. You need a solid code review process to ensure that multiple sets of eyes have looked at code from whatever review criteria your teams have set up. Continuous deployment forces this behavior because, without it, disaster might ensue. I believe in all these things.

But I also believe that many teams and organizations have a lot of hurdles to overcome before they can realistically begin talking about continuous deployment. When tech teams take weeks to roll out patches for their systems, moving to continuous deployment seems like a pretty big leap. The act of continuous delivery is a better goal for most organizations. The goal of deploying every change as it gets committed is a noble one. But so many organizations are so far away from being able to safely and reliably do that.

Many of their internal processes are structured around the idea of a deliverable unit called a release. But when you’re doing continuous deployment, even the concept of what a release is has to change within the company. How many internal processes are based on software versions? Sprints, project plans, help documentation, training--all these things are often centered around this ritualistic concept of a release. Considering how far the journey is, targeting the ability to release more frequently will give tremendous benefits without having to create upheaval within the organization regarding how they view, manage, and market their software platform.

But whether you decide on continuous deployment or continuous delivery, the question might come up, how do you know that your code is deployable if you’re not deploying it on a regular basis? That brings us back to the testing suite that I started this chapter with. Your testing suite will be one of the signals your teams will use to evaluate whether a change being made breaks the ability to deploy your application. Continuous delivery focuses on this idea of a series of structured, automated steps that application code goes through to prove its viability for deployment. These steps are called deployment pipelines.

DEFINITION Deployment pipelines are a series of structured, automated steps that exercise a code change, such as unit tests and packaging of the application code. Pipelines are an automated way to prove the code’s viability for deployment.

The result of the pipeline is usually some sort of build artifact that contains all of the code necessary to run the application. This includes any third-party libraries, internal libraries, and the actual code itself. The type of artifact that gets produced depends heavily on the system that you’re building. In a Java ecosystem, it might be a JAR or a WAR file. In a Python ecosystem, it might be a PIP file or a wheel. The outputted file type will depend heavily on how that artifact is going to be consumed later in the process.

DEFINITION Artifacts are a deployable version of your application or a component of the application. It is the final output of your build process. The type of artifact that gets produced will vary depending on your deployment strategy and the language your applications are written in.

With the build artifact being the last portion of your pipeline, the total set of pipeline steps might look like the following:

  1. Check out the code.

  2. Run static analysis such as a linter or syntax checker on the code.

  3. Run unit tests.

  4. Run integration tests.

  5. Run end-to-end tests.

  6. Package the software into a deployable artifact (WAR file, RPM, or a zip file, for example).

The steps are performed on every change that needs to be merged into the primary/ trunk branch. This pipeline serves as a signal to the quality and deployment readiness of the change and results in something that can be moved to production (the deployment artifact).

This is just an example list. Any number of steps can be added to your pipeline that make sense for your organization. Security scans are a logical addition to this list. Infrastructure testing and configuration management might be another set of items that you would want to add. The takeaway is that your requirements for releasing software can and should be codified as much as possible and placed into a pipeline of some sort.

5.5 Feature flags

Whether you’re using continuous deployment or continuous delivery, you can protect users from receiving new features and changes with every deployment. It can be unsettling as a user when a company is deploying new functionality multiple times per day. This is a distinct possibility when you move to continuous delivery or deployment. But you can employ several techniques to separate feature delivery from feature deployment--most notably feature flagging.

Feature flagging occurs when code functionality is hidden behind conditional logic tied to some sort of flag or semaphore. This allows code to be deployed to an environment, without necessarily having the code path available to all users. Feature flagging allows you to separate feature delivery from feature deployment. This allows the release of the feature from a marketing or product perspective to be separate and distinct from the technical release.

DEFINITION Feature flags are a type of conditional logic that allows you to tie code paths to a flag or a semaphore, which activates the code path. If not activated, the code path is dormant. Feature flags allow you to separate a code path’s deployment from the code path’s release to the user audience.

The value of a feature flag is often stored in the database as a Boolean value: true meaning the flag is enabled, and false meaning disabled. In more advanced implementations, the feature flag can be enabled for specific customers or users but not for others.

To give a basic example, say you have a new algorithm to create recommendations for users on the order page. The algorithms for the recommendation engine are encapsulated into two classes. With the feature flag, you can separate when you’ll use the new code path from when it gets deployed. In the following listing, you can see the skeleton of such an implementation.

Listing 5.1 Feature flagging a recommendation engine

class RecommendationObject(object):
    use_beta_algorithm = True               
    def run(customer):
      if self.use_beta_aglorithm == True:   
        return BetaAlgorithm.run()
      else:
        return AlphaAlgorithm.run()
class AlphaAlgorithm(object):
     //current implementation details
class BetaAlgorithm(object):
     // previous implementation details

Defines the feature flag. This value could come from a database.

Checks the feature flag value and changes execution

This example uses a variable to determine whether the beta version or the alpha version of the algorithm should run. In a real-world scenario, you would probably base this on a database query. You could have a table structured with a feature toggle name and a Boolean value of true or false. This allows you to change the behavior of the application without needing to deploy new code. Simply updating the database value would have your running applications augment their behavior to start serving the new or old algorithm.

This also has an added benefit: if there’s a problem with the new algorithm, rolling back that change doesn’t require a deployment, but simply another database update. Over time, once you feel comfortable with the change, you can make it permanent by removing the feature-flagging logic and just making that the default behavior of the application. This prevents code from being littered with tons of conditional logic for feature flags that have surpassed their experimental or testing phase.

In some cases, you might always want feature toggles to exist in order to be able to recover gracefully. Imagine you have a feature flag around code that interacts with a third-party service. If the third-party service is down or having problems, enabling/ disabling a feature flag could force your code to not interact with the third party and instead return a generic response. This allows your application to continue func-tioning but in a slightly degraded mode of performance--far better than being com-pletely down.

Feature toggle libraries exist for just about every programming language imaginable. These libraries give you a more structured way to implement toggles. Many of them also offer added benefits, like providing caching, so that every request doesn’t add an additional database query to your system. SAAS solutions are also available for feature flag implementations. A few such examples are LaunchDarkly (https://launchdarkly.com), Split (www.split.io), and Optimizely (www.optimizely.com).

5.6 Executing pipelines

The field of pipeline execution is littered with options and poorly used terms. The marketing spin on these tools has led to the misuse of terms like continuous integration. This is a pet peeve of mine, and I’ll spare you the gory details. Because of this, I’m going to use the term pipeline executors, which are a class of tools that allow for the conditional execution of various steps within a workflow.

Figure 5.2 shows an example of a pipeline. Popular tools that fit in this category include Jenkins, CircleCI, Travis CI, and Azure Pipelines.

DEFINITION Pipeline executors are a class of tools that allow for conditional execution of various steps within a workflow. They typically have various built-in integrations with code repositories, artifact repositories, and other software build tools.

The dirty secret about pipeline executors is that most of them are the same. They pretty much have feature parity in terms of integrations, hooks, and so on. At the end of the day, though, these pipelines are still running code that your organization wrote. If your test scripts are flaky or inconsistent, it doesn’t really matter what build tool you use; it won’t make those test scripts suddenly failure-free.

Figure 5.2 The flow of an example build pipeline

If you’re going to set up a large committee to discuss and evaluate 10 options, I suggest you scuttle the committee and put that energy into making your build scripts more reliable. Assuming you don’t already have a pipeline executor, you should look at your tools from the lens of least organizational resistance. A pipeline executor isn’t going to make or break your organization. Having any of the tools is infinitely better than not having them, so if you can get Azure Pipelines into your organization with relative ease, then it’s the best choice for you. If Jenkins is an easier sell to your teams, that’s the best tool. Don’t waste a lot of time evaluating many tools unless you have clear, specific, and unique requirements.

Once you’ve chosen a tool, it’s important to break your pipeline into separate, distinct phases. This is so that you can give your developers rapid feedback on their pipeline execution and assist in pinpointing where the process has broken down. Take, for example, a script that does the following:

  1. Runs unit tests

  2. Runs integration tests

  3. Creates a testing database

  4. Loads the database with data

  5. Runs end-to-end tests

In a single big-bang script, those steps are all muddled together in a log file that is probably too long, is impossible to read, and forces you to search your way through it, looking for the word “Error.” But if this same series of steps were defined as separate phases in the pipeline, you could quickly realize that the error was in loading the test database in step 4, just before the end-to-end tests were going to start.

This is an easy thing to think about when you’re talking about test pipelines, because you often think in those terms anyway. But you should remember to make sure that all your pipelines exhibit this behavior; regardless of whether it’s a code build, an environment cleanup script, or a repository rebuild, the structure of pipelines remains the same.

Last, regarding pipelines, you should consider what your signal of success looks like for each build. The pipeline, especially in testing, should ensure or confirm that something meets a certain set of criteria or has been executed successfully. Sticking with an example of building code, you want your pipeline to be able to communicate the suitability of a build. A common way to do this is by producing a build artifact as the final step of a pipeline.

A build artifact is a packaged version of your software into a single file, usually with a preloaded set of installation instructions in the artifact or binary. With the build artifact as the last step of the build process, its mere existence signifies that it has gone through the necessary steps to become an artifact. That’s your signal that the code is suitable for deployment.

If your build doesn’t produce an artifact, there has to be a method for tying your version of code to a successful build. The easiest way is to ensure that your code merge process requires a successful build before merging. There are plenty of integrations with pipeline executors that can enforce this behavior. In this way, you can ensure that your primary/trunk branch can be considered a deployable build.

The preceding options are probably the strongest, but there are other alternatives to creating signals if necessary. One option might be a key/value pair that signals a commit hash with a successful build number. This makes it easy for other processes to integrate with this behavior, especially if manual processes must be performed. Another alternative is to integrate with the build server you’re using directly as part of the code validation step. Your deployment code (or anything that needs to validate that a build is OK for deployment) can also integrate with the pipeline executor to verify that a successful build was performed for a specific commit. This creates a bit of a tighter coupling to your pipeline executor, making it harder to migrate from it in the future. But it might be worth the payoff if none of the other methods for build signals is worthwhile.

Whatever method you choose, you need to be certain that you have a way to signal quality of code to other downstream processes, most notably deployment. It all starts with a series of structured, quality test cases that can be used as an inference for code quality. Moving those tests into a pipeline executor tool allows you to further automate those processes and potentially provide points of integration.

Continuous integration vs. automated testing

With the explosion in automated testing tools and continuous integration servers like Jenkins, the term continuous integration has been muddied. It’s important to understand how true continuous integration impacts the development process.

During software development life cycles, engineers would create long-lived branches to develop their specific feature on. These long-lived branches plagued the software development world. When people had a long-lived branch, the responsibility was on the engineer to rebase the primary/trunk branch onto their feature branch. But if developers didn’t do that, massive merge conflicts would result that were cumbersome to resolve.

Not only that, but feature development and changes were progressing along in the primary branch, while the feature branch being worked on was static. It wouldn’t be uncommon to find that an approach being taken was undone or broken as the result of a change that happened in the primary earlier. Enter continuous integration.

In the continuous integration approach, engineers were required to merge their changes into the primary/trunk on a regular basis, the goal being at least daily. This not only eliminated the long-lived branch, but also forced engineers to think about how their code was written so that it could safely be merged into the primary/trunk branch.

Enter the continuous integration server (Jenkins, Circle CI, Bamboo, and others). These applications would run a series of automated tests that were required to pass successfully before the code could be merged to the master. This would ensure that the changes introduced were safe and that potentially unfinished code wouldn’t alter the behavior of the system.

Over time it became clear, however, that people were conflating the process of con-tinuous integration with the practice of just running automated tests in a continuous integration server. The distinction between the two is all but lost except for the most nuanced conversations. If you’d like to learn more about what continuous integration is really about, I point you to Continuous Integration by Paul M. Duvall (Addison-Wesley Professional, 2007), and Continuous Delivery by Jez Humble and David Farley (Addison-Wesley Professional, 2010).

5.7 Managing the testing infrastructure

The underpinnings of the testing environment often can be overlooked and orphaned in organizations. When you break down the testing environment, you’re really looking at a few key components that need to be managed:

  • The continuous integration server

  • Storage of the produced artifacts from the pipeline

  • Testing servers

  • Source code being tested

  • Test suites being executed

  • All the libraries and dependencies that are needed by the test suite

These areas tend to zigzag across lines of responsibility. Quite a bit of hardware is involved. If an artifact is built, it needs to be stored somewhere and accessible to other parts of the deployment pipeline--namely, the servers to which the artifact will be deployed. Separate servers for the test suite to run against are also needed. Finally, the environment will need a continuous integration server, which will need its rights managed as well as network access to the various integration points like the artifact storage location and the source code repositories. It will need its passwords managed and rotated.

Most of these tasks would normally fall inside the realm of the operations staff. But in some organizations, because all of these actions belong squarely in the realm of the development team, the ownership of the testing infrastructure falls into question. I’m putting my stake in the argument and advocating that the operations team own the testing environments. Here’s why.

For starters, you need your testing servers to mimic production. Some components are governed by the actual source code being tested--library versions, for example. The test suite should be intelligent enough to install those dependencies as part of the test automation. But there are also more static dependencies that won’t change from build to build, such as the database version. The database version is a static dependency that will be heavily influenced by the version that’s running in production.

The team that has the best insight into what’s running in production is, of course, the operations group. When a patch upgrade goes out to production, that patch should run through the entire testing pipeline. Since the operations team usually dictates when a patch is going to be applied (especially as it relates to security-related patches), it makes sense that they would then trigger the appropriate upgrades throughout the testing infrastructure.

In addition to versions, massive security concerns arise around testing infrastructure, specifically the continuous integration server. Because the CI server requires integrating with key parts of the infrastructure, it becomes a point of sensitivity when it comes to security. Many organizations also use their CI server as an essential component to the deployment process. So, this server will have access to the source code, the build artifact storage location, and the production network. With this level of connectivity, it makes sense to have the operations team own this server so that it can conform to the other security restrictions and processes that exist today. It’s too sensitive of a server to have it fall outside the normal security management processes.

Unfortunately, this isn’t a slam dunk. The development team will also need some level of control over the environment. Their build tasks will need the necessary permissions to install new libraries within the scope of their tests. Operations surely doesn’t want to be involved every time a developer wants to upgrade a dependent library to make sure it’s installed. The different libraries used by tests requires an ability for the testing environment to be able to compartmentalize certain operations to restrict them to the current test case. (Many CI servers offer this sort of functionality by default.) Sometimes automated testing will fail, and troubleshooting or debugging might require access to the servers. (Sometimes the issue occurs only on the build servers but can’t be reproduced locally.)

All of this is to say that even though operations needs to own the testing infrastructure, a high level of cooperation and coordination will remain between the two teams. Operations staff will need to listen to the needs of software engineers so that they can properly do their jobs. But at the same time, software engineers will have to recognize the outsized power that some testing infrastructure poses and be sensitive to that when not every request can be granted.

5.8 DevSecOps

I want to touch briefly on the subject of a new paradigm that’s been emerging in the space: DevSecOps. This extension of the DevOps mindset adds security as one of the primary concerns. Many organizations have a security team whose goal is to ensure that the applications and infrastructure that teams use meet some form of minimal safety standard. The quality as a condiment antipattern lends itself nicely to discussing DevSecOps because most organizations tack on security toward the end of a project.

During the life cycle of developing a tool, the word “security” may never be mentioned. Then just prior to go-live, a piece of software that wasn’t designed from the perspective of security is thrown through a battery of checks, evaluations, and tests that reveal it has no realistic chance of passing.

By including the security team into the DevOps life cycle, you can afford that team the opportunity to be involved in some of the design decisions early in the process and continue the idea of quality and security being essential ingredients to the solution. A DevSecOps approach requires the embedding of security scans and tests as part of the pipeline solution, similar to the way testing should operate. Imagine a suite of tests that run with a specific security context against all builds. Imagine a codified remediation process when an application brings in unsafe dependencies. These are some of the successes that DevSecOps can bring, but the road isn’t always easy.

The topic is way too vast to cover in this book. If you want a more expansive discussion, I recommend DevOpsSec by Jim Bird (O’Reilly, 2016) or Securing DevOps by Julian Vehent (Manning, 2018) as great resources.

For starters, you’ll need to invest time and energy into basic security-monitoring tools that you can automate and integrate into your build and testing pipelines. This is extremely important because many organizations don’t have a formal security program, relying on individual engineers to stay apprised of what’s going on in the industry and in the security landscape.

And sure, when there’s a large vulnerability in OpenSSL, you’ll probably hear about it through your various professional networks. But the organization needs to know about that OpenSSL vulnerability, and it needs a process that is separate and independent from individual actors (and their professional networks) scouring the internet for news of risk.

Automated scanning utilities that are capable of not only scanning your code but also resolving the transitive dependencies in your codebase and scanning those libraries for vulnerabilities are a must. With the web of dependencies that get installed with a simple package, it’s unfair to think that any individual could keep track of all those versions, relationships, and vulnerabilities. This is where software shines.

You’ll also need to bring your security team into the design process. Most teams view the security team as a sort of “ban hammer.” They don’t allow you to do anything. But the problem is that most security teams are at the tail end of the relationship; by the time they’re engaged, people have already done the things they’re now asking for permission to do. In reality, the main goal of security is risk management. But you can’t evaluate risk without understanding the costs and potential gains. Involving your security team early in the design process allows them to work collaboratively on a solution that balances the needs of the organization both from an execution and a safety standpoint.

As I mentioned earlier, the discussion of DevSecOps is a big one. Start by talking to your security team, assuming you have one. Ask them about how you can work closer together and integrate their processes into the build, test, and deployment processes. Work toward a goal of high-level checks that can be automated and integrated into the various testing pipelines along with the steps to take to engage the security team on future projects.

Summary

  • Use the testing pyramid as a guide to logically group tests.

  • Restore confidence in testing by focusing on reliability and quick developer feedback.

  • Use continuous delivery to ensure that your application is always in a releasable state.

  • Choose a pipeline executor tool that offers the least resistance from your organization.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.97.189