Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

8
Leveraging your delivery pipeline for security

This chapter covers

Security-style unit tests
A security perspective on feature toggles
Writing automated security tests
Why availability tests are important
How misconfiguration causes security issues

Most developers agree that testing should be an integral part of the development process. This way, the perils of having a separate bug-fixing phase after development is avoided. Methodologies such as test-driven development (TDD) and behavior-driven development (BDD) have made it the de facto standard to execute thousands of tests each time a change is integrated. But for some reason, perhaps because security is an afterthought for many people, this only seems to apply to nonsecurity tests. In our opinion, this doesn’t make sense. Security tests are no different than regular tests and should be executed as frequently. This doesn’t mean you need a penetration test at every commit.¹ Instead, you need a different mindset, where security concerns are seamlessly integrated into the delivery pipeline and exercised every time a change is made—and that’s what this chapter is about.

Each section in this chapter is more or less independent, but a common theme is to teach you how to integrate different security tests into your delivery pipeline. This may require thinking explicitly about security for a change, but doing this in your daily work instantly gives you feedback on and an understanding of how secure your software is. Before we dive into the details, let’s have a quick refresher on what a delivery pipeline is.

8.1 Using a delivery pipeline

A delivery pipeline is an automated manifestation of the process for delivering software to production (or to some other environment).² Although this sounds advanced and overly complex, it’s just the opposite. Suppose you have the following delivery process:

Make sure all files have been checked into Git.
Build the application from the master branch.
Execute all unit tests and make sure they pass.
Execute all application tests and make sure they pass.
Execute all integration tests and make sure they pass.
Execute all system tests and make sure they pass.
Execute all availability tests and make sure they pass.
Deploy to production (if all previous steps pass).

The first couple of steps ensure that all files have been included in the build and that the code compiles. Steps 3 to 7 exercise different quality aspects, and the last step allows deployment to production if all previous steps pass. Regardless of whether you choose to run the process manually or automatically, the main objective is to prevent bugs from slipping through to production. If you choose to use a build server, you end up with an automated manifestation of the process—a delivery pipeline as illustrated in figure 8.1.

figure08-01.eps — Figure 8.1 Example of a delivery pipeline

As illustrated, unit tests, application tests, and integration tests run in parallel, whereas the other steps run sequentially. Although not required, a benefit of automating the process is the ease of moving around the process steps. Although it’s interesting to analyze how to do this best, what’s far more important is the choice of making this an automated process.

Using a delivery pipeline guarantees that the process is executed consistently—no one can choose to skip a step or cheat when delivering to production or some other environment. You can take advantage of this to ensure that security checks are done continuously during development. By including security tests in the pipeline, you gain immediate feedback and an understanding of how secure your software is. This makes a huge difference to quality, so let’s see how you can secure your design using unit tests.

8.2 Securing your design using unit tests

When securing a design using unit tests, you need to think a bit differently from what you may be used to. Using TDD helps you focus on what the code should do rather than what it shouldn’t do. This is a good strategy, but unfortunately, it only takes you halfway. Only focusing on what the code should do makes it easy to forget that security weaknesses often are unintended behavior.

For example, if you represent a phone number as a string, you probably expect phone numbers as input and nothing else. But the definition of a string is much broader than the definition of a phone number, and this makes you automatically accept any input that could be represented by a string—a weakness that opens up the possibility of injection attacks. This justifies the need for a different test strategy that includes both what the code should do and what it shouldn’t do.

When testing your objects, we suggest using four different test types, as described in table 8.1. That way, you’ll gain confidence that the code truly does what it claims to do and that unintended behavior is avoided.

Table 8.1 Test types and their objectives

Test type	Objective
Normal input testing	Verifies that the design accepts input that clearly passes the domain rules, ensuring that the code handles vanilla input in a correct way.
Boundary input testing	Verifies that only structurally correct input is accepted. Examples of boundary checks are length, size, and quantity, but they could also include complex invariants and domain rules.
Invalid input testing	Verifies that the design doesn’t break when invalid input is handled. Empty data structures, `null`, and strange characters are often considered invalid input.
Extreme input testing	Verifies that the design doesn’t break when extreme input is handled. For example, such input might include a string of 40 million characters.

To give you a feel for and understanding of how to use these tests, we’ll walk you through an example where sensitive patient information in a hospital is sent by email. Although designing and testing an email domain primitive might seem trivial, the methods and reasoning used are universal and can be applied to any object you create.

Picture a hospital with an advanced computerized medical system. The system includes everything from medical charts to drug prescriptions to x-ray results—a vital system with thousands of transactions per day. As part of the daily routine, doctors and nurses use the system when discussing sensitive patient information. This communication is email-based, and for patient integrity reasons, it’s critical that information is never sent to email addresses outside the hospital domain.

Configuring the email servers to only accept addresses in the hospital domain is the natural strategy of choice. But what if the configuration changes or is lost during an upgrade? Then you’d silently start to accept emails to invalid addresses—a security breach that could lead to catastrophic consequences. A better strategy is to combine email server configuration with the rejection of invalid addresses in the system. This way, security in depth is achieved, which makes the system harder to attack because it’s not enough to circumvent one protection mechanism.³ But to do this, you need to understand the rules for an email address in the hospital domain.

8.2.1 Understanding the domain rules

In chapter 1, you learned that talking to domain experts helps you gain a deeper understanding about the domain. This is also the case for the hospital domain. As it turns out, the rules for an email address in this context are quite different from what you might expect.

The email address specification, RFC 5322, is quite generous when it comes to what characters an accepted address can have.⁴ Unfortunately, you can’t use the same definition in the hospital domain because several legacy systems have character restrictions that need to be considered. Because of this, the domain experts have decided to allow only alphabetic characters, digits, and periods in a valid email address. The total length is restricted to 77 characters, and the domain must be hospital.com. Several other requirements include:

The format of an email address must be local-part@domain.
The local part can’t be longer than 64 characters.
Subdomains aren’t accepted.
The minimum length of an email address is 15 characters.
The maximum length of an email address is 77 characters.
The local part can only contain alphabetic characters (a-z), digits (0-9), and one period.
The local part can’t start with nor end with a period.

At first, it can be tempting to represent an email address as a String because of the generous definition in RFC 5322. But the requirements defined by the domain rules suggests that a better choice would be to represent it as a domain primitive, EmailAddress. One way to ensure it complies with the domain rules is to drive the design using unit tests, so let’s start by testing normal behavior.

8.2.2 Testing normal behavior

When testing normal behavior , you want to focus on input that clearly meets the domain rules. For EmailAddress that means input that fits within the length constraints (15 to 77 characters); has hospital.com as the domain; and has a local part containing only alphabetic characters (a-z), digits, and at most one period. This way, confidence is gained that the implementation works as expected when vanilla input is provided.

In listing 8.1, you see an example of how to capture the normal behavior of EmailAddress. The test is executed with JUnit 5, and the construction is quite clever in the sense that it uses a stream of input values (valid email addresses), which are mapped to a lazily executed test case—a dynamic test.⁵ Compared to an ordinary test case, a dynamic test case is different in that it’s not defined at compile time but rather at runtime. That way, it’s possible to dynamically create test cases based on parameter input, as is done in the listing. In addition, using a parameterized test construction is often preferable when confirming a theory because it lets you easily add or remove input values without affecting test logic.

Listing 8.1 Test capturing the normal behavior of EmailAddress

import static org.junit.jupiter.api.Assertions.assertDoesNotThrow;
import static org.junit.jupiter.api.DynamicTest.dynamicTest;
 
class EmailAddressTest {
  @TestFactory
  Stream<DynamicTest> should_be_a_valid_address() {
     return Stream.of(    ①  
           "[email protected]",    ②  
           "[email protected]",    ②  
           "[email protected]")    ②  
           .map(input ->
              dynamicTest("Accepted: " + input,    ③  
                () -> assertDoesNotThrow(  ④  
                  () -> new EmailAddress(input)))); 
  }
}

Having this test in place allows you to start designing the EmailAddress object. According to the domain rules, only alphabetic characters, digits, and one period are allowed in the local part. This adds some complexity, but the next listing shows a solution that addresses this using a regular expression (regexp). The domain is also restricted to hospital.com, which prevents any other domains from being accepted.

Listing 8.2 EmailAddress meeting the normal behavior criteria

import static org.apache.commons.lang3.Validate.matchesPattern;
 
public final class EmailAddress {
 
   public final String value;
 
   public EmailAddress(final String value) {
      matchesPattern(value.toLowerCase(),    ①  
        "^[a-z0-9]+\.?[a-z0-9]+@\bhospital.com$",    ②  
        "Illegal email address");
 
      this.value = value.toLowerCase();    ③  
   }
   ...
}

But testing normal behavior is only one step toward making EmailAddress secure. You also need to ensure that addresses close to the semantic boundary behave as expected. For example, how do you know if an email address longer than 77 characters is rejected or that an address can’t start with a period? This justifies adding a new set of tests where the boundary behavior is verified.

8.2.3 Testing boundary behavior

In chapter 3, we discussed the importance of understanding the semantic boundary of a context and how data could implicitly change meaning when crossing a boundary. For a domain object, it’s often a combination of simple structural rules (for example, length, size, or quantity) and complex domain rules that defines the semantic boundary. For example, consider a shopping cart on a web page that’s modeled as an entity. It’s fine to add items up to a certain limit and modify the cart as long as you haven’t gone through checkout. After that, the order is immutable, and updates are illegal. This state transition makes the order cross a semantic boundary because the meaning of an open order isn’t the same as that of a submitted order. This is important to test because many security problems tend to lie around these boundaries.

Returning to EmailAddress and the hospital domain, you need to ensure the design truly satisfies the boundary conditions defined by the domain rules. Fortunately, you can simplify the testing a little because the rules don’t impose any complex state transitions like those in the shopping cart example. Instead, you only have structural requirements, such as length restrictions and which symbols to allow, and they are quite easy to test. Table 8.2 summarizes the boundary conditions that need to be verified.

Table 8.2 Boundary conditions to verify

Accept	Reject
Address that’s exactly 15 characters long	Address that’s 14 characters long
Address with a local part that’s 64 characters long	Address with a local part that’s 65 characters long
Address that’s exactly 77 characters long	Address with a local part containing an invalid character
	Address with multiple @ symbols
	Address with a domain other than hospital.com
	Address with a subdomain
	Address with a local part that starts with a period
	Address with a local part that ends with a period
	Address with more than one period in the local part

Having this list in place allows you to start designing unit tests that verify boundary behavior for each particular case. In listing 8.3, you see an example of how to implement this with JUnit 5. The first test, should_be_accepted, verifies that an address is accepted if it’s part of the hospital.com domain and between 15 and 77 characters long. The second test, should_be_rejected, is a bit longer and focuses on rejecting input that’s outside the boundaries; for example, input that’s too short, too long, has invalid characters, or has an invalid domain.

Listing 8.3 Tests verifying that addresses meet boundary conditions

import static org.apache.commons.lang3.StringUtils.repeat;
import static org.junit.Assert.assertEquals;
import static org.junit.jupiter.api.Assertions.assertDoesNotThrow;
import static org.junit.jupiter.api.Assertions.assertThrows;
import static org.junit.jupiter.api.DynamicTest.dynamicTest;
 
class EmailAddressTest {
   @TestFactory
   Stream<DynamicTest> should_be_accepted() {
      return Stream.of(
            "[email protected]",    ①  
            repeat("X", 64) + "@hospital.com")    ②  
            .map(input -> dynamicTest("Accepted: " + input,
              () -> assertDoesNotThrow(() -> new EmailAddress(input))));
   }
 
   @TestFactory
   Stream<DynamicTest> should_be_rejected() {
      return Stream.of(
         "[email protected]",    ③  
         repeat("X", 64) + "@something.com",    ⑧  
         repeat("X", 65) + "@hospital.com",    ④  
         "[email protected]",    ⑤  
         "jane@[email protected]",    ⑥  
         "[email protected]",    ⑦  
         "[email protected]",    ⑧  
         "[email protected]",    ⑨  
         "[email protected]",    ⑩  
         "[email protected]")    ⑪  
         .map(input ->
             dynamicTest("Rejected: " + input,
                () -> assertThrows(    ⑫  
                         IllegalArgumentException.class,
                         () -> new EmailAddress(input))));
   }
}

Executing this test shows that the implementation of EmailAddress is too weak. The regular expression ^[a-z0-9]+.?[a-z0-9]+@hospital.com$ is a bit naive because it doesn’t limit the length of the local part or the total length of an address.

Listing 8.4 shows an updated version of EmailAddress where length is explicitly checked before applying the regexp. In chapter 4, you learned that a lexical scan should always be applied before processing the input. This can be achieved using a positive lookahead in the regular expression, but we’ve deliberately skipped it because the length check ensures the input is safe to parse regardless of which characters it contains.⁶ However, in more complex situations, you should protect the parser by doing a lexical scan first.

Listing 8.4 EmailAddress with explicit length check

import static org.apache.commons.lang3.Validate.inclusiveBetween;
import static org.apache.commons.lang3.Validate.isTrue;
import static org.apache.commons.lang3.Validate.matchesPattern;
 
public final class EmailAddress {
 
   public final String value;
 
   public EmailAddress(final String value) {
      inclusiveBetween(15, 77, value.length(),    ①  
         "address length must be between 15 and 77 chars");
 
      isTrue(value.indexOf("@") < 65,    ②  
         "local part must be at most 64 chars");
 
      matchesPattern(value.toLowerCase(),
         "^[a-z0-9]+\.?[a-z0-9]+@\bhospital.com$",
         "Illegal email address");
 
      this.value = value.toLowerCase();
   }
   ...
}

Adding the explicit length check does indeed make the design appear solid. Unfortunately, this is where most developers stop their testing efforts, because the implementation appears to be good enough. But from a security perspective, you need to go further.

It’s also important to verify that harmful input can’t break the validation mechanism. For example, the design of EmailAddress relies heavily on how regular expressions are interpreted. This is fine, but what if there’s a weakness in the regexp engine that could make it crash when parsing a certain input, or if there’s input that takes an extremely long time to evaluate? Flushing out these types of problems is the objective of the last two test types: invalid input testing and extreme input testing. Let’s see how to apply invalid input testing on the EmailAddress object.

8.2.4 Testing with invalid input

Before you design tests with invalid input, you need to understand what invalid input is. As a general rule of thumb, any input that doesn’t satisfy the domain rules is considered invalid. But from a security perspective, we’re also interested in testing with invalid input that causes immediate or eventual harm, and for some reason, null, empty strings, and strange characters tend to have this effect on many systems.

Listing 8.5 illustrates how EmailAddress is tested with invalid input. The input is a mix of addresses containing strange characters, null values, and input resembling valid data. With this type of testing, you increase the probability that the design truly holds for simple injection attacks that could exploit weaknesses in the validation logic.

Listing 8.5 Testing with invalid input

import static org.junit.Assert.assertEquals;
import static org.junit.jupiter.api.Assertions.assertThrows;
import static org.junit.jupiter.api.DynamicTest.dynamicTest;
 
class EmailAddressTest {
   @TestFactory
   Stream<DynamicTest> should_reject_invalid_input() {
      return Stream.of(
            null,    ①  
            "null",    ①  
            "nil",    ①  
            "0",    ①  
            "",    ①  
            " ",    ①  
            "	",    ①  
            "
",    ①  
            "john.doe
@hospital.com",    ①  
            "   @hospital.com",    ①  
            "%[email protected]",    ①  
            "john.d%[email protected]",    ①  
            "[email protected]",    ①  
            "--",    ①  
            "e x a m p l e @ hospital . c o m",    ①  
            "=0@$*^%;<!->.:\()&#"",    ①  
            "©@£$∞§|[]≈±´•Ωé®†µüıœπ˙~ß∂¸√ç‹›‘’‚…")    ①  
            .map(input ->
               dynamicTest("Rejected: " + input,
                  () -> assertThrows(    ②  
                           RuntimeException.class,
                           () -> new EmailAddress(input))));
   }
}

After running the boundary tests, it appears that the design of EmailAddress was good enough. But testing with invalid input revealed that null causes the implementation to crash when invoking value.length(). The next listing is an updated version of EmailAddress where null is explicitly rejected by a notNull contract.

Listing 8.6 Updated version of EmailAddress that rejects null input

import static org.apache.commons.lang3.Validate.inclusiveBetween;
import static org.apache.commons.lang3.Validate.isTrue;
import static org.apache.commons.lang3.Validate.matchesPattern;
import static org.apache.commons.lang3.Validate.notNull;
 
public final class EmailAddress {
 
   public final String value;
 
   public EmailAddress(final String value) {
      notNull(value, "Input cannot be null");    ①  
 
      inclusiveBetween(15, 77, value.length(),
              "address length must be between 15 and 77 chars");
 
      isTrue(value.indexOf("@") < 65,
              "local part must be at most 64 chars");
 
      matchesPattern(value.toLowerCase(),
              "^[a-z0-9]+\.?[a-z0-9]+@\bhospital.com$",
              "Illegal email address");
 
      this.value = value.toLowerCase();
   }
   ...
}

Testing with input that causes eventual harm

Testing with input that causes eventual harm is interesting from a security standpoint because it’s the underlying foundation of second-order injection attacks.^*

In chapter 3, we talked about context mapping and how data changes meaning when crossing a semantic boundary. A similar reasoning applies when trying to understand where and when input might cause eventual harm in a system. This is because the input isn’t trying to exploit a weakness in the receiving context, but rather in a context where it’s used at a later stage. For example, when analyzing how the EmailAddress is used in the hospital domain, you might find that it’s used in SQL queries and displayed on a web page. Although this isn’t the primary concern of the EmailAddress object, knowing this should inspire you to test against SQL injection and cross-site scripting (XSS) attacks.

In the following code example, the EmailAddress object is tested with 10 SQL injection statements to ensure that it rejects the input. There are of course a lot more SQL injection statements to test for, but this gives you an idea of how to gain confidence that the EmailAddress object isn’t susceptible to SQL injection attacks. (A better solution might be to dynamically load thousands of injection strings from an SQL dictionary instead of listing them explicitly.^† )

Testing that SQL injection statements are rejected

import static org.junit.Assert.assertEquals;
import static org.junit.jupiter.api.Assertions.assertThrows;
import static org.junit.jupiter.api.DynamicTest.dynamicTest;
 
class EmailAddressTest {
  @TestFactory
  Stream<DynamicTest> should_reject_SQL() {
     return Stream.of(
           "'or%20select *",                         ①  
          "admin'--",                               ①  
           "<>"'%;)(&+",                            ①  
           "'%20or%20''='",                          ①  
           "'%20or%20'x'='x",                        ①  
           ""%20or%20"x"="x",                    ①  
           "')%20or%20('x'='x",                      ①  
           "0 or 1=1",                               ①  
           "' or 0=0 --",                            ①  
          "" or 0=0 --")                           ①   
           .map(input ->
              dynamicTest("Rejected: " + input,
                 () -> assertThrows(                 ②  
                       RuntimeException.class,
                       () -> new EmailAddress(input))));
  }
}

Running the invalid input tests shows that the validation logic is sound. But to ensure it’s really secure, we also need to test the extreme.

8.2.5 Testing the extreme

Testing the extreme is about identifying weaknesses in the design that make the application break or behave strangely when handling extreme values. For example, injecting large inputs can yield poor performance, memory leaks, or other unwanted behaviors. Listing 8.7 shows how EmailAddress is tested using a Supplier lambda with inputs ranging from 10,000 to 40 million characters. This clearly doesn’t meet the domain rules, but the point isn’t to test them; it’s rather to see how the validation logic behaves when parsing the input. Ideally, it should reject it, but if a poor evaluation algorithm is used, then all sort of craziness might happen.

Listing 8.7 Testing EmailAddress with extreme values

import static org.apache.commons.lang3.StringUtils.repeat;
import static org.junit.Assert.assertEquals;
import static org.junit.jupiter.api.Assertions.assertThrows;
import static org.junit.jupiter.api.DynamicTest.dynamicTest;
 
class EmailAddressTest {
   @TestFactory
   Stream<DynamicTest> should_reject_extreme_input() {
      return Stream.<Supplier<String>>of(
            () -> repeat("X", 10000),    ①  
            () -> repeat("X", 100000),    ②  
            () -> repeat("X", 1000000),    ③  
            () -> repeat("X", 10000000),    ④  
            () -> repeat("X", 20000000),    ⑤  
            () -> repeat("X", 40000000))    ⑥  
            .map(input ->
               dynamicTest("Rejecting extreme input",
                  () -> assertThrows(    ⑦  
                           RuntimeException.class,
                           () -> new EmailAddress(input.get()))));
   }
}

As it turns out, running the extreme input test shows that the design of EmailAddress truly holds. The input is rejected in an efficient way, but this might not have been the case. In chapter 4, we talked about validation order and the importance of validating input length before parsing contents. Listing 8.7 is an example where it really matters.

The length check can seem redundant, but without it, the extreme input yields such terrible performance that the application more or less halts. This is because when the regexp engine fails to match an expression, it backtracks to the character next to the potential match and starts over again. For large input, this could lead to a catastrophic performance drop due to the vast number of backtracking operations.⁸

This concludes the EmailAddress example and how to use a security mindset when designing unit tests. But this is only one step toward making software secure by design. Another way is to ensure you only have the features you want in production, and this brings us to the next topic: verifying feature toggles .

8.3 Verifying feature toggles

With continuous delivery and continuous deployment increasingly becoming best practices in software development, the use of feature toggles when developing systems has also found greater acceptance. Feature toggling is a practice that allows developers to rapidly develop and deploy features in a controlled and safe manner. Feature toggling is a useful tool, but if used excessively, it can quickly become complex and nontrivial. Depending on what functionality you’re toggling, a mistake made in the toggling mechanism can lead to not only incorrect business behavior, but also severe security complications (as you’ll soon see).

When using feature toggles, it’s important to understand that a toggle alters the behavior of your application. And like any other behavior, you should verify it using automated tests. This means you shouldn’t verify only the feature code in your application, but also the toggles themselves. Before we start looking at how to verify toggles, let’s take a look at an example of why it’s important you verify them.

8.3.1 The perils of slippery toggles

Here’s a story about a team of experienced developers and an unfortunate mishap with feature toggles—a mishap that led to exposure of sensitive data in a public API.⁹ This mishap could have been avoided if the developers had used automated tests to verify the toggles. If you’re not familiar with feature toggling, don’t worry, you’ll get a primer before we move on with the rest of the section.

The members of the team had been working together for some time, and it had become a tight group that was delivering working software at a high pace. The team applied many software development practices from continuous delivery, and they also used test-driven development when writing code. In addition to that, they’d built an extensive delivery pipeline that ensured only properly working features made it all the way to production.

The team was working on a set of new functionality. One of the first things they did, as they’d done many times before, was to add a feature toggle that allowed them to turn the new functionality on and off. This toggle was used when executing local tests on a developer’s computer and the CI server, or when running tests against a deployed instance in the test environment. The new functionality was to be exposed through a public API, and when finished, it would have proper authentication and authorization so that only certain users could call the new API endpoints. The authorization would be based on some new permission rules that hadn’t been developed yet and would be developed by a different team. But the new permission rules weren’t needed to verify the rest of the business behavior. This allowed the team to continue to work while the other team was finishing up on its side. The toggle for the unfinished functionality was configured to be off during production in order to prevent it from being exposed in the public API. It was to remain off until the new functionality was completely finished and had passed all acceptance tests.

At one point during development, the toggle accidentally got enabled in the production configuration. This happened because of a mistake made by a developer when merging some code changes in the configuration files. The number of toggles used in the application had built up over time, and the configuration for the toggles had become rather complex. Spotting a subtle mistake in the configuration wasn’t easy, and it was a mistake any one of the developers could have made. This mishap resulted in the new functionality being exposed in the public API—but without any form of authorization controls in place, because they hadn’t been implemented yet. This made it possible for almost anyone to access the new endpoints. Fortunately, the mistake was soon discovered by the team, and the error in the configuration was corrected before the exposed functionality was ever executed in production.

Had an ill-minded person discovered those publicly exposed endpoints, they could have caused significant damage to the company. Even though this particular story ended well, there’s still an interesting observation to make: none of the toggle configurations were verified to work as expected. If the team had employed automatic verification of the behavior of each toggle, it would have prevented the mishap from ever happening.

We wanted to share this story with you to show you a real example of how feature toggles can lead to quite serious problems if not implemented correctly. You’re now ready to start looking at how to verify feature toggles from a security perspective.

8.3.2 Feature toggling as a development tool

A full exploration of the topic of feature toggling is beyond the scope of this book. But in order to understand why and how you should test your feature toggles, we feel it’s fitting to begin with a brief introduction to the subject. If you’re already familiar with feature toggling, you can view this section as a quick refresher.

In essence, a feature toggle works much like an electric switch. It lets you turn on and off a certain feature in your software, like an electric switch turns a light bulb on and off (figure 8.2). Apart from turning features on and off, toggles can also be used to switch between two different features, letting you alternate between different behaviors.

figure08-02.eps — Figure 8.2 Feature toggles let you switch between features or turn them on and off.

When working on new functionality, you can use a toggle to turn on, or enable, that functionality when you need to run tests or deploy the application to a test environment. This gives you full access to the new functionality while you’re working on it. At the same time, the toggle lets you turn off, or disable, the functionality when the application is deployed to your staging or production environment. This ability to turn specific functionality on and off gives you full control over when the functionality is made available to end users.

Another aspect of using feature toggles is that it lets you perform development on the main branch of the version control system instead of a long-lived feature branch. This is something many consider to be a necessity in order to follow best practices from continuous integration and, as a consequence, continuous delivery. (This is yet another reason for why feature toggles are becoming more common among developers.)

There are various types of feature toggles. Some are used to toggle features still in development, others to enable or disable functionality in production, depending on runtime parameters like time or date or certain aspects of the current user. You can also implement toggles in different ways. The most basic implementation is by changing a piece of code to either include or exclude certain parts of the codebase, as seen in listing 8.8.

As you can see, the code toggles between an old and a new functionality. The old functionality is invoked via the callOldFunctionality method. When the old functionality is enabled, you disable the new functionality by commenting out the callNewFunctionality method. When you want to use the new functionality instead, you do the opposite: you comment out the callOldFunctionality method and invoke callNewFunctionality, as is done in the listing with usingNewImplementation.

Listing 8.8 Feature toggling by code in its most rudimentary form

void usingOldImplementation() {
 
   doSomething();
   callOldFunctionality();    ①  
   //callNewFunctionality();    ②  
   doSomethingElse();
 
}
 
void usingNewImplementation() {
 
   doSomething();
   //callOldFunctionality();    ③  
   callNewFunctionality();    ④  
   doSomethingElse();
 
}

A more elaborate toggle can, for example, be controlled via configuration provided at application startup. An example of this is shown in listing 8.9, where the functionality executed depends on the value of a system property called feature.enabled. If you want more dynamic toggles, you can make them controllable during runtime via some administrative mechanism.¹⁰

Listing 8.9 Feature toggling by configuration—a simple example

void branchByConfigurationProperty() {
 
  final String isEnabled = System.getProperty("feature.enabled", "false");
    if (Boolean.valueOf(isEnabled)) {    ①  
       doSomething();
    }
    else {
       doSomethingElse();
    }
 
}

Regardless of what type of toggles you use, or what mechanism you use to toggle them, it’s important to understand that a feature toggle alters the behavior of your application. When you’re flipping the toggle’s switch, you’re changing the behavior of your system. When you make use of feature toggles, you’re designing your system to allow for alternating behavior, and, like any other behavior in your application, you should verify it with as many automated tests as you can. Because feature toggles can lead to security implications, it’s important you get them right. Now that we’ve reviewed the basics of feature toggles, we can start looking at how you can verify them using automated tests.

8.3.3 Taming the toggles

Whenever you use feature toggles, you introduce complexity. The more toggles you add, the more complexity you end up with, especially if the toggles depend on each other. If you can, minimize the number of toggles you use at any given point. If that’s not possible, then you’ll have to learn how to deal with the complexity they add.

Complexity increases the likelihood of making mistakes, and when talking security, even a simple mistake can lead to severe problems. For example, exposing unfinished functionality in a public API can lead to a variety of security problems, ranging from direct economic loss to sensitive data being exposed.

If you create automatic tests that verify every toggle works as intended and you add those tests to your delivery pipeline, you get a safety net that ensures the toggles behave as expected. Because the tests are executed automatically, and for every build, they also work as regression tests for future changes, preventing you from accidentally messing things up. The scenario from the story at the beginning of this section, where a bad code merge led to API endpoints being exposed to the public, could’ve been prevented if there had been automatic tests in place that made sure the new functionality was never enabled in production.

Always strive to test feature toggles automatically rather than manually. Automated tests are the most reliable and deterministic way to verify not only feature toggles but any behavior of your code. There are exceptions to the rule, and sometimes you’ll find it too costly to automate the verification. In those cases, it makes sense to resort to manual verification. When you need to perform manual testing, make sure you add that as a manual step in your delivery pipeline. By doing so, you avoid the risk of forgetting to perform the testing before a deliverable is marked as ready for production, because you can’t accidentally skip a step that’s in the pipeline.

Table 8.3 shows a few examples of how you can verify different types of feature toggles. These are basic suggestions, and often the verification will be more elaborate, but they’ll suffice to give you an idea of how to verify a toggle using an automated test.

Table 8.3 Examples of methods for verifying feature toggles

Type of toggle	Typical methods of verification
Remove functionality in public API	If removed successfully, the API should: Return 404 in an HTTP API call Discard/ignore sent messages Refuse connections on a socket
Replace existing functionality	Try to perform a new action. New behavior shouldn’t be observed until finished (can be checked via resulting data or nonexisting UI elements, and so forth).
New authentication/authorization	Should be unable to log in/access system with new functionality/users/permissions. Only the old way should work.
Alternating behavior	When enabling feature A, then feature B shouldn’t be executed/accessible, and vice versa when enabling feature B.

Listing 8.10 shows an example of a slightly more realistic OrderService that provides the ability to place an order. The OrderService has been extended with a new feature that sends data about the placed order to a business intelligence (BI) system. The new feature is toggled with the help of a ToggleService, which is a fictional library for managing feature toggles. Whenever the placeOrder method is executed, the OrderService checks to see whether the new or old order mode is enabled and acts accordingly.

Listing 8.10 OrderService with a new feature placed in a toggle

import static org.apache.commons.lang3.Validate.notNull;
 
public class OrderService {
 
   // ...
 
   public void placeOrder(final Order order) {
      notNull(order);
 
      if (OrderMode.OLD.equals(toggleService.orderMode())) {
         orderBackend.process(order);    ①  
      }
      else if (OrderMode.NEW.equals(toggleService.orderMode())) {
         orderBackend.process(order);
         biBackend.record(order);    ②  
      }
      else {
         throw new IllegalStateException("No supported order mode");
      }
   }
 
}
 
public class ToggleService {    ③  
 
   public enum OrderMode {
      OLD("old"),
      NEW("new");
 
      private final String key;
 
      OrderMode(final String key) {
         this.key = key;
      }
 
      public String key() {
         return key;
      }
   }
 
   private OrderMode orderMode = OLD;
 
   public OrderMode orderMode() {
      return orderMode;
   }
 
   public void setOrderMode(final OrderMode orderMode) {
      this.orderMode = notNull(orderMode);
   }
}

An example of how to write tests for this toggle is shown in listing 8.11. The tests aren’t focusing on the behavior of the underlying functionality of placing an order and sending data to a BI system. They’re only concerned with verifying if correct behavior is triggered based on the setting of the toggle. If the order mode of the toggle is set to OLD, then the order should be sent for processing, but nothing should be sent to the BI system. If the order mode is set to NEW, then data about the order should be sent to the BI system in addition to the order being processed. The tests are using mocks to verify interaction with the supporting services (the BI backend and the order backend). Don’t worry if you’re not familiar with using mocks in tests. In this example, it’s a way to verify if any calls have been made to the supporting services .

Listing 8.11 Testing the toggle in OrderService

import org.junit.Test;
 
import static org.mockito.Matchers.any;
import static org.mockito.Mockito.*;
 
public class OrderServiceToggleTests {
 
   @Test
   public void should_process_order_if_old_order_mode_is_enabled() {
      givenOrderModeIs(OLD);
 
      whenPlacingAnOrder();
 
      thenOrderShouldBeProcessed();    ①  
   }
 
   @Test
   public void should_not_send_to_BI_if_old_order_mode_is_enabled() {
      givenOrderModeIs(OLD);
 
      whenPlacingAnOrder();
 
      thenOrderShouldNotBeSentToBI();    ②  
   }
 
   @Test
   public void should_process_order_if_new_order_mode_is_enabled() {
      givenOrderModeIs(NEW);
 
      whenPlacingAnOrder();
 
      thenOrderShouldBeProcessed();    ①  
   }
 
   @Test
   public void should_send_to_BI_if_new_order_mode_is_enabled() {
      givenOrderModeIs(NEW);
 
      whenPlacingAnOrder();
 
      thenOrderShouldBeSentToBI();    ③  
   }
 
   private ToggleService toggleService;
   private OrderBackend orderBackend;
   private BIBackend biBackend;
 
   private void givenOrderModeIs(final OrderMode orderMode) {
      toggleService = new ToggleService();
      toggleService.setOrderMode(orderMode);
   }
 
   private void whenPlacingAnOrder() {
      createOrderService().placeOrder(new Order());
   }
 
   private OrderService createOrderService() {
      orderBackend = mock(OrderBackend.class);    ④  
      biBackend = mock(BIBackend.class);    ④  
      return new OrderService(orderBackend,
                              biBackend,
                              toggleService);
   }
 
   private void thenOrderShouldBeProcessed() {
      verify(orderBackend).process(any(Order.class));    ⑤  
   }
 
   private void thenOrderShouldNotBeSentToBI() {
      verifyZeroInteractions(biBackend);    ⑥  
   }
 
   private void thenOrderShouldBeSentToBI() {
      verify(biBackend).record(any(Order.class));    ⑦  
   }
 
}

So far, you’ve learned why it’s important to test your toggles, and you’ve seen a few examples of how to test them. There are a few more things to discuss before we close the section on feature toggles: dealing with a large number of toggles and the fact that the process of toggling can be subject to auditing .

8.3.4 Dealing with combinatory complexity

If you’re using multiple toggles, you should strive to verify all combinations of them, especially if there are toggles that affect each other. Even if they aren’t directly related, you should test all the combinations, because there might be indirect coupling between them. Indirect coupling can occur at any time during development. As you might guess, it can quickly become a combinatory nightmare if you have a large number of feature toggles to verify. But the more toggles you have, the more likely it is you’ll get something wrong—and the more important it is you test them. This is one of the reasons why you should always try to keep the number of feature toggles as low as possible.

One could argue that it isn’t necessary to test all combinations if you first perform a risk analysis—evaluating how much more confidence or less risk you get by testing all combinations versus testing a few of them—and then only choose a selected set of combinations to test. This approach might appear reasonable, but it’s based on the assumption that you can assess security flaws you’re unaware of. If you’re aware of them, you most likely have already addressed them.¹¹ Our recommendation is to verify all combinations of your toggles and mitigate the testing complexity by reducing the number of toggles in your codebase .

8.3.5 Toggles are subject to auditing

One thing to keep in mind when using runtime toggles is the importance of exposing the toggle mechanism in a safe manner. Because these types of toggles are changing the behavior of the application in production, the mechanism you use to change the state of the toggles should be protected so that only authorized access is possible. You should also consider if any modifications to the state of a toggle should be logged for auditing purposes. It should always be possible to identify when and by whom a toggle was changed in production.

The use of feature toggling is becoming more and more popular, and we predict many developers will come to see such toggles as a natural part of how software is developed. An effect of this is that it’ll be increasingly important to verify your toggles in an automatic way and to make that verification part of your delivery pipeline. We’re advocates of using feature toggles because they bring many benefits to software development. As long as you’re aware of the potential pitfalls and how to mitigate them, we believe the benefits far outweigh the drawbacks. In the next section, we’ll take a look at how to get started writing automated tests that explicitly verify security features and vulnerabilities.

8.4 Automated security tests

Most developers will agree that security testing is important and should be performed regularly. The reality, though, is that most software projects will never be subjected to a security audit or a penetration test, perhaps because the software has been deemed low risk or because security has been overlooked by the developers. Another common reason for why these tests are skipped, in our experience, is because penetration tests are often considered too time-consuming and costly.

Security testing tends to be time-consuming because a lot of the testing involved can be hard to automate. It’s hard to automate because it’s the experience and knowledge of the security expert that’s needed to expose possible flaws and weaknesses in an application.

In a way, the work (and value) of a penetration tester is not that different from that of a normal tester performing exploratory testing. Humans can perform tasks and logical reasoning in ways computers are still incapable of. Trying to replace a human tester with automated tests is not a realistic option, nor do we suggest it should be your goal. But some of the testing performed during a penetration test can be automated. In this section, you’ll learn how to write tests that can be used to perform a mini pen test as part of your delivery pipeline.

8.4.1 Security tests are only tests

One thing you should realize is that security tests are no different from any other tests (figure 8.3). The only difference is that as developers, for whatever reason, we choose to label the tests with the word security. If you know how to write regular automated tests to verify behavior and find bugs, you can apply the exact same principles to security testing.

Do you handle failed login attempts correctly? Write a test for it.
Does your online discussion forum have adequate protection against XSS? Write a test that tries to enter malicious data.

Once you understand that there’s nothing magical about security testing, you can start using automated tests to verify security features and to find security bugs.¹²

figure08-03.eps — Figure 8.3 Security tests are no different from other tests.

Let’s take a closer look at what types of checks a security tester performs. Some are more or less mandatory in the sense that they will always be performed regardless of what the goal of the testing is. A lot of these checks can be considered hygiene-level, and an application should always pass them. As it turns out, many of them aren’t that hard to perform through automated tests. The checks that are easy to automate are usually also the ones where having a human performing them adds little value. Converting these into automated tests not only allows you to run them at will, but it also allows the testers to focus on more elaborate testing. Supplying malicious data to check for flaws in input validation, such as flaws that enable SQL injection or buffer overflow attacks, isn’t only a mundane task but also a good example of testing that can be automated.

8.4.2 Working with security tests

To help you understand what features to test and how to structure the work with test automation, we can categorize security tests into two main categories: application and infrastructure (as seen in table 8.4). Apart from these two types of tests that explicitly focus on security, there are also tests with a domain focus. We covered domain testing in the first part of this chapter and, as you learned, those tests will also help secure your system. We’ll now take a look at the other two categories of tests.

Table 8.4 Types of security tests

Category	Types of checks
Application focused	These tests verify the application in parts other than the domain. Examples include checking HTTP headers in a web application or testing input validation.
Infrastructure focused	These tests verify correct behavior from the infrastructure running the application. Examples include checking for open ports and looking at the privileges of the running process.

For tests focusing on the application and infrastructure, there are a number of tools available that might be worth exploring. Port scanning tools can, for example, be set up to run against the server you deploy your application on. Likewise, a web testing tool can scan your web application or run predefined use cases, while at the same time checking for vulnerabilities.¹³ You can also use tools to scan your code for vulnerable third-party dependencies.¹⁴ Any unexpected results from a test run should fail and cause the delivery pipeline to be stopped.

These types of tests can sometimes take a while to execute, so you might choose to run them less often than other tests in your pipeline. If you have other long-running tests such as performance tests executed nightly, a good approach can be to run the scanning tools before or after them.

8.4.3 Leveraging infrastructure as code

With the adoption of cloud computing, the idea of infrastructure as code (IaC ) is becoming more common. The basic concept of IaC is that it allows you to declaratively define infrastructure. This can be anything from servers and network topologies to firewalls, routing, and more. This has multiple advantages, one of which is making the setup of your infrastructure deterministic, giving you the ability to recreate your entire infrastructure as many times as you want. It also becomes a breeze to use version control to track the history of every change to your infrastructure, no matter how small or big.

From a security perspective, this is exciting. Not only do you minimize the risk of human error, but you can also use this approach to automatically verify your infrastructure. Because you’re putting all changes in a version control system, you get traceability of any changes made, and the automated nature of IaC means you can verify the changes before pushing them into production.

For example, say you’re updating a firewall. Before applying the changes in production, you first apply them in a preproduction environment. The ideal way to do this is to completely recreate the entire infrastructure in a mirrored setup. Once you’ve created the preproduction environment, you can run automated security tests against it, verifying that no previous functionality has been unintentionally altered and that the changes made have the expected effects. You can then safely deploy the changes in production. If you are using IaC or are about to move in that direction, you should definitely look into the opportunities it provides in terms of securing your infrastructure .

8.4.4 Putting it into practice

By writing tests with an explicit security focus and adding them to your pipeline, you can pick a lot of low-hanging fruit. If you couple that with the execution of existing tools in an automated fashion, you get a mini pen test you can execute at will and as often as you want. This field is still developing, but we’ll be watching it with interest in the upcoming years because we’re hoping the tools will become more mature and accessible to both developers and QA.

You have now learned the basics of how to automate explicit security testing. In the next section, we’ll take a look at why availability is important and how it relates to secure software.

8.5 Testing for availability

It’s easy to think that the classical security concerns of confidentiality, integrity, and availability (CIA) only apply to information security, but they’re also important when designing secure software.¹⁵ For example, confidentiality is about protecting data from being read by unauthorized users, and integrity ensures data is changed in an authorized way. But what about availability? Many developers find it easy to understand but difficult to test because it concerns having data available when authorized users need it.

For example, suppose a fire breaks out and you call 911 (or 112 in Europe), but your call doesn’t get through, not because you dialed the wrong number but because the switchboard is flooded with prank calls. Not good! Another less serious example is when you’re trying to buy tickets to a popular concert online, and the website crashes or can’t be accessed. Often, this isn’t the result of malevolent behavior, but rather that everyone tries to buy tickets at the same time; people’s intentions are good, but the consequences are equally as bad as those of an evil attack.

Testing availability is therefore something every application needs to do, but how do you do this in practice? One way is to simulate a denial of service (DoS) attack, which lets you understand what the behavior is before and after data becomes unavailable.¹⁶ To do this, you need to start by estimating the headroom.

8.5.1 Estimating the headroom

Estimating the headroom is about trying to understand how much load an application can handle before it fails to serve its clients in a satisfactory way. Typical things to look for are memory consumption, CPU utilization, response times, and so on. But it can also be a way to understand how the application behaves before it fails and where the weak spots are in the design.

Figure 8.4 shows an example of a distributed denial of service (DDoS) attack , where a massive number of parallel requests are made from different servers against an application. Regardless of how many requests are made or how much load they generate, the main objective is to limit the availability of the application’s services. When talking about DDoS attacks, it’s not uncommon to use the more generic term DoS attack. The main difference between the two is that DoS attacks are made from a single server instead of multiple ones. The objective is, however, the same, and from now on, we’ll use the terms DDoS and DoS interchangeably.

figure08-04.eps — Figure 8.4 Denial of service attack (DDoS, but more commonly referred to as DoS attack)

By simulating a DoS attack, you can easily get a feel for how well your application scales and how it behaves before it fails to meet its availability requirements. It’s important to note that regardless of how well a system is designed, an attack large enough will eventually break it. This makes it practically impossible to design a system that’s 100% resilient, but estimating the headroom is a good strategy to use when trying to understand where the weak spots are in your design.

Several commercial products and open source alternatives let you load test your application. One example is “Bees with Machine Guns,”¹⁷ which is a utility for creating EC2 server instances on the Amazon Web Services platform that attack an application with thousands of parallel requests.¹⁸ In listing 8.12, you see an example of how to configure eight EC2 instances that issue 100,000 requests, 500 at a time, against a website.

Listing 8.12 Simple example of configuring a test running a DDoS attack

bees up -s 8 -g public -k your_ssh_key    ①  
bees attack -n 100000 -c 500 -u website_url    ②  
bees down    ③

Regardless of which product you choose, having tests in your delivery pipeline that place your system under a heavy load is an efficient way of flushing out weaknesses that could be exploitable by an attacker in production. But a DoS attack doesn’t require thousands of parallel requests to be successful. Availability could be affected in a more sophisticated way; for example, by exploiting domain rules that execute under the radar .

8.5.2 Exploiting domain rules

When exploiting domain rules , you’re actually creating a domain DoS attack in which rules are executed in a way that’s accepted by the business, but with malicious intent.¹⁹ To illustrate, let’s consider the example of a hotel that has a generous cancellation policy.

To provide great customer service, the hotel manager has decided to fully refund any reservation that’s canceled before 4 p.m. on the day of arrival. This allows for great flexibility, but what if someone makes a reservation without the intent of staying at the hotel? Won’t that prevent someone else from making a reservation, causing the hotel to lose business? It certainly will, and that’s how a domain DoS attack works. By exploiting the domain rules for cancellation, it’s possible to reserve all the rooms at the hotel and cancel them at the latest possible moment without being charged. This way, an attacker might be able to block out a certain room type or direct customers to a competitor’s hotel.

This type of attack might seem fictitious and unlikely, but there are several real-world examples where this has happened. One was in San Francisco, where the ride-sharing company, Lyft, accused rival Uber of booking and then canceling over 5,000 rides in an attempt to affect its business.²⁰ Another was in India, where Uber sued its competitor Ola for booking over 400,000 false rides.²¹

Simulating this in tests might seem pointless, but the fact is that by exercising domain rules in a malicious way, you gain deeper understanding of weaknesses in the domain model—knowledge that could be invaluable when designing alarms to trigger on thresholds and user behavior, for example, or when using machine learning to detect malicious activity. But testing availability is only one thing to consider when adding security to your delivery pipeline. Another is to understand how an application’s behavior changes with its configuration, especially the security aspects. And this brings us to the next topic: validating configuration .

8.6 Validating configuration

In contemporary software development, common features are often realized through configuration; you bring in an existing library or framework that allows you to enable, disable, and tweak functionality without having to implement it yourself. In this section, we’ll take a look at why it’s important to verify your configuration and how automation can be used to protect against security flaws caused by misconfiguration.

If you’re building a web application, you probably don’t want to spend time writing your own HTTPS implementation to serve web requests or implementing a homegrown ORM framework for database persistence—both of which can be hard to get right. Instead of implementing these generic features yourself, you can use an existing implementation in the form of a library or a framework. For most developers, this makes a lot of sense, because bringing in generic functionality via an external tool lets you focus on what’s unique about your business domain.

Even if you do decide to roll your own in-house implementation of generic functionality, you’re most likely going to distribute it as a library for development teams to reuse in their applications. Regardless of which approach you take, the result is that important features of an application are provided by code external to the current team, and those features are controlled through configuration. The features provided can be generic but can, nonetheless, play a central role in the security of your application. As a consequence, errors in the configuration can directly lead to security problems. Automated tests can effectively be used to mitigate these problems .

8.6.1 Causes for configuration-related security flaws

Security flaws resulting from faulty configuration can generally be said to stem from either unintentional changes, intentional changes, or misunderstood configuration (figure 8.5).

figure08-05.eps — Figure 8.5 Underlying causes for security flaws induced by configuration

Let’s take a look at each one of these underlying causes to give you an understanding of how they can arise, and why it’s so important that you use automated tests to prevent these types of flaws from occurring.

Unintentional changes

Being able to control functionality through configuration makes the lives of developers a lot easier. Not only does it speed up development, but it can also make your application more secure. Using well-known, community-reviewed, battle-tested, open source implementations is most likely going to be more secure than writing your own libraries. Getting security features right in software is hard, even for the most seasoned security experts.

When features are controlled via configuration, it’s easy to alter the behavior of your application. Even substantial changes can be made by altering one line in a configuration. But although it’s easy to change the behavior to something you want, it’s equally easy to unintentionally change it to something you don’t want. Say you mistakenly alter a line in your configuration or you misspell a string parameter, and suddenly the behavior of your application is silently changed; there’s no exception or other error when you run the application. If you’re unlucky, the changed behavior makes your application vulnerable in one way or another, and if you’re really unlucky, you’re not going to notice until after it’s been deployed to production.

What you need is a safety net that can catch many of the problems caused by unintentional configuration changes. Creating automated tests that check features and behavior enabled via configuration is a relatively economical and easy way to implement such a safety net.

Intentional changes

It’s not only unintentional changes that can make your application insecure through unwanted side effects. Sometimes an intentional change can have unwanted side effects too.

Say you’re implementing a new feature and, as part of that, you need to make a change in your application’s configuration. You verify the new behavior—ideally by adding a new automated test as you just learned—and then continue implementing the rest of the new feature. But what you didn’t notice when verifying the new behavior was that by making the change, you also altered the behavior in a different part of the application. Maybe the configuration you changed had been carefully placed there by another developer as the result of a previously conducted security audit or penetration test, or in order to prevent the exposure of a certain weakness. When altering the configuration, you also disabled those security features, leaving your application exposed.

Unknowingly changing the behavior in one part of a system while making changes in another part isn’t an uncommon scenario. The scenario is similar to the one with the unintentional changes, but it’s worth pointing this one out because, as a developer, you’re not doing anything wrong here.

Unintentional changes are caused by someone making a mistake, so you might think you can protect yourself by being more careful or introducing more rigorous processes. But, in this case, you’re making the correct code changes with a deliberate intent. You might even add automated tests for the changes you’re currently making. The tests you add might protect you from unintentional changes of the feature you just implemented, but unless you have tests for the already existing features, your intentional change can break existing behavior. This is something to look out for when working in existing codebases where, historically, not many tests have been written.

Misunderstood configuration

The third main cause of misconfiguration is not understanding the configuration mechanism used. In essence, this occurs when you think you’re configuring a certain behavior, whereas in reality, you’re configuring something else. This can easily happen when the configuration API for the library you’re using hasn’t been designed to be unambiguous.

Integer values, magic strings, and negating statements are typical giveaways of an ambiguous configuration API. When you use such configuration, chances are you’re not getting what you think you are. Every time you configure a feature, make it a habit to add a test that verifies your configuration is doing what you intend .

8.6.2 Automated tests as your safety net

How can you protect yourself from accidentally introducing security vulnerabilities in other parts of the software than the one you’re currently working on? How can you ensure the intention and tribal knowledge behind important configuration don’t get lost as a codebase evolves? As we’ve already hinted, an efficient way is to write automated tests to verify the expected behavior and use those tests as a regression suite in your delivery pipeline.

If you’re new to this view on testing configuration from a security perspective, it helps to think in terms of configuration hot spots. A configuration hot spot is an area in your configuration where the type of behavior you’re controlling has a direct or indirect impact on how secure your system will be. To give you an idea of some typical configuration hot spots, table 8.5 lists examples of functionality it’s important to have automated tests for.

Table 8.5 Examples of configuration hot spots to test

Type of configuration	Examples of behavior controlled
Web containers	HTTP headers CSRF tokens Output encoding
Network communication	Transport Layer Security (HTTPS and so on)
Data parsing	Behavior of data parsers (such as XML and JSON parsers)
Authentication mechanisms	Authentication on/off Integration settings (for example, for CAS and LDAP)

Our experience is that the functionality that is controlled via configuration and is interesting from a security perspective is often fairly straightforward to write automated tests for. In a web application, for example, it isn’t hard to write a test that checks for proper HTTP headers or that a form uses CSRF tokens.²² These types of tests are best created as you’re developing an application, but because they tend to be straightforward to write, it’s fairly easy to add them to an existing codebase.

When discussing test automation for functionality controlled by configuration, sometimes arguments are made against this practice. One common argument is that testing your configuration is similar to testing a setter method that sets a simple value and therefore adds little value. Although this can be true for some types of configuration, it isn’t true for the one we’re discussing here.

The type of configuration you should test is configuration that alters the behavior of your application. In the same way you write tests to verify the behavior you implement, it’s equally important to write tests for the behavior you configure. Once you realize that you aren’t testing the configuration itself but rather the resulting behavior, it becomes clearer why this is so important.

8.6.3 Knowing your defaults and verifying them

In addition to the behaviors you explicitly configure, it’s also important to verify the implicit behaviors you get when using a library or framework. An implicit behavior is one you get without adding any configuration. This is sometimes also referred to as a default behavior. Because there’s no configuration, the tricky part here is even knowing you have an important feature to verify. In order to gain that knowledge, you need to know the defaults of the tool you use.

As an example, most modern web frameworks make it easy to write HTTP APIs or RESTful web services. Numerous frameworks and libraries allow the developer to declaratively write code to define the HTTP endpoints. These types of frameworks can boost your productivity because they let you focus on your business logic instead of generic plumbing and boilerplate code. What enables the code you write to be clean and concise is usually the application of sensible default behavior by the framework. As long as you stick to the defaults, there’s little code to write. This is all good when writing code tutorials or small proof-of-concept applications, but for real business-critical projects, you must make sure you understand exactly what the defaults are. In many cases, the defaults will help make your application more secure, but in some cases, they might sacrifice some level of security for an increased ease of use. If you aren’t aware of those trade-offs, you might expose security vulnerabilities without knowing it.

Say you’re writing an HTTP service. It could be in the form of a RESTful API or some other API approach based on HTTP. In order to reduce the number of attack vectors, it’s a good security practice to only enable the HTTP methods required by the API. If an API endpoint is meant to serve data to clients accessing it using an HTTP GET, you should make sure it doesn’t return a normal response when accessed with any other HTTP method. Instead, it can respond with a status code 405 Method Not Allowed or 501 Not Implemented to let the client know the requested HTTP method is not supported. The more HTTP methods the endpoint responds to, the more security vulnerabilities it opens up. For example, TRACE is an HTTP method known to be used to perform cross-site tracing (XST) attacks, so you don’t want to enable TRACE unless you have to.²³

Listing 8.13 shows an example of how to write a test that verifies only specific HTTP methods are enabled for an endpoint. Note that the example is simplified and that a real implementation depends on how the API under test is designed and what the definition of an enabled endpoint is. Other aspects to consider are, for example, if custom HTTP methods are allowed and if authentication is enabled.

Listing 8.13 Testing enabled HTTP methods

import org.junit.Test;
import java.net.URI;
 
import static java.util.Arrays.asList;
import static java.util.stream.Collectors.toList;
import static java.util.stream.Collectors.toSet;
import static org.apache.commons.lang3.Validate.notNull;
import static org.junit.Assert.assertEquals;
 
public class OnlyExpectedMethodsAreEnabledTest {
 
   enum HTTPMethod {
      GET, HEAD, POST, PUT, DELETE, CONNECT, OPTIONS, TRACE
   }
 
   URI uri;
   List<Result> results;
 
   @Test
   public void verify_only_expected_HTTP_methods_are_enabled() {
      givenEndpoint("http://example.com/endpoint");    ①  
 
      whenTestingMethods(HTTPMethod.values());    ②  
 
      thenTheOnlyMethodsEnabledAre(GET, PUT, HEAD);    ③  
   }
 
   void givenEndpoint(final String uri) {
      this.uri = URI.create(uri);
   }
 
   void whenTestingMethods(final HTTPMethod... methods) {
      results = Arrays.stream(methods)
                      .distinct()
                      .map(method -> getStatus(method, uri))
                      .collect(toList());
   }
 
   void thenTheOnlyMethodsEnabledAre(final HTTPMethod... methods) {
      final Set<HTTPMethod> enabled = enabledHttpMethods();
      assertEquals(new HashSet<>(asList(methods)), enabled);
   }
 
   Set<HTTPMethod> enabledHttpMethods() {
      return results.stream()
                    .filter(r -> isEnabled(r.status))
                    .map(r -> r.method)
                    .collect(toSet());
   }
 
   static class Result {
 
      final int status;
      final HTTPMethod method;
 
      Result(final int status, final HTTPMethod method) {
         this.status = status;
         this.method = notNull(method);
      }
 
   }
 
   boolean isEnabled(final int statusCode) {
     // Check if the status code is considered    ④  
     // as "enabled"
   }
 
   Result getStatus(final HTTPMethod method, final URI uri) {
     // Call the URI with the given HTTP method    ④  
     // and return the status
   }
 
   // ...
 
}

Remember that the mindset you should have here isn’t to explicitly forbid HTTP methods, but to only enable those that are needed for the functionality you’re implementing. Also, even if the default settings are what you need, you should add tests that verify the behavior. Even if the defaults are what you need right now, a later release of the framework might change the defaults; if you have tests for those behaviors, you’ll immediately catch those changes .

In this chapter, you saw several ways to use your delivery pipeline to automatically verify security concerns. Some approaches we’ve discussed have involved a more explicit focus on security than other concepts in this book. If you’re already familiar with some of these approaches, we hope you’ve learned how to view them from a slightly different perspective. In the next chapter, you’ll learn how to securely handle exceptions and how you can use different design ideas to avoid many of the issues with traditional error handling.

Summary

By dividing tests into normal testing, boundary testing, invalid input testing, and extreme input testing, you can include security in your unit test suites.
Regular expressions can be sensitive to inefficient backtracking, and, therefore, you should check the length of input before sending it to the regular expression engine.
Feature toggles can cause security vulnerabilities, but you can mitigate those vulnerabilities by verifying the toggle mechanisms using automated tests.
A good rule of thumb is to create a test for every toggle you add, and you should test all possible combinations of them.
You should watch out for the combinatory complexity that large numbers of toggles can lead to. The best way to avoid this is by keeping the number of toggles as small as possible.
The toggle mechanism itself can be subject to auditing and record keeping.
Incorporating automated security tests into your build pipeline can give you the ability to run a mini penetration test as often as you like.
Availability is an important security aspect that needs to be considered in every system.
Simulating DoS attacks helps in understanding weaknesses in the overall design.
A domain DoS attack is extremely difficult to protect against because it’s only the intent that distinguishes it from regular usage.
Many security problems are caused by misconfiguration, and the cause for faulty configuration can be either unintentional changes, intentional changes, or misunderstood configuration.
Configuration hot spots are good indicators for finding areas in your configuration where testing is most critical.
It’s important to know the default behavior of the tools you use and assert that behavior with tests.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.