12
Guidance in legacy code

This chapter covers

  • How to deal with ambiguous parameters
  • Security issues caused by logging
  • How DRY is about ideas, rather than text
  • Absence of negative tests as a warning sign
  • Introducing domain primitives in legacy code

Once you’ve grokked the fundamentals of the secure by design approach, you can start applying the concepts when writing code. This is usually easier when you’re doing greenfield development, but you’ll most likely spend a lot of time working on legacy codebases—codebases that weren’t created with a secure by design mindset. When working on such codebases, it can be difficult to know how to apply the concepts you’ve learned in this book and where to start.

In this chapter, you’ll learn how to identify some problems and pitfalls that we’ve found are common in legacy codebases. Some of them are easy to spot, and others are more subtle. We also provide you with tips and advice on how to rectify them. Some issues require more effort than others, so you need to choose your approach based on your situation. Hopefully this chapter will provide you with enough guidance to make that choice easier.

You’ll start by learning why ambiguous parameters in methods and constructors are a common source of security flaws. Then you’ll learn what to look out for when logging and why defensive code constructs are problematic. We then show you how seemingly well-designed code can have subtle issues that cause security issues. At the end of this chapter, you’ll learn about some mistakes that are commonly made when introducing domain primitives in legacy code and what to watch out for when you start implementing domain primitives.

12.1 Determining where to apply domain primitives in legacy code

In chapter 5, you learned about the concept of domain primitives and how they exist if and only if they are valid. This characteristic enabled you to apply domain primitives in several situations to either enable stronger security or completely remove the possibility of an attack—but there’s a catch to all of this. Most agree that using domain primitives makes sense and is an intuitive design pattern, but to satisfy the invariant of validity, you must also identify the context in which the domain primitive applies. Otherwise, you won’t be able to decide which domain rules to use, making it impossible to reject invalid data—this is a common pitfall when introducing domain primitives in legacy code.

To illustrate, picture the worst possible scenario: a homogeneous mass of code where strings, integers, and other generic data types are passed around. The only way to know what the developer’s intentions are is to look at variable names, method signatures, and class hierarchies, but sometimes not even that makes it clear. All in all, the codebase is a mess that would benefit from using domain primitives—but where do you start?

Because the concept of domain primitives is easy to grasp, many start by wrapping generic types such as integers and strings in explicit data types, but doing so often ends up being a mistake. As we’ll elaborate further in section 12.8, a domain primitive must encompass a conceptual whole, and wrapping generic data types only results in an explicit type system, not in the design that you want. Instead, what you should do is to start by identifying a bounded context and the semantic boundary, because this is where you’ll start creating your domain primitives.

In chapter 3, we talked about bounded contexts and how a domain model captures the semantics of the ubiquitous language within a context. As part of this, you also learned that the semantic boundary can be implicit and hard to see in code, but it can be identified by testing the semantics of the domain model. As soon as the semantics of a term or concept change, the model breaks, which means that a semantic boundary is found.

A good starting point when identifying a bounded context is therefore to group concepts that belong together. A common pattern is to use a package or a module for your classes and methods, but how you organize them doesn’t matter. What’s important is that all concepts that share the same semantic model should be grouped together. For example, if there’s a method with the signature

public void cancelReservation(final String reservationId)

then every class or method that uses a reservationId or operates on a Reservation object must share the same understanding of what they mean, or it could result in misunderstandings and bugs. But a discrepancy in semantics isn’t only a source of errors, it’s also an indicator that the concept should be moved out of that grouping, which sometimes is a painful experience because it requires heavy redesign.

Grouping concepts that belong together is only the first step toward creating a bounded context. The next step is to introduce domain primitives to express the semantic boundary. But this is also when things could go seriously wrong unless you’re careful, so let’s proceed by investigating how you deal with ambiguous parameter lists.

12.2 Ambiguous parameter lists

One easily recognizable design flaw in code is ambiguous parameter lists in methods and constructors. An example of such a method is shown in the following listing, where the shipItems method takes two integers and two addresses as input parameters.

Listing 12.1 Method with ambiguous parameter list

public void shipItems(int itemId, int quantity,    ①  
                     Address billing, Address to) {    ②  
  // ...
}
 
public class Address {
  private final String street;
  private final int number;
  private final String zipCode;
 
  public Address(final String street, final int number,
                 final String zipCode) {
    this.street = street;
    this.number = number;
    this.zipCode = zipCode;
  }
 
   // ...
}

In chapter 5, you learned that ambiguous parameters are common sources of security issues for a couple of reasons. One problem is that generic types lack the level of input validation that you should strive for to create robust code. Another is that it’s common to accidentally swap the parameters with each other. We’ve kept the shipItems method fairly short to make it concise as an example, but you might encounter parameter lists much longer than this. If you have a method with 20-plus parameters, all of type String, it’s hard not to mess up the order. When you see this type of code construct, you should see it as an opportunity to change your code to be more secure.

As you can see in listing 12.2, it’s easy to get the parameters mixed up when calling the shipItems method. Both itemId and quantity are of the primitive type int, and accidentally swapping them when calling the method is a mistake that can go unnoticed. The billing address and the shipping address are both complex types that look like domain objects. But they’re of the same type, and there’s nothing preventing you from making the mistake of mixing them up.

Listing 12.2 Accidentally swapping parameters

int idOfItemToBuy = 78549;
int quantity = 67;
Address billingAddress = new Address("Office St", 42, "94 102");
Address shippingAddress = new Address("Factory Rd", 2, "94 129");
 
service.shipItems(quantity, idOfItemToBuy,    ①  
                  shippingAddress, billingAddress);  ②  

Accidentally swapping any of these parameters can have severe side effects. For example, sending 78,549 packages of an item with ID 67 is different from sending 67 packages of an item with ID 78549. It’s also highly unlikely that a company’s finance department would appreciate having a handful of pallets of industrial grease dumped at its entrance, while the factory gets an invoice in its mailbox.

The solution for ambiguous parameter lists is to replace all of the parameters, or as many as you can, with domain primitives (see chapter 5) or secure entities (see chapters 5, 6, and 7). By doing so, you’ll not only make the parameters unambiguous but also make the code more secure, as you learned in chapters 5–7. The way you go about doing this refactoring depends on the codebase at hand and how much time is available.

In this section, you’ll learn three different approaches and when they’re a good fit and a less good fit. The approaches we’ll discuss, together with a summary of their pros and cons, are shown in table 12.1. It’s worth noting that these approaches are also applicable when trying to introduce domain primitives in general, not only when replacing ambiguous parameters; for example, after you’ve identified a context boundary using the tips in section 12.1.

Table 12.1 Approaches to dealing with ambiguous parameter lists
ApproachProsCons
Direct approach —Replace all ambiguous parameters at once
  • Solves everything right away
  • Works well with smaller codebases and few developers
  • Can be performed quickly if codebase is of reasonable size
  • Too much refactoring in large codebases
  • Not well suited if data quality is a big issue
  • Best performed by a single developer
Discovery approach—Find and fix the problems before changing the API
  • Works well if data quality is poor
  • Works with larger codebases and multiple teams
  • Takes a long time to complete
  • May not be able to keep up with a fast-changing codebase
New API approach—Create a new API and then gradually refactor away the old API
  • Allows for incremental refactoring
  • Works well with both small and large codebases
  • Works with multiple developers and teams
  • If data quality is an issue, combine it with the discovery approach

12.2.1 The direct approach

The direct approach is to first introduce domain primitives and secure entities to replace all ambiguous parameters. After that, you perform the change and alter the method signature, typically one parameter at a time.

Listing 12.3 shows how the shipItems method looks with the parameters replaced with domain primitives. As soon as you do the swap, you’ll most likely run into everything from compilation errors to test failures. You tackle this by solving one issue at a time. Once you have fixed all the problems and can build the application, you can deploy it to a test environment and let it process data. Because you’ve introduced domain primitives with more strict validation, bad data that used to flow through your system is now going to be rejected. Therefore, you need to monitor it for any errors that can pop up.

Listing 12.3 The direct approach: replace all at once

public void shipItems(final ItemId itemId,
                      final Quantity quantity,
                      final BillingAddress billing,
                      final ShippingAddress to) {    ①  
  // ...
}

An advantage of the direct approach is that it solves everything right away. When you’re finished with the refactoring, there’s no more work to be done. This approach is typically well suited for smaller codebases owned by a single team, where it tends to be the fastest approach. It’s also worth noting that you don’t have to refactor the code behind the method API immediately; instead, you can choose to gradually refactor it.

If the codebase is large, then this might not be a viable solution, because it can leave the code in a broken state for a long period of time before you manage to fix all the errors—it becomes a big-bang refactoring exercise. Even if you manage to fix all compilation errors and failing tests, it can take a while before you’ve ironed out possible runtime errors caused by poor data quality. Also, if the changes affect more than one team, it can be difficult or impossible to sync the work, and it needs to happen simultaneously.

You should also be aware that introducing domain types with strict invariants that reject bad data can lead to heated discussions about how domain primitives break the code instead of improving it. In those situations, it’s important to remember that domain primitives and secure entities didn’t create the problem. The bad data was already there, and what you should do is address the data quality at the source.

12.2.2 The discovery approach

Another approach is one we like to call the discovery approach. As the name implies, the tactic of this approach is to introduce domain primitives and secure entities behind the public API without changing the API. After you’ve done that, you’ll discover and fix possible problems before swapping and replacing the ambiguous parameters. This approach is especially good if data quality is a big issue.

An example of how to apply the discovery approach on the shipItems method is shown in listing 12.4. You keep the method signature as is, and for each ambiguous parameter, you try to create the corresponding domain primitive. For the integer value itemId, you try to create a domain primitive ItemId; for the generically typed billing address, you try to create a domain primitive called BillingAddress; and so on. For each domain primitive you instantiate, you catch and log any exceptions that can occur. When you’re done, you can start running your test suites and even deploy the code to production. Monitor the logs for errors and address the problems as they occur. Analyze each exception to find the root cause and fix the problem.

Listing 12.4 The discovery approach: fix the problems first

public void shipItems(final int itemId, final int quantity,
                      final Address billing, final Address to) {
  tryCreateItemId(itemId);    ①  
  tryCreateQuantity(quantity);    ①  
  tryCreateBillingAddress(billing);    ①  
  tryCreateShippingAddress(to);    ①  
 
   // ...
}
 
private void tryCreateItemId(final int itemId) {
  try {
      new ItemId(itemId);
  } catch (final Exception e) {
      logError("Error while creating ItemId", e);    ②  
  }
}
 
private void tryCreateQuantity(final int quantity) {
  try {
      new Quantity(quantity);
  } catch (final Exception e) {
      logError("Error while creating Quantity", e);    ②  
  }
}
 
// other tryCreate methods...
 
import static org.apache.commons.lang3.Validate.isTrue;
 
public class ItemId {    ③  
   private final int value;
 
   public ItemId(final int value) {
      isTrue(value > 0, "An item id must be greater than 0");
      this.value = value;
   }
 
   public int value() {
      return value;
   }
 
   // domain operations, equals(), hashcode(), toString(), and so on
}
 
// other domain primitives...

Remember that the purpose of the tryCreate methods isn’t to prevent bad data from entering, but to discover it. Common sources for errors include bad data at runtime and invalid stubs or mocks used in tests. Once you feel confident that you’ve fixed the majority of issues, you can start replacing the parameters the same way as with the direct approach.

The discovery approach works well if data quality is a major problem and you want to avoid disruption. If the domain rules you’re introducing with domain primitives and secure entities are so strict (albeit correct) that a lot of the data you’re processing fails to meet those rules, then a defensive approach like this is a good choice. It also works well on both small and large codebases.

The downside of the discovery approach is it’ll take longer to perform the transition to the new API. You first need to introduce the creation of secure domain types, then monitor logs for a period of time, and then address data quality problems. It isn’t until all these steps are done that you can make the API change. Another challenge is when active development is occurring while you’re performing the transition. New code can be written using the old API, adding to the technical debt on one side while you’re trying to clean up the data on the other side. If a fast-moving codebase is a challenge, then you might want to combine this approach with the approach you’ll learn about next—the new API approach.

12.2.3 The new API approach

The third and last approach we’ll discuss is called the new API approach. The core idea in this approach is to create a new API alongside the existing one, as shown in listing 12.5. This new API only uses domain primitives and secure entities but still provides the same functionality. You ensure that the functionality is the same, either by delegating to the old API or by first extracting the logic from the old API and then reusing the extracted logic in the new API. Once the new API is in place, you can gradually refactor calling code to use the new API instead. Eventually all references to the old API will be gone, and you can delete it.

Listing 12.5 The new API approach: incremental refactoring

import static org.apache.commons.lang3.Validate.notNull;
 
public void shipItems(final ItemId itemId,    ①  
                      final Quantity quantity,
                      final BillingAddress billing,
                      final ShippingAddress to) {
   notNull(itemId);
   notNull(quantity);
   notNull(billing);
   notNull(to);
 
   shipItems(itemId.value(), quantity.value(),    ②  
             billing.address(), to.address());
}
 
@Deprecated
public void shipItems(int itemId, int quantity,
                     Address billing, Address to) {    ③  
   // ...
}

This approach has several advantages. One is that it lets you do the transition to the new hardened API in small steps. Because you refactor one part of the codebase at a time, you get smaller commits and fewer data issues to deal with. Another is that because of the gradual refactoring, it also works well when dealing with large codebases or multiple teams. But if data quality is an issue, you’ll still need to use the ideas from the discovery approach you learned about earlier.

You’ve now learned that ambiguous parameter lists are a common source of security problems. You’ve also looked at three different approaches for dealing with the problem once you’ve found it. When replacing generic parameters with domain primitives and secure entities, it’s common that you’ll expose issues with poor data quality. If this is the case, you might need to fix those problems first before you can completely introduce the new strict types in your API.

12.3 Logging unchecked strings

In chapter 10, we talked about the importance of using a centralized logging service instead of writing to a file on disk. Changing the logging strategy in legacy code sounds like a big operation at first, but often it’s enough to replace the backend of the logging framework with one that sends data across the network instead of writing to a file on disk. This could, in fact, be done in a seamless fashion, but, unfortunately, that only takes you halfway from a secure logging perspective. What you still need to do is ensure that unchecked strings never get logged—which is addressed using domain primitives—because this can open up potential security vulnerabilities, such as data leakage and injection attacks.

When bringing this up, we often hear that most logging frameworks only accept strings, and avoiding strings isn’t an option. We certainly agree, but there’s a big difference between not logging strings and logging unchecked strings. Unchecked strings are, in fact, the root cause of many attack vectors and should be avoided at all cost. Unfortunately, not many realize this, and the mistake is often repeated over and over again regardless of experience level. Let’s dive into an example and learn what to look for in a codebase so you don’t make the same mistake.

12.3.1 Identifying logging of unchecked strings

The problem of logging unchecked strings isn’t restricted to only legacy code. It can appear in any codebase, regardless of quality. The problem is often hidden in plain sight, and what you need to look for is the logging of unverified String objects (for example, logging of user input or data from another system). In the following listing, you see an example where an unchecked string is logged. The fetch method is part of a table reservation service in a restaurant system that lets you fetch table reservations by ID.

Listing 12.6 Logging of unchecked input when fetching a table reservation

public TableReservation fetch(final String reservationId) {
   logger.info("Fetching table reservation: " +
                                  reservationId);    ①  
   // fetch table reservation logic
   return tableReservation;
}

It’s not important why the reservation ID is logged. What’s important is that the input argument is logged without being checked. Even though a reservation ID has strict domain rules (for example, it must start with a hash sign followed by seven digits), the developer decided to represent it as a String. This means that even though you expect it to be a valid reservation ID, an attacker could inject anything that matches the rules of a String object, which literally could be anything (for example, 100 million characters or a script). Another serious issue is the possibility of a second-order injection attack, as we discussed in chapter 9 when talking about handling bad data. If there’s a weakness in the log parsing tool, an attacker could exploit it by injecting a malicious string that pretends to be a reservation ID and have it written to the logs—not good!

The solution is to use a domain primitive instead of a String object, because it lets you control the input instead of the attacker having control. But, as you learned in section 12.2, how you introduce domain primitives in an API could cause a whole set of new problems, and choosing a strategy that doesn’t cause too much fuss is recommended.

12.3.2 Identifying implicit data leakage

Another issue related to logging unchecked strings is leakage of sensitive data in logs due to an evolving domain model. This problem is somewhat hard to recognize unless you know what to look for, so let’s turn back to our previous example, but with a slight modification, as seen in the following listing. What’s been added is the logic for fetching the corresponding table reservation from a repository and a log statement, where the internal data of the matching reservation object is serialized as a String before it’s written to the logs.

Listing 12.7 Serializing the table reservation object as a String

public TableReservation fetch(final String reservationId) {
   logger.info("Fetching table reservation: " + reservationId);
   final TableReservation tableReservation =
            repository.reservation(reservationId);    ①  
   logger.info("Received " + tableReservation);    ②  
   return tableReservation;
}

Logging a table reservation can seem harmless because it only contains data such as the table number, reservation ID, number of guests, and time slot. But what if the domain model evolves? For example, imagine that someone decides to include member information in a reservation to allow for better service. Then the TableReservation object suddenly contains sensitive data (for example, name, contact information, and membership number) that implicitly leaks when logging a reservation. But keeping track of how data is consumed in a large system is close to impossible, so how do you prevent this from happening?

As it turns out, preventing implicit data leakage is hard, but detecting it isn’t that difficult. In chapter 5, we discussed how to deal with sensitive values and the read-once object pattern. This pattern lets you decide how many times a value is allowed to be read, and if this limit is exceeded, a contract violation occurs. That way, you only need to identify sensitive data in your system and model it as such to prevent it from being read more often than you allow, as would be the case when doing an unknown log operation.

But there’s another subtle problem in the statement logger.info("Received " + tableReservation). Concatenating the string "Received " with the tableReservation object using the + operator creates an implicit call to the toString method of tableReservation, which allows it to be represented as a String object. This operation is more or less harmless if you haven’t overridden the toString method in java.lang.Object, because it only results in a string containing the class name and an unsigned hexadecimal representation of the hash code. But toString is frequently overridden because it’s an easy way to debug the contents of an object.

To avoid updating toString every time a field is added or modified, many choose to use reflection to dynamically extract all the data. This implies that sensitive data can implicitly leak through the toString implementation. You should, therefore, never rely on toString when logging. What you should do is use explicit accessor methods for data that you want to log, because then new fields never end up in logs by accident.

Logging unchecked strings is certainly problematic, but avoiding it isn’t that hard if you put some thought into it. Another issue that often appears in legacy code is defensive code constructs—so let’s proceed and see how you can deal with this in an efficient way.

12.4 Defensive code constructs

To make your system stable and robust is indeed an honorable thing, and chapter 9 was devoted to this quest. A well-designed system should never crash with a NullPointerException. But we often see code riddled with != null checks, even deep in the code where it wouldn’t be sensible that a null value should show up at all. Repeated checks of null, of formats, or of boundary conditions isn’t good design in itself. Rather, it’s an expression that one piece of the code dare not trust the design of the rest of the code, so it feels the need to recheck just in case. Instead of having clear contracts of what the parts can expect from each other, the code becomes riddled with defensive code constructs of unclear purpose.

The problem here is twofold. First, the code becomes convoluted, hard-to-read, and bloated because of unnecessary repetition—also making it hard to refactor. This is bad in itself, but it leads to a second problem. When such code is maintained over time, it risks becoming incoherent. And code with incoherent checks is a great opportunity for attackers to exploit in creative ways. One way to turn this around is to clarify the contracts between different parts of the code (as described in chapter 4) and introduce appropriate domain primitives (as described in chapter 5).

12.4.1 Code that doesn’t trust itself

Let’s start by looking more closely at some code that doesn’t trust itself. The code in listing 12.8 is taken from an online bookstore, where the ShoppingService adds books to an Order using the public method addToOrder. The public method has several private helper methods, such as putInCartAsNew and isAlreadyInCart. Note the prevailing checks of non-null and of formats, even deep inside the private methods. Take a particular look at the numerous checks inside isAlreadyInCart at the bottom of the Order class. Also note the irony of ShoppingService, which checks the format of isbn before sending it as an argument in order.addBook(isbn, qty). Although the format check is done, that check isn’t trusted by the code inside Order. Compare this with the use of domain primitives, where you can rely on the fact that the value wrapped in an ISBN domain primitive adheres to the format rules for ISBNs.

Listing 12.8 Code with repeated non-null and format checks

class ShoppingService {
 
   public void addToOrder(String orderId, String isbn, int qty) {
      Order order = orderservice.find(orderId);
      if (isbn.matches("[0-9]{9}[0-9X]")) {    ①  
         order.addBook(isbn, qty);
      }
   }
}
 
class Order {
   private Set<OrderLine> items;
 
   public void addBook(String isbn, int qty) {
      if (isbn != null && !isbn.isEmpty()    ②  
            && !isAlreadyInCart(isbn)) {
         putInCartAsNew(isbn, qty);
      } else {
         addToLine(isbn, qty);
      }
   }
 
   private void putInCartAsNew(String isbn, int quantity) {
      if (isbn != null && isbn.matches("[0-9]{9}[0-9X]")) {
         items.add(new OrderLine(isbn, quantity));
      }
   }
 
   private void addToLine(String isbn, int quantity) { ... }
 
   private boolean isAlreadyInCart(String isbn) {
      boolean result = false;
      for (OrderLine item : items) {
         String itemISBN = item.isbn();
         if (itemISBN != null
               && isbn != null    ③  
               && isbn.matches("[0-9]{9}[0-9X]")) {    ④  
            if (itemISBN.equals(isbn)) {
               result = true;
            }
         }
      }
      return result;
   }
   // other methods
}

At first glance, it might seem wise to ensure that isbn is not null before matching it against a regexp (regular expression). You don’t want a NullPointerException to be thrown arbitrarily, do you? The problem is that these kinds of checks don’t make the code much safer or more robust—they make it harder to understand.

The code seems to distrust itself; it’s not sure what rules are in play. Is it OK to send in an isbn that is null? If so, how should the code handle it? If not, has anyone stopped it before? It does make sense that the public method addBooks checks the input thoroughly, and it certainly ensures no null value or empty string enters the code. But it doesn’t check the format:

isbn.matches("[0-9]{9}[0-9X]")

That particular check is instead done in other places. The following two methods obviously don’t trust the null checks in addBook and repeat them, even deep in loops:

private void putInCartAsNew(String isbn, int quantity)
 
private boolean isAlreadyInCart(String isbn)

On the other hand, the same methods don’t repeat the check !isbn.isEmpty(). It’s fundamentally unclear what promises each part of the code can rely on from other parts of the code.

This is a condensed example, but it’s not uncommon to find if (value != null) checks deep down in the code, sometimes in the middle of methods with several hundred lines. In the defense of the programmers who wrote that extra null check, their ambition was probably to ensure the program wouldn’t crash. And, in many cases, those extra checks are put in place because the program has crashed with a NullPointerException at that specific point. But here (in listing 12.8), instead of clarifying the design to ensure that null wouldn’t show up, a simple if was patched as a band-aid over the spot.

Another problem with this code is the coping strategy. In this example, if the code finds an item.isbn() that doesn’t follow the required format, it reacts by doing nothing! The code doesn’t stop executing, throw an exception, or even report the fault through logging. It ignores the situation. Of course, in the long run, this leads to nothing ever getting corrected, and the data continues to be riddled with bad stuff. To make things worse, these kinds of sprinkled-out checks don’t just bloat the code, they also get in the way of doing refactorings that would let the code state its functionality in a more concise way.

12.4.2 Contracts and domain primitives to the rescue

Instead of plastering null checks all over the code to protect it from null data, you should ensure that data is not null. In the same way, to avoid format checks being repeated, you want to encapsulate that such a check has already been done. The tools for this are contracts (as described in section 4.2) and domain primitives (as described in chapter 5).

Going back to our example from listing 12.8, there are two domain primitive types, ISBN and Quantity, that become part of the interface for Order. We’re not going to describe the transformation in full because that was done in chapter 5, but will roughly sketch out the differences. When ShoppingService calls Order, it needs first to create the corresponding domain primitive objects to pass to the addToOrder method, as shown in the following listing. The code calling the Order object takes on the burden of validation by creating domain primitive objects.

Listing 12.9 Calling the Order object

    public void addToOrder(String orderId, String isbn, int qty) {
        Order order = orderservice.find(orderId);
        order.addBook(new ISBN(isbn), new Quantity(qty));
    }

When the Order.addBook receives ISBN and Quantity, the values are well wrapped, and you can rely on them having been properly checked. There’s no need to do the same check again. The Order ensures it isn’t sent null values, as seen in the next listing.

Listing 12.10 Order only upholds the contract notNull for the arguments

    public void addBook(ISBN isbn, Quantity qty) {
        notNull(isbn);
        notNull(qty);
        if (!isAlreadyInCart(isbn)) {
            putInCartAsNew(isbn, qty);
        } else{
            addToLine(isbn, qty);
        }
    }

With validated domain primitive objects coming in as arguments, there’s no longer any need for Order to revalidate the format of the ISBN, for example. Also, by checking for notNull in the public method addBook, you avoid the need to repeat null checks in the private methods, such as putInCartAsNew and isAlreadyInCart. Take a look at the latter after cleanup in the following code listing. You’ll see that the updated isAlreadyInCart method trusts that format checks are ensured by the design.

Listing 12.11 Updated isAlreadyInCart

    private boolean isAlreadyInCart(ISBN isbn) {
        boolean result = false;
        for (OrderLine item : items) {
            if (item.isbn().equals(isbn)) {
                result = true;
            }
        }
        return result;
    }

With all the confusing rechecks gone, it’s much more obvious what the code does. You also can see ways to make it simpler. The if statement inside the for-each can be replaced with

result = result && item.isbn().equals(isbn)

Now it’s clear that you sieve through the list to see if any of the items match the ISBN searched for, as seen in the following listing. If any existing item matches the searched-for ISBN, then it’s already in the cart. Obviously, this can be done more smoothly with the Stream API.

Listing 12.12 Searching for the ISBN

    private boolean isAlreadyInCart(final ISBN isbn) {
        return items.stream().anyMatch(item -> item.isbn().equals(isbn));
    }

The design flaw to watch out for is deep checks of null, format, or ranges. What it hides is that the code isn’t designed in such a way that it can trust itself. Also, the messy code can hamper clarifying refactorings. Introducing appropriate contracts and domain primitives is a way to get to a better state.

12.4.3 Overlenient use of Optional

There’s a modern variant of this design flaw when working with optional data types, such as Optional<T> in Java. Instead of having data lying around as null values, those values are transformed to Optional.EMPTY. If there sometimes are missing addresses, the address field of an order might be written as Optional<Address>, although an order must have an address in the long run. When it’s time to use the data, the flaw shows up by taking a detour using Optional.map.

Compare how anAddr and perhapsAddr are treated in listing 12.13. The code for perhapsAddr becomes much more convoluted. Instead of asking for the zip code using the method zip, it takes a detour using the method map, which is a higher-order function that accepts the function Address::zip as an argument and applies that function to each Optional<Address> in the stream, which is one in this case. This is a cumbersome detour to get one zip code out of one address just to take into account that the address might be missing.

Listing 12.13 Taking account for Optional.EMPTY gives convoluted code

Address anAddr;
Optional<Address> perhapsAddr;
 
Zip anZip = anAddr.zip();
Optional<Zip> perhapsZip = perhapsAddr.map(Address::zip);

Watch out for Optional<Address> in cases where the address is mandatory and the use of Optional carries no domain insight but is a cover-up for the code not trusting the rest of the design. Whether they’re old-school repetitions of null and format checks or new-school overuse of Optional, defensive code constructs are a clear signal that something isn’t right with the code. These parts of the code are a great place to start your quality and security work by applying contracts and domain primitives.

We’ve covered three problems that often show up in legacy code: ambiguous parameter lists, logging unchecked strings, and defensive code constructs. We’ll now turn to three other problems that are somewhat subtler. These are situations where good design principles have been applied, but in a way that is incomplete or slightly beside the point. At first sight, these applications might even look good, but upon closer inspection, you realize that there are problems nevertheless. Let’s start with how the design principle Don’t Repeat Yourself (DRY) can be misapplied.

12.5 DRY misapplied—not focusing on ideas, but on text

The design principle Don’t Repeat Yourself (DRY) was coined by Andy Hunt and Dave Thomas in their seminal book, The Pragmatic Programmer.1  It states that when we capture our knowledge into code, we shouldn’t repeat the same codification in several places.

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

Unfortunately, the principle of DRY has often been misunderstood as applying not to knowledge and ideas, but to the code as text. Programmers search their codebases for repeated text and try to get rid of the duplication. This misconception is enhanced by the helpful tools of modern IDEs that can automatically identify duplicated code and even suggest automatic transforms that remove the duplication. Those tools are an excellent help when applying DRY correctly, but at the same time a speedy road to dependency hell if misapplied.

It’s definitely true that duplicated text often is due to a duplication of ideas. And certainly, you can use “look for duplicated text” as a quick test to find DRY violations. (Note that we are using the word test in the sense of a medical test that checks for indicators of a disease, not a system development unit test or similar.) But “look for duplicated text” isn’t a fault-free test. There are both false positives and false negatives: false positives look like repetition but aren’t; false negatives don’t look like repetition but are. False positives might lead you to link together parts of the code that aren’t related, creating unnecessary dependencies. A false negative, on the other hand, risks inconsistent functionality evolving, raising the risk of security vulnerabilities.

12.5.1 A false positive that shouldn’t be DRY’d away

In listing 12.14, you see part of the constructor for a class representing a delivery of books that’s sent from an online bookstore to a customer, so naturally it contains both the order number and the zip code. A zip code is a five-digit string, and that happens to be the format for the order number as well.

Listing 12.14 No repetition of knowledge, although same regexp used twice

    BookDelivery(String ordernumber, String recipient,
                String streetaddress, String zipcode) {
 
        this.ordernumber = notNull(ordernumber);
        matchesPattern(ordernumber, "[0-9]{5}");    ①  
        this.zipcode = notNull(zipcode);
        matchesPattern(zipcode, "[0-9]{5}");    ②  
        ...
    }

It might look as if you’re repeating yourselves, because the same regexp occurs twice. But you aren’t repeating yourselves in the sense of DRY, because those two regexps encode different pieces of knowledge: the knowledge about what order numbers look like and the knowledge about what zip codes look like. The occurrence of the same regexp is coincidental.

12.5.2 The problem of collecting repeated pieces of code

The problem starts when someone thinks, “There are two regexps that look the same; this is repetition and a violation of DRY,” and then sets out to fix it by refactoring those two unrelated regexps to some common technical construct, such as a utility method. Those methods often end up as static methods in classes named Util and are then collected in packages named util, common, or misc. Needless to say, virtually every piece of code in the rest of the codebase will have a dependency on that common or util library. That might be acceptable briefly as a temporary state during cleanup, but it’s certainly not desirable as a long-term solution.

The problem escalates when you then start separating the codebase into different parts, perhaps into microservices. Microservices should be standalone. It’s reasonable to have a small amount of dependencies to some common concepts, but here there’s a massive dependency to a large util package from all the other parts. Now, whenever anything changes in any part of the humongous util package, everything needs to be recompiled and redeployed. You don’t have a system of independent microservices, you have a tightly coupled distributed monolith dependency hell.

12.5.3 The good DRY

Instead of packing things because the pieces of code look the same, you should focus on what knowledge is captured. In our example, you’ll note that we have two separate but primitive domain concepts: the concept of zip codes and the concept of order numbers. Those make two fine domain primitives: ZipCode and OrderNumber, respectively. Each of their constructors will have a regexp check that, code-wise, will look the same, but that’s OK. If they’re unrelated, then leave them apart. The obvious thought experiment is, “If we change the format of order numbers, will the format of zip codes then change accordingly?”

12.5.4 A false negative

A false negative shows up the other way around: when the code doesn’t look like a repetition, but really is. Look at listing 12.15 and note how there are two ways to check that a string is a zip code. One way uses a regexp to check the format; the other checks that the string has length five and parses to an integer.

Listing 12.15 Expressing the same idea in two different ways

    if((input.length() == 5)    ①  
        && Integer.parseInt(input) > 0) {
         ...
     }
 
    if(zipcodeSender.matches("[0-9]{5}")) {    ②  
        ...
    }

These two code snippets need not be in the same method, nor even in the same class. One of them might be a helper method for CustomerSettingsService and the other in the middle of a method 400 lines long; LogisticPlanningsService, for example. These pieces of code might be written at completely different times, and the programmer writing the first expression might even have quit before the programmer writing the second expression started on the project.

No text is repeated, and it won’t be found by a duplication tool. Nevertheless, these snippets are a violation of DRY because they encode the same knowledge—the understanding of what a zip code looks like. Remember, “Every piece of knowledge must have a single… representation.” Unfortunately, it’s hard to spot this kind of repetition.

The problem with having two different encodings of the same knowledge is when things change. You might remember to change these two places, but there might easily be 37 other places that also encode the same knowledge. Probably, you’ll remember (or find) 35 at most. The changes might also be small, but several are needed for consistency, so one or a few might easily be missed.

Official zip code formats don’t change often, but a format for an order number might. In an order number, you might first restrict what the first digit might be; then you allow a trailing letter; then you restrict those that start with an odd digit to have some special meaning; and so on. And in each of these updates, you miss a few of the code sites, but not the same few every time. After a while, you have a lot of inconsistency in your business rules, and inconsistency is what attackers love.

Fortunately, the remedy is easy. Create appropriate domain primitives that capture formatting. If you have no obvious domain primitive to start with, then start turning all these snippets into small helper methods that you collect into some utility class. In one of our projects, we managed to get over 200 methods into such a class, which was a gold mine for creating domain primitives. As mentioned earlier, these kinds of utility classes are nothing you want in the long run, but as a temporary step they can be valuable. Now let’s move on to another subtle issue; one where things look fine because we have domain types, but they lack sufficient validation.

12.6 Insufficient validation in domain types

Sometimes you’ll find yourself working on a codebase with a well-designed domain model that has important concepts represented as explicit types, maybe even as value objects and entities.2  The type of code we’re talking about is typically well structured and easy to read, but there might be some lurking design flaws that can lead to security vulnerabilities.

One such flaw you should learn to spot is when there’s insufficient domain validation of the data encapsulated by the domain types. This means that there’s either no validation at all, or the validation that’s performed is insufficient from a domain perspective. In chapters 4 to 7, you learned why strict domain validation is important for mitigating security flaws. When you spot domain types like this, you should seize the opportunity to improve them.

To illustrate this, listing 12.16 shows an example of a value object, Quantity, that’s missing relevant domain validation. The value object validates the input data for not being null, and it’s even immutable. But from a business domain perspective, a quantity is more precise in its definition than just being an integer. In our imaginary business domain, it’s defined as an integer between 1 and 500.

Listing 12.16 A domain type with insufficient validation

import static org.apache.commons.lang3.Validate.notNull;
 
public final class Quantity {
 
   private final Integer value;
 
   public Quantity(final Integer value) {
      this.value = notNull(value);
   }
 
   public int value() {
      return value;
   }
 
   // ...
 
}

If you apply what you learned about domain primitives in chapter 5 and enforce all the domain rules you know about a quantity in the constructor, the Quantity class ends up looking something like the following listing.3  You have now taken a well-defined value object and increased the overall security by turning it into a domain primitive.

Listing 12.17 Quantity turned into a domain primitive

import static org.apache.commons.lang3.Validate.inclusiveBetween;
 
public final class Quantity {
 
   private static final int MIN_VALUE = 1;
   private static final int MAX_VALUE = 500;
 
   private final int value;
 
   public Quantity(final int value) {
      inclusiveBetween(MIN_VALUE, MAX_VALUE, value);
      this.value = value;
   }
 
   public int value() {
      return value;
   }
 
   // ...
 
}

You can use the same approach when it comes to entities. Make it a habit to always look closer when you see domain types with insufficient validation. Most likely, you’ll find an opportunity to make your code more secure by design and, at the same time, deepen your domain knowledge through collaboration with the domain experts.

12.7 Only testing the good enough

When writing tests, developers tend to focus on two things: that the production code follows the intended behavior and that a safety net is established to prevent bugs from slipping through. Because of this, many use tests as documentation to learn how a system behaves when given a certain input and how the implementation supports different business requirements—but this isn’t the only thing that tests reveal. They also show where the weak spots are in the code, which could be exploited by an attacker.

In chapter 8, you learned about how to secure your code using tests. Many times, developers stop writing tests after creating normal and boundary tests because they capture intended behavior and establish a safety net that’s good enough. But as you know by now, to promote security in depth, you also need to test with invalid and extreme input, because it pushes your code to the limit.4  Unfortunately, not many developers think of this when writing tests. Studying how code is tested is therefore a clever way to detect potential security weaknesses.

To illustrate this, we’ll revisit the code example from section 12.3, but this time, we’ll shift focus and analyze potential security weaknesses that can be exploited by looking at how the code is tested.

In listing 12.18, you see the table reservation method that allows you to fetch table reservations by ID. As you’ve learned, the reservation ID should be implemented as a domain primitive, but we’ve kept it as a String because this is how code tends to look before applying the secure by design mindset.

Listing 12.18 A table reservation is fetched by ID

public TableReservation fetch(final String reservationId) {
   logger.info("Fetching table reservation: " + reservationId);
   final TableReservation tableReservation =
            repository.reservation(reservationId);    ①  
   logger.info("Received " + tableReservation);
   return tableReservation;
}

Let’s analyze this from a secure testing perspective. Because many weaknesses are flushed out using invalid and extreme input, you should look for tests that exercise this kind of data. If there aren’t any, then there’s a high probability that the code will break if provided such input. For example, the table reservation ID is represented as a String, which means that you could inject any input that satisfies the rules of a String, such as 100 million random characters. Obviously, this isn’t a valid reservation ID and might seem absurd to test, but that’s not the point. The point is, if the code can’t handle this type of input efficiently, then an attacker could craft an exploit that potentially affects the overall availability of the system. Consider what happens to the log implementation if it receives extreme input, or how input of 100 million characters would affect the search algorithm in the database. It probably would break things. The solution is to use a domain primitive—say, ReservationId—to reject any input that doesn’t satisfy the domain rules.

But using domain primitives can lull you into a false sense of security, unless you’re careful. The design of a domain primitive certainly takes care of invalid input, but there’s no guarantee that it protects against failures and unexpected behavior caused by a dependency. You should, therefore, also look for tests that exercise failure scenarios even if the input is valid; otherwise, you might end up with implicit weaknesses caused by your dependencies. For example, if the table reservation repository is backed up by a Mongo database, then a MongoExecutionTimeoutException could be thrown, regardless of whether the reservation ID is correct or not. This means that it’s necessary to provide proper exception handling (as you learned in chapter 9) to avoid sensitive data leaking out.

Up to this point, you’ve learned how to identify several potential security problems in legacy code (for example, ambiguous parameter lists, logging of unchecked strings, and defensive code constructs) that seem to benefit from using domain primitives. But, unless you’re careful, applying them incorrectly could create a whole set of new problems. Let’s see what happens if you end up creating partial domain primitives.

12.8 Partial domain primitives

Another pitfall when creating domain primitives is to not encompass a complete conceptual whole. When designing domain primitives, it’s a good idea to keep them as small as possible—perhaps only wrapping a value in a thin shell of domain terminology. But sometimes that wrap can be too small. When not enough information is captured in the domain primitive, there’s a possibility of it being used in the wrong context. This might constitute a security risk.

Let’s look at an example: using a monetary amount with the wrong currency. Imagine you have an online bookstore, and most of your book suppliers are situated in the United States and are paid in U.S. dollars. How do you represent the prices you pay? Most currencies have a unit and subunit: the U.S. dollar is divided into 100 cents, the Mexican peso into 100 centavos, and the Swedish krona into 100 öre. Monetary amounts look like real numbers, such as 12.56 or 5.04, but they’re not.

Whatever representation you use for amount (BigDecimal, two ints, a single long, or anything else), you realize that you shouldn’t have that representation floating around in the code. You need to form a domain primitive. What about Amount? It seems like a good idea, even if we’ll shortly show you that it’s too small.

You form a domain primitive Amount using some internal representation. On the outside, it has methods for adding (public Amount add(Amount other)), comparing (public boolean isGreaterThan(Amount other)), and perhaps splitting an amount into a number of piles (public Amount[] split(int piles))—the latter ensuring that no cent is lost if $10 is split three ways.

12.8.1 Implicit, contextual currency

What we’ve unfortunately not taken into account yet is the concept of currency. Remember that most of your suppliers trade in U.S. dollars? That’s most, not all. There has to be a currency somewhere. Right now, it’s obviously implicit—by looking at a supplier, you can figure out what currency to use. Each supplier will be paid in the currency of its respective country: U.S. suppliers in U.S. dollars, the few French suppliers in euros, and the odd Slovenian suppliers in Slovenian tolars. If you want to know the total sum of your trade with a specific supplier, you can sum it up as shown in the following listing.

Listing 12.19 Summing up the purchase orders for a specific supplier

        List<PurchaseOrder> purchases =
                procurementService.purchases(supplier);
        Amount total = purchases.stream().    ①  
                map(PurchaseOrder::price).    ②  
                reduce(Amount.ZERO, Amount::add);    ③  

The pitfall here is that the information about currency is contextual—you have to keep track of it outside the handling of monetary amounts. In this example, because a supplier is linked to a country that’s linked to a currency, the linking of implicit, contextual information worked out fine. But that’s not always the case.

12.8.2 A U.S. dollar is not a Slovenian tolar

Most suppliers are paid in U.S. dollars, but not all. And the code has to be aware of that fact. Imagine that this takes place in an organization where developers and businesspeople are strictly separated into different departments and never meet, and where all communication is mandated to go via business analysts and project managers. Yes, we know that sounds bizarre, but bear with us for the sake of the example.

Suppose a developer gets the task to sum up the value of all purchases for all suppliers. The developer might not know about the odd non-U.S. suppliers, and so might write code like that in the following listing. Notice how the method add is used in the same way as when summing purchase orders for a single supplier.

Listing 12.20 Summing up the purchase orders for all suppliers

        List<PurchaseOrder> purchases =
                procurementService.allPurchases();    ①  
        Amount sum = purchases.stream().
                map(PurchaseOrder::price).
                reduce(Amount.ZERO, Amount::add);    ②  

The design deficiency here is that the developer has summed up all the amounts. The result is something that’s almost correct, because a U.S. dollar is approximately as much as a euro, and the purchases in tolars are so few that they disappear in the mass. But it isn’t correct.

The problem is that it doesn’t make sense to add 10 U.S. dollars and 15 euros. (Well, both are monetary values, so it might make some sense, but it won’t add up to 25 of anything sensible.) The class Amount isn’t aware of the difference between 10 U.S. dollars and 10 euros because currency is a concept that’s implicitly understood, not explicitly stated. You should either convert one of those values to the other or say, “Can’t add different sorts of money.” But the class Amount isn’t designed to capture that domain subtlety.

This discrepancy between explicit and implicit usually shows up when the implicit concept is suddenly explicitly needed, often in the case of interacting with external representations. In the following listing, you see this happening when trying to initialize a payment at a bank, which suddenly requires a currency. The currency is then fetched from a central service that keeps track of which supplier is in which country and, therefore, uses appropriate currency.

Listing 12.21 Sending off a payment to a bank, fetching currency out of thin air

public void payPurchase(PurchaseOrder po) {
    Account account = po.supplier().account();
    Amount amount = po.price();    ①  
    Currency curr =
        supplierservice.currencyFor(po.supplier());    ②  
    bankservice.initializePayment(account, amount, curr);
}

This code works because a supplier is bound to a country, and a country is bound to a currency. But that connection is contextual—the currency information is transmitted out-of-band with respect to the amount to be paid.

Things get even worse when Slovenia decides to leave the tolar behind and start using euros instead. This happened in 2007, and the code in listing 12.21 is roughly what we found at one of our clients. Suddenly Slovenian suppliers didn’t have one well-defined currency, but two different ones, depending on the transaction date. A supplier with a purchase order for 100,000 tolars signed in late 2006 would probably get paid in early 2007, but at that point, the code

supplierservice.currencyFor(po.supplier())

would no longer return tolars but euros instead. The supplier would instead be paid 100,000 euros, a significantly larger sum. Fortunately, our client realized this in time and was able to take precautions, but it caused a significant amount of extra work.

12.8.3 Encompassing a conceptual whole

If Amount isn’t enough, what concept would be big enough to form a conceptual whole? If you wrap both the amount and the currency, you get a concept you might call Money. The PurchaseOrder wouldn’t have had an amount only and simply relied on implicit currency information. Rather, it’d have the currency explicitly embedded in the price in the form of a Money object.

The code for paying a supplier using Money would look like the following listing. Notice that po.price() no longer returns an Amount, but a Money object that contains the currency. The currency information is no longer obtained out-of-band but from the purchase order itself.

Listing 12.22 Fetching currency and amount from the same conceptual whole

public void payPurchase(PurchaseOrder po) {
    Account account = po.supplier().account();
    Amount amount = po.price().amount();    ①  
    Currency curr = po.price().currency();    ②  
    bankservice.initializePayment(account, amount, curr);
}

When the currency and the amount travel together in Money, they form a conceptual whole and make up a neat domain primitive. This also fixes another issue.

Recall the broken code for summing up all purchase orders for all suppliers in listing 12.20. The mistake was to add U.S. dollars, tolars, and euros without discrimination. Using Money as a domain primitive, as seen in the following listing, that wouldn’t be possible. The add method in Money checks that the two amounts to be added refer to the same currency.

Listing 12.23 Summing up the purchase orders for a specific supplier

class Money {
    public static final Money ZERO = ...
    private Amount amount;
    private Currency currency;
 
    public Money add(Money other) {
        isTrue(this.currency
                .equals(other.currency));    ①  
        ...
    }
}
 
        List<PurchaseOrder> purchases =
                procurementService.allPurchases();
        Money sum = purchases.stream().
                map(PurchaseOrder::price).
                reduce(Money.ZERO, Money::add);    ②  
}

In this chapter, we’ve covered some situations that you might encounter during your work with making designs more secure. We’ve looked at some sources for potential security weaknesses in the code—ambiguous parameter lists, logging of unchecked strings, and defensive code constructs—and what you can do about them. We’ve also looked at situations where things are better, and the code might look fine at first glance, but where there are hidden issues if you think more deeply about it: the principle of DRY misapplied, insufficient validation in domain types, and only positive tests. Finally, we’ve covered two pitfalls where things are hard to get right when starting out: where to apply domain primitives and the risk of creating domain primitives that are partial.

These tricks also apply well in traditional codebases and are a great way to get started. But what about a system with a microservice architecture? What can be done, and what should you think about? That’s the topic of the next chapter.

Summary

  • Introducing domain primitives should be done at the semantic boundary of your context.
  • Ambiguous parameters in APIs are a common source of security bugs.
  • You should be on the lookout for ambiguous parameters when reading code and address them using the direct approach, the discovery approach, or the new API approach.
  • Never log unchecked user input, because it opens up the risk of second-order injection attacks.
  • Limit the number of times a sensitive value can be accessed, because it allows you to detect unintentional access.
  • Use explicit accessor methods for data that you want to log. Otherwise, new fields can end up in logs by accident.
  • Because defensive code constructs can be harmful, clarify them using contracts and domain primitives.
  • The DRY (Don’t Repeat Yourself) principle is about repeated representation of knowledge, not about repeated text.
  • Trying to reduce repeated text that isn’t repeated knowledge can cause unnecessary dependencies.
  • Failing to reduce repeated knowledge because the text differs can cause vulnerable inconsistencies.
  • Tests reveal possible weaknesses in your code, and you should look for invalid and extreme input tests.
  • Ensure that your domain primitives encompass an entire conceptual whole.
  • Be on the lookout for domain types lacking proper validation and address them with domain primitives and secure entities.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.147.252