13
Guidance on microservices

This chapter covers

  • How to design secure APIs for microservices
  • Sensitive data in a microservice architecture
  • Integrity of log data
  • Traceability across services and systems
  • A domain-oriented logging API

In chapter 12, we looked at challenges in legacy code that often appear in monolithic architectures and how to apply secure by design fundamentals. In this chapter, we’ll focus on microservices, an architectural style that has grown in popularity in recent years. The topic is too large to cover fully in a single chapter, but we’ve selected an interesting set of challenges that are essential from a security standpoint. For example, you’ll learn how to deal with sensitive data across services, and why it’s important to design service APIs that enforce invariants. In addition, we’ll revisit logging one more time and explore challenges like traceability of transactions across services and systems, how to avoid tampering of log data, and how to ensure confidentiality using a domain-oriented logger API. But before we dive into the world of microservices, let’s establish what a microservice is.

13.1 What’s a microservice?

Microservice architecture is an architectural style of building systems that has become popular as an alternative to and a reaction against the monolithic style. Monolithic systems are built as a single logical unit. They can consist of various technical parts, such as an application, a server, and a database, but those parts depend logically on each other, both during development and at runtime. If any of them are down, the system doesn’t work. Similarly, any nontrivial change will likely affect several if not most of the parts and needs to be deployed simultaneously to work properly.

There’s no single, authoritative definition of microservice architecture. Still, there’s some common understanding about what the term means, as we can see by looking at the following quotes from Martin Fowler and Chris Richardson, respectively. Most people agree that microservice architecture describes a style of structuring the system around loosely dependent, relatively small, business-oriented services, each executing in its own runtime environment.

[T]he microservice architectural style is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery.

—Martin Fowler, https://www.martinfowler.com/articles/microservices.html

Microservices—also known as the microservice architecture—is an architectural style that structures an application as a collection of loosely coupled services, which implement business capabilities. The microservice architecture enables the continuous delivery/deployment of large, complex applications.

—Chris Richardson, https://microservices.io

A complete description of the microservice architecture style is beyond the scope of our security-focused discussion, but we certainly recommend reading up on it. These two websites are good places to start, together with Chris Richardson’s book, Microservices Patterns (Manning, 2018), and Sam Newman’s book, Building Microservices (O’Reilly, 2015). Executed well, it’s a style that can return huge benefits.

Let’s briefly sketch out three aspects of microservices that we think are important but that are, unfortunately, often overlooked: independent runtimes, independent update capability, and being designed for down.

13.1.1 Independent runtimes

A microservice should run in its own runtime, independent of the other services. A runtime in this sense could be a process, a container, a machine of its own, or some other way to separate the services from each other. That the runtimes are independent also means that there should be no dependencies of the type this one has to start before that one, and services shouldn’t make assumptions about the particulars of other services. For example, if you need to move one of your services from one machine to another, you should be able to do so without the other services malfunctioning or ceasing to work completely. Although there are several ways of achieving this goal, following the advice on cloud-native concepts and the twelve-factor app methodology that we covered in chapter 10 provides a good starting point (in particular, see sections 10.2, 10.3, and 10.6).

13.1.2 Independent updates

Having independent runtimes makes it possible to take down a service and restart it without restarting the rest of the system. This ability is a prerequisite for independent updates. But it’s not enough to be able to restart the service—you have to be able to do it with updated functionality.

A change in functionality should be isolated to a few services at most. The ideal case is that you only need to touch one single service for a functional update. But it makes sense that if you extend the functionality one service provides, then you’ll most probably want to change some of the calling code in another service to make that change usable and valuable—and that’s perfectly fine. What you want to avoid is a change that ripples from one service to the next, then over to a third, and so on. A huge help in this is orienting each service around a business domain. We’ve touched on this in earlier chapters (for example, in sections 3.3 and 3.4), and in this chapter, we’ll elaborate on it further. (See section 13.2.)

13.1.3 Designed for down

With independent runtimes and independent updates, it’s normal for one service to be up while another is down. To work well in this situation, a service needs to be designed both so that it behaves well when the other service is down and it recovers to normal operation when the other service is up again. The service isn’t only designed for the happy case when every service is up, but also designed for when services it depends on are down. The techniques that we covered in chapter 9, especially the use of bulkheading to contain a failure and circuit breakers to avoid domino failures, will take you a long way towards this goal.

A neat trick when developing is to start with implementing what the service should do in case a service it depends on is down. This is easier if each service is designed as a bounded context of a business domain. Even if the other service isn’t available, you can try to make the best of the situation in the context you’re in. Another powerful approach is to design your architecture as event-driven, where the services communicate by passing messages. In that case, the services pull incoming messages from a queue or topic at their own discretion, so the sender makes no assumption about whether the receiver is up or down.

Now that we’ve looked at some characteristics of microservices, let’s look at how to design such services. We’ll start with designing each service so that it captures a model in a bounded context.

13.2 Each service is a bounded context

A common challenge when designing microservices is figuring out how to split functionality between different services. Whether feature X should go in service Y or service Z isn’t always an easy question to answer, and it’s one you shouldn’t answer in haste. Slicing the feature set incorrectly can lead to multiple architectural challenges. One is that instead of independent services, you might end up with a distributed monolith, leading to the overhead of managing multiple services but without the benefit of them being independent. Cramming too much functionality into a single service is another challenge, because then you’re working with a monolith again. A good way to tell if your microservices are inappropriately sliced is when you’re experiencing testing difficulties or witnessing tight dependencies between different development teams that, in theory, should be independent.

You can design microservices in many ways, but we believe a good design principle is to think of each service as a bounded context.1  Doing so provides several benefits:

  • If you treat each service as a bounded context with an API that faces the rest of the world, you can use the various design principles and tools you’ve learned in this book to build more secure services.
  • It’ll help you decide where a certain feature belongs, because it’s easier to reason about the home of the feature when you’re thinking in terms of bounded contexts instead of technical services or APIs. This helps you with the challenge of slicing the feature set.

Our experience is that designing microservices this way leads to better defined APIs, less complex dependency graphs between services, and, most importantly, more secure services. When you think of each microservice as its own bounded context, it’ll become clear that each service has different concepts and semantics, even if some concepts might share the same names (such as customer or user). With that understanding, you most likely won’t miss that you need to perform explicit translations when you’re moving across services. Each time you translate between different bounded contexts, you’ll use techniques learned in part 2 of this book to improve security. For example, you can use domain primitives and secure entities to enforce invariants in the receiving service. You can also make sure to handle exceptions in a secure way so that bad data doesn’t lead to security issues. And it’s more likely that you’ll spot the (sometimes subtle) semantic differences between services that can lead to security problems. Let’s take a look at three cases that we’ve found are common and that we think you should pay some extra attention to when designing microservices: API design, splitting monoliths, and evolving services.

13.2.1 The importance of designing your API

Designing the public API of a microservice is probably one of the most important steps of building microservices, but unfortunately, it’s commonly overlooked. Each service should be treated as a bounded context, and the public API is its interface to the rest of the world. In chapter 5, we talked about how to use domain primitives to harden your APIs and the importance of not exposing your internal domain publicly. You should also apply those concepts when designing the API of your microservice to make it more secure.

Another important aspect of API design is that you should only expose domain operations. If the API only exposes domain operations, the service can enforce invariants and maintain valid states. This, as you learned in part 2, is an essential part of building secure systems. Don’t fall for the temptation of exposing inner details of your domain or the underlying technology you happen to be using—free for anyone to perform operations on. A service isn’t just a bunch of CRUD (create, read, update, and delete) operations; it provides important business functionality, and only that functionality should be exposed.

The following listing shows a customer management API designed in two different ways. The API is described as an interface because it’s a concise way of expressing an API. The implementation doesn’t matter, because the focus of this discussion is API design.

Listing 13.1 Designing the API: CRUD operations versus domain operations

public interface CustomerManagementApiV1 {    ①  
 
   void setCustomerActivated(CustomerId id, boolean activated);
 
   boolean isActivated(CustomerId id);
 
}
 
public interface CustomerManagementApiV2 {    ②  
 
   void addLegalAgreement(CustomerId id, AgreementId agreementId);
 
   void addConfirmedPayment(ConfirmedPaymentId confirmedPaymentId);
 
   boolean isActivated(CustomerId id);
 
}

The purpose of the customer management API is to provide the functionality of activating a customer. In this particular system, a customer is considered activated once a legal agreement has been signed and an initial payment has been confirmed. What’s interesting is how the two different versions, CustomerManagementApiV1 and CustomerManagementApiV2, handle how a customer becomes activated.

In the first version of the API, two methods, setCustomerActivated(CustomerId, boolean) and isActivated(CustomerId), are exposed. This might seem like a flexible solution, because anyone that wants to can activate a customer and check if a customer is activated. The problem with this design is that the service owns the concept of a customer and the definition of an activated customer, but the way the API is designed, it’s unable to uphold the invariants for it (having a signed legal agreement and a confirmed payment). There might also be other invariants for when a customer should be deactivated, which the service also is unable to enforce.

In the second, redesigned version, the API no longer exposes a method to directly mark a customer as activated. Instead, it exposes two other methods: addLegalAgreement(CustomerId, AgreementId) and addConfirmedPayment(ConfirmedPaymentId). Other services that handle legal agreements or payments can call these methods to notify the customer service when a legal agreement is signed or when a payment has been confirmed. The isActivated(CustomerId) method only returns true if both a legal agreement and a payment for the customer exist.

Only exposing domain operations in the API means the service is now in full control of maintaining a valid state and upholding all applicable invariants, which is a cornerstone for building secure systems. Because the service now owns all operations related to activating a customer, this design also makes it possible to add more prerequisites without changing any client code. The following listing shows a third version of the API, where a new requirement has been added for a customer to be activated: a customer service representative must have made a welcoming call to the new customer.

Listing 13.2 Introducing a new requirement in the API

public interface CustomerManagementApiV3 {
 
   void addLegalAgreement(CustomerId id, AgreementId agreementId);
 
   void addConfirmedPayment(ConfirmedPaymentId confirmedPaymentId);
 
   void welcomeCallPerformed(CustomerId id);    ①  
 
   boolean isActivated(CustomerId id);    ②  
 
}

To implement the new requirement, all you need to do is add a third method, welcomeCallPerformed(CustomerId), that notifies the customer service that the call has been made and makes sure the isActivated(CustomerId) method also checks for the new requirement before returning true. There’s no need to make changes to all other services calling the isActivated method, because the logic for determining whether a customer is activated or not is now owned by the customer service. This would have been impossible to do with an anemic CRUD API like the one you saw in listing 13.1.

13.2.2 Splitting monoliths

Often, you’ll find yourself in a situation where you’re splitting a monolith into multiple smaller services. This might be because you’re refactoring an existing system toward a microservice architecture, or perhaps because you started with a microservice that has grown too big and needs to be split. The tricky part can be to figure out where to split the monolith. If you identify the semantic boundaries in your monolith (as you learned in section 12.1), you can then use those boundaries to split the monolith into smaller microservices, each with a well-designed API.

In terms of API design, one thing to watch out for when splitting a monolith is that you must also discover and enforce the translation between the different contexts—contexts that are now in different microservices. Because the context boundaries you discovered were most likely hidden in the monolith, there’s not going to be any existing translation in the code. When you’re creating your microservice, possibly by extracting existing code from the monolith, it’s easy to forget to add explicit translations between the contexts.

Always be wary when making calls across services and make it a habit to add explicit translation to and from the context you’re talking to. A good way of doing this is by thinking carefully about the semantics and using code constructs like domain primitives. As soon as you receive incoming data in your API, immediately validate it, interpret the semantics, and create domain primitives from it. Doing this will take you a good way toward creating APIs that are hardened by design. To give you an example, let’s go back to section 12.1 and this method:

public void cancelReservation(final String reservationId)

Say you’ve found that this method is part of a context boundary, and you want to split the monolith at this point to create a new microservice. A good first step before extracting the code to the new service is to introduce a domain primitive for the reservation ID. This way, you’ll enforce explicit translation to and from the bounded context. Once you have that in place, you can go ahead and extract the code to the new microservice.

13.2.3 Semantics and evolving services

If you already have a microservice architecture, you should pay extra attention as services evolve, especially when there are semantic changes in the APIs. The reason for this is that subtle changes in semantics can lead to security issues if appropriate changes aren’t also made in the translation between the different bounded contexts; in other words, broken context mappings can cause security problems.

The story in chapter 11, where evolving APIs led to insurance policies being given away for free, is a perfect example of how evolving microservices can cause serious trouble. Context mapping, taking nearby microservices into account, and thinking carefully about how to evolve semantics in the APIs are some effective ways of handling evolving services in a safe way.

When you evolve microservices and either introduce new concepts, change existing ones, or in other ways change the semantics, always try to avoid redefining existing terminology. If you feel an existing term has changed in meaning, then replace it with a new term and remove the old one. Another approach is to leave the old term unchanged and instead introduce a new term that’ll let your domain model express the new meaning.

Changes in semantics are something that usually requires some degree of domain modeling and context mapping to get right.2  Sometimes the changes in semantics can lead to a change of context boundaries, and, because each service is a bounded context, the change of boundaries leads to a change in the microservices you have. New services get created or existing ones get merged as a result of evolving semantics. Remember that even if you’re using various secure code constructs in your APIs, you still need to invest time in the soft parts of API design in order to avoid security pitfalls like the one you saw in chapter 11.

Now you know that API design is an important aspect of microservice architectures (not only from a security perspective) and that putting time into it is an investment that’ll pay off in many ways—improved security being one of them. In the next section, you’ll learn about some pitfalls when sending data between different services.

13.3 Sensitive data across services

When thinking about security in any architecture, it’s important to ask yourself what data is sensitive. In a microservice architecture, it’s easier to make mistakes because the architectural style encourages developers to work by looking at one service at a time, and, when doing so, it becomes harder to keep track of cross-cutting concerns like security. In particular, to ensure that sensitive data is handled well, you need to see the entire picture. For a start, let’s elaborate a bit on the classic security attributes in the context of a microservice architecture.

13.3.1 CIA-T in a microservice architecture

Information security classically focuses on the security triad of CIA: confidentiality (keeping things secret), integrity (ensuring things don’t change in bad ways), and availability (keeping things…well, available when needed). Sometimes traceability (knowing who changed what) is added to this triad, creating the acronym CIA-T. In chapter 1, we elaborated a little on these concepts under the section on security features and security concerns.

The microservice architecture doesn’t help us when addressing cross-cutting concerns like security concerns. In the same way that you can’t rely on a single service to provide fast response times (because the bottleneck might be elsewhere), you can’t look at a single service to ensure security (because the weakness might be elsewhere). On the contrary, most security concerns become harder to satisfy because a microservice architecture consists of more connected parts—more places and connections where things could go wrong.

Ensuring confidentiality gets trickier because a request for data might travel from component to component. In many microservice architectures, the identity of the original requester is lost by the time the request finally arrives at the end component. The situation doesn’t get easier when the request is done (in part) asynchronously, for example, via message queues. To keep track of this, you need some token to be passed with the request, and when a request reaches a service, the service needs to check whether the requester is authorized. Security frameworks like OAuth 2.0 can help because they are built to provide such tokens. For example, in OAuth 2.0, the first request is given a token (the JSON Web Token, or JWT) based on the caller. The JWT is carried along with the downstream requests, and each service that processes the request can consult the authorization server to see if it should be allowed.

When guaranteeing integrity across multiple services, two things are important. The first is that every piece of information should have an authoritative source, typically a specific service where the data lives. Unfortunately, often data is copied from one place to the next, aggregated, and copied further. Instead of copying it, go directly to the source as often as possible. The second important thing is that the data hasn’t been tampered with. Here, classical cryptography can help by providing some sort of checksum or signature to ensure integrity.

For availability, a microservice architecture needs to ensure that a service is responding or that some sensible value can be used if the service is down; for example, a cached value from a previous call or a sensible default. In chapter 9, we discussed circuit breakers and other tools that are useful to design for availability.

Ensuring traceability also becomes more complicated in a microservice environment. As with confidentiality, you need to be able to track the original requester, but you also need to be able to correlate different calls to different services to see the bigger pattern of who accessed what. The term auditability is sometimes used as a synonym for traceability. Later, in this chapter, we’ll elaborate on how this property can be achieved through well-structured logging. CIA-T is a great way to reason about security, but what do we mean by sensitive data?

13.3.2 Thinking “sensitive”

Often sensitive is confused with confidential or classified. Sometimes this is indeed the case; for example, personal data like health records is considered sensitive and should be kept confidential. But the term sensitive is broader than that.

Take the license plate number of your car. Is that sensitive? It’s surely not confidential, as it’s on public display. But take the plate number and combine it with a geolocation and a timestamp, and suddenly there’s information on where you were at a certain point in time—information that you might want to stay confidential. The challenge becomes even greater in a microservice architecture, where data travels from service to service and from context to context. (Remember the discussion about each service being a bounded context earlier in this chapter.)

Let’s look at another example. A hotel room number such as 4711 isn’t something that’s confidential in and of itself. But who is staying in room 4711 on a certain night certainly is. After the guest has checked out, there’s nothing confidential about the room number any longer, and the hotel goes into the regular housekeeping routine of cleaning, replenishing the minibar, and so on. This isn’t security-sensitive. But suppose during housekeeping, a coat is found and is marked “found in room 4711” and placed in the lost-and-found. When the guest shows up to claim the coat, you suddenly have a connection between that customer and the room again—something that should be confidential. You can see that when moving in and out of different contexts, the same data (the room number) might merit being confidential or not.

The requirement for confidentiality isn’t an absolute but something that depends on context. That’s why you reason about sensitive data—data that could have security concerns. In this case, we looked at confidentiality, but a similar line of reasoning could apply to data that has integrity or availability concerns. A similar situation arises when a service is an aggregator of data. Such services are sometimes underestimated because they create no new data and therefore can’t be more sensitive than their parts. This is a mistake. If you add together several different pieces of data, there might arise a complete picture that says a lot more than the parts did individually. This is basically the way any intelligence agency works, so you should pay attention to those harmless aggregating services in your architecture.

To us, sensitive is a marker that indicates we should pay attention to security concerns to stop us from focusing on one service at a time. What is sensitive or not is something that needs to be understood by considering the entire system.

To identify sensitive data, you can ask yourself the following questions:

  • Should this data be confidential in another context?
  • Does the data require a high degree of integrity or availability in another context? How about traceability?
  • If combined with data from other services, could this data be sensitive? (Recall the example of the license plate number together with a time and geolocation.)

While thinking about this, you need to have the entire range of services in scope. Unfortunately, cross-cutting concerns like security can’t be addressed by myopically looking at one or a few services at a time, any more than issues of response time or capacity can be solved by a single service.

Now let’s move over to the tricky field of ensuring traceability in a microservice architecture—the call for structured logging from multiple sources.

13.4 Logging in microservices

We’ve brought up logging several times in previous chapters and analyzed how it impacts the overall security of an application. For example, in chapter 10, we discussed the importance of avoiding logging to a file on disk, and in chapter 12, we talked about the danger of logging unchecked strings. The key takeaway is that logging contains lots of hidden complexity, and things can go seriously wrong if done naively; for example, logging could open up the risk of second-order injection attacks or implicitly cause leakage of sensitive data due to an evolving domain model. Although this has been discussed extensively in this book, there’s one more aspect of logging we need to cover before closing the topic: the importance of traceability and how to ensure the integrity and confidentiality of log data.

13.4.1 Integrity of aggregated log data

In a monolithic architecture, you tend to get away with using remote login to access log data on a server when needed. In chapter 10, you learned that logging shouldn’t be done to a local disk, but rather to an external logging system. This might have seemed like overengineering at the time, but when using microservices, it definitely starts to make sense. If you run multiple instances of a service that scale dynamically and use the same logging strategy as with a monolith, you never know which instance contains the log data you need, because a transaction could span multiple instances, depending on load. This means that log data will be scattered throughout the system, and to get a complete picture of what has happened, you need to aggregate data—but fetching it manually quickly becomes a painful experience.

Aggregating log data from multiple services is therefore preferably done through an automatic process—but there’s a catch. To effectively aggregate data, you need to store it in a normalized, structured format (for example, as JSON), which means it needs to be transformed somewhere in the logging process. Consequently, it’s common to find solutions where each service logs data in natural language and later transforms it into a structured format using a common external normalization step before passing it to the logging system (as illustrated in figure 13.1).

figure13-01.eps

Figure 13.1 Log data is transformed into JSON in an external normalization step.

The upside to this, ironically, is also its downside. By having a normalization step, you encourage a design with great flexibility in terms of logging, but it opens up logging of unchecked strings as well—and that’s a security concern! It’s also common that normalization is implemented using a temporal state on the local disk, which is problematic because it complicates the repavement phase of the three R’s of enterprise security (which we talked about in chapter 10). The third issue is less obvious and involves integrity of data during the normalization step.

When normalizing data, you restructure it into a key-value format that, by definition, is a modification of its original form—but does that violate the integrity of the data? Not necessarily; you only need to ensure the data hasn’t changed in an unauthorized way. In theory, this should be simple, but in practice, it’s hard, because validating the transformation logic from natural language to a structured format isn’t trivial and is something you probably want to avoid. Another solution is to therefore structure data in each service before passing it to the logging system, as illustrated in figure 13.2. This way, you avoid using third-party normalization software.

figure13-02.eps

Figure 13.2 Log data is structured into JSON before being passed to the logging system.

The downside to this approach is that every microservice needs to implement explicit normalization logic, which adds complexity, but avoiding third-party dependencies also reduces complexity, so it probably evens out in the long run. Two other aspects are also interesting from a security perspective. First, by explicitly normalizing log data in each service, it becomes possible to digitally sign each payload using a cryptographic hash function (for example, SHA-256) before passing it to the logging system. This implies that the integrity of log data can be verified explicitly in the logging system, and you know it hasn’t been tampered with. Second, normalization is often tightly coupled with categorization of data, which requires extensive domain knowledge (especially when you’re dealing with sensitive data). The natural place for this isn’t in a common normalization step but rather within each service. We’ll talk more about this later on in this chapter when analyzing how confidentiality is achieved using a domain-oriented logger API.

Choosing the right logging strategy is important from an integrity standpoint, regardless of whether you have a monolith or microservice—but that’s not all that matters when it comes to logging. The next topic to consider is traceability in log data.

13.4.2 Traceability in log data

When logging in a monolithic architecture, the presumption is that the source is always the same. Unfortunately, this simplification is no longer valid when using microservices because you might need to identify which particular service instance took part in a transaction. For example, consider two payment services, A and B, where A has version 1.0 and B has version 1.1. The services use semantic versioning, which means that service B contains some additional functionality compared to A but is fully backward compatible with version 1.0. The only problem is that service B contains a bug that causes a rounding error that doesn’t exist in service A, and consequently, several financial transactions fail in production. At this point, you want to be able to tell whether service A or service B was used in a transaction—but if the logs don’t contain enough traceability, it becomes a guessing game.

The solution to the payment service problem is to add traceability to your system, but there’s some hidden complexity to consider. For example

  • A service must be uniquely identifiable by its name, version number, and instance ID.
  • A transaction must be traceable across systems.

Let’s see why this is important.

Uniquely identifying a service

In a microservice architecture, you’ll often choose to follow the rules of semantic versioning for your service APIs. This means it should be safe to invoke any service within the same major version range because all versions are backward compatible. But when it comes to traceability, you can’t make this assumption, because even if a version is fully backward compatible, there might be differences (bugs or unintended behavior) that distinguish one service from another. It might even be the case that instances with the same version number behave differently because of installment issues or because they’ve been compromised. Being able to uniquely identify a service is therefore important from a security standpoint. A common way to achieve this is to add the service name, version number, and a unique instance ID to each log statement.

Identifying transactions across systems

Uniquely identifying a service certainly allows you to achieve traceability on a micro level, but transactions seldom interact with one system alone. Instead, they span across multiple systems, and to fully support traceability, you also need to identify which services and external systems take part in a transaction. One way to do this is to use a tracing system, such as Dapper by Google3  or Magpie by Microsoft,4  but it might be overkill if you only need to identify which services and systems participated in a transaction. What you need to do is ensure that each transaction has a unique identifier and that it’s shared between services and external systems, as illustrated in figure 13.3.

figure13-03.eps

Figure 13.3 The unique trace ID is shared between systems A, B, and C.

Every time system A initiates a transaction in system B, it needs to provide a unique trace ID that identifies the transaction in system A. System B appends this ID to a newly created, (probabilistically) unique, 64-bit integer and uses this as the trace ID. This lets you identify all services in B that took part in the transaction initiated by A. System B then passes the trace ID to system C, and a new ID is created in a similar fashion. This way, you can easily identify all services that participated in a transaction spanning several systems.

13.4.3 Confidentiality through a domain-oriented logger API

In chapter 10, we talked about confidentiality and how to ensure only authorized consumers get access to data. The solution proposed was to separate log data into different categories (for example, Audit, Behavior, and Error) and restrict access to each category, but doing this in practice requires some additional thought.

Logging data with different logging levels, like DEBUG, INFO, and FATAL (provided by a logging framework), is a common design pattern used in many systems. At first glance, it might seem as if this could solve the confidentiality problem in chapter 10, but unfortunately, logging levels tend to focus on response actions rather than confidentiality of data. For example, if you see a log statement marked as INFO, you tend not to worry, but if there’s a statement marked as FATAL, you’ll probably respond instantly—regardless of whether the data is confidential or not. Another more business-oriented example is that of a bank withdrawal, where sensitive information such as account number, amount, and timestamp needs to be logged. In a design using logging levels, this might be categorized as INFO because it’s nothing out of the ordinary, but that level is also used for nonsensitive data such as average processing time. This diversity implies that all log entries marked as INFO must have restricted access because they can contain sensitive information—a confidentiality problem you don’t want.

A better solution, based on our experience, is to treat logging as a separate view of the system that needs explicit design, similar to what you’d do for a user interface. How data is presented on the web, on mobile devices, or in some other consumer context must always be well designed, because otherwise you’ll get a bad user experience. The same applies to logging, but with the difference that the consumer isn’t your normal user. Instead, it’s an automated analysis tool, developer, or some other party interested in how the system behaves. This means that structure and categorization of data need to be considered, but so does sensitive information.

How to classify information as sensitive or not is therefore an important part of your system design. But how to do this in practice isn’t a trivial task, because the classification depends on context and is an overall business concern. This implies that classification of data requires extensive business domain knowledge and should be part of your service design, not something you delegate to a third-party application. To illustrate, we’ll use an example from the hospitality domain.

Consider a web-based hotel system that handles everything a hotel needs to keep its business running from day to day: from bookings to housekeeping to financial transactions. The system is designed on a microservice platform, where each service defines a bounded context—but what’s interesting from a logging perspective is how services address the confidentiality problem using a logger with domain-oriented API. In listing 13.3, you see the cancellation logic of the booking service, where domain-oriented actions such as cancelBooking, bookingCanceled, and bookingCancellationFailed are expressed in the logger API. These actions are customized for this context only and are achieved by wrapping the raw logger implementation with a cancel booking interface.

Listing 13.3 Cancel booking logic using a logger with a domain-oriented API

import static org.apache.commons.lang3.Validate.notNull;
 
public Result cancel(final BookingId bookingId, final User user) {
    notNull(bookingId);
    notNull(user);
 
    logger.cancelBooking(bookingId, user);    ①  
 
    final Result result = bookingsRepository.cancel(bookingId);
 
    if (result.isBookingCanceled()) {
        logger.bookingCanceled(bookingId, user);    ②  
    } else {
        logger.bookingCancellationFailed(
                        bookingId, result, user);    ③  
    }
    return result;
}

The main upside to the logger API is that it guides developers in what data they need in each step of the process. This certainly minimizes the risk of logging incorrect data, but it also separates data in terms of confidentiality—so let’s pop the hood of the bookingCancellationFailed method to see how it’s implemented.

The bookingCancellationFailed method in listing 13.4 is interfacing directly with the raw logger API, where the log method only accepts String objects. This implies that the logger doesn’t care about what data it logs, just that it meets the requirements of a String. Categorizing data must therefore be made explicitly before invoking the log method, because the logger won’t make that distinction.

Listing 13.4 Logging categorized data to the logging system

import static org.apache.commons.lang3.Validate.notNull;
 
private final Logger logger ...    ①  
 
public void bookingCancellationFailed(final BookingId id,
                                      final Result result,
                                      final User user) {
    notNull(id);
    notNull(result);
    notNull(user);
 
    logger.log(auditData(id, result, user));    ②  
    logger.log(behaviorData(result));    ③  
 
    if (result.isError()) {
        logger.log(errorData(result));    ④  
    }
}

Only accepting strings in the logger API does indeed make sense because how you distinguish between audit, behavior, and error data is specific to your domain. In listing 13.5, you see the auditData method, which translates audit data into JSON, represented as a String. The map contains an explicit entry for the audit category. This shouldn’t be necessary because this is audit data by definition, but it allows the logging system to detect invalid data on an endpoint (such as audit data sent to the behavior log endpoint) or to separate data based on category if the same endpoint is used for all categories. The status field in the Result object indicates why the booking couldn’t be canceled (for example, because the guest has already checked out).

Listing 13.5 Method that extracts audit data to be logged as JSON

import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.ObjectMapper;
import static java.lang.String.format;
 
private String auditData(final BookingId bookingId,
                         final Result result,
                         final User user) {
    final Map<String, String> data = new HashMap<>();
    data.put("category", "audit");    ①  
    data.put("message","Failed to cancel booking");    ②  
    data.put("bookingId", bookingId.value());    ③  
    data.put("username", user.name());    ④  
    data.put("status", result.status());    ⑤  
    return asJson(data,    ⑥  
            "Failure translating audit data into JSON");
}
 
private final ObjectMapper objectMapper ...    ⑦  
 
private String asJson(final Map<String, String> data,
                      final String errorMessage) {    ⑧  
    try {
      return objectMapper.writeValueAsString(data);  ⑨  
    } catch (JsonProcessingException e) {
      return format("{"failure":"%s"}",    ⑩  
                       errorMessage);
    }
}

Behavior and error data are extracted in similar fashion, but with the exception of sensitive data. Only the audit category is allowed to contain confidential or sensitive information. This might require special attention in other categories when extracting data; for example, when dealing with error data from a stack trace. Some frameworks or libraries choose to include the input that caused the exception in the stack trace message. This means that you could end up with information like an account number or Social Security number in the error log if an exception is thrown during the parsing process. Consequently, you want to exclude the stack trace message from the error log; otherwise, you might need to place it under restrictive access similar to the audit log.

Categorizing data this way certainly enables you to meet the confidentiality requirements, but one question still remains: should you store everything in the same log and have different views (audit, behavior, and error) or have one log per category? Both strategies seem to facilitate confidentiality, but there’s a subtle difference in security to consider. Let’s explore both alternatives.

Assume you choose to store everything in the same master log and have different views with restricted access. Then the log would contain intermixed JSON entries, as illustrated in the following listing.

Listing 13.6 Master log with intermixed JSON entries

{
  "category":"audit",
  "message":"Failed to cancel booking",
  "bookingId":"#67qTMBqT96",
  "username":"[email protected]",
  "status":"Already checked out"
}
{
  "category":"behavior",
  "message":"Failed to cancel booking",
  "status":"Already checked out"
}

The upside to this is that the solution is fairly easy to reason about and to implement in code. You only need to categorize data in each service and store it in the master log. Aggregation of data into a view is then a product of your access rights; for example, audit data and error data could be shown in the same view if you’re granted access to both.

But this flexibility also results in a drawback that makes the solution unviable. By storing everything in the same master log and allowing categories to be intermixed, you open up the possibility of leaking sensitive data in a view, hence violating the confidentiality requirement. For example, audit data can accidentally become intermixed in a view with behavioral data. Even though this is highly unlikely, you need to ensure this never happens every time a view is generated, which adds significant complexity to the solution. As a side note, depending on what domain your system operates in, the logging system might need to comply with various data protection regulations (for example, the GDPR in the European Union), and violating confidentiality could then become a costly experience.

A better alternative is to categorize data in the same way, but have each category stored as separate log streams. This has the benefit of making log data separated by design, which in turn reduces the risk of accidental leakage between categories in an aggregated view, but there’s another upside to this as well. By favoring a design that enables log separation, you also open up the possibility of storing audit logs in a separate system with strict access control and traceability. This certainly seems to increase the complexity when accessing data, but the cost is justifiable when seeing it from a security perspective. For example, because audit logs could carry sensitive information, you must ensure they never end up in the wrong hands. This implies that strict access control and traceability are needed; otherwise, you don’t know how sensitive data has been consumed.

The life cycle of audit data is also important from a security standpoint. When a system is decommissioned, you tend not to care about logs anymore; except for audit logs, because there might be legal requirements that demand audit data be persisted for a long time (for example, financial transactions). Treating audit logs with care and storing them in a separate system is therefore a good strategy, both from a security and operational perspective.

You have now read about how to deal with sensitive data in a microservice architecture, how to design your APIs, and what complexity logging brings from a security standpoint. We’ve nearly reached the end of our journey toward making software secure by design. In the next and final chapter, we’ll talk about techniques that you should use in combination with what you’ve learned in this book. For example, we’ll discuss why it’s still important to run security penetration tests from time to time, and why explicitly thinking about security is necessary.

Summary

  • A good microservice should have an independent runtime, allow independent updates, and be designed for other services being down.
  • Treating each microservice as a bounded context helps you design more secure APIs.
  • Secure design principles such as domain primitives and context mapping are also applicable when designing microservice APIs.
  • In order to avoid common security pitfalls, only expose domain operations in APIs, use explicit context mapping between services, and pay extra attention to evolving APIs.
  • It’s important to analyze confidentiality, integrity, availability, and traceability (CIA-T) across all services.
  • Identify data that’s sensitive and possibly needs to be secured across services.
  • The integrity of log data is important from a security standpoint.
  • Normalization and categorization of log data requires extensive domain knowledge and should be part of the service design.
  • A service must be uniquely identifiable by its name, version number, and instance ID.
  • A transaction must be traceable across systems.
  • Using a logger with a domain-oriented API facilitates a design that considers the confidentiality of log data.
  • Don’t intermix sensitive and nonsensitive data in the same log, because that can lead to accidental information leakage.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.128.199.162