Chapter 5. Application architecture and workflow

This chapter covers

  • Building CoreOS into your application architecture
  • Understanding the twelve-factor methodology
  • Harmonizing development, persistence, and presentation

At this point, you should have a basic, practical understanding of how CoreOS functions. This chapter is intended as a primer for someone with a role like software or systems architect. The assumption is that you’ll be building a new application for, or migrating an existing application to, CoreOS. As such, this chapter is less about technical practice and more about the planning you need to do before any technical implementation.

5.1. Your application and the twelve-factor methodology

Suppose you’ve been tasked with drafting the architecture for a new SaaS product, and you want to use CoreOS as your target platform. Where do you start? What are the best practices in this space? Although it isn’t explicitly meant for CoreOS, the twelve-factor methodology (http://12factor.net) is a set of guidelines for successfully architecting complex application stacks. This approach doesn’t define any technologies or processes but is specifically useful in one of two ways, depending on your starting point:

  • If you’re building an application from scratch, it can guide your choices of technology and workflows.
  • If you’re migrating or figuring out how to scale an existing application, it can show you where and how those tasks will be difficult.

Briefly, the 12 factors are as follows:

  • Codebase— Your application’s code exists in source control, from which you do many deploys.
  • Dependencies— Supporting libraries should be explicit and isolated.
  • Config— The application configuration should be per environment.
  • Backing services— Data, persistence, and external services are all abstracted.
  • Build, release, run— The codebase is deployed through these strictly separated steps.
  • Processes— Your application process(es) should be stateless and share nothing.
  • Port binding— The application should be able to bind its own service.
  • Concurrency— Scale is achieved by adding processes (a.k.a. horizontal scaling).
  • Disposability— Processes should be disposable and have quick startup.
  • Development/production parity— Your development environment should be as similar to production as possible.
  • Logs— Logs should act as event streams and exist in the application as unbuffered writes to stdout.
  • Admin processes— Management tools should be task-oriented one-offs.

Throughout this chapter, I’ll refer to these factors when they come into play for architecting an application for CoreOS. Some have less relevance than others with respect to CoreOS, and it’s always up to you whether you want to implement this methodology into your organization’s technical design process. CoreOS’s design resolves many of these factors for you, so we’ll start by going over each of them and where CoreOS does (or doesn’t) help you.

5.1.1. CoreOS’s approach

You’re reading this book, so I’m sure it’s no surprise that abstractions are how we maintain sanity in complex systems. You’ve probably experienced that it can be hard to find agreement on where those abstractions should be, how they function, and how to use them. Even at the level most relevant to CoreOS, best practice is still an open question: virtualization and containerization have overlaps and competing technologies internally. Obviously, with CoreOS, you’ve made a choice to go with containerization over virtualization to abstract your services; you’ve chosen to rely on etcd and fleet to manage at least some of your configuration state and scheduling for scale. With CoreOS, you can also manage stateful data services at scale, and you have a networking abstraction system through flannel.

If these seem like opinionated systems, that’s because they are, by design. Orchestrated together, they’re designed to immediately solve some of the twelve-factor problems.

Codebase

CoreOS doesn’t provide much here. As long as your final product consists of a container and service units, the codebase and source control you use are inconsequential. This is, of course, by design: the fundamentals of containerization provide an explicitly generic platform so you aren’t tied to any one technology. You will, however, have to consider your CoreOS deployment in your codebase; section 5.2 goes into the details of what this means.

Dependencies

Nothing is explicitly gained by using CoreOS for this factor, other than the inherent dependency isolation you achieve by using containers. So, you’ll likely apply this factor implicitly.

Config

This factor ensures that your software’s configuration is relative to its environment. This means not baking your configuration parameters into a build, and making sure that what needs to be changed in the config is available via environment variables. CoreOS solves this problem at scale with etcd, which gives you a distributed store specifically designed for managing environment configuration.

Backing services

This factor has more to do with ensuring that services that back your application (like a database) are interchangeable. CoreOS doesn’t enforce or solve this problem explicitly but does make it easier to solve by better defining the dynamic configuration, as per the third factor. And by using containers, you probably already have loose coupling between services.

Build, release, run

The build and release processes are out of the scope of what CoreOS can help with. But fleet and its version of systemd provide the standard for application runtime, and containerization implicitly provides some level of release context (such as Docker tags).

Processes

CoreOS resolves process isolation with containerization. It also enforces that isolation by requiring you to build your containers with the expectation that they could lose state.

Port binding

Port binding is well covered in CoreOS. Containerization and flannel give you the tools to abstract and control the port binding of your applications.

Concurrency

With fleet, CoreOS gives you a number of tools to control concurrency among your service units. Flannel also helps you keep the port configuration consistent across multiple instances of the same process.

Disposability

CoreOS strictly enforces disposability. You must rely on fleet and etcd as the central sources of truth for your architecture’s state.

Development/production parity

This is a goal achieved by containerization, but not by CoreOS specifically.

Logs

CoreOS expects all containers to output only to stdout and stderr. It controls this stream with systemd’s journal and provides access to it via fleet.

Admin processes

CoreOS doesn’t facilitate creating administrative tools in any way, but it does provide an interface via fleet and etcd to make creating those tools easier.

As you design your application architecture, keep in mind these 12 factors and how CoreOS augments their application. Remember, too, that these are just guidelines: especially if you’re migrating an application that has any components that don’t fit the model, those components can be difficult or impossible to transform into an optimal configuration.

5.1.2. The architecture checklist

To locate the holes in your architecture, learn how to begin writing your technical design, and determine how far you are from an optimal twelve-factor configuration, it’s useful to start with a checklist:

  • What infrastructure are you using for CoreOS?
  • Which services are stateful, and which are stateless?
  • Are dependencies between services clear and documented?
  • Is the configuration that will describe those dependencies well known, and can you apply that model in etcd?
  • What does your process model look like?
  • What services and configuration of your system do you need to expose outside of the cluster?

If you can answer all these questions in detail with information from this chapter and chapter 4, you’ll be well prepared for building out a complex system in CoreOS. Before you start applying your architecture, though, you need to address some requirements in your application code.

5.2. The software development cycle

You’ve gone through the process of mapping out the technical design for your latest project with the twelve-factor methodology in mind, including everything CoreOS brings to the table. What details need to be resolved in your various codebases to make this design fully functional?

Your codebase, dependency management, and build/release/run workflows are all part of a software development lifecycle that may or may not be well defined in your organization. Determining how you’ll build around or fit CoreOS into that cycle is, of course, critical to your success. We won’t go into how Docker solves some of these problems; for more detail on the benefits of containers, Docker in Action (Nickoloff, 2016, www.manning.com/books/docker-in-action) is a good resource. Specifically, though, we’ll cover where the CoreOS-related components live in your codebase, how that code resolves dependencies between services, and how to automate the deployment of your services. This will mostly be a high-level discussion: the actual implementation will be very specific to your application and your organization. Once you’ve mapped out all these components, you’ll be ready to create a development and test plan for getting your application live.

5.2.1. Codebase and dependencies

In this book, you’ve seen a lot of custom scripts and logic being built to hook into CoreOS’s various features and systems. You absolutely should retain your unit files in source control. Where you do that starts to become a bit tricky. Unless you’re deploying a single monolithic application with no outside dependencies, you’ll have services that are shared. Often these are persistence layers, which probably exist somewhat outside of your development cycle. You also may be using a mix of containers that are publicly available images (for example, official Docker Hub library containers), containers that are based on public images, and some that are entirely built from scratch. Keeping the latter two types in the source control of their respective projects is easy, but containers you’re using straight from the public Docker library need to have their service unit files in source control as well.

The unit files for public images probably contain more logic than your custom ones, because custom applications are more likely to have environmental clustering logic built in than a base Docker image is. We’ll look more at what that means in the next subsection. If you’re using Git, my recommendation is to maintain a repository for your units with Git submodules in your custom applications. Taking a peek at chapter 6, the file tree looks something like the following.

Note

It’s a good idea to begin getting familiar with the layout of the project you start in chapter 6. You’ll build on it throughout the rest of the book.

Keeping a layout like this serves a few purposes:

  • You can keep an eye on the big picture of the layout of your applications.
  • There’s a clear separation between custom code and publicly available services.
  • You can use this repository with its submodules as a template for big-picture, continuous integration.
  • Service dependencies become more obvious.

The last point is especially important: easily understanding how the different parts of your project depend on one another is a great benefit in an organization where there are many engineers. Understanding the layout at a glance makes this simple. For example, if worker also depended on webapp, I probably would have made it a Git submodule of webapp. But wait! What if I create a new service that depends on both webapp and worker? The short answer is, don’t! Doing so would break the processes factor in the twelve-factor model as well as, arguably, dependencies. We’ll go into microservices a bit in the next section; but having a service with dependencies on multiple other services should be a big red flag that you’re creating very tight coupling between services, which can exponentially compound complexity and add corner cases of cascading application failures that may be difficult to predict. If you need to do so and still want to maintain this kind of file tree, you can either duplicate the submodule or symlink one to the other.

This brings us to environment logic and microservice interactions, which will become important to your development cycle when you’re building services based on infrastructure as code.

5.2.2. Environment logic and microservices

CoreOS is a platform that relies on your ability to build some kind of logic around the way it expresses the state of the cluster via etcd. You’ve seen a lot of examples in this book with sidekicks that react to or change this state in some way. This kind of logic can get a little complex in Bash, not to mention being difficult to maintain over time. It’s sometimes useful to be able to write things like sidekicks and functions in your applications that respond or write to this state. Usually, you can gather more context about your application from within its runtime, which opens up other opportunities in your app to communicate its status with (and use information from) etcd.

There are libraries for etcd in many different programming languages; if your language of choice doesn’t have a library, you can always fall back on the simple HTTP REST interface. Before we dive into using these APIs, let’s talk about the process model. Many projects and tools are designed to add a second layer of supervision to your processes; a good example is PM2 for Node.js applications (well, PM2 can launch any kind of process). There are plenty of good reasons to use these kinds of systems: for example, they can provide additional monitoring and performance-reporting metrics. But let’s look at what this looks like in practice in a process tree:

Although this isn’t explicitly stated in the twelve-factor model, it’s useful to try to think about your applications in the context of the scheduler they’re running under, and to understand them as dependencies with their own state. The node processes depend on pm2, and pm2 depends on docker, which loosely depends on dockerd. systemd is left not knowing the state of the node processes; essentially, you’re relying on a second process scheduler. It’s debatable whether the benefits of whatever this second scheduler does outweigh the context lost to the system scheduler, but it’s certainly less complex if only one scheduler is determining how things are run.

Why is this important? If you’re following a microservices model, this begins to go against the isolation of processes that gives you the benefits of loosely coupled systems. It also means you can’t easily derive state from the exit code of the node process in this example. If you have small services doing discrete things, it’s convenient to exit the program with an exit code that gives context to the scheduler to know whether it should be restarted. For example, if only one of the node processes throws an exception and exits, should they all fail? Or will they fail one by one and be restarted by pm2, and systemd will never be aware of the context?

You’ll see how to use this in the next chapter. In the web service application, logic checks etcd for a set operation on an etcd key set by Couchbase (the database used in the next chapter). If it sees this operation, it will exit(0), which lets systemd know it should be restarted—because that means Couchbase has moved to a different machine. In a microservices architecture, where things are loosely coupled and startup time is trivial for processes, exiting processes is usually the best way to reestablish state. This pattern also adheres well to considering the initial state immutable rather than something that’s copied and changed in the service.

I could fill many books with discussions of process architectures and state immutability. Ultimately, the implementation is up to you. How strictly you want to follow these models may or may not be up to you as an implementer of services on CoreOS, but you should be aware of how those choices affect the complexity of the overall system.

5.2.3. The application edge

The last consideration for a successful deployment is something that falls a bit out of scope for this book: how to expose the edge of your application to the world. This will be specific to your application, your organization, and your chosen platform for your infrastructure.

The last item on the checklist should cover the “what” of the items you need to expose, and the construction of that component probably is coupled fairly tightly to what you choose for your edge. Load balancers, DNS, external logging and alerting systems, policies and reporting, and backup/recovery procedures are all part of the edge of your system as a whole. They may be completely separate systems that you can deploy with CoreOS in and of themselves. Deciding how this hierarchy works is usually a larger organizational question (enterprise architecture), but you’ll want to be sure that these top-level components have separate zones of failure and scale vectors from the stack you’re responsible for deploying.

5.3. Summary

  • Apply the twelve-factor model to your application stack, where possible.
  • Make a high-level checklist for your architecture, starting with the one included in section 5.1.2.
  • Have a clear mapping of dependencies between services.
  • The application edge is the ultimate goal. It’s often useful to work on the architecture design both from the internal application requirements and from the outside expectation of the product.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.242.71