After discussing API design and implementation details, it is now time to discuss how APIs and microservice architecture fit together. This topic has been popular for quite some time and enterprises have started to move towards this design pattern. The content of this chapter is based on questions and discussions with customers. Martin Fowler’s article “Microservices” is available for anyone who wants to learn more about microservice architecture1 in general.
A microservice architecture refers to a design pattern that emphasizes the idea of having APIs be self-contained and serve one purpose only. Each API should be deployable through an automated process. An application may use multiple APIs that are grouped by business purpose. A microservice architecture should create a system that is highly fault tolerant, scalable, deployable, maintainable, and allows you to add and remove single APIs.
What Is the Difference Between APIs and Microservices?
At a first glance, APIs and microservices (“services” for short) are the same thing with different names. Both receive requests and produce expected responses. The external views on APIs and services do not give any hint to what they are. The differences are their internals, in regard to implementation details, deployment model, dependencies, and the scope of features they serve. In this chapter, I will refer to APIs as the old way of doing things and microservices (or services) as the new way of doing things.
APIs may be implemented on a server that hosts many other non-related APIs too. APIs receive requests and handle them but may also send requests to other APIs to complete their tasks. Unfortunately, when hosted on the same server, some APIs retrieve other API resources directly, for example, by connecting to an API’s database. This type of intercommunication is a recipe for expensive maintenance costs in all possible ways. This pattern is not uncommon and has caused many escalations and reduced software upgrades to rare events.
Microservices are built to serve one purpose only. Services that have different business purposes are not colocated on the same server. Services only communicate with other components via documented and provided interfaces.
In Figure 8-1 the application on the right side is based on a services architecture. The application leverages services but they don’t run on their own servers. An update of one service does not influence the other ones. In addition, one service is dedicated to communicating with the database and through it other services access the database. Each service can be scaled horizontally, independent of others.
The application on the left side is based on an architecture where a single server provides practically all features. In addition, all APIs access the database directly. Updating or replacing one of those APIs or the database schema is difficult. The regression test effort may be huge, depending on the behavior of the APIs. This is a scenario where it may take weeks or even months before an upgrade can be deployed. This is not unusual; I have seen it in customer environments. Having this type of architecture prevents systems from being updated regularly, which means that new features and security updates cannot be made available when they should be.
I recently attended a public event in Vancouver, hosted by a social platform, and the message was, Our website gets updated up to three times per day, our mobile app once a week. It is very unlikely that two attendees here have the same version of our app!’ That was pretty impressive.
It is difficult to top that dynamic environment. In spite of knowing about this possibility, it should not be the first goal when coming from a twice per year upgrade rhythm. Having multiple servers, multiple databases, each component communicating with others, everything tangled together is a tough situation. Being able to update such a system at least once per month is probably a big step ahead already.
The question is, How can we get from an API-based architecture to a services-based architecture with independent services everywhere?
The first step is to find out what exists. Often not even the current state is known. If developers are asked, What does this API do?, the answer may be Not sure, but it seems to work!’ Knowing that these kinds of answers will be given, you should ask different development teams to create dependency and entity diagrams to explain how their individual systems work. After collecting and tying together different diagrams, you can get a larger picture and the existing system will start to get transparent, which is one of the most crucial requirements for this task.
After the system has been documented, including communication channels between different entities, a small piece should be identified, ideally a piece of the system that is serving one business purpose only. This should be the Guinea pig for the transformation from a monolithic to a services-based application.
Developers should move this service onto its own server. For example, if it is a Java application, it could be deployed into an Apache Tomcat2 or JBoss server.3 As soon as these services are deployable and locally tested, they should be taken into a QA environment where test clients can verify their function. Once that is successful, clients who have been consuming the original service should switch to the new one. Step by step this service can be promoted to different environments. If this promotion is a manual task, this is the right time to start turning it into an automated process, even if it is only a bunch of scripts. It is important to get started!
Note
I sometimes hear people say that automation is not possible. This is usually not true. Instead, it has not been done before, it is difficult, and it requires changes in processes. No matter what, enabling automation must be a focus in the development, testing, and deployment process!
With some effort, including the automation, developers should find themselves in a situation where a check-in into a version control system (VCS)4 is all it takes to get a new version of a service deployed, or at least built. Getting this done in a test and/or development environment is the first step. It will take some time to figure out the details of how to do (or not to do) things, but it is a good feeling when a test server suddenly hosts an updated version of code with no manual effort. It also teaches everyone how not to break services interfaces without notifying anyone else because other developers, whose services consume these ones, will complain immediately!
Figure 8-2 indicates that a developer (or a group of developers) keeps her work within a development environment. She goes through all tasks that are needed to get the system up and running. Once she is done, she checks her code into the VCS. When this happens, a build server kicks off and executes automated tests, configures the services, and creates artifacts as needed. When this step successfully ends, the build server deploys the service into the target environment’s application server. This server instantiates the artifacts and the updated service becomes available.
Figure 8-2 is very simple, but in the end, it is always the same process, sometimes including a few more steps than shown but basically like that. Depending on the environment, the application server may host more than a single or logical group of services. Due to resource limitations this may not be avoidable, but, regardless, services should not have implicit dependencies to each other.
The automated process enables teams to redeploy services often. A bug was found, it got fixed (and nothing else), tested, checked in, and deployed. Considering my own experience, any manual task that can be eliminated is a step towards automated deployability. Updates do not need to be scheduled over months; they may not be scheduled at all! As long as interfaces do not change, clients will not need to be updated and can continue even with the latest service.
Note
Automated tests have very little value if a failing test raises the question Was it the test or the implementation that caused the failure? This question indicates missing trust in the test system and, with that, in the quality of the tested product itself!
The last few paragraphs got a little mixed up with the next section. Nevertheless, if the process of extracting services out of monolithic applications had its first small success stories, it becomes easier to follow the microservices pattern.
What to Know When Supporting a Microservice Infrastructure
Having the term “infrastructure” in this section’s title should indicate that there is more to microservices than just modifying the implementation. As mentioned in the previous chapter, it should be possible to automate the deployment of services. This requires a CI/CD5 pipeline that avoids as many manual tasks as possible. This is not only necessary to enable automation but also because the team members who will deploy the software are not part of the group of software developers.
To support a good working CI/CD pipeline, other groups than only developers are required. Network infrastructure experts, security experts, support, operations—all these groups are needed. Over the last two or three years the term DevOps6 was introduced and now refers to the whole process. DevOps emphasizes the fact that development and operations are working hand in hand (specifically development, QA, and operations). Each involved group between development and deployment has its own tasks, but at the same time the needs of other groups are respected.
Your developers implement, test, document, and release software into production environments all by themselves.
QA is testing software manually.
Network administrators accompany developers to open up server rooms and provide access to servers so that these developers can manually deploy new versions of software straight into production.
The database administrator is on stand-by during an upgrade to rescue failed attempts and suggest default values for database configurations.
You do have operations teams who have received instructions for manual software installations. The instructions assume deep knowledge of the software, which does not exist. After 5.5 hours of following instructions, the process is rolled back due to some undocumented and missing parameters (the procedure to roll back is not documented, so operations must figure it out on the fly).
QA has never tested the software in a production-like system (the development environment is the same as production anyways …).
You had to postpone a release due to a sick developer whose knowledge is required during an upgrade.
Systems have to be taken offline to run the upgrade. SLAs state very clearly how long this may take, and additional periods will result in costly penalties. To reduce the chance of having to pay those penalties, the number of releases is limited to two per year.
If all of the above, or at least a few of them, are true for your current environment, it is a strong indicator that some work lies ahead of you. The work is not only referring to implementations, but in changing the mindsets of teams. Current processes have to change!
Developer: I have written all 15 steps you need to follow to install the upgrade. Have a good evening. See you on Monday!
Operations: Ok, I will follow them tonight during the maintenance window.
The instructions say Open the installations menu and provide the default username. Guess what? Operations will already be stuck. They do not know how to open the installations menu nor are they aware of the default username! This little example is not fake. I witnessed it (not saying who I was in that scenario)!
- 1.
The developer assumed that operations knew how to open the installation menu.
- 2.
The developer assumed that operations knew the default username.
- 3.
Operations did not go through the instructions when the developer was still around.
In larger scenarios there are almost endless possibilities for failure! For that reason, development and operations need to work close together. For example, after the above situation, operations shared with the developer that they are maintaining more than 30 systems at the same time. It is impossible for them to be experts on all systems and to know the default username for each one of them.
Developer: I have written all 15 steps you need to take to install the upgrade. I also included a script that executes steps 1-6 and 9-11 if you prefer that. Usernames, passwords, locations for menus are all documented. I will be home later, but I have left my phone number for the worst-case scenario.
Operations: Let me just check the instructions …. Ok, I got it, Looks good. I will do a dry run right now and give you a call if something is missing. I will use the scripts to reduce the chance of errors caused between the screen and the keyboard. Thanks!
Runbooks
The written instructions are also called runbooks. Runbooks should have straightforward instructions but also cover anything that may happen outside the happy-path deployment process (this may even be the most important content, recovering from errors). A good runbook is created by team work! Operations must be able to install new systems or upgrade existing systems just by following the runbook instructions.
The shown groups may vary, but Figure 8-3 should be more or less accurate for environments that own the complete process.
Developers implement and build software and create a runbook based on their current deployment experiences. This draft is reviewed by operations and used in production-like environments. Their review results in a list of updates and a set of questions and recommendations. Documentation reviews the instructions and applies the feedback. In between, QA verifies that no steps for validating the software’s function are missing. This iterative process ends with a runbook that enables operations to install or upgrade systems with confidence.
Note
The documentation team is not always mentioned in the context of creating a runbook. Nevertheless, technical writers are the ones who can help formulate instructions to be understood in the target language. Developers and QA members often work in environments that use languages other than their native ones. For example, our documentation team turns my German-English into English frequently.
An accepted runbook is the first step towards a working DevOps process. Having this runbook points out that the team understands and respects everyone’s needs. Once this has been established, the next step waits.
Automating the Runbook!
Yes, automation is the overall goal for the process. Only automated processes permit frequent service deployments with low risk of failures. Where the first runbook is good for deployments that happen once in a while or environments with just a few services, the automated runbook is a prerequisite for enterprise-level systems with hundreds of services. To me, this became very obvious when I had lunch with a previous colleague who said, Sascha, I develop the code, I write the unit test, I commit it. That’s it! After a few days, my code runs in production and I have no clue how it got there!’ She did know that her code was tested in an automated QA pipeline and reviewed at some point. But the interesting part for me was that developers did not need to know the details of the deployment pipeline (the automated runbook).
Figure 8-4 lists the steps that are considered part of the CI/CD pipeline. It is an endless, ever-repeating circle. The left half contains tasks and asks for operations (Ops) and the right half the tasks and asks for development (Dev), which also includes QA. This image also indicates the hand-off from Dev to Ops. Development has no role on the operations side, which emphasizes the need for a process that does not need a developer to be available when a system gets released!
Note
In Figure 8-4 monitor is a little special and needs attention. Monitoring any deployment is highly important. Monitoring is the only way of knowing how the system performs. Operations needs to be able to collect metrics, analytics, and a view into the current state. Comprehensive monitoring capabilities should be an acceptance criteria for any deployment!
To summarize this section, supporting a microservices infrastructure requires an automated CI/CD pipeline. It requires investment in tooling, education, and a change of mentality. It is just as important as a strong foundation when constructing a house.
How Does Docker Help?
The previous section discussed CI/CD and DevOps. It spoke about (automated) runbooks. In traditional environments, application servers run and never stop (ideally). Software installations or upgrades are executed on those servers. It is the same process for each supported environment. In addition, developers often need their own, local instance to speed up development without breaking tests or builds that others are running at the same time. It is a huge effort to keep all these servers up and running and configure them all the same way, or, at least, similar to each other.
Docker7 is a technology that helps simplifying this situation. Docker has the concept of containers where a container serves one particular purpose and its content is referred to as docker image . Like containers on ships, containers can be stacked and replaced and do not influence others. On the other hand, multiple containers may form one application. Imagine a construction site. Sometimes you’ll see containers stacked on top of and next to each other, and each container is different. Although each container serves a different purpose (restroom, office), together they represent a complete construction site management building. Please note that Docker was chosen because it is very popular and because I have personally used it. But it is not the only container solution out there!8
Having these pictures in mind helps explain why Docker is relevant in the CI/CD, DevOps realm. Figure 8-1 displayed how services run in their own servers. When that figure was discussed, the message was each service is running in its own server. With Docker, this changes slightly. There is no server running into which a new service gets deployed. A service brings its own server! Furthermore, containers should be ephemeral, which means they appear and disappear without leaving a trace/ persisting data. Here is an example.
Without Docker: A developer creates a runbook. One area of the runbook explains how to upgrade software within a running application server. Another area explains how to set up a new application server and how to deploy new software into it. The automated runbook may do this without requiring manual effort. However, the new application server and the new software most likely need some sort of configuration, too. To make the complete chain of tasks work, the runbook does not only need to discuss the actual pieces of software; in addition, prerequisites have to be specified to match requirements for the application server and the software within it.
This is a very simple example but launching docker containers is generally similar. In this case, the application acme/app, version 1.0, will be deployed!
Regardless of the fact that this example is simple, the question is How does that one statement replace potentially many instructions in a runbook? To be honest, they are not replaced! But they are executed at a different point in time and by the developers themselves. This is where the automation story becomes relevant again. Here is another example.
- 1.
Download Apache Tomcat.
https://tomcat.apache.org/download-90.cgi
- 2.
Install Tomcat at /usr/local/tomcat.
- 3.
Remove the example web applications:
rm -rf /usr/local/tomcat/webapps/*
- 4.
Copy my project into the web applications directory:
cp add-ons/web /usr/local/tomcat/webapps/ROOT
- 5.
… many more …
This continues, line by line, until all my requirements have been addressed. If another instance has to be prepared and launched, the same steps have to be executed. It is hopefully obvious that this process is very error prone, especially if executed by a team that does not work with Tomcat in detail. And even if all those lines were moved into a script, the script could still fail!
Runbooks for operations can be simplified.
Runbooks reference docker images that are already tested.
Operations do not need to have any knowledge about Tomcat itself.
- 1.
Create a new docker image.
- 2.
Tag the new docker image (provide a useful name).
- 3.
Push the new image to a repository.
- 4.
Launch a new container using the new image.
It works like this:
As of now, the previous runbook only requires the docker run command from above. The image has certainly been tested and deployed into staging environments beforehand to verify its functionality.
Although this sounds very good and is very good, there are a few differences in comparison to traditional runbook procedures. For me personally, this is the main differentiator:
Containers are ephemeral !
- 1.Modifications against running containers are lost when the container stops.
Containers may even be immutable! With that, modifications would not even be possible!
- 2.
Modifications against containers are valid only as long as they are running.
- 3.
Containers do not persist data by default (which includes configurations).
- 4.
Launching multiple instances are duplicates of each other. Some resources may be available once only (i.e. ports).
- 5.
Each container instance requires the same resources (i.e. memory).
Especially the fact that even configurations are transient may raise the concern of having to build a different image for each configuration. For example, a container in the development environment may access a local database whereas the same container in production connects to a database hosted in a cloud environment.
The local file dev_server.xml would overwrite the file /usr/local/tomcat/conf/server.xml of the Tomcat image.
After a few seconds those two containers are available. To sum up this section, leveraging Docker has many advantages. However, to run software in Docker at an enterprise scale requires more than just creating the docker images themselves. The infrastructure for that has to be provided, knowledge has to be available, and success and error cases have to be managed just the same way. Platforms such as Red Hat OpenShift10 or Microsoft Azure for Docker11 should be evaluated as a Docker management platform.
Summary
Turning an existing monolithic-style application into a microservice architecture is a challenge. This challenge has great benefits but cannot be done without commitment of all teams including business owners. At the end of the transformation, new versions of software systems can be deployed frequently and reduce the risk of failures.