© Akos Hochrein 2019
A. HochreinDesigning Microservices with Djangohttps://doi.org/10.1007/978-1-4842-5358-8_3

3. Anatomy of a Microservice

Akos Hochrein1 
(1)
Berlin, Germany
 

Now that we have a vague idea of what a microservice looks like from a birds eye view, it is time for us to zoom in and take a look at a closer anatomy of various services and how they interact with each other internally. For an easy understanding of our architecture, we are going to focus on 3 major categories, so reasoning is a bit easier when talking about the entire system

Firstly, we are going to look at the dimension on where in your architecture a specific service can be located. We are going to examine 3 types:
  • Frontend services

  • Mixed services

  • Backend services

Let’s start from what users don’t see. You will see why.

Backend Services

Every system has components which the users don’t really interact with directly, just through some many many layers of abstractions. You can, for example, imagine a state machine that calculates how the user is represented in the current system. Are they a former customer, or are they currently active? Perhaps they have never ever used a paying feature on your website before, yet for marketing purposes it is important to store their data.

Backend services are there to provide the backbone of your application. Most of these encapsulate functionalities that have the core business logic for the company. They oftentimes provide data or have something to do with data transformations. They are more likely to be providers of data to other services than consumers of it.

Designing pure backend applications sometimes feels like a trivial challenge and we will go over a couple of examples from our pizza application to make sure that we understand what this means. In Figure 3-1 you can see the pizza microservice connected to a data store that contains the models we have defined in the previous chapter under the pizza Django application.
../images/481995_1_En_3_Chapter/481995_1_En_3_Fig1_HTML.png
Figure 3-1

The tizza application backend systems imagined

Well, at least some of them, as we will see in a couple of chapters. Another service that you can see on the diagram is the auth service, this we will use for all user related information and authentication. Some teams also use a service like auth for authorization, depending on your taste, you can move that layer to a separate part of your architecture as well. Keep in mind, though, that data that exists in similar or the same business domain often should stay close.

One thing that is definitely worth mentioning here is that the design of these services is driven by both the data that they host and the domain that they work in. It is a very common mistake when building microservices to create services that are only there to host a single specific data type. When different types of data lie in similar business domains, they should live close to each other physically and logically. This is, however, always a tough call to make. Here are a couple of cases that you can lament over with your colleagues during lunch break:
  • Should pizzas and pizzerias be hosted on the same service in the same data store or not? What if we start storing topping for pizzas? Would your opinion change in that case?

  • Where should we store permissions? Should it be bound to the pizzas or bound to the users of our system?

  • Where should we store likes?

There are multiple good answers to all of the questions above. My recommendation is to think of it this way: if the data is not or loosely coupled, then you’re safe to break it up. If it’s tightly coupled, try to measure how tight the coupling is. You can always measure, for example, how many times do you query the different resources together from your data store, it might help you make the decision.

Keeping all the data in the same place can work for some time for your company. As a matter of fact, keeping data in the same place will speed up your operations significantly in the beginning. However, after a while, the issues and stories that we mentioned in chapter 1 will come up again and again. It’s a real shame if you have a single table in your database that is not very critical but fills up your storage and causes an outage to your core business. On the other hand, moving everything apart will remove your ability to create joins and fast operations that happen on the storage level, you will need to fetch data from different sources and connect them manually, sometimes writing inefficient queries.

Some of this talk might give you an idea of doing data duplication amongst your various storages. Let’s do a quick detour around this topic.

A Note on Data Duplication

Now that we’ve talked a lot about how the data serving services work, I’d like to take a short detour to talk about data duplication between the different data stores that you’re going to be working with when you migrate to microservices.

Data duplication can come very naturally when working with microservices. After all, you _do_ need the email address in your service, right? Why not store it when the user was created and then you can be confident that this data is available to you at all times.

Thinking like this can be very deceptive. When you’re working with microservices (and really, anything in software), the one thing that you always want to reduce, is maintenance work. When you introduce a new database, table or even just a field in your service, you’re creating ownership and maintenance work for yourself. In the email example mentioned above, you need to make sure that the email always stays up-to-date, meaning that you got to make sure that if your user changes it in the auth service, you need to change it as well in your own! When the customer wants to use their right to be forgotten, you need to make sure that you remove or anonymize the email address in your data store as well. On the long run this can cause a lot of inconsistencies and headaches.

Keeping data consistent across many systems is a very difficult problem. Database engineers have been fighting the CAP theorem for decades, creating algorithms like the hinted handoff or the Sloppy Quorum, achieving eventual consistency across various database replicas. Is it worth implementing complex consistency algorithms like these in your application?

As you can tell, I am not a huge fan of data duplication. Naturally, there are situations where you cannot avoid it, however, I usually recommend the following alternatives:
  • Virtual objects: Why do you need to store the entire user object if you can store an identifier with which you can query that object from another system?

  • Client and server side caching: Think about the data you’re working with. How important is it to be up-to-date? The owner service of the data can always implement a caching layer easily, but the same can happen on the client side as well!

Think about alternatives before you start copying data from other services. It might cost you dearly in the long run.

Now that we have a good understanding of where and how our data will be stored, let’s take a look at service types that will consume them.

Front-end Services

Frontend services exist to containerize the front-end applications rendered on the users’ machines. At first glance their existence might not make a lot of sense, however, there are a couple of reasons why designing services that are (almost) fully frontend might make sense for you and your teams:
  • Separation of concern - you might remember (or still work with) MVC models and the benefits of them separating the various parts of your application. In a way, you can think about frontend services as the “view” layer of your MVC. Developers can specialize into working with these services, only utilizing interfaces of others and interacting with data that they are not owners of.

  • Separate tooling - if there are different teams working on front-end services, there will be different tooling around it as well with more specialized people for this field. Not all people who are familiar with gradle are familiar with Webpack. However, this doesn’t necessarily mean that they cannot learn from each other!

Front-end services can consume data directly from backend services and systems that are there to integrate the data provided by the backend services into a more digestible format defined by specific business logic. Let’s take a look at mixed services.

Mixed Services

As per the philosophy of SoA, sometimes we need systems that do just one thing for our business and it does that right. Engineers who have no specialization in frontend or backend need to take care of these services. It is also entirely possible that these business components don’t tie to the engineering department strictly. The main focus of this book will be around backend and mixed services.

If ownership or lack of people to maintain systems dictates, we can completely consider systems, that I like to call “mixed services”. In the wild, they are sometimes referred to as backend to frontend services, or BFFs.

Mixed services have plenty of frontend and backend components wired together to achieve a simple business goal. Let’s take a look at an example before we jump into the code:

Let’s imagine a world, where in a distant future, we become the technical leads of one of the most important teams in tizza, which is the tizza-admin team. Our mission is to make sure that all pizza creators can easily manage their pizzas and can promote marketing campaigns inside the application itself. They need a single page app for this to make the experience smooth. After reading the specification, the following questions might arise:
  1. 1.

    There’s a lot of data going on here, where should we get if from?

     
  2. 2.

    Should we call all service endpoints separately from the frontend?

     
  3. 3.

    What about mobile? Can they handle all the data?

     

All of these are valid questions that every full-stack (and non-thereof) should ask themselves when building single page applications with multiple data sources. The first thing that we won’t want to do is connect to the existing databases (we will have more reasoning about this later in the chapter), so we will confine ourselves to calling APIs. Here we have the option of calling the endpoints of our data (in this case, for example we need the list of pizzas, the permissions, campaign options and payment details, amongst many other things) from separate data sources or from a single one. With the power of event loops and threads, we can easily run the first option to parallelly get all the information at the same time, however, we are using up a lot of network bandwidth.

Why is this an important question? In 2017, 63% of all network traffic in the United States was done via mobile devices, a lot of this through mobile networks. Mobile networks are fickle little beings. They are flaky, weak, the roundtrip-time is abysmal and people take them to places where the Sun only rarely shines, which makes network bandwidth optimization one of the top priorities we need to consider as engineers.

Changing the currently existing endpoints to support partial response payloads might be a bit of a hassle, so here comes the idea of a service that would aggregate the data for us and respond in a compact response. The drawback? We have introduced an extra call to the BFF.

With the separate service comes another beautiful thing, which is ownership. BFFs are usually the parts of your system which have the most business logic in it, making it the perfect candidate for ownership of product teams.

Now that we are familiar with the basic concepts of how we categorize microservices, we are going to do a dive into how a high level architecture of a service is supposed to look like.

Design Principles

We are going to take a look at methodologies like the SOLID principles - which are originally used with monolithic services to manage code complexity - and how they provide a useful way of thinking about services. We are also going to take a look at a couple of common design patterns that emerge during service design.

Keep in mind that the examples that we are to look at in this part should be taken with a grain of salt and thinking. These are not patterns that will solve all difficulties when designing services. Keep an open mind during your implementation and focus on your business problems when integrating these principles into your systems.

SOLID Building Blocks

Some of you might have heard about the SOLID principles formulated by legendary software engineers, like Sandi Metz and Robert C. Martin, if not, this might be a very eye-opening little snippet here.

The SOLID principles are essentially guidelines on how to design your code and code architecture to spend the least amount of time with feature development and maintenance in the future. We will briefly go through the 5 principles with some examples. If you’d like to read more about it, I highly recommend Clean Architecture by Robert C. Martin as reading material. These principles are not strictly related to microservice design, but I’ve found a great deal of inspiration in them while thinking about the systems that me and my team was building. Also, understanding and applying them (if needed) will objectively make you a better programmer.

1. Single responsibility principle - States that a member of your system (class, method, or even an entire microservice!) should only have a single reason to change. What does this mean? Think about a function that is responsible for fetching data from a data storage and displaying it on a web UI. Now, this component might have 2 reasons to change. First, if the data or the datastore changes that it reads from, like adding a new column to a database table. Second, if the format that it displays the data changes, like allowing `json` formats as well as `xml` data as your response. Ideally, you’d like to keep these layers separate. Since they become easier to reason about.

2. Open-close principle - States that parts of your system should be open to extension, but closed to modification. Now, this doesn’t mean that you should write code that is impossible to change and fix in the future, rather it means that if you’d like to add new functionality to your software, you should not need to change the already existing code to do so.
def pizzas(request):
    if request.method != 'GET':
        # we are post (I guess)
        return update_pizzas(request)
    else:
        return get_pizzas(request)
Listing 3-1

Not conforming to open-close

Adding a new method type to the above code requires serious modifications
def pizzas(request):
    if request.method != 'GET' and request.method != 'PUT':
        # still post! (I guess)
        return update_pizzas(request)
    elif request.method == 'PUT':
        return create_pizzas(request)
    else:
        return get_pizzas(request)
Listing 3-2

Still not conforming to open-close

Instead, consider the following (still not the best, but will suffice):
PIZZA_METHOD_ROUTER = {
    'GET': get_pizzas,
    'PUT': create_pizzas,
    'POST': update_pizzas,
}
def pizzas(request):
    return PIZZA_METHOD_ROUTER.get(request.method)()
Listing 3-3

Conforming to the open-close

3. Liskov substitution principle - States that if you have types in your program that has subtypes, the instances of said type should be replaceable by the subtypes without breaking your program. One of the more object oriented principles of the 5, this essentially states that if your abstractions of your code should be replaceable by the concrete members if required, this way ensuring correctness of the system in the long run. I found that the Liskov substitution principle is quite easy to follow if the engineer uses an IDE that tells them if they are breaking the rules of the superclass. One additional thing that makes it much easier to follow this principle is to minimize the use of metaprogramming. This we will get into later in the book.

4. Interface segregation principle - States that many client-specific interfaces are better than a few big abstract ones that have many functionalities. In other words, you don’t want your clients to depend on things that they don’t need. This is a very-very important principle in modern software engineering that is many times ignored. Basically the idea behind the service oriented architecture principles.

Imagine that you’re a backend developer. Your job is to write pristine, multi purpose APIs that hundreds of internal and external clients use every single minute. Your interfaces have grown throughout the years into massive monsters, some of them have no limits of the amount of data they return about the customer. From the first name to the number of restaurants they’ve attended with the list of friends for each visit is returned in the response every single time. Now, it’s possible that it’s easy for you, the database is sitting beneath you and with clever queries on your MySQL cluster, you were able to keep the APIs blazing fast. However, the mobile teams suddenly start complaining. They are saying that you cannot possibly expect customers to download hundreds of kilobytes of data each time they open the application! It is true, the massive APIs would definitely be better off sharded into smaller ones. This way the data that is queried is more specific and the refactoring and extension of such backend services will be faster. When building APIs, always start from the client!

5. Dependency inversion principle - States that systems should depend upon abstractions, not concretions. Probably one of the most famous ones from the 5. Basically states that you should be using clearly defined interfaces in your code and your components should be depending on these. This way, you give yourself flexibility in the implementation layer.

Microservices are - supposed to be - all about the dependency inversion principle. In an ideal world the systems communicate using contracts, such as API definitions to make sure that every service is on the same page about what sort of data is produced and can be consumed. Sadly, the real world is not always littered with such sunshine and happiness, but we are going to take a look at methodologies on how we can aspire for this.

One thing that people often forget about microservice design is that it does not permit you to write bad code and follow bad design patterns on the low level. Make sure that you’re proud of the system that you design both on low and high levels of abstraction, and that this service is not just replaceable, but maintainable as well.

12 factors

One of the more popular service design methodologies is following the rules of the 12 factor app. Originally authored by Adam Wiggins and later on forked and maintained by Heroku, the 12 factor app is a microservice design methodology collection that gives us 12 points that should be followed to build a scalable and maintainable service. Now, these methodologies cover a much wider spectrum than what this cook can cover in-dept, so I recommend reading more at 12factor.net.

1. There should be one codebase tracked in the revision control system, deployed multiple times

I think nowadays there are not too many codebases that are not tracked with various revision systems, such as Git or Subversion. If you’re one of the people who have not adapted these technologies, I heavily recommend checking them out and integrating them into your workflow. An app should consist of one codebase with one or more deployments. In object oriented terms you can think of your codebase being a class and a deployment an instance of your class with various parameters that enable it to run in production, development or test environments.

Your codebase can have different versions in different deployments. For example, your local development deployment can be running on a different version of the codebase as you are building the application.

2. Dependencies should be isolated and explicitly declared

As we are going to learn from later parts of the book, dependency management is one of the biggest and most difficult questions of building microservices. The second rule of the 12 can give us a couple of rule of thumb that we can follow to get started.

In the python world we usually use pip combined with requirements or setup files as a dependency managers. This rule dictates that all your dependencies should have pinned versions. What does this mean? Imagine the following situation: you’re using package A in your application with an unpinned version. Everything goes completely fine until a critical security gets found in the package and you never get notified of it. Furthermore, the only maintainer of the project has disappeared 8 months ago, leaving all your user data stolen. Now, this might sound like an extreme situation, but if you’ve ever worked with dependency managers like npm and the version indicators like ^ and ~ you know what I am talking about. Stay on the safe side and use == for your dependencies.

3. Store configurations in the environment

In order to adhere to rule #1, we need to store the deployment dependent configurations separately from the deployment itself. Deployment dependent configurations could be many things and they are often essential for your application to run. We are referencing to variables such as:
  • Database and other external system’s URIs

  • Credentials

  • Settings for logging and monitoring

4. Treat external services as resources

External services can vary from databases and caches to mail services or even completely internal applications that provide some sort of service to your application. These systems need to be treated as resources, meaning that your application should support changing their origin on demand. The application should make no difference between third party environments.

Imagine the following situation: there is a massive marketing campaign coming up and your third-party email provider just cannot take the load. Upgrading your plan might take some time, but spinning up a new (higher throughput plan) application in the third party seems like a viable and quick solution. A 12 factor app should be able to handle the switch without much issues, as it doesn’t care about the environment it’s sitting in, only the configuration it’s using. In the example, changing the auth credentials of the application saved the day.

5. The non-development deployment creation should support the build - release - run cycle

A 12 factor app separates deployment creation into 3 separate stages
  • Build - When your code and dependencies are assembled into an executable.

  • Release - When your assembled executable gets combined with the environmental configs and creates a release ready for execution in a given environment.

  • Run - The assembled executable and configs now run in the given environment.

Why is it so important that we shard this process? It’s a very good question, and the most simple answer I can give is reasoning about the application. Imagine the following: there is a critical bug in production for your payment systems. The team immediately starts looking at application code on your version management system, checking the most recent commits while the revert was happening. Nothing had indicated that the issue should’ve happened in the first place, still the team made the decision not to re-release the broken version until the bug is found. Only days later the team learned that an engineer made changes to the production code for the payments systems. This is one of the examples that the 12 factor applications would like to avoid with this rule.

Now, the above problem is quite difficult to solve without proper security restrictions to your production systems, however, there are tools to discourage engineers from doing this in the first place. For instance, you can use a proper release management system, where rollbacks of applications are simple, such as `helm` for Kubernetes. In addition, all of your releases should have a version and a timestamp attached, preferably stored in a changelog (we are going to dive deeper into these sort of systems in later chapters).

6. 12 factor apps are stateless processes

12 factor applications assume that nothing will be stored long term on the disk or in memory next to the main application. The reason for this is, again, being able to reason about the application and the bugs it might be associated with in the future. Naturally, this doesn’t mean that you cannot use the memory, it is recommended to think of it as a single request cache. If you store many things in-memory and your process restarts for some reason (such as a new deployment), you will lose all that data, which might not be beneficial to your business.

Applications with persistent sessions, where user data could be reused multiple times across requests, should still be stored in some sort of data store, in this case this can be a cache. Later in the book we are going to explore some python packages and frameworks, like asyncio and aiohttp-jobs where it’s very easy to enter the danger-zone of storing your requests in memory and losing it altogether during a process restart.

7. Export your services using port-binding

A little bit more web-development specific (but hey, most of this book is about that), this rule dictates that the application should be entirely self contained, should not depend on the runtime injection of a web server, but exports it’s interfaces by binding to a port and serving requests through there.

In our cases, Django will take care of all of this.

8. Scale out using processes

The base of every application should be the process, which should be interpreted as Unix-like service daemons. Various types of processes should be designed to handle various types of payload. Longer running tasks with heavier computation might require workers or other asynchronous processes, whereas HTTP requests might require web processes to handle.

This doesn’t mean that threads are discouraged from the runtime of your process, in case of Python, your application can absolutely utilize the `threading` library, or `asyncio`. On the other hand, your application needs to be able to expand as multiple processes running on the same or multiple physical and/or virtual machines.

Make sure to not overcomplicate things on the operation system level, just use the standard tools for managing your processes, like `systemd` or `supervisor`.

Point number 6 enables this.

9. Processes should be easy to spin up and dispose of

Have not attachment to the processes of your 12 factor apps, as they should be just as easy to get rid of, as easy they are to create. At a moments notice. This comes with a couple of requirements though.
  • Startup should be fast - processes should take just a few seconds to start up. This is needed for simplified scaling and fast release process. Achieving this can be quite tricky. You should make sure that there are no expensive operations when you’re loading your application - such as remote calls to separate web servers. If you’re using many modules, you might want to look into the lazy_import method of importlib.

  • Shutdown should be graceful - when your process receives a SIGINT (or even SIGTERM) from the operating system, it should make sure that everything shuts down in order, meaning that the running process/request finishes in your application, the network connections and the file handlers are closed. In the case of django, the selected WSGI servers are going to take care of this for you.

10. Keep development and production as close as possible

Make sure that the code that’s running in production is as close to the code running on the development machines is as close as possible. Just for the sake of avoiding a misunderstanding, by closeness, we mean the difference between the version of the application running. According to the 12 factor app, you will need to work on 3 “gaps” to achieve this:
  • Time: The actual time that takes the developer to deliver a feature to production - Whether this is days or weeks for you and your company currently, the goal is to reduce it to hours or even minutes.

  • Personnel: The time it takes for the ops engineer to deploy the code - Developers of 12 factor apps should be able to be involved in the deployment process and monitor the application without the need of an ops engineer.

  • Tools: The tools that are used during development versus the tools that are being used in production (i.e. databases, remote systems, etc.) - Keep the development and production tooling as close as possible.

You might think that most of these are easier said than done. A decade ago it was almost impossible to imagine services being deployed without ops people in place in the matter of minutes. Most continuous development and deployment systems were built by hand using various collected scripts from the ops people who had gotten bored of running `rsync`s every time someone changed something in the codebase. Today there are entire industries and technology branches developed to make the deployment experience faster, simpler and less error prone. There are systems which can just hook up to your git repository and offer automated deployments to your clusters, such as AWS CodePipeline, CircleCI or Jenkins.

Note

If you’re not familiar with continuous integration (CI) or continuous deployment (CD) pipelines, I recommend doing some reading on it. There are excellent resources found on devops.com.

Regarding the tooling, today, in the age of containerization, there are multiple tools that you and your developers can use to simplify this. Before we take a glance at them, let’s take a look at why this is important:

Imagine the following situation: a developer of yours is working on a very complicated query that your system’s ORM cannot handle, so you decide to use a RAW query for the solution. The developer fires up their local systems and started building the query on their local SQLite database. After a couple of days, the multi-hundred line query is complete, covered with automated and manual tests, everything works like a charm. The developer gets that approvals on their pull-request, and after deployment your monitoring system alerts the team that the feature is not operational. After some debugging, the developer comes to the conclusion that there was a syntax difference between his local SQLite database and the Postgres running in production that he hadn’t known of.

In the past it made sense to run lightweight backing services on your local development deploy, since the resources on your machine were usually limited and expensive. Today with the monster development machines that we use this is no longer an issue. The other problem could be the availability of the type of backend services. Maintaining a Postgres cluster on your local machine might seem tedious, and it is, if you don’t have the tooling backup that is provided today with the power of virtualization and especially containerization.

Setting up a Postgres database on your local machine is as easy today as writing a Docker compose file that looks something like this:
version: '3'
services:
  postgres:
    image: postgres:11.6.1
    ports:
      - "5432:5432"
Listing 3-4

Sample yaml to spin up a database with Docker Compose

There are no more excuses! Make sure to use a similar ecosystem in all your deploys, to reduce the type of errors detailed above.

11. Logs should be managed by something else

This point is quite simple. A 12 factor app should not concern itself with managing and writing to various loglines and should treat all logs as an event stream that is written to `stdout`. This makes development quite easy, since on the local machine the developer can see what events are happening in their application, speeding up the debug process.

In staging and production environments, the streams are collected by the execution environment, and then shipped for viewing and/or archival. These destinations should not be configurable by the 12 factor application.

Nowadays there are dozens of great logging solutions at your disposal. If you are unsure where to start, I recommend checking out Logstash, Graylog or Flume.

12. Run your administrative processes as one-off processes

Oftentimes developers need to run manual processes/scripts for maintenance purposes on the 12 factor app. Some examples include:
  • manage.py migrate for database migrations on your Django application

  • One time scripts to patch user data in the database

  • manage.py shell for getting a Python shell to inspect the application state versus the database

These processes should be run in an identical environment as where the long running processes of the app are running. They require the same codebase and the same configurations. Admin code must ship with the application code to the various environments.

Conclusion

Now that we went through the rules of the 12 factor application, we might have a vague idea what a performant microservice looks like. Ideally, you have heard of most of these points and think of them as things that would be worthy additions to your arsenal of designing services. There will be some parts of this book, where we are going to observe how these rules can be broken in ways which might be considered acceptable, due to the development or business constraints you have. Wherever we will break the rules of the 12 factor, I will let you know and you can evaluate yourself if it's worth it or not.

We’ve embraced a couple of high level design philosophies about how our services should look like from a bird’s eye view. Now we are going to zoom in and learn how they should communicate with each other.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.219.236.62