14

Understanding Distributed Tracing

In this chapter, we will learn how to use distributed tracing to better understand how our microservices cooperate, for example, in fulfilling a request sent to the external API. Being able to utilize distributed tracing is essential for being able to manage a system landscape of cooperating microservices. As already described in Chapter 8, Introduction to Spring Cloud, Spring Cloud Sleuth will be used to collect trace information, and Zipkin will be used for the storage and visualization of said trace information.

In this chapter, we will learn about the following topics:

  • Introducing distributed tracing with Spring Cloud Sleuth and Zipkin.
  • How to add distributed tracing to the source code.
  • How to perform distributed tracing, visualizing both successful and unsuccessful API requests. We will see how both synchronous and asynchronous processing can be visualized.
  • How to use either RabbitMQ or Kafka to send trace events from our microservices to the Zipkin server.

Technical requirements

For instructions on how to install the tools used in this book and how to access the source code for this book, see:

  • Chapter 21 for macOS
  • Chapter 22 for Windows

The code examples in this chapter all come from the source code in $BOOK_HOME/Chapter14.

If you want to view the changes applied to the source code in this chapter, that is, see what it took to add distributed tracing using Spring Cloud Sleuth and Zipkin, you can compare it with the source code for Chapter 13, Improving Resilience Using Resilience4j. You can use your favorite diff tool and compare the two folders, $BOOK_HOME/Chapter13 and $BOOK_HOME/Chapter14.

Introducing distributed tracing with Spring Cloud Sleuth and Zipkin

To recapitulate Chapter 8, Introduction to Spring Cloud, in reference to the Using Spring Cloud Sleuth and Zipkin for distributed tracing section, the tracing information from a complete workflow is called a trace or a trace tree, and sub-parts of the tree, for example, the basic units of work, are called spans. Spans can consist of sub-spans forming the trace tree. The Zipkin UI can visualize a trace tree and its spans as follows:

Figure 14.1: Example of a trace with its spans

Spring Cloud Sleuth can send trace information to Zipkin either synchronously over HTTP, or asynchronously using a message broker such as RabbitMQ or Kafka. To avoid creating runtime dependencies on the Zipkin server from the microservices, it is preferable to send trace information to Zipkin asynchronously using either RabbitMQ or Kafka. This is illustrated in the following diagram:

Figure 14.2: Sending trace information to Zipkin using a message broker

Zipkin comes with native support for storing trace information either in memory, or in a database such as Apache Cassandra, Elasticsearch, or MySQL. Added to this, a number of extensions are available. For details, refer to https://zipkin.io/pages/extensions_choices.html. In this chapter, we will store the trace information in memory.

With Zipkin introduced and placed in the system landscape, let's see what changes are required in the source code to enable distributed tracing.

Adding distributed tracing to the source code

In this section, we will learn how to update the source code to enable distributed tracing using Spring Cloud Sleuth and Zipkin. This can be done with the following steps:

  1. Add dependencies to the build files to bring in Spring Cloud Sleuth and the capability of sending trace information to Zipkin
  2. Add dependencies on RabbitMQ and Kafka for the projects that haven't used them before, that is, the Spring Cloud projects authorization-server, eureka-server, and gateway
  3. Configure the microservices to send trace information to Zipkin using either RabbitMQ or Kafka
  4. Add a Zipkin server to the Docker Compose files
  5. Add the kafka Spring profile in docker-compose-kafka.yml to the Spring Cloud projects authorization-server, eureka-server, and gateway

To run the Zipkin server as a Docker container, we will use a Docker image published by the Zipkin project. Refer to https://hub.docker.com/r/openzipkin/zipkin for details.

Adding dependencies to build files

To be able to utilize Spring Cloud Sleuth and the ability to send trace information to Zipkin, we need to add a couple of dependencies to the Gradle project build files, build.gradle.

This is accomplished by adding the following two lines:

implementation 'org.springframework.cloud:spring-cloud-starter-sleuth'
implementation 'org.springframework.cloud:spring-cloud-sleuth-zipkin'

For the Gradle projects that haven't used RabbitMQ and Kafka before, that is, the Spring Cloud projects authorization-server, eureka-server, and gateway, the following dependencies have also been added:

implementation 'org.springframework.cloud:spring-cloud-starter-stream-rabbit'
implementation 'org.springframework.cloud:spring-cloud-starter-stream-kafka'

Adding configuration for Spring Cloud Sleuth and Zipkin

Configuration for using Spring Cloud Sleuth and Zipkin is added to the common configuration file, config-repo/application.yml. In the default profile, it is specified that trace information will be sent to Zipkin using RabbitMQ:

spring.zipkin.sender.type: rabbit

By default, Spring Cloud Sleuth only sends 10% of the traces to Zipkin. To ensure that all traces are sent to Zipkin, the following property is added in the default profile:

spring.sleuth.sampler.probability: 1.0

When sending traces to Zipkin using Kafka, the Spring profile kafka will be used. In the kafka profile, we override the setting in the default profile so that trace information is sent to Zipkin using Kafka:

--- 
spring.config.activate.on-profile: kafka

spring.zipkin.sender.type: kafka

Finally, the Gateway service needs a parameter in the configuration file config-repo/gateway.yml to enable Sleuth to track trace IDs correctly:

spring.sleuth.reactor.instrumentation-type: decorate-on-last

For details, see: https://docs.spring.io/spring-cloud-sleuth/docs/3.0.1/reference/html/integrations.html#sleuth-reactor-integration.

Adding Zipkin to the Docker Compose files

As we mentioned previously, the Zipkin server is added to the Docker Compose files using an already existing Docker image, openzipkin/zipkin, published by the Zipkin project. In docker-compose.yml and docker-compose-partitions.yml, where RabbitMQ is used, the definition of the Zipkin server appears as follows:

  zipkin:
    image: openzipkin/zipkin:2.23.2
    mem_limit: 1024m
    environment:
      - RABBIT_ADDRESSES=rabbitmq
      - STORAGE_TYPE=mem
    ports:
      - 9411:9411
    depends_on:
      rabbitmq:
        condition: service_healthy

Let's explain the preceding source code:

  • The version of the Docker image, openzipkin/zipkin, is specified to be version 2.23.2.
  • The RABBIT_ADDRESSES=rabbitmq environment variable is used to specify that Zipkin will receive trace information using RabbitMQ and that Zipkin will connect to RabbitMQ using the hostname rabbitmq.
  • The STORAGE_TYPE=mem environment variable is used to specify that Zipkin will keep all trace information in memory.
  • The memory limit for Zipkin is increased to 1,024 MB, compared to 512 MB for all other containers. The reason for this is that since Zipkin is configured to keep all trace information in memory, it will consume more memory than the other containers after a while.
  • Zipkin exposes the HTTP port 9411 for web browsers to access its web user interface.
  • Docker will wait to start up the Zipkin server until the RabbitMQ service reports being healthy to Docker.

While it is OK to store the trace information in Zipkin in memory for development and test activities, Zipkin should be configured to store trace information in a database such as Apache Cassandra, Elasticsearch, or MySQL in a production environment.

In docker-compose-kafka.yml, where Kafka is used, the definition of the Zipkin server appears as follows:

  zipkin:
    image: openzipkin/zipkin:2.23.2
    mem_limit: 1024m
    environment:
      - STORAGE_TYPE=mem
      - KAFKA_BOOTSTRAP_SERVERS=kafka:9092
    ports:
      - 9411:9411
    depends_on:
      - kafka

Let's explain the preceding source code in detail:

  • The configuration for using Zipkin together with Kafka is similar to the configuration we just saw for using Zipkin with RabbitMQ.
  • The main difference is the use of the KAFKA_BOOTSTRAP_SERVERS=kafka:9092 environment variable, which is used to specify that Zipkin will use Kafka to receive trace information and that Zipkin will connect to Kafka using the hostname kafka and the port 9092.
  • Docker will wait to start up the Zipkin server until the Kafka service has been started.

In docker-compose-kafka.yml, the kafka Spring profile is added to the Spring Cloud services eureka, gateway, and auth-server:

    environment:
      - SPRING_PROFILES_ACTIVE=docker,kafka

That's what it takes to add distributed tracing using Spring Cloud Sleuth and Zipkin, so let's try it out in the next section!

Trying out distributed tracing

With the necessary changes to the source code in place, we can try out distributed tracing. We will do this by performing the following steps:

  1. Build, start, and verify the system landscape with RabbitMQ as the queue manager.
  2. Send a successful API request and see what trace information we can find in Zipkin related to this API request.
  3. Send an unsuccessful API request and see what error information we can find.
  4. Send a successful API request that triggers asynchronous processing and see how its trace information is represented.
  5. Investigate how we can monitor trace information that's passed to Zipkin in RabbitMQ.
  6. Switch the queue manager to Kafka and repeat the preceding steps.

We will discuss these steps in detail in the upcoming sections.

Starting up the system landscape with RabbitMQ as the queue manager

Let's start up the system landscape. Build the Docker images with the following commands:

cd $BOOK_HOME/Chapter14
./gradlew build && docker-compose build

Start the system landscape in Docker and run the usual tests with the following command:

./test-em-all.bash start

Before we can call the API, we need an access token. Run the following commands to acquire an access token:

unset ACCESS_TOKEN
ACCESS_TOKEN=$(curl -k https://writer:secret@localhost:8443/oauth2/token -d grant_type=client_credentials -s | jq -r .access_token)
echo $ACCESS_TOKEN

As noticed in previous chapters, an access token issued by the authorization server is valid for one hour. So, if you start to get 401 Unauthorized errors after a while, it is probably time to acquire a new access token.

Sending a successful API request

Now, we are ready to send a normal request to the API. Run the following command:

curl -H "Authorization: Bearer $ACCESS_TOKEN" -k https://localhost:8443/product-composite/1 -w "%{http_code}
" -o /dev/null -s

Expect the command to return the HTTP status code for success, 200.

We can now launch the Zipkin UI to look into what trace information has been sent to Zipkin:

  1. Open the following URL in your web browser: http://localhost:9411/zipkin/.
  2. To find the trace information for our request, we can search for traces that have passed through the gateway service. Perform the following steps:
    1. Click on the large plus sign (white + sign on red background) and select serviceName and then gateway.
    2. Click on the RUN QUERY button.
    3. Click on the Start Time header to see the results ordered by latest first (a down arrow should be visible to the left of the Start Time header).

    The response from finding traces should look like the following screenshot:

    Figure 14.3: Searching for distributed traces using Zipkin

  3. The trace information from our preceding API request is the first one in the list. Click on its SHOW button to see details pertaining to the trace:

    Figure 14.4: Sample distributed trace visualized in Zipkin

    In the detailed trace information view, we can observe the following:

    1. The request was received by the gateway service.
    2. The gateway service delegated the processing of the request to the product-composite service.
    3. The product-composite service, in turn, sent three parallel requests to the core services: product, recommendation, and review.
    4. Once the product-composite service received the response from all three core services, it created a composite response and sent it back to the caller through the gateway service.
    5. In the details view to the right, we can see the HTTP path of the actual request we sent: /product-composite/1.

Sending an unsuccessful API request

Let's see what the trace information looks like if we make an unsuccessful API request; for example, searching for a product that does not exist:

  1. Send an API request for product ID 12345 and verify that it returns the HTTP status code for Not Found, 404:
    curl -H "Authorization: Bearer $ACCESS_TOKEN" -k https://localhost:8443/product-composite/12345 -w "%{http_code}
    " -o /dev/null -s
    
  2. In the Zipkin UI, go back to the search page (use the back button in the web browser) and click on the RUN QUERY button again. To see the results ordered by latest first, click on the Start Time header. Expect a result similar to the following screenshot:

    Figure 14.5: Finding a failed request using Zipkin

  3. You should see the failed request at the top of the returned list. Note that its duration bar is red, indicating that an error has occurred. Click on its SHOW button to see details:

Figure 14.6: Viewing a trace of a failed request using Zipkin

Here, we can see the request path that caused the error, /product-composite/12345, as well as the error code: 404 (Not Found). The color coding in red indicates that it is the request to the product service that caused the error. This is very useful information when analyzing the root cause of a failure!

Sending an API request that triggers asynchronous processing

The third type of request that is interesting to see represented in the Zipkin UI is a request where parts of its processing are done asynchronously. Let's try a delete request, where the delete process in the core services is done asynchronously. The product-composite service sends a delete event to each of the three core services over the message broker and each core service picks up the delete event and processes it asynchronously. Thanks to Spring Cloud Sleuth, trace information is added to the events that are sent to the message broker, resulting in a coherent view of the total processing of the delete request.

Run the following command to delete the product with a product ID of 12345 and verify that it returns the HTTP status code for success, 200:

curl -X DELETE -H "Authorization: Bearer $ACCESS_TOKEN" -k https://localhost:8443/product-composite/12345 -w "%{http_code}
" -o /dev/null -s

Remember that the delete operation is idempotent, that is, it will succeed even if the product doesn't exist!

In the Zipkin UI, go back to the search page (use the back button in the web browser) and click on the RUN QUERY button again. To see the results ordered by latest first, click on the Start Time header. Expect a result similar to the following screenshot:

Figure 14.7: Finding a delete request using Zipkin

You should see the delete request at the top of the returned list. Note that the root service name, gateway, is suffixed by the HTTP method used, delete. Click on its SHOW button to see details:

Figure 14.8: Viewing a delete request using Zipkin

Here, we can see the trace information for processing the delete request:

  1. The request was received by the gateway service.
  2. The gateway service delegated the processing of the request to the product-composite service.
  3. The product-composite service, in turn, published three events on the message broker (RabbitMQ, in this case).
  4. The product-composite service is now done and returns a success HTTP status code, 200, through the gateway service back to the caller.
  5. The core services (product, recommendation, and review) receive the delete events and start to process them asynchronously, that is, independently of one another.

To confirm the involvement of the message broker, click on the first product span:

Figure 14.9: Viewing information about the asynchronous processing of an event using Zipkin

The selected span has a rather unhelpful name, unknown. But in the Tags section of the selected span, to the right, we can see information that is more interesting. Here, we can see that the product service was triggered by a message delivered on its input channel, products. We can also see the name of the message broker, broker, in the field Broker Address.

The Zipkin UI contains much more functionality for finding traces of interest!

To get more accustomed to the Zipkin UI, try out the query functionality by clicking on the plus sign and selecting tagQuery. For example, to find requests that failed on a 404 - not found error, set its value to tagQuery=error and http.status_code=404, searching for traces that failed on a Not Found (404) error. Also, try setting limits for lookback range (start and end time) and the maximum number of hits by clicking on the gear icon to the right of the RUN QUERY button.

Monitoring trace information passed to Zipkin in RabbitMQ

To monitor trace information sent to Zipkin over RabbitMQ, we can use the RabbitMQ management Web UI. Trace messages are sent to Zipkin using a queue named zipkin. To monitor messages sent through this queue, open the following URL in your web browser: http://localhost:15672/#/queues/%2F/zipkin. If required, log in using the username "guest" and the password "guest". Expect a web page that looks like the following:

Figure 14.10: Trace records sent through RabbitMQ

In the graph named Message rates, we can see that trace messages are sent to Zipkin, currently at an average rate of 1 message per second.

Wrap up the tests of distributed tracing using RabbitMQ by bringing down the system landscape. Run the following command:

docker-compose down

Using Kafka as a message broker

Let's also verify that we can send trace information to Zipkin using Kafka instead of RabbitMQ!

Start up the system landscape using the following commands:

export COMPOSE_FILE=docker-compose-kafka.yml
./test-em-all.bash start

Repeat the commands we performed in the previous sections, where we used RabbitMQ, and verify that you can see the same trace information in the Zipkin UI when using Kafka.

Kafka doesn't come with a management web UI like RabbitMQ. Therefore, we need to run a few Kafka commands to be able to verify that the trace events were passed to the Zipkin server using Kafka:

For a recap on how to run Kafka commands when running Kafka as a Docker container, refer to the Using Kafka with two partitions per topic section in Chapter 7, Developing Reactive Microservices.

  1. First, list the available topics in Kafka:
    docker-compose exec kafka /opt/kafka/bin/kafka-topics.sh --zookeeper zookeeper --list
    

    Expect to find a topic named zipkin:

    Figure 14.11: Finding the Zipkin topic in Kafka

  2. Next, ask for trace events that were sent to the zipkin topic:
    docker-compose exec kafka /opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic zipkin --from-beginning --timeout-ms 1000
    

    Expect a lot of events similar to the following:

    Figure 14.12: Viewing a lot of trace events in the Zipkin topic in Kafka

The details of a trace event are not important. The Zipkin server sorts that out for us and makes the information presentable in the Zipkin UI. The important point here is that we can see that the trace events actually were sent to the Zipkin server using Kafka.

Now, bring down the system landscape and unset the COMPOSE_FILE environment variable:

docker-compose down
unset COMPOSE_FILE

That concludes this chapter on distributed tracing!

Summary

In this chapter, we have learned how to use distributed tracing to understand how our microservices cooperate. We have learned how to use Spring Cloud Sleuth to collect trace information, and Zipkin to store and visualize the trace information.

To promote the decoupling of runtime components, we have learned how to configure microservices to send trace information to the Zipkin server asynchronously while using RabbitMQ and Kafka as message brokers. We have seen how adding Spring Cloud Sleuth to microservices is effected by adding a couple of dependencies to the build files and setting up a few configuration parameters. We have also seen how the Zipkin UI makes it very easy to identify which part of a complex workflow caused either an unexpectedly long response time or an error. Both synchronous and asynchronous workflows can be visualized with the Zipkin UI.

In the next chapter, we will learn about container orchestrators, specifically Kubernetes. We will learn how to use Kubernetes to deploy and manage microservices, while also improving important runtime characteristics such as scalability, high availability, and resilience.

Questions

  1. What configuration parameter is used to control how trace information is sent to Zipkin?
  2. What is the purpose of the spring.sleuth.sampler.probability configuration parameter?
  3. How can you identify the longest-running request after executing the test-em-all.bash test script?
  4. How can we find requests that have been interrupted by the timeout introduced in Chapter 13, Improving Resilience Using Resilience4j?
  5. What does the trace look like for an API request when the circuit breaker introduced in Chapter 13, Improving Resilience Using Resilience4j, is open?
  6. How can we locate APIs that failed on the caller not being authorized to perform the request?
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.148.102.90