Gaining insight into the service mesh

Visibility into the service mesh with the help of proper tools is a necessity if you wish to resolve issues quickly. In the absence of appropriate tools, it becomes very time-consuming and expensive to find out the source of the problems. In Chapter 16, Exploring the Reliability Features of Linkerd, we used the Linkerd dashboard to debug a particular route, which showed a success rate of less than 100%. This information about a specific route is of great help and acts as a feedback loop for the developer so that they can fix issues.

The Linkerd dashboard (GUI) and the Linkerd CLI (command line) are two essential tools if we want to gain insight into the service mesh. These tools show key indicators such as live traffic, success rate, routes, latencies, and an overview of traffic flow from individual sources to different targets. These are important for the health and performance of any application from an HTTP or gRPC protocol standpoint. They help pinpoint issues much more quickly than having to go through the logs of different containers.

One of the salient features of Linkerd is to show P50, P95, and P99 latencies, as we explained in Chapter 16, Exploring the Reliability Features of Linkerd. It is possible to report such types of metrics due to aggregation that's done at the proxy level.

Linkerd also provides a pre-built Grafana dashboard for metrics that are scrapped through Prometheus, which stores data for up to 6 hours to give us a quick insight into the service mesh. For long-term history collection, we have to store the data in an external Prometheus backend.

In the next section, we will look at the aforementioned methods in more detail in order to gain insight into the service mesh. Let's begin with the Linkerd command-line interface (CLI).

Table of Contents for Gaining insight into the service mesh

Create new playlist

Sign In

Sign Up

Table of Contents for
Gaining insight into the service mesh