A better dashboard for big screens

We explored how to create a dashboard with a graph and a single stat (semaphore). Both are based on similar queries, and the significant difference is in the way they display the results. We'll assume that the primary purpose of the dashboard we started building is to be available on a big screen, visible to many, and not as something we keep open on our laptops. At least, not continuously.

What should be the primary purpose of such a dashboard? Before I answer that question, we'll import a dashboard I created for this chapter.

Please click the + button from the left-hand menu and select Import. Type 9132 as the Grafana.com Dashboard and press the Load button. Select a Prometheus data source. Feel free to change any of the values to suit your needs. Never the less, you might want to postpone that until you get more familiar with the dashboard.

In any case, click the Import button once you're finished.

Figure 6-11: Grafana dashboard based on semaphores

You are likely to see one or more red semaphores. That's normal since some of the resources in our cluster are not configured properly. For example, Prometheus is likely to have less memory requested than it needs. That's OK because it allows us to see the dashboard in action. The definitions used in the Gists are not supposed to be production-ready, and you already know that you have to adjust their resources, and likely a few other things.

You'll notice that the dashboard we imported consists only of semaphores. At least, on the first look. Even though they might not be as appealing as graphs and other types of panels, they are much more effective as indicators of the health of our system. We do not need to look at that dashboard. It's enough if it's displayed on a big screen, while we work on something else. If one of the boxes turns red, we'll notice that. It will be a call to action. Or, to be more precise, we'll need to do something if a red box continues being red for longer, thus excluding the possibility that it's a false positive that will be resolved by itself after a few moments.

You can think of this dashboard as a supplement to Prometheus alerts. It does not replace them, since there are some subtle, yet significant differences we'll discuss later.

I won't describe each of the panels since they are a reflection of the Prometheus alerts we created earlier. You should be familiar with them by now. If in doubt, please click on the i icon in the top-left corner of a panel. If the description is not enough, enter the panel's edit mode and check the query and the coloring options.

Please note that the dashboard might not be the perfect fit as-is. You might need to change some of the variable values or the coloring thresholds. For example, the threshold of the Nodes panel is set to 4,5. Judging by the colors, we can see that it'll turn orange (warning) if the number of nodes jumps to four, and red (panic) if it goes to five. Your values are likely to be different. Ideally, we should use variables instead of hard-coded thresholds, but that is currently not possible with Grafana. Variables are not supported everywhere. You, as a supporter of open source projects, should make a PR. Please let me know if you do.

Does all that mean that all our dashboards should be green and red boxes with a single number inside them? I do believe that semaphores should be the "default" display. When they are green, there's no need for anything else. If that's not the case, we should extend the number of semaphores, instead of cluttering our monitors with random graphs. However, that begs the question. What should we do when some of the boxes turn red or even orange?

Below the boxes, you'll find the Graph row with additional panels. They are not visible by default for a reason.

There is no justification for seeing them under normal circumstances. But, if one of the semaphores does raise an alert, we can expand Graphs and see more details about the issue.

Figure 6-12: Grafana dashboard based on tables and graphs

The panels inside the Graphs row are a reflection of the panels (semaphores) in the Alerts row. Each graph shows more detailed data related to the single stat from the same location (but a different row). That way, we do not need to waste our time trying to figure out which graph corresponds to the "red box".

Instead, we can jump straight into the corresponding graph. If the semaphore on in the second row on the right turns red, look at the graphs in the second row on the right. If multiple boxes turn red, we can take a quick look at related graphs and try to find the relation (if there is any). More often than not, we'll have to switch from Grafana to Prometheus and dig deeper into metrics.

Dashboards like the one in front of you should give us a quick head start towards the resolution of an issue. The semaphores on the top provide alerting mechanism that should lead to the graphs below that should give a quick indication of the possible causes of the problem. From there on, if the cause is an obvious one, we can move to Prometheus and start debugging (if that's the right word).

Dashboards with semaphores should be displayed on big screens around the office. They should provide an indication of a problem. Corresponding graphs (and other panels) provide a first look at the issue. Prometheus serves as the debugging tool we use to dig into metrics until we find the culprit.

We explored a few things that provide similar functionality. Still, it might not be clear what the difference between Prometheus alerts, semaphores, graph alerts, and Grafana notifications is? Why didn't we create any Grafana notification? We'll explore those and a few other questions next.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.19.27.178