IT Operational Analytics and Root Cause Analysis

Up until this point, we have extensively explained the value of detecting anomalies across metrics and logs separately. This is extremely valuable, of course. In some cases, however, the knowledge that a particular metric or log file has gone awry may not tell the whole story of what is going on. It may, for example, be pointing at a symptom and not the cause of the problem. To have a better understanding of the full scope of an emerging problem, it is often helpful to look holistically at many aspects of a system or situation. This involves smartly analyzing multiple kinds of related datasets together.

In this chapter, we will cover the following topics:

  • Understanding the roles and importance of key performance indicators (KPIs) and other supporting metrics
  • Learning methods to organize, filter, and enrich the data to supply context
  • Exploiting that contextual information via ML jobs that employ data splitting and statistical influencers
  • Combining the anomalies that are created by ML into one common view that enables better triage, collaboration, and resolution

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.42.81