The Single metric job

The Single metric job allows the analysis of a single time series metric at a time. With Metricbeat running in the background, grabbing all the data relative to resource utilization on my laptop, it should be easy to find a good single metric candidate for analysis. In fact, as discussed previously, the system.cpu.idle.pct field may be a great place to start.

Let's go back to start the ML job creation again, after clicking on the Single metric icon:

We are now presented with a screen that looks like the following:

Basically, the user has to make a few choices as to how to configure the analysis:

Pick an Aggregation function
Pick the Field to operate that Aggregation function on
Pick the Bucket span or time resolution for the analysis
Pick the duration of history over which the ML job should run

Since we want to operate on the system.cpu.idle.pct field, we will pick that, and we will pick the Min aggregation function to spot possible anomalies on the low side of the data (since, the longer the CPU is idle, the "worse" it is). Because the data is sampled once per minute, a Bucket span of 1m can be fine, but clicking on the Estimate bucket span button might yield a more conservative suggestion. Ultimately, choose a Bucket span that is the balance between running the analysis frequently enough (to spot anomalies in timeframes that you care about) without running them too frequently. Refer back to Chapter 1, Machine Learning for IT, for a review of the discussion on bucketization for more information. Lastly, since we probably haven't been running Metricbeat for too long, it would be useful to leverage all historical data for this metric as part of the ML job. Selecting the Use full metricbeat-* data button at the top right will modify Kibana's time/data picker to show the entire history of the metric that is stored in Elasticsearch. The resulting screen after these modifications could look something like the following:

After naming the job and, optionally, giving it a plain text description and putting it into a job group (which could be useful later), clicking on the Create Job button will show a view of the ML in action:

Notice that you will start to see a blue area appearing in the chart, surrounding the actual data values: this is a visual representation of the ML model for this dataset. In this specific example, the model doesn't start to be colorized until the last part of the dataset—this is because, in reality, there's not really enough history on this dataset yet (we could have let Metricbeat collect more data first). But despite that, ML did properly identify (with a vertical yellow bar) the moment when the CPU idle percentage dropped.

Just as a side note for this chapter, after running the job, you will be asked if you want to continue to run the job in real-time or create a watch (an alert) for the real-time job, as the following screen shows:

You can enable both options. The first option keeps our ML analysis running in the background in order to detect anomalies in our CPU utilization in real time. The second option relates to ML integration with alerting—a topic we will come back to later in this book.

Click on the View Results button to be brought to the Single Metric Viewer.

As you will see on the chart, the model is learning as it sees data. If you don’t have much data to learn yet, the model may only make a conservative, rough fit around the actual dataset:

This is typical. As more data is collected by Metricbeat, and more data is seen by ML, the model will get more mature and will more accurately identify the dynamic behavioral patterns in the data, as the following chart shows:

The ML model will automatically detect any periodicity in the data. A metric may behave differently during the day than at night, or differently during the week than it does over the weekend. Therefore, the more data you feed Elastic ML, the more steady and accurate the model gets. Having a large corpus of historical data helps you quickly build a model that is ready for real-time data, without the long wait.

Even though your model isn't that mature yet, you may see in the analysis that anomalies start appearing in the form of dots on the data source in different colors:

Warning (blue): Scores less than 25
Minor (yellow): Scores between 25 and 50
Major (orange): Scores between 50 and 75
Critical (red): Scores from 75 to 100

In my case, I have one minor anomaly, as the following screenshot shows:

This anomaly shows a higher-than-normal utilization of the CPU (because of the low idle), based on past observations of this metric. I can get more detail from the table located at the bottom of the Single Metric Viewer, as seen in the following screenshot:

Notice that this anomaly was given a score of 47, a normalized version of the probability assessment of the actual value of the CPU measurement against the ML model of expected values at the time of occurrence—cast onto a scale from 0 to 100. The probability calculation itself was slightly low: 0.001235; the normalized anomaly score is inversely proportional and evaluates to 47, a minor anomaly. Keep in mind that the anomaly scoring is dynamic, depending on the data and its history. In other words, there is no rule that says a probability calculation of a certain value must equal a certain anomaly score.

Note that this analysis doesn't show any details of which process is using the CPU at this time; the way that we choose to aggregate all of the CPU measurements together into a single number has, by design, lost this level of detail. If we'd like to have this detail, we would need to separate or split the CPU measurements on a per-process basis. This is the exact situation where a multi-metric job comes in handy.

Table of Contents for The Single metric job

Create new playlist

Sign In

Sign Up

Table of Contents for
The Single metric job