Anatomy of the default watch from the ML UI in Kibana

Once created in the ML UI in Kibana, the contents of the watch definition will look something like the code in the following listing on the GitHub repository: https://github.com/PacktPublishing/Machine-Learning-with-the-Elastic-Stack/blob/master/Chapter06/default_ML_watch.json.

Since this watch is quite lengthy, let's break it down into sections. First, let's look at the trigger section:

{
  "trigger": {
    "schedule": {
      "interval": "109s"
    }
  },

Here, we can see that the interval at which the watch will fire in real time is every 109s. This will be a random value between 60 and 120 seconds so that if a node restarts, all of the watches will not be synchronized, and they will have their execution times more evenly spread out to reduce any potential load on the cluster. It is also important that this interval value be less than or equal to the bucket span of the job. Having it be larger than the bucket span may cause recently written anomaly records to be missed by the watch. With the interval being less (or even much less) than the bucket span of the job, you can also take advantage of the advanced notification that is available when there are interim results, anomalies that can still be determined despite not having seen all of the data within a bucket span.

The input section starts with the query section:

          "query": {
            "bool": {
              "filter": [
                {
                  "term": {
                    "job_id": "farequote"
                  }
                },
                {
                  "range": {
                    "timestamp": {
                      "gte": "now-30m"
                    }
                  }
                },
                {
                  "terms": {
                    "result_type": [
                      "bucket",
                      "record",
                      "influencer"
                    ]
                  }
                }
              ]
            }
},

Here, we are asking Watcher to query (in the .ml-anomalies-* index pattern) for bucket, record, and influencer result documents for a job called farequote in the last 30 minutes (again, the default window is twice the bucket span of the ML job, which was 15 minutes in this example). While all result types were asked for, we will later see that only the bucket-level results are used to evaluate whether or not to create an alert.

Next comes a series of three aggregations. When they're collapsed, they look as follows:

The bucket_results aggregation first filters for buckets where the anomaly score is greater than or equal to 75:

          "aggs": {
            "bucket_results": {
              "filter": {
                "range": {
                  "anomaly_score": {
                    "gte": 75
                  }
                }
              },

Then, a subaggregation asks for the top 1 bucket sorted by anomaly_score:

              "aggs": {
               "top_bucket_hits": {
                  "top_hits": {
                    "sort": [
                      {
                        "anomaly_score": {
                          "order": "desc"
                        }
                      }
                    ],
                    "_source": {
                      "includes": [
                        "job_id",
                        "result_type",
                        "timestamp",
                        "anomaly_score",
                        "is_interim"
                      ]
                    },
                    "size": 1,

Next, still within the top_bucket_hits subaggregation, there are a series of defined scripted fields:

                    "script_fields": {
                      "start": {
                        "script": {
                          "lang": "painless",
                          "inline": "LocalDateTime.ofEpochSecond((doc["timestamp"].date.getMillis()-((doc["bucket_span"].value * 1000)
 * params.padding)) / 1000, 0, ZoneOffset.UTC).toString()+":00.000Z"",
                          "params": {
                            "padding": 10
                          }
                        }
                      },
                      "end": {
                        "script": {
                          "lang": "painless",
                          "inline": "LocalDateTime.ofEpochSecond((doc["timestamp"].date.getMillis()+((doc["bucket_span"].value * 1000)
 * params.padding)) / 1000, 0, ZoneOffset.UTC).toString()+":00.000Z"",
                          "params": {
                            "padding": 10
                          }
                        }
                      },
                      "timestamp_epoch": {
                        "script": {
                          "lang": "painless",
                          "inline": "doc["timestamp"].date.getMillis()/1000"
                        }
                      },
                      "timestamp_iso8601": {
                        "script": {
                          "lang": "painless",
                          "inline": "doc["timestamp"].date"
                        }
                      },
                      "score": {
                        "script": {
                          "lang": "painless",
                          "inline": "Math.round(doc["anomaly_score"].value)"
                        }
                      }
                    }

These newly defined variables will be used by the watch to provide more functionality and context. Some of the variables are merely reformatting values (score is just a rounded version of anomaly_score), while start and end will later fill a functional role by defining a start and end time that is equal to +/- 10 bucket spans from the time of the anomalous bucket. This is later used by the UI to show an appropriate contextual time range before and after the anomalous bucket so that the user can see things more clearly.

The influencer_results and record_results aggregations ask for the top three influencer scores and record scores, but only the output of the record_results aggregation is used in subsequent parts of the watch (and only in the default email text).

The condition section of the watch is where the input is evaluated to see whether or not the action section is executed or not. In this case, the condition section is as follows:

  "condition": {
    "compare": {
      "ctx.payload.aggregations.bucket_results.doc_count": {
        "gt": 0
      }
    }
  },

We are using this to check whether the bucket_results aggregation returned any documents (where the doc_count is greater than 0). In other words, if the bucket_results aggregation did indeed return non-zero results, that indicates that there were indeed documents where the anomaly_score was greater than 75. If true, then the action section will be invoked.

The action section has two parts in our case: one action for logging information to a file and the other for sending an email. If the action section is executed because of true being returned from the condition section, then both the log action and the email action are invoked:

  "actions": {
    "log": {
      "logging": {
        "level": "info",
        "text": "Alert for job [{{ctx.payload.aggregations.bucket_results.top_bucket_hits.hits.hits.0._source.job_id}}] at [{{ctx.payload.aggregations.bucket_results.top_bucket_hits.hits.hits.0.fields.timestamp_iso8601.0}}] score [{{ctx.payload.aggregations.bucket_results.top_bucket_hits.hits.hits.0.fields.score.0}}]"
      }
    },
    "send_email": {
      "throttle_period_in_millis": 900000,
      "email": {
        "profile": "standard",
        "to": [
          "[email protected]"
        ],
        "subject": "ML Watcher Alert",
        "body": {
          "html": "<html>
  <body>
    <strong>Elastic Stack Machine Learning Alert</strong>
    <br />
    <br />

    <strong>Job</strong>: {{ctx.payload.aggregations.bucket_results.top_bucket_hits.hits.hits.0._source.job_id}}
    <br />

    <strong>Time</strong>: {{ctx.payload.aggregations.bucket_results.top_bucket_hits.hits.hits.0.fields.timestamp_iso8601.0}}
    <br />

    <strong>Anomaly score</strong>: {{ctx.payload.aggregations.bucket_results.top_bucket_hits.hits.hits.0.fields.score.0}}
    <br />
    <br />

    <a href="http://localhost:5601/app/ml#/explorer/?_g=(ml:(jobIds:!('{{ctx.payload.aggregations.bucket_results.top_bucket_hits.hits.hits.0._source.job_id}}')),refreshInterval:(display:Off,pause:!f,value:0),time:(from:'{{ctx.payload.aggregations.bucket_results.top_bucket_hits.hits.hits.0.fields.start.0}}',mode:absolute,to:'{{ctx.payload.aggregations.bucket_results.top_bucket_hits.hits.hits.0.fields.end.0}}'))&_a=(filters:!(),mlAnomaliesTable:(intervalValue:auto,thresholdValue:0),mlExplorerSwimlane:(selectedLane:Overall,selectedTime:{{ctx.payload.aggregations.bucket_results.top_bucket_hits.hits.hits.0.fields.timestamp_epoch.0}},selectedType:overall),query:(query_string:(analyze_wildcard:!t,query:'*')))">
    Click here to open in Anomaly Explorer</a>.
    <br />
    <br />

    

    <strong>Top records:</strong>
    <br />
    {{#ctx.payload.aggregations.record_results.top_record_hits.hits.hits}}
      {{_source.function}}({{_source.field_name}}) {{_source.by_field_value}} {{_source.over_field_value}} {{_source.partition_field_value}} [{{fields.score.0}}]
      <br />
    {{/ctx.payload.aggregations.record_results.top_record_hits.hits.hits}}

  </body>
</html>
"
        }
      }
    }
  }

The log section will print a message to an output file, which by default is the Elasticsearch log file. Notice that the syntax of the message is using the templating language called Mustache (named because of its prolific usage of curly braces). Simply put, variables contained in Mustache's double curly braces will be substituted with their actual values. As a result, for an example job, the logging text written out to the file may look as follows:

Alert for job [farequote_alert] at [2017-02-12T00:00:00.000Z] score [91]

The email may look as follows:

Elastic Stack Machine Learning Alert 
 
 Job: farequote_alert 
 Time: 2017-02-09T16:15:00.000Z 
 Anomaly score: 91 
 Click here to open in Anomaly Explorer. 
 
 Top records: 
 count() [91]

It is obvious that the format of the alert HTML is really oriented around getting the user a summary of the information, but to entice the user to investigate further by clicking on the link within the email. The URL of this link contains context, and looking at the URL itself gives us those clues:

http://localhost:5601/app/ml#/explorer/?_g=(ml:(jobIds:!('farequote_alert')),refreshInterval:(display:Off,pause:!f,value:0),time:(from:'2017-02-09T13:45:00.000Z',mode:absolute,to:'2017-02-09T18:45:00.000Z'))&_a=(filters:!(),mlAnomaliesTable:(intervalValue:auto,thresholdValue:0),mlExplorerSwimlane:(selectedLane:Overall,selectedTime:1486656900,selectedType:overall),query:(query_string:(analyze_wildcard:!t,query:'*')))

The job_id, the from and to timestamps, and the epoch timestamp selectedTime correspond to the variables that are filled in via Mustache in the watch definition, with the actual values coming from the scripted fields mentioned in the input section we looked at previously.

Also, it is notable that the top three records are reported in the text of the email response. In our example case, there is only one record (a count detector with a score of 91). This section of information came from the record_results aggregation we described previously in the input section of the watch.

The default watch that was created by ML is a good, usable alert that provides summarized information about the unusualness of the dataset over time, but it is also good to understand the implications of using (without modification) the watch that's created by the ML user interface in Kibana:

The main condition for Alerting is a bucket anomaly score above a certain value. Therefore, it would not alert on individual anomalous records within a bucket in the case where their score does not lift the overall bucket score above the stated threshold.
By default, only a maximum of the top three record scores in the bucket are reported in the output, and only if the email action is chosen.
The watch would still exist, even if the ML job was deleted. You would need to remember to also delete this watch.
The watch's only actions are logging and email. Adding other actions (slack message, webhook, and so on) would require manually editing the watch.

Knowing this information, it may become necessary at some point to create a more full-featured, complex watch to fully customize the behavior and output of the watch. In the next section, we'll discuss some more examples of creating a watch from scratch.

Table of Contents for Anatomy of the default watch from the ML UI in Kibana

Create new playlist

Sign In

Sign Up

Table of Contents for
Anatomy of the default watch from the ML UI in Kibana