Custom queries for ML jobs

While somewhat obscured from the user when configuring anything but an Advanced job (or configuring a job via the API), the user does indeed have complete control over the query being made to the raw data index to feed the ML job. This is the Query parameter of the ML job config:

The default is {"match_all":{}} (return every record in the index), but just about any valid Elasticsearch DSL is supported for filtering the data. Free-form composing Elasticsearch DSL into this text field is a little error-prone. Therefore, a more intuitive way would be to approach this from Kibana via saved searches.

For example, let's say that we have an index pattern called operational-analytics-metricbeat-* and the appropriate hosts associated with the application we'd like to monitor and analyze consists of three servers, site-search-es1, site-search-es2, and site-search-es3. In Kibana's Discover option, we can build a Filter that specifies that we would like to select data only for these hosts by choosing to filter on beat.name to be one of a sublist of all of the hosts available in the index:

After creating the filter, Kibana will apply it to the original data (now, you can see that the returned filtered set of data is smaller, containing 21 million records instead of the original 72 M records):

Under the hood, this filter, when built with the Kibana UI, ultimately manifests itself as Elasticsearch DSL of the following form:

{ 
  "query": { 
    "bool": { 
      "should": [ 
        { 
          "match_phrase": { 
            "beat.name": "site-search-es-1" 
          } 
        }, 
        { 
          "match_phrase": { 
            "beat.name": "site-search-es-2" 
          } 
        }, 
        { 
          "match_phrase": { 
            "beat.name": "site-search-es-3" 
          } 
        } 
      ], 
      "minimum_should_match": 1 
    } 
  } 
}

Kibana has done the work for us. Kibana also offers us the ability to make this a Saved Search, meaning that we can name it and refer to it later. By clicking on the Save menu item, we can give our search a name of only_es1_es2_es3:

Now that the search has been saved, when we create an ML job, we can choose this Saved Search as the basis of our job:

As such, our ML job will now only run for the hosts of interest for this specific application. We can also see that if we split the data along the beat.name field, that only the three hosts of interest will have their data analyzed by the ML job, as indicated by following the visual representation of the split on the right-hand side:

Thus, we have been able to effectively limit and segment the data analysis to the hosts that we've defined to have made a contribution to this application.

Table of Contents for Custom queries for ML jobs

Create new playlist

Sign In

Sign Up

Table of Contents for
Custom queries for ML jobs