Elasticsearch provides a large number of built-in processors that increases with every release. In the preceding examples, we have seen the set
and the replace
ones. In this recipe, we will cover one of the most used for log analysis: the grok processor, which is well known to Logstash users.
You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.
To execute curl
via the command line, you need to install curl
for your operative system.
To test a grok pattern against some log lines, we will perform the following steps:
curl -XPOST 'http://127.0.0.1:9200/_ingest/pipeline/_simulate? pretty' -d '{ "pipeline": { "description": "Testing grok pattern", "processors": [ { "grok": { "field": "message", "patterns": [ "%{IP:client} %{WORD:method} % {URIPATHPARAM:request} %{NUMBER:bytes} % {NUMBER:duration}" ] } } ] }, "docs": [ { "_index": "index", "_type": "type", "_id": "1", "_source": { "message": "155.2.124.3 GET /index.html 15442 0.038" } } ] }'
{ "docs" : [ { "doc" : { "_index" : "index", "_id" : "1", "_type" : "type", "_source" : { "duration" : "0.038", "request" : "/index.html", "method" : "GET", "bytes" : "15442", "client" : "155.2.124.3", "message" : "155.2.124.3 GET /index.html 15442 0.038" }, "_ingest" : { "timestamp" : "2016-12-10T14:42:30.368+0000" } } } ] }
The grok processor allows you to extract structure fields out of a single text field in a document. A grok pattern is like a regular expression that supports aliased expressions that can be reused. It was used mainly in another Elastic software Logstash for its powerful syntax for log data extraction.
Elastisearch has a built-in of about 120 grok expressions (you can analyse them at https://github.com/elastic/elasticsearch/tree/master/modules/ingest-common/src/main/resources/patterns).
Defining a grok expression is quite simple, as the syntax is human readable. If we want to extract colors from an expression (pattern
) and check if their value is in a subset of RED
, YELLOW
, and BLUE
via pattern_definitions
, we can define a similar processor:
curl -XPOST 'http://127.0.0.1:9200/_ingest/pipeline/_simulate? pretty' -d '{ "pipeline": { "description" : "custom grok pattern", "processors": [ { "grok": { "field": "message", "patterns": ["my favorite color is %{COLOR:color}"], "pattern_definitions" : { "COLOR" : "RED|GREEN|BLUE" } } } ] }, "docs":[ { "_source": { "message": "my favorite color is RED" } }, { "_source": { "message": "happy fail!!" } } ] }'
The result will be as follows:
{ "docs" : [ { "doc" : { "_index" : "_index", "_id" : "_id", "_type" : "_type", "_source" : { "message" : "my favorite color is RED", "color" : "RED" }, "_ingest" : { "timestamp" : "2016-12-10T15:06:21.823+0000" } } }, { "error" : { "root_cause" : [ { "type" : "exception", "reason" : "java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: Provided Grok expressions do not match field value: [happy fail!!]", "header" : { "processor_type" : "grok" } } ], "type" : "exception", "reason" : "java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: Provided Grok expressions do not match field value: [happy fail!!]", "caused_by" : { "type" : "illegal_argument_exception", "reason" : "java.lang.IllegalArgumentException: Provided Grok expressions do not match field value: [happy fail!!]", "caused_by" : { "type" : "illegal_argument_exception", "reason" : "Provided Grok expressions do not match field value: [happy fail!!]" } }, "header" : { "processor_type" : "grok" } } } ] }
In real applications, the failing grok processor exceptions will prevent your document from being indexed for this reason. When you design your grok pattern be sure to test it on a large subset.
18.189.189.67