Grok processor

Elasticsearch provides a large number of built-in processors that increases with every release. In the preceding examples, we have seen the set and the replace ones. In this recipe, we will cover one of the most used for log analysis: the grok processor, which is well known to Logstash users.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

To execute curl via the command line, you need to install curl for your operative system.

How to do it...

To test a grok pattern against some log lines, we will perform the following steps:

  1. We will execute a call passing both the pipeline with our grok processor and a sample subset of a document to test the pipeline against:
            curl -XPOST 'http://127.0.0.1:9200/_ingest/pipeline/_simulate?
            pretty' -d '{
              "pipeline": {
                "description": "Testing grok pattern",
                "processors": [
                  {
                    "grok": {
                      "field": "message",
                      "patterns": [
                        "%{IP:client} %{WORD:method} %
                        {URIPATHPARAM:request} %{NUMBER:bytes} % 
                        {NUMBER:duration}"
                      ]
                    }
                  }
                ]
              },
              "docs": [
                {
                  "_index": "index",
                  "_type": "type",
                  "_id": "1",
                  "_source": {
                    "message": "155.2.124.3 GET /index.html 15442 0.038"
                  }
                }
              ]
            }'
    
  2. The result returned by Elasticsearch, if everything is okay, should be a list of documents with the pipeline processed:
            {
              "docs" : [
                {
                  "doc" : {
                    "_index" : "index",
                    "_id" : "1",
                    "_type" : "type",
                    "_source" : {
                      "duration" : "0.038",
                      "request" : "/index.html",
                      "method" : "GET",
                      "bytes" : "15442",
                      "client" : "155.2.124.3",
                      "message" : "155.2.124.3 GET /index.html 15442 0.038"
                    },
                    "_ingest" : {
                      "timestamp" : "2016-12-10T14:42:30.368+0000"
                    }
                  }
                }
              ]
            }
    

How it works...

The grok processor allows you to extract structure fields out of a single text field in a document. A grok pattern is like a regular expression that supports aliased expressions that can be reused. It was used mainly in another Elastic software Logstash for its powerful syntax for log data extraction.

Elastisearch has a built-in of about 120 grok expressions (you can analyse them at https://github.com/elastic/elasticsearch/tree/master/modules/ingest-common/src/main/resources/patterns).

Defining a grok expression is quite simple, as the syntax is human readable. If we want to extract colors from an expression (pattern) and check if their value is in a subset of RED, YELLOW, and BLUE via pattern_definitions, we can define a similar processor:

    curl -XPOST 'http://127.0.0.1:9200/_ingest/pipeline/_simulate?
    pretty' -d '{
      "pipeline": {
      "description" : "custom grok pattern",
      "processors": [
        {
          "grok": {
            "field": "message",
            "patterns": ["my favorite color is %{COLOR:color}"],
            "pattern_definitions" : {
              "COLOR" : "RED|GREEN|BLUE"
            }
          }
        }
      ]
    },
    "docs":[
      {
        "_source": {
          "message": "my favorite color is RED"
        }
      },
      {
        "_source": {
          "message": "happy fail!!"
        }
      }
      ]
    }'

The result will be as follows:

{ 
  "docs" : [ 
    { 
      "doc" : { 
        "_index" : "_index", 
        "_id" : "_id", 
        "_type" : "_type", 
        "_source" : { 
          "message" : "my favorite color is RED", 
          "color" : "RED" 
        }, 
        "_ingest" : { 
          "timestamp" : "2016-12-10T15:06:21.823+0000" 
        } 
      } 
    }, 
    { 
      "error" : { 
        "root_cause" : [ 
          { 
            "type" : "exception", 
            "reason" : "java.lang.IllegalArgumentException: 
            java.lang.IllegalArgumentException: Provided Grok  
            expressions do not match field value: [happy fail!!]", 
            "header" : { 
              "processor_type" : "grok" 
            } 
          } 
        ], 
        "type" : "exception", 
        "reason" : "java.lang.IllegalArgumentException: 
        java.lang.IllegalArgumentException: Provided Grok expressions 
        do not match field value: [happy fail!!]", 
        "caused_by" : { 
          "type" : "illegal_argument_exception", 
          "reason" : "java.lang.IllegalArgumentException: Provided Grok 
          expressions do not match field value: [happy fail!!]", 
          "caused_by" : { 
            "type" : "illegal_argument_exception", 
            "reason" : "Provided Grok expressions do not match field   
            value: [happy fail!!]" 
          } 
        }, 
        "header" : { 
          "processor_type" : "grok" 
        } 
      } 
    } 
  ] 
} 

In real applications, the failing grok processor exceptions will prevent your document from being indexed for this reason. When you design your grok pattern be sure to test it on a large subset.

See also

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.189.67