Built-in processors

Elasticsearch provides by default a large set of ingest processors. Their number and functionalities can also change from minor versions to extended versions for new scenarios.

In this recipe, we will see the most commonly used ones.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

To execute curl via the command-line, you need to install curl for your operative system.

How to do it...

To use several processors in an ingestion pipeline in Elasticsearch, we will perform the following steps:

  1. We execute a simulate pipeline API call using several processors with a sample subset of a document to test the pipeline against:
            curl -XPOST 'http://127.0.0.1:9200/_ingest/pipeline/_simulate?
            pretty' -d '{
              "pipeline": {
                "description": "Testing some build-processors",
                "processors": [
                  {
                    "dot_expander": {
                      "field": "extfield.innerfield"
                    }
                  },
                  {
                    "remove": {
                      "field": "unwanted"
                    }
                  },
                  {
                    "trim": {
                      "field": "message"
                    }
                  },
                  {
                    "set": {
                      "field": "tokens",
                      "value": "{{message}}"
                    }
                  },
                  {
                    "split": {
                      "field": "tokens",
                      "separator": "\s+"
                    }
                  },
                  {
                    "sort": {
                      "field": "tokens",
                      "order": "desc"
                    }
                  },
                  {
                    "convert": {
                      "field": "mynumbertext",
                      "target_field": "mynumber",
                      "type": "integer"
                    }
                  }
                ]
              },
              "docs": [
                {
                  "_index": "index",
                  "_type": "type",
                  "_id": "1",
                  "_source": {
                    "extfield.innerfield": "booo",
                    "unwanted": 32243,
                    "message": "   155.2.124.3 GET /index.html 15442 
                     0.038   ",
                    "mynumbertext": "3123"
                  }
                }
              ]
            }'
    
  2. The result will be as follows:
            {
              "docs" : [
                {
                  "doc" : {
                    "_index" : "index",
                    "_type" : "type",
                    "_id" : "1",
                    "_source" : {
                      "mynumbertext" : "3123",
                      "extfield" : {
                        "innerfield" : "booo"
                      },
                      "tokens" : [
                        "GET",
                        "155.2.124.3",
                        "15442",
                        "0.038",
                        "/index.html"
                      ],
                      "message" : "155.2.124.3 GET /index.html 15442  
                       0.038",
                      "mynumber" : 3123
                    },
                    "_ingest" : {
                      "timestamp" : "2016-12-10T16:49:40.875+0000"
                    }
                  }
                }
              ]
            }
    

How it works...

The preceding example shows how to build a complex pipeline to pre-process a document. There are a lot of built-in processors to cover the most common scenarios in log and text processing.

More complex ones can be done via scripting.

At the time of writing, Elasticsearch provides built-in pipelines the following processors:

Name

Description

Append

Appends values to a field. If required, it converts them in an array.

Convert

Converts a field value to a different type.

Date

Parses a date and uses it as a timestamp for the document.

Date Index Name

Allows us to set the _index name based on date field.

Fail

Raises a failure.

Foreach

Processes the element of an array with the provided processor.

Grok

Applies grok pattern extraction.

Gsub

Executes a regular expression replace on a field.

Join

Joins an array of values using a separator.

JSON

Convert a JSON string to a JSON object.

Lowercase

Lowercases a field.

Remove

Removes a field.

Rename

Renames a field.

Script

Allows us to execute a script.

Set

Sets the value of a field.

Split

Splits a field in an array using regular expression.

Sort

Sorts the values of an array field.

Trim

Trims whitespaces from a field.

Uppercase

Uppercases a field.

Dot expander

Expands a field with a dot in the objects.

See also

  • In Chapter 17, Plugin Development, we will cover how to write a custom processor in Java to extend the capabilities of Elasticsearch
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.90.182