Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Built-in processors

Elasticsearch provides by default a large set of ingest processors. Their number and functionalities can also change from minor versions to extended versions for new scenarios.

In this recipe, we will see the most commonly used ones.

Getting ready

You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

To execute curl via the command-line, you need to install curl for your operative system.

How to do it...

To use several processors in an ingestion pipeline in Elasticsearch, we will perform the following steps:

We execute a simulate pipeline API call using several processors with a sample subset of a document to test the pipeline against:

        curl -XPOST 'http://127.0.0.1:9200/_ingest/pipeline/_simulate?
        pretty' -d '{
          "pipeline": {
            "description": "Testing some build-processors",
            "processors": [
              {
                "dot_expander": {
                  "field": "extfield.innerfield"
                }
              },
              {
                "remove": {
                  "field": "unwanted"
                }
              },
              {
                "trim": {
                  "field": "message"
                }
              },
              {
                "set": {
                  "field": "tokens",
                  "value": "{{message}}"
                }
              },
              {
                "split": {
                  "field": "tokens",
                  "separator": "\s+"
                }
              },
              {
                "sort": {
                  "field": "tokens",
                  "order": "desc"
                }
              },
              {
                "convert": {
                  "field": "mynumbertext",
                  "target_field": "mynumber",
                  "type": "integer"
                }
              }
            ]
          },
          "docs": [
            {
              "_index": "index",
              "_type": "type",
              "_id": "1",
              "_source": {
                "extfield.innerfield": "booo",
                "unwanted": 32243,
                "message": "   155.2.124.3 GET /index.html 15442 
                 0.038   ",
                "mynumbertext": "3123"
              }
            }
          ]
        }'

The result will be as follows:

        {
          "docs" : [
            {
              "doc" : {
                "_index" : "index",
                "_type" : "type",
                "_id" : "1",
                "_source" : {
                  "mynumbertext" : "3123",
                  "extfield" : {
                    "innerfield" : "booo"
                  },
                  "tokens" : [
                    "GET",
                    "155.2.124.3",
                    "15442",
                    "0.038",
                    "/index.html"
                  ],
                  "message" : "155.2.124.3 GET /index.html 15442  
                   0.038",
                  "mynumber" : 3123
                },
                "_ingest" : {
                  "timestamp" : "2016-12-10T16:49:40.875+0000"
                }
              }
            }
          ]
        }

How it works...

The preceding example shows how to build a complex pipeline to pre-process a document. There are a lot of built-in processors to cover the most common scenarios in log and text processing.

More complex ones can be done via scripting.

At the time of writing, Elasticsearch provides built-in pipelines the following processors:

Name	Description
Append	Appends values to a field. If required, it converts them in an array.
Convert	Converts a field value to a different type.
Date	Parses a date and uses it as a timestamp for the document.
Date Index Name	Allows us to set the `_index` name based on date field.
Fail	Raises a failure.
Foreach	Processes the element of an array with the provided processor.
Grok	Applies grok pattern extraction.
Gsub	Executes a regular expression `replace` on a field.
Join	Joins an array of values using a separator.
JSON	Convert a JSON string to a JSON object.
Lowercase	Lowercases a field.
Remove	Removes a field.
Rename	Renames a field.
Script	Allows us to execute a script.
Set	Sets the value of a field.
Split	Splits a field in an array using regular expression.
Sort	Sorts the values of an array field.
Trim	Trims whitespaces from a field.
Uppercase	Uppercases a field.
Dot expander	Expands a field with a dot in the objects.

Table of Contents for
Built-in processors

Built-in processors

Getting ready

How to do it...

How it works...

See also

Table of Contents for Built-in processors

Create new playlist

Sign In

Sign Up

Built-in processors

Getting ready

How to do it...

How it works...

See also

Table of Contents for
Built-in processors