Setting up an ingestion node

The main goals of Elasticsearch are indexing, searching, and analytics, but it's often required to modify or enhance the documents before storing in Elasticsearch.

The most common scenarios in this case are:

  • Preprocessing the log string to extract meaningful data.
  • Enrich the content of some textual fields with Natural Language Processing (NLP) tools.
  • Add some transformation during ingestion such as convert IP in geolocalization or build custom fields at ingest time

Getting ready

You need a working Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe and a simple text editor to change configuration files.

How to do it...

To set up an ingest node, you need to edit the config/elasticsearch.yml file and set up the ingest property to true:

node.ingest: true

How it works...

The default configuration for Elasticsearch is to set the node as ingest node (refer to Chapter 13, Ingest, for more info on ingestion pipeline).

As the client node, using the ingest node is a way to provide functionalities to Elasticsearch without suffering cluster safety.

Note

If you want preventing node to be used for ingestion, you need to disable it with node.ingest: false. It's best practice to disable it in master and data node to prevent ingestion error issues and to protect the cluster. The client node is the best candidate to be also ingest ones.

The best practice if you are using NLP, attachment extraction (via attachment ingest plugin) or logs ingestion, is to have a pool of client nodes (no master, no data) with ingestion active.

The attachment plugin and NLP ones, in the previous version of Elasticsearch, were available in standard data node or master node. They give a lot of problems to Elasticsearch due to:

  • High CPU usage for NLP algorithms that saturates all CPU on Data node giving bad indexing and searching performances
  • Instability due to bad format of attachment and/or Tika (the library used for managing document extraction) bugs

Tip

The best practice is to have a pool of client nodes with ingestion enabled to provide the best safety for the cluster and ingestion pipeline.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.37.10