Understanding Elasticsearch analyzers

The main task of an analyzer is to take the value of a field and break it down into terms. In Chapter 2, Getting Started with Elasticsearch, we looked at the structure of an inverted index. The job of the analyzer is to take documents and each field within them and extract terms from them. These terms make the index searchable, that is, they can help us find out which documents contain particular search terms.

The analyzer performs this process of breaking up input character streams into terms. This happens twice:

At the time of indexing
At the time of searching

The core task of the analyzer is to parse the document fields and build the actual index.

Every field of text type needs to be analyzed before the document is indexed. This process of analysis is what makes the documents searchable by any term that is used at the time of searching.

Analyzers can be configured on a per field basis, that is, it is possible to have two fields of the text type within the same document, each one using different analyzers.

Elasticsearch uses analyzers to analyze text data. An analyzer has the following components:

Character filters: Zero or more
Tokenizer: Exactly one
Token filters: Zero or more

The following diagram depicts the components of an analyzer:

Figure 3.1: Anatomy of an analyzer

Let's understand the role of each component one by one.

Table of Contents for Understanding Elasticsearch analyzers

Create new playlist

Sign In

Sign Up

Table of Contents for
Understanding Elasticsearch analyzers