Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Specifying a different analyzer

In the previous recipes, we have seen how to map different fields and objects in Elasticsearch and we have described how it's easy to change the standard analyzer with the analyzer and search_analyzer properties.

In this recipe, we will see several analyzers and how to use them to improve the indexing and searching quality.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

How to do it...

Every core type field allows you to specify custom analyzer for indexing and for searching as field parameters.

For example, if we want that the name field uses a standard analyzer for indexing and a simple analyzer for searching, the mapping will be as follows:

{ 
    "name": { 
        "type": "string", 
        "index_analyzer": "standard", 
        "search_analyzer": "simple" 
    } 
       }

How it works...

The concept of analyzer comes from Lucene (the core of Elasticsearch). An analyzer is a Lucene element that is composed by a tokenizer, that splits a text in tokens, and one or more token filter, that do token manipulation such as lowercasing, normalization, removing stop words, stemming, and so on.

During indexing phase, when Elasticsearch processes a field that must be indexed, an analyzer is chosen, looking first if it is defined in the index_analyzer field, then in document, and finally in the index.

Tip

Choosing the correct analyzer is essential to have good results during the query phase.

Elasticsearch provides several analyzers in its standard installation. In the following table, the most common ones are described:

Name	Description
standard	It divides the text using a standard tokenizer--normalize tokens, lowercase tokens and remove unwanted tokens.
simple	It divides text at non-letter and converts them to lowercase.
whitespace	It divides text at spaces.
stop	It processes the text with standard analyzer, then applies custom stopwords.
keyword	It considers the all text as a token.
pattern	It divides text using a regular expression.
snowball	It works as a standard analyzer plus a stemming at the end of processing.

For special language purposes, Elasticsearch supports a set of analyzers aimed at analyzing specific language text, such as Arabic, Armenian, Basque, Brazilian, Bulgarian, Catalan, Chinese, Cjk, Czech, Danish, Dutch, English, Finnish, French, Galician, German, Greek, Hindi, Hungarian, Indonesian, Italian, Norwegian, Persian, Portuguese, Romanian, Russian, Spanish, Swedish, Turkish, and Thai.

Table of Contents for
Specifying a different analyzer

Specifying a different analyzer

Getting ready

How to do it...

How it works...

Tip

See also

Table of Contents for Specifying a different analyzer

Create new playlist

Sign In

Sign Up

Specifying a different analyzer

Getting ready

How to do it...

How it works...

Tip

See also

Table of Contents for
Specifying a different analyzer