Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Creating an analyzer plugin

Elasticsearch provides, out-of-the-box, a large set of analyzers and tokenizers to cover general needs. Sometimes we need to extend the capabilities of Elasticsearch by adding new analyzers.

Typically you can create an analyzer plugin when you need:

To add standard Lucene analyzers/tokenizers not provided by Elasticsearch
To integrate third-part analyzers
To add custom analyzers

In this recipe we will add a new custom English analyzer similar to the one provided by Elasticsearch.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

A Maven tool, or an IDE that supports Java programming, such as Eclipse or IntelliJ IDEA. The code for this recipe is available in the chapter17/analysis_plugin directory.

How to do it...

An analyzer plugin is generally composed of two classes:

A Plugin class, which implements the org.elasticsearch.plugins.AnalysisPlugin class
An AnalyzerProviders class, which provides an analyzer

For creating an analyzer plugin, we will perform the following steps:

The plugin class is similar to previous recipes, plus a method that returns the analyzers:

        public class AnalysisPlugin extends Plugin implements      
        org.elasticsearch.plugins.AnalysisPlugin { 
            @Override 
            public Map<String, AnalysisModule.AnalysisProvider
            <AnalyzerProvider<? extends Analyzer>>>
            getAnalyzers() { 
                Map<String, AnalysisModule.AnalysisProvider
                <AnalyzerProvider<? extends Analyzer>>>
            analyzers = new HashMap(); 
                analyzers.put(CustomEnglishAnalyzerProvider.NAME,            
                CustomEnglishAnalyzerProvider::
                getCustomEnglishAnalyzerProvider); 
                return analyzers; 
            } 
        }

The AnalyzerProvider class provides the initialization of our analyzer, passing parameters provided by the settings:

        package org.elasticsearch.index.analysis; 
 
        import org.apache.lucene.analysis.en.EnglishAnalyzer; 
        import org.apache.lucene.analysis.util.CharArraySet; 
        import org.elasticsearch.common.settings.Settings; 
        import org.elasticsearch.env.Environment; 
        import org.elasticsearch.index.IndexSettings; 
 
        public class CustomEnglishAnalyzerProvider extends      
        AbstractIndexAnalyzerProvider<EnglishAnalyzer> { 
            public static String NAME = "custom_english"; 
 
            private final EnglishAnalyzer analyzer; 
 
           public CustomEnglishAnalyzerProvider(IndexSettings   
           indexSettings, Environment env, String name, Settings   
           settings, boolean useSmart) { 
                super(indexSettings, name, settings); 
 
                analyzer = new EnglishAnalyzer( 
                        Analysis.parseStopWords(env, settings,   
                        EnglishAnalyzer.getDefaultStopSet(), true), 
                        Analysis.parseStemExclusion(settings, 
                        CharArraySet.EMPTY_SET)); 
            } 
 
            public static CustomEnglishAnalyzerProvider 
            getCustomEnglishAnalyzerProvider(IndexSettings  
            indexSettings, Environment env, String name, Settings     
            settings) { 
                return new CustomEnglishAnalyzerProvider(indexSettings, 
                env, name, settings, true); 
             } 
 
            @Override 
            public EnglishAnalyzer get() { 
                return this.analyzer; 
            } 
        }

After building the plugin and installing it on an Elasticsearch server, our analyzer is accessible as any native Elasticsearch analyzer.

How it works...

Creating an analyzer plugin is quite simple. The general workflow is:

Wrap the analyzer initialization in a provider
Register the analyzer provider in the plugin

In the preceding example, we registered a CustomEnglishAnalyzerProvider class, which extends the EnglishAnalyzer class.

public class CustomEnglishAnalyzerProvider extends AbstractIndexAnalyzerProvider<EnglishAnalyzer>

We need to provide a name to analyzer:

public static String NAME="custom_english";

We instantiate a private scope Lucene analyzer to be provided on request with the GET method.

    private final EnglishAnalyzer analyzer;

The CustomEnglishAnalyzerProvider constructor can be injected via Google Guice, with settings that can be used to provide cluster defaults, via index settings or elasticsearch.yml.

public CustomEnglishAnalyzerProvider(IndexSettings indexSettings, Environment env, String name, Settings settings) {

To make it work correctly, we need to set up the parent constructor via the super call.

super(index, indexSettings, name, settings);

Now we can initialize the internal analyzer, which must be returned by the GET method:

analyzer = new EnglishAnalyzer( 
Analysis.parseStopWords(env, settings, EnglishAnalyzer.getDefaultStopSet(), true), 
             Analysis.parseStemExclusion(settings, CharArraySet.EMPTY_SET));

This analyzer accepts:

A list of stopwords that can be loaded by settings or set by the default ones
A list of words that must be excluded by the stemming step

To easily wrap the analyzer we need to create a static method that can be called to create the analyzer; and we'll use it in the plugin definition:

public static CustomEnglishAnalyzerProvider getCustomEnglishAnalyzerProvider(IndexSettings indexSettings, Environment env, String name, Settings settings) { 
    return new CustomEnglishAnalyzerProvider(indexSettings, env, name, settings, true); 
}

Finally we can register our analyzer in the plugin. To do so our plugin must derive from AnalysisPlugin so that we can override the getAnalyzers method:

@Override 
public Map<String, AnalysisModule.AnalysisProvider<AnalyzerProvider<? extends Analyzer>>> getAnalyzers() { 
    Map<String, AnalysisModule.AnalysisProvider<AnalyzerProvider<? extends Analyzer>>> analyzers = new HashMap(); 
    analyzers.put(CustomEnglishAnalyzerProvider.NAME, CustomEnglishAnalyzerProvider::getCustomEnglishAnalyzerProvider); 
    return analyzers; 
}

The :: operator of Java 8 allows us to provide a function that will be used for the construction of our AnalyzerProvider.

There's more...

A plugin extends several Elasticsearch functionalities. To provide them with this requires extending the correct plugin interface. In Elasticsearch 5.x, the plugin interfaces are:

ActionPlugin: This is used for REST and cluster actions
AnalysisPlugin: This is used for extending all the analysis stuff, such as analyzers, tokenizers, tokenFilters, and charFilters
ClusterPlugin: This is used to provide new deciders
DiscoveryPlugin: This is used to provide custom node name resolvers
IngestPlugin: This is used to provide new ingest processors
MapperPlugin: This is used to provide new mappers and metadata mappers
RepositoryPlugin: This allows the provision of new repositories to be used in backup/restore functionalities
ScriptPlugin: This allows the provision of new scripting languages, scripting contexts or native scripts (Java based ones)
SearchPlugin: This allows extending all the search functionalities: Highlighter, aggregations, suggesters, and queries

If your plugin needs to extend more than a single functionality, it can extend from several plugin interfaces at once.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Creating an analyzer plugin

Create new playlist

Sign In

Sign Up

Creating an analyzer plugin

Getting ready

How to do it...

How it works...

There's more...

Table of Contents for
Creating an analyzer plugin