Creating an analyzer plugin

Elasticsearch provides, out-of-the-box, a large set of analyzers and tokenizers to cover general needs. Sometimes we need to extend the capabilities of Elasticsearch by adding new analyzers.

Typically you can create an analyzer plugin when you need:

  • To add standard Lucene analyzers/tokenizers not provided by Elasticsearch
  • To integrate third-part analyzers
  • To add custom analyzers

In this recipe we will add a new custom English analyzer similar to the one provided by Elasticsearch.

Getting ready

You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.

A Maven tool, or an IDE that supports Java programming, such as Eclipse or IntelliJ IDEA. The code for this recipe is available in the chapter17/analysis_plugin directory.

How to do it...

An analyzer plugin is generally composed of two classes:

  • A Plugin class, which implements the org.elasticsearch.plugins.AnalysisPlugin class
  • An AnalyzerProviders class, which provides an analyzer

For creating an analyzer plugin, we will perform the following steps:

  1. The plugin class is similar to previous recipes, plus a method that returns the analyzers:
            public class AnalysisPlugin extends Plugin implements      
            org.elasticsearch.plugins.AnalysisPlugin { 
                @Override 
                public Map<String, AnalysisModule.AnalysisProvider
                <AnalyzerProvider<? extends Analyzer>>>
                getAnalyzers() { 
                    Map<String, AnalysisModule.AnalysisProvider
                    <AnalyzerProvider<? extends Analyzer>>>
                analyzers = new HashMap(); 
                    analyzers.put(CustomEnglishAnalyzerProvider.NAME,            
                    CustomEnglishAnalyzerProvider::
                    getCustomEnglishAnalyzerProvider); 
                    return analyzers; 
                } 
            } 
    
  2. The AnalyzerProvider class provides the initialization of our analyzer, passing parameters provided by the settings:
            package org.elasticsearch.index.analysis; 
     
            import org.apache.lucene.analysis.en.EnglishAnalyzer; 
            import org.apache.lucene.analysis.util.CharArraySet; 
            import org.elasticsearch.common.settings.Settings; 
            import org.elasticsearch.env.Environment; 
            import org.elasticsearch.index.IndexSettings; 
     
            public class CustomEnglishAnalyzerProvider extends      
            AbstractIndexAnalyzerProvider<EnglishAnalyzer> { 
                public static String NAME = "custom_english"; 
     
                private final EnglishAnalyzer analyzer; 
     
               public CustomEnglishAnalyzerProvider(IndexSettings   
               indexSettings, Environment env, String name, Settings   
               settings, boolean useSmart) { 
                    super(indexSettings, name, settings); 
     
                    analyzer = new EnglishAnalyzer( 
                            Analysis.parseStopWords(env, settings,   
                            EnglishAnalyzer.getDefaultStopSet(), true), 
                            Analysis.parseStemExclusion(settings, 
                            CharArraySet.EMPTY_SET)); 
                } 
     
                public static CustomEnglishAnalyzerProvider 
                getCustomEnglishAnalyzerProvider(IndexSettings  
                indexSettings, Environment env, String name, Settings     
                settings) { 
                    return new CustomEnglishAnalyzerProvider(indexSettings, 
                    env, name, settings, true); 
                 } 
     
                @Override 
                public EnglishAnalyzer get() { 
                    return this.analyzer; 
                } 
            } 
    

After building the plugin and installing it on an Elasticsearch server, our analyzer is accessible as any native Elasticsearch analyzer.

How it works...

Creating an analyzer plugin is quite simple. The general workflow is:

  • Wrap the analyzer initialization in a provider
  • Register the analyzer provider in the plugin

In the preceding example, we registered a CustomEnglishAnalyzerProvider class, which extends the EnglishAnalyzer class.

public class CustomEnglishAnalyzerProvider extends AbstractIndexAnalyzerProvider<EnglishAnalyzer> 

We need to provide a name to analyzer:

public static String NAME="custom_english"; 

We instantiate a private scope Lucene analyzer to be provided on request with the GET method.

    private final EnglishAnalyzer analyzer; 

The CustomEnglishAnalyzerProvider constructor can be injected via Google Guice, with settings that can be used to provide cluster defaults, via index settings or elasticsearch.yml.

public CustomEnglishAnalyzerProvider(IndexSettings indexSettings, Environment env, String name, Settings settings) { 

To make it work correctly, we need to set up the parent constructor via the super call.

super(index, indexSettings, name, settings); 

Now we can initialize the internal analyzer, which must be returned by the GET method:

analyzer = new EnglishAnalyzer( 
Analysis.parseStopWords(env, settings, EnglishAnalyzer.getDefaultStopSet(), true), 
             Analysis.parseStemExclusion(settings, CharArraySet.EMPTY_SET)); 

This analyzer accepts:

  • A list of stopwords that can be loaded by settings or set by the default ones
  • A list of words that must be excluded by the stemming step

To easily wrap the analyzer we need to create a static method that can be called to create the analyzer; and we'll use it in the plugin definition:

public static CustomEnglishAnalyzerProvider getCustomEnglishAnalyzerProvider(IndexSettings indexSettings, Environment env, String name, Settings settings) { 
    return new CustomEnglishAnalyzerProvider(indexSettings, env, name, settings, true); 
} 

Finally we can register our analyzer in the plugin. To do so our plugin must derive from AnalysisPlugin so that we can override the getAnalyzers method:

@Override 
public Map<String, AnalysisModule.AnalysisProvider<AnalyzerProvider<? extends Analyzer>>> getAnalyzers() { 
    Map<String, AnalysisModule.AnalysisProvider<AnalyzerProvider<? extends Analyzer>>> analyzers = new HashMap(); 
    analyzers.put(CustomEnglishAnalyzerProvider.NAME, CustomEnglishAnalyzerProvider::getCustomEnglishAnalyzerProvider); 
    return analyzers; 
} 

The :: operator of Java 8 allows us to provide a function that will be used for the construction of our AnalyzerProvider.

There's more...

A plugin extends several Elasticsearch functionalities. To provide them with this requires extending the correct plugin interface. In Elasticsearch 5.x, the plugin interfaces are:

  • ActionPlugin: This is used for REST and cluster actions
  • AnalysisPlugin: This is used for extending all the analysis stuff, such as analyzers, tokenizers, tokenFilters, and charFilters
  • ClusterPlugin: This is used to provide new deciders
  • DiscoveryPlugin: This is used to provide custom node name resolvers
  • IngestPlugin: This is used to provide new ingest processors
  • MapperPlugin: This is used to provide new mappers and metadata mappers
  • RepositoryPlugin: This allows the provision of new repositories to be used in backup/restore functionalities
  • ScriptPlugin: This allows the provision of new scripting languages, scripting contexts or native scripts (Java based ones)
  • SearchPlugin: This allows extending all the search functionalities: Highlighter, aggregations, suggesters, and queries

If your plugin needs to extend more than a single functionality, it can extend from several plugin interfaces at once.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.123.189