Elasticsearch provides, out-of-the-box, a large set of analyzers and tokenizers to cover general needs. Sometimes we need to extend the capabilities of Elasticsearch by adding new analyzers.
Typically you can create an analyzer plugin when you need:
In this recipe we will add a new custom English analyzer similar to the one provided by Elasticsearch.
You need an up-and-running Elasticsearch installation as we described in the Downloading and installing Elasticsearch recipe in Chapter 2, Downloading and Setup.
A Maven tool, or an IDE that supports Java programming, such as Eclipse or IntelliJ IDEA. The code for this recipe is available in the chapter17/analysis_plugin
directory.
An analyzer plugin is generally composed of two classes:
Plugin
class, which implements the org.elasticsearch.plugins.AnalysisPlugin
classAnalyzerProviders
class, which provides an analyzerFor creating an analyzer plugin, we will perform the following steps:
plugin
class is similar to previous recipes, plus a method that returns the analyzers:public class AnalysisPlugin extends Plugin implements org.elasticsearch.plugins.AnalysisPlugin { @Override public Map<String, AnalysisModule.AnalysisProvider <AnalyzerProvider<? extends Analyzer>>> getAnalyzers() { Map<String, AnalysisModule.AnalysisProvider <AnalyzerProvider<? extends Analyzer>>> analyzers = new HashMap(); analyzers.put(CustomEnglishAnalyzerProvider.NAME, CustomEnglishAnalyzerProvider:: getCustomEnglishAnalyzerProvider); return analyzers; } }
AnalyzerProvider
class provides the initialization of our analyzer, passing parameters provided by the settings:package org.elasticsearch.index.analysis; import org.apache.lucene.analysis.en.EnglishAnalyzer; import org.apache.lucene.analysis.util.CharArraySet; import org.elasticsearch.common.settings.Settings; import org.elasticsearch.env.Environment; import org.elasticsearch.index.IndexSettings; public class CustomEnglishAnalyzerProvider extends AbstractIndexAnalyzerProvider<EnglishAnalyzer> { public static String NAME = "custom_english"; private final EnglishAnalyzer analyzer; public CustomEnglishAnalyzerProvider(IndexSettings indexSettings, Environment env, String name, Settings settings, boolean useSmart) { super(indexSettings, name, settings); analyzer = new EnglishAnalyzer( Analysis.parseStopWords(env, settings, EnglishAnalyzer.getDefaultStopSet(), true), Analysis.parseStemExclusion(settings, CharArraySet.EMPTY_SET)); } public static CustomEnglishAnalyzerProvider getCustomEnglishAnalyzerProvider(IndexSettings indexSettings, Environment env, String name, Settings settings) { return new CustomEnglishAnalyzerProvider(indexSettings, env, name, settings, true); } @Override public EnglishAnalyzer get() { return this.analyzer; } }
After building the plugin and installing it on an Elasticsearch server, our analyzer is accessible as any native Elasticsearch analyzer.
Creating an analyzer plugin is quite simple. The general workflow is:
In the preceding example, we registered a CustomEnglishAnalyzerProvider
class, which extends the EnglishAnalyzer
class.
public class CustomEnglishAnalyzerProvider extends AbstractIndexAnalyzerProvider<EnglishAnalyzer>
We need to provide a name to analyzer
:
public static String NAME="custom_english";
We instantiate a private scope Lucene analyzer to be provided on request with the GET
method.
private final EnglishAnalyzer analyzer;
The CustomEnglishAnalyzerProvider
constructor can be injected via Google Guice, with settings that can be used to provide cluster defaults, via index settings or elasticsearch.yml
.
public CustomEnglishAnalyzerProvider(IndexSettings indexSettings, Environment env, String name, Settings settings) {
To make it work correctly, we need to set up the parent constructor via the super
call.
super(index, indexSettings, name, settings);
Now we can initialize the internal analyzer, which must be returned by the GET
method:
analyzer = new EnglishAnalyzer( Analysis.parseStopWords(env, settings, EnglishAnalyzer.getDefaultStopSet(), true), Analysis.parseStemExclusion(settings, CharArraySet.EMPTY_SET));
This analyzer accepts:
To easily wrap the analyzer we need to create a static
method that can be called to create the analyzer; and we'll use it in the plugin definition:
public static CustomEnglishAnalyzerProvider getCustomEnglishAnalyzerProvider(IndexSettings indexSettings, Environment env, String name, Settings settings) { return new CustomEnglishAnalyzerProvider(indexSettings, env, name, settings, true); }
Finally we can register our analyzer in the plugin. To do so our plugin must derive from AnalysisPlugin
so that we can override the getAnalyzers
method:
@Override public Map<String, AnalysisModule.AnalysisProvider<AnalyzerProvider<? extends Analyzer>>> getAnalyzers() { Map<String, AnalysisModule.AnalysisProvider<AnalyzerProvider<? extends Analyzer>>> analyzers = new HashMap(); analyzers.put(CustomEnglishAnalyzerProvider.NAME, CustomEnglishAnalyzerProvider::getCustomEnglishAnalyzerProvider); return analyzers; }
The ::
operator of Java 8 allows us to provide a function that will be used for the construction of our AnalyzerProvider
.
A plugin extends several Elasticsearch functionalities. To provide them with this requires extending the correct plugin interface. In Elasticsearch 5.x, the plugin interfaces are:
ActionPlugin
: This is used for REST and cluster actionsAnalysisPlugin
: This is used for extending all the analysis stuff, such as analyzers, tokenizers, tokenFilters, and charFiltersClusterPlugin
: This is used to provide new decidersDiscoveryPlugin
: This is used to provide custom node name resolversIngestPlugin
: This is used to provide new ingest processorsMapperPlugin
: This is used to provide new mappers and metadata mappersRepositoryPlugin
: This allows the provision of new repositories to be used in backup/restore functionalitiesScriptPlugin
: This allows the provision of new scripting languages, scripting contexts or native scripts (Java based ones)SearchPlugin
: This allows extending all the search functionalities: Highlighter, aggregations, suggesters, and queriesIf your plugin needs to extend more than a single functionality, it can extend from several plugin interfaces at once.
3.12.123.189