Introducing the spellcheck component

When we refer to the spellchecker, we think about a component that is able to recognize small typos and errors over a term inserted by a casual user. In short, the component works by applying some distance between the terms in search and the terms indexed. When the distance will give results lesser than a certain value, a suggestion will be returned. Nothing is conceptually complicated then, and the component works generally almost well out of the box. Still it's important to do some fine tuning in order to have it work as expected; we always have to remember that if we use only the terms in our index, our precision can degrade very fast due to the domain-specific nature of our application, and to the quantity of the terms we have indexed.

Spellchecking can be seen somehow similar to obtaining suggestions for similar topics. This is one of the most commonly used ways to implement with Solr a functionality similar to the well-know ''did you mean'' by Google.

The most important spellcheckers that can be used are as follows:

  • IndexBasedSpellChecker: This will check for the terms in the index.
  • WordBreakSolrSpellChecker: This does the same function but it's able to handle sequences of words. This can be seen as a sort of "phrase" spellchecker if you want.
  • DirectSolrSpellChecker: This introduces some adjustments over the IndexBasedSpellChecker one, by taking care of spaces, punctuations, and other things. This spellchecker doesn't need to re-index the terms.
  • FileBasedSpellChecker: This can be used to provide suggestions starting from a file of controlled words.

You can find details about the SpellCheck component in the following official wiki page:

http://wiki.apache.org/solr/SpellCheckComponent

There is also a specific page with hints on how to use it for ''did you mean'' autosuggestions at http://wiki.apache.org/solr/Suggester.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.35.122