What is text analysis?

Text analysis is a Solr mechanism that takes place in two phases:

  • During index time, optimize the input terms, feeding the information, generates the token stream and builds the indexes
  • During query time, optimize the query terms, generates the token stream, matches with the term generated at index time, and provides results

Let’s dive deeper and understand:

  • How exactly Solr works to build indexes
  • How to optimize the query terms to match with indexes
  • How we get accurate, efficient, and fast results

If someone is searching for the string The Host Country of Soccer World Cup 2018 and someone else is searching for the string The Host Nation of Football world cup 2018, the result should be Russia in both the cases. We will learn later in this chapter how Solr matches a query containing Nation and Football to documents containing Country and Soccer.

We can't assume which type of search input comes from the end users during the search, for example:

  • Searching for Soccer and Football
  • Searching for Unites States Of America and USA
  • Searching for South Africa and RSA
  • Searching for air-crew, aircrew, and air crew 

All of these are different input patters that contain ideally the same meaning in natural languages. But the user may provide input in a non-natural language also, like this:

  • Searching for Hundred GB and 100 gigabyte
  • Searching for Caffé and cafe

There are also other complex search patterns that may be used by end users at query time. So, looking at the overall scope of possible search patterns used by end users during search time, Solr has to be ready to determine all possible search patterns, analyze them, and output them accurately with efficient results.

Here, however, we don't have to worry because The Solr is an intelligent search engine that handles all search input patterns being used across the world. So now, without thinking too much about Solr's searching capability, let's see how Solr actually works to meet all our search requirements.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.109.4