Chapter 4. Analysis and Analyzers

In the previous chapter, we looked at the basic concepts and definitions of mapping. We talked about fields of metadata and data types. Then, we discussed the relationship between mapping and relevant search results. Finally, we tried to have a good grasp of understanding what the schema-less is in Elasticsearch.

In this chapter, we will review the process of analysis and analyzers. We will examine the tokenizers and we will look closely at the character and token filters. In addition, we will review how to add analyzers to an Elasticsearch configuration. By the end of this chapter, we would have covered the following topics:

  • What is analysis process?
  • What is built-in analyzers?
  • What are doing tokenizers, character, and token filters?
  • What is text normalization?
  • How to create custom analyzers?

Introducing analysis

As mentioned in Chapter 1, Introduction to Efficient Indexing, a huge scale of data is produced at any moment in today's world of information technologies on various platforms, such as social media and medium and large-sized companies, which provide services in communication, health, security, and any other areas. Moreover, initially, such data is in an unstructured form.

We can see that this point of view on the big data takes into account three basic needs/concerns/forms:

  • Recording of data by high performance
  • Accessing of data by high performance
  • Analyzing of data

Big data solutions are mostly related to the aforementioned three basic needs.

Data should be recorded with high performance in order that data can be accessed with fully high performance benefits; however, it is not enough alone. To get the real meaning of data, data must be analyzed.

Thanks to data analysis, the well-established search engines like Google and many social media platforms like Facebook/Twitter are using it successfully.

Let's consider Google with the following screenshot.

Would you accept it if Google does not predict that you're looking for Barcelona when you search for the phrase barca or if does not ask you the Did you mean function when you make a spelling mistake?

To be honest, the answer is absolutely not.

Introducing analysis

If a search engine does not predict what we're looking for, then we use another search engine that can do it.

We're talking about subtle analysis, and more than that, the exact value of Barca is not the same as the exact value barca. We are talking about the understanding of a search. For example, TR relates to Turkey and a search for Jeffrey Jacob Abrams also relates to J.J. Abrams.

The importance of data analysis occurs at this point because the understanding of the aforementioned analysis can only be achieved by data analysis.

We will discuss the analysis process in Elasticsearch in the next sections.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.67.177