Summary

Hooray! We have completed the first part of the advanced feature of this book; that is, machine learning with Elasticsearch. We have introduced the machine learning feature of the Elastic Stack. We created a single-metric job to track the volume field to detect anomalies in the data of the cf_rfem_hist_price index. We have also introduced the Python scikit-learn library and the unsupervised learning algorithm, k-means clustering. The KMean class is provided in the sklearn.cluster package. We have extracted data from the cf_rfem_hist_price index and used three fields, changeOverTime, changePercent, and volume, to construct multidimensional input data, in order for the k-means clustering to find the anomalies. By using the matplotlib.pyplot() function, we have plotted a graph to show the anomalies and the regular data.

In the next chapter, we will provide an overview of Elasticsearch for Apache Hadoop, known as ES-Hadoop, which enables big data businesses to enhance their Hadoop workflows with the search and analytics engine. We'll learn about the skills of reading data from an Elasticsearch index, performing some computations using Spark, and then writing the results into another Elasticsearch index.

Table of Contents for Summary

Create new playlist

Sign In

Sign Up

Table of Contents for
Summary