Summary

While our recommendation system may not have taken the typical textbook approach, nor may it be the most accurate recommender possible, it does represent a fully demonstrable and incredibly interesting approach to one of the most commonplace techniques in data science today. Further, with persistent data storage, a REST API interface, distributed shared memory caching, and a modern web 2.0-based user interface, it provides a reasonably complete and rounded candidate solution.

Of course, building a production-grade product out of this prototype would still require much effort and expertise. There are still improvements to be sought in the area of signal processing. For example, one could improve the sound pressure and reduce the signal noise by using a loudness filter, http://languagelog.ldc.upenn.edu/myl/StevensJASA1955.pdf, by extracting pitches and melodies, or most importantly, by converting stereo to a mono signal.

Note

All these processes are actually part of an active area of research - readers can look at some of the following publications: http://www.justinsalamon.com/publications.html and http://www.mattmcvicar.com/publications/.

In addition, we questioned how one can improve data science demonstrations by using simple (interactive) user interfaces. As mentioned, this is an often overlooked aspect and a key feature of presentation. Even in the early stages of a project, it's worth investing some time in data visualization, as it can be especially useful when convincing business people of the viability of your product.

One final thought, as an aspirational chapter we explored innovative ways to address data science use cases in a Spark environment. By balancing skills between mathematics and computer science, data scientists should feel free to explore, to be creative, to push back the frontier of what is feasible, to undertake what people say is not, but most importantly, to have fun with data. For this is the main reason why being a data scientist is considered the sexiest job of the 21st century.

This chapter was a musical interlude. In the next chapter, we will be looking at classifying GDELT articles by bootstrapping a classification model using Twitter data, another ambitious task to say the least.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.139.15