I dedicate this book to my wife, Neha; my son, Ziaan; and my parents. Without you guys, this book wouldn’t have been possible. You complete my world and are the source of my strength.
Before even starting to write this book, I asked myself a question: Is there a need for another book on Machine Learning? I mean that there are so many books written on this subject already that this might end up as just another book on the shelf. To find the answer, I spent a lot of time thinking and after a while, a few patterns started to emerge. The books that have been written on Machine Learning were too detailed and lacked a high-level overview. Most of these would start really easy but after a couple of chapters, it felt overwhelming to continue as the content became too deep. As a result, readers would give up without getting enough out of the book. That’s why I wanted to write this book, which demonstrates the different ways of using Machine Learning without getting too deep, yet capturing the complete methodology to build an ML model from scratch. The next obvious question was this: Why Machine Learning using PySpark? The answer to this question did not take too long since I am a practicing Data Scientist and well aware of the challenges faced by people dealing with data. Most of the packages or modules are often limited as they process data on a single machine. Moving from a development to production environment becomes a nightmare if ML models are not meant to handle Big Data, and finally the processing of data itself needs to be fast and scalable. For all these reasons, it made complete sense to write this book on Machine Learning using PySpark to understand the process of using Machine Learning from a Big Data standpoint.
Now we come to the core of the book Machine Learning with PySpark . This book is divided into three different sections. The first section gives the introduction to Machine Learning and Spark, the second section talks about Machine Learning in detail using Big Data, and finally the third part showcases Recommender Systems and NLP using PySpark. This book might also be relevant for Data Analysts and Data Engineers as it covers steps of Big Data processing using PySpark as well. The readers who want to make a transition to Data Science and the Machine Learning field would also find this book easier to start with and can gradually take up more complicated stuff later. The case studies and examples given in the book make it really easy to follow along and understand the fundamental concepts. Moreover, there are very few books available on PySpark out there, and this book would certainly add some value to the knowledge of the readers. The strength of this book lies in explaining the Machine Learning algorithms in the most simplistic ways and uses a practical approach toward building them using PySpark.
I have put in my entire experience and learning into this book and feel it is precisely relevant to what businesses are seeking out there to solve real challenges. I hope you have some useful takeaways from this book.
This book wouldn’t have seen the light of the day if a few people were not there with me during this journey. I had heard the quote “Easier said than done” so many times in my life, but I had the privilege to experience it truly while writing this book. To be honest, I was extremely confident of writing this book initially, but as I progressed into writing it, things started becoming difficult. It’s quite ironic because when you think about the content, you are crystal clear in your mind, but when you go on to write it on a piece of paper, it suddenly starts becoming confusing. I struggled a lot, yet this period has been revolutionary for me personally. First, I must thank the most important person in my life, my beloved wife, Neha, who selflessly supported me throughout this time and sacrificed so much just to ensure that I completed this book.
I need to thank Suresh John Celestin who believed in me and offered me this break to write this book. Aditee Mirashi is one of the best editors to start your work with. She was extremely supportive and always there to respond to all my queries. You can imagine that for a person writing his first book, the number of questions that I must have had. I would like to especially thank Matthew Moodie, who dedicated his time for reading every single chapter and giving so many useful suggestions. Thanks, Matthew; I really appreciate it. Another person that I want to thank is Leonardo De Marchi who had the patience of reviewing every single line of code and check the appropriateness of each example. Thank you, Leo, for your feedback and your encouragement. It really made a difference to me and the book as well. I also want to thank my mentors who have constantly forced me to chase my dreams. Thank you, Alan Wexler, Dr. Vijay Agneeswaran, Sreenivas Venkatraman, Shoaib Ahmed, and Abhishek Kumar for your time.
Finally, I am infinitely grateful to my son, Ziaan, and my parents for the endless love and support irrespective of circumstances. You guys remind me that life is beautiful.
is a Manager, Data Science at Publicis.Sapient and works as a Data Science track lead for a project with Mercedes Benz. He has extensive hands-on experience in Machine Learning, Data Engineering, programming, and designing algorithms for various business requirements in domains such as retail, telecom, automobile, and consumer goods. He drives lot of strategic initiatives that deal with Machine Learning and AI at Publicis.Sapient. He received his Bachelor’s degree in Electrical and Electronics Engineering from Mumbai University, an MBA (Operations & Finance) from Symbiosis International University along with Data Analytics Certification from IIM – Calcutta. He has spent the last eight plus years working on multiple Data projects. He has used Machine Learning and Deep Learning techniques in numerous client projects using R, Python, Spark, and TensorFlow. He has also been a regular speaker at major conferences and universities. He conducts Data Science meetups at Publicis.Sapient and regularly presents webinars on ML and AI. He lives in Bangalore with his wife and two-year-old son. In his spare time, he enjoys playing guitar, coding, reading, and watching football.
holds a Master’s in Artificial intelligence and has worked as a Data Scientist in the sports world, with clients such as the New York Knicks and Manchester United, and with large social networks such as Justgiving.
He now works as Lead Data Scientist in Badoo, the largest dating site with over 360 million users, he is also the lead instructor at ideai.io , a company specializing in Deep Learning and Machine Learning training and is a contractor for the European Commission.
18.188.216.249