Introduction

This book is a concise and easy-to-understand tutorial for big data and Spark. It will help you learn how to use Spark for a variety of big data analytic tasks. It covers everything that you need to know to productively use Spark.

One of the benefits of purchasing this book is that it will help you learn Spark efficiently; it will save you a lot of time. The topics covered in this book can be found on the Internet. There are numerous blogs, presentations, and YouTube videos covering Spark. In fact, the amount of material on Spark can be overwhelming. You could spend months reading bits and pieces about Spark at different places on the Web. This book provides a better alternative with the content nicely organized and presented in an easy-to-understand format.

The content and the organization of the material in this book are based on the Spark workshops that I occasionally conduct at different big data–related conferences. The positive feedback given by the attendees for both the content and the flow motivated me to write this book.

One of the differences between a book and a workshop is that the latter is interactive. However, after conducting a number of Spark workshops, I know the kind of questions people generally have and I have addressed those in the book. Still, if you have questions as you read the book, I encourage you to contact me via LinkedIn or Twitter. Feel free to ask any question. There is no such thing as a stupid question.

Rather than cover every detail of Spark, the book covers important Spark-related topics that you need to know to effectively use Spark. My goal is to help you build a strong foundation. Once you have a strong foundation, it is easy to learn all the nuances of a new technology. In addition, I wanted to keep the book as simple as possible. If Spark looks simple after reading this book, I have succeeded in my goal.

No prior experience is assumed with any of the topics covered in this book. It introduces the key concepts, step by step. Each section builds on the previous section. Similarly, each chapter serves as a stepping-stone for the next chapter. You can skip some of the later chapters covering the different Spark libraries if you don’t have an immediate need for that library. However, I encourage you to read all the chapters. Even though it may not seem relevant to your current project, it may give you new ideas.

You will learn a lot about Spark and related technologies from reading this book. However, to get the most out of this book, type the examples shown in the book. Experiment with the code samples. Things become clearer when you write and execute code. If you practice and experiment with the examples as you read the book, by the time you finish reading it, you will be a solid Spark developer.

One of the resources that I find useful when I am developing Spark applications is the official Spark API (application programming interface) documentation. It is available at http://spark.apache.org/docs/latest/api/scala. As a beginner, you may find it hard to understand, but once you have learned the basic concepts, you will find it very useful.

Another useful resource is the Spark mailing list. The Spark community is active and helpful. Not only do the Spark developers respond to questions, but experienced Spark users also volunteer their time helping new users. No matter what problem you run into, chances are that someone on the Spark mailing list has solved that problem.

And, you can reach out to me. I would love to hear from you. Feedback, suggestions, and questions are welcome.

—Mohammed Guller
LinkedIn: www.linkedin.com/in/mohammedguller
Twitter: @MohammedGuller

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.182.96