Foreword

Big data is getting bigger and bigger day by day. And I don't mean tera, peta, exa, zetta, and yotta bytes of data collected all over the world every day. I refer to complexity and number of components utilized in any decent and respectable big data ecosystem. Never mind the technical nitties gritties—just keeping up with terminologies, new buzzwords, and hypes popping up all the time can be a real challenge in itself. By the time you have mastered them all, and put your hard-earned knowledge to practice, you will discover that half of them are old and inefficient, and nobody uses them anymore. Spark is not one of those "here today, gone tomorrow" fads. Spark is here to stay with us for the foreseeable future, and it is well worth to get your teeth into it in order to get some value out of your data NOW, rather than in some, errr, unforeseeable future. Spark and the technologies built on top of it are the next crucial step in the big data evolution. They offer 100x faster in-memory, and 10x on disk processing speeds in comparison to the traditional Hadoop jobs.

There's no better way of getting to know Spark than by reading this book, written by Mike Frampton, a colleague of mine, whom I first met many, many years ago and have kept in touch ever since. Mike's main professional interest has always been data and in pre-big data days, he worked on data warehousing, processing, and analyzing projects for major corporations. He experienced the inefficiencies, poor value, and frustrations that the traditional methodologies of crunching the data offer first hand. So understanding big data, what it offers, where it is coming from, and where it is heading, and is intrinsically intuitive to him. Mike wholeheartedly embraced big data the moment it arrived, and has been devoted to it ever since. He practices what he preaches, and is not in it for money. He is very active in the big data community, writes books, produces presentations on SlideShare and YouTube, and is always first to test-drive the new, emerging products.

Mike's passion for big data, as you will find out, is highly infectious, and he is always one step ahead, exploring the new and innovative ways big data is used for. No wonder that in this book, he will teach you how to use Spark in conjunction with the very latest technologies; some of them are still in development stage, such as machine learning and Neural Network. But fear not, Mike will carefully guide you step by step, ensuring that you will have a direct, personal experience of the power and usefulness of these technologies, and are able to put them in practice immediately.

Andrew Szymanski

Cloudera Certified Hadoop Administrator/Big Data Specialist

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.69.157