Preface

If you have ever wanted to get into data mining, but didn't know where to start, I've written this book with you in mind.

Many data mining books are highly mathematical, which is great when you are coming from such a background, but I feel they often miss the forest for the trees—that is, they focus so much on how the algorithms work, that we forget about why we are using these algorithms.

In this book, my aim has been to create a book for those who can program and want to learn data mining. By the end of this book, my aim is that you have a good understanding of the basics, some best practices to jump into solving problems with data mining, and some pointers on the next steps you can take.

Each chapter in this book introduces a new topic, algorithm, and dataset. For this reason, it can be a bit of a whirlwind tour, moving quickly from topic to topic. However, for each of the chapters, think about how you can improve upon the results presented in the chapter. Then, take a shot at implementing it!

One of my favorite quotes is from Shakespeare's Henry IV:

But will they come when you do call for them?

Before this quote, a character is claiming to be able to call spirits. In response, Hotspur points out that anyone can call spirits, but what matters is whether they actually come when they are called.

In much the same way, learning data mining is about performing experiments and getting the result. Anyone can come up with an idea to create a new data mining algorithm or improve upon an experiment's results. However, what matters is: can you build it and does it work?

What this book covers

Chapter 1, Getting Started with Data Mining, introduces the technologies we will be using, along with implementing two basic algorithms to get started.

Chapter 2, Classifying with scikit-learn Estimators, covers classification, which is a key form of data mining. You'll also learn about some structures to make your data mining experimentation easier to perform..

Chapter 3, Predicting Sports Winners with Decision Trees, introduces two new algorithms, Decision Trees and Random Forests, and uses them to predict sports winners by creating useful features.

Chapter 4, Recommending Movies Using Affinity Analysis, looks at the problem of recommending products based on past experience and introduces the Apriori algorithm.

Chapter 5, Extracting Features with Transformers, introduces different types of features you can create and how to work with different datasets.

Chapter 6, Social Media Insight Using Naive Bayes, uses the Naive Bayes algorithm to automatically parse text-based information from the social media website, Twitter.

Chapter 7, Discovering Accounts to Follow Using Graph Mining, applies cluster and network analysis to find good people to follow on social media.

Chapter 8, Beating CAPTCHAs with Neural Networks, looks at extracting information from images and then training neural networks to find words and letters in those images.

Chapter 9, Authorship Attribution, looks at determining who wrote a given document, by extracting text-based features and using support vector machines.

Chapter 10, Clustering News Articles, uses the k-means clustering algorithm to group together news articles based on their content.

Chapter 11, Classifying Objects in Images Using Deep Learning, determines what type of object is being shown in an image, by applying deep neural networks.

Chapter 12, Working with Big Data, looks at workflows for applying algorithms to big data and how to get insight from it.

Appendix, Next Steps…, goes through each chapter, giving hints on where to go next for a deeper understanding of the concepts introduced.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.32.222