Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Ashwin Pajankar and Aditya Joshi

Hands-on Machine Learning with Python

Implement Neural Network Solutions with Scikit-learn and PyTorch

Logo of the publisher

Ashwin Pajankar

Nashik, Maharashtra, India

Aditya Joshi

Haldwani, Uttarakhand, India

ISBN 978-1-4842-7920-5e-ISBN 978-1-4842-7921-2

https://doi.org/10.1007/978-1-4842-7921-2

Apress Standard

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Apress imprint is published by the registered company APress Media, LLC part of Springer Nature.

The registered company address is: 1 New York Plaza, New York, NY 10004, U.S.A.

This book is dedicated to the memory of our teacher, Prof. Govindarajulu Regeti (July 9, 1945–March 18, 2021)

Popularly known to everyone as RGR, Prof. Govindarajulu obtained his B.Tech. in Electrical and Electronics Engineering from JNTU Kakinada. He also earned his M.Tech. and Ph.D. from IIT Kanpur. Prof. Govindarajulu was an early faculty member of IIIT Hyderabad and played a significant role in making IIIT Hyderabad a top-class institution that it grew to become today. He was by far the most loved and cheered for faculty member of the institute. He was full of energy to teach and full of old-fashioned charm. There is no doubt he cared for every student as an individual, taking care to know about and to guide them. He has taught, guided, and mentored many batches of students at IIIT Hyderabad (including one of the authors of the book, Ashwin Pajankar).

Introduction

We have long been planning to collaborate and write a book on machine learning. This field has grown and expanded immensely since we started learning these topics almost a decade ago. We realized that, as lifelong learners ourselves, the initial few steps in any field require a much clearer source that shows a path clearly. This also requires a crisp set of explanation and occasional ideas to expand the learning experience by reading, learning, and utilizing what you have learned. We have used Python for a long duration in our academic life and professional careers in software development, data science, and machine learning. Through this book, we have made a very humble attempt to write a step-by-step guide on the topic of machine learning for absolute beginners. Every chapter of the book has the explanation of the concepts used, code examples, explanation of the code examples, and screenshots of the outputs.

The first chapter covers the setup of the Python environment on different platforms. The second chapter covers NumPy and Ndarrays. The third chapter explores visualization with Matplotlib. The fourth chapter introduces us to the Pandas data science library. All these initial chapters build the programming and basic data crunching foundations that are one of the prerequisites for learning machine learning.

The next section discusses traditional machine learning approaches. In Chapter 5, we start with a bird’s-eye view of the field of machine learning followed by the installation of Scikit-learn and a short and quick example of a machine learning solution with Scikit-learn. Chapter 6 elaborates methods to help you understand and transform structural, textual, and image data into the format that’s acceptable by machine learning libraries. In Chapter 7, we introduce supervised learning methods, starting with linear regression for regression problems and logistic regression and decision trees for classification problems. In each of the experiments, we also show how to plot visualizations that the algorithm has learned with the use of decision boundary plots. The eighth chapter ponders over further fine-tuning of machine learning models. We explain some ideas for measuring the performance of the models, issues of overfitting and underfitting, and approaches for handling such issues and improving the model performance. The ninth chapter continues the discussion of supervised learning methods especially focusing on naive Bayes and Support Vector Machines. The tenth chapter explains ensemble learning methods, which are the solutions that combine multiple simpler models to produce a performance better than what they might offer individually. In the eleventh chapter, we discuss unsupervised learning methods, specifically focusing on dimensionality reduction, clustering, and frequent pattern mining methods. Each part contains a complete example of implementing the discussed methods using Scikit-learn.

The last section begins with introducing the basic ideas of neural network and deep learning in the twelfth chapter. We introduce a highly popular open source machine learning framework, PyTorch, that will be used in the examples in the subsequent chapters. The thirteenth chapter begins with the explanation of artificial neural networks and thoroughly discusses the theoretical foundations of feedforward and backpropagation, followed by a short discussion on loss functions and an example of a simple neural network. In the second half, we explain how to create a multilayer neural network that is capable of identifying handwritten digits. In the fourteenth chapter, we discuss convolutional neural networks and work through an example for image classification. The fifteenth chapter discusses recurrent neural networks and walks you through a sequence modeling problem. In the final, sixteenth chapter, we discuss strategies for planning, managing, and engineering machine learning and data science projects. We also discuss a short end-to-end example of sentiment analysis using deep learning.

If you are new to the subject, we highly encourage you to follow the chapters sequentially as the ideas build upon each other. Follow through all the code sections, and feel free to modify and tweak the code structure, datasets, and hyperparameters. If you already know some of the topics, feel free to skip to the topics of your interest and examine the relevant sections thoroughly. We wish you the best for your learning experience.

Acknowledgments

I would like to express my gratitude toward Aditya Joshi, my junior from IIIT Hyderabad and now an esteemed colleague who has written the major and the most important section of this book. I also wish to thank my mentors from Apress, Celestin, Aditee, James Markham, and the editorial team. I wish to thank the reviewers who helped me make this book better. I also thank Prof. Govindrajulu’s family – Srinivas (son) and Amy (daughter-in-law) – for allowing me to dedicate this book to his memory and sharing his biographical information and his photograph for publication.

—Ashwin Pajankar

My work on this book started with a lot of encouragement and support from my father, Ashok Kumar Joshi, who couldn’t live long enough to see it till completion. I am extremely grateful to friends and family – especially my mother, Bhavana Joshi, and many others, whose constant support was the catalyst to help me work on this project. I also want to extend my heartiest thanks to my wife, Neha Pandey, who was supportive and patient enough when I extended my work especially during weekends. I would like to thank Ashwin Pajankar, who’s been not just a coauthor but a guide throughout this journey. I’d also like to extend my gratitude to the Innomatics team, Kalpana Katiki Reddy, Vishwanath Nyathani, and Raghuram Aduri, for giving me opportunities to interact with hundreds of students who are learning data science and machine learning. I’d also like to thank Akshaj Verma for his support with code examples in one of the advanced chapters. I also thank the editorial team at Apress, especially Celestin Suresh John, Aditee Mirashi, James Markham, and everyone who was involved in the process.

—Aditya Joshi

Table of Contents

Section 1: Python for Machine Learning1

Chapter 1: Getting Started with Python 3 and Jupyter Notebook3

Python 3 Programming Language4

History of Python Programming Language4

Where Python Is Used5

Installing Python6

Python on Linux Distributions8

Scientific Python Ecosystem18

Python Implementations and Distributions19

Anaconda20

Summary21

Chapter 2: Getting Started with NumPy23

Getting Started with NumPy24

Multidimensional Ndarrays26

Indexing of Ndarrays26

Ndarray Properties28

NumPy Constants29

Summary30

Chapter 3: Introduction to Data Visualization31

NumPy Routines for Ndarray Creation31

Matplotlib Data Visualization34

Summary43

Chapter 4: Introduction to Pandas45

Pandas Basics45

Series in Pandas46

Properties of Series47

Pandas Dataframes48

Visualizing the Data in Dataframes50

Summary61

Section 2: Machine Learning Approaches63

Chapter 5: Introduction to Machine Learning with Scikit-learn65

Learning from Data66

Supervised Learning66

Unsupervised Learning67

Structure of a Machine Learning System68

Problem Understanding69

Data Collection69

Data Annotation and Data Preparation70

Data Wrangling70

Model Development, Training, and Evaluation70

Model Deployment71

Scikit-Learn72

Installing Scikit-Learn72

Understanding the API73

Your First Scikit-learn Experiment75

Summary77

Chapter 6: Preparing Data for Machine Learning79

Types of Data Variables79

Transforming Nominal Attributes82

Transforming Ordinal Attributes84

Five-Step NLP Pipeline90

Preprocessing Images94

Summary97

Chapter 7: Supervised Learning Methods: Part 199

Linear Regression99

Finding the Regression Line100

Logistic Regression107

Line vs. Curve for Expression Probability108

Learning the Parameters109

Logistic Regression Using Python110

Visualizing the Decision Boundary112

Decision Trees114

Building a Decision Tree114

Decision Tree in Python116

Summary120

Chapter 8: Tuning Supervised Learners121

Training and Testing Processes121

Measures of Performance122

Confusion Matrix122

Precision124

Accuracy124

F-Measure124

Performance Metrics in Python125

Cross Validation128

Why Cross Validation?129

Cross Validation in Python129

ROC Curve131

Overfitting and Regularization134

Bias and Variance138

Regularization139

Hyperparameter Tuning142

Effect of Hyperparameters143

Summary148

Chapter 9: Supervised Learning Methods: Part 2149

Naive Bayes149

Bayes Theorem150

Conditional Probability150

How Naive Bayes Works151

Multinomial Naive Bayes151

Naive Bayes in Python152

Support Vector Machines155

How SVM Works155

Nonlinear Classification157

Kernel Trick in SVM158

Support Vector Machines in Python159

Summary165

Chapter 10: Ensemble Learning Methods167

Bagging and Random Forest168

Random Forest in Python170

Boosting172

Boosting in Python175

Stacking Ensemble180

Stacking in Python181

Summary184

Chapter 11: Unsupervised Learning Methods185

Dimensionality Reduction185

Understanding the Curse of Dimensionality186

Principal Component Analysis187

Principal Component Analysis in Python189

Clustering192

Clustering Using K-Means193

K-Means in Python194

Frequent Pattern Mining205

Market Basket Analysis206

Frequent Pattern Mining in Python207

Summary211

Section 3: Neural Networks and Deep Learning213

Chapter 12: Neural Network and PyTorch Basics215

Installing PyTorch216

PyTorch Basics217

Creating a Tensor217

Tensor Operations219

Perceptron221

Perceptron in Python223

Artificial Neural Networks225

Summary226

Chapter 13: Feedforward Neural Networks227

Feedforward Neural Network228

Training Neural Networks229

Loss Functions233

ANN for Regression234

Activation Functions239

ReLU Activation Function240

Sigmoid Activation Function240

Tanh Activation Function241

Multilayer ANN242

NN Class in PyTorch248

Overfitting and Dropouts251

Classifying Handwritten Digits253

Summary260

Chapter 14: Convolutional Neural Networks261

Convolution Operation261

Structure of a CNN266

Padding and Stride267

CNN in PyTorch268

Image Classification Using CNN271

What Did the Model Learn?282

Deep Networks of CNN284

Summary284

Chapter 15: Recurrent Neural Networks285

Recurrent Unit286

Types of RNN288

One to One288

One to Many289

Many to One289

Many to Many289

RNN in Python290

Long Short-Term Memory291

LSTM Cell292

Time Series Prediction293

Gated Recurrent Unit304

Summary305

Chapter 16: Bringing It All Together307

Data Science Life Cycle308

CRISP-DM Process309

How ML Applications Are Served312

Learning with an Example312

Defining the Problem313

Data313

Preparing the Model319

Serializing for Future Predictions325

Hosting the Model327

What’s Next328

Index329

About the Authors

Ashwin Pajankar

is an author, an online instructor, a content creator, and a YouTuber. He has earned a Bachelor of Engineering from SGGSIE&T Nanded and an M.Tech. in Computer Science and Engineering from IIIT Hyderabad. He was introduced to the amazing world of electronics and computer programming at the age of seven. BASIC is the very first programming language he learned. He has a lot of experience in programming with Assembly Language, C, C++, Visual Basic, Java, Shell Scripting, Python, SQL, and JavaScript. He also loves to work with single-board computers and microcontrollers like Raspberry Pi, Banana Pro, Arduino, BBC Microbit, and ESP32.

He is currently focusing on developing his YouTube channel on computer programming, electronics, and microcontrollers.

Aditya Joshi

is a machine learning engineer who’s worked in data science and ML teams of early to mid-stage startups. He has earned a Bachelor of Engineering from Pune University and an M.S. in Computer Science and Engineering from IIIT Hyderabad. He became interested in machine learning during his masters and got associated with the Search and Information Extraction Lab at IIIT Hyderabad. He loves to teach, and he has been involved in training workshops, meetups, and short courses.

About the Technical Reviewer

Joos Korstanje

is a data scientist, with over five years of industry experience in developing machine learning tools, of which a large part is forecasting models. He currently works at Disneyland Paris where he develops machine learning for a variety of tools.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Front Matter

Create new playlist

Sign In

Sign Up

Hands-on Machine Learning with Python

Implement Neural Network Solutions with Scikit-learn and PyTorch

Table of Contents for
Front Matter