Vaibhav Verdhan

Supervised Learning with Python

Concepts and Practical Implementation Using Python

1st ed.
Foreword by Dr. Eli Yechezkiel Kling (PhD)
Vaibhav Verdhan
Limerick, Ireland
ISBN 978-1-4842-6155-2e-ISBN 978-1-4842-6156-9
© Vaibhav Verdhan 2020
Apress Standard
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Distributed to the book trade worldwide by Springer Science+Business Media New York, 233 Spring Street, 6th Floor, New York, NY 10013. Phone 1-800-SPRINGER, fax (201) 348-4505, e-mail [email protected], or visit www.springeronline.com. Apress Media, LLC is a California LLC and the sole member (owner) is Springer Science + Business Media Finance Inc (SSBM Finance Inc). SSBM Finance Inc is a Delaware corporation.

To Yashi, Pakhi and Rudra.

Foreword

How safe is home birthing? That is a good question. Pause a moment and let yourself contemplate it.

I am sure you can see how the answer to this question can affect personal decisions and policy choices. The answer could be given as a probability, a level classification, or an alternative cost. Another natural reaction is “it depends.” There are many factors that could affect the safety of home birthing.

I took you through this thought exercise to show you that you naturally think like a data scientist. You understood the importance of stipulating clearly the focus of the analysis and what could explain different outcomes. The reason you are embarking on a journey through this book is that you are not sure how to express these instinctive notions mathematically and instruct a computer to “find” the relationship between the “Features” and the “Target.”

When I started my career 30-odd years ago, this was the domain of statisticians who crafted a mathematical language to describe relationships and noise. The purpose of predictive modeling was in its essence to be a tool for separating a signal or a pattern out of seemingly chaotic information and reporting how well the partition was done.

Today, machine learning algorithms harnessing computing brute force add a new paradigm. This has created a new profession: the data scientist. The data scientist is a practitioner who can think in terms of statistical methodology, instruct a computer to carry out the required processing, and interpret the results and reports.

Becoming a good data scientist is a journey that starts with learning the basics and mechanics. Once you are done exploring this book you might also be able to better see where you will want to deepen your theoretical knowledge. I would like to suggest you might find it interesting to look into the theory of statistical modeling in general and the Bayesian paradigm specifically. Machine learning is computational statistics after all.

Dr. Eli. Y. Kling (BSc. Eng. MSc. PHD) London, UK. June 2020.

Introduction

“It is tough to make predictions, especially about the future.”

—Yogi Berra

In 2019, MIT’s Katie Bouman processed five petabytes of data to develop the first-ever image of a black hole. Data science, machine learning, and artificial intelligence played a central role in this extraordinary discovery.

Data is the new electricity, and as per HBR, data scientist is the “sexiest” job of the 21st century. Data is fueling business decisions and making its impact felt across all sectors and walks of life. It is allowing us to create intelligent products, improvise marketing strategies, innovate business strategies, enhance safety mechanisms, arrest fraud, reduce environmental pollution, and create path-breaking medicines. Our everyday life is enriched and our social media interactions are more organized. It is allowing us to reduce costs, increase profits, and optimize operations. It offers a fantastic growth and career path ahead, but there is a dearth of talent in the field.

This book attempts to educate the reader in a branch of machine learning called supervised learning. This book covers a spectrum of supervised learning algorithms and respective Python implementations. Throughout the book, we are discussing building blocks of algorithms, their nuts and bolts, mathematical foundations, and background process. The learning is complemented by developing actual Python code from scratch with step-by-step explanation of the code.

The book starts with an introduction to machine learning where machine learning concepts, the difference between supervised, semi-supervised, and unsupervised learning approaches, and practical use cases are discussed. In the next chapter, we examine regression algorithms like linear regression, multinomial regression, decision tree, random forest, and so on. It is then followed by a chapter on classification algorithms using logistic regression, naïve Bayes, knn, decision tree, and random forest. In the next chapter, advanced concepts of GBM, SVM, and neural network are studied. We are working on structured data as well as text and image data in the book. Pragmatic Python implementation complements the understanding. It is then followed by the final chapter on end-to-end model development. The reader gets Python code, datasets, best practices, resolution of common issues and pitfalls, and pragmatic first-hand knowledge on implementing algorithms. The reader will be able to run the codes and extend them in an innovative manner, as well as will understand how to approach a supervised learning problem. Your prowess as a data science enthusiast is going to get a big boost, so get ready for these fruitful lessons!

The book is suitable for researchers and students who want to explore supervised learning concepts with Python implementation. It is recommended for working professionals who yearn to stay on the edge of technology, clarify advanced concepts, and get best practices and solutions to common challenges. It is intended for business leaders who wish to gain first-hand knowledge and develop confidence while they communicate with their teams and clientele. Above all, it is meant for a curious person who is trying to explore how supervised learning algorithms work and who would like to try Python.

Stay blessed, stay healthy!

—Vaibhav Verdhan

Limerick,

Ireland. June 2020

Acknowledgments

I would like to thank Apress publications, Celestin John, Shrikant Vishwarkarma, and Irfan Elahi for the confidence shown and the support extended. Many thanks to Dr. Eli Kling for the fantastic forward to the book. Special words for my family—Yashi, Pakhi, and Rudra—without their support it would have been impossible to complete this work.

Table of Contents
Index 367
About the Author
Vaibhav Verdhan
../images/499122_1_En_BookFrontmatter_Figb_HTML.jpg

has 12+ years of experience in data science, machine learning, and artificial intelligence. An MBA with engineering background, he is a hands-on technical expert with acumen to assimilate and analyze data. He has led multiple engagements in ML and AI across geographies and across retail, telecom, manufacturing, energy, and utilities domains. Currently he resides in Ireland with his family and is working as a Principal Data Scientist.

 
About the Technical Reviewer
Irfan Elahi
../images/499122_1_En_BookFrontmatter_Figc_HTML.jpg

is a full stack customer-focused cloud analytics specialist bearing the unique and proven combination of diverse consulting and technical competencies (cloud, big data, and machine learning) with a growing portfolio of successful projects delivering substantial impact and value in multiple capacities across telecom, retail, energy, and health-care sectors. Additionally, he is an analytics evangelist as is evident from the published book, Udemy courses, blogposts, trainings, lectures, and presentations with global reach.

 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.118.99