0%

Book Description

Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems

Key Features

  • Delve into machine learning with this comprehensive guide to scikit-learn and scientific Python
  • Master the art of data-driven problem-solving with hands-on examples
  • Foster your theoretical and practical knowledge of supervised and unsupervised machine learning algorithms

Book Description

Machine learning is applied everywhere, from business to research and academia, while scikit-learn is a versatile library that is popular among machine learning practitioners. This book serves as a practical guide for anyone looking to provide hands-on machine learning solutions with scikit-learn and Python toolkits.

The book begins with an explanation of machine learning concepts and fundamentals, and strikes a balance between theoretical concepts and their applications. Each chapter covers a different set of algorithms, and shows you how to use them to solve real-life problems. You'll also learn about various key supervised and unsupervised machine learning algorithms using practical examples. Whether it is an instance-based learning algorithm, Bayesian estimation, a deep neural network, a tree-based ensemble, or a recommendation system, you'll gain a thorough understanding of its theory and learn when to apply it. As you advance, you'll learn how to deal with unlabeled data and when to use different clustering and anomaly detection algorithms.

By the end of this machine learning book, you'll have learned how to take a data-driven approach to provide end-to-end machine learning solutions. You'll also have discovered how to formulate the problem at hand, prepare required data, and evaluate and deploy models in production.

What you will learn

  • Understand when to use supervised, unsupervised, or reinforcement learning algorithms
  • Find out how to collect and prepare your data for machine learning tasks
  • Tackle imbalanced data and optimize your algorithm for a bias or variance tradeoff
  • Apply supervised and unsupervised algorithms to overcome various machine learning challenges
  • Employ best practices for tuning your algorithm's hyper parameters
  • Discover how to use neural networks for classification and regression
  • Build, evaluate, and deploy your machine learning solutions to production

Who this book is for

This book is for data scientists, machine learning practitioners, and anyone who wants to learn how machine learning algorithms work and to build different machine learning models using the Python ecosystem. The book will help you take your knowledge of machine learning to the next level by grasping its ins and outs and tailoring it to your needs. Working knowledge of Python and a basic understanding of underlying mathematical and statistical concepts is required.

Table of Contents

  1. Title Page
  2. Copyright and Credits
    1. Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits
  3. About Packt
    1. Why subscribe?
  4. Contributors
    1. About the author
    2. About the reviewers
    3. Packt is searching for authors like you
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Download the example code files
    5. Download the color images
    6. Conventions used
    7. Get in touch
    8. Reviews
  6. Section 1: Supervised Learning
  7. Introduction to Machine Learning
    1. Understanding machine learning
    2. Types of machine learning algorithms
    3. Supervised learning
    4. Classification versus regression
    5. Supervised learning evaluation
    6. Unsupervised learning
    7. Reinforcement learning
    8. The model development life cycle
    9. Understanding a problem
    10. Splitting our data
    11. Finding the best manner to split the data
    12. Making sure the training and the test datasets are separate
    13. Development set
    14. Evaluating our model
    15. Deploying in production and monitoring
    16. Iterating
    17. When to use machine learning
    18. Introduction to scikit-learn
    19. It plays well with the Python data ecosystem
    20. Practical level of abstraction
    21. When not to use scikit-learn
    22. Installing the packages you need
    23. Introduction to pandas
    24. Python's scientific computing ecosystem conventions
    25. Summary
    26. Further reading
  8. Making Decisions with Trees
    1. Understanding decision trees
    2. What are decision trees?
    3. Iris classification
    4. Loading the Iris dataset
    5. Splitting the data
    6. Training the model and using it for prediction
    7. Evaluating our predictions
    8. Which features were more important?
    9. Displaying the internal tree decisions 
    10. How do decision trees learn? 
    11. Splitting criteria
    12. Preventing overfitting
    13. Predictions
    14. Getting a more reliable score
    15. What to do now to get a more reliable score
    16. ShuffleSplit
    17. Tuning the hyperparameters for higher accuracy
    18. Splitting the data
    19. Trying different hyperparameter values
    20. Comparing the accuracy scores
    21. Visualizing the tree's decision boundaries
    22. Feature engineering
    23. Building decision tree regressors
    24. Predicting people's heights
    25. Regressor's evaluation  
    26. Setting sample weights
    27. Summary
  9. Making Decisions with Linear Equations
    1. Understanding linear models
    2. Linear equations
    3. Linear regression
    4. Estimating the amount paid to the taxi driver
    5. Predicting house prices in Boston
    6. Data exploration
    7. Splitting the data
    8. Calculating a baseline 
    9. Training the linear regressor
    10. Evaluating our model's accuracy
    11. Showing feature coefficients 
    12. Scaling for more meaningful coefficients
    13. Adding polynomial features
    14. Fitting the linear regressor with the derived features
    15. Regularizing the regressor
    16. Training the lasso regressor
    17. Finding the optimum regularization parameter
    18. Finding regression intervals
    19. Getting to know additional linear regressors
    20. Using logistic regression for classification
    21. Understanding the logistic function
    22. Plugging the logistic function into a linear model
    23. Objective function
    24. Regularization
    25. Solvers
    26. Configuring the logistic regression classifier
    27. Classifying the Iris dataset using logistic regression
    28. Understanding the classifier's decision boundaries
    29. Getting to know additional linear classifiers
    30. Summary
  10. Preparing Your Data
    1. Imputing missing values
    2. Setting missing values to 0
    3. Setting missing values to the mean
    4. Using informed estimations for missing values
    5. Encoding non-numerical columns
    6. One-hot encoding
    7. Ordinal encoding
    8. Target encoding
    9. Homogenizing the columns' scale
    10. The standard scaler
    11. The MinMax scaler
    12. RobustScaler
    13. Selecting the most useful features
    14. VarianceThreshold
    15. Filters
    16. f-regression and f-classif
    17. Mutual information
    18. Comparing and using the different filters
    19. Evaluating multiple features at a time
    20. Summary
  11. Image Processing with Nearest Neighbors
    1. Nearest neighbors
    2. Loading and displaying images
    3. Image classification
    4. Using a confusion matrix to understand the model's mistakes
    5. Picking a suitable metric
    6. Setting the correct K
    7. Hyperparameter tuning using GridSearchCV
    8. Using custom distances
    9. Using nearest neighbors for regression
    10. More neighborhood algorithms 
    11. Radius neighbors 
    12. Nearest centroid classifier
    13. Reducing the dimensions of our image data
    14. Principal component analysis
    15. Neighborhood component analysis
    16. Comparing PCA to NCA
    17. Picking the most informative components 
    18. Using the centroid classifier with PCA 
    19. Restoring the original image from its components 
    20. Finding the most informative pixels 
    21. Summary
  12. Classifying Text Using Naive Bayes
    1. Splitting sentences into tokens
    2. Tokenizing with string split
    3. Tokenizing using regular expressions
    4. Using placeholders before tokenizing
    5. Vectorizing text into matrices
    6. Vector space model
    7. Bag of words
    8. Different sentences, same representation
    9. N-grams
    10. Using characters instead of words
    11. Capturing important words with TF-IDF
    12. Representing meanings with word embedding
    13. Word2Vec
    14. Understanding Naive Bayes
    15. The Bayes rule 
    16. Calculating the likelihood naively 
    17. Naive Bayes implementations
    18. Additive smoothing
    19. Classifying text using a Naive Bayes classifier
    20. Downloading the data
    21. Preparing the data
    22. Precision, recall, and F1 score
    23. Pipelines
    24. Optimizing for different scores
    25. Creating a custom transformer
    26. Summary
  13. Section 2: Advanced Supervised Learning
  14. Neural Networks – Here Comes Deep Learning
    1. Getting to know MLP
    2. Understanding the algorithm's architecture 
    3. Training the neural network
    4. Configuring the solvers 
    5. Classifying items of clothing 
    6. Downloading the Fashion-MNIST dataset
    7. Preparing the data for classification
    8. Experiencing the effects of the hyperparameters 
    9. Learning not too quickly and not too slowly
    10. Picking a suitable batch size
    11. Checking whether more training samples are needed
    12. Checking whether more epochs are needed
    13. Choosing the optimum architecture and hyperparameters 
    14. Adding your own activation function
    15. Untangling the convolutions
    16. Extracting features by convolving
    17. Reducing the dimensionality of the data via max pooling
    18. Putting it all together
    19. MLP regressors
    20. Summary
  15. Ensembles – When One Model Is Not Enough
    1. Answering the question why ensembles? 
    2. Combining multiple estimators via averaging
    3. Boosting multiple biased estimators 
    4. Downloading the UCI Automobile dataset
    5. Dealing with missing values
    6. Differentiating between numerical features and categorical ones
    7. Splitting the data into training and test sets
    8. Imputing the missing values and encoding the categorical features
    9. Using random forest for regression
    10. Checking the effect of the number of trees
    11. Understanding the effect of each training feature
    12. Using random forest for classification
    13. The ROC curve
    14. Using bagging regressors
    15. Preparing a mixture of numerical and categorical features
    16. Combining KNN estimators using a bagging meta-estimator
    17. Using gradient boosting to predict automobile prices
    18. Plotting the learning deviance
    19. Comparing the learning rate settings
    20. Using different sample sizes
    21. Stopping earlier and adapting the learning rate
    22. Regression ranges 
    23. Using AdaBoost ensembles 
    24. Exploring more ensembles
    25. Voting ensembles 
    26. Stacking ensembles 
    27. Random tree embedding
    28. Summary
  16. The Y is as Important as the X
    1. Scaling your regression targets
    2. Estimating multiple regression targets 
    3. Building a multi-output regressor 
    4. Chaining multiple regressors 
    5. Dealing with compound classification targets
    6. Converting a multi-class problem into a set of binary classifiers
    7. Estimating multiple classification targets 
    8. Calibrating a classifier's probabilities 
    9. Calculating the precision at k
    10. Summary
  17. Imbalanced Learning – Not Even 1% Win the Lottery
    1. Getting the click prediction dataset 
    2. Installing the imbalanced-learn library
    3. Predicting the CTR
    4. Weighting the training samples differently
    5. The effect of the weighting on the ROC
    6. Sampling the training data
    7. Undersampling the majority class
    8. Oversampling the minority class
    9. Combining data sampling with ensembles 
    10. Equal opportunity score
    11. Summary
  18. Section 3: Unsupervised Learning and More
  19. Clustering – Making Sense of Unlabeled Data
    1. Understanding clustering
    2. K-means clustering
    3. Creating a blob-shaped dataset
    4. Visualizing our sample data
    5. Clustering with K-means
    6. The silhouette score
    7. Choosing the initial centroids
    8. Agglomerative clustering
    9. Tracing the agglomerative clustering's children
    10. The adjusted Rand index
    11. Choosing the cluster linkage 
    12. DBSCAN
    13. Summary
  20. Anomaly Detection – Finding Outliers in Data
    1. Unlabeled anomaly detection
    2. Generating sample data
    3. Detecting anomalies using basic statistics
    4. Using percentiles for multi-dimensional data
    5. Detecting outliers using EllipticEnvelope
    6. Outlier and novelty detection using LOF
    7. Novelty detection using LOF
    8. Detecting outliers using isolation forest
    9. Summary
  21. Recommender System – Getting to Know Their Taste
    1. The different recommendation paradigms
    2. Downloading surprise and the dataset 
    3. Downloading the KDD Cup 2012 dataset
    4. Processing and splitting the dataset
    5. Creating a random recommender
    6. Using KNN-inspired algorithms
    7. Using baseline algorithms
    8. Using singular value decomposition
    9. Extracting latent information via SVD  
    10. Comparing the similarity measures for the two matrices
    11. Click prediction using SVD
    12. Deploying machine learning models in production
    13. Summary
  22. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think
3.12.162.179