0%

Book Description

Master an array of machine learning techniques with real-world projects that interface TensorFlow with R, H2O, MXNet, and other languages

Key Features

  • Gain expertise in machine learning, deep learning, and predictive modeling techniques
  • Build intelligent end-to-end projects for finance, social media, and a variety of other domains
  • Implement multi-class classification, regression, and clustering in your models

Book Description

R is one of the most popular languages when it comes to exploring the mathematical side of machine learning and easily performing computational statistics.

This Learning Path shows you how to leverage the R ecosystem to build efficient machine learning applications that carry out intelligent tasks within your organization. You’ll work through realistic projects such as building powerful machine learning models with ensembles to predict employee attrition. Next, you’ll explore different clustering techniques to segment customers using wholesale data and even apply TensorFlow and Keras-R for performing advanced computations. Each chapter will help you implement advanced machine learning algorithms using real-world examples. You’ll also be introduced to reinforcement learning along with its use cases and models. Finally, this Learning Path will provide you with a glimpse into how some of these black box models can be diagnosed and understood.

By the end of this Learning Path, you’ll be equipped with the skills you need to deploy machine learning techniques in your own projects.

What you will learn

  • Develop a joke recommendation engine to show jokes that match users’ tastes
  • Build autoencoders for credit card fraud detection
  • Work with image recognition and convolutional neural networks
  • Make predictions for casino slot machines using reinforcement learning
  • Implement natural language processing (NLP) techniques for sentiment analysis and customer segmentation
  • Produce simple and effective data visualizations for improved insights
  • Use NLP to extract insights for text
  • Implement tree-based classifiers including random forest and boosted tree

Who this book is for

If you're a data analyst, data scientist, or machine learning developer who wants to master machine learning techniques using R, this is an ideal Learning Path for you. Each project will help you test your skills in implementing machine learning algorithms and techniques. A basic understanding of machine learning and working knowledge of R programming is necessary to get the most out of this Learning Path.

Table of Contents

  1. Title Page
  2. Copyright and Credits
    1. Advanced Machine Learning with R
  3. About Packt
    1. Why subscribe?
    2. Packt.com
  4. Contributors
    1. About the authors
    2. Packt is searching for authors like you
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Conventions used
    4. Get in touch
      1. Reviews
  6. Preparing and Understanding Data
    1. Overview
    2. Reading the data
    3. Handling duplicate observations
      1. Descriptive statistics
      2. Exploring categorical variables
    4. Handling missing values
    5. Zero and near-zero variance features
    6. Treating the data
      1. Correlation and linearity
    7. Summary
  7. Linear Regression
    1. Univariate linear regression
      1. Building a univariate model
      2. Reviewing model assumptions
    2. Multivariate linear regression
      1. Loading and preparing the data
      2. Modeling and evaluation – stepwise regression
      3. Modeling and evaluation – MARS
      4. Reverse transformation of natural log predictions
    3. Summary
  8. Logistic Regression
    1. Classification methods and linear regression
    2. Logistic regression
    3. Model training and evaluation
      1. Training a logistic regression algorithm
        1. Weight of evidence and information value
        2. Feature selection
        3. Cross-validation and logistic regression
      2. Multivariate adaptive regression splines
      3. Model comparison
    4. Summary
  9. Advanced Feature Selection in Linear Models
    1. Regularization overview
      1. Ridge regression
      2. LASSO
      3. Elastic net
    2. Data creation
    3. Modeling and evaluation
      1. Ridge regression
      2. LASSO
      3. Elastic net
    4. Summary
  10. K-Nearest Neighbors and Support Vector Machines
    1. K-nearest neighbors
    2. Support vector machines
    3. Manipulating data
      1. Dataset creation
      2. Data preparation
    4. Modeling and evaluation
      1. KNN modeling
      2. Support vector machine
    5. Summary
  11. Tree-Based Classification
    1. An overview of the techniques
      1. Understanding a regression tree
      2. Classification trees
      3. Random forest
      4. Gradient boosting
    2. Datasets and modeling
      1. Classification tree
      2. Random forest
        1. Extreme gradient boosting – classification
      3. Feature selection with random forests
    3. Summary
  12. Neural Networks and Deep Learning
    1. Introduction to neural networks
    2. Deep learning – a not-so-deep overview
      1. Deep learning resources and advanced methods
    3. Creating a simple neural network
      1. Data understanding and preparation
      2. Modeling and evaluation
    4. An example of deep learning
      1. Keras and TensorFlow background
      2. Loading the data
      3. Creating the model function
      4. Model training
    5. Summary
  13. Creating Ensembles and Multiclass Methods
    1. Ensembles
    2. Data understanding
    3. Modeling and evaluation
      1. Random forest model
      2. Creating an ensemble
    4. Summary
  14. Cluster Analysis
    1. Hierarchical clustering
      1. Distance calculations
    2. K-means clustering
    3. Gower and PAM
      1. Gower
      2. PAM
    4. Random forest
    5. Dataset background
    6. Data understanding and preparation
    7. Modeling 
      1. Hierarchical clustering
      2. K-means clustering
      3. Gower and PAM
      4. Random forest and PAM
    8. Summary
  15. Principal Component Analysis
    1. An overview of the principal components
      1. Rotation
    2. Data
      1. Data loading and review
      2. Training and testing datasets
    3. PCA modeling
      1. Component extraction
      2. Orthogonal rotation and interpretation
      3. Creating scores from the components
      4. Regression with MARS
      5. Test data evaluation
    4. Summary
  16. Association Analysis
    1. An overview of association analysis
      1. Creating transactional data
    2. Data understanding
    3. Data preparation
    4. Modeling and evaluation
    5. Summary
  17. Time Series and Causality
    1. Univariate time series analysis
      1. Understanding Granger causality
    2. Time series data
      1. Data exploration
    3. Modeling and evaluation
      1. Univariate time series forecasting
      2. Examining the causality
        1. Linear regression
        2. Vector autoregression
    4. Summary
  18. Text Mining
    1. Text mining framework and methods
      1. Topic models
      2. Other quantitative analysis
    2. Data overview
      1. Data frame creation
    3. Word frequency
      1. Word frequency in all addresses
      2. Lincoln's word frequency
    4. Sentiment analysis
    5. N-grams
    6. Topic models
    7. Classifying text
      1. Data preparation
      2. LASSO model
    8. Additional quantitative analysis
    9. Summary
  19. Exploring the Machine Learning Landscape
    1. ML versus software engineering
    2. Types of ML methods
      1. Supervised learning
      2. Unsupervised learning
      3. Semi-supervised learning
      4. Reinforcement learning
      5. Transfer learning
    3. ML terminology – a quick review
      1. Deep learning
      2. Big data
      3. Natural language processing
      4. Computer vision
      5. Cost function
      6. Model accuracy
      7. Confusion matrix
      8. Predictor variables
      9. Response variable
      10. Dimensionality reduction
      11. Class imbalance problem
      12. Model bias and variance
      13. Underfitting and overfitting
      14. Data preprocessing
      15. Holdout sample
      16. Hyperparameter tuning
      17. Performance metrics
      18. Feature engineering
      19. Model interpretability
    4. ML project pipeline
      1. Business understanding
      2. Understanding and sourcing the data
      3. Preparing the data 
      4. Model building and evaluation
      5. Model deployment
    5. Learning paradigm
    6. Datasets
    7. Summary
  20. Predicting Employee Attrition Using Ensemble Models
    1. Philosophy behind ensembling 
    2. Getting started
    3. Understanding the attrition problem and the dataset 
    4. K-nearest neighbors model for benchmarking the performance
    5. Bagging
      1. Bagged classification and regression trees (treeBag) implementation
      2. Support vector machine bagging (SVMBag) implementation
      3. Naive Bayes (nbBag) bagging implementation
    6. Randomization with random forests
      1. Implementing an attrition prediction model with random forests
    7. Boosting 
      1. The GBM implementation
      2. Building attrition prediction model with XGBoost
    8. Stacking 
      1. Building attrition prediction model with stacking
    9. Summary
  21. Implementing a Jokes Recommendation Engine
    1. Fundamental aspects of recommendation engines
      1. Recommendation engine categories
        1. Content-based filtering
        2. Collaborative filtering
        3. Hybrid filtering
    2. Getting started
    3. Understanding the Jokes recommendation problem and the dataset
      1. Converting the DataFrame
      2. Dividing the DataFrame
    4. Building a recommendation system with an item-based collaborative filtering technique
    5. Building a recommendation system with a user-based collaborative filtering technique
    6. Building a recommendation system based on an association-rule mining technique
      1. The Apriori algorithm
    7. Content-based recommendation engine
      1. Differentiating between ITCF and content-based recommendations
    8. Building a hybrid recommendation system for Jokes recommendations
    9. Summary
    10. References
  22. Sentiment Analysis of Amazon Reviews with NLP
    1. The sentiment analysis problem
    2. Getting started
    3. Understanding the Amazon reviews dataset
    4. Building a text sentiment classifier with the BoW approach
      1. Pros and cons of the BoW approach
    5. Understanding word embedding
    6. Building a text sentiment classifier with pretrained word2vec word embedding based on Reuters news corpus
    7. Building a text sentiment classifier with GloVe word embedding
    8. Building a text sentiment classifier with fastText
    9. Summary
  23. Customer Segmentation Using Wholesale Data
    1. Understanding customer segmentation
    2. Understanding the wholesale customer dataset and the segmentation problem
      1. Categories of clustering algorithms
    3. Identifying the customer segments in wholesale customer data using k-means clustering
      1. Working mechanics of the k-means algorithm
    4. Identifying the customer segments in the wholesale customer data using DIANA
    5. Identifying the customer segments in the wholesale customers data using AGNES
    6. Summary
  24. Image Recognition Using Deep Neural Networks
    1. Technical requirements
    2. Understanding computer vision
    3. Achieving computer vision with deep learning
      1. Convolutional Neural Networks
        1. Layers of CNNs
    4. Introduction to the MXNet framework
    5. Understanding the MNIST dataset
    6. Implementing a deep learning network for handwritten digit recognition
      1. Implementing dropout to avoid overfitting
      2. Implementing the LeNet architecture with the MXNet library
    7. Implementing computer vision with pretrained models
    8. Summary
  25. Credit Card Fraud Detection Using Autoencoders
    1. Machine learning in credit card fraud detection
    2. Autoencoders explained
      1. Types of AEs based on hidden layers
      2. Types of AEs based on restrictions
      3. Applications of AEs
    3. The credit card fraud dataset
    4. Building AEs with the H2O library in R
      1. Autoencoder code implementation for credit card fraud detection
    5. Summary
  26. Automatic Prose Generation with Recurrent Neural Networks
    1. Understanding language models
    2. Exploring recurrent neural networks
      1. Comparison of feedforward neural networks and RNNs
    3. Backpropagation through time
    4. Problems and solutions to gradients in RNN
      1. Exploding gradients
      2. Vanishing gradients
    5. Building an automated prose generator with an RNN
      1. Implementing the project
    6. Summary
  27. Winning the Casino Slot Machines with Reinforcement Learning
    1. Understanding RL
      1. Comparison of RL with other ML algorithms
      2. Terminology of RL
      3. The multi-arm bandit problem
      4. Strategies for solving MABP
        1. The epsilon-greedy algorithm
        2. Boltzmann or softmax exploration
        3. Decayed epsilon greedy
        4. The upper confidence bound algorithm
        5. Thompson sampling
    2. Multi-arm bandit – real-world use cases
    3. Solving the MABP with UCB and Thompson sampling algorithms
    4. Summary
  28. Creating a Package
    1. Creating a new package
    2. Summary
  29. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think
3.137.163.62