0%

Book Description

Reinforcement learning (RL) will deliver one of the biggest breakthroughs in AI over the next decade, enabling algorithms to learn from their environment to achieve arbitrary goals. This exciting development avoids constraints found in traditional machine learning (ML) algorithms. This practical book shows data science and AI professionals how to perform the reinforcement process that allows a machine to learn by itself.

Author Dr. Phil Winder of Winder Research covers everything from basic building blocks to state-of-the-art practices. You’ll explore the current state of RL, focusing on industrial applications, and learn numerous algorithms, frameworks, and environments. This is no cookbook—it doesn’t shy away from math and expects familiarity with ML.

  • Learn what RL is and how the algorithms help solve problems
  • Become grounded in RL fundamentals including Markov decision processes, dynamic programming, and temporal difference learning
  • Dive deep into value methods and policy gradient methods
  • Apply advanced RL implementations such as meta learning, hierarchical learning, evolutionary algorithms, and imitation learning
  • Understand cutting-edge deep RL algorithms including Rainbow, PPO, TD3, SAC, and more
  • Get practical examples through the accompanying Git repository

Table of Contents

  1. 1. Why Reinforcement Learning?
    1. Why Now?
    2. Machine Learning
    3. Reinforcement Learning
      1. When Should You Use RL?
      2. RL Applications
    4. Taxonomy of RL Approaches
      1. Model-Free or Model-Based
      2. Online or Offline Updates to the Strategy
      3. Discrete or Continuous Actions
      4. Optimization
    5. Fundamental Concepts in Reinforcement Learning
      1. The First RL Algorithm
      2. Value Estimation
      3. Prediction Error
      4. Weight Update Rule
      5. Prediction and Control
      6. Motivation
    6. Rewards And Feedback
      1. Dopamine as a Reward Signal
      2. System 1 or System 2?
    7. Reinforcement Learning as a Discipline
    8. Summary
    9. Further Reading
  2. 2. Markov Decision Processes, Dynamic Programming and Monte Carlo Methods
    1. A/B Testing
      1. Reward Engineering
      2. Value Function
      3. Choosing the Best Action
      4. Simulating the Environment
      5. Running the Experiment
      6. Improving the ϵ -greedy Algorithm
    2. Markov Decision Processes
      1. Inventory Control
      2. Inventory Control Simulation
    3. Policies and Value Functions
      1. Discounted Rewards
      2. Predicting Rewards with the State-Value Function
      3. Predicting Rewards with the Action-Value Function
      4. Optimal Policies
    4. Monte Carlo Policy Generation
    5. Value Iteration with Dynamic Programming
      1. Implementing Value Iteration
      2. Results of Value Iteration
    6. Summary
    7. Further Reading
  3. 3. Temporal-Difference Learning, Q-Learning and n -Step Algorithms
    1. Formulation of TD Learning
    2. Q-Learning
    3. SARSA
    4. Q-Learning vs. SARSA
    5. Industrial Example: Real-Time Bidding in Advertising
      1. Defining the MDP
      2. Results of the Real-Time Bidding Environments
      3. Further Improvements
    6. Extensions to Q-Learning
      1. Multiple Learners
      2. Delaying Updates
      3. Comparing Standard, Double and Delayed Q-Learning
      4. Opposition Learning
    7. n -Step Algorithms
      1. n -Step Implementation
    8. Eligibility Traces
    9. Extensions to Eligibility Traces
      1. Watkin’s Q( λ )
      2. Fuzzy Wipes in Watkin’s Q( λ )
      3. Speedy Q-Learning
      4. Accumulating vs. Replacing Eligibility Traces
    10. Conclusions
    11. Further Reading
3.133.12.172