0%

Get hands-on experience in creating state-of-the-art reinforcement learning agents using TensorFlow and RLlib to solve complex real-world business and industry problems with the help of expert tips and best practices

Key Features

  • Understand how large-scale state-of-the-art RL algorithms and approaches work
  • Apply RL to solve complex problems in marketing, robotics, supply chain, finance, cybersecurity, and more
  • Explore tips and best practices from experts that will enable you to overcome real-world RL challenges

Book Description

Reinforcement learning (RL) is a field of artificial intelligence (AI) used for creating self-learning autonomous agents. Building on a strong theoretical foundation, this book takes a practical approach and uses examples inspired by real-world industry problems to teach you about state-of-the-art RL.

Starting with bandit problems, Markov decision processes, and dynamic programming, the book provides an in-depth review of the classical RL techniques, such as Monte Carlo methods and temporal-difference learning. After that, you will learn about deep Q-learning, policy gradient algorithms, actor-critic methods, model-based methods, and multi-agent reinforcement learning. Then, you'll be introduced to some of the key approaches behind the most successful RL implementations, such as domain randomization and curiosity-driven learning.

As you advance, you'll explore many novel algorithms with advanced implementations using modern Python libraries such as TensorFlow and Ray's RLlib package. You'll also find out how to implement RL in areas such as robotics, supply chain management, marketing, finance, smart cities, and cybersecurity while assessing the trade-offs between different approaches and avoiding common pitfalls.

By the end of this book, you'll have mastered how to train and deploy your own RL agents for solving RL problems.

What you will learn

  • Model and solve complex sequential decision-making problems using RL
  • Develop a solid understanding of how state-of-the-art RL methods work
  • Use Python and TensorFlow to code RL algorithms from scratch
  • Parallelize and scale up your RL implementations using Ray's RLlib package
  • Get in-depth knowledge of a wide variety of RL topics
  • Understand the trade-offs between different RL approaches
  • Discover and address the challenges of implementing RL in the real world

Who this book is for

This book is for expert machine learning practitioners and researchers looking to focus on hands-on reinforcement learning with Python by implementing advanced deep reinforcement learning concepts in real-world projects. Reinforcement learning experts who want to advance their knowledge to tackle large-scale and complex sequential decision-making problems will also find this book useful. Working knowledge of Python programming and deep learning along with prior experience in reinforcement learning is required.

Table of Contents

  1. Mastering Reinforcement Learning with Python
  2. Why subscribe?
  3. Contributors
  4. About the author
  5. About the reviewers
  6. Packt is searching for authors like you
  7. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Download the example code files
    5. Download the color images
    6. Conventions used
    7. Get in touch
    8. Reviews
  8. Section 1: Reinforcement Learning Foundations
  9. Chapter 1: Introduction to Reinforcement Learning
    1. Why reinforcement learning?
    2. The three paradigms of ML
    3. Supervised learning
    4. Unsupervised learning
    5. Reinforcement learning
    6. RL application areas and success stories
    7. Games
    8. Robotics and autonomous systems
    9. Supply chain
    10. Manufacturing
    11. Personalization and recommender systems
    12. Smart cities
    13. Elements of a RL problem
    14. RL concepts
    15. Casting Tic-Tac-Toe as a RL problem
    16. Setting up your RL environment
    17. Hardware requirements
    18. Operating system
    19. Software toolbox
    20. Summary
    21. References
  10. Chapter 2: Multi-Armed Bandits
    1. Exploration-Exploitation Trade-Off
    2. What is a MAB?
    3. Problem definition
    4. Experimenting with a simple MAB problem
    5. Case study: Online advertising
    6. A/B/n testing
    7. Notation
    8. Application to the online advertising scenario
    9. Advantages and Disadvantages of A/B/n Testing
    10. ε-greedy actions
    11. Application to the online advertising scenario
    12. Advantages and disadvantages of ε-greedy actions
    13. Action selection using upper confidence bounds
    14. Application to the online advertising scenario
    15. Advantages and disadvantages of using UCBs
    16. Thompson (Posterior) sampling
    17. Application to the online advertising scenario
    18. Advantages and disadvantages of Thompson sampling
    19. Summary
    20. References
  11. Chapter 3: Contextual Bandits
    1. Why we need function approximations
    2. Using function approximation for context
    3. Case study: Contextual online advertising with synthetic user data
    4. Function approximation with regularized logistic regression
    5. Objective: Regret minimization
    6. Solving the online advertising problem
    7. Using function approximation for action
    8. Case study: Contextual online advertising with user data from U.S. Census
    9. Function approximation using a neural network
    10. Calculating the regret
    11. Solving the online advertising problem
    12. Other applications of multi-armed and contextual bandits
    13. Recommender systems
    14. Webpage / app feature design
    15. Healthcare
    16. Dynamic pricing
    17. Finance
    18. Control systems tuning
    19. Summary
    20. References
  12. Chapter 4: Makings of a Markov Decision Process
    1. Starting with Markov chains
    2. Stochastic processes with Markov property
    3. Classification of states in a Markov chain
    4. Example: -step behavior in the grid world
    5. Example: Sample path in an ergodic Markov chain
    6. Semi-Markov processes and continuous-time Markov chains
    7. Introducing the reward: Markov reward process
    8. Attaching rewards to the grid world example
    9. Relations between average rewards with different initializations
    10. Return, discount and state values
    11. Analytically Calculating the State Values
    12. Estimating the state values iteratively
    13. Bringing the action in: Markov decision process
    14. Definition
    15. Grid world as a Markov decision process
    16. State-value function
    17. Action-value function
    18. Optimal state-value and action-value functions
    19. Bellman optimality
    20. Partially observable Markov decision process
    21. Summary
    22. Exercises
    23. References
  13. Chapter 5: Solving the Reinforcement Learning Problem
    1. Exploring dynamic programming
    2. Example use case: Inventory replenishment of a food truck
    3. Policy evaluation
    4. Policy iteration
    5. Value iteration
    6. Drawbacks of dynamic programming
    7. Training your agent with Monte Carlo methods
    8. Monte Carlo prediction
    9. Monte Carlo control
    10. Temporal-difference learning
    11. One-step TD learning: TD(0)
    12. n-step TD Learning
    13. Understanding the importance of the simulation in reinforcement learning
    14. Summary
    15. References
  14. Section 2: Deep Reinforcement Learning
  15. Chapter 6: Deep Q-Learning at Scale
    1. From tabular Q-learning to deep Q-learning
    2. Neural Fitted Q-iteration
    3. Online Q-learning
    4. Deep Q-networks
    5. Key concepts in deep Q-networks
    6. The DQN algorithm
    7. Extensions to DQN: Rainbow
    8. The extensions
    9. The performance of the integrated agent
    10. How to choose which extensions to use: Ablations to Rainbow
    11. Overcoming the deadly triad
    12. Distributed deep Q-learning
    13. Components of a distributed deep Q-learning architecture
    14. Gorila: General reinforcement learning architecture
    15. Ape-X: Distributed prioritized experience replay
    16. Implementing scalable deep Q-learning algorithms using Ray
    17. A primer on Ray
    18. Ray implementation of a DQN variate
    19. RLlib: Production-grade deep reinforcement learning
    20. Summary
    21. References
  16. Chapter 7: Policy-Based Methods
    1. Need for policy-based methods
    2. A more principled approach
    3. Ability to use with continuous action spaces
    4. Ability to learn truly random stochastic policies
    5. Vanilla policy gradient
    6. Objective in the policy gradient methods
    7. Figuring out the gradient
    8. REINFORCE
    9. The problem with REINFORCE and all policy gradient methods
    10. Vanilla policy gradient using RLlib
    11. Actor-critic methods
    12. Further reducing the variance in policy-based methods
    13. Advantage Actor-Critic: A2C
    14. Asynchronous Advantage Actor-Critic: A3C
    15. Generalized Advantage Estimators
    16. Trust-region methods
    17. Policy gradient as policy iteration
    18. TRPO: Trust Region Policy Optimization
    19. PPO: Proximal Policy Optimization
    20. Revisiting off-policy Methods
    21. DDPG: Deep Deterministic Policy Gradient
    22. TD3: Twin Delayed Deep Deterministic Policy Gradient
    23. SAC: Soft actor-critic
    24. IMPALA: Importance Weighted Actor-Learner Architecture
    25. Comparison of the policy-based methods in Lunar Lander
    26. How to pick the right algorithm?
    27. Open source implementations of policy-gradient methods
    28. Summary
    29. References
  17. Chapter 8: Model-Based Methods
    1. Introducing model-based methods
    2. Planning through a model
    3. Defining the optimal control problem
    4. Random shooting
    5. Cross-entropy method
    6. Covariance matrix adaptation evolution strategy
    7. Monte Carlo tree search
    8. Learning a world model
    9. Understanding what model means
    10. Identifying when to learn a model
    11. Introducing a general procedure to learn a model
    12. Understanding and mitigating the impact of model uncertainty
    13. Learning a model from complex observations
    14. Unifying model-based and model-free approaches
    15. Refresher on Q-learning
    16. Dyna-style acceleration of model-free methods using world models
    17. Summary
    18. References
  18. Chapter 9: Multi-Agent Reinforcement Learning
    1. Introducing multi-agent reinforcement learning
    2. Collaboration and competition between MARL agents
    3. Exploring the challenges in multi-agent reinforcement learning
    4. Non-stationarity
    5. Scalability
    6. Unclear reinforcement learning objective
    7. Information sharing
    8. Training policies in multi-agent settings
    9. RLlib multi-agent environment
    10. Competitive self-play
    11. Training tic-tac-toe agents through self-play
    12. Designing the multi-agent tic-tac-toe environment
    13. Configuring the trainer
    14. Observing the results
    15. Summary
    16. References
  19. Section 3: Advanced Topics in RL
  20. Chapter 10: Introducing Machine Teaching
    1. Introduction to machine teaching
    2. Understanding the need for machine teaching
    3. Exploring the elements of machine teaching
    4. Engineering the reward function
    5. When to engineer the reward function
    6. Reward shaping
    7. Example: Reward shaping for mountain car
    8. Challenges with engineering the reward function
    9. Curriculum learning
    10. Warm starts with demonstrations
    11. Action masking
    12. Summary
    13. References
  21. Chapter 11: Achieving Generalization and Overcoming Partial Observability
    1. Focusing on generalization in reinforcement learning
    2. Generalization and overfitting in supervised learning
    3. Generalization and overfitting in reinforcement learning
    4. Connection between generalization and partial observability
    5. Achieving generalization with domain randomization
    6. Overcoming partial observability with memory
    7. Recipe for generalization
    8. Enriching agent experience via domain randomization
    9. Dimensions of randomization
    10. Curriculum learning for generalization
    11. Using memory to overcome partial observability
    12. Stacking observations
    13. Using RNNs
    14. Transformer architecture
    15. Quantifying generalization via CoinRun
    16. CoinRun environment
    17. Installing the CoinRun environment
    18. The effect of regularization and network architecture on the generalization of RL policies
    19. Network Randomization and Feature Matching
    20. Sunblaze environment
    21. Summary
    22. References
  22. Chapter 12: Meta-Reinforcement Learning
    1. Introducing meta-reinforcement learning
    2. Learning to learn
    3. Defining meta-reinforcement learning
    4. Relation to animal learning
    5. Relation to partial observability and domain randomization
    6. Meta-reinforcement learning with recurrent policies
    7. Grid world example
    8. RLlib implementation
    9. Gradient-based meta-reinforcement learning
    10. RLlib implementation
    11. Meta-reinforcement learning as partially observed reinforcement learning
    12. Challenges in meta-reinforcement learning
    13. Conclusion
    14. References
  23. Chapter 13: Exploring Advanced Topics
    1. Diving deeper into distributed reinforcement learning
    2. Scalable, efficient deep reinforcement learning: SEED RL
    3. Recurrent experience replay in distributed reinforcement learning
    4. Experimenting with SEED RL and R2D2
    5. Exploring curiosity-driven reinforcement learning
    6. Curiosity-driven learning for hard-exploration problems
    7. Challenges in curiosity-driven reinforcement learning
    8. Never Give Up
    9. Agent57 improvements
    10. Offline reinforcement learning
    11. An overview of how offline reinforcement learning works
    12. Why we need special algorithms for offline learning
    13. Why offline reinforcement learning is crucial
    14. Advantage weighted actor-critic
    15. Offline reinforcement learning benchmarks
    16. Summary
    17. References
  24. Section 4: Applications of RL
  25. Chapter 14: Solving Robot Learning
    1. Introducing PyBullet
    2. Setting up PyBullet
    3. Getting familiar with the Kuka environment
    4. Grasping a rectangle block using a Kuka robot
    5. Kuka Gym environment
    6. Developing strategies to solve the Kuka environment
    7. Parametrizing the difficulty of the problem
    8. Using curriculum learning to train the Kuka robot
    9. Customizing the environment for curriculum learning
    10. Designing the lessons in the curriculum
    11. Training the agent using a manually designed curriculum
    12. Curriculum learning using absolute learning progress
    13. Comparing the experiment results
    14. Going beyond PyBullet into autonomous driving
    15. Summary
    16. References
  26. Chapter 15: Supply Chain Management
    1. Optimizing inventory procurement decisions
    2. The need for inventory and the trade off in its management
    3. Components of an inventory optimization problem
    4. Single-step inventory optimization: The newsvendor problem
    5. Simulating multi-step inventory dynamics
    6. Developing a near-optimal benchmark policy
    7. Reinforcement learning solution to the inventory management
    8. Modeling routing problems
    9. Pick-up and delivery of online meal orders
    10. Pointer networks for dynamic combinatorial optimization
    11. Summary
    12. References
  27. Chapter 16: Personalization, Marketing, and Finance
    1. Going beyond bandits for personalization
    2. Shortcomings of bandit models
    3. Deep reinforcement learning for news recommendation
    4. Developing effective marketing strategies using reinforcement learning
    5. Personalized marketing content
    6. Marketing resource allocation for customer acquisition
    7. Reducing customer churn rate
    8. Winning back lost customers
    9. Applying reinforcement learning in finance
    10. Challenges with using reinforcement learning in finance
    11. Introducing TensorTrade
    12. Developing equity trading strategies
    13. Summary
    14. References
  28. Chapter 17: Smart City and Cybersecurity
    1. Controlling traffic lights to optimize vehicle flow
    2. Introducing Flow
    3. Creating an experiment in Flow
    4. Modeling the traffic light control problem
    5. Solving the traffic control problem using RLlib
    6. Further reading
    7. Providing ancillary service to power grid
    8. Power grid operations and ancillary services
    9. Describing the environment and the decision-making problem
    10. Reinforcement learning model
    11. Detecting cyberattacks in a smart grid
    12. The problem of early detection of cyberattacks in a power grid
    13. Partial observability of the grid state
    14. Summary
    15. References
  29. Chapter 18: Challenges and Future Directions in Reinforcement Learning
    1. What you have achieved with this book
    2. Challenges and future directions
    3. Sample efficiency
    4. Need for high-fidelity and fast simulation models
    5. High-dimensional action spaces
    6. Reward function fidelity
    7. Safety, behavior guarantees, and explainability
    8. Reproducibility and sensitivity to hyper-parameter choices
    9. Robustness and adversarial agents
    10. Suggestions for aspiring reinforcement learning experts
    11. Go deeper into the theory
    12. Follow good practitioners and research labs
    13. Learn from papers and from their good explanations
    14. Stay up to date with trends in other fields of deep learning
    15. Read open source repositories
    16. Practice!
    17. Final words
    18. References
  30. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think
18.225.209.95