Title Page Copyright and Credits Hands-On Reinforcement Learning with Python Dedication Packt Upsell Why subscribe? PacktPub.com Contributors About the author About the reviewers Packt is searching for authors like you Preface Who this book is for What this book covers To get the most out of this book Download the example code files Download the color images Conventions used Get in touch Reviews Introduction to Reinforcement Learning What is RL? RL algorithm How RL differs from other ML paradigms Elements of RL Agent Policy function Value function Model Agent environment interface Types of RL environment Deterministic environment Stochastic environment Fully observable environment Partially observable environment Discrete environment Continuous environment Episodic and non-episodic environment Single and multi-agent environment RL platforms OpenAI Gym and Universe DeepMind Lab RL-Glue Project Malmo ViZDoom Applications of RL Education Medicine and healthcare Manufacturing Inventory management Finance Natural Language Processing and Computer Vision Summary Questions Further reading Getting Started with OpenAI and TensorFlow Setting up your machine Installing Anaconda Installing Docker Installing OpenAI Gym and Universe Common error fixes OpenAI Gym Basic simulations Training a robot to walk OpenAI Universe Building a video game bot TensorFlow Variables, constants, and placeholders Variables Constants Placeholders Computation graph Sessions TensorBoard Adding scope Summary Questions Further reading The Markov Decision Process and Dynamic Programming The Markov chain and Markov process Markov Decision Process Rewards and returns Episodic and continuous tasks Discount factor The policy function State value function State-action value function (Q function) The Bellman equation and optimality Deriving the Bellman equation for value and Q functions Solving the Bellman equation Dynamic programming Value iteration Policy iteration Solving the frozen lake problem Value iteration Policy iteration Summary Questions Further reading Gaming with Monte Carlo Methods Monte Carlo methods Estimating the value of pi using Monte Carlo Monte Carlo prediction First visit Monte Carlo Every visit Monte Carlo Let's play Blackjack with Monte Carlo Monte Carlo control Monte Carlo exploration starts On-policy Monte Carlo control Off-policy Monte Carlo control Summary Questions Further reading Temporal Difference Learning TD learning TD prediction TD control Q learning Solving the taxi problem using Q learning SARSA Solving the taxi problem using SARSA The difference between Q learning and SARSA Summary Questions Further reading Multi-Armed Bandit Problem The MAB problem The epsilon-greedy policy The softmax exploration algorithm The upper confidence bound algorithm The Thompson sampling algorithm Applications of MAB Identifying the right advertisement banner using MAB Contextual bandits Summary Questions Further reading Deep Learning Fundamentals Artificial neurons ANNs Input layer Hidden layer Output layer Activation functions Deep diving into ANN Gradient descent Neural networks in TensorFlow RNN Backpropagation through time Long Short-Term Memory RNN Generating song lyrics using LSTM RNN Convolutional neural networks Convolutional layer Pooling layer Fully connected layer CNN architecture Classifying fashion products using CNN Summary Questions Further reading Atari Games with Deep Q Network What is a Deep Q Network? Architecture of DQN Convolutional network Experience replay Target network Clipping rewards Understanding the algorithm Building an agent to play Atari games Double DQN Prioritized experience replay Dueling network architecture Summary Questions Further reading Playing Doom with a Deep Recurrent Q Network DRQN Architecture of DRQN Training an agent to play Doom  Basic Doom game Doom with DRQN DARQN Architecture of DARQN Summary Questions Further reading The Asynchronous Advantage Actor Critic Network The Asynchronous Advantage Actor Critic The three As The architecture of A3C How A3C works Driving up a mountain with A3C Visualization in TensorBoard Summary Questions Further reading Policy Gradients and Optimization Policy gradient Lunar Lander using policy gradients Deep deterministic policy gradient Swinging a pendulum Trust Region Policy Optimization Proximal Policy Optimization Summary Questions Further reading Capstone Project – Car Racing Using DQN Environment wrapper functions Dueling network Replay memory Training the network Car racing Summary Questions Further reading Recent Advancements and Next Steps Imagination augmented agents  Learning from human preference Deep Q learning from demonstrations Hindsight experience replay Hierarchical reinforcement learning MAXQ Value Function Decomposition Inverse reinforcement learning Summary Questions Further reading Assessments Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6 Chapter 7 Chapter 8 Chapter 9 Chapter 10 Chapter 11 Chapter 12 Chapter 13 Other Books You May Enjoy Leave a review - let other readers know what you think