Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Abhilash Majumder

Deep Reinforcement Learning in Unity

With Unity ML Toolkit

1st ed.

../images/502041_1_En_BookFrontmatter_Figa_HTML.png

Abhilash Majumder

Pune, Maharashtra, India

ISBN 978-1-4842-6502-4e-ISBN 978-1-4842-6503-1

https://doi.org/10.1007/978-1-4842-6503-1

Standard Apress

Trademarked names, logos, and images may appear in this book. Rather than use a trademark symbol with every occurrence of a trademarked name, logo, or image we use the names, logos, and images only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Distributed to the book trade worldwide by Springer Science+Business Media LLC, 1 New York Plaza, Suite 4600, New York, NY 10004. Phone 1-800-SPRINGER, fax (201) 348-4505, e-mail [email protected], or visit www.springeronline.com. Apress Media, LLC is a California LLC and the sole member (owner) is Springer Science + Business Media Finance Inc (SSBM Finance Inc). SSBM Finance Inc is a Delaware corporation.

This book is dedicated to my parents, Abhijit and Sharbari Majumder, and my late grandfather, Shri Paresh Chandra Majumder.

Introduction

Machine learning has been instrumental in shaping the scope of technology since its inception. ML has played an important role in the development of things such as autonomous vehicles and robotics. Deep reinforcement learning is that field of learning where agents learn with help of rewards—a thought which has been derived from nature. Through this book, the author tries to present the diversity of reinforcement learning algorithms in game development as well as in scientific research. Unity, the cross-platform engine that is used in a plethora of tasks, from visual effects and cinematography to machine learning and high performance graphics, is the primary tool that is used in this book. With the power of the Unity ML Agents Toolkit, the deep reinforcement learning framework built by Unity, the author tries to show the vast possibilities of this learning paradigm.

The book starts with an introduction to state-based reinforcement learning, from Markov processes to Bellman equations and Q-learning, which sets the ground for the successive sections. A plethora of diverse pathfinding algorithms, from Dijkstra to sophisticated variants of A* star, have been provided along with simulations in Unity. The book also covers how navigation meshes work for automated pathfinding in Unity. An introduction to the ML Agents Toolkit, from standard process for installation to training an AI agent with deep reinforcement learning algorithm (proximal policy operation [PPO]) is provided as a starter. Along the course of this book, there is an extensive usage of the Tensorflow framework along with OpenAI Gym environments for proper visualizations of complex deep reinforcement learning algorithms in terms of simulations, robotics, and autonomous agents. Successive sections of the book involve an in-depth study of the variety of on- and off-policy algorithms, ranging from discrete SARSA/Q-learning to actor critic variants, deep Q-network variants, PPO, and their implementations using the Keras Tensorflow framework on Gym. These sections are instrumental in understanding how different simulations such as the famous Puppo (Unity Berlin), Tiny agents, and other ML Agents samples from Unity are created and built. Sections with detailed descriptions about how to build simulations in Unity using the C# software development kit for ML Agents and training them using soft actor critic (SAC), PPO, or behavioral cloning algorithms such as GAIL are provided.

The latter part of this book provides an insight into curriculum learning and adversarial networks with an analysis of how AI agents are trained in games such as FIFA. In all these sections, a detailed description of the variants of neural networks—MLP, convolution networks, recurrent networks along with long short-term memory and GRU and their implementations and performance are provided. This is especially helpful as they are used extensively during building the deep learning algorithms. The importance of convolution networks for image sampling in Atari-based 2D games such as Pong has been provided. The knowledge of computer vision and deep reinforcement learning is combined to produce autonomous vehicles and driverless cars, which is also provided as an example template (game) for the readers to build upon.

Finally, this book also contains an in-depth review of the Obstacle Tower Challenge, which was organized by Unity Technologies to challenge state-of-the-art deep reinforcement learning algorithms. Sections on certain evolutionary algorithms along with the Google Dopamine framework has been provided for understanding the vast field of reinforcement learning. Through this book, the author hopes to infuse enthusiasm and foster research among the readers in the field of deep reinforcement learning.

Acknowledgments

The amount of dedication and support that I have received in the making of this book has left me amazed. First, I would like to thank my family, Mr. Abhijit Majumder and Mrs. Sharbari Majumder, who have been instrumental in supporting me all the way. I would also like to extend my heartfelt thanks to the entire Apress Team, without whom this would not have been possible. Special thanks to Mrs. Spandana Chatterjee, the Acquisition Editor, Mr. Shrikant Vishwakarma, the Coordinating Editor, and Laura Berendson, the Development Editor, for their constant support and thorough reviews. Ansh Shah, the Technical Reviewer of this book, has also played an important role and I extend my thanks to him.

I would also like to share this space in thanking my mentor, Carl Domingo from Unity Technologies, who has been so instrumental in guiding me from the beginning of my journey with Unity. The Unity Machine Learning team deserves mention, as this book would not have been possible without their constant efforts to make the ML Agents platform amazing. I especially thank Dr. Danny Lange, whose sessions on machine learning have been instrumental in understanding the framework and the concepts.

I am grateful to everyone who helped in the entire process to make this book, which would help readers understand the beauty of deep reinforcement learning.

Table of Contents

Chapter 1: Introduction to Reinforcement Learning 1

OpenAI Gym Environment: CartPole 3

Installation and Setup of Python for ML Agents and Deep Learning 4

Playing with the CartPole Environment for Deep Reinforcement Learning 8

Visualization with TensorBoard 11

Unity Game Engine 12

Markov Models and State-Based Learning 14

Concepts of States in Markov Models 15

Markov Models in Python 17

Downloading and Installing Unity 18

Markov Model with Puppo in Unity 20

Hidden Markov Models 29

Concepts of Hidden Markov Models 29

Hidden Markov Model with Tensorflow 31

Hidden Markov Model Agent in Unity 32

Bellman Equation 39

Bellman Agent Implementation in Unity 40

Creating a Multi-Armed Bandit Reinforcement Learning Agent in Unity 48

Strategies Involved in Multi-Armed Bandits 49

Multi-Armed Bandit Simulation in Unity with UCB Algorithm 51

Building Multi-Armed Bandit with Epsilon-Greedy and Gradient Bandit Algorithms 58

Value and Policy Iterations 60

Implementing Q-Learning Policy Using Taxi Gym Environment 63

Q-Learning in Unity 67

Summary 70

Chapter 2: Pathfinding and Navigation 73

Pathfinding Algorithms 75

Variants of the A* Algorithm 94

Other Variants of Pathfinding 107

Pathfinding in Unity 108

Dijkstra Algorithm in Unity 108

A* Algorithm Simulation in Unity 118

Navigation Meshes 127

Navigation Mesh and Puppo 129

Obstacle Meshes and Puppo 139

Off Mesh Links and Puppo 144

Creating Enemy AI 147

Summary 152

Chapter 3: Setting Up ML Agents Toolkit 155

Overview of the Unity ML Agents Toolkit 156

Installing Baselines and Training A Deep Q-Network 158

Installing Unity ML Agents Toolkit 161

Cloning the Github Unity ML Agents Repository 163

Exploring the Unity ML Agents Examples 164

Local Installation of Unity ML Agents 166

Installing ML Agents from Python Package Index 169

Installation in Virtual Environments 170

Advanced Local Installation for Modifying the ML Agents Environment 172

Configuring Components of ML Agents: Brain and Academy 173

Brain-Academy Architecture 174

Behavior and Decision Scripts in Unity ML Agents Version 1.0 176

Linking Unity ML Agents with Tensorflow 181

Barracuda: Unity Inference Engine 181

Training 3D Ball Environment with Unity ML Agents 183

Visualization with TensorBoard 189

Playing Around with Unity ML Agents Examples 192

Summary 205

Chapter 4: Understanding brain agents and academy 209

Understanding the Architecture of the Brain 211

Sensors 212

Policies 245

Inference 257

Demonstrations 265

Communicators 270

Understanding the Academy and Agent 272

Training an ML Agent with a Single Brain 287

Attach Behavior Parameters Script 288

Writing Agent Script 290

Training Our Agent 296

Visualize with TensorBoard 298

Running in Inference Mode 300

Generic Hyperparameters 300

Summary 302

Chapter 5: Deep Reinforcement Learning 305

Fundamentals of Neural Networks 306

Perceptron Neural Network 306

Dense Neural Network 314

Convolution Neural Network 319

Deep Learning with Keras and TensorFlow 327

Dense Neural Network 327

Convolution Neural Networks 334

Building an Image Classifier Model with Resnet-50 342

Deep Reinforcement Learning Algorithms 344

On Policy Algorithms 345

Off-Policy Algorithms 375

Model-Free RL: Imitation Learning–Behavioral Cloning 390

Building a Proximal Policy Optimization Agent for Puppo 393

Interfacing with Python API 403

Interfacing with Side Channels 407

Training ML Agents with Baselines 409

Understanding Deep RL policies in Unity ML Agents and Memory-Based Networks 415

Model Overrider Script 415

Models Script: Python 424

PPO in Unity ML Agents 434

Long Short-Term Memory Networks in ML Agents 438

Simplified Hyperparameter Tuning For Optimization 441

Hyperparameters for PPO 443

Analyzing Hyperparameters through TensorBoard Training 443

Summary 445

Chapter 6: Competitive Networks for AI Agents 449

Curriculum Learning 449

Curriculum Learning in ML Agents 465

Extended Deep Reinforcement Learning Algorithms 480

Deep Deterministic Policy Gradient 480

Twin Delayed DDPG 485

Adversarial Self-Play and Cooperative Learning 488

Adversarial Self-Play 488

Cooperative Learning 489

Soccer Environment for Adversarial and Cooperative Learning 490

Training Soccer Environment 499

Tensorboard Visualization 501

Building an Autonomous AI Agent for a Mini Kart Game 502

Training Tiny Agent with PPO Policy 508

Visualization and Future Work 509

Summary 510

Chapter 7: Case Studies in ML Agents 513

Evolutionary Algorithms 514

Genetic Algorithm 516

Evolution Strategies 521

Case Study: Obstacle Tower Challenge 528

The Details of Obstacle Tower 529

Procedural Level Generation: Graph Grammar 531

Generalization 532

Challenge Winners and Proposed Algorithms 534

Installation and Resources 535

Interacting with Gym Wrapper 542

Case Study: Unity ML Agents Challenge I 545

Google Dopamine and ML Agents 548

Summary 551

Index 553

About the Author

Abhilash Majumder

../images/502041_1_En_BookFrontmatter_Figb_HTML.jpg

is a natural language processing research engineer for HSBC (UK/India) and technical mentor for Udacity (ML). He also has been associated with Unity Technologies and was a speaker at Unite India-19, and has educated close to 1,000 students from EMEA and SEPAC (India) on Unity. He is an ML contributor and curator for Open Source Google Research, Tensorflow, and Unity ML Agents and a creator of ML libraries under Python Package Index (PyPI). He is a speaker for NLP and deep learning for Pydata-Los Angeles. He is an online educationalist for Udemy, and a deep learning mentor for Upgrad. He is an erstwhile graduate from the National Institute of Technology, Durgapur (NIT-D) majoring in NLP, Machine Learning, and Applied Mathematics. He can be reached via email at [email protected]

Abhilash was a former apprentice/student ambassador for Unity Technologies, where he educated corporate employees and students on using general Unity for game development. He was a technical mentor (AI programming) for the Unity Ambassadors Community and Content Production. He has been associated with Unity Technologies for general education, with an emphasis on graphics and machine learning. He is a community moderator for machine learning (ML Agents) sessions organized by Unity Technologies (Unity Learn). He is one of the first content creators for Unity Technologies India (since 2017) and is responsible for the growth of the community in India under the guidance of Unity Technologies.

About the Technical Reviewer

Ansh Shah

../images/502041_1_En_BookFrontmatter_Figc_HTML.jpg

is pursuing an MSc Physics and BE Mechanical from Bits Pilani University, India. By day, he is a student. By night, he is a robotics and machine learning enthusiast. He is a core member of BITS ROBOCON, a technical team at college, and is currently working on quadcopter and quadruped.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Front Matter

Create new playlist

Sign In

Sign Up

Deep Reinforcement Learning in Unity

With Unity ML Toolkit

Table of Contents for
Front Matter