Chapter 1. Introducing Synthesis and Simulation

This book is all about synthesis and simulation, and leveraging the power of modern video game engines for machine learning. Combining machine learning with simulations and synthetic data on the surface sounds relatively straightforward, but the reality is the idea of including video game technology in the serious business world of machine learning scares an unreasonable number of companies and businesses away from the idea.

We hope this book will steer you into this world, and alleviate your concerns. Three of the authors of this book are video game developers with a significant background in computer science, and one is a serious machine learning and data scientist. Our combined perspectives and knowledge, built over many years in a variety of industries and approaches, are presented here for you.

A whole new world of ML

The rest of this chapter is split into four sections:

  • first, we discuss the domains of machine learning that the book explores: simulation and synthesis

  • second, we introduce the tools that we’ll be using—the Unity game engine, the Unity ML-Agents Toolkit, and TensorFlow—and how they fit together

  • third, we’ll look at the techniques we’ll be using for machine learning: proximal policy optimisation (PPO), and TODO

  • fourth and finally, we’ll discuss the projects that we’ll be building through this book, and how they relate to the domains, and the tools

By the end of this chapter you’ll be ready to dive in to the world of simulations and synthesis, you’ll know how a game engine works at a high-level, and you’ll see why it’s a nearly perfect tool for machine learning.

The Domains

The win pillars of this book are simulation and synthesis. In this section we’ll unpack exactly what we mean by each of these terms, and how this book will explore the concepts.

Simulation and synthesis are core parts of the future of artificial intelligence and machine learning.

Many applications immediately jump out at you: combine simulation with deep reinforcement learning to validate how a new robot will function before building a physical product, create the brain of your self-driving car without the car, build your warehouse and train your pick and place robots without the warehouse (or the robots).

Other uses are more subtle: synthesise to create artificial data using simulations, instead of information recorded from the real world, and then train traditional machine learning models; take real user activity and, with behavioural cloning combined with simulations, use it to add a biological- or human-seeming element to an otherwise perfect, machine-learned task.

A video game engine, such as Unity, can simulate enough of the real world, with enough fidelity, to be useful for simulation-based machine learning and artificial intelligence. Not only can a game engine allow you to simulate enough of a city, and a car, to test, train, and validate a self-driving car deep learning model, but it can also simulate the hardware down to the level of engine temperatures, power remaining, LIDAR, sonar, x-ray, and beyond. Want to incorporate a fancy, expensive new sensor in your robot? Try it out, and see if it might improve performance before you invest a single cent in new equipment. Save money, time, compute power, engineering resources, and get a better view of your problem space.

Is your data literally impossible to acquire enough of? Or potentially unsafe? Create a simulation and test your theories. Cheap, unlimited training data is only a simulation away.

Simulation

There’s not one specific thing that we refer to when we say simulation. Simulation, in this context, can mean practically any use of a game engine to develop a scene or environment where machine learning is then applied. In this book, we use simulation as a term to refer, broadly, to the following:

  • creating a environment, using a game engine, with certain components that are the agent, or agents

  • giving the agent the ability to move, or otherwise interact or work with the environment and/or other agents

  • connecting the environment to a machine learning framework to train a model that can operate the agent(s) within the environment

  • using that trained model to operate with the environment in the future, or connecting the model to a similarly equipped agent elsewhere (for example, in the real world, with an actual robot)

Synthesis

Synthesis is a significantly easier thing to pin down: synthesis, in the context of this book, is the creation of ostensibly fake training data using a game engine. For example, if you were building some kind of image identification machine learning model for a supermarket, you might need to take photos of a box of a specific cereal brand from many, many different angles, with many different backgrounds, and contexts.

Using a game engine, you could create and load a 3D model of a box of cereal, and then generate thousands of images of it——synthesising them——in different angles, backgrounds, and skews, and save them out to a standard image format (JPG or PNG, for example). Then, with your enormous trove of training data, you use a perfectly standard machine learning framework and toolkit (e.g. TensorFlow, PyTorch, CreateML, Turi Create, or one of the many web services-based training systems) and train a model that can recognise your cereal box.

This mode could then be deployed to, again for example, some sort of on-trolley AI system that helps people shop, guides them to the items in their shopping list, or helps store staff to fill the shelves correctly and estimate stock.

The synthesis is the creation of the training data, using the game engine, and the game engine often has nothing, or very little, to do with the training process itself.

The Tools

This chapter provides you with an introduction to the tools that we’ll be using in our journey. If you’re not a game developer, the primary new tool that TODO.

Unity

First and foremost, Unity is a game and visual effects engine. Unity describes themselves as a real-time 3D development platform. We’re not going to repeat the marketing material on the Unity website for you, but if you’re curious about how they present themselves, you can check it out.

Tip

This book isn’t here to teach you the fundamentals of Unity. Some of the authors of this book have already written several books on that——from a game development perspective——and you can find those at O’Reilly Media, if you’re interested. You don’t need to learn Unity as a game developer to make use of it for simulation and synthesis with machine learning, and we’ll teach you just enough Unity to be effective at this in this book.

The Unity user interface looks like just about every other professional software package that has 3D features. We’ve included an example screenshot in Figure 1-1. It has panes that can be manipulated, a 3D canvas for working with objects, and lots of settings. We’ll come back to the specifics of Unity’s user interface later. You can get a solid overview of the different elements of it in the Unity documentation.

Editor Breakdown
Figure 1-1. The Unity user interface

The Unity engine comes with a robust set of tools that allow you to simulate gravity, forces, friction, movement, sensors of various kinds and more. These tools are the exact set of tools necessary in order to build a modern video game. It turns out that these are the exact same set of tools in order to create simulations and to synthesis data for machine learning. But you probably already guessed that, given you’re reading our book.

TensorFlow and Unity ML-Agents

If you’re in the machine learning space, you’ve probably heard of the TensorFlow open source project. As one of the most popular platforms and ecosystems for machine learning in both academia and industry it’s nearly ubiquitous. In the simulation and synthesis space, it’s no different: TensorFlow is the go-to framework.

In this book, the underlying machine learning that we explore will, mostly, be done via TensorFlow. We won’t necessarily be getting into the weeds of TensorFlow, because much of the work we’ll be doing with TensorFlow will be via the Unity ML-Agents Toolkit.

Tip

We’re going to spend the rest of this section discussing the Unity ML-Agents Toolkit, so if you need a refresher we highly recommend the TensorFlow website, or one of the many excellent books that O’Reilly Media has on the subject.

TensorFlow is a library that provides support for performing computations using data flow graphs. It supports both training and inference using CPUs and GPUs (and other specialised machine learning hardware), and it runs on a huge variety of platforms ranging from serious ML-optimised servers to mobile devices.

Note

Because most of the work you’ll be doing with TensorFlow in this book is abstracted away, we will rarely be talking in terms of TensorFlow itself. So, while it’s in the background of almost everything we’re going to explore, your primary interface to it will be via the Unity ML-Agents Toolkit, and other tools

Unity ML-Agents

The Unity ML-Agents Toolkit (which, against Unity branding, we’ll abbreviate to UnityML much of the time) is the backbone of the work you’ll be doing with this book. UnityML was released a few years ago, and slowly grew to encompass a range of features that enable a game engine to serve as simulation environments for training and exploring intelligent agents and other machine learning applications. It’s an open source project that ships with many exciting and well-considered examples (as shown in Figure 1-2), and is freely available via its GitHub project.

UnityMLBanner
Figure 1-2. The ‘hero image’ of the Unity ML-Agents Toolkit, showing some of their example characters

The Techniques

The ML-Agents toolkit supports training using a number of methods: reinforcement learning (RL), generative adversarial imitation learning (GAIL), and behavioural cloning (BC).

Reinforcement Learning

The ML-Agents framework ships with implementations for two different Reinforcement Learning (RL) algorithms: Proximal Policy Optimisation (PPO) and Soft Actor-critic (SAC).

Warning

Take note of the acronyms for these domains and algorithms: RL, PPO and SAC. Memorise them. We’ll be using them often throughout the book.

PPO is a powerful, general purpose reinforcement learning algorithm that’s repeatedly been shown to highly effective and generally stable across a range of scenarios.

Tip

Proximal Policy Optimisation was created by the team at OpenAI and debuted in 2017. You can read the original paper on arXiv, if you’re interested in diving into the details.

For most of the book, we’ll be using PPO, and PPO is actually the default algorithm in ML-Agents. We’ll be exploring more about how PPO works as we build some environments and agents a little later on.

SAC is an off-policy reinforcement learning algorithm. This means that it can draw from experiences collected at any time (including the past). As a result, it requires less samples to learn a task when compared to PPO, but requires more updating of the model. This makes it a good choice for slower environments.

Tip

Soft Actor-Critic was created by the Berkeley Artificial Intelligence Research (BAIR) group, and debuted in December 2018. You can read the original release documentation for the details.

We’ll be using SAC one or twice in this book, and we’ll explore how it works in a little more detail when we get there.

Imitation Learning

Generative Adversarial Imitation Learning (GAIL) and Behavioural Cloning (BC) are two domains of Imitation Learning (IL): where RL is largely reliant on trial-and-error to arrive upon desired behaviour, IL achieves a desired behaviour by training based on demonstrated behaviours.

IL is often used in conjunction with RL. Training solely using IL can make it relatively trivial to achieve a specific behaviour, while training with IL and RL can make it significantly faster to train an agent to solve a complex, multi-faceted environment with sparse-rewards.

Tip

Sparse-reward environments are those where the agent is rewarded rarely or infrequently. Sometimes, in such an environment, the ramp to receive a reward signal may be so long (or non-existent) that some sort of intrinsic reward signal, such as curiosity, could help. We’ll discuss sparse-rewards and curiosity a lot more, later in the book.

BC is an IL approach that trains an gent to precisely mimic the actions found in a demonstration. With the ML-Agents Toolkit, BC can be used with either PPO or SAC-powered RL training loops.

GAIL is a generative adversarial appraoch applied to IL. In GAIL, a second model—a discriminator—is used to distinguish whether an action/observation is demonstration-based on produced by the agent being trained. The discriminator model provides a reward based on how close the action/observation is to provided demonstrations. The agent will try and maximise its reward, and the discriminator will get better at telling between demonstrations and the agent’s activities: the agent gets more sophisticated because it must work harder and harder to fool an increasingly strict discriminator.

Tip

Behavioural Cloning is often the best approach for scenarios and environments where it is possible to demonstate all, or almost all of the states that the agent can experience. GAIL is great for situations where there are no environmental or extrinsic rewards, or where there are a limited number of demonstrations available. BC can also be used in conjunction with GAIL.

We’ll explore IL, using both BC and GAIL, in more detail as we explore projects later in the book.

Summary of Techniques

This chapter is an introductory survey of concepts and techniques, and you’ll be exposed to, and use, each of the techniques we’ve looked at here over the course of this book. In doing so you’ll become more familiar with how each of them works in a practical sense.

The gist of it is:

  • the Unity ML-Agents Toolkit currently provides a selection of training methods across two approaches:

    • for reinforcement learning (RL): proximal policy optimisation (PPO) and soft actor-critic (SAC)

    • for imitation learning (IL): behavioural cloning (BC) and generative adversarial imitation learning (GAIL)

  • these methods can be used independently, or together:

    • RL can be either PPO or SAC alone, or one of PPO or SAC in conjunction with an IL method

    • BC can be used alone, or as a step on the path to an approach using GAIL or RL

  • IL techniques require some sort of provided demonstration

  • RL techniques learn by doing

Projects

This book is a practical, pragmatic work. We want you to get up and running using simulations and synthesis as quickly as possible, and we assume you’d prefer to focus on the implementation wherever possible.

So, while we do explore the behind the scenes often, the meat of the book is in the projects we’ll be building together.

The practical, project-based side of the book is split between the two domains we discussed earlier: simulation and synthesis.

Simulation Projects

NOTE: The projects aren’t fully known yet, so this section will be developed a bit later.

Our simulation projects we’ll be varied: when you’re building a simulation environment in Unity there’s a wide range of ways in which the agent that exists in the environment can observe and sense its world.

Some simulation projects will use an agent that observes the world using vector observations: numbers. Whatever numbers might want to send it. Literally anything you like. Realistically, though, vector observations are more often than not things like the distance the agent is from something, or other positional information. But really, any number can be a observation.

Some simulation projects will use an agent that observes the world using visual observations: pictures! Because Unity is a game engine, and game engines, like film, have a concept of cameras, you can simple (virtually) mount cameras on your agent, and just have it exist in the game world. The view from these cameras can then be fed into your machine learning system, allowing the agent to learn about its world based on the camera input.

Synthesis Projects

NOTE: The projects aren’t fully known yet, so this section will be developed a bit later.

TBD based on projects.

Summary and Next Steps

You’ve taken the first steps, and this chapter contained a bit of the required background material. From here onwards, we’ll be teaching you by doing. The book has the word practical in the title for a reason, and we want you to get a feel for simulation and synthesis by buulding projects of your own.

In the next chapter, we’ll look at how you can create your first simulation, implement an agent to do something in it, and train a machine learning system using reinforcement learning using it.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
44.213.65.97