Chapter 3. The Benefits of Batching

In 3D graphics and games, batching is a very general term describing the process of grouping a large number of wayward pieces of data and processing them together as a single, large block of data. The goal of this process is to reduce computation time, often by exploiting parallel processing or reducing overhead costs, if the entire batch is treated as individual elements. In some cases, the act of batching centers around meshes, large sets of vertices, edges, UV coordinates, and so on, which are used to represent a 3D object. However, the term could just as easily refer to the act of batching audio files, sprites, and texture files (also known as Atlasing), and other large datasets.

So, just to clear up any confusion, when the topic of batching is mentioned in Unity, it is usually referring to the two primary mechanisms it offers for batching mesh files: Static and Dynamic Batching. These methods are essentially a form of geometry instancing, where we use the same mesh data in memory to repeatedly render the same object multiple times without needing to prepare the data more than once.

These batching features offer us opportunities to improve the performance of our application, but only as long as they are used wisely. They are fairly nuanced systems, and there has been a lot of confusion surrounding the conditions that they are triggered under, and just as importantly, under what conditions we would even see a performance improvement. In some cases, batching can actually degrade performance if batches are asked to process datasets under conditions that don't fit a very particular mold.

The batching systems in Unity are mostly a black box, in which Unity technologies have not revealed much detailed, technical information about their inner workings. But, based on their behavior, profiler data, and the list of requirements needed to make them work, we can still infer a great deal. This chapter intends to dispel much of the misinformation floating around about batching systems. We will observe, via explanation, exploration, and examples, just how these two batching methods operate. This will enable us to make informed decisions, making the most of them to improve our application's performance.

Draw Calls

Before we discuss Static and Dynamic Batching independently, let's first understand the problems that they are both trying to solve within the graphics pipeline. We will try to keep fairly light on the technicalities. We will explore this topic in greater detail in Chapter 6, Dynamic Graphics.

The primary goal of these batching methods is to reduce the number of Draw Calls required to render all objects in the current view. At its most basic form, a Draw Call is a request sent from the CPU to the GPU, asking it to draw an object. But, before a Draw Call can be requested, several important criteria need to be met. Firstly, mesh and texture data must be pushed from the CPU memory (RAM) into GPU memory (VRAM), which typically takes place during initialization of the Scene. Next, the CPU must prepare the GPU by configuring the options and rendering features that are needed to process the object that is the target of the Draw Call.

These communication tasks between the CPU and GPU take place through the underlying graphics API, which could be either DirectX or OpenGL depending on the platform we're targeting and certain graphics settings. These APIs feature many complex and interrelated settings, state variables, and datasets that can be configured, and the available features change enormously based on the hardware device we're operating on. The massive array of settings that can be configured before rendering a single object is often condensed into a single term known as the Render State. Until these Render State options are changed, the GPU will maintain the same Render State for all incoming objects and render them in a similar fashion.

Changing the Render State can be a time-consuming process. We won't go too deeply into the particulars of this, but essentially the Render State is a collection of global variables that affect the entire graphics pipeline. Changing a global variable within a parallel system is much easier said than done. A lot of work must happen on the GPU to synchronize the outcome of these state changes, which often involves waiting for the current batch to finish. In a massively parallel system such as a GPU, a lot of valuable time can be lost waiting for one batch to finish before beginning the next. Things that can trigger this synchronization may include pushing a new texture into the GPU, changing a Shader, changing lighting information, shadows, transparency, and changing almost any setting we can think of.

Once the Render State has been configured, the CPU must decide what mesh to draw, what Material it should use, and where to draw the object based on its position, rotation, and scale (all represented within a single transform matrix). In order to keep the communication between CPU and GPU very dynamic, new requests are pushed into a Command Buffer. This is a buffered list, which the CPU sends instructions to, and which the GPU pulls from whenever it finishes the previous command. The Command Buffer behaves like a First In First Out (FIFO) queue, and each time the GPU finishes one command, it pops the oldest command from the front of the queue, processes it, and repeats until the Command Buffer is empty.

Note that a new Draw Call does not necessarily mean that a new Render State must be configured. If two objects share the exact same Render State information, then the GPU can immediately begin rendering the new object since the same Render State is maintained after the last object was finished.

Because the rendering process requires two hardware components to work in tandem, it is very sensitive to bottlenecks, which could originate in one or both components. GPUs can render individual objects incredibly quickly, so if the CPU is spending too much time generating Draw Call commands (or simply generating too many of them), then the GPU will wait for instructions more often than it is working. In this case, our application's graphics would be CPU-bound. We're spending more time waiting on the CPU to decide what to draw, than the GPU spends drawing it. Conversely, being GPU-bound means the Command Buffer fills up with requests as the GPU cannot process requests from the CPU quickly enough.

Note

You will learn more about what it means to have rendering bottlenecks in either the CPU or GPU, and how to solve both cases, in Chapter 6, Dynamic Graphics.

Another component which can impede the speed of graphics activity in this chain of events is within the hardware driver. This component mediates commands coming through the graphics API, which can come from multiple sources such as our application, other applications, and even the Operating System itself (such as rendering the desktop). Because of this, using updated drivers can sometimes result in a fairly significant increase in performance!

Next-generation graphics APIs, such as Microsoft's DirectX 12, Apple's Metal, and the Kronos Group's Vulcan, all aim to reduce the overhead on the driver by simplifying and parallelizing certain tasks; particularly, how instructions are passed into the Command Buffer. Once these APIs become commonplace, we may be able to get away with using significantly more Draw Calls comfortably within our application. But until these APIs mature, we must treat our Draw Call consumption with a good deal of concern, in order to avoid becoming CPU-bound.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.37.154