15. High‐Performance Programming with Data‐Oriented Design (2/3)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

256 15.High‐PerformanceProgrammingwithData‐OrientedDesign

Finally, the most important step is to look at the data you’ve identified as

input and figure out how it can be transformed into the output data in an efficient

way. How does the input data need to be arranged? Normally, you’ll want a large

block of the same data type, but perhaps, if there are two data types that need to

be processed at the same time, interleaving them might make more sense. Or

maybe, the transformation needs two separate passes over the same data type, but

the second pass uses some fields that are unused in the first pass. In that case, it

might be a good candidate for splitting it up into two types and keeping each of

them sequentially in memory.

Once you have decided on the transformation, the only thing left is gathering

the inputs from the rest of the system and filling the outputs. When you’re transi-

tioning from a more traditional architecture, you might have to perform an ex-

plicit gathering step—query some functions or objects and collect the input data

in the format you want. You’ll have to perform a similar operation with the out-

put, feeding it into the rest of the system. Even though those extra steps represent

a performance hit, the benefits gained usually offset any performance costs. As

more systems start using the data-oriented approach, you’ll be able to feed the

output data from one system directly into the input of another, and you’ll really

be able to reap the benefits.

15.5Real‐WorldSituations

Homogeneous,SequentialData

You are probably already applying some of the principles of data-oriented design

in your games right now: the particle system. It’s intended to handle thousands

and thousands of particles. The input data is very carefully designed to be small,

aligned, and fit in cache lines, and the output data is also very well defined be-

cause it probably feeds directly into the GPU.

Unfortunately for us, we don’t have that many situations in game develop-

ment where we can apply the same principle. It may be possible for some sound

or image processing, but most other tasks seem to require much more varied data

and lots of different code paths.

HeterogeneousData

Game entities are the perfect example of why the straightforward particle ap-

proach doesn’t work in other game subsystems. You probably have dozens of

different game entity types. Or maybe you have one game entity, but have doz-

ens, or even hundreds, of components that, when grouped together, give entities

their own behavior.

15.5Real‐WorldSituations 257

One simple step we can take when dealing with large groups of heterogene-

ous data like that is to group similar data types together. For example, we would

lay out all the health components for all entities in the game one right after the

other in the same memory block. Same thing with armor components, and every

other type of component.

If you just rearranged them and still updated them one game entity at a time,

you wouldn’t see any performance improvements. To gain a significant perfor-

mance boost, you need to change the update from being entity-centric to being

component-centric. You need to update all health components first, then all ar-

mor components, and proceed with all component types. At that point, your

memory access patterns will have improved significantly, and you should be able

to see much better performance.

BreakandBatch

It turns out that sometimes even updating a single game entity component seems

to need a lot of input data, and sometimes it’s even unpredictable what it’s going

to need. That’s a sign that we need to break the update into multiple steps, each

of them with smaller, more predictable input data.

For example, the navigation component casts several rays into the world.

Since the ray casts happen as part of the update, all of the data they touch is con-

sidered input data. In this case, it means that potentially the full world and other

entities’ collision data are part of the input data! Instead of collecting that data

ahead of time and feeding it as an input into each component update, we can

break the component update into two parts. The initial update figures out what

ray casts are required and generates ray-cast queries as part of its output. The se-

cond update takes the results from the ray casts requested by the first update and

finishes updating the component state.

The crucial step, once again, is the order of the updates. What we want to do

is perform the first update on all navigation components and gather all the ray-

cast queries. Then we take all those queries, cast all those rays, and save the re-

sults. Finally, we do another pass over the navigation components, calling the

second update for each of them and feeding them the results of the ray queries.

Notice how once again, we managed to take some code with that tree-like

memory access structure and turn it into something that is more linear and works

over similar sets of data. The ray-casting step isn’t the ideal linear traversal, but

at least it’s restricted to a single step, and maybe the world collision data might

fit in some cache so we won’t be getting too many misses to main memory.

Once you have this implemented, if the ray-casting part is still too slow, you

could analyze the data and try to speed things up. For example, if you often have

258 15.High‐PerformanceProgrammingwithData‐OrientedDesign

lots of grouped ray casts, it might be beneficial to first sort the ray casts spatially,

and when they’re being resolved, you’re more likely to hit data that is already

cached.

ConditionalExecution

Another common situation is that not all data of the same type needs to be updat-

ed the same way. For example, the navigation component doesn’t always need to

cast the same number of rays. Maybe it normally casts a few rays every half a

second, or more rays if other entities are closer by.

In that case, we can let the component decide whether it needs a second-pass

update by whether it creates a ray-cast query. Now we’re not going to have a

fixed number of ray queries per entity, so we’ll also need a way to make sure we

associate the ray cast with the entity it came from.

After all ray casts are performed, we iterate over the navigation components

and only update the ones that requested a ray query. That might save us a bit of

CPU time, but chances are that it won’t improve performance very much because

we’re going to be randomly skipping components and missing out on the benefits

of accessing memory linearly.

If the amount of data needed by the second update is fairly small, we could

copy that data as an output for the first update. That way, whenever we’re ready

to perform the second update, we only need to access the data generated this way,

which is sequentially laid out in memory.

If copying the data isn’t practical (there’s either too much data or that data

needs to be written back to the component itself), we could exploit temporal co-

herence, if there is any. If components either cast rays or don’t, and do so for

several frames at a time, we could reorder the components in memory so all nav-

igation components that cast rays are found at the beginning of the memory

block. Then, the second update can proceed linearly through the block until the

last component that requested a ray cast is updated. To be able to achieve this, we

need to make sure that our data is easily relocatable.

Polymorphism

Whenever we’re applying data-oriented design, we explicitly traverse sets of data

of a known type. Unlike an object-oriented approach, we would never traverse a

set of heterogeneous data by calling polymorphic functions in each of them.

Even so, while we’re transforming some well-known data, we might need to

treat another set of data polymorphically. For example, even though we’re updat-

ing the bullet data (well-known type), we might want to deliver damage to any

15.6Parallelization 259

entity it hits, independent of the type of that entity. Since using classes and inher-

itance is not usually a very data-friendly approach, we need to find a better alter-

native.

There are many different ways to go about this, depending on the kind of

game architecture you have. One possibility is to split the common functionality

of a game entity into a separate data type. This would probably be a very small

set of data: a handle, a type, and possibly some flags or indices to components. If

every entity in the world has one corresponding set of data of this type, we can

always count on it while dealing with other entities. In this case, the bullet data

update could check whether the entity has a damage-handling component, and if

so, access it and deliver the damage.

If that last sentence left you a bit uncomfortable, congratulations, you’re

starting to really get a feel for good data access patterns. If you analyze it, the

access patterns are less than ideal: we’re updating all the current bullets in the

world. That’s fine because they’re all laid out sequentially in memory. Then,

when one of them hits an entity, we need to access that entity’s data, and then

potentially the damage-handling component. That’s two potentially random ac-

cesses into memory that are almost guaranteed to be cache misses.

We could improve on this a little bit by having the bullet update not access

the entity directly and, instead, create a message packet with the damage it wants

to deliver to that entity. After we’re done updating all of the bullets, we can make

another pass over those messages and apply the damage. That might result in a

marginal improvement (it’s doubtful that accessing the entity and its component

is going to cause any cache misses on the following bullet data), but most im-

portantly, it prevents us from having any meaningful interaction with the entity.

Is the entity bullet proof? Maybe that kind of bullet doesn’t even hit the entity,

and the bullet should go through unnoticed? In that case, we really want to access

the entity data during the bullet update.

In the end, it’s important to realize that not every data access is going to be

ideal. Like with all optimizations, the most important ones are the ones that hap-

pen more frequently. A bullet might travel for hundreds of frames, and it will hit

something at most in one frame. It’s not going to make much of a difference if,

during the frame when it hits, we have a few extra memory accesses.

15.6Parallelization

Improving memory access patterns is only part of the performance benefits pro-

vided by data-oriented design. The other half is being able to take advantage of

multiple cores very easily.

260 15.High‐PerformanceProgrammingwithData‐OrientedDesign

Normally, to split up tasks on different cores, we need to create a description

of the job to perform and some of the inputs to the job. Unfortunately, where

things fall down for procedural or object-oriented approaches is that a lot of tasks

have implicit inputs: world data, collision data, etc. Developers try to work

around it by providing exclusive access to data through locking systems, but

that’s very error prone and can be a big hit on performance depending on how

frequently it happens.

The good news is that once you’ve architected your code such that you’re

thinking about the data first and following the guidelines in earlier sections,

you’re ready to parallelize it with very little effort. All of your inputs are clearly

defined, and so are your outputs. You also know which tasks need to be per-

formed before other tasks based on which data they consume and produce.

The only part missing is a scheduler. Once you have all of that information

about your data and the transformations that need to happen to it, the scheduler

can hand off tasks to individual cores based on what work is available. Each core

gets the input data, the address where the output data should go, and what kind of

transformation to apply to it.

Because all inputs are clearly defined, and outputs are usually new memory

buffers, there is often no need to provide exclusive access to any data. Whenever

the output data writes back into an area of memory that was used as an input (for

example, entity states), the scheduler can make sure there are no jobs trying to

read from that memory while the job that writes the output is executing. And be-

cause each job is very “shallow” (in the sense that it doesn’t perform cascading

function calls), the data each one touches is very limited, making the scheduling

relatively easy.

If the entire game has been architected this way, the scheduler can then cre-

ate a complete directed acyclic graph of data dependencies between jobs for each

frame. It can use that information and the timings from each previous frame (as-

suming some temporal coherency) and estimate what the critical path of data

transformation is going to be, giving priority to those jobs whenever possible.

One of the consequences of running a large system of data transformations

this way is that there are often a lot of intermediate memory buffers. For plat-

forms with little memory available, the scheduler can trade some speed for extra

memory by giving priority to jobs that consume intermediate data instead of ones

in the critical path.

This approach works well for any kind of parallel architecture, even if the

individual cores don’t have access to main memory. Since all of the input data is

explicitly listed, it can be easily transferred to the core local memory before the

transformation happens.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 15. High‐Performance Programming with Data‐Oriented Design (2/3)

Create new playlist

Sign In

Sign Up

Table of Contents for
15. High‐Performance Programming with Data‐Oriented Design (2/3)