2
THE ROLE OF AUDIO IN INTERACTIVE AND IMMERSIVE ENVIRONMENTS

1. Inform, Entertain, Immerse

What is the purpose of audio in games? What makes a player turn up the volume in a game instead of streaming their favorite music playlist?

Games have come a long way since the days of the Atari 2600 and its embryonic soundtracks, the blips and noises still in our collective memory today. Newer, better technologies have come online, giving sound designers new tools and more resources with which to create the soundtracks of future games. Yet, even with the recent technological advances, crafting a compelling soundtrack remains a tricky affair at best, reminding us that technology isn’t everything, and that, at its core, the issues facing the modern sound designer have at least as much to do with the narrative we strive so hard to craft as with the tools at our disposal. So perhaps we should begin our investigation not so much by looking at the tools and techniques used by professionals but by understanding the aims and challenges gaming confronts us with, and how to best tackle them.

Understanding these challenges independently from the technology involved will allow us to ultimately get the best out of the tools available to us, whatever those may be, whether we are working on a AAA game for the latest generation of dedicated hardware or a much more humble mobile app.

If we had to sum up the purpose of sound in games and interactive media we could, perhaps, do it with these three words: inform, entertain, immerse. The role of the sound designer and audio engineer in interactive media is to pursue and attain these goals, establishing a dialogue between the player and the game, providing them with essential information and data, that will help them navigate the game. Perhaps a simple way to think about how each event fits within the overall architecture of our soundtracks is through this simple equation:

It is easy to understand the entertain portion of our motto. The soundtrack (a term that refers to music, dialog and SFX) of a AAA game today should be able to compete with a high-end TV or film experience. We expect the sound design to be exciting, larger than life and original. That is a challenge in itself, of course. Additionally, however, in order to create a fully encompassing gaming experience, it is also important that we provide useful feedback to the player as to what is happening in the game both in terms of mechanics and situational awareness. Using the soundtrack to provide gamers with information that will help them play better and establish a dialog with the game is a very powerful way to maximize the impact of the overall experience. Indeed, as we shall see, even a simple, mobile arcade game type can be significantly improved by a detailed and thoughtful soundtrack, and the user’s experience vastly heightened as a result. Effective aural communication will also certainly greatly contribute to and enhance the sense of immersion that so many game developers aspire to achieve.

In a visually driven media world we tend to underestimate – or perhaps take for granted – how much information can be conveyed with sound. Yet constantly in our daily lives we are analyzing hundreds of aural stimuli throughout the day that provide us with information on our surroundings, the movement of others, alert us to danger or the call of a loved one and much, much more. In effect, we experience immersion on a daily basis; we simply call it reality, and although gaming is a fundamentally different experience, we can draw upon these cues from the real world to better understand how to provide the user with information and how to, hopefully, achieve immersion.

Let us take a closer look at all three of these concepts, inform, entertain and immerse, first in this chapter, then in more detail throughout the rest of this book as we examine strategies to develop and implement audio assets for a number of practical situations.

1. Inform: How, What

In a 3D or VR environment sound can and must play an important role when it comes to conveying information about the immediate surroundings of the user. Keeping in mind that the visual window available to the player usually covers between 90–120 degrees out of 360 at any given time, sound quickly becomes indispensable when it comes to conveying information about the remaining portion of the environment. It should also be noted that, while the visual field of humans is about 120 degrees, most of that is actually peripheral vision; our actual field of focus is much narrower. The various cues that our brain uses to interpret these stimuli into a distance, direction and dimension, will be examined in more detail in a future chapter, but already we can take a preliminary look at some of the most important elements we can extract from these aural stimuli and what they mean to the interactive and immersive content developer.

a. Geometry/Environment: Spatial Awareness

In a game engine, the term geometry refers to the main architectural elements of the level, such as the walls, stairs, large structures and so on. It shouldn’t be surprising that sound is a great way to convey information about a number of these elements. Often, in gaming environments, the role of the sound designer extends beyond that of creating, selecting and implementing sounds. Creating a convincing environment for sound to propagate in is often another side of the audio creation process, known as environmental modeling. A well-designed environment will not only reinforce the power of the visuals but is also a great way to inform the user about the game and provide a good backdrop for our sounds to live in.

Some of the more obvious aspects of how sound can translate into information are:

Is the environment indoors or outdoors?
- If indoors, what is the order of the size of the room we find ourselves in?
- If outdoors, are there any large structures, natural or man-made, around?
Do we have a clear line of sight with the sound we are hearing, or are we partially or fully cut off from the source of the sound? We can isolate three separate scenarios:
1. We are fully cut off from the audio source. The sound is happening in an adjacent room or outside. This is known as occlusion. There is no path for the direct or reflected sound to get to the listener.
2. The path between the audio source and the player is partially obstructed, as in a small wall or architectural feature (such as a column for instance) blocking our line of sight. In this case the direct audio path is blocked, but the reflected audio path is clear: that is known as obstruction.
3. The direct path is clear, but the reflected sound path isn’t, blocking the reverberated sound: this is known as exclusion.

Each of these situations can be addressed and simulated in a soundtrack and provide the user with not just an extremely immersive experience but also valuable information to help them navigate their environment and the game itself.

b. Distance

We have for a long time understood that the perception of distance was based primarily on the amount of dry vs. reflected sound that reaches our ears and that therefore reverberation played a very important role in the perception of distance.

Energy from reverberant signals decays more slowly over distance than dry signals, and the further away from the listener the sound is, the more reverb is heard.

Additionally, air absorption is another factor that aids us in perceiving distance. Several meteorological factors contribute to air absorption; the most important ones are temperature, humidity and distance. The result is the noticeable loss of high frequency content, an overall low pass filtering effect.

Most game engines, Unity being one of them, provide us with a great number of tools to work with and effectively simulate distance. It does seem, however, that, either due to a lack of knowledge or due to carelessness, a lot of game developers choose to ignore some of the tools at their disposal and rely solely on volume fades. The result is often disappointing and less-than-convincing, making it difficult for the user to rely on the audio cues alone to accurately gauge distance.

c. Location

The perception of the location of a sound in terms of direction in 360 degrees is a little more complex a mechanism, as it in fact relies on multiple mechanisms. The most important are:

Interaural time difference: the time difference it takes for sound to reach both the left and right ears.
Interaural intensity difference: the difference in amplitude between the signal picked up by the left and the right ear.
The precedence effect: in a closed space, the precedence effect can also help us determine the direction of the initial sound source. It was demonstrated by Dr Helmut Haas in 1949 that humans, when confronted to under certain circumstances, will determine the location of a sound based on the first arriving wave.

As outlined with these principles, our ability to discern the direction a sound comes from is dependent on minute differences in time of arrival and relative intensities of signals to both ears. While some of these phenomena are more relevant with certain frequencies than others (we almost universally have an easier time locating sounds with high frequency content, for instance), it is almost impossible to determine the location of a continuous tone, such as a sine wave playing in a room (Cook ’99). A good game audio developer will be able to use these phenomena to their advantage.

The process currently used to recreate these cues on headphones is a technology called Head Related Transfer Functions, which we shall discuss in Chapter four.

Another somewhat complimentary technology when it comes to spatial audio is ambisonic recording. While not used to actually recreate the main cues of human spatial hearing, it is a great way to compliment these cues by recording a 360-degree image of the space itself. The Unity game engine supports this technology, which their website describes as an ‘audio skybox’. Ambisonic and their place in our sonic ecosystem will also be discussed further in upcoming chapters.

d. User Feedback and Game Mechanics

This might be less obvious than some of the previous concepts discussed up until now, as in some ways, when successfully implemented, some of the features about to be discussed might not – and perhaps should not – be noticed by the casual player (much to the dismay of many a sound designer!).

On a basic level, audio based user feedback is easily understood by anyone who ever had to use a microwave oven, digital camera or any of the myriad consumer electronics goods that surround us in our daily lives. It is the Chime Vs. Buzzer Principle that has governed the sound design conventions of consumer electronics good for decades – and TV quiz shows for that matter.

The simplest kind of feedback one can provide through sound is whether an action was completed successfully or not. The Chime Vs. Buzzer Principle is actually deceptively simple, as it contains in its root some of the most important rules of sound design as it relates to user feedback:

The chime almost universally symbolizes successful completion of an action, or positive feedback. It is a pleasant, musical sound that we associate with immediate action and positive sentiments. The buzzer, of course, is noisy, unpleasant to the ear and associated with negative feedback and negative sentiments. Both these sounds have the benefit of being easy to hear, even at moderate levels in a somewhat crowded or noisy environment, although the chime appears to achieve similar results while remaining pleasant to the listener.

These qualities, being easy to hear in a noisy environment, easy to understand when heard (also known as legibility), make them prime examples of the specific demands that user feedback sound design requires.

Sound can provide much more complex and subtle feedback as well. Adding a low tone to the mix when entering a room can induce an almost subliminal sense of unease in the player; a sound can inform us of the material that something is made of even though it might not be clear visually. There are many variations of the Chime Vs. Buzzer Principle in gaming. Contact sounds, the sound the game makes if you hit a target, for instance, are one such great example, but there are far too many for us to list here. As you can see, there are many ways to use the Chime Vs. Buzzer Principle in your games, and coming up with creative ways to take advantage of our innate understanding of this principle provides the game developer with endless opportunities for great sound design.

Additionally, the mix itself is an effective way to provide information to the player. By altering the mix – for instance the balance between music, dialog and FX – or even by changing the relative balance between sound effects, the game can attract the attention of the player and focus it on a specific element or, in turn, distract the attention of the player.

2. Entertain

The focus of this book being on sound design and not composition, we will think of music in relation to the sound design and overall narrative and emotional functions it supports.

a. Sound Design

We all know how much less scary or intense even the most action-packed shots look when watched with the sound off. If you haven’t tried, do so. Find any scary scene from a game or movie, and watch it with the sound all the way down. Sound allows the story-teller to craft and compliment a compelling environment that magnifies the emotional impact of the scene or game, increasing the amount of active participation of the gamer. An effective combination of music and sound design, where both work together, plays a critical role in the overall success of the project, film or game.

Sound design for film and games remains still today, to an extent, a bit of a nebulous black art – or is often perceived as such – and one that can truly be learned only through a long and arduous apprenticeship. It is true that there is no substitute for experience and taste, both acquired through practice, but the vast amount of resources available to the student today makes it a much more accessible craft to acquire. This book will certainly attempt to demystify the art of sound design and unveil to students some of the most important techniques used by top notch sound designers, but experimentation by the student is paramount.

As previously discussed, sound supports every aspect of a video game – or should anyway. If we think of sound as simply ‘added on’ to complete the world presented by the visuals, we could assume that the role of sound design is simply to resolve the cognitive dissonance that would arise when the visuals are not complemented by sound.

Of course, sound does also serve the basic function of completing the visuals and therefore, especially within VR environment, allows for immersion to begin to take hold, but it also supports every other aspect of a game, from narrative to texturing, animation to game mechanics. A seasoned sound designer will look for or create a sound that will not simply complete the visual elements but also serve these other functions in the most meaningful and appropriate manner.

b. Music and the Mix

While this book does not focus on music composition and production, it would be a mistake to consider sound design and music in isolation from each other. The soundtrack of any game (or movie) should be considered as a whole, made up of music, dialog, sound effects and sometimes narration. At any given time, one of these elements should be the predominant one in the mix, based on how the story unfolds. A dynamic mix is a great way to keep the player’s attention and create a truly entertaining experience. Certain scenes, such as action scenes, tend to be dominated by music, whose role is to heighten the visuals and underline the emotional aspect of the scene. A good composer’s work will therefore add to the overall excitement and success of the moment. Other scenes might be dominated by sound effects, focusing our attention on an object or an environment. Often, it is the dialog that dominates, since it conveys most of the story and narrative. An experienced mixer and director can change the focus of the mix several times in a scene to carefully craft a compelling experience. Please see the companion website for some examples of films and games that will illustrate these points further.

Music for games can easily command a book in itself, and there are many out there. Music in media is used to frame the emotional perspective of a given scene or level. It tells us how to feel and whom to feel for in the unfolding story. I was lucky enough to study with Morton Subotnick, the great composer and pioneer of electronic music. During one of his lectures, he played the opening scene to the movie The Shining by Stanley Kubrick. However, he kept changing the music playing with the scene. This was his way to illustrate some of the obvious or subtle ways in which music can influence our emotional perception of the scene. During that exercise it became obvious to us that music could not only influence the perceived narrative by being sad or upbeat or by changing styles from rock to classical but that, if we are not careful, music also has the power to obliterate the narrative altogether. Additionally, music has the power to direct our attention to one element or character in the frame. Inevitably, a solo instrument links us emotionally to one of the characters, while an orchestral approach tends to take the focus away from individuals and shifts it toward the overall narrative.

Although we were all trained musicians and graduate students, Subotnick was able to show us that music was even more powerful than we had thought previously.

The combination of music and sound can not only be an extremely powerful one, but it can play a crucial role in providing the gamer with useful feedback in a way that neither of these media can accomplish on their own, and therefore communication between the composer and sound design team is crucial to achieve the best results and create a result greater than the sum of its parts.

3. Defining Immersion

Entire books have been dedicated to the topic of immersion – or presence – as psychologists have referred to it for several decades. Our goal here is not an exhaustive study of the phenomenon but rather to gain an understanding of it in the context of game audio and virtual reality.

We can classify virtual reality and augmented reality systems into three categories:

Non-immersive systems: typically, simple Augmented Reality systems that affect one sensory input. Playing a 3D game on a laptop is a common example. This is the type of system most people are familiar with.
Semi-immersive systems: typically allows the users to experience a 3D world while remaining connected to the real world. A flight simulator game played on a multiscreen system with realistic hardware, such as a flight yoke, would be a good example of such a system.
Fully immersive systems: affect all or most sensory inputs and attempt to completely cut off the user from their surroundings through the use of head mounted displays, headphones, and additional systems such as gaming treadmills, which allow the user to walk or even run though a virtual environment.

An early definition of presence based on the work of Minski, 1980 would be:

The sense an individual experiences of being physically located in an environment different from their actual environment, while also not realizing the role technology is playing in making this happen

We in the gaming world tend to think of presence or immersion as a rather novel topic, one that came about with games and virtual reality. Truly, however, the concept has been part of conventional medias such as literature for hundreds of years. Narrative immersion happens when a player or reader is so invested in the plot that they momentarily forget about their surroundings.

There is no doubt, however, that games and virtual reality have given us a new perceived dimension in the immersive experience, that is, the possibility to act in an environment, not simply having the sensation of being there. So, what are the elements that scientists have been able to identify as most likely to create immersion?

The research of psychologist Werner Wirth suggests that successful immersion requires three steps:

Players begin to create a representation in their minds of the space or world the game is offering.
Players begin to think of the media space or game world as their main reference (aka primary ego reference).
Players are able to obtain useful information from the environment.

Characteristics that create immersion tend to fall in two categories:

Characteristics that create a rich mental model of the game environment.
Characteristics that create consistency amongst the various elements of the environment.

Clearly, sound can play a significant role in all these areas. We can establish a rich mental model of an environment through sound by not only ‘scoring’ the visuals with sound but by also by adding non-diegetic elements to our soundtrack. For instance, a pastoral outdoor scene can be made more immersive by adding the sounds of birds in various appropriate locations, preferably randomized around the player, such as trees, bushes etc. Some elements can be a lot more subtle, such as the sound of wood creaking, layered every once in a while, with footsteps over a wooden surface, for instance. While the player may not be consciously cognizant of such an event, there is no doubt that these details will greatly enhance the mental model of the environment and therefore contribute to creating immersion.

Consistency, this seemingly obvious concept, can be trickier to implement when it comes to creature sounds or interactive objects such as vehicles. The sound an enemy makes while it is being hurt in battle should be different than the sound that same creature might make when trying to intimidate its enemies, but it should still be consistent overall with the expectations of the player based on the visuals and, in this case, the anatomy of the creature and the animation or action. Consistency is also important when it comes to sound propagation in the virtual environment, and, as was seen earlier in this chapter, gaming extends the role of the sound designer to modeling sound propagation and the audio environment in which the sounds will live.

Inconsistencies in sound propagation will only contribute to confusing the player and cause them to eventually discard any audio cue and rely entirely on visual cues.

Indeed, when the human brain receives conflicting information between audio and visual channels, the brain will inevitably default to the visual channel. This is a phenomenon known as the Colavita visual dominance effect.

As sound designers, it is therefore crucial that we be consistent in our work. This is not only because we can as easily contribute and even enhance immersion as we can destroy it, but beyond immersion, if our work is confusing to the player, we take the risk of having the user discard audio cues altogether.

It is clear that sensory rich environments are much better at achieving immersion. The richness of a given environment maybe given as:

Multiple channels of sensory information.
Exhaustiveness of sensory information.
Cognitively challenging environments.
Possessing a strong narrative element.

Additionally, while immersion can be a rather tricky thing to achieve, it is rather easy to break. In order to maintain immersion, research suggests that these elements are crucial:

Lack of incongruous audio/visual cues.
Consistent behavior from objects in the game world.
Continuous presentation of the game world – avoid commercials, level reset after a loss.
The ability to interact with objects in the game world.

While some of these points may be relatively obvious, such as the lack of presence of incongruous elements (such as in-game ads, bugs in the game, the wrong sound being triggered), some may be less so. The third point presented in this list, ‘continuous presentation of the game world’, is well illustrated by the game Inside by Playdead studios. Inside is the follow-up to the acclaimed game Limbo, and Inside’s developers took a very unique approach to the music mechanics in the game. The Playdead team was trying to prevent the music from restarting every time the player respawned after being killed in the game. Something as seemingly unimportant as this turns out to have a major effect on the player. By not having the music restarted with every spawn, the action in the game feels a lot smoother, and the developers have removed yet one more element that may be guilty of reminding the player they are in a game, therefore making the experience more immersive. Indeed, the game is extremely successful at creating a sense of immersion.

It is important to note than the willingness to be emotionally involved is also an important, perhaps crucial, factor to achieving immersion. This is something that developers have no control over and that pre-supposes the desire of the user to be immersed. This is sometimes referred to as the ‘Fan Gene’. As a result, two users may have wildly differing experiences when it comes to the same experience, based, partially, on their willingness to ‘be immersed’.

2. Challenges of Game Audio

In spite of the improvements that each new generation of hardware brings with every anticipated release, developers are forced to come to one ineluc-table conclusion: no matter how new, exciting, revolutionary, even, each new generation of tools is, we are almost always at some point contending with finite resources. It could be said that developers working on mobile gaming today are facing similar challenges as their peers did when developing games on the first generation of gaming consoles. In that regard, the range of technologies available to us today requires the modern developer to deal with a massive range of hardware and capabilities, demanding a level of expertise that is constantly evolving and increasing.

1. Implementation

It is impossible to understate the importance and impact of implementation on the final outcome, although what implementation actually consists of, the process and its purpose often remain a somewhat nebulous affair. In simplistic terms, implementation consists of making sure that the proper sounds are played at the right time, at the right sound level and distance and that they are processed in the way the sound designer intended. Implementation can make or break a soundtrack and, if poorly realized, can ruin the efforts of even the best sound designers. On the other hand, clever use of resources and smart coding can work their magic and enhance the efforts of the sound designers and contribute to creating a greater sense of immersion.

Implementation can be a somewhat technical process, and although some tools are available that can partially take out the need for scripting, some programming knowledge is definitely a plus in any circumstance and required in most. One of the most successful third-party implementation tools is Audio-kinetic’s Wwise, out of Montreal, Canada, which integrates seamlessly with most of the leading game engines such as Unity, Unreal and Lumberyard. The Unreal engine has a number of tools useful for audio implementation. The visual scripting language Blue Print developed by Epic is a very powerful tool for all-purpose implementation with very powerful audio features. As a sound designer or audio developer, learning early on what the technical limitations of a game, system or environment are is a crucial part of the process.

Because the focus of this book is to work with Unity and with as little reliance on other software as possible, we will look at these concepts and implementation using C# only, although they should be easy to translate into other environments.

2. Repetition and Fatigue Avoidance

We have already seen in Chapter one that the first generations of gaming hardware did not rely on stored PCM data for audio playback as is mostly the case today but instead used on-board audio chips to synthesize sounds in real time. Their concerns when it came to sound therefore had more to do with number of available voices than trying to squeeze as many samples as possible on a disc or download. Remember that the Atari 2600 had a polyphony of two voices.

The 1980s saw the rise and then dominance of PCM audio as the main building blocks of game soundtracks. Audio samples afforded a level of realism that was unheard of until then, even at the low resolutions early hardware could (barely) handle. Along with increased realism, however, came another host of issues, some of which we are still confronted with today.

Early video game systems had very limited available RAM, as a result of which games could ship only with a small amount of samples. Often these samples were heavily compressed (both in terms of dynamic range and data reduction), which severely reduced the fidelity of the recording or sound, making them hard to listen to, especially over time. In addition, since so few samples could be included, they were played frequently and had to be used for more than one purpose. In order to deal with listener fatigue, game developers early on developed techniques that are still relevant and in use today, the most common being randomization.

The use of random and semi-random techniques in sound and music, also known as Stochastic techniques, had been pioneered by avant-garde composers such as John Cage and Iannis Xenakis in the 1950s and 1960s. These techniques, directly or indirectly, have proved to be extremely helpful for game developers.

The use of random behaviors is a widespread practice in the gaming industry, which can be applied to many aspects of sound.

Randomization can be applied to but is not limited to:

Pitch
Amplitude
Sample Selection
Sample concatenation – the playback of samples sequentially
Interval between sample playback
Location of sound source
Synthesis parameters of procedurally generated assets

(Working examples of each of the techniques listed in the following and more are provided in the scripting portion of the book.)

The most common of these techniques is the randomization of pitch and amplitude, often built into game engines, such as Unreal, in which it’s been implemented as a built-in feature for iterations. Pitch and amplitude randomization might be a good start, but it’s often no longer enough to combat listener fatigue. Nowadays developers rely on more sophisticated techniques, often combining the randomization of several parameters. These more advanced, combinatorial techniques are sometimes referred to as procedural, a term in this case used rather loosely. In this book, we will tend to favor the stricter definition of the term procedural, that is, real time creation of audio assets, as opposed the real time manipulation of existing audio assets. The difference between procedural asset creation and advanced stochastic techniques are sometimes blurry, however. These more advanced random or stochastic techniques are certainly very important, and their usefulness should not be underestimated.

3. Interactive Elements and Prototyping

One of the challenges that even very accomplished sound designers coming from linear media tend to struggle with the most initially when working in gaming is the interactive elements, such as vehicles, machines, weapons and other devices the user may interact with. Interactivity makes it difficult to predict the behavior of a game object and therefore cannot be approached in a traditional linear fashion. How can one design sounds for a vehicle not knowing in advance how the user will interact with it? Simple things such as acceleration, braking sounds and the sound of tires skidding when the vehicle moves at high speed are all of a sudden part of a new equation.

The answer when addressing these issues is often prototyping. Prototyping consists of building an interactive audio model of the object, often in a visual environment such as Cycling74’s MAXMSP, Native Instrument’s Reaktor or Pure Data by Miller Puckette, to recreate the intended behavior of the object and test in advance all possible scenarios to make sure that our sound design is on point and, just as importantly, that the sounds behave appropriately. For instance, in order to recreate the sense of a vehicle accelerating, the engine loop currently playing back might get pitched up; inversely, when the user is slamming on the breaks the sample will get pitched down, and eventually, in more complex simulation, another sample at lower RPM might get triggered if the speed drops below a certain point and vice versa.

Working with interactive elements does imply that sounds must be ‘animated’ by being pitched up, down, looped and processed in accordance with the circumstances. This adds another layer of complexity to the work of the sound designer: they are not only responsible for the sound design but also for the proper processing and triggering of these sounds. The role of the sound designer therefore extends to determining the range of the proper parameters for these actions, as well as the circumstances or threshold for which certain sounds must be triggered. The sound of tires skidding would certainly sound awkward if triggered at very low speeds, for instance. Often, these more technical aspects are finely tuned in the final stages of the game, ideally with the programming or implementation team, to make sure their implementation is faithful to your prototype. In some cases, you might be expected to be fluent both as a sound designer and audio programmer, which is why having some scripting knowledge is a major advantage. Even in situations where you are not directly involved in the implementation, being able to interact with a programmer in a way they can clearly comprehend, with some knowledge of programming, is in itself a very valuable skill.

4. Physics

The introduction and development of increasingly more complex physics engines in games introduced a level of realism and immersion that was a small revolution for gamers. The ability to interact and have game objects behave like ‘real-world’ objects was a thrilling prospect. Trespasser: Jurassic Park, released in 1998 by Electronic Arts, is widely acknowledged as the first game to introduce ragdoll physics, crossing another threshold toward full immersion. The case could be made that subsequent games such as Half Life 2, published in 2004 by Valve Corporation, by introducing the gravity gun and allowing players to pick up and move objects in the game, truly heralded the era of realistic physics in video games.

Of course, physics engines introduced a new set of challenges for sound designers and audio programmers. Objects could now behave in ways that were totally unpredictable. A simple barrel with physics turned on could now be tipped over, dragged, bounce or roll at ranges of velocities, each requiring their own sound, against any number of potential materials, such as concrete, metal, wood etc.

The introduction of physics in game engines perhaps demonstrated the limitations of the sample-based paradigm in video game soundtracks. It would be impossible to create, select and store enough samples to perfectly cover each possible situation in the barrel example. Some recent work we shall discuss in the procedural audio chapter shows some real promise for real-time generation of audio assets. Using physical modeling techniques we can model the behavior of the barrel and generate the appropriate sound, in real time, based on parameters passed onto us by the game engine.

For the time being, however, that is, until more of these technologies are implemented in production environments and game engines, we rely on a combination of parameter randomization and sample selection based on data gathered from the game engine at the time of the event. Such data often include the velocity of the collision and the material against which the collision occurred. This permits satisfactory, even realistic simulation of most scenarios with a limited number of samples.

5. Environmental Sound Design and Modeling

In creating the soundtrack for a large 3D game or environment, one should consider the resulting output as a cohesive whole instead of a collection of sounds playing somewhat randomly on top of each other. This kind of foresight and holistic approach to sound design allows for much more engaging and believable environments and a much easier mix overall. The soundtrack of a game is a complex environment, composed of many layers playing on top of each and changing based on complex parameters determined by the gameplay. In a classic first-person shooter game, the following groups or layers of sounds could be playing at any single time over each other:

Room tones: drones, hums.
Environmental sounds: street sounds, weather.
Dialog and chatter.
Foley: footsteps, movement sounds.
Non-player characters: AI, creatures, enemies.
Weapons: small arms fire, explosions.
Machinery: vehicles, complex interactive elements.
Music.

This list does gives us a sense of the challenge that organizing, designing, prioritizing and playing back all these sounds together and keeping the mix from getting cluttered represents.

In essence, we are creating a soundscape. We shall define soundscape as a sound collage that is intended to recreate a place and an environment and provide the player with an overall sonic context.

In addition to having the task of creating a cohesive, complex and responsive sonic environment, it is just as important that the environment itself, within which these sounds are going to be heard, be just as believable. This discipline is known as environmental modeling and relies on tools such as reverberation and filtering to model sound propagation. Environmental modeling is a discipline pioneered by sound designers and film editors such as Walter Murch that aims at recreating the sonic properties of an acoustical space – be it indoors or outdoors – and provides our sounds a believable space to live in. The human ear is keenly very sensitive to the reverberant properties of most spaces, even more so to the lack of reverberation. Often the addition of a subtle reverberation to simulate the acoustic properties of a place will go a long way in creating a satisfying experience but in itself may not be enough. Environmental modeling is discussed in further detail in this book.

6. Mixing

The mix often remains the Achilles’ heels of many games. Mixing for linear media is a complex and difficult process usually acquired with experience. Mixing for games and interactive media does introduce the added complexity of unpredictability, as it isn’t always possible to anticipate what to expect sonically in an interactive environment where events may unfold in many potential ways. We must teach the engine to deal with all potential situations using a carefully thought-out routing and rules of architecture for the game to follow. In most situations the game has no or little awareness of its own audio output.

Our challenge is, as it is so often in game audio, twofold: ensure a clean, crisp and dynamic mix while making sure that, no matter what, critical audio such as dialog is heard clearly under any circumstances and is given priority. Discussing the various components of a good mix is beyond the scope of this chapter and shall be addressed in detail in Chapter twelve.

7. Asset Management and Organization

A modern game or VR simulation requires a massive number of audio assets. These can easily range in the thousands, possibly tens of thousands for a AAA game. Managing these quickly becomes a challenge in itself. Game engines, even third-party software such as Wwise, should be thought of as integration and creative tools rather than asset creation tools. The line between the two is not always an absolute one, but as a rule you should only import in the game engine and work with polished assets ready to be plugged in as quickly and painlessly as possible. While you can fix some issues during the implementation process, such as amplitude or pitch adjustments, you should avoid consistently relying on adjusting assets in the game engine for matters that could have been taken care of sooner. This tends to cost time and create unnecessarily complex projects. It is much more time-efficient to make sure all assets are exported and processed correctly prior to importing them.

An asset delivery checklist, usually in the form of a spreadsheet, is a must. It should contain information about the following, but this list is not exhaustive:

Version control: you will often be dealing with multiple versions of a sound, level, game build etc. due to fixes or changes. Making sure you are working with and delivering the latest or correct file is obviously imperative.
Deadlines: often the work of the sound design team is split up into multiple deadlines for various assets types in order to layer and optimize the audio integration and implementation process. Keeping track of and managing multiple deadlines is a highly prized and useful organizational skill.
Consistency and format: making sure that all the files you will be delivering are at the proper format, sample rate, number of channels and at consistent sound levels across variations, especially for sounds that are related (such as footsteps for instance), quickly becomes challenging and an area where it is easy to make mistakes.
Naming convention: dealing with a massive number of assets requires a naming convention that can easily be followed and understood by all the team members. The naming convention should be both descriptive and as short as possible:
- Hero_Fstps_Walk_Wood_01.wav
- Hero_Fstps_Walk_Metal_02.wav
- Hero_Fstps_Run_Stone_09.wav

Deciding on a naming convention is something that should be carefully considered in the preproduction stages of the game, as it will be very inconvenient to change it halfway through and could cause disruptions in the production process. Keep in mind that audio files are usually linked to the engine by name.

Conclusion

The functions performed by the soundtrack of a video game are complex and wide ranging, from entertaining to providing user feedback. The goal of an audio developer and creator is to create a rich immersive environment while dealing with the challenges common to all audio media – such as sound design, mixing and supporting the narrative – but with the added complexities brought on by interactive media and specific demands of gaming. Identifying those challenges, establishing clear design goals and familiarity with the technology you are working with are all important aspects of being successful in your execution. Our work as sound designers is often used to support almost every aspect of the gameplay, and therefore the need for audio is felt throughout most stages of the game creation process.

Table of Contents for 2 The Role of Audio in Interactive and Immersive Environments

Create new playlist

Sign In

Sign Up

Conclusion

Table of Contents for
2 The Role of Audio in Interactive and Immersive Environments