So far we've concerned ourselves with making scenes smooth and believable. We've looked at ways to even out ugly shot transitions and examined what to do when faced with many tracks from which to choose. Unfortunately, these are the very procedures that some editors point to when claiming that dialogue editing is boring and wanting of soul. “Sound effects editing is filled with opportunities for artistic expression, but not so dialogue!” Rubbish.
You have to do your basic scene balancing and clean-ups before you can get to the fun stuff. It's very difficult to manipulate subtle changes in the focus of a scene if you can't hear past the room tone explosions at each edit. In the same way, you can't paint your walls until you fix the falling plaster. Once your scenes run smoothly, however, and there are no serious problems with noises or off-mic sound, you're ready to begin adding some life to the dialogue.
In this chapter we'll begin to turn flat dialogue into something with depth, focus, and story.
Film dialogue is overwhelmingly mono, usually coming exclusively from the center speaker only.1 As shockingly retro as this may seem, there are good reasons for it. Logic tells us that the apparent sound source of a line of dialogue should mimic its visual source. Given the ever-growing number of channels on a big-league film print or DCP file, you'd think we'd take advantage of some of those channels and move the dialogue around the screen. See it on the left; hear it on the left. Same goes for the right. Moving character? No problem, automate a pan from left to center to right and into the audience. Simple? Yes, but set up such a dialogue scene on your workstation and you'll right away sense the lunacy of your “logical” plan.
People interact differently with dialogue than with music, sound effects, backgrounds, or Foley. With dialogue we're more critical and more imaginative at the same time, and when it's panned, we're not the least bit forgiving. Rather than enhancing the film experience, panned dialogue (other than an occasional off-screen line or group loop) often takes the viewer out of the scene—the last thing you want. Problems crop up when you place dialogue anywhere but in the center loudspeaker behind the screen, as did Gaumont so long ago.
Here are some of the factors that go into dialogue placement within the sound image:
The bottom line on dialogue imaging is that you can, in fact, do a bit of panning but you must be terribly sober about it. Normally, however, the dialogue comes out of the center speaker and none other.
Just because film dialogue is mono doesn't mean it's flat. In a flat scene, the dialogue sticks to the screen. Everything is given equal weight, so there's no focus. To “manage” levels, the scene is compressed too much so it presents itself as a wall of sound. There's no air in the sound, and the scene is fatiguing to watch. In short, it lacks depth.
When a scene has depth, there's a feeling of space around the words. There's a focus, however subtle, that not only guides the viewer but adds com -mentary. And even though all of the dialogue is coming from one speaker, there's a feeling of layers, as though some sounds stay near the screen while others move progressively away from the speakers.
First of all, it certainly doesn't hurt to start with excellent recordings. When the dialogue is well recorded and on-axis, without too much ambient noise, you don't have to overcook the tracks in the premix, so you can hold on to the natural roundness of the voices. The reflections in the room, undamaged by aggressive noise reduction, contribute to a sense of space.
But depth control is mostly about well-planned, well-mixed tracks. When you receive an OMF/AAF, the narrative is there but there's no discipline in the tracks and it's impossible to find the personality of each region. As you reassemble the shots, rid them of noises, and smooth the transitions, you develop an understanding of how the regions work and how they fit together. You begin to understand them in a way that you couldn't when they were all jumbled together. Each scene—and in fact each shot—you discover, contains a “moment.” This moment is the focus of the action. Make sure that it isn't compromised by a footstep, door close, or other natural but ill- placed sound. Give the featured moment a bit of space in which to breathe, thus putting it on stage.
Split off the regions you want to separate from the rest of the scene. You'll manipulate these separated regions later, in the premix. A tiny bit more volume here, a little less there, wetter, sharper, duller.2 If you want to push a shot further back toward the screen, split it off so that the mixer can change the EQ to darken the sound, reduce the level a bit to make it less prominent, or create a little reverb.3 Even a tiny amount of trickery will create distance between foreground dialogue and a manipulated background element, and the process can be used to imply physical, emotional, or social separation.
Ultimately, most of the depth in a scene comes from the countless tiny fader moves that the mixer executes while predubbing. “Microdynamic control,” you might say. The finesse of the mixing process pulls one shot from the screen and pushes another back, so plan the tracks so that you and the mixer can take advantage of your editing fantasies. Prepare tracks that seduce the mixer into playing with depth.
If, on the other hand, you don't dissect your tracks enough to let them talk to you, or if you merely “process” your tracks in a workstation rather than giving them the time they need in a proper dialogue premix, you probably won't achieve the separation and depth you're seeking. If you limit and compress as a substitute for manual fader moves, you clog the air within the tracks and the light in the conversation will darken. There's a reason that dialogue is premixed on a dubbing stage.
You can, of course, do a perfectly decent dialogue premix within a work -station. What matters is that you do it in a room whose acoustics translate well to the mix room, and that you have a means of controlling the DAW with enough ease that you don't talk yourself into compromise just because doing it right is too much of a hassle. Still, by planning your tracks well and knowing what's important in each shot, you can develop rich, deep scenes that will survive any mix, regardless the platform.
I remember during Edward Scissorhands being in close contact with dialogue because Johnny Depp's performance was so delicate and introspective. His hands, too, needed to sound delicate, but have times of agitation when he was nervous. Also, I needed to be mindful of the particular pitches that his hands included so they would not interfere with the timbre of his speech. Had we not paid close attention to Depp's delivery and emotional performances, the hands could have become annoying noises. I think the design of his hands worked nicely because we paid attention to the actor's intentions of the character and to the dialogue editor's choices.
Vanessa Ament-Gjenvick, Foley artist
Die Hard; Platoon; Batman Returns
Author of The Foley Grail: The Art of Performing Sound
for Film, Games, and Animation
Imagine a choir singing with our protagonist standing on the second riser. The scene begins with a wide shot, and we hear the whole choir singing. As the camera slowly dollies toward our heroine, we want to call attention to her voice. We may do this purely for “realism”—to sense more of her voice as we approach, providing another layer to the sound—or we may want to psychologically focus on her, to reflect on what we learned of her in the last scene or to telegraph something soon to happen. In the mix we might subtly lower the choir's tracks to lend her an unnatural aura of isolation.
Whatever your objectives, you'll need to create a clean sync track with nothing but the character's voice that you can mix with the track from the wide shot. Normally, this is done with postsync (ADR), where you prepare the actress's lines and then rerecord her singing, sync to picture. Rerecording singing is in many ways easier than rerecording spoken dialogue, since most music carries a predictable rhythm. The actress need but learn the “quirks” in her onscreen performance to make a good match. (Chapter 15, ADR, discusses how to spot the shots, prepare the paperwork, and record and edit the lines.)
A similar situation is a dolly shot that moves down a line of football players attempting to sing the national anthem. As we approach an athlete we hear his voice strengthen, hold and fade away as we pass, only to be overtaken by the next one entering the frame. In this case, you'll record ADR for each of the featured characters, panning (maybe) and fading (certainly) as the camera moves down the line. If, when approaching individual players, you can hear their voices on the original recording, you'll need to replace the take with a less specific version. Otherwise, you run the risk of hearing someone singing with himself.
In a scene with two characters, the sound level of each character often remains the same regardless of who's on screen, as though we're sitting somewhere between the two, listening. However, there are questions of perspective that must be asked. Cut to a wide shot and we may or may not change the sound to match the change in picture perspective. On a very close shot, we may or may not accentuate this physical closeness with a bit more volume. Even if we cut to a long shot—a point of view that would in the real world certainly affect our sound perception—we may stick with a close-up sound. Or we may completely muffle the dialogue to reflect the frustration of trying to eavesdrop from across the yard. These are choices about sound perspective.
Simply put, perspective in sound reflects decisions we make concerning our relationship with the screen action as well as the relationships—physical and emotional—between the characters within the scene. In the conversation scene from the last paragraph, we kept the same perspective when cutting from one close-up to the other, and hence kept the same volume, EQ, and reverb for the two characters regardless of who was on screen.
This wasn't only because they were seated relatively close together but also because they were communicating with each other, carrying on a conversation, so there was some sort of emotional contact. There was no reason to honor the “fact” that during a cutaway to the nonspeaking character there would logically be a sense of sound separation. Respecting reality—pulling back the dialogue a bit as we cut to the listener's face—would emotionally separate the two. However, when we cut to a wider shot, we must decide what we're trying to accomplish.
Are we, the viewers, being shut out? If so, a strong perspective cut that reminds us of our outsider status would make a point. If, on the other hand, we keep the scene steady, ignoring the change in picture perspective, we keep the focus on the conversation rather than on the physical world. It's as though the conversation transcends changes in our viewpoint.
I once edited the dialogue on a film about a mother coming to terms with her adult son's impending death. The film deals with a mother and son struggling with the pains and mistakes of their past. Much of the film takes place in the son's bedroom, so there's not a lot of action. At the beginning of the film, the two estranged characters don't know how to connect, since they'd never learned how to talk to each other. During these early scenes, Mom aimlessly cleans Son's room as they talk at each other.
Since there's no real chemistry between the two at this point in the film, I forced a bit of perspective on all off-camera dialogue. As we watched Mom clean nervously while “listening” to her son's harpings, the son's dialogue was very slightly attenuated and made a tiny bit wetter, as though Mom was hearing but not listening. When Mom was talking and we lingered on a cutaway of Son, her voice, too, was made slightly distant.
The film progressed, and after the inevitable knock-down drag-out, there was some real, though painful, connection between the two. As they battled and reconciled, the perspective difference between them vanished. They weren't exactly getting along, but they were communicating. The rest of their conversations in the bedroom were mixed at equal volume, keeping the characters connected.
The most recognizable perspective cuts are about physical rather than emotional or psychological distance. For example, Mom and Dad are arguing in the kitchen. We cut to the other end of the house where we see a frightened child listening, along with us, to the muted shouts of the parents. This is a classic perspective cut that tells us something about the geography of the house, the parents' ability to keep their problems from the child, and about the way she perceives these arguments.
Another example of perspective to enhance a story is from Hamlet, Act 3, Scene IV. Hamlet and Gertrude, his mother, are quarrelling in her chamber (the troubled prince is having a hard time accepting that Mom married the ambitious brother of her murdered husband, the king). Hiding behind a curtain, spying on the squabble, is Polonius. The argument itself might cut from one close-up to another and be sprinkled with medium and wide shots. More than likely we'd keep this conversation rather level, with no real sense of perspective (except perhaps in the case of extreme close-ups) to constrain the energy of their argument and to hold the focus on the fight. But when cutting to a behind-the-curtain shot to see all of this from Polonius's point of view, it's his breathing and body motion that comes to the forefront whereas the fight between Hamlet and Gertrude sounds lower and a bit darker and wetter.
This perspective split not only accentuates the spatial separation between the two sides of the scene but also calls on a film language convention used to describe eavesdropping. By pushing the main part of the scene (for example, Hamlet and Gertrude's brutal argument) to the background, we make Polonius the focus of the shot. His fear, his anger, and his humanity are what counts. This sort of perspective cut is a common way to identify the outsider as well as to give some depth to a scene. If we keep the sound of the argument relatively consistent, even when we're behind the curtain, the focus rests on Hamlet and Gertrude, making Polonius emotionally and narratively less important. Either way, Hamlet kills him.
Sometimes there's no real physical separation, yet you use perspective to separate one member of a group from the rest. Imagine a circle of schoolgirls, giggling and gossiping, largely at the expense of one of them—let's call her New Girl. All dialogue between the girls in the circle is prepared and mixed in a normal way, so nothing initially seems wrong to an outsider. But the teasing intensifies, and New Girl becomes increasingly frustrated. To make a point, we force perspective on the sounds of the other girls when the camera cuts to New Girl. We begin to experience the badgering from her point of view.
The sounds of the provokers become wetter, lower in level, and maybe a bit darker, all to show the increasing separation between New Girl and the group. When we cut back to the other girls, the sound is normal again, further stressing the frustration of the victim. Not only is she subjected to the other girls' taunts but, worse, she's separated from the society she so wants to be a part of. By the end of the scene, we may have lost all sound when we share New Girl's viewpoint. She gives up and tunes out. Most of the emotional message of the scene is delivered through sound manipulation.
Perspective cuts to emphasize emotional separation need not be delivered with a hammer blow. The ones I used to divide Mom and dying Son aren't severe—just a 2 dB level drop, a little EQ, and a tiny bit of reverb. Few people will notice, but this subterfuge makes a difference. Other times, when you want to stress distance, separation, or fear, you can apply aggressive perspective.
Remember, leave the equalization, dynamics processing, and reverb for the mix. Plan the subtle storytelling, then organize your tracks so that it's easy to develop your ideas with the rerecording mixer.
Creating the illusion of distance or separation or depth involves several tools. By far the most common—and easiest—are volume level and EQ. To grossly oversimplify things: to make something seem farther away, make it quieter and darker. With distance, sound tends to lose high frequencies and almost certainly lose loudness. There are plenty of exceptions, but this is a good place to start. In interior settings, depending on the size of the room and what it's made of, sounds will often get wetter with distance from the source. Creating distance through reverb is a tricky matter, as the space you create often has little to do with the space you're in. Pay attention to the color of the reverb tail; it's often too bright, so it sticks out. Play with the feel of the reverb to make it give the impression you want. And don't get too caught up in the physics, the reality, of the shot. The separation between shots that you are creating is about emotion, even if you're mimicking a real-world change of POV. When separating points of view in an exterior scene, a bit of delay (slap) may do the trick. But delay is easy to overuse, so before you congratulate yourself on the super cool transitions you just made, take a breath and ask yourself if this transition best tells the story of the moment or if you're just showing off.
Often a mixer will “pull perspective” on an extreme close-up facial shot. Volume goes up slightly, there may be a tiny emphasis on high frequency. The result reflects reality (we are, after all, in this guy's face, but it may help us to get into his thoughts and feelings.) Coupled with thoughtful Foley, we can touch his beard, smell his breath, and sense his fear. Overdone, this
technique is distracting and it unnecessarily separates the characters— drawing the conversation to a halt.
The illusion of perspective is created with mixing magic. But this is all made possible through proper dialogue editing. In essence you must again split the tracks you already split. You can, of course, organize your tracks following the usual rules of dialogue editing shown in Chapter 11: split by shot or by dominant sound, ignoring picture edits within scenes (see Figure 12.1). You'd then control the many elements that make up a change in perspective (volume, EQ, reverb, and perhaps dynamics) with automation. You can pull this off, but the solution is plagued with several problems:
Controlling depth and perspective is not about equipment. You can do a perfectly good dialogue premix in a workstation. More films than you think are mixed this way. It's not about control surface; it's about how you envision and prep the dialogue tracks BEFORE you mix them. If you're after depth, perspective, or focus, it's hard to find a good argument for ignoring picture-driven perspective cuts while you edit (see Figure 12.2).
If you stand beside someone who's talking on the phone, you hear only one side of the conversation. Of the person at the other end you hear, at most, occasional squawks and clicks. You won't be able to follow the exchange. Often this is what happens in a film. If the filmmaker doesn't want the viewer to know what's up on the other end of the line or wants a realistic feeling, this is the way to go.
However, another convention in film language allows us to hear both sides of the conversation, as though we are listening in. This trick, unrealistic as it is, can be useful as an efficient way to kill off exposition and other essential information, and it serves to bring together two characters, all the while keeping them physically separate. It's the sound equivalent of a split screen. Some filmmakers find this a useful tool; others look with disdain at what they call a cheap TV trick.
Like it or not, the telephone split is part of the language of film. Figure 12.3 shows how it's done.
− Character A “live.” We're with this character, so we hear his voice in a natural way. We hear this track only when seeing character A.
− Character A “phone.” This is the phone voice of character A as heard by character B. We hear this track only when character B is on the screen.
− Character B “live.” We're with this character, so we hear her voice in a natural way. We hear this track only when seeing character B.
− Character B “phone.” This is the phone voice of character B as heard by character A. We hear this track only when character A is on the screen.
During a crossfade, two sounds are of course playing at once. Ideally, there's neither a “bump” from excess energy nor a “hole” from a loss of it. Most of the time you don't pay much attention to this. If you set up your preferences correctly (and you're not sharing the machine with another editor who changes your preferences while you sleep), everything ought to work well. But what if you try a perspective cut and something goes terribly wrong— your edits bump. What happened?
Normally you're cutting or crossfading between two different sounds (see Figure 12.5). Even when you're crossfading between different parts of the same soundfile the material is usually not from the exact same moment, so
you aren't crossfading across phase-aligned material. One side fades out and the other fades in, and the 3 dB attenuation at the fade's midpoint is perfect to prevent a rise or drop in level during the transition (see Figure 12.6).
With any type of perspective cut, including a phone split, you have to pay attention to the type of edit linking you use (see Figure 12.7). To create a perspective cut, you first begin with one continuous region. You perform an edit, likely at a picture-motivated location, and then split the region on to two tracks.
Once split, the regions are overlapped and then crossfaded. The amount of overlap depends on the “rules” of the film you're working on and the transition softness or harshness you're trying to achieve. Now begins the problem. Use the same −3 dB crossfade that you normally use and you'll likely hear a rise during the crossfade.
Because the material being played during the crossfade is precisely the same on both tracks, it plays together twice as loudly as during a “normal” crossfade. Whereas 3 dB was enough attenuation to quell a rise in our first example (refer to Figure 12.5), during a perspective crossfade we need 6 dB of midpoint attenuation to achieve unity (see Figure 12.8).
Using the wrong amount of attenuation is a very common mistake, and it shamelessly foils attempts at smoothness. Listen to your crossfades. If something sounds strange, check the fade and make sure the fade linking is set correctly. And like any other bit of editing advice, trust your ears. If it sounds good, don't seek out make-believe flaws.
1. Such generalizations are always dangerous. There are, of course, stereo dialogue films and there are giant gee-whiz films that place dialogue all around the screen because, well, they can. And there are artistic films that play with stereo images for reasons beyond those of traditional narrative.
2. The English language is not rich when it comes to describing sound qualities. Often the most efficient way to discuss sound is through metaphor. Hence, descriptions such as “wet” (reverberant), “dry” (little or no reverb), “sharp” (rich in high frequency, when describing nonmusical sounds; higher than expected pitch when describing musical sounds), and “dull” (poor in high-frequency elements; the opposite of “bright”).
3. Under normal circumstances it's nuts to add reverb to a track while you're working in a cutting room. When I say “add reverb” I mean organize your tracks in such a way that the mixer can easily add reverb (or EQ or dynamics or delay) to the shot.