1

Sound Basics

Sound is essential to storytelling. Good sound propels us through a narrative, triggering certain emotions, reminding us of themes, and helping us recall information. Along with the visuals, sound creates a satisfying film experience in both fiction and nonfiction stories. When the sound isn’t right, we know it. The wrong music can ruin the mood for a scene. An interview that echoes becomes distracting if we can’t see a large space in the shot to justify the echo. A soundtrack with too prominent a drum track intrudes on important words in an interview. In each of these instances, the story is disrupted, rather than supported, by sound. By contrast, when sound “works,” we often don’t notice it. We laugh, we cry, we absorb important information—all without realizing the role that sound has played in making us respond. So, what is it about sound that makes us either jump further into the story or get pulled away from it? What are the components of sound that are essential to storytelling? In this chapter, we will demystify some of the physics of sound and explain some resulting principles and best practices that you can apply to your sound production, editing, and mix. Whether you are working as a solo storyteller or collaborating with a large production team, this basic knowledge of sound will help you improve your media making and ensure that audio is a strong part of your work.

Sound Waves

Let’s start off with the sound wave. Sound is created by vibrations caused by pressure and release in the medium that surrounds it. These compressions and rarefactions can cause movement through air, plasma, liquid, or even solids, though typically it is air. This repetitive action of compression and rarefaction forms a pattern that is called sound waves. In the field, we record sound waves from two main sources: “sync” sound—those sounds that are synchronized with picture, including dialogue—and “wild” sounds—those audio elements we record independently of picture. We then take those sounds and convert them from the analog world in which we listen and live to the digital realm in which we produce our films. Though analog recordings can still be made, today digital recording is the norm. In post, we mix those gathered sound waves together, often weaving them with other already-recorded sounds, such as music and sound effects. During playback, the digital sound files are converted back from data to analog, which is the way the sound waves arrive at our ears. The result is a story you watch and hear, hopefully without noticing our handiwork (Figure 1.1).

What’s interesting is that we often don’t think of sound in physical terms. But these waves really do have a physical presence, just one we can’t see. For example, low frequencies have very long sound waves. At 50 Hz, the sound wave is over 22 feet long. Imagine four refrigerators stacked on each other end to end. No wonder low frequencies can travel further. And no wonder they build up in small rooms. But we’ll get to that in our production chapter. For now, just imagine all of the sound waves surrounding you and bouncing off of you in every space in which you move.

Images

FIGURE 1.1   Graphic of compressions and rarefactions of a waveform.

Frequency or Pitch

Pitch is almost entirely determined by the frequency of a sound wave, essentially the speed at which it travels. In the International System of Units (SI), one cycle per second of frequency is called one Hertz, named after the person who discovered electromagnetic waves, Heinrich Rudolf Hertz. A sound that is 20 cycles per second is 20 Hz. One that is 4000 cycles is often referred to 4 kilohertz, or 4k. (Note that this has nothing to do with 4K footage, which has to do with pixels.) When you are making a recording, you need to know that any vibrating object creates sound. Wherever you are filming, there are many other frequencies flying around besides what you are focused on recording. They might complement the primary sound you are recording. Or they might compete with it. Understanding what kind of frequencies you are trying to isolate can help you with microphone placement. For example, since low frequencies travel further than high ones, it is easier to hear a male with a low booming voice than a small child with a higher voice (unless it’s your kid, in which case you could pick that sound out of a lineup anywhere; Figure 1.2).

Every sound also has overtones, undertones, and harmonics beyond the fundamental frequency. Overtones add depth and range to the sound being recorded. When you record, you want to minimize the amount of change to all of these frequencies. This is especially true with the human voice, with its many overtones and undertones. Because the human voice has such a wide frequency range, and is very complex, you can really negatively impact the voice by manipulating frequencies too much when you are recording in the field. We will address all this later in Chapter 4 on location sound.

Images

FIGURE 1.2   Graphic of sound wave over time depicting pitch.

What this also means for the video storyteller or filmmaker is that there are sounds beyond your primary audio that can interfere with your recording. Be aware of the sound around you. To do this, we recommend always using professional headphones (not earbuds) when recording sound. You will hear sounds much more clearly through headphones. You’ll be able to place your microphone slightly differently for clearer primary sound, or understand before it’s too late just how distracting a particular sound in the background will be to your primary audio. It’s always better to rid your location of the noise—shut down that humming air conditioning unit, move the snuffling dog—rather than fix it in post. Or avoid interrupting frequencies altogether by using our sound scouting tips. But when you can’t avoid competing sound waves, you do have options (which we’ll address later) for minimizing or eliminating some disturbing frequencies in either production or post.

Amplitude

Amplitude is the measure of energy of a sound wave. It is measured from the top to the bottom of the sound wave. The more energy or amplitude, the louder the sound and the further it travels. If you open the door to a night club on a Saturday night, chances are good that you will physically feel those powerful waves of sound washing over you. The pressure you feel and hear is measured in SPL, or Sound Pressure Level. Distance is also key in measuring SPL. For instance, a vacuum cleaner is loud at 70 SPL just a little over a yard away. A jet plane is 140 SPL at 54.5 yards away. Last summer, I measured the SPL around our community pool with my handy analog SPL meter to determine how loud our swim meets were from various distances, including the house of a person who complained these gatherings violated a local noise ordinance. Let’s just say there was a lot of sound energy, but no laws were violated.

Amplitude is measured in decibels, or “dBs” for short. Sound engineers like me often think of dBs as “clicks” or small levels of adjustments: “bring that music down two dBs” or “bring that sound effects track up a click (or one dB).” By the way, like the Hertz, the decibel is another term in audio that gets its name from an important inventor. In this case, the “bel” refers to Alexander Graham Bell, who studied how sound travels, invented the telephone, and helped launch the audio engineering field with his pioneering work (Figure 1.3).

Images

FIGURE 1.3   Sound wave showing amplitude, pointing out crest and trough.

Measuring Audio

If you want to be precise, there are many ways to actually measure dBs. One of the most common is dBFS, which is the Decibel Full Scale used in digital audio systems. The dBFS is sample based and, in digital audio, has a top level of zero. In analog, there is actually sound beyond the top of the scale, or over zero. But we’ll stick with digital for now. Digital audio that hits zero will distort. When recording, it’s best to stick around –10 to –12 dBFS to give plenty of headroom. Final delivery levels can vary from –10 to –6 dBFS.

Another way to measure audio is Decibel True Peak (dBTP). dBTP refers to the inter-sample peaks that can be created during the conversion back from digital into analog and is a measurement used predominantly in broadcast media. However, we are seeing this specification now in online media delivery, too.

Yet another way to measure sound is LKFS or Loudness, which is K-weighted relative to Full Scale. Like SPL, it measures loudness but is an average measurement of volume over a period of time. Used heavily in broadcast to provide consistent volumes from program to program and commercials, this method is becoming the standard for mixing for all platforms. Broadcasters’ specifications vary, but a target of –24 LKFS and –2 dBTP is pretty normal. We’ll discuss this more in Chapter 7 on audio post-production.

Sample Rate and Bit Depth

During a recent house- and office-moving project, I found some old cassette tapes in a box from my father. My teenage daughter had no idea what they were, or even that sound was recorded on analog tape before the advent of digital audio; and vinyl was something cool DJs did. She has only seen me record and playback to and from a computer. This process actually converts the analog sound wave into a digital signal. Converting analog audio into digital is a process where the ADC (analog to digital converter) gathers information by sampling or taking a snapshot of the sound waves in equally spaced slices. This creates points carrying frequency and amplitude information. The information between the points is drawn in to finish the wave. The higher the sample rate the more points of data to connect the sound wave, the better the sound. Anytime there is sound recorded that goes digital, there is an ADC. Converters can be standalone, in recording interfaces or mixers, and even microphones.

Keep in mind that the sample rate and bit depth of digital audio have everything to do with soundwaves, their frequencies, and their amplitudes. The number of snapshots taken when sampling is called “sample rate.” Sample rate is measured in cycles per second. To get an accurate sample of a sound, the sample rate needs to be twice the highest frequency it is trying to capture. The standard sample rate for CDs is 44.1K, and digital audio without video is based on this. The original developers of digital audio took the range of human hearing, 20 Hz to 20 kHz, when considering sample rate.

To capture 20 kHz, a minimum of 40 thousand (40K) samples are required. But we actually don’t use 40K. Instead, we use 44.1K as the standard. If you want a little back story on that, here it is. The content for the first CDs was actually delivered to the mastering and duplicating plant via Sony Umatic ¾-in tapes. Yes, tapes. 44.1K allowed digital audio to be delivered for tape machines in both PAL and NTSC. Second and more importantly, the 44.1K recording standard also allowed for over-sampling to buffer the digital aliasing—or degradation—that happens when frequencies above 20K are recorded. A little digital “headroom” is therefore provided for sampling. 44.1K still stands today as the standard for CD recordings, often called Red Book Audio, but it is never delivered via tape anymore. Meanwhile, 48K has developed into the standard for digital video, both because of how digital audio is stored on digital video tapes and because it gives even more sampling headroom to prevent the ugly sound of digital aliasing.

Another consideration is bit depth, which is sometimes referred to as word length. Bit depth tells us how many bits of data there are per sample. Digital bits are zeros and ones. Bit depth determines the dynamic range of digital audio, the range from soft to loud. 8 bit audio can produce 49.93 dB of dynamic range, 16 bit can produce 98.09 dB, and 24 bit can produce 122.17 dB. Remember that the human ear can hear 120 dB in dynamic range. And here’s where you as a storyteller are affected by bit depth, whether you realize it or not. If you are recording something quiet with 8 bit recording, there’s only roughly 50 dB of dynamic range. This puts more of your information closer to or in what we consider to be the “noise floor.” As you might guess, being in the noise floor isn’t great for sound. 8 bit recordings tend to be “crunchy” because there is no dynamic range to the sounds. The sound waves’ energy is literally getting stuffed into too small of a container, with no room at the top or bottom. Even though it creates larger files than a lower bit rate would, when recording or creating audio, choose 24 bit if you can.

Audio File Formats

There are several different types of digital file formats that capture audio. The ones we use most often in broadcast-quality production are WAV (Waveform Audio File Format) and AIFF (Audio Interchange File Format). There is a trend to use mp3 files (an MPEG compression), which are fine for voice auditions and transcriptions but should not be used in your final mix. The frequencies in the human voice are especially degraded by this file format. The reason is that the mp3s are small, so they carry less frequency information. The algorithms used to create them recognize the frequencies that are seemingly most important and disregard the rest. This really destroys the sound quality of the voice. It affects music less, but generally we like to avoid mp3s and go with WAV or AIFF files, which carry all the audio information needed to make a good mix. Of course, providing a high-resolution mp3 at 320 kbps for radio is still better than sending a WAV file, only to discover that a radio station has compressed it into a low-resolution mp3 on the air. That tinny, thin sound you hear often on the radio is either a Skype call being recorded as part of an interview show, or a low-resolution mp3 being played back. One of the keys to successful audio workflow is to know what will happen to your files once you deliver them.

Recording the Human Voice

Now that you understand something about sound waves, their amplitude and frequencies, and how we sample those elements to make a recording, let’s turn to one of the most important audio components of most nonfiction filmmaking: the human voice. The voice is made up of an extremely wide range of frequencies. You can hear vocal sounds as low as 50 Hz for very deep voices and as high as 10K for high ones. The frequencies taper off drastically above 8K but are still present and add what we often call “sparkle” to the voice.

As a filmmaker, there are many stages along the way from production through post in which your audio, especially recordings of the human voice, may be degraded. The trick is to anticipate common issues and know how to listen for such problems as distortion and over-modulated and over-compressed audio. It’s important to remember that as many tools as we have in the recording and mixing of sound, the best tools to catch ugly audio are our own ears.

One of the people you may rely on for recording the human voice is your voiceover talent. Most narrators now own their own recording equipment in their own home studios. This is fantastic for producers on tight deadlines or who want to work with talent remotely. But it can be challenging if you don’t know the quality of the equipment being used, whether the sound booth is truly isolated from other sound frequencies indoors and out, and whether or not any compression is being added during the output process. Request audio files from narrators to be recorded at 44.1K 16 bit mono WAV or AIFF or 48K 24 bit mono WAV or AIFF files, never mp3s. I have emphasized the word “record” because if a talent records at a lower sample rate, or as an mp3, and then exports to the specifications of 44.1 16 bit WAV, you are only getting the same quality of the mp3, just with a different label. The record sample rate is the most important (Figure 1.4). Often music and voice is recorded at 96K or 192K to capture more frequencies in the sampling process. Don’t get caught up in that argument. What matters to you as a story creator is the quality of the ADCs, the microphones used, and the recording techniques.

Images

FIGURE 1.4   Sound being sampled.

Understanding Sound Mix and Design Process

The mix process can hold a lot of mystery for people. A video editor once told me, “I don’t know what you do, but I send the show out to mix and it’s magic. It comes back a different show, all fixed and sounding great.” It’s gratifying to hear this kind of response to what we sound designers and mixers do every day. But it’s really not magic at all. Audio post-production is a step-by-step process in which we augment your project and open up its storytelling possibilities.

Sound finishing can be broken down into the following steps:

     Organizing tracks

     Organizing tracks

     Watching the show

     Taking notes

     Listening to tracks individually

     Sound editing and noise reduction

     Sound design, including adding effects, augmenting, or adding tracks

     Watching the show with draft mix

     Taking notes

     Final mix

     Approval

     Adjusted alternate mixes for additional platforms

     Outputs to final delivery specs

Organization is the first step of the audio post process. Organizing by track and clip type is necessary for most standard deliverables and ease of global application of effects and sweetening. Once organization of files is complete, sound editing can begin. The first task is making the current edits in the timeline support the story, not distract from it. Often during the video-editing process, these edits are just roughed in, so that the focus can naturally be on the story arc and the characters. In the sound-editing process, we refine and smooth out these edits, often within fractions of a frame. This process can include taking out breaths or lip-smacking so that the listener can focus on what is being said, rather than how the speaker talks. We also spend considerable time adding in room tone so when voices are edited you don’t hear the background noise going on and off as those edits occur. We also “massage” soundbites so they are more fluid and more easily understood by the viewer/listener. That means we might add an “and” where two phrases are edited together or remove a series of “ums” so that it doesn’t feel like the speaker was stuttering. By the way, we sometimes jokingly call this process of editing interviews “Franken-biting.” Some documentary producers feel that this editing violates their sense of ethics, and there are many conversations at industry conferences and in academia about what constitutes unethical sound editing. But in a world where content is getting shorter and shorter, we have no choice but to make edits to accommodate delivery formats while doing our best to preserve the speaker’s style and purpose. We’ll touch more on our approach to this challenge in the chapters on preparing for mix and sound mixing (Chapters 6 and 7).

When we no longer hear the various edits we’ve placed into interviews or music, we start the sweetening process of noise reduction and equalization. Equalization is the manipulation of frequency to achieve a “sweeter” sound (Figure 1.5). Most often after sweetening, I’ll do another version of the mix with the music and voices in order to get everything sounding good before adding in the b-roll sync sound and any additional effects or layers of produced sounds, called Foley sound. Then, it’s on to the final mixing process, where there is often tweaking of all the previous steps as we mold the sound elements of the story.

Images

FIGURE 1.5   EQ Roll-Off at 80 Hz, circle showing no frequencies in that area of the spectrum.

Thinking Ahead for Multi-Platform Delivery

A key element to today’s successful storytelling is understanding all those possible platforms for delivery and their respective limitations and opportunities when it comes to audio playback. There are so many playback environments that it makes audio-mixing decisions very challenging. Our lives are saturated with content, from smartphones to traditional TVs, and museums to live events. Heck, there’s even media content at my local gas pump. The fact is, as a storyteller you never really can control where your story will end up once you release it to the world. But you can strategize and plan for the best results over multiple platforms.

I’ll give you a relevant example from the visual side of filmmaking. One colorist I work with, Robbie Carman, calls this the “Grandma’s Pink TV Effect.” He came up with this name after receiving a frantic call from a client saying the color on her newly delivered show looked pink. It turns out the client was watching the program on an older television at his grandmother’s house, and the color menu had not been set up properly. Similarly, your audio could sound “pink” when coming out of poor speakers, or compressed at the wrong rate, or any number of factors out of your control. The best you can do is deliver excellent audio with the playback medium in mind.

Generally, we think that in order to accommodate different delivery platforms, the best approach is to make sure you plan to create an audio mix that will work across multiple speaker types and variations in loudness. Broadcast and Over the Top (OTT) providers will give you very detailed information about how they want audio delivered. For most producers, budget and time constraints necessitate only creating one mix. Because of this, it is essential to know which playback environment is the priority. For example, if you are delivering for multiple platforms, such as broadcast and web, typically. The broadcast platform is the priority. In other cases, the priority might be a live environment presentation. I have a client who has a large annual conference of about 20,000 attendees. Though the media will also be posted online after the event ends, the top priority is the live experience at the event itself. So, in this case we work to ensure that the mix sounds great on large, big-room speakers at a high volume. We also play back the mix on smaller speakers at a softer volume to make sure nothing is lost in translation when the show goes online.

There are certain instances where you should not just release one mix and expect it to be successful, such as when independent feature documentaries are bought for OTT or broadcast. The technical specifications for each network vary so much that a revised mix is usually needed. The dynamic range of a film or feature mix can be much greater than the very small dynamic range of a broadcast mix. Often, another running time is needed as well, resulting in a change for all of the audio edits in the original mix. It is a good practice to start your production with the assumption that you may eventually need a film/theatrical mix, an OTT, or broadcast mix. This doesn’t mean you need to compromise on the quality of your original delivery. We will address some ways to make the transition more efficient when we discuss the mix in Chapter 7.

A long time ago, I had a client who insisted that her 90-minute film would only be played online and in stereo. Being a self-funded documentary maker, she couldn’t afford anything else. We proceeded with a mix, and everyone was happy with the results. About a year later, I received a call from that client with the wonderful news that her project had been bought by a distributer, asking if I could turn around a mix to their specifications in a single day. You see, the audio for her project had failed the network’s rigorous QC (quality control) requirements for broadcast. This didn’t shock me. I took a deep breath, looked at the technical specifications and delivery requirements, and got to work. The first obstacle I noted was that a 5.1 surround mix was required. Surround is a multi-channel format where there are discreet channels feeding speakers around you, not just in front of you, as with Stereo. In addition, the levels the broadcaster required were, naturally, more consistent with broadcast specs (24 LKFS plus or minus zero) than the –6 dBFS stereo online media mix we had previously created. After bringing up the project and working for more than an hour to unsuccessfully “up-mix” the original stereo to surround sound, I realized that it would take multiple days to do the job right. I suggested we instead address the issue of the audio levels and other delivery requirements, and get permission to deliver in stereo rather than surround sound. The client received permission, and we made the deadline. All this in under eight hours. After that, I have always built any film—short or long—in a surround project, even if it ultimately wasn’t finished in surround. The moral of the story is this: work to “future proof” your project as much as possible by thinking about the highest possible delivery quality for your mix.

Communicate, Collaborate, and Create

Collaboration and communication are the key to any project’s success. Fortunately, there are several tools that can help in both the communication and management of the creative process, whether your entire team is local or you are working remotely. Asana, Basecamp, and Trello are standouts in the team management arena. These tools offer the ability to manage deadlines and share ideas and assets, such as scripts or transcripts. Trello’s white board approach works very well for managing the mix itself. The cards can be moved from phase to phase – keeping everyone in the loop of where the process is, where the holdups are, and what needs to be done – all at a glance. The ability to sync with calendars, such as Google and Outlook, keeps the project tracked and present on a calendar as well. Frame.io, Vimeo, and Wipster deliver amazing options for sharing ideas and give you the ability to post unique and frame-specific review notes on a rough or fine cut. With Wipster, each client can post his or her own notes on a frame, but the number of people able to post changes at the fine cut stage can be limited. Wipster also integrates with Slack and Vimeo, and includes project status pages that work well for teams managing multiple content delivery streams. Frame.io was originally launched with some great features and is now moving towards high-security features required by larger television networks. Source Connect and Streambox work well for reviewing your show and your audio mix live with a client. These tools allow for a live, high-quality review with very little latency. The trick is to find the tools that work for you and your team. Ease of use and price is key—why waste money and time learning a complicated system?

You should now have an understanding of some of the basics of sound—how sound is created; how it moves; and how we hear, measure, and record it. We’ve also discussed the elements of a sound mix and how to be prepared for your audio workflow, including apps to help organize your team and help you manage the review and feedback process. In the next chapter, we will dive into the fun stuff: the opportunities for storytelling that abound with audio.

Tips on Sound Basics

     Sound is physical, coming at us in waves. As you develop your sound sense, remember that these waves come in varying frequencies and energy levels, and can travel different distances, depending on their length and any obstacles in their path.

     While sound is naturally analog, it is converted to digital during recording and then converted back to analog in playback.

     Always work to minimize any compressions or distortions to sound you are recording in the field or in a studio. Once those frequencies or portions of sound are lost, they can’t be retrieved.

     Record audio at 48 kHz for the best-quality sound. Stick to –10 to –12 dBFS to give plenty of “headroom” for the range of frequencies, including undertones and overtones.

     Final delivery levels can vary from –10 to –6 dBFS and –2 dBTP, so, be sure you understand all the possible delivery specs for your project, including broadcast, web, display, and projection. Each one may affect how you acquire, edit, and deliver your sound.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.234.150