CHAPTER 8
Using Audio Codecs

This chapter is about the common attributes of audio codecs and the decisions to be made when using them. It follows the structure of the previous chapter on video codecs.

While video may take 80–98 percent of the compression time and bandwidth, audio still makes up at least half the experience—audiences are more likely to stick with bad video with decent audio than decent video with bad audio. So getting the audio right is just as important as getting the video to look good. The good news is that it’s also quite a bit easier; if targeting modern formats, there’s really never a reason for audio to be distactingly overcompressed.

Choosing Audio Codecs

General-Purpose Codecs vs. Speech Codecs

Audio codecs fall into two main camps: general-purpose codecs and speech codecs.

General-purpose codecs are designed to do well with music, sound effects, speech, and everything else people listen to.

Speech codecs do speech well, but generally can’t reproduce other kinds of content well. Commonly used speech codecs include AMR, CELP, and WMA 9 Voice. However, speech codecs can go to lower data rates than can general-purpose codecs, and generally provide better quality with speech content below 32 Kbps. They’re also low complexity for easy playback on phones. Speech codecs generally only support monophonic, single-channel sound. And, of course, any nonspeech content in the audio will either be distorted or removed. Most speech codecs also have a pretty low maximum bitrate—32 Kbps on the outside.

The current generation of general-purpose audio codecs using frequency synthesis, notably HE AAC and WMA 10 Pro, can be competitive with speech at much lower bitrates than older codces, so it isn’t always necessary to use a speech codec even for low-bitrate speech. Coupled with the general increase in bandwidth, speech-only codecs are increasingly relegated to mobile devices.

Sample Rate

As discussed in Chapter 3, the audio sample rate controls the range of frequencies that can be compressed. 44.1 kHz is “CD-quality” and anything much less starts demonstrating audible reductions in quality. 32 KHz is generally about as low as music can go while being at least FM quality, and 22.05 KHz is about as low as music can go and still make people want to dance. Speech can go much lower: 8 KHz is telephone quality. It’s a bit low for high-pitched voices, like small kids, but fine for most adults.

As long as you’ve got at least 48 Kbps, you can do a decent 44.1 KHz with modern codecs and most content. Going higher than 44.1 doesn’t really add any quality for the human listener, although dogs and bats would be appreciative. There’s no reason not to use 48 KHz if that’s your source and you’ve got a high enough bitrate. More than that for content delivery is just wasted bits.

Bit Depth

Thank goodness: modern audio codecs are all at least 16-bit. We had 8-bit compression in the paleolithic age of computer multimedia, and it was a lot of work to make it merely awful instead of unbearable. Once the 16-bit, 4:1 compressed IMA codec became common in 1995, 8-bit died off quickly.

Similarly to 44.1 KHz, 16-bit is a “good enough” level when correctly mastered; many fewer bits and it can sound terrible, but more are unlikely to make a perceptible improvement. That said, some codecs do support native input of 24-bit audio. Since the codec is really storing frequency data, it doesn’t need a precise bit depth. If you’ve got greater-than-16-bit source and your codec supports that for input, go ahead and use it.

There is one 12-bit codec: DV’s four-channel mode. 12-bit simply doesn’t have enough bits for high-quality encoding, and shouldn’t be used for professional work.

Channels

Most web video source uses mono or stereo audio, while home theater codecs like AC-3 and DTS support multichannel output from DVDs. For users that have multichannel audio out of their computers, we have good multichannel codecs in WMA Professional and AAC, both of which support up to 7.1 audio. While office workers certainly don’t have 5.1 systems, they’re becoming increasingly common for gamers and home theater setups. Content targeting those audiences can benefit from encoding beyond stereo. All media players will automatically “fold down” multichannel to stereo or even mono if that’s all that’s available for output.

Unfortunately, both Flash (as of 10) and Silverlight (as of 3) always fold down multichannel to stereo 44.1 for output. I hope we’ll see this change in the future.

Older codecs allocated equal amounts of bandwidth per channel, which meant stereo required twice the bandwidth of mono. Fortunately, modern codecs use the redundancy between channels to reduce the additional bandwidth required (as described in Chapter 3). Prior to frequency synthesis, going to stereo from mono required around 20 to 50 percent more bits to achieve the same sound quality (need more bits the more stereo separation there was; identical mono audio in both channels shouldn’t require an increase in bitrate).

HE AAC v2 (but not v1) and WMA 10 Pro LBR apply frequency synthesis to stereo separation as well, making stereo coding very efficient. WMA 10 Pro doesn’t even offer mono for this reason.

Data Rate

Originally audio codecs didn’t specify a data rate. Instead, their size was determined by number of channels and the sample rate. The math is simple: sample rate in K × channels × bits is kilobits per second for uncompressed. Divide that by the compression rate for compressed codecs. So IMA 44.1 KHz stereo is 44.1 × 2 × 16, divided by 4 for IMA’s 4:1 compression ratio, and thus 353 Kbps.

Modern codecs offer a range of bitrates. Generally the available sample rate/channel combinations depend on the bitrate.

CBR and VBR

Traditionally, audio codecs were CBR, with each timeslice of audio (around 1/48th of a second, called a “frame” but not aligned with video frames) around the same size. But even CBR codecs have a little flexibility; MP3 and others use a “bit reservoir” where unused bandwidth from previous blocks can be used by future, more difficult to encode blocks. Like video, audio decoders use a buffer, and hence have a VBV, although that’s not an encode-time configurable parameter in most audio encoders.

There are many kinds of VBR in audio codecs. There can even be big differences in implementations targeting the same bitstream.

Fixed quality

Like a quality-constrained video codec, a fixed-quality audio codec provides consistent quality with highly variable data rates. This form of codec is mainly useful for archiving content or music only playback. While the size of any one file may not be predictable, generally a given type of content will average out to a particular average compression ratio over a large number of tracks.

“Average” bitrate (ABR)

An average bitrate encode raises and lowers the data rate with quality, with a target and range. This is analogous to 1-pass video data rate-limited video codecs but with much laxer constraints. Options for maximum and minimum bitrates might also be provided. In practice, this yields a result halfway between fixed quality and true bitrate control. It’s great for building out a music library where users care more about quality and total library size than bitrate control. But it’s not a good fit for streaming or soundtracks in video where there are more precise requirements.

Constant bitrate (CBR)

Audio codecs can do real CBR as well, required for streaming. It’s a fixed bitrate with a VBV, and prone to having challenging sections sound worse than easy sections.

Most implementations are 1-pass only. The WMA family of codecs support 2-pass CBR as well, but it’s extremely rare to have an audible improvement from the second pass.

Some CBR modes will allow easy sections of the audio (like silence) to use a lower bitrate; others will insert padding bits if needed.

Variable bitrate (VBR)

There’s a fair amount of confusion in terminology around VBR audio. Many tools call ABR or even fixed quality “VBR”—and they are variable. In the interest of consistency, I’m using the term in the same sense we did for video: a specified average bitrate, lower than the peak bitrate, and hence variability within the file.

This is a relatively rare mode in audio, with the WMA family the main place it has been used. But I’m a big fan of it for any time I’d be using VBR for video: bitrate is varied to keep quality more constant. Plenty of songs can encode fine at lower rates until high-frequency percussion like a hi-hat comes in, and along with it fairy bells and other artifacts. With a VBR codec, bits would be shifted to those sections for a big improvement, while still preserving a predictable file size. For highly variable content like film soundtracks, this can cut average bitrate requirements in half.

Additive VBR

An additive VBR, common to MP3, lets you set a target data rate, but it will raise the data rate above that point for difficult audio. Therefore you know the minimum data rate, but not the maximum. Still, with an output file at a given final bitrate, the additive VBR should sound better than a CBR of the same file.

Additive isn’t as good as an average bit rate VBR for most uses, since neither the final data rate nor the quality can be controlled.

Subtractive VBR

In a subtractive VBR, the audio data rate can go down for easy sections, but never goes up higher for the difficult sections. Subtractive VBR can be a good mode for encoding web content, since you can assure some minimum quality, but can save bits on easy-to-encode sections.

Encoding Speed

When encoding audio as a soundtrack to video, the speed of audio is vanishingly small compared to the video. Even complex multichannel audio encoding runs many times faster than realtime on a modern machine.

The only time people really care about audio encoding time is for big batches of audio-only content, like a music library transcode. Even CD ripping is limited by the speed of the drive, not the codec these days. There are MP3 codecs that have multiple speed/quality modes, but lots of implementations have just one speed. If you have a speed control, I recommend tuning it to a slower, higher-quality option unless you’re in a rush.

Tradeoffs

Generally speaking, the only tradeoff with audio is to find out how high a bitrate you need to sound awesome, and use that. Broadband and sideloaded content should always sound great; it’s only streaming at low rates, particularly to phones, where there could be audible quality compromises.

In general, I’d say more compressionists err on the side of too little audio bitrate versus too high. For a 500 Kbps stream, going from 32 to 64 Kbps for audio can result in a listening experience that goes from weak to stellar, and then reducing the video bitrate from 468 to 436 may not even be noticible. So, raise your audio bitrate until a higher rate isn’t appreciably better, or until the resulting reduction in video bandwidth and quality hurts more than the increase in audio helps.

Sample Rate

For broadband and CD-ROM uses, set your sample rate to at least 32 kHz. But keep in mind that higher frequencies may cause artifacts. Lower sample rates are also faster to encode and play back.

Bit Depth

This is simple. If using a 16-bit codec, great. If you’ve got greater-than-16-bit source and a codec that can use that, go for it, but don’t go to a lot of extra work to get there.

Channels

For most listeners, stereo is less important than having a high sample rate and few artifacts. If using an older codec at low bitrates, go to stereo only if you’ve got enough bandwidth for the stereo to be 44.1 KHz and sound better than a mono version.

Stereo Encoding Mode

Most codecs don’t offer a choice of stereo encoding modes. Of those that do, joint stereo is the best option for compression efficiency, but normal stereo is best if you need to keep accurate phase relationships for Dolby Surround–encoded content, or are doing a high bitrate archive.

Data Rate

Higher data rates make audio sound better, but they require more bits. Typically, the data rate of a file allocated to audio is much less than for video. Note that some sample rate, bit depth, and channel combinations may require particular bitrates.

CBR vs. VBR

If you’re encoding a soundtrack for a VBR video file, make the audio VBR as well. Note that some codecs only offer VBR at some bitrates. For example, WMA 9.2 and 10 Pro offer VBR for their middle bitrates, but are CBR only at the high and low end.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.166.101