CHAPTER 13
Advanced Audio Coding (AAC) and M4A

M4A File Format

The M4A file format is simply an MPEG-4 file with only a single track of AAC audio. It’s not “. mp4” simply to make it easy to tell it’s an audio-only file. M4A was made famous by Apple with iTunes, and has become broadly supported in media players and consumer electronics devices.

The main advantage of M4A over MP3 is better compression efficiency. In its typical AAC-LC profile, it provides the same quality in a third less bitrate.

An iTunes .m4p file is a FairPlay DRM-encrypted M4A and as such isn’t interoperable outside of the Apple ecosystem. It’s something only Apple can make, and only Apple products can play it back.

AAC Profiles

AAC Low Complexity

AAC Low Complexity (AAC LC) was originally the low-complexity implementation of AAC, a simplified version of the forgotten AAC Main. But in real-world listening tests with optimized encoders, AAC LC quality at 128 Kbps was essentially as good as AAC Main, but with lower encode and decode complexity. Thus it has become the default version of AAC, with enhancements building off LC instead of Main.

AAC-LC is the only version supported in QuickTime as of 7.6 and Silverlight as of 3, although the High Efficiency variants provide lower-fidelity backward compatibility to LC.

AAC-LC is a very flexible codec, going from 8 KHz mono to 7.1 96 KHz in common implementations. While many encoders are CBR-only, various VBR modes also exist.

Beyond its use for audio in .mp4 files, AAC-LC has also been used in billions of .m4a CD rips in iTunes, where it has long been the default codec.

AAC-LC was a good efficiency improvement over the previous music codec standard of MP3, offering around a 50 percent efficiency improvement (CBR AAC-LC at 128 Kbps sounds about as good as CBR MP3 at 192 Kbps). However, the broader ecosystem of MP3 players have kept MP3 in broad use; AAC delivers the same quality at the same size, but rarely better quality.

But if you’re in an environment where you can count on AAC-LC being there, use it.

High efficiency AAC v1

High Efficiency AAC (HE AAC) uses Spectral Band Replication (SBR) for improved efficiency. The key insight is that while high frequencies are necessary for audio to sound rich, most of what we hear in high frequencies that matters is an overtone of a lower frequency. Since an overtone is a multiple of a lower frequency, HE AAC reduces the frequency range that gets a full DCT encode (typically by half, so 22.05 out of 44.1), and hence the bitrate of the recoding. Then it includes some “hint” data that shows how to synthesize the missing overtones based on the retained lower-frequency data. The net effect is that bandwidth requirements get cut nearly in half without a big loss in perceptual quality.

The base band in HE AAC is AAC-LC, which remains backward compatible. So while software that only decodes AAC-LC will have lower fidelity, listeners will still hear something, just at half the sample rate. This can sound pretty weak for music, but fine for intelligible speech.

HE AAC is sometimes called “AAC +” after the trademark Coding Technologies used for their implementation. Coding Technologies has since been bought by Dolby, which now calls its implementation “Dolby Pulse.” Note that this is an entirely different codec than “Dolby Plus,” which is the evolution of Dolby Digital (AC-3). Yes, this can get very confusing on conference calls!

While HE AAC supports multichannel, many decoders that can do HE AAC in stereo don’t handle HE AAC 5.1.

HE AAC v2

HE AAC v2 extends the SBR concept to stereo channels. Instead of encoding separate channels using L + R/L – R, it encodes all the frequencies together, with more “hint” data explaining which channel to steer each frequency to. That makes stereo encoding nearly free compared to mono.

Again, the core is AAC-LC, so AAC-LC decoders can still play back HE AACv2, albeit in half sample rate mono.

While extending SBR to more than stereo makes sense, it’s not something enabled in encoders and decoders yet.

HE AAC v1 is supported by many mobile devices, and HE AAC v1 and v2 are supported in Flash since 9 Update 3. QuickTime X on Mac OS X 10.6 also can decode v1 and v2, but that hasn’t been enabled in QuickTime for Windows or older Mac OS versions.

AAC Encoders

Apple (QuickTime and iTunes)

QuickTime was the first major architecture to support AAC-LC (introduced with QuickTime 6.0 in 2002). Since it’s a free API that includes output, the QuickTime implementation has been broadly used in other tools. And of course, Apple’s AAC-LC encoder is famously used in iTunes, where it has ripped billions of .m4a files. Even though QuickTime X introduced HE AAC decode, it doesn’t include an HE AAC encoder.

MPEG-4 Export vs. QuickTime Export

One highly annoying aspect of QuickTime’s AAC-LC is that most of the cool advanced options aren’t available if you’re exporting to a .mp4/.m4a file (see Figure 13.1). You need to export as a QuickTime Movie for the full dialog to show up (Figure 13.2). It’s not a compatibility thing; it’s trivial to reopen the .mov, and then export to MPEG-4 with audio set to “Pass through” (Figure 13.3). Hopefully, this will change with QuickTime X.

Figure 13.1 QuickTime’s AAC dialog when exporting to MPEG-4.

image

Figure 13.2 QuickTime’s AAC dialog when exporting to a QuickTime file.

image

Figure 13.3 But you can load your advanced setting encoded .mov file and pass it back through to MPEG-4 by using “Pass through.”

image

The Mac does include a command-line encoder that will do this in a single step. You can find it by typing the following in Terminal: /usr/bin/afconvert

It has a variety of promising sounding parameters that appear to be entirely undocumented, none of which are exposed in any GUI compression tools, and there are cases where afconvert produces seemingly valid output that isn’t QuickTime-compatible.

Quality

The classic Quality versus Speed control, going Faster-Fast-Normal-Better-Best. This is more than just the codec; it also impacts sample rate and bitrate conversion; Best is definitely recommended if your source is 24-bit so that you get high-quality rounding/dithering to 16-bit. If you’re encoding audio-for-video, just stick it at Best and forget about it – the increase in total encoding time would be a rounding error.

Encoding strategies

Constant Bit Rate

The default Constant Bit Rate mode is a classic CBR encoder. It’s the least efficient overall, of course, and is mainly useful when doing real-time encoding, as it has the lowest and most predictable broadcast delay.

Average Bit Rate

Average Bit Rate is a ABR-style encode that sticks closely to the target bitrate—it hits the average over several seconds, so total file size will be with a percent or two for content of at least a minute. This should be your go-to mode for on-demand encoding. It’s fine for streaming as well, given the relatively small variability.

Variable Bit Rate

Variable Bit Rate asks for a target bit rate, but it’s otherwise pretty similar to other qualitybased VBR audio codecs. The output size can easily be 20 percent higher or lower than the target. VBR mode is good for music collections where the actual bitrate will average out. Its unpredictability can be a problem for web distribution.

VBR has a smaller range of available bitrates than ABR/CBR, but as it can undershoot/overshoot its target, the practical range is essentially the same.

Variable Bit Rate constrained

VBR Constrained is a hybrid of ABR and VBR, keeping the final bitrate within a range and thus offering a lot more adaptability. It’s a good compromise for progressive download, as it offers quite a bit better efficiency than ABR.

Table 13.1 shows the allowed data rates for the different modes in QuickTime. I’m keeping to reasonable sample rate/channel combos; even though it allows 8 KHz 7.1, you wouldn’t ever want to do that.

Table 13.1 Allowable Data Rates for Different QuickTime Modes.

ChannelsSample rateCBR/ABR Bitrate rangeVBR range 44.1/48
Mono88–248–24
Mono11.0258–329–36
Mono1212–329–36
Mono1612–4812–48
Mono22.05/2416–6418–72
Mono3224–9624–96
Mono44.1/4832–25624–96
Stereo22.05/2440–12836–144
Stereo3248–19248–192
Stereo44.1/4864–32048–192
4.044.1/48128–64096–384
5.0/5.144.1/48160–768120–480
6.044.1/48192–960144–576
7.0/7.144.1/48224–960168–672
8.044.1/48256–1280192–768

Coding Technologies (Dolby)

Coding Technologies were the creators of HE AAC (called AAC + in their implementation). They were then purchased by Dolby, with the implementation renamed Dolby Pulse. But it’s still the same thing, and lots of applications, such as Squeeze and Episode, are still using the old name (see Figures 13.4 and 13.5).

Figure 13.4 Squeeze calls out HE AAC by the Coding Technologies brand. It doesn’t offer a way to turn off v2; if you want v1, you have to use mono.

image

Figure 13.5 Episode gives explicit options for High Efficiency (v1) and Parametric stereo (v2). It’s one of the few tools that can encode multichannel in HE v1. However, it doesn’t constrain your choices: you can ask for 96 KHz 7.1 HE AAC v2 at 8 Kbps, and you won’t find out that it doesn’t work until the encode errors out.

image

These implementations offer good quality, and are generally CBR only.

Microsoft

The Microsoft AAC-LC implementation in Expression Encoder 3 and Windows 7 doesn’t have very complex options, but provides good quality within those (Figure 13.6). It is mono and stereo only with 96, 128, 160, and 192 Kbps CBR options. It does support both 16-bit and 24-bit audio input.

Figure 13.6 The Microsoft AAC-LC encoder from Expression Encoder 3. It’s not very flexible, but offers good quality in the available bitrates.

image

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.165.226