CHAPTER 10
MP3

Although MP3 was originally designed for MPEG-1 (It’s short for MPEG-1, Layer III audio), it is almost never seen in MPEG-1 files. However, it’s massively popular as a standalone audio-only file format, and also used as a codec in media files, particularly “DivX” AVI files with MPEG-4 part 2 video.

MP3 was largely created by Fraunhoffer IIS. As part of an ISO standard, sample source code for MP3 encoding and decoding was released well before the licensing terms were announced. Many different software vendors created MP3 encoding and playback software while the technology’s legal status was murky, and later had to pay retroactive license fees. Although it wasn’t planned this way, it turned out to be an excellent way to launch the technology into ubiquity. However, this has left a certain legacy of confusion about when and where it is legal to use and distribute MP3 encoders and players, and whether or what fees need to be paid. Commercial “pay and download an MP3 file” sites definitely need to pay a fee for transactions. Commercial encoders and decoders need to do so as well.

MP3 was the first common, “good enough” format for encoding music—128 Kbps for 44.1 kHz stereo audio can be quite listenable for casual listeners, and even picky folks like me have been known to dance to a 160 Kbps MP3. Modern audio codecs like AAC and WMA Pro offer better compression efficiency, but the very broad ecosystem of MP3 players and the “good enough” compression ensure it a long life. In the end, audio bitrates are low enough that there just wasn’t enough value in “just as good, but smaller” versus ubiquitous playback for anything else to really displace it.

Like other MPEG standards, file format and decoder behavior is precisely specified, but implementation of encoding is up to the vendor. Thus, many vendors have created MP3 encoders. The two most important are Fraunhoffer’s own encoder (often called FhG), and the open source LAME, both of which are used by many other applications.

MP3 Rate Control Modes

MP3 files are broken up into “frames,” each 26 ms long (about 38 a second). This frame rate isn’t related to that of any associated video. Each frame needs to be one of a limited number of data rates from 8 to 320 Kbps (the minimum and maximum for a whole file as well). If there just isn’t enough audio information in a given frame to use up its assigned number of bits, leftovers can go into a “bit reservoir,” for use by later frames that may need more than they’re assigned. There are a few different encoder modes commonly seen in MP3. All commonly used MP3 encoders are 1-pass only.

CBR

Originally, all MP3 files were CBR. The nice thing about CBR files is that file size is predictable based upon its duration, as is where you need to jump into the file to start playing it at a certain point. CBR is also universally compatible on playback.

CBR MP3 today is most commonly used for audio broadcasting, à la Shoutcast, in order to give consistent bitrate.

VBR

Each frame’s header specifies that frame’s size. The early Xing encoder exploited this by varying data rate frame by frame in order to deliver more consistent quality at lower file sizes. These files are called VBR, and what we’d call “Quality VBR” in the video world. The file size is entirely dependent on the complexity of the audio; all you know is that it won’t be more than 320 Kbps.

VBR files sound a lot better bit for bit at lower data rates. Some very old MP3 decoders didn’t correctly support VBR files, but I doubt any of those are still in use. For example, QuickTime 4.0 would crash playing back VBR MP3, but that was fixed in 4.1 back in 1999.

ABR

Average Bit Rate is a flavor of VBR that offers more predictable bitrates, but still varies the bitrate within and between files to ensure minimum quality. It’s the primary mode used with LAME for Audio CD ripping or other file-based playback.

Some encoders like Apple’s iTunes only use VBR to raise the size of a given frame, not reduce it, so long silent sections of a file can use up a lot of bandwidth anyway.

MP3 Modes

An MP3 file can be mono, joint stereo, or normal stereo.

Mono

Mono is, of course, a single channel of audio. As always, it’s generally better to maintain sample rate than number of channels, so you’re generally better off encoding any MP3 less than 64 Kbps as mono.

Mid/Side Encoding

Like other Mid/Sideband solutions, instead of recording L and R as two different channels, L + R and L – R are encoded. Because most audio has a lot of information in common between the channels, L – R can be compressed a lot more aggressively, reducing data rate. Most encoders can switch between joint and M/S throughout the stream, using M/S when the two channels are similar, and joint when not. However, changing high-frequency spatial information messes up phase relationships, and thus Dolby Pro Logic. While this isn’t usually a problem with music source, it can be with movie and TV soundtracks.

Mid/Side shouldn’t be chosen explicitly; Joint Stereo will figure out when it’s useful and when it’s not.

Joint Stereo

In Joint Stereo mode, each frame can adaptively switch between Mid/Side and Normal Stereo per frame based on what’s most efficient. Normally, this helps compression efficiency and hence quality, and should be used for 128 Kbps and lower bitrates. However, it can cause audible degradation in audio with flanging/phasing effects—most famously, the guitar from “Mrs. Robinson.”

Normal Stereo

In Normal Stereo, the two audio channels are encoded as separate tracks, with no information shared between them. However, the encoder is able to use different frame sizes for each, if one is a lot more complex than the other. This is the optimum mode for high-bitrate archiving, or when you need to preserve Dolby Pro Logic information.

FhG

FhG is Fraunhoffer’s licensable, professional MP3 encoder. For a long time it was by far the best MP3 encoder, although LAME’s continual improvements have brought it into the ballpark with FhG. Most FhG-based encoders offer very few controls, providing choices of bitrate, sample rate, channel mode, and maybe a speed-versus-quality option—enough to suit the needs of most projects.

Lame

LAME, originally the recursive acronym for “LAME Ain’t an MP3 Encoder,” started as an open source patch to the available reference encoder source code. This reference encoder worked, but wasn’t optimized for speed or quality, and so started out enormously worse than the FhG encoder. However, lots of hackers cared about MP3, and so have progressively been enhancing it for years, focusing on features most useful for CD archiving.

LAME, which is freely distributed, doesn’t license the MP3 patent. Commercial products that want to include LAME need to pay the MP3 license fee.

The official LAME is distributed as source code, which you can compile yourself for a command-line version, or a third party can incorporate it into their own tools. Whichever way it’s used, the standard parameters of LAME are available in most encoders.

While I’m generally a fan of knob-tweaking with encoders, the LAME presets are quite good and have seen a lot of tuning. If there’s one that matches your scenario, it’s unlikely you’d be able to come up with something dramatically better. A few of note:

–abr (Average Bit Rate)

This is a 1-pass VBR, where the file shoots for an average data rate while letting the size of each frame to go up or down as much as needed. Actual final file size can vary quite a decent amount (perhaps +/− 5 percent), but not as much with the VBR mode.

–c Constant Bit Rate

CBR is good old CBR. It’s appropriate for fixed-buffer streaming or when you need to encode to an exact file size, but otherwise ABR is higher quality.

–v (Variable Bit Rate)

This is the fixed-quality VBR. It’s useful for archiving and personal music libraries, but very unpredictable file size makes it suboptimal for distribution.

–q (Quality)

The quality mode is a speed-versus-quality control. It defaults to –q 4, but I normally use –q 2, which is a little higher quality (–h is the same as –q 2). And while it’s slower, it’ll still be many times faster than real time on modern hardware. Q 0 and 1 are much slower yet and rarely product a perceptible improvement.

Using –q with –v or –preset results in a smaller file, as it makes encoding more efficient—perhaps 10 percent from –q 9 to –q 0, with a 3–4x increase in encoding time.

–preset

LAME also includes a number of high-quality presets for particular tasks. These include tuned psychoacoustics models for the particular tasks, like telephony, “CD quality,” and voice. The voice preset is especially useful for encoding intelligible low-bitrate speech. Preset is mainly used for quality-limited encoding. In order of quality and file size, they are:

•  –preset medium

•  –preset standard

•  –preset extreme

•  –preset insane

Even medium sounds pretty good for soundtracks, and most people find standard high enough quality for headphone listening. I’ve never been disappointed by extreme. Insane seems aptly named—it’s really a 320 Kbps CBR.

MP3 Encoding Examples

When you encode with LAME, you get a histogram like the following one, indicating the relative size of the different frames. Note the data rates of frames used run from 32 Kbps to 320 Kbps.

Here’s an example of a vocal rock song encoded with LAME, using the “standard” preset meant to offer transparent compression to most listeners. The “–h” flag tells it to use the highquality mode; I always use it. LAME>lame ForgiveMe.wav ForgiveMe_standard.mp3 --preset standard -hUsing polyphase lowpass filter, transition band: 18671 Hz-19205 HzEncoding as 44.1 kHz j-stereo MPEG-1 Layer III VBR(q=2) qval=2Frame | CPU time/estim | REAL time/estim | play/CPU | ETA9718/9718 (100%)| 0:13/ 0:13| 0:13/ 0:13| 18.458x| 0:0032 [ 89] %*40 [ 1] *48 [ 5] %56 [ 15] *64 [ 16] *80 [ 4] *96 [ 16 *112 [ 23] %128 [ 47] %160 [1148] %%%%**************192 [4368] %%%%%%%%%%%%%%%%%%%%%%%%%%*******************224 [3025] %%%%%%%%%%%%%%%%%%%%****************256 [ 880] %%%%*********320 [ 81] %*---------------------------------------------------------------kbps LR MS % long switch short %202.4 40.3 59.7 97.1 1.8 1.1 Even though our average here wound up at a potentially reasonable 202 Kbps, we got a lot of frames at higher bitrates, some up to the max of 320. Conversely, quite a few frames didn’t need the full average. So our audio quality is a lot better than it would have been otherwise. But we didn’t have any rate control. If we wanted to be smaller and still very good, –preset medium is a good compromise (it gave me 160 Kbps with the same song). But if we needed to be around an Internet-friendly 128 Kbps but still get the quality benefits of VBR, we could have used this: LAME>lame ForgiveMe.wav ForgiveMe_abr-128.mp3 –abr 128 -hUsing polyphase lowpass filter, transition band: 16538 Hz-17071 HzEncoding as 44.1 kHz j-stereo MPEG-1 Layer III (11x) average 128 kbpsqval=2Frame | CPU time/estim | REAL time/estim | play/CPU | ETA9718/9718 (100%)| 0:14/ 0:14| 0:14/ 0:14| 17.805x| 0:0032 [ 80] %40 [ 0]48 [ 0]56 [ 0]64 [ 0]80 [ 12] *96 [ 405] %****112 [5541] %%%%%%%%%%%**************************128 [3220] %%%%%%%%%%%%**************160 [ 378] %%***192 [ 66] %224 [ 12] %256 [ 4] %320 [ 0]--------------------------------------------------------kbps LR MS % long switch short %118.5 20.1 79.9 97.5 1.5 0.9

We get a pretty similar distribution to “standard,” but shifted to lower bitrates as we’d expect. Still, quality will be quite a bit better than with a true CBR encode, particularly wherever there are transients like percussion. And we actually came in almost 10 Kbps below our target.

mp3Pro

The mp3PRO codec was an enhanced version of MP3 with Spectral Band Replication. Like WMA 10 Pro and HE AAC, it combined a baseband (of MP3 in MP3 Pro’s case) at a lowered sample rate with frequency synthesis hints to recreate the higher sample rates. For example, a 64 Kbps stereo MP3 would normally encode at 22.05 kHz. When encoded with mp3PRO, perhaps 60 Kbps would be spent on conventional MP3 data and the remaining four kilobits on the SBR data. An MP3 Pro decoder can use this extra information to synthesize the missing frequencies all the way up to 44.1 kHz, while a conventional MP3 decoder would ignore the SBR and play back at 22.05 kHz.

Since it offered poor quality on existing MP3 players, which is the main reason to use MP3, users just stuck with classic MP3 or used the more advanced SBR codecs, and so mp3PRO decoders and encoders never became common.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.148.102.142