CHAPTER 12
MPEG-4 part 2 Video Codec

This chapter is about the original MPEG-4 part 2 video codec (also known as just “MPEG-4,” “DivX,” “Xvid,” or “ASP”). The MPEG-4 part 10 codec (better known as H.264) is covered in Chapter 14. As influential as MPEG-4 has been as a format, we have to admit that its first attempt at a video codec was something of a failure, something of a speed bump between the towering successes of MPEG-2 and H.264. So, what happened?

In the end, I’d suggest it was a mix of technical limitations and timing. On the technical side, part 2 simply wasn’t enough of an improvement over MPEG-2 to justify the switching costs. While software players on a PC are easy to update, replacing hundreds of millions of cable boxes and other media players with new models containing updated hardware decoders is an enormously expensive, challenging, and long process. And MPEG-4 part 2 wound up not being as efficient as hoped, while MPEG-2 wound up getting better than its creators imagined. Part of that was due to the huge market for MPEG-2 that made even small improvements in efficiency valuable. Part 2 never developed that kind of market. But there were also real limitations to part 2 that capped how good it could really get. In the end, I don’t think there ever was a moment where part 2 was more than 30 percent better than MPEG-2 for important scenarios, which kept it from being worth the trouble of switching for the big guys.

From a timing perspective, part 2 straddled three eras:

•  It launched as MPEG-2 was hitting its stride with DVD and digital broadcasting.

•  Part 2 implementations hit the market during the early web video days, but licensing ambiguities kept big content companies wary, while the proprietary formats and codecs offered simple licensing and then better real-world compression efficiency.

•  By the time the licensing issues were all worked out and the Streaming Wars were ending, H.264 was already on the horizon and clearly poised to offer big efficiency advantages over both MPEG-2 and part 2.

In the end, piracy and phones were the primary markets for part 2.

The DivX/Xvid Saga

The piracy tale is an interesting part of the bigger story. Microsoft had been an early participant in the MPEG-4 process, and had created an early reference encoder and decoder. And Windows Media actually shipped several generations of MPEG-4 codecs—first Microsoft MPEG-4 v1, v2, and v3 based on standards drafts, and then an ISO MPEG-4 codec based on the final standard. These were decent codecs of their era, but locked to the ASF file format. An enterprising hacker took the MS MPEG-4 v3 .dll and changed some bits in order to let it work as an AVI codec, which was a lot easier to work with in video authoring tools of the day. And then the hacker broadly distributed the hacked version, under the name “DivX ;-)”—annoying emoticon and all (and a reference to the failed DivX competitor to DVD). There are people at Microsoft still irritated by this. But it became a pretty popular way to create and distribute higher-quality video files via download and peer-to-peer than was possible with real-time streaming of the era.

Somehow this hacked codec led to financing, and DivX Networks was born to commercialize the technology. However, said technology consisted of the MPEG-4 part 2 video codec, the AVI file format, and, typically, MP3 audio. The company dabbled in open source for a while with OpenDivX, then reverted to a closed source model. Participants in OpenDivX forked the code from the last open version and created their own version, the palindromically named “Xvid.”

At that point DivX Networks tried to move the ecosystem to their new DivX Media Format (DMF), which was still based on AVI for backward compatibility. But most content targeting “DivX” was still just vanilla AVI, and increasingly encoded using Xvid.

DivX Inc’s (as it is now named) biggest moneymaker appears to have been certifying DVD players as also supporting DivX, so that users could burn their DivX files to disc and play them back easily. The market for that has been declining, however, as optical media is increasingly uncompetitive with hard drive and flash-based storage. And since there wasn’t any actual DivX Networks technology required to play the files back, many devices simply shipped compatibility for the files without paying for the logo.

Today, many media players can handle the AVI + MPEG-4 part 2 + MP3 combination, including VLC, Windows 7’s WMP, Xbox 360, and PlayStation 3.

DivX Networks itself is now focused on DivX 7, which uses H.264 and the open source Matroska container format (MKV).

Encoding for DivX-compatible devices is covered in Chapter 24.

Why MPEG-4 Part 2?

Honestly, there are fewer and fewer reasons to use part 2 these days; the main one remaining is compatibility with existing players. Every class of players that supported part 2 is rapidly adopting H.264. There are a few places where legacy part 2 only players might matter, but they’re shrinking.

Consumer Electronics

There was a burst of “DivX-certified” DVD players a few years ago, all capable of playing the DivX-style part 2 + MP3 in an AVI file. While specifics specs varied, lots of them are capable of doing at least 720p. Xbox 360 and PS3 are also able to play back these files.

Mobile

Various flavors of part 2, particularly Short Header and Simple Profile, were used in phones before H.264 was common. I haven’t seen any recent breakdowns of which phones are capable of decoding what, but given the rapid rate of phone replacement, H.264 Baseline will soon be a safe choice if it isn’t already.

Low Power PC playback

Part 2 is a lot simpler than H.264 to decode. So older computers can play back higher resolutions using part 2.

The challenge there is only Macs have out-of-the-box part 2 decode, but it’s Simple Profile .mp4 only. So this only works if you’re able to install a decoder along with your media, or give up B-frames and hence a bunch of compressor efficiency.

Why Not Part 2?

There are some pretty big reasons not to use part 2 by default anymore.

H.264 or VC-1 Is Already There

Most of the phones, PCs, and consoles that do part 2 can also do H.264, which is a much more efficient and broadly supported codec. And Windows PCs already have WMV, which is more efficient and offers better performance on playback on most machines today.

So lack of good out-of-the-box support is a challenge. Even older Macs support at least H.264 Main Profile. Windows 7 will be the first OS to ship with full ASP support out of the box, and it includes H.264 as well.

Lower Efficiency

Both VC-1 and H.264 offer better compression efficiency than part 2, making them better choices for anywhere that bandwidth is a premium.

The efficiency gets even worse if QuickTime is being targeted for playback, due to the Simple Profile limitation.

What’s Unique About MPEG-4 Part 2

So what’s part 2 is like as a codec? First off, it was based on the ITU H.263 videoconferencing codec, with all profiles in significant use being supersets of that. Unfortunately, it was based on the original H.263, not including many of the useful features of H.263 (also released in 1998). This was not uncommon; H.263 is more advanced than MPEG-1 but doesn’t have patent licensing fees associated with it.

MPEG-4 part 2 was perhaps the last 8 × 8 floating-point DCT. Very broadly, VC-1 can be thought of as the culmination of that course, keeping the core structure while addressing its weaknesses, while H.264 represents a decisive break from that past and adds fundamental changes.

Custom Quantization Tables

Part 2 supports custom quantization tables that can be tuned to content and bitrate (like MPEG-2). There are quite a few available for download with different tools, but it can require a lot of trial and error to figure out which ones are well-suited to different kinds of content. In general, the H.263 matrix is better for lower bitrates (errs on the side of softness) while the MPEG-4 matrix is better for higher bitrates with film/video source (retains more detail). Most part 2 encoders default to the H.263 matrix.

B-Frames

Part 2 has B-frames, but they can’t contain intra-coded blocks, and so BI-frames like VC-1 uses for flash frames aren’t possible. Still, B-frames are highly useful in compression, and are by far the most valuable tool in Advanced Simple Profile not in Simple Profile.

Quarter-Pixel Motion Compensation

Part 2 goes beyond MPEG-1’s full-pixel and MPEG-2’s half-pixel precision by supporting quarter-pixel precision. However, the implementation wasn’t as efficient as in later codecs, providing a slight improvement in efficiency at best, and is often not used by encoders. DivX’s own recent profiles don’t include it.

Global Motion Compensation

Global motion compensation (GMC) sounded like a great idea—do a global motion search and provide an overall description of the motion in the frame. And not just pans, but zooming! But it’s very expensive to calculate, provides a very slight improvement at best, and thus is very rarely used.

Interlaced Support

Part 2 has good MPEG-2 style interlaced support, but it is rarely used in practice; 720p is generally the highest resolution decoded.

Last Floating-Point DCT

MPEG-4 was the last significant interframe codec to use a MPEG-1/2 style floating-point DCT without an ironclad definition of the right answer. So different decoders can potentially drift over time from what the encoder assumed due to a succession of slightly different rounding errors, although in practice this problem is less severe than in MPEG-2.

No In-Loop Deblocking Filter

There’s no in-loop deblocking in part 2; if you get a reference frame with high quantization, that’ll get propagated forward to the next frame.

MPEG-4 Part 2 Profiles

Short Header

MPEG-4 short header is simply canonical, original flavor H.263. It was commonly used for phone playback due to its low decoder requirements. However, its compression efficiency is similarly low. Modern phones with better ASICs use more efficient codecs to preserve bandwidth.

Simple Profile

Simple Profile was the most commonly used for web video, being the best mode QuickTime could export and play. The main difference is that Simple Profile adds optional error resilience to reduce the impact of corrupt/missing data.

Advanced Simple Profile

Advanced Simple Profile (ASP) is the one everyone expected to dominate. It added four main features to Simple Profile:

•  B-frames

•  Global motion compensation

•  Quarter-pixel motion estimation

•  MPEG-4 and custom quant matrices (Simple just used H.263)

B-frames turned out to be by far the most useful feature of the bunch, and many ASP encoders default to doing SP + B-frames. And while there isn’t a formal profile for that, there are plenty of devices, including the Zune HD and many DivX certified hardware players, that support SP + B-frames as a de facto profile, while lacking support for the other ASP features.

Probably the biggest barrier to ASP’s broad adoption was that QuickTime has never supported it. Due to an architectural limitation prior to QuickTime 7, QuickTime codecs couldn’t natively support B-frames. The QuickTime 6 betas could play ASP content by dropping B-frames, but that was removed for the released version. Even though QT7 fixed the general B-frame issue in 2005, the part 2 decoder has never been updated to play ASP.

In general, all the other software players, such as VLC and ffmpeg, support all of ASP. But as QuickTime has long been the architecture most likely to try to play a double-clicked .mp4 file, ASP files will fail to play on many out-of-the-box systems.

Windows 7 includes full ASP support in Windows Media Player 12.

Studio Profile

The MPEG-4 part 2 Studio Profile was intended for high quality content acquisition and archiving. It goes far beyond 8-bit 4:2:0. Its only significant use so far is in Sony’s excellent HDCAM-SR tape format, which does both 10-bit 4:2:2 and the glorious 12-bit 4:4:4.

MPEG-4 Part 2 Levels

MPEG defines resolution with such terms as QCIF, CIF, 2CIF, and 4CIF, derived from the MPEG “Common Intermediate (meaning between PAL and NTSC) Format.” Q = Quarter, 2 = double width, and 4 = double width and double height. Each has a canonical resolution (Table 12.1). However, the actual limitation in the spec is the maximum number of 16 × 16 macroblocks allowed. So, in theory you can redistribute the blocks into any shape you like. So, for QCIF, instead of 176 × 144, you could deliver 256 × 96 (for streaming Ben-Hur, perhaps). The one exception to this redistribution rule is Simple Level 0, which is limited to 176 × 144 actual pixels. However, in practice device decoders may also have a fixed maximum height and width.

Table 12.1 CIF Family of Frame Sizes.

NameCanonical resolutions16 × 16 blocks
QCIF176 × 14499
CIF352 × 288396
2CIF352 × 576792
4CIF704 × 5761620

Simple

The Simple Profile is mainly seen in QuickTime and phones. It may also be used for recording video with low-power devices. Level 0 also has a limit of 15 fps (Table 12.2).

Table 12.2 Simple Profile Levels.

LevelMax sizeMax data rate (Kbps)
0QCIF (fixed to 176 × 144)64
1Equivalent to QCIF64
2Equivalent to CIF128
3Equivalent to CIF384

Advanced Simple

Advanced Simple (Table 12.3) is a superset of Simple, so it can play all Simple content. This profile adds a number of enhancements to support better visual quality. Interlaced content is supported in levels 4–5. Advanced Simple Level 3b is used in ISMA Profile 1, and support for higher levels are planned for later ISMA Profiles.

Table 12.3 Advanced Simple Profile Levels.

LevelMax sizeMax data rate (Kbps)Interlace
0QCIF128No
1QCIF128No
2CIF384No
3CIF768No
3b2CIF3000No
42CIF3000Yes
54CIF8000Yes

MPEG-4 Part 2 Implementations

DivX

DivX, Inc., was really the popularizer of MPEG-4 part 2. It’s often called “DivX” even by those who don’t use their tools.

While DivX dabbled in commercial encoding tools for several years, their Dr. DivX product (Figure 12.1) is now free and open source. They also have a commercial DivX Pro implementation (although that’s much more focused on H.264 these days).

Figure 12.1 The Dr. DivX OSS interface. Basic and simple, aimed at consumers.

image

Xvid

Xvid (Figure 12.2) was created as a fork from the OpenDivX project. It has had years of sustained development from dedicated codec hackers who have pushed part 2 about as far as it’s likely to go.

Figure 12.2 The Xvid advanced configuration dialog in MeGUI. It’s rather imposing; fortunately, the presets are quite good for most content.

image

As an open source technology, Xvid is widely used in many free encoders, including ffmpeg, Handbrake, and MeGUI.

Sorenson Media

Sorenson Media is best known today for their Squeeze product and online services, but their heritage was in codecs, notably the H.263-derived Sorenson Video and the H.263 Spark implementation for FLV (Figure 12.3).

Figure 12.3 Squeeze’s “Sorenson MPEG Pro” dialog will be very familiar to Spark users. It supports .mp4 files, but not .avi or other containers.

image

Telestream

Telestream’s purchase of PopWire and its Compression Master product (now Episode, Figure 12.4) gained them encoding technology very focused on mobile encoding, which includes excellent part 2.

Figure 12.4 Episode’s part 2 dialog exposes advanced features aimed at the mobile market.

image

QuickTime

QuickTime had one of the most widely used part 2 implementations, which was a shame, as it was always one of the weakest, and hasn’t been updated in ages. Beyond being Simple Profile only, it also only has 1-pass CBR with mediocre efficiency at best (Figure 12.5). I can’t think of a reason to use it today.

Figure 12.5 Unfortunately the “Optimized for” option is only available for QuickTime H.264, not part 2. Apple’s part 2 is 1-pass CBR only.

image

A number of compression products encode part 2 via the QuickTime API, and they’ll do no better in quality.

MPEG-4 Part 2 Tutorial

So, what would we actually use MPEG-4 part 2 for?

Scenario

We and our friends have made a cool animated short, which we want to make available to the world. It’s in HD, and we figure people might want to watch it in home theaters as well as PCs.

Three Questions

What Is My Content?

Our source is 1920 × 1080p24 animation, currently living as a sequence of PNG files. Audio is a simple stereo mix.

Who Is My Audience?

People inclined to download amusing videos from the Internet and watch them on their PCs, consoles, and DVD players: they’re younger, tech-savvy, often but not always with the latest gear. And probably with time on their hands; they’re prefer a download for higher quality over streaming.

What Are My Communications Goals?

We want our content to be awesome; we’d rather it take longer to download than look bad. And we want it to be in HD, at least 720p. We’re in it for the glory and adulation of our peers; our grandmothers aren’t going to watch it, and that’s probably how we want it.

Technical Specs

We’re going to go for 1280 × 720 24p, at a data rate that’ll deliver great quality. We want the audio to sound good too.

For broadest compatibility, we’re going to do “DivX”-style part 2: MPEG-4 part 2 Simple Profile + B-frames with MP3 audio in an AVI wrapper.

We’re not going to sweat bitrate; we want to be as small as it can and still be awesome.

Encoding in MeGUI

We’re going to encode with the open source MeGUI, which is free and a nice front-end to Xvid and other codec implementations (Figure 12.6).

Figure 12.6 MeGUI’s basic Xvid configuration option. This is where most real-world tuning will be done. (Continued)

image

First off, we have to turn our PNG sequence and WAV files into a source. MeGUI uses AVISynth for video input, and can use AVS or other audio formats for audio. Here’s a simple script that will do all of the following:

•  Turn the PNG sequence into video

•  Add the audio

•  Merge them together

•  Resize the video to 1280 × 720

•  Convert to the required YV12 (4:2:0) color space ready for the codec:

•   a = imageSource(“D:Dream108005d.png”,1,15691,24)b = wavsource(“D:DreamCM-St-16bit.wav”)audiodub(a,b)BilinearResize(1280,720)ConvertToYV12()

Then we need to configure our settings.

MeGUI comes with a number of good presets. We’ll start with “2pass HQ (no Qpel),” which will gives us the broadest compatibility for 720p capable devices. For audio, we’ll start with “LAME MP3-128 ABR” (Figure 12.7). We’ll want to make a few modifications, though:

•  Raise Bitrate to 4000. We want our stuff to look good!

•  Change Profile to Hi-Def (720p), which adds a few further constrains to make sure it’s compatible with DivX-certified 720p players.

Figure 12.7 LAME as exposed in MeGUI. Watch out; it has Normalize Peaks on by default.

image

•  Reduce Max Keyframe Int from 250 to 96, keeping the max GOP length to 4 seconds. We figure people will want to scrub through the video looking for cool stuff, so want random access to be smoother.

•  Raise the MP3 ABR from 128 to 192 for better audio quality.

And there we have it. A very nice-looking presentation of our masterpiece to share with the world.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.195.215