CHAPTER 9
MPEG 1 and 2

MPEG-1

MPEG-1, released in October 1992, was the Moving Pictures Experts Group’s first standard, and it offered revolutionary quality for its time. Although it has been eclipsed by later technologies, it laid the foundation for them as well as digital video for consumer electronics.

MPEG started out following the success and technical approaches of JPEG for still images and H.261 for videoconferencing, and as such is a pretty typical 8 × 8 DCT codec with motion compensation.

The initial hype for MPEG-1 was its application as a compression format for Video CD and interactive movies on desktop computers. The interactive movie angle never really caught on—it required users to buy a $200 card to get full-motion video, and having to sit back and watch a blocky video clip between doing stuff never amounted to compelling game play. It quickly became clear that users were much more interested in fully interactive real-time 3D games than in interactive cinema. Most CD-ROM projects made use of QuickTime or AVI, both of which offered greater flexibility and lower decode requirements. By the time computers sported CPUs fast enough for MPEG-1 playback to be ubiquitous, other codecs were emerging that offered superior compression efficiency and flexibility.

Initially, Video CD also looked to be an abysmal failure. The quality it produced was barely that of VHS, let alone the Laserdiscs then beloved by videophiles. However, once hardware costs dropped enough, Video CD became enormously popular in Asia, due to the prevalence of very cheap pirated movies and ubiquitous, inexpensive players.

MPEG-2

As soon as MPEG-1 was designed, work began on MPEG-2, focused on delivering a digital replacement for analog video tranmissions. It’s called H.262 by the ITU.

Technically, MPEG-2 was a pretty straightforward enhancement of MPEG-1. The primary new feature was interlaced video, and there were many under-the-hood optimizations that improved compression efficiency.

By almost any standard, MPEG-2 is the most ubiquitous and important video standard in the world today, and will remain in wide use for years to come. While web and device video certainly get a lot of attention, MPEG-2 still gets more eyeball-hours than any other codec, through DVD, digital satellite and cable, and digital broadcasting. We are in the midst of a broad transition to H.264 from MPEG-2 in many sectors, driven by H.264 High Profile being capable of everything MPEG-2 was, but with substantially improved efficiency.

One warning: there’s a huge array of stuff in the various MPEG specs that isn’t used in practice (D-frames!). This chapter is going to survey MPEG-1 and MPEG-2 as they’re used in practice, and doesn’t attempt to be a survey of the specifications themselves.

MPEG File Formats

Elementary Stream

An MPEG elementary stream is just the video or audio track all by itself. For MPEG-2, this is generally .m2v. These are intermediate files for importing into an authoring or muxing tool, and aren’t used by themselves, or even playable by many applications that can otherwise play a .mpg file.

Program Stream

A Program Stream is the MPEG file format, generally .mpg. A program stream contains multiplexed elementary streams, typically video and audio. Other kinds of data like captioning can be inserted into a program stream as “user data.”

This was originally called a System Stream in MPEG-1, and later renamed; it’s the same thing underneath.

Transport Stream

Transport streams are designed for real-time low-latency transmission through lossy environments, and so include a lot of error resiliency and recovery functions. They’re standard in broadcast. Their extension is .ts.

The design of Transport Streams has proved to be very useful, and they continue to be the standard for broacast even when the MPEG-2 codec itself isn’t used. Transport streams offer lower broadcast delay than streaming-oriented formats; we’ve seen some interesting use of transport stream delivery to Silverlight. Apple’s new adaptive streaming technology also uses transport streams as the container file format.

Transport Streams are one of the most durable technologies to come out of MPEG, and will be used long after today’s codecs have been replaced.

MPEG-1 Video

The MPEG-1 video codec took advantage of a lot of work already done in JPEG and H.261. Given the extreme limitations of the encoding and decoding horsepower available back in 1992, MPEG-1 was designed as a straight-up implementation of DCT without the advanced features modern codecs. MPEG-1 is a 4:2:0 color space codec, with 8 × 8 blocks and 16 × 16 macroblocks. It doesn’t support partial macroblocks, so resolution must be divisible by 16 on both axes.

MPEG-1 is a very basic codec, designed to be playable in real time on the dedicated hardware available in the early 1990s. Thus it lacks a lot of features we’ve gotten used to in more modern codecs. Some of the biggies include:

•  Partitioned motion vectors. All motion estimation is per macroblock. That’s fine for big motion, but terrible when small things are going different directions inside a 16 × 16 region. Particle effects and rain are particularly problematic.

•  Subpixel motion estimation. All motion vectors are aligned with the pixel grid. Slow motion isn’t encoded efficiently.

•  Variable block sizes. MPEG-1 is always 8 × 8.

•  In-loop deblocking filter. If a reference frame gets quantized, all later frames carry that forward.

•  Floating-point DCT. MPEG-1/2 used a classic DCT with floating-point math. This means that different hardware can vary slightly in the results of a calculation, as there’s no official definition of the exact right answer with floating-point. Any single calculation will be quite close, but since the result of that calculation can be used as the input for the next, a cumulative DCT “drift” can result, making the video output less predictable over time.

The floating-point DCT has big impact, since it limits how many consecutive calcuations can be used safely. This is why MPEG-2 typcially only has a few reference frames, as discussed shortly.

In general, MPEG-1 does very well at the data rates and resolutions it was designed for: 1.5 Mbps at 352×240, 29.97 fps (NTSC) or 352×288, 25 fps (PAL). This data rate was chosen to match the transfer rate of an audio CD, so video CD players could be built around the physical mechanism of audio CD players.

Modern high-end MPEG-1 encoders with such enhancements as 2-pass encoding, better preprocessing, and more exhaustive motion search algorithms offer much higher efficiency than their predecessors available in MPEG-1’s heyday. However, the lack of a in-loop deblocking filter means quality drops off sharply when the bitrate is insufficient for the quality. As you learned in Chapter 2, the human eye is much more attuned to sharp edges, so these blocky artifacts are quite distracting. While it’s quite possible to do postprocessing deblocking with MPEG-1, it’s not broadly implemented in players.

MPEG-2 Video

The MPEG-2 video encoder is an enhancement of MPEG-1, carrying on its features, often enhanced, and adding a few important new ones. The basic structure of macroblocks, GOPs, IBP frames, and so on, is the same. The important additions include:

•  Support for interlaced video.

•  Half-pel motion precision (MPEG-1 was full-pixel only; this is a big improvement). Allowing 8–11-bit DCT precision during the initial transform before quantization (MPEG-1 was 8-bit only) for more accurate gradients and more efficient compression at higher bitrates.

•  Optional High Profile mode with 4:2:2 color.

•  Defining aspect ratio as display aspect ratio, instead of the pixel aspect ratio used in MPEG-1.

•  Restricted motion vector search range to simplify decoders for higher resolutions. Motion vectors can go up 128 pixels vertically and 1024 pixels horizontally. This can be an issue in HD: with 1080p24 IBBP, 128 pixels is 1/8 the frame height in an eighth of a second.

•  16 × 8 motion vector partitions allowed (MPEG-1 is 16 × 16 only).

•  Some other under-the-hood tweaks that slightly improve compression; they’re always on, so you don’t need to worry about them.

For typical standard-definition content, MPEG-2 with a good VBR encoder is well served by average data rates between 2.5 Mbps and 6 Mbps, depending on content. MPEG-2 targeted and was tuned for higher-bitrate scenarios than MPEG-1, and can substantially outperform MPEG-1 at higher bitrates at the same frame size.

Interlaced Video

The biggest addition to MPEG-2 is native support for interlaced video, which was designed very nicely, with later codecs following the same mechanism. MPEG-2 video can be progressive or interlaced at three different levels:

•  An interlaced stream can be progressive or interlaced; a progressive stream only has progressive frames.

•  An interlaced stream can include both interlaced and progressive frames; progressive frames have progressive macroblocks only.

•  An interlaced frame can have any macroblock and be either progressive or interlaced.

This flexibility is extremely useful. First, progressive-only content is easily detected for devices with progressive-only output. And for interlaced content, progressive frames can still be encoded as progressive. And in a frame where part of the image is moving and other parts aren’t, the static sections can be encoded as progressive (for better efficiency), and the moving sections can be interlaced (which is required as the fields contain different information).

Because an interlaced macro block actually covers 32 lines of source instead of 16, the zigzag motion pattern normally used in MPEG for progressive scan must be modified. The alternate scan pattern attempts to go up and down two pixels for every one it goes left and right (see Figure 9.1). This was sometimes referred to as the “Yeltsin Walk” pattern after the famously unsteady former President of the Russian Federation.

Figure 9.1 The alternate scan pattern was sometimes referred to as the “Yeltsin Walk” pattern after the famously unsteady Russian leader.

image

“24p” in an interlaced stream

If we lived in a good and just world, all CE devices would support 24p MPEG-2 streams, and we’d just encode film sources with that and go home early. Alas, our world is not good and just, and the DVD and some other devices/standards (like CableLabs VOD) only support interlaced bitstreams (at least at some resolutions).

What’s worse, those interlaced streams require 29.97 frame rate. So we can’t even encode 23.976 progressive fps in that interlaced bitstream (although we can encode 30p that way just fine).

We actually have to encode with 3:2 pulldown, thus yielding a repeating pattern of three progressive frames and two interlaced frames. This isn’t as bad as it sounds, as MPEG-2 has a field_repeat tag that can be used to mark a field as “don’t waste any time on me, just reuse the same field from the previous frame.” Since our 24 frames in a second get turned into 48 fields, but need to fill 60 fields, there are 12 repeated fields per second, and thus 12 field_repeat tags. This makes encoding “24p” almost as efficient as encoding real 24p.

Also, those 12 field_repeat tags come in a particular pattern that provides very good hints to a decoder for how to reassemble the original 24 frames out of the 48 unique fields, discarding the repeats. This is how “progressive” DVD players and software players on computers work. And is why movie content looks so much better from those players than 30i; no deinterlace is required and the full 480 lines of detail are preserved.

Modern NLEs and compression tools generally do a good job of hiding this for you under the covers; if you drag a 24p source into a DVD authoring app, it’ll do the right thing just like it does for 4:3 and 16:9.

“24p” in 720p

The other annoying case is sticking 24p in a 720p bitstreams, since ATSC and other formats don’t have a real 720p24 mode. It’s the same idea as the previous example, but using frame_ repeat. So, the 24p source gets turned into a repeating pattern of two and three copies of each frame to fill out the 60. It’s just as efficient as a native 24p encode would be, but means that playback on the new 120 Hz displays isn’t smooth. Instead of getting each frame shown five times (5 × 24 = 120), we’d get a pattern of 4, 6, 4, 6 copies of each frame; classic NTSC judder again.

What Happened to MPEG-3?

MPEG-2 was originally targeted at SD resolutions, although the spec itself allows for absurdly high resolutions (the MPEG-1 spec can do 4096 × 4096). It wasn’t obvious how well MPEG-2 could scale, so the initial plan was to produce a MPEG-3 spec for HD. However, MPEG-2 turned out to be just fine for HD without any obvious enhancements that would merit a new standard. Thus HD levels were just added to MPEG-2, and MPEG numbers skipped to MPEG-4.

Sometimes MP3 is called “MPEG-3,” but that’s a misnomer. It’s really short for “MPEG-1 Layer III” as described next.

Table 9.1 MPEG-2 Profiles.

Abbr.NameChroma SubsamplingComment
MPMain Profile4:2:0Content delivery
HPHigh Profile4:2:2Content authoring

Table 9.2 MPEG-2 Levels.

Abbr.NameMax WidthMax HeightMax Framerate (fps)Max Bitrate (Mbs)Comment
MLMain Level7205763015Standard def, all DVD
H-14High 1440144011523060HDV
HLHigh Level19201152720p=60 1080=3080ATSC, DVB, Blu-ray

MPEG-2 Profiles and Levels

These are the profiles and levels in common use for MPEG-2 (Tables 9.1 and 9.2). There’s lots of others in the spec that aren’t used in practice. Also, almost any CE device is going to have further constraints on encoding beyond Profile and Level. In particular, the max bitrates allowed by CE devices are almost always much lower than that of the specified level. For example, DVD is limited to a total bitrate of 9.8 Mbps, even though DVD’s Main Profile@Main Level MPEG-2 has a maximum 15 Mbps bitrate.

Audio

MPEG-1 Audio

MPEG-1 introduced three audio codecs. All are based on similar technology with psychoacoustic modeling, with improving compression efficiency at higher levels. They’re also increasingly complex on decode.

Layer I

MPEG-1 Audio Layer I sounds pretty good at higher bitrates, but offers very low compression efficiency. In practice all decoders that can do Layer 1 also do Layer 2, so that’s used instead. Its main use was on the long-forgotten DAT competitor Digital Compact Cassette.

Layer II

MPEG-1 Audio Layer II audio eclipsed Layer 1 because it offers substantially better encoding efficiency and puts an insignificant hit on modern processors. Layer 2 audio is also supported in all MPEG-1 hardware playback devices, and is mandated as the audio codec for Video CD. The general rule of thumb is that Layer 2 needs about 50 percent higher bitrate to sound as good as MP3, so 192 Kbps Layer II sounds about as good as a 128 Kbps MP3. Video CD uses 224 Kbps Layer II audio, which is nearly CD audio quality.

Layer 2 is used broadly as one of the better patent-free audio codecs, as seen in HDV and DAB digital radio (although it is being replaced by HE AAC v2 in DAB +).

Layer 2 was also used in audio-only files before the MP3 explosion. MPEG-1, Layer 2 audio files use the .mp2 extension. MP2 was the high-quality format of choice on the pioneering Addicted to Noise music web site.

At the highest bitrates, Layer II can actually be more transparent than Layer III in some (albeit rare) cases.

Layer 2 data rates run from 32 to 384 Kbps, with 64 and below mono only.

Layer III

MPEG-1 Audio Layer III was never widely adopted in MPEG-1 authoring or playback tools due to patent licensing fees (otherwise not required by a MPEG-1 player). However, it became an enormously popular technology in its own right, albeit under its shortened name, MP3. Layer 3 data rates range from 32 to 320 Kbps.

You’ll always want to use Layer II as a soundtrack for MPEG files, as very few players support Layer III in a MPEG file. For more details on MP3, see .

MPEG-2 Audio

Oddly enough, most MPEG-2 applications don’t use MPEG audio codecs for audio. Instead, they use another audio bitstream that provide better efficiency or compatibility with existing players.

All DVD audio codecs default to 48 KHz, slightly higher than the 44.1 KHz of audio CD.

MPEG-2 extensions to MPEG-1 Layer II

MPEG-2 included extensions to MPEG-1 Layer II to handle new scenarios on lower and higher ends:

•  Sub-32 KHz sampling for low bitrates

•  MPEG Multichannel for up to 7.1 channels

Neither sees wide use. MPEG-2 isn’t commonly used at bitrates low enough to take advantage of the low bitrate extensions. And while the PAL DVD spec did originally include MPEG Multichannel, the NTSC version didn’t and Dolby Digital became the universally supported multichannel codec. It was dropped as a mandatory codec for PAL DVD in 1997.

AAC

Advanced Audio Coding (AAC) was first designed for MPEG-2, although it’s not used with MPEG-2 in practice. It became the default codec with MPEG-4, and is covered in chapter 13.

PCM

Consumer-level DVD authoring systems historically supported just uncompressed stereo tracks, to avoid the cost of licensing Dolby Digital. That licensing has become easier over the years, so fewer authoring tools and hence discs are PCM-only. Using uncompressed audio wastes a whole lot of bits better spent on video.

Dolby Digital (AC-3)

Most real-world MPEG-2 content is coupled with Dolby Digital (AC-3). Dolby is a mandatory codec (meaning all decoders must support it) in several important delivery formats:

•  DVD

•  Blu-ray

•  ATSC

•  DVB

•  CableLabs

While AC-3 is a quite old codec this point, and not particularly efficient (AAC-LC outperforms it at lower bitrates, and WMA 10 Pro and HE AAC blow it out of the water), it has a very valuable legacy base of installed decoders. Every 5.1 reciever manufactured includes an AC-3 decoder, making AC-3 the only bistream guaranteed to work when connecting any CE devices to a receiver, be it through TOSLink, RF, or other mechanisms. Those older connection types don’t have enough bandwidth for multichannel PCM, either (that was introduced with HDMI). So any device that’s going to deliver multichannel audio to anything other than a system with dedicated multichannel output (like a PC with a 5.1 speaker system) will support AC-3, be it by reencoding from another codec, or just passing the bitstream on.

With sufficient bitrates, Dolby Digital provides good quality with most content, and even its maximum bitrates are quite small compared to typical MPEG-2.

AC-3 provides a full range of channel options from mono to 5.1 (see Table 9.3). With a maximum data rate on DVD and ATSC of 448 Kbps, quality is very good for soundtrack content, although transients (like high-pitched percussion) can come out flattened in concert materials. Blu-ray can do AC-3 up to 640 Kbps, the maximum rate supported by existing decoders. Stereo is typically encoded at 192 or 224 Kbps.

Table 9.3 AC-3 Supported Channels.

LabelChannelsLFE?Minimum Data Rate
1/0CenterNo LFE56 Kbps
2/0Left, RightNo LFE96 Kbps
3/0Left, Center, RightLFE optional128 Kbps
2/1Left, Right, SurroundLFE optional128 Kbps
3/1Left, Center, Right, SurroundLFE optional192 Kbps
2/2Left, Right, Left Surround, Right SurroundLFE optional192 Kbps
3/2Left, Center, Right, Left Surround, Right SurroundLFE optional224 Kbps

Dolby Digital also supports Dialog Normalization, sometimes called DialNorm. This metadata should be set to the average dB of dialog, which allows the dynamic range of the audio to be compressed to allow intelligible speech while keeping explosions to a neighbor-or sleeping baby-approved level on the other. DialNorm is an art in its own right, and is something that needs to be determined by an audio engineer based on the actual mix.

There are enhanced Dolby Digital modes called Dolby Digital Plus and Dolby TrueHD used in Blu-ray, and discussed in Chapter 24. They are rarely coupled with MPEG-2.

DTS (Digital Theater Systems)

DTS, from Digital Theater Systems, is often positioned as the high-end alternative to AC-3. It is also sometimes used for theater sound, but never for broadcast. DTS’s big advantage and big limitation in the marketplace is its relatively low compression. Originally designed as a theater standard with higher quality than Dolby Digital, it used CD-ROM as its data standard. The base version of DTS only does 5.1, but there’s a backward-compatible 6.1 variant called DTS-ES. In DVD, DTS can use a data rate of 1,536 Kbps or 768 Kbps. The higher rate offers incredible transparency, but isn’t often used on DVD due to space limitations. It’s not clear whether the 768 Kbps mode is significantly better in general than AC-3 at 448 Kbps.

DTS’s higher data rate requirements can mean DTS DVDs may have fewer features and cost more than AC-3 versions. However, aficionados say the superior audio makes this a small price to pay.

While most professional DVD authoring tools include AC-3, many do not include DTS. DTS is usually only used in high-end Hollywood productions.

Blu-ray makes DTS a mandatory codec, and so DTS-only titles are possible there.

MPEG Audio

Ironically, the actual MPEG-1/2 audio codecs are not commonly used with MPEG-2 in content distribution. Many ecosystems, including DVD, don’t have MP2 audio as a mandatory codec. The main place where MP2 is used in in self-contained files targeting PC playback, like .mpg.

MPEG-1 for Universal Playback

Since MPEG-1 hasn’t required licensing patents, it’s been broadly supported for many years, and has been included with Windows since Windows 95 and every Mac since 1998. And while many codecs aren’t available in Linux distributions due to patent licensing requirements, MPEG-1 is often still supported. Many devices will support it as well, including DVD players with “multiformat” options like Divx and MP3. So, if you want to have a single file that’ll play on pretty much every working PC in the world, no matter how slow the CPU or ancient the OS version, MPEG-1 may be a good choice.

The drawback, of course, is that much higher bitrates are required; decent quality can require three or more times the bitrate than something using a modern codec.

For a universally supported MPEG-1, my recommendations are:

•  Use square-pixel encoding. Many old media players, including Windows Media Player before version 9, always decode as square pixel so nonsquare pixels come out distorted.

•  Use a maximum frame size of 640 × 480. Some ancient players don’t handle anything above that.

•  Use 2-pass VBR in the slowest, highest-quality mode your tool has. MPEG-1 is so simple that it’ll still encode faster than real time on a laptop.

•  Layer II audio needs a lot of bits. Use 192 Kbps at a minimum for decent quality. Use Joint Stereo for better efficiency at lower rates, and Normal for better quality above 224 Kbps.

•  Audio should be 44.1 KHz, to improve compatibility with older players.

•  Use a Program/System stream (same thing, different names) as the file format, with a .mpg extension.

•  If you put it on a disc, I recommend the good old ISO 9660 format, which everything can read. That’s what most ROM burners default to (some default to the superior but more recent UDF).

MPEG-2 for Authoring

Unsurprisingly, MPEG-2 is also used as a video production format, in capture, for mezzanine files, and even for editing. The consumer-focused HDV uses the same Main Profile used for content delivery. But most MPEG-2 in authoring uses High Profile, which supports 4:2:2 sampling, supported in MXF, XDCAM, and IMX.

4:2:2 is typically run with only I-frames at 50 Mbps (for SD). It’s rather analogous to using Motion-JPEG at the same data rate, although it is somewhat more standardized.

For SD mezzanine files, MP@ML 15 Mbps interframe encoding can produce very nice quality, even with interlaced. And for stereo audio tracks, Layer II Normal stereo at 384 Kbps is very high quality.

One drawback to using MPEG-2 files for this is that most computers don’t ship with built-in MPEG-2 decoders, so they may need to be added later. Windows 7 is the first mainstream OS to ship with out of the box MPEG-2 file playback. Most video editing and compression tools include MPEG-2 support in some fashion, but they can vary in what profiles are supported. Don’t assume that any random person can play back or reencode from a MPEG-2 file.

MPEG-2 for Broadcast

MPEG-2 has been the leading codec for digital broadcasting, although it is starting to be displaced by H.264 . But the rest of the MPEG-2 technology stack, including MPEG-2 transport streams, looks to have a long future ahead of it.

The big reason for this was compression efficiency; MPEG-2 can squeeze about six channels of decent-quality SD video in the same amount of spectrum as a single analog channel. Thus it was a no-brainer for digital satellite and cable companies to adopt MPEG-2 technology, and continually push advancements in the technology, particulary for real-time encoding.

One of the key innovations here has been statistical multiplexing (statmux), which is so cool that I need to describe it briefly. Statmux is essentially an intra-band, inter-channel VBR. Using our example of six digital channels fitting into one analog channel, those six channels are combined into a single transport stream. While the total bandwidth of transport stream is fixed, any given channel can vary in its own data rate. So a statmux encoder will compress all the channels on the same band simultaneously, redistributing bits between them on the fly to achieve the best average quality at any given moment.

The unfortunate fact is that the focus of most digital broadcasting is on increasing the number of channels, with quality a much lower concern. Lots of digital content, particularly on satellite, winds up looking worse than a good analog signal would have (although certainly better than a bad analog signal).

ATSC

The U.S. format for high-definition digital television is called ATSC, from the Advanced Television Standards Committee. Like many committees, they had a lot of trouble making a definitive decision, and produced 28 different interlaced/progressive scan, resolution, and frame rate options. That was our last, best chance to kill off interlaced, but the TV manufacturers saw interlaced as a barrier to entry for the computer industry and so fought to keep it. Ironically, of course, the TV manufacturers died off or now have a big business selling computers and monitors, none of which do interlaced anymore anyway. The saga is well told in Joel Brinkley’s Defining Vision (Harvest Books, 1998), probably the only high-drama nonfiction page-turner the compression industry shall ever have. I hope Amazon.com recommends it as a companion to this book.

Note that ATSC uses 704 × 480 instead of DVD’s 720 × 480 (see Table 9.4). This is still the same aspect ratio—720 has eight pixels on either side of the frame that aren’t included in the 704 broadcasts. So converting between 720 and 704 requires adding or subtracting eight pixels on the left/right.

Table 9.4 ATSC Options.

ResolutionAspect RatioInterlaced fps (a.k.a. 30i)Progressive fps (a.k.a. 24/30/60p)
640×4804:329.97, 3023.976, 24, 29.97, 30, 59.94, 60
704×4804:3 or 16:929.97, 3023.976, 24, 29.97, 30, 59.94, 60
1280×72016:9 23.976, 24, 29.97, 30, 59.94, 60
1920×108016:929.97, 3023.976, 24, 29.97, 30

You might think that you’ll never be asked to encode ATSC, but I hope that’s not true. Most broadcast content is authored well in advance, and it’s silly to send it through a real-time encoder in that case. We have great software HD MPEG-2 encoders that can be used to good effect here today. That said, you definitely need an ATSC-compliant tool for ATSC, as there’s lots of under the hood timing data and muxing parameters that must be followed religiously.

Personally, I still believe we only needed two HD formats: 720p60 and 1080p24. Unfortunately, all U.S. broadcasters doing 1080 are doing 1080i30.

DVB

DVB is the European/PAL equivalent of ATSC. Its biggest differences are its use of PAL frame rates, as described in Table 9.5.

Table 9.5 DVB/PAL ATSC Options.

ResolutionAspect ratioInterlaced fps (a.k.a. 25i)Progressive fps (a.k.a. 25/50p)
352×2404:3 25
352×576 480×576 544×5764:3 or 16:92525
704×5764:3 or 16:92525, 50
1280×72016:925, 50 
1920×108016:92525

DVB has branched out from just MPEG-2; some countries only do DVB with H.264. And DVB-H targeting broadcast to mobile devices is covered in the encoding for mobile devices chapter.

CableLabs

The other broadcast MPEG-2 format you may see in practice is the CableLabs VOD spec, which is used for VOD by most cable operators. The CableLabs spec is a strict subset ATSC. Fortunately, it only has three modes:

•  Three-quarter screen SD: 528×480i30 with 3180 Kbps video and 192 Kbps 48 KHz stereo AC-3

•  720p: 1280×720, up to 10 Mbps video

•  1080i:1920×1080i30, up to 18.1 Mbps video

These are generally higher than used for broadcast at those frame sizes and can look quite good if encoded correctly. As they’re by definition on demand, software encoders are normally used.

MPEG Compression Tips and Tricks

352 from 704 from 720

MPEG assumes the actual image area of a frame is 704 (which is generally true). So when converting to and from 352-width MPEG-1 or MPEG-2, you should scale to/from 704 wide. If source or target is 720, add/subtract eight pixels left/right as needed. The same applies when convertiong between 720 and 704.

Slow, High-Quality Modes

MPEG-1/2 have been around forever, and are quite simple codecs for encoding and decoding; there’s simply not as much work to do with them as with modern codecs. So a decent computer can encode SD MPEG-2 in faster than real time even with very complex settings. Since high QP is such a quality killer with MPEG, anything that can be done to improve efficiency can really help.

Use 2-Pass VBR

And given the risk of blocking from high QP, you should use 2-pass VBR whenever your peak bitrate can be higher than the average (often for DVD, never for broadcast). That lets the codec raise bitrate and reduce QP in order to prevent very blocky frames.

Mind Your Aspect Ratios

VCD and MPEG-2 SD is always nonsquare pixel, and HD often is as well (like 1440×1080 or 1280×1080 anamorphic compression). Make sure you’re properly flagging all the aspect ratios correctly. MPEG-1 only has a few fixed-pixel aspect ratio modes, but MPEG-2 is much more flexible.

Get Field Order Straight

MPEG-2 is the only format many of us encode to in interlaced. But many of us don’t have interlaced monitors to test playback on anymore. Thus, it’s really important to make sure that you’re encoding with the same field order in source and output. If you don’t, field order will get reversed, so the display order can go from 1A 1B 2A 2B 3A 3B to 1B 1A 2B 2A 3B 3A. This results in displayed images alternating between skipping back 1/60th of a second and skipping forward 3/60th of a second! So you get a nausea-inducing strobe effect whenever there’s motion. This has been the cause of many panicked emails to many compression forums over the years.

One common cause of this is DV, which is bottom field first, while many DVD encoders default to top field first. This is particularly annoying in PAL lands where everything but DV is top field; we in the NTSC world were never lulled into that false sense of security.

DVD and other interlaced MPEG-2 technlogies support arbitrary field order, so just set the encode to use the same as source. Some tools will automatically convert field order, but that’s a visually lossy process and not needed.

If you’re encoding interlaced, you should have a way to play that back as interlaced to a display with an interlaced intput. Even if it’s a flat panel that bobs to 60p, it’ll show this horrible behavior. A software player capable of bob can as well.

I recommend you encode a high-motion interlaced source in both correct and incorrect field order, so you’re able to verify your ability to detect field order mismatch.

Progressive Best Effort

If the source is 24p/25p/30p, you can encode it in a seminative fashion. However, most MPEG-2 delivery formats are strictly PAL or NTSC only, so you may need to switch back and forth:

•  24/25p can get sped up/slowed down 4 percent for format conversion.

•  24p for 30i gets 3:2 pulldown with field_repeat tags, to make for easier progressive playback. 24p for 720p gets 3:2 frame_repeat.

•  25p to NTSC should first be slowed down to 24p (well, 23.976p) and then have 3:2 pulldown applied.

•  25p gets encoded as 25p30p gets encoded as 30p60p for non-60p formats like DVD needs to get interlaced to 30i, or progressively encoded as 30p. The choice gets down to whether temporal or spatial quality is more important for the content.

Minimize Reference Frames

Remember the issue about how different decoders can have slightly different results from DCT/iDCT? That results in a slow drift away from what the content could look like, and the drift will be different in degree and appearance on different software.

You may have noticed how few reference frames MPEG typically has. For a NTSC DVD, there’s a maximum of 18 frames per GOP, and with a IBBP pattern, you only have five reference frames in a GOP: IBBPBBPBBPBBPBBPBB. So at most, one of the last B-frames would only have six frames back to the start of the GOP. This minimizes the potential for drift.

So, beware going to IBP or IP patterns, or using longer GOPs for software playback. While it might increase compression efficiency, and look fine in player with an IEEE-compliant 64-bit float DCT, it may start working quite a bit worse by the end of the GOP on a cheap hardware player with a 32-bit float engine.

Minimum Bitrate

For most codecs, we don’t ever think about minimum bitrate being an issue. We want that to be as low as possible, to save bits for the hard parts. But the lack of deblocking for MPEG means that a VBR encode that targets a fixed QP can leave the easy video looking really bad. So most MPEG encoders include a minimum bitrate control that will keep QPs lower in the easy parts. In most cases, this doesn’t do padding, so once a minimum QP is hit, bitrate can fall below the minimum QP.

Personally, this seems like laziness to me. While QP isn’t a bad shorthand estimate for visual quality, it’s far from perfect, and rate control algorithms should be sophisticated enough to make sure that the minimum visual quality doesn’t drop too low. Modern codecs do this well, but it just hasn’t happened much with MPEG-1/2.

Preprocess with a Light Hand

Since most MPEG-2 is going back out to a video device, we’re not going to do a lot of the preprocessing steps we would when targeting computer or device playback:

•  No deinterlacing, since we can encode interlaced.

•  No cropping out safe area, since it could be played on an old CRT that needs safe area.

So, for clean source, we genenrally don’t do anything. There are a few things we do want to watch for:

•  Crop 720×486 to 720×480; don’t scale it.

•  If there’s edge noise, blank it out so it’s not annoying on PCs.

•  Consider cropping 720 to 704 and encoding as 704 if there’s horizontal blanking.

•  Noise reduction can help a lot in reducing QP with MPEG-2. But if you use it with interlaced content, make sure it’s a denoise filter that processes fields separately.

MPEG-2 Encoding Tools

There are zillions of MPEG-2 encoding tools available these days, and I won’t try to catalog them all. And many DVD authoring packages build in MPEG-2 encoding facilities. I’m covering only MPEG-2 encoders here, and will talk about the authoring tools in Chapter 24.

Canopus ProCoder

ProCoder was long my go-to tool for MPEG-2 authoring. It uses the great Canopus MPEG-2 encoding engine, which does a very nice job of keeping QPs down and avoiding blocking in shadow areas.

ProCoder’s MPEG-2 has a very cool “Mastering Mode” for good-but-slow encoding. Ignore the popup warning about being “up to 20x slower”—it’s still faster–than-real-time on a modern laptop. The codec itself is single-threaded, but ProCoder will do a “grid” encode where it splits up different sections of the video to different cores on the machine for full performance. This is great for CBR, but there is some loss of efficiency with 2-pass VBR encoding using this mechanism, since it can’t reallocate bits between chunks on different cores.

Some other cool features in its MPEG-2 support:

•  Built in AC-3 encoding

•  Full VOB generation to build a disc image from a single source file

•  Automatic 480/486 and 704/352 conversion

•  24/25p speedup/slowdown

•  “24p” in 30i and 60p

•  Fixed quality plus VBV, allowing encoding to “good enough” quality while preserving compliance with DVD and other platforms

Rhozet Carbon Coder

Carbon is basically the big brother of ProCoder. It’s the same UI and engine, but more frequently updated and with lots of high-end features. It does everything ProCoder does plus these higher-end functions:

•  Integrated 5.1 AC-3 encoder

•  Built-in templates for CableLabs and MPEG-2

•  Presets for the High Profile production formats like XDCAM and P2

•  Presets for video server formats for Grass Valley, Omenon, Quantel, MediaStream, and others

Main Concept

Main Concept is broadly licensed to many companies for MPEG-2 and other formats. Among others, it’s included in Adobe’s video products.

It’s generally a good, fast encoder for SD. I have had trouble getting it to deliver ATSCcomplaint VBV with high-motion 1080i content, however.

Apple’s MPEG-2

Apple has one of the most commonly used MPEG-2 encoders out there, included in iDVD, DVD Studio Pro, and Compressor.

Early versions had rather infamous video quality issues, particularly when VBR encoding. The VBR issues were resolved several years ago, and the encoder is now reasonably fast and reaonsably good, but not a standout.

Using Apple’s QMaster grid computing system, Compressor is able to farm out MPEG-2 jobs across multiple computers. However, like the Carbon implementation, each node can only redistribute bits within its own chunk of the video, so as the number of nodes goes up, efficiency can go down.

HC Encoder

HC Encoder (HcEnc) is the most popular open source/freeware MPEG-2 encoder. It’s mainly focused on DVD authoring. As you’d expect, it has good integration with AVISynth and DGDecode for reencoding. I think ProCoder/Carbon do a better job in the low luma range.

CinemaCraft

CinemaCraft’s MPEG-2 is best known for high-touch DVD encoding, including segment reencoding, custom quant matrices, and N-pass encoding (up to 99, although 5 is the most I’ve ever seen make a difference).

It also includes a good inverse telecine analysis pass that makes for good “24p” encoding from less-then-perfect telecine sources.

Tutorial: Universally Compatible MPEG-1

Most of the time, MPEG-2 encodes are used in optical disc, so tutorials for those will be in that chapter. But let’s do a hands-on with that “Universally Compatible” MPEG-1 I mentioned before.

Scenario

Thumb drives are being handed out at an independent film industry event that includes marketing materials for a local digital theater chain. They’d like to take their shot-on-DV marketing piece and stick it on the thumb drive.

The theater owners got a deal on remaindered drives with their logo a few years ago, so they’re tiny 512 MB models.

The Three Questions

What Is My Content?

The source is a bog-standard DV codec file in 16:9, with interlaced video (We’ll use our Lady Washington source clip on the DVD as a placeholder.) Its duration is 5:48.

Who Is My Audience?

These are indie film folks, so highly variable in how technical they are, and prone to using older machines or less mainstream systems like Linux. So they need to be able to stick in the thumb drive and have a file they can double-click.

What Are My Communication Goals?

The file should be easily discovered, play on even oddball machines, and look reasonably good. Unfortunately, the thumb drives are small and will have a lot of other materials on them. We only have 120 MB left for our video file.

Tech Specs

So, what do we want to deliver? That’s not a lot of bits:

image

We want the audio to be decent, so will allocate 224 Kbps, leaving 2534 Kbps for video. Not terrible, but not great.

So, our tech specs will look like this:

•  MPEG-1 file

•  640 × 352. 16:9 would be 640 × 360, but macroblock alignment will give us an efficiency boost. It’s only a 2 percent distortion and so won’t be noticeable.

•  29.97 fps progressive.

•  224 Kbps 44.1 KHz stereo audio.

•  2500 Kbps video (rounding down a little from 2435 to give us some just-in-case headroom).

•  Peak bitrate of 4000 Kbps. Those old USB keys can be a little slow sometimes. 4000 Kbps is well under the USB 1.0 10 Mbps spec.

Encoding in Canopus Procoder/Rhozet Carbon Coder

I’ve been a happy user of the Canopus MPEG-1/2 encoder for many years. It’s not super-configurable, but does a good job of avoiding blockies at lower bitrates compared to the other encoders I’ve used.

Rhozet Carbon Coder (which uses the Procoder code) uses implicit preprocessing, so I don’t need to do much with the source other than add it. It’ll even know not to letterbox my source when going to 640 352, since that distortion falls beneath the default 5 percent aspect ratio distortion threshold.

I can just add a MPEG target and set the following parameters (also shown in Figure 9.2):

•  Stream Type = MPEG-1 System Steam

•  Width = 640

•  Height = 352

•  Aspect Ratio Code = VGA (this just means “square pixel”)

•  Quality/Speed = Mastering Quality (better than “Highest Quality,” albeit logically questionable)

•  Bitrate Type = VBR

•  Number of Passes = 2 passes (much better rate allocation)

•  Video Bitrate = 2500

•  Max Bitrate = 4000

•  Min Bitrate = 500 (make sure the easy parts don’t get too blocky)

•  Sample Rate = 44.1

Figure 9.2 Parameters for universally compatible MPEG-1 in Carbon Coder.

image

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.18.198