CHAPTER 7
Using Video Codecs

You’ve probably noticed that there are roughly six ba zillion video codecs out there. While the rest of this book offers specifics on how to use each of those individually, there are many aspects of video codecs and compression we can investigate in general terms. Be aware that many of these technologies use different terms to describe the same feature, and sometimes call different features by the same name.

For this chapter, I’m speaking of a codec as a particular encoder implementation of a particular bitstream (dubbed an “encodec” in a memorable typo on Doom9). So, Main Concept H.264 and x264 are different codecs that both make H.264 bitstreams. So even when you know the bitstream you need to make, there are still plenty of choices between implementations of the major codec standards.

Bitstream

The most basic codec setting is the setting that determines which codec is being used. Some formats and platforms, such as MPEG-1, offer but one codec. QuickTime, at the other extreme, is a codec container that can hold any of literally dozens of codecs, most of which are wildly inappropriate for any given project.

Picking the right codec is of critical importance. You’ll find details on each codec’s strengths and weaknesses in the chapters devoted to each format. When evaluating new codecs, compare their features to those described in the following to get a sense of where they fit into the codec ecology.

Profiles and Level

Profile and Level define the technical constraints on the encode and the features needed in the decoder. They’re generally described together as profile@level, for example, “H.264 High 4:2:0 Profile @ Level 2.1.”

Profile

A profile defines the basic set of tools that can be used in the bitstream. This is to constrain decoder requirements; the standardized codecs have a variety of profiles tuned for different complexities of devices and decoder scenarios. For example, H.264 has a Baseline Profile for playback on low-power devices, which leaves out features that would take a lot of battery life, a lot of RAM, or a lot of silicon to implement. Conversely, PC and consumer device with a plug instead of a battery typically use High Profile, which requires more watts/pixel but fewer bits/pixel. And MPEG-2 has a Main Profile used for content delivery with 4:2:0 color and a High Profile used for content authoring that does 4:2:2.

Profiles define the maximum set of tools that can be used on encode, and the minimum that need to be supported on decode. For example, Main Profile H.264 encoder adds CABAC entropy coding and, B-frames, (described in the chapter on H.264). QuickTime’s H.264 exporter doesn’t use CABAC or multiple reference frames, but can use B-frames. Since it doesn’t use any tools not supported in Main Profile, most notably 8 × 8 block size, its output is still Main Profile legal. Conversely, any Main Profile decoder must support all of the tools allowed in Main (and QuickTime can play back Main and High Profile content using features QuickTime doesn’t export).

Particular devices often have additional constraints beyond the published Profiles. For example, many MPEG-4 pt. 2 encoders and decoders use “Simple Profile + B-Frames,” which would normally be Advanced Simple Profile (ASP), but the other tools in ASP are of questionable real-world utility but add a lot of decoder complexity.

Level

Level defines particular constraints within the profile, like frame size, frame rate, data rate, and buffer size. These are often defined somewhat esoterically, like in bytes of video buffering verifier (VBV) and total macroblocks per second. I provide tables of more applicable definitions for each level in the chapters covering the various codecs.

Like Profile, Level is a maximum constraint on encoder and minimum constraint on decoder. Typically the maximum bitrate allowed by a Level is much higher than you’d use in practice, and many devices have further constraints that must be kept to.

Encoders targeting the PC market, and decoders in it, tend not to focus on Level. With software decode, lower resolution, very high-bitrate clips can be hard to decode; thus specifying levels for software playback is generally insufficient detail.

Data Rates

Data rate is typically the single most critical parameter when compressing, or at least when compression is hard.

Modern formats measure bitrate in kilobits per second (Kbps) or megabits per second (Mbps). I think we’ve purged all the tools that used to think that K = 1024 (see sidebar), so this should be reasonably universal now.

For lots of content, bitrate is the fundamental constraint, particularly for any kind of networkbased delivery. You want to use the fewest bits that can deliver the experience you need; spending bits past the point at which they improve the experience is just increasing costs and connectivity requirements.

Within a codec’s sweet spot, quality improves with increased data rate. Beyond a certain point (at the same resolution, frame rate, and so on), increasing the data rate doesn’t appreciably improve quality. The codec or its profile@level may also have a fixed maximum bitrate. Below a certain data rate (again, with all parameters fixed), the codec won’t be able to encode all the frames at the requested bitrate; either frames are dropped (better) or the file just comes out bigger than requested (worse). And some codecs also reach a point where slight drops in data rate lead to catastrophic drops in quality. MPEG-1 is especially prone to this kind of quality floor, as it lacks a deblocking filter.

Compression Efficiency

Compression efficiency is the key competitive feature between codecs and bitstreams. When people talk about a “best codec” they’re really talking “most efficient.” When a codec is described as “20 percent better than X” this means it can deliver equivalent quality at a 20 percent lower data rate, even if it’s used to improve quality at the original data rate.

Compression efficiency determines how few bits are needed to achieve “good enough” quality in different circumstances. Codecs vary wildly in typical compression efficiency, with modern codecs and bitstreams being able to achieve the same quality at a fraction of the data rates required by older codecs and bitstreams (today’s best video and audio codecs are easily 10x as efficient as the first). And different codecs have different advantages and disadvantages in compression efficiency with different kinds of sources and at different data rates. The biggest differences are in the range of “good enough” quality—the higher the target quality, the smaller the differences in compression efficiency. H.264 may only need one-third the bitrate of MPEG-2 to look good at low bitrates, but it’ll have less of a relative advantage at Blu-ray rates.

Figure 7.1 16 years of compression efficiency on display right here. Both are 800 Kbps. (A) The ancient Apple Video codec. 5-bit per channel RGB and no rate control! Could only get down to 800 Kbps at 160 × 120. (B) H.264 High Profile via a quality-optimized 3-pass x264 encode. Nearly transparent compression at 640 × 480 in 800 Kbps.

image

Some authoring and special-use codecs don’t offer a data rate control at all, either because the data rate is fixed (as in DV) or because it is solely determined by image complexity (like PNG, Lagarith, and Cineform).

A Plea for Common Terminology

The computer industry got into a bad habit a few decades ago. Back in 1875, the metric system was fully codified, and it defined the common kilo-, mega-, and gigaprefixes, each 1000x greater than the one before. These are power-of-ten numbers, and thus can be written in scientific notation. So, 2 terabytes (2 TB) would be 2 × 109, the 9 indicating that there are nine zeros.

However, computer technology is based on binary calculation and hence uses powerof-two numbers. Ten binary digits (1 × 210) is 1024, very close to three decimal digits (1 × 102) 1000. And so computer folks started calling 1024 “kilo.” And then extended that to mega, tera, and on to penta and so on.

But that slight difference may not be so slight, particularly when it’s the difference between “just fits on the disc” and “doesn’t fit on the disc.” So there is now a new nomenclature for the power-of-two values, sticking an “ib” after each number. Thus:

•  Computer industry numbers

•  “K” = Kib = 210 = 1024

•  “M” = Mib = 220 = 1,048,576

•  “G” = Gib = 230 = 1,073,741,824

•  Correct numbers

•  K = 103 = 1000

•  M = 106 = 1,000,000

•  G = 109 = 1,000,000,000

•  Difference between values

•  K v. Ki = 2.4 percent

•  M v. Mi = 4.8 percent

•  G v. Gi = 7.37 percent

Rate Control

All that 1-pass, 2-pass, CBR, VBR terminology describes data rate modes. These represent one of the key areas of terminology confusion, with the same feature appearing under different names, and different features being given identical names. We can classify data rate control modes in a few general categories.

VBR and CBR

The variable bitrate (VBR) versus constant bitrate (CBR) distinction can be confusing. All interframe compressed video codecs are variable in the sense that not every frame uses the same number of bits as every other frame. This is a good thing—if every frame were the same size, keyframes would be terrible compared to delta frames. Even codecs labeled “CBR” can vary data rate quite a bit throughout the file. The only true CBR video codecs are some fixed frame-size authoring codecs like DV. But even there, the trend for solid-state capture is to use variable frame sizes to make sure hard frames get enough bits.

So, what’s the difference here? From a high level, a CBR codec will vary quality in order to maintain bitrate, and a VBR codec will vary bitrate in order to maintain quality.

VBV: The fundamental constraint

Another way to think about CBR and VBR is how the decoder does, instead of how the encoder does. What the decoder really cares about is getting new video data fast enough that it’s able to have a frame ready by the time that frame needs to be displayed, but doesn’t get so much video data that it can’t store all the frames it hasn’t decoded yet. So the two things that can go wrong for the decoder are a buffer underrun, when it runs out of bits in its buffer and so doesn’t have a frame it can decode by the time the next frame is supposed to display, or a buffer overrun, where it has received so many bits it can’t store a frame that has yet to be displayed more.

Thus, every decoder has a video buffering verifier (VBV) that defines what a decoder has to be able to handle for a particular profile@level, and hence how much variability the encoder could face.

So, really, the core issue is the VBV; an encoder can vary allocation of bits between frames all it wants to as long as it doesn’t violate the VBV. The VBV defines how many bytes can be in the pipeline. The VBV is thus the same as the bitrate (or peak bitrate) times the buffer duration. So, a 4-second buffer at 1000 Kbps would be 4000 Kbits, or 500,000 bytes.

And, from that understanding, we can think of a CBR as encoded to maximize use of the VBV, while a VBR encode is one where the average bitrate can be lower. The highest bitrate a VBR encode can go up to is the same as a CBR encode, and looks very much like a CBR encode for that section. With a five-second buffer, the data rate for any five seconds of the video must not be higher than the requested average. This isn’t a question of multiple, discrete five-second blocks, but that any arbitrary five seconds plucked at random from the file must be at or under the target data rate.

CBR for streaming

The name “streaming” is a good one—a stream is a constant flow of water. But even though it is nominally constant, it isn’t always possible for the encoder to use every last bit. If the video fades to black for a few seconds, there simply isn’t enough detail to spend it on. For web streaming applications, fewer bits get sent; no biggie. Fixed-bandwidth broadcast like ATSC actually needs to use the full bandwidth, and will introduce “padding bits” of null data.

That said, adaptive streaming techniques can support much more variability, since it’s the size per chunk that matters, not size per stream. A number of products support the new “VBR VC-1” Smooth Streaming SDK from Microsoft, where the right “stream” for the client’s bitrate gets assembled out of the available chunks from different encoded bitrates. So even though the encoded files are VBR, the client is receiving data at an essentially constant rate.

VBR for download

We generally use VBR for video that’s going to be downloaded or stored on disc. The fundamental definition of VBR is that the average bitrate is less than the peak bitrate, while those are identical with CBR. Thus, for a VBR encode, we specify the average bitrate as well as the buffer (either via VBR or by defining peak bitrate and buffer duration).

The goal of a VBR encode is to provide better overall compression efficiency by reducing bitrate and “surplus” quality in the easier portions of the video as it does in the more complex portions.

Depending on format, it’s perfectly possible to use VBR files for streaming. This is sometimes used as a cost-saving measure in order to reduce total bandwidth consumption. Its peak bandwidth requirements would be the same of a CBR file with identical buffer. For example, encoding with a 1500 Kbps peak and a 1000 Kbps average would still require 1500 Kbps for playback, but would save a third on per GB delivery costs.

The average and peak rates are really independent axes. Average bitrate gets down to how big a file you want, and the peak is based on how much CPU you need to play it back. Take a DVD with video at a pretty typical 5 Mbps average 9 Mbps peak. If a bunch of content gets added, meaning more minutes need to be stuffed into the same space, the average bitrate may need to be dropped to 4 Mbps, but the 9 Mbps maximum peak will stay the same. Conversely, if the project shifts from using replication to DVD-R without any change in content, the peak bitrate may be dropped to 6.5 Mbps for better compatibility with old DVD players, but the average wouldn’t need to change.

Quality-limited VBR

A quality-limited VBR is a special case of VBR when data rate isn’t controlled at all; each frame gets as many bits as it needs in order to hit the quality target. In some cases, this can be a fixed-QP encode. Other models can be a little more sophisticated in targeting a constant perceptual quality; for example allowing B-frames to have higher QP. In those cases, you can think of a bitrate-limited VBR as basically a mechanism to find the QP that’ll give the desired file size.

Some codecs, more common in audio than video, offer a sort of hybrid quality/bitrate VBR mode with an overall target bitrate and target quality, and the actual bitrate per file will go up and down within a band to hit that quality. In video codecs, this is supported in x264’s CRF mode, and in ProCoder/Carbon’s quality-with-VBR MPEG-2 mode.

Figure 7.2 These graphs all aim to show the relationship between two values over time: data rate and the quantization parameter (QP). As you may remember from Chapter 3, a higher QP drives a higher scale factor, and hence more compression and lower quality. So a higher QP means lower quality and a lower QP means higher quality. (A) Three CBR encodes with a 4-second buffer. As we’d expect from a constant bitrate encode, the general pattern of data rate QP is similar across all three encodes, with the lower bitrate having a higher QP. There’s some variability in data rate, of course, although it follows the VBV constrained accurately. There’s lots more variation in QP, since harder scenes can’t spend more bits and thus look at lot worse. (B) Three CBR encodes with the same bitrate, with a 1, 4, and 8 second buffer. The larger buffer results in a little more variability in bitrate (higher spikes) and a slight reduction in variability of QP (fewer high QP, redistributing bits from increasing QP on easy frames). (C) Three 500 Kbps encodes with different peaks: One CBR (so 500 Kbps peak), one VBR at 750 Kbps peak, the last VBR with 1500 Kbps peak. As bitrate variability increases, we see bigger changes in bitrate, but much smaller changes in QP. The highest QPs are a lot lower with even a 750 Kbps peak, and getting almost flat at 1500 Kbps. (D) Three encodes with the same peak bitrate of 750 Kbps, but different averages of 250, 500, and 750. As you’d expect, the CBR encode has a flat bitrate and a pretty variable QP, always lower than the streams with a lower average. But the 500 and 750 Kbps streams match closely in terms of the hardest part of the video, where they both use the full 750 Kbps peak and hit the same QP. The CBR is able to make the easier frames better. The 250 Kbps stream has a pretty consistent QP and a more variable data rate; the big ratio between peak and average lets it act almost like a fixed quality encode. (E) Varying the duration of the VBR buffer has a less dramatic effect than increasing the peak, but it’s still very helpful. The larger buffers result in peak QPs getting a lot lower.

image

1-Pass versus 2-Pass (and 3-Pass?)

1-Pass

By default, most codecs function in 1-pass mode, which means compression occurs as the file is being read (although generally with a buffer the same size as the VBV). Obviously, any live encoding will require 1-pass encoding.

The limitation of traditional 1-pass codecs is that they have no foreknowledge of how complex future content is and thus how to optimally set frame types and distribute bits. This is generally less challenging for CBR encodes, since there is limited ability to redistribute bits anyway, but is a bigger deal for VBR. Although a lot of work has gone into finding optimal data rate allocation algorithms for 1-pass codecs, there will always be files on which different implementations will guess wrong, either over-allocating or under-allocating bits. An overallocation can be especially bad, because when the data rates go up above the average they will have to eventually go below the average by the same amount. If this happens when the video is getting still more complex, quality can get dramatically awful.

1-Pass with lookahead

Some codecs are able to use “lookahead” techniques to provide some of the quality advantages of 2-pass encoding in a 1-pass encode. For example, most VC-1 implementations are able to buffer up to 16 frames to analyze for scene changes, fades to/from black, and flash frames, and then set each frame for the optimum mode. Some also support lookahead rate control, where bitrate itself is tuned based on future frames in the buffer. Telestream’s Episode is actually doing 1-pass with lookahead in its “two-pass” modes, with the final encode pass following up to 500 frames behind the first. This is excellent use of highly multicore systems, since different cores can working on the lookahead and primary at the same time.

2-Pass

2-pass codecs first do an analysis pass, where they essentially do a scratch encode to figure out how hard each frame is. Once the entire file has been analyzed, the codec calculated an optimal bit budget to achieve the highest possible average quality over the file. In essence, this lets the codec see into the future, so it always knows when and how much to vary the data rate. 2-pass compression can yield substantial improvements in compression efficiency for VBR encodes, easily 50 percent with highly variable content. For example, it knows how many bits it can save from the end credits, and apply them to the opening title sequence.

2-pass is less helpful for CBR in general, but can vary quite a bit by codec type. Quite a few codecs don’t offer a 2-pass CBR at all, reserving 2-pass for VBR modes. Going to 2-pass doesn’t always mean that compression time will be doubled. Many codecs use a much quicker analysis mode on the first pass.

3-Pass and beyond

A few encoders, including x264 and CinemaCraft, offer a third or even more passes. Much like the first pass is a temp encode that is refined, the third pass is able to take the output from the first pass and further refine it. Most of the value of multipass is achieved in the second pass, but for highly variable content or a big average/peak ratio, a third is sometimes further help.

Segment re-encoding

Segment re-encoding is when the encoder is able to encode just specific sections of the video, leaving others alone. This is sometimes called third pass, but there’s no reason why there couldn’t be three or one passes first, or that the tweaking will only take one additional pass. A few encoders can do this automatically, like QuickTime’s H.264 encoder.

More commonly, segment re-encoding is a manual process, with a compressionist picking particular shots that aren’t quite perfect and adjusting encoding settings for them. This is the domain of high-end, high-touch compression products typically targeting Hollywood-grade optical discs, like CineVision PSE and CinemaCraft. The main product targeting streaming and other formats with segment re-encoding is Inlet’s Fathom. And it is glorious.

It can also be a real time rathole, since there’s no clear definition of “done.” The highest-paid compressionists do this kind of work for Hollywood DVD and Blu-ray titles.

Frame Size

While not typically listed as a codec setting, resolution is one of the most important parameters to consider when choosing a codec. With a particular source and frame size, a given codec needs a certain number of bits to provide adequate quality, so there is an important balancing act between resolution and bitrate.

Some codecs can use any resolution. Others may require each axis be divisible by two, by four, or by 16 (and some decoders offer better performance with Mod16 even if it’s not strictly required). Others may have maximum resolutions, or support only one or a few resolutions.

Note the relationship between frame size and bitrate isn’t linear. < Math Warning! > There’s an old rule of thumb called the “Power of 0.75” that says data rate needs to be changed by the power of 0.75 of the relative change in frame size. By example, assume a video where 640 × 360 looks good at 1000 Kbps, and a 1280 × 720 version is needed. There are four times as many pixels in the new frame: (1280 × 720)/(640 × 360) × 4. And 40.75 = 2.828. Times the 1000 Kbps, that suggests 2828 Kbps would be about right for the same quality. This is a rule of thumb, and will vary in practice depending on how much detail is in the source.

That’s a handy rule when figuring out the frame sizes for the different bitrates when doing multiple bitrate encoding. It’s also why my most used apps in compression are probably Excel and Calc.exe.

Aspect Ratio/Pixel Shape

Traditionally, web video always used square pixels, and hence the aspect ratio is solely determined by the frame size. But that’s not a hard-and-fast rule, and there’s a place for doing anamorphic encoding. Here are the times I consider using anamorphic:

•  When the output formats require it. There’s no square-pixel mode allowed for DVD.

•  When I’m limited by the source resolution. If I’ve got enough bitrate for 1080p, but my source is an anamorphic 1440 × 1080p24, I’ll encode matching that frame size. There’s no point in encoding more pixels than are in the source!

•  When motion is very strongly along one axis. For example, when we did the NCAA March Madness event with CBS Sports, we found that the motion during the games was very strongly biased along the horizontal. By squeezing the video to 75 percent of the original wide (like 480 × 360 instead of 640 × 360) we were able to get better overall compression efficiency.

Square-pixel is slightly more efficient to encode, so it’s the right default when there’s not an obvious reason to do something else.

Bit Depth and Color Space

Modern codecs in common use for content delivery are all 8-bit per channel. While H.264 has potentially interesting 10- and 12-bit modes, neither are supported by existing CE devices and standards. 10-bit would mainly be used in making an archival or mezzanine file for later processing using a codec like Cineform, DNxHD, or ProRes.

Most delivery codecs are 4:2:0 as well, with potentially future exceptions being the same H.264 High Profile variants. Similarly, 4:2:2 would mainly be used in an intermediate/archive/mezzanine file, particularly with interlaced source.

Frame Rate

Modern formats let you specify duration per frame, changing frame rate on the fly. Some older formats like VideoCD only offer a specific frame rate, and MPEG-1 and MPEG-2 have a limited number of options (the native frame rates for PAL, NTSC, and film).

More modern formats like MPEG-4 allow variable frame rates as well, where the timebase itself changes between parts of the file. However, most tools don’t support that directly.

In general, you want to use the same frame rate as the source. For anything other than motion graphics or screen recordings, less than 24 fps starts looking less like a moving image and more like a really fast slideshow. Most web and device video thus winds up in the 24–30 fps range.

However, for PC playback, we’ve now got PCs fast enough to do 50/60p playback well, which can deliver a sense of immediacy and vibrancy well beyond what 30p can do. So when you’ve got high-rate progressive content and a reasonable amount of bandwidth, consider using the full frame rate. And with interlaced sources, it’s possible to use a bob technique (described in the previous chapter) to encode one frame out of each field, so taking 25i to 50p and 30i to 60p. That’s a lot more effort in preprocessing, but can be worth it for sports and other content with fast motion.

Interestingly, the bigger the frame size, the more sensitive we are to frame rate. So even if an iPod could play back 60p, it wouldn’t have that much of an impact on the experience. But stick 60p on an HD display, and it can be breathtaking.

Frame rate has a less linear impact on bitrate than you might imagine, and less even than frame size. When the encoded frames are closer in time, there’s less time for the image to change between frames, and thus interframe encoding can be a lot more efficient. Also, the less time each frame is on the screen, the less time for any artifact or noise to be seen; they average out more, letting the underlying video shine through better.

Thus, it’s only at really low bitrates where I’d even consider dropping frame rate below 24 fps; south of 300 Kbps for modern codecs. It’s almost always better bang for the bit to reduce the frame size first, down to 320 × 240 or less, before reducing frame rate below 24 fps.

When reducing frame rate, the central rule is to only divide it by a full number, as discussed in preprocessing. But it’s such an important point (it’s second only to encoding interlaced as progressive and getting the aspect ratio wrong in things for which I will mock you in public for getting wrong), here’s that table again.

Table 7.1 Acceptable Frame Rates.

Frame RateFullHalfThirdQuarterFifth
23.97623.97611.9887.9925.9944.795
24.00024.00012.0008.0006.0004.800
25.00025.00012.5008.3336.2505.000
29.97029.97014.9859.9907.4935.994
30.00030.00015.00010.0007.5006.000
50.00050.00025.00016.66712.50010.000
59.94059.94029.97019.98014.98511.988
60.00060.00030.00020.00015.00012.000
120.000120.00060.00040.00030.00024.000

Even the simple difference between 23.976 and 24 matters; getting the frame rate off by that 1000/1001 difference means that one of a thousand frames will be dropped or duplicated. That can absolutely be noticed if there’s motion at that moment, and that’ll happen about every 41 seconds.

Keyframe Rate/GOP Length

Virtually all codecs let you specify the frequency of keyframes, also called the GOP length. In most cases, the parameter is really “keyframe at least every” instead of “keyframe every.” The codec will also insert natural keyframes at cuts and this resets the “counter” for keyframe every. For example, having a keyframe rate of “every 100” normally doesn’t mean you’ll get a keyframe at frames 1, 101, 201, 301, and so on. In this case, if you had scene changes triggering natural keyframes at 30 and 140, you’d get keyframes at 1, 30, 130, 140, 240, and so on.

This is the optimal behavior. The main use of keyframes is to ensure the bitstream has sufficient keyframes to support random access, and to keep playback from being disrupted for too long after a dropped frame or other hiccup.

The drawback of keyframes is that they don’t compress as efficiently as delta frames. By sticking them at scene changes which need to be intra coded anyway, we wind up not wasting any extra bits.

One concern in having too frequent fixed frames is “keyframe strobing,” where the visual quality of a keyframe is different from that of the surrounding delta frames. With a short GOP, a regular pulsing can be quite visible and annoying. Modern codecs have made great strides in reducing strobing, particularly when doing 2-pass, lookahead, or VBR encoding. Open GOP also reduces keyframe strobing.

Typical keyframe values vary wildly between formats and uses. In interframe-only authoring codecs, every frame is a keyframe. MPEG-1 and MPEG-2 typically use a keyframe every half second. This is partly because those formats are typically used in environments very sensitive to latency (set-top boxes and disc-based media), and because they have a little bit of potential drift every frame due to variability in the precision of the inverse DCT, which means you don’t want to have too long a chain of reference frames.

For the web, the GOP lengths are typically 1–10 seconds. Beyond that, the cost to random access is high, but the compression efficiency gains aren’t significant. However, if keyframe flashing is a significant problem, reducing the number of keyframes will make the flashes less common. Generally the GOP length goes down as the bitrate goes up, since the efficiency hit on I-frames is lower, but the cost of decoding a frame for random access is higher. Note that the random access hit is proportional to the number of reference frames per GOP, so B-frames don’t count as part of that.

Inserted Keyframes

A few tools let you manually specify particular frames to be keyframes. Like natural keyframes, inserted keyframes typically reset the GOP length target.

This was critical when Cinepak was the dominant codec, as its automatic keyframe detection was so lousy. Modern codecs are a lot better at inserting keyframes automatically, so you should only do manual keyframing if there’s a particular problem you’re trying to address. For example, a slow dissolve might not have any single frame that’s different enough than the previous to trigger a keyframe, so you can make the first full frame after the dissolve a keyframe. More commonly, manual keyframes are used to ensure easy random access to a particular frame; many DVD authoring tools set chapter marked frames as I-frames as well. The markers set in tools like Premiere and Final Cut Pro are used for that.

B-Frames

As discussed back in the compression fundamentals chapter, a B-frame is a bidirectional frame that can reference the previous and next I or P frame. These are available in the modern codecs, although off by default in some tools, particularly in older Windows Media encoders.

B-Frames normally improve quality for a few reasons:

•  Bidirectional references are more efficient, reducing bits needed on average substantially.

•  Since a B-frame isn’t a reference frame, they can be encoded with only enough bits to look good enough, without worrying about being a good reference for later frames.

•  The net effect is that bits saved on B-frames can be spent on I and P frames, raising the quality of the reference frames that the B-frames are based on in the first place.

Beyond compression efficiency, B-frames have some performance advantages as well.

•  Because no frames are based on them, B-frames can be dropped on playback under CPU stress without messing up future frames.

•  Because B-frames reduce the number of P-frames between I-frames, random access is faster as the number of reference frames needed to be decoded to skip to a particular frame goes down.

Open/Closed GOP

Closed GOP is one of those terms of art that seems so minor, but keeps coming up again and again.

The core idea is simple. In an Open GOP, the first frame of the GOP can actually be a B-frame, with the I-frame following. That B-frame can reference the last P-frame in the previous GOP, breaking the whole “each GOP is an independent set of frames” concept. But Open GOP improves compression efficiency a little, and can be a big help in reducing keyframe strobing, since that B-frame can be a mixture of the old and new GOP, and thus smoothes out the transition.

Normally Open GOP is fine, and is the default in many codecs and tools.

Figure 7.3 Closed and Open GOPs. An Open GOP starts with a B — frame that can reference the last P — frame of the previous GOP.

image

Minimum Frame Quality

Web-oriented codecs often expose a “Quality” control that is actually a spatial quality threshold. This threshold sets a maximum quantization (and thus minimum quality) allowed for any frame. However, since a bunch of big frames could overflow the VBV, the codec eventually will have to drop frames in order to maintain the minimum quality. The net effect of spatial quality threshold functions is that when the video gets more complex, the quality of each frame remains high, but the frame rate could drop, sometimes precipitously.

Most users find irregular frame rates more irritating than a lower, but steady frame rate. For that reason, I normally set spatial quality thresholds to their absolute minimum. If I find the spatial quality isn’t high enough, I reduce the frame size, or in extremis, drop the frame rate in half. I’d rather watch a solid 15 fps than video bouncing between 30 fps and 7.5 fps.

Encoder Complexity

Many codecs offer options to tune for higher quality (at slower speed) or higher speed (at lower quality). The primary technique to speed up encoding is to reduce the scope and precision of the motion search. Depending on the content, the quality difference may be imperceptible or substantial; content with more motion yields more noticeable differences. Speed difference with a particular codec can vary widely; 10x or even 100x between the absolute slowest and absolute fastest modes. However, complexity only impacts the codec’s portion of the compression process, not the source decode and preprocessing. So, if the bulk of the compression time is spent decoding AVCHD and doing a high quality scale down to a portable screen size, the net impact of different codec modes of the final 320x176 encode can be small.

My philosophy is that I can buy more and faster computers, but I can’t buy users more bandwidth. Thus, I err on the side of slower, higher-quality modes. I certainly do comps in faster modes, to verify that the preprocessing is correct and the overall compression settings seem correct, and to test integration. But if I encode at a setting that might be leaving some pixels on the floor, I often wind up redoing it in the more complex mode “just to see if it would make a difference.” That said, it’s easy to get past the point of diminishing returns. Making the encode 3x slower than the default may yield an extra 15 percent efficiency. But it might take a further 5x slowdown to get that last 5 percent more efficiency. It’s only worth taking more time as long as it is yielding a visible improvement in quality.

When used for real-time encoding, codecs may also make trade-offs to internally optimize themselves for speed. Speed is much more critical in real-time compression, because if the encoder isn’t able to keep up, frames will be dropped on capture.

Different encoders can have wildly varying and large numbers of options, which I’ll talk about in their particular chapters.

Achieving Balanced Mediocrity with Video Compression

I introduced the concept of “balanced mediocrity” in Chapter 4. Balanced mediocrity is supposed to be an amusing name for a very serious concept: getting all the compression settings in the right balance. You’ll know when you’ve achieved the right balance when changing any parameter, in any direction, reduces the quality of the user’s experience. “Quality” in this case equates with “fitness for use.” A quality video compression is one that’s most fit for its intended purpose—“quality” exists only in context.

For example, take a file compressed at 30 fps. If 30 fps is the optimal value, raising it to 60 fps would mean the fewer bits per frame hurt the file more than having smoother motion helped it. Conversely, reducing the frame rate to 15 would hurt the perception of motion more than any improvement in per-frame image quality would.

Choosing a Codec

Picking a codec is so basic and so important, it can be difficult to conceptualize the trade-offs. Sometimes it doesn’t matter at all; only one is available for a device, or there is a clearly superior one. Other times there are several available bitstream choices or encoders for the right bitstream. There are a number of things you’re looking for in a codec, but they all revolve around solving the problem posed by our three questions:

1.  What is your content?

2.  What are your communication goals?

3.  Who’s your audience?

You want the codec that can do the best job at delivering your content to your audience, while meeting your communication goals.

The three big issues you face in picking a codec are compression efficiency, playback performance, and availability.

Compression efficiency

Codecs vary radically in compression efficiency. New ones are much, much better than old ones. For example, H.264 High Profile can achieve quality at 100 Kbps that Cinepak required more than 1,000 Kbps for. Some codecs work well across a wide range of data rates; others have a minimum floor below which results aren’t acceptable.

Playback performance

Playback performance determines how fast the playback computer needs to be to decode the video. The slower the decoder, the faster a computer needed for playback, or the lower resolution and frame rate acceptable. Playback performance is typically proportional to pixels per second: height × width × fps. Data rate can also have a significant effect. So, the higher the target resolution, frame rate, and data rate, the faster the playback computer will need to be. If it’s a cut scene on a game that requires a Core 2 Duo processor, 1080p VC-1 is easily handled. But if the job is playing back six different video streams at once in Silverlight on a 1.8 GHz single-core P4, they better be pretty easy streams! Note that slower CPUs may allow less postprocessing in some players; hence a file may display at lower quality, but at full frame rate. B-frames may also be dropped on playback on slower machines. The worst case would be that only keyframes would be played.

Availability

Great-looking content that can’t be seen doesn’t count. When you think of availability, the questions you should ask yourself are: What percentage of the audience already can play that format and codec? How many of the rest are willing to download it? Are a lot of potential viewers on managed corporate desktops where they can’t install anything?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.57.43