CHAPTER 6
Preprocessing

Preprocessing encompasses everything done to the original source between the time it’s handed to you and the time you hand it off to a codec in its final color space. Typical preprocessing operations include cropping, scaling, deinterlacing, aspect ratio correction, noise reduction, image level adjustment, and audio normalization. It’s during preprocessing that whatever the source is becomes optimized for its delivery target.

Some projects don’t need preprocessing at all; if the source is already 640 × 480 4:2:0, and that’s what it’ll be encoded to, there’s nothing that needs to be done. Other times the preprocessing is implicit in the process; converting from DV to DVD requires a conversion from 4:1:1 to 4:2:2, but that’s done silently in most workflows. And in still other instances (generally for challenging sources needed for critical projects), preprocessing is a big operation, involving dedicated tools and maybe even some plug-ins.

The more experience I get as a compressionist, the more and more preprocessing is where my keyboard-and-mouse time goes. In the end, codecs are a lot more straightforward and scientific; once you know the technical constraints for a particular project and the optimum settings to get there, there’s only so much room for further tuning.

But preprocessing is as much art as science; one can always spend longer to make things ever so slightly better. I might be able to get 80 percent of the way to perfect in 15 seconds with a few checkboxes, and then spend another week getting another 18 percent, with that last gap from 98 percent to 100 percent just out of reach.

And for those hard, important projects, preprocessing can actually take a whole lot more CPU time than the final encode will. Give me some noisy 1080i HDV source footage that’ll be seen by a million people, and my basement office will be kept nice and toasty all weekend with many CPU cores going full blast.

For a quick sample of how preprocessing matters, check out Color Figure C.18. You’ll see good preprocessing at 800 Kbps can soundly beat 1000 Kbps without preprocessing.

General Principles of Preprocessing

Sweat Interlaced Sources

If you’re not compressing for optical disc or broadcast, you’ll probably need to compress as progressive. If you have interlaced source, you’ve signed up for a lot of potential pain in massaging that into a high-quality progressive output. So when you see fields in your source, make sure to budget for that in your time and quality expectations.

Use Every Pixel

You want every pixel that’ll be seen to include real video content you want to deliver. You want to make sure anything in your source that’s not image (letterboxing, edge noise, tearing, Line 21) doesn’t make it into the final output frame.

Only Scale Down

Whenever you increase height or width more than the source, you’re making up visual data that doesn’t actually exist, so there’s mathematically no point. Avoid if possible.

Mind Your Aspect Ratio

Squares shouldn’t be rectangles and circles shouldn’t be ovals after encoding. Make sure you’re getting the same shapes on the output screen as in the source.

Divisible by 16

For the most efficient encoding with most codecs, the final encoded frame’s height and width should be divisible by 16 (also called “Mod16”). The cropping, scaling, and aspect ratio may need to be nudged to get there. Some codecs like MPEG-1 and often MPEG-2 require Mod16.

Err on the Side of Softness

Lots of preprocessing filters can add extra erroneous details, like scaling artifacts and edges from deinterlacing. That’s basically adding incorrect high frequency details that’ll suck bits away from your image. At every step, you want to avoid adding wrong detail, and if the choice is between too sharp and too soft, be soft and make things easier for the codec.

Make It Look Good Before It Hits the Codec

Hopefully you’re using a tool that gives a way to preview preprocessing so you can verify that the output looks good before it hits the codec. Preview before you hit encode. It’s frustrating to get problems in the output and not know whether they’re coming from source, preprocessing, or compression.

Think About Those First and Last Frames

Lots of players show the first frame for a bit while they queue up the rest of the video, and then end with the last frame paused on the screen. Having something interesting in the first and last frames is a valuable chance to communicate, or at least not bore.

Decoding

Some older codecs have more than one decoder mode. Perhaps the most common of these is Apple’s DV codec in QuickTime. Each QuickTime video track has a “High Quality” flag that can be on or off. That parameter is off by default in most files, and ignored by most codecs other than DV. But if the DV codec is left with HQ off, it makes for a lousy-looking single field half-size decode (so 360 × 240 for NTSC and 360 × 288 for PAL). Most compression tools will automatically force the HQ flag on for all QuickTime sources, but if you are seeing poor quality from DV source, you need to go in and set it manually. This can be done via QuickTime Pro.

Figure 6.1 Here’s the magic checkbox (6.1A) that you need to hit to make QuickTime’s DV codec look good (only available in QuickTime Pro). The quality difference is dramatic; 6.2B shows the before, and 6.1C shows the after.

image

In most cases, we want sources to be uncompressed or in a production grade codec. However, those files can be far too big for easy transport via FTP, and so more highly compressed files are often provided. These are called mezzanine files, and should be encoded at high enough bitrates that they don’t have visible artifacts. But that’s not always the case.

If your source uses lossy compression, there’s the potential of artifacts. Blocking and ringing in the source are false details that’ll suck away bits from the real detail. When source and encode are macroblock-aligned (e.g., if no cropping or scaling was done, so the 8 × 8 block edges are on the exact same pixels in source and output) the artifacts are not as bad, since the detail is between blocks where it won’t mess up any particular DCT matrix. But when resizing loses macroblock alignment, those ugly sharp (high-frequency) edges land right in the middle of blocks in the new codec, really hurting compression. Worse, those artifacts have a fixed position, ignoring the actual motion in the source. So they also confuse motion estimation, due to a mismatch of static and moving pixels inside each block. Nightmare. Generally, any source that has visible compression artifacts is going to be significantly harder to encode than a cleaner version of the same source, and will wind up with even more compression artifacts after encoding.

The first thing to do is try to fix whatever is causing artifacts in the source. Way too many video editors don’t know how to make a good mezzanine file, and just use some inappropriate web bitrate preset from their editing tool.

But if the only source you have has artifacts, you’ve got to make the best of it.

While there are some decent (if slow) filters that can be applied to video after decoding to clean up DCT blocks, it’s much more effective and much faster to apply deblocking in the decoder. Since the decoder knows how compressed the video is and exactly where block boundaries are, it can make a much quicker and more accurate determination of what’s an artifact and what’s not.

MPEG-2

MPEG-2 is by far the most common source format I get with source artifacts. There’s so much MPEG-2 content out there that’s blocky, whether it’s coming from HDV, lower bitrate XDCAM, DVD, ATSC, or DVB. And MPEG-2 doesn’t have any kind of in-loop deblocking filter to soften edges.

There’s a bunch of MPEG-2 decoders out there, of course. Unfortunately, the consumer-focused decoders tend to be the ones with integrated postprocessing. Those bundled into compression tools never seem to, darn it. Hopefully we’ll see compression products take advantage of the built-in decoders.

When I get problematic MPEG-2, I tend to go to AVISynth using DGMPGDec, which includes a decoder .dll (DGDecode) and an indexing app (DGIndex) that makes a index file required for DGDecode from the source. It’s complex, but it works. If I have source with visible blocking, I go ahead and turn on cpu 4 mode, which applies horizontal and vertical deblocking, and if it’s bad enough to have obvious ringing, I’ll use cpu = 6, which adds deranging to luma and chroma. It can be a little slow if trying to play full HD, but is well worth it for decoding. Using idct = 5 specifies IEEE 64-bit floating-point mode for the inverse DCT, minimizing the chance of decode drift at the sacrifice of some speed. Here’s an example script that will deblock a HDV file: # Created by AVSEdit# Ben Waggoner 6/8/2009LoadPlugin(“C:Program FilesAviSynth 2.5pluginsDGDecode.dll”)MPEG2Source(“Hawi_1920 × 1080i30_HDV.d2v”,cpu = 4,idct = 5)

Figure 6.2 High – motion HDV source with deblocking decoding turned on (6.2A) and off (6.2B). The deblocked version will be much easier to encode.

image

DGDecode also includes a filter called BlindPP filter that will deblock and dering already decoded frames. It’s not as accurate, but can be quite useful in a pinch.

VC-1

Windows Media’s VC-1 is a more efficient codec than MPEG-2 and has an in-loop deblocking filter available that helps somewhat (but used to be automatically turned off for encodes beyond 512 × 384 or so in size). When VC-1 is used as a mezzanine file, it’s typically at a 1-pass VBR quality level high enough to be clean.

But if not, Microsoft’s built-in WMV/VC-1 decoders will automatically apply deblocking and deranging during playback as needed for lower bitrate content, when sufficient extra CPU power is available (Figure 6.3). Whether this gets turned on or not when doing file-to-file transcoding can be unpredictable. Fortunately, there is a registry key that turns on full deblocking and deringing for all WMV/VC-1 decoding. Since it’s an adaptive filter, it won’t soften the image when frames are high quality, so it’s safe to leave on for all transcoding. The only drawback is that it will make playback somewhat more CPU-intensive.

Figure 6.3 WMV source files encoded in recent years shouldn’t have this kind of quality issue, but if you get some, postprocessing with strong deblocking and deringing can be a big help.

image

Figure 6.4 Alex Zambelli’s invaluable WMV PowerToy, set to maximum VC-1 postprocessing.

image

The simplest way to set the key is to use WMV 9 PowerToy (Figure 6.3). I recommend at least strong deblocking it be left on anytime you’re trying to transcode from WMV or VC-1 files, with deringing used as needed.

H.264

The good/bad news is that H.264’s strong in-loop deblocking filter generally means it’s had about as much filtering as it can handle. Generally, H.264 gets soft as it gets more compressed. Thus there’s not much that can be done on playback to improve H.264. It’ll take general noise reduction and other image processing steps described later in this chapter.

Color Space Conversion

601/709

Ideally, making sure that your file is flagged correctly with the same color space as the source will take care of things. However, many devices ignore the color space flag in the file and always decode as one or the other, or else base it on frame size.

If there’s a difference in 601/709 from your source and what you need in the output, you should make sure that conversion is applied if color is critical to the project. Some compression tools like ProCoder/Carbon include a specific conversion filter for this, as do all the video editing tools. And if the source is RGB, you want to make sure it’s converted to the correct color space as part of RBG > Y′CbCr Conversion.

Refer to Color Figure C.16 for the risks of poor 601/709 conversion.

Chroma Subsampling

When converting from higher subsampling like 4:2:2 to 4:2:0, a simple averaging is applied almost universally and almost universally works well. It’s when you need to interpolate on one axis that potential quality issues can intrude. The classic case of this is coming from NTSC DV 4:1:1 to 4:2:0, particularly for DVD.

There’s no way around the fact that 4:1:1 doesn’t provide enough detail for the sharp edge of a saturated color. But when the 4-pixel wide chroma sample’s value is simply copied twice into two 2-pixel wide sample, you get aliasing as well, with a sharp edge. This is essentially using a nearest neighbor scale on the chroma channel. The simplest thing to do is use a DV decoder that integrates a decent interpolation filter, easing the transition between adjacent samples. It doesn’t get back the missing color detail, but at least it isn’t adding bad high-frequency data on top of that.

Dithering

When converting from 10-bit or higher sources down to 8-bit, you’ve got a lot more luma resolution to use to determine what’s signal and what’s noise. But in the end, you’re going to take your 64–940 (10-bit Y′CbCr) or 0–255 (RGB) range and get a 16–235 range. And you want a few particular characteristics in that output:

•  The histogram should match the pattern of the source; no spikes or valleys due to different amounts of input values mapping to particular output values.

•  There shouldn’t be any banding in output that’s not present in the source.

The key process here is called dithering or random error diffusion. The core concept is that there’s not a perfect way to translate the luma values. If one simply truncates the two least-significant bits of a 10-bit value, you’ll see Y′ = 64 turn into Y′ = 16, but Y′ = 62, Y′ = 63, and Y′ = 65 become Y′ 16 too. And thus, what was smooth in the source can wind up banded with a sharp boundary between Y′ = 16 and Y′ = 17 in the output.

This truncation is probably the worst approach. Y′ = 65 should really come out as Y′ = 16.25; but have to end with a whole number. Instead we could randomly assign 75 % of pixels of Y′ = 65 to become Y′ = 16 and 25 % Y′ = 65 pixels to Y′ = 17. And instead of a big region of Y′ = 16 with an obvious seam between it and a big region of Y′ = 17, we get a slowly changing mix of Y′ = 16 and Y′ 17 across the boundary. Big improvement! There’s lots of more advanced ways to do dither than this process, but that’s the basic idea. Though adding a little randomness may seem like it would hurt compression, it’s very slight compared to even moderate film grain.

Unfortunately, very few compression tools include dithering at all. This is generally the province of high-end Blu-ray/DVD encoder tools; Inlet’s high-end Fathom and Armada products are the only I know of that can use good dithering with web formats. And for once AVISynth is of no help to us; it’s built from the ground up as an 8-bit-per-channel technology. So, for my readers who work for compression tool vendors, get on it!

And if you do, may I suggest Floyd-Steinberg dithering with a randomized pattern? That seems to do the best job I’ve seen of creating as little excess noise as needed while providing a uniform texture across the frame, and is the secret behind many of the best-looking HD DVD and Blu-ray titles.

See Color Figure C.19 for a demonstration of what dithering can accomplish.

Deinterlacing and Inverse Telecine

As you learned in Chapter 5, there’s way too much interlaced video in the world. This means that the even and the odd lines of video were captured 1 / 59.94th of a second (or 1/50th in PAL) apart. If you’re going from interlaced source to an interlaced display format without changing the frame size, you should leave the video interlaced if your target playback device is DVD, broadcast, or any other natively interlaced format. However, most other delivery formats either require or support progressive content.

Depending on the nature of your source, you will need to do one of four things to preprocess your footage for compression—deinterlace, bob, inverse telecine, or nothing. Failure to deinterlace an interlaced frame that is then encoded as progressive results in severe artifacts for all moving objects, where a doubled image is made of alternating lines. This looks really bad on its own merits, and it then hurts compression in two ways. First, the high detail in the interlaced area will use a whole bunch of your bits, starving the actual content. Second, moving objects split into two and then merge back into one as they start and stop moving, wreaking havoc with motion estimation. Failure to deinterlace when needed is among the worst things you can for your video quality.

Deinterlacing

You can tell if content was originally shot on video, because every frame with motion in it will show interlacing in the moving parts. Samples of the various deinterlacing modes can be found in Color Figure C.20.

Field elimination

The most basic form of deinterlacing simply throws out one of the two fields. This is very fast, and is often used by default on low-end products. The drawback is that half of the resolution is simply thrown away. For 720 × 480 source, this means you’re really starting with just a 720 × 240 frame. Throw in a little cropping, and you’ll actually have to scale up vertically to make a 320 × 240 image, even though you’re still scaling down horizontally. This produces a stair-stepping effect on sharp edges, especially diagonals, when going to higher resolutions. Still, it’s a whole lot better than not deinterlacing at all.

Spatial adaptive

A spatial adaptive deinterlacer compares a given pixel to the pixels around it to see if there’s a better match between the neighboring area in the other field (the lines above and below) versus its own field (the lines two above and two below). If the neighboring area is at least a good match, that part of the image probably isn’t moving, and so can be deinterlaced. The net effect is that static parts of the image are left alone and moving parts are deinterlaced. So where you can easily see detail, it’s there, and when the jagged edges come in, it’s where motion would mask it at least somewhat.

Spatial adaptive used to be significantly slower than a basic deinterlace, but it’s not a significant hit on modern PCs. It’s really the minimum needed for a half-way decent compression tool.

Motion adaptive

A motion adaptive deinterlacer compares the current frame with a past and future frame to get a better sense of where motion is taking place, and is essentially an improved spatial adaptive. It can be slower to preview, as multiple frames need to be decoded to show any particular one. But it’s not much worse for encoding when the input and output frame rates are the same, since all frames need to be decoded anyway.

Motion search adaptive

A motion search adaptive deinterlacer tracks the motion between frames. This enables it to reassemble the even and odd lines of moving objects, much improving their edges.

Unfortunately, motion search adaptive deinterlacing is rarely seen in mainstream compression tools. A big part of this is that it’s really slow, particularly with HD. It can take longer to deinterlace than to actually encode the video. It’s possible with several tools:

•  Compressor, using the Advanced Image Processing pane.

•  Episode, via Create new Fields By: Motion Compensation.

•  AVISynth, with a variety of plug-ins. I’m a fan of the EEDI2 and TIVTC combination.

Blend (a.k.a. “I hate my video, and want to punish it”)

One technique I normally avoid is a “blend” deinterlace. These simply blur the two fields together. This blending eliminates the fine horizontal lines of deinterlacing, but leaves different parts of the image still overlapping wherever there’s any motion. And it softens the image even where there isn’t motion. Though better than no deinterlacing at all, this blending makes motion estimation very difficult for the codec, substantially reducing compression efficiency. Plus, it looks dumb. Please don’t do it.

Figure 6.5 Bob deinterlacing turns each field from an interlaced frame into individual progressive frames.

image

Bob

In normal deinterlacing, we take an interlaced frame in and get a progressive frame out, eliminating any of the “off” field with motion that doesn’t match. With a bob deinterlace, we generate one progressive frame out of each field of the source; so 25i to 50p or 30i to 60p. It’s called “bob” because the up-and-down adjustment applied to where the active field lands in each output frame looks like something bobbing up and down, (and isn’t named after some legendary compression engineer named Robert).

When bitrate isn’t of paramount concern, this is the best way to preserve high-action content, particularly sports. And doubling framerate doesn’t require doubling bitrate; since the closer the frames are in time the less has changed between them, and thus fewer bits are needed to encode the difference. About 40–50 percent higher bitrate is typically sufficient to keep the same perceptual quality when doubling frame rate.

Bobbing is one way to extract some more value out of lower-quality interlaced sources. Since the temporal detail of interlaced is perfect, more frames gives the encode a more realistic feeling, and at the higher frame rate, any frame’s random noise is visible for only half as long, distracting less from the content.

Bob requires excellent deinterlacing quality, particularly with horizontal or shallow diagonal details. Since the serif of a capital “H” may only have been in one of the two fields in the first place, the deinterlacer will need to be able to reconstruct that accurately. Otherwise, those small details will flicker in-and-out every frame, which can be extremely distracting. This requires a motion adaptive deinterlacer, and ideally a motion search-adaptive one. Once again, Compressor, Episode, and AVISynth are good products for this, due to having good motion search adaptive deinterlacing, and all support bob.

Telecined Video—Inverse Telecine

Content originally produced on film and transferred to NTSC video via the telecine process should be run through an inverse telecine process. As you learned in Chapter 4, telecine takes the progressive frames of film and converts them into fields, with the first frame becoming three fields, the next frame two fields, then three, then two, and repeating. See Color Figure 21 for an illustration of how they are assembled.

The great thing about telecined content is that reversing it restores a 24p sourced content to its native resolution and speed, eliminating the need for deinterlacing and making motion accurate. It’s basically as good as having 24p source in the first place. You can recognize telecined content by stepping through the source one frame at a time. You’ll see a repeating pattern of three progressive frames followed by two interlaced frames. This 3:2 pattern is a good mnemonic to associate with 3:2 pulldown.

Bear in mind that the film and its corresponding audio are slowed down by 0.1 percent, just as 29.97 is slowed down from 30 fps. This means the 3:2 pulldown pattern can be continued indefinitely, without any adjustments for the 29.97. However, it also means the inverse telecine process leaves the frame rate at 23.976, not 24. And note some compression tools require the encoded frame rate to be manually set to 23.976 when doing inverse telecine.

One “gotcha” to watch out for is with video that was originally telecined, but was then edited and composited in a postproduction environment set up for interlaced video. The two big problems are cadence breaks and graphics mismatches. In a cadence break, a video edit doesn’t respect the 3:2 pulldown order, and might wind up with only a single field remaining from a particular film frame. This breaks the stride of the 3:2 pattern. You’ll need to be using an inverse telecine algorithm that can find and correct for cadence breaks.

In a graphics mismatch, the graphic overlay doesn’t match the movement of the film source underneath. This could be due to the video having been field-rendered, so you could have a title that moves every field, while the underlying film source only changes every two or three fields. The result of this is an odd mish-mash of both types. There isn’t really any straightforward way to deal with this kind of source beyond having an inverse telecine algorithm smart enough to reassemble the source based on the underlying film motion, and then automatically deinterlacing the motion graphics part of the image. Delightfully, film transferred to PAL doesn’t have to run through any of that 3:2 rigmarole. Instead, the 24 fps source is sped up 4 percent, and turned into 25 fps progressive. Yes, it’s progressive! And sports no decimals. The only warning is that if someone does compositing or other processing to that source with fields turned on, it’ll ruin the progressive scan effect. Generally, adaptive deinterlace filters do an excellent job with this kind of footage, restoring almost all the original 25p goodness. This is only one of the many delights of PAL that leave me fantasizing about moving to Europe.

Mixed Sources

What about content where different sections use different field patterns? This is pretty common with film documentaries that’ll mix 24p source of clips with 30i source of interviews.

If the video content is predominant, your most likely choice is to treat all of your source as interlaced video. If the film content is predominant, you could try to use an inverse telecine filter. Also, the judder of converting to a different frame rate is much more apparent with fast motion than a talking head. Ideally, a compression tool would be able to vary the output frame rate between sections when going to a format such as QuickTime that supports arbitrary frame timing. Expression Encoder will do this if the source has variable frame rate, but isn’t able to dynamically figure out where to apply 3:2 pulldown by section.

Again, it gets down to some specialized AVISynth filters to reliably be able to switch frame rates by section. Unfortunately, AVISynth itself requires a fixed frame rate per project, so to get real 24/30/60 adaptibility, AVISynth needs to run at 120 fps (5 × 24, 4 × 30, 2 × 60), and then you have to hope that the encoder won’t encode duplicate frames.

Progressive Source—Perfection Incarnate

The best option, of course, is to get your video in progressive scan in the first place. Remember what I said about never shooting interlaced?

When working with progressive source, make sure that any compositing and rendering is done in progressive scan as well. All it takes is one field-rendered moving title to throw the whole thing off. Generally, most tools will do the right thing when working with compressed bitstreams that include metadata indicating picture structure. But when you capture uncompressed video to .mov or .avi, that data isn’t always indicated, so a given tool might assume interlaced is progressive or vice versa. It’s always good to double-check.

Cropping

Cropping specifies a rectangle of the source video that is used in the final compression. While we can think of it as “cutting out” stuff we don’t want, it’s really just telling the rest of the pipeline to ignore everything outside the cropping box. There are a few basic things we always want to crop out for computer playback:

•  Letterboxing/pillarboxing. Drawing black rectangles is easy; we don’t need to waste bits and CPU power baking them into our source and then decoding them on playback.

•  Any “blanking” or other noise around the edge of the video.

Having unwanted gunk in the frame occurs much more frequently than you might think. Content originally produced for analog television typically doesn’t put image all the way to the edge, while what we create on a computer or modern digital camera typical does.

Edge Blanking

Broadcast video signals are overscanned—that is, they’re larger than the viewable area on a TV screen. This is done to hide irregularities in the edges of aging or poorly adjusted picture tubes on old analog TVs, and to account for the distortion caused by the curvature at the edges of every CRT. Bear in mind that the flat panel era is less than a decade old and follows 50 years of the CRT; there’s a huge library of content shot for CRT, and plenty new stuff that still needs to work well on CRT. The area of the video frame that carries the picture is called the active picture. The areas outside the active picture area are called blanking areas.

When describing video frame sizes, for example 720 × 480 or 720 × 486, vertical height (a.k.a. the 480 and 486 in these example sizes) is expressed as active lines of resolution. The number of pixels sampled in the active length in our two examples is 720. Active lines and line length equate one-for-one to pixels in the digital domain.

So what’s all that mean in practical terms? Examine a full broadcast video frame and you’ll see the blanking quite clearly (Figure 6.6). It appears as black lines bordering both the left and right edges, as well as the top and bottom of the frame. Sometimes you’ll also see black and white dashes at the top of the screen: that’s Line 21, which contains the closed captions in a sort of Morse Code-like pattern.

Figure 6.6 VHS can have a lot of extreme edge noise, including Line 21, horizontal blanking, and tearing at the bottom. For DVD output that should be blanked out, but cropping should be used for formats that don’t have a fixed frame size.

image

On the other hand, computer monitors and similar digital devices show every pixel. CRT monitors are naturally analog, but instead of cutting off the edges of the video, the whole image is shown with a black border around the edge of the screen. So, when displayed on a computer monitor or in a media player’s screen, blanking from the analog source is extraneous and distracting, and you should crop it out. The active image width of 720-pixel wide source is typically 704 pixels, so you can almost always crop eight pixels off the left and right sides of the video. Different cameras and switching equipment can do different things around those edges, so you may see the width of the blanking bars vary from shot to shot; you’ll need to crop enough to get inside all of them.

If you’re unfortunate enough to have VHS source, you’ll encounter tearing. This is a dramatic distortion in the lower portion of the frame. Such distortion usually occurs outside the viewable area of a television screen, but is ugly and quite visible when played back on a computer monitor. Crop it out as well.

You don’t have to crop the same amount from every side of the video—if there’s a thick black bar on the left and none on the right, and there’s VHS tearing at the bottom but the top looks good, crop only those problem areas. One important special case for cropping is if you’re authoring content that could go back out to an analog display, like DVD. If you have some tearing or whatever in a 720 × 480 source, you’re generally better off blanking out the edges instead of cropping, since cropping requires the video to be then scaled as DVD has a fixed frame size. What you want is the video to be framed so that it looks good if someone’s watching on an overscan display, but there’s no distracting junk (especially Line 21!) if they’re watching the full raster (Figure 6.7).

Figure 6.7 Cropped (6.7A) and blanked (6.7B) versions of Figure 6.6.

image

Letterboxing

For their first several decades, feature films were mainly produced in the 4:3 (1.33:1) aspect ratio that broadcast television then adopted. But once TV caught on, film studios decided they needed to differentiate themselves, and so adopted a number of widescreen formats (including the storied Cinemascope, Vista-Vision, and Panavision) for film production, with the most popular aspect ratios being 1.85:1 and 2.4:1. Modern HDTVs are now 16:9 (1.78:1), which is a little squarer than 1.85:1. Outside of televisions, the most common computer monitor aspect ratio today is 16:10. Lots of portable devices are 4:3 or 16:9, but some—like the iPhone—are 3:2 (1.5:1).

Thus, we have a bunch of potential mismatches between content and display. We could have source at anything from 4:3 to 2.4:1 (almost double the width), with displays (outside of theaters) between 4:3 and 16:9.

So, how do you display video of a different shape than the screen? There are three basic strategies: pan-and-scan, letterboxing, and stretching. Pan-and-scan entails chopping enough off the edges of the film frame for it to fit the 4:3 window. On the high end, operators spend a lot of time trying to make this process look as good as possible, panning across the frame to capture motion, and scanning back and forth to catch the action. If you receive pan-and-scan source, you can treat it as any other 4:3 content in terms of cropping

But, especially when dealing with source beyond 1.85:1, film purists do not find pan-and-scan acceptable. The initial solution was letterboxing, which displays the full width of the film frame on a 4:3 display and fills the areas left blank above and below the 16:9 image with black bars. This preserves the full image but makes the image smaller by not filling the available screen real estate. Starting with Woody Allen’s Manhattan, some directors began insisting that their films be made available letterboxed. Letterboxing has since been embraced by much of the public, especially when DVD’s higher resolution made the loss of image area less of a problem. DVDs can also store anamorphic content at 16:9, which reduces the size of the black bars needed for letterboxing, and hence increases the number of lines of source that can be used. When compressing letterboxed source for computer delivery, you’ll generally want to crop out the black bars and deliver the content at its native aspect ratio; you can’t know in advance what the display aspect ratio is going to be and so have to trust the player to do the right thing. The last approach used is stretching: simply distorting the image to the desired shape. Distortions of less than 5 percent or so aren’t very noticeable. But going 16:9 to 4:3 is a very visible 33 percent stretch! Yes, some people do watch TV like that at home, but many other viewers (including myself) get very annoyed if that is done at the authoring stage. Some users will always mess up their presentations, but you want to make sure that viewers who care to optimize their experience can have a good one.

If a client requests you deliver the video at 4:3 or some other specific frame aspect ratio, you may have no choice but to either pan-and-scan the video yourself or deliver the video with the black bars intact. The pan-and-scan process is inevitably labor-intensive if you want to do it well. I’ve found that clients who insist all content be 4:3 tend to quickly change their minds when they see a realistic pan-and-scan bid (or, horror of horrors, just insist on a 4:3 center cut, and woe betide any action happening at the edges of the frame).

If you need to deliver in a set window size for browser-embedded video, ideally you’ll encode the video at the correct frame width and reduce height with wider aspect ratios. By scripting the web plug-in, you can usually specify that the video always appears vertically centered in the frame. This will keep the frame size constant, while still preserving the full aspect ratio.

Flash and Silverlight make it easy for the player to automatically resize to match the content with the right player tweaking.

When you have to add letterboxing

If you need to leave the bars in for a format like DVD or Blu-ray with a fixed frame size, make sure the bars are a single, mathematically uniform Y′ = 16 for more efficient encoding.

If you have to letterbox, try to get the boundary between the black and the image to fall along the 8 × 8 grid most DCT codecs use for luma blocks. And for a slight improvement beyond that, align along the 16 × 16 grid of macroblocks. This will allow complete blocks of flat color, which are encoded with a single DC coefficient. While most standard resolutions are already Mod16, there’s one glaring exception: 1920 × 1080. 1080 is only Mod8. When encoding as 1920 × 1080, the codec is actually internally coding at 1920 × 1088 with eight blank pixels beyond the edge of the screen. So, when letterboxing for 1080i/p, you’ll want the height of your top matte to be Mod16, but the height of your bottom matte to be Mod16 ± 8. And yes, it won’t be quite symmetric; I prefer to shift it up slightly, to leave more room for subtitles to fit outside of the image at the bottom. But don’t worry, no one will notice.

Table 6.1 shows my recommended crop and matte sizes for typical combinations of frame sizes, source aspect ratios, and content aspect ratios.

Table 6.1 Recommended Matte Sizes.

EncodedWidthHeightScreenContentTop Crop or MatteBottom Crop or MatteLeft & Right Crop or MatteActive Image
720 × 4807204804 × 34 × 3000720 × 480
720 × 4807204804 × 316 × 964640720 × 352
720 × 4807204804 × 31.85:164640720 × 352
720 × 4807204804 × 32.39:11121120720 × 256
720 × 48072048016 × 94 × 30096528 × 480
720 × 48072048016 × 916 × 9000720 × 480
720 × 48072048016 × 91.85:116160720 × 448
720 × 48072048016 × 92.39:164640720 × 352
720 × 5767205764 × 34 × 3000720 × 576
720 × 5767205764 × 316 × 980800720 × 416
720 × 5767205764 × 31.85:180800720 × 416
720 × 5767205764 × 32.39:11281280720 × 320
720 × 57672057616 × 94 × 30096528 × 576
720 × 57672057616 × 916 × 9000720 × 576
720 × 57672057616 × 91.85:116160720 × 544
720 × 57672057616 × 92.39:180800720 × 416
1280 × 720128072016 × 94 × 300160960 × 720
1280 × 720128072016 × 916 × 90001280 × 720
1280 × 720128072016 × 91.85:1161601280 × 688
1280 × 720128072016 × 92.39:1969601280 × 528
1920 × 10801920108016 × 94 × 3002401440 × 1080
1920 × 10801920108016 × 916 × 90001920 × 1080
1920 × 10801920108016 × 91.85:1162401920 × 1040
1920 × 10801920108016 × 92.39:112814401920 × 808

Safe Areas

While blanking is used to accommodate the foibles of television picture tubes, video safe areas were designed to further protect important information from the varying ravages of analog televisions. CRTs vary in how much of the active picture they show, both between models and over the life of a set. Also, portions of the image that appear near the edge of the screen become distorted due to curvature of the tube, making it difficult to see text or other fine details. Because of this, video engineers defined “action-safe” and “title-safe” areas where visually important content would be retained across the CRTs in the wild. The action safe area is inset five percent from the left/right/top/bottom edges of the frame. Assume anything falling within this zone will not be visible on all TV sets.

Figure 6.8 Aspect ratios in common use.

image

Figure 6.9 The classic safe areas for 4:3 video. No important motion will be outside of motion safe, and nothing needing to be visible without distortion will be outside of title safe.

image

Title-safe comes in 10 percent from the active image area. Important text is not placed outside the title-safe area, as it may not be readable by all viewers. This is where the phrase “lower thirds” comes from—when the bottom of the text or graphics needs to be 10 percent up from the bottom of the screen, the top of the text/graphics naturally starts about 1/3 of the way from the bottom.

Any television should be able to accurately display motion in the action-safe area, and display text clearly in the smaller title-safe area. As you can see, these areas limit the overall videosafe area by a significant amount.

Whether you crop out any of the safe area depends on your targeted output resolution. As discussed in potentially mind-numbing detail in the forthcoming section on scaling, we don’t want to scale the frame size up in either height or width. Thus, when you’re targeting higher resolution video, you want to grab as much of the source video as possible—if the output is 640 × 480, you’ll want to grab all of the 720 × 480 you can. However, if you’re targeting a lower resolution than the source, a more aggressive crop will make the elements in the middle of the screen larger. This will make them easier to see at low resolutions, particularly on phones and media players. If your footage was originally shot for video, the camera operators made sure any important information was within the safe areas, so a 5 percent crop around the edge to action-safe will be generally fine. Title-safe is riskier, but possible if you scrub through the video first.

When working with letterboxed source, you’ll want to do this before cropping the letterbox, of course. Content up to the edge of the matte is important to preserve. So with a 4:3 frame at 720 × 480 with 16:9 matted in (thus 720 × 352 image area), you could crop 18 pixels left / right and then crop out letterboxing, leaving 684 × 352. That’d make a nice 608 × 352 converted to square pixel, and thus would nicely fill an iPhone 3 G display with around a 480 × 288 rectangle. Had we started with the full 720 × 352 area, we’d have gotten 480 × 272.

Scaling

Cropping defines the source rectangle, and scaling defines the size and the shape of the output rectangle and how to get there. It sounds simple enough, but there are a number of goals here:

•  Have a frame size that’ll offer good quality with the content, codec, and bitrate.

•  Get displayed at the correct aspect ratio: circles aren’t ovals, and squares aren’t rectangles.

•  Be the optimal frame size for compression, typically with height and width divisible by 16.

•  Use an algorithm that provides a high-quality, clean image without introducing scaling artifacts.

Aspect Ratios

First, a few definitions of art terms to help keep things straight:

•  Pixel Aspect Ratio (PAR): The shape of each pixel. Square pixel is 1:1.

•  Display Aspect Ratio (DAR): The shape of the frame overall. Like 16:9 or 4:3.

•  The Nameless Ratio (no acronym): The actual height/width in pixels of nonsquare pixel video. So 720 × 480 would be 3:2.

That 3:2 is a particularly useless and misleading number, as there’s nothing 3:2 about the entire frame. If you plug that into Excel trying to figure out your frame size, you’re likely to wind up doing something crazy (and very common) like encoding at 360 × 240. The Zen trick is to tune out the algebra and just make sure the output aspect ratio matches the input aspect ratio. When encoding for a square pixel output (which will be true for most PC and device playback) just make sure the height/width of the output frame size matches that aspect ratio of the source image (after cropping out any letterboxing). Divide the output resolution of each axis by the ratio for that axis, and make sure the numbers are identical or nearly so. So, with 4:3 going to 320 × 240, divide 320 by 4 and 240 by 3. In this case, both values are exactly 80. Perfect. See Figure 6.10 for examples of correct and incorrect scaling, as well as the settings in Expression Encoder and Compressor.

Figure 6.10 Depending on your content and playback environment, either pan – and – scan or letterboxing can make sense to fit 16:9 content into a 4:3 screen (6.10A). But don’t just stretch it! The correct settings are shown for Expression Encoder (6.10B) and Compressor (6.10C).

image

When the output frame size is anamorphic (nonsquare pixel), it’s generally because you’re using the same aspect ratio as the source.

Downscaling, Not Upscaling

Anyone who has tried to take a low-resolution still photograph (like a DV still) and blow it up knows scaling can hurt image quality. Even a 2x zoom can look painfully bad with detailed content. There’s really almost never a good reason to do any upscaling in compression. Doing it well is really a creative process, and thus should be handled upstream in post.

The good news is that scaling down, when done correctly, looks great

Scaling Algorithms

Different scaling algorithms provide different scaling quality. And unfortunately, many tools use the same name for what are different implementations of the algorithms with different implications. And although it sounds crazy, the choice of a scaling algorithm can have a bigger impact on your video quality than all the codec tuning in the world.

There are a number of tradeoffs in scaling algorithms. A big one is speed; the bad ones are used because they’re fast. The ones that yield better quality, particularly when coming from HD source, can be painfully slow. Quality is next; you don’t want to see aliasing or ringing, even with big scaling ratios (see Figure 6.11). Going from 1920 × 1080 to 320 × 176 is a pretty typical encode these days. Algorithms also vary in how much they err on the side of softness versus sharpness as they scale down. Sharper can “pop” but also have more highfrequency data and hence are harder to encode.

Here are some of the common scaling types you’ll see.

Nearest-neighbor

Also called a box filter. Each output pixel is a copy of a particular input pixel. So if there’s a gradient over 5 pixels in the source, but the video is scaled down 5x, only one of those pixels gets picked for the output. This leads to crazy aliasing and some horrible-looking video. Shun it.

Bilinear

Bilinear is the first semidecent scaling algorithm, made somewhat confusing by highly varying implementations. A basic bilinear is generally good as long as the output is within 50–200 percent of the size of the original on each axis. However, it can start showing bad aliasing beyond that.

Some bilinear implementations, like that in AVISynth, include filtering that prevents aliasing and making it arguably the best downsampling filter, as it adds the least sharpness, and hence has less high-frequency data to throw off the codec.

Bicubic

Bicubic is the “good” choice in a lot of tools, including Adobe’s. It’s slower than bilinear, but offers good quality well below 50 percent of the original size. Different implementations vary a lot in sharpness; the VirtualDub version is a lot sharper than the AVISynth default.

Figure 6.11 Different scaling algorithms can result in very different images. And poor – quality algorithms can yield artifacts that are exaggerated by compression.

image

Lancsoz

Lancsoz is a high-quality filter, but definitely errs on the side of sharpness; you’ll need more bits per pixel to avoid artifacts with it. But if you’ve got bits to burn and are looking to pop off that phone screen, it does the job very well.

Super sampling

Super sampling is the best scaling algorithm in Expression Encoder, and perhaps my favorite. It’s very neutral between sharpening and softening, without significant ringing. It’s not the fastest, but the source is better enough that it can be worth using the slower scaling mode and a lower complexity on the codec; better source leaves the codec with less to do.

Scaling Interlaced

The biggest problem we’re likely to face with scaling is when taking interlaced source to progressive output. As discussed earlier, when deinterlacing with a nonadaptive system, or on a clip with high motion where adaptive deinterlacing isn’t able to help, in NTSC you’re really starting with 240 lines of source at best. This is the big reason progressive scan and film sources are so well suited to computer delivery. 16:9 is also good for this reason, as you spend more of your pixels horizontally than vertically. If you’ve only got 240 real lines, 432 × 240 from 16:9 can be a lot more impressive than the 320 × 240 you get from 4:3.

Mod16

All modern codecs use 16 × 16 macroblocks, and so really work in 16 × 16 increments. When you encode to a resolution that’s not divisible by 16, like 320 × 180 × (16:9), it rounds up to the next 16 internally, and then doesn’t show those pixels. So, 320 × 180 is really processed as 320 × 192, with the bottom 12 lines filled with dummy pixels that aren’t shown. It’s not a disaster, but it makes those bottom 4 lines less efficient to encode and decode. It would be better to use 320 × 176 in that case. As mentioned previously, this padding technique is used in 1080 video, where it’s really encoded as 1088 with 8 dummy lines.

Rounding out to Mod16 can throw your aspect ratio off by a bit. If it’s only a bit, don’t worry about it. A distortion to aspect ratio really doesn’t become really noticeable until around 5 percent, and even the fussy won’t notice below 2 percent. This can also be addressed somewhat by tweaking safe area cropping to closer match the final shape.

Noise Reduction

Noise reduction refers to a broad set of filters and techniques that are used to remove “bad stuff” from video frames. These errors encompass all the ways video goes wrong—composite noise artifacts, film grain and video grain from shooting in low light, film scratches, artifacts from prior compression, and so on. While noise reduction can help, sometimes a lot, it’s never as good as if it were produced cleanly in the first place. But when all the video editor hands you is lemons, you’ve got to make lemonade, even if there never is enough sugar.

There’s not always a fine line between noise and style. Film grain, for example, can be hard to encode at lower bitrates, but it really is part of the “film look” and the character of the source, and should be preserved when possible. Removing it may make for a cleaner encode, but may also be less true to the original. This is part of what makes preprocessing an art; you need to know what in the image is important to preserve.

The noise reduction features in most compression-oriented tools tend to be quite basic—they’re generally simple low-pass filters and blurs. Some codecs, like Microsoft’s VC-1 implementations, also include their own noise reduction methods, although they’re typically not that advanced.

The drawback in all noise reduction filters is that they make the video softer. Of course, softer video is easier to compress, but you want to keep the right details while getting rid of the bad ones.

Sharpening

I don’t recommend applying sharpening to video. Sharpening filters do make the image sharper, but sharper images are more difficult to compress. And sharpening filters aren’t selective in what they sharpen—noise gets sharpened as much as everything else. And at low-to-moderate bitrates, all that high-frequency data can make the compression a lot harder, pushing up the QP, and thus yielding blocks or softening of the new detail you tried to add anyway.

For the usual strange historic reasons, Unsharp Mask is another name for sharpening, typically seen in applications whose roots are in the world of graphic arts.

Blurring

The most basic and worst noise reduction filter is blur. There are many kinds, the most common being the Gaussian blur, which produces a very natural out-of-focus look. The only time a straight blur really makes sense if the source is flat-out horrible, like it had previously been scaled up nearest-neighbor, or is interlaced video that was scaled as progressive.

Some blur modes, like that in Adobe Media Encoder, allow separate values for horizontal and vertical. This is nice when dealing with deinterlacing artifacts, as those only need vertical blurring.

Low-Pass Filtering

A low-pass filter lets low frequency information pass through (hence “low-pass”) and cuts out high frequencies. For video, frequency is a measure of how fast the video changes from one pixel to the next. Thus, a low-pass filter softens sharp edges (including film grain), but not smoother parts of the image.

Still, low-pass is a pretty blunt instrument.

Spatial Noise Reduction

A spatial noise reduction filter uses a more complex algorithm than blur/low-pass, attempting to find where on a frame noise is and isn’t, and only trying to remove the noise. In general, spatial noise reduction will try to preserve sharp edges, and remove or reduce noise from flatter areas with a lot of seemingly random fine detail, like grain/gain. The challenge is they can’t always discriminate between static textures, like cloth, and actual noise. This problem is generally more of a false negative; noise not being removed instead of detail lost, but the image can get weirdly soft in patches if values get set high enough.

You also need to watch out for a “falloff” effect when there’s texture on an object at an angle to the camera. You can get odd-looking cases where the texture in some parts of the image is within the noise reduction threshold and is outside of it in others. This can look very weird with motion.

Temporal Noise Reduction

For real grain/gain removal, you need a temporal filter. That compares a pixel with its spatial neighbors in the current frame and its temporal neighbors in adjacent frames. Since grain/gain is random on every frame, the filter can pick out flecks of detail that change from frame-to-frame (noise) and those that don’t (detail), and thus be much more aggressive about eliminating the former. And since that random noise is the biggest compression killer (fine detail without any motion vectors to take advantage of), this can pay off very nicely.

AVISynth

I keep talking about AVISynth as having high-quality algorithms, and this is truer than anywhere in noise reduction, where they have an embarrassment of riches. They even have filters designed to clean up bad VHS sources. There’s a huge and ever-evolving variety of AVS noise reduction filters, but here are a few of my favorites:

FFT3DFilter

FFT3DFilter is, for those with the right kind of decoder ring, a Fast Fourier Transform 3D noise reduction Filter. Specifically, it turns the entire frame into frequency data (like DCT, but without 8 × 8 blocks) and then applies further processing based on those frequencies. Like most AVS filters, it has a huge number of different modes. Myself, I like this for cleaning up typical film grain a little while preserving some of the texture: FFT3DFilter(sigma = 1.5, plane = 4, bt = 5, interlaced = false)

The sigma value controls the strength of the denoise. I’ll use up to 2.5 for somewhat grainer but still Hollywood-caliber content. Higher is only useful for rawer sources.

Once nice thing about FF3D3Filter is that is can also operate on interlaced frames. Since the gain noise in an interlaced video is localized to the field, this is generally better. For very noisy sources, I’ve even applied it twice; once before deinterlacing, and then using interlaced false after deinterlacing.

The big caveat to FFT3DFilter is that it can be incredibly slow. However, nothing really beats it for cleaning up noisy video for cleaner compression.

VagueDenoiser

VagueDenoiser applies a wavelet transform to the video, and then will filter out small details based on that. It’s a spatial filter, and offers a more low-pass-like effect than FFT3DFilter. I only use it with highly troubled sources, like HDV.

BlindPP

BlindPP is another filter included in DGDecode (mentioned in Chapter 5) that removes blocking and ringing artifacts from content that’s been through a visually lossy 8 × 8 DCT encode (including MPEG-1/2, MPEG-4 part 2, VC-1 and JPEG). It requires that the content hasn’t been scaled or cropped in order to preserve the original macroblock structure. Still, it’s not as good as using postprocessing in a decoder, since it doesn’t know what the source was, but it’s a lot better than nothing in a pinch.

Deblock

Deblock is also included with DGDecode. It’s an MPEG-2 specific filter, and wants the source not to have been scaled/cropped since decode, in order to accurately line up with the original 8 × 8 block pattern. If your content fits that description, it’ll work better than BlindPP. It’s great for fixing content sourced from HDV or DVD and then converted to another codec.

Luma Adjustment

As its name implies, luma adjustment is the processing of the luma channel signal. This is often called color correction, though it’s often not color (chroma) that we’re worried about. Most tools that offer any image level controls offer some luma-only filters like brightness, contrast, and gamma.

One big change in QuickTime since version 6 is that it’s adopted industry standard gamma, so you no longer have to worry about encoding with different gamma for Mac and Windows.

Another great change is the predominance of all-digital workflows. Analog video almost always requires at least some luma level tweaking to get blacks normalized. But most all-digital projects don’t have any noise in the blacks of titles and credits, and so are likely to need no image processing at all.

Normalizing Black

An important aspect of the remapping is getting parts of the screen that are black down to Y′ = 16 or RGB = 0. Noise that wouldn’t have been noticed on an old CRT TV can be painfully obvious on a flat-panel TV or even a CRT computer monitor. And LCDs have a pretty bright black, so we want our blacks to be the lowest value we can get to the screen. The net effect is that a slightly noisy black region composed of a jumble of Y′ = 16, 17, 18, and 19 may look black on a calibrated professional CRT display, but on a consumer set it can look like someone is boiling rice in oil. If we get that all down to Y′ = 16, all that erroneous motion goes away. Also, a large field of the same number is extremely easy to compress compared to a jumble of 16, 17, 18, and 19, which looks “black” but not black. So in addition to looking better, it saves bits, letting other parts of the frame look better.

Lastly, getting blacks to nominal black makes it possible to embed the media player inside of a black frame, without an obvious seam between video and frame when a black frame is up. It also avoids the annoying appearance of PC-drawn letterboxing bars not matching block frames.

Getting black levels to Y′ = 16 can be difficult with noisy source, as many pixels that should be black have values well above 16. The solution is to get those brighter pixels down to Y′ = 16 as well. First, noise reduction helps by reducing the variance between the average brightness and the brightest pixels of noise. Beyond that, a combination of luma mapping, decreased brightness and increase contrast will get you where you need to go. However, care must be taken that parts of the image that should truly be dark gray, not black, don’t get sucked down as well.

Brightness

Brightness filters adjust the overall intensity of the light by raising or lowering the value of each pixel by a fixed amount. So, Brightness-5 would typically reduce Y′ = 205 to Y′ = 200 and Y′ = 25 to Y′ = 20.

If brightness is used in compression, it is normally used to make the video darker overall, instead of lighter, since we don’t want to do anything to shift blacks higher than Y′ = 16.

Contrast

Contrast increases or decreases the values of each pixel by an amount proportional to how far away it is from the middle of the range. So, for an 8-bit system, 127 (the middle of the range) stays the same, but the farther away from the center value a pixel is, the larger the change from contrast; it makes black blacker and white whiter.

One classic use of contrast is to expand or contrast the luma range to fix a RGB 0-255 / Y′CbC4 16–235 range mismatch; for example, video that comes in with blacks at Y′ = 32 and whites at Y′ = 215. Adding contrast so the values land at the right spots helps a lot.

Figure 6.12 Before (6.12A) and after (6.12B) luma correction and black restoration, and the histogram that made it possible (6.12C).

image

With noisy source, it can often help to use a somewhat higher contrast value to lower black levels. This technique is often mixed with a corresponding decrease in brightness to keep the whites from getting blown out, and to get a combined effect in getting the dark grays down to black without hurting the rest of the image too much.

Gamma Adjustment

Gamma adjustment was the bugaboo of compression from CD-ROM until QuickTime 6. Back then, QuickTime and its codecs didn’t do any gamma correction, and Macs and Windows machines used a different gamma value (1.8 on Mac, and video-standard 2.2 on Windows). Thus if you encoded video so it looked good on a Mac, the midtones were too dark on Windows, and Windows-tuned video was too bright on Macs. However, all codecs introduced since QuickTime 6 (so MPEG-4 pt 2 and H.264) do automatic gamma correction. There’s still some older Sorenson Video 3 content around the web here and there that has this property, but you won’t face it in any content you create today. Mac OS X 10.6 finally brought standard 2.2 gamma to the Mac for everything, not just video.

Gamma adjustment can be a useful tool in its own right. If the blacks and whites look just right, but the midtones seem a little washed out or dark, adjusting gamma can often improve quality significantly. And if you want to change the subjective “brightness” of a clip, gamma is a much more useful tool than the brightness itself, as gamma gives you brightness where it’s most visible without messing up your black and white levels.

Processing Order

I want to make a little comment about luma processing order. It should always go in the order of Brightness, Contrast, Gamma. This is so blacks and whites are normalized before they hit a gamma filter, so that you don’t need to readjust gamma every time you change blacks/whites.

Adobe’s Levels: Luma Control Done Right

Sure, we can have separate sliders for luma control. But why are we still bothering? Photoshop nailed the perfect UI for this nearly 20 years ago (see Figure 6.12c). It has an integrated histogram to show the distribution of luma values in the source, and handy little triangles to set the black and white points. So easy!

Chroma Adjustment

Chroma adjustment, as you’ve already guessed, refers to tweaks made to color. The good news is that with most projects, you won’t need to make chroma adjustments. And when you are required to do so, it likely won’t be with the complexity that luma adjustments often require.

Saturation

Saturation controls affect changes in the intensity of color, analogous to luma. Saturation changes affect both color channels equally, but have no impact on luma.

It’s pretty unusual to tweak saturation at the compression stage, particularly with a digital workflow. Analog video may wind up a little undersaturated, in which case raising it can help make thing a little more punchy. And video with a lot of chroma noise can sometimes benefit by some desaturation to reduce that noise.

Hue

Hue is determined by the ratio between Cb and Cr, and determines the mix of wavelengths of light emitted by the pixel. Incorrect hue can be due to white balance not being set correctly when the video was shot, from capturing through badly calibrated composite video systems, or messed-up 601 / 709 conversions. These should be fixed in postproduction, but sometimes are only caught at the compression stage.

Typically the problem shows up when flesh tones and pure white don’t look correct. The fix is to adjust the hue until whites and skin look correct. Ideally, you’ll be able to use a full-featured color-correction tool when such problems abound in your source. The hue of flesh tones falls along a very narrow range, A vectorscope will show you the distribution of colors in a frame and a line indicating ideal skin color tone (see Color Figure C.22).

Life’s Mysteries Solved by Compression #2

And here’s another college debate topic solved by image science. Question: “Aren’t we all really the same color in the end?” In fact, yes, we’re all the same hue. All ethnicities have skin the same hue even though there’s lots of variation in luma and saturation. This is why a getting the hue of skin tones even a little off can look very weird.

Frame Rate

While frame rate is typically specified as part of the encoding process, conceptually it really belongs with preprocessing. The critical frame rate rule is the output frame rate must be an integer division of the source frame rate. Acceptable output frame rates are shown in Table 6.2.

Table 6.2 Acceptable Frame Rates.

Frame RateFullHalfThirdQuarterFifth
23.97623.97611.9887.9925.9944.795
24.00024.00012.0008.0006.0004.800
25.00025.00012.5008.3336.2505.000
29.97029.97014.9859.9907.4935.994
30.00030.00015.00010.0007.5006.000
50.00050.00025.00016.66712.50010.000
59.94059.94029.97019.98014.98511.988
60.00060.00030.00020.00015.00012.000
120.000120.00060.00040.00030.00024.000

And that’s it! After so many subjective concepts, I’m happy to finally present a hard and fast rule.

We need to use these frame rates to preserve smooth motion. Imagine a source that has an object moving an equal amount on each frame. At the source frame rate, everything looks fine. If you drop every other frame, the amount of movement in each frame doubles, so it is consistent. Same goes for dropping two frames out of every three, or three out of every four. But if you use any other frame rate, you’ll be dropping an uneven number of frames. Take, for example, going from 24 to 18 fps. That will drop one frame out of every three, so you’ll have two frames that show equal motion, then one with double motion, yielding a “judder” of jerky motion.

Of course, there are some complexities. For one, how do you treat source that is mixed, like a promotional piece for a movie in which video interviews are intermixed with telecined film scenes? If you’re outputting QuickTime, Windows Media, or MPEG-4, which all allow variable frame rates, the frame rate could go up or down as appropriate (if the tool supports that). But MPEG-1/2 and AVI require a fixed frame rate (although frames can be dropped), meaning a given frame can have its duration doubled, tripled, and so on, filling the space where several frames would have normally gone.

I include 120 fps on the chart to show how it can hit all the NTSC frame rates with subdivisions: 60, 30, and 24. So if an authoring tool can process at 120 fps internally, it should be able to handle any kind of variable frame rate you throw at it.

As for conversion between the NTSC 1.001 rates, that difference is imperceptible. If you’re using a compression tool like ProCoder/Carbon that supports audio time resampling preprocessing, you can easily convert between 29.97/30 and 23.976/24 by speeding up/slowing down that 0.1%.

PAL/NTSC conversion is even feasible with this mechanism. It involves a 4 percent frame rate change between 24 and 25 fps, but for most content that looks fine (and people in PAL countries are very used to it; they’ve seen the 4 percent speedup as much as NTSC consumers have seen 3:2 pulldown).

Audio Preprocessing

If all goes well, you won’t need to do much audio preprocessing, because problematic audio can be a nightmare. Fortunately, well-produced audio aimed for home stereo or television delivery will already be normalized and clean, and shouldn’t need anything further for successful compression except potentially normalization. Raw captured audio might need quite a bit of sweetening, but again that’s normally a post-production task.

Normalization

Normalization adjusts the level of an audio clip by finding the single loudest sound in the entire file, and then raising or lowering the volume of the whole clip so the loudest sound matches whatever level has been specified. This doesn’t change the relative volume at all, just the absolute overall volume.

Normalization is of vital importance to the overall listening experience. I’m sure everyone has tried to listen to some lousy, overly quiet web audio, cranking up the volume to hear it clearly. And then, in the middle of the clip the next “You’ve got mail” arrives as a thunderous, ear-splitting BOOM! Not pleasant. Normalizing audio tracks eliminates this problem, and is an important courtesy to your audience. As a rule, the loudest sound in the audio should be around as loud as a system beep.

Dynamic Range Compression

In a source of perennial confusion, there is an audio filter called a compressor. In audio, extreme transients (very loud, very short noises), such as explosions or cymbal crashes, can cause distortion in audio playback and recording devices. Likewise, extremely quiet passages can get lost in the noise floor (the always present hums and hisses). Audio compressors act on dynamic range, smoothing out peaks and valleys of a signal making more consistent and thus “louder”—audio compression, when used properly, can tame wild level changes.

Audio compression was critically important in the 8-bit audio era, when already lousy CD-ROM audio played through tiny internal speakers. Today, it’s thankfully much less of an issue. If you’re dealing with source mixed for television delivery, it will likely have audio compression on it already. Soundtracks mixed for home theater may also have too broad a dynamic range, with the average volume level quite low to leave headroom for big explosions.

Audio Noise Reduction

Noise is no better for audio than video. And audio can have quite a lot of it: tape hiss, air conditioners in the background, hum from power lines, wind in the mic, and so on. A number of professional audio tools are available to reduce this noise, and the results can be pretty miraculous.

The scope and use of such audio noise reduction systems is broad, well beyond our purview here. However, if you have bad-sounding audio, there is often (but not always) something a good audio engineer can do to help the quality. The results are always better when the source is clean from the get-go.

Few compression tools have integrated audio noise reduction.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.63.13