CHAPTER 5
Production, Acquisition, and Post Production

Introduction

From 1995–1999 I was the Chief Technologist and partner in a small (maxed out at eight full-timers) company that offered what we called “digital media production, post and compression services.” This was a good time, and Portland, Oregon was a good market for that stuff. We had a lot of technology companies using digital media really early, like Intel, Tektronix, and Mentor Graphics, plus the double-footed shoe titan of Nike and Adidas Americas. During that period, most of our content was delivered via kiosks or CD-ROM, with the web getting big near the end.

And compression was hard back then, and very content-specific. There was simply no way to encode a frenetic music video that would look or sound good played back on the technology of the era. So, our business model integrated the whole workflow, so that we’d script and storyboard knowing what the final compression would require, and make the creative choices up front that would yield great-looking video on the final display. And it delivered. We used and even pioneered a lot of techniques I talk about here, particularly fixed cameras, avoiding cross-dissolves, and keeping motion graphics in 2D.

The good news is that we’ve had big gains in decoding horsepower, codecs, and bandwidth. So it’s less necessary to sweat these things than it used to be; it’s rare to have content that becomes actually incomprehensible after compression. But it can still matter, particularly when targeting lower bitrates. Sure, H.264 degrades into being soft more than blocky, but the fine detail can still go missing. Knowing that will happen in advance lets you pick techniques to retain fine detail, or avoid having important fine detail to lose.

And some of this is just general advice on best practices for production and post that applies to any content, no matter how it will be delivered.

I also want to warn about, and arm you against, the pernicious assumption that “it’s just for the web, so why worry about quality.” In fact, the opposite is true. In any video destined for viewing in some compressed form, be it online or on low-bitrate wireless devices or elsewhere, all the tenets of traditional production must be adhered to while addressing the additional constraints imposed by compression. Well-lit, well-shot, well-edited content will compress better and show fewer artifacts than a poorly done source. And there’s more than one axis of bad in this world. There’s nothing about a small screen that makes bad white balance look better or bad audio sound better.

Broadcast Standards

Globally, there are two major broadcast standards that encompass every aspect of video production, distribution, and delivery: NTSC (National Television Systems Committee) used primarily in the Americas and Japan and PAL (Phase Alternating Line), used throughout Eurasia and Africa, plus Brazil and Argentina. Signals produced from one will be incompatible for delivery on the other, because NTSC and PAL have different native frame rates and numbers of lines. Luckily, a lot of video tools—especially postproduction software—can handle both formats. Alas, it’s not as common for tape decks and monitors to be format-agnostic. Most video production facilities are either PAL- or NTSC-centric. When called upon to deliver content in a non-native format, they do a format conversion. This conversion is never perfect, as going from PAL to NTSC results in a loss of resolution and going from NTSC to PAL loses frames. In both cases, motion is made less smooth.

NTSC

The first National Television Systems Committee consisted of 168 committee and panel members, and was responsible for setting standards that made black-and-white analog television transmission a practical reality in the United States. The standards the NTSC approved in March 1941 were used for broadcast in the United States until 2009 and remain in use in other countries. Those standards serve as the framework on which other broadcast television formats are based. Even though analog broadcast has ended, lots of our digital video technology echoes and emulates analog standards.

Some aspects of the NTSC standard—the 4:3 aspect ratio and frame rate—had been recommended by the Radio Manufacturers of America Television Standards Committee back in 1936. One of the most important things the NTSC did in 1941 was ignore the arguments of manufacturers who wanted 441 lines of horizontal resolution and mandate 525-line resolution—then considered “High Definition.”

The original frame rate of NTSC—30 interlaced frames per second—yielded 60 fields per second, and was chosen because U.S. electrical power runs at 60 Hz. Electrical systems could interfere quite severely with the cathode ray tubes of the day. By synchronizing the display with the most likely source of interference, a simple, nonmoving band of interference would appear on the screen. This was preferable to the rapidly moving interference lines that would be produced by interference out of sync with the picture tube’s scan rate.

When color was later added to NTSC, it was critical that the new color broadcasts be backwards compatible with existing black-and-white sets. The ensuing political battle took more than a decade to resolve in the marketplace (from 1948, when incompatible color TV was developed, through 1953 when a backward-compatible technique was devised, until 1964 when color TV finally caught on with the public); but that’s a tale for another book. Politics aside, backward compatibility was a very challenging design goal. The NTSC worked around it by leaving the luminance portion of the signal the same, but adding color information in the gaps between adjoining channels. Since there was limited room for that information, much less bandwidth was allocated to chroma than luma signals. This is the source of the poor color resolution of NTSC, often derided as Never Twice the Same Color. The frame rate was also lowered by 0.1 percent, commonly but imprecisely truncated to 29.97 (59.94 fields per second).

The canonical number of lines in NTSC is 525. However, many of those lines are part of the vertical blanking interval (VBI), which is information between frames of video that is never seen. Most NTSC capture systems grab either 480 or 486 lines of video. Systems capturing 486 lines are actually grabbing six more lines out of the source than 480 line systems.

PAL

PAL is the color-coding system used in Europe. By reversing the phase of the reference color burst on alternate scan lines, PAL corrects for shifts in color caused by phase errors in transmission signals. For folks who care about such things, PAL is often referred to as Perfection At Last. Unfortunately, that perfection comes at a price in circuit complexity, so critics called it Pay A Lot.

PAL was developed by the United Kingdom and Germany and came into use in 1967. European electrical power runs at 50 Hz, so PAL video runs at 50 fields per second (25 frames per second) for the same interference-related reasons that NTSC started at 60 Hz. Nearly the same bandwidth is allocated to each channel in PAL as in NTSC, and each line takes almost the same amount of time to draw. This means more lines are drawn per frame in PAL than in NTSC; PAL boasts 625 lines to NTSC’s 525. Of those 625 lines of horizontal resolution, 576 lines are captured. Thus, both NTSC and PAL have the same number of lines per second, and hence equal amounts of information (576 × 25 and 480 × 30 both equal exactly 14,440).

The largest practical difference between NTSC and PAL is probably how they deal with film content. NTSC uses convoluted telecine and inverse telecine techniques. To broadcast film source in PAL, they simply speed up the 24 fps film an additional 4 percent so it runs at 25 fps, and capture it as progressive scan single frames. This makes it much easier to deal with film source in PAL. This is also why movies on PALDVD are 4 percent shorter than the NTSC version.

SECAM

SECAM (Système Électronique pour Couleur Avec Mémoire) is the French broadcast

standard. It is very similar to PAL, offering better color fidelity, yet poorer color resolution. SECAM today is only a broadcast standard—prior to broadcast, all SECAM is made with PAL equipment. There’s no such thing as a SECAM tape. SECAM is sometimes jokingly referred to as Something Essentially Contrary to the American Method.

ATSC

The Advanced Television Standards Committee created the U.S. standard for digital broadcasting. That Digital Television (DTV) standard comes in a somewhat overwhelming number of resolutions and frame rates. As a classic committee-driven format, it can provide incredible results, but its complexity can make it extremely expensive to implement. And even though Moore’s Law continues to bring chip prices down, the expense necessary to build display devices (be they projection, LCD, or CRT) that can take full advantage of highdefinition quality remains stubbornly high.

Resolutions at 720 × 480 and below are called Standard Definition (SDTV). Resolutions above 720 × 480 are called High Definition (HDTV).

With the transition to HD digital broadcasting underway around the world, production companies have increasingly been shooting HD to future-proof their content.

DVB

Digital Video Broadcasting is basically the PAL countries’ version of ATSC.

Preproduction

As with any production, proper planning is critical to shooting compression-friendly video. Beyond the usual issues of location, equipment, cast, crew, script, and so on, with compressed video you need to be aware of where your content will be viewed. Is it going to DVD? CD-ROM? Broadband web video? 56 kbps dial-up modem connection? Wireless handheld devices? All of the above? Is it going to be evergreen content that needs to work on future devices not yet conceived of? Knowing your delivery platform helps you understand what you can and can’t get away with in terms of aspect ratios, backgrounds, framing, and the like.

It’s always better to plan ahead, to know in advance what you’re trying to do and how you’re going to do it. Even if you’re making a documentary in which you’ll find the story as you edit your footage, knowing your target delivery platform will dictate how you’ll light and shoot your interview subjects, capture your audio, select archival footage, and so on.

Things that should be decided and communicated to your crew in advance of your shoot include what aspect ratio you’ll be shooting, what frame rate the camera is running at, your shooting style (locked-down cameras or handhelds), and whether you’ll only use tight closeups for talking heads or opt for a mix of close-ups and wide angle shots.

Production

Production is generally the most expensive phase of any project, and hence the most costly to get wrong. The standard rules of producing good video apply to producing video that compresses well: good lighting, quality cameras, and professional microphones are always required.

Professional video production is difficult, and involves a lot of people doing their own difficult jobs in coordination. And a lot of web content is shot with Handycams by novice users who don’t know enough to know they don’t know what they’re doing. I don’t mean to sound elitist by that; the democratization of production has been one of the best things to happen during my career. But even though a $2,000 camera today can outperform a $100,000 camera of a decade ago, video shot by hobbyists still looks like video shot by hobbyists. In general, it takes several years of experience working in a particular production discipline before you can call yourself a pro—it’s not something you pick up over a weekend after reading a book, even one as fabulous as this. The explosion of great, affordable equipment gives many more people the chance to learn professional skills, but it’s not a substitute for them.

Last, whenever anyone on the set says “We’ll just fix it in post,” be afraid. As someone who’s been on both ends of that conversation, I can tell you that phrase contains one of the least accurate uses of the word “just.” As said previously, “just fix it in post” is more often black humor than a good plan. Repairing production problems in post is invariably more expensive than getting it right the first time, and the resulting quality is never as good as it would be if the shot had been done right the first time. Post is the place to be doing the things that couldn’t be done on the set, not fixing the things that should have been.

Production Tips

There are a number of simple and not-so-simple things you can do to improve the quality of your video. Optimizing it based on a single target delivery medium is relatively simple. However, we live in a world in which “write once, deliver everywhere” has become a business necessity. Shooting for multiple delivery platforms forces you to make choices, but there are things that shouldn’t be optional, no matter what your target delivery platform.

Interlaced is dead. Shoot progressive

Interlaced video was a great hack that had a great half-century run, but it has no place in the flat-panel era. No one makes interlaced displays anymore, and tons of devices have no ability to play back interlaced content at all. Anything you shoot interlaced is going to get converted to progressive at some point, either before compression or in the display. And the earlier in the chain you do it, the better the final results will be. I’ll talk in great detail in Chapter 6 about how to do deinterlacing, but we’ll both be a lot happier if you don’t give yourself a reason to read it. Even if someone says you need to shoot interlaced because the content is going to go out as 480i/576i broadcast or otherwise has to work with legacy displays, you’ve got two fine options:

•  Shoot 50/60p. If you shoot using a frame rate of your target field rate (so, double the interlaced frame rate), you can convert to interlaced after post for interlaced deliverables. Going from 60p to 30i and 50p to 25i delivers all the motion doing 30i/25i originally could have, and you’ve got a higher quality progressive version as well. Interlacing is a lot easier than deinterlacing!

•  Shoot 24p. For film/entertainment content, 24p serves as a great universal master. You can leave it as native 24p for most deliverables, and then convert to 3:2 pulldown for NTSC and speed up 4% to 25p for PAL.

If there’s a lot of motion, 720p60 is better than 1080p24 or 1080p30. So: only progressive. No interlaced, ever.

Use pro gear

If you want professional quality, use gear that’s of professional caliber, operated by someone who knows how to use it. The good news is that pro-caliber gear is now dirt-cheap compared to only a few years ago. And when we talk about pro gear, don’t focus mainly on format support and imaging spec. When a camera’s marketing materials start talking about the lens more than the recording format, that’s a good sign of a good camera. Good glass beats lots of megapixels. You want to be able to control focus, zoom, depth of field, and other parameters with fine precision.

And don’t forget the importance of quality microphones. Avoid using the built-in microphone on video cameras—use a high-quality external mic. A shotgun or boom is great in the hands of a skilled operator. A lavaliere (clip-on) microphone is more trouble-free, and will record voice well. Remember, even if a playback device has a tiny screen, it’s also going to have a headphone jack, and that sound can be as big as the world. Since audio has the best chance of making it through the compression process with uncompromised quality, getting the sound right will really pay off.

Produce in the right aspect ratio

16:9 versus 4:3? These days, I’d generally default to 16:9 (and of course all HD cameras are always 16:9). I only consider 4:3 if I know 4:3 devices are going to be targeted for playback; mainly phones (although the iPhone uses a 3:2 aspect ratio, which lies between 4:3 and 16:9). While letterboxing works well for larger displays, being able to use the full 320 × 240 on that size screen gives a lot more picture than 320 × 176. Shooting hybrid is a great way to hedge, with the critical action constrained to the 4:3 center, but including the full-width shot and checked for boom mics, and so on. Then the video can be delivered as-is for 16:9, and cropped (“center-cut”) for 4:3. This requires careful production to make sure that both framing modes offer good visuals. See Figure 5.1.

Figure 5.1 Guides for shooting hybrid for 16:9 and 4:3. The innermost vertical lines show where action should be kept to for easy cropping to 4:3.

image

Bright, soft lighting

Lighting plays an enormous role in determining how easily your video will compress. Soft, diffuse, even lighting is easier to compress than lighting that produces deep shadows and hard edges. A three-light kit and some diffusers will usually be enough for a simple on-location shoot. For outdoors, bounce cards can be used to reduce shadows.

More important than softness, there should be enough light that there’s not extra noise coming from using a higher gain in the CCD, or from using a faster film stock. Grain and gain noise creates codec-killing small details that change every frame. While it can be removed in post, it’s hard and slow to do without hurting detail.

Bear in mind that modern flat-panel displays don’t have perceptually uniform gamma like most codecs assume, which often makes compression issues in shadow detail very visible and can lead to distracting blocky artifacts in the not-quite-blacks that wouldn’t have been visible on a calibrated CRT display. This can be particularly hard with mobile device screens, far too many of which use 16-bit processing without any dithering. So beware of dark scenes if you’re targeting handheld device playback. They can look great on set, but can wind up like a mosaic on the final display.

Record DV audio in 48 kHz 16-bit

DV supports two audio recording modes: 4-channel 32 kHz 12-bit and 2-channel 48 kHz 16-bit. Always use the 48 kHz mode. 32 kHz isn’t great, but 12-bit audio can be a quality killer.

Control depth of field

Compared to film, most video cameras use a much deeper depth of field. So a scene shot by a film camera (or a digital camera that optically emulates film) could show foreground and background objects as blurry when the action in the middle is in focus, while the video camera leaves everything in focus over the same range of distances. That may sound more like a bug than a feature, but depth of field control has long been a big part of the film vocabulary, and is required for a convincing “film look” if that’s what desired. Also, using a narrow depth of field forces the codec to spend bits on just the elements you’ve focused on. If producing in an environment where the background can’t be controlled, such as outdoors or on a convention floor, making sure your talent is in focus and irrelevant details aren’t means that the subject will get most of the bits. Plus, background details that won’t matter won’t distract the viewer.

Controlling what’s in focus and what’s not may require some preplanning in terms of camera location, as you’ll typically need to be farther away than you might expect.

Watch out for foliage!

One of the most difficult things to compress is foliage blowing in the wind, which makes sense mathematically: it’s green, and thus mainly the nonsubsampled Y′ channel, and it’s highly detailed. And it has lots of subtle random motion. You can be amazed by how many more artifacts the talent’s face will get when in front of a hedge in a gentle breeze. I’ve had cases where I had to quadruple bitrate for talking-head video based just on background foliage.

Use a slow shutter speed

If you’re shooting slower motion content, stick to a shutter speed of 1/60 (NTSC), 1/50 (PAL), or 1/48 (24p) of a second. These are the defaults for most cameras. Slow shutter speed produces very natural-looking motion blur, meaning that when the image is detailed, it’s not moving much, and when it’s moving a lot, it’s not very detailed (see Figure 5.2). Thus, the codec has to handle only one hard thing at a time. The very slow shutter of film is one reason film content encodes so nicely. Slow shutter speeds also let in more light per frame, which reduces the amount of light needed to avoid grain and gain.

Figure 5.2 With motion blur, moving parts of the image blur along the axis of motion (5.2A), while static parts of the image remain sharp (5.2B).

image

If you want viewers to see it, make it big

A long panoramic shot may look great in the viewfinder, but the talent in the middle of the screen may vanish into insignificance on a phone. An initial establishing shot may be effective, but the bulk of your shots should be close up on what you want the viewer to focus their attention on.

Limit camera motion

Handheld camera work can be extremely difficult to compress, since the whole frame is moving at once, and likely moving in a different direction than the action. Most codecs work on a 2D plane, so if the camera is moving parallel to the action, it compresses great. But if the camera is rotating (a “dutch”), zooming, or moving towards the action, it’s a lot harder for the codec to make good matches from frame to frame.

The worst case here is when the camera is mounted on something that is moving. Perhaps the hardest single thing to compress is the helmet-mounted camera giving a point of view (POV) of someone mountain biking. The motion is fast and furious, the camera is whipping around with constant angular changes, plus it’s rapidly changing position. And remember that thing about foliage? You’ll need to have radically more bits per pixel for that kind of content.

The tripod and dolly are classics that still work great. If handheld camera work is absolutely required, try to use a Steadicam or similar stabilization system. A skilled camera operator will be better at keeping the camera still.

Mr. Rogers Good, MTV Bad

As you learned in Chapter 4, the two things that use up the most bits in compressed video are detail and motion. The more complex your image and the more it moves, the more difficult it is for a codec to reproduce that image well. At lower data rates, this difficulty manifests as increasingly poor image quality.

As a rule of thumb, classic, sedate shooting and editing compress well. Jump cuts, wild motion graphics, and handheld (shaky-cam) camera work are difficult or impossible to compress well at lower data rates. “Mr. Rogers good, MTV bad” says it all.

When designing for low-data-rate delivery, avoid using elements that aren’t essential to what you’re trying to communicate. If a static logo works, don’t use a moving one. The lower the target data rate, the simpler the video needs to be. If you want to do live streaming to handsets, Spartan simplicity is your goal. But if you’re going for Blu-ray, you can get away with almost anything.

Picking a Production Format

Even though we’ve seen a great winnowing down of delivery codecs and formats, it seems you can’t sneeze without some new production codec or format popping up; there’s far more now than ever.

Types of Production Formats

Interframe vs. intraframe

The biggest difference between production formats is whether they encode each frame independently (intraframe) or use motion vectors and encode a number of frames together in a GOP (interframe).

Both can work, but interframe codecs trade substantially increased pain in postproduction for bitrate savings when recording. Bear in mind that the postproduction pain can be high enough that the interframe codecs may just get re-encoded to an intraframe codec on import, causing a generation loss and throwing away any efficiency savings.

Interframe is mainly seen in the consumer HDV and AVCHD formats. They’re great from a cost-of-hardware and cost-of-capture media perspective, but can be pretty painful when trying to ingest a lot of content. Interframe encoding makes lossless editing much harder, since all edits need to be at I-frames. Any edits requiring more than a half-second or so precision would require re-encoding frames around the edit point.

DCT vs. wavelet

Classic video delivery codecs all used DCT as discussed in Chapter 3. But we’re seeing a whole lot of wavelet codecs used in various production formats. Wavelet has some key advantages:

•  It’s more efficient than simple DCT for intraframe encoding (although worse for interframe).

•  It degrades into softness, not blockiness.

•  It’s easily extended to different bit depths, number of channels, etc.

•  It supports decoding only to a sub-band. So just the 960 × 540 band of a 1920 × 1080 frame could be decoded to display a 960 × 540 preview. They typically have bands at 2x ratios for width/height, so even thumbnails are easy to get.

Subsampled vs. full raster

The higher-end formats will encode a pixel for every pixel of the video format, so a full 1920 × 1080 or 1280 × 720. Others use subsampling, like encoding just 1440 × 1080 or 960 × 720. You’ll want to capture as many pixels wide as you’ll want to display.

4:2:2 vs. 4:4:4 vs. 4:2:0 vs. 4:1:1 vs. “RAW”

Most pro video formats use 4:2:2 chroma subsampling. This is a fine choice; it’s a superset of 4:2:0 and 4:1:1, and means that any interlaced production (not that you would produce one!) doesn’t have to share chroma samples between fields.

One drawback to some tape formats is that it’s hard to get access to the native bitstream. So even though you have a nice compact version on the tape, the deck converts it back to uncompressed, making capture more expensive and difficult, and potentially requiring recompression. I don’t like this approach.

One recent emerging choice is “Video RAW.” Like a still camera RAW mode, this actually captures the sensor image data directly. With a CCD, that’s a single color per sensor element, with more precision and in a nonlinear gamma. This winds up replicating some of the classic film workflow. The RAW video, like a film negative, needs to be processed and color-timed in order to make something that’s editable. But the RAW approach preserves all the detail the camera is capable of; there’s no baked-in contrast, color correction, and so on. So it preserves the maximum flexibility for creative color correction in post.

Tape vs. solid state vs. cable

This last subsection is about what medium the content is recorded to.

Back in the day, it was always tape. Sure, in the end it was just rust glued to Mylar, but it worked and was generally cheap and reliable. But it’s also linear; capturing an hour of tape takes an hour of transfer time. And if you want to find just that one scene? You’ve got to spin the spindles to the right part of the tape. With device control this is automatable, but it’s a slowdown in the workflow.

Now we have an increasing number of solid-state capture mechanisms using flash memory. These write a file to a flash card, and so can be popped straight into a computer when done. No fast-forward/rewind; it’s like accessing any other file. Initially there was a price premium for the convenience, but flash prices are dropping at a Moore’s Law–like pace, while tape price remains pretty static, so it won’t be long until solid state is cheaper than tape. Flash memory also is much more reusable than tape; thousands of writes unlike the dozen or so at most for typical video tape before reliability starts going down.

The last option is to just capture the bits live to a hard drive, either connected to the camera or to a PC. The workflow is great here as well, although it requires a very fast computer and a good backup mechanism. Most cameras that support PC capture can do it in parallel with tape or card, for a real belt-and-suspenders experience.

If you’re not familiar with the professional video industry, you might find this list of options perplexing. It’s all just glass, CCD, and encoded bitstream, so why so many different products with such massive variability in price points? And why are the “consumer” formats missing critical features that would be trivial to add? Well, the companies that historically have dominated this industry have worked hard to make sure that equipment from the consumer divisions doesn’t get so good that they eat into sales of the professional products. This segmentation has led to some very annoying frustrations in the industry.

Fortunately, the competition between the consumer divisions of different companies causes some upward feature creep ever year. And the good news is that as new companies without legacy business, notably RED, enter the market, they don’t have to worry about segmentation and are shaking things up.

Production codecs

DV

DV was the first affordable digital video format and remains in wide use. It is a simple 8-bit DCT intraframe codec running at 25 Mbps (thus the nickname DV25). SD only, it’s always 720 × 480 in NTSC and 720 × 576 in PAL. Depending on the camera, it can support interlaced or progressive, 4:3 or 16:9, and even 24p. 24p on DV is really 24PsF (Progressive segmented Frame)—it encodes 24 unique frames per second and 6 duplicate frames in order to maintain the 30i frame rate. This wastes some bits, but has no impact on quality and is automatically corrected for inside NLEs, so all you’ll normally see is the 24p.

The biggest drawback is that the NTSC version uses 4:1:1 chroma subsampling, which makes for a lot of blocking with saturated colors. The PAL version is a more sensible 4:2:0 (except for the largely abandoned PAL 4:1:1 DVCPRO variant).

DVCPRO and DVCAM are JVC and Sony’s brands for their Pro DV versions, respectively. They’re both the standard DV25 bitstream but offer larger higher capacity cassettes that won’t fit consumer DV cameras.

DVPRO50

DVCPRO50 is JVC’s Pro variant of DV. Essentially two DV codecs in parallel, it uses 4:2:2 instead of 4:1:1 and doubles the bitrate to 50 Mbps. It looks great and offers the easy postproduction of DV. DVCPRO50 uses normal DV tapes running at double speed, so you only get half the run time that DV25 has.

DVCPRO-HD

DVCPRO-HD is sometimes called DV100, as it’s essentially four DV codecs ganged together. Thus, it’s intraframe 4:2:2 with up to 100 Mbps (although it can go down to 40 Mbps depending on the frame size and frame rate). It’s available in both tape and flash (P2) versions. The tape runs at 4x the speed and so has 1/4 the duration as standard DV.

The biggest drawback to DVCPRO-HD is that it uses aggressive subsampling. But the high and reliable per-pixel quality can be well worth it.

DVCPRO-HD has several modes:

•  960 × 720p50/60

•  1280 × 1080i30

•  1440 × 1080i25

DVCPRO-HD is used in VariCam, which allows highly variable frame rate recordings for special effects.

HDV

HDV is an interframe MPEG-2 HD format. Designed to be a straightforward upgrade from DV, it uses DV tapes and the same 25 Mbps bitrate.

When HDV works, it’s fine. While earlier NLEs had a hard time playing it back, modern machines are easily fast enough. But since it’s an interframe MPEG-2 codec, once motion or detail starts getting too high, it can get blocky. So production technique matters; HDV is not a good choice for our helmet-cam mountain biking example. The quality of the MPEG-2 encoder chip varies between cameras (though across the board they’re better today than when the format launched), so trial and error is required to figure out what’s possible to get away with. Even if it is recorded with the blockies, there are ways to reduce that on decode (as we’ll discuss later), but it slows down the process. Compression artifacts are particularly problematic in the 1080i mode; see Color Figure C.15 for an example. Shooting progressive—particularly 24p—improves compression efficiency and thus reduces compression artifacts.

There are two main modes of HDV:

•  HDV 720p is a full 1280 × 720 at 24, 25, 50, or 60 fps at 20 Mbps. This has only been seen in JVC cameras to date. 720p24 is easy to do at that bitrate, and can yield high-quality results.

•  HDV 1080i is 1440 × 1080 at 25 Mbps. Although originally meant to be a progressive format, Sony added 1080i to the format, and has that as the default mode in their products.

This has led to a frustratingly large variety of attempts to get 24p back out again.

•  Cineframe was just a not very good deinterlacer applied after the native-interlaced signal processing. It’s horrible; don’t use it, or a camera that has it.

•  “24” uses 3:2 pulldown, so two out of five frames are coded interlaced. It can be a pain in post, but result can be good.

•  24A uses PsF, where there are six repeated frames every second. This works fine in practice; bits aren’t spent on the repeated ones.

True progressive models with a native 24/25/30 bitstream are now available. However, some still use interlaced sensors; make sure that any camera you get supports progressive lens to bitstream. Since HDV is a 4:2:0 format, shooting interlaced can result in some color challenges.

AVCHD

AVCHD is a H.264 bitstream in a MPEG-2 program stream on solid-state memory. It’s

conceptually similar to HDV—use a commodity consumer codec at the highest bitrate. The big difference is that it has followed the photography market in using flash memory instead of tape.

H.264 is a much more efficient codec than MPEG-2, and also degrades more gracefully, so quality is generally better. The bigger challenge with AVCHD is it takes a lot of horsepower to decode. These bitrates can be challenging to decode in software on their own, and it can take GPU acceleration to be able to scrub or edit crisply. Some editing products, including Final Cut Pro 6, automatically recompress from AVCHD to another simpler format right away.

AVCHD is definitely the best consumer video format at this point, but have a postproduction plan before you use it.

IMX

IMX was an early Sony attempt at a DVCPRO50 competitor, using intraframe MPEG-2 from 30–50 Mbps. It’s still supported in many cameras, and is a fine choice for SD-only productions.

XDCAM

XDCAM is Sony’s pro post-tape format. It originally supported the Professional Disc optical format, with support for SxS and SDHC cards added later. There are a number of variants and modes, but all are interframe MPEG-2. Since they target solid-state, they can use VBR, raising bitrate for challenging scenes; this makes the risk of HDV-like blockiness much lower. There are 19 and 25 Mbps modes, but you should just use XDCAM at the HQ 35 Mbps top bitrate; storage is cheap these days, so there’s no reason to run the risks of artifacts.

HDCAM

HDCAM was Sony’s first pro HD offering. It’s a nice intraframe DCT codec, but with very odd subsampling. For luma, it’s a pretty typical 1440 × 1080. But for chroma, it’s a unique 3:1:1 chroma subsampling. Yep. It has 1440/3 = 480 chroma samples horizontally. This is only half the chroma detail of a full 1920 × 1080 4:2:2 encode (1920/2 = 960). Not bad for acquisition, but pretty lossy for post, particularly for animation and motion graphics. Being Sony, it started out as 1080i only, but since has gained 24/25p modes (used in Sony’s highend CineAlta cameras).

RED

The RED cameras are one of those things like Moby Dick and the Arctic Monkeys: everyone talks about how religiously awesome it is, to the point where it seems reality has to disappoint, But winds up being pretty awesome anyway. It’s been the focus of incredible buzz at NAB show for many years, and is being used with great real-world results. Beyond some great and promising products, it’s also exciting to have a new entrant into the camera business without a vested interest in the way things have always been done, and willing to explore new price points and configurations.

Their main product now is the RED ONE, which uses a 4096 × 2304 imager (4 K). That’s actual sensor count for individual red, green, and blue sensors, however, not full pixels. But it’s fully capable of making great 2 K and higher images, with frame rates up to 120 Hz. A feature film grade camera can be had for under $30 K, without lab fees—far less than a good Pro SD digital video camera was in the 1990s.

“REDCODE RAW” is the native RED bitstream, a mathematically lossy wavelet codec recording the actual values from the 4 K receptors. This offers the most flexibility in post, but requires a whole lot of processing to turn into editable video as well.

At this point, RED is really best thought of as a 35 mm film replacement technology. It can radically outperform existing video cameras in capture detail, but it also requires a much more intensive ingestion process.

Post-only formats

The post-only formats won’t ever be used to create content, but you may receive content on those formats. As much as I love that the HD future has become the HD present, there’s a huge legacy of content out there that you may need to access.

VHS

VHS (Video Home System) was a horrible video standard. Back in the day we talked about “VHS” quality, but that was mainly an excuse for our grainy, blocky 320 VHS × 240 Cinepak clips. VHS has poor luma detail, much worse chroma detail, and is extremely noisy. People just watching a VHS movie for fun may not have noticed how nasty VHS looked, but when the content is captured and displayed on a computer screen, the results are wretched.

Sometimes content is only available on VHS, and if that’s all you’ve got, that’s all you’ve got. But any kind of professionally produced content was originally mastered on another format. It’s almost always worth trying to track down a copy of the original master tape, be it on Beta SP or another format. There’s some aggressive preprocessing that can be done to make VHS less terrible, but that’s lurching towards mediocrity at best.

VHS supports several speed modes, the standard SP, the rare LP, and the horrible (and horribly common) EP. Because EP runs at one-third the speed of SP, you can get three times as much video on a single tape, with triple the amount of noise. Tapes recorded in EP mode are more likely to be unplayable on VCRs other than the one they were recorded with, and most professional decks can read only SP tapes. If you have a tape that won’t work in your good deck, try putting it into a consumer VHS deck and see if it’s in EP mode.

VHS can have linear and HiFi audio tracks. The HiFi ones are generally good, even approaching CD quality. Linear tracks may be encoded with good old analog Dolby—if so recorded, playing back in Dolby mode can help quality and fidelity.

S-VHS

S-VHS (the “S” stands for “super”) handles saturated colors better than VHS through improved signal processing. S-VHS was never a popular consumer standard, but was used for corporate- and wedding-grade production. You’ll need a real S-VHS deck to capture at full quality, though; some VHS decks with a “pseudo S” mode can play S-VHS, but only at VHS quality.

D-VHS

D-VHS was an early attempt at doing digital HD for the home, led by VHS creator JVC. The technology was a straightforward extension of ATSC and VHS—a MPEG-2 transport stream on a VHS tape. The quality was fine when encoded well (it handled up to 25 Mbps). And it was DRM-free with IEEE 1394/FireWire support, so the MPEG-2 bitstream could be directly captured! D-VHS also supports recording encrypted cable content, since it’s just a bit pump. A CableCARD or similar decryption device is required upstream.

Prerecorded D-VHS was called D-Theater, and it saw quite a few releases from 2002–2004, ending as HD DVD and Blu-ray were emerging. No one is releasing new content for it anymore, but it was a popular way to easily access HD source content.

8 mm

8 mm was a popular camcorder format before DV. It’s better than VHS, although not by a whole lot.

Hi8

Hi8 is an enhanced 8 mm using similar signal processing as S-VHS to improve quality. However, the physical tape format itself wasn’t originally designed for carrying information as dense as the Hi8 calls for. Thus, the tape itself is very sensitive to damage, often resulting in hits—long horizontal lines of noise lasting for a single frame. Every time the tape is played, the wear can cause additional damage. If you get a Hi8, first dub the source to another tape format, and then capture from the fresh copy. Do not spend a lot of time shuttling back and forth to set in- and out-points on a Hi8 tape. Lots of Hi8 tapes are going to be too damaged to be useful this many years later.

3/4 Umatic

3/4 (named after the width of its tape) was the default production standard pre-Betacam. Quality was decent, it didn’t suffer much from generation loss, and the tapes themselves were very durable. There’s lots of archival content in 3/4. The original UMatic was followed by Umatic SP with improved quality.

Betacam

Betacam, based on the Betamax tape format, provided much higher quality by running several times faster, increasing the amount of tape assigned to each frame of video. The small tape size of Betacam made it a hit with ENG (Electronic News Gathering) crews.

There are two sizes of Betacam tapes, one a little smaller (though thicker) than a standard VHS tape meant for cameras, the other about twice as large meant for editing.

Betacam SP was a later addition that quickly took over, and was the last great analog tape format. SP offered low noise, tape durability, and native component color space. It actually had different Y′, Cb, and Cr tracks on the tape. The first wide use of component analog was with Betacam SP.

If the tape is in good shape, you can get pretty close to DVD quality out of SP. It’s the analog format you hope for.

Digital Betacam

Digital Betacam (often referred to as D-Beta or Digi-Beta) was a lightweight SD digital codec, and the first wide use of 10-bit luma. It’s captured via SDI (described below).

Betacam SX

This weird hybrid format uses interframe MPEG-2. It was used by some diehard Sony shops, but was eclipsed by DV in the broader market.

D1

D1 is an uncompressed 4:2:2 8-bit per channel SD format. D1 was extremely expensive in its day, and offers great SD quality.

D2

D2 is an anomaly—it’s a digital composite format. D2 just digitally sampled the waveform of the composite signal itself. This meant that the signal had to be converted to composite first, throwing a lot of quality out. There are still some of these tapes floating around in the world, although it can be a challenge to find a working deck.

D5

D5 was Panasonic’s attempt at reaching the D-Beta market with a 10-bit, 4:2:2 format. The same tape format was used in the much more popular D5 HD (see following section).

D5 HD

D5 HD was the leading Hollywood post format for several years, as the first broadly used tape format with native 24p, 8 or 10-bit 4:2:2, and no horizontal subsampling. It’s great, although is being displaced by HDCAM-SR. D5 uses 6:1 compression in 8-bit and 8:1 in 10-bit. Given the intraframe DCT codec used, I’ve seen it introduce some blocking in highly detained chroma, like a frame full of green text.

HDCAM-SR

HDCAM-SR almost makes up for all of Sony’s sins in pushing interlaced and offering way too many segmented formats over the years. HDCAM-SR supports a full 10-bit 4:2:2 1920 × 1080 without subsampling and without artifacts. It’s the only major use of the MPEG-4 Studio Profile. HDCAM-SR also has a high-end 12-bit 4:4:4 mode suitable for mastering film or animation prior to color correction.

D9 (Digital-S)

Originally named Digital-S, D9 from JVC was originally touted as an upgrade path for S-VHS users and was the first use of the DV50 bitstream.

Acquisition

Acquisition, also called “capture,” is how the source content gets from its current form to something you can use in your compression tool. For live video, acquisition is grabbing the video frames as they come down the wire. Of course, the goal is for your source to look and sound exactly the same after capture and digitization as it did in its original form. Capture is probably the most conceptually simple aspect of the entire compression process, but it can be the most costly to get right, especially if you’re starting with analog sources.

Digitization, as you’ve probably guessed, is the process of converting analog source to digital form—our old friends sampling and quantization. That conversion may happen during capture or it may occur earlier in the production process, say during a video shoot with DV format cameras or during a film-to-video transfer. When you capture video off a DV or HDV tape with a 1394 connector, or import an AVCHD or RED file off a memory card, you aren’t digitizing, because you aren’t changing the underlying data. The same is true if you’re capturing uncompressed video off SDI or HD-SDI.

Let’s look at some of the technologies and concepts that come into play with capture and digitization.

Video Connections

While analog video is intrinsically YUV and hence Y′CbCr, the manner in which the video signal is encoded can have dramatic effects on the quality and range of colors being reproduced. Both the type of connection and the quality of the cabling are important. Things are simpler in the digital world, where connection types are more about workflow and cost. The Holy Grail is the high-quality digital interconnect that allows direct transfer of the highquality bitstream initially captured.

Coaxial

Anyone with cable television has dealt with coaxial (coax) connections. Coax carries the full bandwidth and signal of broadcast television on its pair of copper wires surrounded by shielding. Coax not only carries composite video signals; audio signals are mixed with the video signal for broadcast television. This can cause a lot of interference—bright video can leak into the audio channel, and loud audio can cause changes in the video.

There are plenty of consumer capture boards that include tuners and can capture analog over coax. The good news is that analog coax is going away. Coax itself lives on, but it mainly carries digital cable or ATSC signals.

Composite

The composite format was the first better-than-coax way to connect a VCR and TV. At least it doesn’t mix audio with video or multiple channels into a single signal. But in composite video, luma and chroma are still combined into a single signal, just like for broadcast. And the way they are combined makes it impossible to separate luma and chroma back perfectly, so information can erroneously leak from one domain to the other. This is the source of the colored shimmer on Paul’s coat in Perry Mason reruns. You may also see strange flickering, or a checkerboard pattern, in areas of highly saturated color.

How well equipment separates the signals plays a big part in how good video acquired via composite can look, with higher-end gear including 3D comb filters as your best bet. But it’s all gradations of bad; no video encoding or compression system should use composite signals. Throw out all your yellow-tipped cables.

Professional systems use a BNC connector without color-coding that has a wire protruding from the center of the male end and a twist top that locks on. But that doesn’t help quality at all; it doesn’t matter how good the cable is if it’s still composite.

S-Video

S-Video (also called Y/C) is a leap in quality over composite. With S-Video, luma and chroma information are carried on separate pairs of wires. This eliminates the Perry Mason effect when dealing with a clean source, as chroma doesn’t interfere with luma. The chroma signals can interfere with each other, but that’s much less of a problem.

Consumer and prosumer analog video tapes (VHS, S-VHS, Hi8) recorded luma and chroma at separate frequencies. When going to a composite output, luma and chroma are mixed back together. So, even with a low-quality standard like VHS, you can see substantial improvements by capturing via S-Video, avoiding another generation of mixing and unmixing luma and chroma.

Component Y′CbCr

Component video is the only true professional analog video format. In component, Y′, Cb, and Cr are stored and transmitted on different cables, preserving separation. Betacam SP was the main tape format that was natively component analog.

Component analog was also the first HD signal, and supports all the classic HD resolutions and frame rates, up to 1080i30. More recent component analog gear supports 1080p60.

When color-coding cables for Y′CbCr, Y = Green, Cb = Blue, and Cr = Red. This is based on the major color contributor to each signal, as described in Chapter 3.

Component RGB

As its name implies, Component RGB is the color space of component video. It isn’t used as a storage format. It is a transmission format for CRT computer monitors, game consoles, and some high-end consumer video gear. Unless you’re trying to capture from one of those sources, there is generally no need to go through Component RGB.

VGA

VGA carries Component RGB. VGA cables can be available in very high quality shielded models, and can support up to 2048 × 1536p60 with some high-end CRT displays. VGA is sensitive to interference, so unshielded cables, noisy environments, low quality switchers, or long runs can cause ghosting or other visual problems. Higher resolutions and frame rates increase the risk of interference, so if you capture from VGA, use the lowest resolution and frame rate that works for your content. Better yet—capture DVI.

DVI

DVI was the first consumer digital video format, and has largely (and thankfully) replaced VGA for modern computers and displays. It is 8-bit per channel RGB and can go up to 1920 × 1200p60. DVI cables can also carry a VGA signal over different pins.

There’s also a dual-link version (still using a single cable) that goes up to 2560 × 1600p60, which is a glorious thing to have in a monitor, particularly when authoring HD content.

There are a variety of capture cards that can grab single-channel DVI, but I don’t know of any that can do dual-link yet.

There are four types of DVI connectors seen in the wild:

•  DVI-I: Supports both DVI-A and DVI-D, so can be used with analog and digital displays. Make sure that digital-to-digital connections are using the digital output.

•  DVI-D: Carries only the digital signal.

•  DVI-A: Carries only the analog signal. Most often seen on one end of a DVI to VGA cable.

•  DVI-DL: Dual Link connector for resolutions beyond 1920 × 1200. Works fine as as a DVI-D cable.

HDMI

HDMI is a superset of single-channel DVI with embedded audio, worse connector (no screws; it can fall out a lot), and a guarantee of HDCP (see sidebar). As it’s electrically identical with the video portion of DVI, simple passive adaptors can be used to convert from one to the other. However, HDMI supports Y′CbCr color as well as RGB, including rarely used “high-color” modes like 4:4:4 16-bit per channel. HDMI equipment normally doesn’t support all the “PC” resolutions like 1024 × 768 and 1280 × 960; HD is typically limited to 1280 × 720 and 1920 × 1080.

HDMI has become the standard for home theater, as it’s cheap and of high quality. Its only real drawback is that darn connector.

HDMI without HDCP is easy to capture as video using products like the BlackMagic Intensity.

DisplayPort

Oh, so very many ports! DisplayPort is aimed at replacing both DVI and HDMI, combining support for greater-than-1080p resolutions with HDCP. It’s designed for easy expandability to new display types like 3D. It’s used in just a few products, and there isn’t a way to capture it yet.

HDCP

DVI devices can but are not required to use HDCP content protection, while HDMI devices always support HDCP. HDCP is a content protection technology that prevents playback if later devices in the single chain could allow an unauthorized capture to take place.

This is a big limitation with DVI to HDMI converters, since often trial and error is required to find out if a particular DVI device supports HDCP.

In some markets, boxes that strip HDCP data from DVI and HDMI are available, but they’re illegal to sell in the U.S.

In general, Hollywood content available in digital HD, be it Blu-ray or other future format, will require HDCP if it is going to be displayed digitally. Most other content won’t use it; for example, game output from the PlayStation 3 and Xbox 360 never uses HDCP, but playback of premium HD video on those devices does.

FireWire

FireWire (also called IEEE 1394, just 1394, or i.Link by Sony) is a high-speed serial bus protocol developed by Apple. The first version supported up to 400 Mbps. The newer 1394b supports 800 Mbps, with the spec going up to 3200, but uses a different connector.

FireWire was a big part of what made DV so revolutionary, as it supports bidirectional digital transmission of audio, video, and device control information over a single cable. The plug-and-play simplicity was a huge improvement over older formats. A single Beta SP deck uses more than 10 cables to provide the same functions, each of which costs more than that single FireWire cable.

The original 1394 connectors come in two flavors: 4-pin and 6-pin. Most DV cameras require 4-pin connectors, while computers and other 1394-equipped devices use the 6-pin variety. Camera-to-computer connections require a 4-pin-to-8-pin cable. Note the extra two pins on 6-pin cables are used to carry power to devices chained together via 1394. The 1394a 800 MHz version came with a third connector type called “beta” that supports power; however, it hasn’t been adopted widely apart from the Mac, and even there mainly for storage.

Lots of video formats use FireWire as the interconnect, including DV25, DV50, DV100, HDV, and D-VHS.

SDI

Serial Digital Interface (SDI) connections transmit uncompressed 720 × 486 (NTSC) or 720 × 576 (PAL) video at either 8-bit or 10-bit per channel resolution at up to 270 Mbps. SDI can handle both component and composite video, as well as four groups of four channels of embedded digital audio. Thus, SDI transfers give you perfect reproduction of the content on the tape. Due to the wonders of Moore’s Law, you can now buy an SDI capture board for less than a good component analog card.

HD-SDI

HD-SDI is the primary high-end interconnect these days. It uses the same cabling as SDI, but provides uncompressed video up to 10-bit 4:2:2 up to 1080i30. Thankfully, all flavors support 24p natively.

There’s an even higher-end variant called dual-link HD-SDI, which uses two connectors to deliver up to 1080p60, and up to 12-bit 4:4:4 Y′CbCr and RGB. 3G SDI is intended to replace that with a single connection capable of the same.

Unfortunately, consumer electronics are not allowed to include SDI or HD-SDI interfaces as a piracy prevention measure.

Audio Connections

Like most things in the video world, audio often gets the short shrift in the capture department. However, it is critically important to pay attention to audio when targeting compressed video. Audio is sometimes the only element you have a hope of reproducing accurately on small screens or in limited bandwidth. Plus, extraneous noise can dramatically hurt compression, so it’s even more important to keep audio captures noise-free.

Unbalanced

Unbalanced analog audio is the format of home audio equipment, where RCA plugs and jacks are the connectors of choice. In audio gear designed for home and project studios, unbalanced signals are often transmitted over cables fitted with 1/4-inch red (right) and white (left) phone plugs. Those audio inputs on computer sound cards are unbalanced mini jacks, like those used for portable headphones. Note mini connectors come in two sizes. It’s nearly impossible to see the slight difference in size, but they aren’t compatible.

Unbalanced audio cables come in both mono and stereo flavors. Mono mini and 1/4-inch plugs use tip (hot) and ring (ground) wiring. Stereo mini and 1/4-inch plugs use the same configuration, but sport two rings.

You’ve no doubt noticed that stereo RCA cables use separate wires for left and right channels. Their connectors are generally color-coded red for right and black or white for left.

So what’s an unbalanced audio signal? Unbalanced cables run a hot ( + ) signal and a ground down a single, shielded wire. Such cables have the unfortunate ability to act as antennas, picking up all sorts of noise including 60 Hz hum (50 if you’re in a country running 50 Hz power) and RF (radio frequency interference). Shielding is supposed to cut down on the noise, but the longer the unbalanced cable, the more it’s likely to behave like an antenna. Keep unbalanced cables as short as possible. And when confronted with the choice, buy shielded cables for audio connections between devices.

Jacks and Plugs

It’s easy to confuse these two terms, and they’re often incorrectly used interchangeably. So for the record: plugs are male connectors. Jacks are female connectors that plugs are “plugged” into.

Balanced

Balanced, like unbalanced, sends a single signal over a single cable; however, that cable houses three wires, not two. The signal flows positive on one wire, negative on another, and to ground on the third. I won’t burden you with the details, but this design enables balanced wiring to reject noise and interference much, much better than unbalanced.

Balanced connections are highly desirable in electromagnetically noisy environments, such as live production environments or video editing studios. They’re also helpful where long cable runs are required.

Balanced audio connections are most often thought of as using XLR connectors, but quarterinch phone connectors can also be wired to carry balanced signals. XLRs are also called Cannon connectors after the company that invented them—ITT-Cannon. The company made a small X series connector to which they added a latch. Then a resilient rubber compound was added to the female end, and the XLR connector was born.

All professional video cameras feature balanced audio connectors.

Levels

It’s worth noting that matching levels plays an important role in getting the best audio quality. Professional audio equipment typically runs at + 4 dBm (sometimes referred to as + 4 dBu) and consumer/prosumer gear runs at – 10 dBV. Without getting immersed in the voodoo of decibels, suffice it to say that these two levels (though measured against different reference points) represent a difference in voltage that can lead to signals that are either too quiet (and thus noisy) or too loud (and very distorted) if equipment with different levels is used.

S/PDIF

S/PDIF (Sony/Philips Digital InterFace) is a consumer digital audio format. It’s used for transmitting compressed or uncompressed audio. Because it’s digital, there’s no quality reason to avoid it in favor of the professional standard AES/EBU, although that format offers other useful features for audio professionals that don’t relate to capture.

S/PDIF cables are coax-terminated in either phone or BNC connectors. For most purposes, S/PDIF and AES/EBU cables are interchangeable. If, however, you’re connecting 20- or 24-bit devices, you’ll likely want to differentiate the cables by matching their impedance rating to that of the format: 75 ohm for S/PDIF; 110 ohm for AES/EBU.

AES/EBU

AES/EBU, from the Audio Engineering Society and the European Broadcast Union, is the professional digital audio format. Of course, its actual content is no different than that of S/PDIF. The advantage of AES/EBU comes from being able to carry metadata information, such as the timecode, along with the signal. If you’re using SDI for video, you’re probably using AES/EBU for audio.

AES/EBU 110-ohm impedance is usually terminated with XLR connectors.

Optical

Optical audio connections, under the name TOSLink, are proof of P. T. Barnum’s theory that a sucker is born every minute. The marketing behind optical proclaimed because it transmitted the actual optical information off a CD, it would avoid the distortion caused by the D/A conversions. Whatever slight advantage this may have once offered, audio formats like S/PDIF offer perfect quality from the digital source at a much lower cost.

That said, lots of consumer equipment offers only TOSLink for digital audio output.

Frame Sizes and Rates

A compressed bitstream comes in at a particular frame size and rate, of course. But when digitizing from analog or uncompressed to digital, the frame size can be selected.

Capturing Analog SD

For analog, the resolution of the source is explicit vertically, since each line is a discrete unit in the original, and only so many contain visual information. But resolution is undefined horizontally, since each line itself is an analog waveform. So, 640 × 480 and 720 × 480 capture the same number of lines. But 720 × 486 captures six more lines than 720 × 480—each line of the 480 is included in the 486, and six total must be added to the top or bottom to convert from 480 to 486. Capturing 720 pixels per line doesn’t mean you’re capturing more of each line than in a 640 pixels per line capture—you are just capturing more samples per analog line, and thus have anamorphic pixels. In most analog sources, the 8 pixels left and right are often blank; it’s almost always fine to capture as 704 wide instead (which is a smaller area than 720). It really gets down to how big your biggest output frame is going to be.

The most important thing in capturing analog SD is to use component analog or at least S-Video. As mentioned previously, composite cables are death to quality, and coax is worse yet.

Analog SD other than component is always an interlaced signal, even if some of those frames happen to have aligned fields so that it looks progressive.

Plenty of capture cards offer analog preprocessing, like noise reduction, deinterlacing, and cropping. These are great for live encoding. When capturing for later preprocessing, you have to weigh the ease of capturing optimized source with the loss of flexibility in being able to apply more advanced software preprocessing. If you know you’re never going to do any per-file tweaking and don’t ever want to go back out to 480i/576i, you can preprocess while capturing.

If you don’t know what you’re going to do, capturing the full 720-width interlaced gives you the maximum flexibility to tweak things later.

Capturing Component Analog

Capturing component analog is generally a lot better, since you don’t have the composite quality hit, and the content is often natively progressive. And, of course, it can have more pixels, which are always welcome. While we normally think of component analog in terms of being HD, it also supports SD, including progressive SD resolutions.

Table 5.1 Component Analog Modes.

Common NameFrame SizeAspect RatioFrames Per SecondFrame ModeImages Per SecondBitrate for Uncompressed 10-bit 4:2:2
480iTypically 640, 704, or 720 wide; height 480 or 4864:3 or 16:929.97 (30/1.001)Interlaced59.94187 Mbps
576i720, 704, 640 wide, height 5764:3 or 16:925Interlaced50187 Mbps
480p640, 704, 720 wide; height 4804:3 or 16:959.94 or 60ProgressiveSame as fps373 Mbps
576p720, 704, 640 wide; height 5764:3 or 16:950ProgressiveSame as fps373 Mbps
720p1280 × 72016:950, 59.94, 50ProgressiveSame as fps?50p: 829 Mbps 60p: 995 Mpbs
1080i1920 × 108016:925/29.97InterlacedSame as fps25i: 933 Mbps 30i: 1120 Mbps
1080p1920 × 108016:924, 25, 50, 60ProgressiveSame as fps24p: 896 Mbps 25p: 933 Mbps 50p: 1866 Mbps 60p: 2239 Mbps

Component analog can be either RGB or Y′CbCr depending on the source device. If the content is originally video, Y′CbCr output may skip a color correction in the player and be more accurate. If it’s a game console, RGB is going to be the native color space. Note game consoles often have multiple options for output luma level.

Capturing Digital

A digital capture is more straightforward, since the source frame size, frame rate, and other qualities are known. So an uncompressed capture simply needs to copy the pixel values straight from SDI to disc.

Capturing from Screen

Capturing screen activity from a computer is important in a lot of markets. Screen captures get used often for game demos, PowerPoint presentations, application tutorials, and so on.

Capturing via VGA

There are a variety of ways to capture via VGA. The simplest is to use a VGA to component fan out (an adaptor that goes from the VGA connector’s pins to separate component cables), and capture with a component analog device capable of handling component RGB. The biggest limitation to this approach is it only supports the component progressive frame sizes, typically 1920 × 1080 or 1280 × 720. Since few laptops have those native resolutions, this normally requires the demo machine to be reconfigured, which can be impractical in some demo environments.

There are dedicated VGA capture systems like Winnov’s CBox 3 and Digital Foundry’s TrueHD that are capable of handling arbitrary VGA frame sizes. For historical reasons, VGA may allow full-resolution HD output for copy-protected content in cases where that is now blocked over analog or non-HDCP DVI.

Capturing via DVI or HDMI

As DVI can be easily adapted to HDMI, HDMI capture boards like Black Magic’s Intensity can be used to capture computer output, albeit again restricted to standard video sizes. And systems with native HDMI are easier yet. Capturing digitally obviously produces higher-quality results than analog, and should be done when possible. TrueHD is the only available system I’m aware of that can handle DVI capture using all the PC resolution options.

Capturing via screen scraping

Capturing what’s on the display from within the computer has long been a popular technique, and is certainly simpler than having to mess with a video device. This has been done with applications like TechSmith’s Camtasia on Windows and Ambrosia Software’s SnapzProX. This technique is often called “screen scraping,” as the apps are basically taking a screenshot periodically, and then saving them with an interframe lossless codec tuned to screen captures.

This continues to work well for lower-motion, lower-resolution content, particularly if there’s a simple background and (in Windows) the Classic theme. But once you need full-motion or high frame sizes, or to include natural images or (worse) video on the screen, the RLE-like codecs just can’t keep up. And Aero Glass isn’t supported at all (Figure 5.3 and Color Figure C.15).

Figure 5.3 (also Color Figure C.15) The same screen rendered in the Aero Glass (5.3A) and Classic (5.3B) themes. Aero Glass looks much nicer, while Classic has lots of flat areas for easy RLE-like compression.

image

If you’re using a classic screen scraper, make sure to use the smallest capture region that works for your image. Ideally, you can capture just an application instead of the whole display. The fewer pixels per second you need, the less of a CPU hit you’ll get from the capture process, and the better performance you’ll get from what you’re capturing.

For that reason, a multicore system is very helpful—the capture process doesn’t have to share as much CPU resources with the rest of the system.

Capturing via windows’ desktop window manager

Windows Vista introduced a new display architecture—the Desktop Window Manager (DWM). The DWM is used with the Aero Glass theme, enabling the transparencies and other effects. DWM works by using the GPU’s 3D engine to handle compositing the image; it essentially turns the user interface into a 3D game. Contrary to rumors, this “eye candy” actually improves performance, by offloading the math required to draw the screen from CPU to GPU.

Because it’s using the 3D card, the final screen is resident as a “3D surface” on the GPU. We now have modern capture technologies that are capable of grabbing that directly without interfering with the normal operation of the display, allowing for much higher frame rates and frame sizes to work.

FRAPS was the first product to take advantage of this. It was originally created for game captures under Windows XP, and so has a lightweight (but space-hungry) codec that handles very high motion with aplomb. I’ve captured up to 2560 × 1600p60 with FRAPS (albeit on an 8-core system).

Microsoft’s Expression Encoder 3 includes a new Expression Encoder Screen Capture (EESC) application, including DWM capture. This uses a new codec that is extremely efficient for screen activity, with a controllable space/quality level. EESC can simultaneously capture screen activity and a live video input, to enable synchronized recording of a presenter and presentation.

Capturing via MacOS’s quartz

Another approach to capturing screen activity is to actually capture the commands being sent to the display manager, and then be able to play those back. This is the technique used by Telestream’s ScreenFlow for Mac OS X, which captures the input of the Mac’s Quartz rendering engine. While it’s not exactly a codec, it basically makes a movie out of all the commands sent to the screen, and can convert that to an editable or compressible file after the fact. It’s cool stuff.

Capture Codecs

Uncompressed

Uncompressed is the simplest to capture, although the most expensive in terms of space. But even uncompressed video might not always be exactly the same as the source. For example, 8-bit 4:2:2 can be captured from 10-bit 4:2:2. Some hardware, like AJA’s capture boards, have a nice 10-bit to 8-bit dither, and so it can make sense to do that conversion on capture if no image processing is planned and there isn’t a good dither mode later in the pipeline.

DV

I’m not a fan of digitizing to DV25. 4:1:1 color subsampling is just too painful. There are lots of cheap composite/S-Video-to-DV converter boxes, but I don’t consider them capable of production-quality video.

Motion JPEG

Motion JPEG and its many variants (like the Media 100 and Avid AVR codecs) were popular for intraframe encoding in NLEs for many years, and are still supported in many cards. They give decent lightweight compression; the main drawback is that they’re limited to 8-bit.

DN x HD

Avid’s DNxHD is another DCT intraframe 4:2:2 codec offering good quality and speed, which is well-suited to real-time capture. I prefer to use the higher 220 Mbps mode, which can provide 10-bit and doesn’t use horizontal subsampling. DNxHD is being standardized as VC-3. Free decoders are available for Mac and Windows.

ProRes

Similar to DNxHD, Apple’s ProRes codec is also a 4:2:2 intraframe DCT codec. Its video modes have a maximum bitrate of 220 Mbps with 10-bit support, with a new 12-bit 4:4:4 mode added as of Final Cut 7. Encoding for ProRes is available only for the Mac, although there’s a (slow) decoder available for Windows.

Cineform

Cineform is a vendor of digital video workflow products, centered around their wavelet-based Cineform codec. The codec itself is available in a variety of flavors, and can handle RGB/Y′CbCr, up to 4 K, 3D, up to 16-bit, mathematically or visually lossless, and a variety of other modes. There is a free cross-platform decoder, and the spec for the bitstream has been published.

Their Prospect plug-ins for Adobe Premiere heavily leverages wavelet sub-banding for fast effects and previews. It’s long been popular as an intermediate codec for conversion from HDV and AVCHD sources.

Huffyuv

Huffyuv is a lightweight, mathematically lossless codec for Windows, using Huffman encoding. It’s easily fast enough to use in real-time capture. It offers 8-bit in RGB and 4:2:2.

Lagarith

Lagarith is an updated version of Huffyuv including multithreading and arithmetic coding. It typically offers about 30 percent lower bitrates than Huffyuv, but requires somewhat more CPU power to encode and decode.

FFV1

FFV1 is a new lossless open-source codec somewhere between Huffyuv and Lagarith in compression efficiency and decode performance. It’s included in ffmpeg, and so available for many more platforms than the Windows-only Lagarith.

MPEG-2

There’s a wide variety of capture technologies based around MPEG-2. These can be High Profile (4:2:2) or Main (4:2:0), interframe- or intraframe-coded, and can use a variety of different wrappers. Most MXF files are compressed with MPEG-2.

An intraframe coded MPEG-2 is quite similar to all the other DCT codecs.

AVC-Intra

AVC-Intra is an attempt to let H.264 provide similar capture and post functionality to I-Frame MPEG-2, with more efficient encoding. It supports 10-bit luma, and has two standard modes:

•  50 Mbps with 75 percent anamorphic scaling and 4:2:0 coding

•  100 Mbps without anamorphic scaling and 4:2:2 coding

AVC-Intra can produce high quality, certainly, although it’s somewhat slower to decode than older codecs. The 100 Mbps variant uses the lighter-weight CAVLC entropy encoding.

Data Rates for Capture

One critical issue with all capture is the data rate. With a fixed codec like DV25, it’s easy: DV is locked in at 25 Mbps. And capturing uncompressed is easy: bits per pixel × height × width × frame rate. But other codecs can have a variety of configurable options, controlling bitrate directly or by a quality level (typically mapping to QP). Fixed bitrates made sense when storage speed was a big limitation of performance, but that’s much less of an issue these days of fast, cheap RAID. Even with a fixed bitrate, the actual output data rate may be less if some frames are very easy to compress, like black frames. This is a good thing, as it’s just free extra capacity.

The simplest thing to do if you don’t care about file size is to just use the highest available data rate/quality combination and call it a day. If you’re trying to find a balance between good enough quality and efficient storage, capture a few of your favorite, challenging clips at a variety of bitrates, and compare them on a good monitor. Pick the lowest bitrate/quality combination where you can’t see any difference between that and the source.

The three limiting factors in data rate are drive space, drive speed, and capture codec. Drive space is self-explanatory, and since we discuss capture codecs at length elsewhere, we don’t need to look at them here. But let’s look at drive speed.

Drive Speed

If access to the hard drive isn’t fast enough to capture all the video, frames may be dropped. This can be a serious, jarring quality problem. Always make sure your capture program is set to alert you when frames have been dropped.

If you have decent drives in a RAID configuration, even uncompressed 1080i is pretty easy these days. A few rules of thumb:

•  Video goes down in long sequences, so rpm doesn’t matter. Don’t pay extra for low-capacity 15000 rpm drives; 7200 is cheaper and offers better capacity.

•  Solid state drives are great for netbooks, but they’re not set up for the sustained writes of video capture today. They might work for DV25, but won’t give anything close to the write throughput of a good HD RAID.

•  Don’t let your drives get too full (under 80 percent full is a good rule), and defragment them occasionally.

•  If RAID seems expensive or complex, know that my first RAID cost me $6,000, was 4 GB, and couldn’t do more than about 50 Mbps sustained.

Postproduction

Postproduction (or just “post”) is where the raw footage gets turned into finished video, through editing, the addition of effects and motion graphics, voice-overs, audio sweetening, and so on. Compared to production, postproduction is marvelously flexible—with enough time and big enough budget, a project can be tweaked endlessly. This can include the luxury of test encodes to locate and adjust trouble spots.

Again, classic editorial styles and values prove to be good for compression as well. And as a general matter of craft, it’s good to match the production techniques and parameters in post where possible.

Postproduction Tips

Use antialiasing

Antialiasing is the process of softening the edges of rendered elements, like text, so they don’t appear jagged. By eliminating those jagged edges, antialiasing looks better when compressed. Most apps these days either have antialiasing turned on by default or make it an option, often as part of a high-quality mode. If importing graphics made in another app, such as Photoshop, make sure that those files are antialiased as well. See Figure 5.4.

Figure 5.4 The same text encoded to a 30 K JPEG, one without – anti aliasing (5.4A, left), the second with it (5.4B, right).

image

Use motion blur

Earlier, you learned how codecs spend bandwidth on complex images and motion, and hence the most difficult content to compress is rapidly moving complex images. Shooting on film or video with a slower shutter speed gives us motion blur, a great natural way to reduce this problem. With motion blur, static elements are sharp and moving elements become soft or blurred. Thus, there are never sharp and moving images at the same time. If you’re importing graphics or animation from a tool such as After Effects, make sure those images use motion blur. If you’re compositing graphics over video, the graphics’ motion blur value should match the shutter speed used in the camera. Beyond compressing well, it also looks much more natural; lots of otherwise good animation or compositing has been ruined by a mismatch in motion blur between foreground and background.

Some motion blur effects spread one frame’s motion out over several frames or simply blend several frames together. Avoid these—you want natural, subtle motion blur without obvious banding. The NLE may have a configurable pattern for how many samples to run during motion blur. Higher values are slower to render, but reducing banding/mirroring. See Figure 5.5 for a low-banding example.

Figure 5.5 Starting with the image in 5.5A, rendered motion blur (5.5B, right) should look just like that from a camera, with blurring along the axis of motion (seen here as the dotted line).

image

Render in the highest-quality mode

By default, you should always render at highest quality, but it’s doubly important to render any effects and motion graphics in the highest-quality mode when you’re going to deliver compressed footage. Depending on the application, the highest-quality rendering mode activates a number of important features, including motion blur and antialiasing. This removes erroneous high-frequency data; softening the image where it should be soft, leaving more bits to be spent on the real image.

If you’re creating comps in a faster mode, don’t make any decisions based on test compressions made with those comps—the quality can be substantially worse than you’ll get from the final, full-quality render.

Avoid rapid cutting

At each edit, the change in the video content requires a new I-frame to account for the big change in the content, starting a new GOP. A cut every few seconds is generally okay at moderate or higher bitrates, but a Michael Bay–style, hyperkinetic montage with an edit every few frames can be a codec killer, and turn into a blurry or blocky mess.

Avoid cross-dissolves

A cross-fade is one of the most difficult kinds of content to compress, as it’s a blend of two different frames, each with their own motion, and with the intensity of each changing frameto-frame. This keeps motion estimation from being as helpful as normal; some codecs can wind up using many I-frames during a cross-dissolve. It’s much more efficient to use hard cuts. Fades to/from black are only slightly easier to compress than a cross-fade, and should be avoided if possible. Wipes and pushes work quite well technically, but aren’t appropriate stylistically for many projects. A straight cut is often your best bet.

Use codec-appropriate fades

A fade to/from black or white may seem like the simplest thing in the world to encode, but the changing intensity is very hard on traditional DCT motion estimation. Older codecs like MPEG-2 or MPEG-4 part 2 can require a bitrate spike or yield lower quality during these fades. VC-1 and H.264 have specific fade compensation modes that make this much easier, although implementations vary quite a bit in how well they handle fades.

No complex motion graphics

Complex motion graphics, where multiple onscreen elements change simultaneously, are extremely difficult to encode. Try to make each element as visually simple as possible, and move only things that you have to. Flat colors and smooth gradients compress more easily than textures. Two-dimensional motion is very easy for motion estimation; if you want to have complex moving patterns, take complex shapes and move them around on the x and y planes without touching the z axis. And always remember motion blur.

Don’t rotate

Codecs don’t rotate as part of motion estimation, so that animated radial blur can be rather problematic, as would be that tumbling 3D logo.

Use simple backgrounds

Keeping the background elements simple helps with compression. The two things to avoid in backgrounds are detail and motion. A static, smooth background won’t be a problem. A cool particle effects background will be. Motion blur can be your friend if you want to have background motion without eating a lot of bits. See Figure 5.6.

Figure 5.6 A complex background (5.6A) sucks bits away from the foreground, reducing quality of the important parts of the image compared to a simple background (5.6B).

image

No fine print

The sharp edges and details of small text are difficult to compress, especially if the video will be scaled down to smaller resolutions. Antialiasing can help preserve the legibility of text, but it can’t work miracles if the text is scaled down too much. For best results, use a nice thick sans serif font. See Figure 5.7.

Figure 5.7 A thick, simple font (5.7A) survives a lot better than a complex, thin one (5.7B) when encoded to the same data rate.

image

Watch that frame rate

Animators often work at whole number frame rates, like 24 and 30. But video formats run at the NTSC versions which are really 24/1.001 (23.976) and 30/1.01 (29.97). Make sure that everyone’s using the exact same frame rate for assets, so no unfortunate mismatches happen.

PAL rates are the nice round numbers of 25 and 50, and so should not have a 1.001 correction applied.

Know and stick to your color space

This is last not because it’s least important, but because I want to make sure that anyone skimming finds it.

As I mentioned back when discussing sampling and quantization, there are a few variants in formulas used for the Cb and Cr for converting between RGB and Y′CbCr, codified in Recommendations 601 and 709 from the International Telecommunications Union (ITU). Specifically, there are three:

•  SMPTE C (NTSC [less Japan] Rec. 601)

•  EBU (Japan and PAL Rec. 601)

•  Rec. 709 (HD)

Mathematically, 709 is right between SMPTE C and EBU, making a conversion between them straightforward. And the luma calculations are the same either way. But you can get a slight but perceptual color shift if 601 is decoded as 709 or vice versa. One hit of that shift isn’t that noticeable that often, but it’s enough to cause a noticeable seam in compositing.

More problematic is multigenerational processing that winds up applying multiple generations of wrong color correction. For example, using a tool that exports to 709 correctly, but assumes source files are 601, and so each round of processing causes another 601 to 709 color change. See Color Figure 16 for the effects of incorrect color space conversion.

Getting this right is normally the simple matter of making sure that each file is tagged correctly on import and that the export color mode is specified.

If a workflow makes it hard to track what particular assets are, appending a “_709” or a “_601” to the filename is a handy shorthand.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.184.142