CHAPTER 27
Media on Windows

Introduction

This chapter is about media playback on Windows, inclusive of but not limited to Windows Media.

I’m sure renaming it “Windows Media” from NetShow made sense back in the day, but it does make for some challenging textual gymnastics. For example, I can’t say “Windows Media APIs”—I have to say “Windows APIs for Media.” I offer my sincere apologies for any unpoetic cadences deriving thereof.

A History of Media Features in Windows

DOS

It all started with Microsoft DOS for the IBM PC in 1981. Command-line, character-based display, and no audio that deserves the name (more of a muffled beep at best).

But man, did it run business apps, and a lot of them, and rapidly found a place on desks around the world.

Windows 1–2

Windows 1.0 launched back in 1985, without much market impact. The early versions of Windows ran on top of DOS, and were quite basic and not widely used due to high system requirements and not much software that took good advantage of the GUI. Things perked up a bit with Windows 2 as the first versions of the familiar Microsoft Office apps began to appear, but most applications were still DOS-only.

This didn’t slow the runaway success of DOS during this period other companies figured out how to make compatible hardware, driving rapid competition on features and price in the PC market, and rapid growth.

While Macs dominated publishing and other media-related tasks, DOS dominated business applications.

Windows 3.0/3.1

The year 1990 was when Windows 3.0 finally went mainstream, as a critical mass of native Windows apps started to appear, and as faster machines made Windows a great way to work with multiple DOS apps at once.

The first real media functionality arrived in 1991 in the Windows 3.0 MultiMedia Extensions version, preinstalled on “multimedia” PCs—they were required to have sound cards, CDROM drives, and at least 640 KB of RAM! This included the first version of Video for Windows, the AVI file format, and Media Player. The extensions were incorporated into Windows 3.1 in 1992, which was the real breakout version.

Windows 95/98/Me

The Windows 95, 98, and Me (Millennium Edition) versions straddled the DOS-based Windows 3.1 and the 32-bit NT-based worlds (more on NT in a moment), offering backward compatibility with old DOS applications and lower system requirements.

Windows 95 (from 1995, natch) is probably the oldest Windows version that would still seem reasonably like a computer to people born since its release. Some core features it introduced:

•  Filenames longer than eight characters

•  The Start menu

•  32-bit applications, raising the memory a program could easily use from 640 KB to 2 GB

OEM Service Release 1 at the end of 1996 added MPEG-1 playback, making it the highest quality format supported on both Mac and Windows out of the box, and thus a popular CD-ROM codec. Win95 also came with Internet Explorer 2, the first web browser included with an operating system. Windows 95 was the first version really targeted as a multimedia playback platform.

Windows 98 was fundamentally a refined version of the big changes in 95. It introduced DirectShow, which remains the primary API for media authoring and playback for most Windows apps. DirectShow’s improvements made media playback a lot smoother.

Windows 98 is probably the oldest version still seen on running machines. And it’s the oldest capable of anything like a modern media experience, being the oldest version of Windows that can run WMP 9. Windows machines of this age should all have decent sound, full-color video, and at least a CD-ROM drive. DVD-ROM support was added in Win98 Second Edition a year later, including a DVD player app (which required a separate MPEG-2 decoder). Windows Me (Millennium Edition) was the last non-NT-based version of Windows. While never as popular as 98 or XP, it introduced important media technologies, including WMP 7 and Movie Maker. It suffered from stability and other issues, and most users updated to Windows XP.

And thus ended the era of DOS.

NetShow

NetShow was the start of Microsoft’s effort to go beyond playback of local AVI files to streaming. Version 1.0 launched in 1996 as an audio-only technology, with video coming in 2.0 the next year. Things got really interesting after the acquisition of VXtreme, with core personnel and technology driving NetShow 3.0, which included the ASF file format and early parts of what become the MPEG-4 part 2 video codec.

While technically interesting, RealNetworks still had the dominant market share for web streaming.

Two big differences in Microsoft’s approach were making media server technology much less expensive than RealNetworks’, and a big focus on getting software developers to incorporate NetShow into a wide array of products.

Windows NT

Windows NT was started back in 1988 to build a real 32-bit OS from the ground up without the DOS underpinnings of 3.0 through Me, and all recent versions of Windows from XP on are evolutions of NT.

NT introduced core modern features like:

•  Protected memory (one bad program couldn’t easily crash the whole computer)

•  Multiprocessor support

•  The NTFS file system and file names beyond eight characters

This made Windows NT a much more stable and powerful for workstations compared to Windows 95/98 and the Mac OS of the era. Windows NT 3.1 launched back in 1991, starting a decade of NT- and DOS-based Windows coexistence. Windows 95 and NT shared the Win32 API, though, so most applications could run fine on either.

It was followed by NT 3.5 in 1994, notable for big speed improvements. Version 3.5.1 in 1995 was a bigger update than its name implies; particularly for media. This was the first version of NT to see much use for content creation, particularly 3D animation and some video editing products.

NT4 in 1996 was the breakout version, and was widely used in corporations as personal desktops. NT4 adopted the Windows 95 interface and bundled apps, and therefore offered the power and stability of NT with the usability and familiarity of the DOS-based Windows. It had some very important new media features.

•  Video drivers moved into the kernel for much better performance

•  The first version of DirectX (although without 3D)

A number of formerly Mac-only companies like Avid and Media 100 released high-end video editors for NT, leveraging the greater stability of the platform.

The Windows NT brand ended with NT 4, which only went as far as WMP 6.4.

Windows Media Launches

NetShow was renamed Windows Media in 1999, before the Windows 2000 and Me launches. A key feature was unification of the local-file Media Player with the NetShow player into the single “Windows Media Player”—an architecture and name that remains today.

Windows 2000

Windows 2000 (in 1999) was NT5 with a new brand, and is about the oldest NT still seeing significant use (mainly behind corporate firewalls on managed desktops, although my kids still have a few in their elementary school). The summer 2009 Forrester data shows Windows 2000 as only 1.2 percent of enterprise desktops.

Windows 2000 launched with WMP 6.4, but can install WMP 9 and Silverlight; it’s the oldest OS with Silverlight support.

Windows XP

XP was based on the business-focused 2000, but incorporated many consumer features from 98/Me to finally unify Microsoft’s product lineup and say goodbye to DOS. XP is the oldest version of Windows in broad use today; most projects can target it as a baseline.

Windows XP introduced critical innovations for HD media playback, including DXVA for hardware-accelerated video decode, GPU compositing for video playback, and 5.1 audio mixing. Windows had gone from being years behind the Mac in media technology to establishing a substantial lead: QuickTime wouldn’t get 5.1 audio until 2005 or hardware decode until 2009.

Windows XP originally shipped with “Windows Media Player XP” (really WMP 8). However, the essentially universal Service Pack 2 includes WMP9, so that can be considered the baseline today. WMP 10 was only supported on XP, and XP is the oldest version that can support WMP 11. Everything older is limited to WMP 9 or earlier.

Windows XP got a big upgrade to Movie Maker with good DV camcorder support, which was widely used for consumer and corporate video production.

Windows Media Center was originally implemented as a different edition of XP in 2002, offering a “lean-back” remote control-based interface for watching TV. By the 2005 edition, this expanded to include live analog and digital TV tuners and playing content out to CE device extenders so that the PC doesn’t need to be near the TV.

Windows XP was the first iteration of Windows to have a 64-bit version, although little media software outside of 3D animation took advantage of it.

Windows Media 9 Series

The original streaming wars largely ended in 2003 with the release of Windows Media 9 Series. It had a broad swath of deep features, low operating costs, great audio/video quality, and was preinstalled on the dominant OS. RealNetworks soon transformed from a media technology platform company to a content licensing and distribution company.

Beyond Windows itself, device support was ramping up quickly, not only on Windows CE and Mobile, but in a variety of media players coming to compete with the iPod.

But while the experience was great on Windows and embedded in Internet Explorer, there wasn’t a clear plan for alternate browser or Mac support.

Ben Waggoner Joins Microsoft

Well, I like to think it was an important day in the company’s media history! More importantly, this is where this history turns from the observation of an outsider to an insider.

I was hired as a program manager on the HD DVD team focusing on the awesome new Windows Media Encoder Professional Edition.

Alas, HD DVD died and Professional Edition never shipped (although much of its technology made it into CineVision PSE, the VC-1 encoder SDK, and Expression Encoder).

Windows Vista

Windows Vista had a challenging development process and launch, but it contained a lot for content creators and consumers. Vista shipped with WMP 11, so all the advanced WMV and WMA codecs are fully supported. The Home Premium and Ultimate versions included DVD playback for both WMP and Media Center.

Vista introduced the Media Foundation API for media playback, although it was mainly used for protected media playback, with DirectShow remaining the default for other content. The Windows Imaging Component added much deeper still image support, and the HD Photo codec (being standardized as JPEG XR).

The Display Window Manager (DWM) in Vista adds system-wide GPU compositing. This saves CPU power, and also makes the user interface much more responsive under heavy load (like compressing video in the background). This is part of broader DirectX enhancements, which include pixel shader support; GPU processing using Direct3D requires Vista or higher.

Vista also added DXVA 2.0, enabling hardware accelerated decode of H.264 and MPEG-2, making software Blu-ray players possible. But it didn’t include out-of-the-box decoders for either other than a MPEG-2 decoder only available for DVD playback in WMP.

On the encoding side, Vista introduced tuning for NUMA architectures; Opteron and Nehalem dual-socket systems run quite a bit faster in Vista than XP.

Vista has been inaccurately maligned for poor performance. Particularly for media authoring and playback, on good hardware it’s a definite upgrade from XP.

Windows Server 2008 is built on top of Vista, so it shares Vista’s media features. WMP and the other media libraries are included in the “Desktop Experience” feature, which must be installed for media playback or authoring. 2008 R2 is based on Windows 7.

Windows 7

Windows 7 was just finalized as this chapter was written, so check the book and Microsoft web sites for any new details that have popped up since release.

Windows 7 has lots of media enhancements, with a big expansion in media format support. It also has important improvements in hardware support to enable low-power machines with great media playback.

Win 7 includes Windows Media Player 12, with much more pervasive Media Foundation support. WMP 12 is only available as part of Win 7; WMP 11 remains for XP and Vista.

The biggest deal in Windows 7 may be a broad expansion in media format support beyond Windows Media and AVI for both playback and transcoding. Out of the box, it’ll handle most of the files in media libraries that formerly required third-party components (it’s a good sign when MKV—the Matroska Video format—is the most common request left). The new formats and codecs include the following:

•  MPEG-4

•  MPEG-4, QuickTime, 3 GPP, including AVCHD

•  MPEG-4 part 2 (SP and ASP)

•  H.264 (up to High, all levels)

•  AAC, including LC and HE v1 and v2, and multichannel

•  “DivX/xvid” AVI

•  MPEG-4 part 2 ASP

•  MP3 and MS ADPCM

•  MPEG-2

•  Program and Transport Streams, including HDV

•  MPEG-2 up to 1080i/p

•  H.264 up to High

•  Dolby Digital, PCM, and MP2

These new formats are available in the mainstream Home Premium, Professional, Enterprise, and Ultimate editions. However, WMV remains the baseline in the developing-country targeted Starter and Home Basic editions.

Windows APIs for Media

Given how long Windows has been around, and the strong emphasis that Microsoft places on backward compatibility, it’s not surprising that there are a lot of different generations of the technology.

Video for Windows

Video for Windows (VfW) was the first popular API for playback and encoding in Windows, introduced way back in the 16-bit era for Windows 3.1. It was basic, but did the job. Its last significant update was for Windows NT.

Microsoft deprecated it back in 1996 in favor of DirectShow, but it’s still supported and used by a few apps, most notably VirtualDub.

DirectShow and Media Foundation codecs need to explicitly have VfW support turned on, which is why some codecs like Microsoft’s DV implementation don’t show up in VirtualDub or other VfW apps (if any others are left).

DirectShow

DirectShow had been the mainstream playback and non-WMV video encoding API in Windows since the mid-1990s. It was originally named ActiveMovie, but was added to the DirectX family of technologies and thus renamed DirectShow.

DirectShow was a lot more powerful than VfW, with support for new technologies like multichannel audio, and is also much more extensible. Plug-ins are implemented as DirectShow “filters” or as DMOs—DirectShow Media Objects. These can be codecs, demuxers, video and audio effects, etc. These get combined into a “Filter Graph,” which is the flow of the content from demuxing to decoding and display or writing back to a file. The free GraphEdit utility (Figure 27.1) included with the Windows SDK visualizes this, and is handy for troubleshooting or designing a custom workflow.

Figure 27.1 GraphEdit, showing the filters used to play a Smooth Streaming .ismv using the demuxer installed by Expression Encoder 3.

image

The flexibility of DirectShow enabled to handle pretty much all pre-Vista media playback, including DVD playback and Media Center.

On the encoding side, DirectShow is mainly used for source decode and AVI writing. Creation of other files is normally handled by other libraries. DirectShow is also heavily used by video apps on Windows like Premiere Pro.

There’s a very big ecosystem of DirectShow filters to extend playback features in Windows. There are commercial ones for professional formats like MXF, and free decoders like ffdshow.

DXVA

DXVA is the DirectX Video Acceleration API, introduced with WMP 10 to enable hardware acceleration of video decoding. The initial DXVA implementation was MPEG-2 and WMV only.

DXVA 2.0, in Vista forward, made hardware-accelerated H.264 support possible. However, that was only exposed to specific apps that could pass it the demuxed H.264 bitstream, without WMP or other players getting access to full .mp4 playback.

Whether DXVA is used depends on the driver and the video card. Very old or cheap GPUs may have no or limited DXVA. And the amount of speedup can vary by model of card as well, with newer cards able to offload more decode from the CPU.

Tip: Avoiding .dll Hell

DLL hell is when multiple dynamic link libraries (.dll files) on Windows all claim to do the same thing, so Windows isn’t sure which to use.

This can happen with codecs, and it’s not uncommon for someone who downloads a lot of content to have multiple H.264 and MPEG-2 decoders installed from difference places. But there’s no mechanism in DirectShow to make sure the optimum .dll for the bitstream is picked, so sometimes a buggy or incompatible decoder is used, which leads to some difficult-to-diagnose issues.

The first step to avoiding this is only installing decoders that you know and need. The worst problems are general caused by “codec packs,” which are packages of many decoders, often pirated, to enable playback of pirated content. There are some well-tested codec packs like the Combined Community Codec Pack (CCCP), but most others are buggy to the point of malware, and in some cases are malware masquerading as a codec pack.

Windows 7 addresses this problem by always defaulting to any Microsoft or hardware-vendor decoder where available. So, if an app wants to use a different software H.264 decoder would need to explicitly add it in its Filter Graph. WMP and MCE on Windows 7 always pick the internal or hardware decoders, barring registry key fiddling.

ffdshow

One popular filter for DirectShow is ffdshow, an implementation of the very flexible ffmpeg open-source player library as a DirectShow filter.

ffdshow enables WMP and DirectShow compatible compression tools to open a wide variety of other formats like Ogg, Snow, Dirac, H.264, other MPEG-4 variants, and others. Versions are installed with Sorenson Squeeze and Rhozet Carbon.

ffdshow is highly configurable through a highly frightening dialog—see Figure 27.2.

Figure 27.2 A small taste of one of dozens of panes of one of the three ffdshow configuration dialogs.

image

AVISynth

AVISynth, described in more detail in Chapter 6, is itself a DirectShow filter. So other DirectShow apps see an .avs file as a media file and can scrub, transcode, and so on, even though the file itself is just a text file of instructions for how to synthesize the video.

AVISynth is incredibly powerful, and broadly used in the preproduction and compression industries. That it’s Windows only is another reason why complex and high-quality compression workflows are mainly on Windows.

Haali Media Splitter

Haali Media Splitter is a free demuxer that can extract the elementary streams from a wide variety of source formats not supported on older versions of Windows like MPEG-2 transport and program streams, MPEG-4, and Ogg. It also includes an excellent AVI demuxer that sometimes works on very large AVI files that the built-in XP/Vista demuxer has trouble with.

It’s also the best-known demuxer for the Matroska format (MKV), often used to deliver rich subtitles. MKV isn’t natively supported in any version of Windows. There are other demuxers for MKV, however, including ffdshow.

CoreAVC

CoreAVC is a popular DirectShow H.264 decoder from CoreCodec often used to add H.264 support to older versions of Windows (and is widely available on other platforms). It supports Baseline-High 4:2:0, but not 4:2:2 or 4:4:4.

It was a famously fast software-only decoder in its 1.0 version, and has tons of decoding configuration options. It was one of the first decoders to take advantage of GPU pixel shaders for video playback, using Nvidia’s CUDA technology. Since all CUDA-compatible GPUs also include DXVA 2.0 support, on Windows it is probably most useful for XP users.

Media Foundation

Media Foundation (MF) is the new Windows API for media playback and authoring. Just like DirectShow didn’t kill VfW, MF doesn’t replace DirectShow as much as supplement it. They can share codecs and other technologies, with MF able to use DirectShow DMOs. The native MF equivalent is the Media Foundation Transform (MFT).

MF was introduced in Vista, but was mainly used for protected media playback; DirectShow continued to be used for most WMP playback. MF also contains the functionality of the Windows Media Format SDK for playback.

Windows 7 greatly expands the use of MF, with native implementations of many codecs and formats.

One important addition in Media Foundation is much lower latency and controllable latency for streaming. Traditionally, WMP and other players using the Format SDK had a default five-second buffer with no easy way to configure a lower value. In MF, this is programmatic, all the way down to 0. Silverlight is based on Media Foundation, and this is how it’s able to deliver lower latency than WMP.

Another great feature of Media Foundation in Win 7 is an automatic bob deinterlacer for all interlaced codecs. So you get 60p out of 30i files.

Protected Media Path

The Protected Media Path (PMP) is an MF feature added in Vista that protects specifically flagged DRM content from screen captures or unencrypted (non-HDCP) output. This is a contractual requirement from Hollywood for top-tier HD content, with new output protection requirements phasing in every year for the next few years.

There was quite a kerfuffle when Vista was launched about how this feature was going to slow system performance or do other horrible things. However, PMP is only used when playing back DRM-protected media flagged to require it, and has very little overhead (generally any GPU with HDMI output has DXVA 2.0). This urban myth remains oft-quoted but little-tested on sites like Slashdot. PMP will never be applied to content you create unless DRM is applied to that content, which generally means that content won’t be available for any platform without output protection.

Hardware-accelerated transcoding

Windows 7’s MF implementation adds a new API for hardware acceleration of video decoding and encoding. This is unofficially called SHED, for Secure Hardware Encoding Decoders (“Secure” refers to its ability to transcode and display while preventing capture of the uncompressed intermediate frames). This goes beyond the DXVA model of having decoding circuitry on just the GPU and lets it be implemented as a dedicated chip or part of another system component.

By combining hardware decode, preprocessing, and encode in a single function, transcoding doesn’t have to stress the main CPU or memory. This makes converting content for devices much faster. However, the quality of the initial SHED implementations isn’t what a hand-crafted encode on an 8-core monster workstation could do. But it’s a huge improvement in “good enough, fast enough” particularly on netbooks and other less powerful systems.

SHED products have been announced by Broadcom, ViXs, Toshiba Semiconductor, and Quartics. SHED could also be delivered as an add-on PCI Express card. Exposed as a MFT, this hardware transcoding can be easily called by any Media Foundation application.

Windows Media Format SDK

The Windows Media Format SDK is used to both encode and play back Windows Media content, and is covered in the Windows Media chapter. The most notable recent change is the quality and performance improvements in the Windows 7 encoder over FSDK 11 for XP and Vista.

Most professional WMV encoding products use the VC-1 Encoder SDK, which is a static library added directly to an application, and doesn’t have any dependency on the Windows version.

Major Media Players on Windows

Windows Media Player

Windows Media Player is implemented on top of the available media APIs, using them for all presentation. It’s more than just a player, of course, with library management features for video and audio content.

WMP can also share content to compatible devices like the Xbox 360 and other copies of WMP.

The most commonly seem versions are 9 (XPSP2 default), 11 (Vista, and recommended update for XP), and 12 (Windows 7).

Generally, anyone who was able to upgrade to 10 has long since updated to 11.

Zune Media Player

The Zune desktop software was originally a slightly modified version of WMP11, but it became a distinct app in its 2.0 version. Like WMP, it offers media library management. But it’s got some additional features of note.

Only the Zune client can sync to Zune devices, and it can be used as media player app without owning a Zune. The Zune client provides H.264 and MPEG-4 part 2 decoding for XP and Vista, and can share MPEG-4 files to the Xbox 360, which WMP11 can’t. So it can be very useful for XP/Vista users,

Zune is also a good podcast client (one reason for that MPEG-4 playback).

VLC

VLC is the VideoLan Client, an open source media player originally created by French college students. It’s available for pretty much every other platform you could imagine as well as Windows; possibly because all the decoders are self-contained.

Thus VLC can’t suffer from dll hell. It’s a great thing to have on a production machine where you don’t want to install a lot of extra DMOs or MFTs.

It does have a nasty habit of taking over a whole lot of file type associations when installed; be careful to uncheck any you want to watch in another player.

Silverlight (Is Not a Media Player)

While Silverlight is certainly used to play media on Windows, it’s not a media player itself. You can’t double-click or drag a media file to “Silverlight” to play because it’s not an application itself, but a runtime that hosts applications.

So, installing Silverlight has no impact on how media gets played outside of Silverlight applications, be they in-browser or out-of-browser. One can certainly build an application that uses Silverlight for media playback, but the desktop icon is of that application, not Silverlight itself.

Windows Media Center

Windows Media Center (generally abbreviated “MCE” from its original name of “Windows XP Media Center Edition”) is another media player app, designed for a “10-foot” experience, used with a remote from a couch. It supports the same media playback functions as WMP, but with a UI appropriate for remote control navigation.

MCE also can record live video using available analog or digital capture hardware. This includes MPEG-2 bitstreams from ATSC, DVB, and unencrypted cable (“ClearQAM”—you don’t even want to know what QAM stands for). Special CableCard-approved PCs can also view and record (but not transcode, due to contractual restrictions) encrypted premium content via a CableCard.

Media Center Extenders

A Media Center Extender is a remote device that can play content from MCE. This includes the MCE interface; an extender plugged into a TV will offer largely the same experience and navigation as if the MCE system was plugged into the display directly. This enables the MCE machine (hopefully with a lot of storage) to be the hub for recording and storage, with multiple cheaper devices accessing the same library.

The Xbox 360 works as an extender, as do specific devices from other vendors. They’re described in more detail in Chapter 25.

Media Formats on Windows

AVI

AVI (Audio Video Interleave) was the original media format from Windows 3.0, and continues to be widely used long after Microsoft tried to deprecate it in favor of ASF and the largely forgotten Advanced Authoring Format—AAF.

The strength and weakness of AVI is its simplicity. Compared to modern formats like MPEG-4, AVI doesn’t do a whole lot more than store a video and an audio track. It doesn’t even support variable frame speed; each AVI file has an explicit frame rate, and the only way to include content running at a lower frame rate is just to have “dropped frames,” leaving the encoded frames with a duration that is a multiple of the file’s frame rate.

But that simplicity has meant that third parties could easily develop reasonably complete support, far more easily than a similarly complete parser for MPEG-4 or ASF. This was the genesis of DivX/Xvid, and why AVI files are broadly supported across platforms and players. However, those different implementations can vary in the details; stuff like VBR audio doesn’t always work reliably.

But once you want to do interesting things like multiple audio tracks or captioning, AVI can draw up short.

AVI Versions

The original implementation of AVI was limited to 1 GB in size, which seemed laughably high back when it was created: CD-ROM was limited to 650 MB and external hard drives were smaller yet.

That quickly became a big limitation for content authoring and capture applications, so a group of companies called OpenDML, led by Matrox, created extension to allow arbitrary sized AVI files. These are generally called “OpenDML,” “AVI 2.0,” or “New AVI.”

It’s been around since 1995, and so there’s no reason to use the old AVI type anymore.

In-Box AVI Video Codecs of Note

As AVI has been deprecated for years, there aren’t many built-in encoders in Win 7, and they’re largely ancient ones.

Generally, AVI should be used for content authoring, not content delivery. “DivX/Xvid” content with MPEG-4 part 2 in AVI is the main remaining use of AVI for content these days.

Uncompressed

There are quite a few different uncompressed video “codecs” in AVI and the media pipelines in Windows. They’re obviously not good for delivery, but are quite handy for authoring. Each has its own four character code (4CC), and often there are different 4CCs indicating the same sampling but different arrangements of samples. That’s largely transparent, handled under the hood. In general, codecs internally use the “planar” variants, where the image is stored as separate series of Y′, Cb, and Cr samples, instead of the “packed” formats where Y′, Cb, and Cr samples for each pixel are stored together. It rarely matters for compatibility, but you can get a tiny performance improvement by using the planar versions as source for encoding.

But if you’re using a tool like VirtualDub, it’s good to make sure that you’re making the color space conversions you want, for optimum speed and quality, and not forcing an unneeded trip through RGB.

RGB

This is good old 8-bit per channel RGB. There are also 15-bit (5-bit per channel and 16-bit (6-bit G and 5-bit R and B) modes, but those aren’t used by any modern codec.

RGBA

This is 8-bit RGB with an 8-bit alpha channel (and so 32-bit per pixel). It’s only needed if you actually have an alpha channel.

YV12/NV12/IYUV (4:2:0)

Most delivery codecs are 4:2:0, so that’s what you want to deliver to. YV12 is the planar version.

Note that the “Intel IYUV” implementation that ships with Windows is limited to standard-def frame sizes, and fails at HD frame sizes. In the case of VirtualDub, it’s better to use “Uncompressed” and specify YV12. Most AVISynth scripts end with either ConvertToYV12 for progressive or ConvertToYUY2 for interlaced.

UYVY/YUY2/YV16/HDYC (4:2:2)

These are the various 4:2:2 codecs. YUY2 is the planar version most used by codecs.

For historical reasons, a lot of video hardware wants to get video in 4:2:2 even though most codecs are 4:2:0. Of course, 4:2:2 is very important when doing interlaced content, so that chroma samples don’t get subsampled across fields.

V210 (4:2:2 10-bit)

V210 is a 10-bit 4:2:2 codec. Mainly used for authoring, of course. Authoring codecs like Cineform use V210 with 10-bit 4:2:2 sources.

Microsoft Video 1

Microsoft Video 1 is the AVI equivalent to the QuickTime “Video” codec; an ancient, horrible, 16-bit pre-Cinepak codec without any rate control. I mention it only so you don’t use it. I’m kind of amazed that Windows 7 still includes the encoder for it.

Microsoft RLE

This one’s an ancient Run-Length Encoded codec, not useful for anything.

Cinepak

Cinepak was the first decent CD-ROM codec, but there’s been no reason to use it since the 1990s. Still, it’s a lot better than Video 1. See Figure 27.3.

Figure 27.3 The Cinepak configuration dialog has to be the oldest dialog box in Windows 7; Radius hasn’t existed for over a decade. I actually used the Black & White mode to good effect on several projects.

image

DV25

Windows has included a DV25 decoder since DirectShow was introduced. It supports .dv and .mov wrappers as well as .avi for easy interoperability.

In-Box Audio Codecs of Note

PCM

Uncompressed is obviously your go-to codec for authoring purposes.

VfW was stereo only and only went up to 48 KHz 16-bit. DirectShow and Media Foundation apps can author multichannel 96 KHz up to 32-bit float and so on. Just don’t try to play them in VirtualDub.

Transporting multichannel audio in an audio-only AVI file can be more compatible than in WAV.

MP3

Windows has included a MP3 decoder for AVI since well before Windows 2000. It’s the best in-box audio codec for compression efficiency.

The MP3 encoder in older versions of Windows was quite limited, and only went up to 56 Kbps (not nearly enough for transparency). The decoder has always been full-featured, though, and so third-party encoders like those from DivX used MP3 as the default audio codec.

a-Law/u-Law

These are ancient 8 KHz telephony codecs. Don’t use.

IMA/Microsoft ADPCM

These are old 4:1 compressed CD-ROM audio codecs. Revolutionary in the day (we could do 16-bit 44.1 KHz on CD-ROM!), but a lot more lossy than even MP3. Don’t use.

Third-party AVI Codecs of Note

Since Microsoft turned its attention to Windows Media, AVI codec development has largely been done by third parties.

Cineform

Cineform is a great, visually lossless wavelet-based authoring codec. I use it heavily, as it offers a master as good as lossless for recompression. It also has great integration with Adobe Premiere Pro, leveraging wavelet subbands for high performance scrubbing and effects previews.

You’ll have to pay a fee to use the encoders, but there’s free decoders for download. In a bonus, the Mac decoder can read AVI, so a Cineform AVI file is a great mastering format for cross-platform workflows.

The encoders are fast enough for real-time HD capture on a fast computer.

Huffyuv

Huffyuv is an open-source AVI encoder decoder. It’s lossless RGB or 4:2:2, using Huffman coding. Unlike many other lossless codecs, it’s quite fast to encode and decode, with real-time capture quite possible.

Its biggest limitation is the lack of a proper 4:2:0 mode…

Lagarith

…which was nicely addressed by the Huffyuv-based Lagarith. Lagarith added arithmetic coding for better compression efficiency, and a native 4:2:0 mode.

While arithmetic coding is a lot slower than Huffman encoding, Lagarith is multithreaded for encode and decode, so real-world performance can still be pretty good, although real-time capture can be a stretch.

When I need really mathematically lossless 4:2:0, Lagarith has been my codec of choice for years.

DivX/Xvid

As described in detail in Chapter 12, DivX was born of a hack to enable the MS MPEG-4 ASF codec to work in AVI files. This lead to the whole DivX ecosystem, and the xvid open-source implementation thereof.

MPEG-4 part 2 isn’t supported in Windows out of the box before Win 7, but decoders are readily available.

Hardware-specific codecs

Vendors like Black Magic Design and AJA provide their own AVI encoders/decoders for their capture cards. These are generally free downloads for easy interoperability.

WAV

The .WAV file format is the audio-only equivalent to AVI. Audio-only AVI files are certainly possible, and are more reliably supported for multichannel audio.

WAV uses the same set of codecs available for AVI on the system. In practice, that means WAV is mainly PCM, since the only good in-box audio codec for WAV is MP3, in which case you might as well make a .mp3.

Multichannel audio support in WAV can be fragile, with no support in VfW products or pre-XP versions of Windows.

Windows Media

The primary media format for Windows has long been, of course, Windows Media. Because at least WMP 9 is installed on nearly all Windows machines, that makes WMV an easy choice for content that will offer good quality and efficiency with high compatibility across Windows computers. That won’t change until Windows 7 is the baseline version, as XP is today. Windows Media is covered in depth in Chapter 16.

Windows 7 finally has the integrated bob deinterlacer on for interlaced VC-1 files, making it possible for viewers to watch interlaced content without having to tell them to set a registry key.

DVR-MS

DVR-MS is the internal recording format of Media Center on XP and Vista. It’s mainly used to store the native MPEG-2 and AC-3 bitstreams of an ATSC or other unencrypted digital broadcast. WMP and the Zune client can automatically transcode from DVR-MS to portable devices.

It wasn’t easy to use outside of Media Center before Vista, but it’s supported in DirectShow and Media Foundation apps on Vista and Windows 7.

MPEG-1

MPEG-1 playback was added way back in a service release for Windows 95. It’s a fine implementation, and is the easiest-to-author format still compatible with ancient machines, as the Windows Media encoders from that era aren’t available in recent Windows versions. The pre WMP 9 versions didn’t handle aspect ratio correction for non-square pixels, so encode as square pixels to make sure the video displays the same on all platforms.

MPEG-2

Windows didn’t include its own general purpose MPEG-2 decoder before Windows 7. The Home Premium and Ultimate editions of Vista included a MPEG-2 decoder for DVD playback, it wasn’t available to other apps, or even for WMP to play back MPEG-2 files.

PCs sold with DVD drives from Win98 on would include a DirectShow MPEG-2 decoder. There have been many vendors and versions of these, but most allowed the decoder to be used in any app, and supported file-based playback as well.

Windows 7 has built-in MPEG-2 program and transport stream demuxers, including HDV, and a good MPEG-2 decoder. The Win 7 bob deinterlacer is probably most useful here, given all the 30i MPEG-2 in the wild.

MPEG-4

While Windows Media Player 7 included an ISO MPEG-4 Part 2 compliant Simple Profile video decoder, it didn’t support the MPEG-4 file format or AAC audio.

Windows 7 is the first version with out-of-box support for .mp4. Making up for lost time, it’s extremely complete, including a wide range of features:

•  Part 2 Simple and Advanced Profiles

•  H.264 through High Profile, including interlaced

•  No Level restriction (although it’ll be limited by what the decoder can do; most Windows 7 systems will have DXVA decoding)

•  AAC inc luding multichannel and HE v1 and v2

•  AVCHD

•  In-transport stream, .mp4, .mov, 3 GP, and AVI files

Tutorial: Preprocessed AVI Intermediate

All of our WMV-related tutorials apply to Windows, and for Windows 7, most of the other formats do as well. So I’m going to show a workflow demo of using VirtualDub to make a lossless preprocessed intermediate.

Scenario

We are writing a book about video compression, and need to make a variety of tutorials for mobile device playback. We have a nice rights-cleared 720 × 480i30 4:3 4:3 anamorphic source file we want to turn into a preprocessed 640 × 480 square-pixel progressive AVI file for use in those tutorials.

The source doesn’t have a sound track need to add some audio as well.

Three Questions

What Is My Content?

The source file is an uncompressed 4:2:2 AVI without audio. It’s made up of a variety of different source clips, which have different horizontal blanking.

So we’re going to insert a rights-cleared audio file as well.

Who Is My Audience?

Me!

What Are My Communication Goals?

I want an interesting-to-encode 640 × 480 square pixel progressive source file that’s easy to use in any of my Windows-based compression tools. It should be lossless YV12 so that all encoders will get the exact same source samples, hopefully eliminating any color space conversions that would keep tests from being apples-to-apples.

And it should also have some interesting-to-encode audio.

Tech Specs

Pretty straightforward. I need to take my source, crop out any blanking around the edges, deinterlace it nicely, and scale it to 640 480. We need to add that audio source next. We then will save it to the Lagarith codec, in its lossless YV12, with the audio.

VirtualDub

Because we’re writing an overdue compression book at 4:43 AM, this will need to be quick and hopefully not dirty. This is a job for VirtualDub.

We have the current version installed, with the old and trusty Smart Deinterlace 2.8 beta 1 deinterlacer.

Opening and scrubbing through our source, we see it’s quite interlaced. We open that up in Smart Deinterlacer with default settings, and scrub through the clip. There are definitely some places where motion is leaking through without being deinterlaced. This can be a little tricky, as the scenes vary so much, but we tweak knobs until we find something that works reasonably well everywhere. Thank goodness for Smart Deinterlace’s “show motion areas” which only shows the parts of the image flagged as motion. Then you can tweak settings until it’s not letting interlaced stuff through, but leaving the static parts of the frame alone. See Figure 27.4.

Figure 27.4 (A) Our final deinterlacing settings are doing a good job of discriminating the moving ping – pong elf – man from the background. (B) Quite a decent job, even with this really high – motion frame. The final settings.

image

Next up is cropping and scaling. The easiest way to do it in VDub is to set the output frame size first, and then apply the Crop settings to that. Scaling is easy: 640 480. We’ll go with Lanczos; detail can be a good thing for codec testing.

As it’s a mix of sources, we have lots of sections without any horizontal blanking at all, but one sequence with very wide blanking on the left side. Cropping out that much doesn’t seem to obviously distort any other scenes, so we’ll call that good. See Figure 27.5.

Figure 27.5 (A) While we can crop from the Resize dialog, the mode from the Filter dialog is easier. (B) Our ping – pong shot has a little subtle blanking on the edges, but it’d be easily masked with a standard 8/8 left/right crop. (C) Our cheerleaders are another matter entirely, and require almost 3x the standard crop. (D) I really quite like this cropping dialog. I get my scrub control, and can switch between visual and numeric entry modes.

image

Now encoding settings (Figure 27.6). Lagarith at YV12, easy enough. We’re in a hurry, so multithreaded, of course. Null Frames are just frames dropped because they’re exact repeats. We’ll leave it on, but there are probably no frames like that here.

Figure 27.6 Our Lagarith settings.

image

The next step is easy to forget: make sure that your output color space is set to what you want; it defaults to RGB (Figure 27.7). We want YV12. That gives us a 4:2:2 to 4:2:0 conversion, so later encodes won’t need any color space conversion for faster processing and higher fidelity.

Adding audio is trivial: Audio > Audio from a rights-cleared music file. Our audio is longer than video; VDub just cuts it off after the last frame of video. Not an elegant fade-out, but that doesn’t matter for codec testing.

And that’s it. It’s just about a real-time export on a decent machine.

Figure 27.7 The #1 VirtualDub mistake is forgetting to set your output color space correctly. It always defaults to RGB.

image

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.15.122.235