CHAPTER 28
QuickTime and Mac OS

Introduction to Mac

Although I’ve worked for Microsoft for nearly four years now, I spent the first couple decades of my computing life as primarily a Mac user and fan.

Apple nailed a GUI usable for creative long apps before anyone else, first winning the print market with PageMaker (on top of QuickDraw) and then video with Avid, Media 100, and Premiere (on top of QuickTime). So many of us who grew up in the early days of the industry grew up with the Mac, since it was the first platform to pioneer so many features. Apple and the Mac are in a resurgence today despite, or perhaps because of, the fact that the differences now are much smaller. Macs and Windows machines were long different ecosystems, with different processor families, accessories, display connectors, hard drive formats, media architectures, and so on, and really seemed like different worlds. But now that Macs and Windows run on the same underlying Intel-architecture PC platforms, software has papered over the gaps.

History of the Mac as a Media Platform

Birth of the Mac

The first Mac launched in 1984. There was no video possible with its 512 × 384 1-bit display (each pixel was either black or white). And the specs would make a toaster laugh today: 8 MHz Motorola 68000 processor (first of the 68 K series) with 128 KB of RAM. But its GUI revolutionized graphics with MacPaint and PageMaker, using the very fast and capable QuickDraw API.

Another less-noted Mac innovation appeared in that first model: built-in sound. It was only 8-bit mono at 22.25 KHz (Macs used oddball sample rates for many years). This was nearly a decade before all new PCs shipped with sound at least as good.

Macintosh II

The Macintosh II in 1987 broke the Mac out of the sealed box, switching to a much faster processor, adding NuBus expansion slots, and support for color. And that made it possible to deliver true 24-bit color graphics for around $10,000, a great deal compared to the proprietary UNIX workstations then otherwise required.

The Mac II was the first true PC workstation, established the basic architecture of the Mac for the next decade, and launched desktop content creation.

And as prices dropped, a color display became an assumed feature of a Mac well before that was true of Windows, leading to much early use of Macs as kiosks.

Formation of Avid, Digidesign, and Radius

With the Mac II and its successors, the Mac supported hardware cards for display, capture, and storage. The OS was quite lightweight, so media apps could directly control hardware, while still leveraging the Mac GUI. The add-ons typically cost many times the cost of the Mac itself.

Thus, we saw early digital video and audio companies targeting the Mac chassis in the second half of the 1980s, with products that defined the Non-linear editor and Digital Audio Workstation.

Avid and Digidesign are still well known, but Radius and SuperMac (which Radius acquired) had arguably the most innovative legacy of all. Founded by key veterans of the Mac development team, they pioneered some critical technologies:

•  Multiple-monitor displays (Radius Full Page Display)

•  CPU add-on cards (Radius Accelerator)

•  CD-ROM video (Cinepak)

•  DSP-assisted media processing

•  Blade server (Radius Rocket—a NuBus card that was a complete Mac. Worked better in theory than in practice)

•  First video capture card (SuperMac’s Video Spigot)

•  First interlaced video capture card (VideoVision Studio)

•  First component video capture card (Telecast)

I’ve got a fond place in my heart for Radius. My first capture card was the VideoVision, and my first good capture card was the Telecast (later VideoVision SP) with component analog video. My future wife was capturing video on a Mac even before that, and remains the only known person to make money with a Video Spigot.

Macromind Director

Adobe’s Director started as Macromind VideoWorks in 1985, and became Director 1.0 in 1989. It quickly became a leading authoring environment for kiosk and CD-ROM titles.

Director was Mac-only until version 4 (1994), although 3 (1992) added the ability to make players for Windows.

System 7

Good OS color integration didn’t come until System 7 in 1991. This was a big upgrade, adding better multitasking and making media apps possible. Mac OS was based on the System 7 architecture until Mac OS X shipped a decade later.

QuickTime 1.0

QuickTime launched as an upgrade to System 7 in 1991. While primitive today, it was for many of us the first time we saw video on a computer screen (Video for Windows was a year away).

QuickTime 1.6 in 1993 introduced the Apple Compact Video codec (later renamed Cinepak, and actually from Radius). CD-ROM video was finally possible.

Products like Director quickly adopted QuickTime for their video playback.

The Multimedia Mac

By the early 1990s, desktop Macs evolved into being good multimedia machines out of the box, far ahead of the average business-configured Windows PC. They all included at least 8-bit 22 KHz stereo audio and color displays.

The modern multimedia machine was really established with the Quadra/Centris AV line in 1993, which had 16-bit stereo up to 48 KHz standard, enough VRAM to run 1024 768 at least to 16-bit color, and a 2x CD-ROM drive.

QuickTime 2

QuickTime 2 came out in 1994 for Mac and Windows (though playback-only on Windows). 2.1 in 1995 added IMA Audio, and made the CD-ROM world safe for 16-bit audio.

PowerPC Switch

In the early 1990s the Motorola 68 K architecture that all Macs had used to date was falling behind Intel’s 486 and Pentium processors.

Without a clear direction to enhance 68 K, Apple, IBM, and Motorola formed the AIM alliance with ambitious plans for a next generation processor, platform, and operating system to compete against Windows and Intel. After tremendous effort, only the PowerPC processor came to market, which was adopted for all new Macs.

It did deliver on its performance goals for a time, being much faster than 68 K and often faster than the best x86 processor in any given year.

The Birth and Death of Mac Clones

As part of PowerPC adoption plans from 1995–1997 Apple licensed the Mac OS to other companies to build Mac models, notably including PowerPC manufacturer Motorola, graphics vendor Radius, and Power Computing, which invented the build-to-order model. However, these machines tended to be better than Apple’s on the high end or cheaper than Apple’s on the low end, so Apple’s sales plummeted. My company went from buying all Apple to buying mostly clones during that period.

When Steve Jobs returned to Apple, he cancelled all licensing, and purchased Power Computing.

QuickTime 2.5 and QuickTime Media Layer

QuickTime 2.5 was a Mac-only release in 1996 that introduced the QuickTime Media Layer, letting QuickTime be a runtime for richer interactive experiences. It also included a MPEG-1 decoder for PowerMacs.

QuickTime v3

QuickTime 3 was announced and won the best of show at NAB 1997 (the National Association of Broadcasters show), finally promising full authoring as well as playback on Windows. However, that proved much harder than anticipated due to Mac OS dependencies, and a development reboot involved porting much of the Mac OS API inside of QuickTime for the Windows version.

It finally shipped in 1998 (again winning best of show at NAB). QT3 also introduced the Sorenson Video and QDesign Music codecs, making QuickTime an excellent platform for progressive download.

The streaming wars between Windows Media, RealMedia, and QuickTime started heating up in the late 1990s.

QuickTime Enters the Streaming Wars

CD-ROMs were platform-specific, so the playback architecture wasn’t seen as that critical. But web video launched during the explosive growth of the web industry, and so was considered highly strategic by Apple and Microsoft. Both entered a heated three-way competition with RealNetworks to become the dominant technology.

In this period, Apple introduced web video and then streaming support, and the innovative QuickTime Media Layer. In the end, Apple lost its focus on web video amongst its attempts to survive, and the QuickTime team was refocused on building a platform for Apple’s consumer and professional media apps to run on.

It’s only in the last few years that Flash and Silverlight have been able to deliver on what QuickTime Interactive was doing more than a decade earlier.

Mac OS X Begins and Steve Jobs Returns

The System 7 lineage had obviously ran out of steam by the early 1990s. It didn’t have good support for multitasking or multithreading; a bug in any app could freeze the entire machine. Apple had spent the better part of decade trying to find their own Windows NT–like way to a modern OS compatible with existing hardware and software.

When in-house efforts foundered, they went shopping for alternatives, and were enamored with NeXT’s OpenStep, a UNIX-based object-oriented OS popular in high-end finance and industrial-grade web sites. It was designed with good hardware abstraction, supporting four different CPU types. And NeXT’s founder and CEO was Apple’s own co-founder and Mac leader Steve Jobs.

And Apple bought NeXT, in order to use OpenStep. A common joke is that Apple though they were buying NeXT, but NeXT really bought Apple. I don’t think any stockholder would complain, though.

The G3 Era and the PC Convergence

The G3 processor was originally designed as a low-power consumer CPU, but it wound up being faster than “high-end” PowerPC and x86 processors, even with dual-processors, so Apple used it across their line.

Macs had long been very different architecturally from Wintel machines, using a different expansion bus (NuBus instead of PCI and AGP), drive interface (SCSI instead of ATA), networking port (AAUI instead of 10Base-T), and even VGA plug. These differences meant that Mac peripherals were often much more expensive than PC equivalents built for a much larger market. And Macs were equally infamous for a user-hostile case design that turned simple RAM or drive upgrades into an hour of cursing and bleeding knuckles.

The canonical “Blue and White” G3 model was the first model fully designed after Jobs’s return to Apple, and it adopted PC-standard technologies like PCI, ATA, 100Base-T, and USB. It also had a very easy-to-service case design that has evolved into the Mac Pro of today.

QuickTime 4: Streaming and The Phantom Menace

In 1999, the web release of the Star Wars: Phantom Menace trailer got 6.4 million downloads (2.2 million QuickTime 4 beta downloads), an incredible number, given the limitations of the early web. It also showed that web-distributed video that requires no apologies was possible; it truly was better-than-VHS quality.

QuickTime 4 itself added real streaming support with the first RTSP implementation, as well as MP3 audio playback.

In an early Apple foray into open source, the source code for the QuickTime Streaming Server was released as the Darwin Streaming Server.

This probably reflected Apple deciding to abdicate the streaming wars and focus QuickTime engineering on providing a better base for Apple’s pro and consumer media apps.

Final Cut Pro

Final Cut began life as a Macromedia project led by Randy Ubillos, creator of Premiere. Its architecture was designed to address the then-limitations of Premiere for extensibility and professional applications. While nominally cross-platform, due to Apple’s problems in the era, the focus was on Windows development. After Macromedia went into hard times during the CD-ROM to web transition, they sold it to Apple.

Media 100 was on the verge Final Cut for their Windows editor when Apple swept in; it would be a different content creation world if that had happened!

Final Cut started as mainly a DV editor, but grew over the next few versions into a capable HD professional tool. It was also joined by a variety of other video tools like Motion, Compressor, and DVD Studio into the Final Cut Studio suite.

Final Cut was critical to Apple’s sale of high-end, high-profit Mac desktops and laptops to video editors.

QuickTime 5

QuickTime 5 came out in 2001, and was one of the shortest-lived major versions. It introduced codec downloading (later removed from QT7), improved QTVR, and added MPEG-1 for Windows, and in 5.0.2 the Sorenson Video 3 codec, which remained the best in QuickTime until H.264.

For the content creator, the DV decoder in QT5 was finally good enough in quality and performance that there was no need for third-party replacements anymore.

The G4 Era

Motorola’s G4 processor introduced AltiVec, a SIMD architecture à la SSE that offered much better performance for media processing when software was tuned for it; most media apps were, and saw big gains (4–8x faster wasn’t uncommon for particular filters).

A “Graphite” G4 in 2001 was the first PC with a built-in DVD-R, boosting the early popularity of DVD Studio Pro.

However, with Windows XP in 2001, the Mac lost any advantage in multimedia playback, as XP introduced multichannel audio and GPU accelerated decode.

The G4 rapidly hit a wall, and was stuck at 500 MHz from 1999–2001, while Intel went from the Pentium 3 600 MHz to the Pentium 4 at 1500 MHz.

From that point on, Windows PCs were always a lot faster than PowerPC Macs for general computing, with an even bigger price/performance lead. AltiVec-optimized apps gave the Mac a good niche until Intel’s SSE3 P4s developed an unassailable lead even for media processing in 2004. Long-time Mac apps like Photoshop and After Effects were clearly faster on Windows.

QuickTime 6 and MPEG-4

After many years of primarily using proprietary codecs from Sorenson Media and QDesign, QuickTime 6 in 2002 embraced MPEG-4 in a big way, reading and writing .mp4 files, the MPEG-4 part 2 video codec, and AAC-LC audio.

This became central to both development of QuickTime and Apple’s device strategy, as the iPod and now the iPhone use only standard video and audio codecs. 3 GP phone support was also added and substantially updated in the various point releases.

QuickTime 6.3 was the last version available for Mac OS 9. QT6 was also notable for skipping Windows with 6.1 and 6.2, with 6.3 returning as cross platform.

Mac OS X, Finally for Real

Mac OS X took years after the NeXT acquisition to actually ship. And it never truly solved the compatibility problems that had bedeviled a new Mac OS for ages. It delivered hardware compatibility by waiting until it could declare every Mac designed before NeXT unsupported, and handled old software by emulating an entire old Mac in software. The underlying OS was much smoother and more stable, but until apps were recompiled as “native” the experience was quite clunky. OS X nominally shipped as 10.0 in 2001, but it wasn’t useful for much. 10.1 in 2001 was usable for office tasks, but didn’t even support DVD playback. A native version of Final Cut Pro 3 was released a few months after 10.1, but it took some time for other products to arrive. 10.2 from 2002 was the first with enough native apps to make it worth running for content creation, with DVD playback, and offered new fundamental features, particularly Quartz Extreme. Quartz Extreme offloaded GUI compositing to the GPU, saving CPU power and memory bandwidth, which were growing increasingly scarce on the G4.

The G5 Era

Apple introduced the G5 processor from IBM in 2003, which was a big leap in media performance. Beyond having much higher clock speeds, the G5 fixed the increasingly limited G4 memory bus, which hurt G4 performance at least as much as clock speed. The G5 was initially available as dual-processor, and then went to quad.

The G5 made HD in software on the Mac possible, and introduced the aluminum design Mac Pros use to this day.

People eagerly awaited the PowerBook equivalents.

The Device Revolution

Apple was transformed by the iPod in 2001 and again by the iPhone in 2007. While the Mac has regained market share in recent years, this pales in comparison with (and may be driven by) Apple’s incredible success with devices. The iPod was an industry game changer, dominating music players nearly to the degree Windows dominates personal computers. The iPhone is a relatively smaller fish in a much bigger sea, but is also very successful. In 2009, Apple has been selling about six iPods and two iPhones for every Mac.

The strong video features of those devices has played a big role in shaping our industry, and Apple’s device ecosystem has driven uniform licensing rules for media sales, DRM-free major-label music, and the very name of podcasting.

QuickTime 7 and H.264

QuickTime 7 introduced H.264 (one of the first technologies using it), and was the biggest architectural upgrade since QT3. It took much more advantage of OS X features, particularly leveraging the Core Graphics API for GPU scaling and compositing of video.

Apple’s H.264 implementation, while not the best on the market, was the first competitive encoder that Apple has created in-house since QuickTime 1.0. From Cinepak in QT 1.5 to Sorenson Video 3 in QT6, the best video codec had been a proprietary technology licensed from a third party.

Intel Switch

After a decade of backing the PowerPC, the final straw was the lack of a reasonable upgrade for the PowerBook, which was lagging further behind every year in performance. And the G5 itself wasn’t ramping up as promised. Steve Jobs infamously promised a 3 GHz G5 “within 12 months” after the G5 launch, but in three years it never made it past 2.7 GHz.

Intel had been patiently offering Apple help to switch to their processors for decodes, including a couple relatively complete ports of Mac OS before OS X. And of course Mac OS X was derived from the x86 compatible OpenStep, and Apple had maintained secret x86 versions of Mac OS X. And with Intel’s Core 2 in development with impressive performance and performance/watt, Apple pulled the trigger.

It went quite well. The one downside is that Classic didn’t make the transition, and so the new Macs were only able to run the OS X apps from the last few years; the oldest Mac apps that ran were now from 2001, not 1984.

In the cross-platform content industry, many of us long struggled with either lugging around two laptops to every industry event, or using slow software emulation like Virtual PC. The advent of Intel Macs with Boot Camp and virtualization meant that any Mac can be as good a Windows machine as any. The biggest hole in the Windows-on-Macs story is that Apple’s EULA (end-user license agreement) doesn’t allow virtualizing Mac OS from Windows—only the other way around.

This Intel switch culminated in Apple’s transition from an innovator in computer hardware to an innovator in computer design. While the MacBook and MacBook Pro are fine professional computers, they’re assembled from the same commodity parts as any Dell or HP, and aren’t available in nearly the same variety of configurations.

Reduced Focus on the Mac and Professional Content Creation

One downside to Apple’s device success is a lessened focus on the Mac, which is no longer the core of the company’s business (after decades as Apple Computer, in 2006 the company became just Apple).

This has been keenly felt in professional video and audio, where Final Cut Studio (FCS) got the relatively minor FCS 3 release more than two years after FCS 2, after nearly a decade of substantial annual releases. DVD Studio Pro has been on version 4 since 2005 (still with its then-innovative HD DVD support long after that format’s demise). One-time high-end effects tool Shake was cancelled after several years of slow development.

This may be a reflection of success as much as anything, however. With the return of Adobe and the recommitment of Avid to the Mac, Apple has less need for first-party products to drive high-end margin sales. The Mac will be a fixture in editing suites for a long time to come.

The Future: Snow Leopard and QuickTime X

Mac OS 10.6 (Show Leopard) is described by Apple as a smaller update in terms of features than past releases, with the focus on optimization, which isn’t a bad thing. As we enter the era of quad-core laptops, 16-core workstations, GPUs with 1000+ stream processors, an OS has dramatically different needs and opportunities for optimizations.

Apple’s also pulling the plug on PowerPC with 10.6. It will only be supported on Intel Macs, and so all those PowerBooks and G5s (every Mac before and many from 2006) are stuck at 10.5.

Of course, Apple being Apple, many touted 10.6 features are well-branded implementations of things that Windows and Linux have been doing for years. For example, OpenCL is another pixel-shader language similar to NVidia’s CUDA and Microsoft’s DirectX Compute APIs.

But from a media perspective, anything that helps make content creation, compression, or playback work better is a big deal for us. And now that we’ve wrung out much of the potential performance from SSE optimizations it’s multicore and GPU processing that are the low-hanging fruit.

There are significant enhancements in the core technologies that could make for improved media performance in Snow Leopard:

•  Expanded 64-bit support.

•  OpenCL pixel shaders for programmable GPU acceleration for recent NVidia and ATI graphics cards. While created by Apple and initially shipping in 10.6, it is in progress as a multivendor standard.

•  Grand Central Dispatch API for easier creation and tuning of multicore applications

QuickTime jumped from 7 to X in 10.6. Hopefully Apple will follow past practice and make it available for 10.5 and Windows, but haven’t made any indications. Given the scope of architectural changes, I can imagine that much of the work is 10.6-specific. Even if we see a QuickTime X for 10.5 and Windows, there could be significant differences, particularly for PowerPC Macs. Some highlighted features include:

•  General optimizations leading to a 2.8x faster QuickTime Player launch times (I hope we see that for Windows, where QuickTime Player can take 20–30 seconds to load).

•  HTTP Live Streaming, as described shortly.

•  A new QuickTime Player tuned for consumers, but less useful for content authors (more on that shortly as well).

•  GPU decode of H.264 (finally!), but only announced for the lowish-end NVidia 9400 M, which are in the default configs of all new-in-2009 consumer Mac models. But they’re not in any older Macs, or in Mac Pros or build-to-order MacBook Pros or iMacs with upgraded graphics. Hopefully Apple really requires a PureVideo HD VP3–compatible GPU (G98 core or later). The G98 adds only a few features on top of existing cards for accelerated decode; it seems a shame to have everything else fall back to software.

•  GPU scaling and compositing (which QT7 already did; it’s not clear what’s different/improved here).

•  Integrated ColorSync for accurate color space conversions. This is a big deal, since incorrect color space conversions have plagued QuickTime since Y′CbCr decode/encode modes were added to some but not all codecs in QT6. At least the Mac finally uses video-standard 2.2 gamma, abandoning their idiosyncratic 1.8 gamma after 22 years.

A personal history of my Macs

Say what you will, but I clearly remember every Apple computer I’ve owned personally or used as my primary work machine.

It’s interesting to note that my nearly two-year-old phone has a faster CPU (400 MHz), more RAM (128 MB) and storage (2 GB) than the first half of my Macs:

•  Apple //c: 1 MHz, 128 KB. I really wanted a Fat Mac, but there weren’t good programming tools at the time. I hacked a lot of AppleSoft Basic code at home through high school.

•  Mac Plus: 8 MHz, 1 MB RAM. For a summer job doing book layout in PageMaker 1.0.

•  Mac SE: 8 MHz, 1 MB, 20 MB HD. I got my first real Mac for college. I had a copy of Tetris, which I was pleased to find girls would come over to my dorm room to play.

•  Mac IIci: 25 MHz, 4 MB RAM. It was a launch unit, for use in a computer music class I was taking; it was actually faster than the college’s MicroVAX at the time. I ran Premiere and Director 1.0 on it, created my first animation and digital video, and wrote innumerable screenplays. It remained my main machine for five years.

•  PowerMac 8100, 80 MHz, 64 MB RAM. 20″ CRT monitor, Radius VideoVision capture card, and SledgeHammer 4 GB RAID. Our first actual compression workstation, and cost about $20 K with displays and capture gear. The defective BART4 chip in the first-generation PowerMacs left it incapable of keeping audio sync at reasonable bitrates.

•  PowerMac 7100, 66 MHz, 48 MB RAM. My home machine for many years. It didn’t do much media work, but it sure wrote a lot of articles and email about media, and played a lot of early Bungie games.

•  PowerMac 8500, 150 MHz. A compression workstation much faster than the 8100. I was out of the editing room by that point.

•  PowerBook 1400. 33 MHz, 16 MB RAM. My first laptop! Poor video bus couldn’t play video worth a darn; anything more than 320 × 240 turned into a slideshow.

•  PowerBook 3400. 240 MHz, 32 MB RAM. I talked our COO into trading for my 1400 to do better customer demos. It was much faster for almost everything except video playback.

•  PowerMac 8600, 2 × 250 MHz, 192 MB RAM. An 8600 upgraded with dual prococessors, my first dual-proc compression workstation. Easily encoded 4x more content a day than the system it replaced, my first lesson that skimping on hardware can be very expensive.

•  PowerMac 8500 with G3 upgrade card, 266 MHz, 128 MB. My old compression system was dumped as too slow for Photoshop, and came home to replace the 7100. Terran Interactive bought me a processor upgrade as thanks for writing a white paper.

•  PowerBook G3: My first Terran Interactive laptop, and my first one that could actually play full-screen video.

•  PowerMac G3: 300 MHz, And my first Bondi Blue compression station. It was single processor, but the G3 was so much better it was still faster than the dual 8600. With a Media 100 xs board.

•  PowerBook G3 (Bronze): 400 MHz. First with DVD decoding hardware and USB ports. Terran purchased this so I could do DVD playback demos. I won Unreal playing with just the trackpad. Still, my first PowerBook upgrade that wasn’t notably faster.

•  PowerMac G4 DP 2 × 450 MHz. A monster machine replacing the G3 as my main capture/edit/compression box. Dual processors, and the G4 added AltiVec SIMD that really helped optimized media apps.

•  PowerBook G4: 500 MHz. The first-generation Titanium enclosure. Beautiful, but with poor Wi-Fi reception and fragile; the screen snapped off three different times. The G4 was a good boost for media apps. First widescreen display (1152 × 768). So loud under load I’d have to hold a pillow over my head in the hotel for overnight trade show encodes.

•  PowerBook G4 800 MHz: Not that much faster in CPU, and a few more pixels (1280 × 854), but a much better graphics card, and most importantly, much quieter.

•  PowerMac G5 DP: The only Mac workstation I bought between Terran and Microsoft. Apple’s performance lagged a lot in the years the G5 launch and the Intel switch. The 23″ monitor I bought with it remains my secondary monitor.

•  PowerBook G4 17″ 1.333 GHz, 1 GB RAM. With the aluminum body popular for many years. It took some of the screenshots in this book. The 1440 × 900 display seemed huge at the time.

•  iMac G4 1.333 GHz. With the 20″ display on an arm. Remains the kitchen computer to the day. I can tell when a kid has left Safari open on a page with a Flash banner ad, as the fan goes crazy until I close it.

•  MacBook Pro 17″ 2 × 2.6 GHz: Microsoft buys me a Mac. The performance boost from the G4 1.33 Ghz to this was probably as big as from the first G3 PowerBook to the last G4. Has spent most of its life running Vista in Boot CampH.

Introduction to QuickTime

Apple’s QuickTime is the granddaddy of all the media architectures. The first public release of QuickTime 0.9 was back in 1991, when a top-of-the-line computer ran at 25 MHz. QuickTime’s impact and legacy are hard to overstate. It laid the foundation for the entire desktop video authoring industry, as well as the practical playback of video on desktop computers. And it’s done this with a remarkable degree of backward compatibility. All the files on the QuickTime v0.9 developer’s CD-ROM than ran on Mac OS 7 still play back today.

The legacy of QuickTime is increasingly found in MPEG-4 these days; the MPEG-4 file format is closely based on QuickTime. Apple itself has made MPEG-4 its primary file type, particularly on its devices like the iPhone and iPod. Part of this presumably was to offer a clean break from the past and not have to support the tremendous variety of codecs and features a .mov could contain.

Thus, while we’ll talk some about QuickTime as a delivery format, this chapter will mainly focus on it as a content creation and playback architecture, and as the primary media architecture on the Mac. Most content created for the Mac can and should be .mp4; there’s little reason to use .mov for content delivery anymore.

The QuickTime Format

The QuickTime file format was the basis of the MPEG-4 file format, and so will seem extremely familiar to anyone who’s used MPEG-4 much. The biggest difference is nomenclature; what’s called a “box” in MPEG-4 is called an “atom” in QuickTime. But they’re both the same concept as a unit of the file that can itself contain other units.

Beyond that, we have the same audio and video tracks, and even hint tracks for streaming, all derived from QuickTime for MPEG-4. However, most MPEG-4 players support only a subset of the theoretical features of the QuickTime or MPEG-4 formats. Since most of those features were defined by their implementation in QuickTime, it’s unsurprising that QuickTime provides the fullest implementation, with the notable exception of fragmented MPEG-4, which QuickTime doesn’t yet support.

One very useful format feature is reference movie. In a reference movie, all or parts of the media tracks can just be references to actual media tracks in other files. This way you can have multiple versions of the same content, all referencing the same source files, without having to duplicate that content. For example, you can trim the head and tail off a file, and instead of having to export a whole new file, you can save a reference movie that simply points to the original, unmodified file in a few KB. This is what Final Cut Pro exports when “Self-Contained” is not checked.

QuickTime Tracks

The basis of many of QuickTime’s unique features is the track. A track is a piece of media with a start time and end time. If it’s a visual track, it appears on the screen. Those visual tracks can be smaller than the movie, and can overlap each other including with transparency. Most track attributes can be changed by tween (for “in-between”) tracks, which interpolate values like size and shape over time.

Video

A video track is anything implemented as a “native” QuickTime codec. This includes everything you can select in an “Export to QuickTime Movie.”

Audio

Audio tracks are the same as video (Figure 28.1).

Figure 28.1 (A) and (B) The video and audio track structures are identical for .mov (28.1A) and .mp4 (28.1B).

image

Hint

A hint track contains instructions for the server on how to break up the video and audio bitstreams into packets (Figure 28.2). Their use is described later in this chapter. Note that they’re pretty big, and ignored when not streaming, so you shouldn’t have them for any file that will be delivered via progressive download.

Figure 28.2 Track structure of a hinted movie.

image

You can hint any track type. However, optimal results require a native packetizer for the particular codec, which allows the codec to guess at missing content; native packetizers are supported for all modern delivery codecs in QuickTime.

MPEG-1

When QuickTime imports simple file formats, like AVI, the content shows up as standard QuickTime audio and video tracks. MPEG-1 is a different case, as QuickTime didn’t natively support B-frames until QT7. So, instead of teaching QuickTime how to think like MPEG-1, MPEG-1 support was implemented as an file handler (sometimes called an import component). While QT7 added B-frame support, older codecs and components have largely not been updated to support it.

It wasn’t until QT7.6 that you could even export the audio from an MPEG-1 movie. QuickTime doesn’t have built-in MPEG-2 support (even though Macs have long included DVD playback). You can purchase a reasonably capable MPEG-2 playback component from Apple for $19.95. Note that it’s Main Profile only, and so can’t decode 4:2:2. Support for 4:2:2 MPEG-2 production formats are installed with Final Cut Pro. See Figure 28.3.

Figure 28.3 MPEG 1 is its own track type, with video and audio muxed together.

image

Text

A text track is simply a series of lines of text with a font, color, and location. Each line of text has a start and a stop time. Because many codecs don’t compress text legibly, text tracks provide a very high-quality, very low-bandwidth way to provide perfect text on the screen. Providing subtitles as text instead of including them as part of the video can provide a much better experience. And, of course, you can have different text tracks in different languages, without having to re-encode the video for each audience.

To author a text track, you can use a tool that supports making them, or write text files in a special format that QuickTime knows how to import. QuickTime Player doesn’t include authoring or editing text tracks, but can play them back with fonts, styling, and specified positioning. Here’s an example: {QTtext}{font:Geneva}{plain}{size:12}{textColor: 65535, 65535, 65535}{backColor: 0, 0, 0}{justify:center}{timeScale:600}{width:160}{height:48}{timeStamps:absolute}{language:0}{textEncoding:0}[00:00:00.000]Here is my first line of text[00:00:03.000]Make sure to include the timestamp before the text you want to use[00:00:04.000]Typing this into any old text editor isn’t that bad, but I’d rather use MagpieFigure: Magpie

Chapters in QuickTime are implemented as a special kind of text track. Most podcasting tools can set these correctly. See Figure 28.4.

Figure 28.4 You can click on the QuickTime chapter indicator and select a chapter to jump to.

image

Music

Music tracks use QuickTime as a MIDI synthesizer, telling it when and how to play a series of notes. QuickTime is pretty decent as a software synth, and typically requires less than 1 Kbps for a full score of music. You can import existing MIDI files into QuickTime to make a music track. QuickTime has pretty full support for modern MIDI, including being able to import sound fonts.

Music tracks haven’t seen significant use in recent years, but they were all too often embedded as a hidden autoplay movie on the early web, without any good way to turn them off.

QuickTime VR

QuickTime took part in the virtual reality craze of the mid-1990s. QuickTime VR (QTVR) debuted in QuickTime v2.5 as a component of the QuickTime Media Layer (along with the less-fortunate QuickDraw 3D). A QuickTime VR track enabled the viewer to navigate within and between panoramas.

QuickTime 5 introduced the long-awaited cubic panorama. Where the original cylindrical panoramas couldn’t allow the user to look too far up or down (no top to the cylinder), cubic panoramas model the scene by mapping the stitched images onto the six sides of a cube.

This cubic approach allows panoramas in which the viewer can actually look straight up and straight down.

There are a variety of tools from different vendors that allow you to “stitch” together a series of photographs into a panorama—a topic beyond the scope of this book.

Sprites

Sprite tracks had an inauspicious beginning in QuickTime v2.5. They were simply a means to make icon graphics move around the screen. With QuickTime Media Layer came Wired Sprites, which were programmable. They formed the basis for interactivity in QuickTime.

Unfortunately, Apple pulled back from interactivity in QuickTime around 2000, and no one has released any updates to tools for it for some years (Totally Hip’s LiveStage Pro led this market). It seems that each new release of QuickTime introduces features that limit the utility of what used to work, particularly with scripting and interactivity as opposed to simple animation.

Flash

Believe it or not, was possible to include a Flash .swf as a QuickTime track. QuickTime 4 added support for Flash 3, with Flash 4 in QuickTime 5 and Flash 5 in QuickTime 6. However, by the time QuickTime 7 rolled around, Flash was a media playback platform in its own right, competing with QuickTime. The Flash component hasn’t been updated, and was turned off by default for security reasons as of QuickTime 7.3.1.

Skins

A Skin track replaces the standard UI of QuickTime Player with a new, custom one. This is useful for creating branded experiences in the player. They haven’t seen significant use in ages Apple hasn’t even updated their web samples in more than seven years.

Delivering Files in QuickTime

QuickTime not only predates the web, it actually predates the CD-ROM as a standard feature in personal computers! However, the excellent work done on QuickTime’s original design has enabled it to evolve into the world of the web with relative ease.

QuickTime for CD-ROM

QuickTime was the dominant file format of the golden age of the CD-ROM (with AVI a close second). Macs dominated multimedia production back then, and Macromedia Director, the leading CD-ROM authoring tool, always had better native support for QuickTime than any other format. Back in the day, whenever a client allowed we’d always include the QuickTime for Windows installer on our cross-platform CD-ROM titles, since QT was more reliable than VfW. If we couldn’t do that, we’d use AVI files that QuickTime on Mac could play, like Cinepak + IMA.

CD-ROM doesn’t get much use anymore. MPEG-1 is probably the best cross-platform media format available on every Mac and Windows PC. Director 11.5 added MPEG-4 H.264 decodes as a built-in feature, so no OS decoders are needed anymore.

QuickTime for Progressive Download

QuickTime pioneered the progressive download model for web video, and it was used for most QuickTime web video even after v4 finally introduced RTSP. The lack of a bitrate switching MBR kept RTSP from being appropriate for long-form content; progressive almost always offered a better experience.

Every QuickTime movie has a movie header, an index to the locations of media in the file which the player requires to start playback. Because that structure can’t be known until the file is completely encoded, the movie header used to be appended to the end of the file. As CD-ROMs offer random access, this wasn’t a problem before the web. However, when doing progressive download via FTP or HTTP, the file is read front to back (this is pre-byterange requests). Because none of the file can be shown until the movie header is available, progressive download wouldn’t work. So, in QuickTime v2.1, Apple introduced the Fast Start movie, which is a fancy name for a movie with the header at the start of the file.

Fast Start is the default selection on “Export to QuickTime.” However, if you do any editing in QuickTime Player, even simple things like changing annotations, on Save the header gets moved to the back of the file again, and Fast Start no longer works. When this happens, you need to re-flatten the movie, either by doing a Save As, or by doing a batch flatten with a Apple-supplied AppleScript or other tool.

QuickTime 3 introduced the compressed movie header. Because there was a lot of redundancy in the header, Apple used traditional lossless compression to make it perhaps 80 percent smaller. While this didn’t reduce the size of the total file more than a few percent, it significantly shortened startup delay.

MPEG-4 in QuickTime

QuickTime v6 added integration of MPEG-4 within QuickTime. The MPEG-4 file format is based on QuickTime, which enabled Apple to do more than just add a MPEG-4 file handler. Instead, QuickTime treats an MPEG-4 file as a QuickTime movie. Thus, you can open an MPEG-4 file in QuickTime, and it will show up as a movie with MPEG-4 codecs in the media tracks. MPEG-4 codecs also show up as options inside the standard QuickTime export dialogs.

However, there are some differences between the QuickTime and MPEG-4 file formats, so a QuickTime file with the right codecs still isn’t a quite a legal MPEG-4 file. This requires a remux, QuickTime Player Pro can do this by selecting MPEG-4 in the Export dialog and setting video and audio to pass-through.

QuickTime before v7 didn’t natively support B-frames, which are part of Advanced Simple, thus QuickTime only supports Part 2 Simple Profile, not the more common ASP (regularly used in DivX/Xvid content).

The sprite and other interactive movie features of QuickTime are radically different than the BIFS system of MPEG-4. Native MPEG-4 interactivity never caught on anyway, so this hasn’t been a signicant issue of interoperability. The main places people still use .mov instead of .mp4 is to access those features.

QuickTime and Darwin Streaming Server support native MPEG-4 streaming as does QuickTime as a client.

Lastly, Apple’s QuickTime Broadcaster is a live broadcasting application that supports MPEG-4 as well as QuickTime streams.

QuickTime for RTSP

QuickTime 4 introduced RTSP and the hint track, which gives the server the information it needs to stream the file. QuickTime can hint anything that lives in a movie. However, codecs designed to be used in RTSP have native packetizers, which gives the server much better information for how to stream the file, and makes them more robust on lossy networks (see Table 28.1).

Table 28.1

Track typeNative packetizers
AudioSorenson Video v2, Sorenson Video v3, H.261, H.263, MPEG-4, H.264
VideoQDesign Music v2, Qualcomm PureVoice, AAC, CELP, AMR
OtherMPEG-1 files

You’ll almost always be using H.264 with AAC for desktop playback. The other options are mainly useful when targeting phones.

Apple provides the branded QuickTime Streaming Server for Mac OS X Server only. However, you can also download binary and source-code versions of Darwin Streaming Server for many operating systems. Darwin is the same server without the Mac GUI.

QuickTime for Live Broadcasting

QuickTime has facilities for live broadcasting similar to those of other formats.

Apple provides QuickTime Broadcaster for free—a live compression tool for Mac OS X. It can broadcast via both MPEG-4 and QuickTime. It is simple and functional, although without much depth.

Since QuickTime can consume standard MPEG-4 RTSP, most live encoding targeting QuickTime just uses live MPEG-4 products, which can offer much better quality and workflow than QT Broadcaster.

HTTP Live Streaming

QuickTime X and the iPhone recently added Apple’s adaptive streaming. It’s already implemented in the iPhone 3.0 software and QuickTime X, but Apple’s said nothing about Windows or AppleTV support.

Apple has made an informational submission to the Internet Engineering Task Force (IETF) documenting it in more detail. Contrary to some reports, that submission is not a proposal for standardization, but an informative document describing the implantation for interoperability—which isn’t to say that Apple might not go through a standards process down the road.

Apple’s documentation calls it “HTTP Live Streaming,” so I’ll call it AHLS for short. The name is a bit misleading, as it clearly supports on-demand playback as well.

The basic architecture of AHLS is like other adaptive streaming technologies, with video delivered in a series of small files. It uses MPEG-2 as the file format, which like fMP4 can be authored with byte ranges easily sliced into independent fragments. M2TS is also widely supported in existing live broadcast encoders, which could make for easy interoperability.

Apple’s documentation assumes that a separate stream segmenter will be responsible for converting the live transport or UDP stream in to individual .ts chunk files. The index (what’s called a manifest in Smooth Streaming) is “.m3u8” a modification of the standard .m3u MP3 playlist format. For live video, the index needs to get updated every chunk, which seems like a lot of overhead. Here’s a VOD sample they give: #EXTM3U#EXT-X-TARGETDURATION:10#EXTINF:10,http://media.example.com/segment1.ts#EXTINF:10,http://media.example.com/segment2.ts#EXTINF:10,http://media.example.com/segment3.ts#EXT-X-ENDLIST

A live stream won’t have the ENDLIST, and the last two entries are placeholders that can’t be sought into until the index is updated and the second-to-last chunk becomes third-to-last and thus seekable.

Only the range in the current index is seekable, pause and random access is limited by the number of items in the list. As chunks get shorter, the index gets bigger as it contains more entries to cover the same time period. Apple recommends 10 seconds as a good chunk duration, quite a bit longer than other adaptive streaming solutions. Their examples all give fixed cadence, but there no apparent reason why chunk duration couldn’t vary.

10-second chunks with suggested settings yields about a 30-second broadcast delay, quite a bit higher than other streaming technologies, adaptive and otherwise.

Multibitrate encoding is supported, of course. Apple says the current implementation has been tested for 100–1600 Kbps, appropriate to the iPhone. But many desktop users would be able to get much higher than 1600 Kbps. Apple hasn’t cracked the MBR audio problem, either, specifying that the same audio bitrate be used for all bitrate alternates.

Here’s a sample multibitrate playlist: #EXTM3U #EXT-X-STREAM-INF:PROGRAM-ID = 1,BANDWIDTH = 1280000 http://example.com/low.m3u8 #EXT-X-STREAM-INF:PROGRAM-ID = 1,BANDWIDTH = 2560000 http://example.com/mid.m3u8 #EXT-X-STREAM-INF:PROGRAM-ID = 1,BANDWIDTH = 7680000 http://example.com/hi.m3u8

The current supported codecs are as follows:

•  Video: H.264 Baseline Level 3.0

•  Audio:

•  HE-AAC or AAC-LC up to 48 kHz, stereo audio

•  MP3 (MPEG-1 Audio Layer 3) up to 48 kHz, stereo audio

Suggested settings for a three-stream encode are as follows:

•  H.264 Baseline 3.0 video

•  HE-AAC (version 1) stereo audio at 44.1 kHz

•  Low—96 Kbps video, 64 Kbps audio

•  Medium—256 Kbps video, 64 Kbps audio

•  High—800 Kbps video, 64 Kbps audio

Those make sense for the iPhone, basically mapping to 3G, and Wi-Fi.

And a few other things I’ve noted:

•  Audio can be delivered as a sequence of segmented .mp3 files.

•  The live encode can easily be archived and converted to VOD.

•  Encryption is supported, with references to the encryption key files in the index file. However, it’s not clear how they could prevent a network sniffer from grabbing the key on the local machine.

•  They have proxy caching in mind, explicitly warning against using https for the chunks as it isn’t cachable.

•  Inlet (Spinnaker 7000) and Envivio (4Caster C4) have both announced support for AHLS.

The Standard QuickTime Compression Dialog

The most basic way to encode video with QuickTime is through the standard Export dialog, supported in QuickTime Player Pro and many other applications. While this isn’t a professional encoder, there is a lot of stuff it can be used for, including exporting a file for later processing in another application. Other applications like Compressor, Squeeze, and Carbon access the same parameters through the QuickTime API.

The QuickTime dialog lets you specify your audio and video settings, size, and Internet streaming options. See Figure 28.5. For progressive and other non-streaming video, you’ll want Fast Start – Compressed Header. This makes the file slightly smaller, and a lot faster in startup time. For RTSP streaming, you’ll choose hint tracks. This automatically generates the hint tracks along with the audio and video files.

Figure 28.5 The QuickTime export dialog file type selector.

image

In the video option, you pick your codec and other settings. The Quality slider maps to QP, and doesn’t do anything in most codecs if data rate has been set. If you don’t specify a data rate, or are using a codec like the JPEG that doesn’t have a data rate control, the quality slider controls the quality of the video, and hence the file size.

There’s also a secret Temporal Quality slider in some interframe codecs, including H.264. You reach that by holding down Control and Alt (Mac) or Option (Win) in the dialog. Temporal Quality nominally controls the quality of non-keyframes. However, I don’t have any idea how well-tuned it is anymore; I may be the last person who remembers it. Setting it doesn’t appear to do anything for H.264 in current versions.

Frames Per Second should generally be Current, so the frame rate of your source is used.

Keyframe Every X refers to the minimum number of keyframes. A natural keyframe, which is normally inserted whenever there is a video cut, resets this counter, so the Keyframe Every value might not matter for video with rapid cuts.

The options button opens up the special features of the codec, if any. Recent codecs like H.264 offer everything in a single dialog (Figure 28.6).

Figure 28.6 QuickTime’s H.264 settings.

image

The Sound settings are a lot simpler. They let you choose your codec, sample rate, and channels. The Quality control determines the quality (and speed) of the resampling filter used when changing sample rates, channels and bit depths. See Figure 28.7.

Figure 28.7 QuickTime’s AAC – LC settings.

image

Detailed descriptions of the H.264 and AAC dialogs are found in Chapters 13 and 14.

QuickTime Alternate Movies

QuickTime has a Multiple Bit Rate system, called Alternates. Unlike MBR in Windows Media and RealMedia, QuickTime doesn’t bundle multiple versions of the data in a single file. Instead, authors create a number of different files, only one of which is played for a given user.

The traditional implementation didn’t support any kind of bitrate switching, only an initial selection as the file starts. So it’s not well suited to longer content where available bandwidth may vary during playback.

Master Movie

The key to alternates is the Master Movie. A master movie stores a list of the available files, the paths to them, and properties on which to base the decision of which to use.

Alternates Parameters

Each movie linked to the reference movie has a number of properties used to determine which file is played. Apple’s free MakeRefMovie utility (Figure 28.8) is the most common way to set these. There’s an older version for Windows, but only the Mac one is up to date. Back in QuickTime’s heyday, Cleaner was the dominant QuickTime compression tool, and had by far the most complete implementation of alternates.

Figure 28.8 MakeRefMovie, with the default output from “Export for web.” The GUI was actually better 10 years ago.

image

QuickTime selects the alternate with the highest “quality” value and bitrate that’s compatible with the player and at or below the current bitrate:

•  The clip is at the target data rate,

•  none are at the correct bandwidth, at the next lowest,

•  none are lower, those at the lowest available bandwidth,

•  play the file with the highest quality setting matching the specified parameters.

Connection

Connection speed required manual configuration by the user until QuickTime 7, limiting its utility dramatically. Most users never even knew it was there to be set. QuickTime 7 added a quick bandwidth measurement test, so the optimal connection speed can be picked based on current download speeds. It doesn’t have any ability to switch bitrates once playback starts, however, reducing its utility, particularly for real-time streaming, as content gets longer.

Language

This allows you to specify the language of the file. It should be left blank unless you’re doing multilingual video, but is very handy if you are.

Priority

This specifies the rank of priority for movie playback. If multiple files meet the requirements for playback, the highest priority clip will be played back.

CPU speed

This is somewhat poorly defined. The values seem to roughly correlate to the MHz clock speed of 604e processors, with a “5” around 500 MHz (and thus slower than an iPhone, let alone any Mac of the last decade). Just leave unspecified.

Mobility

This specifies a file as being iPhone-only, otherwise unspecified. Useful to make sure a file of too high a bitrate for an iPhone to decode doesn’t get sent to it regardless of bandwidth.

Authoring Alternates

Alternates are a pain to create compared to other formats. Each alternate is a unique encode, so targeting four bitrates means the file needs to be compressed four separate times.

QuickTime Player Pro does have one Export preset that makes a set of alternates. It’s very simple, with only three checkbox options and no configurations:

•  iPhone: H.264 1 Mbps

•  iPhone (Cellular): H.264 80 Kbps

•  Computer: H.264 up to 5 Mbps

See Figure 28.9.

Figure 28.9 The files that go with the MakeRefMovie demo.

image

I keep expecting Apple to do something with Alternates, but other than the surprise addition of “Export for web” and bandwidth measurement in QT7, this hasn’t been changed since the 1990s.

QuickTime Delivery Codecs

QuickTime has always had a broad selection of codecs for all kinds of tasks. I won’t try to give a complete list, instead focusing on the ones you’re most likely to encounter or want to use.

H.264

QuickTime supports decode of progressive scan H.264 in the Baseline, Main, and High 8-bit 4:2:0 profiles. H.264 is the best delivery video codec in QuickTime by a wide margin. Before QuickTime X, playback is software-only, so many systems may struggle to play a H.264 file in Mac OS X that the same machine booted into Windows wouldn’t break a sweat doing via DXVA hardware acceleration. QuickTime X added accelerated decode to some recent Mac models, although those were probably fast enough to decode in software anyway.

QuickTime’s H.264 encoding is a lot more limited than its decoder, and tuned to optimize decode performance more than compression efficiency. It is covered in full detail in the H.264 chapter.

The upside of the relatively simple encoder is that even relatively modest machines can play back QuickTime H.264 pretty well; bear in mind that Apple was still selling PowerPC G4 laptops when QuickTime 7 shipped. But achieving the same quality can take quite a bit higher bitrate than a well-tuned High Profile encoder. Of course, as QuickTime is High Profile–compatible, content targeting QuickTime can use other, more efficient encoders. Just make sure to test their output on the minimum target platform to make sure performance is adequate. CABAC seems particularly slow in QuickTime; I’ve seen it double-decode CPU load. CAVLC should be used for higher bitrate content targeting QuickTime.

If QuickTime-specific features are needed, QuickTime can open an H.264 .mp4 file and losslessly Save As to .mov.

Legacy Video Delivery Codecs

MPEG-4

Apple’s “MPEG-4” codec is MPEG-4 part 2 Simple Profile. It was introduced with QuickTime 6, before Apple fixed the architectural limitation preventing B-frames in codecs, and thus Apple only supported encoding and decoding Simple Profile. Even though QuickTime 7 fixed this issue (required for H.264), Apple hasn’t added fuller part 2 support. Thus QuickTime isn’t natively compatible with most DivX/Xvid content. Moreover, even though QuickTime can demux AVI, and decode MP3 and at least MPEG-4 SP, it won’t play an AVI file containing MPEG-4 SP and MP3, excluding most Xvid/DivX content it theoretically would be compatible with.

Figure 28.10 This error you get trying to open up .mp4 with MPEG – 4 part 2 Advanced Simple Profile lacks Apple’s legendary user – friendliness.

image

There are third-party plug-ins (Perian is a popular choice) that do decode part 2 ASP. However, users have to manually install those in advance. Otherwise they’re greeted with the singularly unhelpful “Error -2010: the movie contains some invalid data.”

With QuickTime 6, it sometimes made sense to use MPEG-4 SP for QuickTime-targeted cross-platform content, as older codecs (including the otherwise superior Sorenson Video 3) needed to be encoded with different gamma to look the same on Macs and Windows. But even then, it was preferable to use a higher-quality SP implementation like Squeeze or Compression Master’s (now Episode). As an encoder, Apple’s is a very limited 1-pass only with a lot lower efficiency than other implementations.

H.263

H.263 is a videoconferencing codec, and the basis of MPEG-4 part 2 (where generic H.263 is the “short header” profile). It’s only available by default in the “Movie to 3 G” export mode.

Sorenson Video 3

Long-hyped and long-delayed, Sorenson Video 3 (SV3) was the most-used QuickTime codec in the QuickTime 5 and 6 eras, only dethroned by QuickTime 7 and H.264. It was the last widely used proprietary codec in QuickTime.

QuickTime built in a basic implementation of the encoder, but professional content used the higher quality, faster, and vastly more flexible Pro version. Among other things, it could do 2-pass VBR via Squeeze or Cleaner.

Many pages of the first edition of this book lovingly covered the many fine features of SV3 Pro and its alpha channels, clever hack to enable B-frames, streaming robustness features, and deblocking filter.

But I can’t imagine why anyone would make new content with it. If you have a penchant to do so, it’s still included with Sorenson Squeeze.

Sorenson Video 1/2

The original Sorenson codec, used the same bitstream in both Sorenson Video 1 (SV1) and Sorenson Video 2 (SV2), was the leading codec for QuickTime 3 and 4.

While decent for a Clinton administration–era codec, it seems laughably primitive today. It’s notable as the last codec using the YUV-9 color space, where there is only one color sample per each 4 4 pixel block. This meant edges in colorful content looked hideously blocky.

Video

The Apple Video codec was in the original QuickTime 0.9 back in 1991. I include it here as a warning; picking “Video” to encode Video was an all-too-common mistake.

Video’s code name was Road Pizza (4CC is “rpza”). I imagine this was because it left the video a flattened, crushed mess vaguely reminiscent of the original.

It was designed for 25 MHz computers, and thus is a weird duck compared to modern codecs. It was quality-limited, not data rate–limited, and encoded in 5-bit-per-channel RGB (15-bit color). It had an extremely fast encoder and decoder, so it was used for doing comps and that sort of thing. But even DV and JPEG were fast enough a decade ago for real-time comps on a modern computer.

QuickTime Authoring Codecs

Unlike the other major web formats, QuickTime is as much a content creation platform as a delivery platform, and includes a lot of codecs designed for capturing and editing.

An authoring codec is one whose features make it useful for the acquisition, editing, and storage of full-quality content. An authoring codec’s output can be recompressed with minimal loss. Some codecs, like MPEG-2 and Apple Animation, can be used for both authoring and delivery when different settings are applied. A number of authoring codecs are installed with Final Cut Pro, and so indicated below. Sometimes Mac editors forget which codecs came with FCP, and send files using those to users without FCP and hence without the ability to decode them.

ProRes

ProRes is Apple’s big new authoring codec, introduced with Final Cut Studio 2 and upgraded with FCS3. It’s fast (at least on Mac), flexible, and high quality. In order of compression ratio, these are the ProRes modes:

•  ProRes 4444: Up to 4:4:4 12-bit RGB or Y′CbCr, with alpha. Useful for capturing and storing animation, film, and dual-link SDI content, but unnecessary overkill for 4:2:2 sources. Introduced with FCS 3.

•  ProRes 422 (HQ): The top quality of 4:2:2 mode. Fine to use for capture, multigenerational editing, archiving, and intermediates.

•  ProRes 422: Still visually lossless, but less mathematically accurate than HQ. Fine for an intermediate, but I’d prefer HQ as source for 10-bit image processing.

•  ProRes 422 (LT): A lighter-weight version meant for transcoding from 8-bit interframe sources like AVCHD. Shouldn’t be used in 10-bit workflows.

•  ProRes 422 (Proxy): Not mastering-quality, but allows CPU and storage efficient editing in full resolution for a later conform with the higher-quality sources.

Apple has a downloadable ProRes decoder for Windows, but no encoder, and so far lacks support for the newer 4444 mode.

DV/DVCPRO

QuickTime supports DV25 as a QuickTime codec, and can also open a raw DV files and AVI files containing DV as well.

By default, DV is shown in QuickTime in a preview-quality mode, only displaying a single field. If you want to see the file as it actually is, you’ll need to set the High Quality flag for the video track, as described in Chapter 6. Most compression tools will do that automatically for QuickTime sources.

DV25 is a fine acquisition format, but as we discussed in Chapter 5, its use of 4:1:1 makes it a poor choice as a production format, so you should never compress to it.

DVCPRO50 (via Final Cut)

When Final Cut Studio is installed, DV50 is available as a codec. It uses twice the bitrate of DV25 and 4:2:2, so it’s a fine intermediate codec

DVCPROHD (via Final Cut)

The DVCPROHD codecs (also called DV100) are HD variants of DV, and popular with Mac users due to their long heritage of excellent Final Cut support.

HDV (via Final Cut)

Also installed with Final Cut, QuickTime can decode to the standard HDV formats.

MPEG IMX (Final Cut)

IMX is an older Sony MPEG-2 I-frame-based format targeting tape.

XDCAM EX (Final Cut)

XDCAM EX is Sony’s current VBR long GOP capture format meant for flash memory.

Motion-JPEG

Motion-JPEG is the granddaddy authoring codec in QuickTime. Essentially, it takes JPEG, makes it 4:2:2 instead of 4:2:0, and offers both interlaced and progressive modes.

The M-JPEG A and B codecs were invented by Apple to provide interoperability between the many varieties of digital video capture hardware of the mid-1990s. Two main chipsets were used in all these systems, and Apple got the vendors to provide the information necessary to make universal file formats. The A and B flavors of M-JPEG are based around the chipsets used by different cards, but can be losslessly converted between.

Because JPEG is a long-time free standard, many non-Apple decoders can handle M-JPEG in .mov. The video mode of still cameras is often progressive M-JPEG.

Animation

Using Run-Length Encoding (RLE), Animation’s compression depends on long horizontal lines of identical pixels, and as many as possible identical lines between frames. Animation is lossless at the default 100 Quality; reducing the Quality slider for higher compression largely works by flattening out these lines, which can be pretty devastating to quality.

Animation does provide interframe compression, which some QuickTime transcoding products don’t work well with.

I find PNG to be a superior codec to store RGB content in, even though it’s I-frame only.

PNG

The PNG (pronounced “ping”) codec is just the Portable Network Graphics lossless RGB format as a QuickTime codec, which each frame a PNG file. PNG is always lossless, and offers much better compression efficiency than Animation for stuff not amenable to RLE.

PNG is ideal for intermediate files of RGB content like screen shots (almost all the graphics in this book were PNG files at one point in their life), and for interoperation with RGB-based applications like After Effects. It has alpha channel (“Millions+”) and grayscale (8-bit luma only) modes and a variety of less useful indexed color options.

In the Options dialog, you have different choices for compression efficiency versus speed. Although they’re not so labeled, they’re roughly in order of quality versus speed. I normally just park it on Best, which selects the most effect option for each pixel. It encodes quite a bit slower, but saves quite a bit of disc space.

None

The None codec is just uncompressed RGB, and thus very fast but very wasteful in bits.

QuickTime Audio Codecs

QuickTime, and the Mac OS before it, has had audio codecs since the late 1980s, well before other PCs even had sound by default.

AAC

QuickTime 6 introduced AAC-LC support as a standard audio codec. AAC-LC is easily the strongest audio codec in QuickTime 6 and 7. Its myriad rate control options are documented in Chapter 13.

QuickTime X adds encode and decode of HE AAC v1 and Low Delay modes.

AMR Narrowband

AMR is Apple’s implementation of the Adaptive Multi-Bitrate codec widely used in 3 GPP. It’s the best low-bitrate speech codec in QuickTime, targeting 4.75 to 12.2 Kbps in 8 KHz mono. It includes a VBR nice silence detection mode where it turns the bitrate down further when things are quiet.

Still, 8 KHz is low even for speech-only content; it’s not good for much outside of audioconferencing unless you need to get a lot of audio in a very small file.

Apple Lossless

Apple Lossless is, yep, a lossless audio codec. It’s fine for storing content, but won’t offer any predictable data rate. AAC-LC at 320 Kbps is going to sound just as good for any practical purpose, and most people can’t hear a difference between lossless and 192 Kbps AAC-LC.

iLBC

iLBC is the most recent QuickTime codec, an implementation of the royalty-free “Internet Low Bitrate Codec.” It’s used in other products like Google Talk and Yahoo! Messenger. It offers 8 KHz mono at 15.2 and 13.3 Kbps.

Legacy Audio Codecs

Like video, QuickTime’s long heritage includes a huge range of audio codecs, most of which shouldn’t be used anymore.

QDesign Music 2

QDesign Music 2 was an enhancement to the original QDesign Music codec that added RTSP support and improved quality. Originally introduced in 1999, it was the best streaming-compatible codec before QT6 introduced AAC. Quality was unprecedented for music at low bitrates, but it never did voice well (including singing), and had a pretty low quality ceiling irrespective of bitrate; it was much better than MP3 at 32 Kbps, and much worse at 128 Kbps.

There is a basic version of the QDesign encoder bundled with QuickTime, but it only goes up to 48 Kbps with awful quality. QDesign itself went out of business years ago, although they released a Mac OS X–compatible version of Pro on their way out.

QDesign Music

QDesign Music was the original implementation of QDesign, and offered lower quality yet and a much more complex Pro encoder version.

Qualcomm PureVoice

Qualcomm PureVoice is a QuickTime implementation of an early cell phone codec. As would be expected, it produces audio that sounds like a cell phone—speech is intelligible, anything else sounds lousy.

PureVoice is always 8 KHz mono and offers 13 and 7 Kbps rates.

MP3

QuickTime supports MP3 as an audio codec for playback, although hasn’t ever included an MP3 compressor directly.

MP3 soundtracks were popular for progressive download before QuickTime 6 and AAC, but AAC-LC outperforms it handily.

IMA

The IMA audio codec (created by the long-defunct Interactive Multimedia Association) was the dominant CD-ROM codec of the Cinepak era. It offered a straight 4:1 compression over uncompressed, so where a 44.1 kHz mono track would be 705.6 Kbps, an IMA version of that would be 176.4 Kbps. Audio quality was quite good at 44.1 kHz, but quickly degraded at lower data rates.

None

As with the other formats, QuickTime’s None mode is uncompressed PCM. As such, it’s mainly used for authoring. Ancient QuickTime files that predate IMA (before QuickTime v2.1) will use the None codec, either in 16-bit or (heaven forbid) 8-bit.

MACE

MACE (Macintosh Audio Compression and Expansion) 3:1 and 6:1 are 1980s-era speech codecs from Mac OS that predate even QuickTime. They’re natively 8-bit, and provide strikingly bad quality. Of course, they were optimized for real-time compression and playback on 8 MHz computers. Never use MACE for new content.

μ-Law

μ-Law, (that’s the Greek letter, pronounced “mew” or “moo” depending on whom you ask) is an old-school telephony codec, offering 2:1 compression, and hence 16-bit quality over 8-bit connections. It sounds okay for speech, but the poor compression efficiency makes it an unusual choice for any modern applications. It is also called “U-law” (for those who can’t easily type the “μ“character) and G.711.

A-Law

A-Law is quite similar to μ-Law, being a 2:1, 16-bit telephony codec. It is less commonly used, and offers no significant advantages.

QuickTime Import/Export Components

QuickTime has broad format extensibility via component structures that allow other formats to be read, played, and exported. All QuickTime apps then have access to installed components for reading and writing (as long as they expose it). There’s a wide variety of components out there, but there’s a few I wanted to highlight.

Flip4Mac

As discussed in more detail in the Windows Media chapter, Flip4Mac is a component that supports (depending on version) WMV playback, importing, and encoding from QuickTime for Mac.

The free version is playback only. But this fills an ancient need for PowerPoint users: a good video format that can be embedded in PowerPoint and play on Mac and Windows. Until now, MPEG-1 was the best (and weak) choice. But as long as Mac users install Flip4Mac and Mac Office 2008, all is good.

Perian

Perian is a Mac-only component porting much of ffmpeg into QuickTime. It includes support for the wide variety of ffmpeg-supported codecs, formats, and features like the following:

•  MKV

•  FLV

•  MPEG-4 ASP (with B-frames!)

•  Huffyuv

•  FRAPS

•  Ogg Vorbis

•  DTS

•  Dolby Digital

•  Subtitles

XiphQT

XiphQT is a Mac and Windows component supporting playback and export of the various Ogg codecs like Vorbis, FLAC, and Theora, described in Chapter 19. Perian can often play those as well, but can’t export. See Figure 28.11.

Figure 28.11 XiphQT’s Theora output settings.

image

Flash Encoding

For Flash 6–9, Adobe provided a QuickTime export component for FLV encoding, including basic VP6 support. Since it was installed for free along with Flash (including the trial version…), it was widely used by those who didn’t want to buy a full commercial tool or license.

On2 also sells their high-end exporter as a QuickTime component. Both are covered in Chapter 15.

QuickTime Authoring Tools

QuickTime has had a powerful, open API for a decade now, and unsurprisingly a very healthy third-party industry built up around using QuickTime as both an authoring and delivery technology.

However, Apple’s broader embrace of MPEG-4 has left many just using .mp4 for QuickTime and Mac playback instead of supporting .mov and any of its unique features.

These are all covered in much broader detail in Compression Tools.* (visit the companion website to view this bonus content).

QuickTime Player Pro

QuickTime Player Pro is the essential Swiss Army knife of QuickTime authoring, and cheap at $28.99. If you use QuickTime, the time you save with Pro will more than cover its cost.

From a QuickTime perspective, it’s at least as valuable for lossless tasks, like trimming and merging files, generating reference movies, importing image sequences, swapping out audio tracks, and remixing to other formats and with/without hint tracks. Note that the QuickTime Player in QuickTime X doesn’t provide the same features; QuickTime Pro 7 can be installed side-by-side, and is still needed for most authoring tasks.

Compressor

Apple’s Compressor is the most full-featured QuickTime encoder on the market. It sits on top of the same codec as any other QuickTime product, but offers more workflow management and much more advanced preprocessing.

Episode

Telestream’s Episode couples its own codecs with both .mp4 and .mov formats. For H.264 and part 2, this offers better quality with the same compatibility. Note that it doesn’t enforce any constraints on setting to make them QuickTime-compatible, so it’ll allow part 2 to be set to ASP and H.264 to turn on interlacing even when .mov is the output format.

Episode Pro also includes encoding to .mov with IMX and XDCAM payloads for FCP interoperability.

Sorenson Squeeze

Squeeze started out as a QuickTime-only encoder to deliver on the high-end functions of the Sorenson Video 3 Pro codec.

It still implements full support for the QuickTime API and its many codecs, and bundled Sorenson Video 3, but they’re no longer the focus of the tool.

While it doesn’t expose Sorenson’s superior MPEG-4 encoder while making .mov, it does allow MP3 as the audio track of .mp4 files.

ProCoder/Carbon

Carbon doesn’t have any special .mov support, but has a full implementation of the QuickTime API, including export components. It uses the FLV Export components for its FLV and 3 GP support.

QuickTime Player X

QuickTime X (Time Ten”) is scheduled to be released as part of Mac OS X 10.6 enabled by 10.6 to media playback update plans, if any. A core goal of QT X is to bring the media pipleline improvements created by. Plans for Windows, older versions of Mac OS, and PowerPC Macs (10.6 is Intel-only) are unknown.

The new QuickTime Player has some, but not all, Pro features. There are indications that QuickTime Player 7 Pro may still be required for more advanced editing and exporting features.

Public details are scant at this point, but it is known that QT X includes:

•  New clean UI for QuickTime Player

•  GPU accelerated video decode, finally, on NVidia 8400M GPUs

•  A new “HTTP live streaming” technology, presumably Apple’s adaptive streaming technology

•  Screen recording

•  Easy transcoding for devices and the web, including YouTube

•  Limited editing functionality like trimming (previously in QuickTime Pro only)

•  Accurate color using ColorSync (hopefully fixing the color space transocoding bugs)

Tutorial: Reference Movie from Image Sequence

The MPEG-4 related tutorials all make QuickTime-playable content, so I want to hit QuickTime as a workflow technology here.

Scenario

We’re at a high-end 3D animation studio, working on a short film we’re hoping will be up for an Oscar in a couple of years.

This being high-end animation, each frame can take the better part of an hour to render, and they are rendered out as individual DPX frames at 2048 × 1024 using 32-bit floating point. Animation is hard, so we want our master to be inarguably perfect.

However, these files are massive and hard to browse. We have a script that tracks the current version of each frame, makes a 1920 × 1080 PNG out of it, and stores them in a directory with the frame number in the file name.

Our marketing and invester relations folks are constantly bugging us about making one off demos file of where we are now to show to VIPs. It’s burning a bunch of animator time, so we want to make it easy for marketing’s video guy (just some kid intern who got the job because his uncle knows somebody, and with minimal chops) to do quick encodes to whatever he wants, be it DVD, MPEG-4, Blu-ray, Smooth Streaming, whatever. He’s got different tools on Mac and Windows, even. We don’t care—we just want him to go away so we can finish this movie!

Three Questions

What Is My Content?

An image sequence of PNG files in a directory. 14,316 of them, today.

Plus a .wav file of the current temp soundtrack.

Who Is My Audience?

This kid in marketing who probably thinks DPX is some kind of mountain bike.

What Are My Communication Goals?

We want to make it easy for him to get sources so he goes away. I’m sure he’ll take all the credit, though. Figures.

Tech Specs

We’re going to teach the kid how a QuickTime reference movie that points to our PNG files and audio.

Settings in QuickTime Player Pro

This is going to take QuickTime Player Pro. We’re going to make a reference movie combining the PNG images and .wav file so that it works like a single movie.

First, we select “Import Image Sequence” and select the first file in our PNG sequence. And hit okay. This is 24p exactly; we’re filmmakers—none of the crazy 23.976 video stuff for us (Figure 28.12).

Figure 28.12 If you want your audio to be in sync, get this right!

image

14K images will take a while this is a fine time to make the kid buy me coffee and listen to some Ed Catmull stories.

And behold! QuickTime shows us our PNG files as a movie, the correct duration and everything.

We now open the .wav file too, so both the PNG movie and WAV are open at once. The next part is needs to be done precisely:

•  With the WAV track selected:

•  Select All.

•  Copy All.

•  Switch to the movie window.

•  Hit the leftmost button in the transport controls to make sure we’re on the first frame.

•  Select “Add to Movie.” Not Paste! See Figure 28.13.

Figure 28.13 Add to Movie. Accept no substitutes—particularly not Paste.

image

And that’s it. “Add to Movie” pastes what’s in the clipboard as a new track at the playhead. Since the audio started on the first frame, we should now have perfect sync. See Figure 28.14.

Figure 28.14 The track structure of a reference movie. The only indication it’s not a normal movie is the Resource data indicating where the data really lives.

image

We can now save our movie in a few ways:

•  Self Contained will copy all the data into the file into the file we write, but not recompress anything. It’s fast and big.

•  Reference Movie won’t even do that. It’ll just have links to the video and audio. Since we’re using network storage, it’ll need to resolve a relative path to the referenced files. The easiest way to ensure that is to stick the reference movie in the same directory of the other files, and it will be openable by any Mac or Windows tool that uses the QuickTime API. See Figure 28.15.

•  We could also Export as QuickTime Movie to something easiser to play back, like ProRes 422.

Figure 28.15 There’s a big difference between 5.6 MB and 28.91 GB; reference movies can save a lot of time and space.

image

And that’s it!

And I guess the kid isn’t so bad after all.

Figure C.1 The visual spectrum, from red at the lowest wavelength to violet at the highest.

image

Figure C.2 Relative sensitivity of the eye’s receptors to the different primary. The gray line is the rods; the other lines the three cones.

image

Figure C.3 Ishihara chart showing for testing color vision. People with red-green color blindness aren’t able to see the number in this image.

image

Figure C.4 and C.5 The 1931 and 1976 CIE color chromaticity diagram. The visible spectrum is the outside edge, with mixtures of the primaries as they go towards the white center. (Courtesy of Joe Kane Productions.)

image

Figure C.6 The same image in full range, just luma, and just chroma.

image

Figure C.7 In this image, the inside of the different color squares vary by the surrounding color. But if you cover the surrounding shape, it’s clearly just white.

image

Figure C.8 Paris Street: A Rainy Day by Gustave Caillebotte. This painting uses both shading and converging lines to convey perspective.

image

Figure C.9 The same image with progressively coarser sampling. Even in the finest sampling the text on the cable card disappears, but by the final image it’s not clear it’s a cable car at all. C.9A Original image (1535 × 1464). C.9B Sampled at 256 × 192. C.9C Sampled at 128 × 96. C.9D Sampled at 64 × 48.

image

Figure C.10 Sampling and quantization can result in colors not seen in the original, due to averaging. On the left, we see how purple emerges from samples that were red or blue. This is actually the most accurate and visually pleasing result.

image

Figure C.11 The same colorful image converted to a few different color spaces (at relatively low resolution to show the details). C.11A The source C.11B Dithered to a 1-bit image. C.11C Dithered to the classic 8-bit “web safe” palette. C.11D Dithered to a custom 8-bit palette for the image. C.11E 15-bit (5 bit per channel).

image

Figure C.12 Two images compared in source, high quality, low quality, and at the quality that yields the same file size. The more complex image takes more bits at the same quality, and has lower quality with the same bits.

image

Figure C.13 In the first image, motion vectors exist for the moving object. In the second, there are motion vectors for the whole frame as the whole frame is in motion.

image

Figure C.14 Differential quantization lowers compression in more visually important blocks, while increasing compression in parts of the image where texture detail can better hide the artifacts.

image

Figure C.15 The same screen rendered in the Aero Glass (15A) and Classic (15B) themes. Aero Glass has smoother edges better for DCT style compression, while Classic has lots of flat areas for easy RLE-like compression.

image

Figure C.16 A high-motion HDV frame showing bad blocking (from an interlaced source, deinterlaced for clarity).

image

Figure C.17 Getting 601 and 709 right matters, particularly with skin tones and while details. C.17A is correctly converted to 709. C.17B shows when 601 to 709 correction is applied instead of 709 to 601. C.17C shows the cumulative effect of a double conversion.

image

Figure C.18 The quality advantage gets even bigger after compression even though the preprocessed video is only 800 Kbps compared to the unprocessed’s 1000 Kbps.

image

Figure C.19 A good dither (C.19A) can do a dramatic job of eliminating banding in this kind of subtle gradients (Example courtesy of Stacey Spears)

image

Figure C.20 The same frame as interlaced (C.20A), with a field elimination deinterlace (C.20B), a blend (boo!) deinterlace (C.20C), and finally with full reconstruction via inverse telecine (C20D).

image

Figure C.21 An illustration of how 24p goes into 30i and back again. Also, an illustration of why I’m a compressionist and not an illustrator.

image

Figure C.22 Typical interfaces for color correction. Note the line around 10:30 on the color wheel, indicating the normal hue of skin tones.

image

Figure C.23A Footage shot in front of a blue screen. It’s a great first step, but no compression tool is going to do a good job with it. C.23B The visual part of a well-keyed frame. No noise, just the foreground visible. C.23C And the critical part, the alpha channel itself, showing which pixels are image and which aren’t. That’s what you need to have in the file to key.

image

* Visit the companion website at elsevierdirect.com/companions/9780240812137 to view this bonus chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.151.144