CHAPTER 24
Phones and Devices

Introduction

This chapter is about making content that plays back on non-PC, non–optical disc devices. That may sound more defined by what it’s not than by what it is, but the categories are pretty well-defined in practice.

While there are a lot of terms of art, the world of devices breaks down into basically two families: stuff that mainly runs off batteries, and stuff that mainly runs off AC power.

Phones and Portable Media Players

Your portable battery-powered devices include media-capable phones, iPods, Zunes, and similar small-screen devices suitable for use on the go. They’re designed for long battery life and hence low-watt operation. They’re generally going to have smaller screens than SD, although we’re seeing a few 640 × 480 and beyond displays creep into high-end phones. But given normal viewing distances, most viewers aren’t going to get value out of incredible pixel counts on the display.

The main codec used in portable devices is H.264 Baseline profile, with VC-1 Main Profile and MPEG-4 part 2 also somewhat common. We’re starting to see some devices support the more efficient H.264 Main and High profiles. There’s also a de facto “Baseline + B-frame” profile equivalent supported by some devices and chipsets; B-frames don’t have much of a decoder complexity impact and can be very helpful for compression. Unfortunately, documentation is often poor on what devices support what kinds of files, and some devices explicitly block playback of files not encoded as part of their ecosystems.

We’re on the cusp of a revolution in portable devices, as GPU-derived System-on-a-Chip (SOC) devices like NVvidia’s Tegra and AMD’s Imageon make HD decode and HDMI output possible with small devices. The Zune HD is the first entrant of this new generation, with high-end phones with similar features to follow.

There are of course some audio-only devices left, typically very small, low-priced units like the iPod Shuffle.

The average mobile phone isn’t as capable as most portable media players (PMPs) for media playback, but the market is shifting in that direction, led by Apple’s iPhone. Some of this is simply a logical outgrowth of the development of smartphones; once you have a good-sized color screen and fast processor for handling the web and email, media support is a relatively trivial addition. And PMP chipsets are starting to migrate into phones, providing the same functionality.

Optimal Viewing Distances for Device Screens

We’re seeing the resolution of portable media devices go up faster than the screen size. This can pay off in crisper text, but beyond a certain point it won’t provide a better video experience given the distance at which most folks view the players.

Table 24.1 shows the screen sizes and optimal viewing distances for the most popular PMPs. The industry standard guideline of an optimal pixel size relative to the viewer is 1/60 of a degree, which is the point where individual pixels blend together for someone of average vision. Get any closer, and the pixel edges themselves can be seen. I also include the maximum distance the screen can be away and still fill as much of the visual field as a movie screen would from the back row of a THX-certified theater (which simply isn’t feasible for PMPs). Note that there’s no downside to having a higher resolution on the device, it’s just that beyond the optimal distance, dpi matters less than screen size.

Table 24.1 Typical PMP Screen Sizes and Optimal Viewing Distances, Compared to 60” Flat Panel.

DeviceSize (in)WidthHeightAspect ratioOptimum distanceTHX maximum
iPod nano23202404:317"3.5"
iPod classic2.53202404:322"4.3"
iPod touch/iPhone3.54803203:221"6.4"
Zune 4/8/161.83202404:316"3.1"
Zune 80/1203.23202404:328"5.5"
Zune HD3.348027216:921"6.2"
PlayStation Portable4.348027216:927"8.1"
Archos 54.88004805:318.2"8.8"
37" SD CRT display357204804:3150" (13’)61" (5’)
60" 1080p LCD601920108016:9225" (19’)113" (9’)

As you can see, our portable devices, whatever their resolution, aren’t going to provide the same kind of detail as a normal HD display.

Consumer Electronics

There’s not a great term of art for “video playback devices you plug in that aren’t full-fledged computers.” They’re sometimes called set-top boxes (STBs), appliances, and extenders. For this chapter I’m going with CE (consumer electronics) devices.

Modern CE device have much more capable ASIC decoders than portable devices, typically including H.264 High Profile, VC-1 Advanced Profile, and MPEG-2. CE devices are also much more likely to support interlaced encoding than a portable device; all STBs do, but a few devices are progressive only, or support interlaced SD but only progressive HD.

For outputs, everything new supports HD now. For backward compatibility, most have composite, S-video, and component, and current devices have HDMI as well.

5.1 audio support is also common, and is an area in flux, reflecting the changing state of consumer interconnects. Classic analog stereo is a given. Beyond stereo, there are three approaches to multichannel audio output; any given device may do one or multiple.

The simplest is bitstream pass-through, where the device doesn’t have its own encoder, but just passes PCM stereo, Dolby Digital, or DTS audio bitstreams out to a receiver. This is cheap and simple, but prevents the device from doing any kind of audio mixing, like button feedback. Bitstream pass-through is most commonly provided via TOSLink, but some devices support it via digital RCA or HDMI.

The rarest in practice is multichannel analog—six analog outputs (often combined into three stereo pairs for space reasons). This is how most Windows PCs deliver multichannel output. It’s not seen too often in audio gear outside of the high end, though it was the only multichannel output for DVD-A and SuperAudio CD. Given all the cabling required, receivers tend not to have more than one set of multichannel analog inputs, if even that. And since that’s generally the best way to hook up PC surround audio, any user that uses the same setup for gaming and media isn’t likely to have anywhere to plug in another device.

Why Portable Devices?

Portable media devices have been a hallmark of consumer media technology since the transistor made the pocket-sized radio possible in 1957. From that through Sony’s Walkman, the battery-powered CD player, MP3 players, and on to today’s iPods, Zunes, and media-capable phones, we’ve had a good half-century of being able to carry our music with us in some form.

In the mid-2000s, some early video-on-the-go devices started emerging; I still have a Creative Zen Portable Media Center device that worked surprisingly well in 2004, but definitely wasn’t pocket-sized. But there’s a whole lot more bits to create, store, and decode with video than audio, so it took a few iterations of Moore’s Law to get the cost of authoring, storage, and watts for decoding down to the point where video could be a low-incremental-cost bonus feature to a music player. Originally players that supported video all used tiny hard drives, but the capacity of flash memory keeps going up, and it’s possible to fit many hours of video on an all-solid-state device (although with a higher price per GB for HD-based models).

The bigger problem for portable video playback is screen size. With good headphones, even an iPod nano can make a sound as big as the world. But a small device can’t have a big screen, so there’s an unavoidable tension between portability and visual experience. And historically, portable devices have had pretty weak LCD displays using 15- or 16-bit color (so 5–6 bits per channel), and thus a lot of banding or dithering. This is poised to change with the OLED displays used in a few phones and the new Zune HD.

But for those who fly, commute on mass transit, or treadmill at the gym, the risk of eyestrain, neck cramps, and tired arms can be worth watching what you want when you want it. And heck, maybe someday video projection inside sunglasses will finally go mass-market, like they’ve been promising for years.

The dedicated portable media players typically include analog video output. Originally those were all composite and all terrible, although some have analog component now. However, even that output is generally pretty terrible with lots of noise. Also, none do 60p yet, nor interlaced; 24p content gets output with very ugly judder where every sixth frame gets repeated, instead of the smoother 3:2 pulldown.

HDMI output should be much better; digital is a lot easier to deliver than analog in a small device.

Why CE Devices?

The outboard CE device is nearly as venerable as the portable music player, with its roots deep in the analog era. The first cable set-top boxes were invented in the 1960s.

Soon followed an explosion of ways to watch media at home, starting with some amazing failures like DiscoVision through the revolutionary VHS and high-end Laserdisc to the industry-changing DVD. But the set-top box didn’t vanish in the face of home playback; it kept getting enhanced to support dozens, then hundreds of channels; went from analog to digital to HD; and was adapted for satellite and ITPV use as well, competing against the cable industry where the technology was born.

And for all we talk about DVD as a driver of the entertainment industry, the classic instant-on, watch-what’s-on experience has long commanded many more eyeball-hours. But those lines are blurring. As we see more home recording and pay-per-view functionality in the set-top boxes, it’s not clear if there’s anything sustainably unique about the traditional, linear viewing experience offered by cable and satellite over what streaming to a CE device could deliver.

As TVs and computer monitors increasingly become the same thing, many of the same experiences can be delivered from a PC running Windows Media Center, XBMC, or similar software. Until recently content providers’ DRM rules haven’t allowed live cable/satellite video broadcasts to play directly on PCs (although many enterprising viewers found ways around that). But even that line in the sand is being erased with native CableCard support for Windows 7 Media Center finally announced.

But we’ve been seeing more and more PC-centric experiences working on CE devices, like Netflix’s on-demand streaming via the Roku digital video player, and a broad array of Media Center Extender and Digital Living Network Alliance (DLNA) devices that allow a user’s Windows Media Player or (non-DRMed) iTunes library to be browsed from a PlayStation 3, among many other clients.

How Device Video Is Unique

The biggest difference between device video and general PC video is that the format specifications are much tighter, with files generally working perfectly or not playing at all (better) automatically scode to a spec file. However, this still burns could introduce artifacts, and often results in a bigger file than one optimally encoded for the device.

Unfortunately, these specifications aren’t always clearly documented, and there are lots of things that should work in theory but don’t in practice, or shouldn’t work in theory but do (sometimes, in some places) in practice.

CE devices and their much more powerful ASICs can handle a much broader range of content than portable devices, and often can play a wide array of existing files.

Getting Content to Devices

Devices have a panoply of ways to load, stream, or connect to content. Because they’re not meant to be a primary device, they assume users have a different primary device they use to manage and store content.

Attached Storage via USB

A number of devices, particularly CE, support playback of content via USB storage (like hard drives and flash drives). There are some caveats, though.

The first is that only FAT32-formatted drives are broadly compatible. Most CE devices won’t support NTFS- (Windows) or HFS + (Mac)-formatted drives. However, files on FAT32 are limited to a maximum size of 4 GiB. That’s normally enough for a two-hour movie in 720p (around 4.7 Mbps average bitrate), but generally not enough for 1080p.

Older versions of Windows can only format FAT32 partitions with a maximum size of 32 MB; laughably small for today’s hard drives, but this is fixed in Windows 7.

The second issue is speed of flash drives. While they’re getting faster, models even a couple of years old can have quite slow read speeds. If your file is stuttering while playing back off a USB flash drive, the drive may not support the peak bitrate, and might not be a problem with the decoder or content.

Sideloaded Content

Sideloaded content is content on a mobile device transferred from a computer. This is the classic sync model used with most devices to date, as they’re able to take advantage of the faster downloads and larger storage capacity of a PC.

All Zunes and the iPod Touch/iPhone now support finding and downloading music straight to the device via WiFi, but most users still do their music browsing and downloading on a PC.

Progressive Download to Devices

Progressive download can be challenging on phones that don’t have much internal storage or RAM, as the entire file can’t be cached. This removes the big “rapid random access” advantage of progressive download. A PC may be able to do progressive download of a 1 GB file, but that same file could offer terrible performance on a phone.

Standard Streaming to Devices

Many phones support streaming protocols, most commonly MPEG-4 RTSP for 3GPP files and the Windows Media streaming protocol. The classic streaming protocols were designed for the high-latency, lossy Internet of the past, which isn’t that far from the mobile world today.

Traditional streaming protocols aren’t used in non-phone devices; Wi-Fi sideloading is as far as they get today. The biggest technical barrier is the power requirements of keeping the radio on for sustained periods of time while also running the screen and decoder chips; this can run down a battery quickly.

Adaptive Streaming to Devices

Adaptive streaming for devices is in its infancy, but clearly of growing interest. Smooth Streaming is coming to the Xbox 360, and is being actively implemented by several STB vendors. Apple’s adaptive streaming technology launched in June 2009 on the iPhone and QuickTime X, and an Apple TV implementation seems inevitable, although still unannounced.

It’s not clear if an all-HTTP delivery mechanism is ideal for wireless networks, however. I’d love to see what would be possible with MBR switching of an adaptive streaming chunked format coupled with UDP and multicast.

There are interesting possibilities for adaptive streaming in aggressively prefetching data as quickly as bandwidth allows, and then turning off the power-hungry radio receiver hardware for several minutes until it’s time for the next bunch of chunks to download.

Sharing to Devices

Devices are often on a LAN with a PC or other device storing content. So it’s possible to use that reliable, low-latency, high-speed local bandwidth to “stream” almost like a local file. This allows a big content library to exist on the PC’s cheap storage, with multiple client devices able to consume that as needed.

This works the best with wired Ethernet connections, of which the slowest old home router these days provides at least 100 Mbps, easily enough for multiple HD streams. Wi-Fi can be more of a challenge, as obstructions and interference inside a house can reduce bitrates (it’s not fun when microwaving the popcorn kills movie playback). And older protocols like 802.11b are a lot slower than newer variants like 802.11g and 802.11n.

Universal plug and play

Universal Plug and Play (UPnP) is set of networking protocols from the eponymous industry group, pushing easy interoperability between different devices and platforms. One of their many efforts is the UPnP AV standards.

Many devices can connect to an UPnP AV “Media Server” and its library of content. The most common of these is Windows Media Connect, built into WMP11 and later, making it very easy to share content libraries from Windows to compatible devices and players. There’s no default UPnP Media Server on Macs, but a number of third-party ones are that can share non-DRM content from the iTunes library.

On the client side, many CE devices support UPnP AV, including Xbox 360 and PS3, and some recent Sony and Samsung displays. On the software side, WMP 11 + (so XP with an optional install, and Vista + out of the box) is also a client, as are many common players like VLC.

DLNA

The Digital Living Network Alliance is an industry consortium defining interoperable media sharing across different platforms and devices. It’s notable for supporting Windows Media DRM sharing to non-Microsoft devices like the PlayStation 3.

DLNA builds on top of the UPnP and other specifications.

(Disclosure: I created much of the DLNA test library of WMV files.)

Windows Media Player/Center and Zune

Windows Media Player/Center and the Zune desktop software offer somewhat deeper interoperability, including DRM support (for content with proper rights) when using other Windows computers or the Xbox 360 as a client. This can include richer navigation of libraries.

Media Center Extenders are specific devices that interoperate with Windows Media Center, and can provide the full GUI of Media Center as well as media playback. This enables library management, recording setup, et cetera to be done from the couch, not just the computer room.

Windows 7 adds live transcoding and streaming to devices. You can browse the media library on your home PC from anywhere in the world, and then it’ll transcode and broadcast it to your device in real-time in a format the device can handle over the available bandwidth. If you’ve got a TV tuner in your PC, this would allow watching a live local channel on your phone anywhere in the world.

iTunes

Apple’s iTunes is capable of media sharing to the Apple ecosystem of products, like other copies of iTunes, iPhone/iPod Touch, and AppleTV.

DRM-protected iTunes content is playable only on Apple devices or via iTunes, however. Only Apple is able to publish iTunes content. The Apple walled garden can be a seamless experience if Apple provides want you want as a publisher and device vendor, but with a significant sacrifice in flexibility.

The Walled Garden

There are also lots of devices, particularly classic STBs, that only play content from the specific service that provided them. This is called a “walled garden”—it may be beautiful, but nothing comes in or gets out.

As a compressionist, you may be asked to provide content to a walled garden spec. The most common of these are the CableLabs specs, but many operators have their own particular ones. Be wary of these projects if you don’t have a tool that explicitly supports the precise format in question; even if your MPEG-2 encoder looks like it supports all the needed parameters, there’s often little things that cause validation errors.

Devices of Note

There’s a ton of devices out there, but there’s a few major brands that are most targeted with encoded content, because they’re available in high enough volume.

iPod Classic/Nano/Touch and iPhone

Apple’s iPod line has three main tiers: the screenless iPod shuffle, the 320 × 240 iPod classic and iPod nano, the iPhone-derived iPod touch, and the iPhone itself (Figure 24.1).

Figure 24.1 The Apple iPhone 3GS (which looks exactly like the Apple iPhone 3G, and features the same screen size as the iPod touch).

image

While screen sizes vary (the iPod nano and iPod classic are 320 × 240 while the iPhone and iPod touch are 480 × 320), the same content will play back in any current iPod with a screen. All the iPhone/iPod devices released since the iPod 5G in late 2005 support the same basic parameters:

•  .mp4 or .mov wrapper (self-contained only)

•  Up to 640 × 480p30

•  H.264 Baseline

•  1.5 Mbps max with Level 3.0

•  2.5 Mbps max with 1 reference frame

•  Apple calls this “Baseline Low-Complexity”; not a term from H.264 spec

•  MPEG-4 part 2 Simple Profile up to 2.5 Mbps

•  AAC-LC audio up to 48 KHz stereo

Apple didn’t make authoring that easy, though. Originally any iPod video file more than 320 × 240 requires a special string be added to the file indicating it was compatible. Things are more flexible with devices since the late 2007 launches of the iPod Classic/Touch and iPhone. They increase H.264 support to:

•  720 × 480p30 and 720 × 576p25

•  5 Mbps max Level 3

•  10 Mbps peak

•  10 Mb VBV

The iPhone 3GS has hardware capable of decoding 720p in both H.264 High Profile and VC-1 Advanced Profile, but Apple hasn’t provided any way to sync HD content.

Apple TV

Apple TV is a relatively minor product compared to the massive success of the iPod and iPhone. It’s a simple playback device with limited HD support, mainly compatible with iTunes Store–style content.

It’s quite limited in what it can play back, particularly compared to other CE devices:

•  .mp4 or .mov wrapper (self-contained only)

•  H.264 Main Profile (constrained to Baseline B-frames)

•  Up to 5 Mbps peak

•  Progressive only

•  CAVLC only—no CABAC

•  Max of 1280 × 720p24 or 960 × 540p30

•  MPEG-4 part 2 Simple Profile

•  Up to 3 Mbps peak

•  Max 720 × 432p30

•  AAC-LC audio up to 160 Kbps

•  AC-3 audio for 5.1 audio

Given these constraints, most content HD content targeting the Apple TV needs to be explicitly encoded for it. Main Profile without CABAC is quite a bit less efficient than the standard High Profile, so most existing HD files will use features that are incompatible with the Apple TV.

Zune

The original Zune (the boxy Zune 30) used software decode, and was limited to 320 × 240p30 WMV (VC-1 Main Profile) with a peak bitrate of 1.5 Mbps. The decoder can drop frames with very high motion at the full 320 × 240p30, but generally 320 × 176p30 and 320 × 240p24 always play back reliably.

The second-generation Zune 2 line introduced in 2007 (the Zune 4, 8, 40, 80, and 120) also have a 320 × 240 screen, but a much more powerful decoder. It handles the full range of NTSC and PAL SD sizes, including non-square-pixel support. Supported are:

•  Windows Media

•  VC-1 Main Profile Main Level

•  Up to 720 × 480p30 or 720 × 576p25 3 Mbps peak

•  WMA Standard or Pro up to 192 Kbps

•  MPEG-4 (.mp4)

•  H.264 Baseline

•  – Up to 720 × 480p30 or 720 × 576p25 2.5 Mbps peak

•  MPEG-4 part 2

•  Simple Profile

•  Up to 720 × 480p30 or 720 × 576p25 4 Mbps peak

•  AAC-LC audio up to 192 Kbps 48 KHz

Zune HD

The Zune HD (Figure 24.2) is a quite different device than the first generations of Zune, so I’ll give it its own entry.

Figure 24.2 The Zune HD is a very different device than earlier Zune models.

image

It incorporates the NVidia Tegra chip, the first in a new generation of video SOC chipsets with big improvements in HD file playback for battery-powered devices. It also has a nice OLED 24-bit display, eliminating the banding of older Zunes and most other portable media players. And its dock provides a HDMI port, making it a good portable HD source. It supports:

•  Windows Media

•  VC-1 up to Advanced Profile Level 2

•  Up to 720 × 480p30 or 720 × 576p25 10 Mbps peak

•  Up to 1280 × 720p30 14 Mbps peak

•  WMA Standard up to 192 Kbps

•  WMA Pro up to 384 Kbps 48 KHz 5.1

•  MPEG-4 (.mp4)

•  Max 4 GB file size

•  H.264 Baseline + B-frames (superset of AppleTV spec)

•  Up to 720 × 480p30/720 × 576p25 10 Mbps peak

•  Up to 1280 × 720p30 up to 14 Mbps peak

•  MPEG-4 Part 2

•  Advanced Simple Profile Level 5

•  Up to 720 × 480p30 or 720 × 576p25 4 Mbps peak

•  AAC-LC audio up to 256 Kbps 48 KHz

•  AVI/Divx (.avi/.divx)

•  Max 4 GB file size

•  Divx 4/5 Home Theater profile

•  Up to 720 × 480p30 or 720 × 576p25 4 Mbps peak

•  MP3 audio up to 192 Kbps 44.1 KHz

Xbox 360

The Xbox 360 has a three-core CPU, and uses software decoding, not an ASIC. Like many software players can often play content well beyond the listed specs. These are what’s guaranteed to work, and it will attempt to play back higher rates, potentially with dropped frames. It can often handle real-world H.264 content with 15 Mbps peaks seems to work fine.

Around this time this book hits the shelves, Microsoft will be launching a new adaptive streaming service for the Xbox 360, delivering up to 1080p.

The Xbox 360 can play back content from a FAT32-formatted drive connected via a USB port, shared content (including DRM) via the Window Media Player and Zune media sharing features, and via other products using the same protocol. For example, Connect360 is a third-party app that allows access to (non-DRMed) content shared from iTunes. Note FAT32 playback is limited to 4GB, but file sharing can use files of any size. Official specs are:

•  Windows Media

•  WMV 7, 8, 9, and VC-1 Simple, Main and Advanced profiles

•  Up to 1280 × 720p60 or 1920 × 1080p30 30 Mbps peak

•  WMA Standard, Pro, and Lossless up to 48 KHz 5.1

•  MPEG-4 (.mp4 and .mov wrappers)

•  H.264

•  – Baseline, Main, and High up to Level 4.1

•  – Up to 1280 × 720p60 or 1920 × 1080p30 10 Mbps peak

•  – AAC-LC stereo

•  MPEG-4 Part 2

•  – Simple and Advanced Simple profiles

•  – Up to 1280 × 720p30 5 Mbps

•  AAC-LC stereo

•  AVI/Divx (.avi wrapper)

•  Simple and Advanced Simple profiles

•  Up to 1280 × 720p30 5 Mbps

•  AAC-LC stereo

•  Dolby Digital up to 5.1

PlayStation Portable

The PlayStation Portable (PSP) is mainly a handheld game machine, but it’s also quite popular for media playback. The disc-based models have larger screen than the other PMPs we’re discussing here—a full 4.3 inches. The new PSP Go is a still large 3.8 inches.

Sony being, well, Sony, launched the PSP with a proprietary disc format called UMD—Universal Media Disc—both for game playback and for playback of a proprietary movie format. UMD is a red-laser 60 mm disc in a cartridge and can store up to 1.8 GB. UMD enabled much bigger games than competing handhelds using cartridges had at the time.

However, after an initial burst of sales, the UMD disc format quickly lost consumer interest. And powering the motor and laser for an optical disc is a significant power drain in a battery-powered handheld. In 2009, Sony introduced the PSP Go, which replaces the UMD with 16 GB of internal flash memory to download games. They continue to sell the current UMDbased PSP 3000, but clearly the future of media playback on the PSP is via downloads, network sharing, or flash storage.

The PSP uses Sony’s proprietary Memory Stick flash memory format, which can be quite a bit more expensive than more common flash storage types. Still, given the PSP’s high-quality screen, users were quickly drawn to the PSP for playback. Original firmware blocked playback of video files more than 320 × 240 (presumably to avoid competition with the UMD experience), but that “feature” was quickly hacked and eventually formally dropped.

The PSP is still a very fussy player, and pretty much plays only content specifically designed for it. Uniquely, it requires CABAC entropy coding, and so isn’t compatible with CAVLC or Baseline at all. Thus iPod and PSP files are wholly incompatible; the Zune HD is the only other portable media player capable of playing PSP H.264 files. Supported are:

•  MPEG-4

•  H.264 Main Profile (CABAC only!)

•  – But not using pyramid B-frames, which are incompatible

•  720 × 480, 352 × 480, and 480 × 272 frame sizes

•  MPEG-4 part 2 Simple Profile

•  AAC-LC

•  AVI (really just for still camera video)

•  Motion JPEG

•  PCM or μ-Law

PlayStation 3

Sony’s PS3 aimed to build on the PS2’s dominance of the living room gaming to living room media consumption, with Blu-ray support a key feature. This hasn’t paid off so far; the cost and complexity involved in being a best-of-breed game machine and Blu-ray player has left the PS3 as the third place console, a far cry from the overwhelming first place the PS2 held. While the innovative Cell processor in the PS3 hasn’t really lived up to hopes as a uniquely powerful CPU for console games, it has proven to be a video decoding monster. Thus the PS3 isn’t really limited by decoding horsepower, but by the formats and codecs that have been implemented.

It can even do H.264 at 1080p60, probably the first CE device to do so, although as of Firmware 2.80, that forces 1080i output due to a bug of some sort. PS3-supported media types include:

•  MPEG-4

•  H.264 up to 1080p60 4:2:2 (!)

•  AAC-LC

•  MPEG-1

•  MPEG-2 Program and Transport stream

•  H.264

•  MPEG-2

•  AAC-LC, AC-3, or PCM

•  PCM

•  Camera AVI

•  Motion JPEG

•  PCM or μ-Law

•  DivX/Xvid/AVI

•  MPEG-4 part 2

•  MP3 audio

•  WMV

•  VC-1/WMV 9

•  WMA Standard

•  DRM-free only

Formats for Devices

MPEG-4

The MPEG-4 file format is the most broadly supported for devices, although that’s split up in H.264, Part 2, and 3GPP variants.

H. 264

H.264 has become the most common codec for devices, given its adoption in broadcast industry for CE devices and the Apple iPod/iTunes ecosystem. All current PMPs support it, as do new phones with media playback.

H.264 is normally supported in the .mp4 wrapper.

Note that content purchased for the iTunes Store and protected with Apple’s FairPlay DRM are only playable via Apple devices and the iTunes software, plus a few specific Motorola phones. These will have the .m4v extension, although other files with that extension may not include DRM.

MPEG-4 part 2

MPEG-4 Part 2 Simple Profile was the original video codec used for the iPod. And since the H.264 decoder ASICs generally include part 2, devices often contain support for part 2.

However, it’s rarely the best choice. Most devices that do it also do H.264, and there are more H.264-only than part 2-only devices out there. And the Apple ecosystem is Simple Profile only, setting a quite low bar for compression efficiency.

It’s common for devices that support part 2 to support lower maximum bitrates and frame sizes than for H.264 and VC-1—for example, Apple TV and the Zune HD only do Part 2 in standard definition. What’s worse is that many devices that claim part 2 support don’t publish detailed specs, so it can be trial and error to figure out what works. For example, SP + B-frames is pretty common in non-Apple decoders (B-frames are by far the most useful and simplest to implement feature from ASP).

3GPP

The 3GPP formats are a variant of MPEG-4, and use H.264 or MPEG-4 part 2 video.

The biggest difference is on the audio side, where 3GPP often includes HE AAC and voice-only codecs like AMR and CELP. 3GPP and its myriad codec choices are covered in Chapter 12.

Unfortunately, device support varies widely, although we’re seeing convergence around H.264 Baseline and HE AAC v1 for new phones as chipsets get better. Hopefully, in a few years a single file and publishing service can address all mobile devices, but we’re not there yet. Too often, video delivery to phones must be handled by the carrier, specifically encoding for the devices they support.

Windows Media and VC-1

Windows Media remains a popular choice for device media. One big reason is that it’s the only widely supported licensable DRM. Beyond Windows itself, Windows Media DRM protected files can play on Zunes, Windows Mobile and CE devices, many other devices using the licensable Windows Media and Windows Media DRM porting kits, and Silverlight. That’s led to Windows Media becoming the underlying technology for services the offer device compatibility like Netflix’s streaming service and Blockbuster and Amazon’s download services.

For historical chipset reasons, portable device VC-1 is mainly Main Profile. This will presumably change as Silverlight and Smooth Streaming comes to devices (although H.264 certainly works with Smooth Streaming and PlayReady DRM). CE devices generally support Advanced Profile and hence interlaced.

Most devices support WMA Standard and Pro, although some non–Windows Mobile/CE devices don’t support decoding the Pro LBR modes yet.

AVI/DivX/Xvid

DivX Networks was an early leader in standardizing device media playback with their DivX-certified licensing program, which was particularly popular in off-brand DVD players.

These generally follow the profiles specified by DivX Networks, although many devices (particularly portable ones) don’t carry the explicit DivX certification.

Normally these AVI files use MP3, but some also contain AC-3 for multichannel audio.

The lower compression efficiency of MPEG-4 part 2 + MP3 generally means this should only be used when specific devices are targeted for which this is the best option. It’s quickly becoming a legacy technology. The new Divx profiles are based on H.264 in a MKV wrapper.

Audio-Only Files for Devices

For all this talk of video, music listening accounts for a lot more hours of use than video watching for a typical device. Audio “just works” for the most part, so there just isn’t as much to cover.

Any device is going to support MP3, of course, and most support .mp4a (AAC-LC only MPEG-4 files). WMA support is quite common, with Standard (WMA 9.2) most common, with WMA Pro and Lossless in many devices, particularly those based on Windows Mobile. Apple’s devices (but not many others) also support the Apple Lossless codec.

While open source advocates have been singing the praises of Ogg Vorbis for many years—and it is a more efficient codec than MP3—it’s not supported by many mainstream devices. The FLAC lossless codec is sporadically supported as well.

All that said, most portable audio devices are used to play MP3 files, even though it’s the least efficient of the broadly used lossy codecs. But guaranteed interoperability with every device make MP3 the safe choice, and storage capacity is cheap enough today to make the compression efficiency difference immaterial for most music collections.

Generally portable players only play back stereo audio, although some can fold down multichannel to stereo for playback.

CE devices rarely store audio, but many can play back the same formats via streaming and sharing.

Encoding for Devices

For portable devices, the core tension is between encoding for the screen or for the output. If encoding for personal use on a single device, or a class of devices that only does 320 240, encoding more pixels than that is a waste of time and bits. If encoding for a broad class of devices that vary in screen size, encoding for the size of the biggest screen can be a good compromise. So, a 16:9 file targeting iPod and Zune playback could use H.264 Baseline 1.5 Mbps at 480 × 272. That will play 1:1 on the Zune HD and iPod touch/iPhone, and would get scaled down to 320 × 180 on the iPod classic/nano and Zune 4/8/80/120.

However, not all devices do a good job of scaling the video, either in performance or quality. In particular, phones not of the current cutting-edge generation often don’t use good hardware scalers, so ideally encode a size where no scaling on playback is required (like 320 × 176 on a 320 × 240 display). Sometimes a 2x scale mode is faster than a fractional ratio.

It’s also possible, of course, to make a generally compatible file that’ll play on most devices and most media players (as in the H.264 tutorial). These generally follow the iPod 5G specs, which are the most restrictive in use today. Still, they can do a decent job at standard definition, and so offer some future proofing. 2.5 Mbps is generally sufficient for most content with H.264 Baseline with one ref, the maximum the iPod 5G supports. For when faster downloads or more efficient use of precious space is appropriate, 3 references and 1.5 Mbps peak can be better.

Unfortunately there’s no H.264 setting that’ll play in both the iPod and PSP; the iPod can’t use CABAC, and PSP requires it. So, if you want to make a single file that can play on every PMP, it’ll have to be MPEG-4 part 2 Simple Profile.

For CE devices, things are easier. The major ones all support H.264, MPEG-4 part 2, and VC-1 Windows Media in their ASIC or software decoders, although support for particular file formats and audio codecs can vary. In general, a High Profile H.264 file with stereo AAC-LC audio will work everywhere, but some devices like the Xbox 360 don’t handle 5.1 AAC-LC. Conversely, the PS3 doesn’t do WMA Pro, which the Xbox 360 can use for 5.1.

Devices Tutorial

We covered making generic iPod-compatible H.264 files back in the H.264 chapter. So for this chapter, we’ll look at how to encode screen-optimized clips for the standardsized Zune and iPod models.

Scenario

We work for a skateboarding company’s marketing department. We’ve got a great new line of decks coming out in the fall, and a viral video campaign to back it up. To that end, we’ve got a good 10-minute highlights video that we want our customers to install on their mobile devices to show off to their friends.

Three Questions

What Is My Content?

The source was shot on HDV at 1080i30, edited in Premiere Pro, and saved out as a 16:9 1440 × 1080i30 anamorphic Cineform Prospect HD file. It veers from static close-ups over ethereal flutes to crazy motion, wild effects, and a pounding beat.

Who Is My Audience?

The audience is boys (and some girls) and their toys. They’ve got allegiance to their devices, so we want to offer content tuned to offer a perfect experience on all iPod, Zune, and PSP models. Since that audience is always looking for eye candy to justify their purchases, we think being a great demo clip will up viewership and our brand.

However, these guys go through a lot of content; we don’t want to waste so many bits that they delete the file. So, we want to be as small as we can be while still retaining excellent quality.

What Are My Communication Goals?

Our brand is all about technical brilliance while being extreme and awesome. So the stuff has to look and sound just great, while keeping the file size reasonable for quick downloads and long life on the device.

Tech Specs

Preprocessing is pretty easy. We’re taking that big 1080i source down to 320 × 176 or 480 × 272 for our Mod16 output resolutions. Deinterlacing quality doesn’t matter so much.

We want to make an optimal file for the primary mobile devices in our audience, and each flavor. So, we’re targeting the following devices:

•  iPod 320 × 240

•  iPhone/Touch 480 × 320

•  Zune 320 × 240

•  Zune HD 480 × 272

•  PSP 480 × 272

While in theory we could get away with doing 480 × 272 and 320 × 176 MPEG-4 Simple Profile part 2 to cover all of these, the quality of that codec may not be up to our goals on all our devices. So, we’re going to suck it up and make a variants customized to each device.

•  iPod classic/nano: These have smaller screens, and in the case of the nano, less storage than the iPhone/touch. So we’re going to use three ref frames.

•  320 × 176 (16:9 is 320 × 180, so Mod16 rounded)

•  H.264 Baseline 2-pass VBR 1 Mbps ABR 1.5 Mbps PBR (750 KB buffer)

•  Three reference frames

•  AAC-LC 192 Kbps

•  iPhone/iPod touch: The bigger screen is going to want more bits, and hopefully most people have one with decent storage. We’re going down to one ref frame in order to bump up peak bitrates and still have a file that’ll work on the 5G iPod. Since Apple doesn’t put version numbers on the iPods, users often don’t know which model they have.

•  480 × 272 (16:9)

•  H.264 Main-pass VBR 1.5 Mbps ABR 2.5 Mbps PBR (1250 KB buffer)

•  One reference frame

•  AAC-LC 192 Kbps

•  PSP: Ah, H.264 Main Profile. We can do quite a bit more:

•  480 × 272 (16:9)

•  H.264 Main 2-pass VBR 1.5 Mbps ABR 2.5 Mbps PBR (1250 KB buffer)

•  *CABAC on, pyramid B off (as required by PSP) Three reference frames, max three B-frames

•  AAC-LC 192 Kbps

•  Zune 30 and 4/8/16/80/120: This is going to need to work on the original Zune 30 as well as the more recent models, so we can just stick with those constraints.

•  320 × 176

•  VC-1 Main Profile 1.0 Mbps ABR 1.5 PBR (750 KB buffer)

•  WMA 9.2 192 Kbps

•  Zune HD: This could play the iPhone file with no problem, but we can push the video quality up higher, since we have access to higher peak bitrates. Either H.264 or VC-1 would be fine, but we’ll go with WMV as Zune owners are more likely to be WMV partisans.

•  480 × 272

•  VC-1 Main Profile 1.5 Mbps ABR 4 Mbps PBR (2000 KB buffer)

•  WMA Pro 192 Kbps

Simple enough. Beyond those, we’re going to go for awesome video quality and extreme slowness on our encodes as well. Low resolutions always encode pretty quickly anyway.

Expression Encoder

Expression Encoder 3 has become a pretty darn capable H.264 encoder for devices; the lack of High Profile or HE AAC hasn’t been a limitation.

We’ve got pretty good presets to start with for most of these:

•  H.264 iPod/iPhone touch

•  A modified version of this for the PSP

•  H.264 iPod classic/nano

•  VC-1 Zune 1

•  VC-1 Zune HD

From there, we can make some extreme performance choices:

•  Complexity 5

•  Threads Used: 1

•  RD Optimization and 4 × 4 ME Partition Level on for H.264

•  True Full Chroma and Adaptive Motion Match for VC-1

Those will increase encoding time at least 20x over the defaults, with perhaps a 10 percent improvement in efficiency. But hey, we’re extreme. It’s what we do. Don’t try those settings with HD video, of course!

Here’s a more realistic “slower better better” set of choices:

•  Complexity 4

•  Threads used: 2 for 320 × 176, 4 for 480 × 272 (following our “At least 64 lines per slice” rule)

•  Adaptive True Chroma and Adaptive Motion Match for VC-1

Which would look nearly, and probably indistinguishably, extreme.

Those would be five to ten times faster than the insane settings, but within 1–2 percent of their quality. Awesome, certainly but arguably not extreme. See Figures 24.3 through 24.7.

Carbon

Carbon has full-featured implementations of both the VC-1 Encoder SDK and Main Concept H.264 SDK, and so has a lot more knobs and dials for mobile encoding than other tools.

Unfortunately, the Carbon presets aren’t very well suited to what we want. We’ll just start with the System H.264 and VC-1 presets and modify from there. Note that Main Concept can do adaptive B-frame placement, so we enter 3 here:

H. 264

•  Stream Type: MPEG-4 System

•  Device Compatibility: Sony PlayStation Portable for PSP

•  Video Bitrate Mode: VBR 2-pass

•  VBV Buffer Size: 0 (determined by Level)

•  Profile: Baseline for iPod/iPhone, Main for PSP

•  Level: 3.0

•  Size of Coded Video Sequence: 120 (4-sec GOP @ 29.97 fps)

•  B-frames (PSP only)

•  Number of B-pictures: 3

•  Use Adaptive B-frame placement: On

•  Reference B-pictures: On

•  Allow pyramid B-frame coding: Off (not supported by PSP)

•  Slices: 1

•  All “Use fast” options: Off

•  Use rate distortion optimization: On

•  Adpative Quantization Mode: Complexity (Keep the backgrounds smooth)

•  Adaptive Quantization Strength: –50

•  Audio Bitrate: 192

VC-1

•  Complexity: 5

•  Profile: Main

•  Number of threads: 1

•  GOP size: 120

•  VBV Buffer as above

•  Closed GOP: Off

•  Overlap Filter: On

•  Audio Code: WMA 9.2 for Zune 320 × 240 and WMA Pro for Zune HD

•  Channels: 2

•  Sample Rate: 48 KHz

•  Bits/Sample: 16

•  Audio Bitrate: 192 Kbps

Carbon’s able to encode this quite a bit faster with its Insane settings than EEv3’s, both because it can encode all four output streams in parallel (big time saver when doing single-threaded encoding) and because Main Concept doesn’t have any modes as insanely slow as EEv3’s “RD Optimization.” See Figures 24.8 through 24.12.

Figure 24.3 H.264 settings in Expression Encoder for the iPhone and iPod touch.

image

Figure 24.4 H.264 settings in Expression Encoder for the PSP.

image

Figure 24.5 H.264 settings in Expression Encoder for the iPod classic/nano.

image

Figure 24.6 VC-1 settings in Expression Encoder for the Zune 1.

image

Figure 24.7 VC-1 settings in Expression Encoder for the Zune HD.

image

Figure 24.8 H.264 settings in Carbon Coder for the iPhone and iPod touch.

image

Figure 24.9 H.264 settings in Carbon Coder for the PSP.

image

Figure 24.10 H.264 settings in Expression Encoder for the iPod classic/nano.

image

Figure 24.11 VC-1 settings in Expression Encoder for the Zune 1.

image

Figure 24.12 VC-1 settings in Expression Encoder for the Zune HD.

image

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.160.92