CHAPTER 22
Web Video

For many (most?) people, video compression equates with putting video on the web. Back during the CD-ROM epoch, worthwhile web video was “a year away” for quite a few years. But in 1998, my video compression business went from being 80 percent CD-ROM in January to 80 percent web in December. This was frustrating in some ways—we’d finally gotten CD-ROM to approach broadcast quality, so optimizing video for tiny windows and narrow bandwidths was like starting over with QuickTime 1.0 (thus starting the third iteration of the Postage Stamp Cycle – every five or so years, compression starts over at a postage stamp size. HD, CD-ROM, web, and most recently mobile. Video watches should hit around 2012). Today, web video is at least as good on average as CD-ROM video of a decade ago, and at its best, the web today delivers an HD experience impossible with the discs of old.

That people were willing to put up with postage stamp–sized video meant that web video had to offer something compelling. And it did. The web promised a searchable, immediately available universe of content. No hunting around the video store for the disc you want. No limit to how much video could be put in a given project.

We’re seeing a lot of that promise come true today, and with real business models behind them in services like Hulu and Netflix. The story of the next few years is going to be how quickly the web catches up and (perhaps hard to imagine, but we’ll be there in a few years) eventually surpasses Blu-ray in quality.

Connection Speeds on the Web

The critical factor in putting video on the web, and the limiting factor for the experience, is connection speed. Fortunately, broadband has gone from the ideal to the baseline; it’s hard to remember that we actually streamed video to modems. This is good, because modems really stink for video. Fortunately, anyone with a modem has long since given up being able to watch video on the web, and the web video industry has given up on them.

But what broadband means varies a lot by country, and within a country. Akamai released a “State of the Internet” report at the end of 2008; the portion of the Internet audience that could receive particular bitrates in a few select countries is shown in Table 22.1.

Table 22.1 Average Internet Connection Speed by Country.

 Average SpeedBelow 256 KbpsAbove 2 MbpsAbove 5 Mbps
United States3.9 Mbps4.8%63%25%
South Korea15.2 Mbps0.2%94%69%
UK3.5 Mbps1.6%81%8%
India7 Mbps26%3.7%0.6%
Global1.5 Mbps4.9%57%19%

The United States is pretty close to the global average, with most people able to receive at least 256 Kbps, quite a few still under 2 Mbps, and a quarter above 5 Mbps. Global broadband leader South Korea blows everyone else away with an average of 15 Mbps and 69 percent able to do 5 Mbps. The UK is heavily clustered in the 2–5 Mbps band, with few above or below that range. And India, the big country with the slowest connection, has a quarter of Internet users below 256 Kbps, and very few above 2 Mbps.

Compressionists need to understand and adapt to the global and local markets for Internet content. In South Korea, sweating 800 Kbps video would be as atavistic as tweaking your logo for the Netscape web-safe palette. But 800 Kbps may be the high-end stream in India.

And just because someone is accessing web video through a T3 line at work doesn’t mean they’re able to access video at T3 speeds (45 Mbps). Office bandwidth may be shared among dozens or hundreds of users, not to mention other functions like web and email servers that are given higher network priority than external video.

And as Internet use grows in the home, multiple people can be using the same pipe at the same time, and each can be doing multiple things with that. Even if a household is provisioned at a healthy 10 Mbps, if a couple of kids are downloading music, mom is installing system updates, and dad is installing a new World of Warcraft patch while trying to watch some video, maybe only 1 Mbps of that 10 is reliably available over the duration of the stream.

Wireless is another world as well. Some devices can get 2 Mbps on the go already, and are getting faster. But that same device might only get 40 Kbps out in the boonies or on the subway, and might get nothing at all in an elevator.

So, a decade into the web video era, we have incredible bandwidth, but it’s unevenly distributed, not just by place, but by time. Much of the challenge and craft of web video is in how we can take advantage of bandwidth when it’s there, but still provide a best-effort experience when it’s not.

Kinds of Web Video

Web video is a single phrase that can mean a lot of different things. First, we’ll break down web video into its major categories, and then talk in more detail about how it works.

Downloadable File

A downloadable file is the simplest form of video on the web. The audience uses a file transfer mechanism such as FTP or BitTorrent to download the file. No attempt is made to play it in real time.

This method is used for commercial download movie services and game trailers, but probably accounts more for pirated content than anything else. The advantage of downloadable files is that there is absolutely no expectation of real-time performance. The limit to data rate is how big a file a user is willing to download; if they’re willing to wait a few days for the best possible HD experience, that’s up to them. In an extreme example, I once spent two months downloading 110 GB of (legally provided) Nine Inch Nails concert footage via BitTorrent.

The downside is that potentially long wait, which runs rather counter to the web expectation of “Awesome! Now!” Even the services that formerly offered downloads only are adding immediate playback modes. For example, Xbox Live Marketplace was download-only originally, then added progressive download to allow the downloaded part of a video to be watched, and is adding 1080p adaptive streaming that should be available by the time you’re reading this.

Progressive Download

Progressive download delivers an experience between downloadable files and classic realtime streaming. Like downloadable flies, progressive download content is served from standard web and FTP servers. The most important characteristic of progressive download is that transmission may not be real-time, and it uses lossless file transfer protocols based on the Internet standard TCP (Transmission Control Protocol), typically the web standard of HTTP (Hypertext Transport Procotcol) or (more rarely) the older FTP (File Transfer Protocol).

All content transferred over the web is broken up into many small packets of data. Each individual packet can take a different path from the server to the client. TCP is designed to always deliver every packet of the file (which is why you don’t ever find the middle of an email missing, even with network problems). If a packet is dropped, it is automatically retransmitted until it arrives, or until the whole file transfer process is canceled. Because of this, it is impossible to know in advance when a given packet is going to arrive, though you know it will arrive unless transfer is aborted entirely. This means immediate playback can’t be guaranteed, but video and audio quality can be.

A progressive download file can start playing while it’s partially transmitted. This means less waiting to see some video, and gives the user the ability to get a taste of the content, and the option to terminate the download, should they decide they don’t want to see the whole thing. This ability to play the first part of the video while the rest is being transmitted is the core advantage of progressive download over downloadable files.

At the start of the progressive download, the user initially sees the first video frame, movie transport controls, and a progress bar indicating how much of the file has been downloaded. As the file is downloading the user can hit play at any time, but only that portion of the file that’s been transmitted can be viewed. YouTube is probably the best-known example of a service that uses progressive download.

Most progressive download players are set to start playing automatically when enough of the file has been received that downloading will be complete before the file finishes playing. In essence, the file starts playing assuming the transmission will catch up by the end (Figure 22.1). This works well in most cases, but when a file has a radically changing data rate, a data rate spike can cause the playhead to catch up with the download, so the next frame to be played hasn’t downloaded yet. Thus playback pauses for a time to catch up. The major web video players indicate how much of the video has been downloaded by tinting the movie controller differently for the downloaded portion of the movie. As long as the playhead doesn’t catch up with the end of the gray bar, you won’t have playback problems.

Figure 22.1 A progressive download player, showing the playhead position and how much content has been buffered so far.

image

Most systems start playing a file when the remaining amount of data to be transmitted, at the current transmission rate, is less than the total duration of the file. From this, we can derive the following formula to determine the buffering time, that is, how long the user will have to stare at the first frame before the video starts playing. Note connection speed is the actual download speed of the media; so could be 1 Mbps off the 10 Mbps family connection example.

This formula has some interesting implications:

image

1.  If the transmission speed is greater than the bitrate, then the buffer time is nil—a progressive download clip can start playing almost immediately in that case.

2.  As the ratio between the clip’s data rate and the connection speed changes, start delay can change radically. A 5-minute 3000 Kbps clip will:

•  Play immediately at 3000 Kbps or higher

•  Buffer 2.5 minutes at 2000 Kbps

•  Buffer 5 minutes at 1500 Kbps

•  Buffer 45 minutes at 300 Kbps

Longer duration has a linear impact on start delay—a two-minute movie trailer with a sixminute wait is much more palatable than a two-hour movie with a six-hour wait. Which is exactly what a 4 Mbps movie would get with a 1 Mbps transfer.

Classically, you could do rapid random access within the downloaded section, but couldn’t move the playhead beyond it. In the last several years, many web servers have enabled HTTP byterange access, and a compatible player can then start playback from any point in the file, buffering on from there. Implementations vary; some players flush the buffer of what’s been downloaded when doing that kind of random access, so the first part of the file would need to be downloaded again. But when byterange access is available it addresses what was the biggest single limitation of progressive.

In both modes, if the whole clip has been watched, the whole clip should have been buffered, so a second playback shouldn’t require any network access. This is a win for the user (patience can be traded for a high-quality experience) and the content provider (if someone wants to watch that dancing monkeys clip 49 times in one sitting, you only had to pay to send them the bits once).

The flip side of this historically was that the movie would be sitting right there in the browser’s cache, making it very easy for users to save it. Most of the time that’s probably a feature, not a bug; by default Windows Media Player and QuickTime Player Pro have offered a “Save As” option for progressive download content. But for content owners who want to maintain control over who watches their video and where, it’s been a concern, although different platforms have offered different ways to keep content out of the browser cache. In the end, meaningful content protection has always required real DRM, be it progressive download or streaming.

Real-Time Streaming

The other side of classic web video is real-time streaming. Its defining characteristic is, of course, that it plays in real time. No matter the duration of the file, it’ll start playing in a few seconds, and assuming stable and sufficient bandwidth, it will play all the way through whatever its duration. Each time the playhead is moved, the video will rebuffer, but once playback starts, if all goes well it shouldn’t stop until the stream is over.

My years in this industry leave me with a hollow feeling of terror at the words “if all goes well….”

Real-time streaming requires specific streaming video server software, either vendor- and protocol-specific—like with Windows Media, Flash, and RealVideo—or with interoperable standards-based options like RTSP for MPEG-4 and QuickTime. Such servers are required to support the protocols used; RMPT/RMPTe for Flash, with the others based on RTSP.

The classic Real Time Streaming Protocol (RTSP) can use UDP (User Datagram Protocol) packets, not just the TCP used in progressive download and web content. The salient difference between UDP and TCP is that UDP doesn’t automatically retransmit a dropped packet. Instead, if a packet of data goes missing, it’s up to the player to notice and potentially request it again from the server. While that may sound like a bug, it can actually be a feature.

When TCP automatically retransmits dropped packets, it keeps trying until either the packets arrive or TCP gives up on the whole transfer. And due to the structure of TCP, packets from later in the file than the missing packet aren’t available to the receiving application until the missing packet arrives.

So, when using TCP over a slow and lossy network, you can get into cases where some packets get dropped, and their retransmission uses additional bandwidth, crowding out new packets and further increasing the number of dropped packets. And nothing after the dropped packets is playable. And when the playhead gets to the point of the first dropped packet? Wham, no more video, and perhaps the dreaded “…buffering…buffering…” messages of yore.

With the classic UDP streaming model, it’s up to the player to figure out when and if to request dropped packets be retransmitted. And when a packet is associated with a frame that needs to play in 100 ms and the average transmission time is 200 ms, the player knows to just not re-request, and trust the decoder will be able to display something even without what was in that packet (for an example of what that looks like, see Color Figure 22.2). Coupled with buffers in the server and player to give time for retransmission requests to be made, and to let bitrates average out a bit, UDP promised to tame the wild packets of the consumer Internet into a decent video experience.

Figure 22.2 A very simple comparison of unicast to multicast. Note that it just takes one nonmulticast router to eliminate much of the value of multicast.

image

That was the vision. And it worked pretty well. But lossy high-latency connections like 56 Kbps modems had are the exception, not the rule these days. And that UDP could actually make it from server to player could never be counted on outside of internal networks inside organizations.

Real-time streaming solutions needed to fall back to TCP or even HTTP transmission when passing through firewalls or routers that don’t support UDP. And once a streaming technology has to work over HTTP (and it will for a big chunk of users), the practical difference between real-time streaming and progressive download starts to blur. Originally, progressive download always transmitted the entire file from start to finish, while streaming servers offered random access and bandwidth negotiation, even over HTTP. But web servers that support byte serving can provide random access as well now.

The key differentiated features between progressive download and streaming today are MBR, live, and multicast.

Multiple bitrates for real-time streaming

One streaming feature of Windows Media and RealMedia is Multiple BitRate encoding (MBR) . The content is encoded at multiple bitrates, and the player dynamically switches to the highest bitrate playable at the available transmission speed. For example, an MBR file encoded with 200, 400, and 800 Kbps streams would have the 400 Kbps played at 700 Kbps, and the 200 Kbps played at 350 Kbps.

There are two big challenges with the classic MBR model.

First off, a real-time protocol had no way to know how much extra bandwidth was available in order to increase bitrate. Knowing when to throttle bitrate down is easy; when only 300 Kbps of data is arriving out of a 400 Kbps stream, the actual bandwidth is 300 Kbps. But if data is sent continuously, there’s no way to measure how much extra bandwidth could be used when the stream is going well. In practice, MBR tends to switch down if necessary but may never switch back up again. That might be okay for short content, but it’s not a good experience if a two-hour movie keeps looking worse the longer it plays.

Second, there was always a pause at stream switching, since the new bitstream acted just like a random access seek. For on-demand, this meant a delay in playback, but for live this meant several seconds or video was simply never seen by the user.

That said, MBR was still a lot better than being stuck in the lowest common denominator with every user is limited to the minimum supported bitrate of. It was a great step forward, but never really paid off in its classic implementation.

The most successful streaming MBR implementation today is Flash Media Server’s Dynamic Streaming, discussed later in this chapter and in Chapter 26.

Webcasting

Because progressive download can’t control time, just quality, it’s not capable of live broadcasting (also called webcasting). Webcasting started as a difficult, finicky, and low-quality experience. But, in the inevitable demonstration of Moore’s Law, improved hardware and software brought webcasting to maturity in rapid fashion. Faster processors capable of running more efficient codecs, plus the general increase in available bandwidth, have made a huge difference. Today live streaming is capable of matching the quality of good SD broadcast TV around 1500–2500 Kbps.

Multicasting

With progressive download and non-UDP streaming, each user is sent their own copy (a “unicast”) of the content. If 1000 viewers are watching a stream, 1000 streams need to be served. And with 1,000,000 users, 1,000,000 need to be streamed, with 1000 times more bandwidth to provision and pay for. Thus unicast can prevent streaming from developing the economies of scale of broadcast. And if more users try to access the content than there is server and network capacity, the experience can get bad quickly. Being too popular wasn’t ever a problem in television, but it can and has been with streaming.

Enter multicast stage right, coming to save the day (Figure 22.2). With the IP multicast protocol, only one copy of any given packet needs to be sent to any given router, no matter how many users on the other side of the router are requesting it. That router then sends a single copy of each packet to each additional router that is subscribed to the IP multicast. This should dramatically reduce the bandwidth requirements of live streaming as the number of users scales up.

However, for multicasting to work as advertised, every router between server and client needs to be multicast-enabled. Even though modern routers support multicasting, many older routers don’t, or if they do, their operators don’t have multicasting turned on. In particular, many ISPs and corporate networks don’t allow multicast traffic into their networks. And it only takes one nonmulticast router between server and player to block multicasting for every client connecting through that router.

In most countries, multicast isn’t reliably available for delivering to consumers (although the UK is making great progress to universal multicast). It’s really only used in enterprise LANs and WANs. But it really shines there. Without multicast, it’s impossible to deliver corporate events to tens of thousands of desktops; the unicast bitrates overwhelm the internal routers. Even if a building has 2 Gbps of bandwidth, 1,000 people trying to watch 2500 Kbps live video isn’t possible (2.5 Gbps!), let alone all the other data that might be flowing on that pipe. But with multicast, only a single instance of the stream hits the building, and 5,000 people could watch it with little impact.

The biggest drawback of multicasting is that it doesn’t support on-demand; multicasting requires users to be receiving the same content at the same time.

Even simple things like pausing don’t work in most multicast implementations. Other features might not be available, like retransmission of dropped packets.

Peer-to-Peer

Peer-to-peer (P2P) is sometimes presented as the savior of streaming. The core idea is obvious enough: since users all have upstream and downstream bandwidth, why not have each user send bits on to another user instead of requiring everyone to pull unicast from the source?

Nice idea in theory. But the simple version of this doesn’t work well in practice; lots of people are behind firewalls or NAT (network address translation) routers that prevent computers from easily talking to each other. Fully peer-to-peer systems like BitTorrent may work well for files, since users without good access just get slow speeds. But once you’ve got a realtime playback requirement, a greater percentage of bits have to come from dedicated servers instead of from peers.

So, P2P can pay off in improved costs and quality by adding some new places for content to come from. But it’s not a revolutionary improvement for real-time media delivery; in real-world applications, perhaps only 50 percent of bits will come from peers, which may not even offset the costs coming from the greater complexity of P2P systems, and getting users to install P2P clients.

Adaptive Streaming

The new game in the internet video town is adaptive streaming. From a high level it can sound like a complex mishmash of progressive download, streaming, and general oddness. But as strange as it may sound in theory, it’s proving incredibly powerful in practice.

My Microsoft colleague Alex Zambelli describes adaptive streaming as “streaming adaptive for the web instead of trying to adapt the web to work with streaming.”

Adaptive streaming can be defined in a number of ways, but for me it has three salient characteristics:

•  Content is delivered via HTTP as a series of small files.

•  Seamless bitrate switching is enabled via switching between different series of those small files at different bitrates.

•  The files are small enough to be cached by the existing proxy cache ecosystem for scalability.

The first company to put these features together was the pioneering Move Networks. In their original model, they encoded the same content in multiple bitrates (over a dozen for HD) in three-second Closed GOP chunks. So there’s a new file every three seconds for every bitrate.

On the client side, they have a browser plug-in that decides which bitrate to pull the next chunk from. And since these are being read as small files, not as a continuous stream, the problem of measuring available bandwidth gets solved by just measuring the download time per chunk. If a 300 Kbps three-second chunk downloads in one second, then there’s about 900 Kbps of bandwidth that moment. By always keeping the next few chunks in the buffer, a drop in bandwidth can be detected soon enough to start requesting at a lower bitrate.

The short GOPs of fixed duration is the next innovation. Since they’re the same duration in each data rate, every bitrate starts with a new Closed GOP at the same time. And as they’re Closed GOPs, each is independently decodable without reference to any other chunks.

Thus, switching between bitrates is just a matter of feeding the video bitstreams continuously to the decoder, appending those demuxed from each chunk. As long as the decoder can handle resolution switches on the fly, there’s not a moment’s hesitation when switching between bandwidths.

This can also be used to make startup and random access time a lot faster as well, since a lower bitrate can be requested to start quickly, and then ramped up to the full bitrate.

So, the net gain is that adaptive streaming offers users a welcome hybrid of the best of progressive download and real-time streaming:

•  HTTP means it doesn’t get stopped by firewalls.

•  MBR encoding and stream-switching with no pauses.

•  Accurate measurement of available bandwidth for optimum bitrate selection.

•  Nearly immediate startup and random access time.

•  The “just works” of progressive download with the “no wait” of streaming.

There’s one last facet to adaptive streaming that neither progressive nor real-time streaming had: proxy caching. It’s not something non-sysadmins may think much about, but tons of web content isn’t delivered from the origin web server each time. Instead, popular content gets cached at various intermediate places throughout the network. The goal is to save bandwidth; if an ISP or office has 1000 people all reading CNN at the same time, it’s wasteful to keep pulling in that same PNG of the CNN logo over and over again. So a special kind of server called a proxy server or proxy cache is placed between the users and inbound traffic. It tracks and temporarily stores files requested by every computer on the network, If there’s a second request for the same file, it gets served from the proxy cache. And everyone wins:

•  The user gets the file faster from the cache than if it had come from the original server.

•  The ISP doesn’t have to pay for the inbound bandwidth.

•  The original web site doesn’t have to pay for as much outbound bandwidth.

There’s a maximum size for files to be cached, however, and it’s almost always smaller than video files. But with adaptive streaming, files are only a few seconds each, and so easily fit inside the caches.

Thus, lots of people watching the same video at the same time, or even within a few hours of each other, can be watching from the proxy cache. This provides some of the scalability of multicast, but without having to be live, and without needing special router configuration. It’s just part of the web infrastructure already. Pretty much every ISP, business, school, government office, and other organization is going to have a proxy cache already installed and running.

And those caches aren’t just at the edge. The content delivery networks (CDNs) all use proxy caches inside their own network to speed delivery. So adaptive streaming works very well for them, particularly compared with having to install and maintain specific servers for each format at potentially thousands of locations. (For more about CDNs, see the “Hosting” section later in this chapter.)

The general experience with proxy caching is that the more popular content gets, the fewer bits per viewer are pulled from the origin server or even the CDN. This restores some of the broadcast model of unexpectedly popular content being a cause for celebration rather than crisis. Since fewer bytes per user need to leave the CDN in the first place, bigger audiences should lower per-user bandwidth costs.

So, is adaptive streaming the holy grail? Not quite. It’s a leap forward for many scenarios, but it has some limitations:

•  Companies don’t typically have internal proxy caches, so WAN broadcast still requires multicast.

•  An intelligent client is required. There’s a lot of heuristics involved in figuring out the right chunk to grab next. Adaptive streaming hasn’t been built into off-the shelf devices and operating systems. It either requires a custom app like on the iPhone or a platform that allows custom apps, like Silverlight.

•  There can be a huge number of files involved. A decent-sized movie library in multiple bitrates can get to a quarter-billion files pretty quickly. This becomes extremely challenging to maintain, validate, and transport.

The future of adaptive streaming

There’s a lot of innovation in adaptive streaming these days, with new implementations coming from Microsoft in Silverlight and Apple for the iPhone and QuickTime X. One trend we’re seeing is single files per bitrate on the server, which are divided into chunks only as they’re requested. So, that hour-long video could only be eight files instead of 24,000. Since most content in a large library isn’t being viewed at any given time, this avoids the management cost of incredible numbers of small files, while still offering all the scalability.

Is Flash Dynamic Streaming Adaptive Streaming?

Flash’s Dynamic Streaming is a bit of an odd duck in our taxonomy here. It doesn’t support UDP, just TCP and HTTP, so it can’t do multicast. It has multiple bitrates, but they’re continous streams, so it doesn’t leverage proxy caches. It is capable of gapless or nearly gapless bitrate switching.

Overally, I’d say RTMP may be best thought of as the culmination of the classic proprietary protocol MBR stream switching arcitecture, working better than anything had before—but without the increased scalabilty.

What About Videoconferencing?

Streaming formats—even when doing live broadcasting—introduce latency in the broadcast. This means that web broadcasters are simply not viable for videoconferencing because there will typically be a delay of several seconds. Even the lowest latency web protocols like Adobe’s RTMP still have an uncomfortable amount of end-to-end delay for easy conversation; it’s more like a satellite phone conversation on CNN. The web formats use this latency to optimize quality, which is one reason why they look better than videoconferencing at the same data rate.

Videoconferencing requires videoconferencing products, which are generally based around the H.323 standard.

Hosting

Web video, by definition, needs to live on a server somewhere. What that server is and where it should live vary a lot depending on the task at hand.

Server type is simple to define. Progressive download lives on a normal web server, and realtime streaming uses an appropriate server that supports the format, such as Windows Media Services running on Windows Server for Windows Media and Flash Media Server for Flash. Adaptive streaming that uses discreet chunks may just need a web server, or a specific server may be required to handle chunk requests if large single files are used on the server. For QuickTime and MPEG-4, there are a variety of servers to choose among, although for both formats many of them are based on Apple’s open-source Darwin Streaming Server.

The biggest question is where to put your video: on an in-house server, a hosting service, or on your own server in a co-location service? In general, you want your files to be as close to the viewer as possible.

In-House Hosting

Having your media server inside your facility makes most sense for content that is mainly going to be viewed on an internal network. In-house hosting only makes sense for providing content outside your network on extremely small or large scales. Because most businesses are provisioned with fixed bandwidth, your simultaneous number of users is limited to how much bandwidth you have. If you want to handle a high peak number of users, you’d need to buy far more bandwidth than you’d use at nonpeak times. If you’re Google or Microsoft, this might make sense.

But otherwise, external-facing web content should never be on an organization’s normal Internet connection.

Hosting Services

With a hosting service, you rent space on their servers instead of having to provide or configure servers yourself. Hosting services are the easiest and cheapest way to start providing media to the public Internet. Hosting services are also much easier to manage, and can provide scalability as your bandwidth usage goes up or down. You are typically billed for how much bandwidth you use in a month, so huge peaks and valleys are much less of a problem.

High-end hosting services like Akamai, Level 3, and Limelight describe themselves as Content Delivery Networks – CDNs. A CDN isn’t just a single server, but a network of servers. The connections between the servers are high-speed, high-quality, and multicast-enabled. The idea is to reduce the distance packets need to travel on the public Internet by putting edge servers as near to where customers are (a.k.a. all around the “edge” of the Internet) as possible. For multicasting, this means that multiple streams only need to be distributed from each local server. For static content, files can be cached at the local server, so they would only need to be transmitted once from the center. CDNs will deliver both the best total cost and the best user experience for the vast majority of projects and businesses publishing outside their own networks.

Co-location

A co-location facility is a hybrid between in-house and hosting services. You provide your own server or servers, but put them in a dedicated facility with enormous bandwidth.

Co-location makes sense when you have enough volume that it’s worth the cost and complexity of managing your own servers. Because you provide more management, co-location is typically somewhat cheaper per bit transmitted than a hosting service. The biggest drawback to co-location is that you need to physically add more machines to gain scalability.

Tutorial: Flexible MPEG-4 For Progressive Download

Scenario

Our company has an large library of training videos for our extensive line of home gardening products. We’d like to get these published to the web. These are sold globally by a variety of affiliates and independent distributors, so we want to be able to provide files to our partners that they can host easily and use in a variety of media players.

The Three Questions

What Is My Content?

The content is mainly marketing and training materials for a variety. There’s a lot of older content produced on BetaSP and DV 480i 4:3, with a smattering of content produced by a European division in 576i25. Newer content is HD 16:9, produced in a mix of 24p, 30p, and 30i (whatever the camera operator felt like that day, apparently). The clips are pretty short, ranging from 2–10 minutes.

Who Is My Audience?

We really have two audiences. The initial audience are the many local companies who are going to use these files. Some of them have standardized on different media players, including Flash, Silverlight, and QuickTime. The final audience is the people around the world who will be watching the clips.

What Are My Communication Goals?

We’re making these clips available to help our affiliates sell more products, and to improve customer satisfaction and reduce the cost of telephone support by providing visual demonstration of product use. We’ll also save some money by having to buy and ship less physical media.

We want these files to “just work” in a wide variety of scenarios and players, over a reasonable variety of bandwidths. The video and audio quality needs to make the content clearly understandable, and to generally be attractive and clean, reflecting our brand.

We want random access to be spritely so users can easily scrub around the tutorial clips while reviewing a how-to procedure.

These files could be used for years to come, so we want something that should be compatible with commonly available players in the future. We also want to define a relatively bulletproof process that won’t require a lot of per-clip fiddling. We’ve got a whole lot of files to publish!.

Tech Specs

Given that these are short clips and we don’t have much knowledge or control over affiliate’s hosting, progressive download is an obvious choice. That way we can guarantee quality of playback, if not buffering time.

MPEG-4 files with H.264 video and AAC-LC audio are compatible with Flash, Silverlight, QuickTime, and Windows Media Player in Windows 7, and should be compatible going forward.

Our content can be pretty complex. By definition, it’s full of foliage, and a handheld shot of an electric pruner going crazy on a bush can take a lot of bits to look decent. We’re going to need more bits/pixel than average. For easy progressive download playback, we’re going to target 1000 Kbps total, allocating 96 Kbps to audio at 44.1 KHz stereo. While we might have used 128 Kbps for content with more music, our soundtracks aren’t central to the experience; we mainly need the voiceover to be clear and any short stings of music to not be distractingly bad. That leaves 904 Kbps for video. We’ll use a total peak bitrate of 1500 Kbps to keep progressive playback smooth and decode complexity constrained, yielding a 1404 peak video bitrate, assuming CBR audio.

For easy embedding, we’ll stick with a maximum 360 line height, so 16:9 will come out as 640 × 360 and 4:3 as 480 × 360. It always bugs me when 4:3 gets 640 × 480 and 16:9 gets only 640× 360; the 16:9 source is generally of higher quality, and so can handle more pixels per bit, not less. And yes, we could have done Mod16 (640 × 352 and 464 × 352), but these values are at least Mod8, and let us have perfect 16:9 and 4:3 aspect ratios.

For preprocessing, we’ll crop eight pixels left/right with 4:3 setting; they generally include at least some content originally from BetaSP with horizontal blanking we need to get rid of. Eight left/right is still well outside of the motion-safe area, so we won’t lose any critical information. The HD sources are all digital and full-raster, so no deinterlacing is needed there.

We’ll have frame rate match the source up to 30p; 24p, 25i, 30i, and 30p stay the same.

For good random access, we’re going to use four-second GOPs with two B-frames. With 30p source, that’ll give us 120 frames per GOP, of which 80 are B-frames, so the worst-case number of reference frames would be 40 in any GOP.

For H.264, our choices are Baseline or High Profile. High Profile is compatible with all the web players, but not with most devices. With careful encoding, we could try to make iPod-spec .mp4 files so they could be played there as well. However, there’s a significant efficiency hit in that. In order to keep quality up at our bitrate we’ll use High Profile. If a user wants to watch on a device, these files are short and will transcode quickly as-is.

We’ve got quite a lot of content to encode, so we’re going to tune for a good balance between quality and speed.

Because our source varies in aspect ratio, frame rate, and interlaced/progressive, we want settings that adapt to the source without making us burn a bunch of time on per-file tweaking. Episode, Expression Encoder, and Compressor all have adequate adaptive preprocessing to let us to have one preset for 4:3 480i and another for all the 16:9 sources.

Settings in Episode

We’ll start with the H.264_480_360 and H.264_Widescreen_640 360 presets, with the following changes (Figure 22.3):

•  Output format to .mp4 from .mov (.mov would have worked in our software players, but .mp4 is a little more broadly supported)

•  H.264 General

•  VBR using Peak Rate

•  Peak Rate: 1404

•  Average rate: 904

•  Natural and Force Keyframes with 120 keyframe distance (4 sec @ 30p)

•  Number of reference frames: 4 (a little faster, and just as good with video source)

•  Number of B-frames: 2

•  H.264 Profile and Quality

•  Encoding Profile: High

•  Entropy Coding: CABAC (with 1.5 Mbps peak, decode complexity should be okay)

•  Display aspect ratio: 1:1 (square-pixel)

•  2-pass interval: 500 frames (best variability)

•  Encoding speed versus quality: 50 (much faster and nearly as good)

•  Frame Rate

•  Upper limit of 30

•  Deinterlace

•  Create New Fields By: Edge Detecting Interpolation Heavy (best quality outside of the very slow Motion Compensation mode)

•  Automatic mode for the 16:9 preset as well, since it’ll turn off automatically with progressive source

•  Resize Advanced

•  (16:9) Preprocessing: Lowpass for large downscales (the 4:3 are all SD, so doesn’t matter there)

•  Audio

•  Bitrate to 96 Kbps

•  Volume: Normalize

Settings in Expression Encoder

We can start with the H.264 Broadband VBR preset. EEv3 is Main Profile only, but supports CABAC and multiple reference frames at least.

•  Key Frame Interval to 4

•  Average bitrate to 904

•  Video

•  Peak bitrate to 1404

•  4:3

•  – Width 480, Height 360

•  – Video Aspect: 4:3

•  16:9

•  – Width 640, Height 480

•  – Video Aspect 16:9

•  H.264 Settings

•  Audio

•  Bitrate: 96 Kbps

•  Enhance

•  (4:3) Crop Left 8, Width 704, top 0, Height 480

•  Volume Leveling: On (Normalization in Expression Encoder)

Settings in Compressor

Unfortunately, Compressor doesn’t provide full access to H.264 features when going to a MP4 file. So we’ll have to create a .mov and use a batch utility to remux to .mp4 after. It’s a relatively simple AppleScript for those so inclined.

Compressor can’t do High Profile, so this will be Main Profile only. Also, we can’t set multilple reference frames or the B-frames value. However, QuickTime doesn’t apply CABAC, either, so the decoding will be easier, hopefully compensating for more reference frames.

We can start with the H.264 LAN preset, with these modifications (Figure 22.5):

•  Video

•  Key Frames every 120 frames

•  Data Rate: Restrict to 904

•  We don’t have any explicit peak buffer control; we’ll leave it on Download and hope it comes out with a reasonable peak.

•  We need multipass on to get this option

•  By using Frame Reordering we’ll get 1 B-frame

•  Audio

•  Rate: 44.1 KHz

•  Quality: Best (it’s really not meaningfully slower)

•  Target Bit Rate: 96 Kbps

•  Stick with average bitrate mode as this is progressive download

•  Filters

•  Audio: Dynamic Range On (This is Compressor’s Normalize filter. The defaults are nice, and allow greater tuning. The noise floor is particularly useful with noisy source, but needs to be calibrated per-source.)

•  Geometry 4:3

•  Crop Left and Right of 8

•  Frame Size 480 × 360 (Custom 4:3)

•  Geometry 16:9

•  Frame Size 640 × 360 (Custom 16:9)

Figure 22.3 A, B, C, and D Our settings in Episode (Figure 22.4). Watch out for peak bitrate being on top of average.

image

Figure 22.4 A, B, and C Our settings in Expression Encoder. Make sure that you have height and width still at 0 and 480 after setting the 4:3 crop.

image

Figure 22.5 A, B, C, D, and E Our settings in Compressor. I sure wish I could get to the advanced features targeting .mp4.

image

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.34.154