CHAPTER 19
Ogg

The Xiph.Org Foundation’s Ogg format and codecs are an attempt to make competitive public-domain media technologies that are unencumbered by patents and therefore broadly usable in open source products. The project started with audio and has added video in recent years.

The Ogg codecs haven’t become popular enough to be included in that many mainstream products so far, but the lack of any licensing requirements means free software updates are broadly available to add Ogg support. That said, Xiph’s assertions as to the patent-free nature of their codecs haven’t been proved, nor is that kind of proof really possible. They’ve had no trouble so far.

Why Ogg?

Avoid Licensing Costs

The essential point to the Ogg formats is to be free to implement and play by everyone.

Preference for a “Free” Format

There are markets with a preference for public-domain formats like Ogg. The most notable is Wikimedia, owner of Wikipedia, which has supported Ogg formats for some time and has announced that they will be dramatically expanding that use.

Native Embedding in Firefox and Chrome

Firefox, Opera, and Google’s Chrome have added Ogg video and audio support, with native support of the proposed <video> tag in current HTML5 drafts.

Why Not Ogg?

Lower Compression Efficiency

The Ogg codecs currently require at least twice as many bits/second as the best video and audio codecs available. This can increase bandwidth costs and reduce reach when higher resolutions, quality, and durations are used.

Not Broadly Supported

While Ogg is designed to be broadly supported, so far the MPEG-LA licensed codecs dominate video playback, with similarly royalty bearing audio codecs also broadly used. The Ogg formats haven’t crossed the chicken-and-egg threshold of broad use driving broad support, particularly in devices. While some do support Ogg Vorbis, none do Ogg Theora yet. It will be hard to get Theora support in devices unless support is added to ASICs.

This may or may not change on the desktop; despite Firefox and Google’s forthcoming support, neither Apple nor Microsoft have indicated any plants for support, and their browsers dominate the Mac and Windows systems, respectively. Of course, it’s easy to download software players for PCs.

HTML5 itself doesn’t currently mandate any particular codecs, and is likely some years away from being finalized.

Ogg File Format

The Ogg format was created along with the Ogg Vorbis format, competing with the MP3 file format. It’s rather basic, but it works. One notable lack is a formal metadata mechanism; metadata instead needs to be included in the codec bitstreams themselves.

OGV

Ogg was originally designed as an audio format, so required some tweaking to the specification to add video support. These are .ogv files.

OGM

Early efforts to put video in Ogg by non-Xiph programmers resulted in another format called OGM. This has been deprecated, although some of those older files are still rolling around.

MKV

The Matroska format (named after the Russian nesting dolls) is also able to contain the Ogg codecs. It offers much deeper metadata support than native Ogg, particularly for multiple language audio and captioning.

Ogg Vorbis

Ogg Vorbis was created as a patent-free, royalty-free, open-source audio codec as an alternative to MP3. The project was initiated in 1993, the bitstream frozen in 200, and Version 1.0 was released in 2002. Overall, Vorbis is more efficient than MP3 and in the general ballpark of WMA 9.2 and AAC-LC in compression efficiency, and thus competitive for music recording. One factor complicating comparisons is that Vorbis is normally used in VBR mode, while other codecs default to CBR. While Vorbis is capable of CBR and ABR modes, their use is not recommended. So a VBR encode at the same data rate is probably a better comparison point.

The mainline Vorbis version has improved in recent years, but for most of the last few years, the highest-quality encoder implementation has been the current version of the AoTuV version of the codec (short for Aoyumi’s Tuned Vorbis). Ogg has merged AoTuV improvements back into Vorbis in that past, and are expected to do so again.

Ogg Speex

Ogg Speex is a low-bitrate speech codec using the same Code-Excited Linear Prediction (CELP) model seen elsewhere. Speex is more flexible than many other voice codecs, supporting ABR and VBR encoding, bitrates up to 44 Kbps, and sample rates up to 32 KHz. It’s most commonly used in videoconferencing, and is less seen in files. It is included in Flash 10, including in flv files.

Ogg FLAC

FLAC is the Fast Lossless Audio Codec—certainly the most straightforward name for anything in Ogg. It was released in 2001, and officially joined the Ogg family in 2003.

It is what it sounds like—a fast, lossless audio codec akin to Apple Lossless from QuickTime or WMA Lossless from Windows Media.

Lossless codecs are pretty much all the same, being lossless and all. The only real difference is in speed and compression efficiency, and compression efficiency doesn’t vary by more than 10 percent or so between implementations for typical content; about 50 percent compression is typical for music, and up to 75 percent for soundtracks. This is quite a bit better than just zipping up a .wav file.

FLAC is extremely flexible, and can handle nearly any integer bit depth and up to 7.1 channels. And true to its name, it is quite fast to encode or decode.

Being lossless, the only real thing to configure with FLAC for compression is how much CPU time to increase efficiency. The difference in size is small and the difference in encode time is big, but if it’s something that’s likely to be broadly distributed, you can use the –best flag for slowest, highest quality for realistic use, or even the -e mode, which is a really slow exhaustive mode that could improve compression possibly another half percent past –best.

I’m not a fan of lossless compression for distribution; once the content is perceptually lossless, more bits are wasted bits. But lossless is great as a production format to reduce transmission time. I definitely will use a lossless codec like FLAC for storing anything I may want to transcode later.

That said, there are certainly those who very much enjoy the psychological effect of having lossless audio content, primarily for music content.

Ogg Theora

Ogg’s first effort at a video codec was called Tarkin (announced in 2000). It was a waveletbased distribution codec, adding to the list of promising-sounding wavelet video codecs that have yet to become useful.

Tarkin didn’t make much progress before, On2 decided to open-source their VP3 video codec (a much earlier version of the VP6 codec from FLV files), waiving all patent claims for it. Ogg decided to put Tarkin on hold and work from the VP3 source. On2 and the Xiph foundation Ogg’s effort, announced Theora in 2002, with a bitstream format locked down in 2004, but the 1.0 release didn’t happen until 2008. Since then, lots of progress has been made in improving compression efficiency of the codec. There were big limitations in the On2 encoder source code which kept Theora from delivering competitive quality, particularly in detail preservation. Xiph’s Monty has had a very interesting series of blog posts detailing how they’ve evolved their implementation to improve quality, which is a three-scoop sundae of codec nerditry. And there’s plenty of basic optimizations that have yet to be implemented in Theora. As of 1.1, it still lacks real rate-distortion optimization, lookahead rate control, modern motion search capabilities, and multithreading.

However, it’s clear that the core architecture of the Theora bitstream is going to keep it from ever being competitive to H.264 or VC-1. These things are hard to predict, but based on the tools Theora supports (it doesn’t even have B-frames!), at best it might wind up as a codec on par with MPEG-4 part 2. While there are many ways in which Theora is different from normal codecs, like encoding from bottom to top instead of top to bottom, there are presumably more about avoiding existing patents than any thing that could provide a big improvement in compression quality.

That may be more than enough for simple embedding of short videos in a web page, and there’s been interest in Wikipedia and other organizations about using Theora for that. But the bandwidth savings of more advanced codecs are going to be worth a whole lot more than the actual costs of MPEG-LA license fees for companies doing high volumes or high bitrate content.

Ogg Dirac

The lure of wavelets is strong. They’re so good for still images, we think, that it doesn’t matter if motion estimation isn’t quite as good as with DCT block-based codecs. How hard can it be? we think.

Pretty hard, it always turns out.

Dirac is a BBC-led effort to create a license-free wavelet-based video codec. They’ve had a lot of buzz, and are now working with Xiph to have Dirac under the Ogg umbrella as well.

There have been some interesting demonstrations of Dirac as a production format. I’d expect it to work well there, since intra-only coding works fine with wavelets—SMPTE is working on I-frame-only Dirac as VC-2. But for a distribution format, we need much better compression efficiency, and that’s where Dirac has yet to deliver. There is a research project called Schroedinger working on building a Dirac encoder that’s competitive with H.264/VC-1 at typical web bitrates. But it hasn’t come close to producing anything competitive yet, nor is there a clear strategy for how that could be possible.

At this point, Dirac implementations aren’t any better than Theora, and are a lot slower on encode and decode and less broadly supported. Dirac won’t be of interest unless it can be significantly better than Theora for something.

Encoding OGV

At this point, the primary tool for OGV encoding is ffmpeg2theora, a command-line tool based on ffmpeg and available as source code and compiled for major platforms. A variety of GUI front-ends are available for it, but so far all just expose the modes listed next.

It only has a few options that are specific for the video and audio codecs themselves. Key options as of Theora 1.1 include:

•  -v/--videoquality sets a Quality VBR level from 0 to 10

•  -V/--videobitrate specifies bitrate in Kbps

•  --speedlevel specifies quality versus speed 0–2 with 0 the slowest and highest quality

•  --soft-target relaxes rate control requirements, making it more like an audio ABR encode. Fine for progressive download, but not for streaming.

•  --optimize is the same as -speedlevel 0, giving the slowest, highest-quality encode

•  --two-pass Enables two-pass encoding. Really required for decent encoding, particularly with CBR buffers

•  -F/--framerate specifies frame rate in frames per second

•  --keyint specifies keyframe interval in frames

•  --buf-delay Specifies buffer in frames to make a CBR encode. Defaults to the GOP length for 1-pass encodes and the whole file for VBR encodes (making them unconstrained VBR). Unusually, there’s no way to specify peak bitrate.

•  -a/--audioquality specifies VBR audio quality from –2 to 10; the default of 1 is generally decent

•  -A/--audiobitrate specifies an audio bitrate in Kbps, ranging from 32–500 Kbps

•  -C/--channels specifies the number of output channels

•  -H/--samplerate specifies sample rate in Hz

Ogg Tutorial

This tutorial shows creating an embedded tutorial graphic for player embedding.

Scenario

We’re making an OGV with Theora video and Vorbis audio for embedding in a web page collecting public domain video content. We’re limited to the following maximum specs:

•  640 × 480 frame size

•  30 fps frame rate

•  30 MB per file

Three Questions

What Is My Content?

The source is a U.S. government video of Bill Clinton giving a speech, with a montage of Air Force planes. It is 1280 × 720p60, 3:45 in duration, and varies a lot in detail and motion.

Who Is My Audience?

The audience is composed of viewers of this public resource web site. We can’t make any predictions as to decoder power. It’s up to the web site itself to make sure that users have the proper decoders.

What Are My Communication Goals?

We want as good quality as we can get within the specs of the web site, and reasonable buffering time.

Tech Specs

The web site has a maximum frame size of 640 × 480 and a maximum frame rate of 30 fps. With 3:45 and 25 MB, we can have a total bitrate of 1066 Kbps.

Since the source is 16:9, the output frame size should be 640 × 360. And we’ll divide the source frame rate of 59.94 to get to 29.97.

Since we’re not streaming we don’t need a tight buffer. But the first part of the clip is a lot more complex than the second, which can make for a poor progressive download experience. So we’ll use 2-pass mode with a relatively long buffer window. In other codecs, we might have just set a peak bitrate of 1500 Kbps, but Theora 1.1 doesn’t support that mode.

Settings

ffmpeg2theora.exe -o ArmFrcs_2p_950-100.ogv -V 950 -A 100 --optimize --two-pass -x 640 -y 360 -F 29.97 -K 150 -H 44100 ArmFrcs_720p60.avi

Specifying:

•  -V 950: Video bitrate 950 Kbps

•  -A 100: Audio bitrate 100 Kbps (1050 total; a little headroom on our 1066)

•  --optimize: Slowest, highest-quality encode (still real-time on my workstation)

•  -x 640 -y 360: 640 3 360 frame size

•  --two-pass: Use two pass encoding, making it a bitrate limited with unconstrained peaks

•  --buf-delay 600: A 20 second buffer at 30 fps; balance of allocation flexibility and flatter peaks.

•  -F 30: 30 fps frame rate

•  -K 150: Keyframe every 150 frames (5 seconds)

•  -H 44100: Resample audio to 44.1 KHz

And there we are! The output file is within 1% of our target size, and looks more than good enough for our use; although there is some edge ringing and a lot of texture detail loss, we don’t have much obvious blocking. Theora has come a long way in the last few years.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.218.105