MPEG-2 Compression

MPEG-2 (Moving Picture Experts Group) refers to a family of protocols designed for encoding, compressing, storing, and transmitting audio and video in digital form. MPEG-2 is the protocol of choice for digital video of the HDTV Grand Alliance, but it supports both progressive and interlaced displays.

MPEG-2 is capable of handling standard-definition digital television (SDTV), HDTV, and motion picture films. MPEG-2 supports data transmission, which is used for sending control information to digital set-top boxes and can be used for transmitting user data, such as Web pages. It is backward-compatible with MPEG-1, which means that MPEG-2 decoders can display MPEG-1-encoded files, such as CD-ROMs. This compression technology also has full functionality for video on demand (VoD). MPEG-2 chips are on the market for real-time encoding, and there is a specification for MPEG-2 adaptation over ATM AAL5.

MPEG-2 is rapidly assuming a central role in broadband networking, and it is easy to see why. Without digital compression, RBB would not be possible. Uncompressed video consumes too much bandwidth (see Table 2-8).

Table 2-8. Uncompressed Video Bandwidth Requirements
Format Pixels per Line Lines per Frame Pixels per Frame Frames per Second Millions of Pixels per Second Bits per Pixel Megabits per Second
SVGA 800 600 480,000 72 34.6 8 276.5
NTSC 640 480 307,200 30 9.2 24 221.2
PAL 580 575 333,500 50 16.7 24 400.2
SECAM 580 575 333,500 50 16.7 24 400.2
HDTV 1920 1080 2,073,600 30 62.2 24 1492.8
Film, various depending on content 2000 1700 3,400,000 24 81.6 32 2611.2

With numbers such as those in Table 2-8, compression is a must for storage and transmission. For digital broadcast video and film, MPEG-2 is the standard among broadcasters and cable operators.

This discussion of MPEG focuses on two technical characteristics: video compression and Systems Operations.

MPEG-2 Video Compression

The formal specification for MPEG-2 video is written in "Generic Coding of Moving Pictures and Associated Audio: Video," Recommendation H.222.0, ISO/IEC 13838-3, April 1995.

MPEG-2 achieves compression in two modes: spatial compression and temporal compression.

Spatial Compression

Spatial compression refers to bit reduction achieved in a single frame. This type of compression uses a combination of tiling the image into blocks of bits (8×8), discrete cosi1ne transform, and Huffman run length encoding to achieve bit reduction. Spatial encoding is lossy—that is, some information is lost in the compression, but knobs (control parameters) exist to trade off image loss against compression and processing. Some of these knobs are user-controlled; others are automatically adjusted in the encoding process.

Temporal Compression

Of greater interest for network analysis is temporal compression. Unlike spatial compression, temporal compression achieves the majority of bit reduction. Temporal compression refers to bit reduction achieved by removing bits from successive frames by exploiting the fact that relatively few pixels change position in 1/30th of a second, the time period between adjacent frames. A picture can be coded using knowledge of a prior picture and applying motion prediction so that just the motion vectors of blocks or macroblocks are sent instead of coding a complete picture. This achieves superior compression than, say, motion JPEG, which uses the same spatial compression as MPEG-2 but has no provision for temporal compression.

In addition to compression, the quality of video presentation is affected other factors:

  • Viewing distance

  • Bit rate

  • Picture content

  • Size of display monitor

These factors must be considered when setting knobs for temporal compression. With respect to bit rate, focus group testing [Cermak] has shown that 3 Mb MPEG-2 video is comparable to normal cable TV and VHS over a wide range of content. However, increasing bit rates to 8.3 Mb does not improve viewing quality significantly: 8.3 Mb MPEG was viewed as comparable to the original uncompressed signal.

Frame Types

Frame types are central to temporal compression. To achieve temporal compression, three different kinds of frames are defined:

  • I-frames, or intra-frames, which are complete (spatially compressed) frames

  • P-frames, or predicted frames, which are predicted from I frames or other P frames using motion prediction

  • B-frames, or bidirectional frames, which are interpolated between I and P frames

P-frames achieve a bit reduction on the order of 50 percent from their corresponding I-frame. B-frames achieve bit reduction on the order of 75 percent. These compression estimates are for 4 Mb MPEG-2 video with standard TV definition and not too much motion. Actual bit reduction differs according to picture content and the mix of I-, P-, and B-frames in the stream and various knob settings for spatial compression.

Frame Ordering and Buffering

Figure 2-15 shows a sequence of MPEG-2 frames in display order. This order is different from the order in which frames are decoded at the consumer set top.

The decoder receives an I-frame (Frame 1). The next frame received is Frame 4, which is a P-frame. Actually, this is a bit of a misnomer. A P-frame is not a picture sent by the encoder. Rather, it would be more precise to say that a P-frame is derived by the decoder from information (motion vectors) sent by the encoder. Frames 1 and 4 are buffered in the decoder and are used to compute B-frames 2 and 3 by interpolation

Figure 2-15. MPEG Temporal Compression in Display Order


The use of B-frames means that the order in which frames are displayed by the receiver is not the order in which frames are decoded at the receiver. In Figure 2-15, the display order for the first four frames at the receiver is (I, B, B, P). But the order of decoding is (I, P, B, B). This is because the P-frame is computed first and is used with the I-frame to compute the two intervening B-frames.

An I-frame (Frame 7), is transmitted. The original I-frame (Frame 1) is discarded from the decoder buffer memory, and the first P-frame (Frame 4) is used with Frame 7 to interpolate for B-frames 5 and 6. With the receipt of the new I-frame, the process is repeated. If many I-frames exist, picture quality tends to be higher, but so does bandwidth use. Still, frequent use of I-frames is necessary when motion compensation fails to produce good pictures.

The use of B-frames is a controversial issue. They impose buffering requirements on the order of 1 to 2 MB and latency at the receiver. Some vendors, such as General Instruments, an expert and pioneer in the field of digital TV (and an original Grand Alliance participant), have resisted the use of B-frames, believing that the cost to the decoder does not justify bandwidth reduction on the network. Attitudes about B-frames are becoming more friendly as the costs of memory drop, but there are still important tradeoffs among buffer size, latency, bandwidth, set top memory cost, and picture quality.

Using Figure 2-15 again, some control parameters can be defined. First, the term group of pictures refers to the set of frames between I-frames, including the first I-frame. The group of pictures is shown in the shaded portion of Figure 2-15. The number of pictures in the group of pictures is called the I-frame distance; in this example, the I-frame distance is six. There also is a P-frame distance, which is the number of frames between a P-frame and the subsequent P- or I-frame, including the first P-frame. In Figure 2-15, the P-frame distance is three. I-frame distance and P-frame distance are important knobs. For television, which presents 30 frames per second, I-frame distance is on the order of 15, which means that an I-frame is sent every half-second. P-frame distance is on the order of 0 to 3.

Networking Implications

Frame architecture has implications for networking. I-frames anchor picture quality because P- and B-frames are derived from them. Therefore, it is important that I-frames be transmitted with high reliability, higher than P- or B-frames. Thus, when transmitting MPEG frames over ATM or Frame Relay, it is advisable that I-frames be given priority.

MPEG-2 creates streams of compressed video with timestamps. This timestamp is provided inside an MPEG-2 packet so that the timing and ordering of the I-, B-, and P-frames is correct. Note that because the MPEG-2 layer itself has a timestamp (Program Clock Reference, or PCR), it does not depend on the network layer to provide a timestamp for it. That is one reason why the ATM Forum decided that AAL5 would be used for carrying MPEG-2 streams rather than AAL1, which has a synchronous residual timestamp (SRTS).

Encoder Operation

The job of the encoder is to compute compressed target frames from a perfect pixel-by-pixel image received by the camera. The image received by the camera is called the original image. Before any spatial compression is performed, the encoder caches the original image. As the encoder is calculating the motion-compensated frame, it compares work in process with the original frame. If the encoder decides that the original frame and the target frame are hopelessly out of sync (say, due to a scene change), then it can abort the process and immediately calculate a fresh I-frame. In addition to scene changes, other cases in which motion compensation breaks are when background figures are covered or uncovered by foreground figures or during extreme changes in luminosity.

MPEG-2 does not specify how an encoder works; it specifies only how the decoder works. The encoder's only explicit responsibility is to produce an MPEG-2-compliant bit stream; how it does so is up to the vendor.

The rocket science of encoders lies in determining as early as possible when the original and target frames are out of sync and taking proper corrective action, within bandwidth requirements. Encoders can vary widely in determining when motion compensation is no longer required and turning knobs other than simply recomputing I-frames.

Real-time encoding is made possible by the speed of modern processors. If one assumes that a processor capable of 200 million instructions per second, and if a frame must be displayed 30 times per second, that yields a little more than 6 million instructions per frame that can be used. It is expected that software encoders will be sufficient for most images, although for demanding scenes and environments, hardware assistance will be required.

MPEG-2 Systems Operation

Compressed video and audio streams are just the first step in defining a compression system. MPEG also defines an end-to-end software architecture for linking bit streams to programming and control information. This is defined in the MPEG-2 Systems specification, "Generic Coding of Moving Pictures and Associated Audio: Systems," Recommendation H.222.0, ISO/IEC 13838-1, April 1995. The following sections overview the important components of MPEG-2 systems operation.

MPEG-2 Transport Stream Packet Format

MPEG-2 Systems Standard defined two data stream formats: the Transport Stream (TS) which can multiplex several programs and is optimized for use in networks where data loss is likely; and the Program Stream, which is optimized for multimedia applications, for performing systems processing in software. The TS is used for video over fiber, satellite, cable, ISDN, ATM, and other networks, and also for storage on digital videotape and other devices. For purposes of RBB, attention has centered on the MPEG-2 TS as the format, so we will focus on it here.

The basic difference between the Transport Stream and the Program Stream is that the TS uses fixed-length packets and the Program Stream uses variable-length packets. The TS has a fixed length of 188 bytes, with a variable-length header of a minimum 32 bits. Table 2-9 displays field identifiers, field length, and function of the TS packet.

Table 2-9. Transport Stream Packet Format
Field Length in Bits Function
Synchronization 8 0×47
Transport error indicator 1 Unrecoverable bit error exists
Payload unit start indicator 1 Control information
Transport priority 1 Gives priority to this packet over other packets with the same PID
Program Identifier (PID) 13 Identifies content (such as control table) or audio, video, closed-captioned streams
Scrambling 2 00 means no scrambling; the three other values are user-defined
Adaptation control 2 00 Reserved

01 No Adaptation field exists

10 Adaptation field exists, but there is no payload

11 Adaptation and payload exist
Adaptation field

—Length

—Program Clock Reference (PCR)

—Lots of Other Flags
256

8

42

Variable
See following fields

Length of Adaptation field

Timing information

The Adaptation field can be of length 0 to 256 bits. If it is greater than 0, bits are taken from the data portion of the packet, or payload, to yield a total packet length of 188 bytes.

The fields with network implications are the PID and transport priority. Certain PIDs and packets with high transport priority should be given preference by the network. All other fields are used primarily for image decoding.

Program Identifiers

As noted in Table 2-9, 13 bits of the header constitute the Program Identifier (PID). The PID identifies the contents of a Transport Stream packet. Most Transport Stream packets will carry digital audio, video, or closed-captioned text data for a TV program. The PID identifies which program the data is for. In addition, four PIDs are used for control purposes, specifically 0, 1, n, and m. Table 2-10 summarizes these and other PIDs.

Table 2-10. PIDs and Their Functions
PID Payload
0 Program Association Table (PAT), which points to the Program Map Table (PMT).
1 Conditional Access Table, which contains authentication information.
n PMT, pointed to by PAT, which binds PIDs to particular programs.
m Network Information Table (NIT), an optional field that identifies physical transport, such as cable channel frequency, satellite transponder, modulation scheme, and interleaving depth. Its use is private and is subject to the whims of the carrier.
0×0010 through 0×1FFE User data and program content. These PIDs carry audio, video, and closed-caption data.
0×1FFF Null packets, used for rate adaptation.

PIDs are of particular significance for channel selection. In the analog world, channel selection is performed when the receiver tunes to a single subband of a frequency multiplex broadband channel. In the digital world, channel selection is performed by joining a broadcast stream of digitally encoded packets. In the case of MPEG-2, tuning is PID selection.

A television program or movie consists of at least three streams: a video stream, an audio stream, and a text stream, containing closed-captioned text. Other streams may be associated with the program, such as foreign language audio. The PID identifies the streams.

Each program, therefore, consists of at least three PIDs. Programs are mapped to PIDs through the PMT. When a consumer wishes to view a channel, she consults an electronic program guide that displays what's available to watch, and then the user makes a selection; in turn, a remote tuner selects the PIDs for decoding.

Figure 2-16 illustrates the interactions of the Program Association Table, Program Map Table, and the Transport Stream.

Figure 2-16. PID Usage


The figure shows a Transport Stream consisting of a multiplex of three TV channels: BBC, CNN, and MTV. The Program Association Table (PAT, PID=0) has an inventory of all channels available on the TS and a PID pointer to the PMT entry, which identifies the PIDs of the data streams for each channel. In this case, Transport Packets with PID=50 will have information about BBC, namely which other PIDs will carry audio, video, and caption data for BBC. Likewise, any TS packet with PID=41 will carry PMT information about MTV. Note that PID=0 updates will occur to refresh the PAT. As an example of standardization of control parameters, the Grand Alliance and the DVB Project have specifications on how frequently the PAT and PMTs are updated, in both cases at subsecond intervals.

Five transport stream packets are shown in Figure 2-16 and are described as follows:

Packet PID Value MPEG TS Payload
1 50 Contains update information for the BBC PMT. For example, it may be necessary to change the BBC video PID.
2 20 Contains BBC audio. The payload contains MPEG encoded audio for the BBC program. The set top decoder which is tuned to BBC would intercept this packet to render the audio feed.
3 78 Contains MTV Video. The payload contains MPEG encoded video for the MTV program. The set-top decoder, which is tuned to MTV, would intercept this packet and render it for viewing.
4 0 Is the PAT Update. Periodically, the PAT must be updated so that there is an accurate set of PIDs.
5 92 Contains MTV caption. The payload contains text information, which is displayed for closed captioning.

When a viewer wants to watch BBC, he won't know that PIDs 20, 30, and 40 carry BBC packets. But the decoder knows from the PAT and PMTs what PIDs to select.

Another use of PMT entries is to identify pure data streams, such as Internet service. For example, the PAT can point to a PMT that identifies, say, stock quote push mode data, which can be distributed inside IP packets inside MPEG frames. Users who desire stock quotes can select the corresponding PIDs, just as they would select a TV program.

The example of Figure 2-16 also demonstrates that the stream of PIDs on a channel represents a multiplex of, in this case, two TV programs: MTV and BBC. Recall that a single 6 MHz channel can carry 27 Mbps. With MPEG-2, a program can use as little as 3 to 5 Mbps. It is through the use of PMTs and PIDs that programs can be multiplexed on a single channel.

Another important notion is timing. Frames for a particular program must be displayed at the proper time, and the associated audio must be rendered concurrently. For example, in Figure 2-16, frames 3 and 5 (MTV video and captioning) must be displayed currently. Because the audio, video, and closed-captioned packets occur at various places in the multiplex, presentation timing must be coordinated. Frames 3 and 5 will have a presentation time stamp (PTS) that indicates to the decoder when to display.

The presentation time stamp is an offset of the Program Clock Reference (PCR) in Table 2-9. Formally, an MPEG television program is a set of packets having a common PCR. For example, in the multiplex of packets in Figure 2-16, MTV will have a different PCR from BBC. That is, all MTV packets will be relative to the MTV PCR, and all BBC packets will be relative to the BBC PCR.

Electronic Program Guides

A key application and illustrative use of MPEG systems tables is the electronic program guide (EPG).

The EPG is the key piece of video real estate to be seen by the consumer. Its importance is roughly equivalent to the first page of a Web site portal. Its most common form is the onscreen program guide, which simultaneously permits single-key VCR advanced recording. This is intellectual property patented by (and aggressively enforced by) Gemstar (Pasadena, California; Nasdaq GMST). In addition to broadcast fare, these also contain information on pay-per-view events, such as pricing, parental guidance ratings and language.

Data for EPGs is stored in a hierarchical set of tables collectively known as the Service Information Protocol (SI), which are downloaded inside MPEG TS packets to the consumer set-top box. Well-known PIDs are used in the ( U.S. standard (ATSC) to signal the set-top box to decode and store the SI tables. In the European DVB standard ), the SI PIDs are not statically assigned but rather are pointed to by another table, allowing for an additional level of indirection and, hence, flexibility, should PID definitions change over time. The view in the United States is that static PIDs offer a faster program-acquisition time.

The protocol used to create the EPG is the service information (SI) protocol. The SI provides information on available services, where the services are located, and how the services are categorized. For example, the SI indicates that the UCLA-USC football game is on channel xx, which is on transponder number tt on satellite zz, where xx, tt, and zz are numbers stored in the SI tables. Furthermore, the game may be aggregated with other similar athletic events or other PPV events into groups of programs. In Europe, groups of programs are called bouquets; in the United States, they are called virtual channels.

The organization of the tables is discussed using the DVB specification because it is a little more general than the American table organization. Otherwise, the American specification is nearly identical. There are four SI tables:

  • Network Information (NIT);

  • Bouquet Association (BAT);

  • Service Description (SDT);

  • Event Information (EIT);

The NIT (PID 16) contains information on the Access Network carrying the program. It indicates the medium (cable, satellite, MMDS, terrestrial DTV) and, given that context, identifies static information.

For example, for every satellite the NIT identifies satellite name (such as Galaxy V), polarization, position, and transponder number. For cable systems, it identifies the frequency used, modulation scheme, FEC, and interleave depth.

The BAT identifies groups of programs. Any program can be a member of multiple bouquets. For example a UCLA football game can be a member of a UCLA bouquet, or a football bouquet, or both. In addition to bouquet name and a set of programs, the BAT also has conditional access (security) information, language, bouquet descriptor, and country availability information. This latter feature is important in Europe, where the national boundaries are close, which means that licensing must be enforced on a smaller geographical basis. The bouquet descriptor is used by the set-top box to key into a table of icons for display on the monitor.

The SDT identifies characteristics of the MPEG TS, such as which bouquet it is a member of, what events it contains, country availability, running status (for VoD or nVoD events), and a service descriptor, again used to key into a table of icons. A feature of the SDT is that it permits a matrix of information on multiple services to be displayed on the TV monitor. The cells of the matrix could be still pictures, MPEG clips, or text. The EIT contains information specific to an event. It contains start time, duration, parental guidance, language, event name, content descriptor, running status, security information, and other information. It contains information on one or more events following the current event, or preview information on programs on other transport streams.

SI tables must be stored, refreshed frequently, and displayed from the consumer set-top box. This means that the set-top box must have a sophisticated file management and operating system, like any other computer. Naturally, Microsoft thinks it owns this space, and acquisitions such as WebTV indicate their interest in operating systems for consumer entertainment environments. But there are others contenders as well, including PowerTV, used in Scientific Atlanta (NYSE:SFA) digital set tops, OS9 by Microware, and others. The operating system environment for set tops is to be one of the most contentious commercial battles in the residential broadband business.

Note

The DVB specification predates ATSC. When ATSC analyzed the DVB work, it apparently decided that improvements could be made to speed channel-acquisition time. DVB generally uses more redundant information in tables, whereas ATSC uses more static information and is judged to be more complex. It remains to be seen how much of a performance advantage, if any, the ATSC specification has.


MPEG-2 Challenges

While MPEG-2 compression has succeeded in producing very high compression ratios (upwards of 50:1), some caveats remain.

Picture Quality

Both spatial and temporal compression produce signal loss. How well two different scenes will compress is not always obvious.

For example, a cable operator testing digital video control parameters used an auto race and a fishing scene as viewing samples. The auto race showed cars on a racetrack speeding by billboards. The fishing scene showed sunlight glistening off the water and leaves flapping in a breeze. Intuition suggests that the fast-motion racing scene would be more difficult to compress than the slow-moving fishing scene.

The fishing scene was very pretty in analog. But in MPEG-2, the motion compensation broke down. MPEG-2 had problems with the leaves and the water because luminosity changed rapidly in a lot of places on the image. The result was a poor picture, even at relatively high bit rates. In the same test, auto racing looked quite good at a lower bit rate. MPEG-2 temporal compression accounts well for the movement of objects from frame to frame through the use of motion vectors, but there is no explicit temporal compression for quick and numerous luminosity changes from frame to frame.

Scenarios that represent video compression problems for MPEG-2 are listed here:

  • Quick changes in luminosity at multiple places in the image, as demonstrated in the fishing scene; flashbulbs also present problems

  • Circular motion because motion compensation assumes objects move in a straight line

  • Sharp, high-contrast edges, as for fonts and graphics

  • Multiple motions, where a single image splits into two or more, which confuses motion compensation

  • Alternating wavy lines, a variation on problems posed by circular motion

Decoder Costs

B-frames impose a memory and processing requirement on the decoder. Most new MPEG decoders with B-frames require more than 2 MB of memory. As higher-resolution video becomes popular, bandwidth restrictions may become even more important, which will tend to increase the use of B-frames and thus the cost of decoders.

Datacasting

Data services such as Web access and text annotation of broadcasts will be accommodated within MPEG streams. The adaptation of IP within MPEG is under investigation by the ATSC, as discussed in Chapter 1, "Market Drivers," in the "Datacasting" section.

Low Latency Modes

B-frames also impose latency. Latency is not a big issue for broadcast, VoD, or nVoD. But it is a problem for real-time video, such as videoconferencing. Low latency forms of MPEG are needed to reduce or eliminate B-frames. MPEG has a low latency mode in which buffers are allowed to underflow for extended periods of time. When this occurs, the decoder may lose timing information until the underflow situation is corrected. Further work is in progress on low latency MPEG.

Tighter Integration with Networking

A mechanism is needed by which the networking layer and MPEG transport can help each other. A relatively straightforward case would be one in which the network gives priority to I-frames and significant PIDs. The network should be made aware of the existence of I-frames and should give them preferential queuing. Preferential treatment should also be given to PAT and PMT updates, packets with PCRs, and frames explicitly marked for high priority.

MPEG-2 is a major architectural element of digital TV. It specifies audio and video compression, but, more importantly, it defines a framework or system specification to organize the data elements.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.217.5