6

MPEG bitstreams

After compression, audio and video signals are simply data and only differ from generic data in that they relate to real-time signals. MPEG bitstreams are basically means of transporting data whilst allowing the time axis of the original signal to be re-created at the decoder.

6.1 Introduction

There are two basic applications of MPEG bitstreams – recording and transmission – and these have quite different requirements. In the multichannel recording application the encoders and decoders can all share the same clock. The read data rate of the storage device can be adjusted so that the decoded frame rate or audio sampling rate can be slaved to any required reference signal. In the case of multichannel transmission, such as a digital television service, the program sources are not necessarily synchronous. The source of timing is the encoder and the decoder must synchronize or genlock to that. Thus the difference between a program stream and a transport stream is that the latter must contain additional synchronizing information to lock the encoder and decoder clocks together independently in each program.

Transmission techniques are steadily becoming more complex as new applications are found. In the earliest systems a dedicated signal path with constant bandwidth was specified and one compression factor would be employed. The use of statistical multiplexing allowed variable compression factors to be used subject to the overall bit rate of the multiplex remaining the same.

In network applications different users have access to different bit rates according to geographic and economic constraints. Further factors, such as network congestion or file server capacity may mean that the available bit rate is less than the capacity of the terminal equipment. One solution to this is the use of multi-rate compression, in which the same program material is compressed to a number of different bit rates. A given connection will be provided with a data rate that does not exceed the steady capacity of the terminal equipment or the instantaneous capacity of the network. In practice this may require the decoder to switch between different bit-rate versions of the same material. Whilst MPEG-2 can do this, there is little flexibility in the times at which such a switch can be made. A distinct advantage of AVC is that it is designed to allow multi-rate operation.

images

Figure 6.1 A comparison of MPEG-2 and AVC shows that the latter has an additional layer.

Figure 6.1 shows the basic difference between MPEG-2 and AVC. In MPEG-2 there is a video coding layer, which outputs an elementary stream to a transport layer. In AVC there is an additional layer, known as the Network Abstraction Layer (NAL), between the coding layer and the transport layer. This allows more flexible packaging of the elementary stream for network applications.

Clearly in any bit serial recording or transmission application there must be some means to parse the bitstream by locating symbol boundaries. This is the function of synchronizing patterns. Not all channels are true bitstreams. In many systems, especially where Reed–Solomon error correction is used, the system operates one byte at a time and the parsing of the bitstream into bytes is already performed as a necessary step in error correction and demultiplexing. ATM works in this way, as does DVB and DVD. Thus the output of such a channel would be a bytestream in which word boundaries are already known and the synchronization problem is then to locate a specific byte.

In a packetized multiplexed transmission of several programs, the packets are small so that each program receives frequent updates, reducing buffering requirements. The drawback is that each packet requires a synchronization mechanism and the smaller the packets the more bit rate is lost to synchronizing. Thus for recording it is advantageous to have a different approach in which larger packets are used to cut down synchronizing overhead.

6.2 Packets and time stamps

The video elementary stream is an endless bitstream representing pictures which are not necessarily in the correct order and which take a variable length of time to transmit. Storage and transmission systems prefer discrete blocks of data. In MPEG-2 elementary streams are packetized to form a PES (packetized elementary stream). Audio elementary streams are also packetized. A packet is shown in Figure 6.2. It begins with a header containing an unique packet start code and a code which identifies the type of data stream. Optionally the packet header also may contain one or more time stamps which are used for synchronizing the video decoder to real time and for obtaining lip-sync.

Figure 6.3 shows that a time stamp is a sample of the state of a counter which is driven by a 90 kHz clock. This is obtained by dividing down the master 27 MHz clock of MPEG-2. There are two types of time stamp: PTS and DTS. These are abbreviations for presentation time stamp and decode time stamp. A presentation time stamp determines when the associated picture should be displayed on the screen, whereas a decode time stamp determines when it should be decoded. In bidirectional coding these times can be quite different.

images

Figure 6.2 A PES packet structure is used to break up the continuous elementary stream.

images

Figure 6.3 Time stamps are the result of sampling a counter driven by the encoder clock.

images

Figure 6.4 An example of using PTS/DTS to synchronize bidirectional decoding.

Audio packets only have presentation time stamps. Clearly if lip-sync is to be obtained, the audio and the video streams of a given program must have been locked to the same master 27 MHz clock and the time stamps must have come from the same counter driven by that clock.

With reference to Figure 6.4, the GOP begins with an I picture, and then P1 is sent out of sequence prior to the B pictures. P1 has to be decoded before B1 and B2 can be decoded. As only one picture can be decoded at a time, the I picture is decoded at time N, but not displayed until time N+1. As the I picture is being displayed, P1 is being decoded at N+1. P1 will be stored in RAM. At time N+2, B1 is decoded and displayed immediately. For this reason B pictures need only PTS. At N+3, B2 is decoded and displayed. At N+4, P1 is displayed, hence the large difference between PTS and DTS in P1. Simultaneously P2 is decoded and stored ready for the decoding of B3 and so on.

In practice the time between input pictures is constant and so there is a certain amount of redundancy in the time stamps. Consequently PTS/DTS need not appear in every PES packet. Time stamps can be up to 700 ms apart in program streams and up to 100 ms apart in transport streams. As each picture type (I, P or B) is flagged in the bitstream, the decoder can infer the PTS/DTS for every picture from the ones actually transmitted.

Figure 6.5 shows that one or more PES packets can be assembled in a pack whose header contains a sync pattern and a system clock reference code which allows the decoder to re-create the encoder clock. Clock reference operation is described in section 6.4.

images

Figure 6.5 A pack is a set of PES packets. The pack header contains a clock reference code.

In AVC the rigid structure of MPEG-2 is relaxed somewhat. Temporal coding relies on identification of reference pictures and MPEG-2 does this with the combination of PTS/DTS and the known group structure. AVC allows the reference order of coded pictures to be decoupled from the display order. Thus encoded pictures can be sent in any order within buffering constraints. The encoder can select the order that minimizes overall bit rate. This requires a different picture identification mechanism. AVC uses frame number and Picture Order Count (POC). The frame number is a count which increments in the order of decoding whereas POC increments in the order of display. Note that these are picture counts, not time counts. These counts can be converted to time stamps by mutiplication by the period of each frame which is a constant for a given video format.

POC may be sent explicitly for each picture in slice headers, or it may be compressed. Where a repeating group structure is used, this will be sent in the sequence parameters and the decoder can compute the POC from the picture type and its transmitted position in the group. AVC encoders may change the group structure temporarily by sending delta POCs to change the decoder POCs from the predicted value to the actual value. In the event that the display order is the same as the decoding order the POC can be obtained from the frame number.

6.3 Transport streams

The MPEG-2 transport stream is intended to be a multiplex of many TV programs with their associated sound and data channels, although a single program transport stream (SPTS) is possible. The transport stream is based upon packets of constant size so that adding error-correction codes and interleaving1 in a higher layer is eased. Figure 6.6 shows that these are always 188 bytes long. Transport stream packets should not be confused with PES packets which are larger and vary in size.

images

Figure 6.6 Transport stream packets are always 188 bytes long to facilitate multiplexing and error correction.

Transport stream packets always begin with a header. The remainder of the packet carries data known as the payload. For efficiency, the normal header is relatively small, but for special purposes the header may be extended. In this case the payload gets smaller so that the overall size of the packet is unchanged.

The header begins with a sync byte which is a unique pattern detected by a demultiplexer. A transport stream may contain many different elementary streams and these are identified by giving each a unique 13-bit Packet Identification Code or PID which is included in the header. A multiplexer seeking a particular elementary stream simply checks the PID of every packet and accepts those that match, rejecting the rest.

In a multiplex there may be many packets from other programs in between packets of a given PID. To help the demultiplexer, the packet header contains a continuity count. This is a four-bit value which increments at each new packet having a given PID.

This approach allows statistical multiplexing as it does not matter how many or how few packets have a given PID; the demux will still find them. Statistical multiplexing has the problem that it is virtually impossible to make the sum of the input bit rates constant. Instead the multiplexer aims to make the average data bit rate slightly less than the maximum and the overall bit rate is kept constant by adding stuffing or null packets. These packets have no meaning, but simply keep the bit rate constant. Null packets always have a PID of 8191 (all ones) and the demultiplexer discards them.

6.4 Clock references

A transport stream is a multiplex of several TV programs and these may have originated from widely different locations. It is impractical to expect all the programs in a transport stream to be synchronous and so the stream is designed from the outset to allow asynchronous programs. A decoder running from a transport stream has to genlock to the encoder and the transport stream has to have a mechanism to allow this to be done independently for each program. The synchronizing mechanism is called Program Clock Reference (PCR).

In program streams all the programs must be synchronous so that only one clock is required at the decoder. In this case the synchronizing mechanism is called System Clock Reference (SCR).

Figure 6.7 shows how the PCR/SCR system works. The goal is to re-create at the decoder a 27MHz clock which is synchronous with that at the encoder. The encoder clock drives a forty-eight-bit counter which continuously counts up to the maximum value before overflowing and beginning again.

A transport stream multiplexer will periodically sample the counter and place the state of the count in an extended packet header as a PCR (see Figure 6.6). The demultiplexer selects only the PIDs of the required program, and it will extract the PCRs from the packets in which they were inserted. In a program stream the count is placed in a pack header as an SCR which the decoder can identify.

The PCR/SCR codes are used to control a numerically locked loop (NLL). This is similar to a phase-locked loop, except that the two phases concerned are represented by the state of a binary number. The NLL contains a 27MHz VCXO (voltage controlled crystal oscillator), a variable-frequency oscillator based on a crystal which has a relatively small frequency range.

images

Figure 6.7 Program or System Clock Reference codes regenerate a clock at the decoder. See text for details.

The VCXO drives a forty-eight-bit counter in the same way as in the encoder. The state of the counter is compared with the contents of the PCR/SCR and the difference is used to modify the VCXO frequency. When the loop reaches lock, the decoder counter would arrive at the same value as is contained in the PCR/SCR and no change in the VCXO would then occur. In practice the transport stream packets will suffer from transmission jitter and this will create phase noise in the loop. This is removed by the loop filter so that the VCXO effectively averages a large number of phase errors.

A heavily damped loop will reject jitter well, but will take a long time to lock. Lock-up time can be reduced when switching to a new program if the decoder counter is jammed to the value of the first PCR received in the new program. The loop filter may also have its time constants shortened during lock-up. Once a synchronous 27MHz clock is available at the decoder, this can be divided down to provide the 90 kHz clock which drives the time stamp mechanism. The entire timebase stability of the decoder is no better than the stability of the clock derived from PCR/SCR. MPEG-2 sets standards for the maximum amount of jitter which can be present in PCRs in a real transport stream.

6.5 Program Specific Information (PSI)

In a real transport stream, each elementary stream has a different PID, but the demultiplexer has to be told what these PIDs are and what audio belongs with what video before it can operate. This is the function of PSI. Figure 6.8 shows the structure of PSI. When a decoder powers up, it knows nothing about the incoming transport stream except that it must search for all packets with a PID of zero. PID zero is reserved for the Program Association Table (PAT). The PAT is transmitted at regular intervals and contains a list of all the programs in this transport stream. Each program is further described by its own Program Map Table (PMT) and the PIDs of the PMTs are contained in the PAT.

Figure 6.8 also shows that the PMTs fully describe each program. The PID of the video elementary stream is defined, along with the PID(s) of the associated audio and data streams. Consequently when the viewer selects a particular program, the demultiplexer looks up the program number in the PAT, finds the right PMT and reads the audio, video and data PIDs. It then selects elementary streams having these PIDs from the transport stream and routes them to the decoders.

Program 0 of the PAT contains the PID of the Network Information Table (NIT). This contains information about what other transport streams are available. For example, in the case of a satellite broadcast, the NIT would detail the orbital position, the polarization, carrier frequency and modulation scheme. Using the NIT a set-top box could automatically switch between transport streams.

images

Figure 6.8 MPEG-2 Program Specific Information (PSI) is used to tell a demultiplexer what the transport stream contains.

Apart from 0 and 8191, a PID of 1 is also reserved for the Conditional Access Table (CAT). This is part of the access control mechanism needed to support pay per view or subscription viewing.

6.6 Multiplexing

A transport stream multiplexer is a complex device because of the number of functions it must perform. A fixed multiplexer will be considered first. In a fixed multiplexer, the bit rate of each of the programs must be specified so that the sum does not exceed the payload bit rate of the transport stream. The payload bit rate is the overall bit rate less the packet headers and PSI rate.

In practice the programs will not be synchronous to one another, but the transport stream must produce a constant packet rate given by the bit rate divided by 188 bytes, the packet length. Figure 6.9 shows how this is handled. Each elementary stream entering the multiplexer passes through a buffer which is divided into payload-sized areas. Note that MPEG-2 decoders also have a quantity of buffer memory. The challenge to the multiplexer is to take packets from each program in such a way that neither its own buffers nor the buffers in any decoder either overflow or underflow. This requirement is met by sending packets from all programs as evenly as possible rather than bunching together a lot of packets from one program. When the bit rates of the programs are different, the only way this can be handled is to use the buffer contents indicators. The fuller a buffer is, the more likely it should be that a packet will be read from it. This buffer content arbitrator can decide which program should have a packet allocated next.

images

Figure 6.9 A transport stream multiplexer can handle several programs which are asynchronous to one another and to the transport stream clock. See text for details. Periodically the payload area is made smaller because of the requirement to insert PCR.

If the sum of the input bit rates is correct, the buffers should all slowly empty because the overall input bit rate has to be less than the payload bit rate. This allows for the insertion of Program Specific Information. While PATs and PMTs are being transmitted, the program buffers will fill up again. The multiplexer can also fill the buffers by sending more PCRs as this reduces the payload of each packet. In the event that the multiplexer has sent enough of everything but still cannot fill a packet then it will send a null packet with a PID of 8191. Decoders will discard null packets and as they convey no useful data, the multiplexer buffers will all fill while null packets are being transmitted.

In a statistical multiplexer or statmux, the bit rate allocated to each program can vary dynamically. Figure 6.10 shows that there must be a tight connection between the statmux and the associated compressors. Each compressor has a buffer memory which is emptied by a demand clock from the statmux. In a normal, fixed bit rate, coder the buffer contents feed back and control the requantizer. In statmuxing this process is less severe and only takes place if the buffer is very close to full, because the buffer content is also fed to the statmux.

The statmux contains an arbitrator which allocates packets to the program with the fullest buffer. Thus if a particular program encounters difficult material it will produce large prediction errors and begin to fill its output buffer. This will cause the statmux to allocate more packets to that program. In order to fill more packets, the statmux clocks more data out of that buffer, causing the level to fall again. Of course, this is only possible if the other programs in the transport stream are handling typical video.

images

Figure 6.10 A statistical multiplexer contains an arbitrator which allocates bit rate to each program as a function of program difficulty.

In the event that several programs encounter difficult material at once, clearly the buffer contents will rise and the requantizing mechanism will have to operate.

6.7 Remultiplexing

In real life a program creator may produce a transport stream which carries all its programs simultaneously. A service provider may take in several such streams and create its own transport stream by selecting different programs from different sources. In an MPEG-2 environment this requires a remultiplexer, also known as a transmultiplexer. Figure 6.11 shows what a remultiplexer does.

Remultiplexing is easier when all the incoming programs have the same bit rate. If a suitable combination of programs is selected it is obvious that the output transport stream will always have sufficient bit rate. Where statistical multiplexing has been used, there is a possibility that the sum of the bit rates of the selected programs will exceed the bit rate of the output transport stream. To avoid this, the remultiplexer will have to employ recompression.

Recompression requires a partial decode of the bitstream to identify the DCT coefficients. These will then be requantized to reduce the bit rate until it is low enough to fit the output transport stream.

images

Figure 6.11 A remultiplexer creates a new transport stream from selected programs in other transport streams.

Remultiplexers have to edit the Program Specific Information (PSI) such that the Program Association Table (PAT) and the Program Map Tables (PMT) correctly reflect the new transport stream content. It may also be necessary to change the packet identification codes (PIDs) since the incoming transport streams could have inadvertently used the same values.

When Program Clock Reference (PCR) data are included in an extended packet header, they represent a real-time clock count and if the associated packet is moved in time the PCR value will be wrong. Remultiplexers have to recreate a new multiplex from a number of other multiplexes and it is inevitable that this process will result in packets being placed in different locations in the output transport stream than they had in the input. In this case the remultiplexer must edit the PCR values so that they reflect the value the clock counter would have had at the location at which the packet now resides.

Reference

1. Watkinson, J.R., The Art of Digital Video, second edition, Ch. 6. Oxford: Focal Press (1994)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
35.171.45.182