MPEG applications
7.1 Introduction
It should be borne in mind that however exciting it may be, MPEG is only a technology. Technology per se is not useful and only becomes useful when it is incorporated into an appropriate and affordable product or service. If the use is not appropriate or affordable, there is no point in blaming the technology.
In this chapter a number of useful applications of MPEG are outlined. In most cases designers have followed the rules of high technology to design systems which maximize the potential of the technology whilst protecting against its weaknesses.
MPEG is an information technology. It cannot deliver anything physical: you get the movie but no popcorn. The strength of MPEG is that it delivers audio-visual material with economy of bandwidth. Its weaknesses are sensitivity to bit errors and the requirement for near-real-time transmission. The variable-length coding of MPEG can lose synchronization in the presence of a single bit error. Decoders can take a considerable time to recover from a buffer underflow or overflow. Wherever bandwidth or storage capacity is in short supply, MPEG is useful. Where bandwidth is plentiful, as in optical fibre networks, MPEG is pointless. As the capacity and economics of storage devices continue to increase, MPEG becomes less important.
MPEG will remain important in any area where there will be a permanent shortage of bandwidth; this includes any mobile equipment or service which must use radio communication.
Figure 7.1 shows that well-engineered applications of MPEG will always include an error-correction system appropriate for the characteristics of the channel and suitable mechanisms to ensure that decoder buffers neither overflow nor underflow. Constant-bit rate channels having constant delay are easy to implement because the buffering mechanisms of the standard MPEG encoders and decoders are able to operate with no further assistance. Digital TV transmitters provide essentially constant bit-rate because this is related to the standardized (and regulated) channel bandwidth allowed.
In packet networks the delivery delay is not constant because other types of messages are multiplexed with the audio-visual message of interest. Variable delay channels cause MPEG decoders difficulty. Not only is there a greater danger of buffer overflow or underflow, but the program clock reference information is no longer correct if a packet is received with a time shift. In this case additional buffering is needed at both ends of the channel so that the variable delay characteristic of the channel is invisible to the MPEG layer.
There are now many members of the MPEG family and it can be difficult to decide which type of MPEG coding to use. Figure 7.2 shows some general guidelines. As a decoder becomes more complex, the bit rate which can be used for a given quality will fall. However, this will be accompanied by an increase in power consumption at the encoder and decoder as well as an increased delay. The designer will have to balance the cost of the bandwidth with the cost of the encoding and decoding equipment and the power consumed.
7.2 Telephones
The telephone is a classic example of limited bandwidth. If there is a market for video phones, then compression is the only way forward. The facial animation coding developed for MPEG-4 is an obvious contender as it allows a reasonable reproduction of the caller’s face with a data rate of only a few kilobits per second. This is considerably less than the data rate needed for the speech. Video telephony is relatively easy between desk-based equipment, where the camera can be mounted on top of the display, but it is not clear how this can be implemented in a cellular telephone which is held against the user’s ear. At the time of writing the cellular telephone appears to be degenerating into a cross between a fashion statement and a feature contest as manufacturers pack in more and more gadgets to lure customers. Cellular telephones incorporating CCD cameras that can transmit and receive images rely heavily on compression technology, although pixel counts are low and the quality is poor.
Another important issue for cellular telephones is the amount of signal processing needed to decode and display image related data. Whilst LSI chips can easily perform these tasks in very small physical space, this is only practicable in a portable product if the current consumption is within the range of available battery technology. Battery life is marginal in some of the more complex products.
7.3 Digital television broadcasting
Digital television broadcasting relies on the combination of a number of fundamental technologies. These are: MPEG-2 compression to reduce the bit rate, multiplexing to combine picture and sound data into a common bitstream, digital modulation schemes to reduce the RF bandwidth needed by a given bit rate and error correction to reduce the error statistics of the channel down to a value acceptable to MPEG data. MPEG is a compression and multiplexing standard and does not specify how error correction should be performed. Consequently a transmission standard must define a system which has to correct essentially all errors such that the delivery mechanism is transparent.
DVB (digital video broadcasting)1 is a standard which incorporates MPEG-2 picture and sound coding but which also specifies all the additional steps needed to deliver an MPEG transport stream from one place to another. This transport stream will consist of a number of elementary streams of video and audio, where the audio may be coded according to MPEG audio standard or AC-3. In a system working within its capabilities, the picture and sound quality will be determined only by the performance of the compression system and not by the RF transmission channel. This is the fundamental difference between analog and digital broadcasting. In analog television broadcasting, the picture quality may be limited by composite video encoding artifacts as well as transmission artifacts such as noise and ghosting. In digital television broadcasting the picture quality is determined instead by the compression artifacts and interlace artifacts if interlace has been retained.
If the received error rate increases for any reason, once the correcting power is used up, the system will degrade rapidly as uncorrected errors enter the MPEG decoder. In practice decoders will be programmed to recognize the condition and blank or mute to avoid outputting garbage. As a result digital receivers tend either to work well or not at all.
It is important to realize that the signal strength in a digital system does not translate directly to picture quality. A poor signal will increase the number of bit errors. Provided that this is within the capability of the error-correction system, there is no visible loss of quality. In contrast, a very powerful signal may be unusable because of similarly powerful reflections due to multipath propagation.
Whilst in one sense an MPEG transport stream is only data, it differs from generic data in that it must be presented to the viewer at a particular rate. Generic data are usually asynchronous, whereas baseband video and audio are synchronous. However, after compression and multiplexing audio and video are no longer precisely synchronous and so the term isochronous is used. This means a signal which was at one time synchronous and will be displayed synchronously, but which uses buffering at transmitter and receiver to accommodate moderate timing errors in the transmission.
Clearly another mechanism is needed so that the time axis of the original signal can be re-created on reception. The time stamp and program clock reference system of MPEG does this.
Figure 7.3 shows that the concepts involved in digital television broadcasting exist at various levels which have an independence not found in analog technology. In a given configuration a transmitter can radiate a given payload data bit rate. This represents the useful bit rate and does not include the necessary overheads needed by error correction, multiplexing or synchronizing. It is fundamental that the transmission system does not care what this payload bit rate is used for. The entire capacity may be used up by one high-definition channel, or a large number of heavily compressed channels may be carried. The details of this data usage are the domain of the transport stream. The multiplexing of transport streams is defined by the MPEG standards, but these do not define any error correction or transmission technique.
At the lowest level in Figure 7.3 is the source coding scheme, in this case MPEG-2 compression results in one or more elementary streams, each of which carries a video or audio channel. Figure 7.4 shows that elementary streams are multiplexed into a transport stream. The viewer then selects the desired elementary stream from the transport stream. Metadata in the transport stream ensure that when a video elementary stream is chosen, the appropriate audio elementary stream will automatically be selected.
A key difference between analog and digital transmission is that the transmitter output is switched between a number of discrete states rather than continuously varying. The process is called channel coding which is the digital equivalent of modulation.2 A good code minimizes the channel bandwidth needed for a given bit rate. This quality of the code is measured in bits/s/Hz.
Where the SNR is poor, as in satellite broadcasting, the amplitude of the signal will be unstable, and phase modulation is used. Figure 7.5 shows that phase-shift keying (PSK) can use two or more phases. When four phases in quadrature are used, the result is quadrature phase-shift keying or QPSK. Each period of the transmitted waveform can have one of four phases and therefore conveys the value of two data bits. 8-PSK uses eight phases and can carry three bits per symbol where the SNR is adequate. PSK is generally encoded in such a way that a knowledge of absolute phase is not needed at the receiver. Instead of encoding the signal phase directly, the data determine the magnitude of the phase shift between symbols.
In terrestrial transmission more power is available than, for example, from a satellite and so a stronger signal can be delivered to the receiver. Where a better SNR exists, an increase in data rate can be had using multi-level signalling instead of binary. Figure 7.6 shows that the ATSC system uses an eight-level signal (8-VSB) allowing three bits to be sent per symbol. Four of the levels exist with normal carrier phase and four exist with inverted phase so that a phase-sensitive rectifier is needed in the receiver. Clearly the data separator must have a three-bit ADC which can resolve the eight signal levels. The gain and offset of the signal must be precisely set so that the quantizing levels register precisely with the centres of the eyes. The transmitted signal contains sync pulses which are encoded using specified code levels so that the data separator can set its gain and offset.
Multi-level signalling systems have the characteristic that the bits in the symbol have different error probability. Figure 7.7 shows that a small noise level will corrupt the low-order bit, whereas twice as much noise will be needed to corrupt the middle bit and four times as much will be needed to corrupt the high-order bit. In ATSC the solution is that the lower two bits are encoded together in an inner error-correcting scheme so that they represent only one bit with similar reliability to the top bit. As a result the 8-VSB system actually delivers two data bits per symbol even though eight-level signalling is used.
Multi-level signalling can be combined with PSK to obtain multi-level quadrature amplitude modulation (QUAM). Figure 7.8 shows the example of 64-QUAM. Incoming six-bit data words are split into two three-bit words and each is used to amplitude modulate a pair of sinusoidal carriers which are generated in quadrature. The modulators are four-quadrant devices such that 23 amplitudes are available, four which are in phase with the carrier and four which are antiphase. The two AM carriers are linearly added and the result is a signal which has 26 or 64 combinations of amplitude and phase. There is a great deal of similarity between QUAM and the colour subcarrier used in analog television in which the two colour difference signals are encoded into one amplitude- and phase-modulated waveform. On reception, the waveform is sampled twice per cycle in phase with the two original carriers and the result is a pair of eight-level signals. 16-QUAM is also possible, delivering only four bits per symbol but requiring a lower SNR.
The data bit patterns to be transmitted can have any combinations whatsoever, and if nothing were done, the transmitted spectrum would be non-uniform. This is undesirable because peaks cause interference with other services, whereas energy troughs allow external interference in. A randomizing technique known as energy dispersal is used to overcome the problem. The signal energy is spread uniformly throughout the allowable channel bandwidth so that it has less energy at a given frequency.
A pseudo-random sequence (PRS) generator is used to generate the randomizing sequence. Figure 7.9 shows the randomizer used in DVB. This sixteen-bit device has a maximum sequence length of 65 535 bits, and is preset to a standard value at the beginning of each set of eight transport stream packets. The serialized data are XORed with the sequence, which randomizes the output that then goes to the modulator. The spectrum of the transmission is now determined by the spectrum of the PRS.
On reception, the de-randomizer must contain the identical ring counter which must also be set to the starting condition to bit accuracy. Its output is then added to the data stream from the demodulator. The randomizing will effectively then have been added twice to the data in modulo-2, and as a result is cancelled out leaving the original serial data.
The way that radio signals interact with obstacles is a function of the relative magnitude of the wavelength and the size of the object. AM sound radio transmissions with a wavelength of several hundred metres can easily diffract around large objects. The shorter the wavelength of a transmission, the larger objects in the environment appear to it and these objects can then become reflectors. Reflecting objects produce a delayed signal at the receiver in addition to the direct signal. In analog television transmissions this causes the familiar ghosting. In digital transmissions, the symbol rate may be so high that the reflected signal may be one or more symbols behind the direct signal, causing intersymbol interference. As the reflection may be continuous, the result may be that almost every symbol is corrupted. No error-correction system can handle this. Raising the transmitter power is no help at all as it simply raises the power of the reflection in proportion.
The only solution is to change the characteristics of the RF channel in some way either to prevent the multipath reception or to prevent it being a problem. The RF channel includes the modulator, transmitter, antennae, receiver and demodulator.
As with analog UHF TV transmissions, a directional antenna is useful with digital transmission as it can reject reflections. However, directional antennae tend to be large and they require a skilled permanent installation. Mobile use on a vehicle or vessel is simply impractical.
Another possibility is to incorporate a ghost canceller in the receiver. The transmitter periodically sends a standardized known waveform known as a training sequence. The receiver knows what this waveform looks like and compares it with the received signal. In theory it is possible for the receiver to compute the delay and relative level of a reflection and so insert an opposing one. In practice if the reflection is strong it may prevent the receiver finding the training sequence.
The most elegant approach is to use a system in which multipath reception conditions cause only a small increase in error rate which the error-correction system can manage. This approach is used in DVB. Figure 7.10(a) shows that when using one carrier with a high bit rate, reflections can easily be delayed by one or more bit periods, causing interference between the bits. Figure 7.10(b) shows that instead, OFDM sends many carriers each having a low bit rate. When a low bit rate is used, the energy in the reflection will arrive during the same bit period as the direct signal. Not only is the system immune to multipath reflections, but the energy in the reflections can actually be used. This characteristic can be enhanced by using guard intervals shown in Figure 7.10(c). These reduce multipath bit overlap even more.
Note that OFDM is not a modulation scheme, and each of the carriers used in an OFDM system still needs to be modulated using any of the digital coding schemes described above. What OFDM does is to provide an efficient way of packing many carriers close together without mutual interference.
A serial data waveform basically contains a train of rectangular pulses. The transform of a rectangle is the function sin x/x and so the baseband pulse train has a sin x/x spectrum. When this waveform is used to modulate a carrier the result is a symmetrical sin x/x spectrum centred on the carrier frequency. Figure 7.11(a) shows that nulls in the spectrum appear spaced at multiples of the bit rate away from the carrier.
Further carriers can be placed at spacings such that each is centred at the nulls of the others as is shown in Figure 7.11(b). The distance between the carriers is equal to 90° or one quadrant of sin x. Owing to the quadrant spacing, these carriers are mutually orthogonal, hence the term orthogonal frequency division. A large number of such carriers (in practice several thousand) will be interleaved to produce an overall spectrum which is almost rectangular and which fills the available transmission channel.
When guard intervals are used, the carrier returns to an unmodulated state between bits for a period which is greater than the period of the reflections. Then the reflections from one transmitted bit decay during the guard interval before the next bit is transmitted. The use of guard intervals reduces the bit rate of the carrier because for some of the time it is radiating carrier, not data. A typical reduction is to around 80 per cent of the capacity without guard intervals.
This capacity reduction does, however, improve the error statistics dramatically, such that much less redundancy is required in the errorcorrection system. Thus the effective transmission rate is improved. The use of guard intervals also moves more energy from the sidebands back to the carrier. The frequency spectrum of a set of carriers is no longer perfectly flat but contains a small peak at the centre of each carrier.
The ability to work in the presence of multipath cancellation is one of the great strengths of OFDM. In DVB, more than 2000 carriers are used in single transmitter systems. Provided there is exact synchronism, several transmitters can radiate exactly the same signal so that a singlefrequency network can be created throughout a whole country. SFNs require a variation on OFDM which uses over 8000 carriers.
With OFDM, directional antennae are not needed and, given sufficient field strength, mobile reception is perfectly feasible. Of course, directional antennae may still be used to boost the received signal outside of normal service areas or to enable the use of low-powered transmitters.
An OFDM receiver must perform fast Fourier transforms (FFTs) on the whole band at the symbol rate of one of the carriers. The amplitude and/or phase of the carrier at a given frequency effectively reflects the state of the transmitted symbol at that time slot and so the FFT partially demodulates as well.
In order to assist with tuning in, the OFDM spectrum contains pilot signals. These are individual carriers which are transmitted with slightly more power than the remainder. The pilot carriers are spaced apart through the whole channel at agreed frequencies which form part of the transmission standard. Practical reception conditions, including multipath reception, will cause a significant variation in the received spectrum and some equalization will be needed. Figure 7.12 shows what the possible spectrum looks like in the presence of a powerful reflection. The signal has almost been cancelled at certain frequencies. However, the FFT performed in the receiver is effectively a spectral analysis of the signal and so the receiver computes for free the received spectrum. As in a flat spectrum the peak magnitude of all the coefficients would be the same (apart from the pilots), equalization is easily performed by multiplying the coefficients by suitable constants until this characteristic is obtained.
Although the use of transform-based receivers appears complex, when it is considered that such an approach simultaneously allows effective equalization the complexity is not significantly higher than that of a conventional receiver which needs a separate spectral analysis system just for equalization purposes. The only drawback of OFDM is that the transmitter must be highly linear to prevent intermodulation between the carriers. This is readily achieved in terrestrial transmitters by derating the transmitter so that it runs at a lower power than it would in analog service. This is not practicable in satellite transmitters which are optimized for efficiency, so OFDM is not really suitable for satellite use.
Broadcast data suffer from both random and burst errors and the error correction strategies of digital television broadcasting have to reflect that. Figure 7.13 shows a typical system in which inner and outer codes are employed. The Reed–Solomon codes are universally used for burst correcting outer codes, along with an interleave which will be convolutional rather than the block-based interleave used in recording media. The inner codes will not be R–S, as more suitable codes exist for the statistical conditions prevalent in broadcasting. DVB uses a paritybased variable-rate system in which the amount of redundancy can be adjusted according to reception conditions.
Figure 7.14 shows a block diagram of a DVB-T (terrestrial) transmitter. Incoming transport stream packets of 188 bytes each are first subject to R–S outer coding. This adds 16 bytes of redundancy to each packet, resulting in 204 bytes. Outer coding is followed by interleaving. The interleave mechanism is shown in Figure 7.15. Outer code blocks are commutated on a byte basis into twelve parallel channels. Each channel contains a different amount of delay, typically achieved by a ring-buffer RAM. The delays are integer multiples of 17 bytes, designed to skew the data by one outer block (12×17=204). Following the delays, a commutator reassembles interleaved outer blocks. These have 204 bytes as before, but the effect of the interleave is that adjacent bytes in the input are 17 bytes apart in the output. Each output block contains data from twelve input blocks making the data resistant to burst errors.
Following the interleave, the energy-dispersal process takes place. The pseudo-random sequence runs over eight outer blocks and is synchronized by inverting the transport stream packet sync symbol in every eighth block. The packet sync symbols are not randomized.
The inner coding process of DVB is convolutional. The output bit rate of the inner coder is twice the input bit rate. Under the worst reception conditions, this 100 per cent redundancy offers the most powerful correction with the penalty that a low data rate is delivered. However, a variety of inner redundancy factors can be used from 1/2 down to 1/8 of the transmitted bit rate.
The DVB standard allows the use of QPSK, 16-QUAM or 64-QUAM coding in an OFDM system. There are five possible inner code rates, and four different guard intervals which can be used with each modulation scheme. Thus for each modulation scheme there are twenty possible transport stream bit rates in the standard DVB channel, each of which requires a different receiver SNR. The broadcaster can select any suitable balance between transport stream bit rate and coverage area. For a given transmitter location and power, reception over a larger area may require a channel code with a smaller number of bits/s/Hz and this reduces the bit rate which can be delivered in a standard channel.
Alternatively a higher amount of inner redundancy means that the proportion of the transmitted bit rate which is data goes down. Thus for wider coverage the broadcaster will have to send fewer programs in the multiplex or use higher compression factors.
7.4 The DVB receiver
Figure 7.16 shows a block diagram of a DVB receiver. The off-air RF signal is fed to a mixer driven by the local oscillator. The IF output of the mixer is bandpass filtered and supplied to the ADC which outputs a digital IF signal for the FFT stage. The FFT is analysed initially to find the higher-level pilot signals. If these are not in the correct channels the local oscillator frequency is incorrect and it will be changed until the pilots emerge from the FFT in the right channels. The data in the pilots will be decoded in order to tell the receiver how many carriers, what inner redundancy rate, guard band rate and modulation scheme are in use in the remaining carriers. The FFT magnitude information is also a measure of the equalization required.
The FFT outputs are demodulated into 2K or 8K bitstreams and these are multiplexed to produce a serial signal. This is subject to inner error correction which corrects random errors. The data are then de-interleaved to break up burst errors and then the outer R–S error correction operates. The output of the R–S correction will then be de-randomized to become an MPEG transport stream once more. The de-randomizing is synchronized by the transmission of inverted sync patterns.
The receiver must select a PID of 0 and wait until a Program Association Table (PAT) is transmitted. This will describe the available programs by listing the PIDs of the Program Map Tables (PMT). By looking for these packets the receiver can determine what PIDs to select to receive any video and audio elementary streams.
When an elementary stream is selected, some of the packets will have extended headers containing Program Clock Reference (PCR). These codes are used to synchronize the 27MHz clock in the receiver to the one in the MPEG encoder of the desired program. The 27MHz clock is divided down to drive the time stamp counter so that audio and video emerge from the decoder at the correct rate and with lip-sync.
It should be appreciated that time stamps are relative, not absolute. The time stamp count advances by a fixed amount each picture, but the exact count is meaningless. Thus the decoder can only establish the frame rate of the video from time stamps, but not the precise timing. In practice the receiver has finite buffering memory between the demultiplexer and the MPEG decoder. If the displayed video timing is too late, the buffer will tend to overflow whereas if the displayed video timing is too early the decoding may not be completed. The receiver can advance or retard the time stamp counter during lock-up so that it places the output timing mid-way between these extremes.
7.5 ATSC
The ATSC system is an alternative way of delivering a transport stream, but it is considerably less sophisticated than DVB, and supports only one transport stream bit rate of 19.28 Mbits/s. If any change in the service area is needed, this will require a change in transmitter power.
Figure 7.17 shows a block diagram of an ATSC transmitter. Incoming transport stream packets are randomized, except for the sync pattern, for energy dispersal. Figure 7.18 shows the randomizer.
The outer correction code includes the whole packet except for the sync byte. Thus there are 187 bytes of data in each codeword and 20 bytes of R–S redundancy are added to make a 207-byte codeword. After outer coding, a convolutional interleaver shown in Figure 7.19 is used. This reorders data over a time span of about 4 ms. Interleave simply exchanges content between packets, but without changing the packet structure.
Figure 7.20 shows that the result of outer coding and interleave is a data frame which is divided into two fields of 313 segments each. The frame is transmitted by scanning it horizontally a segment at a time. There is some similarity with a traditional analog video signal here, because there is a sync pulse at the beginning of each segment and a field sync which occupies two segments of the frame. Data segment sync repeats every 77.3μs, a segment rate of 12 933Hz, whereas a frame has a period of 48.4ms. The field sync segments contain a training sequence to drive the adaptive equalizer in the receiver.
The data content of the frame is subject to trellis coding which converts each pair of data bits into three channel bits inside an inner interleave. The trellis coder is shown in Figure 7.21 and the interleave in Figure 7.22. Figure 7.21 also shows how the three channel bits map to the eight signal levels in the 8-VSB modulator.
Figure 7.23 shows the data segment after eight-level coding. The sync pattern of the transport stream packet, which was not included in the error-correction code, has been replaced by a segment sync waveform. This acts as a timing reference to allow deserializing of the segment, but as the two levels of the sync pulse are standardized, it also acts as an amplitude reference for the eight-level slicer in the receiver.
The eight-level signal is subject to a DC offset so that some transmitter energy appears at the carrier frequency to act as a pilot. Each eight-level symbol carries two data bits and so there are 832 symbols in each segment. As the segment rate is 12 933Hz, the symbol rate is 10.76MHz and so this will require 5.38MHz of bandwidth in a single sideband.
Figure 7.24 shows the transmitter spectrum. The lower sideband is vestigial and an overall channel width of 6MHz results.
Figure 7.25 shows an ATSC receiver. The first stages of the receiver are designed to lock to the pilot in the transmitted signal. This then allows the eight-level signal to be sampled at the right times. This process will allow location of the segment sync and then the field sync signals. Once the receiver is synchronized, the symbols in each segment can be decoded. The inner or trellis coder corrects for random errors, then following de-interleave the RS coder corrects burst errors, After de-randomizing, standard transport stream sync patterns are added to the output data.
In practice ATSC transmissions will experience co-channel interference from NTSC transmitters and the ATSC scheme allows the use of an NTSC rejection filter. Figure 7.26 shows that most of the energy of NTSC is at the carrier, subcarrier and sound carrier frequencies. A comb filter with a suitable delay can produce nulls or notches at these frequencies. However, the delay-and-add process in the comb filter also causes another effect. When two eight-level signals are added together, the result is a sixteen-level signal. This will be corrupted by noise of half the level that would corrupt an eight-level signal. However, the sixteen-level signal contains redundancy because it corresponds to the combinations of four bits whereas only two bits are being transmitted. This allows a form of error correction to be used.
The ATSC inner precoder results in a known relationship existing between symbols independent of the data. The time delays in the inner interleave are designed to be compatible with the delay in the NTSC rejection comb filter. This limits the number of paths the received waveform can take through a time/voltage graph called a trellis. Where a signal is in error it takes a path sufficiently near to the correct one that the correct one can be implied.
ATSC uses a training sequence sent once every data field, but is otherwise helpless against multipath reception as tests have shown. In urban areas, ATSC must have a correctly oriented directional antenna to reject reflections. Unfortunately the American viewer has been brought up to believe that television reception is possible with a pair of ‘rabbit’s ears’ on top of the TV set and ATSC will not work like this. Mobile reception is not practicable. As a result the majority of the world’s broadcasters appear to be favouring an OFDM-based system.
7.6 CD-Video and DVD
CD-Video and DVD are both digital optical disks which store video and audio program material. CD-Video is based on the Compact Disc, with, which it shares the same track dimensions and bit rate. MPEG-1 coding is used on video signals adhering to the common intermediate format (CIF – see Chapter 1). The input subsampling needed to obtain the CIF imaging and the high compression factor needed to comply with the low bit rate of a digital audio disk mean that the quality of CD-Video is marginal.
DVD is based on the same concepts as Compact Disc, but with a number of enhancements. The laser in the pickup has a shorter wavelength. In conjunction with a larger lens aperture this allows the readout spot significantly to be reduced in size. This in turn allows the symbols on the tracks to be shorter as well as allowing the track spacing to be reduced. The EFM code of CD is modified to EFMPlus which improves the raw error rate.
The greatly increased capacity of DVD means that moderate compression factors can be employed. MPEG-2 coding is used, so that progressively scanned or interlaced material can be handled. As most DVDs are mastered from movie film, the use of progressive scan is common.
The perceived quality of DVDs can be very high indeed. This is partly due to the fact that a variable bit rate is supported. When difficult frames are encountered, the bit rate can rise above the average. Another advantage of DVD is that it can be mastered in non-real time. This allows techniques such as aligning cuts with I-pictures to be used. Also the quality obtained can be assessed and if artifacts are observed the process can be modified.
Figure 7.27 shows the block diagram of a typical DVD player, and illustrates the essential components. A CD-Video player is similar. The most natural division within the block diagram is into the control/servo system and the data path. The control system provides the interface between the user and the servo mechanisms, and performs the logical interlocking required for safety and the correct sequence of operation.
The servo systems include any power-operated loading drawer and chucking mechanism, the spindle-drive servo, and the focus and tracking servos already described.
Power loading is usually implemented on players where the disk is placed in a drawer. Once the drawer has been pulled into the machine, the disk is lowered onto the drive spindle, and clamped at the centre, a process known as chucking. In the simpler top-loading machines, the disk is placed on the spindle by hand, and the clamp is attached to the lid so that it operates as the lid is closed. The lid or drawer mechanisms have a safety switch which prevents the laser operating if the machine is open. This is to ensure that there can be no conceivable hazard to the user. In actuality there is very little hazard in a DVD pickup. This is because the beam is focused a few millimetres away from the objective lens, and beyond the focal point the beam diverges and the intensity falls rapidly. It is almost impossible to position the eye at the focal point when the pickup is mounted in the player, but it would be foolhardy to attempt to disprove this.
The data path consists of the data separator, the de-interleaving and error-correction process followed by a RAM buffer which supplies the MPEG decoder.
The data separator converts the EFMplus readout waveform into data. Following data separation the error-correction and de-interleave processes take place. Because of the interleave system, there are two opportunities for correction, first, using the inner code prior to deinterleaving, and second, using the outer code after de-interleaving.
Interleaving is designed to spread the effects of burst errors among many different codewords, so that the errors in each are reduced. However, the process can be impaired if a small random error, due perhaps to an imperfection in manufacture, occurs close to a burst error caused by surface contamination. The function of the inner redundancy is to correct single-symbol errors, so that the power of interleaving to handle bursts is undiminished, and to generate error flags for the outer system when a gross error is encountered.
The EFMplus coding is a group code which means that a small defect which changes one channel pattern into another could have corrupted up to eight data bits. In the worst case, if the small defect is on the boundary between two channel patterns, two successive bytes could be corrupted. However, the final odd/even interleave on encoding ensures that the two bytes damaged will be in different inner codewords; thus a random error can never corrupt two bytes in one inner codeword, and random errors are therefore always correctable.
The de-interleave process is achieved by writing sequentially into a memory and reading out using a sequencer. The outer decoder will then correct any burst errors in the data. As MPEG data are very sensitive to error the error-correction performance has to be extremely good. Following the de-interleave and outer error-correction process an MPEG program stream (see Chapter 6) emerges. Some of the program stream data will be video, some will be audio and this will be routed to the appropriate decoder. It is a fundamental concept of DVD that the bit rate of this program stream is not fixed, but can vary with the difficulty of the program material in order to maintain consistent image quality. However, the disk uses constant linear velocity rather than constant angular velocity. It is not possible to obtain a particular bit rate with a fixed spindle speed.
The solution is to use a RAM buffer between the transport and the MPEG decoders. The RAM is addressed by counters which are arranged to overflow, giving the memory a ring structure. Writing into the memory is done using clocks derived from the disk whose frequency rises and falls with runout, whereas reading is done by the decoder which, for each picture, will take as much data as is required from the buffer.
The buffer will only function properly if the two addresses are kept apart. This implies that the amount of data read from the disk over the long term must equal the amount of data used by the MPEG decoders. This is done by analysing the address relationship of the buffer. If the bit rate from the disk is too high, the write address will move towards the read address; if it is too low, the write address moves away from the read address. Subtraction of the two addresses produces an error signal which can be fed to the drive.
In some DVD drives, the disk spins at a relatively high speed, resulting in an excessively high continuous data rate. However, this is reduced by jumping the pickup back. Repeating a previous track does not produce any new data and so the average rate falls. As all the disk blocks are labelled it is a simple matter to reassemble the discontinuous bitstream in memory.
The exact speed of the motor is unimportant. The important factor is that the data rate needed by the decoder is correct, and the system will skip the pickup as often as necessary so that the buffer neither underflows nor overflows.
The MPEG-2 decoder will convert the compressed elementary streams into PCM video and audio and place the pictures and audio blocks into RAM. These will be read out of RAM whenever the time stamps recorded with each picture or audio block match the state of a time stamp counter. If bidirectional coding is used, the RAM readout sequence will convert the recorded picture sequence back to the real-time sequence. The time stamp counter is derived from a crystal oscillator in the player which is divided down to provide the 90 kHz time stamp clock. As a result the frame rate at which the disk was mastered will be replicated as the pictures are read from RAM. Once a picture buffer is read out, this will trigger the decoder to decode another picture. It will read data from the buffer until this has been completed and thus indirectly influence the disk bit rate.
Owing to the use of constant linear velocity, the disk speed will be wrong if the pickup is suddenly made to jump to a different radius using manual search controls. This may force the data separator out of lock, or cause a buffer overflow and the decoder may freeze briefly until this has been remedied. The control system of a DVD player is inevitably microprocessor-based, and as such does not differ greatly in hardware terms from any other microprocessor-controlled device. Operator controls will simply interface to processor input ports and the various servo systems will be enabled or overridden by output ports. Software, or more correctly firmware, connects the two. The necessary controls are Play and Eject, with the addition in most players of at least Pause and some buttons which allow rapid skipping through the program material.
Although machines vary in detail, the flowchart of Figure 7.28 shows the logic flow of a simple player, from start being pressed to pictures and sound emerging. At the beginning, the emphasis is on bringing the various servos into operation. Towards the end, the disk subcode is read in order to locate the beginning of the first section of the program material.
When track following, the tracking-error feedback loop is closed, but for track crossing, in order to locate a piece of music, the loop is opened, and a microprocessor signal forces the laser head to move. The tracking error becomes an approximate sinusoid as tracks are crossed. The cycles of tracking error can be counted as feedback to determine when the correct number of tracks has been crossed. The ‘mirror’ signal obtained when the readout spot is half a track away from target is used to brake pickup motion and re-enable the track following feedback.
7.7 Personal video recorders
The development of the consumer VCR was a small step to end the linearity of commercial television. Increasing numbers of viewers use VCRs not just as time shifters but also as a means of fast-forwarding through the commercial breaks. The non-linear storage of video was until recently restricted by economics to professional applications. However, with the falling cost of hard disk drives and the availability of MPEG compression, non-linear video storage is now a consumer product.
The personal video recorder (PVR) is based on the random access storage of a hard disk drive. As disk drives can move their heads from track to track across the disk surface, they can access data anywhere very quickly. Unlike tape, which can only record or play back but not both at the same time, a PVR can do both simultaneously at arbitrary points on a time line.
Figure 7.29 shows that the disk drive can transfer data much faster than the required bit rate, and so it transfers data in bursts which are smoothed out by RAM buffers. It is straightforward to interleave read and write functions so that it gives the impression of reading and writing simultaneously. The read and write processes can take place from anywhere on the disk.
Although the PVR can be used as an ordinary video recorder, it can do some other tricks. Figure 7.30 shows the most far-reaching trick. The disk drive starts recording an off-air commercial TV station. A few minutes later the viewer starts playing back the recording. When the commercial break is transmitted, the disk drive may record it, but the viewer can skip over it using the random access of the hard drive. With suitable software the hard drive could skip over the commercial break automatically by simply not recording it.
When used with digital television systems, the PVR can simply record the transmitted transport stream data and replay it into an MPEG decoder. In this way the PVR has no quality loss whatsoever. The picture quality will be the same as off-air. Optionally PVRs may also have MPEG encoders so that they can record analog video inputs.
Real-time playback in a PVR is trivial as it is just a question of retrieving recorded data as the decoder buffer demands it. However, the consumer is accustomed to the ability of the VCR to produce a picture over a range of speeds as well as a freeze frame. PVRs incorporate additional processing which modifies the recorded MPEG bitstream on playback so that a standard MPEG decoder will reproduce it with a modified timebase. Figure 7.31 shows that a typical off-air MPEG elementary stream uses bidirectional coding with a moderately long GOP structure. An increase in apparent playback speed can be obtained by discarding some of the pictures in the group. This has to be done with caution as the pictures are heavily interdependent. However, if the group is terminated after a P picture, all pictures up to that point are decodable.
The resultant motion portrayal will not be smooth because pictures are being omitted, but the goal is simply to aid the viewer in finding the right place in the recording so this is of no consequence. In practice the results are good, not least because the noise bar of the analog VCR is eliminated.
High forward speeds can be obtained by selecting only the I pictures from the recorded bitstream. These cannot be transmitted to the decoder directly, because the I pictures contain a lot of data and the decoder buffer might overflow. Consequently the data rate is diluted by sending null P pictures. These are P pictures in which all of the vectors are zero and the residual data are also zero. Null P pictures require very few data but have the effect of converting an I picture into another identical picture. A similar technique can be used to decode in reverse. With a normal MPEG bidirectionally coded bitstream, reverse decoding is impossible, but by taking I pictures in reverse order and padding them with null-P pictures a reverse replay is obtained.
7.8 Networks
A network is basically a communication resource which is shared for economic reasons. Like any shared resource, decisions have to be made somewhere and somehow about how the resource is to be used. In the absence of such decisions the resultant chaos will be such that the resource might as well not exist.
In communications networks the resource is the ability to convey data from any node or port to any other. On a particular cable, clearly only one transaction of this kind can take place at any one instant even though in practice many nodes will simultaneously be wanting to transmit data. Arbitration is needed to determine which node is allowed to transmit.
Data networks originated to serve the requirements of computers and it is a simple fact that most computer processes don’t need to be performed in real time or indeed at a particular time at all. Networks tend to reflect that background as many of them, particularly the older ones, are asynchronous. Asynchronous means that the time taken to deliver a given quantity of data is unknown. A TDM system may chop the data into several different transfers and each transfer may experience delay according to what other transfers the system is engaged in. Ethernet and most storage system buses are asynchronous. For broadcasting purposes an asynchronous delivery system is no use at all, but for copying an MPEG data file between two storage devices an asynchronous system is perfectly adequate.
The opposite extreme is the synchronous system in which the network can guarantee a constant delivery rate and a fixed and minor delay. An AES/EBU digital audio router or an SDI digital video router is a synchronous network. In between asynchronous and synchronous networks reside the isochronous approaches which cause a fixed moderate delay. These can be thought of as sloppy synchronous networks or more rigidly controlled asynchronous networks.
These three different approaches are needed for economic reasons. Asynchronous systems are very efficient because as soon as one transfer completes another one can begin. This can only be achieved by making every device wait with its data in a buffer so that transfer can start immediately. Asynchronous systems also make it possible for low bit rate devices to share a network with high bit rate devices. The low bit rate device will only need a small buffer and will send few cells, whereas the high bit rate device will send more cells.
Isochronous systems try to give the best of both worlds, generally by sacrificing some flexibility in block size. Modern networks are tending to be part isochronous and part asynchronous so that the advantages of both are available.
There are a number of different arbitration protocols and these have evolved to support the needs of different types of network. In small networks, such as LANs, a single point failure which halts the entire network may be acceptable, whereas in a public transport network owned by a telecommunications company, the network will be redundant so that if a particular link fails data may be sent via an alternative route.
A link which has reached its maximum capacity may also be supplanted by transmission over alternative routes.
In physically small networks, arbitration may be carried out in a single location. This is fast and efficient, but if the arbitrator fails it leaves the system completely crippled. The processor buses in computers work in this way.
In centrally arbitrated systems the arbitrator needs to know the structure of the system and the status of all the nodes. Following a configuration change, due perhaps to the installation of new equipment, the arbitrator needs to be told what the new configuration is, or have a mechanism which allows it to explore the network and learn the configuration. Central arbitration is only suitable for small networks that change their configuration infrequently.
In other networks the arbitration is distributed so that some decisionmaking ability exists in every node. This is less efficient but is does allow at least some of the network to continue operating after a component failure.
Distributed arbitration also means that each node is self-sufficient and so no changes need to be made if the network is reconfigured by adding or deleting a node. This is the only possible approach in wide area networks where the structure may be very complex and changes dynamically in the event of failures or overload.
Ethernet uses distributed arbitration. FireWire is capable of using both types of arbitration. A small amount of decision-making ability is built into every node so that distributed arbitration is possible. However, if one of the nodes happens to be a computer, it can run a centralized arbitration algorithm.
The physical structure of a network is subject to some variation as Figure 7.32 shows. In radial networks, (a), each port has a unique cable connection to a device called a hub. The hub must have one connection for every port and this limits the number of ports. However, a cable failure will only result in the loss of one port. In a ring system (b) the nodes are connected like a daisy chain with each node acting as a feedthrough. In this case the arbitration requirement must be distributed. With some protocols, a single cable break doesn’t stop the network operating. Depending on the protocol, simultaneous transactions may be possible provided they don’t require the same cable. For example, in a storage network a disk drive may be outputting data to an editor whilst another drive is backing up data to a tape streamer. For the lowest cost, all nodes are physically connected in parallel to the same cable. Figure 7.32(c) shows that a cable break would divide the network into two halves, but it is possible that the impedance mismatch at the break could stop both halves working.
One of the concepts involved in arbitration is priority which is fundamental to providing an appropriate quality of service. If two processes both want to use a network, the one with the highest priority would normally go first.
Attributing priority must be done carefully because some of the results are non-intuitive. For example, it may be beneficial to give a high priority to a humble device which has a low data rate for the simple reason that if it is given use of the network it won’t need it for long. In a television environment transactions concerned with on-air processes would have priority over file transfers concerning production and editing.
When a device gains access to the network to perform a transaction, generally no other transaction can take place until it has finished. Consequently it is important to limit the amount of time that a given port can stay on the bus. In this way when the time limit expires, a further arbitration must take place. The result is that the network resource rotates between transactions rather than one transfer hogging the resource and shutting everyone else out. It follows from the presence of a time (or data quantity) limit that ports must have the means to break large files up into frames or cells and reassemble them on reception. This process is sometimes called adaptation. If the data to be sent originally exist at a fixed bit rate, some buffering will be needed so that the data can be time-compressed into the available frames. Each frame must be contiguously numbered and the system must transmit a file size or word count so that the receiving node knows when it has received every frame in the file.
The error-detection system interacts with this process because if any frame is in error on reception, the receiving node can ask for a retransmission of the frame. This is more efficient than retransmitting the whole file. Figure 7.33 shows the mechanism of retransmission where one packet has been lost or is in error. The retransmitted packet is inserted at the correct place in the bitstream by the receiver.
Breaking files into frames helps to keep down the delay experienced by each process using the network. Figure 7.34 shows a frame-multiplexed transmission. Each frame may be stored ready for transmission in a silo memory. It is possible to make the priority a function of the number of frames in the silo, as this is a direct measure of how long a process has been kept waiting. Isochronous systems must do this in order to meet maximum delay specifications.
When delivering an MPEG bitstream over an isochronous network, the network transmitter and receiver will require buffering over and above that provided in the MPEG encoder and decoder. This will enable the receiver to re-create a bitstream at the decoder which has essentially the same temporal characteristics as that leaving the encoder, except for a fixed delay. The greater the maximum delay of the network, the greater that fixed delay will have to be. Providing buffering is expensive and delays cause difficulty in applications such as videoconferencing. In this case it is better to use a network protocol which can limit delay by guaranteeing bandwidth.
A central arbitrator is relatively simple to implement because when all decisions are taken centrally there can be no timing difficulty (assuming a well-engineered system). In a distributed system, there is an extra difficulty due to the finite time taken for signals to travel down the data paths between nodes.
Figure 7.35 shows the structure of Ethernet which uses a protocol called CSMA/CD (carrier sense multiple access with collision detect) developed by DEC and Xerox. This is a distributed arbitration network where each node follows some simple rules. The first of these is not to transmit if an existing bus signal is detected. The second is not to transmit more than a certain quantity of data before releasing the bus. Devices wanting to use the bus will see bus signals and so will wait until the present bus transaction finishes. This must happen at some point because of the frame size limit. When the frame is completed, signalling on the bus should cease. The first device to sense the bus becoming free and to assert its own signal will prevent any other nodes transmitting according to the first rule. Where numerous devices are present it is possible to give them a priority structure by providing a delay between sensing the bus coming free and beginning a transaction. High-priority devices will have a short delay so they get in first. Lower-priority devices will only be able to start a transaction if the high-priority devices don’t need to transfer.
It might be thought that these rules would be enough and everything would be fine. Unfortunately the finite signal speed means that there is a flaw in the system. Figure 7.35 shows why. Device A is transmitting and devices B and C both want to transmit and have equal priority. At the end of A’s transaction, devices B and C see the bus become free at the same instant and start a transaction. With two devices driving the bus, the resultant waveform is meaningless. This is known as a collision and all nodes must have means to recover from it. First, each node will read the bus signal at all times. When a node drives the bus, it will also read back the bus signal and compare it with what was sent. Clearly if the two are the same all is well, but if there is a difference, this must be because a collision has occurred and two devices are trying to determine the bus voltage at once.
If a collision is detected, both colliding devices will sense the disparity between the transmitted and readback signals, and both will release the bus to terminate the collision. However, there is no point is adhering to the simple protocol to reconnect because this will simply result in another collision. Instead each device has a built-in delay which must expire before another attempt is made to transmit. This delay is not fixed, but is controlled by a random number generator and so changes from transaction to transaction.
The probability of two node devices arriving at the same delay is infinitesimally small. Consequently if a collision does occur, both devices will drop the bus, and they will start their back-off timers. When the first timer expires, that device will transmit and the other will see the transmission and remain silent. In this way the collision is not only handled, but is prevented from happening again.
The performance of Ethernet is usually specified in terms of the bit rate at which the cabling runs. However, this rate is academic because it is not available all of the time. In a real network bit rate is lost by the need to send headers and error-correction codes and by the loss of time due to interframe spaces and collision handling. As the demand goes up, the number of collisions increases and throughput goes down. Collision-based arbitrators do not handle congestion well.
An alternative method of arbitration developed by IBM is shown in Figure 7.36. This is known as a token ring system. All the nodes have an input and an output and are connected in a ring which must be complete for the system to work. Data circulate in one direction only. If data are not addressed to a node which receives it, the data will be passed on. When the data arrive at the addressed node, that node will capture the data as well as passing them on with an acknowledge added. Thus the data packet travels right around the ring back to the sending node. When the sending node receives the acknowledge, it will transmit a token packet. This token packet passes to the next node, which will pass it on if it does not wish to transmit. If no device wishes to transmit, the token will circulate endlessly. However, if a device has data to send, it simply waits until the token arrives again and captures it. This node can now transmit data in the knowledge that there cannot be a collision because no other node has the token.
In simple token ring systems, the transmitting node transmits idle characters after the data packet has been sent in order to maintain synchronization. The idle character transmission will continue until the acknowledge arrives. In the case of long packets the acknowledge will arrive before the packet has all been sent and no idle characters are necessary. However, with short packets idle characters will be generated. These idle characters use up ring bandwidth.
Later token ring systems use early token release (ETR). After the packet has been transmitted, the sending node sends a token straight away. Another node wishing to transmit can do so as soon as the current packet has passed. It might be thought that the nodes on the ring would transmit in their physical order, but this is not the case because a priority system exists. Each node can have a different priority if necessary. If a high-priority node wishes to transmit, as a packet from elsewhere passes through that node, the node will set reservation bits with its own priority level. When the sending node finishes and transmits a token, it will copy that priority level into the token.
In this way nodes with a lower priority level will pass the token on instead of capturing it. The token will ultimately arrive at the high-priority node.
The token ring system has the advantage that it does not waste throughput with collisions and so the full capacity is always available. However, if the ring is broken the entire network fails.
In Ethernet the performance is degraded by the number of transactions, not the number of nodes, whereas in token ring the performance is degraded by the number of nodes.
7.9 FireWire
FireWire3 is actually an Apple Computers Inc. trade name for the interface which is formally known as IEEE 1394–1995. It was originally intended as a digital audio network, but grew out of recognition. FireWire is more than just an interface as it can be used to form networks and if used with a computer effectively extends the computer’s data bus. Figure 7.37 shows that devices are simply connected together as any combination of daisychain or star network.
Any pair of devices can communicate in either direction, and arbitration ensures that only one device transmits at once. Intermediate devices simply pass on transmissions. This can continue even if the intermediate device is powered down as the FireWire carries power to keep repeated functions active.
Communications are divided into cycles which have a period of 125μs. During a cycle, there are 64 time slots. During each time slot, any one node can communicate with any other, but in the next slot, a different pair of nodes may communicate. Thus FireWire is best described as a time division multiplexed (TDM) system. There will be a new arbitration between the nodes for each cycle.
FireWire is eminently suitable for video/computer convergent applications because it can simultaneously support asynchronous transfers of non-real-time computer data and isochronous transfers of real-time audio/video data. It can do this because the arbitration process allocates a fixed proportion of slots for isochronous data (about 80 per cent) and these have a higher priority in the arbitration than the asynchronous data. The higher the data rate a given node needs, the more time slots it will be allocated. Thus a given bit rate can be guaranteed throughout a transaction; a prerequisite of real-time A/V data transfer.
It is the sophistication of the arbitration system which makes FireWire remarkable. Some of the arbitration is in hardware at each node, but some is in software which only needs to be at one node. The full functionality requires a computer somewhere in the system which runs the isochronous bus management arbitration. Without this only asynchronous transfers are possible. It is possible to add or remove devices whilst the system is working. When a device is added the system will recognize it through a periodic learning process. Essentially every node on the system transmits in turn so that the structure becomes clear.
The electrical interface of FireWire is shown in Figure 7.38. It consists of two twisted pairs for signalling and a pair of power conductors. The twisted pairs carry differential signals of about 220mV swinging around a common mode voltage of about 1.9V with an impedance of 112Ω. Figure 7.39 shows how the data are transmitted. The host data are simply serialized and used to modulate twisted pair A. The other twisted pair (B) carries a signal called strobe, which is the exclusive-OR of the data and the clock. Thus whenever a run of identical bits results in no transitions in the data, the strobe signal will carry transitions. At the receiver another exclusive-OR gate adds data and strobe to re-create the clock.
This signalling technique is subject to skew between the two twisted pairs and this limits cable lengths to about 10 metres between nodes. Thus FireWire is not a long-distance interface technique, instead it is very useful for interconnecting a large number of devices in close proximity. Using a copper interconnect, FireWire can run at 100, 200 or 400 megabits/s, depending on the specific hardware. It is proposed to create an optical fibre version which would run at gigabit speeds.
7.10 Broadband networks and ATM
Broadband ISDN (B-ISDN) is the successor to N-ISDN and in addition to providing more bandwidth, offers practical solutions to the delivery of any conceivable type of data. The flexibility with which ATM operates means that intermittent or one-off data transactions which only require asynchronous delivery can take place alongside isochronous MPEG video delivery. This is known as application independence whereby the sophistication of isochronous delivery does not raise the cost of asynchronous data. In this way, generic data, video, speech and combinations of the above can co-exist.
ATM is multiplexed, but it is not time-division multiplexed. TDM is inefficient because if a transaction does not fill its allotted bandwidth, the capacity is wasted. ATM does not offer fixed blocks of bandwidth, but allows infinitely variable bandwidth to each transaction. This is done by converting all host data into small fixed size cells at the adaptation layer. The greater the bandwidth needed by a transaction, the more cells per second are allocated to that transaction. This approach is superior to the fixed bandwidth approach, because if the bit rate of a particular transaction falls, the cells released can be used for other transactions so that the full bandwidth is always available.
As all cells are identical in size, a multiplexer can assemble cells from many transactions in an arbitrary order. The exact order is determined by the quality of service required, where the time positioning of isochronous data would be determined first, with asynchronous data filling the gaps.
Figure 7.40 shows how a broadband system might be implemented. The transport network would typically be optical fibre based, using SONET (synchronous optical network) or SDH (synchronous digital hierarchy). These standards differ in minor respects. Figure 7.41 shows the bit rates available in each. Lower bit rates will be used in the access networks which will use different technology such as xDSL.
SONET and SDH assemble ATM cells into a structure known as a container in the interests of efficiency. Containers are passed intact between exchanges in the transport network. The cells in a container need not belong to the same transaction, they simply need to be going the same way for at least one transport network leg.
The cell-routing mechanism of ATM is unusual and deserves explanation. In conventional networks, a packet must carry the complete destination address so that at every exchange it can be routed closer to its destination. The exact route by which the packet travels cannot be anticipated and successive packets in the same transaction may take different routes. This is known as a connectionless protocol.
In contrast, ATM is a connection oriented protocol. Before data can be transferred, the network must set up an end-to-end route. Once this is done, the ATM cells do not need to carry a complete destination address. Instead they only need to carry enough addressing so that an exchange or switch can distinguish between all of the expected transactions.
The end-to-end route is known as a virtual channel which consists of a series of virtual links between switches. The term virtual channel is used because the system acts like a dedicated channel even though physically it is not. When the transaction is completed the route can be dismantled so that the bandwidth is freed for other users. In some cases, such as delivery of a TV station’s output to a transmitter, or as a replacement for analog cable TV the route can be set up continuously to form what is known as a permanent virtual channel.
The addressing in the cells ensures that all cells with the same address take the same path, but owing to the multiplexed nature of ATM, at other times and with other cells a completely different routing scheme may exist. Thus the routing structure for a particular transaction always passes cells by the same route, but the next cell may belong to another transaction and will have a different address causing it to be routed in another way.
The addressing structure is hierarchical. Figure 7.42(a) shows the ATM cell and its header. The cell address is divided into two fields, the virtual channel identifier and the virtual path identifier. Virtual paths are logical groups of virtual channels which happen to be going the same way. An example would be the output of a video-on-demand server travelling to the first switch. The virtual path concept is useful because all cells in the same virtual path can share the same container in a transport network. A virtual path switch shown in Figure 7.42(b) can operate at the container level whereas a virtual channel switch (c) would need to dismantle and reassemble containers.
When a route is set up, at each switch a table is created. When a cell is received at a switch the VPI and/or VCI code is looked up in the table and used for two purposes. First, the configuration of the switch is obtained, so that this switch will correctly route the cell. Second, the VPI and/or VCI codes may be updated so that they correctly control the next switch. This process repeats until the cell arrives at its destination.
In order to set up a path, the initiating device will initially send cells containing an ATM destination address, the bandwidth and quality of service required. The first switch will reply with a message containing the VPI/VCI codes which are to be used for this channel. The message from the initiator will propagate to the destination, creating look-up tables in each switch. At each switch the logic will add the requested bandwidth to the existing bandwidth in use to check that the requested quality of service can be met. If this succeeds for the whole channel, the destination will reply with a connect message which propagates back to the initiating device as confirmation that the channel has been set up. The connect message contains a unique call reference value which identifies this transaction. This is necessary because an initiator such as a file server may be initiating many channels and the connect messages will not necessarily return in the same order as the set-up messages were sent. The last switch will confirm receipt of the connect message to the destination and the initiating device will confirm receipt of the connect message to the first switch.
7.11 ATM AALs
ATM works by dividing all real data messages into cells of 48 bytes each. At the receiving end, the original message must be re-created. This can take many forms and Figure 7.43 shows some possibilities. The message may be a generic data file having no implied timing structure, a serial bitstream with a fixed clock frequency, known as UDT (unstructured data transfer) or a burst of data bytes from a TDM system.
The application layer in ATM has two sublayers shown in Figure 7.44. The first is the segmentation and reassembly (SAR) sublayer which must divide the message into cells and rebuild it to get the binary data right. The second is the convergence sublayer (CS) which recovers the timing structure of the original message. It is this feature which makes ATM so appropriate for delivery of audio/visual material. Conventional networks such as the Internet don’t have this ability.
In order to deliver a particular quality of service, the adaptation layer and the ATM layer work together. Effectively the adaptation layer will place constraints on the ATM layer, such as cell delay, and the ATM layer will meet those constraints without needing to know why. Provided the constraints are met, the adaptation layer can rebuild the message. The variety of message types and timing constraints leads to the adaptation layer having a variety of forms. The adaptation layers which are most relevant to MPEG applications are AAL-1 and AAL-5. AAL-1 is suitable for transmitting MPEG-2 multi-program transport streams at constant bit rate and is standardized for this purpose in ETS 300814 for DVB application. AAL-1 has an integral forward error-correction (FEC) scheme. AAL-5 is optimized for single-program transport streams (SPTS) at a variable bit rate and has no FEC.
AAL-1 takes as an input the 188-byte transport stream packets which are created by a standard MPEG-2 multiplexer. The transport stream bit rate must be constant but it does not matter if statistical multiplexing has been used within the transport stream.
The Reed-Solomon FEC of AAL-1 uses a codeword of size 128 so that the codewords consist of 124 bytes of data and 4 bytes of redundancy, making 128 bytes in all. Thirty-one 188-byte TS packets are restructured into this format. The 256-byte codewords are then subject to a block interleave. Figure 7.45 shows that 47 such codewords are assembled in rows in RAM and then columns are read out. These columns are 47 bytes long and, with the addition of an AAL header byte, make up a 48-byte ATM packet payload. In this way the interleave block is transmitted in 128 ATM cells.
The result of the FEC and interleave is that the loss of up to four cells in 128 can be corrected, or a random error of up to two bytes can be corrected in each cell. This FEC system allows most errors in the ATM layer to be corrected so that no retransmissions are needed. This is important for isochronous operation.
The AAL header has a number of functions. One of these is to identify the first ATM cell in the interleave block of 128 cells. Another function is to run a modulo-8 cell counter to detect missing or out-of sequence ATM cells. If a cell simply fails to arrive, the sequence jump can be detected and used to flag the FEC system so that it can correct the missing cell by erasure. In a manner similar to the use of program clock reference (PCR) in MPEG, AAL-1 embeds a timing code in ATM cell headers. This is called the synchronous residual time stamp (SRTS) and in conjunction with the ATM network clock allows the receiving AAL device to reconstruct the original data bit rate. This is important because in MPEG applications it prevents the PCR jitter specification being exceeded.
In AAL-5 there is no error correction and the adaptation layer simply reformats MPEG TS blocks into ATM cells. Figure 7.46 shows one way in which this can be done. Two TS blocks of 188 bytes are associated with an 8-byte trailer known as CPCS (common part convergence sublayer). The presence of the trailer makes a total of 384 bytes which can be carried in eight ATM cells. AAL-5 does not offer constant delay and external buffering will be required, controlled by reading the MPEG PCRs in order to reconstruct the original time axis.
References
1. | Digital Video Broadcasting (DVB); Implementation guidelines for the use of MPEG-2 systems, video and audio in satellite and cable broadcasting applications. ETSI Tech. Report ETR 154 (1996) |
2. | Watkinson, J.R., The Art of Digital Video, third edition, Ch. 8. Oxford: Focal Press (2000) |
3. | Wicklegren, I.J., The facts about FireWire. IEEE Spectrum, 19–25 (1997) |
35.171.45.182