3
Digital transmission consists of converting data into a waveform suitable for the path along which they are to be sent allied to the adherence to some protocol or data structure that allows the receiving device correctly to interpret the data.
In this chapter the fundamentals of digital transmission are introduced along with descriptions of the coding and error-correction techniques used in practical applications.
3.1 Introduction
The generic term for the path down which information is sent is the channel. In a transmission application, the channel may be a point-to-point cable, a network stage or a radio link.
In digital circuitry there is a great deal of noise immunity because the signal has only two states, which are widely separated compared with the amplitude of noise. In transmission this is not always the case. In real channels, the signal may originate with discrete states which change at discrete times, but the channel will treat it as an analog waveform and so it will not be received in the same form. Various frequency-dependent loss mechanisms will reduce the amplitude of the signal. Noise will be picked up as a result of stray electric fields or magnetic induction and the receiving circuitry will contribute some noise. As a result, the received voltage will have an infinitely varying state along with a degree of uncertainty due to the noise. Different frequencies can propagate at different speeds in the channel; this is the phenomenon of group delay. An alternative way of considering group delay is that there will be frequency-dependent phase shifts in the signal and these will result in uncertainty in the timing of pulses.
In digital circuitry, the signals are generally accompanied by a separate clock signal that reclocks the data to remove jitter as was shown in Chapter 1. In contrast, it is generally not feasible to provide a separate clock in transmission applications. A separate clock line would not only raise cost, but would be impractical because at high frequency it is virtually impossible to ensure that the clock cable propagates signals at the same speed as the data cable except over short distances. Such timing differences between parallel channels are known as skew.
The solution is to use a self-clocking waveform and the generation of this is a further essential function of the coding process. Clearly, if data bits are simply clocked serially from a shift register in so-called direct transmission this characteristic will not be obtained. If all the data bits are the same, for example all zeros, a common code in audio and video, there is no clock when they are serialized.
It is not the channel which is digital; instead the term describes the way in which the received signals are interpreted. When the receiver makes discrete decisions from the input waveform it attempts to reject the uncertainties in voltage and time. The technique of channel coding is one where transmitted waveforms are restricted to those that still allow the receiver to make discrete decisions despite the degradations caused by the analog nature of the channel.
3.2 Types of Transmission Channel
Transmission can be by electrical conductors, radio or optical fibre. Although these appear to be completely different, they are in fact just different examples of electromagnetic energy travelling from one place to another. If the energy is made time-variant, information can be carried.
Electromagnetic energy propagates in a manner that is a function of frequency, and our incomplete understanding requires it to be considered as electrons, waves or photons so that we can predict its behaviour in given circumstances.
At DC and at the low frequencies used for power distribution, electromagnetic energy is called electricity and needs to be transported completely inside conductors. It has to have a complete circuit to flow in, and the resistance to current flow is determined by the cross-sectional area of the conductor. The insulation around the conductor and the spacing between the conductors has no effect on the ability of the conductor to pass current. At DC an inductor appears to be a short circuit, and a capacitor appears to be an open circuit.
As frequency rises, resistance is exchanged for impedance. Inductors display increasing impedance with frequency, capacitors show falling impedance. Electromagnetic energy increasingly tends to leave the conductor. The first symptom is the skin effect: the current flows only in the outside layer of the conductor effectively causing the resistance to rise.
As the energy is starting to leave the conductors, the characteristics of the space between them become important. This determines the impedance. A change of impedance causes reflections in the energy flow and some of it heads back towards the source. Constant impedance cables with fixed conductor spacing are necessary, and these must be suitably terminated to prevent reflections. The most important characteristic of the insulation is its thickness as this determines the spacing between the conductors.
As frequency rises still further, the energy travels less in the conductors and more in the insulation between them, and their composition becomes important and they begin to be called dielectrics. A poor dielectric like PVC absorbs high-frequency energy and attenuates the signal. So-called low-loss dielectrics such as PTFE are used, and one way of achieving low loss is to incorporate as much air into the dielectric as possible by making it in the form of foam or extruding it with voids.
High-frequency signals can also be propagated without a medium, and are called radio. As frequency rises further the electromagnetic energy is termed ‘light’ which can also travel without a medium, but can also be guided through a suitable medium. Figure 3.1(a) shows an early type of optical fibre in which total internal reflection is used to guide the light. It will be seen that the length of the optical path is a function of the angle at which the light is launched. Thus at the end of a long fibre sharp transitions would be smeared by this effect. Later optical fibres are made with a radius-dependent refractive index such that light diverging from the axis is automatically refracted back into the fibre. Figure 3.1(b) shows that in single-mode fibre light can only travel down one path and so the smearing of transitions is minimized.
3.3 Transmission Lines
Frequency-dependent behaviour is the most important factor in deciding how best to harness electromagnetic energy flow for information transmission. It is obvious that the higher the frequency, the greater the possible information rate, but in general, losses increase with frequency, and flat frequency response is elusive. The best that can be managed is that over a narrow band of frequencies, the response can be made reasonably constant with the help of equalization. Unfortunately raw data when serialized have an unconstrained spectrum. Runs of identical bits can produce frequencies much lower than the bit rate would suggest. One of the essential steps in a transmission system is to modify the spectrum of the data into something more suitable.
At moderate bit rates, say a few megabits per second, and with moderate cable lengths, say a few metres, the dominant effect will be the capacitance of the cable due to the geometry of the space between the conductors and the dielectric between. The capacitance behaves under these conditions as if it were a single capacitor connected across the signal. The effect of the series source resistance and the parallel capacitance is that signal edges or transitions are turned into exponential curves. This happens because the capacitance is effectively being charged and discharged through the source impedance.
As cable length increases, the capacitance can no longer be lumped as if it were a single unit; it has to be regarded as being distributed along the cable. With rising frequency, the cable inductance also becomes significant, and it too is distributed.
The cable is now a transmission line and pulses travel down it as current loops that roll along as shown in Figure 3.2. If the pulse is positive, as it is launched along the line, it will charge the dielectric locally as at (a). As the pulse moves along, it will continue to charge the local dielectric as at (b). When the driver finishes the pulse, the trailing edge of the pulse follows the leading edge along the line. The voltage of the dielectric charged by the leading edge of the pulse is now higher than the voltage on the line, and so the dielectric discharges into the line as at (c). The current flows forward as it is in fact the same current that is flowing into the dielectric at the leading edge. There is thus a loop of current rolling down the line flowing forward in the ‘hot’ wire and backwards in the return.
The constant to-ing and fro-ing of charge in the dielectric results in dielectric loss of signal energy. Dielectric loss increases with frequency and so a long transmission line acts as a filter. Thus the term ‘low-loss’ cable refers primarily to the kind of dielectric used. For serial digital high definition the dielectric loss is an important factor in the length of cable that can be used.
Transmission lines that transport energy in this way have a characteristic impedance due to interplay of the inductance along the conductors with the parallel capacitance. One consequence of that transmission mode is that correct termination or matching is required between the line and both the driver and the receiver. When a line is correctly matched, the rolling energy rolls straight out of the line into the load and the maximum energy is available. If the impedance presented by the load is incorrect, there will be reflections from the mismatch. An open circuit will reflect all the energy back in the same polarity as the original, whereas a short circuit will reflect all the energy back in the opposite polarity. Thus impedances above or below the correct value will have a tendency towards reflections whose magnitude depends upon the degree of mismatch and whose polarity depends upon whether the load is too high or too low. In practice it is the need to avoid reflections that is the most important reason to terminate correctly. Devices are specified by the return loss they exhibit and each interface standard will contain a return loss limit.
A perfectly square pulse contains an indefinite series of harmonics, but the higher ones suffer progressively more loss. A square pulse at the driver becomes less and less square with distance as Figure 3.3 shows. The harmonics are progressively lost until in the extreme case all that is left is the fundamental. A transmitted square wave is received as a sine wave. Fortunately data can still be recovered from the fundamental signal component.
Once all the harmonics have been lost, further losses cause the amplitude of the fundamental to fall. The effect worsens with distance and it is necessary to ensure that data recovery is still possible from a signal of unpredictable level.
3.4 Equalization and Data Separation
The characteristics of most channels are that signal loss occurs which increases with frequency. This has the effect of slowing down rise times and thereby sloping off edges. If a signal with sloping edges is sliced, the time at which the waveform crosses the slicing level will be changed, and this causes jitter. Figure 3.4 shows slicing a sloping waveform in the presence of baseline wander causes more jitter.
On a long cable, high-frequency roll-off can cause sufficient jitter to move a transition into an adjacent bit period. This is called intersymbol interference and the effect becomes worse in signals having greater asymmetry, i.e. short pulses alternating with long ones. The effect can be reduced by the application of equalization, which is typically a high-frequency boost, and by choosing a channel code which has restricted asymmetry.
The important step of information recovery at the receiver is known as data separation. The data separator is rather like an analog-to-digital convertor because the two processes of sampling and quantizing are both present. In the time domain, the sampling clock is derived from the clock content of the channel waveform. In the voltage domain, the process of slicing converts the analog waveform from the channel back into a binary representation. The slicer is thus a form of quantizer that typically only has one-bit resolution, although multi-level signalling will also be found. The slicing process makes a discrete decision about the voltage of the incoming signal in order to reject noise. The sampler makes discrete decisions along the time axis in order to reject jitter. These two processes will be described in detail.
3.5 Slicing and Jitter Rejection
The slicer is implemented with a comparator that has analog inputs but a binary output. In a cable receiver, this may follow the equalizer. The signal voltage is compared with the midway voltage, known as the threshold, baseline or slicing level by the comparator. If the signal voltage is above the threshold, the comparator outputs a high level, if below, a low level results. Figure 3.5 shows some waveforms associated with a slicer. At (a) the transmitted waveform has an uneven duty cycle. The DC component, or average level, of the signal is received with high amplitude, but the pulse amplitude falls as the pulse gets shorter. Eventually the waveform cannot be sliced.
At (b) the opposite duty cycle is shown. The signal level drifts to the opposite polarity and once more slicing is impossible. The phenomenon is called baseline wander and will be observed with any signal whose average voltage is not the same as the slicing level.
At (c) it will be seen that if the transmitted waveform has a relatively constant average voltage, slicing remains possible up to high frequencies even in the presence of serious amplitude loss, because the received waveform remains symmetrical about the baseline.
It is clearly not possible simply to serialize data in a shift register for so-called direct transmission, because successful slicing can only be obtained if the number of ones is equal to the number of zeros; there is little chance of this happening consistently with real data. Instead, a modulation code or channel code is necessary. This converts the data into a waveform that is DC-free or nearly so for the purpose of transmission.
The slicing threshold level is naturally zero in a bipolar system such as a cable. When the amplitude falls it does so symmetrically and slicing continues. The same is not true of optical fibres in which the receiver responds to intensity and therefore produces a unipolar output. If the received waveform is sliced directly, the threshold cannot be zero, but must be some level approximately half the amplitude of the signal as shown in Figure 3.6(a).
Unfortunately when the signal level falls due to losses, it falls towards zero and not towards the slicing level. The threshold will no longer be appropriate for the signal as can be seen at (b). This can be overcome by using a DC-free coded waveform. If a series capacitor is connected to the unipolar signal from an optical receiver, the waveform is rendered bipolar because the capacitor blocks any DC component in the signal. The DC-free channel waveform passes through unaltered. If an amplitude loss is suffered, (c) shows that the resultant bipolar signal now reduces in amplitude about the slicing level and slicing can continue.
The binary waveform at the output of the slicer will be a replica of the transmitted waveform, except for the addition of jitter or time uncertainty in the position of the edges due to noise, baseline wander, intersymbol interference and imperfect equalization.
Binary circuits reject noise by using discrete voltage levels which are spaced further apart than the uncertainty due to noise. In a similar manner, digital coding combats time uncertainty by making the time axis discrete using events, known as transitions, spaced apart at integer multiples of some basic time period, called a detent, which is larger than the typical time uncertainty. Figure 3.7 shows how this jitter-rejection mechanism works. All that matters is to identify the detent in which the transition occurred. Exactly where it occurred within the detent is of no consequence.
As ideal transitions occur at multiples of a basic period, an oscilloscope, which is repeatedly triggered on a channel-coded signal carrying random data, will show an eye pattern if connected to the output of the equalizer. Study of the eye pattern reveals how well the coding used suits the channel. In the case of transmission, with a short cable, the losses will be small, and the eye opening will be virtually square except for some edge-sloping due to cable capacitance. As cable length increases, the harmonics are lost and the remaining fundamental gives the eyes a diamond shape.
Noise closes the eyes in a vertical direction, and jitter closes the eyes in a horizontal direction, as in Figure 3.8. If the eyes remain sensibly open, data separation will be possible. Clearly, more jitter can be tolerated if there is less noise, and vice versa. If the equalizer is adjustable, the optimum setting will be where the greatest eye opening is obtained.
In the centre of the eyes, the receiver must make binary decisions at the channel bit rate about the state of the signal, high or low, using the slicer output. As stated, the receiver is sampling the output of the slicer, and it needs to have a sampling clock in order to do that. In order to give the best rejection of noise and jitter, the clock edges that operate the sampler must be in the centre of the eyes.
As has been stated, a separate clock is not practicable in recording or transmission. A fixed-frequency clock at the receiver is of no use. Even if it were sufficiently stable, it would not know the phase at which to run.
The only way in which the sampling clock can be obtained is to use a phase-locked loop to regenerate it from the clock content of the self-clocking channel-coded waveform. In phase-locked loops, the voltage-controlled oscillator is driven by a phase error measured between the output and some reference, such that the output eventually has the same frequency as the reference. If a divider is placed between the VCO and the phase comparator, the VCO frequency can be made to be a multiple of the reference. This also has the effect of making the loop more heavily damped. If a channel-coded waveform is used as a reference to a PLL, the loop will be able to make a phase comparison whenever a transition arrives and will run at the channel bit rate. When there are several detents between transitions, the loop will flywheel at the last known frequency and phase until it can rephase at a subsequent transition. Thus a continuous clock is recreated from the clock content of the channel waveform. In a recorder, if the speed of the medium should change, the PLL will change frequency to follow. Once the loop is locked, clock edges will be phased with the average phase of the jittering edges of the input waveform. If, for example, rising edges of the clock are phased to input transitions, then falling edges will be in the centre of the eyes. If these edges are used to clock the sampling process, the maximum jitter and noise can be rejected. The output of the slicer when sampled by the PLL edge at the centre of an eye is the value of a channel bit. Figure 3.9 shows the complete clocking system of a channel code from encoder to data separator.
Clearly, data cannot be separated if the PLL is not locked, but it cannot be locked until it has seen transitions for a reasonable period. In interfaces, transmission can be continuous and there is no difficulty remaining in lock indefinitely. There will simply be a short delay on first applying the signal before the receiver locks to it.
One potential problem area that is frequently overlooked is to ensure that the VCO in the receiving PLL is correctly centred. If it is not, it will be running with a static phase error and will not sample the received waveform at the centre of the eyes. The sampled bits will be more prone to noise and jitter errors. VCO centring is considered in Chapter 8. Many interface receivers have such an adjustment, although recent receiving chip sets may incorporate self-adjustment.
It is not practicable simply to serialize raw data in a shift register for transmission, except over relatively short distances. Practical systems require the use of a modulation scheme, known as a channel code, which expresses the data as waveforms which are self-clocking in order to reject jitter, to separate the received bits and to avoid skew on separate clock lines. The coded waveforms should ideally be DC-free or nearly so to enable slicing in the presence of losses and have a narrower spectrum than the raw data both for economy and to make equalization easier.
Jitter causes uncertainty about the time at which a particular event occurred. The frequency response of the channel then places an overall limit on the spacing of events in the channel. Particular emphasis must be placed on the interplay of bandwidth, jitter and noise, which will be shown here to be the key to the design of a successful channel code.
Figure 3.10 shows that a channel coder is necessary at the transmitter, and that a decoder, known as a data separator, is necessary at the receiver. The output of the channel coder is generally a logic-level signal that contains a ‘high’ state when a transition is to be generated. The waveform generator produces the transitions in a signal whose level and impedance are suitable for driving the channel. The signal may be bipolar or unipolar as appropriate.
One of the fundamental parameters of a transmission code is the efficiency. This can be thought of as the ratio between the Nyquist rate of the data (one-half the bit rate) and the channel bandwidth needed. Efficiency is measured in bits/sec/Hz. Another way of considering the efficiency is that it is the worst-case ratio of the number of data bits transmitted to the number of transitions in the channel.
Some codes eliminate DC entirely. Some codes can reduce the channel bandwidth by lowering the upper spectral limit. This permits greater efficiency (measured in bits/sec/Hz), usually at the expense of noise and jitter rejection. Other codes narrow the spectrum, by raising the lower limit. A code with a narrow spectrum has a number of advantages. The reduction in asymmetry will reduce peak shift and data separators can lock more readily because the range of frequencies in the code is smaller. In theory the narrower the spectrum, the less noise will be suffered, but this is only achieved if filtering is employed. Filters can easily cause phase errors that will nullify any gain.
A convenient definition of a channel code (for there are certainly others) is ‘A method of modulating real data such that they can be reliably received despite the shortcomings of a real channel, whilst making maximum economic use of the channel capacity.’ The basic time periods of a channel-coded waveform are called positions or detents, in which the transmitted voltage will be reversed or stay the same. The symbol used for the units of channel time is Td.
As jitter is such an important issue in digital transmission, a parameter has been introduced to quantify the ability of a channel code to reject time instability. This parameter, the jitter margin, also known as the window margin or phase margin (Tw), is defined as the permitted range of time over which a transition can still be received correctly, divided by the data bit-cell period (T).
Since equalization is often difficult in practice, a code having a large jitter margin will sometimes be used because it resists the effects of intersymbol interference well. Such a code may achieve a better performance in practice than a code with a higher efficiency but poor jitter performance.
A more realistic comparison of code performance will be obtained by taking into account both efficiency and jitter margin.
The FM code, also known as Manchester code or bi-phase mark code, shown in Figure 3.11(a) was the first practical self-clocking binary code and it is suitable for both transmission and recording. It is DC-free and very easy to encode and decode. It is the code specified for the AES/EBU digital audio interconnect standard.
In FM there is always a transition at the bit-cell boundary acting as a clock. For a data one, there is an additional transition at the bit-cell centre. Figure 3.11(a) shows each data bit to be represented by two channel bits. For a data zero, they will be 10, and for a data one they will be 11. Since the first bit is always one, it conveys no information, and is responsible for the efficiency of only one-half. Since there can be two transitions for each data bit, the jitter margin can only be half a bit. The high clock content of FM does, however, mean that data recovery is possible over a wide range of bit rates; hence the use in AES/EBU. The lowest frequency in FM is due to a stream of zeros and is equal to half the bit rate. The highest frequency is due to a stream of ones, and is equal to the bit rate. Thus the fundamentals of FM are within a band of one octave. Effective equalization is generally possible over such a band. FM is not polarityconscious and can be inverted without changing the data.
Figure 3.11(b) shows how an FM coder works. Data words are loaded into the input shift register and clocked out at the data bit rate. Each data bit is converted to two channel bits in the codebook or look-up table. These channel bits are loaded into the output register. The output register is clocked twice as fast as the input register because there are twice as many channel bits as data bits. The ratio of the two clocks is called the code rate, in this case it is a rate one-half code. Ones in the serial channel bit output represent transitions whereas zeros represent no change. The channel bits are fed to the waveform generator which is a one-bit delay, clocked at the channel bit rate, and an exclusive-OR gate. This changes state when a channel bit one is input. The result is a coded FM waveform where there is always a transition at the beginning of the data bit period, and a second optional transition whose presence indicates a one.
3.8 Group Codes
Further improvements in coding rely on converting patterns of real data to patterns of channel bits with more desirable characteristics using a conversion table known as a codebook. If a data symbol of m bits is considered, it can have 2m different combinations. As it is intended to discard undesirable patterns to improve the code, it follows that the number of channel bits n must be greater than m. The number of patterns that can be discarded is:
One name for the principle is group coding, and an important parameter is the code rate, defined as:
It will be evident that the jitter margin Tw is numerically equal to the code rate, and so a code rate near to unity is desirable. The patterns that are used in the codebook will be those giving the desired balance between clock content, bandwidth and DC content.
Figure 3.12 shows that the upper spectral limit can be made to be some fraction of the channel bit rate according to the minimum distance between ones in the channel bits. This is known as Tmin, also referred to as the minimum transition parameter M and in both cases is measured in data bits T. It can be obtained by multiplying the number of channel detent periods between transitions by the code rate.
Figure 3.12 also shows the lower spectral limit to be influenced by the maximum distance between transitions Tmax. This is also obtained by multiplying the maximum number of detent periods between transitions by the code rate. The length of time between channel transitions is known as the run length. Another name for this class is the run-length-limited (RLL) codes1.
The DVB-ASI interface uses the electrical interface of SDI, but the statistics of MPEG compressed data are not suitable for the standard SDI coding scheme. Instead DVB-ASI uses an 8–10 group code in which eight data bits are converted to selected patterns of ten channel bits for transmission.
In the MADI (multi-channel audio interface) standard2, a four-fifths rate code is used where groups of four data bits are represented by groups of five channel bits.
Four bits have 16 combinations whereas five bits have 32 combinations. Clearly only 16 out of these 32 are necessary to convey all the possible data. Figure 3.13 shows that the 16 channel bit patterns chosen are those having the smallest DC component combined with a high clock content. Adjacent ones are permitted in the channel bits, so there can be no violation of Tmin at the boundary of two symbols. Tmax is determined by the worst-case run of zeros at a symbol boundary and as k=3, Tmax is 16/5=3.2T. The code is thus described as 0,3,4,5,1 and Lc=4T.
The jitter resistance of a group code is equal to the code rate. For example, in 4/5 transitions cannot be closer than 0.8 of a data bit apart and so this represents the peak-to-peak jitter that can be rejected. The density ratio is also 0.8 so the FoM is 0.64; an improvement over FM.
A further advantage of group coding is that it is possible to have codes that have no data meaning. In MADI further channel bit patterns are used for packing and synchronizing. Packing, or stuffing, is the name given to dummy data sent when the real data rate is low in order to keep the channel frequencies constant. This is necessary so that fixed equalization can be used. The packing pattern does not decode to data and so it can easily be discarded at the receiver. Further details of MADI can be found in Chapter 4.
3.9 Randomizing and Encryption
Randomizing is not a channel code, but a technique that can be used in conjunction with almost any channel code. It is widely used in digital audio and video broadcasting and in a number of transmission formats. The randomizing system is arranged outside the channel coder. Figure 3.14 shows that, at the encoder, a pseudo-random sequence is added modulo-2 to the serial data. This process makes the signal spectrum in the channel more uniform, drastically reduces Tmax and reduces DC content. At the receiver the transitions are converted back to a serial bitstream to which the same pseudo-random sequence is again added modulo-2. As a result, the random signal cancels itself out to leave only the serial data, provided that the two pseudo-random sequences are synchronized to bit accuracy.
Many channel codes, especially group codes, display pattern sensitivity because some waveforms are more sensitive to peak shift distortion than others. Pattern sensitivity is only a problem if a sustained series of sensitive symbols needs to be recorded. Randomizing ensures that this cannot happen because it breaks up any regularity or repetition in the data. The data randomizing is performed by using the exclusive-OR function of the data and a pseudo-random sequence as the input to the channel coder. On replay the same sequence is generated, synchronized to bit accuracy, and the exclusive-OR of the replay bitstream and the sequence is the original data.
Clearly, the sync pattern cannot be randomized, since this causes a situation where it is not possible to synchronize the sequence for replay until the sync pattern is read, but it is not possible to read the sync pattern until the sequence is synchronized!
In networks, the randomizing may be block based, since this matches the block structure of the transmission protocol. Where there is no obvious block structure, convolutional or endless randomizing can be used. In convolutional randomizing, the signal sent down the channel is the serial data waveform convolved with the impulse response of a digital filter. On reception the signal is deconvolved to restore the original data.
Convolutional randomizing is used in the serial digital interface (SDI) that carries 4:2:2 sampled video. Figure 3.15(a) shows that the filter is an infinite impulse response (IIR) filter having recursive paths from the output back to the input. As it is a one-bit filter its output cannot decay, and once excited, it runs indefinitely. Following the filter is a transition generator consisting of a one-bit delay and an exclusive-OR gate. An input 1 results in an output transition on the next clock edge. An input 0 results in no transition.
A result of the infinite impulse response of the filter is that frequent transitions are generated in the channel resulting in sufficient clock content for the phase-locked loop in the receiver.
Transitions are converted back to 1s by a differentiator in the receiver. This consists of a one-bit delay with an exclusive-OR gate comparing the input and the output. When a transition passes through the delay, the input and the output will be different and the gate outputs a 1 that enters the deconvolution circuit.
Figure 3.15(b) shows that in the deconvolution circuit a data bit is simply the exclusive-OR of a number of channel bits at a fixed spacing. The deconvolution is implemented with a shift register having the exclusive-OR gates connected in a reverse pattern to that in the encoder. The same effect as block randomizing is obtained, in that long runs are broken up and the DC content is reduced, but it has the advantage over block randomizing that no synchronizing is required to remove the randomizing, although it will still be necessary for deserialization. Clearly, the system will take a few clock periods to produce valid data after commencement of transmission, but this is no problem on a permanent wired connection where the transmission is continuous.
Where randomizing is used instead of a channel code, as in the serial digital interface, reliable operation depends on the statistics of the data. If the data to be transmitted have a certain combination of bits, the randomizing is defeated and the clock content is dramatically reduced. Such bit combinations are known as pathological sequences and may deliberately be generated for testing purposes. In video data, the probability of natural occurrence of pathological sequences is very small indeed, whereas in generic data it is not. This is why DVB-ASI cannot use this coding scheme.
In a randomized transmission, if the receiver is not able to re-create the pseudo-random sequence, the data cannot be decoded. This can be used as the basis for encryption in which only authorized users can decode transmitted data. In an encryption system, the goal is security whereas in a channel-coding system the goal is simplicity. Channel coders use pseudo-random sequences because these are economical to create using feedback shift registers. However, there are a limited number of pseudorandom sequences and it would be too easy to try them all until the correct one was found. Encryption systems use the same processes, but the key sequence that is added to the data at the encoder is truly random. This makes it much harder for unauthorized parties to access the data. Only a receiver in possession of the correct sequence can decode the channel signal. If the sequence is made long enough, the probability of stumbling across the sequence by trial and error can be made sufficiently small.
3.10 Synchronizing
Once the PLL in the data separator has locked to the clock content of the transmission, a serial channel bitstream and a channel bit clock will emerge from the sampler. In a group code, it is essential to know where a group of channel bits begins in order to assemble groups for decoding to data bit groups. In a randomizing system it is equally vital to know at what point in the serial data stream the words or samples commence. In serial transmission, channel bit groups or randomized data words are sent one after the other, one bit at a time, with no spaces in between, so that although the designer knows that a data packet contains, say, 128 bytes, the receiver simply finds 1024 bits in a row. If the exact position of the first bit is not known, then it is not possible to put all the bits in the right places in the right bytes; a process known as deserializing. The effect of sync slippage is devastating, because a one-bit disparity between the bit count and the bitstream will corrupt every symbol in the block.
The synchronization of the data separator and the synchronization to the packet format are two distinct problems, although they may be solved by the same sync pattern. Deserializing requires a shift register fed with serial data and read out once per word. The sync detector is simply a set of logic gates arranged to recognize a specific pattern in the register. The sync pattern is either identical for every packet or has a restricted number of versions and it will be recognized by the receiver circuitry and used to reset the bit count through the packet. Then by counting channel bits and dividing by the group size, groups can be deserialized and decoded to data groups. In a randomized system, the pseudo-random sequence generator is also reset. Then counting derandomized bits from the sync pattern and dividing by the word length enables the receiver to deserialize the data words.
Even if a specific code were excluded from the transmitted data so that it could be used for synchronizing, this cannot ensure that the same pattern cannot be falsely created at the junction between two allowable data words. Figure 3.16 shows how false synchronizing can occur due to concatenation.It is thus not practical to search for a bit pattern that is a data code value. The problem is overcome in some synchronous systems by using the fact that sync patterns occur exactly once per packet and therefore contain redundancy. If the pattern is recognized at packet rate, a genuine sync condition exists. Sync patterns seen at other times must be false. Such systems take a few milliseconds before sync is achieved, but once achieved it should not be lost unless the transmission is seriously impaired.
In run-length-limited codes false syncs are not a problem. The sync pattern is no longer a data bit pattern but is a specific waveform. If the sync waveform contains run lengths that violate the normal coding limits, such run lengths cannot occur in encoded data, nor can they be interpreted as data. They can, however, be readily detected by the receiver.
In a group code there are many more combinations of channel bits than there are combinations of data bits. Thus after all data bit patterns have been allocated group patterns, there are still many unused group patterns which cannot occur in the data. With care, group patterns can be found which cannot occur due to the concatenation of any pair of groups representing data. These are then unique and can be used for synchronizing.
3.11 Basic Error Correction
There are many different types of transmission channel and consequently there will be many different mechanisms that may result in errors. Bit errors in audio cause unpleasant crackles. In video they cause ‘sparkles’ in the picture whose effect depends upon the significance of the affected bit. Errors in compressed data are more serious as they may cause the decoder to lose sync.
Errors may be caused by outside interference, or by Gaussian thermal noise in the channel and receiver. When group codes are used, a single defect in a group changes the group symbol and may cause errors up to the size of the group. Single-bit errors are therefore less common in group-coded channels. Inside equipment, data are conveyed on short wires and the noise environment is under the designer’s control. With suitable design techniques, errors can be made effectively negligible whereas in communication systems, there is considerably less control of the electromagnetic environment. In networks, packets may be lost.
Irrespective of the cause, all these mechanisms cause one of two effects. There are large isolated corruptions, called error bursts, where numerous bits are corrupted all together in an otherwise error-free area, and there are random errors affecting single bits or symbols. Whatever the mechanism, the result will be that the received data will not be exactly the same as those sent. In binary the discrete bits will be each either right or wrong. If a binary digit is known to be wrong, it is only necessary to invert its state and then it must be right. Thus error correction itself is trivial; the hard part is working out which bits need correcting.
There are a number of terms having idiomatic meanings in error correction. The raw BER (bit error rate) is the error rate of the medium, whereas the residual or uncorrected BER is the rate at which the errorcorrection system fails to detect or miscorrects errors. In practical digital systems, the residual BER is negligibly small. If the error correction is turned off, the two figures become the same.
Error correction works by adding some bits to the data that are calculated from the data. This creates an entity called a codeword spanning a greater length of time than one bit alone. The statistics of noise means that whilst one bit may be lost in a codeword, the loss of the rest of the codeword because of noise is highly improbable. As will be described later in this chapter, codewords are designed to be able to correct totally a finite number of corrupted bits. The greater the time span over which the coding is performed, the greater will be the reliability achieved, although this does mean that an encoding delay will be experienced, along with a similar or greater decoding delay.
Shannon3 disclosed that a message could be sent to any desired degree of accuracy provided that it is spread over a sufficient time span. Engineers have to compromise, because an infinite coding delay in the recovery of an error-free signal is not acceptable.
In some applications, such as AES/EBU and SDI, the requirements of production are such that no delay is acceptable. Such interfaces cannot use error correction and so are designed to be so robust that errors are negligibly infrequent in normal use. Instead of error correction, error detection is employed. If errors are detected, this implies that some maintenance is required.
If error correction is necessary as a practical matter, it is then only a small step to put it to maximum use. All error correction depends on adding bits to the original message, and this, of course, increases the number of bits to be sent, although it does not increase the information. It might be imagined that error correction is going to reduce efficiency, because bandwidth has to be found for all the extra bits. Nothing could be further from the truth. Once an error-correction system is used, the signal-to-noise ratio of the channel can be reduced, because the raised BER of the channel will be overcome by the error-correction system. Consequently the power of a digital transmitter can be reduced if error correction is used.
Figure 3.17 shows the broad subdivisions of error handling. The first stage might be called error avoidance and includes such measures as network rerouting to bypass faulty links. Properly terminating network cabling is also in this category. The data pass through the channel, which causes whatever corruptions it will. On receipt of the data the occurrence of errors is first detected, and this process must be extremely reliable, as it does not matter how effective the correction or how good the concealment algorithm, if it is not known that they are necessary. The detection of an error then results in a course of action being decided.
In the case of a file transfer, real-time operation is not required. A packet in error in a network may result in a retransmission. In many cases of digital video or audio replay a retransmission is not possible because the data are required in real time. In this case the solution is to encode the message using a system sufficiently powerful to correct the errors in real time. These are called forward error-correcting schemes (FEC). The term ‘forward’ implies that the transmitter does not need to take any action in the case of an error: the receiver will perform the correction.
3.12 Concealment by Interpolation
There are some practical differences between data representing audio or video and generic data. Video and audio must be reproduced in real time, but have the advantage that there is a certain amount of redundancy in the information conveyed. Thus if an error cannot be corrected, it can be concealed. If a sample is lost, it is possible to obtain an approximation to it by interpolating between samples in the vicinity of the missing one. Clearly, concealment of any kind cannot be used with computer instructions or compressed data, although concealment can be applied after compressed signals have been decoded.
If there is too much corruption for concealment, the only course in video is repeat the previous field or frame in a freeze as it is unlikely that the corrupt picture is viewable. In audio the equivalent is muting.
In general, if use is to be made of concealment on receipt, the data must generally be reordered or shuffled prior to transmission. To take a simple example, odd-numbered samples are sent with a delay with respect to even-numbered samples. If a gross error occurs, depending on its timing, the result will be either corrupted odd samples or corrupted even samples, but it is most unlikely that both will be lost. Interpolation is then possible if the power of the correction system is exceeded. NICAM 728 uses such a system.
It should be stressed that corrected data are indistinguishable from the original and thus there can be no visible or audible artefacts. In contrast, concealment is only an approximation to the original information and could be detectable. In practical equipment, concealment occurs infrequently unless there is a defect requiring attention, and its presence is difficult to detect.
3.13 Parity
The error-detection and error-correction processes are closely related and will be dealt with together here. The actual correction of an error is simplified tremendously by the adoption of binary. As there are only two symbols, 0 and 1, it is enough to know that a symbol is wrong, and the correct value is obvious. Figure 3.18 shows a minimal circuit required for correction once the bit in error has been identified. The XOR (exclusive-OR) gate shows up extensively in error correction and the figure also shows the truth table. One way of remembering the characteristics of this useful device is that there will be an output when the inputs are different. Inspection of the truth table will show that there is an even number of ones in each row (zero is an even number) and so the device could also be called an even parity gate. The XOR gate is also an adder in modulo-2.
Parity is a fundamental concept in error detection. In Figure 3.19, the example is given of a four-bit data word to be protected. If an extra bit is added to the word calculated in such a way that the total number of ones in the five-bit word is even, this property can be tested on receipt. The generation of the parity bit can be performed by a number of the ubiquitous XOR gates configured into what is known as a parity tree. In the figure, if a bit is corrupted, the received message will be seen no longer to have an even number of ones. If two bits are corrupted, the failure will be undetected. This example can be used to introduce much of the terminology of error correction. The extra bit added to the message carries no information of its own, since it is calculated from the other bits. It is therefore called a redundant bit.
The addition of the redundant bit gives the message a special property, i.e. the number of ones is even. A message having some special property irrespective of the actual data content is called a codeword. All error correction relies on adding redundancy to real data to form codewords for transmission. If any corruption occurs, the intention is that the received message will not have the special property; in other words, if the received message is not a codeword there has definitely been an error. The receiver can check for the special property without any prior knowledge of the data content. Thus the same check can be made on all received data. If the received message is a codeword, there probably has not been an error. The word ‘probably’ must be used because the figure shows that two bits in error will cause the received message to be a codeword, which cannot be discerned from an error-free message.
If it is known that generally the only failure mechanism in the channel in question is loss of a single bit, it is assumed that receipt of a codeword means that there has been no error. If there is a probability of two error bits, which becomes very nearly the probability of failing to detect an error, since all odd numbers of errors will be detected, and a four-bit error is much less likely. It is paramount in all error-correction systems that the protection used should be appropriate for the probability of errors to be encountered. An inadequate error-correction system is actually worse than not having any correction. Error correction works by trading probabilities. Error-free performance with a certain error rate is achieved at the expense of performance at higher error rates. Figure 3.20 shows the effect of an error-correction system on the residual BER for a given raw BER. It will be seen that there is a characteristic knee in the graph. If the expected raw BER has been misjudged, the consequences can be disastrous. Another result demonstrated by the example is that we can only guarantee to detect the same number of bits in error as there are redundant bits.
3.14 Block and Convolutional Codes
Figure 3.21(a) shows a strategy known as a crossword code, or product code. The data are formed into a two-dimensional array, in which each location can be a single-bit or a multi-bit symbol. Parity is then generated on both rows and columns. If a single bit or symbol fails, one row parity check and one column parity check will fail, and the failure can be located at the intersection of the two failing checks. Although two symbols in error confuse this simple scheme, using more complex coding in a two-dimensional structure is very powerful, and further examples will be given throughout this chapter.
The example of Figure 3.21(a) assembles the data to be coded into a block of finite size and then each codeword is calculated by taking a different set of symbols. This should be contrasted with the operation of the circuit of (b). Here the data are not in a block, but form an endless stream. A shift register allows four symbols to be available simultaneously to the encoder. The action of the encoder depends upon the delays. When symbol 3 emerges from the first delay, it will be added (modulo-2) to symbol 6. When this sum emerges from the second delay, it will be added to symbol 9 and so on. The codeword produced is shown in (c) to be bent such that it has a vertical section and a diagonal section. Four symbols later the next codeword will be created one column further over in the data.
This is a convolutional code because the coder always takes parity on the same pattern of symbols convolved with the data stream on an endless basis. Figure 3.21(c) also shows that if an error occurs, it can be located because it will cause parity errors in two codewords. The error will be on the diagonal part of one codeword and on the vertical part of the other so that it can be located uniquely at the intersection and corrected by parity.
Comparison with the block code of Figure 3.21(a) will show that the convolutional code needs less redundancy for the same single-symbol location and correction performance as only a single redundant symbol is required for every four data symbols. Convolutional codes are computed on an endless basis which makes them inconvenient in networks where packet multiplexing is anticipated. Here the block code is more appropriate as it allows switching gaps to be created between codes.
Convolutional codes work best in channels suffering from Gaussian noise and will be found in broadcasting. They can easily be taken beyond their correcting power if used with a bursty channel.
3.15 Cyclic Codes
In digital audio and video, relatively large quantities of data are involved and it is desirable to use relatively large data blocks to reduce the amount of bandwidth devoted to preambles, addressing and synchronizing. The principle of codewords having a special characteristic will still be employed, but they will be generated and checked algorithmically by equations. The syndrome will then be converted to the bit(s) in error by solving equations.
Where data can be accessed serially, simple circuitry can be used because the same gate will be used for many XOR operations. The circuit of Figure 3.22 is a kind of shift register, but with a particular feedback arrangement which leads it to be known as a twisted-ring counter. If seven message bits A–G are applied serially to this circuit, and each one of them is clocked, the outcome can be followed in the diagram. As bit A is presented and the system is clocked, bit A will enter the left-hand latch. When bits B and C are presented, A moves across to the right. Both XOR gates will have A on the upper input from the right-hand latch, the left one has D on the lower input and the right one has B on the lower input. When clocked, the left latch will thus be loaded with the XOR of A and D, and the right one with the XOR of A and B. The remainder of the sequence can be followed, bearing in mind that when the same term appears on both inputs of an XOR gate, it goes out, as the exclusive-OR of something with itself is nothing. At the end of the process, the latches contain three different expressions. Essentially, the circuit makes three parity checks through the message, leaving the result of each in the three stages of the register. In the figure, these expressions have been used to draw up a check matrix. The significance of these steps can now be explained.
The bits A B C and D are four data bits, and the bits E F and G are redundancy. When the redundancy is calculated, bit E is chosen so that there are an even number of ones in bits A B C and E; bit F is chosen such that the same applies to bits B C D and F, and similarly for bit G. Thus the four data bits and the three check bits form a seven-bit codeword. If there is no error in the codeword, when it is fed into the circuit shown, the result of each of the three parity checks will be zero and every stage of the shift register will be cleared. As the register has eight possible states, and one of them is the error-free condition, then there are seven remaining states, hence the seven-bit codeword. If a bit in the codeword is corrupted, there will be a non-zero result. For example, if bit D fails, the check on bits A B D and G will fail, and a one will appear in the left-hand latch. The check on bits B C D F will also fail, and the centre latch will set. The check on bits A B C E will not fail, because D is not involved in it, making the right-hand bit zero. There will be a syndrome of 110 in the register, and this will be seen from the check matrix to correspond to an error in bit D. Whichever bit fails, there will be a different three-bit syndrome that uniquely identifies the failed bit. As there are only three latches, there can be eight different syndromes. One of these is zero, which is the error-free condition, and so there are seven remaining error syndromes. The length of the codeword cannot exceed seven bits, or there would not be enough syndromes to correct all the bits. This can also be made to tie in with the generation of the check matrix. If 14 bits, A to N, were fed into the circuit shown, the result would be that the check matrix repeated twice, and if a syndrome of 101 were to result, it could not be determined whether bit D or bit K failed. Because the check repeats every seven bits, the code is said to be a cyclic redundancy check (CRC) code.
It has been seen that the circuit shown makes a matrix check on a received word to determine if there has been an error, but the same circuit can also be used to generate the check bits. To visualize how this is done, examine what happens if only the data bits A B C and D are known, and the check bits E F and G are set to zero. If this message, ABCD000, is fed into the circuit, the left-hand latch will afterwards contain the XOR of A B C and zero, which is, of course, what E should be. The centre latch will contain the XOR of B C D and zero, which is what F should be and so on. This process is not quite ideal, however, because it is necessary to wait for three clock periods after entering the data before the check bits are available. Where the data are simultaneously being recorded and fed into the encoder, the delay would prevent the check bits being easily added to the end of the data stream. This problem can be overcome by slightly modifying the encoder circuit as shown in Figure 3.23. By moving the position of the input to the right, the operation of the circuit is advanced so that the check bits are ready after only four clocks. The process can be followed in the diagram for the four data bits A B C and D. On the first clock, bit A enters the left two latches, whereas on the second clock, bit B will appear on the upper input of the left XOR gate, with bit A on the lower input, causing the centre latch to load the XOR of A and B and so on.
The way in which the cyclic codes work has been described in engineering terms, but it can be described mathematically if analysis is contemplated.
Just as the position of a decimal digit in a number determines the power of ten (whether that digit means one, ten or 100), the position of a binary digit determines the power of two (whether it means one, two or four). It is possible to rewrite a binary number so that it is expressed as a list of powers of two. For example, the binary number 1101 means 8+4+1, and can be written:
In fact, much of the theory of error correction applies to symbols in number bases other than 2, so that the number can also be written more generally as
and which also looks much more impressive. This expression, containing as it does various powers, is, of course, a polynomial. The circuit of Figure 3.22, which has been seen to construct a parity-check matrix on a codeword, can also be described as calculating the remainder due to dividing the input by a polynomial using modulo-2 arithmetic. In modulo-2 there are no borrows or carries, and addition and subtraction are replaced by the XOR function, which makes hardware implementation very easy. In Figure 3.24 it will be seen that the circuit of Figure 3.22 actually divides the codeword by the polynomial
This can be deduced from the fact that the right-hand bit is fed into two lower-order stages of the register at once. Once all the bits of the message have been clocked in, the circuit contains the remainder. In mathematical terms, the special property of a codeword is that it is a polynomial yielding a remainder of zero when divided by the generating polynomial. The receiver will make this division, and the result should be zero in the errorfree case. Thus the codeword itself disappears from the division. If an error has occurred it is considered that this is due to an error polynomial added to the codeword polynomial. If a codeword divided by the check polynomial is zero, a non-zero syndrome must represent the error polynomial divided by the check polynomial. Thus if the syndrome is multiplied by the check polynomial, the latter will be cancelled out and the result will be the error polynomial. If this is added modulo-2 to the received word, it will cancel out the error and leave the corrected data.
Some examples of modulo-2 division are given in Figure 3.24, which can be compared with the parallel computation of parity checks according to the matrix of Figure 3.22.
The process of generating the codeword from the original data can also be described mathematically. If a codeword has to give zero remainder when divided, it follows that the data can be converted to a codeword by adding the remainder when the data are divided. Generally speaking, the remainder would have to be subtracted, but in modulo-2 there is no distinction. This process is also illustrated in Figure 3.24. The four data bits have three zeros placed on the right-hand end, to make the word length equal to that of a codeword, and this word is then divided by the polynomial to calculate the remainder. The remainder is added to the zeroextended data to form a codeword. The modified circuit of Figure 3.23 can be described as premultiplying the data by x3 before dividing.
CRC codes are of primary importance for detecting errors, and several have been standardized for use in digital communications. The most common of these are:
The 16-bit cyclic codes have codewords of length 216–1 or 65 535 bits long. This may be too long for the application. Another problem with very long codes is that with a given raw BER, the longer the code, the more errors will occur in it. There may be enough errors to exceed the power of the code. The solution in both cases is to shorten or puncture the code. Figure 3.25 shows that in a punctured code, only the end of the codeword is used, and the data and redundancy are preceded by a string of zeros. It is not necessary to transmit these zeros, and, of course, errors cannot occur in them. Implementing a punctured code is easy. If a CRC generator starts with the register cleared and is fed with serial zeros, it will not change its state. Thus it is not necessary to provide the zeros, encoding can begin with the first data bit. In the same way, the leading zeros need not be provided during reception. The only precaution needed is that if a syndrome calculates the location of an error, this will be from the beginning of the codeword not from the beginning of the data. Where codes are used for detection only, this is of no consequence.
CRCs are in common use in digital interfaces. The channel status data of the AES/EBU interface, video frames in SD EDH and active lines in HD-SDI are all checked with CRCs.
Figure 3.26 shows a simple circuit consisting of three D-type latches which are clocked simultaneously. They are connected in series to form a shift register. At (a) a feedback connection has been taken from the output to the input and the result is a ring counter where the bits contained will recirculate endlessly. At (b) one XOR gate is added so that the output is fed back to more than one stage. The result is known as a twisted-ring counter and it has some interesting properties. Whenever the circuit is clocked, the left-hand bit moves to the right-hand latch, the centre bit moves to the left-hand latch and the centre latch becomes the XOR of the two outer latches. The figure shows that whatever the starting condition of the three bits in the latches, the same state will always be reached again after seven clocks, except if zero is used. The states of the latches form an endless ring of non-sequential numbers called a Galois field after the French mathematical prodigy Evariste Galois who discovered them. The states of the circuit form a maximum length sequence because there are as many states as are permitted by the word length. As the states of the sequence have many of the characteristics of random numbers, yet are repeatable, the result can also be called a pseudo-random sequence (prs). As the all-zeros case is disallowed, the length of a maximum length sequence generated by a register of m bits cannot exceed (2m-1) states. The Galois field, however, includes the zero term. It is useful to explore the bizarre mathematics of Galois fields which use modulo-2 arithmetic. Familiarity with such manipulations is helpful when studying the error correction, particularly the Reed–Solomon codes used in recorders. They will also be found in processes which require pseudo-random numbers such as digital dither and randomized channel codes used in, for example, DVB.
The circuit of Figure 3.26 can be considered as a counter and the four points shown will then be representing different powers of 2 from the MSB on the left to the LSB on the right. The feedback connection from the MSB to the other stages means that whenever the MSB becomes 1, two other powers are also forced to one so that the code of 1011 is generated.
Each state of the circuit can be described by combinations of powers of x, such as
This fact that three bits have the same state because they are connected together is represented by the modulo-2 equation:
Let x = a, which is a primitive element. Now
In modulo-2
In this way it can be seen that the complete set of elements of the Galois field can be expressed by successive powers of the primitive element. Note that the twisted-ring circuit of Figure 3.26 simply raises a to higher and higher powers as it is clocked. Thus the seemingly complex multi-bit changes caused by a single clock of the register become simple to calculate using the correct primitive and the appropriate power.
The numbers produced by the twisted-ring counter are not random; they are completely predictable if the equation is known. However, the sequences produced are sufficiently similar to random numbers that in many cases they will be useful. They are thus referred to as pseudorandom sequences. The feedback connection is chosen such that the expression it implements will not factorize. Otherwise a maximumlength sequence could not be generated because the circuit might sequence around one or other of the factors depending on the initial condition. A useful analogy is to compare the operation of a pair of meshed gears. If the gears have a number of teeth which is relatively prime, many revolutions are necessary to make the same pair of teeth touch again. If the number of teeth have a common multiple, far fewer turns are needed.
3.17 Introduction to the Reed–Solomon Codes
The Reed–Solomon codes (Irving Reed and Gustave Solomon) are inherently burst correcting4 because they work on multi-bit symbols rather than individual bits. The R–S codes are also extremely flexible in use. One code may be used both to detect and correct errors. The number of bursts to be corrected can be chosen at the design stage by the amount of redundancy. A further advantage of the R–S codes is that they can be used in conjunction with a separate error-detection mechanism in which case they perform the correction only by erasure. R–S codes operate at the theoretical limit of correcting efficiency. In other words, no more efficient code can be found.
In the simple CRC system described in section 3.15, the effect of the error is detected by ensuring that the codeword can be divided by a polynomial. The CRC codeword was created by addition of a redundant symbol to the data. In the Reed–Solomon codes, several errors can be isolated if the codeword is divisible by a number of polynomials. Clearly, if the codeword must divide by, say, two polynomials, it must have two redundant symbols. This is the minimum case of an R–S code. On receiving an R–S-coded message there will be two syndromes following the division. In the errorfree case, these will both be zero. If both are not zero, there is an error.
It has been stated that the effect of an error is to add an error polynomial to the message polynomial. The number of terms in the error polynomial is the same as the number of errors in the codeword. The codeword divides to zero and the syndromes are a function of the error only. There are two syndromes and two equations. By solving these simultaneous equations it is possible to obtain two unknowns. One of these is the position of the error, known as the locator, and the other is the error bit pattern, known as the corrector. As the locator is the same size as the code symbol, the length of the codeword is determined by the size of the symbol. A symbol size of eight bits is commonly used because it fits in conveniently with both 16-bit audio samples and byte-oriented computers. An eight-bit syndrome results in a locator of the same word length. Eight bits have 28 combinations, but one of these is the error-free condition, and so the locator can specify one of only 255 symbols. As each symbol contains eight bits, the codeword will be 255×8=2040 bits long.
As further examples, five-bit symbols could be used to form a codeword 31 symbols long, and three-bit symbols would form a codeword seven symbols long. This latter size is small enough to permit some worked examples, and will be used further here. Figure 3.27 shows that in the seven-symbol codeword, five symbols of three bits each, A–E, are the data, and P and Q are the two redundant symbols. This simple example will locate and correct a single symbol in error. It does not matter, however, how many bits in the symbol are in error. The two check symbols are solutions to the following equations:
where a is a constant. The original data A–E followed by the redundancy P and Q pass through the channel.
The receiver makes two checks on the message to see if it is a codeword. This is done by calculating syndromes using the following expressions, where the (′) implies the received symbol which is not necessarily correct:
(This is in fact a simple parity check.)
If two syndromes of all zeros are not obtained, there has been an error. The information carried in the syndromes will be used to correct the error. For the purpose of illustration, let it be considered that D′ has been corrupted before moving to the general case. D can be considered to be the result of adding an error of value E to the original value D such that D′=;D⊕E.
Thus the value of the corrector is known immediately because it is the same as the parity syndrome S0. The corrected data symbol is obtained simply by adding S0 to the incorrect symbol.
At this stage, however, the corrupted symbol has not yet been identified, but this is equally straightforward:
Thus the syndrome S1 is the error bit pattern E, but it has been raised to a power of a that is a function of the position of the error symbol in the block. If the position of the error is in symbol k, then k is the locator value and:
Hence:
The value of k can be found by multiplying S0 by various powers of a until the product is the same as S1. Then the power of a necessary is equal to k. The use of the descending powers of a in the codeword calculation is now clear because the error is then multiplied by a different power of a dependent upon its position, known as the locator, because it gives the position of the error. The process of finding the error position by experiment is known as a Chien search.
Whilst the expressions above show that the values of P and Q are such that the two syndrome expressions sum to zero, it is not yet clear how P and Q are calculated from the data. Expressions for P and Q can be found by solving the two R–S equations simultaneously. This has been done in Appendix 3.1. The following expressions must be used to calculate P and Q from the data in order to satisfy the codeword equations. These are:
In both the calculation of the redundancy shown here and the calculation of the corrector and the locator it is necessary to perform numerous multiplications and raising to powers. This appears to present a formidable calculation problem at both the encoder and the decoder. This would be the case if the calculations involved were conventionally executed. However, the calculations can be simplified by using logarithms. Instead of multiplying two numbers, their logarithms are added. In order to find the cube of a number, its logarithm is added three times. Division is performed by subtraction of the logarithms. Thus all the manipulations necessary can be achieved with addition or subtraction, which is straightforward in logic circuits.
The success of this approach depends upon simple implementation of log tables. Raising a constant, a, known as the primitive element, to successively higher powers in modulo-2 gives rise to a Galois field. Each element of the field represents a different power n of a. It is a fundamental of the R–S codes that all the symbols used for data, redundancy and syndromes are considered to be elements of a Galois field. The number of bits in the symbol determines the size of the Galois field, and hence the number of symbols in the codeword.
In Figure 3.28, the binary values of the elements are shown alongside the power of a they represent. In the R–S codes, symbols are no longer considered simply as binary numbers, but also as equivalent powers of a. In Reed–Solomon coding and decoding, each symbol will be multiplied by some power of a. Thus if the symbol is also known as a power of a it is only necessary to add the two powers. For example, if it is necessary to multiply the data symbol 100 by a3, the calculation proceeds as follows, referring to Figure 3.28:
Note that the results of a Galois multiplication are quite different from binary multiplication. Because all products must be elements of the field, sums of powers that exceed seven wrap around by having seven subtracted. For example:
Figure 3.29 gives an example of the Reed–Solomon encoding process.
The Galois field shown in Figure 3.28 has been used, having the primitive element a =; 010. At the beginning of the calculation of P, the symbol A is multiplied by a6. This is done by converting A to a power of a. According to Figure 3.28, 101 = a6 and so the product will be a(6+6) = a12 = a5 = 111. In the same way, B is multiplied by a, and so on, and the products are added modulo-2. A similar process is used to calculate Q.
Figure 3.29 also demonstrates that the codeword satisfies the checking equations. The modulo-2 sum of the seven symbols, S0, is 000 because each column has an even number of ones. The calculation of S1 requires multiplication by descending powers of a. The modulo-2 sum of the products is again zero. These calculations confirm that the redundancy calculation was properly carried out.
Figure 3.30 gives three examples of error correction based on this codeword. The erroneous symbol is marked with a dash. As there has been an error, the syndromes S0 and S1 will not be zero.
In the examples of Figure 3.30, two redundant symbols P and Q have been used to locate and correct one error symbol. If the positions of errors are known by some separate mechanism (see product codes, section 3.19) the locator need not be calculated. The simultaneous equations may instead be solved for two correctors. In this case the number of symbols that can be corrected is equal to the number of redundant symbols. In Figure 3.31(a) two errors have taken place, and it is known that they are in symbols C and D. Since S0 is a simple parity check, it will reflect the modulo-2 sum of the two errors. Hence .
The two errors will have been multiplied by different powers in S1, such that:
These two equations can be solved, as shown in the figure, to find EC and ED, and the correct value of the symbols will be obtained by adding these correctors to the erroneous values. It is, however, easier to set the values of the symbols in error to zero. In this way the nature of the error is rendered irrelevant and it does not enter the calculation. This setting of symbols to zero gives rise to the term erasure. In this case,
Erasing the symbols in error makes the errors equal to the correct symbol values and these are found more simply as shown in Figure 3.30 (b).
Practical systems will be designed to correct more symbols in error than in the simple examples given here. If it is proposed to correct by erasure an arbitrary number of symbols in error given by t, the codeword must be divisible by t different polynomials. Alternatively, if the errors must be located and corrected, 2t polynomials will be needed. These will be of the form (x + an) where n takes all values up to t or 2t, where a is the primitive element.
Where four symbols are to be corrected by erasure, or two symbols are to be located and corrected, four redundant symbols are necessary, and the codeword polynomial must then be divisible by
Upon receipt of the message, four syndromes must be calculated, and the four correctors or the two error patterns and their positions are determined by solving four simultaneous equations. This generally requires an iterative procedure, and a number of algorithms have been developed for the purpose.5–7 Modern FEC systems, such as ATM and DVB, use eight-bit R–S codes and erasure extensively. The primitive polynomial commonly used with GF(256) is:
The codeword will be 255 bytes long but will often be shortened by puncturing. The larger Galois fields require less redundancy, but the computational problem increases. LSI chips have been developed specifically for R–S decoding in many high-volume formats.
3.19 Interleaving
The concept of bit interleaving was introduced in connection with a singlebit correcting code to allow it to correct small bursts. With burst-correcting codes such as Reed–Solomon, bit interleave is unnecessary. In some channels, the burst size may be many bytes rather than bits, and to rely on a code alone to correct such errors would require a lot of redundancy. The solution in this case is to employ symbol interleaving, as shown in Figure 3.32. Several codewords are encoded from input data, but these are not recorded in the order they were input, but are physically reordered in the channel, so that a real burst error is split into smaller bursts in several codewords. The size of the burst seen by each codeword is now determined primarily by the parameters of the interleave, and Figure 3.33 shows that the probability of occurrence of bursts with respect to the burst length in a given codeword is modified. The number of bits in the interleave word can be made equal to the burstcorrecting ability of the code in the knowledge that it will be exceeded only very infrequently.
There are a number of different ways in which interleaving can be performed. Figure 3.34 shows that, in block interleaving, words are reordered within blocks that are correctly ordered. The block interleave is achieved by writing samples into a memory in sequential address locations from a counter, and reading the memory with non-sequential addresses from a sequencer. The effect is to convert a one-dimensional sequence of samples into a two-dimensional structure having rows and columns.
The alternative to block interleaving is convolutional interleaving where the interleave process is endless. In Figure 3.35 symbols are assembled into short blocks and then delayed by an amount proportional to the position in the block. It will be seen from the figure that the delays have the effect of shearing the symbols so that columns on the left side of the diagram become diagonals on the right. When the columns on the right are read, the convolutional interleave will be obtained. Convolutional interleave works well in transmission applications such as DVB where the signal is continuous. Convolutional interleave has the advantage of requiring less memory to implement than a block code. This is because a block code requires the entire block to be written into the memory before it can be read, whereas a convolutional code requires only enough memory to cause the required delays.
3.20 Product Codes
In the presence of burst errors alone, the system of interleaving works very well, but it is known that in most practical channels there are also uncorrelated errors of a few bits due to noise. Figure 3.36 shows an interleaving system where a dropout-induced burst error has occurred which is at the maximum correctable size. All three codewords involved are working at their limit of one symbol. A random error due to noise in the vicinity of a burst error will cause the correction power of the code to be exceeded. Thus a random error of a single bit causes a further entire symbol to fail. This is a weakness of an interleaving system designed solely to handle dropout-induced bursts. Practical high-density equipment must address the problem of noise-induced or random errors and burst errors occurring at the same time. This is done by forming codewords both before and after the interleave process. In block interleaving, this results in a product code, whereas in the case of convolutional interleave the result is called cross-interleaving
Figure 3.37 shows that in a product code the redundancy calculated first and checked last is called the outer code, and the redundancy calculated second and checked first is called the inner code. The inner code is formed along tracks on the medium. Random errors due to noise are corrected by the inner code and do not impair the burst-correcting power of the outer code. Burst errors are declared uncorrectable by the inner code which flags the bad samples on the way into the deinterleave memory. The outer code reads the error flags in order to correct the flagged symbols by erasure. The error flags are also known as erasure flags. As it does not have to compute the error locations, the outer code needs half as much redundancy for the same correction power. Thus the inner code redundancy does not raise the code overhead. The combination of codewords with interleaving in several dimensions yields a truly synergistic error-protection strategy, in that the end result is more powerful than the sum of the parts. This technique is used in the FEC strategy for delivering audio over ATM.
3.21 Networks
A network is basically a communication resource shared for economic reasons. Like any shared resource, decisions have to be made, somewhere and somehow, how the resource is to be used. In the absence of such decisions the resultant chaos will be such that the resource might as well not exist. In communications networks the resource is the ability to convey data from any node or port to any other. On a particular cable, clearly only one transaction of this kind can take place at any one instant even though in practice many nodes will simultaneously be wanting to transmit data. Arbitration is needed to determine which node is allowed to transmit.
There are a number of different arbitration protocols and these have evolved to support the needs of different types of network. In small networks, such as LANs, a single point failure halting the entire network may be acceptable, whereas in a public transport network owned by a telecommunications company, the network will be redundant so that if a particular link fails data may be sent via an alternative route. A link that has reached its maximum capacity may also be supplanted by transmission over alternative routes.
In physically small networks, arbitration may be carried out in a single location. This is fast and efficient, but if the arbitrator fails it leaves the system completely crippled. The processor buses in computers work in this way. In centrally arbitrated systems the arbitrator needs to know the structure of the system and the status of all the nodes. Following a configuration change, due perhaps to the installation of new equipment, the arbitrator needs to be told what the new configuration is, or have a mechanism which allows it to explore the network and learn the configuration. Central arbitration is only suitable for small networks that change their configuration infrequently.
In other networks the arbitration is distributed so that some decisionmaking ability exists in every node. This is less efficient but it does allow at least some of the network to continue operating after a component failure. Distributed arbitration also means that each node is self-sufficient and so no changes need to be made if the network is reconfigured by adding or deleting a node. This is the only possible approach in wide area networks where the structure may be very complex and change dynamically in the event of failures or overload.
Ethernet uses distributed arbitration. FireWire is capable of using both types of arbitration. A small amount of decision-making ability is built into every node so that distributed arbitration is possible. However, if one of the nodes happens to be a computer, it can run a centralized arbitration algorithm.
The physical structure of a network is subject to some variation as Figure 3.38 shows. In radial networks (a) each port has a unique cable connection to a device called a hub. The hub must have one connection for every port and this limits the number of ports. However, a cable failure will only result in the loss of one port. In a ring system (b) the nodes are connected like a daisy chain with each node acting as a feedthrough. In this case the arbitration requirement must be distributed. With some protocols, a single cable break doesn’t stop the network operating. Depending on the protocol, simultaneous transactions may be possible provided they don’t require the same cable. For example, in a storage network, a disk drive may be outputting data to an editor whilst another drive is backing up data to a tape streamer. For the lowest cost, all nodes are physically connected in parallel to the same cable. Figure 3.38(c) shows that a cable break would divide the network into two halves, but it is possible that the impedance mismatch at the break could stop both halves working.
One of the concepts involved in arbitration is priority, which is fundamental to providing an appropriate quality of service. If two processes both want to use a network, the one with the highest priority would normally go first. Attributing priority must be done carefully because some of the results are non-intuitive. For example, it may be beneficial to give a high priority to a humble device that has a low data rate for the simple reason that if it is given use of the network it won’t need it for long. In a television environment transactions concerned with on-air processes would have priority over file transfers concerning production and editing.
When a device gains access to the network to perform a transaction, generally no other transaction can take place until it has finished. Consequently it is important to limit the amount of time that a given port can stay on the bus. In this way when the time limit expires, a further arbitration must take place. The result is that the network resource rotates between transactions rather than one transfer hogging the resource and shutting everyone else out.
It follows from the presence of a time (or data quantity) limit that ports must have the means to break large files up into frames or cells and reassemble them on reception. This process is sometimes called adaptation. If the data to be sent originally exist at a fixed bit rate, some buffering will be needed so that the data can be time-compressed into the available frames. Each frame must be contiguously numbered and the system must transmit a file size or word count so that the receiving node knows when it has received every frame in the file.
The error-detection system interacts with this process because if any frame is in error on reception, the receiving node can ask for a retransmission of the frame. This is more efficient than retransmitting the whole file. Figure 3.39 shows the flow chart for a receiving node.
Breaking files into frames helps to keep down the delay experienced by each process using the network. Figure 3.40 shows that each frame may be stored ready for transmission in a silo memory. It is possible to make the priority a function of the number of frames in the silo, as this is a direct measure of how long a process has been kept waiting. Isochronous systems must do this in order to meet maximum delay specifications. In Figure 3.40 once frame transmission has completed, the arbitrator will determine which process sends a frame next by examining the depth of all the frame buffers. MPEG transport stream multiplexers and networks delivering MPEG data must work in this way because the transfer is isochronous and the amount of buffering in a decoder is limited for economic reasons.
A central arbitrator is relatively simple to implement because when all decisions are taken centrally there can be no timing difficulty (assuming a well-engineered system). In a distributed system, there is an extra difficulty due to the finite time taken for signals to travel down the data paths between nodes.
Ethernet uses a protocol called CSMA/CD (carrier sense multiple access with collision detect) developed by DEC and Xerox. This is a distributed arbitration network where each node follows some simple rules. The first of these is not to transmit if an existing bus signal is detected. The second is not to transmit more than a certain quantity of data before releasing the bus. Devices wanting to use the bus will see bus signals and so will wait until the present bus transaction finishes. This must happen at some point because of the frame size limit. When the frame is completed, signalling on the bus should cease. The first device to sense the bus becoming free and to assert its own signal will prevent any other nodes transmitting according to the first rule. Where numerous devices are present it is possible to give them a priority structure by providing a delay between sensing the bus coming free and beginning a transaction. High-priority devices will have a short delay so they get in first. Lowerpriority devices will only be able to start a transaction if the high-priority devices don’t need to transfer.
It might be thought that these rules would be enough and everything would be fine. Unfortunately the finite signal speed means that there is a flaw in the system. Figure 3.41 shows why. Device A is transmitting and devices B and C both want to transmit and have equal priority. At the end of A’s transaction, devices B and C see the bus become free at the same instant and start a transaction. With two devices driving the bus, the resultant waveform is meaningless. This is known as a collision and all nodes must have means to recover from it. First, each node will read the bus signal at all times. When a node drives the bus, it will also read back the bus signal and compare it with what was sent. Clearly if the two are the same all is well, but if there is a difference, this must be because a collision has occurred and two devices are trying to determine the bus voltage at once.
If a collision is detected, both colliding devices will sense the disparity between the transmitted and readback signals, and both will release the bus to terminate the collision. However, there is no point is adhering to the simple protocol to reconnect because this will simply result in another collision. Instead each device has a built-in delay that must expire before another attempt is made to transmit. This delay is not fixed, but is controlled by a random number generator and so changes from transaction to transaction.
The probability of two node devices arriving at the same delay is infinitesimally small. Consequently if a collision does occur, both devices will drop the bus, and they will start their back-off timers. When the first timer expires, that device will transmit and the other will see the transmission and remain silent. In this way the collision is not only handled, but also prevented from happening again.
The performance of Ethernet is usually specified in terms of the bit rate at which the cabling runs. However, this rate is academic because it is not available all the time. In a real network bit rate is lost by the need to send headers and error-correction codes and by the loss of time due to interframe spaces and collision handling. As the demand goes up, the number of collisions increases and throughput goes down. Collision-based arbitrators do not handle congestion well.
An alternative method of arbitration developed by IBM is shown in Figure 3.42. This is known as a token ring system. All the nodes have an input and an output and are connected in a ring that must be intact for the system to work. Data circulate in one direction only. If data are not addressed to a node that receives them, the data will be passed on. When the data arrive at the addressed node, that node will capture the data as well as passing them on with an acknowledge added. Thus the data packet travels right around the ring back to the sending node. When the sending node receives the acknowledge, it will transmit a token packet. This token packet passes to the next node, which will pass it on if it does not wish to transmit. If no device wishes to transmit, the token will circulate endlessly. However, if a device has data to send, it simply waits until the token arrives again and captures it. This node can now transmit data in the knowledge that there cannot be a collision because no other node has the token.
In simple token ring systems, the transmitting node transmits idle characters after the data packet has been sent in order to maintain synchronization. The idle character transmission will continue until the acknowledge arrives. In the case of long packets the acknowledge will arrive before the packet has all been sent and no idle characters are necessary. However, with short packets idle characters will be generated. These idle characters use up ring bandwidth.
Later token ring systems use early token release (ETR). After the packet has been transmitted, the sending node sends a token straight away. Another node wishing to transmit can do so as soon as the current packet has passed.
It might be thought that the nodes on the ring would transmit in their physical order, but this is not the case because a priority system exists. Each node can have a different priority if necessary. If a high-priority node wishes to transmit, as a packet from elsewhere passes through that node, the node will set reservation bits with its own priority level. When the sending node finishes and transmits a token, it will copy that priority level into the token. In this way nodes with a lower priority level will pass the token on instead of capturing it. The token will ultimately arrive at the high-priority node.
The token ring system has the advantage that it does not waste throughput with collisions and so the full capacity is always available. However, if the ring is broken the entire network fails.
In Ethernet the performance is degraded by number of transactions, not by the number of nodes, whereas in token ring, performance is degraded by the number of nodes.
3.22 MPEG Packets and Time Stamps
The video elementary stream is an endless bitstream representing pictures that take a variable length of time to transmit. Bi-direction coding means that pictures are not necessarily in the correct order. Storage and transmission systems prefer discrete blocks of data and so elementary streams are packetized to form a PES (packetized elementary stream). Audio elementary streams are also packetized. A packet is shown in Figure 3.43. It begins with a header containing a unique packet start code and a code which identifies the type of data stream. Optionally the packet header also may contain one or more time stamps used for synchronizing the video decoder to real time and for obtaining lip sync.
Figure 3.44 shows that a time stamp is a sample of the state of a counter driven by a 90 kHz clock. This is obtained by dividing down the master 27 MHz clock of MPEG-2. This 27 MHz clock must be locked to the video frame rate and the audio sampling rate of the program concerned. There are two types of time stamp: PTS and DTS. These are abbreviations for presentation time stamp and decode time stamp. A presentation time stamp determines when the associated picture should be displayed on the screen, whereas a decode time stamp determines when it should be decoded. In bi-directional coding these times can be quite different.
Audio packets have only presentation time stamps. Clearly if lip sync is to be obtained, the audio sampling rate of a given program must have been locked to the same master 27 MHz clock as the video and the time stamps must have come from the same counter driven by that clock.
In practice, the time between input pictures is constant and so there is a certain amount of redundancy in the time stamps. Consequently PTS/DTS need not appear in every PES packet. Time stamps can be up to 100 ms apart in transport streams. As each picture type (I, P or B) is flagged in the bitstream, the decoder can infer the PTS/DTS for every picture from the ones actually transmitted.
The MPEG-2 transport stream is intended to be a multiplex of many TV programs with their associated sound and data channels, although a single program transport stream (SPTS) is possible. The transport stream is based upon packets of constant size so that multiplexing, adding errorcorrection codes and interleaving in a higher layer is eased. Figure 3.45 shows these to be always 188 bytes long.
Transport stream packets always begin with a header. The remainder of the packet carries data known as the payload. For efficiency, the normal header is relatively small, but for special purposes the header may be extended. In this case the payload gets smaller so that the overall size of the packet is unchanged. Transport stream packets should not be confused with PES packets that are larger and vary in size. PES packets are broken up to form the payload of the transport stream packets.
The header begins with a sync byte which is a unique pattern detected by a demultiplexer. A transport stream may contain many different elementary streams and these are identified by giving each a unique 13-bit packet identification code or PID which is included in the header. A multiplexer seeking a particular elementary stream simply checks the PID of every packet and accepts only those that are matching.
In a multiplex there may be many packets from other programs in between packets of a given PID. To help the demultiplexer, the packet header contains a continuity count. This is a four-bit value which increments at each new packet having a given PID.
This approach allows statistical multiplexing as it does matter how many or how few packets have a given PID; the demux will still find them. Statistical multiplexing has the problem that it is virtually impossible to make the sum of the input bit rates constant. Instead the multiplexer aims to make the average data bit rate slightly less than the maximum and the overall bit rate is kept constant by adding ‘stuffing’ or null packets. These packets have no meaning, but simply keep the bit rate constant. Null packets always have a PID of 8191 (all ones) and the demultiplexer discards them.
A transport stream is a multiplex of several TV programs and these may have originated from widely different locations. It is impractical to expect all the programs in a transport stream to be genlocked and so the stream is designed from the outset to allow unlocked programs. A decoder running from a transport stream has to genlock to the encoder and the transport stream has to have a mechanism to allow this to be done independently for each program. The synchronizing mechanism is called program clock reference (PCR).
Figure 3.46 shows how the PCR system works. The goal is to re-create at the decoder a 27 MHz clock synchronous with that at the encoder. The encoder clock drives a 48-bit counter that continuously counts up to the maximum value before overflowing and beginning again.
A transport stream multiplexer will periodically sample the counter and place the state of the count in an extended packet header as a PCR (see Figure 3.45). The demultiplexer selects only the PIDs of the required program, and it will extract the PCRs from the packets in which they were inserted.
The PCR codes are used to control a numerically locked loop (NLL). The NLL contains a 27 MHz VCXO (voltage controlled crystal oscillator), a variable-frequency oscillator based on a crystal having a relatively small frequency range.
The VCXO drives a 48-bit counter in the same way as in the encoder. The state of the counter is compared with the contents of the PCR and the difference is used to modify the VCXO frequency. When the loop reaches lock, the decoder counter would arrive at the same value as is contained in the PCR and no change in the VCXO would then occur. In practice the transport stream packets will suffer from transmission jitter and this will create phase noise in the loop. This is removed by loop filter so that the VCXO effectively averages a large number of phase errors.
A heavily damped loop will reject jitter well, but will take a long time to lock. Lockup time can be reduced when switching to a new program if the decoder counter is jammed to the value of the first PCR received in the new program. The loop filter may also have its time constants shortened during lockup. Once a synchronous 27 MHz clock is available at the decoder, this can be divided to provide the 90 kHz clock frequency that drives the time stamp mechanism.
The entire timebase stability of the decoder is no better than the stability of the clock derived from PCR. MPEG-2 sets standards for the maximum amount of jitter present in PCRs in a real transport stream.
Clearly, if the 27 MHz clock in the receiver is locked to one encoder it can only receive elementary streams encoded with that clock. If it is attempted to decode, for example, an audio stream generated from a different clock, the result will be periodic buffer overflows or underflows in the decoder. Thus MPEG defines a program in a manner that relates to timing. A program is a set of elementary streams encoded with the same master clock.
3.24 Transport Stream Multiplexing
A transport stream multiplexer is a complex device because of the number of functions it must perform. A fixed multiplexer will be considered first. In a fixed multiplexer, the bit rate of each of the programs must be specified so that the sum does not exceed the payload bit rate of the transport stream. The payload bit rate is the overall bit rate less the packet headers and program specific information (PSI) rate.
In practice, the programs will not be synchronous to one another, but the transport stream must produce a constant packet rate given by the bit rate divided by 188 bytes, the packet length. Figure 3.47 shows how this is handled. Each elementary stream entering the multiplexer passes through a buffer divided into payload-sized areas. Note that periodically the payload area is made smaller because of the requirement to insert PCR.
MPEG-2 decoders also have a quantity of buffer memory. The challenge to the multiplexer is to take packets from each program in such a way that neither its own buffers nor the buffers in any decoder overflow or underflow. This requirement is met by sending packets from all programs as evenly as possible rather than bunching together a lot of packets from one program. When the bit rates of the programs are different, the only way this can be handled is to use the buffer contents indicators. The more full a buffer is, the more likely it should be that a packet will be read from it. Thus a buffer content arbitrator can decide which program should have a packet allocated next.
If the sum of the input bit rates is correct, the buffers should all slowly empty because the overall input bit rate has to be less than the payload bit rate. This allows for the insertion of PSI. Whilst PATs and PMTs are being transmitted, the program buffers will fill up again. The multiplexer can also fill the buffers by sending more PCRs as this reduces the payload of each packet. In the event that the multiplexer has sent enough of everything but still can’t fill a packet then it will send a null packet with a PID of 8191. Decoders will discard null packets and as they convey no useful data, the multiplexer buffers will all fill whilst null packets are being transmitted.
The use of null packets means that the bit rates of the elementary streams do not need to be synchronous with one another or with the transport stream bit rate. As each elementary stream can have its own PCR, it is not necessary for the different programs in a transport stream to be genlocked to one another; in fact they don’t even need to have the same frame rate.
This approach allows the transport stream bit rate to be accurately defined and independent of the timing of the data carried. This is important because the transport stream bit rate determines the spectrum of the transmitter and this must not vary.
In a statistical multiplexer or statmux, the bit rate allocated to each program can vary dynamically. Figure 3.48 shows there must be a tight connection between the statmux and the associated compressors. Each compressor has a buffer memory which is emptied by a demand clock from the statmux. In a normal, fixed bit rate, coder the buffer content feeds back and controls the requantizer. In statmuxing this process is less severe and only takes place if the buffer is very close to full, because the degree of coding difficulty is also fed to the statmux.
The statmux contains an arbitrator that allocates more packets to the program with the greatest coding difficulty. Thus if a particular program encounters difficult material it will produce large prediction errors and begin to fill its output buffer. As the statmux has allocated more packets to that program, more data will be read out of that buffer, preventing overflow. Of course this is only possible if the other programs in the transport stream are handling typical video.
In the event that several programs encounter difficult material at once, clearly the buffer contents will rise and the coders will have to increase their compression factors.
Appendix 3.1 Calculation of Reed–Solomon Generator Polynomials
For a Reed–Solomon codeword over GF(23), there will be seven three-bit symbols. For location and correction of one symbol, there must be two redundant symbols P and Q, leaving A–E for data.
The following expressions must be true, where a is the primitive element of and is XOR throughout:
Dividing equation (2) by a:
Cancelling Q, and collecting terms:
Using section 3.16 to calculate (an + 1), e.g. a6+ 1=101 + 001=100=a2:
Multiplying equation (1) by a2 and equating to equation (2):
Cancelling terms a2P and collecting terms (remember ):
Adding powers according to section 3.16, e.g.
References
1. Tang, D.T., Run-length-limited codes. IEEE International Symposium on Information Theory (1969)
2. AES Recommended practice for Digital Audio Engineering – Serial Multichannel Audio Digital Interface (MADI). J. Audio Eng. Soc., 39, No. 5, 371–377 (1991)
3. Shannon, C.E., A mathematical theory of communication. Bell System Tech. J., 27, 379 (1948)
4. Reed, I.S. and Solomon, G., Polynomial codes over certain finite fields. J. Soc. Indust. Appl. Math., 8, 300–304 (1960)
5. Berlekamp, E.R., Algebraic Coding Theory. New York: McGraw-Hill (1967). Reprint edition: Laguna Hills, CA: Aegean Park Press (1983)
6. Sugiyama, Y. et al., An erasures and errors decoding algorithm for Goppa codes. IEEE Trans. Inf. Theory, IT–22 (1976)
7. Peterson, W.W. and Weldon, E.J., Error Correcting Codes, 2nd edn, Cambridge MA: MIT Press (1972)
98.82.120.188