The distances involved in transmission vary from that of a short cable between adjacent units to communication anywhere on earth via data networks or radio communication. This chapter must consider a correspondingly wide range of possibilities. The importance of direct digital interconnection between audio devices was realized early, and this was standardized by the AES/EBU digital audio interface for professional equipment and the SPDIF interface for consumer equipment. These standards were extended to produce the MADI standard for multi-channel interconnects. All of these work on uncompressed PCM audio.
As digital audio and computers continue to converge, computer networks are increasingly used for audio purposes. Audio may be transmitted on networks such as Ethernet, ISDN, ATM and Internet. Here compression may or may not be used, and non-real-time transmission may also be found according to economic pressures.
Digital audio is now being broadcast in its own right as DAB, alongside traditional analog television as NICAM digital audio and as the sound channels of digital television broadcasts. All of these applications use compression.
Whatever the transmission medium, one universal requirement is a reliable synchronization system. In PCM systems, synchronization of the sampling rate between sources is necessary for mixing. In packet-based networks, synchronization allows the original sampling rate to be established at the receiver despite intermittent packet delivery. In digital television systems, synchronization between vision and sound is a further requirement.
The AES/EBU digital audio interface, originally published in 1985,1 was proposed to embrace all the functions of existing formats in one standard. Alongside the professional format, Sony and Philips developed a similar format now known as SPDIF (Sony Philips Digital Interface) intended for consumer use. This offers different facilities to suit the application, yet retains sufficient compatibility with the professional interface so that, for many purposes, consumer and professional machines can be connected together.2,3
During the standardization process it was considered desirable to be able to use existing analog audio cabling for digital transmission. Existing professional analog signals use nominally 600Ω impedance balanced line screened signalling, with one cable per audio channel, or in some cases one twisted pair per channel with a common screen. The 600Ω standard came from telephony where long distances are involved in comparison with electrical audio wavelengths. The distances likely to be found within a studio complex are short compared to audio electrical wavelengths and as a result at audio frequency the impedance of cable is high and the 600Ω figure is that of the source and termination. Such a cable has a different impedance at the frequencies used for digital audio, around 110Ω.
If a single serial channel is to be used, the interconnect has to be self-clocking and self-synchronizing, i.e. the single signal must carry enough information to allow the boundaries between individual bits, words and blocks to be detected reliably. To fulfil these requirements, the AES/EBU and SPDIF interfaces use FM channel code (see Chapter 6) which is DC-free, strongly self-clocking and capable of working with a changing sampling rate. Synchronization of deserialization is achieved by violating the usual encoding rules.
The use of FM means that the channel frequency is the same as the bit rate when sending data ones. Tests showed that in typical analog audio cabling installations, sufficient bandwidth was available to convey two digital audio channels in one twisted pair. The standard driver and receiver chips for RS-422A4 data communication (or the equivalent CCITT-V.11) are employed for professional use, but work by the BBC5 suggested that equalization and transformer coupling were desirable for longer cable runs, particularly if several twisted pairs occupy a common shield. Successful transmission up to 350 m has been achieved with these techniques.6 Figure 7.1 shows the standard configuration. The output impedance of the drivers will be about 110Ω, and the impedance of the cable used should be similar at the frequencies of interest. The driver was specified in AES-3–1985 to produce between 3 and 10 V peak-to-peak into such an impedance but this was changed to between 2 and 7 volts in AES-3–1992 to better reflect the characteristics of actual RS-422 driver chips.
The original receiver impedance was set at a high 250Ω with the intention that up to four receivers could be driven from one source. This was found to be inadvisable because of reflections caused by impedance mismatches and AES-3–1992 is now a point-to-point interface with source, cable and load impedance all set at 110Ω. Whilst analog audio cabling was adequate for digital signalling, cable manufacturers have subsequently developed cables which are more appropriate for new digital installations, having lower loss factors allowing greater transmission distances.
In Figure 7.2, the specification of the receiver is shown in terms of the minimum eye pattern (see Chapter 6) which can be detected without error. It will be noted that the voltage of 200 mV specifies the height of the eye opening at a width of half a channel bit period. The actual signal amplitude will need to be larger than this, and even larger if the signal contains noise. Figure 7.3 shows the recommended equalization characteristic which can be applied to signals received over long lines.
As an adequate connector in the shape of the XLR is already in wide service, the connector made to IEC 268 Part 12 has been adopted for digital audio use. Effectively, existing analog audio cables having XLR connectors can be used without alteration for digital connections. The AES/EBU standard does, however, require that suitable labelling should be used so that it is clear that the connections on a particular unit are digital.
The need to drive long cables does not generally arise in the domestic environment, and so a low-impedance balanced signal was not considered necessary. The electrical interface of the consumer format uses a 0.5 V peak single-ended signal, which can be conveyed down conventional audio-grade coaxial cable connected with RCA ‘phono’ plugs. Figure 7.4 shows the resulting consumer interface as specified by IEC 958.
There is the alternative7 of a professional interface using coaxial cable and BNC connectors for distances of around 1000 m. This is simply the AES/EBU protocol but with a 75Ω coaxial cable carrying a one-volt signal so that it can be handled by analog video distribution amplifiers. Impedance converting transformers are commercially available allowing balanced 110Ω to unbalanced 75Ω matching.
In Figure 7.5 the basic structure of the professional and consumer formats can be seen. One subframe consists of 32 bit-cells, of which four will be used by a synchronizing pattern. Subframes from the two audio channels, A and B, alternate on a time-division basis. Up to 24-bit sample wordlength can be used, which should cater for all conceivable future developments, but normally 20-bit maximum length samples will be available with four auxiliary data bits, which can be used for a voice-grade channel in a professional application. In a consumer DAT machine, subcode can be transmitted in bits 4–11, and the sixteen-bit audio in bits 12–27.
The format specifies that audio data must be in two’s complement coding. Whilst pure binary could accept various alignments of different wordlengths with only a level change, this is not true of two’s complement. If different wordlengths are used, the MSBs must always be in the same bit position otherwise the polarity will be misinterpreted. Thus the MSB has to be in bit 27 irrespective of wordlength. Shorter words are leading zero filled up to the twenty-bit capacity.
Four status bits accompany each subframe. The validity flag will be reset if the associated sample is reliable. Whilst there have been many aspirations regarding what the V bit could be used for, in practice a single bit cannot specify much, and if combined with other V bits to make a word, the time resolution is lost. AES-3–1992 described the V bit as indicating that the information in the associated subframe is ‘suitable for conversion to an analog signal’. Thus it might be reset if the interface was being used for non-audio data as is done, for example, in CDI players.
The parity bit produces even parity over the subframe, such that the total number of ones in the subframe is even. This allows for simple detection of an odd number of bits in error, but its main purpose is that it makes successive sync patterns have the same polarity, which can be used to improve the probability of detection of sync. The user and channel-status bits are discussed later.
Two of the subframes described above make one frame, which repeats at the sampling rate in use. The first subframe will contain the sample from channel A, or from the left channel in stereo working. The second subframe will contain the sample from channel B, or the right channel in stereo. At 48 kHz, the bit rate will be 3.072 MHz, but as the sampling rate can vary, the clock rate will vary in proportion.
In order to separate the audio channels on receipt the synchronizing patterns for the two subframes are different as Figure 7.6 shows. These sync patterns begin with a run length of 1.5 bits which violates the FM channel coding rules and so cannot occur due to any data combination. The type of sync pattern is denoted by the position of the second transition which can be 0.5, 1.0 or 1.5 bits away from the first. The third transition is designed to make the sync patterns DC-free.
The channel status and user bits in each subframe form serial data streams with one bit of each per audio channel per frame. The channel status bits are given a block structure and synchronized every 192 frames, which at 48 kHz gives a block rate of 250 Hz, corresponding to a period of four milliseconds. In order to synchronize the channel-status blocks, the channel A sync pattern is replaced for one frame only by a third sync pattern which is also shown in Figure 7.6. The AES standard refers to these as X,Y and Z whereas IEC 958 calls them M,W and B. As stated, there is a parity bit in each subframe, which means that the binary level at the end of a subframe will always be the same as at the beginning. Since the sync patterns have the same characteristic, the effect is that sync patterns always have the same polarity and the receiver can use that information to reject noise. The polarity of transmission is not specified, and indeed an accidental inversion in a twisted pair is of no consequence, since it is only the transition that is of importance, not the direction.
When 24-bit resolution is not required, the four auxiliary bits can be used to provide talkback. This was proposed by broadcasters8 to allow voice coordination between studios as well as program exchange on the same cables.
Twelve-bit samples of the talkback signal are taken at one third the main sampling rate. Each twelve-bit sample is then split into three four-bit nibbles which can be sent in the auxiliary data slot of three successive samples in the same audio channel. As there are 192 nibbles per channel status block period, there will be exactly 64 talkback samples in that period. The reassembly of the nibbles can be synchronized by the channel status sync pattern. Channel status byte 2 reflects the use of auxiliary data in this way.
In both the professional and consumer formats, the sequence of channel-status bits over 192 subframes builds up a 24-byte channel-status block. However, the contents of the channel status data are completely different between the two applications. The professional channel status structure is shown in Figure 7.7. Byte 0 determines the use of emphasis and the sampling rate, with details in Figure 7.8. Byte 1 determines the channel usage mode, i.e. whether the data transmitted are a stereo pair, two unrelated mono signals or a single mono signal, and details the user bit handling. Figure 7.9 gives details. Byte 2 determines wordlength as in Figure 7.10. This was made more comprehensive in AES-3–1992. Byte 3 is applicable only to multichannel applications. Byte 4 indicates the suitability of the signal as a sampling rate reference and will be discussed in more detail later in this chapter.
There are two slots of four bytes each which are used for alphanumeric source and destination codes. These can be used for routing. The bytes contain seven-bit ASCII characters (printable characters only) sent LSB first with the eighth bit set to zero acording to AES-3–1992. The destination code can be used to operate an automatic router, and the source code will allow the origin of the audio and other remarks to be displayed at the destination.
Bytes 14–17 convey a 32-bit sample address which increments every channel status frame. It effectively numbers the samples in a relative manner from an arbitrary starting point. Bytes 18–21 convey a similar number, but this is a timeof-day count, which starts from zero at midnight. As many digital audio devices do not have real-time clocks built in, this cannot be relied upon.
AES-3–92 specified that the time-of-day bytes should convey the real time at which a recording was made, making it rather like timecode. There are enough combinations in 32 bits to allow a sample count over 24 hours at 48 kHz. The sample count has the advantage that it is universal and independent of local supply frequency. In theory if the sampling rate is known, conventional hours, minutes, seconds, frames timecode can be calculated from the sample count, but in practice it is a lengthy computation and users have proposed alternative formats in which the data from EBU or SMPTE timecode are transmitted directly in these bytes. Some of these proposals are in service as de-facto standards.
The penultimate byte contains four flags which indicate that certain sections of the channel-status information are unreliable (see Figure 7.11). This allows the transmission of an incomplete channel-status block where the entire structure is not needed or where the information is not available. For example, setting bit 5 to a logical one would mean that no origin or destination data would be interpreted by the receiver, and so it need not be sent.
The final byte in the message is a CRCC which converts the entire channelstatus block into a codeword (see Chapter 6). The channel status message takes 4 ms at 48 kHz and in this time a router could have switched to another signal source. This would damage the transmission, but will also result in a CRCC failure so the corrupt block is not used. Error correction is not necessary, as the channel status data are either stationary, i.e. they stay the same, or change at a predictable rate, e.g. timecode. Stationary data will only change at the receiver if a good CRCC is obtained.
The user channel consists of one bit per audio channel per sample period. Unlike channel status, which only has a 192-bit frame structure, the user channel can have a flexible frame length. Figure 7.9 showed how byte 1 of the channel status frame describes the state of the user channel. Many professional devices do not use the user channel at all and would set the all-zeros code. If the user channel frame has the same length as the channel status frame then code 0001 can be set. One user channel format which is standardized is the data packet scheme of AES18–1992.9,10 This was developed from proposals to employ the user channel for labelling in an asynchronous format.11 A computer industry standard protocol known as HDLC (High-level Data Link Control)12 is employed in order to take advantage of readily available integrated circuits.
The frame length of the user channel can be conveniently made equal to the frame period of an associated device. For example, it may be locked to Film, TV or DAT frames. The frame length may vary in NTSC as there are not an integer number of audio samples in a TV frame.
Whilst the AES/EBU digital interface excels for the interconnection of stereo equipment, it is at a disadvantage when a large number of channels is required. MADI13 was designed specifically to address the requirement for digital connection between multitrack recorders and mixing consoles by a working group set up jointly by Sony, Mitsubishi, Neve and SSL.
The standard provides for 56 simultaneous digital audio channels which are conveyed point-to-point on a single 75Ω coaxial cable fitted with BNC connectors (as used for analog video) along with a separate synchronization signal. A distance of at least 50 m can be achieved.
Essentially MADI takes the subframe structure of the AES/EBU interface and multiplexes 56 of these into one sample period rather than the original two. Clearly this will result in a considerable bit rate, and the FM channel code of the AES/EBU standard would require excessive bandwidth. A more efficient code is used for MADI. In the AES/EBU interface the data rate is proportional to the sampling rate in use. Losses will be greater at the higher bit rate of MADI, and the use of a variable bit rate in the channel would make the task of achieving optimum equalization difficult. Instead the data bit rate is made a constant 100 megabits per second, irrespective of sampling rate. At lower sampling rates, the audio data are padded out to maintain the channel rate.
The MADI standard is effectively a superset of the AES/EBU interface in that the subframe data content is identical. This means that a number of separate AES/ EBU signals can be fed into a MADI channel and recovered in their entirety on reception. The only caution required with such an application is that all channels must have the same synchronized sampling rate. The primary application of MADI is to multitrack recorders, and in these machines the sampling rates of all tracks are intrinsically synchronous. When the replay speed of such machines is varied, the sampling rate of all channels will change by the same amount, so they will remain synchronous.
At one extreme, MADI will accept a 32 kHz recorder playing 12½ per cent slow, and at the other extreme a 48 kHz recorder playing 12½ per cent fast. This is almost a factor of 2:1. Figure 7.12 shows some typical MADI configurations.
The data transmission of MADI is made using a group code, where groups of four data bits are represented by groups of five channel bits. Four bits have sixteen combinations, whereas five bits have 32 combinations. Clearly only 16 out of these 32 are necessary to convey all possible data. It is then possible to use some of the remaining patterns when it is required to pad out the data rate. The padding symbols will not correspond to a valid data symbol and so they can be recognized and thrown away on reception. A further use of this coding technique is that the 16 patterns of 5 bits which represent real data are chosen to be those which will have the best balance between high and low states, so that DC offsets at the receiver can be minimized. The 4/5 code adopted is the same one used for a computer transmission format known as FDDI, so that existing hardware can be used.
Figure 7.13(a) shows the frame structure of MADI. In one sample period, 56 time slots are available, and these each contain eight 4/5 symbols, corresponding to 32 data bits or 40 channel bits. Depending on the sampling rate in use, more or less padding symbols will need to be inserted in the frame to maintain a constant channel bit rate. Since the receiver does not interpret the padding symbols as data, it is effectively blind to them, and so there is considerable latitude in the allowable positions of the padding. Figure 7.13(b) shows some possibilities. The padding must not be inserted within a channel, only between channels, but the channels need not necessarily be separated by padding. At one extreme, all channels can be butted together, followed by a large padding area, or the channels can be evenly spaced throughout the frame. Although this sounds rather vague, it is intended to allow freedom in the design of associated hardware. Multitrack recorders generally have some form of internal multiplexed data bus, and these have various architectures and protocols. The timing flexibility allows an existing bus timing structure to be connected to a MADI link with the minimum of hardware. Since the channels can be inserted at a variety of places within the frame, the necessity of a separate synchronizing link between transmitter and receiver becomes clear.
Figure 7.14 shows the MADI channel format, which should be compared with the AES/EBU subframe shown in Figure 7.5. The last 28 bits are identical, and differences are only apparent in the synchronizing area. In order to remain transparent to an AES/EBU signal, which can contain two audio channels, MADI must tell the receiver whether a particular channel contains the A leg or B leg, and when the AES/EBU channel status block sync occurs. Bits 2 and 3 perform these functions. As the 56 channels of MADI follow one another in numerical order, it is necessary to identify channel zero so that the channels are not mixed up. This is the function of bit 0, which is set in channel zero and reset in all other channels. Finally bit 1 is set to indicate an active channel, for the case when less than 56 channels are being fed down the link. Active channels have bit 1 set, and must be consecutive starting at channel zero. Inactive channels have all bits set to zero, and must follow the active channels.
Whereas a parallel bus is ideal for a distributed multichannel system, for a pointto-point connection, the use of fibre optics is feasible, particularly as distance increases. An optical fibre is simply a glass filament which is encased in such a way that light is constrained to travel along it. Transmission is achieved by modulating the power of an LED or small laser coupled to the fibre. A phototransistor converts the received light back to an electrical signal.
Optical fibres have numerous advantages over electrical cabling. The bandwidth available is staggering. Optical fibres neither generate, nor are prone to, electromagnetic interference and, as they are insulators, ground loops cannot occur.14 The disadvantage of optical fibres is that the terminations of the fibre where transmitters and receivers are attached suffer optical losses, and while these can be compensated in point-to-point links, the use of a bus structure is not really feasible. Fibre-optic links are already in service in digital audio mixing consoles.15 The fibre implementation by Toshiba and known as the TOSLink is popular in consumer products, and the protocol is identical to the consumer electrical format.
When digital audio signals are to be assembled from a variety of sources, either for mixing down or for transmission through a TDM (time-division multiplexing) system, the samples from each source must be synchronized to one another in both frequency and phase. The source of samples must be fed with a reference sampling rate from some central generator, and will return samples at that rate. The same will be true if digital audio is being used in conjunction with VTRs. As the scanner speed and hence the audio block rate is locked to video, it follows that the audio sampling rate must be locked to video. Such a technique has been used since the earliest days of television in order to allow vision mixing, but now that audio is conveyed in discrete samples, these too must be genlocked to a reference for most production purposes.
AES11–199116 documented standards for digital audio synchronization and requires professional equipment to be able to genlock either to a separate reference input or to the sampling rate of an AES/EBU input.
As the interface uses serial transmission, a shift register is required in order to return the samples to parallel format within equipment. The shift register is generally buffered with a parallel loading latch which allows some freedom in the exact time at which the latch is read with respect to the serial input timing. Accordingly the standard defines synchronism as an identical sampling rate, but with no requirement for a precise phase relationship. Figure 7.15 shows the timing tolerances allowed. The beginning of a frame (the frame edge) is defined as the leading edge of the X preamble. A device which is genlocked must correctly decode an input whose frame edges are within ± 25 per cent of the sample period. This is quite a generous margin, and corresponds to the timing shift due to putting about a kilometre of cable in series with a signal. In order to prevent tolerance build-up when passing through several devices in series, the output timing must be held within ± 5 per cent of the sample period.
The reference signal may be an AES/EBU signal carrying program material, or it may carry muted audio samples; the so-called digital audio silence signal. Alternatively it may just contain the sync patterns. The accuracy of the reference is specified in bits 0 and 1 of byte 4 of channel status (see Figure 7.7). Two zeros indicates the signal is not reference grade (but some equipment may still be able to lock to it). 01 indicates a Grade 1 reference signal which is ± 1 ppm accurate, whereas 10 indicates a Grade 2 reference signal which is ± 10 ppm accurate. Clearly devices which are intended to lock to one of these references must have an appropriate phase-locked-loop capture range.
In addition to the AES/EBU synchronization approach, some older equipment carries a word clock input which accepts a TTL level square wave at the sampling frequency. This is the reference clock of the old Sony SDIF-2 interface.
Modern digital audio devices may also have a video input for synchronizing purposes. Video syncs (with or without picture) may be input, and a phase-locked loop will multiply the video frequency by an appropriate factor to produce a synchronous audio sampling clock.
In practical situations, genlocking is not always possible. In a satellite transmission, it is not really practicable to genlock a studio complex half-way around the world to another. Outside broadcasts may be required to generate their own master timing for the same reason. When genlock is not achieved, there will be a slow slippage of sample phase between source and destination due to such factors as drift in timing generators. This phase slippage will be corrected by a synchronizer, which is intended to work with frequencies that are nominally the same. It should be contrasted with the sampling-rate convertor which can work at arbitrary but generally greater frequency relationships. Although a sampling-rate convertor can act as a synchronizer, it is a very expensive way of doing the job. A synchronizer can be thought of as a lower-cost version of a sampling-rate convertor which is constrained in the rate difference it can accept.
In one implementation of a digital audio synchronizer,17 memory is used as a timebase corrector. Samples are written into the memory with the frequency and phase of the source and, when the memory is half-full, samples are read out with the frequency and phase of the destination. Clearly if there is a net rate difference, the memory will either fill up or empty over a period of time, and in order to recentre the address relationship, it will be necessary to jump the read address. This will cause samples to be omitted or repeated, depending on the relationship of source rate to destination rate, and would be audible on program material. The solution is to detect pauses or low-level passages and permit jumping only at such times. The process is illustrated in Figure 7.16. An alternative to address jumping is to undertake sampling-rate conversion for a short period (Figure 7.17) in order to slip the input/output relationship by one sample.18 If this is done when the signal level is low, short wordlength logic can be used. However, now that sampling rate convertors are available as a low-cost single chip, these solutions are found less often in hardware, although they may be used in software-controlled processes.
The difficulty of synchronizing unlocked sources is eased when the frequency difference is small. This is one reason behind the clock accuracy standards for AES/EBU timing generators.19
Routing is the process of directing audio and video signals betwen a large number of devices so that any one can be connected to any other. The principle of a router is not dissimilar to that of a telephone exchange. In analog routers, there is the potential for quality loss due to the switching element. Digital routers handling AES/EBU audio signals are attractive because they need introduce no loss whatsoever. In addition, the switching is performed on a binary signal and therefore the cost can be lower. Routers can be either cross-point or time-division multiplexed.
In a TDM system, channel reassignment is easy. If the audio channels are transmitted in address sequence, it is only necessary to change the addresses which the receiving channels recognize, and a given input channel will emerge from a different output channel. The only constraint in the use of TDM systemsis that all channels must have synchronized sampling rates. Given that the MADI interface uses TDM, it is also possible to perform routing functions using MADIbased hardware.
For asynchronous systems, or where several sampling rates are found simultaneously, a cross-point type of channel-assignment matrix will be necessary, using AES/EBU signals. In such a device, the switching can be performed by logic gates at low cost, and in the digital domain there is, of course, no quality degradation.
Routers are the traditional approach to audio switching. However, given that digital audio is just another kind of data, it is to be expected that computer-based network technology can provide such functions. It is fundamental in a network that any port can communicate with any other port. Figure 7.18 shows a primitive three-port network. Clearly each port must select one or other of the remaining ports in a trivial switching system. However, if it were attempted to redraw Figure 7.18 with one hundred ports, each one would need a 99-way switch and the number of wires needed would be phenomenal. Another approach is needed.
Figure 7.19 shows that the common solution is to have an exchange, also known as a router, hub or switch, which is connected to every port by a single cable. In this case when a port wishes to communicate with another, it instructs the switch to make the connection. The complexity of the switch varies with its performance. The minimal case may be to install a single input selector and a single output selector. This allows any port to communicate with any other, but only one at a time. If more simultaneous communications are needed, further switching is needed. The extreme case is where every possible pair of ports can communicate simultaneously.
The amount of switching logic needed to implement the extreme case is phenomenal and in practice it is unlikely to be needed. One fundamental property of networks is that they are seldom implemented with the extreme case supported. There will be an economic decision made balancing the number of simultaneous communications with the equipment cost. Most of the time the user will be unaware that this limit exists, until there is a statistically abnormal condition which causes more than the usual number of nodes to attempt communication.
The phrase ‘the switchboard was jammed’ has passed into the language and stayed there despite the fact that manual switchboards are only seen in museums. This is a characteristic of networks. They generally only work up to a certain throughput and then there are problems. This doesn’t mean that networks aren’t useful, far from it. What it means is that with care, networks can be very useful, but without care they can be a nightmare.
There are two key factors to get right in a network. The first is that it must have enough throughput, bandwidth or connectivity to handle the anticipated usage and the second is that a priority system or algorithm is chosen which has appropriate behaviour during overload. These two characteristics are quite different, but often come as a pair in a network corresponding to a particular standard.
Where each device is individually cabled, the result is a radial network shown in Figure 7.20(a). It is not necessary to have one cable per device and several devices can co-exist on a single cable if some form of multiplexing is used. This might be time-division multiplexing (TDM) or frequency division multiplexing (FDM). In TDM, shown in Figure 7.20(b), the time axis is divided into steps which may or may not be equal in length. In Ethernet, for example, these are called frames. During each time step or frame a pair of nodes have exclusive use of the cable. At the end of the time step another pair of nodes can communicate. Rapidly switching between steps gives the illusion of simultaneous transfer between several pairs of nodes. In FDM, simultaneous transfer is possible because each message occupies a different band of frequencies in the cable. Each node has to ‘tune’ to the correct signal. In practice it is possible to combine FDM and TDM. Each frequency band can be time multiplexed in some applications.
Data networks originated to serve the requirements of computers and it is a simple fact that most computer processes don’t need to be performed in real time or indeed at a particular time at all. Networks tend to reflect that background as many of them, particularly the older ones, are asynchronous.
Asynchronous means that the time taken to deliver a given quantity of data is unknown. A TDM system may chop the data into several different transfers and each transfer may experience delay according to what other transfers the system is engaged in. Ethernet and most storage system buses are asynchronous. For broadcasting purposes an asynchronous delivery system is no use at all, but for copying an audio data file between two storage devices an asynchronous system is perfectly adequate.
The opposite extreme is the synchronous system in which the network can guarantee a constant delivery rate and a fixed and minor delay. An AES/EBU router is a synchronous network.
In between asynchronous and synchronous networks reside the isochronous approaches. These can be thought of as sloppy synchronous networks or more rigidly controlled asynchronous networks. Both descriptions are valid. In the isochronous network there will be maximum delivery time which is not normally exceeded. The data transmission rate may vary, but if the rate has been low for any reason, it will accelerate to prevent the maximum delay being reached. Isochronous networks can deliver near-real-time performance. If a data buffer is provided at both ends, synchronous data such as AES/EBU audio can be fed through an isochronous network. The magnitude of the maximum delay determines the size of the buffer and the length of the fixed overall delay through the system. This delay is responsible for the term ‘near-real time’. ATM is an isochronous network.
These three different approaches are needed for economic reasons. Asynchronous systems are very efficient because as soon as one transfer completes, another can begin. This can only be achieved by making every device wait with its data in a buffer so that transfer can start immediately. Asynchronous sytems also make it possible for low bit rate devices to share a network with high bit rate devices. The low bit rate device will only need a small buffer and will therefore send short data blocks, whereas the high bit rate device will send long blocks. Asynchronous systems have no dificulty in handling blocks of varying size, whereas in a synchronous system this is very difficult.
Isochronous systems try to give the best of both worlds, generally by sacrificing some flexibility in block size. FireWire is an example of a network which is part isochronous and part asynchronous so that the advantages of both are available.
A network is basically a communication resource which is shared for economic reasons. Like any shared resource, decisions have to be made somewhere and somehow about how the resource is to be used. In the absence of such decisions the resultant chaos will be such that the resource might as well not exist.
In communications networks the resource is the ability to convey data from any node or port to any other. On a particular cable, clearly only one transaction of this kind can take place at any one instant even though in practice many nodes will simultaneously be wanting to transmit data. Arbitration is needed to determine which node is allowed to transmit.
There are a number of different arbitration protocols and these have evolved to support the needs of different types of network. In small networks, such as LANs, a single point failure which halts the entire network may be acceptable, whereas in a public transport network owned by a telecommunications company, the network will be redundant so that if a particular link fails data may be sent via an alternative route. A link which has reached its maximum capacity may also be supplanted by transmission over alternative routes.
In physically small networks, arbitration may be carried out in a single location. This is fast and efficient, but if the arbitrator fails it leaves the system completely crippled. The processor buses in computers work in this way. In centrally arbitrated systems the arbitrator needs to know the structure of the system and the status of all the nodes. Following a configuration change, due perhaps to the installation of new equipment, the arbitrator needs to be told what the new configuration is, or have a mechanism which allows it to explore the network and learn the configuration. Central arbitration is only suitable for small networks which change their configuration infrequently.
In other networks the arbitration is distributed so that some decision-making ability exists in every node. This is less efficient but is does allow at least some of the network to continue operating after a component failure. Distributed arbitration also means that each node is self-sufficient and so no changes need to be made if the network is reconfigured by adding or deleting a node. This is the only possible approach in wide area networks where the structure may be very complex and change dynamically in the event of failures or overload.
Ethernet uses distributed arbitration. FireWire is capable of using both types of arbitration. A small amount of decision-making ability is built into every node so that distributed arbitration is possible. However, if one of the nodes happens to be a computer, it can run a centralized arbitration algorithm.
The physical structure of a network is subject to some variation as Figure 7.21 shows. In radial networks, (a), each port has a unique cable connection to a device called a hub. The hub must have one connection for every port and this limits the number of ports. However, a cable failure will only result in the loss of one port. In a ring system (b) the nodes are connected like a daisy chain with each node acting as a feedthrough. In this case the arbitration requirement must be distributed. With some protocols, a single cable break doesn’t stop the network operating. Depending on the protocol, simultaneous transactions may be possible provided they don’t require the same cable. For example, in a storage network a disk drive may be outputting data to an editor while another drive is backing up data to a tape streamer. For the lowest cost, all nodes are physically connected in parallel to the same cable. Figure 7.21(c) shows that a cable break would divide the network into two halves, but it is possible that the impedance mismatch at the break could stop both halves working.
One of the concepts involved in arbitration is priority, which is fundamental to providing an appropriate quality of service. If two processes both want to use a network, the one with the highest priority would normally go first. Attributing priority must be done carefully because some of the results are non-intuitive. For example, it may be beneficial to give a high priority to a humble device which has a low data rate for the simple reason that if it is given use of the network it won’t need it for long. In a broadcast environment, transactions concerned with on-air processes would have priority over file transfers concerning production and editing.
When a device gains access to the network to perform a transaction, generally no other transaction can take place until it has finished. Consequently it is important to limit the amount of time that a given port can stay on the bus. In this way when the time limit expires, a further arbitration must take place. The result is that the network resource rotates between transactions rather than one transfer hogging the resource and shutting everyone else out.
It follows from the presence of a time (or data quantity) limit that ports must have the means to break large files up into frames or cells and reassemble them on reception. This process is sometimes called adaptation. If the data to be sent originally exist at a fixed bit rate, some buffering will be needed so that the data can be time-compressed into the available frames. Each frame must be contiguously numbered and the system must transmit a file size or word count so that the receiving node knows when it has received every frame in the file.
The error-detection system interacts with this process because if any frame is in error on reception, the receiving node can ask for a retransmission of the frame. This is more efficient than retransmitting the whole file. Figure 7.22 shows the flow chart for a receiving node. Breaking files into frames helps to keep down the delay experienced by each process using the network. Figure 7.23 shows that each frame may be stored ready for transmission in a silo memory. It is possible to make the priority a function of the number of frames in the silo, as this is a direct measure of how long a process has been kept waiting. Isochronous systems must do this in order to meet maximum delay specifications. In Figure 7.23 once frame transmission has completed, the arbitrator will determine which process sends a frame next by examining the depth of all the frame buffers. MPEG transport stream multiplexers and networks delivering MPEG data must work in this way because the transfer is isochronous and the amount of buffering in a decoder is limited for economic reasons.
A central arbitrator is relatively simple to implement because when all decisions are taken centrally there can be no timing difficulty (assuming a wellengineered system). In a distributed system, there is an extra difficulty due to the finite time taken for signals to travel down the data paths between nodes.
Figure 7.24 shows the structure of Ethernet which uses a protocol called CSMA/CD (carrier sense multiple access with collision detect) developed by DEC and Xerox. This is a distributed arbitration network where each node follows some simple rules. The first of these is not to transmit if an existing bus signal is detected. The second is not to transmit more than a certain quantity of data before releasing the bus. Devices wanting to use the bus will see bus signals and so will wait until the present bus transaction finishes. This must happen at some point because of the frame size limit. When the frame is completed, signalling on the bus should cease. The first device to sense the bus becoming free and to assert its own signal will prevent any other nodes transmitting according to the first rule. Where numerous devices are present it is possible to give them a priority structure by providing a delay between sensing the bus coming free and beginning a transaction. High-priority devices will have a short delay so they get in first. Lower-priority devices will only be able to start a transaction if the high-priority devices don’t need to transfer.
It might be thought that these rules would be enough and everything would be fine. Unfortunately the finite signal speed means that there is a flaw in the system. Figure 7.24 shows why. Device A is transmitting and devices B and C both want to transmit and have equal priority. At the end of A’s transaction, devices B and C see the bus become free at the same instant and start a transaction. With two devices driving the bus, the resultant waveform is meaningless. This is known as a collision and all nodes must have means to recover from it. First, each node will read the bus signal at all times. When a node drives the bus, it will also read back the bus signal and compare it with what was sent. Clearly if the two are the same all is well, but if there is a difference, this must be because a collision has occurred and two devices are trying to determine the bus voltage at once.
If a collision is detected, both colliding devices will sense the disparity between the transmitted and readback signals, and both will release the bus to terminate the collision. However, there is no point is adhering to the simple protocol to reconnect because this will simply result in another collision. Instead each device has a built-in delay which must expire before another attempt is made to transmit. This delay is not fixed, but is controlled by a random number generator and so changes from transaction to transaction.
The probability of two node devices arriving at the same delay is infinitesimally small. Consequently if a collision does occur, both devices will drop the bus, and they will start their back-off timers. When the first timer expires, that device will transmit and the other will see the transmission and remain silent. In this way the collision is not only handled, but prevented from happening again.
The performance of Ethernet is usually specified in terms of the bit rate at which the cabling runs. However, this rate is academic because it is not available all the time. In a real network bit rate is lost by the need to send headers and errorcorrection codes and by the loss of time due to interframe spaces and collision handling. As the demand goes up, the number of collisions increases and throughput goes down. Collision-based arbitrators do not handle congestion well.
FireWire20 is actually an Apple Computers Inc. trade name for the interface which is formally known as IEEE 1394–1995. It was originally intended as a digital audio network, but grew out of recognition. FireWire is more than just an interface as it can be used to form networks and if used with a computer effectively extends the computer’s data bus. Devices are simply connected together as any combination of daisy-chain or star network.
Any pair of devices can communicate in either direction, and arbitration ensures that only one device transmits at once. Intermediate devices simply pass on transmissions. This can continue even if the intermediate device is powered down as the FireWire carries power to keep repeater functions active.
Communications are divided into cycles which have a period of 125 s. During a cycle, there are 64 time slots. During each time slot, any one node can communicate with any other, but in the next slot, a different pair of nodes may communicate. Thus FireWire is best described as a time-division multiplexed (TDM) system. There will be a new arbitration between the nodes for each cycle.
FireWire is eminently suitable for audio/computer convergent applications because it can simultaneously support asynchronous transfers of non-real-time computer data and isochronous transfers of real-time audio/video data. It can do this because the arbitration process allocates a fixed proportion of slots for isochronous data (about 80 per cent) and these have a higher priority in the arbitration than the asynchronous data. The higher the data rate a given node needs, the more time slots it will be allocated. Thus a given bit rate can be guaranteed throughout a transaction; a prerequisite of real-time A/V data transfer.
It is the sophistication of the arbitration system which makes FireWire remarkable. Some of the arbitration is in hardware at each node, but some is in software which only needs to be at one node. The full functionality requires a computer somewhere in the system which runs the isochronous bus management arbitration. Without this only asynchronous transfers are possible. It is possible to add or remove devices whilst the system is working. When a device is added the system will recognize it through a periodic learning process. Essentially every node on the system transmits in turn so that the structure becomes clear.
The electrical interface of FireWire is shown in Figure 7.25. It consists of two twisted pairs for signalling and a pair of power conductors. The twisted pairs carry differential signals of about 220 mV swinging around a common mode voltage of about 1.9 V with an impedance of 112W. Figure 7.26 shows how the data are transmitted. The host data are simply serialized and used to modulate twisted pair A. The other twisted pair (B) carries a signal called strobe, which is the exclusive-OR of the data and the clock. Thus whenever a run of identical bits results in no transitions in the data, the strobe signal will carry transitions. At the receiver another exclusive-OR gate adds data and strobe to recreate the clock.
This signalling technique is subject to skew between the two twisted pairs and this limits cable lengths to about 10 metres between nodes. Thus FireWire is not a long-distance interface technique, instead it is very useful for interconnecting a large number of devices in close proximity. Using a copper interconnect, FireWire can run at 100, 200 or 400 Mbits/s, depending on the specific hardware. It is proposed to create an optical fibre version which would run at gigabit speeds.
Broadband ISDN (B-ISDN) is the successor to N-ISDN and in addition to offering more bandwidth, offers practical solutions to the delivery of any conceivable type of data. The flexibility with which ATM operates means that intermittent or one-off data transactions which only require asynchronous delivery can take place alongside isochronous MPEG video delivery. This is known as application independence whereby the sophistication of isochronous delivery does not raise the cost of asynchronous data. In this way, generic data, video, speech and combinations of the above can co-exist.
ATM is multiplexed, but it is not time-division multiplexed. TDM is inefficient because if a transaction does not fill its allotted bandwidth, the capacity is wasted. ATM does not offer fixed blocks of bandwidth, but allows infinitely variable bandwidth to each transaction. This is done by converting all host data into small fixed-size cells at the adaptation layer. The greater the bandwidth needed by a transaction, the more cells per second are allocated to that transaction. This approach is superior to the fixed bandwidth approach, because if the bit rate of a particular transaction falls, the cells released can be used for other transactions so that the full bandwidth is always available.
As all cells are identical in size, a multiplexer can assemble cells from many transactions in an arbitrary order. The exact order is determined by the quality of service required, where the time positioning of isochronous data would be determined first, with asynchronous data filling the gaps.
Figure 7.27 shows how a broadband system might be implemented. The transport network would typically be optical fibre based, using SONET (synchronous optical network) or SDH (synchronous digital hierarchy). These standards differ in minor respects. Figure 7.28 shows the bit rates available in each. Lower bit rates will be used in the access networks which will use different technology such as xDSL.
SONET and SDH assemble ATM cells into a structure known as a container in the interests of efficiency. Containers are passed intact between exchanges in the transport network. The cells in a container need not belong to the same transaction, they simply need to be going the same way for at least one transport network leg.
The cell-routing mechanism of ATM is unusual and deserves explanation. In conventional networks, a packet must carry the complete destination address so that at every exchange it can be routed closer to its destination. The exact route by which the packet travels cannot be anticipated and successive packets in the same transaction may take different routes. This is known as a connectionless protocol.
In contrast, ATM is a connection oriented protocol. Before data can be transferred, the network must set up an end-to-end route. Once this is done, the ATM cells do not need to carry a complete destination address. Instead they only need to carry enough addressing so that an exchange or switch can distinguish between all the expected transactions.
The end-to-end route is known as a virtual channel which consists of a series of virtual links between switches. The term ‘virtual channel’ is used because the system acts like a dedicated channel even though physically it is not. When the transaction is completed the route can be dismantled so that the bandwidth is freed for other users. In some cases, such as delivery of a broadcaster’s output to a transmitter, the route can be set up continuously to form what is known as a permanent virtual channel.
The addressing in the cells ensures that all cells with the same address take the same path, but owing to the multiplexed nature of ATM, at other times and with other cells a completely different routing scheme may exist. Thus the routing structure for a particular transaction always passes cells by the same route, but the next cell may belong to another transaction and will have a different address causing it to routed in another way.
The addressing structure is hierarchical. Figure 7.29(a) shows the ATM cell and its header. The cell address is divided into two fields, the virtual channel identifier and the virtual path identifier. Virtual paths are logical groups of virtual channels which happen to be going the same way. An example would be the output of a video-on-demand server travelling to the first switch. The virtual path concept is useful because all cells in the same virtual path can share the same container in a transport network. A virtual path switch shown in Figure 7.29(b) can operate at the container level whereas a virtual channel switch (c) would need to dismantle and reassemble containers.
When a route is set up, at each switch a table is created. When a cell is received at a switch the VPI and/or VCI code is looked up in the table and used for two purposes. First, the configuration of the switch is obtained, so that this switch will correctly route the cell, second, the VPI and/or VCI codes may be updated so that they correctly control the next switch. This process repeats until the cell arrives at its destination.
In order to set up a path, the initiating device will initially send cells containing an ATM destination address, the bandwidth and quality of service required. The first switch will reply with a message containing the VPI/VCI codes which are to be used for this channel. The message from the initiator will propagate to the destination, creating look-up tables in each switch. At each switch the logic will add the requested bandwidth to the existing bandwidth in use to check that the requested quality of service can be met. If this succeeds for the whole channel, the destination will reply with a connect message which propagates back to the initiating device as confirmation that the channel has been set up. The connect message contains an unique call reference value which identifies this transaction. This is necessary because an initiator such a file server may be initiating many channels and the connect messages will not necessarily return in the same order as the set-up messages were sent.
The last switch will confirm receipt of the connect message to the destination and the initiating device will confirm receipt of the connect message to the first switch.
ATM works by dividing all real data messages into cells of 48 bytes each. At the receiving end, the original message must be recreated. This can take many forms. Figure 7.30 shows some possibilities. The message may be a generic data file having no implied timing structure. The message may be a serial bitstream with a fixed clock frequency, known as UDT (unstructured data transfer). It may be a burst of data bytes from a TDM system.
The application layer in ATM has two sub-layers shown in Figure 7.31. The first is the segmentation and reassembly (SAR) sublayer which must divide the message into cells and rebuild it to get the binary data right. The second is the convergence sublayer (CS) which recovers the timing structure of the original message. It is this feature which makes ATM so appropriate for delivery of audio/ visual material. Conventional networks such as the Internet don’t have this ability.
In order to deliver a particular quality of service, the adaptation layer and the ATM layer work together. Effectively the adaptation layer will place constraints on the ATM layer, such as cell delay, and the ATM layer will meet those constraints without needing to know why. Provided the constraints are met, the adaptation layer can rebuild the message. The variety of message types and timing constraints leads to the adaptation layer having a variety of forms.
The adaptation layers which are most relevant to MPEG applications are AAL-1 and AAL-5. AAL-1 is suitable for transmitting MPEG-2 multi-program transport streams at constant bit rate and is standardized for this purpose in ETS 300814 for DVB application. AAL-1 has an integral forward error correction (FEC) scheme. AAL-5 is optimized for single-program transport streams (SPTS) at a variable bit rate and has no FEC.
AAL-1 takes as an input the 188-byte transport stream packets which are created by a standard MPEG-2 multiplexer. The transport stream bit rate must be constant but it does not matter if statistical multiplexing has been used within the transport stream.
The Reed–Solomon FEC of AAL-1 uses a codeword of size 128 so that the codewords consist of 124 bytes of data and 4 bytes of redundancy, making 128 bytes in all. Thirty-one 188-byte TS packets are restructured into this format. The 256-byte codewords are then subject to a block interleave. Figure 7.32 shows that 47 such codewords are assembled in rows in RAM and then columns are read out. These columns are 47 bytes long and, with the addition of an AAL header byte make up a 48-byte ATM packet payload. In this way the interleave block is transmitted in 128 ATM cells.
The result of the FEC and interleave is that the loss of up to four cells in 128 can be corrected, or a random error of up to two bytes can be corrected in each cell. This FEC system allows most errors in the ATM layer to be corrected so that no retransmissions are needed. This is important for isochronous operation. The AAL header has a number of functions. One of these is to identify the first ATM cell in the interleave block of 128 cells. Another function is to run a modulo-8 cell counter to detect missing or out-of sequence ATM cells. If a cell simply fails to arrive, the sequence jump can be detected and used to flag the FEC system so that it can correct the missing cell by erasure (see section 6.21). In a manner similar to the use of program clock reference (PCR) in MPEG, AAL-1 embeds a timing code in ATM cell headers. This is called the synchronous residual time stamp (SRTS) and in conjunction with the ATM network clock allows the receiving AAL device to reconstruct the original data bit rate. This is important because in MPEG applications it prevents the PCR jitter specification being exceeded.
In AAL-5 there is no error correction and the adaptation layer simply reformats MPEG TS blocks into ATM cells. Figure 7.33 shows one way in which this can be done. Two TS blocks of 188 bytes are associated with an 8-byte trailer known as CPCS (common part convergence sublayer). The presence of the trailer makes a total of 384 bytes which can be carried in eight ATM cells. AAL-5 does not offer constant delay and external buffering will be required, controlled by reading the MPEG PCRs in order to reconstruct the original time axis.
This system was developed by the BBC to allow the two additional high-quality digital sound channels to be carried on terrestrial television broadcasts. Performance was such that the system was adopted as the UK standard, and was recommended by the EBU to be adopted by its members, many of whom put it into service.21
The introduction of stereo sound with television cannot be at the expense of incompatibility with the existing monophonic analog sound channel. In NICAM 728 an additional low-power subcarrier is positioned just above the analog sound carrier, which is retained. The relationship is shown in Figure 7.34. The power of the digital subcarrier is about one hundredth that of the main vision carrier, and so existing monophonic receivers will reject it.
Since the digital carrier is effectively shoe-horned into the gap between TV channels, it is necessary to ensure that the spectral width of the intruder is minimized to prevent interference. As a further measure, the power of the existing audio carrier is halved when the digital carrier is present.
Figure 7.35 shows the stages through which the audio must pass. The audio sampling rate used is 32 kHz which offers similar bandwidth to that of an FM stereo radio broadcast. Samples are originally quantized to fourteen-bit resolution in two’s complement code. From an analog source this causes no problem, but from a professional digital source having longer wordlength and higher sampling rate it would be necessary to pass through a rate convertor, a digital equalizer to provide pre-emphasis, an optional digital compressor in the case of wide dynamic range signals and then through a truncation circuit incorporating digital dither.
The fourteen-bit samples are block companded to reduce data rate. During each one millisecond block, 32 samples are input from each audio channel. The magnitude of the largest sample in each channel is independently assessed, and used to determine the gain range or scale factor to be used. Every sample in each channel in a given block will then be scaled by the same amount and truncated to ten bits. An eleventh bit present on each sample combines the scale factor of the channel with parity bits for error detection. The encoding process is described as a Near Instantaneously Companded Audio Multiplex, NICAM for short. The resultant data now consists of 2 × 32 × 11 = 704 bits per block. Bit interleaving is employed to reduce the effect of burst errors.
At the beginning of each block a synchronizing byte, known as a Frame Alignment Word, is followed by five control bits and eleven additional data bits, making a total of 728 bits per frame, hence the number in the system name. As there are 1000 frames per second, the bit rate is 728kbits/s. In the UK this is multiplied by 9 to obtain the digital carrier frequency of 6.552 MHz but some other countries use a different subcarrier spacing.
The digital carrier is phase modulated. It has four states which are 90° apart. Information is carried in the magnitude of a phase change which takes place every 18 cycles, or 2.74 μs. As there are four possible phase changes, two bits are conveyed in every change. The absolute phase has no meaning, only the changes are interpreted by the receiver. This type of modulation is known as differentially encoded quadrature phase shift keying (DQPSK), sometimes called four-phase DPSK. In order to provide consistent timing and to spread the carrier energy throughout the band irrespective of audio content, randomizing is used, except during the frame alignment word. On reception, the FAW is detected and used to synchronize the pseudo-random generator to restore the original data.
Figure 7.36 shows the general structure of a frame. Following the sync pattern or FAW is the application control field. The application control bits determine the significance of following data, which can be stereo audio, two independent mono signals, mono audio and data or data only. Control bits C1, C2 and C3 have eight combinations, of which only four are currently standardized. Receivers are designed to mute audio if C3 becomes 1.
The frame flag bit C0 spends eight frames high then eight frames low in an endless sixteen-frame sequence which is used to synchronize changes in channel usage. In the last sixteen-frame sequence of the old application, the application control bits change to herald the new application, whereas the actual data change to the new application on the next sixteen-frame sequence.
The reserve sound switching flag, C4, is set to 1 if the analog sound being broadcast is derived from the digital stereo. This fact can be stored by the receiver and used to initiate automatic switching to analog sound in the case of loss of the digital channels.
The additional data bits AD0 to AD10 are as yet undefined, and reserved for future applications.
The remaining 704 bits in each frame may be either audio samples or data. The two channels of stereo audio are multiplexed into each frame, but multiplexing does not occur in any other case. If two mono audio channels are sent, they occupy alternate frames. Figure 7.36(a) shows a stereo frame, where the A channel is carried in odd-numbered samples, whereas Figure 7.36(b) shows a mono frame, where the M1 channel is carried in odd-numbered frames.
Digital television broadcasting relies on the combination of a number of fundamental technologies. These are: compression to reduce the bit rate, multiplexing to combine picture and sound data into a common bitstream, digital modulation schemes to reduce the RF bandwidth needed by a given bit rate and error correction to reduce the error statistics of the channel down to a value acceptable to MPEG data.
Compressed video and audio are both highly sensitive to bit errors, primarily because they confuse the recognition of variable-length codes so that the decoder loses synchronization. However, MPEG is a compression and multiplexing standard and does not specify how error correction should be performed. Consequently a transmission standard must define a system which has to correct essentially all errors such that the delivery mechanism is transparent.
Essentially a transmission standard specifies all the additional steps needed to deliver an MPEG transport stream from one place to another. This transport stream will consist of a number of elementary streams of video and audio, where the audio may be coded according to MPEG audio standard or AC-3. In a system working within its capabilities, the picture and sound quality will be determined only by the performance of the compression system and not by the RF transmission channel. This is the fundamental difference between analog and digital broadcasting.
Whilst in one sense an MPEG transport stream is only data, it differs from generic data in that it must be presented to the viewer at a particular rate. Generic data are usually asynchronous, whereas baseband video and audio are synchronous. However, after compression and multi-plexing audio and video are no longer precisely synchronous and so the term isochronous is used. This means a signal which was at one time synchronous and will be displayed synchronously, but which uses buffering at transmitter and receiver to accommodate moderate timing errors in the transmission.
Clearly another mechanism is needed so that the time axis of the original signal can be recreated on reception. The time stamp and program clock reference system of MPEG does this.
Figure 7.37 shows that the concepts involved in digital television broadcasting exist at various levels which have an independence not found in analog technology. In a given configuration a transmitter can radiate a given payload data bit rate. This represents the useful bit rate and does not include the necessary overheads needed by error correction, multiplexing or synchronizing. It is fundamental that the transmission system does not care what this payload bit rate is used for. The entire capacity may be used up by one high-definition channel, or a large number of heavily compressed channels may be carried. The details of this data usage are the domain of the transport stream. The multiplexing of transport streams is defined by the MPEG standards, but these do not define any error correction or transmission technique.
At the lowest level in Figure 7.38 the source coding scheme, in this case MPEG compression, results in one or more elementary streams, each of which carries a video or audio channel. Elementary streams are multiplexed into a transport stream. The viewer then selects the desired elementary stream from the transport stream. Metadata in the transport stream ensures that when a video elementary stream is chosen, the appropriate audio elementary stream will automatically be selected.
The video elementary stream is an endless bitstream representing pictures which take a variable length of time to transmit. Bidirection coding means that pictures are not necessarily in the correct order. Storage and transmission systems prefer discrete blocks of data and so elementary streams are packetized to form a PES (packetized elementary stream). Audio elementary streams are also packetized. A packet is shown in Figure 7.39. It begins with a header containing an unique packet start code and a code which identifies the type of data stream. Optionally the packet header also may contain one or more time stamps which are used for synchronizing the video decoder to real time and for obtaining lip-sync.
Figure 7.40 shows that a time stamp is a sample of the state of a counter which is driven by a 90 kHz clock. This is obtained by dividing down the master 27 MHz clock of MPEG-2. This 27 MHz clock must be locked to the video frame rate and the audio sampling rate of the program concerned. There are two types of time stamp: PTS and DTS. These are abbreviations for presentation time stamp and decode time stamp. A presentation time stamp determines when the associated picture should be displayed on the screen, whereas a decode time stamp determines when it should be decoded. In bidirectional coding these times can be quite different.
Audio packets only have presentation time stamps. Clearly if lip-sync is to be obtained, the audio sampling rate of a given program must have been locked to the same master 27 MHz clock as the video and the time stamps must have come from the same counter driven by that clock.
In practice the time between input pictures is constant and so there is a certain amount of redundancy in the time stamps. Consequently PTS/DTS need not appear in every PES packet. Time stamps can be up to 100 ms apart in transport streams. As each picture type (I, P or B) is flagged in the bitstream, the decoder can infer the PTS/DTS for every picture from the ones actually transmitted.
The MPEG-2 transport stream is intended to be a multiplex of many TV programs with their associated sound and data channels, although a single program transport stream (SPTS) is possible. The transport stream is based upon packets of constant size so that multiplexing, adding error-correction codes and interleaving in a higher layer is eased. Figure 7.41 shows that these are always 188 bytes long.
Transport stream packets always begin with a header. The remainder of the packet carries data known as the payload. For efficiency, the normal header is relatively small, but for special purposes the header may be extended. In this case the payload gets smaller so that the overall size of the packet is unchanged. Transport stream packets should not be confused with PES packets which are larger and which vary in size. PES packets are broken up to form the payload of the transport stream packets.
The header begins with a sync byte which is an unique pattern detected by a demultiplexer. A transport stream may contain many different elementary streams and these are identified by giving each an unique thirteen-bit packet identification code or PID which is included in the header. A multiplexer seeking a particular elementary stream simply checks the PID of every packet and accepts only those which match.
In a multiplex there may be many packets from other programs in between packets of a given PID. To help the demultiplexer, the packet header contains a continuity count. This is a four-bit value which increments at each new packet having a given PID.
This approach allows statistical multiplexing as it does not matter how many or how few packets have a given PID; the demux will still find them. Statistical multiplexing has the problem that it is virtually impossible to make the sum of the input bit rates constant. Instead the multiplexer aims to make the average data bit rate slightly less than the maximum and the overall bit rate is kept constant by adding ‘stuffing’ or null packets. These packets have no meaning, but simply keep the bit rate constant. Null packets always have a PID of 8191 (all ones) and the demultiplexer discards them.
A transport stream is a multiplex of several TV programs and these may have originated from widely different locations. It is impractical to expect all the programs in a transport stream to be genlocked and so the stream is designed from the outset to allow unlocked programs. A decoder running from a transport stream has to genlock to the encoder and the transport stream has to have a mechanism to allow this to be done independently for each program. The synchronizing mechanism is called program clock reference (PCR).
Figure 7.42 shows how the PCR system works. The goal is to recreate at the decoder a 27 MHz clock which is synchronous with that at the encoder. The encoder clock drives a 48-bit counter which continuously counts up to the maximum value before overflowing and beginning again.
A transport stream multiplexer will periodically sample the counter and place the state of the count in an extended packet header as a PCR. The demultiplexer selects only the PIDs of the required program, and it will extract the PCRs from the packets in which they were inserted.
The PCR codes are used to control a numerically locked loop (NLL) described in section 3.17. The NLL contains a 27 MHz VCXO (voltage-controlled crystal oscillator). This is a variable-frequency oscillator based on a crystal which has a relatively small frequency range.
The VCXO drives a 48-bit counter in the same way as in the encoder. The state of the counter is compared with the contents of the PCR and the difference is used to modify the VCXO frequency. When the loop reaches lock, the decoder counter would arrive at the same value as is contained in the PCR and no change in the VCXO would then occur. In practice the transport stream packets will suffer from transmission jitter and this will create phase noise in the loop. This is removed by the loop filter so that the VCXO effectively averages a large number of phase errors.
A heavily damped loop will reject jitter well, but will take a long time to lock. Lock-up time can be reduced when switching to a new program if the decoder counter is jammed to the value of the first PCR received in the new program. The loop filter may also have its time constants shortened during lock-up.
Once a synchronous 27 MHz clock is available at the decoder, this can be divided down to provide the 90 kHz clock which drives the time stamp mechanism.
The entire timebase stability of the decoder is no better than the stability of the clock derived from PCR. MPEG-2 sets standards for the maximum amount of jitter which can be present in PCRs in a real transport stream.
Clearly if the 27 MHz clock in the receiver is locked to one encoder it can only receive elementary streams encoded with that clock. If it is attempted to decode, for example, an audio stream generated from a different clock, the result will be periodic buffer overflows or underflows in the decoder. Thus MPEG defines a program in a manner which relates to timing. A program is a set of elementary streams which have been encoded with the same master clock.
In a real transport stream, each elementary stream has a different PID, but the demultiplexer has to be told what these PIDs are and what audio belongs with what video before it can operate. This is the function of PSI which is a form of metadata. Figure 7.43 shows the structure of PSI. When a decoder powers up, it knows nothing about the incoming transport stream except that it must search for all packets with a PID of zero. PID zero is reserved for the Program Association Table (PAT). The PAT is transmitted at regular intervals and contains a list of all the programs in this transport stream. Each program is further described by its own Program Map Table (PMT) and the PIDs of of the PMTs are contained in the PAT.
Figure 7.43 also shows that the PMTs fully describe each program. The PID of the video elementary stream is defined, along with the PID(s) of the associated audio and data streams. Consequently when the viewer selects a particular program, the demultiplexer looks up the program number in the PAT, finds the right PMT and reads the audio, video and data PIDs. It then selects elementary streams having these PIDs from the transport stream and routes them to the decoders.
Program 0 of the PAT contains the PID of the Network Information Table (NIT). This contains information about what other transport streams are available. For example, in the case of a satellite broadcast, the NIT would detail the orbital position, the polarization, carrier frequency and modulation scheme. Using the NIT a set-top box could automatically switch between transport streams.
Apart from 0 and 8191, a PID of 1 is also reserved for the Conditional Access Table (CAT). This is part of the access control mechanism needed to support pay per view or subscription viewing.
Until the advent of NICAM and MAC, all sound broadcasting had been analog. The AM system is now very old indeed, and is not high fidelity by any standards, having a restricted bandwidth and suffering from noise, particularly at night. In theory, the FM system allows high quality and stereo, but in practice things are not so good. Most FM broadcast networks were planned when a radio set was a sizeable unit which was fixed. Signal strengths were based on the assumption that a fixed FM antenna in an elevated position would be used. If such an antenna is used, reception quality is generally excellent. The forward gain of a directional antenna raises the signal above the front-end noise of the receiver and noise-free stereo is obtained. Such an antenna also rejects unwanted signal reflections.
Reception on car radios is at a disadvantage as directional antennae cannot be used. This makes reception prone to multipath problems. Figure 7.44 shows that when the direct and reflected signals are received with equal strength, nulling occurs at any frequency where the path difference results in a 180° phase shift. Effectively a comb filter is placed in series with the signal. In a moving vehicle, the path lengths change, and the comb response slides up and down the band. When a null passes through the station tuned in, a burst of noise is created. Reflections from aircraft can cause the same problem in fixed receivers.
Digital audio broadcasting (DAB), also known as digital radio, is designed to overcome the problems which beset FM radio, particularly in vehicles. Not only does it do that, it does so using less bandwidth. With increasing pressure for spectrum allocation from other services, a system using less bandwidth to give better quality is likely to be favourably received.
DAB relies on a number of fundamental technologies which are combined into an elegant system. Compression is employed to cut the required bandwidth. Transmission of digital data is inherently robust as the receiver has only to decide between a small number of possible states.
Sophisticated modulation techniques help to eliminate multipath reception problems whilst further economizing on bandwidth. Error correction and concealment allow residual data corruption to be handled before conversion to analog at the receiver.
The system can only be realized with extremely complex logic in both transmitter and receiver, but with modern VLSI technology this can be inexpensive and reliable. In DAB, the concept of one-carrier-one-program is not used. Several programs share the same band of frequencies. Receivers will be easier to use since conventional tuning will be unnecessary. ‘Tuning’ consists of controlling the decoding process to select the desired program. Mobile receivers will automatically switch between transmitters as a journey proceeds.
Figure 7.45 shows the block diagram of a DAB transmitter. Incoming digital audio at 32 kHz is passed into the compression unit which uses the techniques described in Chapter 5 to cut the data rate to some fraction of the original. The compression unit could be at the studio end of the line to cut the cost of the link. The data for each channel are then protected against errors by the addition of redundancy. Convolutional codes described in Chapter 6 are attractive in the broadcast environment. Several such data-reduced sources are interleaved together and fed to the modulator, which may employ techniques such as randomizing which were also introduced in Chapter 6.
Figure 7.46 shows how the multiple carriers in a DAB band are allocated to different program channels on an interleaved basis. Using this technique, it will be evident that when a notch in the received spectrum occurs due to multipath cancellation this will damage a small proportion of all programs rather than a large part of one program. This is the spectral equivalent of physical interleaving on a recording medium. The result is the same in that error bursts are broken up according to the interleave structure into more manageable sizes which can be corrected with less redundancy.
A serial digital waveform has a sinx/x spectrum and when this waveform is used to phase modulate a carrier the result is a symmetrical sinx/x spectrum centred on the carrier frequency. Nulls in the spectrum appear at multiples of the phase switching rate away from the carrier. This distance is equal to 90° or one quadrant of sinx. Further carriers can be placed at spacings such that each is centred at the nulls of the others. Owing to the quadrant spacing, these carries are mutually orthogonal, hence the term orthogonal frequency division.22,23 A number of such carriers will interleave to produce an overall spectrum which is almost rectangular as shown in Figure 7.47(a). The mathematics describing this process is exactly the same as that of the reconstruction of samples in a low-pass filter. Effectively sampling theory has been transformed into the frequency domain.
In practice, perfect spectral interleaving does not give sufficient immunity from multipath reception. In the time domain, a typical reflective environment turns a transmitted pulse into a pulse train extending over several microseconds.24 If the bit rate is too high, the reflections from a given bit coincide with later bits, destroying the orthogonality between carriers. Reflections are opposed by the use of guard intervals in which the phase of the carrier returns to an unmodulated state for a period which is greater than the period of the reflections. Then the reflections from one transmitted phase decay during the guard interval before the next phase is transmitted.25 The principle is not dissimilar to the technique of spacing transitions in a recording further apart than the expected jitter. As expected, the use of guard intervals reduces the bit rate of the carrier because for some of the time it is radiating carrier not data. A typical reduction is to around 80 per cent of the capacity without guard intervals. This capacity reduction does, however, improve the error statistics dramatically, such that much less redundancy is required in the error correction system. Thus the effective transmission rate is improved. The use of guard intervals also moves more energy from the sidebands back to the carrier. The frequency spectrum of a set of carriers is no longer perfectly flat but contains a small peak at the centre of each carrier as shown in Figure 7.47(b).
A DAB receiver must receive the set of carriers corresponding to the required program channel. Owing to the close spacing of carriers, it can only do this by performing fast Fourier transforms (FFTs) on the DAB band. If the carriers of a given program are evenly spaced, a partial FFT can be used which only detects energy at spaced frequencies and requires much less computation. This is the DAB equivalent of tuning. The selected carriers are then demodulated and combined into a single bitstream. The error-correction codes will then be deinterleaved so that correction is possible. Corrected data then pass through the expansion part of the data reduction coder, resulting in conventional PCM audio which drives DACs.
It should be noted that in the European DVB standard, COFDM transmission is also used. In some respects, DAB is simply a form of DVB without the picture data.
Audio Engineering Society, AES recommended practice for digital audio engineering – serial transmission format for linearly represented digital audio data. J. Audio Eng. Soc., 33, 975–984 (1985)
EIAJ CP-340, A Digital Audio Interface, Tokyo: EIAJ (1987)
EIAJ CP-1201, Digital Audio Interface (revised), Tokyo: EIAJ (1992)
EIA RS-422 A. Electronic Industries Association, 2001 Eye St NW, Washington, DC 20006, USA
Smart, D.L., Transmission performance of digital audio serial interface on audio tie lines. BBC Designs Dept Technical Memorandum, 3.296/84
European Broadcasting Union, Specification of the digital audio interface. EBU Doc. Tech., 3250
Rorden, B. and Graham, M., A proposal for integrating digital audio distribution into TV production. J. SMPTE, 606–608 (Sept. 1992)
Gilchrist, N., Co-ordination signals in the professional digital audio interface. In Proc. AES/EBU Interface Conf., 13–15. Burnham: Audio Engineering Society (1989)
AES18–1992, Format for the user data channel of the AES digital audio interface. J. Audio Eng. Soc., 40 167–183 (1992)
Nunn, J.P., Ancillary data in the AES/EBU digital audio interface. In Proc. 1st NAB Radio Montreux Symp., 29–41 (1992)
Komly, A. and Viallevieille, A., Programme labelling in the user channel. In Proc. AES/EBU Interface Conf., 28–51. Burnham: Audio Engineering Society (1989)
ISO 3309, Information processing systems–data communications–high level data link frame structure (1984)
AES10–1991, Serial multi-channel audio digital interface (MADI). J. Audio Eng. Soc., 39, 369–377 (1991)
Ajemian, R.G. and Grundy, A.B., Fiber-optics–the new medium for audio: a tutorial. J.Audio Eng. Soc., 38 160–175 (1990)
Lidbetter, P.S. and Douglas, S., A fibre-optic multichannel communication link developed for remote interconnection in a digital audio console. Presented at the 80th Audio Engineering Society Convention (Montreux, 1986), Preprint 2330
Dunn, J., Considerations for interfacing digital audio equipment to the standards AES3, AES5 and AES11. In Proc. AES 10th International Conf., 122, New York: Audio Engineering Society (1991)
Gilchrist, N.H.C., Digital sound: sampling-rate synchronization by variable delay. BBC Research Dept Report, 1979/17
Lagadec, R., A new approach to sampling rate synchronisation. Presented at the 76th Audio Engineering Society Convention (New York, 1984), Preprint 2168
Shelton, W.T., Progress towards a system of synchronization in a digital studio. Presented at the 82nd Audio Engineering Society Convention (London, 1986), Preprint 2484(K7)
Wicklegren, I.J., The facts about Fire Wire. IEEE Spectrum, 19–25 (1997)
Anon., NICAM 728: specification for two additional digital sound channels with System I television. BBC Engineering Information Dept (London, 1988)
Cimini, L.J., Analysis and simulation of a digital mobile channel using orthogonal frequency division multiplexing. IEEE Trans. Commun., COM-33, No.7 (1985)
Pommier, D. and Wu, Y., Interleaving or spectrum spreading in digital radio intended for vehicles. EBU Tech. Review, No. 217, 128–142 (1986)
Cox, D.C., Multipath delay spread and path loss correlation for 910 MHz urban mobile radio propagation. IEEE Trans. Vehic. Tech., VT-26 (1977)
Alard, M. and Lasalle, R., Principles of modulation and channel coding for digital broadcasting for mobile receivers. EBU Tech. Review, No. 224, 168–190 (1987)
44.200.94.150