7
In this chapter the various standardized interfaces for component and composite video will be detailed along with the necessary troubleshooting techniques.
7.1 Introduction
Of all the advantages of digital video, the most important for production work is the ability to pass through multiple generations without quality loss. Digital interconnection between such production equipment is highly desirable to avoid the degradation due to repeated conversions.
Video convertors universally use parallel connection, where all bits of the pixel value are applied simultaneously to separate pins. Disk drives lay data serially on the track, but within the circuitry, parallel presentation is more common because it allows slower, and hence cheaper, memory chips to be used for timebase correction. Reed–Solomon error correction depends upon symbols assembled from typically eight bits. Digital effects machines and switchers typically operate upon pixel values in parallel.
The first digital video interfaces were based on parallel transmission. All that is necessary is a set of suitable driver chips, running at an appropriate sampling rate, to send video data down cables having separate conductors for each bit of the sample, along with a clock to tell the receiver when to sample the bit values. The complexity is trivial, and for short distances this approach represented the optimum solution at the time.
Parallel connection has drawbacks too; these come into play when longer distances are contemplated. A multicore cable is expensive, and the connectors are physically large. It is difficult to provide good screening of a multicore cable without it becoming inflexible. More seriously, there are electronic problems with multicore cables. The propagation speeds of pulses down all of the cores in the cable will not be exactly the same, and so, at the end of a long cable, some bits may still be in transition when the clock arrives, whilst others may have begun to change to the value in the next pixel.
Where it is proposed to interconnect a large number of units with a router, that device will be extremely complex because of the number of parallel signals to be handled. In short, parallel technology could not and did not replace the central analog router of a conventional television station. The answer to these problems is the serial connection. All of the digital samples are multiplexed into a serial bitstream, and this is encoded to form a self-clocking channel code which can be sent down a single channel. Skew caused by differences in propagation speed cannot then occur. The bit rate necessary is in excess of 200 Mbits/s for standard definition and almost 1.5 Gbits/s for HD, but this is well within the capabilities of coaxial cable. The cabling savings implicit in serial systems are obvious, but the electronic complexity of a serial interconnect is naturally greater, as high speed multiplexers or shift registers are necessary at the transmitting end, and a phase-locked loop, data separator and deserializer, as outlined in Chapter 3, are needed at the receiver to regenerate the parallel signal needed within the equipment. The availability of specialized serial chips from a variety of manufacturers meant that serial digital video would render parallel interfaces obsolete very quickly.
A distinct advantage of serial transmission is that a matrix distribution unit or router is more easily realized. Where numerous pieces of video equipment need to be interconnected in various ways for different purposes, a crosspoint matrix is an obvious solution. With serial signals, only one switching element per signal is needed. A serial system has a potential disadvantage that the time distribution of bits within the block has to be closely defined, and, once standardized, it is extremely difficult to increase the word length if this is found to be necessary. The serial digital interface (SDI) was designed from the outset for 10-bit working but incorporates two low-order bits that may be transmitted as zero in eight-bit applications. In a parallel interconnect, the word extension can be achieved by adding extra conductors alongside the existing bits, which is much easier.
The third interconnect to be considered uses fibre optics. The advantages of this technology are numerous: the bandwidth of an optical fibre is staggering, and is practically limited only by the response speed of the light source and sensor, and for this reason it has been adopted for digital HDTV interfacing. The optical transmission is immune to electromagnetic interference from other sources, nor does it contribute any. This is advantageous for connections between cameras and control units, where a long cable run may be required in outside broadcast applications. The cable can be made completely from insulating materials, so that ground loops cannot occur, although many practical fibre-optic cables include electrical conductors for power and steel strands for mechanical strength.
Drawbacks of fibre optics are few. They do not like too many connectors in a given channel, as the losses at a connection are much greater than with an electrical plug and socket. It is preferable for the only breaks in the fibre to be at the transmitting and receiving points. For similar reasons, fibre optics are less suitable for distribution, where one source feeds many destinations. The bi-directional open-collector or tri-state buses of electronic systems cannot be implemented with fibre optics, nor is it easy to build a crosspoint matrix.
The high frequencies involved in digital video mean that accurate signal termination of electrical cables is mandatory. Cable losses cause the signal amplitude to fall with distance. As a result the familiar passive loop-through connection of analog video is just not possible. Whilst much digital equipment appears to have loop-through connections, close examination will reveal an amplifier symbol joining the input and output. Digital equipment must use active loop-through and if a unit loses power, the loop-through output will fail.
7.2 Areas of Standardization
For some time digital interface standards have existed for 525/59.94 and 625/50 4:2:2 component and 4FSC composite. More recently digital interface standards for a variety of HD scanning standards have been set.
Digital interfaces require to be standardized in the following areas: connectors, to ensure plugs mate with sockets; pinouts; electrical signal specification, to ensure that the correct voltages and timing are transferred; and protocol, to ensure that the meaning of the data words conveyed is the same to both devices. As digital video of any type is only data, it follows that the same physical and electrical standards can be used for a variety of protocols. It also follows that the same protocols can be conveyed down a variety of physical channels. Figure 7.1 shows that serial, parallel and optical fibre interfaces may carry exactly the same data.
Parallel interfaces are obsolete, but they are worthy of study because the parallel interface standard actually contains a comprehensive definition of how a television signal is described in the digital domain as a series of binary numbers. This definition will include such factors as how the image is described as a pixel array, the sampling rate needed to do that, how the colour is subsampled, the colorimetry and gamma assumed, the connection between analog signal voltage and the value of binary codes and so on. The actual parallel interface usually consists of little more than a number of ECL line drivers and receivers, one for each bit along with a clock. A serial interface will require some form of shift register to convert the word format data into a bitstream along with a channel coder to turn the bitstream into a self-clocking waveform compatible with the channel, which may be electrical or optical. The SD and HD serial interfaces are quite similar except, of course, for the bit rate.
7.3 Digitizing Component Video
It is not necessary to digitize analog sync pulses in component systems, since the only useful video data is that sampled during the active line. As the sampling rate is derived from sync, it is only necessary to standardize the size and position of a digital active line and all other parts of the video waveform can be recreated at a later time. The position is specified as a given number of sampling clock periods from the leading edge of sync, and the length is simply a standard number of samples. The digital active line is somewhat longer than the analog active line to allow for some drift in the line position of the analog input and to place edge effects in digital filters outside the screen area. Some of the first and last samples of the digital active line will represent blanking level, thereby avoiding abrupt signal transitions caused by a change from blanking level to active signal. When converting analog signals to digital it is important that the analog unblanked picture should be correctly positioned within the line. In this way the analog line will be symmetrically disposed within the digital active line. If this is the case, when converting the data back to the analog domain, no additional blanking will be necessary, as the blanking at the ends of the original analog line will be recreated from the data. The DAC can pass the whole of the digital active line for conversion and the result will be a correctly timed analog line with blanking edges in the right position.
However, if the original analog timing was incorrect, the unblanked analog line may be too long or off-centre in the digital active line. In this case a DAC may apply digital blanking to the line data prior to conversion. Some equipment gives the user the choice of using blanking in the data or locally applied blanking prior to conversion.
In addition to specifying the location of the samples, it is also necessary to standardize the relationship between the absolute analog voltage of the waveform and the digital code value used to express it so that all machines will interpret the numerical data in the same way. These relationships are in the voltage domain and are independent of the scanning standard used. Thus the same relationships will be found in both SD and HD component formats. As a digital interface is just an alternative way of sending a television picture, the information it contains about that picture will be the same. Thus digital interfaces assume the same standards for gamma and colour primaries as the original analog system.
Figure 7.2 shows how the luminance signal fits into the quantizing range of a digital system. Numbering for 10-bit systems is shown with figures for eight bits in brackets. Black is at a level of 6410 (1610) and peak white is at 94010 (23510) so that there is some tolerance of imperfect analog signals and overshoots caused by filter ringing. The sync pulse will clearly go outside the quantizing range, but this is of no consequence as conventional syncs are not transmitted. The visible voltage range fills the quantizing range and this gives the best possible resolution.
The colour difference signals use offset binary, where 51210 (12810) is the equivalent of blanking voltage. The peak analog limits are reached at 6410 (1610) and 96010 (24010) respectively, allowing once more some latitude for maladjusted analog inputs and filter ringing.
Note that the code values corresponding to all ones or all zeros (i.e. the two extreme ends of the quantizing range) are not allowed to occur in the active line as they are reserved for synchronizing. ADCs must be followed by circuitry that detects these values and forces the code to the nearest legal value if out-of-range analog inputs are applied. Processing circuits that can generate these values must employ digital clamp circuits to remove the values from the signal. Fortunately this is a trivial operation.
The peak-to-peak amplitude of Y is 880 (220) quantizing intervals, whereas for the colour difference signals it is 900 (225) intervals. There is thus a small gain difference between the signals. This will be cancelled out by the opposing gain difference at any future DAC, but must be borne in mind when digitally converting to other standards.
The sampling rate used in SD was easily obtained as only two scanning standards had to be accommodated. It will be seen that in HD there are further constraints. In principle, the sampling rate of a system need only satisfy the requirements of sampling theory and filter design. Any rate that does so can be used to convey a video signal from one place to another. In practice, however, there are a number of factors that limit the choice of sampling rate considerably.
It should be borne in mind that a video signal represents a series of two-dimensional images. If a video signal is sampled at an arbitrary frequency, samples in successive lines and pictures could be in different places. If, however, the video signal is sampled at a rate which is a multiple of line rate the result will be that samples on successive lines will be in the same place and the picture will be converted to a neat array having vertical columns of samples that are in the same place in all pictures. This allows for the spatial and temporal processing needed in, for example, standards convertors and MPEG coders. A line-locked sampling rate can conveniently be obtained by multiplication of the H-sync frequency in a phase-locked loop. The position of samples along the line is then determined by the leading edge of sync.
Considering SD sampling rates first, whilst the bandwidth required by 525/59.94 is less than that required by 625/50, and a lower sampling rate might have been used, practicality suggested a common sampling rate. The benefit of a standard H-locked sampling rate for component video is that the design of standards convertors is simplified and DVTRs have a constant data rate independent of standard. This was the goal of CCIR (now ITU) Recommendation 6011, which combined the 625/50 input of EBU Doc. Tech. 3246 and 3247 with the 525/59.94 input of SMPTE RP 125.
CCIR 601 recommends the use of certain sampling rates which are based on integer multiples of the carefully chosen fundamental frequency of 3.375 MHz. This frequency is normalized to 1 in the document.
In order to sample 625/50 luminance signals without quality loss, the lowest multiple possible is 4 which represents a sampling rate of 13.5 MHz. This frequency line-locks to give 858 sample periods per line in 525/59.94 and 864 sample periods per line in 625/50. The spectra of such sampled luminance are shown in Figure 7.3.
In the component analog domain, the colour difference signals typically have one-half the bandwidth of the luminance signal. Thus a sampling rate multiple of 2 is used and results in 6.75 MHz. This sampling rate allows respectively 429 and 432 sample periods per line.
Component video sampled in this way has a 4:2:2 format. Whilst other combinations are possible, 4:2:2 is the format for which the majority of production equipment is constructed and is the only SD component format for which parallel and serial interface standards exist. Figure 7.4 shows the spatial arrangement given by 4:2:2 sampling. Luminance samples appear at half the spacing of colour difference samples, and every other luminance sample is co-sited with a pair of colour difference samples. Co-siting is important because it allows all attributes of one picture point to be conveyed with a three-sample vector quantity. Modification of the three samples allows such techniques as colour correction to be performed. This would be difficult without co-sited information. Co-siting is achieved by clocking the three ADCs simultaneously. In some equipment one ADC is multiplexed between the two colour difference signals. In order to obtain co-sited data it will then be necessary to have an analog delay in one of the signals.
For full bandwidth RGB working, 4:4:4 can be used with a possible 4:4:4:4 used if including a key signal. For lower bandwidths, multiples of 1 and 3 can also be used for colour difference and luminance respectively. 4:1:1 delivers colour bandwidth in excess of that required by the composite formats. 4:1:1 is used in the 525 line version of the DVC quarter-inch digital video format. 3:1:1 meets 525 line bandwidth requirements. The factors of 3 and 1 do not, however, offer a columnar structure and are inappropriate for quality post-production.
In 4:2:2 the colour difference signals are sampled horizontally at half the luminance sampling rate, yet the vertical colour difference sampling rates are the same as for luminance. Where bandwidth is important, it is possible to halve the vertical sampling rate of the colour difference signals as well. Figure 7.5 shows that in 4:2:0 sampling, the colour difference samples only exist on alternate lines so that the same vertical and horizontal resolution is obtained. 4:2:0 is used in the 625 line version of the DVC format and in the MPEG ‘Main Level Main Profile’ format for multimedia communications and, in particular, DVB.
Figure 7.6 shows that in 4:2:2 there is one luminance signal sampled at 13.5 MHz and two colour difference signals sampled at 6.75 MHz. Three separate signals with different clock rates are inconvenient and so multiplexing can be used. If the colour difference signals are multiplexed into one channel, then two 13.5 MHz channels will be required. Such an approach is commonly found within digital component processing equipment where the colour difference processing can take place in a single multiplexed channel.
If the colour difference and luminance channels are multiplexed into one, a 27 MHz clock will be required. The word order is standardized to be:
Cb, Y, Cr, Y, etc.
In order unambiguously to demultiplex the samples, the first sample in the line is defined as Cb and a unique sync pattern is required to identify the beginning of the multiplex sequence. HD adopts the same principle but the frequencies are higher.
There are two ways of handling 16:9 aspect ratio video in SD. In the anamorphic approach both the camera and display scan wider but there is no change to the sampling rates employed and therefore the same 27 MHz data stream can be employed unchanged. Compared with 4:3, the horizontal spacing of the pixels in 16:9 must be greater as the same number are stretched across a wider picture. This results in a reduction of horizontal resolution, but standard 4:3 production equipment can be used subject to some modifications to the shape of pattern wipes in vision mixers. When viewed on a 4:3 monitor anamorphic signals appear squeezed horizontally.
In the second approach, the pixel spacing is kept the same as in 4:3 and the number of samples per active line must then be increased by 16:12. This requires the data rate to rise to 36 MHz. Thus the luminance sampling rate becomes 18 MHz and the colour difference sampling rate becomes 9 MHz. Strictly speaking the format no longer adheres to CCIR-601 because the sampling rates are no longer integer multiples of 3.375 MHz. If, however, 18 MHz is considered to be covered by Rec. 601, then it must be described as 5.333…:2.666…:2.666….
If the sampling rate is chosen to be a common multiple of the US and European line rates, the spacing between the pixels that results will have to be accepted. In computer graphics, pixels are always square, which means the horizontal and vertical spacing is the same. In 601 sampling, the pixels are not square and their aspect ratio differs between the US and European standards. This is because the horizontal sampling rate is the same but the number of lines in the picture is different.
When CCIR 601 was being formulated, the computer and television industries were still substantially separate and the lack of square pixels was not seen as an issue. In 1990 CCIR 709 recommended that HD formats should be based on 1920 pixels per active line, and use sampling rates based on 2.25 MHz (6.75/3): an unnecessarily inflexible approach again making it unlikely that square pixels would result at all frame rates.
Subsequently, the convergence of computer, film and television technology has led to square pixels being adopted in HD formats at all frame rates, a common sampling rate having been abandoned. Another subtle change is in the way of counting lines. In traditional analog video formats, the number of lines was the total number, including blanking, whereas in computers the number of lines has always been the number visible on the screen, i.e. the height of the pixel array. HD standards adopted the same approach. Thus in the 625 line standard, there will be 625 line periods per frame. Whereas in the 1080 line HD standard there are actually 1125 line periods per frame.
It is slowly being understood that improved picture quality comes not from putting more pixels into the image but from eliminating interlace and increasing the frame rate2. Thus 1280×720 progressively scanned frames are described in SMPTE 296M3. Unfortunately there are still those who believe that data describing digital television images somehow differs from computer data. The bizarre adherence to the obsolete principle of interlacing seems increasingly to be based on maintaining an artificial difference between computers and television for marketing purposes rather than on any physics or psycho-optics. The failure of the ATSC and FCC to understand these principles has led to a damaging proliferation of HD television standards, in which the simple and effective approach of digital standard definition has been lost.
7.4 Structure of SD Component Digital
The sampling rate for luma is H-synchronous 13.5 MHz. This is divided by two to obtain the colour difference sampling rate. Figure 7.7 shows that in 625 line systems the control system4 waits for 132 luma sample periods after an analog sync edge before commencing sampling the line. Then 720 luma samples and 360 of each type of colour difference sample are taken; 1440 samples in all. A further 12 sample periods will elapse before the next sync edge, making 132+720+12=864 sample periods. In 525 line systems5, the analog active line is in a slightly different place and so the controller waits 122 sample periods before taking the same digital active line samples as before. There will then be 16 sample periods before the next sync edge, making 122+720+16=858 sample periods.
For 16:9 aspect ratio working, the line and field rate remain the same, but the luminance sampling rate may be raised to 18 MHz and the colour difference sampling rates are raised to 9 MHz. This results in the sampling structure shown for 625 lines in Figure 7.8(a) and for 525 lines in (b). There are now 960 luminance pixels and 2×480 colour difference pixels per active line.
7.5 Structure of HD Component Digital
Given the large number of HD scanning standards, it is only possible to outline the common principles here. Specific standards will differ in line and sample counts. Those who are accustomed to analog SD will note that in HD the analog sync pulses are different. In HD, the picture quality is more sensitive to horizontal scanning jitter and so the signal-to-noise ratio of the analog sync edge is improved by doubling the amplitude. Thus the sync edge starts at the most negative part of the waveform, but continues rising until it is as far above blanking as it was below. As a result 50% of sync, the level at which slicing of the sync pulse is defined to take place, is actually at blanking level. All other voltages and gamuts remain the same as for SD.
The treatment of SD formats introduced the concept of the digital active line being longer than the analog line. Some HD formats have formalized this by describing the total active pixel array as the production aperture, and the slightly smaller area within that, corresponding to the unblanked area of the analog format, as the clean aperture. The quantizing standards of HD are the same as for SD, except that the option of 12-bit resolution is added.
SMPTE 274M6 describes 1125 lines per frame 16:9 aspect ratio HD standards having a production aperture of 1920×1080 pixels and a clean aperture of 1888×1062 pixels. The standard uses square pixels, thus 1080×16=1920×9. Both interlaced and progressive scanning are supported, at a wide variety of frame rates: basically 24, 25, 30, 50 and 60 Hz with the option of incorporating the reduction in frequency of 0.1% for synchronization to the traditional NTSC timing.
As with SD, the sampling clock is line locked. However, there are some significant differences between the SD and HD approaches. In SD, a common sampling rate is used for both line standards. This allows both a common interface data rate and an interface that works in real time, but results in pixels that are not square. In HD, the pixels are square and this causes the video sampling rate to change with the frame rate. In order to keep the interface bit rate constant, variable amounts of packing are placed between the active lines but the result is that the interface no longer works in real time at all frame rates and requires buffering at source and destination. The interface symbol rate has been chosen to be a common multiple of 24, 25 and 30 times 1125 Hz so that there can always be an integer number of interface symbol periods in a line period.
For example, if used at 30 Hz frame rate interlaced, there would be 1125×30=33750 lines per second. Figure 7.9(a) shows that the luma sampling rate is 74.25 MHz and there are 2200 cycles of this clock in one line period. From these, 1920 cycles correspond to the active line and 280 remain for blanking and TRS. The colour difference sampling rate is one half that of luma at 37.125 MHz and 960 cycles correspond to the active line. As there are two colour difference signals, when multiplexed together the symbol rate will be 74.25+37.125+37.125=148.5 MHz. The standard erroneously calls this the interface sampling rate, which is not a sampling rate at all, but a word rate or symbol rate.
Thus the parallel interface has a clock rate of 148.5 MHz. When ten-bit symbols are serialized, the bit rate becomes 1.485 GHz, the bit rate of serial HD. If the option of adhering to the picture rate reduction of 0.1% is taken, all of the above frequencies fall by that amount.
If the frame rate is reduced to 25 Hz, as in (b), the line rate falls to 1125×25=28125 Hz and the luma sampling rate falls to 2200×28125=61.875 MHz. The interface symbol rate does not change, but remains at 148.5 MHz. In order to carry 50 Hz pictures, time compression is used. At 28 125 lines per second, there will be 2640 cycles of 74.25 MHz, the luma interface rate, per line, rather than the 2200 cycles obtained at 60 Hz. Thus the line still contains 1920 active luma samples, but for transmission, the number of blanking/TRS cycles has been increased to 720.
Although the luma is sampled at 61.875 MHz, for transmission luma samples are placed in a buffer and read out at 74.25 MHz. This means that the active line is sent in less than an active line period.
Figure 7.9(c) shows that a similar approach is taken with 24 Hz material in which the number of blanking cycles is further increased.
The 1.485 GHz rate is adequate for interlaced video and for progressively scanned film, in which the frame rate is only 24 or 25 Hz. However, for progressively scanned video, the frame rate may be as high as 60 Hz and this would require the bit rate to be doubled. However, it is becoming increasingly known that because progressive scan eliminates interlace artefacts, it does not need twice the data rate of interlaced systems to give the same perceived quality. Resolution falls dramatically in the presence of even quite slow motion in interlaced video, whereas in progressively scanned video it does not2. Thus on real moving pictures, progressively scanned systems with relatively modest static resolution give better performance because that resolution is maintained. Consequently the 50 and 60 Hz 1920×1080 progressive standards are quite unnecessary for television purposes.
SMPTE 296M describes the 720P standard3 that gives the best results of all of the television industry standards, although not as good as the progressive standard developed by the US military which has a higher frame rate.
720P uses frames containing 750 lines of which 30 correspond to the vertical interval. Note that as interlace is not used, the number of lines per frame does not need to be odd. 720P has square pixels and so must have 720×16/9=1280 pixels per line. The production aperture is thus 1280×720 pixels. A clean aperture is not defined.
The 1280×720 frame can be repeated at 60, 50, 30, 25 and 24 Hz. The same interface symbol rate as 274M is used, so clearly this must also be a common multiple of 24, 25, 30, 50 and 60 times 750 Hz.
Figure 7.10(a) shows that 720/60 has a line rate of 45 kHz and has 1650 sample periods per line, corresponding to a luma sampling rate of 74.25 MHz. The colour difference sampling rate is half of that, but as there are two colour difference signals, the overall symbol rate becomes 148.5 MHz and so this format transmits in real time. As the frame rate goes down, the number of interface symbol periods per line will rise, but the number of pixels remains constant at 1280. As a result the number of blanked symbols rises, as does the degree of time compression of the transmission of each line. This is shown in the remainder of Figure 7.10.
7.6 Synchronizing
The component interface carries a multiplex of luminance and colour difference samples and it is necessary to synchronize the demultiplexing process at the receiver so that the components are not inadvertently transposed. As conventional analog syncs are discarded, horizontal and vertical synchronizing must also be provided. In the case of serial transmission it is also necessary to identify the position of word boundaries so that correct deserialization can take place. These functions are performed by special bit patterns known as timing reference and identification signals (TRS-ID) sent with each line. TRS-ID differs only slightly between formats. Figure 7.11 shows the location of TRS-ID. Immediately before the digital active line location is the SAV (start of active video) TRS-ID pattern, and immediately after is the EAV (end of active video) TRS-ID pattern. These unique patterns occur on every line and continue throughout the vertical interval.
Each TRS-ID pattern consists of four symbols: the same length as the component multiplex repeating structure. In this way the presence of a TRS-ID does not alter the phase of the multiplex. Three of the symbols form a sync pattern for deserializing and demultiplexing (TRS) and one is an identification symbol (ID) that replaces the analog sync signals. The first symbol contains all ones and the next two contain all zeros. This bit sequence cannot occur in active video, even due to concatenation of successive pixel values, so its detection is reliable. As the transition from a string of ones to a string of zeros occurs at a symbol boundary, it is sufficient to enable unambiguous deserialization, location of the ID symbol and demultiplexing of the components. Whatever the word length of the system, all bits should be either ones or zeros during TRS.
The fourth symbol in the ID contains three data bits, H, F and V. These bits are protected by four redundancy bits which form a seven-bit Hamming codeword.
Figure 7.12(a) shows how the Hamming code is generated. Single bit errors can be corrected and double bit errors can be detected according to the decoding table in (b).
Figure 7.13(a) shows the structure of the TRS-ID. The data bits have the following meanings:
Figure 7.13(b) (top) shows the relationship between the sync pattern bits and 625 line analog timing, whilst below is the relationship for 525 lines.
Figure 7.14 shows a decode table for SD TRS which is useful when interpreting logic analyser displays.
The same TRS-ID structure is used in SMPTE 274M and 296M HD. It differs in that the HD formats can support progressive scan in which the F bit is always set to zero.
7.7 Component Ancillary Data
In component standards, only the active line is transmitted and this leaves a good deal of spare capacity. The two line standards differ on how this capacity is used. In 625 lines, only the active line period may be used on lines 20 to 22 and 333 to 3355. Lines 20 and 333 are reserved for equipment self-testing.
In 525 lines there is considerably more freedom and ancillary data may be inserted anywhere where there is no active video, either during horizontal blanking where it is known as HANC, vertical blanking where it is known as VANC, or both4. The spare capacity allows many channels of digital audio and considerably simplifies switching.
The all zeros and all ones codes are reserved for synchronizing, and cannot be allowed to appear in ancillary data. In practice only seven bits of the eight-bit word can be used as data; the eighth bit is redundant and gives the byte odd parity. As all ones and all zeros are even parity, the sync pattern cannot then be generated accidentally.
Ancillary data is always prefaced by a different four-symbol TRS which is the inverse of the video TRS in that it starts with all zeros and then has two symbols of all ones followed by the information symbol. See section 8.7 for treatment of embedded audio in SDI.
7.8 The SD Parallel Interface
Composite digital signals use the same electrical and mechanical interface as used for 4:2:2 component working4,7. This means that it is possible erroneously to plug a component signal into a composite machine. Whilst this cannot possibly work, no harm will be done because the signal levels and pinouts are the same.
A 25 pin D-type connector to ISO 2110-1989 is specified. Equipment always has female connectors, cables always have male connectors. Metal or metallized backshells are recommended with screened cables for optimum shielding. Equipment has been seen using ribbon cables and IDC (insulation displacement connectors), but is it not clear whether such cables would meet the newer more stringent EMC (electromagnetic compatibility) regulations. It should be borne in mind that the ninth and eighteenth harmonics of 13.5 MHz are both emergency frequencies for aircraft radio.
Whilst equipment may produce or accept only eight-bit data, cables must contain conductors for all ten bits. Connector latching is by a pair of 4-40 (an American thread) screws, with suitable posts provided on the female connector. It is important that the screws are used as the multicore cable is quite stiff and can eventually unseat the plug if it is not secured. Some early equipment had slidelocks instead of screw pillars, but these proved to be too flimsy. During the changeover from slidelocks to 4-40 screws, some equipment was made with metric screw pillars and these will need to be changed to attach modern cables.
When unscrewing the locking screws from a D-connector it is advisable to check that the lock screw is actually unscrewing from the pillar. It is not unknown for the pillar to rotate instead. If this is not noticed, the pillar fixings may become detached inside the equipment, which will then have to be dismantled.
Each signal in the interface is carried by a balanced pair using ECL (emitter coupled logic) drive levels. The cable has a nominal impedance of 110 ohm and must be correctly terminated. ECL runs from a power supply of nominally −5.2 V and the logic states are −0.8 V for a ‘high’and −1.85 V for a ‘low’. ECL is primarily a current driven system, and the signal amplitude is quite low compared with other logic families as well as being negative valued.
Figure 7.15 shows the pinouts used. Although it is not obvious from the figure, the numbering of the D-connector is such that signal pairs are on physically opposite pins. Originally most equipment used eight-bit data and ten-bit working was viewed as an option; this was reflected in the wording of the first standards. However, in order to reflect the increasing quantity of ten-bit equipment now in use, the wording of later standards has subtly changed to describe a ten-bit system in which only eight bits may be used.
In the old specification shown in Figure 7.15(a), there are eight signal pairs and two optional pairs, so that extension to a ten-bit word can be accommodated. The optional signals were used to add bits at the least significant end of the word. Adding bits in this way extends resolution rather than increasing the magnitude. It will be seen from the figure that the optional bits are called Data −1 and Data−2 where the −1 and −2 refer to the powers of two represented, i.e. 2−1 and 2−2. The eight-bit word describes 256 levels and ends in a radix point. The extra bits below the radix point represent the half and quarter quantizing intervals. In this way a degree of compatibility exists between ten- and eight-bit systems, as the correct magnitude will always be obtained when changing word length, and all that is lost is a degree of resolution in shortening the word length when the bits below the radix point are lost. The same numbering scheme can be used for both word lengths; the longer word length simply has a radix point and an extra digit in any number base. Converting to the eight-bit equivalent can then be simply a matter of deleting the extra digit and retaining the integer part.
However, the later specification shown in (b) renumbers the bits from 0 to 9 and assumes a system with 1024 levels. Thus all standard levels defined in the old 8+2-bit documents have to be multiplied by four to convert them to the levels in the new ten-bit documents. Thus a level of 16 decimal or 10 hex in the eight-bit system becomes 64 decimal or 40 hex in the ten-bit system. Figure 7.16 shows that 8+2 schemes may use hexadecimal numbering in the XYZ or XYLo format where two hex digits X and Y represent the most significant eight bits and the remaining two bits are represented by the Z or Lo symbol. Ten-bit schemes are numbered with three hex digits PQR where P only has two meaningful bits.
A separate clock signal pair and a number of grounding and shielding pins complete the connection. Figure 7.15 also shows the relationship between the clock and the data. A positive-going clock edge is used to sample the signal lines after the level has settled between transitions. In component, the clock will be line-locked 27 MHz or 36 MHz irrespective of the line standard, whereas in composite digital the clock will be four times the frequency of the PAL or NTSC subcarrier.
The parallel interface is suitable for distances of up to 50 metres (27 MHz) or 40 metres (36 MHz). Beyond this distance equalization is likely to be necessary and skew or differential delay between signals may become a problem. Equalization of such a large number of signals is not economically viable.
The parallel interface sends active line blocks sandwiched between SAV and EAV TRS codes. The remainder of the blanking periods will contain words whose value alternates between the luma and colour difference blanking codes, unless ancillary data is being sent.
7.9 The HD Parallel Interface
This obsolescent interface is suitable for RGB or colour difference working. In RGB, each component is carried on three separate sets of conductors whereas in colour difference working the luma is carried on one set and the two colour difference signals are multiplexed into two of the conductor sets, the third being unused, although an auxiliary signal such as a key channel may optionally be sent on the third set. The general concept is identical to the SD parallel interface, with one differential pair of wires per bit and a single differential clock, all at 110 ohm ECL levels. Each pair has its own screen and so three connector pins are required for each bit. Given that three sets of ten-bit data plus a clock are needed, it is clear that the connector will need a massive 93 pins. At a clock rate of 74.25 MHz, this interface is restricted to a length of 20 metres.
As this interface is essentially two, or three for RGB, channels in parallel, each channel has its own synchronizing means. Thus the luma data have their own TRS-ID and the colour difference data also have their own TRS-ID. The TRS codes in each channel should be co-timed.
7.10 The Composite Digital Parallel Interface
When composite video is to be digitized, the input will be a single waveform having spectrally interleaved luminance and chroma. Any sampling rate allowing sufficient bandwidth would convey composite video from one point to another. However, if processing in the digital domain is contemplated, there will be less choice.
In the composite digital colour processor it will be necessary to decode the composite signal, which will require some kind of digital filter. Whilst it is possible to construct filters with any desired response, it is a fact that a digital filter whose response is simply related to the sampling rate will be much less complex to implement. This is the reasoning that led to the near universal use of four times subcarrier sampling rate. Figure 7.17 shows the spectra of PAL and NTSC sampled at 4×Fsc. It will be evident that there is a considerable space between the edge of the baseband and the lower sideband. This allows the anti-aliasing and reconstruction filters to have a more gradual cut-off, so that ripple in the passband can be reduced. This is particularly important for composite digital recorders, since they are digital devices in an analog environment, and signals may have been converted to and from the digital domain many times in the course of production. A subcarrier multiple sampling clock is easily obtained by gating burst to a phase-locked loop. In NTSC there is no burst swing, whereas at 4Fsc, the burst swing of PAL moves burst crossings by exactly one sample period and so the phase relationship between burst crossings and 4Fs, clock is unaffected by burst swing.
In NTSC, siting of samples along the line is affected by ScH phase. In PAL, the presence of the 25 Hz component of subcarrier means that samples are not in exactly the same place from one line to the next. The columns lean over slightly such that at the bottom of a field there is a displacement of two samples with respect to the top.
Composite digital samples at four times subcarrier frequency, and so there will be major differences between the PAL and NTSC standards. It is not possible to transmit digitized SECAM. Whilst the component interface transmits only active lines and special sync patterns, the composite interfaces carry the entire composite waveform – syncs, burst and active line. Although ancillary data may be placed in sync tip, the rising and falling sync edges must be present. In the absence of ancillary data, the data on the parallel interface is essentially the continuous stream of samples from a convertor which is digitizing a normal analog composite signal. Virtually all that is necessary to return to the analog domain is to strip out ancillary data and substitute sync tip values prior to driving a DAC and a filter. One of the reasons for this different approach is that the sampling clock in composite video is subcarrier locked. The sample values during sync can change with ScH phase in NTSC and PAL and change with the position in the frame in PAL due to the 25 Hz component. It is simpler to convey sync sample values on the interface than to go to the trouble of recreating them later.
The instantaneous voltage of composite video can go below blanking on dark saturated colours, and above peak white on bright colours. As a result the quantizing ranges need to be stretched in comparison with component in order to accommodate all possible voltage excursions. Sync tip can be accommodated at the low end and peak white is some way below the end of the scale. It is not so easy to determine when overload clipping will take place in composite as the sample sites are locked to subcarrier. The degree of clipping depends on the chroma phase. When samples are taken either side of a chroma peak, clipping will be less likely to occur than when the sample is taken at the peak. Advantage is taken of this phenomenon in PAL as the peak analog voltage of a 100% yellow bar goes outside the quantizing range. The sampling phase is such that samples are sited either side of the chroma peak and remain within the range.
The PAL and NTSC versions of the composite digital interface will be described separately. The electrical interface is the same as for a digital component.
7.10.1 Pal Interface
The quantizing range of digital PAL is shown in Figure 7.188. Blanking level is at 25610 (6410) and sync tip is the lowest allowable code of 4(1) as 0 is reserved for digital synchronizing. Peak white is 84410 (21110).
In PAL, the composite digital interface samples at 4×10Fsc, with sample phase aligned with burst phase. PAL burst swing results in burst phases of ±135 degrees, and samples are taken at these phases and at ±45 degrees, precisely half-way between the U and V axes. This sampling phase is easy to generate from burst and avoids premature clipping of chroma. It is most important that samples are taken exactly at the points specified, since any residual phase error in the sampling clock will cause the equivalent of a chroma phase error when samples from one source are added to samples from a different source in a switcher. A digital switcher can only add together pairs of samples from different inputs, but if these samples were not taken at the same instants with respect to their subcarriers, the samples represent different vectors and cannot be added.
Figure 7.19 shows how the sampling clock may be derived. The incoming sync is used to derive a burst gate, during which the samples of burst are analysed. If the clock is correctly phased, the sampled burst will give values of 38010 (9510), 25610 (6410), 12810 (3210), 25610 (6410) repeated, whereas if a phase error exists, the values at the burst crossings will be above or below 25610 (6410). The difference between the sample values and blanking level can be used to drive a DAC that controls the sampling VCO. In this way any phase errors in the ADC are eliminated, because the sampling clock will automatically servo its phase to be identical to digital burst. Burst swing causes the burst peak and burst crossing samples to change places, so a phase comparison is always possible during burst. DC level shifts can be removed by using both positive and negative burst crossings and averaging the results. This also has the effect of reducing the effect of noise.
In PAL, the subcarrier frequency contains a 25 Hz offset, and so 4×Fsc will contain a 100 Hz offset. The sampling rate is not h-coherent, and the sampling structure is not quite orthogonal. As subcarrier is given by:
the sampling rate will be given by:
This results in 709 379 samples per frame, and there will not be a whole number of samples in a line. In practice, 1135 sample periods, numbered 0 to 1134, are defined as one digital line, with an additional 2 sample periods per field which are included by having 1137 samples, numbered 0 to 1136, in lines 313 and 625. Figure 7.20(a) shows the sample numbering scheme for an entire line. Note that the sample numbering begins at 0 at the start of the digital active line so that the horizontal blanking area is near the end of the digital line and the sample numbers will be large. The digital active line is 948 samples long and is longer than the analog active line. This allows the digital active line to move with 25 Hz whilst ensuring the entire analog active line is still conveyed.
Since sampling is not h-coherent, the position of sync pulses will change relative to the sampling points from line to line. The relationship can also be changed by the ScH phase of the analog input. Zero ScH is defined as coincidence between sync and zero degrees of subcarrier phase at line 1 of field 1. Since composite digital samples on burst phase, not on subcarrier phase, the definition of zero ScH will be as shown in Figure 7.21, where it will be seen that two samples occur at exactly equal distances either side of the 50% sync point. If the input is not zero ScH, the samples conveying sync will have different values. Measurement of these values will allow ScH phase to be computed.
7.10.2 NTSC Interface
Although they have some similarities, PAL and NTSC are quite different when analysed at the digital sample level. Figure 7.22 shows how the NTSC waveform fits into the quantizing structure9. Blanking is at 24014 (6010) and peak white is at 80010 (20010), so that 1 IRE unit is the equivalent of 1.4Q which could perhaps be called the DIRE. These different values are due to the different sync/vision ratio of NTSC. PAL is 7:3 whereas NTSC is 10:4.
Subcarrier in NTSC has an exact half-line offset, so there will be an integer number of cycles of subcarrier in two lines. Fsc is simply 227.5×Fh, and as sampling is at 4×Fsc, there will be 227.5×4=910 samples per line period, and the sampling will be orthogonal. Figure 7.20(b) shows that the digital active line consists of 768 samples numbered 0 to 767. Horizontal blanking follows the digital active line in sample numbers 768 to 909.
The sampling phase is chosen to facilitate encoding and decoding in the digital domain. In NTSC there is a phase shift of 123 degrees between subcarrier and the I axis. As burst is an inverted piece of the subcarrier waveform, there is a phase shift of 57 degrees between burst and the I axis. Composite digital NTSC does not sample in phase with burst, but on the I and Q axes at 57, 147, 237 and 327 degrees with respect to burst.
Figure 7.23 shows how this approach works in relation to sync and burst. Zero ScH is defined as zero degrees of subcarrier at the 50% point on sync, but the 57 degree sampling phase means that the sync edge is actually sampled 25.6 ns ahead of, and 44.2 ns after, the 50% point. Similarly, when the burst is reached, the phase shift means that burst sample values will be 4610, 8310, 7410 and 3710 repeating. The phase-locked loop that produces the sampling clock will digitally compare the samples of burst with the values given here. As the burst is not sampled at a zero crossing, the slope will be slightly less. The gain of the phase error detector will also be less, and more prone to burst noise than in the PAL process. The phase error will normally be averaged over several burst samples to overcome this problem.
7.11 Serial Digital Video Interfaces
The serial interfaces described here have a great deal of commonality. Any differences will be noted subsequently. All of them allow up to ten-bit samples to be communicated serially10. If there are only eight bits in the input samples, the missing bits are forced to zero for transmission except for the all-ones condition during TRS which will be forced to ten ones. The interfaces are transparent to ancillary data in the parallel domain, including conveyance of AES/EBU digital audio channels.
Serial transmission uses concepts that were introduced in Chapter 3. At the high bit rates of digital video, the cable is a true transmission line in which a significant number of bits are actually in the cable at any one time, having been sent but not yet received. Under these conditions cable loss is significant. These interfaces operate with cable losses up to 30 dB. The losses increase with frequency and so the bit rate in use and the grade of cable employed both affect the maximum distance the signal will safely travel. Figure 7.24 gives some examples of cable lengths that can be used in SD. In HD there is only one bit rate. Using Belden 1649A or equivalent, a distance of 140 m can be achieved.
Serial transmission uses a waveform that is symmetrical about ground and has an initial amplitude of 800 mV pk–pk across a 75 ohm load. This signal can be fed down 75 ohm coaxial cable having BNC connectors. Serial interfaces are restricted to point-to-point links. Unlike analog video practice, serial digital receivers contain correct termination that is permanently present and passive loop-through is not possible. In permanent installations, no attempt should be made to drive more than one load using T-pieces as this will result in signal reflections that seriously compromise the data integrity. On the test bench with very short cables, however, systems with all manner of compromises may still function.
The range of waveforms that can be received without gross distortion is quite small and raw data produce waveforms outside this range. The solution is the use of scrambling, or pseudo-random coding. The serial interfaces use convolutional scrambling as was described in Chapter 3. This is simpler to implement in a cable installation because no separate synchronizing of the randomizing is needed. The scrambling process at the transmitter spreads the signal spectrum and makes that spectrum reasonably constant and independent of the picture content. It is possible to assess the degree of equalization necessary by comparing the energy in a low-frequency band with that in higher frequencies. The greater the disparity, the more equalization is needed. Thus fully automatic cable equalization at the receiver is easily achieved.
The essential parts of a serial link are shown in Figure 7.25. Parallel data having a word length of up to ten bits forms the input. These are fed to a ten-bit shift register which is clocked at ten times the input word rate: 1.485 GHz, 360 MHz, 270 MHz or 40×Fsc. The serial data emerge from the shift register LSB first and are then passed through the scrambler, in which a given bit is converted to the exclusive-OR of itself and two bits that are five and nine clocks ahead. This is followed by another stage, which converts channel ones into transitions. The transition encoder ensures that the signal is polarity independent. The resulting logic level signal is converted to a 75 ohm source impedance signal at the cable driver.
The receiver must regenerate a bit clock at 1.485 MHz, 360 MHz, 270 MHz or 40×F sc from the input signal, and this clock drives the input sampler and slicer which converts the cable waveform back to serial binary. The local bit clock also drives a circuit that simply reverses the scrambling at the transmitter. The first stage returns transitions to ones. The second stage is a mirror image of the encoder and reverses the exclusive-OR calculation to output the original data. Such descrambling results in error extension, but this is not a practical problem since link error rates are practically zero.
As transmission is serial, it is necessary to obtain word synchronization, so that correct deserialization can take place. The TRS patterns are used for this purpose. The all-ones and all-zeros bit patterns form a unique 30-bit sequence which is detected in the receiver’s shift register. The transition from all ones to all zeros is on a word boundary and from that point on the deserializer simply divides by ten to find the word boundaries in the transmission.
7.11.1 Standard Definition Serial Digital Interface (SDI)
This interface supports 525/59.94 2:1 and 625/50 2:1 scanning standards in component and composite. The component interfaces use a common bit rate of 270 MHz for 4:3 pictures with an option of 360 MHz for 16:9. In component, the TRS codes are already present in the parallel domain and SDI does no more than serialize the parallel signal protocol unchanged.
Composite digital samples at four times the subcarrier frequency and so the bit rate is different between the PAL and NTSC variants. The composite parallel interface signal is not a multiplex and also carries digitized analog syncs. Consequently there is no need for TRS codes. For serial transmission it is necessary to insert TRS at the serializer and subsequently to strip it out at the serial-to-parallel convertor. The TRS-ID is inserted during blanking, and the serial receiver can detect the patterns it contains. Composite TRS-ID is different to the one used in component signals and consists of five words inserted just after the leading edge of analog video sync. Figure 7.26(a) shows the location of TRS-ID at samples 967–971 in PAL and (b) shows the location at samples 790–794 in NTSC.
Out of the five words in TRS-ID, the first four are for synchronizing, and consist of a single word of all ones, followed by three words of all zeros. Note that the composite TRS contains an extra word of zeros compared with the component TRS and this could be used for signal identification in multi-standard devices. The fifth word is for identification, and carries the line and field numbering information shown in Figure 7.27. The field numbering is colour-framing information useful for editing. In PAL the field numbering will go from zero to seven, whereas in NTSC it will only reach three.
On detection of the synchronizing symbols, a divide-by-ten circuit is reset, and the output of this will clock words out of the shift register at the correct times. This circuit will also provide the output word clock.
7.11.2 SDTI
SDI is closely specified and is only suitable for transmitting 2:1 interlaced 4:2:2 digital video in 525/60 or 625/50 systems. Since the development of SDI, it has become possible economically to compress digital video and the SDI standard cannot handle this. SDTI (serial data transport interface) is designed to overcome that problem by converting SDI into an interface that can carry a variety of data types whilst retaining compatibility with existing SDI router infrastructures.
SDTI sources produce a signal which is electrically identical to an SDI signal and which has the same timing structure. However, the digital active line of SDI becomes a data packet or item in SDTI. Figure 7.28 shows how SDTI fits into the existing SDI timing. Between EAV and SAV (horizontal blanking in SDI) an ancillary data block is incorporated. The structure of this meets the SDI standard, and the data within describes the contents of the following digital active line.
The data capacity of SDTI is about 200 Mbits/s because some of the 270 Mbits/s are lost due to the retention of the SDI timing structure. Each digital active line finishes with a CRCC (cyclic redundancy check character) to check for correct transmission.
SDTI raises a number of opportunities, including the transmission of compressed data at faster than real time. If a video signal is compressed at 4:1, then one quarter as much data would result. If sent in real time the bandwidth required would be one quarter of that needed by uncompressed video. However, if the same bandwidth is available, the compressed data could be sent in one quarter of the usual time. This is particularly advantageous for data transfer between compressed camcorders and non-linear editing workstations. Alternatively, four different 50 Mbit/s signals could be conveyed simultaneously.
Thus an SDTI transmitter takes the form of a multiplexer which assembles packets for transmission from input buffers. The transmitted data can be encoded according to MPEG, MotionJPEG, Digital Betacam or DVC formats and all that is necessary is that compatible devices exist at each end of the interface. In this case the data are transferred with bit accuracy and so there is no generation loss associated with the transfer. If the source and destination are different, that is, having different formats or, in MPEG, different group structures, then a conversion process with attendant generation loss would be needed.
7.11.3 ASI
The asynchronous serial interface is designed to allow MPEG transport streams to be transmitted over standard SDI cabling and routers. ASI offers higher performance than SDTI because it does not adhere to the SDI timing structure. Transport stream data do not have the same statistics as PCM video and so the scrambling technique of SDI cannot be used. Instead ASI uses an 8/10 group code (see section 3.8) to eliminate DC components and ensure adequate clock content.
SDI equipment is designed to run at a closely defined bit rate of 270 Mbits/s and has phase-locked loops in receiving and repeating devices which are intended to remove jitter. These will lose lock if the channel bit rate changes. Transport streams are fundamentally variable in bit rate and to retain compatibility with SDI routing equipment ASI uses stuffing bits to keep the transmitted bit rate constant.
The use of an 8/10 code means that although the channel bit rate is 270 Mbits/s, the data bit rate is only 80% of that, that is, 216 Mbits/s. A small amount of this is lost to overheads.
7.11.4 High Definition Serial Digital Interface (HD-SDI)
The SD serial interface runs at a variety of bit rates according to the television standard being sent. In contrast the HD serial interface11 runs at only one bit rate, 1.485 Gbits/s, although it is possible to reduce this by 0.1% so that it can lock to traditional 59.94 Hz equipment. At this high bit rate, variable speed causes too many difficulties and it is easier to accommodate a reduced data rate by sending more blanking or ancillary data so that the transmitted bit rate stays the same. A receiver can work out which format is being sent by counting the number of blanking periods between the active lines.
Apart from the bit rate, the HD serial interface has as much in common with the SDI standard as possible. Although the impedance, signal level and channel coding are the same, the HD serial interface has a number of detail differences in the protocol.
The parallel HD interface above has two channels, one for luma and one for multiplexed colour difference data. Each of these has a symbol rate of 74.25 MHz and has its own TRS-ID structure. Essentially the HD serial interface is transparent to this data as it simply multiplexes between the two channels at symbol rate. As far as the active line is concerned, the result is the same as for SD: a sequence of Cb, Y, Cr, Y, etc. However, in HD the TRS-IDs of the two channels are also multiplexed. A further difference is that the HD interface has a line number and a CRC for each active line inserted immediately after EAV. Figure 7.29(a) shows the EAV and SAV structure of each channel, with the line count and CRC, whereas (b) shows the resultant multiplex.
7.12 Digital Video Interfacing Chipsets
Implementation of digital video systems is much easier now that specialized chips are available. The introduction of HD-SDI has required significant increase in chip performance to support the additional bit rate. HD chips are thus more expensive than their SD equivalent. One useful move is that HD and SD chips are being made with the same pinouts. Thus a single circuit board can be made into an SD or an HD device just by installing chips of the appropriate speed.
Figure 7.30 shows a hypothetical 4:2:2 component system starting with analog signals and ending with the same to illustrate the processes which are necessary. The syncs on Y are separated and multiplied in a phase-locked loop to produce a 27 MHz master clock. This is divided by 2 and by 4 to produce the sampling clocks for the convertors. This results in three data streams, which can be multiplexed to form a parallel interface signal using a parallel encoder chip such as the Sony CXD8068G. This parallel signal may be output using a set of ECL line drivers. If it is required to convert the parallel signal to SDI, a serial encoder will be required. The Sony SBX1610A and the Gennum GS9002 contain all parallel-to-serial functions, but output logic level signals which require a CXA 1389AQ or a GS9007 cable driver to produce the 1.6 volt pk–pk SDI signal which will fall to the standard 0.8 volts after passing through the source terminating resistors.
At the receiving end of the cable the signal requires equalization, clock regeneration and deserializing. The Sony SBX1602A provides all of these functions in one device whereas the Gennum solution is to combine equalization and reclocking in the GS9005 and to perform deserialization in the GS9000. In both cases the output is parallel single-ended data which can be returned to the parallel interface specification using ECL drivers. Alternatively the parallel data may be sent directly to a parallel interface decoder such as the Sony CXD8069G which demultiplexes the 27 MHz data to provide separate outputs for driving three DACs.
Figure 7.31 shows a block diagram of the CXD8068G parallel interface encoder. This accepts the parallel input from three component ADCs and multiplexes them to the 27 MHz parallel standard. The rounding process allows ten-bit inputs to be rounded to shorter word lengths. The limiter prevents out-of-range analog signals from producing all-ones or all-zeros codes which are reserved for synchronizing. In addition to a 27 MHz clock derived from horizontal sync, the chip requires horizontal and frame drives to operate the timing counters which address the TRS generator. The final multiplexer selects TRS patterns, video data or ancillary data for the ten-bit parallel output.
Figure 7.32(a) shows the SBX1601A serial encoder and (b) shows the GS9002 serial encoder. Of necessity these chips contain virtually identical processing. Parallel input data are clocked into the input latch by the parallel word clock which is multiplied in frequency by a factor of ten in a phase-locked loop to provide a serial bit clock. There is provision for selecting several centre frequencies for composite or component applications. The data latch output is examined by logic that detects input sync patterns and extends eight-bit sync values to ten bits. The parallel data are then serialized in a shift register prior to passing through the scrambler and the transition generator.
Figure 7.33 shows an SDI cable driver chip. The device shown has quadruple outputs and is useful in applications such as distribution amplifiers. Note that each differential amplifier produces a pair of separate SDI outputs. The fact that these are mutually inverted is irrelevant as the SDI signal is not polarity conscious. Note the resistor networks that provide correct cable source termination.
Figure 7.34(a) shows the Gennum GS9005 reclocking receiver. This contains an automatic cable equalizer and a phase-locked loop clock recovery circuit that drives a slicer/sampler to recover the channel waveform to a logic level signal for subsequent descrambling in a separate device. The equalizer operates by estimating the cable length from the input amplitude and driving a voltage-controlled filter from the signal strength. A buffered eye pattern test point is provided. The equalizer output is DC restored prior to slicing to ensure that the slicing takes place around the waveform centre line. The slicer output will contain timing jitter and so a phase-locked loop is used having a loop filter to reject the jitter. The jitter-free clock is used to drive the data latch which samples the slicer output between transitions. The VCO centre frequency can be selected from four values and provision is made for an adjusting potentiometer for each frequency.
Figure 7.34(b) shows the GS9000 serial decoder which complements the GS9005. This contains a descrambler and a serial-to-parallel convertor synchronized by the detection of TRS in the shift register. The chip also contains an automatic standard detector that outputs a two-bit standard code for external indication and to select the centre frequency of the GS9005. The single-ended parallel output can be converted to the differential parallel output standard using a multiple ECL driver such as a VS621.
Figure 7.35 shows the Sony SBX1602A, which contains all of the serial receiving functions in one device. Its operation should be self-evident from the description of the Gennum devices above.
Parallel data can be demultiplexed for conversion to analog by the CXD8069G device shown in Figure 7.36 that also extracts ancillary data. The TRS detector identifies sync patterns and uses them to direct the ID word to the Hamming code error-correction stage. This outputs corrected timing signals that are decoded to produce analog video timing drives. A FIFO (First in First out) buffer acts as a small timebase corrector to allow the DACs to be driven with a stable clock. Ten-bit video data may be rounded to shorter word lengths if required, prior to demultiplexing into separate component outputs.
As the HD protocol is based heavily on the SD protocol, HD chipsets differ primarily in the bit rate they can handle. Detail differences include the generation of the line count parameter and CRC following SAV and the need for a different sync recognition system owing to the interleaving of two TRS codes in the serial bitstream. Figure 7.37 shows a typical HD serial system.
In component SDI, there is provision for ancillary data packets to be sent during blanking10,12. The high clock rate of component means that there is capacity for up to 16 audio channels sent in four groups. Composite SDI has to convey the digitized analog sync edges and bursts and only sync tip is available for ancillary data. As a result of this and the lower clock rate, composite has much less capacity for ancillary data than component although it is still possible to transmit one audio data packet carrying four audio channels in one group. Figure 7.38(a) shows where the ancillary data may be located for PAL and (b) shows the locations for NTSC.
As was shown in Chapter 4, the data content of the AES/EBU digital audio subframe consists of validity (V), user (U) and channel (C) status bits, a 20-bit sample and four auxiliary bits which optionally may be appended to the main sample to produce a 24-bit sample. The AES recommends sampling rates of 48, 44.1 and 32 kHz, but the interface permits variable sampling rates. SDI has various levels of support for the wide range of audio possibilities and these levels are defined in Figure 7.39. The default or minimum level is Level A which operates only with a video-synchronous 48 kHz sampling rate and transmits V, U, C and the main 20-bit sample only. As Level A is a default it need not be signalled to a receiver as the presence of IDs in the ancillary data is enough to ensure correct decoding. However, all other levels require an audio control packet to be transmitted to teach the receiver how to handle the embedded audio data. The audio control packet is transmitted once per field in the second horizontal ancillary space after the video switching point before any associated audio sample data. One audio control packet is required per group of audio channels.
If it is required to send 24-bit samples, the additional four bits of each sample are placed in extended data packets that must directly follow the associated group of audio samples in the same ancillary data space.
There are thus three kinds of packet used in embedded audio: the audio data packet which carries up to four channels of digital audio, the extended data packet and the audio control packet.
In component systems, ancillary data begins with a reversed TRS or sync pattern. Normal video receivers will not detect this pattern and so ancillary data cannot be mistaken for video samples. The ancillary data TRS consists of all zeros followed by all ones twice. There is no separate TRS for ancillary data in composite. Immediately following the usual TRS, there will be an ancillary data flag whose value must be 3FC16. Following the ancillary TRS or data flag is a data ID word containing one of a number of standardized codes which tell the receiver how to interpret the ancillary packet. Figure 7.40 shows a list of ID codes for various types of packets. Next come the data block number and the data block count parameters. The data block number increments by 1 on each instance of a block with a given ID number. On reaching 255 it overflows and recommences counting. Next, a data count parameter specifies how many symbols of data are being sent in this block. Typical values for the data count are 3610 for a small packet and 4810 for a large packet. These parameters help an audio extractor to assemble contiguous data relating to a given set of audio channels.
Figure 7.41 shows the structure of the audio data packing. In order to prevent accidental generation of reserved synchronizing patterns, bit 9 is the inverse of bit 8 so the effective system word length is nine bits. Three nine-bit symbols are used to convey all of the AES/EBU subframe data except for the four auxiliary bits. Since four audio channels can be conveyed, there are two ‘Ch’ or channel number bits specifying the audio channel number to which the subframe belongs. A further bit, Z, specifies the beginning of the 192-sample channel status message. V, U and C have the same significance as in the normal AES/EBU standard, but the P bit reflects parity on the three nine-bit symbols rather than the AES/EBU definition. The three-word sets representing an audio sample will then be repeated for the remaining three channels in the packet but with different combinations of the Ch bits.
One audio sample in each of the four channels of a group requires 12 video sample periods and so packets will contain multiples of 12 samples. At the end of each packet a checksum is calculated on the entire packet contents.
If 24-bit samples are required, extended data packets must be employed in which the additional four bits of each audio sample in an AES/EBU frame are assembled in pairs according to Figure 7.42. Thus for every 12 symbols conveying the four 20-bit audio samples of one group in an audio data packet two extra symbols will be required in an extended data packet.
The audio control packet structure is shown in Figure 7.43. Following the usual header are symbols representing the audio frame number, the sampling rate, the active channels, the processing delay and some reserved symbols. The sampling rate parameter allows the two AES/EBU channel pairs in a group to have different sampling rates if required. The active channel parameter simply describes which channels in a group carry meaningful audio data. The processing delay parameter denotes the delay the audio has experienced measured in audio sample periods. The parameter is a 26-bit two’s complement number requiring three symbols for each channel. Since the four audio channels in a group are generally channel pairs, only two delay parameters are needed. However, if four independent channels are used, one parameter each will be required. The e bit denotes whether four individual channels or two pairs are being transmitted.
The frame number parameter comes about in 525 line systems because the frame rate is 29.97 Hz not 60 Hz. The resultant frame period does not contain a whole number of audio samples. An integer ratio is only obtained over the multiple frame sequence shown in Figure 7.44. The frame number conveys the position in the frame sequence. At 48 kHz odd frames hold 1602 samples and even frames hold 1601 samples in a five-frame sequence. At 44.1 and 32 kHz the relationship is not so simple and to obtain the correct number of samples in the sequence certain frames (exceptions) have the number of samples altered. At 44.1 kHz the frame sequence is 100 frames long whereas at 32 kHz it is 15 frames long.
As the two channel pairs in a group can have different sampling rates, two frame parameters are required per group. In 50 Hz systems all three sampling rates allow an integer number of samples per frame and so the frame number is irrelevant.
As the ancillary data transfer is in bursts, it is necessary to provide a little RAM buffering at both ends of the link to allow real-time audio samples to be time compressed up to the video bit rate at the input and expanded back again at the receiver. Figure 7.45 shows a typical audio insertion unit in which the FIFO buffers can be seen. In such a system all that matters is that the average audio data rate is correct. Instantaneously there can be timing errors within the range of the buffers. Audio data cannot be embedded at the video switch point or in the areas reserved for EDH packets, but provided that data are evenly spread throughout the frame 20-bit audio can be embedded and retrieved with about 48 audio samples of buffering. If the additional four bits per sample are sent this requirement rises to 64 audio samples. The buffering stages cause the audio to be delayed with respect to the video by a few milliseconds at each insertion. Whilst this is not serious, Level I allows a delay-tracking mode which allows the embedding logic to transmit the encoding delay so a subsequent receiver can compute the overall delay. If the range of the buffering is exceeded for any reason, such as a non-synchronous audio sampling rate fed to a Level A encoder, audio samples are periodically skipped or repeated in order to bring the delay under control.
It is permitted for receivers that can only handle 20-bit audio to discard the four-bit sample extension data. However, the presence of the extension data requires more buffering in the receiver. A device having a buffer of only 48 samples for Level A working could experience an overflow due to the presence of the extension data.
In 48 kHz working, the average number of audio samples per channel is just over three per video line. In order to maintain the correct average audio sampling rate, the number of samples sent per line is variable and not specified in the standard. In practice a transmitter generally switches between packets containing three samples and packets containing four samples per channel per line as required to keep the buffers from overflowing. At lower sampling rates either smaller packets can be sent or packets can be omitted from certain lines.
As a result of the switching, ancillary data packets in component video occur mostly in two sizes. The larger packet is 55 words in length of which 48 words are data. The smaller packet contains 43 words of which 36 are data. There is space for two large packets or three small packets in the horizontal blanking between EAV and SAV.
A typical embedded audio extractor is shown in Figure 7.46. The extractor recognizes the ancillary data TRS or flag and then decodes the ID to determine the content of the packet. The group and channel addresses are then used to direct extracted symbols to the appropriate audio channel. A FIFO memory is used to timebase expand the symbols to the correct audio sampling rate.
7.14 EDH – Error Detection and Handling
Surprisingly, the original SD-SDI standard had no provisions for data integrity checking. EDH is an option for SD-SDI which rectifies the omission13,14. Figure 7.47 shows an EDH equipped SDI (serial digital interface) transmission system. At the first transmitter, the data from one field is transmitted and simultaneously fed to a cyclic redundancy check (CRC) generator. The CRC calculation is a mathematical division by a polynomial and the result is the remainder. The remainder is transmitted in a special ancillary data packet sent early during the vertical interval, before any switching takes place in a router14. The first receiver has an identical CRC generator that performs a calculation on the received field. The ancillary data extractor identifies the EDH packet and demultiplexes it from the main data stream. The remainder from the ancillary packet is then compared with the locally calculated remainder. If the transmission is error free, the two values will be identical. In this case no further action results. However, if as little as one bit is in error in the data, the remainders will not match. The remainder is a 16-bit word and guarantees to detect up to 16 bits in error anywhere in the field. Greater numbers of errors are not guaranteed to be detected, but this is of little consequence as enough fields in error will be detected to indicate that there is a problem.
Should a CRC mismatch indicate an error in this way, two things happen. First, an optically isolated output connector on the receiving equipment will present a low impedance for a period of 1 to 2 milliseconds. This will result in a pulse in an externally powered circuit to indicate that a field contained an error. An external error-monitoring system wired to this connector can note the occurrence in a log or sound an alarm or whatever it is programmed to do. As the data have been incorrectly received, the fact must also be conveyed to subsequent equipment. It is not permissible to pass on a mismatched remainder. The centre unit in Figure 7.47 must pass on the data as received, complete with errors, but it must calculate a new CRC that matches the erroneous data. When received by the third unit in Figure 7.47, there will then only be a CRC mismatch if the transmission between the second and third devices is in error. This is correct as the job of the CRC is only to locate faulty hardware and clearly if the second link is not faulty the CRC comparison should not fail. However, the third device still needs to know that there is a problem with the data, and this is the job of the error flags that also reside in the EDH packet. One of these flags is called edh (error detected here) and will be asserted by the centre device in Figure 7.47.
The last device in Figure 7.47 will receive edh and transmit eda (error detected already). There are also flags to handle hardware failures (e.g. over-temperature or diagnostic failure). The idh (internal error detected here) and ida (internal error detected already) handle this function. Locally detected hardware errors drive the error output socket to a low impedance state constantly to distinguish from the pulsing of a CRC mismatch.
A slight extra complexity is that error checking can be performed in two separate ways. One CRC is calculated for the active picture only, and another is calculated for the full field. Both are included in the EDH packet shown in Figure 7.48. The advantage of this arrangement is that whilst regular programme material is being passed in active picture, test patterns can be sent in vertical blanking which can be monitored separately. Thus if active picture is received without error but full field gives an error, the error must be outside the picture. It is then possible to send, for example, pathological test patterns during the vertical interval to stress the transmission system more than regular data to check the performance margin of the system. This can be done alongside the picture information without causing any problems.
In a large system, if every SDI link is equipped with EDH, it is possible for automatic error location to be performed. Each EDH-equipped receiver is connected to a monitoring system that can graphically display on a map of the system the location of any transmission errors. If a suitable logging system is used, it is not necessary for the display to be in the same place as the equipment. In the event of an error condition, the logging system can communicate with the display by dialup modem or dedicated line over any distance. Logging allows infrequent errors to be counted. Any increase in error rate indicates a potential failure that can be rectified before it becomes serious.
An increasing amount of new equipment is available with EDH circuitry. However, older equipment can still be incorporated into EDH systems by connecting it in series with proprietary EDH insertion and checking modules.
References
1. CCIR Recommendation 601-1, Encoding Parameters for Digital Television for Studios
2. Watkinson, J., Convergence in Broadcast and Communications Media, Chapter 7. Oxford: Focal Press ISBN 0 240 51509 9 (2001)
3. SMPTE 296M 1280×720 Progressive Image Sample Structure – Analog and Digital Representation and Analog Interface (2001)
4. SMPTE 125M, Television – Bit Parallel Digital Interface – Component Video Signal 4:2:2
5. EBU Doc. Tech. 3246
6. SMPTE 274M Proposed Standard – 1920×1080 Image Sample Structure Digital Representation and Digital Timing Reference Sequences for Multiple Picture Rates
7. CCIR Recommendation 656
8. SMPTE Proposed Standard – Bit Parallel Digital Interface for 625/50 System PAL Composite Digital Video Signal
9. SMPTE 244M, Television – System M/NTSC Composite Video Signals – Bit-Parallel Digital Interface
10. SMPTE 259M – 10-bit 4:2:2 Component and 4Fsc NTSC Composite Digital Signals – Serial Digital Interface
11. SMPTE 292M – Bit-Serial Digital Interface for High Definition Television Systems
12. Wilkinson, J.H., Digital audio in the digital video studio – a disappearing act? Presented at 9th International AES Conference, Detroit, Audio Engineering Society (1991)
13. Elkind, R. and Fibush, D., Proposal for error detection and handling in studio equipment. Presented at 25th SMPTE Television Conference, Detroit (1991)
14. SMPTE RP165 – Error Detection Checkwords and Status Flags for use in Bit-Serial Digital Interfaces for Television
98.82.120.188