Sampling is a process of periodic measurement which can take place in space or time and in several dimensions at once. Figure 4.1(a) shows that in temporal sampling the frequency of the signal to be sampled and the sampling rate Fs are measured in Hertz (Hz), the standard unit of temporal frequency. In still images such as photographs there is no temporal change and Figure 4.1(b) shows that the sampling is spatial. The sampling rate is now a spatial frequency. The absolute unit of spatial frequency is cycles-per-metre, although for imaging purposes cycles-permillimetre is more practical.
If the human viewer is considered, none of these units is useful because they don’t take into account the viewing distance. The acuity of the eye is measured in cycles per degree. As Figure 4.1(c) shows, a large distant screen subtends the same angle as a small nearby screen. Figure 4.1(c) also shows that the nearby screen, possibly a computer monitor, needs to be able to display a higher spatial frequency than a distant cinema screen to give the same sharpness perceived at the eye. If the viewing distance is proportional to size, both screens could have the same number of pixels, leading to the use of a relative unit, shown in (d), which is cyclesper- picture-height (cph) in the vertical axis and cycles-per-picture-width (cpw) in the horizontal axis.
The computer screen has more cycles-per-millimetre than the cinema screen, but in this example has the same number of cycles-per-pictureheight.
Spatial and temporal frequencies are related by the process of scanning as given by:
Temporal frequency = Spatial frequency × scanning velocity
Figure 4.2 shows that if the 1024 pixels along one line of an SVGA monitor were scanned in one tenth of a millisecond, the sampling clock frequency would be 10.24 MHz.
Sampling theory does not require regular sample spacing, but it is the most efficient arrangement. As a practical matter if regular sampling is employed, the process of timebase correction can be used to eliminate any jitter due to recording or transmission.
The sampling process originates with a pulse train which is shown in Figure 4.3(a) to be of constant amplitude and period. This pulse train can be temporal or spatial. The information to be sampled amplitudemodulates the pulse train in much the same way as the carrier is modulated in an AM radio transmitter. One must be careful to avoid overmodulating the pulse train as shown in (b) and this is achieved by suitably biasing the information waveform as at (c).
In the same way that AM radio produces sidebands or identical images above and below the carrier, sampling also produces sidebands although the carrier is now a pulse train and has an infinite series of harmonics as shown in Figure 4.4(a). The sidebands repeat above and below each harmonic of the sampling rate as shown in (b). The consequence of this is that sampling does not alter the spectrum of the baseband signal at all. The spectrum is simply repeated. Consequently sampling need not lose any information.
The sampled signal can be returned to the continuous domain simply by passing it into a low-pass filter. This filter has a frequency response which prevents the images from passing, and only the baseband signal emerges, completely unchanged. If considered in the frequency domain, this filter can be called an anti-image filter; if considered in the time domain it can be called a reconstruction filter. It can also be considered as a spatial filter if a sampled still image is being returned to a continuous image. Such a filter will be two-dimensional.
If an input is supplied having an excessive bandwidth for the sampling rate in use, the sidebands will overlap (Figure 4.4(c)) and the result is aliasing, where certain output frequencies are not the same as their input frequencies but instead become difference frequencies (d). It will be seen from Figure 4.4 that aliasing does not occur when the input bandwidth is equal to or less than half the sampling rate, and this derives the most fundamental rule of sampling, which is that the sampling rate must be at least twice the input bandwidth.
Nyquist1 is generally credited with being the first to point out the need for sampling at twice the highest frequency in the signal in 1928, although the mathematical proofs were given independently by Shannon2,3 and Kotelnikov. It subsequently transpired that Whittaker4 beat them all to it, although his work was not widely known at the time. One half of the sampling frequency is often called the Nyquist frequency.
Whilst aliasing has been described above in the frequency domain, it can be described equally well in the time domain. In Figure 4.5(a) the sampling rate is obviously adequate to describe the waveform, but at (b) it is inadequate and aliasing has occurred. In some cases there is no control over the spectrum of input signals and in this case it becomes necessary to have a low-pass filter at the input to prevent aliasing. This anti-aliasing filter prevents frequencies of more than half the sampling rate from reaching the sampling stage.
Figure 4.6 shows that all practical sampling systems consist of a pair of filters, the anti-aliasing filter before the sampling process and the reconstruction filter after it. It should be clear that the results obtained will be strongly affected by the quality of these filters which may be spatial or temporal according to the application.
Perfect reconstruction was theoretically demonstrated by Shannon as shown in Figure 4.7. The input must be band limited by an ideal linearphase low-pass filter with a rectangular frequency response and a bandwidth of one-half the sampling frequency. The samples must be taken at an instant with no averaging of the waveform. These instantaneous samples can then be passed through a second, identical filter which will perfectly reconstruct that part of the input waveform which was within the passband.
There are some practical difficulties in implementing Figure 4.7 exactly, but well-engineered systems can approach it and so it forms a useful performance target. It was shown in Chapter 3 that the impulse response of a linear-phase ideal low-pass filter is a sinx/x waveform, and this is repeated in Figure 4.8(a). Such a waveform passes through zero volts periodically. If the cut-off frequency of the filter is one-half of the sampling rate, the impulse passes through zero at the sites of all other samples. It can be seen from Figure 4.8(b) that at the output of such a filter, the voltage at the centre of a sample is due to that sample alone, since the value of all other samples is zero at that instant. In other words the continuous output waveform must pass through the tops of the input samples. In between the sample instants, the output of the filter is the sum of the contributions from many impulses (theoretically an infinite number), causing the waveform to pass smoothly from sample to sample.
It is a consequence of the band-limiting of the original anti-aliasing filter that the filtered analog waveform could only take one path between the sample. As the reconstruction filter has the same frequency response, the reconstructed output waveform must be identical to the original band-limited waveform prior to sampling. A rigorous mathematical proof of reconstruction can be found in Porat5 or Betts.6
Perfect reconstruction with a Nyquist sampling rate is a limiting condition which cannot be exceeded and can only be reached under ideal and impractical conditions. Thus in practice Nyquist rate sampling can only be approached. Zero-duration pulses are impossible and the ideal linear-phase filter with a vertical ‘brick-wall’ cut-off slope is impossible to implement. In the case of temporal sampling, as the slope tends to vertical, the delay caused by the filter goes to infinity. In the case of spatial sampling, sharp cut optical filters are impossible to build. Figure 4.9 shows that the spatial impulse response of an ideal lens is a symmetrical intensity function. Note that the function is positive only as the expression for intensity contains a squaring process. The negative excursions of the sinx/x curve can be handled in an analog or digital filter by negative voltages or numbers, but in optics there is no negative light. The restriction to positive only impulse response limits the sharpness of optical filters.
In practice real filters with finite slopes can still be used as shown in Figure 4.10. The cut-off slope begins at the edge of the required pass band, and because the slope is not vertical, aliasing will always occur. However it can be seen from Figure 4.10 that the sampling rate can be raised to drive aliasing products to an arbitrarily low level. The perfect reconstruction process still works, but the system is a little less efficient in information terms because the sampling rate has to be raised. There is no absolute factor by which the sampling rate must be raised. A figure of 10 per cent is typical in temporal sampling, although it depends upon the filters which are available and the level of aliasing products that are acceptable.
There is another difficulty which is that the requirement for linear phase means the impulse response of the filter must be symmetrical. In the time domain, such filters cannot be causal because the output has to begin before the input occurs. A filter with a finite slope has a finite window and so a linear-phase characteristic can be obtained by incorporating a delay of one-half the window period so that the filter can be causal. This concept was described in Chapter 3.
In practical sampling systems the sample impulse cannot be infinitely small in time or space. Figure 4.11 shows that real equipment may produce impulses whose possible shapes include rectangular and Gaussian. The result is an aperture effect where the frequency response of the sampling system is modified. The new response is the Fourier transform of the aperture function.
In the case where the pulses are rectangular, the proportion of the sample period occupied by the pulse is defined as the aperture ratio which is normally expressed as a percentage.
The case where the pulses have been extended in width to become equal to the sample period is known as a zero-order-hold (ZOH) system and has a 100 per cent aperture ratio as shown in Figure 4.12(a). This produces a waveform which is more like a staircase than a pulse train. To see how the use of ZOH compares with ideal Shannon reconstruction, it must be recalled that pulses of negligible width have a uniform spectrum and so the frequency response of the sampler and reconstructor is flat within the passband. In contrast, pulses of 100 per cent aperture ratio have a sinx/x spectrum which falls to a null at the sampling rate, and as a result is about 4 dB down at the Nyquist frequency as shown in Figure 4.12(b).
Figure 4.13(a) shows how ZOH is normally represented in texts with the pulses extending to the right of the sample. This representation is incorrect because it does not have linear phase as can be seen in (b). Figure 4.13(c) shows the correct representation where the pulses are extended symmetrically about the sample to achieve linear phase (d). This is conceptually easy if the pulse generator is considered to cause a half-sample-period delay relative to the original waveform. If the pulse width is stable, the reduction of high frequencies is constant and predictable, and an appropriate filter response shown in (e) can render the overall response flat once more. Note that the equalization filter in (e) is conceptually a low-pass reconstruction filter in series with an inverse sinx/x response.
An alternative in the time domain is to use resampling which is shown in Figure 4.14. Resampling passes the zero-order-hold waveform through a further synchronous sampling stage which consists of an analog switch which closes briefly in the centre of each sample period. The output of the switch will be pulses which are narrower than the original. If, for example, the aperture ratio is reduced to 50 per cent of the sample period, the first frequency response null is now at twice the sampling rate, and the loss at the edge of the pass band is reduced. As the figure shows, the frequency response becomes flatter as the aperture ratio falls. The process should not be carried too far, as with very small aperture ratios there is little energy in the pulses and noise can be a problem. A practical limit is around 12.5 per cent where the frequency response is virtually ideal.
It should be stressed that in real systems there will often be more than one aperture effect. The result is that the frequency responses of the various aperture effects multiply, which is the same as saying that their impulse responses convolve. Whatever fine words are used, the result is an increasing loss of high frequencies where a series of acceptable devices when cascaded produce an unacceptable result. This topic will be considered in Chapter 7 where high-resolution imaging systems are discussed.
In many systems, for reasons of economy or ignorance, reconstruction is simply not used and the system output is an unfiltered ZOH waveform. Figure 4.15 shows some examples of this kind of thing which are associated with the ‘digital look’. It is important to appreciate that in well-engineered systems containing proper filters there is no such thing as the digital look.
It is always instructive to consider other industries to see how familiar technologies are used for different purposes. Figure 4.16(a) shows the lines of a racing yacht. In order to describe the three-dimensional hull shape to the builders, the designer puts the hull shape through a giant conceptual bread slicer and supplies a drawing of each slice. Essentially the hull has been spatially sampled. The designer would place the samples or stations close enough together to ensure that the surface detail was fully conveyed. This is the equivalent of using a high enough sampling rate.
Imagine the designer’s reaction on being presented with an unfiltered ZOH hull shown in Figure 4.16(b) and being told that it was what he had asked for in the plans. This hull isn’t going to win any races. Instead shipbuilders use reconstruction to interpolate the shape of the hull between the stations. In some cases this was done by bending a lead bar called a spline so that it made a fair transition between the frames. To this day computer algorithms which interpolate in this way are called splines. In shipbuilding the use of perfect reconstruction from samples preceded Whittaker, let alone Shannon.
Using unfiltered or ZOH output in audio or imaging systems is poor practice and won’t win any races either. It is a pity that so many textbooks give the impression that this is how conversions should be performed.
The points at which samples are taken and re-created in time or space must be evenly spaced, otherwise unwanted signals can be added. In scanning systems, the scan must proceed with precisely uniform speed. Figure 4.17(a) shows the effect of noise on the vertical scan waveform of a CRT is spatial jitter of the vertical line positions. Figure 4.17(b) shows that another source of jitter is crosstalk or interference on the clock signal of a temporal sampling system, although a balanced clock line will be more immune to such crosstalk. The unwanted additional signal changes the time at which the sloping clock signal appears to cross the threshold voltage of the clock receiver.
Figure 4.18(a) shows the effect of sampling jitter on a sloping waveform. Samples are taken at the wrong times. When these samples have passed through a system, the timebase correction stage prior to the DAC will remove the jitter, and the result is shown at (b). The magnitude of the unwanted signal is proportional to the slope of the waveform and so the amount of jitter which can be tolerated falls at 6 dB per octave. As the resolution of the system is increased by the use of longer sample wordlength, tolerance to jitter is further reduced. The nature of the unwanted signal depends on the spectrum of the jitter. If the jitter is random, the effect is noise-like and relatively benign unless the amplitude is excessive.
Quantizing is the process of expressing some infinitely variable quantity by discrete or stepped values and turns up in a remarkable number of everyday guises. Figure 4.19 shows that an inclined ramp enables infinitely variable height to be achieved, whereas a step-ladder allows only discrete heights to be had. A step-ladder quantizes height. When accountants round off sums of money to the nearest pound or dollar they are quantizing. Time passes continuously, but the display on a digital clock changes suddenly every minute because the clock is quantizing time.
In audiovisual systems the values to be quantized are infinitely variable samples which can represent a voltage waveform, the brightness of a pixel and so on. Strict quantizing is a process which operates in a domain which is orthogonal to space or time and so it works in exactly the same way whether the samples have been taken spatially or temporally.
Figure 4.20(a) shows that the process of quantizing divides the voltage range up into quantizing intervals Q, also referred to as steps S. In applications such as telephony and video these may be of differing size, but for digital audio the quantizing intervals are made as identical as possible. If this is done, the binary numbers which result are truly proportional to the original analog voltage, and the digital equivalents of mixing and gain changing can be performed by adding and multiplying sample values. If the quantizing intervals are unequal this cannot be done accurately. When all quantizing intervals are the same, the term ‘uniform quantizing’ is used. The erroneous term ‘linear quantizing’ will also be found.
The term LSB (least significant bit) will also be found in place of quantizing interval in some treatments, but this is a poor term because quantizing works in the voltage domain. A bit is not a unit of voltage and can have only two values. In studying quantizing, voltages within a quantizing interval will be discussed, but there is no such thing as a fraction of a bit.
Whatever the exact voltage of the input signal, the quantizer will locate the quantizing interval in which it lies. In what may be considered a separate step, the quantizing interval is then allocated a code value which is typically some form of binary number. The information sent is the number of the quantizing interval in which the input voltage lay. Whereabouts that voltage lay within the interval is not conveyed, and this mechanism puts a limit on the accuracy of the information which a real quantizer may approach but not exceed. When the number of the quantizing interval is converted back to the analog domain, it will result in a voltage at the centre of the quantizing interval as this minimizes the magnitude of the error between input and output. The number range is limited by the wordlength of the binary numbers used. In an eight-bit system, 256 different quantizing intervals exist, wheres in a sixteen-bit system there are 65 536. To be strictly correct, the quantizing intervals at the end of the range are infinite because all voltages outside the working range will be expressed as one or other of the limits.
It is possible to draw a transfer function for such an ideal quantizer followed by an ideal DAC, and this is also shown in Figure 4.20. A transfer function is simply a graph of the output with respect to the input. In signal processing, when the term ‘linearity’ is used, this generally means the overall straightness of the transfer function. Linearity is a particular goal in audio, yet it will be seen that an ideal quantizer is anything but linear.
Figure 4.20(b) shows the transfer function is somewhat like a staircase, and blanking level is half-way up a quantizing interval, or on the centre of a tread. This is the so-called mid-tread quantizer which is universally used in video and audio. Figure 4.20(c) shows the alternative mid-riser transfer function which causes difficulty because it does not have a code value corresponding to black or silence and as a result the numerical code value is not proportional to the analog signal voltage.
Quantizing causes an error in the magnitude of the sample which is given by the difference between the actual staircase transfer function and the ideal straight line. This is shown in Figure 4.20(d) to be a sawtoothlike function which is periodic in Q. The amplitude cannot exceed ± 1/2 Q peak-to-peak unless the input is so large that clipping occurs.
When considering quantizing error it is important to avoid confusion with additional errors such as aperture effects. This treatment of quantizing error avoids that confusion by assuming Shannon point samples which have only magnitude in the domain they are measuring but no duration or size in the time or space domains. It is then correct to compare the quantized samples with the original samples to obtain what is effectively a sampled quantizing error waveform. To obtain the continuous quantizing error waveform the sample errors must be reconstructed. Any other approach gives misleading results.
This has been done in Figure 4.21. The curve is the input waveform and by definition the original samples lie on the curve. The horizontal lines in the drawing are the boundaries between the quantizing intervals, and by definition the quantized samples always reach the centre of a quantizing interval. The quantizing error is the difference in the two samples which is shown shaded. These quantizing errors are shown in (b) and can be thought of as samples of an unwanted signal which the quantizing process adds to the perfect original. The resulting continuous waveform due to this quantizing error is also shown in (b).
Quantizing error has some non-intuitive characteristics and it is dangerous to make assumptions about it. For example, Figure 4.22 shows that if a very small amplitude input signal remains within one quantizing interval, the quantizing error is the signal and the quantized signal is unmodulated.
As the transfer function is non-linear, ideal quantizing can cause distortion. As a result practical equipment deliberately uses non-ideal quantizers to achieve linearity. The quantizing error of an ideal quantizer is a complex function, and it has been researched in great depth.7–10 It is not intended to go into such depth here. The characteristics of an ideal quantizer will only be pursued far enough to convince the reader that such a device cannot be used in quality audiovisual applications.
As the magnitude of the quantizing error is limited, its effect can be minimized by making the signal larger. This will require more quantizing intervals and more bits to express them. The number of quantizing intervals multiplied by their size gives the quantizing range of the convertor. A signal outside the range will be clipped. Provided that clipping is avoided, the larger the signal, the less will be the effect of the quantizing error.
Where the input signal exercises the whole quantizing range and has a complex waveform (such as from a contrasty, detailed image or a complex piece of music), successive samples will have widely varying numerical values and the quantizing error on a given sample will be independent of that on others. In this case the size of the quantizing error will be distributed with equal probability between the limits. Figure 4.23(a) shows the resultant uniform probability density. In this case the unwanted signal added by quantizing is an additive broadband noise uncorrelated with the signal, and it is appropriate in this case to call it quantizing noise. This is not quite the same as thermal noise which has a Gaussian probability shown in Figure 4.23(b) (see section 1.7 for a treatment of statistics). The difference is of no consequence as in the large signal case the noise is masked by the signal. Under these conditions, a meaningful signal-to-noise ratio can be calculated as follows.
In a system using n-bit words. there will be 2n quantizing intervals. The largest sinusoid which can fit without clipping will have this peak-topeak amplitude. The peak amplitude will be half as great, i.e. 2n–1Q and the rms amplitude will be this value divided by.
The quantizing error has an amplitude of 1/2 Q peak which is the equivalent of Q/ rms. The signal-to-noise ratio for the large signal case is then given by:
By way of example, an eight-bit system will offer very nearly 50 dB SNR.
Whilst the above result is true for a large complex input waveform, treatments which then assume that quantizing error is always noise give results which are at variance with reality. The expression above is only valid if the probability density of the quantizing error is uniform. Unfortunately at low depths of modulations in audio and with flat fields or simple pictures in image portrayal this is not the case as is already known from Figure 4.21.
At low modulation depth, quantizing error ceases to be random, and becomes a function of the input waveform and the quantizing structure as Figure 4.21 showed. Once an unwanted signal becomes a deterministic function of the wanted signal, it has to be classed as a distortion rather than a noise. Distortion can also be predicted from the non-linearity, or staircase nature, of the transfer function. With a large signal, there are so many steps involved that we must stand well back, and a staircase with enough steps appears to be a slope. With a small signal there are few steps and they can no longer be ignored.
Distortion precludes the use of an ideal quantizer for high-quality work. There is little point in studying the adverse effects further as they should be and can be eliminated completely in practical equipment by the use of dither. The importance of correctly dithering a quantizer cannot be emphasized enough, since failure to dither irrevocably distorts the converted signal: there can be no process which will subsequently remove that distortion. The signal-to-noise ratio derived above has no relevance to practical applications as it will be modified by the dither.
At high signal levels, quantizing error is effectively noise. As the depth of modulation falls, the quantizing error of an ideal quantizer becomes more strongly correlated with the signal and the result is distortion, visible as contouring. If the quantizing error can be decorrelated from the input in some way, the system can remain linear but noisy. Dither performs the job of decorrelation by making the action of the quantizer unpredictable and gives the system a noise floor like an analog system.11,12
In one approach, pseudo-random noise (see Chapter 2) with rectangular probability and a peak-to-peak amplitude of Q was added to the input signal prior to quantizing, but was subtracted after reconversion to analog. This is known as subtractive dither and was investigated by Schuchman13 and much later by Sherwood.14 Subtractive dither has the advantages that the dither amplitude is non-critical, the noise has full statistical independence from the signal15 and has the same level as the quantizing error in the large signal undithered case.16 Unfortunately, it suffers from practical drawbacks, since the original noise waveform must accompany the samples or must be synchronously re-created at the DAC. This is virtually impossible in a system where the signal may have been edited or where its level has been changed by processing, as the noise needs to remain synchronous and be processed in the same way. All practical digital video systems use non-subtractive dither where the dither signal is added prior to quantization and no attempt is made to remove it at the DAC.17 The introduction of dither prior to a conventional quantizer inevitably causes a slight reduction in the signal-to-noise ratio attainable, but this reduction is a small price to pay for the elimination of non-linearities.
The ideal (noiseless) quantizer of Figure 4.21 has fixed quantizing intervals and must always produce the same quantizing error from the same signal. In Figure 4.24 it can be seen that an ideal quantizer can be dithered by linearly adding a controlled level of noise either to the input signal or to the reference voltage which is used to derive the quantizing intervals. There are several ways of considering how dither works, all of which are equally valid.
The addition of dither means that successive samples effectively find the quantizing intervals in different places on the voltage scale. The quantizing error becomes a function of the dither, rather than a predictable function of the input signal. The quantizing error is not eliminated, but the subjectively unacceptable distortion is converted into a broadband noise which is more benign to the ear.
Some alternative ways of looking at dither are shown in Figure 4.25. Consider the situation where a low-level input signal is changing slowly within a quantizing interval. Without dither, the same numerical code is output for a number of samples and the variations within the interval are lost. Dither has the effect of forcing the quantizer to switch between two or more states. The higher the voltage of the input signal within a given interval, the more probable it becomes that the output code will take on the next higher value. The lower the input voltage within the interval, the more probable it is that the output code will take the next lower value. The dither has resulted in a form of duty cycle modulation, and the resolution of the system has been extended indefinitely instead of being limited by the size of the steps.
Dither can also be understood by considering what it does to the transfer function of the quantizer. This is normally a perfect staircase, but in the presence of dither it is smeared horizontally until with a certain amplitude the average transfer function becomes straight.
Recent ADC technology allows the wordlength of audio and video samples to be raised far above the capability of early devices. The situation then arises that an existing recorder or channel needs to be connected to the output of an ADC with greater wordlength. The words need to be shortened in some way.
In signal processing, when a sample value is attenuated, the extra loworder bits which come into existence below the radix point preserve the resolution of the signal and the dither in the least significant bit(s) which linearizes the system. The same word extension will occur in any process involving multiplication, such as digital filtering. It will subsequently be necessary to shorten the wordlength. Low-order bits must be removed to reduce the resolution whilst keeping the signal magnitude the same. Even if the original conversion was correctly dithered, the random element in the low-order bits will now be some way below the end of the intended word. If the word is simply truncated by discarding the unwanted loworder bits or rounded to the nearest integer the linearizing effect of the original dither will be lost.
Shortening the wordlength of a sample reduces the number of quantizing intervals available without changing the signal amplitude. As Figure 4.26 shows, the quantizing intervals become larger and the original signal is requantized with the new interval structure. This will introduce requantizing distortion having the same characteristics as quantizing distortion in an ADC. It then is obvious that when shortening the wordlength of a ten-bit convertor to eight bits, the two low-order bits must be removed in a way that displays the same overall quantizing structure as if the original convertor had been only of eight-bit wordlength. It will be seen from Figure 4.26 that truncation cannot be used because it does not meet the above requirement but results in signaldependent offsets because it always rounds in the same direction. Proper numerical rounding is essential because it accurately simulates analog quantizing to the new interval size. Unfortunately the ten-bit convertor will have a dither amplitude appropriate to quantizing intervals one quarter the size of an eight-bit unit and the result will be highly nonlinear.
In practice, in addition to rounding, there must be a mechanism whereby the requantizing error is converted to noise rather than distortion. One technique which meets this requirement is to use digital dithering18 prior to rounding. This is directly equivalent to the analog dithering in an ADC.
Digital dither is a pseudo-random sequence of numbers. If it is required to simulate the analog dither signal of Figures 4.24 and 4.25, then it is obvious that the noise must be bipolar so that it can have an average voltage of zero. Two’s complement coding can be used for the dither values.
Figure 4.27 shows a simple digital dithering system (i.e. one without noise shaping) for shortening sample wordlength. The output of a two’s complement pseudo-random sequence generator (see Chapter 2) of appropriate wordlength is added to input samples prior to rounding. The most significant of the bits to be discarded is examined in order to determine whether the bits to be removed sum to more or less than half a quantizing interval. The dithered sample is either rounded down, i.e. the unwanted bits are simply discarded, or rounded up, i.e. the unwanted bits are discarded but one is added to the value of the new short word. The rounding process is no longer deterministic because of the added dither which provides a linearizing random component.
If this process is compared with that of Figure 4.24 it will be seen that the principles of analog and digital dither are identical; the processes simply take place in different domains using two’s complement numbers which are rounded or voltages which are quantized as appropriate. In fact quantization of an analog dithered waveform is identical to the hypothetical case of rounding after bipolar digital dither where the number of bits to be removed is infinite, and remains identical for practical purposes when as few as eight bits are to be removed. Analog dither may actually be generated from bipolar digital dither (which is no more than random numbers with certain properties) using a DAC.
The intention here is to treat the processes of analog and digital dither as identical except where differences need to be noted. The characteristics of the noise used are rather important for optimal performance, although many sub-optimal but nevertheless effective systems are in use. The main parameters of interest are the peak-to-peak amplitude, the amplitude probability distribution function (pdf) and the spectral content.
The most comprehensive study of non-subtractive dither is due to Vanderkooy and Lipshitz17–19 and the treatment here is based largely upon their work.
Chapter 2 showed that the simplest form of dither (and therefore the easiest to generate) is a single sequence of random numbers which have uniform or rectangular probability. The amplitude of the dither is critical.
Figure 4.28(a) shows the time-averaged transfer function of one quantizing interval in the presence of various amplitudes of rectangular dither. The linearity is perfect at an amplitude of 1Q peak-to-peak and then deteriorates for larger or smaller amplitudes. The same will be true of all levels which are an integer multiple of Q. Thus there is no freedom in the choice of amplitude.
With the use of such dither, the quantizing noise is not constant. Figure 4.28(b) shows that when the analog input is exactly centred in a quantizing interval (such that there is no quantizing error) the dither has no effect and the output code is steady. There is no switching between codes and thus no noise. On the other hand when the analog input is exactly at a riser or boundary between intervals, there is the greatest switching between codes and the greatest noise is produced. Mathematically speaking, the first moment, or mean error is zero but the second moment, which in this case is equal to the variance, is not constant. From an engineering standpoint, the system is linear but suffers noise modulation: the noise floor rises and falls with the signal content and this is audible in the presence of low-frequency signals.
The dither adds an average noise amplitude of Q/ rms to the quantizing noise of the same level. In order to find the resultant noise level it is necessary to add the powers as the signals are uncorrelated. The total power is given by:
and the rms voltage is Q/. Another way of looking at the situation is to consider that the noise power doubles and so the rms noise voltage has increased by 3 dB in comparison with the undithered case. Thus for an n-bit wordlength, using the same derivation as expression (4.1) above, the signal-to-noise ratio for Q peak-to-peak rectangular dither will be given by:
Unlike the undithered case, this is a true signal-to-noise ratio and linearity is maintained at all signal levels. By way of example, for a ten-bit system nearly 59 dB signal-to-noise ratio is achieved. The 3 dB loss compared to the undithered case is a small price to pay for linearity.
The noise modulation due to the use of rectangular-probability dither is undesirable. It comes about because the process is too simple. The undithered quantizing error is signal dependent and the dither represents a single uniform-probability random process. This is only capable of decorrelating the quantizing error to the extent that its mean value is zero, rendering the system linear. The signal dependence is not eliminated, but is displaced to the next statistical moment. This is the variance and the result is noise modulation. If a further uniformprobability random process is introduced into the system, the signal dependence is displaced to the next moment and the second moment or variance becomes constant.
Adding together two statistically independent rectangular probability functions produces a triangular probability function. A signal having this characteristic can be used as the dither source.
Figure 4.28(c) shows the averaged transfer function for a number of dither amplitudes. Linearity is reached with a peak-to-peak amplitude of 2Q and at this level there is no noise modulation. The lack of noise modulation is another way of stating that the noise is constant. The triangular pdf of the dither matches the triangular shape of the quantizing error function.
The dither adds two noise signals with an amplitude of Q/ rms to the quantizing noise of the same level. In order to find the resultant noise level it is necessary to add the powers as the signals are uncorrelated. The total power is given by:
and the rms voltage is Q/. Another way of looking at the situation is to consider that the noise power is increased by 50 per cent in comparison to the rectangular dithered case and so the rms noise voltage has increased by 1.76 dB. Thus for an n-bit wordlength, using the same derivation as expressions (4.1) and (4.2) above, the signal-to-noise ratio for Q peak-to-peak rectangular dither will be given by:
Continuing the use of a ten-bit example, a signal-to-noise ratio of 57.2 dB is available which is 4.8 dB worse than the SNR of an undithered quantizer in the large signal case. It is a small price to pay for perfect linearity and an unchanging noise floor.
Adding more uniform probability sources to the dither makes the overall probability function progressively more like the Gaussian distribution of analog noise. Figure 4.28(d) shows the averaged transfer function of a quantizer with various levels of Gaussian dither applied. Linearity is reached with 1/2Q rms and at this level noise modulation is negligible. The total noise power is given by:
and so the noise level will be Q/ rms. The noise level of an undithered quantizer in the large signal case is Q/ and so the noise is higher by a factor of:
Thus the signal-to-noise ratio is given by
A ten-bit system with correct Gaussian dither has a signal-to-noise ratio of 56 dB.
This is inferior to the figure in expression (4.3) by 1.1 dB. In digital dither applications, triangular probability dither of 2Q peak-to-peak is optimum because it gives the best possible combination of nil distortion, freedom from noise modulation and signal-to-noise ratio. Using dither with more than two rectangular processes added is detrimental. Whilst this result is also true for analog dither, it is not practicable to apply it to a real ADC as all real analog signals contain thermal noise which is Gaussian. If triangular dither is used on a signal containing Gaussian noise, the results derived above are not obtained. ADCs should therefore use Gaussian dither of Q/2 rms and performance will be given by expression (4.4).
This direction of conversion will be discussed first, since ADCs often use embedded DACs in feedback loops.
The purpose of a digital-to-analog convertor is to take numerical values and reproduce the continuous electrical waveform that they represent. Figure 4.29 shows the major elements of a conventional conversion subsystem, i.e. one in which oversampling is not employed. The jitter in the clock needs to be removed with a VCO or VCXO. Sample values are buffered in a latch and fed to the convertor element which operates on each cycle of the clean clock. The output is then a voltage proportional to the number for at least a part of the sample period. A resampling stage may be found next, in order to remove switching transients, reduce the aperture ratio or allow the use of a convertor which takes a substantial part of the sample period to operate. The resampled waveform is then presented to a reconstruction filter which rejects frequencies above the audio band.
This section is primarily concerned with the implementation of the convertor element. The most common way of achieving this conversion is to control binary-weighted currents and sum them in a virtual earth. Figure 4.30 shows the classical R–2R DAC structure. This is relatively simple to construct, but the resistors have to be extremely accurate. To see why this is so, consider the example of Figure 4.31. At (a) the binary code is about to have a major overflow, and all the low-order currents are flowing. At (b), the binary input has increased by one, and only the most significant current flows. This current must equal the sum of all the others plus one. The accuracy must be such that the step size is within the required limits. In this eight-bit example, if the step size needs to be a rather casual 10 per cent accurate, the necessary accuracy is only one part in 2560, but for a ten-bit system it would become one part in 10 240. This degree of accuracy is difficult to achieve and maintain in the presence of ageing and temperature change.
The input to an ADC is a continuous-time, continuous-voltage waveform, and this is converted into a discrete-time, discrete-voltage format by a combination of sampling and quantizing. As these two processes are orthogonal they are totally independent and can be performed in either order. Figure 4.32(a) shows an analog sampler preceding a quantizer, whereas (b) shows an asynchronous quantizer preceding a digital sampler. Ideally, both will give the same results; in practice each has different advantages and suffers from different deficiencies. Both approaches will be found in real equipment.
The general principle of a quantizer is that different quantized voltages are compared with the unknown analog input until the closest quantized voltage is found. The code corresponding to this becomes the output. The comparisons can be made in turn with the minimal amount of hardware, or simultaneously with more hardware.
The flash convertor is probably the simplest technique available for PCM video conversion. The principle is shown in Figure 4.33. The threshold voltage of every quantizing interval is provided by a resistor chain which is fed by a reference voltage.
This reference voltage can be varied to determine the sensitivity of the input. There is one voltage comparator connected to every reference voltage, and the other input of all the comparators is connected to the analog input. A comparator can be considered to be a one-bit ADC. The input voltage determines how many of the comparators will have a true output. As one comparator is necessary for each quantizing interval, then, for example, in an eight-bit system there will be 255 binary comparator outputs, and it is necessary to use a priority encoder to convert these to a binary code.
Note that the quantizing stage is asynchronous; comparators change state as and when the variations in the input waveform result in a reference voltage being crossed. Sampling takes place when the comparator outputs are clocked into a subsequent latch. This is an example of quantizing before sampling as was illustrated in Figure 4.32. Although the device is simple in principle, it contains a lot of circuitry and can only be practicably implemented on a chip. The analog signal has to drive many inputs which results in a significant parallel capacitance, and a lowimpedance driver is essential to avoid restricting the slewing rate of the input. The extreme speed of a flash convertor is a distinct advantage in oversampling. Because computation of all bits is performed simultaneously, no track/hold circuit is required, and droop is eliminated. Figure 4.33(c) shows a flash convertor chip. Note the resistor ladder and the comparators followed by the priority encoder. The MSB can be selectively inverted so that the device can be used either in offset binary or two’s complement mode.
The flash convertor is ubiquitous in digital video because of the high speed necessary. For audio purposes, many more conversion techniques are available and these are considered in Chapter 5.
1. Nyquist, H., Certain topics in telegraph transmission theory. AIEE Trans, 617–644 (1928)
2. Shannon, C.E., A mathematical theory of communication. Bell Syst. Tech. J., 27, 379 (1948)
3. Jerri, A.J., The Shannon sampling theorem – its various extensions and applications: a tutorial review. Proc. IEEE, 65, 1565–1596 (1977)
4. Whittaker, E.T., On the functions which are represented by the expansions of the interpolation theory. Proc. R. Soc. Edinburgh, 181–194 (1915)
5. Porat, B., A Course in Digital Signal Processing, New York: John Wiley (1996)
6. Betts, J.A., Signal Processing Modulation and Noise, Chapter 6, Sevenoaks: Hodder and Stoughton (1970)
7. Bennett, W. R., Spectra of quantized signals. Bell System Tech. Journal, 27, 446–472 (1948)
8. Widrow, B., Statistical analysis of amplitude quantized sampled-data systems. Trans. AIEE, Part II, 79, 555–568 (1961)
9. Lipshitz, S.P., Wannamaker, R.A. and Vanderkooy, J., Quantization and dither: a theoretical survey. J. Audio Eng. Soc., 40, 355–375 (1992)
10. Maher, R.C., On the nature of granulation noise in uniform quantization systems. J. Audio Eng. Soc., 40, 12–20 (1992)
11. Goodall, W. M., Television by pulse code modulation. Bell System Tech. Journal, 30, 33–49 (1951)
12. Roberts, L. G., Picture coding using pseudo-random noise. IRE Trans. Inform. Theory, IT- 8, 145–154 (1962)
13. Schuchman, L., Dither signals and their effect on quantizing noise. Trans. Commun. Technol., COM-12, 162–165 (1964)
14. Sherwood, D.T., Some theorems on quantization and an example using dither. In Conf. Rec., 19th Asilomar Conference on circuits, systems and computers, Pacific Grove, CA (1985)
15. Lipshitz, S.P., Wannamaker, R.A. and Vanderkooy, J., Quantization and dither: a theoretical survey. J. Audio Eng. Soc., 40, 355–375 (1992)
16. Gerzon, M. and Craven, P.G., Optimal noise shaping and dither of digital signals. Presented at 87th Audio Eng. Soc. Conv., New York (1989), Preprint No. 2822 (J-1)
17. Vanderkooy, J. and Lipshitz, S.P., Resolution below the least significant bit in digital systems with dither. J. Audio Eng. Soc., 32, 106–113 (1984)
18. Vanderkooy, J. and Lipshitz, S.P., Digital dither. Presented at 81st Audio Eng. Soc. Conv., Los Angeles (1986), Preprint 2412 (C-8)
19. Vanderkooy, J. and Lipshitz, S.P., Digital dither. In Audio in Digital Times, New York: AES (1989)
35.171.45.182