Sound reproduction can only achieve the highest quality if every step is carried out to an adequate standard. This requires a knowledge of what the standards are and the means to test whether they are being met, as well as some understanding of psychology and a degree of tolerance!
In principle, quality can be lost almost anywhere in the audio chain whether by poor interconnections, the correct use of poorly designed equipment or incorrect use of good equipment. The best results will only be obtained when good equipment is used and connected correctly and frequently tested or monitored.
Figure 13.1 shows a representative digital audio system which contains all the components needed to go from original sound to reproduced sound. Like any chain, the system is only as good as its weakest link. It should be clear that a significant number of analog stages remain, especially in the area of the transducers. There is nothing fundamentally wrong with analog equipment, provided that it is engineered to the appropriate quality level. It is regrettable that the enormous advance in signal performance which digital techniques have brought to recording, processing and delivery have largely not been paralleled by advances in subjective or objective testing.
It ought to be possible, even straightforward, to monitor every stage of Figure 13.1 to see if its performance is adequate. If adequate stages are retained and inadequate stages are improved, the entire system can be improved. In practice this is not as easy as it seems. As will be seen in this chapter, the audio industry has not established standards by which the entire system of Figure 13.1 can be objectively measured in a way which relates to what it will sound like. The measurements techniques which exist are incomplete and the criteria to be met are not established.
Figure 13.2 shows that the human hearing system operates in three domains; time, frequency and space. In real life we know where a sound source is, when it made a sound and the timbral character of the sound. If a high-quality reproduction is to be achieved, it is necessary to test the equipment in all these domains to ensure that it meets or exceeds the accuracy of the human ear. The frequency domain is reasonably well served, but the time and space domains are still suffering serious neglect.
The ultimate criterion for sound reproduction is that the human ear is fooled by the overall system into thinking it has heard the real thing. It follows that even to approach that ideal, the properties of human hearing have to form the basis for design judgements of every part of the audio chain. It is generally assumed that the more accurate some aspect of a reproduction system is, the more realistic it will sound, but this doesn’t follow. In practice this assumption is only true if the accuracy is less than the accuracy of the hearing system. Once some aspect of an audio system exceeds the accuracy of the hearing system, further improvement is not only unnecessary, but it diverts effort away from other areas where it could be more useful.
Figure 13.3(a) shows some criteria by which audio accuracy can be assessed. In each area, it should be possible to measure the requirements of the listener using whatever units are appropriate in order to create a multi-criterion threshold. Figure 13.3(b) shows that these criteria must be met equally despite the sum of all degradations in every stage of the audio chain from microphone to speaker. Figure 13.3(c) shows that a balance must be reached in two dimensions so that each criterion is given equal attention in every stage through which the signal passes. This gives the best value for money whatever standard is achieved. Figure (d) shows a more common approach which is where the excess quality of some parts of the system is wasted because other weak parts dominate the impression of the listener. The most commonly found error is that the electronic aspects of audio systems are overspecified whilst the transducers are underspecified.
The descriptions of time, frequency and space carried in an audio system represent information, and an an analysis of audio system quality falls within the scope of information theory. In the digital domain, the information rate is fixed by the wordlength and the sampling rate. The wordlength determines how many different conditions can be described by each sample. For example a sixteen-bit sample can have 65 536 different values.
In order to reach this performance, every item in the chain needs to have the same information capacity. Recent ADCs and DACs approach this performance, but transducers in general and loudspeakers in particular don’t.
Any audio device, analog or digital, can be modelled as an information channel of finite capacity whose equivalent bit rate can be measured or calculated. This can also be done with the loudspeaker. This equivalent bit rate relates to the realism which the speaker can achieve.
When the speaker information capacity is limited, the presence of an earlier restriction in the signal being monitored may go unheard and it may erroneously be assumed that the signal is ideal when in fact it is not.
The use of poor loudspeakers simply enables other poor audio devices to enter service. When most loudspeakers have such poor information capacity, how can they be used to assess this capacity in earlier components in the audio chain? ADCs, DACs, pre- and power amplifiers and compression codecs can only meaningfully be assessed on speakers of adequate information capacity. It also follows that the definition of a high-quality speaker is one which readily reveals compression artifacts.
Non-ideal loudspeakers act like compressors in that the distortions, delayed resonances and delayed reradiation they create conceal or mask information in the original audio signal. If a real compressor is tested with non-ideal loudspeakers certain deficiencies of the compressor will not be heard. Others, notably the late Michael Gerzon, have correctly suggested that compression artifacts which are inaudible in mono may be audible in stereo. The spatial compression of non-ideal stereo loudspeakers conceals real spatial compression artifacts.
The ear is a lossy device because it exhibites masking. Not all the presented sound is sensed. If a lossy loudspeaker is designed to a high standard, the losses may be contained to areas which are masked by the ear and then that loudspeaker would be judged transparent. Douglas Self has introduced the term ‘blameless’ for a device whose imperfections are undetectable; an approach which commands respect. However, the majority of legacy loudspeakers are not in this category. Audible defects are intoduced into the reproduced sound in frequency, time and spatial domains, giving the loudspeaker a kind of character which is best described as a signature or footprint.
An audio waveform is simply a voltage changing with time within a limited spectrum, and a digital audio waveform is simply a number changing its value at the sampling rate. As a result any error introduced by a given device can in principle be extracted and analysed. If this approach is used during the design process the performance of any unit can be refined until the error is inaudible.
Naturally the determination of what is meant by inaudible has to be done carefully using valid psychoacoustic experiments. Using such subjective results it is possible to design objective tests which will measure different aspects of the signal to determine if it is acceptable. With care and precision loudspeakers it is equally valid to use listening tests where the reproduced sound is compared with that of other audio systems or of live performances. In fact the combination of careful listening tests with objective technical measurements is the only way to achieve outstanding results. The reason is that the ear can be thought of as making all tests simultaneously. If any aspect of the system has been overlooked the ear will detect it. Then objective tests can be made to find the deficiency the ear detected and remedy it.
If two pieces of equipment consistently measure the same but sound different, the measurement technique must be inadequate. In general, the audio industry survives on inadequate measurement in the misguided belief that the ear has some mysterious power to detect things that can never be measured.
A further difficulty in practice is that the ideal combination of subjective and objective testing is not achieved as often as might be thought. Unfortunately the audio industry represents one of the few remaining opportunities to find employment without qualifications. Given the combination of advanced technologies and marginal technical knowledge, it should be no surprise that the audio industry periodically produces theories which are at variance with scientific knowledge.
People tend to be divided into two camps where audio quality is concerned.
Subjectivists are those who listen at length to both live and recorded music and can detect quite small defects in audio systems. Unfortunately most subjectivists have little technical knowledge and are quite unable to convert their detection of a problem into a solution. Although their perception of a problem may be genuine, their hypothesis or proposed solution may require the laws of physics to be altered. A particular problem with subjective testing is the avoidance of bias. Technical knowledge is essential to understand the importance of experimental design. Good experimental design is important to ensure that only the parameter to be investigated changes so that any difference in the result can only be due to the change. If something else is unwittingly changed the experiment is void. It is also important to avoiding bias. Statistical analysis is essential to determine the significance of the results, i.e. the degree of confidence that the results are not due to chance alone. As most subjectivists lack such technical knowledge it is hardly surprising that a lot of deeply held convictions about audio are simply due to unwitting bias where repeatable results are simply not obtained.
A classic example of bias is the understandable tendency of the enthusiast who has just spent a lot of money on a device which has no audible effect whatsoever to ‘hear’ an improvement.
Another problem with subjectivism is caused by those who don’t regularly listen to live music. They become imprinted on the equipment they normally use and subconsciously regard it as ‘correct’. Any other equipment to which they are exposed will automatically be judged incorrect, even if it is technically superior. On the introduction of FM radio, with 15 kHz bandwidth, broadcasters received complaints that the audio was too shrill or bright. In comparison with the 7 kHz of AM radio it was!
The author loaned an experimental loudspeaker with a ruler-flat frequency response to an experienced sound engineer. It was returned with the complaint that the response had a peak. The engineer even estimated the frequency of the peak. It was precisely at the crossover frequency of the speakers he normally uses, which have a notorious dip in power response.
It is extremely difficult to make progress when experienced people acting in good faith make statements which are completely incorrect, but exposure to the audio industry will show that this is surprisingly common.
Objectivists are those who make a series of measurements and then pronounce a system to have no audible defect. They frequently have little experience of live performance or critical listening. One of the most frequent mistakes made by objectivists is to assume that because a piece of equipment passes a given test under certain conditions then it is ideal. This is simply untrue for several reasons. The criteria by which the equipment is considered to pass or fail may be incorrect or inappropriate. The equipment might fail other tests or the same test under other conditions. In some cases the tests which equipment might fail have yet to be designed.
Not surprisingly, the same piece of equipment can be received quite differently by the two camps. The introduction of transistor audio amplifiers caused an unfortunate and audible step backwards in audio quality. The problem was that vacuum-tube amplifiers were generally operated in Class A and had low distortion except when delivering maximum power. Consequently valve amplifiers were tested for distortion at maximum power and in a triumph of tradition over reason transistor amplifiers were initially tested in the same way. However, transistor amplifiers generally work in Class B and produce more distortion at very low levels due to crossover between output devices. Naturally this distortion was not detectable on a high-power distortion test. Early transistor amplifiers sounded dreadful at low level and it is astonishing that this was not detected before they reached the market when the subjectivists rightly gave them a hard time.
Whilst the objectivists looked foolish over the crossover distortion issue, subjectivists have no reason to crow with their fetishes for gold-plated AC power plugs, special feet for equipment, exotic cables and mysterious substances to be applied to Compact Discs.
The only solution to the subjectivist/objectivist schism is to arrange them in pairs and bang their heads together.
Objective testing consists of making measurements which indicate the accuracy to which audio signals are being reproduced. When the measurements are standardized and repeatable they do form a basis for comparison even if per se they do not give a complete picture of what a device-under-test (DUT) or system will sound like.
There is only one symptom of quality loss which is where the reproduced waveform differs from the original. For convenience the differences are often categorized.
Any error in an audio waveform can be considered as an unwanted signal which has been linearly added to the wanted signal. Figure 13.4 shows that there are only two classes of waveform error. The first is where the error is not a function of the audio signal, but results from some uncorrelated process. This is the definition of noise. The second is where the error is a direct function of the audio signal which is the definition of distortion.
Noise can be broken into categories according to its characteristics. Noise due to thermal and electronic effects in components or analog tape hiss has essentially stationary statistics and forms a constant background which is subjectively benign in comparison with most other errors. Noise can be periodic or impulsive. Power frequency-related hum is often rich in harmonics. Interference due to electrical switching or lightning is generally impulsive. Crosstalk from other signals is also noise in that it does not correlate with the wanted signal. An exception is crosstalk between the signals in stereo or surround systems.
Distortion is a signal-dependent error and has two main categories. Non-linear distortion arises because the transfer function is not straight. The result in analog parts of audio systems is harmonic distortion where new frequencies which are integer multiples of the signal frequency are added, changing the spectrum of the signal. In digital parts of systems non-linearity can also result in anharmonic distortion because harmonics above half the sampling rate will alias. Non-linear distortions are subjectively the least acceptable as the resulting harmonics are frequently not masked, especially on pure tones.
Linear distortion is a signal-dependent error in which different frequencies propagate at different speeds due to lack of phase linearity. In complex signals this has the effect of altering the waveform but without changing the spectrum. As no harmonics are produced, this form of distortion is more benign than non-linear distortion. Whilst a completely linear phase system is ideal, the finite phase accuracy of the ear means that in practice a minimum phase system is probably good enough. Minimum phase implies that phase error changes smoothly and continuously over the audio band without any sudden discontinuities. Loudspeakers often fail to achieve minimum phase, with legacy techniques such as reflex tuning and passive crossovers being particularly unsatisfactory.
Figure 13.5(a) shows that the signal-to-noise ratio (SNR) is the ratio in dB between the largest amplitude undistorted signal the DUT can pass and the amplitude of the output with no input whatsoever, which is presumed to be due to noise. The spectrum of the noise is as important as the level. When audio signals are present, auditory masking occurs which reduces the audibility of the noise. Consequently the noise floor is most significant during extremely quiet passages or in pauses. Under these conditions the threshold of hearing is extremely dependent on frequency.
A measurement which more closely resembles the effect of noise on the listener is the use of an A-weighting filter prior to the noise level measurement stage. The result is then measured in dB(A).
Just because a DUT measures a given number of dB of SNR does not guarantee that SNR will be obtained in use. The measured SNR is only obtained when the DUT is used with signals of the correct level. Figure 13.5(b) shows that if the DUT is installed in a system where the input level is too low, the SNR of the output will be impaired. Consequently in any system where this is likely to happen the SNR of the equipment must exceed the required output SNR by the amount by which the input level is too low. This is the reason why quality mixing consoles offer apparently phenomenal SNRs.
Often the operating level is deliberately set low to provide headroom so that occasional transients are undistorted. The art of quality audio production lies in setting the level to the best compromise between elevating the noise floor and increasing the occurrences of clipping.
Another area in which conventional SNR measurements are meaningless is where gain ranging or floating point coding is used. With no signal the system switches to a different gain range and an apparently high SNR is measured which does not correspond to the subjective result.
The frequency response of audio equipment is measured by the system shown in Figure 13.6(a). The same level is input at a range of frequencies and the output level is measured. The end of the frequency range is considered to have been reached where the output level has fallen by 3 dB with respect to the maximum level. The correct way of expressing this measurement is:
Frequency response: –3 dB, 20 Hz – 20 kHz
or similar. If the level limit is omitted, as it often is, the figures are meaningless.
There is a seemingly endless debate about how much bandwidth is necessary in analog audio and what sampling rate is needed in digital audio. There is no one right answer as will be seen. In analog systems, there is generation loss. Figure 13.6(b) shows that two identical DUTs in series will cause a loss of 6 dB at the frequency limits. Depending on the shape of the roll-off, the –3 dB limit will be reached over a narrower frequency range. Conversely if the original bandwidth is to be maintained, then the –3 dB range of each DUT must be wider.
In analog production systems, the number of different devices an audio signal must pass through is quite large. Figure 13.6(c) shows the signal chain of a multi-track-produced vinyl disk. The number of stages involved mean that if each stage has a seemingly respectable –3 dB, 20 Hz – 20 kHz response, the overall result will be dreadful with a phenomenal rate of roll-off at the band edge. The only solution is that each item in the chain has to have wider bandwidth making it considerably overspecified in a single-generation application.
Another factor is phase response. At the –3 dB point of a DUT, if the response is limited by a first-order filtering effect, the phase will have shifted by 45°. Clearly in a multi-stage system these phase shifts will add. An eight-stage system, not at all unlikely, will give a complete phase rotation as the band-edge is approached. The phase error begins a long way before the –3 dB point, preventing the system from displaying even a minimum phase characteristic.
Consequently in complex analog audio systems each stage must be enormously overspecified in order to give good results after generation loss. It is not unknown for mixing consoles to respond down to 6 Hz in order to prevent loss of minimum phase in the audible band. Obviously such an extended frequency response on its own is quite inaudible, but when cascaded with other stages, the overall result will be audible.
In the digital domain there is no generation loss if the numerical values of the samples are not altered. Consequently digital data can be copied from one tape to another, or to a Compact Disc without any quality loss whatsoever. Simple digital manipulations, such as level control, do not impair the frequency or phase response and, if well engineered, the only loss will be a slight increase in the noise floor. Consequently digital systems do not need overspecified bandwidth. The bandwidth needs only to be sufficient for the application because there is nothing to impair it. However, those brought up on the analog tradition of overspecified bandwidth find this hard to believe.
In a digital system the bandwidth and phase response is defined at the anti-aliasing filter in the ADC. Early anti-aliasing filters had such dreadful phase response that the aliasing might have been preferable, but this has been overcome in modern oversampled convertors which can be highly phase-linear.
One simple but good way of checking frequency response and phase linearity is squarewave testing. A squarewave contains indefinite harmonics of a known amplitude and phase relationship. If a squarewave is input to an audio DUT, the characteristics can be assessed almost instantly. Figure 13.7 shows some of the defects which can be isolated. (a) shows inadequate low-frequency response causing the horizontal sections of the waveform to droop. (b) shows poor high-frequency response in conjunction with poor phase linearity which turns the edges into exponential curves. (c) shows a phase-linear system of finite bandwidth, e.g. a good anti-aliasing filter. Note that the transitions are symmetrical with equal pre- and post-ringing. This is one definition of phase linearity. (d) shows a system with wide bandwidth but poor HF phase response. Note the asymmetrical ringing.
One still hears from time to time that squarewave testing is illogical because squarewaves never occur in real life. The explanation is simple. Few would argue that any sine wave should come out of a DUT with the same amplitude and no phase shift. A linear audio system ought to be able to pass any number of superimposed signals simultaneously. A squarewave is simply one combination of such superimposed sine waves. Consequently if an audio system cannot pass a squarewave as shown in Figure 13.7(c) then it will cause a problem with real audio.
As linearity is extremely important in audio, relevant objective linearity testing is vital. Real sound consists of many different contributions from different sources which all superimpose in the sound waveform reaching the ear. If an audio system is not capable of carrying an indefinite number of superimposed sounds without interaction then it will cause an audible impairment. Interaction between a single waveform and a non-linear system causes distortion. Interaction between waveforms in a non-linear system is called intermodulation distortion (IMD) whose origin is shown in Figure 13.8(a). As the transfer function is not straight, the low-frequency signal has the effect of moving the high-frequency signal to parts of the transfer function where the slope differs. This results in the high frequency being amplitude modulated by the low. The amplitude modulation will also produce sidebands. Clearly a system which is perfectly linear will be free of both types of distortion.
Figure 13.8(b) shows a simple harmonic distortion test. A low-distortion oscillator is used to inject a clean sine wave into the DUT. The output passes through a switchable sharp ‘notch’ filter which rejects the fundamental frequency. With the filter bypassed, an AC voltmeter is calibrated to 100 per cent. With the filter in circuit, any remaining output must be harmonic distortion or noise and the measured voltage is expressed as a percentage of the calibration voltage. The correct way of expressing the result is as follows:
THD + N at 1 kHz = 0.1%
or similar. A stringent test would repeat the test at a range of frequencies. There is not much point in conducting THD + N tests at high frequencies as the harmonics will be beyond the audible range.
The THD + N measurement is not particularly useful in high-quality audio because it only measures the amount of distortion and tells nothing about its distribution. A vacuum-tube amplifier displaying 0.1 per cent distortion may sound very good indeed, whereas a transistor amplifier or a DAC with the same distortion figure will sound awful. This is because the vacuum-tube amplifier produces primarily low-order harmonics, which some listeners even find pleasing, whereas transistor amplifiers and digital devices can produce higher-order harmonics, which are unpleasant.
Very high-quality audio equipment has a characteristic whereby the equipment itself seems to recede, leaving only the sound. This will only happen when the entire reproduction chain is sufficiently free of any characteristic footprint which it impresses on the sound. The term resolution is used to describe this ability. Audio equipment which offers high resolution appears to be free of distortion products which are simply not measured by THD + N tests. Figure 13.9(a) shows the spectrum of a sine wave emerging from an ideal audio system. Figure 13.9(b) shows the spectrum of a low-resolution signal. Note the presence of sidebands around the original signal. Analog tape displays this characteristic, which is known as modulation noise. The Compact Cassette is notorious for a high level of modulation noise and poor resolution.
Analog circuitry can have the same characteristic. Figure 13.9(c) shows that signal or power amplifiers using negative feedback to linearize the output do so by pushing the non-linearities into the power rails. If the circuit board layout is poor, the power rail distortion can enter the signal path through common impedances.
In Chapter 4 the subject of sampling clock jitter was introduced. The effect of sampling clock jitter is to produce sidebands of the kind shown in Figure 13.9(b). This will be considered further in section 13.7.
Few people realize that loudspeakers can also display the effect of Figure 13.9(b). Traditional loudspeakers use ferrite magnets for economy. However, ferrite is an insulator and so there is nothing to stop the magnetic field moving within the magnet due to the Newtonian reaction to the coil drive force. Figure 13.10(a) shows that when the coil is quiescent, the lines of flux are symmetrically disposed about the coil turns, but when coil current flows, as in (b), the flux must be distorted in order to create a thrust. In magnetic materials the magnetic field can only move by the motion of domain walls and this is a non-linear process. The result in a conductive magnet is flux modulation and Barkhausen noise. The flux modulation and noise make the transfer function of the transducer non-linear and result in intermodulation.
The author did not initially believe the results of estimates of the magnitude of the problem, which showed that ferrite magnets cannot reach the sixteen-bit resolution of CD. Consequently two designs of tweeter were built, identical except for the magnet. The one with the conductive neodymium magnet has audibly higher resolution, approaching that of an electrostatic transducer, which, of course, has no magnet at all.
Given the damaging effect on realism caused by sidebands, a more meaningful approach than THD + N testing is to test for intermodulation distortion. There are a number of ways of conducting intermodulation tests, and, of course, anyone who understands the process can design a test from first principles. An early method standardized by the SMPTE was to exploit the amplitude modulation effect in two widely spaced frequencies, typically 70 Hz with 7 kHz added at one tenth the amplitude. 70 Hz is chosen because it is above the hum due to 50 or 60 Hz power. Figure 13.8(c) shows that the measurement is made by passing the 7 kHz region through a bandpass filter and recovering the depth of amplitude modulation with a demodulator.
A more stringent test for the creation of sidebands is where two high frequencies with a small frequency difference are linearly added and used as the input to the DUT. For example, 19 and 20 kHz will produce a 1 kHz difference or beat frequency if there is non-linearity as shown in Figure 13.8(d). With such a large difference between the input and the beat frequency it is easy to produce a 1 kHz bandpass filter which rejects the inputs. The filter output is a measure of IMD. With a suitable generator, the input frequencies can be swept or stepped with a constant 1 kHz spacing.
More advanced tests exploit not just the beats between the fundamentals, but also those involving the harmonics. In one proposal, shown in Figure 13.8(e) input tones of 8 and 11.95kHz are used. The fundamentals produce a beat of 3.95 kHz, but the second harmonic of 8 kHz is 16 kHz which produces a beat of 4.05 kHz. This will intermodulate with 3.95 kHz to produce a 100 Hz component which can be measured. Clearly this test will only work in 60 Hz power regions, and the exact frequencies would need modifying in 50 Hz regions.
If a precise spectrum analyser is available, all the above tests can be performed simultaneously. Figure 13.8(f) shows the results of a spectrum analysis of a DUT supplied with two test tones. Clearly it is possible to test with three simultaneous tones or more. Some audio devices, particularly power amplifiers, ADCs and DACs, are relatively benign under steady-state testing with simple signals, but reveal their true colours with a more complex input.
Subjective testing can only be carried out by placing the device under test in series with an existing sound-reproduction system. Unless the DUT is itself a loudspeaker, the testing will only be as stringent as the loudspeakers in the system allow. Unfortunately the great majority of loudspeakers do not reach the standard required for meaningful subjective testing of units placed in series and consequently the majority of such tests are of questionable value.
If useful subjective testing is to be carried out, it is necessary to use the most accurate loudspeakers available and to test the loudspeakers themselves before using them as any kind of reference. Whilst simple tests such as on-axis frequency response give an idea of the performance of a loudspeaker, the majority produce so much distortion and modulation noise that the figures are not even published.
Digital audio systems potentially have high signal resolution, but subjective testing of high-performance convertors is very difficult because of loudspeaker limitations. Consequently it is important to find listening tests which will meaningfully assess loudspeakers, especially for linearity and resolution, whilst eliminating other variables as much as possible. Linearity and resolution are essential to allow superimposition of an indefinite number of sounds in a stereo image.
Non-linearity in stereo has the effect of creating intermodulated sound objects which are in a different place in the image from the genuine sounds. Consequently the requirements for stereo are more stringent than for mono. This can be used for speaker testing. One stringent test is to listen to a high-quality stereo recording in which multi-tracking has been used to superimpose a number of takes of a musician or vocalist playing/singing the same material. The use of multi-tracking reduces the effect of intermodulation at the microphone and ADC as these handle only one source at a time. The use of a panpot eliminates any effects due to inadequate directivity in a stereo microphone.
It should be possible to hear how many simultaneous sources are present, i.e. whether the recording is double, triple or quadruple tracked, and it should be possible to concentrate on each source to the exclusion of the others.
It should also be possible to pan each version of a multi-tracked recording to a slightly different place in the stereo image and individually identify each source even when the spacing is very small. Poor loudspeakers smear the width of an image because of diffraction and fail this test.
In another intermodulation test it is necessary to find a good-quality recording in which a vocalist sings solo at some point, and at another point is accompanied by a powerful low-frequency instrument such as a pipe organ or a bass guitar. There should be no change in the imaging or timbre of the vocal whether or not the LF is present.
Another stringent test of linearity is to listen to a recording made on a coincident stereo microphone of a spatially complex source such as a choir. It should be possible to identify the location of each chorister and to concentrate on the voice of each. The music of Tallis is highly suitable. Massed strings are another useful test, with the end of Barber’s Adagio for Strings being particularly revealing. Coincident stereo recordings should also be able to reproduce depth. It should be possible to resolve two instruments or vocalists one directly behind the other at different distances from the microphone. Loudspeakers which cannot pass these tests are not suitable for subjective quality testing.
When a pair of reference-grade loudspeakers has been found which will demonstrate all the above effects, it will be possible to make meaningful comparisons between devices such as microphones, consoles, analog recorders, ADCs and DACs. Quality variations between the analog outputs of different CD players or DAT machines will be readily apparent. Those which pay the most attention to convertor clock jitter are generally found to be preferable. Very expensive high-end CD players are often disappointing because these units concentrate on one aspect of performance and neglect others.
One myth which has taken a long time to be revealed is the belief that a low-grade loudspeaker should be used in the production process so that an indication of how the mix will sound on mediocre consumer equipment will be obtained. If an average loudspeaker could be obtained this would be possible. Unfortunately the main defect of a poor loudspeaker is that it stamps its own characteristic footprint on the audio. These footprints vary so much that there is no such thing as an average poor loudspeaker and people who make decisions on cheap loudspeakers are taking serious risks. It is a simple fact that an audio production can never be better than the monitor loudspeakers used and the author’s extensive collection of defective CDs indicates that good monitoring is rare.
In theory the quality of a digital audio system comprising an ideal ADC followed by an ideal DAC is determined at the ADC. This will be true if the digital signal path is sufficiently well engineered that no numerical errors occur, which is the case with most reasonably maintained equipment. The ADC parameters such as the sampling rate, the wordlength and any noise shaping used put limits on the quality which can be achieved. Conversely, the DAC itself may be transparent, because it only converts data whose quality are already determined back to the analog domain. In other words, the ideal ADC determines the system quality and the ideal DAC does not make things any worse.
In practice both ADCs and DACs can fall short of the ideal, but with modern convertor components and attention to detail the theoretical limits can be approached very closely and at reasonable cost. Shortcomings may be the result of an inadequacy in an individual component such as a convertor chip, or due to incorporating a high-quality component into a poorly though-out system. Poor system design or implementation can destroy the performance of a convertor. Whilst oversampling is a powerful technique for realizing high-quality convertors, its use depends on digital interpolators and decimators whose quality affects the overall conversion quality.1 Interpolators and decimators with erroneous arithmetic or inadequate filtering performance have been known.
ADCs and DACs have the same transfer function, since they are only distinguished by the direction of operation, and therefore the same terminology can be used to classify the possible shortcomings of both. Figure 13.11 shows the transfer functions resulting from the main types of convertor error.
Figure 13.11(a) shows offset error. A constant appears to have been added to the digital signal. This has no effect on sound quality, unless the offset is gross, when the symptom would be premature clipping. DAC offset is of little consequence, but ADC offset is undesirable since it can cause an audible thump if an edit is made between two signals having different offsets. Offset error is sometimes cancelled by digitally averaging the convertor output and feeding it back to the analog input as a small control voltage. Alternatively, a digital high-pass filter can be used.
Figure 13.11(b) shows gain error. The slope of the transfer function is incorrect. Since convertors are referred to one end of the range, gain error causes an offset error. The gain stability is probably the least important factor in a digital audio convertor, since ears, meters and gain controls are logarithmic.
Figure 13.11(c) shows integral linearity. This is the deviation of the dithered transfer function from a straight line. It has exactly the same significance and consequences as linearity in analog circuits, since if it is inadequate, distortion will be caused.
Differential non-linearity is the amount by which adjacent quantizing intervals differ in size. This is usually expressed as a fraction of a quantizing interval. In audio applications the differential non-linearity requirement is quite stringent. This is because with properly employed dither, an ideal system can remain linear under low-level signal conditions. When low levels are present, only a few quantizing intervals are in use. If these change in size, clearly waveform distortion will take place despite the dither. Enhancing the subjective quality of convertors using noise shaping will only serve to reveal such shortcomings.
Figure 13.12 shows that monotonicity is a special case of differential nonlinearity. Non-monotonicity means that the output does not increase for an increase in input. Figure 13.12(a) shows that in a DAC with a convertor input code of 01111111 (127 decimal), the seven low-order current sources of the convertor will be on. The next code is 10000000 (128 decimal), shown in Figure 13.12(b), where only the eighth current source is operating. If the current it supplies is in error on the low side, the analog output for 128 may be less than that for 127 as shown in Figure 13.12(c). In an ADC non-monotonicity can result in missing codes. This means that certain binary combinations within the range cannot be generated by any analog voltage. If a device has better than 1/2Q linearity it must be monotonic. It is difficult for a one-bit convertor to be non-monotonic.
Absolute accuracy is the difference between actual and ideal output for a given input. For audio it is rather less important than linearity. For example, if all the current sources in a convertor have good thermal tracking, linearity will be maintained, even though the absolute accuracy drifts.
Clocks which are free of jitter are a critical requirement in convertors as was shown in Chapter 4. The effects of clock jitter are proportional to the slewing rate of the audio signal rather than depending on the sampling rate, and as a result oversampling convertors are no more prone to jitter than conventional convertors.2 Clock jitter is a form of frequency modulation with a small modulation index. Sinusoidal jitter produces sidebands which may be audible. Random jitter raises the noise floor which is more benign but still undesirable. As clock jitter produces artifacts proportional to the audio slew rate, it is quite easy to detect. A spectrum analyser is connected to the convertor output and a low audio frequency signal in input. The test is then repeated with a high audio frequency. If the noise floor changes, there is clock jitter. If the noise floor rises but remains substantially flat, the jitter is random. If there are discrete frequencies in the spectrum, the jitter is periodic. The spacing of the discrete frequencies from the input frequency will reveal the frequencies in the jitter.
Aliasing of audio frequencies is not generally a problem, especially if oversampling is used. However, the nature of aliasing is such that it works in the frequency domain only and translates frequencies to new values without changing amplitudes. Aliasing can occur for any frequency above one half the sampling rate. The frequency to which it aliases will be the difference frequency between the input and the nearest sampling rate multiple. Thus in a non-oversampling convertor, all frequencies above half the sampling rate alias into the audio band. This includes radio frequencies which have entered via audio or power wiring or directly. RF can leap-frog an analog anti-aliasing filter capacitively. Thus good RF screening is necessary around ADCs, and the manner of entry of cables to equipment must be such that RF energy on them is directed to earth. Recent legislation regarding the sensitivity of equipment to electromagnetic interference can only be beneficial in this respect.
Oversampling convertors respond to RF on the input in a different manner. Although all frequencies above half the sampling rate are folded into the baseband, only those which fold into the audio band will be audible. Thus an unscreened oversampling convertor will be sensitive to RF energy on the input at frequencies within ±20 kHz of integer multiples of the sampling rate. Fortunately interference from the digital circuitry at exactly the sampling rate will alias to DC and be inaudible.
Convertors are also sensitive to unwanted signals superimposed on the references. In fact the multiplicative nature of a convertor means that reference noise amplitude modulates the audio to create sidebands. Power supply ripple on the reference due to inadequate regulation or decoupling causes sidebands 50, 60, 100 or 120 Hz away from the audio frequencies, yet does not raise the noise floor when the input is quiescent. The multiplicative effect reveals how to test for it. Once more a spectrum analyser is connected to the convertor output. An audio frequency tone is input, and the level is changed. If the noise floor changes with the input signal level, there is reference noise. RF interference on a convertor reference is more insidious, particularly in the case of noise-shaped devices. Noise-shaped convertors operate with signals which must contain a great deal of high-frequency noise just beyond the audio band. RF on the reference amplitude modulates this noise and the sidebands can enter the audio band, raising the noise floor or causing discrete tones depending on the nature of the pickup.
Noise-shaped convertors are particularly sensitive to a signal of half the sampling rate on the reference. When a small DC offset is present on the input, the bit density at the quantizer must change slightly from 50 per cent. This results in idle patterns whose spectrum may contain discrete frequencies. Ordinarily these are designed to occur near half the sampling rate so that they are beyond the audio band. In the presence of half-sampling-rate interference on the reference, these tones may be demodulated into the audio band.
Although the faithful reproduction of the audio band is the goal, the nature of sampling is such that convertor design must respect EMC and RF engineering principles if quality is not to be lost. Clean references, analog inputs, outputs and clocks are all required, despite the potential radiation from digital circuitry within the equipment and uncontrolled electromagnetic interference outside.
Unwanted signals may be induced directly by ground currents, or indirectly by capacitive or magnetic coupling. It is essential practice to separate grounds for analog and digital circuitry, connecting them in one place only. Capacitive coupling uses stray capacitance between the signal source and point where the interference is picked up. Increasing the distance or conductive screening helps. Coupling is proportional to frequency and the impedance of the receiving point. Lowering the impedance at the interfering frequency will reduce the pickup. If this is done with capacitors to ground, it need not reduce the impedance at the frequency of wanted signals.
Magnetic or inductive coupling relies upon a magnetic field due to the source current flow inducing voltages in a loop. Reduction in inductive coupling requires the size of any loops to be minimized. Digital circuitry should always have ground planes in which return currents for the logic signals can flow. At high frequency, return currents flow in the ground plane directly below the signal tracks and this minimizes the area of the transmiting loop. Similarly, ground planes in the analog circuitry minimize the receiving loop whilst having no effect on baseband audio. A further weapon against inductive coupling is to use ground fill between all traces on the circuit board. Ground fill will act like a shorted turn to alternating magnetic fields. Ferrous screening material will also reduce inductive coupling as well as capacitive coupling.
The reference of a convertor should be decoupled to ground as near to the integrated circuit as possible. This does not prevent inductive coupling to the lead frame and the wire to the chip itself. In the future convertors with on-chip references may be developed to overcome this problem.
In summary, spectral analysis of convertors gives a useful insight into design weaknesses. If the noise floor is affected by the signal level, reference noise is a possibility. If the noise floor is affected by signal frequency, clock jitter is likely. Should the noise floor be unaffected by both, the noise may be inherent in the signal or in analog circuit stages.
One interesting technique which has been developed recently for ADC testing is a statistical analysis of the frequency of occurrence of the various code values in data. If, for example, a full-scale sine wave is input to an ADC having a frequency which is asynchronous to the sampling rate, the probability of a particular code occurring in the output of an ideal convertor is a function only of the slew rate of the signal. At the peaks of the sine wave the slew rate is small and the codes there are more frequent. Near the zero crossing the slew rate is high and the probability is lower. Near the zero crossing, the probability of codes being created is nearly equal. However, if one quantizing interval is slightly larger than its neighbours, the signal will take longer to cross it and the probability of that code appearing will rise. Conversely, if the interval is smaller the probability will fall. By collecting a large quantity of data from a test and displaying the statistics it is possible to measure differential non-linearity to phenomenal accuracy.
This technique has been used to show that oversampled noise-shaped convertors are virtually free of differential non-linearity because of the averaging in the decimation process.
In practice signals used are not restricted to high-level sine waves. A low-level sine wave will only exercise a small number of codes near the audiologically sensitive centre of the quantizing range. However, it may be better to use a combination of three sine waves which exercises the whole range. As the test method reveals differences in probability of occurrence of individual codes, it can be used with program material. In this case the exact distribution of code probabilities is not important. Instead it is important that the probability distribution should be smooth. As Figure 13.13 shows, spikes in the distribution indicate an unusually high or low probability for certain codes.
In an analysis of code probability on a number of commercially available CDs, a disturbing number of those tested had surprising characteristics such as missing codes, particularly in older recordings. Single missing codes can be due to an imperfect ADC, but in some cases there were a large number of missing codes spaced evenly apart. This could only be due to primitive gain controls applied in the digital domain without proper redithering. This may have been required with under-modulated master tapes which would be digitally amplified prior to the cutting process in order to play back at a reasonable level.
Statistical code analysis is quite useful to the professional audio engineer as it can be applied using actual program material at any point in the production and mastering process. Following an ADC it will reveal convertor non-linearities, but used later, it will reveal DSP shortcomings. It is highly likely that a correlation will be found between subjectively perceived resolution and the results of tests of this kind.
From time to time there have been proposals to raise the sampling rates used in digital audio to, for example, 96 kHz and even 192 kHz. These are invariably backed with the results of experiments and demonstrations ‘proving’ that the sampling rate makes a difference. The reality is different because careful study of these experiments show them to be flawed.
The most famous bandwidth myth is the fact that it is possible to hear the difference between a 10 kHz sine wave and a 10 kHz squarewave when the difference between the two starts with the third harmonic at 30 kHz. If we could only hear 20 kHz it wouldn’t be audible, but it is. The reason is non-linearity in practical equipment. Even if the signal system, speakers and air were perfectly linear, so we could inject a 10 kHz acoustic squarewave into the ear, we would still hear the difference because the ear itself isn’t linear. The ossicles in the ear are a mechanical lever system and have limitations. Consequently hearing a difference between a 10 kHz sine wave and a squarewave doesn’t prove anything about the bandwidth of human hearing.
Another classic myth is the experiment shown in Figure 13.14. This takes a 96 kHz source and allows monitoring of the source directly or through a decimation to 48 kHz followed by an interpolation back to 96 kHz. This is supposed to test whether the difference between 48 kHz and 96 kHz is audible. Actually all it proves is that the more stages a signal goes through, the worse it gets. The decimation and interpolation processes will cause degradation of the signal within the 20 kHz band, so it’s no wonder that the subjects prefer the 96 kHz path.
What the experiment should have done was to replicate the degradation of the decimate/interpolate path. In other words the elevated noise floor due to two arithmetic roundoffs in series and the ripple and phase response of the filters should also have been present in the 96 kHz path. Tests should have been made to ensure that both paths were identical in all respects up to 20 kHz. Unfortunately they weren’t and the conclusions are meaningless because the experiment was not properly designed so that the only difference between the two stimuli was the bandwidth.
When a properly designed experiment is performed, in which 96 kHz source material is or is not bandwidth limited to 20 kHz by a psycho-acoustically adequate low-pass filter, it is impossible to hear any difference.3
Some ADC manufacturers have demonstrated better sound quality from convertors running at 96 kHz. However, this does not prove that 96 kHz is necessary. Figure 13.15 shows that if an oversampling convertor has suboptimal decimating filters it will suffer from a modulation noise floor which damages resolution. If the sampling rate is doubled, the noise will be spread over twice the bandwidth so the level will be reduced. This is why the high sampling rate convertor sounds better. However, the same sound quality could be obtained by improving the design of the 48 kHz convertor.
There are three parameters of interest when conveying audio down a digital interface such as AES/EBU or SPDIF, and these have quite different importance depending on the application. The parameters are:
(a) | The jitter tolerance of the serial FM data separator. |
(b) | The jitter tolerance of the audio samples at the point of conversion back to analog. |
(c) | The timing accuracy of the serial signal with respect to other signals. |
A digital interface is designed to convey discrete numerical values from one place to another. If those samples are correctly received with no numerical change, the interface is perfect. The serial interface carries clocking information, in the form of the transitions of the FM channel code and the sync patterns and this information is designed to enable the data separator to determine the correct data values in the presence of jitter. It was shown in Chapter 8 that the jitter window of the FM code is half a data bit period in the absence of noise. This becomes a quarter of a data bit when the eye opening has reached the minimum allowable in the professional specification as can be seen from Figure 8.2. If jitter is within this limit, which corresponds to about 80 nanoseconds pk–pk, the serial digital interface perfectly reproduces the sample data, irrespective of the intended use of the data. The data separator of an AES/EBU receiver requires a phase-locked loop in order to decode the serial message. This phase-locked loop will have jitter of its own, particularly if it is a digital phase-locked loop where the phase steps are of finite size. Digital phase-locked loops are easier to implement along with other logic in integrated circuits. There is no point in making the jitter of the phase-locked loop vanishingly small as the jitter tolerance of the channel code will absorb it. In fact the digital phase-locked loop is simpler to implement and locks up quicker if it has larger phase steps and therefore more jitter.
This has no effect on the ability of the interface to convey discrete values, and if the data transfer is simply an input to a digital recorder no other parameter is of consequence as the data values will be faithfully recorded. However, it is a further requirement in some applications that a sampling clock for a convertor is derived from a serial interface signal.
It was shown in Chapter 4 that the jitter tolerance of convertor clocks is measured in picoseconds. Thus a phase-locked loop in the FM data separator of a serial receiver chip is quite unable to drive a convertor directly as the jitter it contains will be as much as a thousand times too great. Nevertheless this is exactly how a great many consumer outboard DACs are built, regardless of price. The consequence of this poor engineering is that the serial interface is no longer truly digital. Analog variations in the interface waveform cause variations in the convertor clock jitter and thus variations in the reproduced sound quality. Different types of digital cable ‘sound’ different and journalists claim that digital optical interfaces are ‘sonically superior’ to electrical interfaces. The digital outputs of some CD players ‘sound’ better than others and so on. In fact source and cable substitution is an excellent test of outboard convertor quality. A properly engineered outboard convertor will sound the same despite changes in CD player, cable type and length and despite changing from electrical to optical input because it accepts only data from the serial signal and regenerates its own clock. Audible differences simply mean the convertor is of poor design and should be rejected.
Figure 13.16 shows how a convertor should be configured. The serial data separator has its own phase-locked loop which is less jittery than the serial waveform and so recovers the audio data. The serial data are presented to a shift register which is read in parallel to a latch when an entire sample is present by a clock edge from the data separator. The data separator has done its job of correctly returning a sample value to parallel format. A quite separate phase-locked loop with extremely high damping and low jitter is used to regenerate the sampling clock. This may use a crystal oscillator or it may be a number of loops in series to increase the order of the jitter filtering. In the professional channel status, bit 5 of byte 0 indicates whether the source is locked or unlocked. This bit can be used to change the damping factor of the phase-locked loop or to switch from a crystal to a varicap oscillator. When the source is unlocked, perhaps because a recorder is in varispeed, the capture range of the phase-locked loop can be widened and the increased jitter is accepted. When the source is locked, the capture range is reduced and the jitter is rejected.
The third timing criterion is only relevant when more than one signal is involved as it affects the ability of, for example, a mixer to combine two inputs.
In order to decide which criterion is most important, the following may be helpful. A single signal which is involved in a data transfer to a recording medium is concerned only with eye pattern jitter as this affects the data reliability.
A signal which is to be converted to analog is concerned primarily with the jitter at the convertor clock. Signals which are to be mixed are concerned with the eye pattern jitter and the relative timing. If the mix is to be monitored, all three parameters become important.
A better way of ensuring low jitter conversion to analog in digital audio reproducers is to generate a master clock from a crystal adjacent to the convertor, and then to slave the transport to produce data at the same rate. This approach is shown in Figure 13.17. Memory buffering between transport and convertor then ensures that the transport jitter is eliminated. Whilst this can also be done with a remote convertor, it does then require a reference clock to be sent to the transport as in Figure 13.18 so that data can be sent at the correct rate. Unfortunately most consumer CD and DAT players have no reference input and this approach cannot be used. Consumer remote DACs then must regenerate a clock from the player and seldom do it accurately enough. In fact it is a myth that outboard convertors are necessary for high quality. For the same production cost, a properly engineered inboard convertor adhering to the quality criteria of Chapter 4 can sound better than a two-box system. The real benefit of an outboard convertor is that in theory it allows several digital sources to be replayed for the cost of one convertor. In practice few consumer devices are available with only a digital output, and the convertors are duplicated in each device.
The human hearing mechanism has an ability to concentrate on one of many simultaneous sound sources based on direction. The brain appears to be able to insert a controllable time delay in the nerve signals from one ear with respect to the other so that when sound arrives from a given direction the nerve signals from both ears are coherent causing the binaural threshold of hearing to be 3–6 dB better than monaural at around 4 kHz. Sounds arriving from other directions are incoherent and are heard less well. This is known as attentional selectivity.
Human hearing can also locate a number of different sound sources simultaneously presented by constantly comparing excitation patterns from the two ears with different delays. Strong correlation will be found where the delay corresponds to the interaural delay for a given source. This delay-varying mechanism will take time and the ear is slow to react to changes in source direction. Oscillating sources can only be tracked up to 2–3 Hz and the ability to locate bursts of noise improves with burst duration up to about 700 ms. Location accuracy is finite.
Stereophonic and surround systems should allow attentional selectivity to function such that the listener can concentrate on specific sound sources in a reproduced image with the same facility as in the original sound.
We live in a reverberant world which is filled with sound reflections. If we could separately distinguish every different reflection in a reverberant room we would hear a confusing cacaphony. In practice we hear very well in reverberant surroundings, far better than microphones can, because of the transform nature of the ear and the way in which the brain processes nerve signals. Because the ear has finite frequency discrimination ability in the form of critical bands, it must also have finite temporal discrimination.
This is good news for the loudspeaker designer because the ear has finite accuracy in frequency, time and spatial domains. This means that a blameless loudspeaker is not just a concept, it could be made real by the application of sufficient rigour.
When two or more versions of a sound arrive at the ear, provided they fall within a time span of about 30 ms, they will not be treated as separate sounds, but will be fused into one sound. Only when the time separation reaches 50–60 ms do the delayed sounds appear as echoes from different directions. As we have evolved to function in reverberant surroundings, most reflections do not impair our ability to locate the source of a sound.
Clearly the first version of a transient sound to reach the ears must be the one which has travelled by the shortest path and this must be the direct sound rather than a reflection. Consequently the ear has evolved to attribute source direction from the time of arrival difference at the two ears of the first version of a transient.
Versions which may arrive from elsewhere simply add to the perceived loudness but do not change the perceived location of the source unless they arrive within the inter-aural delay of about 700 μs when the precedence effect breaks down and the perceived direction can be pulled away from that of the first arriving source by an increase in level. This area is known as the time-intensity trading region. Once the maximum inter-aural delay is exceeded, the hearing mechanism knows that the time difference must be due to reverberation and the trading ceases to change with level.
Unfortunately reflections with delays of the order of 700 μs are exactly what are provided by the legacy rectangular loudspeaker with sharp corners. These reflections are due to acoustic impedance changes and if we could see sound we would double up with mirth at how ineptly the sound is being radiated. Effectively the spatial information in the audio signals is being convolved with the spatial footprint of the speaker. This has the effect of defocusing the image. Now the effect can be measured.
Intensity stereo, the type obtained with coincident mikes or panpots, works purely by amplitude differences at the two loudspeakers. The two signals should be exactly in phase. As both ears hear both speakers the result is that the space between the speakers and the ears turns the intensity differences into time of arrival differences. These give the illusion of virtual sound sources.
A virtual sound source from a panpot has zero width and on diffraction-free speakers would appear as a virtual point source. Figure 13.19(a) shows how a panpotted dry mix should appear spatially on ideal speakers whereas (b) shows what happens when stereo reverb is added. In fact (b) is also what is obtained with real sources using a coincident pair of mikes. In this case the sources are the real sources and the sound between is reverb/ambience.
Figure 13.19(c) is the result obtained with traditional square box speakers. Note that the point sources have spread so that there are almost no gaps between them, effectively masking the ambience. This represents a lack of spatial fidelity, so we can say that rectangular box loudspeakers cannot accurately reproduce a stereo image, nor can they be used for assessing the amount of reverbertion added to a ‘dry’ recording. Such speakers cannot meaningfully be used to assess compression codecs.
A compressor works by raising the level of ‘noise’ in parts of the signal where it is believed to be masked. If this belief is correct, the compression will be inaudible. However, if the codec is tested using a signal path in which there is another masking effect taking place, the results of the test are meaningless. Theoretical analysis and practical measurement that legacy loudspeakers have exactly such a masking process, both temporally and spatially.4
If a stereophonic system comprising a variable bit rate codec in series with a pair of speakers is considered to be a communication channel, then it will have a finite information rate in frequency, temporal and spatial domains. If this information rate exceeds the capacity of the human hearing mechanism, it will be deemed transparent. However, in the system mentioned, either the codec or the speakers could be the limiting factor and ordinarily there would be no way to separate the effects.
If a variable bit-rate codec is available, some conclusions can be drawn. Figure 13.20(a) shows what happens as the bit rate is increased with an ideal speaker. The sound quality increases up to the point where the capacity of the ear is reached, after which raising the bit rate appears to have no effect. However, if suboptimal speakers are used, the situation of Figure 13.20(b) arises. Now, as the bit rate is increased, the quality levels off prematurely where the information capacity of the loudspeaker has been reached. As a result simply by varying the bit rate of a coder, it becomes possible to measure the effective bit rate of a pair of loudspeakers.
In subjective compression tests, the configuration of Figure 13.20(c) is used. The listener switches between the uncompressed and compressed versions to see if a difference can be detected. If the speaker of Figure 13.20(b) is used, the experimenter is misled, because it would appear that there is no difference between direct and compressed listening at an artificially low bit rate, whereas in fact the limiting factor is the speaker.
At the point shown in Figure 13.20(b) the masking due to the speaker is equal to the level of artifacts from the coder. At any lower bit rate, compression artifacts will become audible over the footprint of the speaker. The lower the information capacity of the speaker, the lower the bit rate at which the artifacts are audible.
Non-ideal loudspeakers act like bit-rate compressors in that they conceal or mask information in the audio signal. If a real compressor is tested with non-ideal loudspeakers certain deficiencies of the compressor will not be heard and it may erroneously be assumed that the compressor is transparent when in fact it is not. Compression artifacts which are inaudible in mono may be audible in stereo and the spatial compression of non-ideal stereo loudspeakers conceals real spatial compression artifacts.
Precision monitor speakers should be free of reflections in the sub-700 μs trading region so that the imaging actually reveals what is going on spatially. When such speakers are used to assess audio compressors, even at high bit rates corresponding to the smallest amount of compression, it is obvious that there is a difference between the original and the compressed result. Figure 13.21 shows graphically what is found. The dominant sound sources are reproduced fairly accurately, but the ambience and reverb between are virtually absent, making the decoded sound much drier than the original.
The effect will be apparent to the same extent with, for example, both MPEG Layer II and Dolby AC-3 coders even though their internal workings are quite different. This is not surprising because both are probably based on the same psychoacoustic masking model. MPEG-3 fares even worse because the bit rate is lower. Transient material has a peculiar effect whereby the ambience will come and go according to the entropy of the dominant source. A percussive note will narrow the sound stage and appear dry but afterwards the reverb level will come back up. All these effects largely disappear when the signals to the speakers are added to make mono, removing the ear’s ability to discriminate spatially.
These effects are not subtle and do not require golden ears. The author has successfully demonstrated them to various audiences up to 60 in number in a variety of untreated rooms. Whilst compression may be adequate to deliver post-produced audio to a consumer with mediocre loudspeakers, these results underline that it has no place in a quality production environment. When assessing codecs, loudspeakers having poor diffraction design will conceal artifacts. When mixing for a compressed delivery system, it will be necessary to include the codec in the monitor feeds so that the results can be compensated. Where high-quality stereo is required, either full bit rate PCM or lossless (packing) techniques must be used.
1. | Lipshitz, S.P. and Vanderkooy, J., Are D/A convertors getting worse? Presented at the 84th Audio Engineering Society Convention (Paris, 1988), Preprint 2586 (D-6) |
2. | Harris, S., The effects of sampling clock jitter on Nyquist sampling analog to digital convertors and on oversampling delta-sigma ADCs. J. Audio Eng. Soc., 38, 537–542 (1990) |
3. | Katz, B., The ultimate listening test. Audio Media (April 2000), 116–118 |
4. | Watkinson, J.R., Putting the science back into loudspeakers. Presented at the Multichannel Audio Summit (London, May 2000) |
3.239.76.211