• Search in book...
• Toggle Font Controls

Some audio principles

2.1 The physics of sound

Sound is simply an airborne version of vibration which is why the two topics are inextricably linked. The air which carries sound is a mixture of gases, mostly nitrogen, some oxygen, a little carbon dioxide and so on. Gases are the highest energy state of matter, for example the application of energy to ice produces water and the application of more energy produces steam. The reason that a gas takes up so much more room than a liquid is that the molecules contain so much energy that they break free from their neighbours and rush around at high speed.

As Figure 2.1(a) shows, the innumerable elastic collisions of these high-speed molecules produce pressure on the walls of any gas container. In fact the distance a molecule can go without a collision, the mean free path, is quite short at atmospheric pressure. Consequently gas molecules also collide with each other elastically, so that if left undisturbed, in a container at a constant temperature, every molecule would end up with essentially the same energy and the pressure throughout would be constant and uniform.

Sound disturbs this simple picture. Figure 2.1(b) shows that a solid object which moves against gas pressure increases the velocity of the rebounding molecules, whereas in (c) one moving with gas pressure reduces that velocity. The average velocity and the displacement of all the molecules in a layer of air near to a moving body is the same as the velocity and displacement of the body. Movement of the body results in a local increase or decrease in pressure of some kind. Thus sound is both a pressure and a velocity disturbance. Integration of the velocity disturbance gives the displacement.

Figure 2.1    (a) The pressure exerted by a gas is due to countless elastic collisions between gas molecules and the walls of the container. (b) If the wall moves against the gas pressure, the rebound velocity increases. (c) Motion with the gas pressure reduces the particle velocity.

Despite the fact that a gas contains endlessly colliding molecules, a small mass or particle of gas can have stable characteristics because the molecules leaving are replaced by new ones with identical statistics. As a result acoustics seldom needs to consider the molecular structure of air and the constant motion can be neglected. Thus when particle velocity and displacement is considered in acoustics, this refers to the average values of a large number of molecules. In an undisturbed container of gas the particle velocity and displacement will both be zero everywhere.

When the volume of a fixed mass of gas is reduced, the pressure rises. The gas acts like spring; it is compliant. However, a gas also has mass. Sound travels through air by an interaction between the mass and the compliance. Imagine pushing a mass via a spring. It would not move immediately because the spring would have to be compressed in order to transmit a force. If a second mass is connected to the first by another spring, it would start to move even later. Thus the speed of a disturbance in a mass/spring system depends on the mass and the stiffness.

After the disturbance had propagated the masses would return to their rest position. The mass–spring analogy is helpful for a basic understanding, but is too simple to account for commonly encountered acoustic phenomena such as spherically expanding waves. It must be remembered that the mass and stiffness are distributed throughout the gas in the same way that inductance and capacitance are distributed in a transmission line. Sound travels through air without a net movement of the air.

2.2 The speed of sound

Unlike solids, the elasticity of gas is a complicated process. If a fixed mass of gas is compressed, work has to be done on it. This will generate heat in the gas. If the heat is allowed to escape and the compression does not change the temperature, the process is said to be isothermal. However, if the heat cannot escape the temperature will rise and give a disproportionate increase in pressure. This process is said to be adiabatic and the Diesel engine depends upon it. In most audio cases there is insufficient time for much heat transfer and so air is considered to act adiabatically. Figure 2.2 shows how the speed of sound c in air can be derived by calculating its elasticity under adiabatic conditions.

Figure 2.2    Calculating the speed of sound from the elasticity of air.

If the volume allocated to a given mass of gas is reduced isothermally, the pressure and the density will rise by the same amount so that c does not change. If the temperature is raised at constant pressure, the density goes down and so the speed of sound goes up. Gases with lower density than air have a higher speed of sound. Divers who breathe a mixture of oxygen and helium to prevent ‘the bends’ must accept that the pitch of their voices rises remarkably.

The speed of sound is proportional to the square root of the absolute temperature. On earth, temperature changes with respect to absolute zero (–273°C) also amount to around one per cent except in extremely inhospitable places. The speed of sound experienced by most of us is about 1000 feet per second or 344 metres per second. Temperature falls with altitude in the atmosphere and with it the speed of sound. The local speed of sound is defined as Mach 1. Consequently supersonic aircraft are fitted with Mach meters.

As air acts adiabatically, a propagating sound wave causes cyclic temperature changes. The speed of sound is a function of temperature, yet sound causes a temperature variation. One might expect some effects because of this. Fortunately, sounds which are below the threshold of pain have such a small pressure variation compared with atmospheric pressure that the effect is negligible and air can be assumed to be linear. However, on any occasion where the pressures are higher, this is not a valid assumption. In such cases the positive half cycle significantly increases local temperature and the speed of sound, whereas the negative half cycle reduces temperature and velocity. Figure 2.3 shows that this results in significant distortion of a sine wave, ultimately causing a shock wave which can travel faster than the speed of sound until the pressure has dissipated with distance. This effect is responsible for the sharp sound of a handclap.

Figure 2.3    At high level, sound distorts itself by increasing the speed of propagation on positive half-cycles. The result is a shock wave.

This behaviour means that the speed of sound changes slightly with frequency. High frequencies travel slightly faster than low because there is less time for heat conduction to take place. Figure 2.4 shows that a complex sound source produces harmonics whose phase relationship with the fundamental advances with the distance the sound propagates. This allows one mechanism (there are others) by which one can judge the distance from a known sound source. Clearly for realistic sound reproduction nothing in the audio chain must distort the phase relationship between frequencies. A system which accurately preserves such relationships is said to be phase linear.

Figure 2.4    In a complex waveform, high frequencies travel slightly faster producing a relative phase change with distance.

2.3 Wavelength

Sound can be due to a one-off event known as percussion, or a periodic event such as the sinusoidal vibration of a tuning fork. The sound due to percussion is called transient whereas a periodic stimulus produces steady-state sound having a frequency f.

Because sound travels at a finite speed, the fixed observer at some distance from the source will experience the disturbance at some later time. In the case of a transient, the observer will detect a single replica of the original as it passes at the speed of sound. In the case of the tuning fork, a periodic sound source, the pressure peaks and dips follow one another away from the source at the speed of sound. For a given rate of vibration of the source, a given peak will have propagated a constant distance before the next peak occurs. This distance is called the wavelength lambda. Figure 2.5 shows that wavelength is defined as the distance between any two identical points on the whole cycle. If the source vibrates faster, successive peaks get closer together and the wavelength gets shorter. Figure 2.5 also shows that the wavelength is inversely proportional to the frequency. It is easy to remember that the wavelength of 1000 Hz is a foot (about 30 cm).

Figure 2.5    Wavelength is defined as the distance between two points at the same place on adjacent cycles. Wavelength is inversely proportional to frequency.

2.4 Periodic and aperiodic signals

Sounds can be divided into these two categories and analysed both in the time domain in which the waveform is considered, or in the frequency domain in which the spectrum is considered. The time and frequency domains are linked by transforms of which the best known is the Fourier tramsform. Transforms will be considered further in Chapter 3.

Figure 2.6(a) shows that a periodic signal is one which repeats after some constant time has elapsed and goes on indefinitely in the time domain. In the frequency domain such a signal will be described as having a fundamental frequency and a series of harmonics or partials which are at integer multiples of the fundamental. The timbre of an instrument is determined by the harmonic structure. Where there are no harmonics at all, the simplest possible signal results which has only a single frequency in the spectrum. In the time domain this will be an endless sine wave.

Figure 2.6    (a) Periodic signal repeats after a fixed time and has a simple spectrum consisting of fundamental plus harmonics. (b) Aperiodic signal such as noise does not repeat and has a continuous spectrum. (c) Transient contains an anharmonic spectrum.

Figure 2.6(b) shows an aperiodic signal known as white noise. The spectrum shows that there is equal level at all frequencies, hence the term ‘white’ which is analogous to the white light containing all wavelengths. Transients or impulses may also be aperiodic. A spectral analysis of a transient (c) will contain a range of frequencies, but these are not harmonics because they are not integer multiples of the lowest frequency. Generally the narrower an event in the time domain, the broader it will be in the frequency domain and vice versa.

2.5 Sound and the ear

Experiments can tell us that the ear only responds to a certain range of frequencies within a certain range of levels. If sound is defined to fall within those ranges, then its reproduction is easier because it is only necessary to reproduce those levels and frequencies which the ear can detect.

Psychoacoustics can describe how our hearing has finite resolution in both time and frequency domains such that what we perceive is an inexact impression. Some aspects of the original disturbance are inaudible to us and are said to be masked. If our goal is the highest quality, we can design our imperfect equipment so that the shortcomings are masked. Conversely if our goal is economy we can use compression and hope that masking will disguise the inaccuracies it causes.

A study of the finite resolution of the ear shows how some combinations of tones sound pleasurable whereas others are irritating. Music has evolved empirically to emphasize primarily the former. Nevertheless we are still struggling to explain why we enjoy music and why certain sounds can make us happy and others can reduce us to tears. These characteristics must still be present in digitally reproduced sound.

Whatever the audio technology we deal with, there is a common goal of delivering a satisfying experience to the listener. However, some aspects of audio are emotive, some are technical. If we attempt to take an emotive view of a technical problem or vice versa our conclusions will be questionable.

The frequency range of human hearing is extremely wide, covering some ten octaves (an octave is a doubling of pitch or frequency) without interruption. There is hardly any other engineering discipline in which such a wide range is found. For example, in radio different wavebands are used so that the octave span of each is quite small. Whilst video signals have a wide octave span, the signal-to-noise and distortion criteria for video are extremely modest in comparison. Consequently audio is one of the most challenging subjects in engineering. Whilst the octave span required by audio can easily be met in analog or digital electronic equipment, the design of mechanical transducers such as microphones and loudspeakers will always be difficult.

2.6 Hearing

By definition, the sound quality of an audio system can only be assessed by human hearing. Many items of audio equipment can only be designed well with a good knowledge of the human hearing mechanism. The acuity of the human ear is finite but astonishing. It can detect tiny amounts of distortion, and will accept an enormous dynamic range over a wide number of octaves. If the ear detects a different degree of impairment between two audio systems in properly conducted tests, we can say that one of them is superior. Thus quality is completely subjective and can only be checked by listening tests. However, any characteristic of a signal which can be heard can in principle also be measured by a suitable instrument although in general the availability of such instruments lags the requirement. The subjective tests will tell us how sensitive the instrument should be. Then the objective readings from the instrument give an indication of how acceptable a signal is in respect of that characteristic.

The sense we call hearing results from acoustic, mechanical, hydraulic, nervous and mental processes in the ear/brain combination, leading to the term psychoacoustics. It is only possible briefly to introduce the subject here. The interested reader is referred to Moore1 for an excellent treatment.

Figure 2.7 shows that the structure of the ear is traditionally divided into the outer, middle and inner ears. The outer ear works at low impedance, the inner ear works at high impedance, and the middle ear is an impedance matching device. The visible part of the outer ear is called the pinna which plays a subtle role in determining the direction of arrival of sound at high frequencies. It is too small to have any effect at low frequencies. Incident sound enters the auditory canal or meatus. The pipe-like meatus causes a small resonance at around 4 kHz. Sound vibrates the eardrum or tympanic membrane which seals the outer ear from the middle ear. The inner ear or cochlea works by sound travelling though a fluid. Sound enters the cochlea via a membrane called the oval window.

If airborne sound were to be incident on the oval window directly, the serious impedance mismatch would cause most of the sound to be reflected. The middle ear remedies that mismatch by providing a mechanical advantage. The tympanic membrane is linked to the oval window by three bones known as ossicles which act as a lever system such that a large displacement of the tympanic membrane results in a smaller displacement of the oval window but with greater force. Figure 2.8 shows that the malleus applies a tension to the tympanic membrane rendering it conical in shape. The malleus and the incus are firmly joined together to form a lever. The incus acts upon the stapes through a spherical joint. As the area of the tympanic membrane is greater than that of the oval window, there is a further multiplication of the available force. Consequently small pressures over the large area of the tympanic membrane are converted to high pressures over the small area of the oval window.

Figure 2.7    The structure of the human ear. See text for details.

Figure 2.8    The malleus tensions the tympanic membrane into a conical shape. The ossicles provide an impedance-transforming lever system between the tympanic membrane and the oval window.

The middle ear is normally sealed, but ambient pressure changes will cause static pressure on the tympanic membrane which is painful. The pressure is relieved by the Eustachian tube which opens involuntarily while swallowing. The Eustachian tubes open into the cavities of the head and must normally be closed to avoid one’s own speech appearing deafeningly loud.

The ossicles are located by minute muscles which are normally relaxed. However, the middle ear reflex is an involuntary tightening of the tensor tympani and stapedius muscles which heavily damp the ability of the tympanic membrane and the stapes to transmit sound by about 12 dB at frequencies below 1 kHz. The main function of this reflex is to reduce the audibility of one’s own speech. However, loud sounds will also trigger this reflex which takes some 60–120 ms to occur, too late to protect against transients such as gunfire.

2.7 The cochlea

The cochlea, shown in Figure 2.9(a), is a tapering spiral cavity within bony walls which is filled with fluid. The widest part, near the oval window, is called the base and the distant end is the apex. Figure 2.9(b) shows that the cochlea is divided lengthwise into three volumes by Reissner’s membrane and the basilar membrane. The scala vestibuli and the scala tympani are connected by a small aperture at the apex of the cochlea known as the helicotrema. Vibrations from the stapes are transferred to the oval window and become fluid pressure variations which are relieved by the flexing of the round window. Effectively the basilar membrane is in series with the fluid motion and is driven by it except at very low frequencies where the fluid flows through the helicotrema, bypassing the basilar membrane.

Figure 2.9    (a) The cochlea is a tapering spiral cavity. (b) The cross-section of the cavity is divided by Reissner’s membrane and the basilar membrane. (c) The basilar membrane tapers so its resonant frequency changes along its length.

The vibration of the basilar membrane is sensed by the organ of Corti which runs along the centre of the cochlea. The organ of Corti is active in that it contains elements which can generate vibration as well as sense it. These are connected in a regenerative fashion so that the Q factor, or frequency selectivity of the ear, is higher than it would otherwise be. The deflection of hair cells in the organ of Corti triggers nerve firings and these signals are conducted to the brain by the auditory nerve. Some of these signals reflect the time domain, particularly during the transients with which most real sounds begin and also at low frequencies. During continuous sounds, the basilar membrane is also capable of performing frequency analysis.

Figure 2.9(c) shows that the basilar membrane is not uniform, but tapers in width and varies in thickness in the opposite sense to the taper of the cochlea. The part of the basilar membrane which resonates as a result of an applied sound is a function of the frequency. High frequencies cause resonance near to the oval window, whereas low frequencies cause resonances further away. More precisely the distance from the apex where the maximum resonance occurs is a logarithmic function of the frequency. Consequently tones spaced apart in octave steps will excite evenly spaced resonances in the basilar membrane. The prediction of resonance at a particular location on the membrane is called place theory. Essentially the basilar membrane is a mechanical frequency analyser. A knowledge of the way it operates is essential to an understanding of musical phenomena such as pitch discrimination, timbre, consonance and dissonance and to auditory phenomena such as critical bands, masking and the precedence effect.

Nerve firings are not a perfect analog of the basilar membrane motion. On continuous tones a nerve firing appears to occur at a constant phase relationship to the basilar vibration, a phenomenon called phase locking, but firings do not necessarily occur on every cycle. At higher frequencies firings are intermittent, yet each is in the same phase relationship.

The resonant behaviour of the basilar membrane is not observed at the lowest audible frequencies below 50 Hz. The pattern of vibration does not appear to change with frequency and it is possible that the frequency is low enough to be measured directly from the rate of nerve firings.

2.8 Mental processes

The nerve impulses are processed in specific areas of the brain which appear to have evolved at different times to provide different types of information. The time domain response works quickly, primarily aiding the direction-sensing mechanism and is older in evolutionary terms. The frequency domain response works more slowly, aiding the determination of pitch and timbre and evolved later, presumably after speech evolved.

The earliest use of hearing was as a survival mechanism to augment vision. The most important aspect of the hearing mechanism was the ability to determine the location of the sound source. Figure 2.10 shows that the brain can examine several possible differences between the signals reaching the two ears. At (a) a phase shift will be apparent. At (b) the distant ear is shaded by the head resulting in a different frequency response compared to the nearer ear. At (c) a transient sound arrives later at the more distant ear. The inter-aural phase, delay and level mechanisms vary in their effectiveness depending on the nature of the sound to be located. At some point a fuzzy logic decision has to be made as to how the information from these different mechanisms will be weighted.

There will be considerable variation with frequency in the phase shift between the ears. At a low frequency such as 30 Hz, the wavelength is around 11.5 metres and so this mechanism must be quite weak at low frequencies. At high frequencies the ear spacing is many wavelengths producing a confusing and complex phase relationship. This suggests a frequency limit of around 1500 Hz which has been confirmed by experiment.

Figure 2.10    Having two spaced ears is cool. (a) Off-centre sounds result in phase difference. (b) Distant ear is shaded by head producing loss of high frequencies. (c) Distant ear detects transient later.

At low and middle frequencies sound will diffract round the head sufficiently well that there will be no significant difference between the level at the two ears. Only at high frequencies does sound become directional enough for the head to shade the distant ear causing what is called an inter-aural intensity difference (IID).

Phase differences are only useful at low frequencies and shading only works at high frequencies. Fortunately real-world sounds are timbral or broadband and often contain transients. Timbral, broadband and transient sounds differ from tones in that they contain many different frequencies.

A transient has an unique aperiodic waveform which, as Figure 2.10(c) shows, suffers no ambiguity in the assessment of inter-aural delay (IAD) between two versions. Note that a one-degree change in sound location causes a IAD of around 10 microseconds. The smallest detectable IAD is a remarkable 6 microseconds. This should be the criterion for spatial reproduction accuracy.

A timbral waveform is periodic at the fundamental frequency but the presence of harmonics means that a greater number of nerve firings can be compared between the two ears. As the statistical deviation of nerve firings with respect to the incoming waveform is about 100 microseconds the only way in which an IAD of 6 microseconds can be resolved is if the timing of many nerve firings is correlated in some way in the brain.

Transient noises produce a one-off pressure step whose source is accurately and instinctively located. Figure 2.11 shows an idealized transient pressure waveform following an acoustic event. Only the initial transient pressure change is required for location. The time of arrival of the transient at the two ears will be different and will locate the source laterally within a processing delay of around a millisecond.

Following the event which generated the transient, the air pressure equalizes. The time taken for this equalization varies and allows the listener to establish the likely size of the sound source. The larger the source, the longer the pressure-equalization time. Only after this does the frequency analysis mechanism tell anything about the pitch and timbre of the sound.

Figure 2.11    Real acoustic event produces a pressure step. Initial step is used for spatial location, equalization time signifies size of source. (Courtesy Manger Schallwandlerbau.)

The above results suggest that anything in a sound reproduction system which impairs the reproduction of a transient pressure change will damage localization and the assessment of the pressure-equalization time. Clearly in an audio system which claims to offer any degree of precision, every component must be able to reproduce transients accurately and must have at least a minimum phase characteristic if it cannot be phase linear. In this respect digital audio represents a distinct technical performance advantage although much of this is lost in poor transducer design, especially in loudspeakers.

2.9 Level and loudness

At its best, the ear can detect a sound pressure variation of only 2 10–5 Pascals r.m.s. and so this figure is used as the reference against which sound pressure level (SPL) is measured. The sensation of loudness is a logarithmic function of SPL and consequently a logarithmic unit, the deciBel, was adopted for audio measurement. The deciBel is explained in detail in section 2.19.

The dynamic range of the ear exceeds 130 dB, but at the extremes of this range, the ear is either straining to hear or is in pain. Neither of these cases can be described as pleasurable or entertaining, and it is hardly necessary to produce audio of this dynamic range since, among other things, the consumer is unlikely to have anywhere sufficiently quiet to listen to it. On the other hand, extended listening to music whose dynamic range has been excessively compressed is fatiguing.

The frequency response of the ear is not at all uniform and it also changes with SPL. The subjective response to level is called loudness and is measured in phons. The phon scale is defined to coincide with the SPL scale at 1 kHz, but at other frequencies the phon scale deviates because it displays the actual SPLs judged by a human subject to be equally loud as a given level at 1 kHz. Figure 2.12 shows the so-called equal loudness contours which were originally measured by Fletcher and Munson and subsequently by Robinson and Dadson. Note the irregularities caused by resonances in the meatus at about 4 kHz and 13 kHz.

Usually, people’s ears are at their most sensitive between about 2 kHz and 5 kHz, and although some people can detect 20 kHz at high level, there is much evidence to suggest that most listeners cannot tell if the upper frequency limit of sound is 20 kHz or 16 kHz.2,3 For a long time it was thought that frequencies below about 40 Hz were unimportant, but it is now clear that reproduction of frequencies down to 20 Hz improves reality and ambience.4 The generally accepted frequency range for high- quality audio is 20 Hz to 20 000 Hz, although for broadcasting an upper limit of 15 000 Hz is often applied.

Figure 2.12    Contours of equal loudness showing that the frequency response of the ear is highly level dependent (solid line, age 20; dashed line, age 60).

The most dramatic effect of the curves of Figure 2.12 is that the bass content of reproduced sound is disproportionately reduced as the level is turned down. This would suggest that if a sufficiently powerful yet high-quality reproduction system is available the correct tonal balance when playing a good recording can be obtained simply by setting the volume control to the correct level. This is indeed the case. A further consideration is that many musical instruments as well as the human voice change timbre with level and there is only one level which sounds correct for the timbre.

Audio systems with a more modest specification would have to resort to the use of tone controls to achieve a better tonal balance at lower SPL. A loudness control is one where the tone controls are automatically invoked as the volume is reduced. Although well meant, loudness controls seldom compensate accurately because they must know the original level at which the material was meant to be reproduced as well as the actual level in use. The equalization applied would have to be the difference between the equal loudness curves at the two levels.

There is no standard linking the signal level on a recording with the SPL at the microphone. The SPL resulting from a given signal level leaving a loudness control depends upon the sensitivity of the power amplifier and the loudspeakers and the acoustics of the listening room. Consequently unless set up for a particular installation, loudness controls are doomed to be inaccurate and are eschewed on high-quality equipment.

A further consequence of level-dependent hearing response is that recordings which are mixed at an excessively high level will appear bass light when played back at a normal level. Such recordings are more a product of self-indulgence than professionalism.

Loudness is a subjective reaction and is almost impossible to measure. In addition to the level-dependent frequency response problem, the listener uses the sound not for its own sake but to draw some conclusion about the source. For example, most people hearing a distant motorcycle will describe it as being loud. Clearly at the source, it is loud, but the listener has compensated for the distance.

The best that can be done is to make some compensation for the level-dependent response using weighting curves. Ideally there should be many, but in practice the A, B and C weightings were chosen where the A curve is based on the 40-phon response. The measured level after such a filter is in units of dBA. The A curve is almost always used because it most nearly relates to the annoyance factor of distant noise sources.

2.10 Frequency discrimination

Figure 2.13 shows an uncoiled basilar membrane with the apex on the left so that the usual logarithmic frequency scale can be applied. The envelope of displacement of the basilar membrane is shown for a single frequency at (a). The vibration of the membrane in sympathy with a single frequency cannot be localized to an infinitely small area, and nearby areas are forced to vibrate at the same frequency with an amplitude that decreases with distance. Note that the envelope is asymmetrical because the membrane is tapering and because of frequency-dependent losses in the propagation of vibrational energy down the cochlea. If the frequency is changed, as in (b), the position of maximum displacement will also change. As the basilar membrane is continuous, the position of maximum displacement is infinitely variable allowing extremely good pitch discrimination of about one twelfth of a semitone which is determined by the spacing of hair cells.

Figure 2.13    The basilar membrane symbolically uncoiled. (a) Single frequency causes the vibration envelope shown. (b) Changing the frequency moves the peak of the envelope.

In the presence of a complex spectrum, the finite width of the vibration envelope means that the ear fails to register energy in some bands when there is more energy in a nearby band. Within those areas, other frequencies are mechanically excluded because their amplitude is insufficient to dominate the local vibration of the membrane. Thus the Q factor of the membrane is responsible for the degree of auditory masking, defined as the decreased audibility of one sound in the presence of another.

2.11 Critical bands

The term used in psychoacoustics to describe the finite width of the vibration envelope is critical bandwidth. Critical bands were first described by Fletcher.5 The envelope of basilar vibration is a complicated function. It is clear from the mechanism that the area of the membrane involved will increase as the sound level rises. Figure 2.14 shows the bandwidth as a function of level.

As will be seen in Chapter 3, transform theory teaches that the higher the frequency resolution of a transform, the worse the time accuracy. As the basilar membrane has finite frequency resolution measured in the width of a critical band, it follows that it must have finite time resolution. This also follows from the fact that the membrane is resonant, taking time to start and stop vibrating in response to a stimulus. There are many examples of this. Figure 2.15 shows the impulse response. Figure 2.16 shows the perceived loudness of a tone burst increases with duration up to about 200 ms due to the finite response time.

Figure 2.14    The critical bandwidth changes with SPL.

Figure 2.15    Impulse response of the ear showing slow attack and decay due to resonant behaviour.

Figure 2.16    Perceived level of tone burst rises with duration as resonance builds up.

The ear has evolved to offer intelligibility in reverberant environments which it does by averaging all received energy over a period of about 30 ms. Reflected sound which arrives within this time is integrated to produce a louder sensation, whereas reflected sound which arrives after that time can be temporally discriminated and is perceived as an echo. Microphones have no such ability, which is why acoustic treatment is often needed in areas where microphones are used.

A further example of the finite time discrimination of the ear is the fact that short interruptions to a continuous tone are difficult to detect. Finite time resolution means that masking can take place even when the masking tone begins after and ceases before the masked sound. This is referred to as forward and backward masking.6

Figure 2.17    Effective rectangular bandwidth of critical band is much wider than the resolution of the pitch discrimination mechanism.

As the vibration envelope is such a complicated shape, Moore and Glasberg have proposed the concept of equivalent rectangular bandwidth to simplify matters. The ERB is the bandwidth of a rectangular filter which passes the same power as a critical band. Figure 2.17(a) shows the expression they have derived linking the ERB with frequency. This is plotted in (b) where it will be seen that one third of an octave is a good approximation. This is about thirty times broader than the pitch discrimination also shown in (b).

Some treatments of human hearing liken the basilar membrane to a bank of fixed filters each of which is the width of a critical band. The frequency response of such a filter can be deduced from the envelope of basilar displacement as has been done in Figure 2.18. The fact that no agreement has been reached on the number of such filters should alert the suspicions of the reader. A third octave filter bank model cannot explain pitch discrimination some thirty times better. The response of the basilar membrane is centred upon the input frequency and no fixed filter can do this. However, the most worrying aspect of the fixed filter model is that according to Figure 2.18(b) a single tone would cause a response in several bands which would be interpreted as several tones. This is at variance with reality. Far from masking higher frequencies, we appear to be creating them!

Figure 2.18    (a) If the ear behaved like a fixed filter bank the filter response could be derived as shown here. (b) This theory does not hold because a single tone would cause response in several bands.

This author prefers to keep in mind how the basilar membrane is actually vibrating is response to an input spectrum. If a mathematical model of the ear is required, then it has to be described as performing a finite resolution continuous frequency transform.

2.12 Beats

Figure 2.19 shows an electrical signal (a) in which two equal sine waves of nearly the same frequency have been linearly added together. Note that the envelope of the signal varies as the two waves move in and out of phase. Clearly the frequency transform calculated to infinite accuracy is that shown at (b). The two amplitudes are constant and there is no evidence of the envelope modulation. However, such a measurement requires an infinite time. When a shorter time is available, the frequency discrimination of the transform falls and the bands in which energy is detected become broader.

Figure 2.19    (a) Result of adding two sine waves of similar frequency. (b) Spectrum of (a) to infinite accuracy. (c) With finite accuracy only a single frequency is distinguished whose amplitude changes with the envelope of (a) giving rise to beats.

When the frequency discrimination is too wide to distinguish the two tones as in (c), the result is that they are registered as a single tone. The amplitude of the single tone will change from one measurement to the next because the envelope is being measured. The rate at which the envelope amplitude changes is called a beat frequency which is not actually present in the input signal. Beats are an artifact of finite frequency resolution transforms. The fact that human hearing produces beats from pairs of tones proves that it has finite resolution.

Measurement of when beats occur allows measurement of critical bandwidth. Figure 2.20 shows the results of human perception of a two-tone signal as the frequency dF difference changes. When dF is zero, described musically as unison, only a single note is heard. As dF increases, beats are heard, yet only a single note is perceived. The limited frequency resolution of the basilar membrane has fused the two tones together. As dF increases further, the sensation of beats ceases at 12–15 Hz and is replaced by a sensation of roughness or dissonance. The roughness is due to parts of the basilar membrane being unable to decide the frequency at which to vibrate. The regenerative effect may well become confused under such conditions. The roughness which persists until dF has reached the critical bandwidth beyond which two separate tones will be heard because there are now two discrete basilar resonances. In fact this is the definition of critical bandwidth.

Figure 2.20    Perception of two-tone signal as frequency difference changes.

2.13 Music and the ear

The characteristics of the ear, especially critical bandwidth, are responsible for the way music has evolved. Beats are used extensively in music. When tuning a pair of instruments together, a small tuning error will result in beats when both play the same nominal note. In certain pipe organs, pairs of pipes are sounded together with a carefully adjusted pitch error which results in a pleasing tremolo effect.

With certain exceptions, music is intended to be pleasing and so dissonance is avoided. Two notes which sound together in a pleasing manner are described as harmonious or consonant. Two sine waves appear consonant if they separated by a critical bandwidth because the roughness of Figure 2.20 is avoided, but real musical instruments produce a series of harmonics in addition to the fundamental.

Figure 2.21 shows the spectrum of a harmonically rich instrument. The fundamental and the first few harmonics are separated by more than a critical band, but from the seventh harmonic more than one harmonic will be in one band and it is possible for dissonance to occur. Musical instruments have evolved to avoid the production of seventh and higher harmonics. Violins and pianos are played or designed to excite the strings at a node of the seventh harmonic to suppress this dissonance.

Harmonic distortion in audio equipment is easily detected even in minute quantities because the first few harmonics fall in non-overlapping critical bands. The sensitivity of the ear to third harmonic distortion probably deserves more attention in audio equipment than the fidelity of the dynamic range or frequency response. The ear is even more sensitive to anharmonic distortion which can be generated in poor-quality ADCs. This topic will be considered in Chapter 4.

Figure 2.21    Spectrum of a real instrument with respect to critical bandwidth. High harmonics can fall in the same critical band and cause dissonance.

When two harmonically rich notes are sounded together, the harmonics will fall within the same critical band and cause dissonance unless the fundamentals have one of a limited number of simple relationships which makes the harmonics fuse. Clearly an octave relationship is perfect.

Figure 2.22 shows some examples. In (a) two notes with the ratio (interval) 3:2 are considered. The harmonics are either widely separated or fused and the combined result is highly consonant. The interval of 3:2 is known to musicians as a perfect fifth. In (b) the ratio is 4:3. All harmonics are either at least a third of an octave apart or are fused. This relationship is known as a perfect fourth. The degree of dissonance over the range from 1:1 to 2:1 (unison to octave) was investigated by Helmholtz and is shown in (c). Note that the dissonance rises at both ends where the fundamentals are within a critical bandwidth of one another. Dissonances in the centre of the scale are where some harmonics lie within a critical bandwidth of one another. Troughs in the curve indicate areas of consonance. Many of the troughs are not very deep, indicating that the consonance is not perfect. This is because of the effect shown in Figure 2.21 in which high harmonics get closer together with respect to critical bandwidth. When the fundamentals are closer together, the harmonics will become dissonant at a lower frequency, reducing the consonance. Figure 2.22 also shows the musical terms used to describe the consonant intervals.

It is clear from Figure 2.22(c) that the notes of the musical scale have empirically been established to allow the maximum consonance with pairs of notes and chords. Early instruments were tuned to the just diatonic scale in exactly this way. Unfortunately the just diatonic scale does not allow changes of key because the notes are not evenly spaced. A key change is where the frequency of every note in a piece of music is multiplied by a constant, often to bring the accompaniment within the range of a singer. In continuously tuned instruments such as the violin and the trombone this is easy, but with fretted or keyboard instruments such as a piano there is a problem.

Figure 2.22    (a) Perfect fifth with a frequency ratio of 3:2 is consonant because harmonics are either in different critical bands or are fused. (b) Perfect fourth achieves the same result with 4:3 frequency ratio. (c) Degree of dissonance over range from 1:1 to 2:1.

The equal-tempered scale is a compromise between consonance and key changing. The octave is divided into twelve equal intervals called tempered semitones. On a keyboard, seven of the keys are white and produce notes very close to those of the just diatonic scale, and five of the keys are black. Music can be transposed in semitone steps by using the black keys.

Figure 2.23 shows an example of transposition where a scale is played in several keys.

2.14 The sensation of pitch

Frequency is an objective measure whereas pitch is the subjective near equivalent. Clearly frequency and level are independent, whereas pitch and level are not. Figure 2.24 shows the relationship between pitch and level. Place theory indicates that the hearing mechanism can sense a single frequency quite accurately as a function of the place or position of maximum basilar vibration. However, most periodic sounds and real musical instruments produce a series of harmonics in addition to the fundamental. When a harmonically rich sound is present the basilar membrane is excited at spaced locations. Figure 2.25 (a) shows all harmonics, (b) shows even harmonics predominating and (c) shows odd harmonics predominating. It would appear that our hearing is accustomed to hearing harmonics in various amounts and the consequent regular pattern of excitation. It is the overall pattern which contributes to the sensation of pitch even if individual partials vary enormously in relative level.

Figure 2.23    With a suitably tempered octave, scales can be played in different keys.

Figure 2.24    Pitch sensation is a function of level.

Figure 2.25    (a) Harmonic structure of rich sound. (b) Even harmonic predominance. (c) Odd harmonic predominance. Pitch perception appears independent of harmonic structure.

Experimental signals in which the fundamental has been removed leaving only the harmonics result in unchanged pitch perception. The pattern in the remaining harmonics is enough uniquely to establish the missing fundamental. Imagine the fundamental in (b) to be absent. Neither the second harmonic nor the third can be mistaken for the fundamental because if they were fundamentals a different pattern of harmonics would result. A similar argument can be put forward in the time domain, where the timing of phase-locked nerve firings responding to a harmonic will periodically coincide with the nerve firings of the fundamental. The ear is used to such time patterns and will use them in conjunction with the place patterns to determine the right pitch. At very low frequencies the place of maximum vibration does not move with frequency yet the pitch sensation is still present because the nerve firing frequency is used.

Figure 2.26    Pitch discrimination fails as frequency rises. The graph shows the number of cycles needed to distinguish pitch as a function of frequency.

As the fundamental frequency rises it is difficult to obtain a full pattern of harmonics as most of them fall outside the range of hearing. The pitch discrimination ability is impaired and needs longer to operate. Figure 2.26 shows the number of cycles of excitation needed to discriminate pitch as a function of frequency. Clearly at around 5 kHz performance is failing because there are hardly any audible harmonics left. Phase locking also fails at about the same frequency. Musical instruments have evolved accordingly, with the highest notes of virtually all instruments found below 5 kHz.

2.15 Frequency response and linearity

It is a goal in high-quality sound reproduction that the timbre of the original sound shall not be changed by the reproduction process. There are two ways in which timbre can inadvertently be changed, as Figure 2.27 shows. In (a) the spectrum of the original shows a particular relationship between harmonics. This signal is passed through a system (b) which has an unequal response at different frequencies. The result is that the harmonic structure (c) has changed, and with it the timbre. Clearly a fundamental requirement for quality sound reproduction is that the response to all frequencies should be equal.

Frequency response is easily tested using sine waves of constant amplitude at various frequencies as an input and noting the output level for each frequency.

Figure 2.28 shows that another way in which timbre can be changed is by non-linearity. All audio equipment has a transfer function between the input and the output which form the two axes of a graph. Unless the transfer function is exactly straight or linear, the output waveform will differ from the input. A non-linear transfer function will cause distortion which changes the distribution of harmonics and changes timbre.

Figure 2.27    Why frequency response matters. Original spectrum at (a) determines timbre of sound. If original signal is passed through a system with deficient frequency response (b), the timbre will be changed (c).

At a real microphone placed before an orchestra a multiplicity of sounds may arrive simultaneously. The microphone diaphragm can only be in one place at a time, so the output waveform must be the sum of all the sounds. An ideal microphone connected by ideal amplification to an ideal loudspeaker will reproduce all of the sounds simultaneously by linear superimposition. However, should there be a lack of linearity anywhere in the system, the sounds will no longer have an independent existence, but will interfere with one another, changing one another’s timbre and even creating new sounds which did not previously exist. This is known as intermodulation. Figure 2.29 shows that a linear system will pass two sine waves without interference. If there is any nonlinearity, the two sine waves will intermodulate to produce sum and difference frequencies which are easily observed in the otherwise pure spectrum.

Figure 2.28    Non-linearity of the transfer function creates harmonics by distorting the waveform. Linearity is extremely important in audio equipment.

Figure 2.29    (a) A perfectly linear system will pass a number of superimposed waveforms without interference so that the output spectrum does not change. (b) A non-linear system causes inter-modulation where the output spectrum contains sum and difference frequencies in addition to the originals.

Figure 2.30    A sine wave is one component of a rotation. When a rotation is viewed from two places at right angles, one will see a sine wave and the other will see a cosine wave. The constant phase shift between sine and cosine is 90° and should not be confused with the time variant phase angle due to the rotation.

2.16 The sine wave

As the sine wave is so useful it will be treated here in detail. Figure 2.30 shows a constant speed rotation viewed along the axis so that the motion is circular. Imagine, however, the view from one side in the plane of the rotation. From a distance only a vertical oscillation will be observed and if the position is plotted against time the resultant waveform will be a sine wave. Geometrically it is possible to calculate the height or displacement because it is the radius multiplied by the sine of the phase angle.

The phase angle is obtained by multiplying the angular velocity w by the time t. Note that the angular velocity is measured in radians per second whereas frequency f is measured in rotations per second or Hertz (Hz). As a radian is unit distance at unit radius (about 57°) then there are 2π radians in one rotation. Thus the phase angle at a time t is given by sinwt or sin2πft.

Imagine a second viewer who is at right angles to the first viewer. He will observe the same waveform, but at a different time. The displacement will be given by the radius multiplied by the cosine of the phase angle. When plotted on the same graph, the two waveforms are phase-shifted with respect to one another. In this case the phase-shift is 90° and the two waveforms are said to be in quadrature. Incidentally the motions on each side of a steam locomotive are in quadrature so that it can always get started (the term used is quartering). Note that the phase angle of a signal is constantly changing with time whereas the phase-shift between two signals can be constant. It is important that these two are not confused.

The velocity of a moving component is often more important in audio than the displacement. The vertical component of velocity is obtained by differentiating the displacement. As the displacement is a sine wave, the velocity will be a cosine wave whose amplitude is proportional to frequency. In other words the displacement and velocity are in quadrature with the velocity lagging. This is consistent with the velocity reaching a minimum as the displacement reaches a maximum and vice versa. Figure 2.31 shows the displacement, velocity and acceleration waveforms of a body executing SHM. Note that the acceleration and the displacement are always anti-phase.

Figure 2.31    The displacement, velocity and acceleration of a body executing simple harmonic motion (SHM).

2.17 Root mean square measurements

Figure 2.32(a) shows that according to Ohm’s law, the power dissipated in a resistance is proportional to the square of the applied voltage. This causes no difficulty with direct current (DC), but with alternating signals such as audio it is harder to calculate the power. Consequently a unit of voltage for alternating signals was devised. Figure 2.32(b) shows that the average power delivered during a cycle must be proportional to the mean of the square of the applied voltage. Since power is proportional to the square of applied voltage, the same power would be dissipated by a DC voltage whose value was equal to the square root of the mean of the square of the AC voltage. Thus the Volt rms (root mean square) was specified. An AC signal of a given number of Volts rms will dissipate exactly the same amount of power in a given resistor as the same number of Volts DC.

Figure 2.33(a) shows that for a sine wave the rms voltage is obtained by dividing the peak voltage Vpk by the square root of two. However, for a square wave(b) the rms voltage and the peak voltage are the same. Most moving coil AC voltmeters only read correctly on sine waves, whereas many electronic meters incorporate a true rms calculation. square wave the peak and rms voltage is the same.

Figure 2.32    (a) Ohm’s law: the power developed in a resistor is proportional to the square of the voltage. Consequently, 1 mW in 600 Ω requires 0.775 V. With a sinusoidal alternating input (b), the power is a sine squared function which can be averaged over one cycle. A DC voltage which would deliver the same power has a value which is the square root of the mean of the square of the sinusoidal input.

Figure 2.33    (a) For a sine wave the conversion factor from peak to rms is . (b) For a square wave the peak and rms voltage is the same.

On an oscilloscope it is often easier to measure the peak-to-peak voltage which is twice the peak voltage. The rms voltage cannot be measured directly on an oscilloscope since it depends on the waveform although the calculation is simple in the case of a sine wave.

2.18 The deciBel

The first audio signals to be transmitted were on telephone lines. Where the wiring is long compared to the electrical wavelength (not to be confused with the acoustic wavelength) of the signal, a transmission line exists in which the distributed series inductance and the parallel capacitance interact to give the line a characteristic impedance. In telephones this turned out to be about 600 . In transmission lines the best power delivery occurs when the source and the load impedance are the same; this is the process of matching.

It was often required to measure the power in a telephone system, and one milliwatt was chosen as a suitable unit. Thus the reference against which signals could be compared was the dissipation of one milliwatt in 600 . Figure 2.32(a) shows that the dissipation of 1 mW in 600 will be due to an applied voltage of 0.775 V rms. This voltage became the reference against which all audio levels are compared.

The deciBel is a logarithmic measuring system and has its origins in telephony7 where the loss in a cable is a logarithmic function of the length. Human hearing also has a logarithmic response with respect to sound pressure level (SPL). In order to relate to the subjective response audio signal level measurements have also to be logarithmic and so the deciBel was adopted for audio.

Figure 2.34    (a) The logarithm of a number is the power to which the base (in this case 10) must be raised to obtain the number. (b) Multiplication is obtained by adding logs, division by subtracting. (c) The slide rule has two logarithmic scales whose length can easily be added or subtracted.

Figure 2.34 shows the principle of the logarithm. To give an example, if it is clear that 102 is 100 and 103 is 1000, then there must be a power between 2 and 3 to which 10 can be raised to give any value between 100 and 1000. That power is the logarithm to base 10 of the value. e.g. log10300 = 2.5 approx. Note that 100 is 1.

Logarithms were developed by mathematicians before the availability of calculators or computers to ease calculations such as multiplication, squaring, division and extracting roots. The advantage is that, armed with a set of log tables, multiplication can be performed by adding and division by subtracting. Figure 2.34 shows some examples. It will be clear that squaring a number is performed by adding two identical logs and the same result will be obtained by multiplying the log by 2.

The slide rule is an early calculator which consists of two logarithmically engraved scales in which the length along the scale is proportional to the log of the engraved number. By sliding the moving scale two lengths can easily be added or subtracted and as a result multiplication and division is readily obtained.

Figure 2.35    (a) The Bel is the log of the ratio between two powers, that to be measured and the reference. The Bel is too large so the deciBel is used in practice. (b) As the dB is defined as a power ratio, voltage ratios have to be squared. This is conveniently done by doubling the logs so the ratio is now multiplied by 20.

The logarithmic unit of measurement in telephones was called the Bel after Alexander Graham Bell, the inventor. Figure 2.35(a) shows that the Bel was defined as the log of the power ratio between the power to be measured and some reference power. Clearly the reference power must have a level of 0 Bels since log10 1 is 0.

The Bel was found to be an excessively large unit for practical purposes and so it was divided into 10 deciBels, abbreviated dB with a small d and a large B and pronounced deebee. Consequently the number of dB is ten times the log of the power ratio. A device such as an amplifier can have a fixed power gain which is independent of signal level and this can be measured in dB. However, when measuring the power of a signal, it must be appreciated that the dB is a ratio and to quote the number of dBs without stating the reference is about as senseless as describing the height of a mountain as 2000 without specifying whether this is feet or metres. To show that the reference is one milliwatt into 600 , the units will be dB(m). In radio engineering, the dB(W) will be found which is power relative to one watt.

Although the dB(m) is defined as a power ratio, level measurements in audio are often done by measuring the signal voltage using 0.775 V as a reference in a circuit whose impedance is not necessarily 600 . Figure 2.35(b) shows that as the power is proportional to the square of the voltage, the power ratio will be obtained by squaring the voltage ratio. As squaring in logs is performed by doubling, the squared term of the voltages can be replaced by multiplying the log by a factor of two. To give a result in deciBels, the log of the voltage ratio now has to be multiplied by 20.

Whilst 600 matched impedance working is essential for the long distances encountered with telephones, it is quite inappropriate for analog audio wiring in a studio. The wavelength of audio in wires at 20 kHz is 15 km. Studios are built on a smaller scale than this and clearly analog audio cables are not transmission lines and their characteristic impedance is not relevant.

In professional analog audio systems impedance matching is not only unnecessary it is also undesirable. Figure 2.36(a) shows that when impedance matching is required the output impedance of a signal source must be artificially raised so that a potential divider is formed with the load. The actual drive voltage must be twice that needed on the cable as the potential divider effect wastes 6 dB of signal level and requires unnecessarily high power supply rail voltages in equipment. A further problem is that cable capacitance can cause an undesirable HF roll-off in conjunction with the high source impedance.

In modern professional analog audio equipment, shown in Figure 2.36(b), the source has the lowest output impedance practicable. This means that any ambient interference is attempting to drive what amounts to a short circuit and can only develop very small voltages. Furthermore shunt capacitance in the cable has very little effect. The destination has a somewhat higher impedance (generally a few k) to avoid excessive currents flowing and to allow several loads to be placed across one driver.

Figure 2.36    (a) Traditional impedance matched source wastes half the signal voltage in the potential divider due to the source impedance and the cable. (b) Modern practice is to use low-output impedance sources with high-impedance loads.

In the absence of a fixed impedance it is meaningless to consider power. Consequently only signal voltages are measured. The reference remains at 0.775 V, but power and impedance are irrelevant. Voltages measured in this way are expressed in dB(u); the most common unit of level in modern analog systems. Most installations boost the signals on interface cables by 4 dB. As the gain of receiving devices is reduced by 4 dB, the result is a useful noise advantage without risking distortion due to the drivers having to produce high voltages.

In order to make the difference between dB(m) and dB(u) clear, consider the lossless matching transformer shown in Figure 2.37. The turns ratio is 2:1 therefore the impedance matching ratio is 4:1. As there is no loss in the transformer, the input power is the same as the output power so that the transformer shows a gain of 0 dB(m). However, the turns ratio of 2:1 provides a voltage gain of 6 dB(u). The doubled output voltage will deliver the same power to the quadrupled load impedance.

Figure 2.37    A lossless transformer has no power gain so the level in dB(m) on input and output is the same. However, there is a voltage gain when measurements are made in dB(u).

In a complex system signals may pass through a large number of processes, each of which may have a different gain. If one stays in the linear domain and measures the input level in volts rms, the output level will be obtained by multiplying by the gains of all the stages involved. This is a complex calculation.

The difference between the signal level with and without the presence of a device in a chain is called the insertion loss measured in dB. However, if the input is measured in dB(u), the output level of the first stage can be obtained by adding the insertion loss in dB. This is shown in Figure 2.38. The output level of the second stage can be obtained by further adding the loss of the second stage in dB and so on. The final result is obtained by adding together all the insertion losses in dB and adding them to the input level in dB(u) to give the output level in dB(u). As the dB is a pure ratio it can multiply anything (by addition of logs) without changing the units. Thus dB(u) of level added to dB of gain are still dB(u).

Figure 2.38    In complex systems each stage may have voltage gain measured in dB. By adding all of these gains together and adding to the input level in dB(u), the output level in dB(u) can be obtained.

In acoustic measurements, the sound pressure level (SPL) is measured in deciBels relative to a reference pressure of 2 10–5 Pascals (Pa) rms. In order to make the reference clear the units are dB(SPL). In measurements which are intended to convey an impression of subjective loudness, a weighting filter is used prior to the level measurement which reproduces the frequency response of human hearing which is most sensitive in the midrange. The most common standard frequency response is the so-called A-weighting filter, hence the term dB(A) used when a weighted level is being measured. At high or low frequencies, a lower reading will be obtained in dB(A) than in dB(SPL).

2.19 Audio level metering

There are two main reasons for having level meters in audio equipment: to line up or adjust the gain of equipment, and to assess the amplitude of the program material.

Line-up is often done using a 1 kHz sine wave generated at an agreed level such as 0 dB(u). If a receiving device does not display the same level, then its input sensitivity must be adjusted. Tape recorders and other devices which pass signals through are usually lined up so that their input and output levels are identical, i.e. their insertion loss is 0 dB. Lineup is important in large systems because it ensures that inadvertent level changes do not occur.

In measuring the level of a sine wave for the purposes of line-up, the dynamics of the meter are of no consequence, whereas on program material the dynamics matter a great deal. The simplest (and cheapest) level meter is essentially an AC voltmeter with a logarithmic response. As the ear is logarithmic, the deflection of the meter is roughly proportional to the perceived volume, hence the term volume unit (VU) meter.

In audio recording and broadcasting, the worst sin is to overmodulate the tape or the transmitter by allowing a signal of excessive amplitude to pass. Real audio signals are rich in short transients which pass before the sluggish VU meter responds. Consequently the VU meter is also called the virtually useless meter in professional circles.

Broadcasters developed the peak program meter (PPM) which is also logarithmic, but which is designed to respond to peaks as quickly as the ear responds to distortion. Consequently the attack time of the PPM is carefully specified. If a peak is so short that the PPM fails to indicate its true level, the resulting overload will also be so brief that the ear will not hear it. A further feature of the PPM is that the decay time of the meter is very slow, so that any peaks are visible for much longer and the meter is easier to read because the meter movement is less violent.

Figure 2.39    Some of the scales used in conjunction with the PPM dynamics. (After Francis Rumsey, with permission.).

The original PPM as developed by the BBC was sparsely calibrated, but other users have adopted the same dynamics and added dB scales, Figure 2.39 shows some of the scales in use.

In broadcasting, the use of level metering and line-up procedures ensures that the level experienced by the listener does not change significantly from program to program. Consequently in a transmission suite, the goal would be to broadcast recordings at a level identical to that which was obtained during production. However, when making a recording prior to any production process, the goal would be to modulate the recording as fully as possible without clipping as this would then give the best signal-to-noise ratio. The level would then be reduced if necessary in the production process.

2.20 Vectors

Often several signals of the same frequency but with differing phases need to be added. When the the two phases are identical, the amplitudes are simply added. When the two phases are 180° apart the amplitudes are subtracted. When there is an arbitrary phase relationship, vector addition is needed. A vector is simply an arrow whose length represents the amplitude and whose direction represents the phase-shift. Figure 2.40 shows a vector diagram showing the phase relationship in the common three-phase electricity supply. The length of each vector represents the phase-to-neutral voltage which in many countries is about 230 V rms. As each phase is at 120° from the next, what will the phase-to-phase voltage be? Figure 2.40 shows that the answer can be found geometrically to be about 380 V rms. Consequently whilst a phase-to-neutral shock is not recommended, getting a phase-to-phase shock is recommended even less!

Figure 2.40    Three-phase electricity uses three signals mutually at 120°. Thus the phase-to-phase voltage has to be calculated vectorially from the phase-to-neutral voltage as shown.

Figure 2.41    The possibility of a phase-to-phase shock is reduced in suburban housing by rotating phases between adjacent buildings. This also balances the loading on the three phases.

The three-phase electricity supply has the characteristic that although each phase passes through zero power twice a cycle, the total power is constant. This results in less vibration at the generator and in large motors. When a three-phase system is balanced (i.e. there is an equal load on each phase) there is no neutral current. Figure 2.41 shows that most suburban power installations each house only has a single-phase supply for safety. The houses are connected in rotation to balance the load. Business premises such as recording studios and broadcasters will take a three-phase supply which should be reasonably balanced by connecting equal loading to each phase.

2.21 Phase angle and power factor

The power is only obtained by multiplying the voltage by the current when the load is resistive. Only with a resistive load will the voltage and the current be in the same phase. In both electrical and audio power distribution systems, the load may be reactive which means that the current and voltage waveforms have a relative phase-shift. Mathematicians would describe the load as complex.

Figure 2.42    Ideal capacitors conduct current with a quadrature phase lead whereas inductors have a quadrature phase lag. In both cases the quadrature makes the product of current and voltage zero so no power is dissipated.

In a reactive load, the power in Watts W is given by multiplying the rms voltage, the rms current and the cosine of the relative phase angle . Clearly if the voltage and current are in quadrature there can be no power dissipated because cos is zero. Cos is called the power factor. Figure 2.42 shows that this happens with perfect capacitors and perfect inductors connected to an AC supply. With a perfect capacitor, the current leads the voltage by 90°, whereas with a perfect inductor the current lags the voltage by 90°.

A power factor significantly less than one is undesirable because it means that larger currents are flowing than are necessary to deliver the power. The losses in distribution are proportional to the current and so a reactive load is an inefficient load. Lightly loaded transformers and induction motors act as inductive loads with a poor power factor. In some industrial installations it is economic to install power factor correction units which are usually capacitor banks that balance the lagging inductive load with a capacitive lead.

As the power factor of a load cannot be anticipated, AC distribution equipment is often rated in Volt-Amps (VA) instead of Watts. With a resistive load, the two are identical, but with a reactive load the power which can be delivered falls. As loudspeakers are almost always reactive, audio amplifiers should be rated in VA. Instead amplifiers are rated in Watts leaving the unfortunate user to find out for himself what reactive load can be driven.

2.22 Audio cabling

Balanced line working was developed for professional analog audio as a means to reject noise. This is particularly important for microphone signals because of the low levels, but is also important for both line level analog and digital signals where interference may be encountered from electrical and radio installations. Figure 2.43 shows how balanced audio should be connected. The receiver subtracts one input from the other which rejects any common mode noise or hum picked up on the wiring. Twisting the wires tightly together ensures that both pick up the same amount of interference.

Figure 2.43    Balanced analog audio interface. Note that the braid plays no part in transmitting the audio signal, but bonds the two units together and acts as a screen. Loop currents flowing in the screen are harmless.

Figure 2.44    In star-quad cabling each leg of the balanced signal is connected to two conductors which are on opposite sides of a four-phase helix. The pickup of interference on the two legs is then as equal as possible so that the differential receiver can reject it.

The star-quad technique is possibly the ultimate interference rejecting cable construction. Figure 2.44 shows that in star-quad cable four conductors are twisted together. Diametrically opposite pairs are connected together at both ends of the cable and used as the two legs of a differential system. The interference pickup on the two legs is rendered as identical as possible by the construction so that it can be perfectly rejected at a well-engineered differential receiver.

The standard connector which has been used for professional audio for many years is the XLR which has three pins. It is easy to remember that pins 1, 2 and 3 connect to eXternal, Live and Return respectively. EXternal is the cable screen, Live is the in-phase leg of the balanced signal and Return is self-explanatory. The metal body shell of the XLR connector should be connected to both the cable screen and pin 1 although cheaper connectors do not provide a tag for the user to make this connection and rely on contact with the chassis socket to ground the shell. Oddly, the male connector (the one with pins) is used for equipment signal outputs, whereas the female (the one with receptacles) is used with signal inputs. This is so that when phantom power is used, the live parts are insulated.

Figure 2.45    (a) Unbalanced consumer equipment cannot be protected from hum loops because the signal return and the screen are the same conductor. (b) With a floating signal source there will be no current in the screen. Source must be double insulated for safety.

When making balanced cables it is important to ensure that the twisted pair is connected identically at both ends. If the two wires are inadvertently interchanged, the cable will still work, but a phase reversal will result, causing problems in analog stereo installations. Digital cables are unaffected by a phase reversal.

In consumer equipment differential working is considered too expensive. Instead single-ended analog signals using coax cable are found using phono, DIN and single-pole jack connectors. Whilst these are acceptable in permanent installations, they will not stand repeated connection and disconnection and become unreliable.

Effective unbalanced transmission of analog or digital signals over long distances is very difficult. When the signal return, the chassis ground and the safety ground are one and the same as in Figure 2.45(a), ground loop currents cannot be rejected. The only solution is to use equipment which is double insulated so that no safety ground is needed. Then each item can be grounded by the coax screen. As Figure 2.45(b) shows, there can then be no ground current as there is no loop. However, unbalanced working also uses higher impedances and lower signal levels and is more prone to interference. For these reasons some better-quality consumer equipment will be found using balanced signals.

2.23 EMC

EMC stands for electromagnetic compatibility which is a way of making electronic equipment more reliable by limiting both the amount of spurious energy radiated and the sensitivity to extraneous radiation. As electronic equipment becomes more common and more of our daily life depends upon its correct operation it becomes important to contain the unwanted effects of interference.

In audio equipment external interference can cause unwanted signals to be superimposed on the wanted audio signal. This is most likely to happen in sensitive stages handling small signals; e.g. microphone preamplifiers and tape replay stages. Interference can enter through any cables, including the power cable, or by radiation. Such stages must be designed from the outset with the idea that radio frequency energy may be present which must be rejected. Whilst one could argue that RF energy from an arcing switch should be suppressed at source one cannot argue that cellular telephones should be banned as they can only operate using RF radiation. When designing from the outset, RF rejection is not too difficult. Putting it in afterwards is often impossible without an uneconomic redesign.

There have been some complaints from the high-end Hi-Fi community that the necessary RF suppression components will impair the sound quality of audio systems but this is nonsense. In fact good EMC design actually improves sound quality because by eliminating common impedances which pick up interference distortion is also reduced.

In balanced signalling the screen does not carry the audio, but serves to extend the screened cabinets of the two pieces of equipment with a metallic tunnel. For this to be effective against RF interference it has to be connected at both ends. This is also essential for electrical safety so that no dangerous potential difference can build up between the units. Figure 2.43 showed that connecting the screen at both ends causes an earth loop with the building ground wiring. Loop currents will circulate as shown but this is not a problem because by shunting loop currents into the screen, they are kept out of the audio wiring.

Some poorly designed equipment routes the X-pin of the XLR via the pcb instead of direct to the equipment frame. As Figure 2.46 shows, this effectively routes loop currents through the circuitry and is prone to interference. This approach does not comply with recent EMC regulations but there is a lot of old equipment still in service which could be put right. A simple track cut and a new chassis-bonded XLR socket is often all that is necessary. Another false economy is the use of plastic XLR shells which cannot provide continuity of screening.

Figure 2.46    Poorly designed product in which screen currents pass to chassis via circuit board. Currents flowing in ground lead will raise voltages which interfere with the audio signal.

Differential working with twisted pairs is designed to reject hum and noise, but it only works properly if both signal legs have identical frequency/impedance characteristics at both ends. The easiest way of achieving this is to use transformers which give much better RF rejection than electronic balancing. Whilst an effective electronic differential receiver can be designed with care, a floating balanced electronic driver cannot compete with a transformer. An advantage of digital audio is that these transformers can be very small and inexpensive, whereas a high-quality analog transformer is very expensive indeed.

Analog audio equipment works at moderate frequencies and seldom has a radiation problem. However, any equipment controlled by a microprocessor or containing digital processing is a potential source of interference and once more steps must be taken in the design stage to ensure that radiation is minimized. It should be borne in mind that poor layout may result in radiation from the digital circuitry actually impairing the performance of analog circuits in the same device. This is a critical issue in convertor design. It is consequently most unlikely that a convertor which could not meet the EMC regulations would meet its audio quality specification.

All AC-powered audio devices contain some kind of power supply which rectifies the AC line to provide DC power. This rectification process is non-linear and can produce harmonics which leave the equipment via the power cable and cause interference elsewhere. Suitable power cord filtering must be provided to limit harmonic generation.

2.24 Electrical safety

Under fault conditions an excess of current can flow and the resulting heat can cause fire. Practical equipment must be fused so that excessive current causes the fuse element to melt, cutting off the supply. In many electronic devices the initial current exceeds the steady current because capacitors need to charge. Safe fusing requires the use of slow-blow fuses which have increased thermal mass. The switch-on surge will not blow the fuse, whereas a steady current of the same value will. Slow blow fuses can be identified by the (T) after the rating, e.g. 3.15 A(T). Blown fuses should only be replaced with items of the same type and rating. Fuses do occasionally blow from old age, but any blown fuse should be regarded as indicating a potential problem. Replacing a fuse with one of a higher rating is the height of folly as no protection against fire is available. When dual-voltage 115/230 equipment is set to a different range a different fuse will often be necessary. In some small power supplies the power taken is small and the use of a fuse is not practicable. Instead a thermal switch is built into the transformer. In the case of overheating this will melt. Generally these switches are designed to work once only after which the transformer must be replaced.

Figure 2.47    For electrical safety the metallic housing of equipment is connected to ground, preventing a dangerous potential existing in the case of a fault.

Except for low-voltage battery-powered devices, electrically powered equipment has to be considered a shock hazard to the user and steps must be taken to minimize the hazard. There are two main ways in which this is achieved. Figure 2.47 shows that the equipment is made with a conductive case which is connected to earth via a third conductor. In the event that a fault causes live wiring to contact the case, current is conducted to ground which will blow the fuse. Clearly disconnecting the earth for any purpose could allow the case to become live. The alternative is to construct the equipment in such a way that live wiring physically cannot cause the body to become live. In a double-insulated product all the live components are encased in plastic so that even if a part such as a transformer or motor becomes live it is still insulated from the outside. Double-insulated devices need no safety earth.

Where there is a risk of cables being damaged or cut, earthing and double insulation are of no help because the live conductor in the cable may be exposed. Safety can be enhanced by the use of a residual current breaker (RCB) which detects any imbalance in live and neutral current. An imbalance means that current is flowing somewhere it shouldn’t and this results in the breaker cutting off the power.

References

 1 Moore, B.C.J., An Introduction to the Psychology of Hearing, London: Academic Press (1989) 2 Muraoka, T., Iwahara, M. and Yamada, Y., Examination of audio bandwidth requirements for optimum sound signal transmission. J. Audio Eng. Soc., 29, 2–9 (1982) 3 Muraoka, T., Yamada, Y. and Yamazaki, M., Sampling frequency considerations in digital audio. J. Audio Eng. Soc., 26, 252–256 (1978) 4 Fincham, L.R., The subjective importance of uniform group delay at low frequencies. Presented at the 74th Audio Engineering Society Convention (New York, 1983), Preprint 2056(H-1) 5 Fletcher, H., Auditory patterns. Rev. Modern Physics, 12, 47–65 (1940) 6 Carterette, E.C. and Friedman, M.P., Handbook of Perception, 305–319. New York: Academic Press (1978) 7 Martin, W.H., Decibel – the new name for the transmission unit. Bell System Tech. J., (January 1929)
• No Comment
..................Content has been hidden....................