Chapter 16

Two-channel stereo

This chapter covers the principles and practice of two-channel stereophonic recording and reproduction. Two-channel stereophonic reproduction (in international standard terms ‘2-0 stereo’, meaning two front channels and no surround channels) is often called simply ‘stereo’ as it is the most common way of conveying some spatial content in sound recording and reproduction. In fact ‘stereophony’ refers to any sound system that conveys three-dimensional sound images, so it is used more generically in this book and includes surround sound. In international standards describing stereo loudspeaker configurations the nomenclature for the configuration is often in the form ‘n–m stereo’, where n is the number of front channels and m is the number of rear or side channels (the latter only being encountered in surround systems). This distinction can be helpful as it reinforces the slightly different role of the surround channels as explained in the next chapter. (Readers familiar with earlier editions of this book should note that the broadcasting nomenclature of ‘A’ and ‘B’, referring to the left and right signals of a stereo pair, has been replaced in this edition by ‘L’ and ‘R’. This is in order to avoid any possible confusion with the American tradition of referring to spaced microphone pairs as ‘AB pairs’, as contrasted with ‘XY’ for coincident microphones.)

It might reasonably be supposed that the best stereo sound system would be that which reproduced the sound signal to the ears as faithfully as possible, with all the original spatial cues intact (see Chapter 2). Possibly that should be the aim, and indeed it is the aim of the so-called ‘binaural’ techniques discussed later in the chapter, but there are many stereo techniques that rely on loudspeakers for reproduction which only manage to provide some of the spatial cues to the ears. Such techniques are compromises that have varying degrees of success, and they are necessary for the simple reason that they are reasonably straightforward from a recording point of view and result in subjectively high sound quality. The results can be reproduced in anyone’s living room and are demonstrably better than mono (single-channel reproduction). Theoretical correctness is one thing, pragmatism and getting a ‘commercial sound’ is another. The history of stereo could be characterised as being something of a compromise between the two.

Stereo techniques cannot be considered from a purely theoretical point of view, neither can the theory be ignored, the key being in a proper synthesis of theory and subjective assessment. Some techniques which have been judged subjectively to be good do not always stand up to rigorous theoretical analysis, and those which are held up as theoretically ‘correct’ are sometimes judged subjectively to be poorer than others. Part of the problem is that the mechanisms of spatial perception are not yet entirely understood. Probably more importantly, most commercial stereo reproduction uses only two loudspeakers so the listening situation already represents a serious departure from natural spatial hearing. (Real sonic experience involves sound arriving from all around the head.) The differences between two-channel stereo reproduction and natural listening may lead listeners to prefer ‘distorted’ sound fields because of other pleasing artefacts such as ‘spaciousness’. Most of the stereo techniques used today combine aspects of imaging accuracy with an attempt to give the impression of spaciousness in the sound field, and to some theorists these two are almost mutually exclusive.

It would be reasonable to surmise that in most practical circumstances, for mainstream consumer applications, one is dealing with the business of creating believable illusions. Sound recording is as much an art as a science. In other words, one needs to create the impression of natural spaces, source positions, depth, size and so on, without necessarily being able to replicate the exact sound pressure and velocity vectors that would be needed at each listening position to recreate a sound field accurately. One must remember that listeners rarely sit in the optimum listening position, and often like to move around while listening. While it may be possible to achieve greater spatial accuracy using headphone reproduction, headphones are not always a practical or desirable form of monitoring. Truly accurate soundfield reconstruction covering a wide listening area can only be achieved by using very large numbers of loudspeakers (many thousands) and this is likely to be impractical for most current purposes.

In the following chapters stereo pickup and reproduction is considered from both a theoretical and a practical point of view, recognising that theoretical rules may have to be bent or broken for operational and subjective reasons. Since the subject is far too large even to be summarised in the short space available, a list of recommended further reading is given at the end of the chapter to allow the reader greater scope for personal study.

Principles of loudspeaker stereo

Historical development

We have become used to stereo sound as a two-channel format, although a review of developments during the last century shows that two channels really only became the norm through economic and domestic necessity, and through the practical considerations of encoding directional sound easily for gramophone records and radio. A two-loudspeaker arrangement is practical in the domestic environment, is reasonably cheap to implement, and provides good phantom images for a central listening position.

Early work on directional reproduction undertaken at Bell Labs in the 1930s involved attempts to recreate the ‘sound wavefront’ which would result from an infinite number of microphone/loudspeaker channels by using a smaller number of channels, as shown in Figure 16.1(a) and (b). In all cases, spaced pressure response (omnidirectional) microphones were used, each connected via a single amplifier to the appropriate loudspeaker in the listening room. Steinberg and Snow found that when reducing the number of channels from three to two, central sources appeared to recede towards the rear of the sound stage and that the width of the reproduced sound stage appeared to be increased. They attempted to make some calculated rather than measured deductions about the way that loudness differences between the channels affected directional perception, apparently choosing to ignore the effects of time or phase difference between channels.

Images

Figure 16.1   Steinberg and Snow’s attempt to reduce the number of channels needed to convey a source wavefront to a reproduction environment with appropriate spatial features intact. (a) ‘Ideal’ arrangement involving a large number of transducers. (b) Compromise arrangement involving only three channels, relying more on the precedence effect

Some twenty years later Snow made comment on those early results, reconsidering the effects of time difference in a system with a small number of channels, since, as he pointed out, there was in fact a marked difference between the multiple-point-source configuration and the small-number-of-channels configuration. It was suggested that in fact the ‘ideal’ multi-source system re-created the original wavefront very accurately, allowing the ears to use exactly the same binaural perception mechanisms as used in the real-life sound field. The ‘wall’ of multiple loudspeakers acted as a source of spherical wavelets, re-creating a new plane wave with its virtual source in the same relative place as the original source, thus resulting in a time-of-arrival difference between the listener’s ears in the range 0–600 µs, depending on source and listener position. (This is the basis of more recent developments in ‘wave field synthesis’, developed at the University of Delft in the Netherlands, that also relies on large numbers of closely spaced channels to reconstruct sound fields accurately.)

In the two- or three-channel system, far from this simply being a sparse approximation to the ‘wavefront’ system, the ears are subjected to two or three discrete arrivals of sound, the delays between which are likely to be in excess of those normally experienced in binaural listening. In this case, the effect of directionality relies much more on the precedence effect and on the relative levels of the channels. Snow therefore begs us to remember the fundamental difference between ‘binaural’ situations and what he calls ‘stereophonic’ situations (see Fact File 16.1).

Fact file 16.1   Binaural versus ‘stereophonic’ localisation

There is a distinct difference between the spatial perception that arises when two ears detect a single wavefront (i.e. from a single source) and that which arises when two arrivals of a similar sound come from different directions and are detected by both ears. The former, shown at (a), gives rise to spatial perceptions based primarily on what is known as the ‘binaural delay’ (essentially the time-of-arrival difference that arises between the ears for the particular angle of incidence). The latter, shown at (b) gives rise to spatial perceptions based primarily on various forms of ‘precedence effect’ (or ‘law of the first wavefront’). In terms of sound reproduction, the former may be encountered in the headphone presentation context where sound source positions may be implied by using delays between the ear signals within the interaural delay of about 0.65 ms. Headphones enable the two ears to be stimulated independently of each other.

In loudspeaker listening the precedence effect is more relevant, as a rule. The precedence effect is primarily a feature of transient sounds rather than continuous sounds. In this case there are usually at least two sound sources in different places, emitting different versions of the same sound, perhaps with a time or amplitude offset to provide directional information. This is what Snow termed the ‘stereophonic’ situation. Both ears hear both loudspeakers and the brain tends to localise based on the interaural delay arising from the earliest arriving wavefront, the source appearing to come from a direction towards that of the earliest arriving signal. This effect operates over delays between the sources that are somewhat greater than the interaural delay, of the order of a few milliseconds. Similar sounds arriving within up to 50 ms of each other tend to be perceptually fused together, such that one is not perceived as an echo of the other. The time delay over which this fusing effect obtains depends on the source, with clicks tending to separate before complex sounds like music or speech. The timbre and spatial qualities of this ‘fused sound’, though, may be affected.

Images

This difference was also recognised by Alan Blumlein, whose now-famous patent specification of 1931 (accepted 1933) allows for the conversion of signals from a binaural format suitable for spaced pressure microphones to a format suitable for reproduction on loudspeakers. His patent also covers other formats of pickup which result in an approximation of the original time and phase differences at the ears when reproduced on loudspeakers. This will be discussed in more detail later on, but it is interesting historically to note how much writing on stereo reproduction even in the early 1950s appears unaware of Blumlein’s most valuable work, which appears to have been ignored for some time.

A British paper presented by Clark, Dutton and Vanderlyn (of EMI) in 1957 revives the Blumlein theories, and shows in more rigorous mathematical detail how a two-loudspeaker system may be used to create an accurate relationship between the original location of a sound source and its perceived location on reproduction. This is achieved by controlling only the relative signal amplitudes of the two loudspeakers (derived in this case from a pair of coincident figure-eight microphones). The authors discuss the three-channel system of Bell Labs, and suggest that although it produces convincing results in many listening situations it is uneconomical for domestic use. They also conclude that the two-channel simplification (using microphones spaced about ten feet apart) has a tendency to result in a ‘hole-in-the-middle’ effect (with which many modern users of spaced microphones may be familiar – sources appearing to bunch towards the left or the right leaving a hole in the centre). They concede that the Blumlein method adapted by them does not take advantage of all the mechanisms of binaural hearing, especially the precedence effect, but that they have endeavoured to take advantage of, and re-create, a few of the directional cues which exist in the real-life situation.

There is therefore a historical basis for both the spaced microphone arrangement which makes use of the time-difference precedence effect (with only moderate level differences between channels), as well as the coincident microphone technique (or any other technique which results in only level differences between channels). There is also some evidence to show that the spaced technique is more effective with three channels than with only two. Later, we shall see that spaced techniques have a fundamental theoretical flaw from a point of view of ‘correct’ imaging of continuous sounds, which has not always been appreciated, although such techniques may result in subjectively acceptable sounds. Interestingly, three front channels are the norm in cinema sound reproduction, since the central channel has the effect of stabilising the important central image for off-centre listeners, having been used ever since the Disney film Fantasia in 1939. (People have often misunderstood the intentions of Bell Labs in the 1930s, since it is not generally realised that they were working on a system suitable for auditorium reproduction with wide-screen pictures, as opposed to a domestic system.)

Creating phantom images

Based on a variety of formal research and practical experience, it has become almost universally accepted that the optimum configuration for two-loudspeaker stereo is an equilateral triangle with the listener located just to the rear of the point of the triangle (the loudspeaker forming the baseline). Wider than this, phantom images (the apparent locations of sound sources in-between the loudspeakers) become less stable, and the system is more susceptible to the effects of head rotation. This configuration gives rise to an angle subtended by the loudspeakers of ± 30° at the listening position, as shown in Figure 16.2. In most cases stereo reproduction from two loudspeakers can only hope to achieve a modest illusion of three-dimensional spatiality, since reproduction is from the front quadrant only.

Images

Figure 16.2   Optimum arrangement of two loudspeakers and listener for stereo listening

The so-called ‘summing localisation’ model of stereo reproduction suggests that the best illusion of phantom sources between the loudspeakers will be created when the sound signals present at the two ears are as similar as possible to those perceived in natural listening, or at least that a number of natural localisation cues that are non-contradictory are available. It is possible to create this illusion for sources in the angle between the loudspeakers using only amplitude differences between the loudspeakers, where the time difference between the signals is very small («1 ms). To reiterate an earlier point, in loudspeaker reproduction both ears receive the signals from both speakers, whereas in headphone listening each ear only receives one signal channel. The result of this is that the loudspeaker listener seated in a centre seat (see Figure 16.3) receives at his left ear the signal from the left speaker first followed by that from the right speaker, and at his right ear the signal from the right speaker first followed by that from the left speaker. The time ∂t is the time taken for the sound to travel the extra distance from the more distant speaker.

Images

Figure 16.3   An approximation to the situation that arises when listening to sound from two loudspeakers. Both ears hear sound from both loudspeakers, the signal from the right loudspeaker being delyaed by δt at the left ear compared with the time it arrives at the right ear (and reversed for the other ear)

The basis on which ‘level-difference’ or ‘Blumlein’ stereo works is to use level differences between two loudspeakers to generate low-frequency phase differences between the ears, based on the summation of the loudspeaker signals at the two ears, as described in Fact File 16.2. Depending on which author one believes, an amplitude difference of between 15 and 18 dB between the channels is needed for a source to be panned either fully left or fully right. A useful summary of experimental data on this issue has been drawn by Hugonnet and Walder and is shown in Figure 16.4. A coincident arrangement of velocity (figure-eight) microphones at ninety degrees to one another produce outputs which differ in amplitude with varying angle over the frontal quadrant by an amount which gives a very close correlation between the true angle of offset of the original source from the centre line and the apparent angle on reproduction, assuming loudspeakers which subtend an angle of 120° to the listening position. This angle of loudspeakers is not found to be very satisfactory for practical purposes for reasons such as the tendency to give rise to a ‘hole’ in the middle of the image. At smaller loudspeaker angles the change in apparent angle is roughly proportionate as a fraction of total loudspeaker spacing, maintaining a correctly proportioned ‘sound stage’, so the sound stage with loudspeakers at the more typical 60° angle will tend to be narrower than the original sound stage but still in proportion.

If a time difference also exists between the channels, then transient sounds will be ‘pulled’ towards the advanced speaker because of the precedence effect, the perceived position depending to some extent on the time delay. If the left speaker is advanced in time relative to the right speaker (or more correctly, the right speaker is delayed!) then the sound appears to come more from the left speaker, although this can be corrected by increasing the level to the right speaker. A delay somewhere between 0.5 and 1.5 ms is needed for a signal to appear fully left or fully right at ±30°, depending on the nature of the signal (see Figure 16.5, after Hugonnet and Walder). With time-difference stereo, continuous sounds may give rise to contradictory phantom image positions when compared with the position implied by transients, owing to the phase differences that are created between the channels. Cancellations may also arise at certain frequencies if the channels are summed to mono.

Fact file 16.2   Stereo vector summation

If the outputs of the two speakers differ only in amplitude and not in phase (time) then it can be shown (at least for low frequencies up to around 700 Hz) that the vector summation of the signals from the two speakers at each ear results in two signals that, for a given frequency, differ in phase angle proportional to the relative amplitudes of the two signals (the level difference between the ears being negligible at LF). For a given level difference between the speakers, the phase angle changes approximately linearly with frequency, which is the case when listening to a real point source. At higher frequencies the phase difference cue becomes largely irrelevant but the shadowing effect of the head results in level differences between the ears. If the amplitudes of the two channels are correctly controlled it is possible to produce resultant phase and amplitude differences for continuous sounds that are very close to those experienced with natural sources, thus giving the impression of virtual or ‘phantom’ images anywhere between the left and right loudspeakers. This is the basis of Blumlein’s (1931) stereophonic system ‘invention’ although the mathematics is quoted by Clark, Dutton and Vanderyn (1957) and further analysed by others. The result of the mathematical phasor analysis is a simple formula which can be used to determine, for any angle subtended by the loudspeakers at the listener, what the apparent angle of the virtual image will be for a given difference between left and right levels.

Images

Firstly, referring to the diagram, it can be shown that:

sin α = (L − R)/(L + R) sin θ0

where α is the apparent angle of offset from the centre of the virtual image, and θ0 is the angle subtended by the speaker at the listener. Secondly, it can be shown that:

(L − R)/(L + R) = tan θt

where θt is the true angle of offset of a real source from the centre-front of a coincident pair of figure-eight velocity microphones. (L - R) and (L + R) are the well-known difference (S) and sum (M) signals of a stereo pair, defined below.

This is a useful result since it shows that it is possible to use positioning techniques such as ‘pan-potting’ which rely on the splitting of a mono signal source into two components, with adjustment of the relative proportion fed to the left and right channels without affecting their relative timing. It also makes possible the combining of the two channels into mono without cancellations due to phase difference.

Images

Figure 16.4   A summary of experimental data relating to amplitude differences (here labelled intensity) required between two loudspeaker signals for a particular phantom image location (data compiled by Hugonnet and Walder, 1995). (Courtesy of Christian Hugonnet)

Combinations of time and level difference can also be used to create phantom images, as described in Fact File 16.3.

Principles of binaural or headphone stereo

Binaural recording has fascinated researchers for years but it has received very little commercial attention until recently. Part of the problem has been that it is actually very difficult to get it to work properly for a wide range of listeners over a wide range of different headphone types, and partly it is related to the limited compatibility between headphone and loudspeaker listening. Conventional loudspeaker stereo is acceptable on headphones to the majority of people, although it creates a strongly ‘in-the-head’ effect, but binaural recordings do not sound particularly good on loudspeakers unless some signal processing is used, and the stereo image is dubious.

Images

Figure 16.5   A summary of experimental data relating to time differences required between two loudspeaker signals for a particular phantom image location (Hugonnet and Walder, 1995). (Courtsey of Christian Hugonnet)

Fact file 16.3   The ‘Williams curves’

Stereo microphone technique relies on either interchannel level or time difference or a combination of the two. A trade-off is possible between them, although the exact relationship between time and level differences needed to place a source in a certain position is disputed by different authors and seems to depend to some extent on the source characteristics. Michael Williams has based an analysis of microphone arrays on some curves of such trade-offs that have generally become known as the ‘Williams curves’, shown below. These curves represent the time and level difference combinations that may be used between two loudspeakers at ±30° in a typical listening room to obtain certain phantom source positions. The data points marked with circles were determined by a Danish researcher, Simonsen, using speech and maracas for signals.

Images

Recent technical developments have made the signal processing needed to synthesise binaural signals and deal with the conversion between headphone and loudspeaker listening more widely available at reasonable cost. It is now possible to create 3D directional sound cues and to synthesise the acoustics of virtual environments quite accurately using digital signal processors (DSP), and it is this area of virtual environment simulation for computer applications that is receiving the most commercial attention for binaural technology today. Flight simulators, computer games, virtual reality applications and architectural auralisation are all areas that are benefiting from these developments.

Basic binaural principles

Binaural approaches to spatial sound representation are based on the premise that the most accurate reproduction of natural spatial listening cues will be achieved if the ears of the listener can be provided with the same signals that they would have experienced in the source environment or during natural listening. In a sense, all stereo reproduction is binaural, but the term is normally taken to mean an approach involving source signals that represent individual ear signals and independent-ear reproduction (such as can be achieved using headphones). Most of the approaches described so far in this chapter have related to loudspeaker reproduction of signals that contain some of the necessary information for the brain to localise phantom images and perceive a sense of spaciousness and depth. Much reproduced sound using loudspeakers relies on a combination of accurate spatial cues and believable illusion. In its purest form, binaural reproduction aims to reproduce all the cues that are needed for accurate spatial perception, but in practice this is something of a tall order and various problems arise.

An obvious and somewhat crude approach to binaural audio is to place two microphones, one at the position of each ear in the source environment, and to reproduce these signals through headphones to the ears of a listener, as shown in Figure 16.6. For binaural reproduction to work well, the HRTFs of sound sources from the source (or synthesised) environment must be accurately re-created at the listener’s ears upon reproduction. This means capturing the time and frequency spectrum differences between the two ears accurately. Since each source position results in a unique HRTF, rather like a fingerprint, one might assume that all that is needed is to ensure the listener hears this correctly on reproduction.

Tackling the problems of binaural systems

The primary problems in achieving an accurate reconstruction of spatial cues can be summarised as follows:

•  People’s heads and ears are different (to varying degrees), although there are some common features, making it difficult to generalise about the HRTFs that should be used for commercial systems that have to serve lots of people (see above).

•  Head movements that help to resolve directional confusion in natural listening are difficult to incorporate in reproduction situations.

•  Visual cues are often missing during binaural reproduction and these normally have a strong effect on perception.

•  Headphones differ in their equalisation and method of mounting, leading to distortions in the perceived HRTFs on reproduction.

•  Distortions such as phase and frequency response errors in the signal chain can affect the subtle cues required.

Images

Figure 16.6   Basic binaural recording and reproduction

It has been possible to identify the HRTF features that seem to occur in the majority of people and to then create generalised HRTFs that work reasonably well for a wide range of listeners. It has also been found that some people are better at localising sounds than others, and that the HRTFs of so-called ‘good localisers’ can be used in preference to those of ‘poor localisers’. To summarise, it can be said that although a person’s own HRTFs provide them with the most stable and reliable directional cues, generalised functions can be used at the expense of absolute accuracy of reproduction for everyone.

The problem of head movements can be addressed in advanced systems by using head tracking to follow the listener’s actions and adapt the signals fed to the ears accordingly. This is generally only possible when using synthesised binaural signals that can be modified in real time. The issue of the lack of visual cues commonly encountered during reproduction can only be resolved in full ‘virtual reality’ systems that incorporate 3D visual information in addition to sound information. In the absence of visual cues, the listener must rely entirely on the sound cues to resolve things like front–back confusions and elevation/distance estimations.

The issue of headphone equalisation is a thorny one as it depends on the design goal for the headphones. Different equalisation is required depending on the method of recording, unless the equalisation of both ends of the chain is standardised. For a variety of reasons, a diffuse field form of equalisation for headphones, dummy heads and synthesised environments has generally been found preferable to free-field equalisation. This means that the system is equalised to have a flat response to signals arriving from all angles around the head when averaged in a diffuse sound field. Headphones equalised in this way have been found to be quite suitable for both binaural and loudspeaker stereo signals, provided that the binaural signals are equalised in the same way.

Distortions in the signal chain that can affect the timing and spectral information in binaural signals have been markedly reduced since the introduction of digital audio systems. In the days of analogue signal chains and media such as compact cassette and LP records, numerous opportunities existed for interchannel phase and frequency response errors to arise, making it difficult to transfer binaural signals with sufficient integrity for success.

Loudspeaker stereo over headphones and vice versa

Bauer showed that if stereo signals designed for reproduction on loudspeakers were fed to headphones there would be too great a level difference between the ears compared with the real-life situation, and that the correct interaural delays would not exist. This results in an unnatural stereo image that does not have the expected sense of space and appears to be inside the head. He therefore proposed a network which introduced a measure of delayed crosstalk between the channels to simulate the correct interaural level differences at different frequencies, as well as simulating the interaural time delays which would result from the loudspeaker signals incident at 45° to the listener. He based the characteristics on research done by Weiner which produced graphs for the effects of diffraction around the human head for different angles of incidence. The characteristics of Bauer’s circuit are shown in Figure 16.7 (with Weiner’s results shown dotted). It may be seen that Bauer chooses to reduce the delay at HF, partially because the circuit design would have been too complicated, and partially because localisation relies more on amplitude difference at HF anyway.

Bauer also suggests the reverse process (turning binaural signals into stereo signals for loudspeakers). He points out that crosstalk must be removed between binaural channels for correct loudspeaker reproduction, since the crossfeed between the channels will otherwise occur twice (once between the pair of binaurally spaced microphones, and again at the ears of the listener), resulting in poor separation and a narrow image. He suggests that this may be achieved using the subtraction of an anti-phase component of each channel from the other channel signal, although he does not discuss how the time difference between the binaural channels may be removed. Such processes are the basis of ‘transaural stereo’ (see Fact File 16.4).

Images

Figure 16.7   Bauer’s filter for processing loudspeaker signals so that they could be reproduced on headphones. The upper graph shows the delay introduced into the crossfeed between channels. The lower graph shows the left and right channel gains needed to imitate the shadowing effect of the head

The idea that unprocessed binaural signals are unsuitable for loudspeaker reproduction has been challenged by Theile. He claims that the brain is capable of associating ‘head-related’ differences between loudspeakers with appropriate spatial cues for stereo reproduction, provided the timbral quality of head-related signals is equalised for a natural-sounding spectrum (e.g. diffuse field equalisation, as described above). This theory has led to a variety of companies and recording engineers experimenting with the use of dummy heads such as the Neumann KU100 for generating loudspeaker signals, and created the idea for the Schoeps ‘Sphere’ microphone described below.

Fact file 16.4   Transaural stereo

When binaural signals are replayed on loudspeakers there is crosstalk between the signals at the two ears of the listener that does not occur with headphone reproduction. The right ear hears the left channel signal a fraction of a second after it is received by the left ear, with an HRTF corresponding to the location of the left loudspeaker, and vice versa for the other ear. This prevents the correct binaural cues from being established at the listener’s ears and eliminates the possibility for full 3D sound reproduction. Binaural stereo tends to sound excessively narrow at low frequencies when replayed on loudspeakers as there is very little difference between the channels that has any effect at a listener’s ears. Furthermore the spectral characteristics of binaural recordings can create timbral inaccuracies when reproduced over loudspeakers unless some form of compromise equalisation is used.

If the full 3D cues of the original binaural recording are to be conveyed over loudspeakers, some additional processing is required. If the left ear is to be presented only with the left channel signal and the right ear with the right channel signal then some means of removing the interaural crosstalk is required. This is often referred to as crosstalk cancelling or ‘transaural’ processing. Put crudely, transaural crosstalk-cancelling systems perform this task by feeding an anti-phase version of the left channel’s signal into the right channel and vice versa, filtered and delayed according to the HRTF characteristic representing the crosstalk path, as shown above.

The effect of this technique can be quite striking, and in the best implementations enables fully three-dimensional virtual sources to be perceived, including behind the listener (from only two loudspeakers located at the front). Crosstalk-cancelling filters are usually only valid for a very narrow range of listening positions. Beyond a few tens of centimetres away from the ‘hot spot’ the effect often disappears almost completely. The effect is sometimes perceived as unnatural, and some listeners find it fatiguing to listen to for extended periods.

Images

‘Spatial equalisation’ has been proposed by Griesinger to make binaural recordings more suitable for loudspeaker reproduction. He suggested low-frequency difference channel (L–R) boost of about 15 dB at 40 Hz (to increase the LF width of the reproduction) coupled with overall equalisation for a flat frequency response in the total energy of the recording to preserve timbral quality. This results in reasonably successful stereo reproduction in front of the listener, but the height and front–back cues are not preserved.

Two-channel signal formats

The two channels of a ‘stereo pair’ represent the left (L) and the right (R) loudspeaker signals. It is conventional in broadcasting terminology to refer to the left channel of a stereo pair as the ‘A’ signal and the right channel as the ‘B’ signal, although this may cause confusion to some who use the term ‘AB pair’ to refer specifically to a spaced microphone pair. In the case of some stereo microphones or systems the left and right channels are called respectively the ‘X’ and the ‘Y’ signals, although some people reserve this convention specifically for coincident microphone pairs. Here we will stick to using L and R for simplicity. In colour coding terms (for meters, cables, etc.), particularly in broadcasting, the L signal is coloured red and the R signal is coloured green. This may be confusing when compared with some domestic hi-fi wiring conventions that use red for the right channel, but it is the same as the convention used for port and starboard on ships. Furthermore there is a German DIN convention which uses yellow for L and red for R.

It is sometimes convenient to work with stereo signals in the so-called ‘sum and difference’ format, since it allows for the control of image width and ambient signal balance. The sum or main signal is denoted ‘M’ and is based on the addition of L and R signals. The difference or side signal is denoted ‘S’ and is based on the subtraction of R from L to obtain a signal which represents the difference between the two channels (see below). The M signal is that which would be heard by someone listening to a stereo programme in mono, and thus it is important in situations where the mono listener must be considered, such as in broadcasting. Colour-coding convention in broadcasting holds that M is coloured white, whilst S is coloured yellow, but it is sometimes difficult to distinguish between these two colours on certain meter types leading to the increasing use of orange for S.

Two-channel stereo signals may be derived by many means. Most simply, they may be derived from a pair of coincident directional microphones orientated at a fixed angle to each other. Alternatively they may be derived from a pair of spaced microphones, either directional or non-directional, with an optional third microphone bridged between the left and right channels. Finally stereo signals may be derived by splitting one or more mono signals into two by means of a ‘pan-pot’. A pan-pot is simply a dual-ganged variable resistor that controls the relative proportion of the mono signal being fed to the two legs of the stereo pair, such that as the level to the left side is increased that to the right side is decreased.

MS or ‘sum and difference’ format signals may be derived by conversion from the LR format using a suitable matrix (see Fact File 3.6 in Chapter 3) or by direct pickup in that format. For every stereo pair of signals it is possible to derive an MS equivalent, since M is the sum of L and R, whilst S is the difference between them. Likewise, signals may be converted from MS to LR formats using the reverse process. Misalignment of signals in either format leads to audible effects as described in Fact File 16.5. In order to convert an LR signal into MS format it is necessary to follow some simple rules. Firstly, the M signal is not usually a simple sum of L and R, as this will result in over-modulation of the M channel in the case where a maximum level signal exists on both L and R (representing a central image). A correction factor is normally applied, ranging between -3 dB and -6 dB (equivalent to a division of the voltage by between √2 and 2 respectively):

e.g. M = (L + R) - 3 dB or (L + R) - 6 dB

The correction factor will depend on the nature of the two signals to be combined. If identical signals exist on the L and R channels (representing ‘double mono’ in effect), then the level of the uncorrected sum channel (M) will be two times (6 dB) higher than the levels of either L or R. This requires a correction of −6 dB in the M channel in order for the maximum level of the M signal to be reduced to a satisfactory level. If the L and R signals are non-coherent (random phase relationship), then only a 3 dB rise in the level of M will result when L and R are summed, requiring the −3 dB correction factor to be applied. This is more likely with stereo music signals. As most stereo material has a degree of coherence between the channels, the actual rise in level of M compared with L and R is likely to be somewhat between the two limits for real programme material.

Fact file 16.5   Stereo misalignment effects

Differences in level, frequency response and phase may arise between signals of a stereo pair, perhaps due to losses in cables, misalignment, and performance limitations of equipment. It is important that these are kept to a minimum for stereo work, as inter-channel anomalies result in various audible side-effects. Differences will also result in poor mono compatibility. These differences and their effects are discussed below.

Frequency response and level

A difference in level or frequency response between L and R channels will result in a stereo image biased towards the channel with the higher overall level or that with the better HF response. Also, an L channel with excessive HF response compared with that of the R channel will result in the apparent movement of sibilant sounds towards the L loudspeaker. Level and response misalignment on MS signals results in increased crosstalk between the equivalent L and R channels, such that if the S level is too low at any frequency the LR signal will become more monophonic (width narrower), and if it is too high the apparent stereo width will be increased.

Phase

Inter-channel phase anomalies will affect one’s perception of the positioning of sound source, and it will also affect mono compatibility. Phase differences between L and R channels will result in ‘comb-filtering’ effects in the derived M signal due to cancellation and addition of the two signals at certain frequencies where the signals are either out-of-or in-phase.

Crosstalk

It was stated earlier that an inter-channel level difference of only 18 dB was required to give the impression of a signal being either fully left or fully right. Crosstalk between L and R signals is not therefore usually a major problem, since the performance of most audio equipment is far in excess of these requirements. Excessive crosstalk between L and R signals will result in a narrower stereo image, whilst excessive crosstalk between M and S signals will result in a stereo image increasingly biased towards one side.

The S signal results from the subtraction of R and L, and is subject to the same correction factor:

e.g. S = (L - R) - 3 dB or (L - R) - 6 dB

S can be used to reconstruct L and R when matrixed in the correct way with the M signal (see below), since (M + S) = 2L and (M - S) = 2R. It may therefore be appreciated that it is possible at any time to convert a stereo signal from one format to the other and back again.

Two-channel microphone techniques

This section contains a review of basic two-channel microphone techniques, upon which many spatial recording techniques are based. Panned spot microphones are often mixed into the basic stereo image created by such techniques.

Coincident-pair principles

The coincident-pair incorporates two directional capsules that may be angled over a range of settings to allow for different configurations and operational requirements. The pair can be operated in either the LR (sometimes known as ‘XY’) or MS modes (see above), and a matrixing unit is sometimes supplied with microphones which are intended to operate in the MS mode in order to convert the signal to LR format for recording. The directional patterns (polar diagrams) of the two microphones need not necessarily be figure-eight, although if the microphone is used in the MS mode the S capsule must be figure-eight (see below). Directional information is encoded solely in the level differences between the capsule outputs, since the two capsules are mounted physically as close as possible. There are no phase differences between the outputs except at the highest frequencies where inter-capsule spacing may become appreciable in relation to the wavelength of sound.

Coincident pairs are normally mounted vertically in relation to the sound source, so that the two capsules are angled to point symmetrically left and right of the centre of the source stage (see Figure 16.8). The choice of angle depends on the polar response of the capsules used. A coincident pair of figure-eight microphones at 90° provides good correspondence between the actual angle of the source and the apparent position of the virtual image when reproduced on loudspeakers, but there are also operational disadvantages to the figure-eight pattern in some cases, such as the amount of reverberation pickup.

Images

Figure 16.8   A coincident pair’s capsules are oriented so as to point left and right of the centre of the sound stage

Figure 16.9 shows the polar pattern of a coincident pair using figure-eight mics. Firstly, it may be seen that the fully-left position corresponds to the null point of the right capsule’s pickup. This is the point at which there will be maximum level difference between the two capsules. The fully-left position also corresponds to the maximum pickup of the left capsule but it does not always do so in other stereo pairs. As a sound moves across the sound stage from left to right it will result in a gradually decreasing output from the left mic, and an increasing output from the right mic. Since the microphones have cosine responses, the output at 45° off axis is √2 times the maximum output, or 3 dB down in level, thus the takeover between left and right microphones is smooth for music signals. Fact File 16.6 goes into greater detail concerning the relationship between capsule angle and stereo width.

Images

Figure 16.9   Polar pattern of a coincident pair using figure-eight microphones

Fact file 16.6   Stereo width issues

With any coincident pair, fully left or fully right corresponds to the null point of pickup of the opposite channel’s microphone, although psychoacoustically this point may be reached before the maximum level difference is arrived at. This also corresponds to the point where the M signal equals the S signal (where the sum of the channels is the same as the difference between them). As the angle between the capsules is made larger, the angle between the null points will become smaller, as shown below. Operationally, if one wishes to widen the reproduced sound stage one will widen the angle between the microphones which is intuitively the right thing to do. This results in a narrowing of the angle between fully left and fully right, so sources which had been, say, half left in the original image will now be further towards the left. A narrow angle between fully left and fully right results in a very wide sound stage, since sources have only to move a small distance to result in large changes in reproduced position. This corresponds to a wide angle between the capsules.

Further coincident pairs are possible using any polar pattern between figure-eight and omni, although the closer that one gets to omni, the greater the required angle to achieve adequate separation between the channels. The hypercardioid pattern is often chosen for its smaller rear lobes than the figure-eight, allowing a more distant placement from the source for a given direct-to-reverberant ratio (although in practice hypercardioid pairs tend to be used closer to make the image width similar to that of a figure-eight pair). Since the hypercardioid pattern lies between figure-eight and cardioid, the angle required between the capsules is correctly around 110°.

Psychoacoustic requirements suggest the need for an electrical narrowing of the image at high frequencies in order to preserve the correct angular relationships between low- and high-frequency signals, although this is rarely implemented in practice with coincident pair recording. A further consideration to do with the theoretical versus the practical is that although microphones tend to be referred to as having a particular polar pattern, this pattern is unlikely to be consistent across the frequency range and this will have an effect on the stereo image. Cardioid crossed pairs should theoretically exhibit no out-of-phase region (there should be no negative rear lobes), but in practice most cardioid capsules become more omni at LF and narrower at HF. As a result some out-of-phase components may be noticed in the HF range while the width may appear too narrow at LF. Attempts have been made to compensate for this in some stereo microphone designs.

Images

The second point to consider with this pair is that the rear quadrant of pickup suffers a left–right reversal, since the rear lobes of each capsule point in the opposite direction to the front. This is important when considering the use of such a microphone in situations where confusion may arise between sounds picked up on the rear and in front of the mic, such as in television sound where the viewer can also see the positions of sources. The third point is that pickup in both side quadrants results in out-of-phase signals between the channels, since a source further round than ‘fully left’ results in pickup by both the negative lobe of the right capsule and the positive lobe of the left capsule. There is thus a large region around a crossed pair of figure-eights that results in out-of-phase information, this information often being reflected or reverberant sound. Any sound picked up in this region will suffer cancellation if the channels are summed to mono, with maximum cancellation occurring at 90° and 270°, assuming 0° as the centre-front.

The operational advantages of the figure-eight pair are the crisp and accurate phantom imaging of sources, together with a natural blend of ambient sound from the rear. Some cancellation of ambience may occur, especially in mono, if there is a lot of reverberant sound picked up by the side quadrants. Disadvantages lie in the large out-of-phase region, and in the size of the rear pickup which is not desirable in all cases and is left–right reversed. Stereo pairs made up of capsules having less rear pickup may be preferred in cases where a ‘drier’ or less reverberant balance is required, and where frontal sources are to be favoured over rear sources. In such cases the capsule responses may be changed to be nearer the cardioid pattern, and this requires an increased angle between the capsules to maintain good correlation between actual and perceived angle of sources.

The cardioid crossed pair shown in Figure 16.10 is angled at approximately 131°, although angles of between 90° and 180° may be used to good effect depending on the width of the sound stage to be covered. At an angle of 131° a centre source is 65.5° off-axis from each capsule, resulting in a 3 dB drop in level compared with the maximum on-axis output (the cardioid mic response is equivalent to 0.5(1 + cos ϑ), where ϑ is the angle off-axis of the source, and thus the output at 65.5° is √2 times that at 0°). A departure from the theoretically correct angle is often necessary in practical situations, and it must be remembered that the listener will not necessarily be aware of the ‘correct’ location of each source, neither may it matter that the true and perceived positions are different. A pair of ‘back-to-back’ cardioids has often been used to good effect (see Figure 16.11), since it has a simple MS equivalent of an omni and a figure-eight, and has no out-of-phase region. Although the maximum level difference between the channels is at 90° off-centre there will in fact be a satisfactory level difference for a phantom image to appear fully left or right at a substantially smaller angle than this.

Images

Figure 16.10   A coincident pair of cardioid microphones should theoretically be angled at 131°, but deviations either side of this may be acceptable in practice

XY or LR coincident pairs in general have the possible disadvantage that central sounds are off-axis to both mics, perhaps considerably so in the case of crossed cardioids. This may result in a central signal with a poor frequency response and possibly an unstable image if the polar response is erratic. Whether or not this is important depends on the importance of the central image in relation to that of offset images, and will be most important in cases where the main source is central (such as in television, with dialogue). In such cases the MS technique described in the next section is likely to be more appropriate, since central sources will be on-axis to the M microphone. For music recording it would be hard to say whether central sounds are any more important than offset sources, so either technique may be acceptable.

Images

Figure 16.11   Back-to-back cardioids have been found to work well in practice and should have no out-of-phase region

Using MS processing on coincident pairs

Although some stereo microphones are built specifically to operate in the MS mode, it is possible to take any coincident pair capable of at least one capsule being switched to figure-eight, and orientate it so that it will produce suitable signals. The S component (being the difference between left and right signals) is always a sideways-facing figure-eight with its positive lobe facing left. The M (middle) component may be any polar pattern facing to the centre-front, although the choice of M pattern depends on the desired equivalent pair, and will be the signal that a mono listener would hear. True MS mics usually come equipped with a control box that matrixes the MS signals to LR format if required. A control for varying S gain is often provided as a means of varying the effective acceptance angle between the equivalent LR pair.

MS signals are not suitable for direct stereo monitoring, they are sum and difference components and must be converted to a conventional loudspeaker format at a convenient point in the production chain. The advantages of keeping a signal in the MS format until it needs to be converted will be discussed below, but the major advantage of pickup in the MS format is that central signals will be on-axis to the M capsule, resulting in the best frequency response. Furthermore, it is possible to operate an MS mic in a similar way to a mono mic which may be useful in television operations where the MS mic is replacing a mono mic on a pole or in a boom.

To see how MS and LR pairs relate to each other, and to draw some useful conclusions about stereo width control, it is informative to consider a coincident pair of figure-eight mics again. For each MS pair there is an LR equivalent. The polar pattern of the LR equivalent to any MS pair may be derived by plotting the level of (M + S)/2 and (M - S)/2 for every angle around the pair. Taking the MS pair of figure-eight mics shown in Figure 16.12, it may be seen that the LR equivalent is simply another pair of figure-eights, but rotated through 45°. Thus the correct MS arrangement to give an equivalent LR signal where both ‘capsules’ are oriented at 45° to the centre-front (the normal arrangement) is for the M capsule to face forwards and the S capsule to face sideways.

A number of interesting points arise from a study of the LR/MS equivalence of these two pairs, and these points apply to all equivalent pairs. Firstly, fully left or right in the resulting stereo image occurs at the point where S = M (in this case at 45° off-centre). This is easy to explain, since the fully left point is the point at which the output from the right capsule is zero. Therefore M = L + 0, and S = L - 0, both of which equal L. Secondly, at angles of incidence greater than 45° off-centre in either direction the two channels become out-of-phase, as was seen above, and this corresponds to the region in which S is greater than M. Thirdly, in the rear quadrant where the signals are in phase again, but left–right reversed, the M signal is greater than S again. The relationship between S and M levels, therefore, is an excellent guide to the phase relationship between the equivalent LR signals. If S is lower than M, then the LR signals will be in phase. If S = M, then the source is either fully left or right, and if S is greater than M, then the LR signals will be out-of-phase.

Images

Figure 16.12   Every coincident pair has an MS equivalent. The conventional left–right arrangement is shown in (a), and the MS equivalent in (b)

To show that this applies in all cases, and not just that of the figure-eight pair, look at the MS pair in Figure 16.13 together with its LR equivalent. This MS pair is made up of a forward-facing cardioid and a sideways-facing figure-eight (a popular arrangement). Its equivalent is a crossed pair of hypercardioids, and again the extremes of the image (corresponding to the null points of the LR hypercardioids) are the points at which S equals M. Similarly, the signals go out-of-phase in the region where S is greater than M, and come back in phase again for a tiny angle round the back, due to the rear lobes of the resulting hypercardioids. Thus the angle of acceptance (between fully left and fully right) is really the frontal angle between the two points on the MS diagram where M equals S.

Now, consider what would happen if the gain of the S signal was raised (imagine expanding the lobes of the S figure-eight). The result of this would be that the points where S equalled M would move inwards, making the acceptance angle smaller. As explained earlier, this results in a wider stereo image, since off-centre sounds will become closer to the extremes of the image, and is equivalent to increasing the angle between the equivalent LR capsules. Conversely, if the S gain is reduced, the points at which S equals M will move further out from the centre, resulting in a narrower stereo image, equivalent to decreasing the angle between the equivalent LR capsules. This helps to explain why Blumlein-style shufflers work by processing the MS equivalent signals of stereo pairs, as one can change the effective stereo width of pairs of signals, and this can be made frequency dependent if required.

Images

Figure 16.13   The MS equivalent of a forward facing cardioid and sideways figure-eight, as shown in (a), is a pair of hypercardioids whose effective angle depends on S gain, as shown in (b)

This is neatly exemplified in a commercial example, the Neumann RSM 191i, which is an MS mic in which the M capsule is a forward-facing short shotgun mic with a polar pattern rather like a hypercardioid. The polar pattern of the M and S capsules and the equivalent LR pair is shown in Figure 16.14 for three possible gains of the S signal with relation to M (-6 dB, 0 dB and +6 dB). It will be seen that the acceptance angle (M) changes from being large (narrow image) at -6 dB, to small (wide image) at +6 dB. Changing the S gain also affects the size of the ear lobes of the LR equivalent. The higher the S gain, the larger the rear lobes. Not only does S gain change stereo width, it also affects rear pickup, and thus the ratio of direct to reverberant sound.

Images

Figure 16.14   Polar patterns on the Neumann RSM191i microphone. (a) M capsule, (b) S capsule

Any stereo pair may be operated in the MS configuration, simply by orientating the capsules in the appropriate directions and switching them to an appropriate polar pattern, but certain microphones are dedicated to MS operation simply by the physical layout of the capsules (see Fact File 16.7).

Operational considerations with coincident pairs

The control of S gain is an important tool in determining the degree of width of a stereo sound stage, and for this reason the MS output from a microphone might be brought (unmatrixed) into a mixing console, so that the engineer has control over the width. This in itself can be a good reason for keeping a signal in MS form during the recording process, although M and S can easily be derived at any stage using a conversion matrix.

Images

Figure 16.14    (c) LR equivalent with −6 dB S gain, (d) 0 dB S gain, (e) +6 dB S gain

Fact file 16.7   End-fire and side-fire configurations

There are two principal ways of mounting the capsules in a coincident stereo microphone, be it MS or LR format: either in the ‘end-fire’ configuration where the capsules ‘look out’ of the end of the microphone, such that the microphone may be pointed at the source (see the diagram), or in the ‘side-fire’ configuration where the capsules look out of the sides of the microphone housing. It is less easy to see the direction in which the capsules are pointing in a side-fire microphone, but such a microphone makes it possible to align the capsules vertically above each other so as to be time-coincident in the horizontal plane, as well as allowing for the rotation of one capsule with relation to the other. An end-fire configuration is more suitable for the MS capsule arrangement (see diagram below), since the S capsule may be mounted sideways behind the M capsule, and no rotation of the capsules is required. There is a commercial example of an LR end-fire microphone for television ENG (electronic news gathering) use which houses two fixed cardioids side-by-side in an enlarged head.

Images

Although some mixers have MS matrixing facilities on board, the diagram in Figure 16.15 shows how it is possible to derive an LR mix with variable width from an MS microphone using three channels on a mixer without using an external MS matrix. M and S outputs from the microphone are fed in phase through two mixer channels and faders, and a post-fader feed of S is taken to a third channel line input, being phase-reversed on this channel. The M signal is routed to both left and right mix buses (panned centrally), whilst the S signal is routed to the left mix box (M + S = 2L) and the −S signal (the phase-reversed version) is routed to the right mix bus (M − S = 2R). It is important that the gain of the −S channel is matched very closely with that of the S channel. (A means of deriving M and S from an LR format input is to mix L and phase-reversed R together to get S, and without the phase reverse to get M.)

Images

Figure 16.15   An LR mix with variable width can be derived from an MS microphone connected to three channels of a mixer as shown. The S faders should be ganged together and used as a width control

Outdoors, coincident pairs will be susceptible to wind noise and rumble, as they incorporate velocity-sensitive capsules which always give more problems in this respect than omnis. Most of the interference will reside in the S channel, since this has always a figure-eight pattern, and thus would not be a problem to the mono listener. Similarly, physical handling of the stereo microphone, or vibration picked up through a stand, will be much more noticeable than with pressure microphones. Coincident pairs should not generally be used close to people speaking, as small movements of their heads can cause large changes in the angle of incidence, leading to considerable movement in their apparent position in the sound stage.

Near-coincident microphone configurations

‘Near-coincident’ pairs of directional microphones introduce small additional timing differences between the channels which may help in the localisation of transient sounds and increase the spaciousness of a recording, which at the same time remaining nominally coincident at low frequencies and giving rise to suitable amplitude differences between the channels. Headphone compatibility is also quite good owing to the microphone spacing being similar to ear spacing. The family of near-coincident (or closely spaced) techniques relies on a combination of time and level differences between the channels that can be traded off for certain widths of sound stage and microphone pattern.

Subjective evaluations often seem to show good results for such techniques. One comprehensive subjective assessment of stereo microphone arrangements, performed at the University of Iowa, consistently resulted of the near-coincident pairs scoring among the two few performers for their sense of ‘space’ and realism. Critics have attributed these effects to ‘phasiness’ at high frequencies (which some people may like, nonetheless), and argued that truly coincident pairs were preferable.

Table 16.1 Some near-coincident pairs based on the ‘Williams curves’

Images

A number of examples of near-coincident pairs exist as ‘named’ arrangements, although there is a whole family of possible near-coincident arrangements using combinations of spacing and angle. Some near-coincident pairs of different types, based on the ‘Williams curves’ (see Fact File 16.3) are given in Table 16.1. The so-called ‘ORTF pair’ is an arrangement of two cardioid mics, deriving its name from the organisation which first adopted it (the Office de Radiodiffusion-Television Française). The two mics are spaced apart by 170 mm, and angled at 110°. The ‘NOS’ pair (Nederlande Omroep Stichting, the Dutch Broadcasting Company), uses cardioid mics spaced apart by 300 mm and angled at 90°. Figure 16.16 illustrates these two pairs, along with a third pair of figure-eight microphones spaced apart by 200 mm, which has been called a ‘Faulkner’ pair, after the British recording engineer who first adopted it (this is not strictly based on the Williams curves). This latter pair has been found to offer good image focus on a small-to-moderate-sized central ensemble with the mics placed further back than would normally be expected.

Images

Figure 16.16   Near-coincident pairs (a) ORTF, (b) NOS, (c) Faulkner

Pseudo-binaural techniques

Binaural techniques could be classed as another form of near-coincident technique. The spacing between the omni microphones in a dummy head is not great enough to fit any of the Williams models described above for near-coincident pairs, but the shadowing effect of the head makes the arrangement more directional at high frequencies. Low-frequency width is likely to need increasing to make the approach more loudspeaker-compatible, as described earlier, unless one adheres to Theile’s association theory of stereo in which case little further processing is required except for equalisation.

The Schoeps KFM6U microphone, pictured in Figure 16.17, was designed as a head-sized sphere with pressure microphones mounted on the surface of the sphere, equalised for a flat response to frontal incidence sound and suitable for generating signals that could be reproduced on loudspeakers. This is in effect a sort of dummy head without ears. Dummy heads also exist that have been equalised for a reasonably natural timbral quality on loudspeakers, such as the Neumann KU100. The use of unprocessed dummy head techniques for stereo recording intended for loudspeakers has found favour with some recording engineers because they claim to like the spatial impression created, although others find the stereo image somewhat unfocused or vague.

Images

Figure 16.17   The Schoeps KFM6U microphone consists of two presssure microphones mounted on the surface of a sphere. (Courtesy of Schalltechnik Dr. -Ing. Schoeps GmbH)

Spaced microphone configurations

Spaced arrays have a historical precedent for their usage, since they were the first to be documented (in the work of Clement Ader at the Paris Exhibition in 1881), were the basis of the Bell Labs stereo systems in the 1930s, and have been widely used since then. They are possibly less ‘correct’ theoretically, from a standpoint of soundfield representation, but they can provide a number of useful spatial cues that give rise to believable illusions of natural spaces. Many recording engineers prefer spaced arrays because the omni microphones often used in such arrays tend to have a flatter and more extended frequency response than their directional counterparts, although it should be noted that spaced arrays do not have to be made up of omni mics (see below).

Spaced arrays rely principally on the precedence effect. The delays that result between the channels tend to be of the order of a number of milliseconds. With spaced arrays the level and time difference resulting from a source at a particular left–right position on the sound stage will depend on how far the source is from the microphones (see Figure 16.18), with a more distant source resulting in a much smaller delay and level difference. In order to calculate the time and level differences that will result from a particular spacing it is possible to use the following two formulae:

Δt = (d1d2)/c     ΔL = 20 log10 (d1/d2)

where Δt is the time difference and ΔL the pressure level difference which results from a source whose distance is d1 and d2 respectively from the two microphones, and c is the speed of sound (340 m/s).

When a source is very close to a spaced pair there may be a considerable level difference between the microphones, but this will become small once the source is more than a few metres distant. The positioning of spaced microphones in relation to a source is thus a matter of achieving a compromise between closeness (to achieve satisfactory level and time differences between channels), and distance (to achieve adequate reverberant information relative to direct sound). When the source is large and deep, such as a large orchestra, it will be difficult to place the microphones so as to suit all sources. It may therefore be found necessary to raise the microphones somewhat so as to reduce the differences in path length between sources at the front and rear of the orchestra.

Images

Figure 16.18   With spaced omnis a source at position X results in path lengths d1 and d2 to each microphone respectively, whilst for a source in the same LR position but at a greater distance (source Y) the path length difference is smaller, resulting in smaller time difference than for X

Spaced microphone arrays do not stand up well to theoretical analysis when considering the imaging of continuous sounds, the precedence effect being related principally to impulsive or transient sounds. Because of the phase differences between signals at the two loudspeakers created by the microphone spacing, interference effects at the ears at low frequencies may in fact result in a contradiction between level and time cues at the ears. It is possible in fact that the ear on the side of the earlier signal may not experience the higher level, thus producing a confusing difference between the cues provided by impulsive sounds and those provided by continuous sounds. The lack of phase coherence in spaced-array stereo is further exemplified by phase inverting one of the channels on reproduction, an action which does not always appear to affect the image particularly, as it would with coincident stereo, showing just how uncorrelated the signals are. (This is most noticeable with widely spaced microphones.)

Accuracy of phantom image positioning is therefore lower with spaced arrays, although many convincing recordings have resulted from their use. It has been suggested that the impression of spaciousness that results from the use of spaced arrays is in fact simply the result of phasiness and comb-filtering effects. Others suggest that there is a place for the spaciousness that results from spaced techniques, since the highly decorrelated signals which result from spaced techniques are also a feature of concert hall acoustics.

Griesinger has often claimed informally that spacing the mics apart by at least the reverberation radius (critical distance) of a recording space gives rise to adequate decorrelation between the microphones to obtain good spaciousness, and that this might be a suitable technique for ambient sound in surround recording. Mono compatibility of spaced pairs is variable, although not always as poor in practice as might be expected.

The so-called ‘Decca Tree’ is a popular arrangement of three spaced omnidirectional mics. The name derives from the traditional usage of this technique by the Decca Record Company, although even that company did not adhere rigidly to this arrangement. A similar arrangement is described by Grignon (1949). Three omnis are configured according to the diagram in Figure 16.19, with the centre microphone spaced so as to be slightly forward of the two outer mics, although it is possible to vary the spacing to some extent depending on the size of the source stage to be covered. The reason for the centre microphone and its spacing is to stabilise the central image which tends otherwise to be rather imprecise, although the existence of the centre mic will also complicate the phase relationships between the channels, thus exacerbating the comb-filtering effects that may arise with spaced pairs. The advance in time experienced by the forward mic will tend to solidify the central image, due to the precedence effect, avoiding the hole-in-the-middle often resulting from spaced pairs. The outer mics are angled outwards slightly, so that the axes of best HF response favour sources towards the edges of the stage whilst central sounds are on-axis to the central mic.

Images

Figure 16.19   The classic ‘Decca Tree’ involved three omnis, with the centre microphone spaced slightly forward of the outer mics

A pair of omni outriggers are often used in addition to the tree, towards the edges of wide sources such as orchestras and choirs, in order to support the extremes of the sound stage that are some distance from the tree or main pair (see Figure 16.20). This is hard to justify on the basis of any conventional imaging theory, and is beginning to move toward the realms of multi-microphone pickup, but can be used to produce a commercially acceptable sound. Once more than around three microphones are used to cover a sound stage one has to consider a combination of theories, possibly suggesting conflicting information between the outputs of the different microphones. In such cases the sound balance will be optimised on a mixing console, subject to the creative control of the recording engineer.

Spaced microphones with either omnidirectional or cardioid patterns may be used in configurations other than Decca Tree described above, although the ‘tree’ has certainly proved to be the more successful arrangement in practice. The precedence effect begins to break down for delays greater than around 40 ms, because the brain begins to perceive the two arrivals of sound as being discrete rather than integrated. It is therefore reasonable to assume that spacings between microphones which give rise to greater delays than this between channels should be avoided. This maximum delay, though, corresponds to a mic spacing of well over ten metres. Such extremes have not proved to work well in practice due to the great distance of central sources from either microphone compared with the closeness of sources at the extremes, resulting in a considerable level drop for central sounds and thus a hole in the middle.

Images

Figure 16.20   Omni outriggers may be used in addition to a coincident pair or Decca Tree, for wide sources

Images

Figure 16.21   Dooley and Streicher’s proposal for omni spacing

Dooley and Streicher have shown that good results may be achieved using spacings of between one-third and one-half of the width of the total sound stage to be covered (see Figure 16.21), although closer spacings have also been used to good effect. Bruel and Kjaer manufacture matched stereo pairs of omni microphones together with a bar which allows variable spacing, as shown in Figure 16.22, and suggest that the spacing used is smaller than one-third of the stage width (they suggest between 5 cm and 60 cm, depending on stage width). Their principal rule is that the distance between the microphones should be small compared with the distance from microphones to source.

Images

Figure 16.22   B&K omni microphones mounted on a stereo bar that allows variable spacing

Binaural recording and ‘dummy head’ techniques

While it is possible to use a real human head for binaural recording (generally attached to a live person), it can be difficult to mount high-quality microphones in the ears and the head movements and noises of the owner can be obtrusive. Sometimes heads are approximated by the use of a sphere or a disc separating a pair of microphones, and this simulates the shadowing effect of the head but it does not give rise to the other spectral filtering effects of the outer ear. Recordings made using such approaches have been found to have reasonable loudspeaker compatibility as they do not have the unusual equalisation that results from pinna filtering. (Unequalised true binaural recordings replayed on loudspeakers will typically suffer two stages of pinna filtering – once on recording and then again on reproduction – giving rise to distorted timbral characteristics.)

Dummy heads are models of human heads with pressure microphones in the ears that can be used for originating binaural signals suitable for measurement or reproduction. A number of commercial products exist, some of which also include either shoulders or a complete torso. A complete head-and-torso simulator is often referred to as a ‘HATS’, and an example is shown in Figure 16.23. The shoulders and torso are considered by some to be important owing to the reflections that result from them in natural listening, which can contribute to the HRTF. This has been found to be a factor that differs quite considerably between individuals and can therefore be a confusing cue if not well matched to the listener’s own torso reflections.

Some dummy heads or ear inserts are designed specifically for recording purposes whereas others are designed for measurement. As a rule, those designed for recording tend to have microphones at the entrances of the ear canals, whereas those designed for measurement have the mics at the ends of the ear canals, where the ear drum should be. (Some measurement systems also include simulators for the transmission characteristics of the inner parts of the ear.) The latter types will therefore include the ear canal resonance in the HRTF, which would have to be equalised out for recording/reproduction purposes in which headphones were located outside the ear canal. The ears of dummy heads are often interchangeable in order to vary the type of ear to be simulated, and these ears are modelled on ‘average’ or ‘typical’ physical properties of human ears, giving rise to the same problems of HRTF standardisation as mentioned above.

Images

Figure 16.23   Head and torso simulator (HATS) from B&K

The equalisation of dummy heads for recording has received much attention over the years, mainly to attempt better headphone/loudspeaker compatibility. Equalisation can be used to modify the absolute HRTFs of the dummy head in such a way that the overall spatial effect is not lost, partly because the differences between the ears are maintained. Just as Theile has suggested using diffuse field equalisation for headphones as a good means of standardising their response, he and others have also suggested diffuse field equalisation of dummy heads so that recordings made on such heads replay convincingly on such headphones and sound reasonably natural on loudspeakers. This essentially means equalising the dummy head microphone so that it has a near-flat response when measured in one-third octave bands in a diffuse sound field. The Neumann KU100, pictured in Figure 16.24, is a dummy head that is designed to have good compatibility between loudspeaker and headphone reproduction, and uses equalisation that is close to Theile’s proposed diffuse field response.

Binaural cues do not have to be derived from dummy heads. Provided the HRTFs are known, or can be approximated for the required angle of sound incidence, signals can be synthesised with the appropriate time delays and spectral characteristics. Such techniques are increasingly used in digital signal processing applications that aim to simulate natural spatial cues, such as flight simulators and virtual reality. Accurate sets of HRTF data for all angles of incidence and elevation have been hard to come by until recently, and they are often quite closely guarded intellectual property as they can take a long time and a lot of trouble to measure. The question also arises as to how fine an angular resolution is required in the data set. For this reason a number of systems base their HRTF implementation on relative coarse resolution data and interpolate the points in between.

Images

Figure 16.24   Neumann KU100 dummy head. (Courtesy of Georg Neumann GmbH, Berlin)

Spot microphones and two-channel panning laws

We have so far considered the use of a small number of microphones to cover the complete sound stage. It is also possible to make use of a large number of mono microphones or other mono sources, each covering a small area of the sound stage and intended to be as independent of the others as possible. This is the normal basis of most studio pop music recording, with the sources often being recorded at separate times using overdubbing techniques. In the ideal world, each mic in such an arrangement would pick up sound only from the desired sources, but in reality there is usually considerable spill from one to another. It is not the intention in this chapter to provide a full resumé of studio microphone technique, and thus discussion will be limited to an overview of the principles of multi-mic pickup as distinct from the more simple techniques described above.

In multi-mic recording each source feeds a separate channel of a mixing console, where levels are individually controlled and the mic signal is ‘panned’ to a virtual position somewhere between left and right in the sound stage. The pan control takes the monophonic signal and splits it two ways, controlling the proportion of the signal fed to each of the left and right mix buses. Typical pan control laws follow a curve which gives rise to a 3 dB drop in the level sent to each channel at the centre, resulting in no perceived change in level as a source is moved from left to right (see Fact File 6.2). This has often been claimed to be due to the way signals from left and right loudspeakers sum acoustically at the listening position, which includes a diffuse field component of the room. The -3 dB pan-pot law is not correct if the stereo signal is combined electrically to mono, since the summation of two equal signal voltages would result in a 6 dB rise in level for signals panned centrally. A -6 dB law is more appropriate for mixers whose outputs will be summed to mono (e.g.: radio and TV operations) as well as stereo, although this will then result in a drop in level in the centre for stereo signals. A compromise law of −4.5 dB is sometimes adopted by manufacturers for this reason.

Panned mono balances rely on channel level differences, separately controlled for each source, to create phantom images on a synthesised sound stage, with relative level between sources used to adjust the prominence of a source in a mix. Time delay is hardly ever used as a panning technique, for reasons of poor mono compatibility and technical complexity. Artificial reverberation may be added to restore a sense of space to a multi-mic balance. Source distance can be simulated by the addition of reflections and reverberation, as well as by changes in source spectrum and overall level (e.g.: HF roll-off can simulate greater distance).

It is common in classical music recording to use close mics in addition to a coincident pair or spaced pair in order to reinforce sources that appear to be weak in the main pickup. These close mics are panned to match the true position of the source. The results of this are variable and can have the effect of flattening the perspective, removing any depth which the image might have had, and thus the use of close mics must be handled with subtlety. David Griesinger has suggested that the use of stereo pairs of mics as spots can help enormously in removing this flattening effect, because the spill that results between spots is now in stereo rather than in mono and is perceived as reflections separated spatially from the main signal.

The recent development of cheaper digital signal processing (DSP) has made possible the use of delay lines, sometimes as an integral feature of digital mixer channels, to adjust the relative timing of spot mics in relation to the main pair. This can help to prevent the distortion of distance, and to equalise the arrival times of distant mics so that they do not exert a precedence ‘pull’ over the output of the main pair. It is also possible to process the outputs of multiple mono sources to simulate binaural delays and head-related effects in order to create the effect of sounds at any position around the head when the result is monitored on headphones or on loudspeakers using crosstalk cancelling, as described earlier.

Recommended further reading

Alexander, R. C. (2000) The lnventor of stereo. The life and works of Alan Dower Blumlein. Focal Press

Bartlett, B. (1991) Stereo Microphone Techniques. Focal Press

Eargle, J. (ed.) (1986) Stereophonic Techniques – An Anthology. Audio Engineering Society

Eargle, J. (2005) The Microphone Book. Focal Press

Hugonnet, C. and Walder, P. (1998) Stereophonic Sound Recording – Theory and Practice. John Wiley

Rumsey, F. (2001) Spatial Audio. Focal Press

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.151.44