3.4 Spatial hearing in rooms

3.4.1 Source localization in the presence of reflections: the precedence effect

Usually, the direct sound of a source reaches the ears earlier than the reflections of the same sound because the indirect path associated with a reflection is longer than the direct path from the source to the ears. The precedence effect describes a number of phenomena related to the auditory system's ability to resolve the direction of a source in the presence of one or more reflections by favoring the ‘first wavefront’ over successively arriving reflections. That is, the directional perception of reflections arriving within a few milliseconds after the direct sound is suppressed and the direct sound and these reflections are ‘fused’ into one single auditory object at the direction of the direct sound. Extensive reviews of the precedence effect have been given by Zurek [291], Blauert [26], and Litovsky et al. [183].

A typical precedence effect experiment is illustrated in Figure 3.11. The signals given to two loudspeakers are illustrated in Figure 3.11(a). The signal x1 contains pulses repeating at regular intervals τp. The same pulse train is contained in signal x2, but slightly delayed by τe. Typical values for τp and τe are 400 and 5 ms, respectively. When listening to these signals over a standard stereo setup a listener will perceive only one auditory object at the position of the loudspeaker which emits x1.

Figure 3.12 illustrates the three phases of the precedence effect. (I) The directional perception of a pair of stimuli with an interstimulus delay shorter than 1 ms is called summing localization (Section 3.3.4). The weight of the lagging stimulus reduces with increasing delay up to approximately 1 ms. (II) For delays greater than that the leading sound dominates the localization judgment. (III) Echo threshold refers to the delay where the fusion breaks apart. Depending on stimulus properties and individual listeners, thresholds of 2–50 ms have been reported in the literature [183]. The previously mentioned auditory model [87] not only attempts to explain localization of concurrent sources, but also localization of sources in the presence of reflections, i.e. the precedence effect.

images

Figure 3.11 A typical precedence effect experiment: (a) signals given to the loudspeakers; (b) an auditory object is perceived at the position of the leading-signal loudspeaker.

images

Figure 3.12 The three phases of the precedence effect: (I) summing localization; (II) precedence effect; (III) localization of primary auditory object and echo.

Figure 3.13(a) shows the echo threshold for speech signals as investigated by Damaske [65]. The echo threshold for noise pulses of different lengths are shown in Figure 3.13(b) [194]. The considered lengths of the noise pulses are 10, 30, and 100 ms.

3.4.2 Spatial impression

So far the discussion has mostly focused on the attribute of perceived direction or lateralization of auditory objects. One exception was the discussion of the role IC and ICC play for signals in determining the width of the auditory object. In the following, other attributes related to auditory objects and the auditory spatial image are briefly discussed. These attributes mostly depend on the properties of reflections relative to the direct sound.

Coloration

The first early reflections up to about 20 ms later than the direct sound can cause timbral coloration due to a ‘comb-filter’ effect which attenuates and amplifies frequency components in a frequency-periodic pattern.

An example for the effect of early reflections is illustrated in Figure 3.14. An impulse response h1 with a direct sound and a single reflection after 7 ms, is shown in the top left panel of Figure 3.14. The corresponding magnitude spectrum is shown in the top right panel, showing that this corresponds to a ‘comb-filter’, i.e. frequencies at regular intervals are removed from a signal when filtered with h1. The bottom two panels of Figure 3.14 illustrate another example with several reflections of different strengths.

images

Figure 3.13 Echo thresholds for: (a) speech; (b) noise pulses with different durations.

images

Figure 3.14 Two impulse responses h1(t) and h2(t) and their respective magnitude spectra H1(ω) and H2(ω).

Distance of auditory object

In free field, the following two-ear entrance signal attributes change as a function of source distance: power of signal reaching the ears and high-frequency content (air absorption). Additionally, for sources close to the head a source distance change causes a change in ILD across all frequencies [52]. There is evidence that the overall level of sound reaching the ears provides potent distance information. For a source for which a listener knows its likely level of emitted sound, such as speech, the overall sound level at the ear entrances provides an absolute distance cue [192, 193]. However, in situations when a listener does not expect a source to have a certain emitting level, overall sound level at the ear entrances can not be used for judging absolute distance. In such a situation, overall level provides only a relative cue [62].

On the other hand, in a reverberant environment there is more information available to the auditory system. The reverberation time and the timing of the first reflections contain information about the size of a space and the distance to the surfaces, thus giving an indication about the expected range of source distances. Thus it is not surprising that many researchers have argued that for relatively distant sources the ratio of the power of direct to reflected sound is a reliable distance cue, see e.g. [50, 192, 193].

Distance cues and their importance for generating artificial auditory spatial images have been discussed by Shinn-Cunningham [238]. It is argued that for real-world listening conditions and headphone playback, it is most important to consider level and reverberation cues.

Width of auditory objects

Barron and Marshall [6] found that lateral reflections from ±90° cause the greatest spatial impression. The closer the direction of the reflections is to the median plane the less is the resulting spatial impression. The spatial impression caused by a pair of early reflections in the range of reflection delays between about 5 and 80 ms is approximately constant. Based on this, Barron and Marshall proposed a physical measure called lateral fraction (LF) for measuring spatial impression. The lateral fraction is the ratio of the lateral sound energy to the total sound energy that arrived within the first 80 ms after the arrival of the direct sound

images

where h(t) is the impulse response and α(t) is the angle of arrival of the reflection at time t relative to the side as illustrated in Figure 3.15. A dipole microphone pointing towards the side can be used to measure h(t)cos α(t). The response of such a dipole microphone is indicated in Figure 3.15.

The lateral fraction measure is mostly associated with the width of the auditory object. More recent studies found that lateral reflections from ±90° are not optimal for creating greatest spatial impression at all frequencies [3, 203, 241], i.e. at certain frequencies reflections arriving from other directions than the side create most spatial impression.

images

Figure 3.15 Definition of reflection direction for the computation of lateral fraction and late lateral fraction.

images

Figure 3.16 Early reflections emitted from the side loudspeakers have the effect of widening the auditory object. The shaded area indicates the perceived auditory object.

An experimental setup for emulating early lateral reflections is illustrated in Figure 3.16. The direct sound is emitted from the center loudspeaker while independent early reflections are emitted from the left and right loudspeakers. The width of the auditory object increases as the relative strength of the early lateral reflections is increased.

Listener envelopment

More than 80 ms after the arrival of the direct sound, reflections tend to contribute more to the perception of the environment than to the auditory object itself. This is manifested in a sense of ‘envelopment’ or ‘spaciousness of the environment’, frequently denoted listener envelopment [201]. Such a situation occurs for example in a concert hall, where late reverberation arrives at the listener's ears from all directions.

images

Figure 3.17 Late reflections emitted from the side loudspeakers relate more to the environment than the auditory object itself. This is denoted listener envelopment. The shaded areas indicate the perceived auditory objects.

Bradley and Soulodre [39] extended the research of Barron and Marshall by adding more early reflections and a (late) ‘reverberation tail’. From a number of experiments they concluded that a similar measure as the lateral fraction for early reflections may also be applicable to reverberation. This measure relates more to listener envelopment than width of auditory objects. They termed this measure late lateral fraction:

images

Late lateral reflections can be emulated with a setup as shown in Figure 3.17. The direct sound is emitted from the center loudspeaker while independent late reflections are emitted from the left and right loudspeakers. The sense of listener envelopment increases as the relative strength of the late lateral reflections is increased, while the width of the auditory object is expected to be hardly affected.

Interaural cross-correlation coefficient

The previously described measures, lateral fraction and late lateral energy fraction, relate properties of rooms (early and late reflections) to the perceptual phenomena of width of auditory objects and listener envelopment. Another class of physical measures relates properties of the signals at the ear entrances to such attributes. In the following a few such measures are reviewed.

images

Figure 3.18 (a) Room impulse response (RIR); (b) left and right head-related transfer function (HRTF); (c) left and right binaural room impulse response (BRIR).

In order to represent properties of the ear entrance signals, binaural room impulse responses (BRIRs) are considered. Recall that a room impluse response (RIR) models the path from a source to an observation point in a room and a head-related transfer function (HRTF) models the path from a source to an ear entrance in free field. A BRIR is a linear filter modeling the multiple paths sound travels through in a room before reaching an ear entrance as direct sound and reflections. Usually HRTFs and BRIRs are considered in pairs, one for the left and right ear entrance, respectively. Figure 3.18 illustrates how RIR, a pair of HRTFs, and a pair of BRIRs are defined.

As implied by the results presented in Sections 3.3.2 and 3.3.4, IC and ICC are related to the width of auditory objects. An artificial experience related to listener envelopment can be evoked by emitting independent noise signals with the same level from loudspeakers distributed all around a listener, as illustrated in Figure 3.19. When the ICC between source signal pairs is increased, the width of the auditory object surrounding the listener decreases [64]. IC and ICC are in many cases directly related, i.e. lower ICC between a loudspeaker pair results in lower IC between the ear entrance signals [173]. Thus both, IC and ICC, seem to be related to auditory object width and listener envelopment. Similarly to the case of relating room properties to auditory object width and envelopment, IC can be related to these two properties by computing it relative to the early and late part of BRIRs. These two measures are often denoted early and late interaural cross-correlation coefficient, IACC(E) and IACC(L), respectively [38, 203]. The IACC is defined as

images

where

images

images

Figure 3.19 For multi-loudspeaker playback the auditory object surrounding the listener increases in width as the ICC between the signals decreases.

Table 3.1 Definition of interaural cross-correlation coefficient measures.

Measure BRIR segment(ms)
IACC(E) 0-80
IACC(L) 80-1000
IACC(T) 0-1000

and τ ∈ [−1 1] ms. The early, late, and total IACC use different limits of the integration, T1 and T2, in (3.6) as shown in Table 3.1.

Despite the fact that IACC and measures like lateral fraction and lateral energy fraction seem very different, they are often similarly influenced by lateral reflections [174]. A time-based division, by considering early and late reflections (e.g. up to 80 ms and later reflections) for measuring auditory object width and envelopment, is not always suitable since both influence each other to a certain degree.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.77.63