3.5 Limitations of the human auditory system

3.5.1 Just-noticeable differences in interaural cues

Although the human auditory system is capable of estimating interaural cues with significant accuracy, it has several known limitations as well. For example, interaural cues have to change by a certain amount in order to be detectable. Such minimum required change is often referred to as just-noticeable difference (JND) or threshold.

For example, the just-noticeable change in ILD amounts to approximately 0.5–1 dB and is roughly constant over frequency and stimulus level [101, 123, 191, 282]. If the reference ILD increases, ILD thresholds increase also. For reference ILDs of 9 dB, the ILD threshold is about 1.2 dB, and for a reference ILD of 15 dB, the ILD threshold amounts between 1.5 and 2 dB [196, 225, 285].

The sensitivity to changes in ITDs strongly depends on frequency. For frequencies below 1000 Hz, this sensitivity can be described as a constant interaural phase difference (IPD) sensitivity of about 0.05 rad [153, 165, 191, 286]. The reference ITD has some effect on the ITD thresholds: large ITDs in the reference condition tend to decrease sensitivity to changes in the ITDs [123, 283]. There is almost no effect of stimulus level on ITD sensitivity [295]. At higher frequencies, the binaural auditory system is not able to detect time differences in the fine-structure waveforms. However, time differences in the envelopes can be detected quite accurately [22, 259]. Despite this high-frequency sensitivity, ITD-based sound source localization is dominated by low-frequency cues [24, 25].

The sensitivity to changes in the interaural coherence (IC) strongly depends on the reference coherence. For a reference coherence of +1, changes of about 0.002 can be perceived, while for a reference coherence around 0, the change in coherence must be about 100 times larger to be perceptible [63, 89, 178, 222]. The sensitivity to changes in interaural coherence is practically independent of stimulus level, as long as the stimulus is sufficiently above the absolute threshold [109]. At high frequencies, the envelope coherence seems to be the relevant descriptor of the spatial diffuseness [19, 20].

The threshold values described above are typical for spatial properties that exist for a prolonged time (i.e., 300–400 ms). If the duration is smaller, thresholds generally increase. For example, if the duration of the (change in) ILD and ITD in a stimulus is decreased from 310 to 17 ms, the thresholds may increase by up to a factor of 4 [21]. Interaural coherence sensitivity also strongly depends on the duration [277, 278, 294]. It is often assumed that the increased sensitivity for longer durations results from temporal integration properties of the auditory system.

There is, however, one important exception in which the auditory system does not seem to integrate spatial information across time. In reverberant rooms, the perceived location of a sound source is dominated by the first 2 milliseconds of the onset of the sound source, while the remaining signal is largely discarded in terms of spatial cues. This phenomenon is referred to as ‘the law of the first wavefront’ or ‘precedence effect’ [183, 239, 268, 289], which is discussed in Section 3.4.1.

3.5.2 Spectro-temporal decomposition

Extensive psychophysical research (cf. [125, 168, 263]) and efforts to model the binaural auditory system (cf. [46, 61, 90, 182, 248]) have suggested that the human auditory system extracts spatial cues as a function of time and frequency. To be more specific, there is considerable evidence that the binaural auditory system renders its binaural cues in a set of frequency bands, without having the possibility to acquire these properties at a finer frequency resolution. This spectral resolution of the binaural auditory system can be described by a filter bank with filter bandwidths that follow the ERB (equivalent rectangular bandwidth) scale [98, 108, 166].

The limited temporal resolution at which the auditory system can track binaural localization cues is often referred to as ‘binaural sluggishness’, and the associated time constants are between 30 and 100 ms [125, 167]. Although the auditory system is not able to follow ILDs and ITDs that vary quickly over time, this does not mean that listeners are not able to detect the presence of quickly varying cues. Slowly varying ILDs and/or ITDs result in a movement of the perceived sound source location, while fast changes in binaural cues lead to a percept of ‘spatial diffuseness’, or a reduced ‘compactness’ [26]. In other words, there exists a transition from distinct localization cues (if ITDs or ILDs remain constant within the temporal analysis window of the auditory system) to a decrease in interaural coherence (if ITDs or ILDs vary considerably within the temporal analysis window). Despite the fact that the perceived ‘quality’ of the presented stimulus depends on the movement speed of the binaural cues, it has been shown that the detectability of ILDs and ITDs is practically independent of the variation speed [45]. This sensitivity of human listeners to time-varying changes in binaural cues can also be described by sensitivity to changes in the maximum of the cross-correlation function (e.g., the coherence) of the incoming waveforms [20, 60, 152, 247]. In fact, there is considerable evidence that the sensitivity to changes in any of the binaural cues is the basis of the phenomenon of binaural masking level difference (BMLD) [71, 102, 103, 154, 191]. The BMLD reflects a change in the detection threshold of a signal (the maskee) in the presence of another masking signal (the masker), if the spatial properties of the masker are different from those of the signal (maskee). For example, when a masking tone is presented in phase to both ears, and a pure-tone maskee is presented out-of-phase to each ear simultaneously (a so-called NoSπ condition), the threshold level for detecting the signal is generally lower than for the case when both the masker and the maskee are presented in phase (NoSo condition) [102, 281]. In this condition, the addition of the out-of-phase maskee results in the generation of a constant ITD or ILD, depending on the phase angle between masker and maskee, which is used as a cue for detection. For the NoSo condition, this cue is absent and hence the detection thresholds for the signal are in many cases higher (up to 25 dB).

If the masker consists of noise, the interaural cues due to the addition of the out-of-phase signal will randomly fluctuate across time, depending on the bandwidth of the masking noise [45, 290]. The sensitivity to a decrease in coherence due to the signal results in lower detection thresholds than for an in-phase signal [44, 124, 260, 292].

Recently, it has been demonstrated that the concept of ‘spatial diffuseness’ depends mostly on the interaural coherence value itself and is relatively unaffected by the temporal fine-structure details of the waveforms within the temporal integration time of the binaural auditory system. For example, van de Par et al. [261] measured the detectability and discriminability of inter-aurally out-of-phase test signals presented in an inter-aurally in-phase masker. The subjects were perfectly able to detect the presence of the out-of-phase test signal, but they had great difficulty in discriminating different test signal types (i.e. noise vs harmonic tone complexes). In a second series of experiments van de Par et al. [262] a harmonic tone complex and a noise signal were presented simultaneously to two ears. In one ear, the harmonic tone complex was lagging while in the other ear, the noise was lagging. In other words, the harmonic tone complex (when presented in isolation) would have been lateralized to one ear, while the noise would have been lateralized to the other ear. Surprisingly, subjects had great difficulty in detecting a left/right swap of the stimulus as long as no distinct temporal structure was present in the harmonic tone complex. On the other hand, subjects could easily discriminate between the left/right lateralized stimulus and a condition without any time lags (i.e., both the noise and harmonic tone complex presented diotically).

These observations suggest that detection in a BMLD condition is indeed based on a change in binaural cues, and that the underlying signals that cause the change in binaural cues can not be isolated from the masking signal on a time–frequency grid that is more accurate than the time–frequency resolution of the binaural auditory system. In other words, detection of a change in binaural cues does not imply identification of the signals that cause the change.

3.5.3 Localization accuracy of single sources

The combined effect of spectral and interaural cues results in the ability of human listeners to discriminate between different positions in the horizontal plane with an accuracy of 1–10° [161, 195, 221]. Absolute localization tasks usually result in a lower accuracy between 2 and 30° [53, 187, 221, 273]. In the vertical direction, localization accuracy amounts to about 4 to 20 degrees [187, 208, 221, 273]. It has also been shown that changes in the localization cues, as long as the movement of the sound source is relatively slow [209], increase our ability to localize sound sources [208, 275].

3.5.4 Localization accuracy of concurrent sources

Localization accuracy in the presence of concurrent sound from different directions has been investigated by several authors. A detailed review is given by Blauert [26]. The effect of independent distracters on the localization of a target sound has been recently studied by Good and Gilkey [99], Good et al. [100], Lorenzi et al. [185], Hawley et al. [116], Drullman and Bronkhorst [70], Langendijk et al. [176], Braasch and Hartung [36], and Braasch [35]. The results of these studies generally imply that the localization of the target is either not affected or only slightly degraded by introducing one or two simultaneous distracters at the same overall level as the target. When the number of distracters is increased or the target-to-distracter ratio (T/D) is reduced, the localization performance begins to degrade. However, for most configurations of a target and a single distracter in the frontal horizontal plane, the accuracy stays very good down to a target level only a few dB above the threshold of detection [99, 100, 185]. An exception to these results is the outcome of the experiment of Braasch [35], where two incoherent noises with exactly the same envelope were most of the time not individually localizable.

3.5.5 Localization accuracy when reflections are present

Localization accuracy within rooms has been studied by Hartmann [112], Rakerd and Hartmann [219, 220], and Hartmann and Rakerd [114] (see also a review by Hartmann [113]). Overall, in these experiments the localization performance was slightly degraded by the presence of reflections. Interestingly, using slow-onset sinusoidal tones and a single reflecting surface, Rakerd and Hartmann [219] found that the precedence effect sometimes failed completely. In a follow-up study, the relative contribution of the direct sound and the steady-state interaural cues to the localization judgment was found to depend on the onset rate of the tones [220]. Nevertheless, absence of an attack transient did not prevent the correct localization of a broadband noise stimulus [112]. Giguére and Abel [96] reported similar findings for noise with the bandwidth reduced to one-third octave. Rise/decay time had little effect on localization performance except for the lowest center frequency (500 Hz), while increasing the reverberation time decreased the localization accuracy. Braasch et al. [37] investigated the bandwidth dependence further, finding that the precedence effect started to fail when the bandwidth of noise centered at 500 Hz was reduced to 100 Hz.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.21.159.82