5.2 Concepts of Spatial Hearing

As already mentioned, using only hearing, humans can localize sound sources, and they are also able to perceive some properties of the space they are in. This section considers both physical and perceptual issues which are related to such perception of spatial sound.

5.2.1 Head-related Transfer Functions

As a sound signal travels from a sound source to the ear canals of the listener, the signals in both ear canals will be different from the original sound signal and from each other. The transfer functions from a sound source to the ear canals are called the head-related transfer functions (HRTF) [Bla97]. They are dependent on the direction of a sound source related to the listener, and they yield temporal and spectral differences between left and right ear canals. Due to the fact that the ears are located on different sides of the skull, the arrival times of a sound signal vary with direction. Also, the skull casts an acoustic shadow that causes the contralateral ear signal to be attenuated. The shadowing is most prominent at frequencies above about 2 kHz, and does not exist when the frequency is below about 800 Hz. The pinna and other parts of the body may also change the sound signal. In some cases, it is advantageous to think about these filtering effects in the time domain, thus considering them head-related impulse responses (HRIR). Several authors have measured HRTFs by means of manikins or human subjects. A popular collection of measurements was taken by Gardner and Martin using a KEMAR dummy head, and made freely available [GM94, Gar97a]. A large set of HRTFs measured from humans have also been made available [ADDA01]. The HRTFs are also dependent on distance [BR99] with sources close to the listener. If the distance is more than about 1 m the dependence can be omitted. It will always be assumed in this chapter that the sources are in far field.

5.2.2 Perception of Direction

Humans decode the differences of sound between the ear channels and use them to localize sound sources. These differences are called binaural directional cues. Temporal difference is called the interaural time difference (ITD) and spectral difference is called the interaural level difference (ILD) [Bla97]. Humans are sensitive to ILD at all frequencies, and to ITD mainly at frequencies lower than about 1.5 kHz. At higher frequencies, humans are also slightly sensitive to ITDs between signal envelopes, and not at all to ITD between the carriers of the signals. In typical HRTFs there exists a region near 2 kHz, where ILD is not monotonic with azimuth angle, and listeners easily localize the sources erroneously if the ITD between signal envelopes does not provide information of sound source direction [MHR10].

ITD and ILD provide information on where a sound source is in the left–right dimension. The angle between the sound source direction and the median plane can thus be decoded by the listener. The median plane is the vertical plane which divides the space related to a listener into left and right parts. The angle between the median plane and the sound source defines the cone of confusion, which is a set of points that all satisfy the following condition: the difference in distance from both ears to any point on the cone is constant, as shown in Figure 5.1. The angular coordinate system used in this chapter is also shown in the figure, which utilizes clockwise azimuth angle θ, being zero in front of the listener, and elevation angle ϕ, which defines the angle between the horizontal plane and the sound source direction, where positive is above the horizontal plane.

Figure 5.1 The azimuth-elevation coordinate system and cone of confusion.

5.1

The information of the cone of confusion provided by the ITD and ILD is only an intermediate phase in the localization process. It is known that there are two mechanisms, which refine the perceived direction. One is related to monaural spectral cues, and the other is the monitoring of the effect of head rotation to binaural cues.

The monaural spectral cues are caused by the pinna of the listener, which filters the sound depending on the direction of arrival. For example, the concha, which is the cavity just around the ear canal opening, is known to have a direction-dependent resonance around 5–6 kHz [Bla97]. This effect with other direction-dependent filtering by the pinna and torso of the listener introduces spectral changes into the sound signal entering the ear canal at frequencies above 1–2 kHz. This provides information on the direction within the cone of confusion obtained from the ITD and ILD cues. Note that this mechanism is thus dependent on the spectrum of the signal, and a sufficiently broad and locally smooth spectrum is needed to decode the direction from the monaural spectrum. If the signal has too narrow a bandwidth, monaural spectral cues cannot be decoded. For example, some birds have a narrow bandwidth in their calls, and their localization using only hearing is relatively hard.

The effect of head movements on binaural cues, and how humans use this information in sound source localization [Bla97, GA97] are now discussed. For example, when a source is in front of the listener, and the listener rotates his head to the right, the left ear becomes closer the source, and the ITD and ILD cues change favoring the left ear. If the source is in the back of the listener, the cues would change favoring the right ear. This dynamic cue gives information on the source direction. Humans seem to use the information in a relatively coarse manner, such as if the source is in front, back or above of the listener. However, it is a very strong cue. A simple and very effective spatial effect can be composed by switching the ear canal signals of the listener in dynamic conditions either with tubes or microphones and loudspeakers. In this device, the sound signal captured on one side of the head of the listener is delivered to the ear on the other side. When wearing such a device, a striking directional effect is obtained, where the perceived direction of the voice of the visible speaker in front is perceived at the back of the listener.

5.2.3 Perception of the Spatial Extent of the Sound Source

It is also possible to perceive the extent of a sound source in some cases. For example, the sea shore and grand piano can be perceived to have a substantial width only using audition. Unfortunately, the knowledge of the corresponding perceptual phenomena and mechanisms is relatively sparse. A basic result is, that point-like broadband sound sources are perceived to be point-like, and when incoherent broadband sound arrives from multiple directions evenly it is perceived to surround the listener [Bla97]. In these cases, the perception corresponds well to the physical situation. When the frequency content is narrower, or the duration of the stimulus is short, the perceived widths of the sources are perceived to be narrower than in reality [PB82, CT03, Hir07, HP08]. When the frequency bands of a broad sound signal are presented using loudspeakers in different directions, the listener perceives the source to be wide, though not as wide as the loudspeaker ensemble is [Hir07].

5.2.4 Room Effect

So far we have discussed only the direct sound coming from the source to the listener. In real rooms and in many outdoor spaces there exist reflections and reverberation, which do not carry information on the direction of the sound. A mechanism has evolved which helps to localize sources in such environments. The precedence effect [Bla97, Zur87, LCYG99] is a suppression of early delayed versions of the direct sound in source direction perception. This has been researched a lot in classical studies, where a direct sound and a delayed sound are presented to a listener in anechoic conditions with two loudspeakers. When the delay is about 0–3 ms, no echo is perceived, and the perceived direction depends on the amplitude relationship and on the delay between the loudspeakers. The perceived direction may also be dependent on the frequency content of the sound. When the delay is about 5–30 ms, the presence of the lagging sound may be perceived, but it is not localized correctly. With larger delays, the delayed loudspeaker starts to be localizable. The effect is dependent on the signal, in principle: the more transient-like the nature of the signal, the more the precedence effect is salient. The precedence effect manifests itself in the Franssen effect, where the rapid onset of a sinusoid with a slow fadeout in one loudspeaker is interleaved with a slow fade in of the same sinusoid in another loudspeaker. The listener does not perceive that the second loudspeaker is emitting sound, but he erroneously perceives that the first loudspeaker is still active [Bla97].

Humans can also perceive the effect of the room in some manner. Indeed, in real life, a free-field condition very seldom occurs and sound always contains some reverberation, composed of reflections from surfaces. Humans can estimate the size of a room and even surface materials by listening to sounds. The perception relies on the density of the reflections and the length of the reverberation. Consider a concert hall and a bathroom, which both can have a reverberation time of 2 to 3 seconds, i.e., the sound is audible 2 to 3 seconds after the source has stopped emitting sound. The density of reflections, as well as their frequency characteristics modify the sound color, based on which humans can tell the size of space, even though the reverberation time in both cases is the same. The shape of the space can be also perceived to some extent, at least if it is a long narrow corridor, or a big concert hall.

5.2.5 Perception of Distance

Humans also perceive the distance of sound sources to some extent [Bla97]. There are some main cues used for this. The perceived loudness created by a sound source has been proven to affect the perceived distance: the softer the auditory event, that farther away it is perceived. However, the signal has to be somewhat known by the listener, and is effective only with sources in about 1 m–10 m distances.

Listeners use the acoustical room effect caused by the source in perception of distance: the more the room effect is present in the ear canals of the listener, the farther away is the source perceived. This is quantified with the direct-to-reverberant ratio (DRR) of sound energies expressed in decibels. Besides the DRR, the room also has another well-known effect on perceived distance. If the impulse response of the room contains no strong early reflections, the source is perceived to be relatively near. This is utilized in studio reverberators with a predelay parameter, which controls the delay between the direct sound and the reverb tail. If the value of the predelay is long enough, the source is perceived at the distance of the loudspeakers, and if it is very short, then the source is perceived to be farther away.

When the source is very near to the listener, there are also some binaural effects which are used in distance perception [DM98]. With close sources the magnitude of ILD is higher and appears at lower frequencies than a far source with the same direction, which creates the perception of a nearby source.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.44.143