In conventional binaural rendering systems, a sound source i with associated time-domain signal xi(t) is rendered at a certain position by convolving the signal with a pair of headrelated impulse responses hl,i(t), hr,i(t), for the left and right ears, respectively, to result in binaural signals yl,i(t), yr,i(t):
with v ∈ {l, r}. This process is visualized in the left panel of Figure 8.1.
It is often convenient to express the convolution in the frequency domain using a frequency-domain representation Xi(f) of a short segment of xi(t):
with Hl,i(f), Hr,i(f) the frequency-domain representations (head-related transfer functions) of hl,i(t), hr,i(t), respectively. The power pyv,i at the eardrum resulting from signal yv,i in frequency band b is given by:
with (*) the complex conjugation operator, and f(b) the lower edge frequency of frequency band b. For clarity and readability, the subscript b will not be given in the following equations; all described processing should nevertheless be performed in each parameter band individually. If the HRTF magnitude spectra Hv,i(f) are locally stationary (i.e. constant within the frequency band b), this can be simplified to:
with phv,i the power within frequency band b of HRTF Hv,i:
and pxi the power of the source signal Xi(f) in frequency band b:
Thus, given the local stationarity constraint, the power at the level of the eardrums follows from a simple multiplication of the power of the sound source and the power of the HRTF in corresponding frequency bands. In other words, statistical properties of binaural signals can be deducted from statistical properties of the source signal and from the HRTFs. This parameter-based approach is visualized in the right panel of Figure 8.1.
The inter-aural phase difference ϕ in parameter band b is given by the phase difference ϕyl,iyr,i between the signals yl,i and yr,i in parameter band b:
Under the assumption of local stationarity of inter-aural HRTF phase spectra, the IPD can be derived directly from the HRTF spectra themselves, without involvement of the sound source signal:
with ϕhl,ihr,i the average phase angle of the HRTF pair corresponding to position i and frequency band b:
The equations above assume local stationarity of HRTF magnitude and inter-aural phase spectra to estimate the resulting binaural parameters. However, strong deviations from stationarity within analysis bands may result in a decrease in the inter-aural coherence (IC) for certain frequency bands, which can be perceived as a change in the spatial ‘compactness’ of a virtual sound source. To capture this property, the IC is estimated for each frequency band b. In the current context, the coherence is defined as the absolute value of the average normalized cross-spectrum:
The IC parameter has a dependency on the source signal xi. For broadband signals, however, it's expected value however is only dependent on the HRTFs:
with
In summary, under the local stationarity constraint, the binaural parameters pyl, pyr, IPD and IC resulting from a single sound source can be estimated from the sound source parameters pxi and the HRTF parameters phl,i, phr,i, ϕhl,i,hr,i and chl,i,hr,i.
For multiple simultaneous sound sources, conventional methods convolve each individual source signal i with an HRTF pair corresponding to the desired position, followed by summation:
Under the constraint of independent sound source signals xi(t), the power at the eardrums in frequency band b is given by the sum of the powers of each individual virtual sound source:
which can be written for stationary HRTF properties as:
The net IPD ϕ resulting from the simultaneous virtual sound sources i is given by:
This formulation can also be written in terms of parameters:
The IC can be estimated similarly:
The assumption of independent signals across various objects may hold for many applications, especially if each signal is associated with independent sound sources. However, for some applications, the various signals may comprise common components. For example if a virtual multi-channel audio setup is simulated, the signals that are radiated by the virtual loudspeakers may exhibit a significant mutual correlation. In that case, these correlations have to be taken into account in the binaural parameter estimation process. The ICC for band b is denoted by cxi1,xi2, for sound sources i1 and i2. In that case, the binaural parameters are estimated according to:
with
In a similar way, the IPD and IC are given by:
with
with
In these equations, the IPD ϕ of each sound source is assumed to be distributed symmetrically across the two binaural signals (i.e. ϕ/2 is the phase offset that is applied to the left-ear signal, and −ϕ/2 is the phase offset of the right-ear signal). As can be observed, these equations are equivalent to those given in Section 8.3.2 for cxi1,xi2 = 0 if i1 ≠ i2.
If the decrease in coherence due to HRTF convolution using different impulse responses for both ears is ignored (i.e. ch1 hr = 1) and hence it is assumed that the coherence of the binaural signal pair YL, YR is dominated by the fact that (partially) incoherent sources have different spatial positions, the estimation process simplifies to:
18.221.126.56