10.3 Spatial decomposition of stereo signals

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

10.3 Spatial decomposition of stereo signals

Stereo signals are recorded or mixed such that for each source the signal goes coherently into the left and right signal channel with specific directional cues (level difference, time difference) and reflected/reverberated independent signals go into the channels determining auditory object width and listener envelopment cues. This motivates modeling single source stereo signals, as illustrated in Figure 10.1, where the signal s mimics the direct sound from a direction determined by the factor a. The independent signals, n₁ and n₂, correspond to the lateral reflections. These signals are assumed to have the following relation with the stereo signal pair x₁, x₂:

images

In order to get a decomposition which is not only effective in a one auditory object scenario, but nonstationary scenarios with multiple concurrently active sources, the described decomposition is carried out independently in a number of frequency bands and adaptively in time:

images

Figure 10.1 Mixing a stereo signal mimicking direct sound s and lateral reflections n₁ and n₂. The factor A determines the direction at which the auditory object appears.

images

Figure 10.2 Each left and right time–frequency tile of the stereo signal, x₁ and x₂, is decomposed into three signals, s, n₁, and n₂, and a factor A.

where m is the sub-band index, k is the time index, and A_b the amplitude factor for signal s_m for a certain parameter band b that may comprise one or more sub-bands of the sub-band signals (see also Chapter 5 for more details on sub-bands and parameter bands). The decomposition in separate time/frequency tiles is illustrated in Figure 10.2, i.e. in each time–frequency tile with indices m and k, the signals s_m, n_1,m, n_2,m, and factor A_b are estimated independently. For brevity of notation, the sub-band and time indices are often ignored in the following. Similarly as BCC or MPEG Surround a perceptually motivated sub-band decomposition is used. This decomposition may be based on the fast fourier transform, quadrature mirror filterbank, or other filterbank. For each parameter band, the signals s_m, n_1,m, n_2,m, and A_b are estimated based on segments with a length of approximately 20 ms.

Given the stereo sub-band signal pair, x_1,m and x_2,m, the goal is to estimate s_m, A_b, n_1,m, and n_2,m in each parameter band. This is performed by analysis of the powers and cross-correlation of the stereo signal pair. A short-time estimate of the power of x_1,m in parameter band b is denoted p_x1,b and is obtained as outlined in Chapter 6, Section 6.3.4. The powers of n_1,m and n_2,m in each parameter band are assumed to be the same, i.e. it is assumed that the amount of lateral independent sound is the same for the left and right signals:

10.3.1 Estimating p_s,b, A_b and p_n,b

Given the sub-band representation of the stereo signal, the power (p_x₁,b, p_x₂,b) and the normalized cross-correlation ρx₁x_2,b for parameter band b are computed (see also Section 6.3.4). A_b,p_s,b, and p_n,b are subsequently estimated as a function of the estimated p_x₁,b, p_x₂,b, and ρ_x₁_x₂_,b. Three equations relating the known and unknown variables are:

images

These equations solved for A_b, p_s,b, and p_n,b, yield

images

with

images

10.3.2 Least-squares estimation of s_m, n_1,m and n_2,m

Next, the least-squares estimates of s_m, n_1,m, and n_2,m are computed as a function of A_b, p_s,b, and p_n,b. For each parameter band b and each independent signal frame, the signal s_m is estimated as

images

where w_1,b and w_2,b are real-valued weights. The estimation error is

The weights w_1,b and w_2,b are optimal in a least mean-square sense when the error signal E is orthogonal to x_1,m and x_2,m in parameter band b [117], i.e.

images

yielding two equations:

images

from which the weights are computed:

images

Similarly, n_1,m and n_2,m are estimated. The estimate of n_1,m is

images

The estimation error is

Again, the weights are computed such that the estimation error is orthogonal to x_1,m and x_2,m, resulting in

images

The weights for computing the least-squares estimate of n_2,m

images

are

images

10.3.3 Post-scaling

Given the initial least-squares estimates ŝ_m,_1,m, and _2,m, post-scaling is applied such that the power of the estimates ŝ_m,_1,m, and _2,m in each parameter band equals to p_s,b and p_n,b. The power of ŝ_m in parameter band b is

Thus, for obtaining an estimate of ŝ_m with power p_s,b,ŝ_m is scaled

images

With similar reasoning, _1,m and _2,m are scaled, i.e.

images

10.3.4 Numerical examples

The factor A_b (top panel), the ratio p_s,b/p_x₁,b (middle panel) and A²p_s,b/p_x2,b (lower _b panel) expressed in dB are shown in Figure 10.3 as a function of the stereo level difference p_x₂,b/p_x₁,b (in dB) and the cross correlation ρ_x₁_x₂_,b.

The weights w_1,b and w_2,b for computing the least-squares estimate of s_m are shown in the top two panels of Figure 10.4 as a function of the stereo signal level difference and ρx₁x₂,b. The post-scaling factor for ŝ_m (10.18) is shown in the bottom panel.

The weights w_3,b and w_4,b for computing the least-squares estimate of n_1,m and the corresponding post-scaling factor (10.19) are shown in Figure 10.5 as a function of the stereo signal level difference and ρx₁x₂,b.

The weights w_5,b and w_6,b for computing the least-squares estimate of n_2,m and the corresponding post-scaling factor (10.19) are shown in Figure 10.6 as a function of the stereo signal level difference and ρx₁x_2,b.

images

Figure 10.3 The factor A_b (top panel), the ratio p_s,b/p_x1,b (middle panel) and A²p_s,b/p_x2,b (lower panel) expressed in dB as a function of the stereo level difference p_x₂,b/p_x₁,b (in dB) and the cross correlation ρx₁x₂,b.

images

Figure 10.4 The least-squares estimate weights w_1,b and w_2,b and the post-scaling factor for computation of the estimate of s_m.

images

Figure 10.5 The least-squares estimate weights w_3,b and w_4,b and the post-scaling factor for computation of the estimate of n_1,m.

images

Figure 10.6 The least-squares estimate weights w_5,b and w_6,b and the post-scaling factor for computation of the estimate of n_2,m.

images

Figure 10.7 Estimates ŝ, A_b, ₁, and ₂ are shown as a function of time for a short audio clip. The factor A_b is shown for various parameter bands.

An example for the spatial decomposition of a stereo rock music clip with a singer in the center is shown in Figure 10.7. The estimates of s, A, n₁, and n₂ are shown. The signals are shown in the time domain (i.e. after independent processing of each parameter band and subsequently transforming the signals to the time domain) and A_b is shown for every time-frequency tile. The estimated direct sound s is relatively strong compared to the independent lateral sound n₁ and n₂ since the singer in the center is dominant.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 10.3 Spatial decomposition of stereo signals

Create new playlist

Sign In

Sign Up

10.3 Spatial decomposition of stereo signals

10.3.1 Estimating ps,b, Ab and pn,b

10.3.2 Least-squares estimation of sm, n1,m and n2,m

10.3.3 Post-scaling

10.3.4 Numerical examples

Table of Contents for
10.3 Spatial decomposition of stereo signals

10.3.1 Estimating p_s,b, A_b and p_n,b

10.3.2 Least-squares estimation of s_m, n_1,m and n_2,m