9.2 Motivation and details

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

9.2 Motivation and details

As mentioned above, the scheme for joint-coding of audio source signals, shown in Figure 9.2, is based on transmission of the sum of the audio source signals,

images

where M is the number of source signals and s_i(n) are the individual source signals.

Similar to spatial audio coding techniques, this method relies on the assumption that the perceived auditory spatial image is largely determined by the inter-channel time difference (ICTD), inter-channel level difference (ICLD), and inter-channel coherence (ICC) between the rendered audio channels. Therefore, as opposed to requiring ‘clean’ source signals s_i(n) as mixer input in Figure 9.1, only signals ŝ_i(n) are required that result in similar ICTD, ICLD, and ICC at the mixer output as for the case of supplying the real source signals s_i(n) to the mixer. There are three goals for the generation of ŝ_i(n):

If ŝ_i(n) are supplied to a mixer, the mixer output channels will have approximately the same spatial cues (ICLD, ICTD, ICC) as if s_i(n) were supplied to the mixer.
ŝ_i(n) are to be generated with as little as possible information about the original source signals s_i(n) (because the goal is to have low bitrate side information).
ŝ_i(n) are generated from the transmitted sum signal s(n) such that a minimum amount of signal distortion is introduced.

images

Figure 9.3 A mixer for generating stereo signals given a number of source signals.

Without loss of generality for deriving the scheme for an arbitrary number of rendered output channels, a stereo mixer is considered. A further simplification over the general case is that only amplitude and delay modifications are applied for mixing. (Equalization of objects is also provided by frequency-dependent amplitude changes.) If the discrete source signals were available at the decoder, a stereo signal would be mixed, as shown in Figure 9.3, i.e.

images

where a_i and b_i represent level mixing parameters, while c_i, and d_i represent time delay mixing parameters.

Given, for each source with index i, the gain G_i (in dB), pan pot position ΔL_i (expressed as level difference in dB), and the delay pan pot position τ_i in samples, the mixing parameters (9.2) can be computed:

images

In the following, ICTD, ICLD, and ICC of the stereo mixer output x₁(n), x₂(n) are computed as a function of (statistical properties of) the input source signals s_i(n) and the mixing parameters a_i, b_i, c_i, and d_i. The expressions obtained will give an indication of which source signal properties determine the mixer output ICTD, ICLD, and ICC (together with the mixing parameters).

9.2.1 ICTD, ICLD and ICC of the mixer output

The ICTD, ICLD and ICC parameters of the mixer output are estimated in sub-bands (critical bands) and as a function of time. It is assumed that the source signals s_i(n) are zero mean and mutually independent. A pair of sub-band signals of the mixer output (Equation 9.2) is denoted ₁(n) and ₂(n). Note that for simplicity of notation we are using the same time index n for time-domain and sub-band-domain signals. Also, no sub-band index is used and the analysis/processing described is applied to subbands at each frequency independently. The subband powers of the two mixer output signals are:

images

where _i(n) is one sub-band signal of source s_i(n) and E{.} denotes short-time mean, e.g. _i

images

where K determines the length of the moving average. (The solutions provided are based on an assumption of incoherent source signals. For partially coherent signals, the reader is referred to Chapter 8, Section 8.3.3.) Note that the sub-band power values E {²_i(n)} represent for each source signal the spectral envelope as a function of time. The time span considered for the averaging (9.5) determines the time resolution at which the interchannel cues are considered.

The ICLD, ΔL(n), between signals ₁(n) ₂(n) is given by:

images

For estimating ICTD and ICC the normalized cross-correlation function [181]

images

is computed. The ICC, c(n), is obtained according to

For the computation of the ICTD, τ(n), the location of the highest peak of the crosscorrelation function along the delay axis is computed,

The normalized cross-correlation function can be computed as a function of the mixing parameters. Together with Equation (9.2), Equation (9.7) can be written as

images

which is equivalent to

images

where the normalized autocorrelation function Φ_i(n, e) is

images

and τ_i = d_i − c_i. Note that for computing Equation (9.11), given Equation (9.10), it has been assumed that the signals are wide sense stationary within the range of delays considered, i.e.

images

A numerical example for two source signals, illustrating the dependence between ICTD, ICLD, and ICC and the source sub-band power, is shown in Figure 9.4. The top, middle, and bottom panel of Figure 9.4 show ΔL(n),τ(n), and c(n), respectively, as a function of the ratio of the sub-band power of the two sources, a = E{s²₁(n)}/(E{s²₂(n)}), + {s²₂(n)}) for different mixing parameters (9.3) ΔL₁, L₂, τ₁, and τ₂ (with G_i = 1).

The top panel of Figure 9.4 indicates that when only one source has power in the sub-band (a = 0 or a = 1), then the mixer ICLD, ΔL(n) (9.6), is equal to the amplitude panning parameter ΔL_i (Equation 9.3) of the dominant source. When the power in the sub-bands fades from one source to the other, i.e. when a changes from zero to one, the mixer output level difference fades from the amplitude panning parameter of one source to the amplitude panning parameter of the other source.

The middle panel of Figure 9.4 indicates that when only one source has power in the sub-band (a = 0 or a = 1), then the mixer ICTD, τ(n) (Equation 9.9), is equal to the delay panning parameter τ_i (Equation 9.3) of the dominant source. As opposed to the mixer output level difference, the mixer output time difference is determined by the delay panning parameter of the source which has more power in the sub-band, as indicated by the hard switch of τ(n) at a = 0.5.

The bottom panel of Figure 9.4 indicates that when only one source has power in the sub-band (a = 0 or a = 1), then the mixer output coherence, c(n) (9.8), is equal to one. Mixer output coherence decreases when more than one source has power in the sub-band.

images

Figure 9.4 ΔL(n) (top), τ(n) (middle), and c(n) (bottom) for a critical band as a function of a = E{²₁(n)}/(E{s²₁(n)}+ E{s²₁(n)}). The mixer parameters (Equation 9.3) are: ΔL₁ = 14 dB, ΔL₂ = −14 dB, τ₁ = −400 μs, τ₂ = 400 μs (solid); ΔL₁ = 18 dB, ΔL₂ = 0 dB, τ₁ = −600 μs, τ₂ = 0 μs (dashed); L₁ = −10 dB, ΔL₂ = 10 dB, τ₁ = 200 μs, τ₂ = −200 μs (dotted). The source gain has always been chosen to be G_i = 0 dB.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 9.2 Motivation and details

Create new playlist

Sign In

Sign Up

9.2 Motivation and details

9.2.1 ICTD, ICLD and ICC of the mixer output

Table of Contents for
9.2 Motivation and details