9.4 Using spatial audio decoders as mixers

Mixing is preferably applied to the transmitted sum signal (Equation 9.1) without explicit computation of ŝi(n). In this approach, the sum signal s(n) is directly transformed to a stereo or multi-channel output signal with the correct spatial attributes. A BCC synthesis stage or MPEG Surround decoder can be used for this purpose. In the following, a stereo reproduction case is considered, but all principles described can be applied for generation of multi-channel audio signals as well.

A stereo BCC synthesis scheme, applied for processing the sum signal (Equation 9.1), is shown in Figure 9.6. The scheme comprises a filterbank (FB), gains (g1, g2), delays (D1, D2), a decorrelation stage and inverse filterbanks (IFB). The purpose of this scheme is to generate a signal that is perceived similarly as the output signal of a mixer as shown in Figure 9.3. This requires correct synthesis of ICTD, ICLD, ICC and output levels of the BCC output as those obtained when the original source signals were directly fed in to the mixer (see Equation 9.2).

images

Figure 9.6 A mixer for generating stereo signals directly given the sum of a number of source signals without explicit computation of the source signals. Gain factors, delays, and decorrelation are applied independently in sub-bands.

The same side information as for the previously described more general scheme is used, allowing the decoder to have access to the short-time sub-band power values Eimages2i(n)} of the sources. Given Eimages2i(n)}, the gain factors g1 and g2 in Figure 9.6 are computed as

images

such that the output subband power and ICLD (9.6) are the same as for the mixer in Figure 9.3.

The ICTD τ(n) is computed according to Equation (9.9), determining the delays D1 and D2 in Figure 9.6,

images

The ICC c(n) is computed according to (9.8) determining the decorrelation processing in Figure 9.6. Decorrelation processing (ICC synthesis) is described in [79, 82, 83, 86, 235] and in Chapters 4, 5 and 6. The advantages of applying decorrelation processing to the mixer output channels compared to applying it for generating independent ŝi(n) are:

  1. Usually the number of source signals M is larger than the number of audio output channels N. Thus, the number of independent audio channels that need to be generated is smaller when de-correlating the N output channels as opposed to de-correlating the M source signals.
  2. Often the N audio output channels are correlated (ICC > 0) and less decorrelation processing can be applied than would be needed for generating independent M or N channels.

Due to less decorrelation processing better audio quality is expected.

Best audio quality is obtained when the mixer parameters are constrained such that a2i + b2i = 1, i.e. Gi = 0 dB. In this case, the sub-band power of each source in the transmitted sum signal (Equation 9.1) is the same as the power of the same source in the mixed decoder output signal. Interestingly, for Gi = 0, the decoder output signal (Figure 9.6) is the same as if the mixer output signal (Figure 9.3) were encoded and decoded by a BCC encoder/decoder in this case. Thus, similar audio quality can also be expected.

images

Figure 9.7 Proposed transcoder scheme to convert object (SAOC) parameters to an MPEG Surround (MPS) bit stream based on mixing parameters.

Also, in this case the decoder can not only determine the direction at which each source is to appear, but also the gain of each source can be varied. The gain is increased by choosing a2 +b2 >1(Gi > 0 dB) and decreased by choosing a2 +b2 <1(Gi < 0 dB) in (9.16). If values for Gi are different from zero, it is advisable to constraint the range of Gi between −12 and +12 dB.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.74.232