5.1 Introduction

Parametric stereo (PS) is the first employment of spatial audio coding technology in international standards and commercially available audio codecs. Within MPEG-4, PS is supported in aacPlus v2 (also known as the high-efficiency AAC (HE-AAC) profile), a codec based on (mono) AAC, spectral band replication (SBR) and PS. This codec is currently regarded as the most efficient (stereo) audio coder available today, delivering ‘good’ quality at bitrates as low as 24–32 kbps, and ‘excellent’ quality around 48 kbps.

5.1.1 Development and standardization

AacPlus v2 was developed roughly between 2001 and 2004, shortly after the finalization of MPEG-4 Audio version 1 and 2 in 1999–2000. At that point in time, one of the most efficient audio codecs was MPEG-4 AAC, a codec based on transform-domain (MDCT) coefficient quantization, enhanced with perceptual noise substitution (PNS) and long-term prediction (LPT) tools. The performance (or perceptual quality) of MPEG-4 AAC for stereo signals is excellent for bitrates of 96 kbps and higher, but drops rapidly for lower bitrates.

In 2001, MPEG started work to develop new technology to enable further bitrate reduction. Two areas of improved audio coding technology were identified;

  • Improved compression efficiency of audio signals or speech signals by means of bandwidth extension, that is forward and backward compatible with existing MPEG-4 technology;
  • Improved compression efficiency of high-quality audio signals by means of parametric coding.

Three new tools resulted from these work items:

  • SSC (sinusoidal coding) [66, 233, 234], a parametric audio coder based on decomposition of an audio signal into three objects: sinusoids, transients and noise, which can all be described very efficiently using parametric techniques;
  • SBR (spectral band replication) [78, 105, 145], a method to efficiently describe the upper bandwidth of an audio signal using a parametric approach;
  • PS (parametric stereo) [47], a variant of spatial audio coding that was actually initially developed in the context of SSC [135, 235].

SSC (in combination with PS) reached the final stage of MPEG-4 amendment 2 [135, 143] in mid-2003. However, since PS and SBR could be combined in a very cost-effective manner [214, 217, 235] resulting in an additional boost in coding efficiency for SBR-enabled codecs (i.e., aacPlus [77, 279]), the specific combination of MPEG-4 AAC, SBR and PS was standardized as enhanced aacPlus or aacPlus v2 [217]. Other (application-oriented) standardization bodies such as 3GPP [1] and DVB-H subsequently adopted this technology as well.

5.1.2 AacPlus v2

The general structure of an aacPlus v2 encoder is shown in the top panel of Figure 5.1. The stereo input signal is first processed by a parametric stereo encoder, resulting in a mono audio output signal and (stereo) spatial parameters. The mono signal is subsequently processed by the aacPlus v1 encoder. Finally, a multiplexer combines the spatial parameters and the mono bitstream into a single output bitstream. The spatial parameters are stored in the so-called ancillary data part of the core coder bitstream. This part of the bitstream is ignored by legacy (aacPlus v1) codecs, to ensure forward compatibility with existing decoders. In fact, the aacPlus v2 bitstreams are also compatible with MPEG-4 AAC, since both the SBR as well as the PS parameters are stored in the ancillary data part of the AAC bitstream. In other words, any AAC, aacPlus v1 or aacPlus v2 codec can in principle play back any bitstream generated by any of these encoders.

The structure of the aacPlus v2 decoder is shown in the lower panel of Figure 5.1. The decoder basically performs the reverse process of the encoder. The incoming bitstream is first split in a core (aacPlus v1) coder bitstream and a (stereo) spatial parameter bitstream. The core-coder bitstream is subsequently decoded by the aacPlus v1 decoder. Finally, the parametric stereo decoder generates stereo output signals based on the mono input and the transmitted parameters.

In this chapter, the parametric stereo encoder and decoder that are included in aacPlus v2 will be described in more detail. In international standards such as MPEG, encoders are in many cases not fully specified. In MPEG terminology, it is said that the encoder is informative. A simple encoder embodiment is provided as an example, while building a high-quality encoder is up to the developers themselves. On the other hand, the bitstream format and the decoder are fully specified (normative). Therefore, the encoder will be described mostly general terms and concepts, leaving a lot of freedom for the implementer or developer for optimization, while the decoder will be described in more detail. In particular, the specific combination of aacPlus (v1) as core coder and parametric stereo has some interesting complexity advantages when combined in a single system. This will be outlined in more detail in Section 5.5.6.

images

Figure 5.1 Structure of the aacPlus v2 encoder (top panel) and decoder (lower panel).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.150.231