Parametric stereo (PS) is the first employment of spatial audio coding technology in international standards and commercially available audio codecs. Within MPEG-4, PS is supported in aacPlus v2 (also known as the high-efficiency AAC (HE-AAC) profile), a codec based on (mono) AAC, spectral band replication (SBR) and PS. This codec is currently regarded as the most efficient (stereo) audio coder available today, delivering ‘good’ quality at bitrates as low as 24–32 kbps, and ‘excellent’ quality around 48 kbps.
AacPlus v2 was developed roughly between 2001 and 2004, shortly after the finalization of MPEG-4 Audio version 1 and 2 in 1999–2000. At that point in time, one of the most efficient audio codecs was MPEG-4 AAC, a codec based on transform-domain (MDCT) coefficient quantization, enhanced with perceptual noise substitution (PNS) and long-term prediction (LPT) tools. The performance (or perceptual quality) of MPEG-4 AAC for stereo signals is excellent for bitrates of 96 kbps and higher, but drops rapidly for lower bitrates.
In 2001, MPEG started work to develop new technology to enable further bitrate reduction. Two areas of improved audio coding technology were identified;
Three new tools resulted from these work items:
SSC (in combination with PS) reached the final stage of MPEG-4 amendment 2 [135, 143] in mid-2003. However, since PS and SBR could be combined in a very cost-effective manner [214, 217, 235] resulting in an additional boost in coding efficiency for SBR-enabled codecs (i.e., aacPlus [77, 279]), the specific combination of MPEG-4 AAC, SBR and PS was standardized as enhanced aacPlus or aacPlus v2 [217]. Other (application-oriented) standardization bodies such as 3GPP [1] and DVB-H subsequently adopted this technology as well.
The general structure of an aacPlus v2 encoder is shown in the top panel of Figure 5.1. The stereo input signal is first processed by a parametric stereo encoder, resulting in a mono audio output signal and (stereo) spatial parameters. The mono signal is subsequently processed by the aacPlus v1 encoder. Finally, a multiplexer combines the spatial parameters and the mono bitstream into a single output bitstream. The spatial parameters are stored in the so-called ancillary data part of the core coder bitstream. This part of the bitstream is ignored by legacy (aacPlus v1) codecs, to ensure forward compatibility with existing decoders. In fact, the aacPlus v2 bitstreams are also compatible with MPEG-4 AAC, since both the SBR as well as the PS parameters are stored in the ancillary data part of the AAC bitstream. In other words, any AAC, aacPlus v1 or aacPlus v2 codec can in principle play back any bitstream generated by any of these encoders.
The structure of the aacPlus v2 decoder is shown in the lower panel of Figure 5.1. The decoder basically performs the reverse process of the encoder. The incoming bitstream is first split in a core (aacPlus v1) coder bitstream and a (stereo) spatial parameter bitstream. The core-coder bitstream is subsequently decoded by the aacPlus v1 decoder. Finally, the parametric stereo decoder generates stereo output signals based on the mono input and the transmitted parameters.
In this chapter, the parametric stereo encoder and decoder that are included in aacPlus v2 will be described in more detail. In international standards such as MPEG, encoders are in many cases not fully specified. In MPEG terminology, it is said that the encoder is informative. A simple encoder embodiment is provided as an example, while building a high-quality encoder is up to the developers themselves. On the other hand, the bitstream format and the decoder are fully specified (normative). Therefore, the encoder will be described mostly general terms and concepts, leaving a lot of freedom for the implementer or developer for optimization, while the decoder will be described in more detail. In particular, the specific combination of aacPlus (v1) as core coder and parametric stereo has some interesting complexity advantages when combined in a single system. This will be outlined in more detail in Section 5.5.6.
18.188.218.226