6.5 Subjective evaluation

During the MPEG Surround development, the progress and corresponding performance have been documented in detail in several publications [49, 121, 122, 265] and documented in a formal verification test report [144]. The results published in those papers primarily focused on bitrate scalability, different channel configurations, support for external down mixes, and binaural decoding (see also Chapter 8).

The purpose of the listening tests described in this chapter is to demonstrate that existing stereo services can be upgraded to high-quality multi-channel audio in a fully backward compatible fashion at transmission bit rates that are currently used for stereo. In a first test, the MPEG Surround performance is demonstrated using two different core coders (AAC and MP3), and a comparison is made against alternative systems to upgrade a stereo transmission chain to multi-channel audio. In a second test, the performance for the operation mode without transmission of spatial parameters (i.e. using the enhanced matrix mode) is outlined.

6.5.1 Test 1: operation using spatial parameters

Stimuli and method

The list of codecs that were employed in the test is given in Table 6.2. The total employed bit rate (160 kbps) was set to a value that is commonly used for high-quality stereo transmission.

Configuration (1) represents stereo AAC at 128 kbps in combination with 32 kbps of MPEG Surround (MPS) parametric data. Configuration (2) is based on a different core coder (MP3 in combination with MPEG Surround) using a slightly lower parametric bitrate (and consequently a slightly higher bitrate for the core coder; informal listening indicated that this resulted in a higher overall quality). Configuration (3) is termed ‘MP3 Surround’ [120] which is a proprietary extension to the MPEG-1 layer 3 (MP3) codec. This extension also employs parametric side information to retrieve multi-channel audio from a stereo down-mix, but is not compatible with MPEG Surround. Configuration (4) employs the Dolby Prologic II matrixed surround system (DPLII) for encoding and decoding in combination with stereo AAC at a bit rate of 160 kbps. Configuration (5) is AAC in multi-channel mode, which represents state-of-the-art discrete channel coding.

Table 6.2 Codecs under test.

images

For configurations (1), (4) and (5), state-of-the-art AAC encoders were used. For configurations (2) and (3), an encoder and decoder available from www.mp3surround.com have been used (version April 2006). Dolby Prologic II encoding and decoding was performed using the Dolby-certified ‘Minnetonka Surcode for Dolby Prologic II’ package (version 2.0.3) using its default settings.

Eight listeners participated in this experiment. All listeners had significant experience in evaluating audio codecs and were specifically instructed to evaluate the overall quality, consisting of the spatial audio quality as well as any other noticeable artifacts. In a double-blind MUSHRA test [148], the listeners had to rate the perceived quality of several processed excerpts against the original (i.e. unprocessed) excerpts on a 100-point scale with 5 anchors, labeled ‘bad’, ‘poor’, ‘fair’, ‘good’ and ‘excellent’. A hidden reference and a low-pass filtered anchor (cut-off frequency at 3.5 kHz) were also included in the test. The subjects could listen to each excerpt as often as they liked and could switch in real time between all versions of each excerpt. The experiment was controlled from a PC with an RME Digi 96/24 sound card using ADAT digital out. Digital-to-analog conversion was provided by an RME ADI-8 DS 8-channel D-to-A converter. Discrete pre-amplifiers (Array Obsydian A-1) and power amplifiers (Array Quartz M-1) were used to feed a 5.1 loudspeaker setup employing B&W Nautilus 800 speakers in a dedicated listening room according to ITU recommendations [147].

A total of 11 critical excerpts were used as listed in Table 6.3. The excerpts are the same as used in the MPEG Call for Proposals (CfP) on Spatial Audio Coding [142], and range from pathological signals (designed to be critical for the technology at hand) to movie sound and multi-channel productions. All input and output excerpts were sampled at 44.1 kHz.

Results

The subjective results of each codec and excerpt are shown in Figure 6.19. The horizontal axis denotes the excerpt under test, the vertical axis the mean MUSHRA score averaged across listeners, and different symbols indicate different codecs. The error bars denote the 95% confidence intervals of the means.

For all excerpts, the hidden reference (square symbols) has scores virtually equal to 100 with a very small confidence interval. The low-pass anchor (circles), on the other hand, consistently has the lowest scores around 10–20. The scores for AAC multi-channel (rightward triangles) are between 20 and 60 for the individual excerpts, and its average rates approximately 40. Stereo AAC in combination with Dolby Prologic II (leftware triangles) scores only slightly higher on average. For 10 out of the 11 excerpts, the combination of stereo AAC and MPEG Surround has the highest scores (diamonds).

Table 6.3 Test excerpts.

images

images

Figure 6.19 Mean subjective results averaged across listeners for each codec and excerpt. Error bars denote 95% confidence intervals. Reproduced by permission of the Audio Engineering Society, Inc, New York, USA.

The overall scores (averaged across subjects and excerpts) are given in Figure 6.20. AAC with MPEG Surround scores approximately 5 points higher than MP3 with MPEG Surround. MP3 Surround scores approximately 15 points lower than MPEG Surround when combined with MP3.

images

Figure 6.20 Overall mean subjective results for each codec. Reproduced by permission of the Audio Engineering Society, Inc, New York, USA.

Discussion

The results indicate the added value of parametric side information with a stereo transmission channel (configurations (1), (2) and (3) vs configurations (4) and (5)). The increase in quality for MPEG Surround compared with discrete multi-channel coding or matrixed surround methods amounts to more than 40 MUSHRA points (using AAC as core coder), which is a considerable improvement. All three parameter-enhanced codecs demonstrated such a clear benefit, enabling high-quality audio transmission at bitrates that are currently used for high-quality stereo transmission. The two core coders tested seem to have only a limited effect, since the difference between AAC with MPEG Surround and MP3 with MPEG Surround is reasonably small. On the other hand, given the large difference in quality between configurations (4) and (5) which are based on the same core coder using virtually the same bitrate, the two different parametric enhancements (MPEG Surround and MP3 Surround, respectively) seem to differ significantly in terms of quality and compression efficiency; MPEG Surround delivers significantly higher quality while using only 69% of the parameter bit rate of MP3 Surround.

6.5.2 Test 2: operation using enhanced matrix mode

Stimuli and method

The list of configurations that were employed in the test is given in Table 6.4.

All configurations employed stereo AAC at a bitrate of 160 kbps as core coder. Configuration (1) serves as a high-quality anchor employing MPEG Surround with 32 kbps of spatial parameter data. For configuration (2), no parameters were transmitted; the MPEG Surround encoder generated a matrixed surround compatible stereo down-mix (using the MTX conversion block), while the MPEG Surround decoder operated in enhanced matrix mode (EMM) as outlined in Section 6.4.3. Configuration (3) employed a Dolby Prologic II (DPLII) encoder to convert multi-channel to matrixed surround compatible stereo, and a Dolby Prologic II decoder to resynthesize multi-channel signals. The same AAC and Prologic encoders and decoders were employed as in the previous test. The test procedure and reproduction setup were also equal to those in the previous test. The 10 excerpts that were used are given in Table 6.5. These items were used in the MPEG Surround verification test of which the results are given in [144].

Results

The MUSHRA scores averaged across excerpts and subjects are given in Figure 6.21. The square symbols denote the reference, which has as score of 100. Configuration (1), employing MPEG Surround using 32 kbps of parametric overhead, results in a very high subjective quality with a score of over 90 (diamonds). If no parametric data has been transmitted, as in configuration (2) denoted by the downward triangles, the average score drops to around 70. Dolby Prologic II (upward triangles), on the other hand, has a score of around 55. The low-pass filtered anchor (leftward triangles) has a score around 23.

Table 6.4 Codecs under test.

images

Table 6.5 Test excerpts.

images

images

Figure 6.21 Overall mean subjective results for each codec.

Discussion

The results indicate a similar benefit of transmitting spatial parameters as the previous test. The additional transmission of 32kbps of parametric data results in a significant increase in perceptual quality compared with the two systems under test that do not employ any transmission of any additional information.

Configurations (2) and (3) both employ a matrixed surround compatible down-mix to facilitate multi-channel reconstruction at the decoder side. Interestingly, despite the very similar method to convey and signal surround activity of these two configurations, the MPEG Surround system is capable of reconstructing a multi-channel signal that is significantly closer to the original signal than the Dolby Prologic II system. Hence even if the transmission of additional data is undesirable or even impossible, MPEG Surround achieves a very competitive multi-channel experience.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.139.172