© Himanshu Bhalla, Oren Haggai 2021
H. Bhalla, O. HaggaiUnraveling Bluetooth LE Audiohttps://doi.org/10.1007/978-1-4842-6658-8_6

6. LC3 Codec

Himanshu Bhalla1   and Oren Haggai2
(1)
Bengaluru, India
(2)
Kefar Sava, Israel
 

In this chapter, we will look into the details of the mandatory LE Audio Codec – namely, LC3 (Low Complexity Communication Codec). The LC3 Codec is the main Codec used in LE Audio. Other optional or vendor-specific codecs may also be used. In this chapter, the LC3 Codec is reviewed from various perspectives. An algorithmic overview of the Codec is provided, and various aspects of the Codec features are outlined, such as compression quality, latency and complexity, the overall system delay, and bitrate.

Why a New Codec?

We already mentioned the need to use audio codecs in general, and for Bluetooth audio in particular. Using codecs allows the audio applications to compress the digital audio content into smaller bitrates for transmission over the air. Bluetooth audio compression is essential due to the requirement to send low user bitrates over the radio. In LE audio, compression is more essential in order to free more spectrum and enable low energy operation as well as new use cases. As we discussed in the previous chapters, LE Audio enables multistream use cases, and providing lower bitrates enables sending audio streams in multidevice topology.

The previous mandatory codecs used by Classic Bluetooth were providing a certain level of compression and quality which was considered good at that time. The main Codec is SBC for music use cases and its mSBC variant for voice use cases. It was later shown that SBC was considered to provide a medium music listening quality, and other optional codecs and vendor-specific codecs began to emerge in order to provide better listening quality. Optional codecs were added to Classic Audio music profiles, while vendor-specific codecs were used by proprietary applications. The other optional and vendor-specific codecs required higher bitrates.

In recent years, a lot of progress was made in acoustic and compression technologies. The result was codecs which are using better time to frequency transformations. These codecs had the potential to produce lower bitrate and better quality compared to SBC. The new codecs however were more complex in terms of the amount of memory and CPU or DSP cycles required for encoding and decoding. For example, in the cellular industry, there are advanced codecs which reside in cellular base station towers and in handset cellular devices (phones). The cellular codecs use the latest state-of-the-art compression and decompression technology to provide excellent sound quality to mobile phone users.

In many cases, the codecs in the cellular technology use proprietary technology and require royalty payment. These codecs are also highly complex and require strong processing power and large amounts of memory. In the cellular world, phones today are actually a small PC and do have the processing power to run such codecs. The same goes for the base station towers which may employ high processing units for compression and decompression logic. This is true for cellular radios between base stations and smartphones.

In Bluetooth, however, there is another category of small devices such as hearing aids, small speakers, microphones, and earbud devices. The category of small Bluetooth peripherals requires the Codec to be less complex in order to allow the peripheral devices to consume less power and run longer time on battery charging.

Due to the preceding reasons, the Bluetooth SIG aimed to develop a new Codec which will provide the latest state-of-the-art compression and decompression, better listening quality, and latency and yet employ lower complexity compared to the other advanced codecs from the same category. The Codec which was developed for this purpose is LC3, which indeed stands for Low Complexity Communication Codec.

Figure 6-1 shows the high-level flow of the LC3 Codec when used over the LE Audio stream.
../images/494931_1_En_6_Chapter/494931_1_En_6_Fig1_HTML.png
Figure 6-1

LC3 Codec operation

As shown in Figure 6-1, the source of audio is generating audio samples as a bit stream which is fed into the LC3 encoder part. The source audio samples are uncompressed PCM samples. For example, 48 KHz generates 480 samples every 10 ms. Each sample may be a 16-bit signed number, which represents the signal amplitude, as it changes in time. The output of the encoder part generates a lower compressed bit stream which is sent over the air to the remote device. The compression ratio of LC3 is about eight, and it is better than the legacy SBC compression. The remote device is the sink of the audio and is used for playback. The receiving device receives a compressed bit stream from the LE Audio radio and passes the bit stream into the decoder. The decoder handles the incoming bit stream and decodes it to recover the audio samples as close as possible to the originally sampled at the source device. At this point, the audio samples may be played on a speaker. In music use cases, one device acts as a source, and another device acts as a sink. In bidirectional communication such as voice or hybrid use cases, both devices act as a source and a sink and employ both LC3 encoder and decoder parts.

LC3 Compared with SBC/mSBC

LC3 provides better listening quality compared with SBC and produces smaller bitrate. Since LC3 uses the latest audio Codec technology, it is more complex compared with SBC; however, LC3 complexity is lower when compared to other state-of-the-art codecs (e.g., OPUS), which provide the same audio quality. Complexity is measured by the number of operations required to compress or decompress the bitrate. Another aspect of LC3 is that it may produce low latency compared with other codecs in the same category. Low latency operation in Bluetooth is essential since Bluetooth is the last link in the audio chain. Audio may have an end-to-end latency from the source to the destination, which may begin from cellular base stations or from Internet audio streaming. In particular, for voice applications such as cellular, the margin of Bluetooth audio latency is in the order of 20 ms, since it is already adding up to other latency components of the cellular call. In hearing aid use cases it is essential to keep latency low since the hearing aid device is often amplifying ambient sound from a close by wireless microphone. LC3 is meeting the requirement to provide voice coding in the order of 20 ms latency and similar or slightly higher latency for music use cases when required. Table 6-1 summarizes the comparison between LC3 and SBC/mSBC.
Table 6-1

LC3 to SBC comparison

Property

SBC/mSBC

LC3

Applications

Music and voice

Music and voice

Quality

Medium

High

Latency

Low

Low

Complexity

Low

Medium

Bitrate

High

Low

In the next few sections, we will review each category from Table 6-1 in more detail.

Listening Quality

As part of the LC3 Codec development, the Bluetooth SIG conducted subjective listening tests to assess the quality of the LC3 Codec as well as objective measurements such as PEAQ for music (Perceptual Evaluation of Audio Quality) and POLQA for speech (Perceptual Objective Listening Quality Analysis ).

The results showed that voice LC3 has the same voice quality at half of the bitrate when compared to mSBC. And in the case of music, the LC3 quality is always better than SBC, at about half the bitrate when compared to SBC high-quality settings.

Figure 6-2 shows music quality comparison between the SBC high-quality setting and the LC3 music setting. As shown, the quality of LC3 is close to the original reference audio file when no compression is done. In the quality comparison, LC3 uses half the bitrate of SBC.
../images/494931_1_En_6_Chapter/494931_1_En_6_Fig2_HTML.png
Figure 6-2

LC3 music quality

In subjective listening tests, a group of expert listeners listen to various audio samples and score the experience on a scale of 1 to 5. The listeners do not know if the audio is encoded and what Codec is used if any. There is a hidden reference file which is uncompressed within the various files. The listeners score the reference file as well as the compressed files based on subjective listening only. In various listening tests done by expert listeners, LC3 consistently scored close to 5, while the SBC quality is closer to 4, at almost double bitrate compared to LC3. A score closer to 5 is defined as imperceptible, which means that it is almost impossible to tell the coded/decoded samples from the original soundtrack, while a score of 4 is defined as perceptible, in which a difference in quality is perceived. A score closer to 3 is defined as slightly annoying, with annoyance increasing when the score is 2 or 1. The listening test procedure described above is also known as MUSHRA (MUltiple Stimuli with Hidden Reference and Anchor).

Latency and Complexity

A Codec adds latency to the audio chain. In general, audio streaming or voice calls already contain latency elements regardless of Bluetooth audio. For example, listening to audio streaming over the Internet includes network delay. Another example is a voice call, in which the cellular network and codecs introduce delay. As mentioned earlier, the gateway source of audio may also be using codecs. In the case of Bluetooth, there is a transcoding operation in which audio is recovered from the network or cellular coding and then compressed again for transmission over the Bluetooth connection.

As Bluetooth is the last link to the speaker and microphone, it is essential to keep its latency lower as possible. The latency requirements may be relaxed in the case of music playback and more tight in the case of voice or low latency media such as gaming or movie playback.

For the Bluetooth latency, we defined the overall Wireless System Delay (WSD) . The WSD consists of various latency components. It begins with the time it takes to capture an audio frame for encoding. It continues with the time it takes to encode the audio frame and then the time it takes to transport the audio frame over the air from the Bluetooth source device to the Bluetooth sink device. At the sink device, the LC3 decoding delay is added.

The encoding delay and the decoding delay are directly impacted by the Codec complexity for the given sampling rate and bitrate. These delays are impacted by the number of required operations. The encoding and decoding delays also depend on the implementation clock rate. Implementations with a faster clock rate will complete the encoding or decoding in shorter delays. Since in Bluetooth the LC3 Codec also resides in small devices with a slower clock rate, it is essential to optimize LC3 complexity accordingly, so it would allow reasonable delays also in small devices.

There is an additional component which is added to WSD, and it is called the presentation delay. The presentation delay placement is always at the peripheral device. In the case that the peripheral device is a wireless speaker, then the presentation delay is added after the decoding and includes the decoding time along with other components such as DAC delays (Digital to Analog Conversion) and jitter buffers. In the case that the peripheral device is a wireless microphone, then the presentation delay is added before the encoding and includes the encoding time along with other components such as ADC delays (Analog to Digital Conversion). The purpose of the presentation delay is to allow additional audio processing at the peripheral side. In the case of multidevice topology, the presentation delay is set equal among all peripheral devices, so that it is a lower common denominator of the best-case delay among all peripheral devices. The agreed value is communicated to all peripheral devices by BAP. This allows all peripherals to synchronize the audio, such that audio is rendered at the same time on multiple speaker devices or acquired over the air at the same time from multiple microphone devices. The presentation delay for microphones is also known as the acquisition delay. The presentation delay for speakers is also known as the rendering delay. The additional presentation delay on top of the encoder or decoder depends on the application. As an example, a presentation delay may be as low as 1 ms or as high as 60 ms. The peripheral reports a min and max range, and BAP mandates that 40 ms will be included in the range to assure minimum interoperability. As an example in a two-peripheral topology, if Peripheral A reports min=4 ms and max=50 ms and Peripheral B reports min=2 ms and max=45 ms, then the selected presentation delay for both Peripherals A and B will be configured to be 4 ms, since this is the minimum presentation delay which is supported by both peripherals. Although Peripheral B supports a minimum of 2 ms, this delay is not supported by Peripheral B, so the BAP client selects the value of 4 ms, which is the maximum between the minimums of the two devices.

The LC3 Codec works on units of frames. The frame duration is a configuration value of LC3. There are two frame duration modes possible in LC3 for LE Audio: 7.5 ms and 10 ms. In each mode, a full frame is processed and encoded after capturing. In addition to a full frame delay, LC3 also generates an inherent look ahead algorithmic delay. The encoder works on two frames in order to compress the next frame. A portion of the previous frame audio content is encoded in the current frame. And the audio content tail of the current capture is saved and encoded in the next frame. This property of LC3 is called LD-MDCT (Low Delay Modified Discrete Cosine Transform). The cosine transform serves as the filter when transferring time domain samples to frequency domain samples. And the width of the cosine filter is over two frame durations. As a result of LD-MDCT, a look ahead delay of 4 ms is added to the 7.5 ms frame, and a look ahead delay of 2.5 ms is added to the 10 ms frame. The look ahead delay is algorithmic only and represents a delay in audio content, and not actual processing time. This delay is added to WSD since it adds to the end-to-end latency of the audio content.

The various components of LC3 latency are shown in Table 6-2 for the two LC3 modes, 7.5 and 10, with examples for estimated values for the main components. An actual implementation may have smaller or higher values depending on the processing clock. Not shown in the table are elements for jitter, buffering, DAC, and ADC, which also depends on actual implementation. The actual values for the encoder and the decoder will be different for different sampling rates. The values in the table for encoding and decoding are providing a ballpark figure for the LC3 Codec latency. The LE transport delay in this example is assuming minimum transport latency, within a single transport interval, which is common in voice use cases; music use cases may use longer transport latency to accommodate more retry opportunities for higher reliability.
Table 6-2

Example for LC3 WSD latency components

Delay

Component Breakdown

LC3 7.5

[Total Latency:

16 ms + presentation delay]

[ms]

LC3 10

[Total Latency:

18.4 ms + presentation delay]

[ms]

Look ahead (end to end)

4

2.5

Capturing

7.5

10

Encoding

(Plus acquisition for peripheral)

2.2

2.8

LE transport

1.1

1.3

Decoding

(Plus rendering for peripheral)

1.4

1.8

Additional presentation delay

1 to 60

1 to 60

Figure 6-3 shows the delay component contribution over a timeline. The LD-MDCT transformation filter is shown at the top. The LD-MDCT transformation is done at the encoder, and it is using samples from the previous captured frame (N-1). Overall, the encoder works on a double window relative to the capturing window. The inverse LD-MDCT is shown at the bottom of the figure. The inverse LD-MDCT is used by the decoder, and it is using previous frame decoded information when decoding the current frame. Some of the decoded information (the tail) is saved toward the next decoded frame (N+1). The encoding and decoding are using LD-MDCT, and inverse LD-MDCT enhances the recovery back to the original audio content by using samples from previous frames. This is the LC3 property which provides close transparency audio recovery. This algorithm introduces the look ahead delay, so audio content is delayed by a partial frame, in addition to the capturing frame delay. Acquisition and rendering are delays which depend on peripheral configuration for either a microphone or a speaker. In many cases, only one of these delay components is valid.
../images/494931_1_En_6_Chapter/494931_1_En_6_Fig3_HTML.png
Figure 6-3

LC3 wireless system delay components

The example in Figure 6-3 shows a low latency application where audio is rendered immediately as it is available. In this case, the LE transport is configured to deliver packets within the current frame only. The figure shows how the total frame N latency is roughly around 2 times the LC3 frame size (capturing window). There are applications which may allow longer latency for rendering. For example, music playback may allow longer latency in order to achieve higher reliability. In this case, the LE transport may be longer in cases where packets need more retransmissions for a longer period of time. We will get back to this aspect when discussing the link layer transport in Chapter 7.

Bitrate

LC3 spans a full range of sampling rates for voice and audio.

Among the various configuration options, there are two main differentiating vectors. Voice and music is the first vector. The hearing aid quality or high quality is the second vector.

The main configurations for hearing aid applications are sampling rates of 16 KHz for voice and 24 KHz for music. In the first chapter, we reviewed the hearing aid requirements for people with hearing loss and mentioned that higher tones than 11 KHz are not audible by hearing aid users. This is why the music sampling rate stops at 24 KHz (sampling rate is twice the tone content). Additional audio processing for hearing aid takes care to replace higher tones with tones under 11 KHz, which are covered by the 24 KHz sampling rate setting. The voice quality is a wideband speech (WB) and may be used by hearing aid users as well as people with no hearing loss.

For high-quality voice application, an additional configuration is added: 32 KHz which adds super wideband speech (SWB). With SWB, the voice quality adds an in-room experience, where during a voice call, the user may hear the call as if the user is within the same room as the counterpart on the other end of the call. This configuration has already become popular in cellular phones, and Bluetooth may now provide the same level of user experience.

For high-quality music or hybrid voice and music, 48 KHz full band is available. 48 KHz provides the full range of the human ear audibility which is sensitive to tones of up to 20 KHz, as we saw in Chapter 1. LC3 is designed as a mono Codec, so that left and right stereo may be encoded separately. This allows sending left and right streams to two separate speakers or earbuds.

Table 6-3 summarizes the bitrate and payload size which is required for each LC3 frame duration and application configuration. The table summarizes the most common configurations. There are other possible LC3 sampling rates such as 8 KHz for voice and 44.1 KHz for music, which are supported by LC3 but are less commonly used. There are also different bitrates which are available in LC3. For example, 48 KHz may be used with a bitrate of 124 kbps instead of 96 kbps for a marginal increase in quality. Table 6-3 shows the configuration for a single channel of audio, such as mono, left side or right side. Stereo requires twice the bitrate.
Table 6-3

LC3 latency components bitrates

Application

Sampling

Rate

[KHz]

LC3 7.5

Bitrate/Payload

[Kb/s]/[bytes]

LC 10

Bitrate/Payload

[Kb/s]/[bytes]

Hearing aid voice

Hearing aid music

High-quality voice

High-quality music

16

24

32

48

32/30

48/45

64/60

96/90

32/40

48/60

64/80

96/120

Summary

In this chapter, we looked into the LC3 Codec, which is the mandatory Codec in LE Audio. LC3 provides a full range of sampling rates at low bitrate, low latency, and low complexity and achieves excellent scores in listening tests.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.181.21