In this chapter, we will look into the details of the mandatory LE Audio Codec – namely, LC3 (Low Complexity Communication Codec). The LC3 Codec is the main Codec used in LE Audio. Other optional or vendor-specific codecs may also be used. In this chapter, the LC3 Codec is reviewed from various perspectives. An algorithmic overview of the Codec is provided, and various aspects of the Codec features are outlined, such as compression quality, latency and complexity, the overall system delay, and bitrate.
Why a New Codec?
We already mentioned the need to use audio codecs in general, and for Bluetooth audio in particular. Using codecs allows the audio applications to compress the digital audio content into smaller bitrates for transmission over the air. Bluetooth audio compression is essential due to the requirement to send low user bitrates over the radio. In LE audio, compression is more essential in order to free more spectrum and enable low energy operation as well as new use cases. As we discussed in the previous chapters, LE Audio enables multistream use cases, and providing lower bitrates enables sending audio streams in multidevice topology.
The previous mandatory codecs used by Classic Bluetooth were providing a certain level of compression and quality which was considered good at that time. The main Codec is SBC for music use cases and its mSBC variant for voice use cases. It was later shown that SBC was considered to provide a medium music listening quality, and other optional codecs and vendor-specific codecs began to emerge in order to provide better listening quality. Optional codecs were added to Classic Audio music profiles, while vendor-specific codecs were used by proprietary applications. The other optional and vendor-specific codecs required higher bitrates.
In recent years, a lot of progress was made in acoustic and compression technologies. The result was codecs which are using better time to frequency transformations. These codecs had the potential to produce lower bitrate and better quality compared to SBC. The new codecs however were more complex in terms of the amount of memory and CPU or DSP cycles required for encoding and decoding. For example, in the cellular industry, there are advanced codecs which reside in cellular base station towers and in handset cellular devices (phones). The cellular codecs use the latest state-of-the-art compression and decompression technology to provide excellent sound quality to mobile phone users.
In many cases, the codecs in the cellular technology use proprietary technology and require royalty payment. These codecs are also highly complex and require strong processing power and large amounts of memory. In the cellular world, phones today are actually a small PC and do have the processing power to run such codecs. The same goes for the base station towers which may employ high processing units for compression and decompression logic. This is true for cellular radios between base stations and smartphones.
In Bluetooth, however, there is another category of small devices such as hearing aids, small speakers, microphones, and earbud devices. The category of small Bluetooth peripherals requires the Codec to be less complex in order to allow the peripheral devices to consume less power and run longer time on battery charging.
Due to the preceding reasons, the Bluetooth SIG aimed to develop a new Codec which will provide the latest state-of-the-art compression and decompression, better listening quality, and latency and yet employ lower complexity compared to the other advanced codecs from the same category. The Codec which was developed for this purpose is LC3, which indeed stands for Low Complexity Communication Codec.
As shown in Figure 6-1, the source of audio is generating audio samples as a bit stream which is fed into the LC3 encoder part. The source audio samples are uncompressed PCM samples. For example, 48 KHz generates 480 samples every 10 ms. Each sample may be a 16-bit signed number, which represents the signal amplitude, as it changes in time. The output of the encoder part generates a lower compressed bit stream which is sent over the air to the remote device. The compression ratio of LC3 is about eight, and it is better than the legacy SBC compression. The remote device is the sink of the audio and is used for playback. The receiving device receives a compressed bit stream from the LE Audio radio and passes the bit stream into the decoder. The decoder handles the incoming bit stream and decodes it to recover the audio samples as close as possible to the originally sampled at the source device. At this point, the audio samples may be played on a speaker. In music use cases, one device acts as a source, and another device acts as a sink. In bidirectional communication such as voice or hybrid use cases, both devices act as a source and a sink and employ both LC3 encoder and decoder parts.
LC3 Compared with SBC/mSBC
LC3 to SBC comparison
Property | SBC/mSBC | LC3 |
---|---|---|
Applications | Music and voice | Music and voice |
Quality | Medium | High |
Latency | Low | Low |
Complexity | Low | Medium |
Bitrate | High | Low |
In the next few sections, we will review each category from Table 6-1 in more detail.
Listening Quality
As part of the LC3 Codec development, the Bluetooth SIG conducted subjective listening tests to assess the quality of the LC3 Codec as well as objective measurements such as PEAQ for music (Perceptual Evaluation of Audio Quality) and POLQA for speech (Perceptual Objective Listening Quality Analysis ).
The results showed that voice LC3 has the same voice quality at half of the bitrate when compared to mSBC. And in the case of music, the LC3 quality is always better than SBC, at about half the bitrate when compared to SBC high-quality settings.
In subjective listening tests, a group of expert listeners listen to various audio samples and score the experience on a scale of 1 to 5. The listeners do not know if the audio is encoded and what Codec is used if any. There is a hidden reference file which is uncompressed within the various files. The listeners score the reference file as well as the compressed files based on subjective listening only. In various listening tests done by expert listeners, LC3 consistently scored close to 5, while the SBC quality is closer to 4, at almost double bitrate compared to LC3. A score closer to 5 is defined as imperceptible, which means that it is almost impossible to tell the coded/decoded samples from the original soundtrack, while a score of 4 is defined as perceptible, in which a difference in quality is perceived. A score closer to 3 is defined as slightly annoying, with annoyance increasing when the score is 2 or 1. The listening test procedure described above is also known as MUSHRA (MUltiple Stimuli with Hidden Reference and Anchor).
Latency and Complexity
A Codec adds latency to the audio chain. In general, audio streaming or voice calls already contain latency elements regardless of Bluetooth audio. For example, listening to audio streaming over the Internet includes network delay. Another example is a voice call, in which the cellular network and codecs introduce delay. As mentioned earlier, the gateway source of audio may also be using codecs. In the case of Bluetooth, there is a transcoding operation in which audio is recovered from the network or cellular coding and then compressed again for transmission over the Bluetooth connection.
As Bluetooth is the last link to the speaker and microphone, it is essential to keep its latency lower as possible. The latency requirements may be relaxed in the case of music playback and more tight in the case of voice or low latency media such as gaming or movie playback.
For the Bluetooth latency, we defined the overall Wireless System Delay (WSD) . The WSD consists of various latency components. It begins with the time it takes to capture an audio frame for encoding. It continues with the time it takes to encode the audio frame and then the time it takes to transport the audio frame over the air from the Bluetooth source device to the Bluetooth sink device. At the sink device, the LC3 decoding delay is added.
The encoding delay and the decoding delay are directly impacted by the Codec complexity for the given sampling rate and bitrate. These delays are impacted by the number of required operations. The encoding and decoding delays also depend on the implementation clock rate. Implementations with a faster clock rate will complete the encoding or decoding in shorter delays. Since in Bluetooth the LC3 Codec also resides in small devices with a slower clock rate, it is essential to optimize LC3 complexity accordingly, so it would allow reasonable delays also in small devices.
There is an additional component which is added to WSD, and it is called the presentation delay. The presentation delay placement is always at the peripheral device. In the case that the peripheral device is a wireless speaker, then the presentation delay is added after the decoding and includes the decoding time along with other components such as DAC delays (Digital to Analog Conversion) and jitter buffers. In the case that the peripheral device is a wireless microphone, then the presentation delay is added before the encoding and includes the encoding time along with other components such as ADC delays (Analog to Digital Conversion). The purpose of the presentation delay is to allow additional audio processing at the peripheral side. In the case of multidevice topology, the presentation delay is set equal among all peripheral devices, so that it is a lower common denominator of the best-case delay among all peripheral devices. The agreed value is communicated to all peripheral devices by BAP. This allows all peripherals to synchronize the audio, such that audio is rendered at the same time on multiple speaker devices or acquired over the air at the same time from multiple microphone devices. The presentation delay for microphones is also known as the acquisition delay. The presentation delay for speakers is also known as the rendering delay. The additional presentation delay on top of the encoder or decoder depends on the application. As an example, a presentation delay may be as low as 1 ms or as high as 60 ms. The peripheral reports a min and max range, and BAP mandates that 40 ms will be included in the range to assure minimum interoperability. As an example in a two-peripheral topology, if Peripheral A reports min=4 ms and max=50 ms and Peripheral B reports min=2 ms and max=45 ms, then the selected presentation delay for both Peripherals A and B will be configured to be 4 ms, since this is the minimum presentation delay which is supported by both peripherals. Although Peripheral B supports a minimum of 2 ms, this delay is not supported by Peripheral B, so the BAP client selects the value of 4 ms, which is the maximum between the minimums of the two devices.
The LC3 Codec works on units of frames. The frame duration is a configuration value of LC3. There are two frame duration modes possible in LC3 for LE Audio: 7.5 ms and 10 ms. In each mode, a full frame is processed and encoded after capturing. In addition to a full frame delay, LC3 also generates an inherent look ahead algorithmic delay. The encoder works on two frames in order to compress the next frame. A portion of the previous frame audio content is encoded in the current frame. And the audio content tail of the current capture is saved and encoded in the next frame. This property of LC3 is called LD-MDCT (Low Delay Modified Discrete Cosine Transform). The cosine transform serves as the filter when transferring time domain samples to frequency domain samples. And the width of the cosine filter is over two frame durations. As a result of LD-MDCT, a look ahead delay of 4 ms is added to the 7.5 ms frame, and a look ahead delay of 2.5 ms is added to the 10 ms frame. The look ahead delay is algorithmic only and represents a delay in audio content, and not actual processing time. This delay is added to WSD since it adds to the end-to-end latency of the audio content.
Example for LC3 WSD latency components
Delay Component Breakdown | LC3 7.5 [Total Latency: 16 ms + presentation delay] [ms] | LC3 10 [Total Latency: 18.4 ms + presentation delay] [ms] |
---|---|---|
Look ahead (end to end) | 4 | 2.5 |
Capturing | 7.5 | 10 |
Encoding (Plus acquisition for peripheral) | 2.2 | 2.8 |
LE transport | 1.1 | 1.3 |
Decoding (Plus rendering for peripheral) | 1.4 | 1.8 |
Additional presentation delay | 1 to 60 | 1 to 60 |
The example in Figure 6-3 shows a low latency application where audio is rendered immediately as it is available. In this case, the LE transport is configured to deliver packets within the current frame only. The figure shows how the total frame N latency is roughly around 2 times the LC3 frame size (capturing window). There are applications which may allow longer latency for rendering. For example, music playback may allow longer latency in order to achieve higher reliability. In this case, the LE transport may be longer in cases where packets need more retransmissions for a longer period of time. We will get back to this aspect when discussing the link layer transport in Chapter 7.
Bitrate
LC3 spans a full range of sampling rates for voice and audio.
Among the various configuration options, there are two main differentiating vectors. Voice and music is the first vector. The hearing aid quality or high quality is the second vector.
The main configurations for hearing aid applications are sampling rates of 16 KHz for voice and 24 KHz for music. In the first chapter, we reviewed the hearing aid requirements for people with hearing loss and mentioned that higher tones than 11 KHz are not audible by hearing aid users. This is why the music sampling rate stops at 24 KHz (sampling rate is twice the tone content). Additional audio processing for hearing aid takes care to replace higher tones with tones under 11 KHz, which are covered by the 24 KHz sampling rate setting. The voice quality is a wideband speech (WB) and may be used by hearing aid users as well as people with no hearing loss.
For high-quality voice application, an additional configuration is added: 32 KHz which adds super wideband speech (SWB). With SWB, the voice quality adds an in-room experience, where during a voice call, the user may hear the call as if the user is within the same room as the counterpart on the other end of the call. This configuration has already become popular in cellular phones, and Bluetooth may now provide the same level of user experience.
For high-quality music or hybrid voice and music, 48 KHz full band is available. 48 KHz provides the full range of the human ear audibility which is sensitive to tones of up to 20 KHz, as we saw in Chapter 1. LC3 is designed as a mono Codec, so that left and right stereo may be encoded separately. This allows sending left and right streams to two separate speakers or earbuds.
LC3 latency components bitrates
Application | Sampling Rate [KHz] | LC3 7.5 Bitrate/Payload [Kb/s]/[bytes] | LC 10 Bitrate/Payload [Kb/s]/[bytes] |
---|---|---|---|
Hearing aid voice Hearing aid music High-quality voice High-quality music | 16 24 32 48 | 32/30 48/45 64/60 96/90 | 32/40 48/60 64/80 96/120 |
Summary
In this chapter, we looked into the LC3 Codec, which is the mandatory Codec in LE Audio. LC3 provides a full range of sampling rates at low bitrate, low latency, and low complexity and achieves excellent scores in listening tests.