Audio compression

Audio compression is aimed at dealing with both speech and music signals in a perceptual manner.

A perceptual audio coder does not attempt to accurately reconstruct the input signal after encoding and decoding, rather its goal is to ensure that the output signal sounds the same to the listener.

The primary psychoacoustic effect that the perceptual audio coder uses is called auditory masking, where parts of a signal which are not discernible by the human ear can be imperceptibly removed.

Psychoacoustics

Psychoacoustics plays a key role by compressing parts of the signal that are imperceptible to the human ear.

The human ear hears frequencies between 20 Hz and 20 kHz, and the human voice typically produces sounds in the range of 400Hz–4kHz, with the ear most sensitive in the range of 2–4 kHz.

The low frequencies are vowels and bass, while the high frequencies are consonants. The dynamic range (i.e. quietest to loudest sounds) of the human ear is about 96 dB (around four thousand million to one), with the threshold of pain at around 115 dBA and permanent damage at around 130 dBA. The dBA is a unit of measurement of the sound pressure level that the ear is sensitive to.

It could be considered that audio does not need to be compressed as it uses relatively little bandwidth compared to video.

CD audio, for instance, is not compressed, and has 44 100 samples per second (44.1 kHz sampling), with 16 bits per sample and 2 channels (stereo), which gives a data rate of 1.4 Mbps.

It can also be argued (and hotly by audiophiles!) that audio suffers when it is compressed. However, it is both desirable and possible to compress the audio signal with no discernibly detrimental effect to reduce the overall data rate of the programme signal.

There are several techniques for the real-time digitization and compression of audio signals, some having been defined as international standards, and some remain proprietary systems (e.g. Dolby AC-3).

Audio sampling and masking

The processes used in digital audio compression are in principle the same as in digital video compression.

In the psychoacoustic model, the human ear appears to be more discerning than the human eye, which only requires a relatively limited dynamic range of dark to light. Because of the higher resolution required for audio due to the greater sensitivity of the human ear, the audio is typically sampled with 16 bits per sample. The typical sampling frequencies for audio are 32, 44.1 or 48 kHz.

After quantization the coded bit stream has a range of rates typically from 32 to 384 kbps.

In order to achieve higher compression ratios and transmit wider bandwidth of audio, MPEG audio compression uses in particular a technique known as ‘masking’.

Imagine a musical ensemble comprising several different instruments and playing all at the same time. The human ear is not capable of hearing all of the components of the sound because some of the quieter sounds are hidden or masked by the louder sounds.

If a recording was made of the music and the parts that we could not hear were removed, we would still hear the same sound, but we would have recorded much less data. This is exactly the way in which audio compression works, by removing the parts of the sound that we could not hear in any case.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.184.90