Chapter 8
Dynamic Range Control

U. Zölzer and E. Gerat

The dynamic range of a signal is defined as the logarithmic ratio of maximum to minimum signal amplitude and is given in decibels. The dynamic range of an audio signal lies between 40 and 120 dB. Dynamic range control of audio signals is used in many applications to match the dynamic behavior of the audio signal to different requirements. While recording, dynamic range control protects the AD converter from overload or it is used in the signal path to optimally use the full amplitude range of a recording system. For suppressing low‐level noise, so‐called noise gates are used so that the audio signal is passed through only from a certain level onward. While reproducing music and speech in a car, a shopping center, restaurant, or inside a disco, the dynamics have to match the special noise characteristic of the environment. Therefore the signal level is measured from the audio signal and a control signal is derived which then changes the signal level to control the loudness of the audio signal. This loudness control is adaptive to the input level. The combination of level measurement and adaptive signal level adjustment is called dynamic range control.

8.1 Basics

Figure 8.1 shows a block diagram of a system for dynamic range control. After measuring the input level upper X Subscript dB Baseline left-parenthesis n right-parenthesis, the output level upper Y Subscript dB Baseline left-parenthesis n right-parenthesis is affected by multiplying the delayed input signal x left-parenthesis n right-parenthesis by a factor g left-parenthesis n right-parenthesis according to

(8.1)y left-parenthesis n right-parenthesis equals g left-parenthesis n right-parenthesis dot x left-parenthesis n minus upper D right-parenthesis period

The delay of the signal x left-parenthesis n right-parenthesis compared with the control signal g left-parenthesis n right-parenthesis allows predictive control of the output signal level. This multiplicative weighting is carried out with corresponding attack and release time. Multiplication leads, in terms of a logarithmic level representation of the corresponding signals, to the addition of the weighting level upper G Subscript dB Baseline left-parenthesis n right-parenthesis to the input level upper X Subscript dB Baseline left-parenthesis n right-parenthesis giving the output level

Schematic illustration of system for dynamic range control.

Figure 8.1 System for dynamic range control.

(8.2)upper Y Subscript dB Baseline left-parenthesis n right-parenthesis equals upper X Subscript dB Baseline left-parenthesis n right-parenthesis plus upper G Subscript dB Baseline left-parenthesis n right-parenthesis period

8.2 Static Curve

The relationship between input level and weighting level is defined by a static level curve upper G Subscript dB Baseline left-parenthesis n right-parenthesis equals f left-parenthesis upper X Subscript dB Baseline left-parenthesis n right-parenthesis right-parenthesis. An example of such a static curve is given in Fig. 8.2. Here, the output level and the weighting level are given as functions of the input level.

Schematic illustration of static curve with the parameters.

Figure 8.2 Static curve with the parameters: LT = limiter threshold; CT = compressor threshold; ET = expander threshold; and NT = noise gate threshold.

With the help of a limiter, the output level is limited when the input level exceeds the limiter threshold (LT). All input levels above this threshold lead to a constant output level. The compressor maps a change of input level on a certain smaller change of output level. In contrast to a limiter, the compressor increases the loudness of the audio signal. The expander increases changes in the input level to larger changes in the output level. With this, an increase of the dynamics for low levels is achieved. The noise gate is used to suppress low‐level signals, for noise reduction, and is also used for sound effects like truncating the decay of room reverberation. Every threshold used in particular parts of the static curve is defined as the lower limit for the limiter and compressor and upper limit for the expander and noise gate.

In the logarithmic representation of the static curve, the compression factor R (ratio) is defined by the ratio of the input level change normal upper Delta upper P Subscript upper I to the output level change normal upper Delta upper P Subscript upper O, as given by

(8.3)normal upper R equals StartFraction normal upper Delta upper P Subscript upper I Baseline Over normal upper Delta upper P Subscript upper O Baseline EndFraction period
Schematic illustration of compressor curve.

Figure 8.3 Compressor curve (compressor ratio CR/compressor slope CS).

With the help of Fig. 8.3, the straight line equation upper Y Subscript dB Baseline left-parenthesis n right-parenthesis equals CT plus StartFraction 1 Over normal upper R EndFraction left-parenthesis upper X Subscript dB Baseline left-parenthesis n right-parenthesis minus CT right-parenthesis and the compression factor

are obtained, where the angle beta is defined as shown in Fig. 8.2. The relationship between the ratio R and the slope S can also be derived from Fig. 8.3 and is expressed as

(8.5)normal upper S equals 1 minus StartFraction 1 Over normal upper R EndFraction
(8.6)normal upper R equals StartFraction 1 Over 1 minus normal upper S EndFraction period

Typical compression factors are

(8.7)StartLayout 1st Row 1st Column Blank 2nd Column Blank 3rd Column normal upper R 4th Column equals 5th Column infinity 6th Column Blank 7th Column limiter semicolon 2nd Row 1st Column Blank 2nd Column Blank 3rd Column normal upper R 4th Column greater-than 5th Column 1 6th Column Blank 7th Column compressor left-parenthesis CR colon compressor ratio right-parenthesis semicolon 3rd Row 1st Column 0 2nd Column less-than 3rd Column normal upper R 4th Column less-than 5th Column 1 6th Column Blank 7th Column expander left-parenthesis ER colon expander ratio right-parenthesis semicolon 4th Row 1st Column Blank 2nd Column Blank 3rd Column normal upper R 4th Column equals 5th Column 0 6th Column Blank 7th Column noise gate period EndLayout

The transition from logarithmic to linear representation leads, from Eq. (8.4), to

where ModifyingAbove x With caret left-parenthesis n right-parenthesis and modifying above y with caret left-parenthesis n right-parenthesis are the linear levels and c Subscript upper T denotes the linear compressor threshold. Rewriting Eq. (8.8) gives the linear output level

(8.9)StartLayout 1st Row 1st Column StartFraction modifying above y with caret left-parenthesis n right-parenthesis Over c Subscript upper T Baseline EndFraction 2nd Column equals 3rd Column 1 0 Superscript StartFraction 1 Over upper R EndFraction log Super Subscript 10 Superscript left-parenthesis StartFraction ModifyingAbove x With caret left-parenthesis n right-parenthesis Over c Super Subscript upper T Superscript EndFraction right-parenthesis Baseline equals left-parenthesis StartFraction ModifyingAbove x With caret left-parenthesis n right-parenthesis Over c Subscript upper T Baseline EndFraction right-parenthesis Superscript StartFraction 1 Over upper R EndFraction Baseline comma 2nd Row 1st Column modifying above y with caret left-parenthesis n right-parenthesis 2nd Column equals 3rd Column c Subscript upper T Superscript 1 minus StartFraction 1 Over upper R EndFraction Baseline dot ModifyingAbove x With caret Superscript StartFraction 1 Over upper R EndFraction Baseline left-parenthesis n right-parenthesis comma EndLayout

as a function of input level. The control factor g left-parenthesis n right-parenthesis can be calculated by the quotient

(8.10)StartLayout 1st Row 1st Column g left-parenthesis n right-parenthesis 2nd Column equals 3rd Column StartFraction modifying above y with caret left-parenthesis n right-parenthesis Over ModifyingAbove x With caret left-parenthesis n right-parenthesis EndFraction 2nd Row 1st Column Blank 2nd Column equals 3rd Column left-parenthesis StartFraction ModifyingAbove x With caret left-parenthesis n right-parenthesis Over c Subscript upper T Baseline EndFraction right-parenthesis Superscript StartFraction 1 Over upper R EndFraction minus 1 Baseline period EndLayout

With the help of tables and interpolation methods, it is possible to determine the control factor without taking logarithms and antilogarithms. The implementation described as follows, however, makes use of the logarithm of the input level and calculates the control level upper G Subscript dB Baseline left-parenthesis n right-parenthesis with the help of the straight line equation. The antilogarithm leads to the value f left-parenthesis n right-parenthesis which gives the control factor g left-parenthesis n right-parenthesis with corresponding attack and release times (see Fig. 8.1).

To further smoothen the transition between the compressed (or expanded) and uncompressed portions of the static curve at the threshold level, the width normal upper W of the knee (angle at the threshold) can be adjusted. A width set to zero will give a hard knee while a larger width will set what is called a soft knee. It consists of a range centered on the threshold level with a variable width, as shown in Fig. 8.4. This allows a more transparent dynamic range control operation. The knee portion of the static curve is replaced by a piecewise continuous function

(8.11)normal upper Y Subscript dB Baseline equals Start 3 By 2 Matrix 1st Row 1st Column normal upper X Subscript dB Baseline 2nd Column 2 left-parenthesis normal upper X Subscript dB Baseline minus CT right-parenthesis less-than negative normal upper W comma 2nd Row 1st Column normal upper X Subscript dB Baseline plus StartFraction 1 minus normal upper R Over 2 upper W upper R EndFraction left-parenthesis normal upper X Subscript dB Baseline minus CT plus StartFraction normal upper W Over 2 EndFraction right-parenthesis squared 2nd Column 2 StartAbsoluteValue left-parenthesis normal upper X Subscript dB Baseline minus CT right-parenthesis EndAbsoluteValue less-than-or-equal-to normal upper W comma 3rd Row 1st Column CT plus normal upper R Superscript negative 1 Baseline left-parenthesis normal upper X Subscript dB Baseline minus CT right-parenthesis 2nd Column 2 left-parenthesis normal upper X Subscript dB Baseline minus CT right-parenthesis greater-than normal upper W period EndMatrix

The make‐up gain is an additional gain used to compensate the level loss produced by the compression operation. It allows the generation of a sound perceived to be louder without the distortions induced by the clipping of the audio signal.

Schematic illustration of static curve and gain mapping curve of a compressor with soft and hard knees and make-up gain.

Figure 8.4 Static curve and gain mapping curve of a compressor with soft and hard knees and make‐up gain.

8.3 Dynamic Behavior

In addition to the static curve of dynamic range control, the dynamic behavior in terms of attack and release times plays a significant role in sound quality. The rapidity of dynamic range control depends also on the measurement of PEAK and RMS values [McN84, Sti86].

8.3.1 Level Measurement

Level measurements [McN84] can be performed with the systems shown in Figs. 8.5 and 8.9. For PEAK measurement, the absolute value of the input is compared with the peak value x Subscript PEAK Baseline left-parenthesis n right-parenthesis. If the absolute value is greater than the peak value, the difference is weighted with the coefficient AT (attack time) and added to left-parenthesis 1 minus AT right-parenthesis dot x Subscript PEAK Baseline left-parenthesis n minus 1 right-parenthesis. For this attack case StartAbsoluteValue x left-parenthesis n right-parenthesis EndAbsoluteValue greater-than x Subscript PEAK Baseline left-parenthesis n minus 1 right-parenthesis, we get the difference equation (see Fig. 8.5)

(8.12)x Subscript PEAK Baseline left-parenthesis n right-parenthesis equals left-parenthesis 1 minus AT right-parenthesis dot x Subscript PEAK Baseline left-parenthesis n minus 1 right-parenthesis plus AT dot StartAbsoluteValue x left-parenthesis n right-parenthesis EndAbsoluteValue

with the transfer function

Schematic illustration of PEAK measurement.

Figure 8.5 PEAK measurement.

If the absolute value of the input is smaller than the peak value StartAbsoluteValue x left-parenthesis n right-parenthesis EndAbsoluteValue less-than-or-equal-to x Subscript PEAK Baseline left-parenthesis n minus 1 right-parenthesis (the release case), the new peak value is given by

with the release time coefficient RT. The difference signal of the input will be muted by the nonlinearity such that the difference equation for the peak value is given according to Eq. (8.14). For the release case, the transfer function

is valid. For the attack case, the transfer function (8.13) with coefficient AT is used, and for the release case, the transfer function (8.15) with the coefficient RT is used. The coefficients (see Section 8.3.3) are given by

where the attack time t Subscript a and the release time t Subscript r are given in msec (upper T Subscript upper S sampling interval). With this switching between filter structures, one achieves fast attack responses for increasing input signal amplitudes and slow decay responses for decreasing input signal amplitudes.

A first variation for peak detection is the branched peak detector that has two different branches of operation, as shown in Fig. 8.6. For the attack case AC when StartAbsoluteValue x left-parenthesis n right-parenthesis EndAbsoluteValue greater-than x Subscript PEAK Baseline left-parenthesis n minus 1 right-parenthesis and the release case RC when StartAbsoluteValue x left-parenthesis n right-parenthesis EndAbsoluteValue less-than-or-equal-to x Subscript PEAK Baseline left-parenthesis n minus 1 right-parenthesis, the operations are given by

(8.18)x Subscript PEAK Baseline left-parenthesis n right-parenthesis equals Start 2 By 2 Matrix 1st Row 1st Column left-parenthesis 1 minus AT right-parenthesis dot x Subscript PEAK Baseline left-parenthesis n minus 1 right-parenthesis plus AT dot StartAbsoluteValue x left-parenthesis n right-parenthesis EndAbsoluteValue comma 2nd Column for AC comma 2nd Row 1st Column left-parenthesis 1 minus RT right-parenthesis dot x Subscript PEAK Baseline left-parenthesis n minus 1 right-parenthesis comma 2nd Column for RC period EndMatrix
Schematic illustration of block diagrams of the attack and release parts of the branched peak level detector.

Figure 8.6 Block diagrams of the attack and release parts of the branched peak level detector.

One further variation of the peak detector is the so‐called decoupled peak level detector, which is presented in Fig. 8.7. It computes an auxiliary signal x 1 left-parenthesis n right-parenthesis based on the release time and then feeds this signal to an attack filter, as given by

(8.19)StartLayout 1st Row 1st Column x 1 left-parenthesis n right-parenthesis 2nd Column equals max left-parenthesis StartAbsoluteValue x left-parenthesis n right-parenthesis EndAbsoluteValue comma left-parenthesis 1 minus RT right-parenthesis dot x 1 left-parenthesis n minus 1 right-parenthesis plus RT dot StartAbsoluteValue x left-parenthesis n right-parenthesis EndAbsoluteValue right-parenthesis comma EndLayout
(8.20)StartLayout 1st Row 1st Column x Subscript PEAK Baseline left-parenthesis n right-parenthesis 2nd Column equals left-parenthesis 1 minus AT right-parenthesis dot x Subscript PEAK Baseline left-parenthesis n minus 1 right-parenthesis plus AT dot x 1 left-parenthesis n right-parenthesis period EndLayout

This detector adds a small onset to the release time, as shown in Fig. 8.8. This leads to the release time approximately equal to t Subscript AT Baseline plus t Subscript RT and is therefore always bigger than the attack time. This can be useful to avoid artifacts arising from release times that are too short [Gia12].

Schematic illustration of block diagram of the decoupled peak level detector.

Figure 8.7 Block diagram of the decoupled peak level detector.

Schematic illustration of attack and release behavior of branching and decoupled peak level detectors.

Figure 8.8 Attack and release behavior of branching and decoupled peak level detectors.

The computation of the RMS value

(8.21)x Subscript RMS Baseline left-parenthesis n right-parenthesis equals StartRoot StartFraction 1 Over upper N EndFraction sigma-summation Underscript i equals 0 Overscript upper N minus 1 Endscripts x squared left-parenthesis n minus i right-parenthesis EndRoot

over upper N input samples can be achieved by a recursive formulation. The RMS measurement shown in Fig. 8.9 uses the square of the input and performs averaging with a first‐order lowpass filter. The averaging coefficient

is determined according to the time constant calculation discussed in Section 8.3.3, where t Subscript upper M is the averaging time in msec. The difference equation is given by

(8.23)x Subscript RMS Superscript 2 Baseline left-parenthesis n right-parenthesis equals left-parenthesis 1 minus TAV right-parenthesis dot x Subscript RMS Superscript 2 Baseline left-parenthesis n minus 1 right-parenthesis plus TAV dot x squared left-parenthesis n right-parenthesis

with the transfer function

(8.24)upper H left-parenthesis z right-parenthesis equals StartFraction TAV Over 1 minus left-parenthesis 1 minus TAV right-parenthesis z Superscript negative 1 Baseline EndFraction period
Schematic illustration of RMS measurement.

Figure 8.9 RMS measurement (TAV = averaging coefficient).

8.3.2 Gain Factor Smoothing

Attack and release times can be implemented by the system shown in Fig. 8.10 [McN84]. The attack coefficient AT or release coefficient RT is obtained by comparing the input control factor with the previous one. A small hysteresis curve determines whether the control factor is in the attack or release status and hence gives the coefficient AT or RT. The system also serves to smooth the control signal. The difference equation is given by

(8.25)g left-parenthesis n right-parenthesis equals left-parenthesis 1 minus k right-parenthesis dot g left-parenthesis n minus 1 right-parenthesis plus k dot f left-parenthesis n right-parenthesis comma

with k equals AT or k equals RT and the corresponding transfer function leads to

(8.26)upper H left-parenthesis z right-parenthesis equals StartFraction k Over 1 minus left-parenthesis 1 minus k right-parenthesis z Superscript negative 1 Baseline EndFraction period
Schematic illustration of implementing attack and release times or gain factor smoothing.

Figure 8.10 Implementing attack and release times or gain factor smoothing.

8.3.3 Time Constants

If the step response of a continuous‐time system is

(8.27)g left-parenthesis t right-parenthesis equals 1 minus e Superscript negative t slash tau Baseline tau equals time constant comma

then sampling (step‐invariant transform) the step response gives the discrete‐time step response

(8.28)g left-parenthesis n upper T Subscript upper S Baseline right-parenthesis equals epsilon left-parenthesis n upper T Subscript upper S Baseline right-parenthesis minus e Superscript minus n upper T Super Subscript upper S Superscript slash tau Baseline equals 1 minus z Subscript infinity Superscript n Baseline with z Subscript infinity Baseline equals e Superscript minus upper T Super Subscript upper S Superscript slash tau Baseline period

The Z‐transform leads to

(8.29)StartLayout 1st Row 1st Column upper G left-parenthesis z right-parenthesis 2nd Column equals 3rd Column StartFraction z Over z minus 1 EndFraction minus StartFraction 1 Over 1 minus z Subscript infinity Baseline z Superscript negative 1 Baseline EndFraction 2nd Row 1st Column Blank 2nd Column equals 3rd Column StartFraction 1 minus z Subscript infinity Baseline Over left-parenthesis z minus 1 right-parenthesis left-parenthesis 1 minus z Subscript infinity Baseline z Superscript negative 1 Baseline right-parenthesis EndFraction period EndLayout

With the definition of attack time t Subscript a Baseline equals t 90 minus t 10, we derive

(8.30)0.1 equals 1 minus e Superscript minus t 10 slash tau Baseline left-arrow t 10 equals 0.1 tau comma
(8.31)0.9 equals 1 minus e Superscript minus t 90 slash tau Baseline left-arrow t 90 equals 0.9 tau period

The relationship between attack time t Subscript a and the time constant tau of the step response is obtained as follows:

(8.32)StartLayout 1st Row 1st Column 0.9 slash 0.1 2nd Column equals 3rd Column e Superscript left-parenthesis t 90 minus t 10 right-parenthesis slash tau 2nd Row 1st Column ln left-parenthesis 0.9 slash 0.1 right-parenthesis 2nd Column equals 3rd Column left-parenthesis t 90 minus t 10 right-parenthesis slash tau 3rd Row 1st Column t Subscript a 2nd Column equals 3rd Column t 90 minus t 10 equals 2.2 tau period EndLayout

Hence, the pole is calculated as

(8.33)StartLayout 1st Row z Subscript infinity Baseline equals e Superscript minus 2.2 upper T Super Subscript upper S Superscript slash t Super Subscript a Superscript Baseline EndLayout period

A system for implementing the given step response is obtained by the relationship between the Z‐transform of the impulse response and the Z‐transform of the step response:

(8.34)upper H left-parenthesis z right-parenthesis equals StartFraction z minus 1 Over z EndFraction upper G left-parenthesis z right-parenthesis period

The transfer function can now be written as

(8.35)upper H left-parenthesis z right-parenthesis equals StartFraction left-parenthesis 1 minus z Subscript infinity Baseline right-parenthesis z Superscript negative 1 Baseline Over left-parenthesis 1 minus z Subscript infinity Baseline z Superscript negative 1 Baseline right-parenthesis EndFraction

with the pole z Subscript infinity Baseline equals e Superscript minus 2.2 upper T Super Subscript upper S slash t Super Subscript a adjusting the attack, release, or averaging time. For the coefficients of the corresponding time constant filters, the attack case is given by Eq. (8.16), the release case by Eq. (8.17), and the averaging case by Eq. (8.22). Figure 8.11 shows an example where the dotted lines mark the t 10 and t 90 times.

Schematic illustration of attack and release behavior for time constant filters.

Figure 8.11 Attack and release behavior for time constant filters.

8.4 Implementation

The programming of a system for dynamic range control is described in the following sections.

8.4.1 Limiter

The block diagram of a limiter is presented in Fig. 8.12. The signal x Subscript PEAK Baseline left-parenthesis n right-parenthesis is determined from the input with variable attack and release times. The logarithm to the base 2 of this peak signal is taken and compared with the limiter threshold. If the signal is above the threshold, the difference is multiplied by the negative slope of the limiter LS. After this, the antilogarithm of the result is taken. The obtained control factor f left-parenthesis n right-parenthesis is then smoothed with a first‐order lowpass filter (SMOOTH). If the signal x Subscript PEAK Baseline left-parenthesis n right-parenthesis lies below the limiter threshold, the signal f left-parenthesis n right-parenthesis is set to f left-parenthesis n right-parenthesis equals 1. The delayed input x left-parenthesis n minus upper D 1 right-parenthesis is multiplied by the smoothed control factor g left-parenthesis n right-parenthesis to give the output y left-parenthesis n right-parenthesis.

Schematic illustration of limiter.

Figure 8.12 Limiter.

8.4.2 Compressor

Feedback implementation

A DRC system can be implemented in a feedback form, as shown in Fig. 8.13, where level sensing is performed on the output signal y left-parenthesis n right-parenthesis. A delay has to be introduced in the feedback side‐chain path. The gain is then applied to the input signal x left-parenthesis n right-parenthesis.

Schematic illustration of feedback DRC System.

Figure 8.13 Feedback DRC System.

Ducking

Ducking consists of using a second signal x Subscript upper D Baseline left-parenthesis n right-parenthesis as input to the side‐chain of the DRC system, as shown in Fig. 8.14. This has various applications, for example, in the case of background music playing and an announcement has to be made. The level of the music can be automatically reduced by sensing the level of the speaker and thus applying a negative gain to the music (compression). This effect is also widely used in modern music production to give a stronger feeling of energy to the kick drum. In this case, the kick drum is used as a side‐chain signal to periodically attenuate the rest of the instruments.

Schematic illustration of ducking DRC system.

Figure 8.14 Ducking DRC system.

Lookahead

The lookahead consists of introducing a delay in the direct path to sense the level prior to applying it, as shown in Fig. 8.15. It is useful when applied on transients or fast changing signals, because it predicts the variations and reduces (or avoids) the time needed to react to changes in the signal dynamic. When realized offline, the delay introduced can be compensated, but in real‐time, the delay is equal to the lookahead time. Given a lookahead time t Subscript upper L upper H in ms, the number of samples is given by upper N equals StartFraction t Subscript upper L upper H Baseline dot f Subscript upper S Baseline Over 1000 EndFraction.

Schematic illustration of DRC system with a lookahead of N samples.

Figure 8.15 DRC system with a lookahead of upper N samples.

8.4.3 Compressor, Expander, Noise Gate

The block diagram of a compressor/expander/noise gate is shown in Fig. 8.16. The basic structure is similar to the limiter. In contrast to the limiter, the logarithm of the signal x Subscript RMS Baseline left-parenthesis n right-parenthesis is taken and multiplied by 0.5. The obtained value is compared with three thresholds to determine the operating range of the static curve. If one of the three thresholds is crossed, the resulting difference is multiplied by the corresponding slope (CS, ES, NS) and the antilogarithm of the result is taken. A following first‐order lowpass filter provides the attack and release times.

Schematic illustration of compressor/expander/noise gate.

Figure 8.16 Compressor/expander/noise gate.

8.4.4 Combination System

A combination of a limiter that uses PEAK measurement and a compressor/expander/noise gate that is based on RMS measurement, is presented in Fig. 8.17. The PEAK and RMS values are measured simultaneously. If the linear threshold of the limiter is crossed, the logarithm of the peak signal x Subscript PEAK Baseline left-parenthesis n right-parenthesis is taken and the upper path of the limiter is used to calculate the characteristic curve. If the limiter threshold is not crossed, the logarithm of the RMS value is taken and one of the three lower paths is used. The additive terms in the limiter and noise gate paths result from the static curve. After going through the range detector, the antilogarithm is taken. The sequence f left-parenthesis n right-parenthesis is smoothed with a SMOOTH filter in the limiter case, or weighted with corresponding attack and release times of the relevant operating range (compressor, expander, or noise gate). By limiting the maximum level, the dynamic range is reduced. As a consequence, the overall static curve can be shifted up by a gain factor. Figure 8.18 demonstrates this with a gain factor equal to 10 dB. This static parameter value is directly included in the control factor g left-parenthesis n right-parenthesis.

Schematic illustration of limiter/compressor/expander/noise gate.

Figure 8.17 Limiter/compressor/expander/noise gate.

Schematic illustration of shifting the static curve by a gain factor.

Figure 8.18 Shifting the static curve by a gain factor.

As an example, Fig. 8.19 illustrates the input x left-parenthesis n right-parenthesis, the output y left-parenthesis n right-parenthesis, and the control factor g left-parenthesis n right-parenthesis of a compressor/expander system. It is observed that signals with high amplitude are compressed and those with low amplitude are expanded. An additional gain of 12 dB shows the maximum value of 4 for the control factor g left-parenthesis n right-parenthesis. The compressor/expander system operates in the linear region of the static curve if the control factor is equal to 4. If the control factor is between 1 and 4, the system operates as a compressor. For control factors lower than 1, the system works as an expander (3500 less-than n less-than 4500 and 6800 less-than n less-than 7900). The compressor is responsible for increasing the loudness of the signal whereas the expander increases the dynamic range for signals of small amplitude.

Schematic illustration of signals x(n), y(n), and g(n) for dynamic range control.

Figure 8.19 Signals x left-parenthesis n right-parenthesis, y left-parenthesis n right-parenthesis, and g left-parenthesis n right-parenthesis for dynamic range control.

8.5 Realization Aspects

8.5.1 Sampling Rate Reduction

To reduce the computational complexity, downsampling can be carried out after calculating the PEAK/RMS value (see Fig. 8.20). As the signals x Subscript PEAK Baseline left-parenthesis n right-parenthesis and x Subscript RMS Baseline left-parenthesis n right-parenthesis are already band limited, they can be directly downsampled by taking every second or fourth value of the sequence. This downsampled signal is then processed by taking its logarithm, calculating the static curve, taking the antilogarithm, and filtering with corresponding attack and release time with reduced sampling rate. A following upsampling by a factor of 4 is achieved by repeating the output value four times. This procedure is equivalent to upsampling by a factor 4 followed by a sample‐and‐hold transfer function.

Schematic illustration of dynamic system with sampling rate reduction.

Figure 8.20 Dynamic system with sampling rate reduction.

The nesting and spreading of partial program modules over four sampling periods is shown in Fig. 8.21. The modules PEAK/RMS (i.e. PEAK/RMS calculation) and MULT (delay of input and multiplication with g left-parenthesis n right-parenthesis) are performed every input sampling period. The number of processor cycles for PEAK/RMS and MULT are denoted by Z1 and Z3, respectively. The modules LD(x), CURVE, 2 Superscript x, and SMO have a maximum number of processor cycles of Z2 and are processed consecutively in the given order. This procedure is repeated every four sampling periods. The total number of processor cycles per sampling period for the complete dynamics algorithm results from the sum of all three modules.

Schematic illustration of nesting technique.

Figure 8.21 Nesting technique.

8.5.2 Curve Approximation

In addition to taking the logarithm and antilogarithm, other simple operations like comparisons and addition/multiplication occur in calculating the static curve. The logarithm of the PEAK/RMS value is taken as follows:

(8.36)x equals upper M dot 2 Superscript upper E Baseline comma
(8.37)ld left-parenthesis x right-parenthesis equals ld left-parenthesis upper M right-parenthesis plus upper E period

First, the mantissa is normalized and the exponent is determined. The function ld left-parenthesis upper M right-parenthesis is then calculated by a series expansion. The exponent is simply added to the result.

The logarithmic weighting factor upper G and the antilogarithm 2 Superscript upper G are given by

(8.38)StartLayout 1st Row 1st Column upper G 2nd Column equals negative upper E minus upper M comma EndLayout
(8.39)StartLayout 1st Row 1st Column 2 Superscript upper G 2nd Column equals 2 Superscript negative upper E Baseline dot 2 Superscript negative upper M Baseline period EndLayout

Here, upper E is a natural number and upper M is a fractional number. The antilogarithm 2 Superscript upper G is calculated by expanding the function 2 Superscript negative upper M in a series and multiplication by 2 Superscript negative upper E. A reduction of computational complexity can be achieved by directly using tables for log and antilog.

Schematic illustration of stereo dynamic system.

Figure 8.22 Stereo dynamic system.

8.5.3 Stereo Processing

For stereo processing, a common control factor g left-parenthesis n right-parenthesis is needed. If different control factors are used for both channels, limiting or compressing one of the two stereo signals causes a displacement of the stereo balance. Figure 8.22 shows a stereo dynamic system in which the sum of the two signals is used to calculate a common control factor g left-parenthesis n right-parenthesis. The following processing steps of measuring the PEAK/RMS value, downsampling, taking logarithm, calculating static curve, taking antilogarithm attack and release time, and upsampling with a sample‐and‐hold function remain the same. The delay (DEL) in the direct path must be the same for both channels.

8.6 Multiband DRC

A multiband DRC system consists of several DRC devices applied on different portions of the frequency range. The signal is split into bands (usually 3 to 5) with a complementary filter bank and each band is treated separately with it own DRC device (usually a compressor), as shown in Fig. 8.23. Multiband compressors are useful when only a specific frequency region of the signal has to be treated. Especially at the mastering stage where the mix is already done, each instrument can not be treated on its own anymore. It also prevents common DRC artifacts that occur over a wider frequency range. The level is measured on each band and is used for the gain computation of the corresponding DRC device [Dut11]. A problem of multiband compressors is that the signal is split into bands even when the device is not active. This can cause filtering problems, such as phase cancellation or shifting, making the effect not fully transparent even when deactivated. Also important to notice is that such systems are relatively expensive to compute and can lead to CPU overload when heavily used on systems that are too weak.

Schematic illustration of multiband DRC system.

Figure 8.23 Multiband DRC system.

8.7 Dynamic Equalizers

Dynamic equalizers are very similar to multiband DRC systems, but are in their own way more transparent. The filters for a dynamic equalizer are placed in series like a regular equalizer, but the sensing can be performed in parallel, as shown in Fig. 8.24. The levels of several band‐limited signals are sensed and are used to control the gain of a peak (or shelving) filter. The control can be positive and work as an expander (the gain of the filter is proportional to the gain which exceeds the threshold), or can also be negative and work as a compressor (the filter gain is inverse proportional to the gain exceeding the threshold) [Wis09, Väl16]. A corresponding frequency response at a specific time instant n is shown in Fig. 8.25.

Schematic illustration of dynamic EQ system.

Figure 8.24 Dynamic EQ system.

Schematic illustration of frequency response of a dynamic equalizer at sample n.

Figure 8.25 Frequency response of a dynamic equalizer at sample n.

Schematic illustration of block diagram of a low-shelving dynamic filter.

Figure 8.26 Block diagram of a low‐shelving dynamic filter.

A further approach, shown in Figs. 8.26 and 8.27, implements a dynamic shelving and peak filter based on an allpass filter decomposition for parametric equalization [Zöl08]. This implementation takes advantage of the lowpass and bandpass signal generated at the first stage to compute the level used later at the DRC stage.

Schematic illustration of block diagram of a peak dynamic filter.

Figure 8.27 Block diagram of a peak dynamic filter.

8.8 Source‐filter DRC

8.8.1 Introduction

Source‐filter separation and processing [Arf11] has been extensively used to extract the so‐called spectral envelope (filter coefficients) and the source signal from an audio signal. When applying the spectral envelope to the source signal again, the filtering operation will perfectly reconstruct the original. In the case of a speech signal, the spectral envelope has the advantage to provide a good representation of its formants. The idea of using the extracted formants to re‐synthesize voice has been expressed for a long time in [Sla62] and is nowadays used in applications such as Vocoders [Nag09] or artificial speech models [Kad07].

The system proposed here will apply some dynamic processing on the error signal (source signal) to affect the re‐synthesized result. However, too drastic modifications of the source signal can generate artifacts at the re‐synthesis stage. One of the applications presented here aims to reduce background noise and is especially effective when applied to speech recordings.

Source‐filter separation [Arf11], as shown in Fig. 8.28, uses linear predictive coding and decoding (LPC) by generating a prediction ModifyingAbove x With caret left-parenthesis n right-parenthesis of the input signal x left-parenthesis n right-parenthesis with an adaptive FIR filter upper P left-parenthesis z right-parenthesis of order p. This filter estimates a spectral envelope of the input x left-parenthesis n right-parenthesis. The difference between the predicted signal and the input is called the prediction error e left-parenthesis n right-parenthesis equals x left-parenthesis n right-parenthesis minus ModifyingAbove x With caret left-parenthesis n right-parenthesis and represents the source signal. The prediction error e left-parenthesis n right-parenthesis can be seen as a whitened version of x left-parenthesis n right-parenthesis. For speech signals, the prediction error (source signal) represents an excitation signal similar to the sound emitted by the vocal cords, which will be filtered by the vocal tract to give the spectral shape of the speech signal. Further dynamic range processing of the prediction error will lead to a processed source/error ModifyingAbove e With tilde left-parenthesis n right-parenthesis, which is then fed to the LPC decoding as shown in Fig. 8.28 to reconstruct the processed output y left-parenthesis n right-parenthesis.

Schematic illustration of source-filter separation and processing using linear predictive coding (analysis) and decoding (synthesis).

Figure 8.28 Source‐filter separation and processing using linear predictive coding (analysis) and decoding (synthesis).

The prediction filter upper P left-parenthesis z right-parenthesis of order p used in the analysis and synthesis is updated at every new incoming sample using a buffer of the p past samples from x left-parenthesis n right-parenthesis. The filter coefficients a Subscript k are updated according to the least‐mean‐square (LMS) method given by

(8.40)a Subscript k Baseline left-parenthesis n plus 1 right-parenthesis equals a Subscript k Baseline left-parenthesis n right-parenthesis plus mu e left-parenthesis n right-parenthesis x left-parenthesis n minus k right-parenthesis semicolon

for k equals 1 comma ellipsis comma p, where k is the filter coefficient index and mu corresponds to the step size of the adaption. The predicted signal ModifyingAbove x With caret left-parenthesis n right-parenthesis is calculated using

(8.41)ModifyingAbove x With caret left-parenthesis n right-parenthesis equals sigma-summation Underscript k equals 1 Overscript p Endscripts a Subscript k Baseline x left-parenthesis n minus k right-parenthesis comma

which represents FIR filtering of the input x left-parenthesis n right-parenthesis with the previously estimated filter coefficients. The transfer function for coding (analysis) is given by

(8.42)upper H Subscript upper C Baseline left-parenthesis z right-parenthesis equals StartFraction upper E left-parenthesis z right-parenthesis Over upper X left-parenthesis z right-parenthesis EndFraction equals 1 minus upper P left-parenthesis z right-parenthesis period

The decoder for the re‐synthesis uses the inverse transfer function of the coder given by

(8.43)upper H Subscript upper D Baseline left-parenthesis z right-parenthesis equals StartFraction upper Y left-parenthesis z right-parenthesis Over ModifyingAbove upper E With tilde left-parenthesis z right-parenthesis EndFraction equals StartFraction 1 Over 1 minus upper P left-parenthesis z right-parenthesis EndFraction period

This filter uses the coefficients calculated by the prediction filter upper P left-parenthesis z right-parenthesis and performs as an all‐pole filter. From an implementation point of view, this is simply using the filter in a feedback loop, as depicted in Fig. 8.28.

8.8.2 Combination with DRC

In this particular system, the error signal will be processed by a DRC system to influence the re‐synthesized signal y left-parenthesis n right-parenthesis, as depicted in Fig. 8.29. DRC produces generally a quite transparent effect. Therefore, the processed error signal will not be affected too drastically, so the reconstructed signal will be less prompt to artifacts arising from imperfect reconstruction.

Schematic illustration of block diagram of the combined systems LPC and DRC.

Figure 8.29 Block diagram of the combined systems LPC and DRC.

8.8.3 Applications

Denoising

Dynamic range control has been one of the first tools used to denoise audio signals. As an example, a noise gate or expander attenuates the input signal when its level goes below a defined threshold. By setting this threshold right above the background noise level, the parts where no signal of interest is present will see their level reduced, thus denoised. This denoising method is, by definition, effective only when the energy of the input signal is low in the case of a speech signal when the speaker is not speaking or pausing between words.

Schematic illustration of error signals e(t) and eexp(t) before and after dynamic range expansion, respectively.

Figure 8.30 Error signals e left-parenthesis t right-parenthesis and e Subscript e x p Baseline left-parenthesis t right-parenthesis before and after dynamic range expansion, respectively.

Schematic illustration of spectrograms of a female singing voice with white noise.

Figure 8.31 Spectrograms of a female singing voice with white noise (30 dB SNR; top left, input; top right, expanded input; bottom, denoised output).

In this denoising application, an expander is used as a DRC block, because it tends to be less prompt to artifacts than a noise gate. The expander chosen in this application is composed of a peak level detector [Gia12]. Because the error signal carries out most of the background noise, the expander threshold parameter can be set to a few dBs above the noise level (see Fig. 8.30). The pulse train remains, whereas the noise in between pulses is slightly reduced. Moreover, the silent part sees its background noise reduced as well, because the level is located below the expander threshold. For this application, the time constants of the expander have been set to a very fast attack AT equals 0.1 ms to leave the transients and pulses untouched and a slower release RT equals 1 ms to avoid artifacts. A ratio ER equals 40 has been chosen to drastically reduce the quieter sounds, thus the background noise. In this particular case, the threshold has been set up manually using prior knowledge of the noise level. However, an adaptive threshold may be applied for non‐stationary noise situations. It is worth mentioning that for high noise levels (that cover the speech), the presented algorithm will see its performances reduced drastically because its fundamental principle is to increase the level differences between parts of the signal. As mentioned in [Arf11], LPC can be parameterized so that the error signal resembles a whispering effect. This means that the resonant part of the voice is removed resulting in a combination of plosives, noise and pulse sequences. Removing the background noise from such a signal results in denoising the re‐synthesized signal. To obtain an excitation signal with such properties, filter orders in the range p equals left-bracket 80 comma 150 right-bracket are most suitable (with f Subscript upper S Baseline equals 44.1 kHz).

The top left plot in Fig. 8.31 shows the spectrogram of a female singing voice with white noise (30 dB SNR). The top right plot spectrogram in Fig. 8.31 shows the effect of an expander applied as denoiser. In the case of the presence of stationary low‐colored background noise in a signal, it will be contained in the error signal e left-parenthesis n right-parenthesis as well, as the separation by the LPC is whitening the signal. By denoising the error signal with an expander, the re‐synthesized signal y left-parenthesis n right-parenthesis will also be denoised in parts where voicing is happening. This effect is visible on the spectrogram shown in the lower plot of Fig. 8.31, where an attenuation of the background noise is occurring without altering the original signal.

Transient Control

The error signal obtained by the source–filter separation contains the transients of the original sound. Because transients are sudden burst of energy, the prediction filter is not fast enough to predict them. This arises from to the weighted update of the filter coefficients (mu). Moreover, transients are generally broadband. This character makes them prompt not to be modeled by the filter, and therefore are present in the error signal.

Schematic illustration of block diagram of the DRC block for transient control.

Figure 8.32 Block diagram of the DRC block for transient control.

Schematic illustration of input signal x(t) and the corresponding extracted transients.

Figure 8.33 Input signal x left-parenthesis t right-parenthesis and the corresponding extracted transients.

Once transients are extracted, it is possible to choose the amount to re‐inject into the error signal before the re‐synthesis. For this matter, a gain normal upper G Subscript t r controlling the amount of transient to add or cancel (by phase inversion) is placed, as shown in Fig. 8.32. One of the problems occurring frequently when recording speech is the presence of too strong transients or clipping. This can arise from the air flow hitting the microphone membrane when pronouncing a plosive. Reducing transients may be used to reduce this effect (pop filtering). Traditionally, compressors are used to overcome such problems. Nevertheless, it is also considered that the attack portion of sounds conveys brightness character, so increasing the transients may also give a brighter effect to the re‐synthesized signal. Depending on the desired effect, being able to control the amount of transients in a sound is useful. This property makes the error a good pre‐processing step to extract transients. Processing it with a compressor can achieve a complete transient extraction.

An example of extracted transients is shown in Fig. 8.33 and the corresponding spectrograms are visible in Fig. 8.34. Selecting a slower attack time will allow for the transient portion of the sound to pass through before the compressor starts clamping. However, if the attack time is too slow, then too big a portion of the transient may pass through. A noise gate is cascaded after the expander to fully remove the non‐transients parts (see Fig. 8.32).

Schematic illustration of spectrogram of a female singing voice.

Figure 8.34 Spectrogram of a female singing voice (top left, input; top right, extracted transients; bottom, output with attenuated transients).

8.9 JS Applet – Dynamic Range Control

The applet shown in Fig. 8.35 demonstrates dynamic range control. It is designed for a first insight into the perceptual effects of dynamic range control of an audio signal. You can adjust the characteristic curve with two control points. You can choose between two predefined audio files from our web server (audio1.wav or audio2.wav) or your own local WAV file to be processed [Gui05].

Schematic illustration of JS applet - dynamic range control.

Figure 8.35 JS applet – dynamic range control.

8.10 Exercises

1. Lowpass Filtering for Envelope Detection

Generally, envelope computation is performed by lowpass filtering the input signal's absolute value or its square.

  1. Sketch the block diagram of a recursive first‐order lowpass upper H left-parenthesis z right-parenthesis equals StartFraction lamda Over 1 minus left-parenthesis 1 minus lamda right-parenthesis z Superscript negative 1 Baseline EndFraction.
  2. Sketch its step response. What characteristic measure of the envelope detector can be derived from the step response and how?
  3. Typically, the lowpass filter is modified to use a non‐constant filter coefficient lamda. How does lamda depend on the signal? Sketch the response to a rect‐signal of the lowpass filter thus modified.

2. Discrete‐time Specialties of Envelope Detection

Taking absolute value or squaring are nonlinear operations. Hence care must be taken when using them in discrete‐time systems as they introduce harmonics the frequency of which may violate the Nyquist bound. This can lead to unexpected results, as a simple example shall illustrate. Consider the input signal x left-parenthesis n right-parenthesis equals sine left-parenthesis StartFraction pi Over 2 EndFraction n plus phi right-parenthesis comma phi element-of left-bracket 0 comma 2 pi right-bracket.

  1. Sketch x left-parenthesis n right-parenthesis comma StartAbsoluteValue x left-parenthesis n right-parenthesis EndAbsoluteValue, and x squared left-parenthesis n right-parenthesis for different values of phi.
  2. Determine the value of the envelope after perfect lowpass filtering, i.e. averaging, StartAbsoluteValue x left-parenthesis n right-parenthesis EndAbsoluteValue. Note: As the input signal is periodical, it is sufficient to consider one period, e.g.
    x overbar equals one fourth sigma-summation Underscript n equals 0 Overscript 3 Endscripts StartAbsoluteValue x left-parenthesis n right-parenthesis EndAbsoluteValue period
  3. Similarly, determine the value of the envelope after averaging x squared left-parenthesis n right-parenthesis.

3. Dynamic Range Processors

Sketch the characteristic curves mapping input level to output level and input level to gain for, and describe briefly the application of:

  1. limiter;
  2. compressor;
  3. expander; and
  4. noise gate.

References

  1. [Arf11] D. Arfib, F. Keiler, U. Zölzer, and V. Verfaille: Source‐Filter Processing, chapter 8, pages 279–320. John Wiley and Sons, Ltd, 2011.
  2. [Dut11] P. Dutilleux, K. Dempwolf, M. Holters, and U. Zölzer: Nonlinear Processing, chapter 4, pages 101–138. John Wiley and Sons, Ltd, 2011.
  3. [Gia12] D. Giannoulis, M. Massberg, and J.D. Reiss: Digital dynamic range compressor design ‐ a tutorial and analysis. Journal of the Audio Engineering Society, 60(6):399–408, 2012.
  4. [Gui05] M. Guillemard, C. Ruwwe, U. Zölzer: J‐DAFx ‐ Digital Audio Effects in Java, Proc. 8th Int. Conference on Digital Audio Effects (DAFx‐05), Madrid, Spain, pp.161–166, 2005.
  5. [Kad07] Manjunath D. Kadaba: Artificial speech synthesis using LPC In Audio Engineering Society Convention 122, May 2007.
  6. [McN84] G.W. McNally: Dynamic Range Control of Digital Audio Signals, J. Audio Eng. Soc., Vol. 32, No. 5, pp. 316–327, 1984.
  7. [Nag09] F. Nagel, S. Disch, and N. Rettelbach: A phase vocoder driven bandwidth extension method with novel transient handling for audio codecs. In Audio Engineering Society Convention 126, May 2009.
  8. [Sla62] F.H. Slaymaker and R.A. Houde. Speech compression by analysis‐synthesis. J. Audio Eng. Soc, 10(2):144–148, 1962.
  9. [Sti86] E. Stikvoort: Digital Dynamic Range Compressor for Audio, J. Audio Eng. Soc., Vol. 34, No. 1/2, pp. 3–9, 1986.
  10. [Väl16] V. Välimäki and J.D. Reiss: All about audio equalization: Solutions and frontiers. Applied Sciences, 6(5), 2016.
  11. [Wis09] D.K. Wise: Concept, design, and implementation of a general dynamic parametric equalizer. Journal of the Audio Engineering Society, 57(1/2):1–28, January 2009.
  12. [Zöl08] U. Zölzer: Digital Audio Signal Processing. John Willey and Sons, 2008.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.86.86