Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 4

Recent Advances on LIP Nonlinear Filters and Their Applications

Efficient Solutions and Significance-Aware Filtering

Christian Hofmann^⁎; Walter Kellermann^⁎ ^⁎Friedrich-Alexander-Universität Erlangen-Nürnberg, Lehrstuhl für Multimediakommunikation und Signalverarbeitung, Erlangen, Germany

Abstract

Linear-In-the-Parameters (LIP) nonlinear filters are categorized as Cascade Models (CMs) (generalizing Hammerstein models), Cascade Group Models (CGMs) (generalizing Hammerstein Group models (HGMs) and including, e.g., Volterra filters) and bilinear cascade models, where the filter output is a bilinear function of the model parameters. Time-domain and partitioned-block frequency-domain adaptation of CGMs and CMs is described and the methods for adapting bilinear cascade models are summarized as variants of the filtered-X adaptation. These models and algorithms are employed to review the Significance-Aware (SA) filtering concept, decomposing the model for the unknown system and the adaptation mechanism into synergetic subsystems to achieve high computational efficiency. In particular, the Serial SA (SSA) and Parallel SA (PSA) decomposition lead to SSA-CMs, PSA-CGMs and a novel PSA filtered-X algorithm. The main concepts described in this chapter are exemplarily compared for the challenging application of nonlinear acoustic echo cancellation. Furthermore, model structure estimation for LIP nonlinear filters based on convex filter combinations is briefly outlined and compared with SA filtering.

Keywords

Cascade models; Cascade group models; Hammerstein models; Hammerstein group models; Significance-aware filtering; SA filtering; Filtered-X

Chapter Points

• LIP nonlinear filters with simple linear dependence on the parameters can be identified in a unified way by classical partitioned-block frequency-domain normalized least mean square algorithms.
• Algorithms to adapt LIP nonlinear filters with bilinear dependence on parameter subsets can be treated in a unified way as filtered-X algorithms.
• Significance-aware filtering can be applied in both cases for the efficient and effective identification of LIP nonlinear filters.

Acknowledgements

This book chapter could not have been written without the dedicated work of many former students and collaborators in the authors' research group, especially Alexander Stenger, Fabian Küch, Marcus Zeller and Luis A. Azpicueta-Ruiz.

4.1 Introduction

Linear-In-the-Parameters (LIP) nonlinear filters constitute a broad class of nonlinear systems and are used for modeling in a broad range of scientific areas, such as in biology (e.g., [1,2]), control engineering (e.g., [3]), communications (e.g., [4–6]) and in acoustic signal processing, where they are, for instance, employed for loudspeaker linearization (e.g., [7–9]) and Acoustic Echo Cancellation (AEC) (e.g., [10–16]). A nonlinear system with input $x (n)$ and output $y (n)$ , where n is the discrete-time sample index, will be described by the input/output relation

$y (n) = f_{I/O} {x (n)},$

(4.1)

and will be called a memoryless nonlinearity if and only if $y (n) δ (n) = f_{I/O} {x (n) δ (n)}$ for any n and $x (n)$ , where $δ (n)$ is the unit impulse. Otherwise, $f_{I/O} {x (n)}$ will be called a nonlinearity with memory. In both cases, a nonlinearity may either be considered as time-invariant and known, or as parametrized by a set of unknown parameters. The following treatment will focus on LIP nonlinear filters, where the output depends linearly on the parameters, e.g., on a vector a, or bilinearly on two parameter subsets, e.g., on parameter vectors $a_{1}$ and $a_{2}$ .

The adaptive identification of a parametric model for an unknown nonlinear system $f_{I/O} {\cdot}$ is illustrated in Fig. 4.1. Therein, an adaptive nonlinear filter (the parametric model) produces an estimate $\hat{y} (n)$ for an observation $y (n)$ and the filter's parameters are iteratively refined (adapted) to minimize a cost function derived from the error signal $e (n) = y (n) - \hat{y} (n)$ , based on noisy observations $y (n) = f_{I/O} {x (n)} + s (n)$ . When obtaining the observations by measuring a physical signal, $s (n)$ may be the superposition of sensor self-noise and other processes coupling into the sensors as well.

Figure 4.1 The generic task of adaptive system identification.

As will be explained in the following, AEC is a particularly challenging example for system identification. For AEC, $y (n)$ in Fig. 4.1 corresponds to a microphone signal composed of local sound sources $s (n)$ (e.g., the local speaker in a telephone communication) and acoustic echoes $f_{I/O} {x (n)}$ from a played-back loudspeaker signal $x (n)$ (e.g., the far-end speaker in a telephone communication). The AEC system aims at removing the acoustic echoes $f_{I/O} {x (n)}$ from the microphone signal $y (n) = f_{I/O} {x (n)} + s (n)$ , using an echo signal estimate $\hat{y} (n)$ . In periods of local sound source activity, the error signal $e (n)$ constitutes an estimate for the local sound sources $s (n)$ . In single-talk periods where only the echo signal $f_{I/O} {x (n)}$ contributes to the microphone signal, the error signal can be employed to refine the adaptive filter. The detection of local sound sources (double-talk detection) is out of the scope of this chapter and solutions for this problem exist (e.g., [17,18]). Therefore, we will assume that no local sources are active when considering AEC, which means $s (n) = 0$ .

AEC is a particularly challenging example for system identification due to the complex acoustic echo paths to be modeled, which typically requires nonlinear adaptive filters with large memory. Furthermore, adaptive filters have to identify a system excited by speech or audio signals (super-Gaussian, non-white and highly instationary signals) instead of signals tailored to system identification. Thus, constituting an especially challenging application, AEC will be used as a reference for evaluating the methods in this chapter. However, the models described and introduced in this chapter are not limited to AEC. Potential other applications will be highlighted at the beginning of Section 4.5, where the required terminology on nonlinear systems and their adaptation will have been established.

The organization of this chapter will be summarized in the following. After introducing the fundamental notations and terminology in the remainder of this section, state-of-the-art LIP nonlinear filters will be categorized in Section 4.2 and fundamental algorithms for parameter estimation for such filters will be presented in Section 4.3. Resulting from a unified treatment of time-domain Normalized Least Mean Square (NLMS), Partitioned-Block Frequency-domain Normalized Least Mean Square (PB-FNLMS), as well as Filtered-X (FX) algorithms for LIP nonlinear filters, a novel and efficient FX PB-FNLMS algorithm will be introduced. In Section 4.4, the recently proposed, computationally very efficient concept of Significance-Aware (SA) filtering will be summarized and a novel SA FX PB-FNLMS algorithm with even further increased efficiency will be proposed. The adaptive filtering techniques described in this chapter will be evaluated and compared in Section 4.5. Finally, similarities and differences of SA filters and nonlinear adaptive filtering techniques based on convex filter combinations for model structure estimation will be reviewed in Section 4.6, before summarizing this chapter in Section 4.7.

Notations and Fundamentals As common in system identification literature, the terms filter and system are used interchangeably. The input/output relationship of a linear, causal Finite Impulse Response (FIR) system of length L with input signal $x (n)$ and output $y (n)$ will be written as

$y (n) = h (n) ⁎ x (n) = \sum_{k = 0}^{L - 1} h (k) \cdot x (n - k) = 〈 h, {\overset{ˇ}{x}}_{n} 〉,$

(4.2)

where $h (k)$ denotes the $k^{th}$ tap of the system's Impulse Response (IR), ⁎ denotes convolution, and $〈 h, {\overset{ˇ}{x}}_{n} 〉$ represents the scalar product between the vectors h and ${\overset{ˇ}{x}}_{n}$ , where

$h = {[h (0), h (1), \dots, h (L - 1)]}^{T},$

(4.3)

${\overset{ˇ}{x}}_{n} = {[x (n), x (n - 1), \dots, x (n - L + 1)]}^{T}$

(4.4)

denote the IR vector and the time-reversed input signal vector, respectively, and the time reversal is indicated by the caron accent. Furthermore, $0_{M}$ and $I_{M}$ will denote the $M \times M$ matrix of zeros and the $M \times M$ identity matrix, respectively, and are used as components of the windowing matrices

$W_{01} = [\begin{matrix} 0_{M} & 0_{M} \\ 0_{M} & I_{M} \end{matrix}] and W_{10} = [\begin{matrix} I_{M} & 0_{M} \\ 0_{M} & 0_{M} \end{matrix}],$

(4.5)

which set the first and second half of a vector of length $N = 2 M$ to zero, respectively. In addition, F will refer to the $N \times N$ unitary Discrete Fourier Transform (DFT) matrix and $a_{1} ⊙ a_{2}$ and $a_{1} ⊘ a_{2}$ will denote element-wise multiplication and division of the vectors $a_{1}$ and $a_{2}$ , respectively. The element-wise magnitude square computation for the entries of a vector a will be written as ${| a |}^{2}$ .

4.2 A Concise Categorization of State-of-the-Art LIP Nonlinear Filters

In this section, generic state-of-the-art LIP nonlinear filter structures are introduced as building blocks for the recently developed and newly proposed adaptive filtering methods in Sections 4.3 and 4.4.

4.2.1 Hammerstein Models and Cascade Models

A very simple class of LIP nonlinear filters can be described with the cascade structure depicted in Fig. 4.2, where a nonlinearity $f {\cdot}$ is followed by a linear system with IR vector h. The output depends linearly on the parameters h of the linear system and can be expressed in the notation of Eq. (4.2) as

$y (n) = f {x (n)} ⁎ h (n) = x_{NL} (n) ⁎ h (n) .$

(4.6)

If $f {\cdot}$ in Fig. 4.2 is a memoryless nonlinearity, the structure is widely known as HM [19], whereas the more general case, where $f {\cdot}$ is a nonlinearity with memory, will simply be referred to as CM in the following (the attribute nonlinear will be implied). HMs can, for instance, be employed for loudspeaker linearization (e.g., [20,21]) and AEC (e.g., [22–32]). HMs are simple and powerful models for acoustic echo paths if the major source of nonlinearity is the playback equipment (amplifiers and loudspeakers operating at the limits of their dynamic range) and can be mapped to $f {\cdot}$ , while the acoustic IR (describing sound propagation through the acoustic environment) can be described by h [22].

Figure 4.2 A CM, which is commonly referred to as HM for the special case of a memoryless nonlinearity f{⋅}.

4.2.2 Hammerstein Group Models and Cascade Group Models

The term Cascade Group Model (CGM) will refer to models with the structure depicted in Fig. 4.3. Such CGMs are composed of B parallel branches, each of which is a nonlinear CM (see Section 4.2.1) with a branch nonlinearity $f_{b} {\cdot}$ and IR $h_{(b)}$ . For memoryless branch nonlinearities $f_{b} {\cdot}$ , each branch becomes an HM and the entire CGM will be referred to as HGM in the following. HGMs have successfully been employed to describe both acoustic systems, e.g., in [33–37], and nonacoustic systems, e.g., [38,39]. In their treatment in the literature, HGMs have been addressed by various names, such as the Uryson model [3,39], or are simply treated as an approximation of a particular CGM. The most prominent examples of such HGM/CGM pairs are the so-called power filters [37,40,41] and Volterra filters [19]. Power filters are HGMs with monomes as branch nonlinearities $f_{b} {\cdot}$ , e.g.,

$f_{1} {\cdot} = (\cdot), f_{2} {\cdot} = {(\cdot)}^{2}, f_{3} {\cdot} = {(\cdot)}^{3}, \dots, . f_{B} {\cdot} = {(\cdot)}^{B} .$

Figure 4.3 The general structure of a CGM, which is also referred to as HGM for the special case of memoryless nonlinearities f_b{⋅},b ∈ {1,…,B}.

Volterra filters result from power filters by augmenting the latter with additional branches where the branch nonlinearities $f_{b} {\cdot}$ perform time-lagged products of powers of the input (time-lagged products of the power filter branch nonlinearities), e.g., the set

$\begin{matrix} f_{1} {x (n)} = x (n), f_{2} {x (n)} = x^{2} (n), f_{3} {x (n)} = x (n) \cdot x (n - 1), \\ \dots, f_{2 + R} {x (n)} = x (n) \cdot x (n - R), f_{3 + R} {x (n)} = x^{3} (n), \dots \end{matrix}$

leads to a Volterra filter in Diagonal Coordinate Representation (DCR) [6]. The set of IRs $h_{(b)}$ of branches where $f_{b} {\cdot}$ performs (potentially time-lagged) products of o input signal values is called oth-order Volterra kernel and the branches within a kernel are also termed diagonals of the respective kernel. In the aforementioned example, the second-order kernel is modeled by $R + 1$ diagonals ( $b \in {2, \dots, 2 + R}$ ). Volterra filters are universal approximators for a large class of nonlinear systems and have therefore been applied in biology (e.g. [1]), control engineering (e.g., [3]) and communications (e.g., [4–6]), as well as in acoustic signal processing (e.g., [7–16,42,43]). Recently, HGMs and CGMs have also been realized with other sets of branch nonlinearities, such as Fourier basis functions [44], Legendre polynomials [45,46] or Chebyshev polynomials [47]. Also, Functional-Link Adaptive Filters (FLAFs) [34,35,48–50] can be viewed as HGM or CGM structures. All these models can be described by the input/output relationship

$y (n) = \sum_{b = 1}^{B} \sum_{k = 0}^{L - 1} \underset{x_{b} (n - k)}{\underset{︸}{f_{b} {x (n - k)}}} h_{(b)} (k) = \sum_{b = 1}^{B} x_{b} (n) ⁎ h_{(b)} (n),$

(4.7)

where $h_{(b)} (k)$ denotes the kth tap of the linear subsystem in branch b and $x_{b} (n)$ is the bth branch signal. Thus, algorithms developed for one particular realization (i.e., set of basis functions $f_{b} {\cdot}$ ) of CGMs can often immediately be applied to other realizations. This holds for the adaptation algorithms for coefficient estimation described in Sections 4.3 and 4.4, as well as for the structure estimation methods outlined in Section 4.6.

4.2.3 Bilinear Cascade Models

For applications where CGMs with a large number of branches are computationally too demanding, e.g., for AEC in mobile devices with very limited processing or battery power, CM-structured adaptive filters are particularly attractive. Yet, the nonlinearity itself is usually unknown. Therefore, the nonlinearity $f_{} {\cdot}$ of a CM is typically modeled by a parametric preprocessor according to

$f_{} {x (n)} = f_{pp} {x (n)} = \sum_{b = 1}^{B} w_{b} f_{b} {x (n)} = x_{pp} (n),$

(4.8)

where B nonlinearities $f_{b} {\cdot}$ are combined with weights $w_{b}$ , which are parameters to be identified in addition to the linear subsystem. The subindex “pp” of the nonlinear preprocessor and its output signal are used to emphasize the parametrical structure of the preprocessor. The general structure of a CM with a preprocessor according to Eq. (4.8) is depicted in Fig. 4.4A. Most prominently, $f_{b} {\cdot}$ are chosen as monomes [20,21,24,51–54], e.g., $f_{b} {\cdot} = {(\cdot)}^{b}$ , such that the nonlinearity $f_{pp} {\cdot}$ can be seen as a truncated Taylor series expansion. Fourier series- [55] and Legendre series-based approximations [28,30,31,56] have also recently been employed in AEC. Expanding the preprocessor of Fig. 4.4A in the block diagram results in Fig. 4.4B. Obviously, such a system has the input/output relationship

$y (n) = x_{pp} (n) ⁎ h (n)$

(4.9)

$= h (n) ⁎ \sum_{b = 1}^{B} \overset{y_{b} (n)}{\overset{︷}{w_{b} \underset{x_{b} (n)}{\underset{︸}{f_{b} {x (n)}}}}},$

(4.10)

and the paths from $x_{b} (n)$ to $y (n)$ consist not only of a single linear system, but of cascades of $w_{b}$ (a single-tap linear system) with the linear system h. Therefore, the output of $y (n)$ linearly depends on both the parameter vector $w = {[w_{1}, w_{2}, \dots, w_{B}]}^{T}$ and the parameter vector h, which is why such systems will be referred to as bilinear¹ CMs in the following. Bilinearity holds regardless of $f_{b} {\cdot}$ being memoryless or with memory and also for general linear filters with IRs $h_{(b)}$ replacing the single-tap filters $w_{b}$ , resulting in CGM preprocessors.² Thus, in the following, all considerations on simple preprocessor systems according to Fig. 4.4A translate to bilinear CMs with CGM preprocessors and vice versa, unless noted otherwise.

Figure 4.4 Alternative representations of a CM with parametric preprocessor. (A) Compact representation of a CM with parametric preprocessor as nonlinear–linear cascade. To emphasize the parametric structure of the preprocessor, the nonlinearly distorted input signal is denoted $x_{pp} (n)$ instead of $x_{NL} (n)$ . (B) Expansion of the memoryless nonlinearity in (A), revealing a bilinear dependency of the model output $y (n)$ on the parameter vectors $w = {[w_{1}, w_{2}, \dots, w_{B}]}^{T}$ and h.

4.3 Fundamental Methods for Coefficient Adaptation

The most common methods for parameter estimation for linear and LIP nonlinear filters are stochastic gradient-type algorithms, such as the Least Mean Square (LMS) algorithm or the NLMS algorithm [57]. These can be derived as an approximation of a gradient descent w.r.t. the mean squared error cost function $J = E {{| e (n) |}^{2}}$ , where $E {\cdot}$ denotes mathematical expectation and $e (n) = y (n) - \hat{y} (n)$ is the error signal, computed as difference between the unknown system's output $y (n)$ and its estimate $\hat{y} (n)$ , obtained with the adaptive filter (cf. Fig. 4.1). For a comprehensive treatment of different types of LMS-type algorithms, we refer to [57].

In the following, classical system identification will be employed in Section 4.3.1 to adapt $h_{(b)}$ of CGMs from Section 4.2.2 (cf. Fig. 4.3). In Section 4.3.2, the filtered-X method is employed to convert the adaptation of a preprocessor system according to Section 4.2.3 (cf. Fig. 4.4) into two ordinary system identification tasks, one for $h_{(b)}$ and one for $w_{b}$ , by reordering $w_{b}$ and $h_{(b)}$ .

4.3.1 NLMS for Cascade Group Models

In this section, the adaptation of CGMs according to Section 4.2.2 by a time-domain and a partitioned-block frequency-domain NLMS will be described in Sections 4.3.1.1 and 4.3.1.2, respectively. Note that the branch nonlinearities will be considered as time-invariant and known, such that only the linear subsystems of $h_{(b)}$ of the CGMs from Section 4.2.2 are adaptive.

4.3.1.1 Time-Domain NLMS

To implement and adapt a CGM, the input signal $x (n)$ has to be mapped nonlinearly to a set of branch signals

$x_{b} (n) = f_{b} {x (n)}, b \in {1, \dots, B},$

(4.11)

as inputs for the linear $B \times 1$ Multiple-Input/Single-Output (MISO) subsystem depicted in the right half of Fig. 4.3. This allows one to compute output samples

$\hat{y} (n) = \sum_{b = 1}^{B} 〈 {\hat{h}}_{(b), n - 1}, {\overset{ˇ}{x}}_{(b), n} 〉,$

(4.12)

where ${\hat{h}}_{(b), n - 1}$ is the length-L IR estimate in branch b and has been obtained at time index $n - 1$ and where ${\overset{ˇ}{x}}_{(b), n}$ is the time-reversed branch signal vector, structured analogously to Eq. (4.4). This allows one to compute the error signal

$e (n) = y (n) - \hat{y} (n)$

(4.13)

before performing for each branch b a filter update according to

${\hat{h}}_{(b), n} = {\hat{h}}_{(b), n - 1} + μ_{b} \cdot \frac{e (n)}{E_{x_{b}} (n) + δ} {\overset{ˇ}{x}}_{(b), n},$

(4.14)

with the branch-specific adaptation step sizes $μ_{b} \in (0, 2)$ , a nonnegative regularization constant δ for numerical stability and the branch signal energies

$E_{x_{b}} (n) = 〈 {\overset{ˇ}{x}}_{(b), n}, {\overset{ˇ}{x}}_{(b), n} 〉 .$

(4.15)

The computational effort of such a time-domain NLMS-adaptive CGM grows linearly with the number of branches B. Note that the CGM adaptation includes the CM adaptation for $B = 1$ and the adaptation of linear systems for $B = 1$ and $f_{1} {x (n)} = x (n)$ .

4.3.1.2 Partitioned-Block Frequency-Domain NLMS

To efficiently adapt LIP nonlinear filters with a low input/output delay, Partitioned Block (PB) frequency-domain algorithms can be applied. For this purpose, a PB-FNLMS will be summarized here for CGMs.

Partitioned-Block Convolution

For partitioned-block convolution, the summation in Eq. (4.2) is split into partial sums and each partial sum (the convolution of an IR partition with an appropriately delayed segment of the input signal) is computed for M consecutive samples via fast DFT-domain convolution, like in [58–62]. This leads to block processing of the input signal $x (n)$ with a frameshift of M samples and a frame size of $N = 2 M$ . In this notation, let the input signal vector $x_{κ}$ at frame κ, the IR partition vector $h^{(p)}$ (containing the pth partition of h) and the output signal vector $y_{κ}$ at frame κ be defined as

$x_{κ} = {[x (κ M - M), . . ., x (κ M + M - 1)]}^{T},$

(4.16)

$h^{(p)} = {[h (p M), . . ., h (p M + M - 1), 0, . . ., 0]}^{T} and$

(4.17)

$y_{κ} = {[0, . . ., 0, y (κ M), . . ., y (κ M + M - 1)]}^{T},$

(4.18)

respectively. Employing the length-N DFT matrix F, DFT-domain representations

${\underline{x}}_{κ} = F x_{κ} and$

(4.19)

${\underline{h}}^{(p)} = F h^{(p)}$

(4.20)

can be obtained, and the output signal vector can be computed as

$y_{κ} = W_{01} \overset{y_{κ}^{\circ}}{\overset{︷}{F^{H} \underset{{\underline{y}}_{κ}^{\circ}}{\underset{︸}{\sum_{p = 0}^{P - 1} {\underline{x}}_{κ - p} ⊙ {\underline{h}}^{(p)}}}}},$

(4.21)

where ${\underline{y}}_{κ}^{\circ}$ results from the accumulation over all P partitions' DFT-domain products and contains cyclic convolution artifacts in its time-domain representation $y_{κ}^{\circ}$ . Thus, the time-domain output signal vector $y_{κ}$ emerges from $y_{κ}^{\circ}$ by premultiplication with the windowing matrix $W_{01}$ , defined in Eq. (4.5).

Application of Partitioned-Block Convolution for Adaptive Filtering

As in the previous paragraph's partitioned convolution scheme, an adaptive CGM estimate for the linear subsystem in branch b at frame κ can be split into partitions ${\hat{h}}_{(b), κ}^{(p)}$ . Their DFT-domain representations ${\underline{\hat{h}}}_{(b), κ}^{(p)}$ can be adapted efficiently on a frame-wise basis by an NLMS-type algorithm. To this end, branch signal vectors

$x_{(b), κ} = [\begin{matrix} [0_{M}, I_{M}] x_{(b), κ - 1} \\ {[f_{b} {x (κ M)}, \dots, f_{b} {x (κ M + M - 1)}]}^{T} \end{matrix}]$

(4.22)

are determined, which contain M samples of the previous frame (upper half) and M newly computed branch signal samples (lower half), and they are transformed into the DFT domain according to

${\underline{x}}_{(b), κ} = F x_{(b), κ} .$

(4.23)

Applying the fast partitioned convolution scheme of Eq. (4.21) in all B branches and summing up all branches in the DFT domain yields the output estimate with cyclic convolution artifacts

${\hat{\underline{y}}}_{κ}^{\circ} = \sum_{b = 1}^{B} \sum_{p = 0}^{P - 1} {\underline{x}}_{(b), κ - p} ⊙ {\underline{\hat{h}}}_{(b), κ - 1}^{(p)} .$

(4.24)

This result allows one to compute the DFT-domain error signal vector

${\underline{e}}_{κ} = F (y_{κ} - \underset{{\hat{y}}_{κ}}{\underset{︸}{W_{01} F^{H} {\hat{\underline{y}}}_{κ}^{\circ}}}),$

(4.25)

where ${\hat{y}}_{κ}$ represents the output signal vector estimate, obtained from ${\hat{\underline{y}}}_{κ}^{\circ}$ after an Inverse DFT (IDFT) and windowing. For NLMS adaptation, the DFT-domain signal energy can be estimated using a recursive averaging

(4.26)

where is a smoothing factor ( will be assumed as default) and will be used to compute normalized branch signals

${\underline{x}}_{norm, (b), κ} = μ_{b} {({\underline{x}}_{(b), κ})}^{⁎} ⊘ (s_{x_{b}, κ} + δ),$

(4.27)

where $μ_{b} \in (0, 2)$ is the adaptation step size and δ is a small positive constant for numerical stability, which is added to each element of $s_{x_{b}, κ}$ . Based on ${\underline{x}}_{norm, (b), κ}$ , the update of the filter partitions can be expressed as

${\hat{\underline{h}}}_{(b), κ}^{\circ (p)} = {\underline{\hat{h}}}_{(b), κ - 1}^{(p)} + {\underline{x}}_{norm, (b), κ - p} ⊙ {\underline{e}}_{κ},$

(4.28)

which is also known as unconstrained update in the literature [58] and where cyclic convolution artifacts lead to nonzero filter taps in the second half of ${\hat{h}}_{(b), κ}^{\circ (p)} = F^{H} {\hat{\underline{h}}}_{(b), κ}^{\circ (p)}$ . Limiting the temporal support of the partitions to M samples is possible by

${\underline{\hat{h}}}_{(b), κ}^{(p)} = F \underset{{\hat{h}}_{(b), κ}^{(p)}}{\underset{︸}{W_{10} F^{H} ({\hat{\underline{h}}}_{(b), κ}^{\circ (p)})}}$

(4.29)

and corresponds to the so-called constrained update in the literature, which is typically formulated by introducing the constraint on the update instead of the partitions (see [58]). Alternatively, a soft-partitioned update is also possible [62,63], which shapes the temporal support of ${\hat{h}}_{(b), κ}^{(p)}$ by convolution in the DFT domain with a very short DFT-domain sequence in order to save the DFTs in Eq. (4.29).

Corresponding to the time-domain adaptation in Section 4.3.1.1, this section's PB-FNLMS CGM adaptation includes the CM adaptation for $B = 1$ and the adaptation of linear systems for $B = 1$ and $f_{1} {x (n)} = x (n)$ . Furthermore, a nonpartitioned Frequency-domain Normalized Least Mean Square (FNLMS) algorithm results from $P = 1$ . Taking the number of DFTs as measure of computational complexity, a CGM with $B = 5$ branches and $P = 4$ partitions is about 4.3 times as complex as a linear model with $P = 4$ partitions ( $B = 1$ ).

4.3.2 Filtered-X Adaptation for Bilinear Cascade Models

In this section, the FX algorithm will first be introduced and discussed on a conceptual level for bilinear CMs, before specifying it in detail and proposing an efficient realization for CGM preprocessors with single-tap branch filters (i.e., parametric preprocessors according to Eq. (4.8)).

The Generic Filtered-X Structure FX algorithms are frequently employed in Active Noise Control (ANC) [64] for prefilter adaptation. Thereby, the FX algorithm exploits the fact that the ordering of linear time-invariant systems in a cascade can be reversed without altering the cascade's output: for the preprocessor-based CM in Fig. 4.4B, joint linearity and time invariance allow incorporating h into each of the branches, as depicted in Fig. 4.5. Therein, the prefiltered inputs to the former prefilters $w_{b}$ are often termed FX signals. The representation of the CM in Fig. 4.5 allows one to directly apply an NLMS-type adaptation algorithm for linear MISO systems from Section 4.3.1 to adapt $w_{b}$ , as their inputs are known and their combined output is directly matched to the target signal (in system identification: an unknown system's output). Mathematically, an FX LMS algorithm can also be derived directly like an LMS algorithm as stochastic gradient descent algorithm w.r.t. the partial derivative w.r.t. the prefilters. The exchange of the filtering orders results simply from an advantageous summation order in the gradient calculation. Note that adaptive filters for $w_{b}$ and h actually violate the time invariance required for exchanging the filter order. However, the time variance appears to be uncritical in practice, where FX algorithms have been employed for ANC for more than three decades [64,65].

Figure 4.5 A classical FX structure for an HM with parametric preprocessor according to Eq. (4.8) as a result of exchanging the filtering order of h and w_b in Fig. 4.4B (exploiting linearity and assuming time invariance), which forms the common core of a class of adaptive algorithms for nonlinear system identification [24].

Review of Known Algorithms for Filtered-X Adaptation of Bilinear Cascade Models Many FX-like algorithms have been derived for the adaptation of bilinear LIP nonlinear systems but do not establish the link to the filtered-X algorithms and the exchange of filtering orders. The following description of these methods will highlight the fact that these algorithms can be viewed and implemented as FX algorithms.

In [24], the FX preprocessor adaptation was derived and applied to the adaptation of a polynomial preprocessor with $f_{b} {\cdot} = {(\cdot)}^{b}$ —a nonadaptive offline version of this mechanism was proposed in [23], which resembles the iterative version of [51] for AutoRegressive Moving Average (ARMA) models as linear submodels. More recent work on such cascade models considered a generalization of [24] by allowing longer IRs than a single tap [37,66] or considered piece-wise linear functions for $f_{b} {\cdot}$ in conjunction with a modified joint normalization of the linear and nonlinear filter coefficients [27]. In particular, [37] employs a power filter as preprocessor to the linear subsystem. The resulting algorithm corresponds to a time-domain FX algorithm. In [66], a particular CGM preprocessor is discussed, which corresponds to a Volterra filter in triangular representation (see, e.g., [19] for the triangular representation) with memory lengths of $N_{1}$ and $N_{2}$ for the kernels of orders 1 and 2, respectively, and no memory at all for higher-order Volterra kernels (becoming a simple memoryless preprocessor³ for these orders). Structures similar to this preprocessor also result from the so-called EVOLutionary Volterra Estimation (EVOLVE), which will be outlined later on in Section 4.6. While evaluating their algorithm for a memoryless preprocessor ( $N_{1} = N_{2} = 0$ ), [66] also evaluates an extension of [24] by adapting the linear subsystem with a frequency-domain NLMS with unconstrained update. The preprocessor is adapted, as in [24], by a Recursive Least Squares (RLS)-like algorithm based on the FX signals. Still, the two RLS descriptions differ as [66] employs the direct inversion of the involved correlation matrix, whereas [24] makes use of the alternative variant with a recursively computed inverse (see [57] for RLS-adaptive filters). Another extension of [24] is treated in [25] (in German), where the signal flow and benefit of a subband-AEC variant of [24] is discussed on a theoretical level.

Note that none of these approaches describes the preprocessor adaptation explicitly as an FX algorithm.⁴ As opposed to this, the remainder of this section introduces an FX PB-FNLMS algorithm for CMs with CGM preprocessors and tailors this algorithm to the special case of memoryless preprocessors according to Eq. (4.8).

Filtered-X Adaptation of Preprocessors of Partitioned-Block CMs In the following, consider the structure of Fig. 4.4B as a single-tap CGM, for which DFT-domain branch signal vectors can be computed according to Eq. (4.23). Based on that, preprocessed DFT-domain input signal vectors can be determined by

${\underline{x}}_{pp, κ} = \sum_{b = 1}^{B} {\hat{w}}_{b} (κ - 1) {\underline{x}}_{(b), κ}$

(4.30)

and the DFT-domain FX signal vectors ${\underline{x}}_{FX, (b), κ}$ (see Fig. 4.5 illustrating the FX signals) can be computed by partitioned convolution in a two-step procedure as

${\underline{x}}_{FX, (b), κ}^{\circ} = \sum_{p = 0}^{P - 1} {\underline{x}}_{(b), κ - p} ⊙ {\underline{\hat{h}}}_{κ - 1}^{(p)}$

(4.31)

${\underline{x}}_{FX, (b), κ} = F \underset{x_{FX, (b), κ}}{\underset{︸}{[\begin{matrix} [0_{M}, I_{M}] x_{FX, (b), κ - 1} \\ [0_{M}, I_{M}] F^{H} {\underline{x}}_{FX, (b), κ}^{\circ} \end{matrix}]}} .$

(4.32)

With the introduced signals, the adaptive filter's output can alternatively be determined by one of the two equations

${\hat{y}}_{κ} = W_{01} F^{H} \sum_{p = 0}^{P - 1} {\underline{x}}_{pp, κ - p} {\underline{\hat{h}}}_{p}^{(κ)}$

(4.33)

$\approx W_{01} F^{H} \underset{{\underline{y}}_{κ}^{\circ}}{\underset{︸}{\sum_{b = 1}^{B} {\hat{w}}_{b} (κ - 1) {\underline{x}}_{FX, (b), κ}}},$

(4.34)

where Eq. (4.33) corresponds to block processing with the signal flow in Fig. 4.4B and Eq. (4.34) corresponds to the signal flow in Fig. 4.5. For time-invariant systems, Eqs. (4.33) and (4.34) are identical. This allows one to compute the error signal vector $e_{κ} = y_{κ} - {\hat{y}}_{κ}$ and its DFT

${\underline{e}}_{κ} = F e_{κ},$

(4.35)

both of which will be used for NLMS-type adaptation of the parameters. Analogously to Section 4.3.1.2, a PB-FNLMS adaptation of ${\underline{\hat{h}}}_{κ}^{(p)}$ can be expressed as

${\hat{\underline{h}}}_{κ}^{\circ (p)} = {\underline{\hat{h}}}_{κ - 1}^{(p)} + {\underline{x}}_{norm, pp, κ - p} ⊙ {\underline{e}}_{κ},$

(4.36)

${\underline{\hat{h}}}_{κ}^{(p)} = F \underset{{\hat{h}}_{κ}^{(p)}}{\underset{︸}{W_{10} F^{H} ({\hat{\underline{h}}}_{κ}^{\circ (p)})}},$

(4.37)

where ${\underline{x}}_{norm, pp, κ}$ is the normalized preprocessed input signal vector computed analogously to Eqs. (4.26) and (4.27) (containing the adaptation step size) and where Eq. (4.36) and Eq. (4.37) correspond to the unconstrained and the constrained update, respectively.

Similar to [24], the weights ${\hat{w}}_{b} (κ)$ are estimated by an NLMS algorithm employing the FX signals as

${\hat{w}}_{b} (κ) = {\hat{w}}_{b} (κ - 1) + \frac{μ_{FX}}{E_{b} + δ} 〈 x_{FX, (b), κ}, e_{κ} 〉$

(4.38)

with the adaptation step size $μ_{FX}$ and

$E_{b} = 〈 x_{FX, (b), κ}, x_{FX, (b), κ} 〉 .$

(4.39)

For the more general case where a partitioned-block CGM with $P_{pp}$ partitions ${\underline{\hat{h}}}_{(b), κ}^{(p)}$ replaces the preprocessor with weights ${\hat{w}}_{b} (κ)$ , all linear combinations with the weights, like in Eq. (4.30), have to be replaced by actual partitioned convolutions and PB-FNLMS updates can be computed for the preprocessor according to

${\hat{\underline{h}}}_{(b), κ}^{\circ (p)} = {\underline{\hat{h}}}_{(b), κ - 1}^{(p)} + {\underline{x}}_{FX, norm, (b), κ - p} ⊙ {\underline{e}}_{κ}$

(4.40)

and

${\underline{\hat{h}}}_{(b), κ}^{(p)} = F \underset{{\hat{h}}_{(b), κ}^{(p)}}{\underset{︸}{W_{10} F^{H} ({\hat{\underline{h}}}_{(b), κ}^{\circ (p)})}}$

(4.41)

with the normalized DFT-domain FX branch signal vectors ${\underline{x}}_{FX, norm, (b), κ - p}$ determined from the FX branch signals $x_{FX, (b), κ}$ analogously to Eqs. (4.26) and (4.27). This corresponds to a generalization of [37] from time-domain power filter preprocessor adaptation to PB-FNLMS adaptation of general CGM preprocessors.

Thereby, the adaptation of bilinear CMs has been decomposed into two ordinary PB-FNLMS system identification tasks, plus the additional prefiltering operations generating the FX branch signals. All these components can be implemented efficiently using partitioned-block convolution techniques.

An Adaptation Tailored to Single-Tap CGM Preprocessors

In the following, the implications of a parametric preprocessor according to Eq. (4.30) (a single-tap CGM preprocessor, cf. Fig. 4.4B) will be considered in detail. Interestingly, Eq. (4.34) does not introduce cyclic convolution artifacts into ${\underline{y}}_{κ}^{\circ}$ for single-tap filters ${\hat{w}}_{b} (κ)$ . Consequently, the cyclic convolution artifacts present in $x_{FX, (b), κ}^{\circ}$ do not spread by filtering with the weights, allowing to compute the output estimate

${\hat{y}}_{κ} = W_{01} F^{H} \sum_{b = 1}^{B} {\hat{w}}_{b} (κ - 1) \cdot {\underline{x}}_{FX, (b), κ}^{\circ}$

(4.42)

without Eq. (4.32). Furthermore, the zeros in the vector $e_{κ}$ and the unitarity of the DFT imply

$〈 x_{FX, (b), κ}, e_{κ} 〉 = 〈 x_{FX, (b), κ}^{\circ}, e_{κ} 〉$

(4.43)

$= 〈 {\underline{x}}_{FX, (b), κ}^{\circ}, {\underline{e}}_{κ} 〉 .$

(4.44)

Further identifying

$E_{b} \approx E_{b}^{\circ} = 〈 {\underline{x}}_{FX, (b), κ}^{\circ}, {\underline{x}}_{FX, (b), κ}^{\circ} 〉$

(4.45)

as a reasonable approximation of the branch signal energy within a frame allows one to compute the weights according to

${\hat{w}}_{b} (κ) = {\hat{w}}_{b} (κ - 1) + \frac{μ_{FX}}{E_{b}^{\circ} + δ} 〈 {\underline{x}}_{FX, (b), κ}^{\circ}, {\underline{e}}_{κ} 〉$

(4.46)

without Eq. (4.32) as well. Thus, Eq. (4.32) does not need to be evaluated at all, which allows saving 2B DFTs of length N.

The remaining DFTs and IDFTs are $B + 2$ transforms for computing DFT-domain branch signals prior to Eq. (4.31), the output estimate in Eq. (4.34), and the DFT-domain error signal in Eq. (4.35), as well as 2P DFTs for a constrained update of the partitioned linear subsystem in Eq. (4.37). The computational effort for non-DFT operations has a component growing proportional to $B \cdot P$ due to Eq. (4.31). However, as the effort for a length-N DFT is typically much higher than the effort for a vector multiplication in Eq. (4.31), the number of DFTs may serve as a first indicator of the computational complexity. Thus, the relative computational complexity $R_{CGM}^{FX-CM}$ of an FX CM and a full-length CGM can be approximated by

${\tilde{R}}_{CGM}^{FX-CM} = \frac{B + 2 + 2 P}{B + 2 + 2 B P} + ε,$

(4.47)

which is the ratio of the numbers of involved DFTs plus an overhead ε representing the number of operations which are not DFTs and grow with $B \cdot P$ in both the FX-CM and the CGM (filtering and filter coefficient updates). To give an example, $P = 4$ and $B = 5$ leads to ${\tilde{R}}_{CGM}^{FX-CM} \approx 0.32 + ε$ and suggests that the CM can be identified with about one-third of the effort of a CGM of the same length.

4.3.3 Summary of Algorithms

The main algorithms presented in Section 4.3 are summarized in tabular form to support an implementation in practice. The most basic algorithm is the time-domain NLMS algorithm summarized in Table 4.1. The operations listed therein have to be evaluated at each sample index n. For processing an entire block of M subsequent samples, as in block- and frequency-domain processing, all equations in Table 4.1 have to be evaluated sample by sample for each of the block's samples—M times in total. For PB-FNLMS adaptation, the operations for processing a block of M samples are summarized in Table 4.2. The operations listed therein have to be executed for each signal frame κ (once every M samples). In the same way, the operations required for FX adaptation of a bilinear CM with a single-tap CGM as nonlinear preprocessor are listed in Table 4.3.

Table 4.1

Summary of operations of a time-domain NLMS algorithm for CGMs as executed for each sampling instant n

computed quantity	computation formula		equation no.
branch signals	$x_{b} (n) = f_{b} {x (n)}$	∀b	Eq. (4.11)
time-reversed branch signal vectors	${\overset{ˇ}{x}}_{(b), n} = {[x_{b} (n), \dots, x_{b} (n - L + 1)]}^{T}$	∀b	–
output signal	$\hat{y} (n) = \sum_{b = 1}^{B} 〈 {\hat{h}}_{(b), n - 1}, {\overset{ˇ}{x}}_{(b), n} 〉$		Eq. (4.12)
error signal	$e (n) = y (n) - \hat{y} (n)$		Eq. (4.13)
frame energy	$E_{x_{b}} (n) = 〈 {\overset{ˇ}{x}}_{(b), n}, {\overset{ˇ}{x}}_{(b), n} 〉$	∀b	Eq. (4.15)
filter coefficients	${\hat{h}}_{(b), n} = {\hat{h}}_{(b), n - 1} + μ_{b} \cdot \frac{e (n)}{E_{x_{b}} (n) + δ} {\overset{ˇ}{x}}_{(b), n}$	∀b	Eq. (4.14)

Table 4.2

Summary of operations of a PB-FNLMS algorithm for CGMs as executed for each frame κ

computed quantity	computation formula		equation no.
branch signals	$x_{(b), κ} = [\begin{matrix} [0_{M}, I_{M}] x_{(b), κ - 1} \\ {[f_{b} {x (κ M)}, \dots, f_{b} {x (κ M + M - 1)}]}^{T} \end{matrix}]$	∀b	Eq. (4.22)
∼ in DFT domain	${\underline{x}}_{(b), κ} = F x_{(b), κ}$	∀b	Eq. (4.23)
DFT-domain output estimate with cyclic convolution artifacts	${\hat{\underline{y}}}_{κ}^{\circ} = \sum_{b = 1}^{B} \sum_{p = 0}^{P - 1} {\underline{x}}_{(b), κ - p} ⊙ {\underline{\hat{h}}}_{(b), κ - 1}^{(p)}$		Eq. (4.24)
DFT-domain error signal	${\underline{e}}_{κ} = F (y_{κ} - W_{01} F^{H} {\hat{\underline{y}}}_{κ}^{\circ})$		Eq. (4.25)
branch PSDs		∀b	Eq. (4.26)
normalized branch signals	${\underline{x}}_{norm, (b), κ} = μ_{b} {({\underline{x}}_{(b), κ})}^{⁎} ⊘ (s_{x_{b}, κ} + δ)$	∀b	Eq. (4.27)
updated filters	${\hat{\underline{h}}}_{(b), κ}^{\circ (p)} = {\underline{\hat{h}}}_{(b), κ - 1}^{(p)} + {\underline{x}}_{norm, (b), κ - p} ⊙ {\underline{e}}_{κ}$	∀b,p	Eq. (4.28)
constrained update	${\underline{\hat{h}}}_{(b), κ}^{(p)} = F W_{10} F^{H} ({\hat{\underline{h}}}_{(b), κ}^{\circ (p)})$	∀b,p	Eq. (4.29)

Table 4.3

Summary of operations of an FX algorithm tailored to the adaptation of bilinear CMs with single-tap CGM preprocessors as executed for each frame κ

computed quantity	computation formula		equation no.
branch signals	$x_{(b), κ} = [\begin{matrix} [0_{M}, I_{M}] x_{(b), κ - 1} \\ {[f_{b} {x (κ M)}, \dots, f_{b} {x (κ M + M - 1)}]}^{T} \end{matrix}]$	∀b	Eq. (4.22)
∼ to DFT domain	${\underline{x}}_{(b), κ} = F x_{(b), κ}$	∀b	Eq. (4.23)
DFT-domain preprocessed signal	${\underline{x}}_{pp, κ} = \sum_{b = 1}^{B} {\hat{w}}_{b} (κ - 1) {\underline{x}}_{(b), κ}$		Eq. (4.30)
DFT-domain FX branch signals with cyclic convolution artifacts	${\underline{x}}_{FX, (b), κ}^{\circ} = \sum_{p = 0}^{P - 1} {\underline{x}}_{(b), κ - p} ⊙ {\underline{\hat{h}}}_{κ - 1}^{(p)}$	∀b,p	Eq. (4.31)
output signal	${\hat{y}}_{κ} = W_{01} F^{H} \sum_{b = 1}^{B} {\hat{w}}_{b} (κ - 1) \cdot {\underline{x}}_{FX, (b), κ}^{\circ}$		Eq. (4.33)
error signal	$e_{κ} = y_{κ} - {\hat{y}}_{κ}$		–
∼ to DFT domain	${\underline{e}}_{κ} = F e_{κ}$		Eq. (4.35)
preprocessed signal PSD			–
normalized input	${\underline{x}}_{norm, pp, κ} = {({\underline{x}}_{pp, κ})}^{⁎} ⊙ (μ_{b} ⊘ (s_{x_{pp}, κ} + δ))$		–
updated filters	${\hat{\underline{h}}}_{κ}^{\circ (p)} = {\underline{\hat{h}}}_{κ - 1}^{(p)} + {\underline{x}}_{norm, pp, κ - p} ⊙ {\underline{e}}_{κ}$ ,	∀p	Eq. (4.40)
constrained update	${\underline{\hat{h}}}_{κ}^{(p)} = F \underset{{\hat{h}}_{κ}^{(p)}}{\underset{︸}{W_{10} F^{H} ({\hat{\underline{h}}}_{κ}^{\circ (p)})}}$ ,	∀p	Eq. (4.41)
FX energies	$E_{b} \approx E_{b}^{\circ} = 〈 {\underline{x}}_{FX, (b), κ}^{\circ}, {\underline{x}}_{FX, (b), κ}^{\circ} 〉$	∀b > 1	Eq. (4.45)
weight update	${\hat{w}}_{b} (κ) = {\hat{w}}_{b} (κ - 1) + \frac{μ_{FX}}{E_{b}^{\circ} + δ} 〈 {\underline{x}}_{FX, (b), κ}^{\circ}, {\underline{e}}_{κ} 〉$	∀b > 1	Eq. (4.46)

These algorithms will be employed as building blocks of the computationally very efficient SA filters in the following section, Section 4.4. Therein, the aforementioned basic algorithms will only be referenced without repeating all equations.

4.4 Significance-Aware Filtering

In [28], the concept of SA filtering has been introduced, in order to estimate the parameters of nonlinear CMs by decomposing the adaptation process in a divide-and-conquer manner into synergetic adaptive subsystems [28,30–32] which are LIP. Thereby, the estimation of the nonlinearity is separated from the estimation of a possibly long linear subsystem, as it would be characteristic for acoustic echo paths. As a consequence, the nonlinearity can be estimated computationally efficiently by nonlinear models with a very low number of parameters.

In the following, two different SA decompositions will be introduced in Section 4.4.1 and employed for the Serial SA CMs (SSA-CMs) in Section 4.4.2 (a generalization of the Equalization-based SA (ESA)-HM from [32]), the Parallel SA CGMs (PSA-CGMs) in Section 4.4.3 (a generalization of the SA-HGM from [32]) and a novel Parallel SA Filtered-X (PSAFX) algorithm in Section 4.4.4.

4.4.1 Significance-Aware Decompositions of LIP Nonlinear Systems

In [32], two different SA decompositions have been employed, which will be denoted Serial SA (SSA) decomposition (employed for ESA filtering in [32]) and Parallel SA (PSA) decomposition (employed for so-called “classical” SA filtering in [32]). Both decompositions allow for the estimation of the nonlinearity with a low-complexity nonlinear adaptive filter, as will be explained in the following.

Serial Significance-Aware Decomposition

The SSA decomposition is depicted schematically in Fig. 4.6, where the unknown system is cascaded with an equalizer filter ${\hat{h}}_{eq}$ . The overall response of this serial connection will ideally be a (delayed) version of the nonlinear preprocessor $f {\cdot}$ . Thus, an estimate for the physically inaccessible intermediate signal $x_{NL} (n)$ is obtained, which allows one to estimate the nonlinearity directly from this decomposition efficiently by a nonlinear model with a short temporal support. An adaptive system using this decomposition will be specified in Section 4.4.2.

Figure 4.6 Serial decomposition of the unknown system exploited for SSA filtering. Cascading the unknown system with a linear equalizer filter allows one to approximate the cascade as the nonlinear component of the unknown system (disregarding the processing delay due to causal filtering).

Parallel Significance-Aware Decomposition

The PSA decomposition is illustrated in Fig. 4.7. The LIP property of a nonlinear CM is employed to decompose its IR vector into a dominant region $h_{d}$ (light gray), describing, e.g., direct-path propagation, and a complementary region $h_{c}$ (dark gray). Augmenting the unknown system with a parallel system modeling the complementary-region output signal component $y_{c} (n)$ allows one to compute the dominant-region output component $y_{d} (n) = y (n) - y_{c} (n)$ . As $f_{} {\cdot}$ , $x_{NL} (n)$ , $h_{c}$ , and thereby $y_{c} (n)$ are not known during system identification, they have to be replaced by estimates $f_{pp} {\cdot}$ , $x_{pp} (n)$ , ${\hat{h}}_{c}$ and ${\hat{y}}_{c} (n)$ in practice, as depicted in the upper part of Fig. 4.7. Still, the output of the parallel arrangement yields an estimate $e_{\bar{c}} (n) = y (n) - {\hat{y}}_{c} (n) \approx y_{d} (n)$ of the dominant-region component of the unknown system (bottom of Fig. 4.7). As the dominant-region component of the system describes the transmission of a significant amount of energy, a nonlinear dominant-region model can be estimated at a high Signal to Noise Ratio (SNR) and will suffer much less from gradient noise than a reverberation tail model. Adaptive systems using this decomposition will be specified in Sections 4.4.3 and 4.4.4.

Figure 4.7 Parallel decomposition of the unknown system exploited for PSA filtering. The temporal support modeled by IR vectors is illustrated below the respective vectors' symbols as follows. The entire vector h is represented by a flat white box with a gray frame. Nonzero entries are marked by light gray boxes if they correspond to the dominant region and by dark gray boxes if they describe the complementary region.

Estimating a Preprocessor System From a CGM

Another key component for the SA models described in Sections 4.4.2 and 4.4.3 below is the possibility to extract a CM's preprocessor from an identified CGM-structured estimate of the CM system. To this end, assume that the CM's nonlinearity can be expressed as weighted sum of the CGM's branch nonlinearities $f_{b} {\cdot}$ , resulting in an expression identical to Eq. (4.8). The CM's linear subsystem will be denoted h. Then, the CM can be expressed as CGM with linear subsystems $h_{(b)} = w_{b} h$ . In practice, an identified CGM can only provide noisy estimates

${\hat{h}}_{(b)} = w_{b} h + ϵ_{b},$

(4.48)

where $ϵ_{b}$ is the coefficient error vector. As shown in [28], a least squares estimate for $w_{b}$ can be obtained by

${\hat{w}}_{b}^{(LS)} = \frac{〈 h_{(b)}, h_{(b_{ref})} 〉}{〈 h_{(b_{ref})}, h_{(b_{ref})} 〉}$

(4.49)

w.r.t. a reference branch $h_{(b_{ref})}$ , for which ${\hat{w}}_{b_{ref}}^{(LS)} = 1$ . Note that ${\hat{w}}_{b_{ref}}^{(LS)} = 1$ is not a model restriction—it simply removes the scaling ambiguity inherent to the cascade model by shifting the actual $w_{b_{ref}}$ as gain factor into the estimate $\hat{h}$ to be identified after the preprocessor. Without loss of generality, $b_{ref} = 1$ and $f_{1} {\cdot} = (\cdot)$ will be assumed from now on. A preprocessor with $w_{b} = 0 \forall b > 1$ will be referred to as linearly configured preprocessor.

4.4.2 Serial Significance-Aware Cascade Models

SSA-CMs employ the SSA decomposition from Fig. 4.6 and generalize the Serial SA HMs (SSA-HMs) from [32] by allowing branch nonlinearities with memory. The structure of an SSA-CM is depicted in Fig. 4.8. Initially, the preprocessor in Block A in Fig. 4.8, providing the signal $x_{pp} (n)$ , is configured to be linear. After this initialization, all subsystems in Fig. 4.8 are adapted in parallel. In Block C in Fig. 4.8, an adaptive linear equalizer implements the SSA decomposition. The equalizer filters the unknown system's output $y (n)$ to produce an estimate ${\hat{x}}_{NL} (n - N_{eq})$ of the delayed nonlinearly distorted input signal. For adaptation, ${\hat{x}}_{NL} (n - N_{eq})$ is matched to $x_{pp} (n - N_{eq})$ .⁵ Ideally, $x (n - N_{eq})$ and ${\hat{x}}_{NL} (n - N_{eq})$ are time-aligned as well and related by $f_{} {\cdot}$ (cf. Fig. 4.6). Thus, the nonlinear relationship between $x (n - N_{eq})$ and ${\hat{x}}_{NL} (n - N_{eq})$ can be estimated by a very short adaptive CGM of $L = L_{SA}$ taps in Block B in Fig. 4.8. Although $L_{SA} = 1$ would be sufficient in the case of a perfect equalization of h, a very small $L_{SA}$ of about $L_{SA} = 3$ is reasonable in practice due to the adaptive, imperfect equalizer. Between Blocks B and A in Fig. 4.8, the identified CGM and Eq. (4.49) are employed to estimate the coefficients of a preprocessor which combines the CGM's branch signals according to Eq. (4.8). Analogously to [32], the preprocessor coefficients ${\hat{w}}_{b}$ are estimated on a frame-wise basis as

${\hat{w}}_{b} (κ) = γ_{w} {\hat{w}}_{b} (κ - 1) + (1 - γ_{w}) {\tilde{w}}_{b} (κ),$

(4.50)

where $γ_{w}, 0 \leq γ_{w} < 1$ , is a recursive-smoothing constant ( $γ_{w} = 0.95$ will be used by default for the simulations in Section 4.5) and ${\tilde{w}}_{b} (κ)$ is computed according to Eq. (4.49) from the instantaneous estimates of the branch signal vectors of the CGM of Block B. The preprocessor with weights ${\hat{w}}_{b} (κ)$ is used in a CM with an adaptive linear subsystem to model the entire unknown system (Block A in Fig. 4.8).

Figure 4.8 Proposed novel SSA filtering: an estimate ${\hat{x}}_{NL} (n)$ of the nonlinearly distorted input signal is obtained in Block C by equalizing the system output signal $y (n)$ with a linear equalizer, which allows in Block B to assess the nonlinear distortions by a very short CGM, which in turn is employed for estimating the parameters of the nonlinear preprocessor $f_{pp} {\cdot}$ of Block A.

The SSA-CM structure according to Fig. 4.8 is efficient, because it requires only two adaptive linear filters with long temporal support (in Blocks A and C), both of which will be assumed to have length L, and a CGM with a very low number of taps $L_{SA}$ . A number of $L_{SA} = 3$ taps will be assumed by default in the following. The long adaptive filters can be realized efficiently using a PB-FNLMS according to Section 4.3.1.2 in a block processing scheme and the CGM can be implemented due to its short length at very low computational effort by a time-domain NLMS according to Section 4.3.1.1. Hence, the computational effort for an SSA-CM is roughly twice as high as that for a linear filter. The relative complexity compared with a CGM, $R_{CGM}^{SSA-CM}$ , is about

${\tilde{R}}_{CGM}^{SSA-CM} = \frac{2 \cdot P}{B \cdot P} = \frac{2}{B},$

where the contribution of the length- $L_{SA}$ CGM has been disregarded. For $B = 5$ branches and $P = 4$ partitions, this leads to ${\tilde{R}}_{CGM}^{SSA-CM} = 0.40$ .

4.4.3 Parallel Significance-Aware Cascade Group Models

PSA-CGMs generalize the SA-HGMs from [28,31] by allowing branch nonlinearities with memory and are structurally identical otherwise. The structure of such a PSA-CGM is depicted in Fig. 4.9. For an explanation of the general concept, assume that the temporal support of the unknown system's dominant region (cf. Fig. 4.7) is known. In this case, all adaptive systems in Fig. 4.9 can be adapted simultaneously.

Figure 4.9 Basic structure of a PSA-CGM: a B-branch CGM models only the dominant-part propagation (Block C), whereas the long IR is covered by a simple preprocessor system (Block A).

Block A of Fig. 4.9 contains a CM with adaptive linear subsystem (minimizing the error signal $e_{CM} (n)$ ). The nonlinear parametric preprocessor $f_{pp} {\cdot}$ is initially configured as linear function and is refined separately from the IR. The CM's IR estimate is partitioned into a dominant region ${\hat{h}}_{d}$ (upper branch in Block A) and a complementary region ${\hat{h}}_{c}$ (lower branch in Block A). This allows one to apply the PSA decomposition (cf. Fig. 4.7) to the unknown system in Block B in Fig. 4.9: subtracting the complementary-region estimate ${\hat{y}}_{c} (n)$ from the unknown system's output $y (n)$ results in the dominant-region dominated error signal $e_{\bar{c}} (n)$ . This enables in Block C of Fig. 4.9 the identification of the unknown system's dominant region alone by a CGM with short temporal support (as short as ${\hat{h}}_{d}$ ). The CGM's IRs ${\hat{h}}_{(b)}$ allow one to extract the preprocessor coefficients for the CM in Block A of Fig. 4.9 analogously to Section 4.4.2. It is worth noting that a PSA-CGM offers two different error signals: $e_{CM} (n)$ , resulting from a CM output signal estimate ${\hat{y}}_{CM} (n)$ , and $e_{SA} (n)$ , resulting from ${\hat{y}}_{SA} (n) = {\hat{y}}_{c} (n) + {\hat{y}}_{CGM} (n)$ (not shown in Fig. 4.9). The estimate ${\hat{y}}_{SA} (n)$ is obtained with a CGM as dominant-path model and represents a more general model than the CM in Block A of Fig. 4.9. If the unknown system has a CM structure, ${\hat{y}}_{CM} (n)$ may be slightly more accurate than ${\hat{y}}_{SA} (n)$ , because the CGM has more degrees of freedom than necessary and is more affected by gradient noise. If the unknown system is more complex than a CM, the additional degrees of freedom of the CGM may render ${\hat{y}}_{SA} (n)$ more accurate. For this reason, the PSA-CGM will appear in the evaluation in Section 4.5 twice.

For a highly efficient low-delay realization of the PSA-CGM, a PB-FNLMS implementation has been proposed in [31]. This method identifies partitions ${\hat{h}}_{κ}^{(p)}$ of the linear subsystem of the CM in Block A of Fig. 4.9 by a PB-FNLMS (Section 4.3.1.2, $B = 1$ branches and $P > 1$ partitions). Employing this partitioning, the dominant region is modeled by a single partition ${\hat{h}}_{κ}^{(p_{d})}$ with index $p_{d}$ and the complementary region is composed of all other partitions, where $p \neq p_{d}$ . As a consequence, the nonlinear dominant-region model of Block C in Fig. 4.9 is a single-partition CGM and adapted according to Section 4.3.1.2 ( $B > 1$ branches and $P = 1$ partition).

If the temporal support of the dominant region is unknown, only ${\hat{h}}_{κ}^{(p)}$ are adapted in an initial convergence phase. Afterwards, the fixed-length partitions ${\hat{h}}_{κ}^{(p)}$ allow one to detect a dominant partition ${\hat{h}}_{κ}^{(p_{d})}$ as the partition with maximum energy. Compared to a full-length CGM with P partitions in each branch, the relative computational complexity $R_{CGM}^{PSA-CGM}$ of a PSA-CGM is approximately

${\tilde{R}}_{CGM}^{PSA-CGM} = \frac{B + P}{B \cdot P} .$

Obviously, the PSA-CGM is very efficient for large P and B. Without block partitioning (P=1), however, the described PSA-CGM does not provide any advantage. To give an example, $B = 5$ and $P = 4$ results in ${\tilde{R}}_{CGM}^{PSA-CGM} = 0.45$ .

4.4.4 Parallel Significance-Aware Filtered-X Adaptation

In this section, a novel SA filtering concept will be introduced. This concept exploits the PSA decomposition for the direct FX adaptation of bilinear CMs from Section 4.2.3 (see Fig. 4.7) and will be denoted Parallel SA Filtered-X CM (PSAFX-CM). The block diagram of the PSAFX-CM matches the block diagram of the PSA-CGM in Fig. 4.9, except for Block C. Like for the PSA-CGM, a CM with an adaptive linear subsystem (Block A in Fig. 4.9) enables a PSA decomposition (Block B in Fig. 4.9); in an initialization phase, the CM preprocessor is configured as linear function and only the IR is adapted. Afterwards, the CM's IR is split into a dominant and a complementary region. The complementary-region submodel is used (as in Block B in Fig. 4.9) to compute an error signal $e_{\bar{c}} (n) = y (n) - {\hat{y}}_{c} (n) \approx y_{d} (n)$ . This error signal mainly contains components $y_{d} (n)$ , which are caused by the unknown system's dominant region (see also Fig. 4.7).

As an alternative to the PSA-CGM, the novel PSAFX-CM implements the adaptive nonlinear model for the dominant region (Block C) according to Fig. 4.10. Therein, the PSAFX-CM employs a dominant-region CM with ${\hat{h}}_{d}$ from Block A in Fig. 4.9 and FX-adapted preprocessor weights ${\hat{w}}_{b}$ . As it is characteristic for SA filtering, the preprocessor determined for the dominant region is then applied in the CM with the full-length adaptive linear subsystem (Block A in Fig. 4.9). Note that the complementary-region CM from Block A in Fig. 4.9 and the dominant-region CM from Fig. 4.10 seamlessly combine to the overall CM of Block A in Fig. 4.9 again. This implies $e_{FX} (n) = e_{CM} (n)$ . Thus, the only difference from an ordinary FX implementation is that the FX branch signals are generated with the most significant fragment of $\hat{h}$ .

Figure 4.10 Substitution for Block C in Fig. 4.9 to form the novel PSAFX-CM: The preprocessor coefficients ${\hat{w}}_{b}$ of a dominant-region CM are adapted using an FX algorithm with ${\hat{h}}_{d}$ from Block A in Fig. 4.9 as linear subsystem of the CM and with $e_{\bar{c}} (n) \approx y_{d} (n)$ from the PSA decomposition in Block B in Fig. 4.9 as target signal.

Pursuing a PB-FNLMS implementation as in Section 4.3.2, the adaptation procedures are identical up to Eq. (4.31), and Eq. (4.31) simplifies to the single partition

${\underline{x}}_{FX, (b), κ}^{\circ} = {\underline{x}}_{(b), κ - p} ⊙ {\underline{\hat{h}}}_{κ - 1}^{(p_{d})} = : {\underline{x}}_{SAFX, (b), κ}^{\circ}$

(4.51)

and is therefore independent of the length of the actually modeled unknown system. Unlike for the FX algorithm of Section 4.3.2, the filtering effort for generating the FX signals is not proportional to $B \cdot P$ anymore, but only to B. As a consequence, the relative complexity $R_{CGM}^{PSAFX-CM}$ of a PSAFX-CM w.r.t. a full-length CGM can be estimated by the number of DFTs and yields

${\tilde{R}}_{CGM}^{PSAFX-CM} = \frac{B + 2 + 2 P}{B + 2 + 2 B P} < {\tilde{R}}_{CGM}^{FX-CM} .$

(4.52)

The overhead ε from Eq. (4.47), caused by non-DFT operations with a complexity proportional to $B \cdot P$ , does not appear in Eq. (4.52) as the actual filtering effort is reduced to a complexity proportional to B only. To give an example, $P = 4$ and $B = 5$ leads to ${\tilde{R}}_{CGM}^{PSAFX-CM} \approx 0.32$ and thereby indicates a computationally very inexpensive nonlinear model, which is even less complex than the other SA filters.

As for the other SA models in Sections 4.4.2 and 4.4.3, the nonlinear system identification has been split into synergetic subsystems, where the first models the possibly long memory of the linear subsystem (Block A in Fig. 4.9) and the second estimates the nonlinearities without modeling the possibly long memory of the system (Fig. 4.10).

4.5 Experiments and Evaluation

In this section, the adaptive LIP nonlinear filters from Sections 4.3 and 4.4 will be evaluated exemplarily for the challenging application of AEC (cf. Section 4.1). To this end, the evaluation method will be introduced in Section 4.5.1 and the experiments will be presented and discussed in Section 4.5.2.

Note that, in principle, the presented methods can also be employed for many applications aside from AEC. In particular, the identification of the path between the discrete-time loudspeaker signal and a recording of the radiated sound can also be the first step towards a loudspeaker linearization [7–9] or nonlinear active noise control [68,53]. Furthermore, the presented algorithms can enable the identification of nonlinear analog signal processors (e.g., guitar amplifier and effect processor devices [69,70]) to emulate the identified processors in digital hardware afterwards. Moreover, the joints of robots can be modeled as HMs as well [71] and may be identified with the methods of Section 4.3. Lifting the latter example to a broad industrial scope, the adaptive algorithms presented in this chapter may be employed for digital twin modeling (also termed cybertwin) of nonlinear mechanical, electrical, acoustical or chemical systems in digitalized production plants for Industry 4.0 [72]—as long as the underlying nonlinear systems can be described well by CMs. However, the longer the linear subsystem of an HM or CM is, the more computational benefit can be expected by SA modeling. In case of short linear subsystems, block partitioning and frequency-domain processing may be unnecessary and time-domain adaptation of all involved adaptive linear filters should be pursued (see [28] for a time-domain SA filter).

4.5.1 Evaluation Metrics

For system identification, normalized misalignment [24] and projection misalignment measures [73] evaluate how well particular components of a parametrized system are identified. However, these measures require the exact knowledge of the true system parameters—knowledge which is typically not available for actual physical systems to be identified. In this case, signal-based measures have to be adopted. In particular, the modeling accuracy can be measured by the ratio of the variance of the unknown system's output $y (n)$ and of the residual error $e (n) = y (n) - \hat{y} (n)$ .

For the special system identification case of AEC, where the primary objective is the removal of the loudspeaker signal components (echoes) from the microphone signal, this ratio of variances is widely known as Echo-Return Loss Enhancement (ERLE) and is usually considered in the logarithmic domain according to

$ERLE = 10 \log_{10} (\frac{E {{| y (n) |}^{2}}}{E {{| e (n) |}^{2}}}) dB,$

(4.53)

assuming noise-free “single talk” situations (no speaker or noise on the near end, only an echo signal). A higher ERLE measure corresponds to a better performance of the AEC system [74]. When estimating the ERLE in short time frames employing the instantaneous energies of $y (n)$ and $e (n)$ and continuously adapting the model, even severe overadaptation to the particular signal may remain unnoticed and be misinterpreted as accurate system identification because of the high ERLE values. To minimize this undesired effect, filtering a long interval of speech with frozen filter coefficients after convergence is possible. The resulting ERLE can be used as system identification performance measure. An ERLE computed in this way also evaluates the primary use case of an AEC system, as it captures the amount of echo reduction in the presence of near-end speech (double talk), when the adaptive filters cannot be adapted and the recording cannot be muted (no half-duplex but full-duplex communication) [74]. The ERLE measure for frozen filter coefficients will be employed in the following to evaluate and compare the adaptive filtering algorithms described in Sections 4.3 and 4.4.

4.5.2 Experiments

In this section, the ERLE performance of the adaptive algorithms from Sections 4.3 and 4.4 will be compared in AEC experiments. To this end, the nonlinear echo paths of different devices will be considered. Device 1 is a smartphone in hands-free mode, Device 2 is a low-quality electrodynamic loudspeaker and Device 3 is a cellphone manufactured in the year 2000. About $80 s$ of speech (both male and female speech) have been played back at relatively high volumes of about $80 dBA$ at $1 m$ distance and recorded with the playback device itself for Device 1 and with a high-quality measurement microphone at $1 m$ distance for Devices 2 and 3. All AEC experiments were conducted at a sampling rate of $f_{S} = 16 kHz$ and in three acoustic environments, denoted Scenarios 1, 2 and 3. The actual recordings were performed under low-reverberation conditions, which corresponds to Scenario 1. The data for Scenarios 2 and 3 are synthesized from the low-reverberation recordings by convolving the recordings with measured acoustic IRs from real lab environments with reverberation times of $T_{60} \approx 250 ms$ and $T_{60} \approx 400 ms$ , respectively. The adaptive filters employed for the identification of the acoustic echo path will be realized, as specified earlier in this chapter, as partitioned-block adaptive filters in the frequency domain. For all scenarios and devices, a linear filter (Section 4.3.1.2 with $B = 1$ branches and $f_{1} {\cdot} = (\cdot)$ ), a CM with FX adaptation tailored to memoryless preprocessors (Section 4.3.2), a CM with PSAFX adaptation (Section 4.4.4), a PSA-CGM (Section 4.4.3), an SSA-CM (Section 4.4.2) and a CGM with full temporal support (Section 4.3.1.2) will be compared. All adaptive filters are identically parametrized by step sizes of $μ_{b} = μ_{FX} = 0.1$ , block processing is done at a frame shift of $M = 512$ samples and a frame size of $N = 2 M$ and the block-partitioned filters consist of $P = 4$ partitions of length M. Furthermore, the equalizer delay for the SSA-CM is chosen as $N_{eq} = 2 M$ samples. All nonlinear models have $B = 5$ branches and preprocessor coefficients, respectively, and employ odd-order Legendre polynomials of orders 1 to 9 as branch nonlinearities $f_{b} {\cdot}$ . Thereby, all CGMs are HGMs and all CMs are HMs. The computational complexity of the individual models relative to the complexity of a CGM is listed in Table 4.4. Therein, the first line (“approximate”) corresponds to the relative complexity estimates introduced with the algorithms in Sections 4.3 and 4.4. The second line (“FLOPS”) contains the relative numbers of FLoating Point Operations (FLOPS) of the particular method in comparison to a CGM. To this end, the FLOPS have been determined analogously to [32]. Operations counted as FLOPS are real-valued additions, multiplications and divisions. The complex-valued DFT-domain operations are considered by mapping complex-valued additions, multiplications and divisions to multiple real-valued operations. In particular, a complex-valued addition is counted as 2 real-valued FLOPS, a complex-valued multiplication is counted as 6 real-valued FLOPS and the division of a complex number by a real-valued number is counted as 2 real-valued FLOPS. Accumulating the FLOPS of all equations necessary for processing a signal frame of M samples by a particular filter structure corresponds to the algorithm's number of FLOPS. Normalizing these FLOPS to the FLOPS of a CGM leads to the relative FLOPS in line two of Table 4.4.

Table 4.4

Computational complexity of linear and nonlinear models relative to a CGM with B = 5 branches and P = 4 partitions. The approximate values are a summary of the numbers in the individual algorithms' sections, whereas the number of FLOPS are computed analogously to [32] for a frameshift of M = 512

	linear	PSAFX-CM	FX-CM	SSA-CM	PSA-CGM	CGM
approximate	0.23	0.32	0.32 + ε	0.40	0.45	1
FLOPs	0.23	0.32	0.37	0.50	0.58	1

Obviously, the qualitative rating of the different algorithms in Table 4.4 is similar by both complexity estimation methods: the further right an algorithm is in Table 4.4, the more computationally demanding it is.

The ERLE achievable with the different adaptive filter structures with frozen parameters (like during double talk) is listed in Figs. 4.11A to 4.11C for Devices 1 to 3, respectively. Obviously, linear AEC (A in Fig. 4.11) suffers a lot from the severe nonlinearities, leading to ERLE below $10 dB$ under all reverberation conditions and to particularly bad performance under the low-reverberation conditions of Scenario 1. All nonlinear models outperform the linear model for all devices and scenarios. The CGM (G in Fig. 4.11), being the most complex model, shows consistently high performance. However, in particular the PSA-CGM (E and F in Fig. 4.11) approaches or even outperforms the more complex CGM. Especially the PSA-CGM's CM output (E in Fig. 4.11) performs very well. This confirms that the actual acoustic echo paths can be approximated well by CMs. However, the FX adaptation of a CM (C in Fig. 4.11) closely approaches the performance of the CGM only for Device 1. Still, introducing further efficiency into the FX adaptation by a PSA decomposition (B in Fig. 4.11) does not lead to any noticeable disadvantage. The SSA-CM performance (D in Fig. 4.11) resembles the performance of other nonlinear approaches for Device 1, but performs worse for the other devices (while still outperforming the linear filter).

Figure 4.11 ERLE performance: the efficient SA-filtering approaches allow for nonlinear AEC and may even reach the performance of the filters without the SA extension (SAFX-CM/FX-CM and SA-CGM/CGM). (A) Smartphone. (B) Electrodynamic Loudspeaker. (C) Cellphone.

To sum up, all SA filters successfully alleviate the computational burden for nonlinear AEC. The performance of a general CGM can be reached in the experiments by an efficient PSA-CGM. An even less complex (but also less effective) direct CM adaptation by an FX algorithm can benefit in terms of computational complexity from PSA filtering without losing ERLE performance.

4.6 Outlook on Model Structure Estimation

All methods described in this chapter identify the parameters of nonlinear models adaptively. Yet, the algorithms do not choose an optimum model structure, such as a particular base function set, the order of a polynomial nonlinearity, or the memory length of linear subsystems. However, knowledge about the model structure is highly desirable for two reasons. First, minimizing the number of parameters leads to computational efficiency and second, unnecessarily estimated and redundant parameters introduce gradient noise (estimation errors) and consequently reduce the accuracy of the model.

Both disadvantages can be tackled by employing convex filter combinations [75], as described in Chapter 11. To briefly illustrate this concept for the scope of this chapter, consider two competing models A and B with the respective outputs ${\hat{y}}^{(A)} (n)$ and ${\hat{y}}^{(B)} (n)$ , modeling the same system with output $y (n)$ . Although modeling the same system, the models A and B are assumed to have a different structure (e.g., memory length of linear subsystems) and consequently different numbers and values of parameters (e.g., filter coefficients of linear subsystems). Convex filter combinations superimpose the outputs $y^{(A)} (n)$ and $y^{(B)} (n)$ according to

$\hat{y} (n) = (1 - η) \cdot {\hat{y}}^{(A)} (n) + η \cdot {\hat{y}}^{(B)} (n),$

where the mixing weight $η, 0 \leq η \leq 1$ , is determined adaptively. In practice, this can perform like an adaptive switch indicating the more suitable of the two competing models and thereby the more suitable model structure. Convex combination-based methods for structure estimation can be applied to all kinds of CGMs (e.g., Volterra, Legendre, Chebyshev and Fourier nonlinear filters). As the structure estimation methods are described for the example of Volterra filters in the original publications, the methods will be explained here first for Volterra filters. To this end, consider Volterra filters, which consist of kernels of orders $o, o \in {1, \dots, O}$ , with maximum order O. Each kernel consists of diagonals with maximum relative time lags $R_{o}$ in the branch nonlinearities of the oth kernel ( $R_{o}$ is also known as diagonal radius) and has linear subsystems of length $L_{o}$ in kernel o. As O, $R_{o}$ and $L_{o}$ determine the model structure, they will be referred to as the structure parameters of the Volterra model. These structure parameters determine which of the filter coefficients of a Volterra filter are modeled in practice.

As proposed in [43], an identified model can automatically be trimmed by an additional adaptive structure, where a Volterra kernel is clustered into disjoint groups of diagonals and each group's output signal $y_{g} (n)$ , where g is the group index, is replaced by a convex combination of the group's output $y_{g} (n)$ with 0 (the result of filtering with an all-zero kernel region). Thereby, irrelevant diagonals, which are dominated by noise and would increase the modeling error, can be trimmed for the final computation of the model output. Thereby, the modeling accuracy is increased at the expense of a slight increase in computational complexity. In this case, the mixing variables of the convex combinations take the role of additional structure parameters. In a similar way, the FLAFs in [48] disable nonlinear model parts as well. Another related approach is the so-called EVOLVE concept [76], which approaches the estimation of a Volterra filter structure in a bottom-up way by starting with a small filter and successively growing the model until the optimal size is reached. Filters which are evolutionary w.r.t. a structure parameter dimension $S \in {O, R_{o}, L_{o}}$ perform a convex combination of a filter with a small model size (a low structure parameter value $S_{1}$ ) and of a filter with a larger model size (a larger structure parameter value $S_{2} > S_{1}$ ). In the case of a clear preference of one of both models, both models are grown or shrunk into the preferred direction. Note that, for reasons of efficiency, the larger model can also be “virtualized” by simply augmenting the small model with additional filter coefficients [76,77]. As described in [76], an evolution in $R_{o}$ and $L_{o}$ and bypassing kernel orders is possible by cascading multiple convex combinations, each of which decides for the activation of another region of filter taps of the Volterra filter. This concept can, of course, be applied to CGMs with other branch nonlinearities as well: Legendre-based CGMs (cf., [45,46] or Section 4.5.2), where ith powers in the Volterra filters' branch nonlinearities are replaced by ith-order Legendre polynomials, have the same set of structure parameters ${O, R_{o}, L_{o}}$ and can be grown the same way as Volterra filters in an EVOLVE fashion. Also, the model trimming from [43] can directly be applied to such Legendre-based CGMs. As opposed to [43], EVOLVE [76] starts with a low-complexity model and successively grows the model if required. Thereby, not only gradient noise in irrelevant filter coefficients is prevented, but also the computational complexity is only as high as the complexity of the optimum filter structure found by the algorithm, plus some overhead for the virtualized models of increased size.

Like in SA filtering, efficient EVOLVE implementations can employ a block partitioning suitable for PB-FNLMS adaptation. This allows one to grow and shrink the memory lengths of the diagonals in steps of M taps by adding or removing entire length-M partitions. However, the efficiency and accuracy are achieved differently from SA filtering. SA filtering estimates the nonlinear characteristics of the unknown system after decomposing the system into a nonlinear system with short nonzero temporal support and the estimated nonlinear characteristics are then extrapolated efficiently using CMs with parametric preprocessors. In EVOLVE, such a model decomposition and extrapolation does not take place, so SA filtering locally (at particular filter coefficients) observes the nonlinearity and extrapolates it to a global system representation, assuming a fixed model structure (CM structure). As opposed to this, EVOLVE locally grows the model (at particular diagonals in a particular Volterra kernel), aiming for the best-performing global model structure and parametrization as final result. Yet, the SA and the EVOLVE approach do not contradict each other but can jointly be implemented with the same block partitioning scheme. The resulting evolutionary SA filters may, in the future, dynamically grow their structure (e.g., number of partitions P and the number of branches B) while extrapolating nonlinearities with a preprocessor system, whenever the computational complexity restricts additional partitions or a direct estimation of the particular CGM partitions is disadvantageous due to adaptation noise.

4.7 Summary

In this chapter, recent advances in LIP nonlinear filters have been presented with the focus on the concept of Significance-Aware (SA) filtering and nonlinear AEC as a challenging application in the background. After a unified description of nonlinear models as variants of CGMs and LIP CMs, where the output is a bilinear function of the model parameters, the parameter estimation methods for both structures have been revisited, showing that some of the recent developments can be described as FX algorithms. This lead to the proposal of a novel, highly efficient FX algorithm for the adaptation of CMs with parametric nonlinear preprocessors. Afterwards, the SA-filtering concept has been described in two variants: the SSA-CM and the PSA-CGM, which gain efficiency by equalizing parts of or partly compensating the unknown system before estimating the system's nonlinearity. Additionally, a novel PSA FX algorithm has been introduced as well. The algorithms have been compared in AEC experiments with recordings from three different physical devices, which have shown the high efficacy of SA filtering and its potential to save computational complexity without sacrificing modeling accuracy. Finally, model structure estimation based on convex filter combinations has been summarized and potential synergies with SA filters in a joint partitioned-block signal processing framework have been pointed out.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 4: Recent Advances on LIP Nonlinear Filters and Their Applications

Create new playlist

Sign In

Sign Up