3.12 Best Approximations and Orthogonal Projections

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

The following result plays an important role in many signal processing applications involving estimation and optimization tasks such as solving an overdetermined system of equations in the least-squares (LS) sense [10, 11].

Theorem 3.12.1. (Orthogonal Projection Theorem [6]) Let V $V$ be an inner-product vector space and let S $S$ be a subspace of it. For every vector v ∈ V $V$ there may exist a best approximation s ∈ S $S$ such that

∥v − s∥≤∥v − s′∥ , ∀s′∈S. $‖ v - s ‖ \leq ‖ v - s^{'} ‖, \forall s^{'} \in S .$

(3.100)

This condition holds if and only if

(v − s)⊥s′, ∀s′∈S. $(v - s) ⊥ s^{'}, \forall s^{'} \in S .$

(3.101)

Moreover, if this best approximation exists, then it is unique. Now let an orthogonal basis for S $S$ be denoted by {s₁, s₂,…, s_m}. It can be shown that the best approximation of v by vectors in S $S$ can be computed as

s = ∑k = 1m⟨v|sk⟩∥sk∥2sk. $s = \sum_{k = 1}^{m} \frac{〈 v | s_{k} 〉}{{‖ s_{k} ‖}^{2}} s_{k} .$

(3.102)

In other words, the best approximation of v by vectors in S $S$ is given by the orthogonal projection s of vector v onto the subspace S $S$ . Notice further that in (3.102), the k^th term in the sum is the projection of v on the direction of vector s_k. Consider now that for any vector v ∈ V $V$ , there exists an orthogonal projection onto subspace S $S$ . Thus, a mapping P : V $V$ → S $S$ can be defined that associates to each v ∈ V $V$ , its orthogonal projection onto S $S$ . Mapping P will be referred to as the orthogonal projection of V $V$ onto subspace S $S$ , and some of its properties will be considered in Exercise 3.23.18.

3.13 Least Squares Approximations

Suppose T : V $V$ → W $W$ is a linear transformation such that for some v ∈ V $V$ and w ∈ W $W$ , the equation

T(v) = w $T (v) = w$

(3.103)

does not present an exact solution; i.e., w does not lie in the range space of T. In practice, such equations are often encountered and solutions that satisfy (3.103) in an approximative manner must be determined. One of the most common approaches adopted for solving (3.103) is to find a vector v such that the norm of the least-squares (LS) criterion:

e = ∥T(v) − w∥2, $e = {‖ T (v) - w ‖}^{2},$

is minimized. This problem reduces to finding the vector w′ ∈ R $R$ (T) such that

T(v) = w′, $T (v) = w^{'},$

(3.104)

and ‖w′ - w‖ is minimized. The corresponding solution v of (3.104) is called the least squares solution. The following result characterizes the LS solution of (3.103) (see e.g., [6, 8]).

Theorem 3.13.1. Vector v ∈ V $V$ is the minimizer of ‖T(v) - w‖ if and only if

T∗T(v) = T∗(w). $T^{*} T (v) = T^{*} (w) .$

(3.105)

If T^*T is invertible, then the LS solution is unique and given by:

v = (T∗T) − 1T∗(w), $v = {(T^{*} T)}^{- 1} T^{*} (w),$

(3.106)

and

w′ = T(v) = T(T∗T) − 1T∗(w). $w^{'} = T (v) = T {(T^{*} T)}^{- 1} T^{*} (w) .$

(3.107)

A justification of this result is delegated to Exercise 3.23.20.

Now consider the inconsistent overdetermined m × n(m > n) system of linear equations Av = w. If w does not lie in the column space (R $R$ (A)), there is no exact solution. Using the LS method, a unique vector ŵ ∈ R $R$ (A) that is closest to w can be found; or equivalently we can choose v̂ to minimize the norm of the error ‖Av̂−w‖ Finding the least squares solution v̂ is equivalent to finding

wˆ = Avˆ, $\hat{w} = A \hat{v},$

(3.108)

that is closest to w than any other vector in the column space R $R$ (A). Any solution to the new system of equations:

Av = wˆ $A v = \hat{w}$

(3.109)

is referred to as a least-squares solution for Av = w. Finally, note that the least-squares solution to Av = w is any vector v that minimizes ‖ Av −w‖The following result characterizes the LS solution of minimum norm for an overdetermined system of linear equations [3, 6, 8].

Theorem 3.13.2. If A is an m x n matrix (m ≥ n) and A* A is invertible, then among all the least squares solutions of the system of equations: Av = w, there is a unique LS solution of minimum norm given by:

v = (A∗A) − 1A∗w. $v = {(A^{*} A)}^{- 1} A^{*} w .$

(3.110)

The matrix (A*A)⁻¹A* is called the Moore-Penrose pseudoinverse of A. The approximation ŵ of w by the vectors in R $R$ (A) is given by:

wˆ = Av = A(A∗A) − 1A∗w, $\hat{w} = A v = A {(A^{*} A)}^{- 1} A^{*} w,$

(3.111)

and it is obtained by projecting w onto R $R$ (A). As a corollary, the error term w – ŵ is orthogonal to R $R$ (A). Reference [3] presents several extensions of these results and a characterization of all the LS solutions.

3.14 Angles Between Subspaces

3.14.1 Principal Angles Between Subspaces

Recall that given two non-zero vectors v, w ∈ ℝⁿ, the angle between them is computed as

θ = cos − 1⟨v|w⟩∥v∥ ⋅ ∥w∥,(0≤θ≤π2/). $θ = c o s^{- 1} \frac{〈 v | w 〉}{‖ v ‖ \cdot ‖ w ‖}, (0 \leq θ \leq \frac{π}{2}) .$

(3.112)

Now suppose that we are given two subspaces represented in terms of the range spaces of two matrices A and B. A set of angles between them can be defined recursively. These angles will be referred to as principal (canonical) angles. Matrices A and B are assumed real-valued with the same number n of rows and dim R $R$ (A) = p and dim R $R$ (B) = q(q ≥ p). The principal angles θ₁, θ₂, …, θ_q ∈ [0, ^π/2] between the two column spaces R $R$ (A) and R $R$ (B) are recursively defined via:

cos(θk) = maxv∈R(A) maxw∈R(B)vTw = vTkwk, k = 1,…,q, $\begin{matrix} c o s (θ_{k}) = \max_{v \in ℛ (A)} \max_{w \in ℛ (B)} v^{T} w \\ = v_{k}^{T} w_{k}, k = 1, \dots, q, \end{matrix}$

subject to these conditions: ‖v‖ = ‖w‖ = 1, v^Tv_i = 0, w^Tw_i; =0, i = 1,…, k −1. Notice that the notation ‖·‖ stands for the standard Euclidean norm of a vector, and the resulting vectors v_i’s and w_i’s are called the principal vectors. Notice also that during step k, the algorithm determines θ_k, and the associated principal vectors v_k and w_k by searching for vectors in subspaces that are orthogonal with respect to v_i and w_i, i = 1,…, k − 1, respectively. Therefore, by searching for vectors that present the largest angle between them and that are orthogonal to the already found principal vectors, the complete sets of principal angles and principal vectors are obtained.

The concept of principal angles between two vector spaces has found applications in assessing the degree of correlation between two sets of random variables and has been used in canonical analysis [13, 14]. Applications of principal angles also include random processes [15, 16], stochastic realization of random processes and system identification [17, 18, 19], pattern recognition and machine learning (see e.g., [20, 21]), and signal processing and communications (see e.g., the signal estimation and detection applications presented in [22]).

3.15 Eigenvalues and Eigenvectors

Let T be a linear operator on a vector space V $V$ , T : V $V$ → V $V$ . An eigenvalue of T is a scalar λ such that

T(v) = λv, $T (v) = λ v,$

(3.113)

for a non-zero vector v ∈ V $V$ . The vector v is called an eigenvector of T. Similarly, for an n × n matrix A that is the matrix representation of the linear transformation T, an eigenvalue is a scalar λ that makes the matrix A − λI singular. The eigenvector of matrix A corresponding to eigenvalue λ is any non-zero vector v that satisfies the equation Av = λν. The polynomial in λ expressed as:

χA(λ) = det(A − λI), $χ_{A} (λ) = \det (A - λ I),$

(3.114)

is called the characteristic polynomial of A and the equation

det(A − λI) = 0, $\det (A - λ I) = 0,$

(3.115)

is referred to as the characteristic equation of A. The set of roots of the characteristic equation provides the set of eigenvalues, and therefore, it is also referred to as the spectrum of A and denoted by the notation by σ(A). Thus, every eigenvalue λ is a root of the characteristic polynomial. The eigenspace corresponding to an eigenvalue λ of A is defined as the subspace

Eλ = {v∈V|Av = λv}. $E_{λ} = {v \in V | A v = λ v} .$

(3.116)

If A has distinct eigenvalues, the eigenspaces are all one-dimensional. The following general results, whose proofs are deferred to Exercises 3.23.22 and 3.23.23, hold.

Theorem 3.15.1. Let A be an n × n matrix with distinct eigenvalues. Then, the eigenvectors of A form a linearly independent set.

Theorem 3.15.2. The eigenvectors of a Hermitian matrix are real and the eigenvectors corresponding to distinct eigenvalues are orthogonal.

The concepts of eigenvalue and eigenvector play an important role in building high-resolution spectral estimation techniques, designing efficient algorithms for localization and tracking in antenna array processing [12], checking stability of digital filters, system identification, designing efficient signaling schemes in multiantenna (multi–input multi–output (MIMO)) based communication systems [25], etc.

3.15.1 Diagonalization

A square matrix A is diagonalizable if it can be factored as follows:

A = SΛS − 1, $A = S Λ S^{- 1},$

(3.117)

for some non-singular matrix S and diagonal matrix Λ. Notice that Λ presents the same eigenvalues as A. In general, two matrices A and Λ that are related by means of an equation of the form (3.117) are referred to as similar matrices,and they present the same set of eigenvalues. The following result provides an alternative characterization for diagonalizable matrices.

Theorem 3.15.3. If A is an n × n matrix, then it is diagonalizable if it admits n linearly independent eigenvectors or n distinct eigenvalues.

A justification of this result is left to Exercise 3.23.23.

Using the characteristic function, it can be checked that the eigenvalues of a diagonal matrix coincides with its diagonal entries. Also, it is not difficult to show that the i^th column of S is actually the eigenvector of A corresponding to the i^th diagonal entry (eigenvalue) of Λ (A), for any i = 1,… ,n. The following result shows a simple way to calculate the higher order powers of a matrix in terms of the higher-order powers of its eigenvalues.

Theorem 3.15.4. Let A be an n × n matrix with eigenvalues λ₁,…, λ_n. Then the eigenvalues of A^k (k∈ ℤ) are λk1,…,λkn $λ_{1}^{k}, \dots, λ_{n}^{k}$ . Moreover, it follows that

Ak = (SΛS − 1)k = SΛkS − 1. $\begin{array}{l} A^{k} = {(S Λ S^{- 1})}^{k} \\ = S Λ^{k} S^{- 1} . \end{array}$

(3.118)

3.16 Schur Factorization and Spectral Theorem

Theorem 3.16.1. Let A be an n × n matrix. There exists a unitary matrix Q such that the multiplication

T = Q†AQ $T = Q^{†} A Q$

(3.119)

is upper triangular, or equivalently A can be factorized as

A = QTQ†, $A = Q T Q^{†},$

(3.120)

where T is an upper triangular matrix.

The decomposition (3.120) is referred to as Schur’s factorization. Notice that the diagonal elements of T consist of the eigenvalues of A. The proof of this theorem can be done using the induction principle with respect to n. Alternative justifications for Schur factorization can be found in [3, 5, 24].

Theorem 3.16.2. (Spectral Theorem) For any Hermitian matrix A, there is a unitary diagonalizing matrix Q.

Stated differently, if A is a Hermitian matrix, the matrix T in Schur’s factorization must be diagonal. Assume there exists a unitary matrix Q such that (3.119) holds, where T is an upper triangular matrix. However, A^† = A, because A is Hermitian symmetric. Hence, T^† = Q^†A^†Q = Q^†AQ = T, Thus, T is also Hermitian symmetric. Being also an upper triangular matrix, T should be diagonal. As an immediate corollary, if A is a real-valued symmetric matrix, then it can be diagonalized via an orthogonal matrix Q, i.e., the equality Q^TAQ = T holds. Since Q is diagonalizing A to T, the diagonal elements of T are the eigenvalues of A, and the columns of Q consist of eigenvectors of A, i.e., A = QTQ^T represents the spectral decomposition of a symmetric matrix.

3.17 Singular Value Decomposition (SVD)

Let A be an m × n real-valued matrix. Then the Singular Value Decomposition (SVD) of matrix A is given by the factorization [24]:

A = UΣVT, $A = U Σ V^{T},$

(3.121)

where Σ is an m × n diagonal matrix, and U and V are two orthogonal matrices of dimensions m × m and n × n, respectively. In general, Σ can be expressed as

Σ = [Σ′0(m − r) × r0r × (n − r)0(m − r) × (n − r)], $Σ = [\begin{matrix} Σ^{'} & 0_{r \times (n - r)} \\ 0_{(m - r) \times r} & 0_{(m - r) \times (n - r)} \end{matrix}],$

(3.122)

where

Σ′ = ⎡⎣⎢⎢⎢⎢⎢⎢σ10⋮00σ2⋱⋯⋯⋱⋱00⋮0σr⎤⎦⎥⎥⎥⎥⎥⎥r × r σi>0,1≤i<r, $Σ^{'} = {[\begin{matrix} σ_{1} & 0 & \dots & 0 \\ 0 & σ_{2} & ⋱ & ⋮ \\ ⋮ & ⋱ & ⋱ & 0 \\ 0 & \dots & 0 & σ_{r} \end{matrix}]}_{r \times r} σ_{i} > 0, 1 \leq i < r,$

(3.123)

and r stands for the rank of matrix A. The non-zero entries on the main diagonal of Σ, σ_i, i = 1,…r, are non-negative and are referred to as the singular values of A. Notice that using (3.121), it follows that AA^T =UΣΣ^TU^T and A^TA =VΣ^TΣV^T Because matrices U and V are orthogonal, and ΣΣ^T and Σ^T Σ are diagonal, it follows that the singular values of matrix A are equal with the square roots of non-zero eigenvalues of A^†A and AA^†. Furthermore, the columns of U and V are the eigenvectors of AA^T and A^TA, respectively. The SVD holds also for complex-valued matrices. If A is a complex-valued matrix, then its SVD assumes the factorization: A = UΣV^†, where U and V are unitary matrices and Σ is a diagonal matrix with nonnegative entries. Although in the SVD factorization, σ₁,…,σ_r are unique, U and V are not.

The SVD (3.121) can be re-expressed in these two alternative ways:

AV = UΣ, $A V = U Σ,$

(3.124)

and

ATU = VΣT. $A^{T} U = V Σ^{T} .$

(3.125)

Equating the i^th columns in the left-hand-side (LHS) and right-hand side (RHS) of (3.124) and (3.125), it follows that:

Avi = σiui i = 1,2,…,r, $A v_{i} = σ_{i} u_{i} i = 1, 2, \dots, r,$

(3.126)

and

ATui = {σivi01≤i≤rr + 1≤i≤m, $A^{T} u_{i} = {\begin{array}{l} σ_{i} v_{i} & 1 \leq i \leq r \\ 0 & r + 1 \leq i \leq m \end{array},$

(3.127)

where vectors u_i and v_i stand for the ith column of matrices U and V, respectively. Vectors v_i ’s and u_i ’s are called right singular vectors and left singular vectors of A, respectively. It follows also that {v₁, v₂,…, v_r} and {u₁, u₂,…, u_r} represent orthogonal bases for R $R$ (A^T) and R $R$ (A), respectively. Likewise, {v_r+1, v_r+2,…, v_n} and {u_r₊₁, u_r₊₂,…, u_r} are orthogonal bases for N $N$ ull (A) and N $N$ ull (A^T), respectively. Thus, the SVD provides bases for all the four fundamental subspaces of a matrix: R $R$ (A), N $N$ ull (A), R $R$ (A)^⊥, and N $N$ ull (A)^⊥. Notice also that the rank r of A is obtained immediately from its SVD, and it is equal to the number of non-zero entries located on the main diagonal of Σ. SVD enables also the construction of the orthogonal projections on the four fundamental subspaces of a matrix. E.g., if matrices U_1:_r and V_1:_r stand for the sub-matrices formed by the columns 1 through r of matrices U and V, respectively, then U1:rUT1:r $U_{1 : r} U_{1 : r}^{T}$ and V1:rVT1:r $V_{1 : r} V_{1 : r}^{T}$ represent the orthogonal projection matrices on R(A) and N $N$ ull (A)^⊥ , respectively [3, 24].

Reference [24] presents a number of applications of SVD in calculating row-rank matrix approximations, principal angles between subspaces, generalized inverses, solving systems of equations in the standard or total least-squares sense, orthogonal projections, etc. During the past decades, SVD proved to be a powerful tool in solving numerous problems in statistics, signal processing, wireless communications, information theory, pattern recognition, machine learning and other fields. Also, SVD played an important role in enabling the calculation of the capacity of a MIMO channel by reducing the MIMO channel to a number of independent single–input single–output (SISO) channels [25], developing efficient high-resolution spectral estimation techniques [12], and designing optimal channel estimation, signaling and equalization techniques.

3.18 Rayleigh Quotient

Suppose A is an n × n Hermitian matrix, and denote by λ₁ ≥ λ₂ ≥…≥ λ_n and u₁, u₂,…, u_n its eigenvalues and corresponding orthogonal eigenvectors, respectively. Notice also that the Spectral Theorem 3.16.2 guarantees the existence of a factorization of the form A = UΛU^H, with U unitary and Λ diagonal matrix containing on its main diagonal the eigenvalues of A. One can observe that orthonormal vectors u₁, u₂,…, u_n, representing the columns of U, stand also for the eigenvectors of A corresponding to eigenvalues λ₁,…, λ_n. If x is a non-zero vector in Cⁿ, then the Rayleigh quotient of x with respect to Hermitian matrix A is defined as:

R(x) = ⟨Ax|x⟩⟨x|x⟩ = x†Axx†x. $R (x) = \frac{〈 A x | x 〉}{〈 x | x 〉} = \frac{x^{†} A x}{x^{†} x} .$

(3.128)

It can be shown that the minimum and the maximum values of the Rayleigh quotient are λ_n and λ₁, respectively, i.e.,

λn≤R(x)≤λ1. $λ_{n} \leq R (x) \leq λ_{1} .$

(3.129)

A quick justification of (3.129) can be obtained by expressing the arbitrary vector x ∈ ℂⁿ as a linear combination of the orthogonal eigenvectors of A, u_i, i = 1,…, n (or in terms of the singular vectors of A obtained via SVD),

x = ∑i = 1naiui. $x = \sum_{i = 1}^{n} a_{i} u_{i} .$

(3.130)

Plugging3.130 into3.128, it follows that:

R(x) = ∑ni = 1λi|ai|2∑ni = 1|ai|2 = ∑ni = 1λi|ai|2∥a∥22, $\begin{matrix} R (x) = \frac{\sum_{i = 1}^{n} λ_{i} {| a_{i} |}^{2}}{\sum_{i = 1}^{n} {| a_{i} |}^{2}} \\ = \frac{\sum_{i = 1}^{n} λ_{i} {| a_{i} |}^{2}}{{‖ a ‖}_{2}^{2}}, \end{matrix}$

(3.131)

where a = [a₁, a₂,…,a_n]^T. Because of the ordering λ₁ ≥ λ₂ ≥…≥ λ_n, it follows that the numerator in (3.131) can be lower- and upper-bounded, respectively, by:

λn∥a∥22≤∑i = 1nλi|ai|2≤λ1∥a∥22, $λ_{n} {‖ a ‖}_{2}^{2} \leq \sum_{i = 1}^{n} λ_{i} {| a_{i} |}^{2} \leq λ_{1} {‖ a ‖}_{2}^{2},$

from where (3.129) follows immediately. Rayleigh quotient found applicability in signal processing and wireless communication applications dealing with the design of filters and equalization schemes with improved performance.

3.19 Application of SVD and Rayleigh Quotient: Principal Component Analysis

Suppose we have an m × n (m ≥ n) matrix X of rank r, and whose column vectors are possibly correlated (i.e., are not orthogonal). The principal component analysis (PCA) is a method that transforms a set of correlated variables (in this case column vectors of matrix X) into a possibly smaller set of uncorrelated variables {y₁,… y_r}, called principal components vectors, that present the largest possible norm (variance, power or variability) and span the same vector space as the column space of matrix X. Herein application, the “uncorrelatedness” condition is achieved by imposing that the vectors y_i’s, i = 1, 2,… r, are orthogonal.

Consider now the SVD (or spectral) decomposition of correlation matrix C = X^TX/(n − 1) under the form C = UΛU^T, where Λ is a diagonal matrix with the diagonal entries λ₁ ≥ λ₂ ≥ … ≥ λ_r > 0, and U = [u₁, u₂,…, u_n] is anorthogonal matrix. The facts that y_i’s must lie in the range space of X and must be uncorrelated imply that at most r principal component vectors might be defined. Next we will determine the first principal component y₁ as the vector of largest variance or norm that lies in the range space of matrix X. This reduces to finding the vector y₁ = Xv₁, where v₁ is an arbitrary vector of unit norm, such that

Var(y1) = yT1y1n − 1 = (Xv1)TXv1n − 1 = vT1Cv1, $V a r (y_{1}) = \frac{y_{1}^{T} y_{1}}{n - 1} = \frac{{(X v_{1})}^{T} X v_{1}}{n - 1} = v_{1}^{T} C v_{1},$

(3.132)

is maximized. The Rayleigh quotient implies immediately that v₁ should coincide with the eigenvector (u₁) corresponding to the largest eigenvalue (λ₁) of correlation matrix C. Now, using the Induction Principle, we will construct the remaining principal directions. Assuming that the first k− 1 principal component vectors y₁,…, y_k₋₁ are given by the eigenvectors u₁,…, u_k₋₁ corresponding to the first k − 1 largest eigenvalues, respectively, we want to prove that the kth principal component vector is given by the eigenvector u_k corresponding to the kth eigenvalue λ_k. Consider that y_k = Xv_k, where v_k is an arbitrary vector of unit norm. Because y_k should be orthogonal to y₁,…, y_k₋₁, it follows that

yTkyι = 0,∀l = 1,…,k − 1. $y_{k}^{T} y ι = 0, \forall l = 1, \dots, k - 1.$

(3.133)

Equation (3.133) leads further to

(vTkXT)(Xuι) = (n − 1)vTkCuι = (n − 1)λιvTkuι = 0, ∀l = 1,…,k − 1 $(v_{k}^{T} X^{T}) (X u ι) = (n - 1) v_{k}^{T} C u ι = (n - 1) λ_{ι} v_{k}^{T} u ι = 0, \forall l = 1, \dots, k - 1$

(3.134)

Thus, it follows that necessarily v_k should lie in the range space of orthogonal eigenvectors u_k,…, u_n. Consequently, there exist the scalars α_l, l = k,…,n,such that:

vk = ∑l = knαιuι, $v_{k} = \sum_{l = k}^{n} α_{ι} u_{ι},$

(3.135)

and ∑nl = kα2l = 1 $\sum_{l = k}^{n} α_{l}^{2} = 1$ is imposed to enforce the unit norm for v_k. Notice that the maximization of the kth principal component norm or variance is equivalent to the maximization of

yTkyk = (∑j = knαjuj)TXTX(∑l = knαιuι) = ∑l = knα2ιλ2ι, $y_{k}^{T} y_{k} = {(\sum_{j = k}^{n} α_{j} u_{j})}^{T} X^{T} X (\sum_{l = k}^{n} α_{ι} u_{ι}) = \sum_{l = k}^{n} α_{ι}^{2} λ_{ι}^{2},$

(3.136)

with respect to scalars αι, l = k,…, n, and assuming the constraint ∑nl = kα2l = 1 $\sum_{l = k}^{n} α_{l}^{2} = 1$ . Obviously, the maximum of (3.136) is achieved when αι = 0 for l > k, and |α_k | = 1. This concludes the proof. In other words, the eigenvectors corresponding to the correlation matrix C yield the principal component directions. Depending on the applications where the PCA method is employed, PCA undertakes different names such as Karhunen-Loeve transformation, SVD or Hotelling transform. For more details on PCA, the reference [25] presents a comprehensive overview of PCA and its applications.

3.20 Special Matrices

In this section several special matrices such as block, circulant, Toeplitz, Vandermonde, stochastic, generalized inverse and positive definite matrices will be introduced.

3.20.1 Block Matrices

A matrix A is referred to as a block matrix if it can be represented in terms of sub-matrices A_i,j of appropriate sizes as follows:

A = ⎡⎣⎢⎢⎢⎢⎢A1,1A2,1⋮Am,1A1,2A2,2⋮Am,2……⋱…A1,nA2,n⋮Am,n⎤⎦⎥⎥⎥⎥⎥ . $A = [\begin{matrix} A_{1, 1} & A_{1, 2} & \dots & A_{1, n} \\ A_{2, 1} & A_{2, 2} & \dots & A_{2, n} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ A_{m, 1} & A_{m, 2} & \dots & A_{m, n} \end{matrix}] .$

(3.137)

A block matrix with non-zero entries on its main diagonal and zero-blocks elsewhere is called a block diagonal matrix, and assumes the expression:

A = ⎡⎣⎢⎢A1⋮0…⋱…0⋮An⎤⎦⎥⎥ . $A = [\begin{matrix} A_{1} & \dots & 0 \\ ⋮ & ⋱ & ⋮ \\ 0 & \dots & A_{n} \end{matrix}] .$

Some properties pertaining to the multiplication of two block matrices A and B of appropriate sizes are listed below.

• If A = [A1A2] $A = [\begin{matrix} A_{1} \\ A_{2} \end{matrix}]$ , then AB = [A1BA2B] $AB = [\begin{matrix} A_{1} B \\ A_{2} B \end{matrix}]$

• If B = [B₁ B₂ ] , then A [ B₁ B₂ ] = [ AB₁ AB₂].

• if A = [A1A2],B = [B1,B2] $A = [\begin{matrix} A_{1} \\ A_{2} \end{matrix}], B = [B_{1}, B_{2}]$ , the AB = [A1A2][B1,B2] = [A1B1A1B2A2B1A2B2] $AB = [\begin{matrix} A_{1} \\ A_{2} \end{matrix}] [B_{1}, B_{2}] = [\begin{matrix} A_{1} B_{1} A_{1} B_{2} \\ A_{2} B_{1} A_{2} B_{2} \end{matrix}]$

• If A = [A1,1A1,2A2,1A2,2]B = [B1,1B1,2B2,1B2,2] $A = [\begin{matrix} A_{1, 1} A_{1, 2} \\ A_{2, 1} A_{2, 2} \end{matrix}] B = [\begin{matrix} B_{1, 1} B_{1, 2} \\ B_{2, 1} B_{2, 2} \end{matrix}]$ , then AB = [A1,1B1,1 + A1,2B2,1A1,1B1,2 + A1,2B2,2A2,1B1,1 + A2,2B2,1A2,1B1,2 + A2,2B2,2] $AB = [\begin{matrix} A_{1, 1} B_{1, 1} + A_{1, 2} B_{2, 1} A_{1, 1} B_{1, 2} + A_{1, 2} B_{2, 2} \\ A_{2, 1} B_{1, 1} + A_{2, 2} B_{2, 1} A_{2, 1} B_{1, 2} + A_{2, 2} B_{2, 2} \end{matrix}]$

The proof of the last property is delegated to Exercise 3.23.25. Reference [7] presents additional properties of block matrices.

3.20.2 Circulant Matrices

An n × n matrix A is called circulant if each row of A is obtained by right-shifting the previous row by one column, and the element in the last column being shifted to the first column. In other words, a circulant matrix can be defined in terms of elements located on its first row, as depicted below:

A = circ(a1,…,an) = ⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢a1an⋮a2↘↘a2a 1⋮a3↘………↘↘an − 1an − 2⋮an↘↘anan − 1⋮a1⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥ . $A = c i r c (a_{1}, \dots, a_{n}) = [\begin{array}{l} a_{1} & a_{2} & \dots & a_{n - 1} & a_{n} \\ ↘ & ↘ & ↘ & ↘ \\ a_{n} & a_{1} & \dots & a_{n - 2} & a_{n - 1} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ ↘ & ↘ & ↘ \\ a_{2} & a_{3} & \dots & a_{n} & a_{1} \end{array}] .$

(3.138)

The circularity condition for matrix A = [a_k,l] can be described alternatively through the condition:

ak,l = {ak − 1,l − 1ak − 1,nk = 2,…,n, l = 2,…,nk = 2,…,n, l = 1. $a_{k, l} = {\begin{array}{l} a_{k - 1, l - 1} & k = 2, \dots, n, l = 2, \dots, n \\ a_{k - 1, n} & k = 2, \dots, n, l = 1 \end{array} .$

Consider now two circulant matrices A and B of the same size and arbitrary scalars α and β. Then the following statements hold:

• A^T is circulant.

• αA + βB is circulant.

• A^k is circulant, for any positive integer k.

• If A is nonsingular, then A⁻¹ is circulant.

• AB is circulant.

The proofs of the last two properties are delegated to Exercise 3.23.27.

An important feature of circulant matrices is that they can be diagonalized using Discrete Fourier Transform (DFT). In fact, for an arbitrary n × n circulant matrix A the following spectral decomposition holds:

A = F∗nΛnFn, $A = F_{n}^{*} Λ_{n} F_{n},$

(3.139)

where F_n is the DFT matrix with entries

[Fn]k,l = 1n√e − 2π − 1√kl/n, 0≤k,l≤n − 1, ${[F_{n}]}_{k, l} = \frac{1}{\sqrt{n}} e^{- 2 π \sqrt{- 1} k l / n}, 0 \leq k, l \leq n - 1,$

and Λ_n is a diagonal matrix with its diagonal entries equal to the eigenvalues of matrix A. The diagonal entries of Λ_n, i.e., the eigenvalues of circulant matrix A, can be calculated in O(n log n) flops (floating point operations) by considering the fast Fourier transform (FFT) of the first column of A. Indeed, using (3.139) and denoting by 1 the vector of all ones and e₁ = (1, 0,…, 0)^T, the following equations hold:

ΛnFn = FnA,ΛnFne1 = FnAe1,Λn1 = FnAe1, $\begin{matrix} Λ_{n} F_{n} = F_{n} A, \\ Λ_{n} F_{n} e_{1} = F_{n} A e_{1}, \\ Λ_{n} 1 = F_{n} A e_{1}, \end{matrix}$

(3.140)

where we took into consideration in (3.140) the fact that F_ne₁ = 1. Equating the k entries in the LHS and RHS of (3.140), it follows that the diagonal entries λ_k of Λ_n are given by

λk = 1n√∑j = 0n − 1aje − 2πijk/n, k = 0,…,n − 1,i = − 1−−−−√ $λ_{k} = \frac{1}{\sqrt{n}} \sum_{j = 0}^{n - 1} a_{j} e^{- 2 π i j k / n}, k = 0, \dots, n - 1, i = \sqrt{- 1}$

Exercise 3.23.28 describes how the eigenvectors of circulant matrix A can be efficiently determined. Circular matrices are important in signal processing applications because they arise in the matrix representation of circular convolution between two discrete-time sequences (see e.g., [26]). Since the inverse DFT of the product of two DFTs is equal to the circular convolution of the two sequences whose DFTs are multiplied, circular matrices play a fundamental role in performing the linear convolution between two sequences [26]. Reference [27] presents a nice introduction to circular matrices. Additional properties of circular matrices can be found in [7].

3.20.3 Toeplitz Matrices

An n × n matrix A = [a_k,l] whose entries satisfy the property a_k,l = a_k−_l is called a Toeplitz matrix. In other words, the entry (k, l) of a Toeplitz matrix is perfectly described by the difference between its row and column indices. Due to this fact, the following well-known characterization of Toeplitz matrices is used in the literature: all matrix entries located on diagonals parallel with the main diagonal are equal. As a corollary, all Toeplitz matrices are perfectly determined by its first column and first row vectors. Notice also that circulant matrices are a special class of Toeplitz matrices. Toeplitz matrices are present in the representation of discrete-time linear convolutions between two sequences, and autocorrelation (auto-covariance) matrices of stationary stochastic processes. An interesting and quite useful property of Toeplitz matrices is the fact that Toeplitz systems of equations of the form Ax = b can be solved efficiently in O(n²) flops using the Levinson-Durbin algorithm. This result plays an important role in designing parametric autoregressive-moving average (ARMA) or AR power spectral density estimators (solving the Yule-Walker system of equations) [12], calculation of Wiener (minimum mean-square error (MMSE)) equalizers and predictors. For additional information, reference [27] represents an excellent elementary introduction to Toeplitz and circulant matrices.

3.20.4 Hankel Matrices

A matrix A = [a_k,l] whose entries satisfy the condition: a_k,l = a_k+l is referred to as a Hankel matrix. Hankel matrices enjoy the property that the entries located on anti-diagonals are equal. Therefore, the entries of a Hankel matrix are perfectly described by its first row and last column. Hankel matrices found applicability in the design of state-space based system identification and channel estimation algorithms [28]. Hankel matrices enjoy similar properties to Toeplitz matrices in terms of computing their inverses. This can be understood by noticing the result: if J stands for the exchange matrix, i.e., the matrix with all the entries on the main counter-diagonal equal to 1 and with zeros elsewhere, then JA is Toeplitz for a square Hankel matrix A [7].

3.20.5 Vandermonde Matrices

A Vandermonde matrix A admits the following structure:

A = ⎡⎣⎢⎢⎢⎢⎢11⋮1a1a2⋮ama21a22⋮a2m………am − 11am − 12⋮am − 1m⎤⎦⎥⎥⎥⎥⎥ $A = [\begin{array}{l} 1 & a_{1} & a_{1}^{2} & \dots & a_{1}^{m - 1} \\ 1 & a_{2} & a_{2}^{2} & \dots & a_{2}^{m - 1} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ 1 & a_{m} & a_{m}^{2} & \dots & a_{m}^{m - 1} \end{array}]$

(3.141)

= ⎡⎣⎢⎢⎢⎢⎢⎢⎢1a1a21am − 111a2a22am − 12…………1ama2mam − 1m⎤⎦⎥⎥⎥⎥⎥⎥⎥T $= {[\begin{array}{l} 1 & 1 & \dots & 1 \\ a_{1} & a_{2} & \dots & a_{m} \\ a_{1}^{2} & a_{2}^{2} & \dots & a_{m}^{2} \\ a_{1}^{m - 1} & a_{2}^{m - 1} & \dots & a_{m}^{m - 1} \end{array}]}^{T}$

(3.142)

In other words, in each row in a Vandermonde matrix are terms of a geometric progression. A square Vandermonde matrix A enjoys the property that its determinant can be expressed in closed-form expression:

det(A) = ∏1≤i≤j≤m(aj − ai). $\det (A) = \prod_{1 \leq i \leq j \leq m} (a_{j} - a_{i}) .$

(3.143)

Vandermonde matrices arise in applications involving evaluations of polynomials at a specific set of values, polynomial least-squares fitting and polynomial interpolation designs.

3.20.6 Normal Matrices

A complex square matrix A is said to be a normal if

A†A = AA† , $A^{†} A = A A^{†},$

where we recall that superscript. ^†stands for the conjugate transposition (Hermitian transposition) operation. For real-valued matrices, a normal matrix satisfies the condition:

ATA = AAT . $A^{T} A = A A^{T} .$

One of the most important features of normal matrices is the fact that they are unitarily diagonalizable, i.e., there exists a unitary matrix U such that U^HAU = diag(λ₁,…, λ_n) [5]. In particular, notice that a Hermitian symmetric (or antisymmetric) matrix is a normal matrix.

3.20.7 Stochastic Matrices

An n × n matrix A = [α_k,l] is called stochastic (probabilistic or transition) matrix if all its entries are non-negative (a_k,l ≥ 0) and the entries in each row add up to one ∑nl = 1ak,l = 1 $\sum_{l = 1}^{n} a_{k, l} = 1$ A stochastic matrix can be interpreted as the transition probability matrix in a Markov chain, and its use arises in many fields such as probability, statistics, computer science, optimization (majorization theory), etc. A matrix with non-negative entries and with each column summing up to one is also referred to as a stochastic matrix. A doubly stochastic matrix is a matrix with non-negative entries, and in which each row and each column add up to one.

3.20.8 Positive and Negative Definite Matrices

Assume x is an n × 1 vector in ℝⁿ and n × n matrix A is symmetric. The quadratic form f (x) = x^TAx is said to be positive (negative) definite if f (x) > 0 (f (x) < 0) for any non-zero vector x. By definition, the real symmetric matrix A is called:

• Positive Definite iff x^TAx > 0 , ∀x ∈ ℝⁿ − {0}.

• Positive Semi-Definite iff x^TAx ≥ 0 , ∀x ∈ ℝⁿ − {0}.

• Negative Definite iff x^TAx < 0 , ∀x ∈ ℝⁿ − {0}.

• Negative Semi-Definite iff x^T Ax ≤ 0 , ∀x ∈ ℝⁿ − {0}.

The following result provides an alternative way to assess when a matrix is positive definite.

Theorem 3.20.1. A real symmetric matrix A is positive definite if and only if all of its eigenvalues are positive.

The proof of this result is left to Exercise 3.23.30. An alternative method to check for the positive definiteness of a matrix is to verify that the determinants of all principal submatrices are all positive (see e.g., [8]). Additional information about the properties and applications of positive definite matrices can be found in the book [29].

3.20.9 Matrix Condition Number

Consider the system of equations Ax = b. Matrix A is said to be ill-conditioned if a relatively small change in b (Δb) causes relatively large changes in the solution vector x (Δx). In contrast, matrix A is called a well-conditioned matrix, if a relatively small change in b causes relatively small changes in solution x. Assume A is a square nonsingular matrix and

Ax′ = A(x + Δx) = b + Δb , $A x^{'} = A (x + Δ x) = b + Δ b,$

(3.144)

where x′ represents the solution error caused by an error Δb in b. Considering Ax = b, it follows that ΑΔx = Δb and hence:

Δx = A − 1Δb. $Δ x = A^{- 1} Δ b .$

(3.145)

Therefore,

∥Δx∥≤∥∥A − 1∥∥ ∥Δb∥. $‖ Δ x ‖ \leq ‖ A^{- 1} ‖ ‖ Δ b ‖ .$

(3.146)

On the other hand,

∥Δb∥ = ∥AΔx∥≤∥A∥ ∥Δx∥. $‖ Δ b ‖ = ‖ A Δ x ‖ \leq ‖ A ‖ ‖ Δ x ‖ .$

(3.147)

Therefore,

∥Δb∥∥A∥≤∥Δx∥≤∥∥A − 1∥∥ ∥Δb∥. $\frac{‖ Δ b ‖}{‖ A ‖} \leq ‖ Δ x ‖ \leq ‖ A^{- 1} ‖ ‖ Δ b ‖ .$

(3.148)

Considering x = A⁻¹ b, and following the same reasoning it is concluded that:

∥b∥∥A∥≤∥x∥≤∥∥A − 1∥∥ ∥b∥. $\frac{‖ b ‖}{‖ A ‖} \leq ‖ x ‖ \leq ‖ A^{- 1} ‖ ‖ b ‖ .$

(3.149)

From (3.148) and (3.149), it follows that:

1∥A∥ ∥∥A − 1∥∥∥Δb∥∥b∥≤∥Δx∥∥x∥≤∥A∥∥∥A − 1∥∥ ∥Δb∥∥b∥. $\frac{1}{‖ A ‖ ‖ A^{- 1} ‖} \frac{‖ Δ b ‖}{‖ b ‖} \leq \frac{‖ Δ x ‖}{‖ x ‖} \leq ‖ A ‖ ‖ A^{- 1} ‖ \frac{‖ Δ b ‖}{‖ b ‖} .$

(3.150)

The number ‖A‖ ‖A⁻¹‖ in3.150 is referred to as the condition number of matrix A and is denoted by cond (A). Therefore,3.150 can be expressed as:

1cond (A)(∥Δb∥∥b∥)≤∥Δx∥∥x∥≤cond(A)(∥Δb∥∥b∥). $\frac{1}{cond (A)} (\frac{‖ Δ b ‖}{‖ b ‖}) \leq \frac{‖ Δ x ‖}{‖ x ‖} \leq cond (A) (\frac{‖ Δ b ‖}{‖ b ‖}) .$

(3.151)

This equation relates the relative solution error (‖Δx‖/ ‖x‖) to the relative error in (‖Δb‖/ ‖b‖) If cond (A) is close to 1, then the relative errors in x and b are close. However, if cond (A) is a large number, then the relative errors in x can be significantly larger than the relative errors in b.

3.20.10 Sherman-Morrison-Woodbury Identity

Consider that the system of linear equations Ax = b admits the solution vector x₀. Consider also the slightly changed version of this system of equations under the form: Ãx = b, and whose solution we want to determine under the assumption that Ã represents a perturbed version of A. The problem that we would like to address is finding a computationally efficient solution for the perturbed system of equations assuming knowledge of x₀ and that Ã represents a rank-one perturbation of A, i.e.,

$\tilde{A} = A - u v^{T},$

(3.152)

in which u and v are two arbitrary vectors. Because of3.152, the inverse of Ã can be expressed in terms of the original matrix A as follows:

${(\tilde{A})}^{- 1} = A^{- 1} + \frac{(A^{- 1} u) (v^{T} A^{- 1})}{1 - v^{T} A^{- 1} u} .$

(3.153)

Identity (3.153) is referred to as the Sherman-Morrison identity. Right-multiplying (3.153) with b, the solution of the perturbed system x̃ = (Ã)⁻¹b can be expressed as:

$\tilde{x} = x_{0} + Δ x = x_{0} + \frac{(A^{- 1} u) (v^{T} x_{0})}{1 - v^{T} A^{- 1} u},$

(3.154)

where we took into account A⁻¹b = x₀. Sherman-Morrison formula can be extended to situations when matrix A is subject to higher-order rank perturbations of the form:

$\tilde{A} = A - U V^{T},$

where U and V are n × m matrices and n > m. In this case, the inverse of the perturbed matrix Ã is calculated by means of a generalization of the Sherman-Morrison formulation, referred to as the Woodbury identity :

${(\tilde{A})}^{- 1} = A^{- 1} + A^{- 1} U {(I_{m} - V^{T} A^{- 1} U)}^{- 1} V^{T} A^{- 1},$

(3.155)

Equation (3.155) is often referred to as the Sherman-Morrison-Woodbury identity. Right-multiplying (3.155) with vector b, it turns out that the solution of the perturbed system can be expressed in terms of x₀ as follows:

$x^{'} = x_{0} + Δ x$

(3.156)

$= x_{0} + A^{- 1} U {(I_{m} - V^{T} A^{- 1} U)}^{- 1} V^{T} x_{0} .$

(3.157)

The Sherman-Morrison-Woodbury identities (3.153) and (3.155) play an important role in the development of recursive (iterative or online) estimation and filtering techniques. The RLS algorithm and Kalman filter are two important examples in this regard [10, 11, 30, 31].

3.20.11 Schur Complement

Many practical applications require the calculation of the inverse of a block- partitioned matrix. Assume that the matrix E is block-partitioned as:

$E = [\begin{matrix} A & B \\ C & D \end{matrix}],$

(3.158)

where submatrix A is invertible. The expression F = D − CA⁻¹ B is called the Schur complement of submatrix A in E and represents an important ingredient in computing the inverse of E in terms of the inverse of A. It turns out that the inverse of the partitioned matrix E in3.158 can be expressed in terms of the inverses of submatrix A and its Schur complement F as follows:

$E^{- 1} = [\begin{matrix} A^{- 1} + A^{- 1} B F^{- 1} C A^{- 1} - A^{- 1} B F^{- 1} \\ - F^{- 1} C A^{- 1} F^{- 1} \end{matrix}] .$

(3.159)

Furthermore, the determinant of matrix E can be calculated using determinants of submatrix A and its Schur complement:

$| E | = | A | | F | .$

(3.160)

The proof of (3.160) is deferred to Exercise 3.23.32. Additional properties and applications of Schur complement can be found in [7, 32].

3.20.12 Generalized Inverses

Often it is necessary to extend the concept of matrix invertibility to a wider class of rank-deficient matrices by means of concepts such as generalized inverse or pseudoinverse of a matrix. One of the major applications of the concept of generalized inverse is in solving systems of equations of the form: Ax = b, where matrix A is neither square nor nonsingular. Such a system of equations may not admit any solution or may admit an infinite number of solutions.

Basic Definition of Generalized Inverse

The generalized inverse of a matrix is not unique. Any matrix that satisfies the following equation:

$A A ͞ A = A,$

(3.161)

is referred to as a generalized inverse of matrix A and is denoted by A⁻. In general, the generalized inverse of an m × n matrix is an n × m matrix. Some of the main properties of generalized inverses are enumerated below.

1. If A is a square and nonsingular matrix, then A⁻ = A⁻¹, i.e., the generalized matrix coincides with the regular inverse.

2. Transposition: (A^T)⁻ = (A⁻ )^T.

3. Assume that matrix A is of size m × n, then

(a) rank (AA⁻) = rank (A) .

(b) rank (I − A⁻A) = n − rank (A).

4. Consider m × n matrices A and B and their n × m generalized inverses, A⁻ and A⁻, respectively.

(a) A (I + A)⁻ = (I + A⁻)⁻ .

(b) (A⁻ + B⁻)⁻ = A (A + B)⁻B .

(d) (I + AB)⁻ = I − A (I + BA)⁻ B .

5. Generalized inverse of block matrices. Consider the block matrix

$E = [\begin{matrix} A & B \\ C & D \end{matrix}],$

where A, B, C, and D are submatrices of E and A⁻, B⁻, C⁻, and D⁻ stand for their generalized inverses, respectively. The following formula enables calculation of generalized inverse of E:

$E^{-} = [\begin{matrix} A^{-} + A^{-} B F^{-} C A^{-} - A^{-} B F^{-} \\ - F^{-} C A^{-} F^{-} \end{matrix}],$

where F = D −CA⁻B.

An excellent reference dealing with the properties and applications of generalized inverses is [33]. More recently, [7] provides a good list of the major properties exhibited by generalized (weak) inverses.

Generalized Inverses to Solve Consistent Systems of Equations

Assume Ax = b is a consistent under-determined system of equations, i.e., the system of equations admits more than one solution. The generalized inverse A⁻ of matrix A is assumed to satisfy (3.161). Right-multiplying (3.161) with x, it follows that:

$A A^{-} A x = A x .$

(3.162)

Since Ax = b, (3.162) simplifies further to:

$A (A^{-} b) = b .$

(3.163)

Therefore, x = A⁻b can be interpreted as a solution of the original system of equations Ax = b. Notice that in general any vector of the form: x = A⁻b + (I − A⁻A) y, where y is any n × 1 vector, is a solution of the original system of equations (and in fact all the solutions are of this type [3]). Indeed, one can check directly by performing the required calculations that

$\begin{matrix} A x = A (A^{-} b + (I - A^{-} A) y) \\ = A A^{-} b + (A - A A^{-} A) y \\ = b - 0 \\ = b . \end{matrix}$

The reverse implication can be proved by using the fact that any solution of inhomogeneous system of equations Ax = b can be expressed as the sum between a particular solution (A⁻ b) of the original system Ax = b and the general solution of the homogeneous system of equations Ax = 0, which coincides with the null space of A. It is not difficult to check that I − A⁻A is the projector onto the null space of A [3].

Moore-Penrose Inverse

As mentioned before, the definition of a generalized inverse does not yield a unique generalized inverse, unless it is restricted to satisfy some additional requirements. By imposing several additional constraints to the definition (3.161), a unique generalized inverse can be obtained for any arbitrary matrix A. The resulting unique generalized inverse is referred to as the Moore-Penrose inverse of A and is represented in terms of the notation: A⁺. The additional constraints satisfied by the Moore-Penrose inverse are listed below:

$A^{+} A A^{+} = A^{+},$

(3.164)

${(A^{+} A)}^{T} = A^{+} A,$

(3.165)

${(A A^{+})}^{T} = A A^{+} .$

(3.166)

The Moore-Penrose inverse A⁺ exists for any arbitrary matrix A and is unique. There are several methods to calculate the Moore-Penrose inverse of a matrix. One of the most common methods is based on the singular values decomposition of the matrix, explained in Section 3.17. Let the SVD of matrix A be given by A = UΣV^T, where Σ is the same as in (3.122), and matrices U, V are orthogonal. The pseudoinverse of Σ, Σ⁺, is an n × m diagonal matrix with ¹/σ_i, i = 1,… ,r, on its main diagonal, and 0 elsewhere. The Moore-Penrose inverse of A is given by (see e.g., [3, 7, 24]):

$A^{+} = V Σ^{+} U^{} .$

(3.167)

3.21 Matrix Operations

Several useful matrix operations such as Kronecker product, Hadamard product, dot product, direct sum, and matrix differentiation will be introduced in this section.

3.21.1 Kronecker Product

The Kronecker product of m × n matrix A = [a_ij ] and p × q matrix B = [b_ij] is represented in terms of notation ⊗, and it is given by the matrix mp × nq matrix C:

$C = A \otimes B = [\begin{matrix} a_{11} B & a_{12} B & \dots & a_{1, n} B \\ a_{21} B & a_{22} B & \dots & a_{2 n} B \\ ⋮ & ⋮ & ⋮ & ⋮ \\ a_{m 1} B & a_{m 2} B & \dots & a_{m n} B \end{matrix}] .$

(3.168)

This Kronecker product is also referred to as direct multiplication.

Suppose that A, B, C, and D are matrices with appropriate sizes. Consider also that a and b are vectors of appropriate sizes, and α and β are scalars. The following properties hold for the Kronecker product (see e.g., [7] for additional properties):

• (A + B) ⊗ C = A ⊗ C + B ⊗C.

• A ⊗ (B + C) = A ⊗ B + A ⊗C.

• (A ⊗ B) ⊗ C = A ⊗ (B ⊗ C).

• (A ⊗ B) (C ⊗ D) = (AC) ⊗ (BD) .

• (A ⊗ B)^T = A^T ⊗ B^T .

• a ⊗ b^T = b^T ⊗ a = ab^T .

• (A ⊗ B)⁻¹ = A⁻¹ ⊗ B⁻¹ .

• rank (A ⊗ B) = rank (A) rank (B).

3.21.2 Hadamard Product

The Hadamard product, also called the element-wise or array product, is represented in terms of notation ⊙, and it is defined for two m × n matrices A = [a_k,l] and B = [b_k,l] as follows:

$A ⊙ B = [\begin{matrix} a_{1, 1} b_{1, 1} & a_{2, 1} b_{2, 1} & \dots & a_{1, n} b_{1, n} \\ a_{2, 1} b_{2, 1} & a_{2, 2} b_{2, 2} & \dots & a_{2, n} b_{2, n} \\ ⋮ & ⋮ & \dots & ⋮ \\ a_{m, 1} b_{m, 1} & a_{m, 2} b_{m, 2} & \dots & a_{m, n} b_{m, n} \end{matrix}] .$

(3.169)

For arbitrary matrices A, B, and C, assumed of appropriate sizes, the Hadamard product exhibits the following properties:

• A ⊙ B = B ⊙ A

• (A ⊙B) ⊙ C = A ⊙ (B ⊙C)

• (A + B) ⊙ C = A ⊙ C + B ⊙ C

• (A ⊙B)^T = A^T ⊙ B^T

• C (A ⊙B) = (CA) ⊙ B = A ⊙ (CB)

3.21.3 Dot Product

The dot product, also called the inner-product of two m × n matrices A = [a_i] and B = [b_i] is defined in terms of their column vectors a_i and b_i, respectively, as follows:

$〈 A | B 〉 = \sum_{i = 1}^{m} a_{i}^{T} b_{i} .$

(3.170)

The dot product satisfies the following properties:

• 〈A|B〉 = tr (A^TB)

• 〈A|B〉 = 〈A^T|B^T〉

3.21.4 Direct Sum

The direct sum of k square matrices A₁, A₂, …, A_k with sizes n₁ × n₁, n₂ × n₂, …, n_k × n_k, respectively, is a block diagonal square matrix of size (n₁ + n₂ + … + n_k) × (n₁ + n₂ + … + n_k) defined by

$A_{1} \oplus A_{2} \oplus \dots \oplus A_{k} = diag (A_{1}, A_{2}, \dots, A_{k}) = [\begin{matrix} A_{1} & 0 & \dots & 0 \\ 0 & A_{2} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & A_{k} \end{matrix}] .$

(3.171)

The following properties of direct sum hold:

• tr (A₁ ⊕ A₂ ⊕ … A_k) = tr (A₁) + tr (A₂) + … + tr (A_k)

• |A₁ ⊕ A₂ ⊕ … ⊕ A_k| = |A₁| |A₂|… |A_k|

• rank (A₁ ⊕ A₂ ⊕ … ⊕ A_k) = rank (A₁) + rank (A₂) + … + rank (A_k)

3.21.5 Differentiation of Matrix and Vector Functions

This subsection focuses on presenting some basic conventions and results regarding the calculation of derivatives for matrix- and vector-valued functions with respect to a scalar, vector, or matrix-valued independent variable.

Differentiation with Respect to a Scalar

When the differentiation variable is a scalar, the derivative has the same structure as the operand and its elements are simply derivatives of the elements of the operand. Assuming that the m × 1 vector x(t) = (x₁(t),…,x_m(t))^T and m × n matrix Y(t) = [y_i,j (t)] are functions of the independent scalar variable t, then their derivatives with respect to t are given by:

$\frac{\partial x (t)}{\partial t} = {(\frac{\partial x_{1} (t)}{\partial t}, \dots, \frac{\partial x_{m} (t)}{\partial t})}^{T},$

(3.172)

$\frac{\partial Y (t)}{\partial t} = [\begin{matrix} \frac{\partial y_{1, 1} (t)}{\partial t} & \dots & \frac{\partial y_{1, n} (t)}{\partial t} \\ ⋮ & ⋱ & ⋮ \\ \frac{\partial y_{m, 1} (t)}{\partial t} & \dots & \frac{\partial y_{m, n} (t)}{\partial t} \end{matrix}] = [\frac{\partial y_{i, j} (t)}{\partial t}] .$

(3.173)

Similarly the higher order derivatives of a vector (matrix) with respect to (wrt) a scalar is a vector (matrix) of the same size, and whose entries consist of the higher order derivatives of the vector (matrix) entries.

Differentiation with Respect to a Vector

The derivative of a scalar function x(t) with respect to an n × 1 vector t = (t₁, …,t_n)^T is also referred to as the gradient, and it is defined as the vector:

$\frac{\partial x (t)}{\partial t^{T}} = (\frac{\partial x (t)}{\partial t_{1}}, \dots . \frac{\partial x (t)}{\partial t_{n}}) .$

(3.174)

An alternative notation for the gradient of x is:

$\nabla x = \frac{\partial x (t)}{\partial t^{T}} .$

(3.175)

The derivative of any operand with respect to a vector yields a vector for each element of the operand. Therefore, the derivative of a vector with respect to a vector is a matrix. Moreover, the derivative of a matrix-valued function with respect to a vector is a three-dimensional array.

Assume the m × 1 vector x(t) = (x₁(t), …,x_m(t))^T and p × q matrix Y(t) = [y_i,_j(t)]. Then the derivatives of x(t) and Y(t) with respect to the n × 1 vector t = (t₁, …,t_n)^T are given by:

$\frac{\partial x (t)}{\partial t^{T}} = [\begin{matrix} \frac{\partial x_{1} (t)}{\partial t} & \frac{\partial x_{1} (t)}{\partial t_{2}} & \dots & \frac{\partial x_{1} (t)}{\partial t_{n}} \\ \frac{\partial x_{2} (t)}{\partial t_{1}} & \frac{\partial x_{2} (t)}{\partial t_{2}} & \dots & \frac{\partial x_{2} (t)}{\partial t_{n}} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ \frac{\partial x_{m} (t)}{\partial t_{1}} & \frac{\partial x_{m} (t)}{\partial t_{2}} & \dots & \frac{\partial x_{m} (t)}{\partial t_{n}} \end{matrix}],$

(3.176)

$\frac{\partial Y}{\partial t} = [y_{i, j, k}] , y_{i, k, j} = \frac{\partial y_{i, j}}{\partial t_{k}}, i = 1, \dots, p, j = 1, \dots, q, k = 1, \dots, n$

(3.177)

The m × n matrix $\frac{\partial x (t)}{\partial t^{T}}$ is referred to as the Jacobian of x(t) and is usually represented in terms of the notation:

$J_{x} - \frac{\partial x}{\partial t^{T}} .$

(3.178)

Higher order derivatives can be calculated by applying first-order derivatives successively several times; e.g., the second-order derivative of a scalar x(t) with respect to the n × 1 vector t = (t₁, …,t_n)^T is the first-order derivative of the vector $\frac{\partial x (t)}{\partial t^{T}} = {(\frac{\partial x (t)}{\partial t_{1}}, \dots, \frac{\partial x (t)}{\partial t_{n}})}^{T}$ with respect to the vector t, and it is equal to the matrix:

$\begin{matrix} \frac{\partial^{2} x (t)}{\partial t \partial t^{T}} = \frac{\partial}{\partial t} (\frac{\partial x (t)}{\partial t^{T}}) \\ = \frac{\partial}{\partial t} (\frac{\partial x (t)}{\partial t_{1}}, \dots, \frac{\partial x (t)}{\partial t_{n}}) \\ = (\begin{matrix} \frac{\partial^{2} x}{\partial t_{1}^{2}} & \frac{\partial^{2} x}{\partial t_{2} t_{1}} & \dots & \frac{\partial^{2} x}{\partial t_{n} t_{1}} \\ \frac{\partial^{2} x}{\partial t_{1} t_{2}} & \frac{\partial^{2} x}{\partial t_{2}^{2}} & \dots & \frac{\partial^{2} x}{\partial t_{n} t_{2}} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ \frac{\partial^{2} x}{\partial t_{1} t_{n}} & \frac{\partial^{2} x}{\partial t_{2} t_{n}} & \dots & \frac{\partial^{2} x}{\partial t_{n}^{2}} \end{matrix}) \\ = (\begin{matrix} \frac{\partial^{2} x}{\partial t_{1}^{2}} & \frac{\partial^{2} x}{\partial t_{1} t_{2}} & \dots & \frac{\partial^{2} x}{\partial t_{1} t_{n}} \\ \frac{\partial^{2} x}{\partial t_{2} t_{1}} & \frac{\partial^{2} x}{\partial t_{2}^{2}} & \dots & \frac{\partial^{2} x}{\partial t_{2} t_{n}} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ \frac{\partial^{2} x}{\partial t_{n} t_{1}} & \frac{\partial^{2} x}{\partial t_{n} t_{2}} & \dots & \frac{\partial^{2} x}{\partial t_{n}^{2}} \end{matrix}) . \end{matrix}$

(3.179)

The matrix in (3.179) is called the Hessian of x(t) and is represented in general in terms of one of these notations:

$\nabla \nabla x = \nabla^{2} x = \frac{\partial^{2} x}{\partial t^{2}} .$

(3.180)

Derivatives of some commonly used expressions with respect to the n × 1 vector x are shown in Table 3.1. In this table, A stands for an n × n matrix, and x and a are

Table 3.1: Derivatives of commonly used expressions with respect to vectors.

f (x)	$\frac{\partial f (x)}{\partial x}$	$\frac{\partial f (x)}{\partial x^{T}}$
x^T a	a	a^T
a^T Ax	A^T a	a^T A
Ax	A^T	A
x^T x	2x	2x^T
x^T Ax	Ax + A^T x	x^T A^T + x^T A

Table 3.2: Derivatives of commonly used expressions with respect to matrices.

f(X)	$\frac{\partial f (x)}{\partial x}$
a^T Xb	ab^T
a^T X^T b	ba^T
tr(AX)	A^T
tr (XX^T )	2X
tr (AXB)	A^T B^T
\|X\|	\|X\| (X⁻¹ )^T

n × 1 vectors. Generally, the following conventions are adopted $\frac{\partial f (x)}{\partial x} = {(\frac{\partial f (x)}{\partial x^{T}})}^{T}$ and $\frac{\partial f (x)}{\partial x^{T}} = {(\frac{\partial f (x)}{\partial x})}^{T}$ .

Differentiation with Respect to a Matrix

The derivative of each element of an object with respect to a matrix is the matrix of derivatives of that element with respect to the matrix entries. Therefore, the derivatives of a scalar, vector, or a matrix with respect to a matrix are a matrix, three-dimensional array, and four-dimensional array, respectively. Derivatives of some commonly used expressions with respect to a n × n matrix X are depicted in Table 3.2. In this table, A, B, and X are n × n matrices, and a and b are n × 1 vectors. Finally, we would like to mention that reference [7] presents a much more comprehensive description of the results pertaining to the computation of derivatives for vector- and matrix-valued functions.

3.22 References and Further Studies

This chapter presented a short summary of the main concepts and results from linear algebra that are used most frequently in the fields of signal processing, wireless communications, and networking. For more detailed and comprehensive treatment of the basic concepts in linear algebra and matrix theory, the readers are directed to the excellent references [3, 5, 8, 34]. For numerical stable implementations of various matrix factorizations and linear algebra algorithms, [24] represents still one of the state-of-the-art references in this area. The handbook [7] presents an encyclopedic coverage of matrix theory from the viewpoint of a statistician. Another excellent reference that provides an in-depth treatment of basic properties of matrices is [5]. Finally, we end this chapter with the remark that linear algebra represents a very important tool for designing efficient algorithms for signal processing, communications, and networking applications (see e.g., [10, 12, 30, 35]).

Acknowledgment: This work was supported by NSF Grant 0915444.

3.23 Exercises

Exercise 3.23.1. Show that the subset $S$ ⊂ $V$ , ( $S$ ≠ {0}), where $V$ is a vector space over field $F$ , is a subspace if and only if

$\forall s_{1} s_{2} \in S, \forall r_{1}, r_{2} \in ℱ : r_{1} s_{1} + r_{2} s_{2} \in S .$

Exercise 3.23.2.Let B = {b₁, b₂,…, b_n} be an ordered basis for vector space $V$ .

• Show that every inner-product on $V$ can be computed using the values

$a_{i, j} = 〈 b_{j} | b_{i} 〉 i = 1, \dots, n, j = 1, \dots, n .$

Hint: Take x, y ∈ $V$ and write them as linear combinations of b_i’s (i = 1,….n). Form the inner-product 〈x|y 〉 and the coordinate matrices of x and y in the ordered basis B, [x]_B and [y]_B. Show that

$〈 x | y 〉 = {[y]}_{B}^{†} A {[x]}_{B},$

where n × n matrix A = [a_i,j] consists of the inner-products of the vectors in ordered basis B.

• Show that A is Hermitian positive definite matrix.

Exercise 3.23.3. This problem establishes the Cauchy-Schwarz inequality for the induced norm. Let v be a non-zero vector and define:

$v^{'} = w - \frac{〈 w | v 〉}{‖ v ‖^{2}} v .$

Show that 〈v′|v 〉 = 0 and from the fact 0 ≤ ‖v′‖² conclude that

${| 〈 v | w 〉 |}^{2} \leq {‖ v ‖}^{2} {‖ w ‖}^{2} .$

Using this result, prove the triangle inequality for the induced norm.

Exercise 3.23.4. Show that every orthogonal subset B (with non-zero elements) of a vector space $V$ is linearly independent.

Exercise 3.23.5. Prove that in the Gram-Schmidt orthogonalization process introduced in Theorem 3.1.5, the following relationship holds:

$〈 w_{k + 1} | w_{i} 〉 = 0, i = 1, \dots, k .$

Thus, {w₁, w₂,…, w_k₊₁} is an orthogonal set. Does the set {w₁, w₂,…, w_k₊₁} represent a basis for span (v₁, v₂,…, v_k₊₁)7 Justify.

Exercise 3.23.6. Using the fact that for an m × n matrix A, elementary row (column) operations do not affect the column (row) rank, prove that

$row rank (A) = column rank (A) .$

Exercise 3.23.7. If A is an m × n matrix over field $F$ , show that the corresponding linear transformation T_A : $F$ ⁿ → $F$ ^m is injective if and only if rank (A) = n and is surjective if and only if rank (A) = m.

Exercise 3.23.8. Show that for a projection operation T : $V$ → $V$ ,

$T (v) = v, \forall v \in ℛ (T) .$

Use this statement to prove Theorem 3.2.3.

Exercise 3.23.9. By computing 〈Pu|(I — P)v 〉 show that a Hermitian idempotent matrix P represents an orthogonal projection. Use this fact to prove Theorem 3.2.4.

Exercise 3.23.10. Let B = {v₁, v₂,…, v_n} be a basis for vector space $V$ .

• Show that there is a unique linear functional v_i* on $V$ , such that

$v_{i}^{*} (v_{j}) = δ_{i j} .$

• Show that the set of n distinct linear functionals on $V$ obtained form B (as mentioned above), are linearly independent.

• From the dimension of $V$ ^* what can we conclude about the dual basis of B?

• Let f be a linear functional on $V$ . Write f as a linear combination of v_i*’s (i = 1, 2,… ,n). Moreover, let v be a vector in $V$ . So we can write it as a linear combination of v_i’s. Write the expression for v_i*(v) and find the unique expression for v as a linear combination of v_i’s (i = 1,…, n).

Exercise 3.23.11. Let T : $V$ → $W$ be a linear transformation between two inner-product finite dimensional vector spaces $V$ and $W$ .

• Let w ∈ $R$ (T) and n ∈ $N$ ull (T*). By computing 〈w|n 〉 show that

$N ull (T^{*}) \subset ℛ {(T)}^{⊥} .$

• Now, let w ∈ $R$ (T)^⊥ . Using the definition of the adjoint of T, show that

$w \in N ull (T^{*}),$

and

$ℛ {(T)}^{⊥} \subset N ull (T^{*}) .$

• Using the previous results, show that

$ℛ {(T)}^{⊥} = N ull (T^{*}) .$

• Show that R(T^*)^⊥ = $N$ ull (T) using a similar argument.

Exercise 3.23.12. Using the properties of the four fundamental subspaces of linear transformation T, show that T is surjective if and only if T^* is injective, and T is injective if and only if T^* is surjective.

Exercise 3.23.13. Show the following equalities:

$\begin{matrix} N ull (T^{*} T) = N ull (T) \\ N ull (T T^{*}) = N ull (T^{*}) \\ ℛ (T^{*} T) = ℛ (T^{*}) \\ ℛ (T T^{*}) = ℛ (T) . \end{matrix}$

Hint: First, show that $N$ ull (T) ⊂ $N$ ull (T^*T). Then using the definition of the adjoint, show that if T^*T(v) = 0, then T(v) = 0; and hence, $N$ ull (T*T) ⊂ $N$ ull (T).

Exercise 3.23.14. Show that the two definitions of the operator norm are equivalent, i.e.,

$\begin{matrix} ‖ T ‖ = \sup_{v \in V - {0}} \frac{‖ T (v) ‖}{‖ v ‖} \\ = \sup_{v \in V, ‖ v ‖ = 1} ‖ T (v) ‖ . \end{matrix}$

Exercise 3.23.15. This problem derives several matrix norms.

• Using the definition of ‖x‖₁, show that:

$\begin{matrix} {‖ A ‖}_{1} = \max_{{‖ x ‖}_{1} = 1} {‖ A x ‖}_{1} \\ = \max_{j} \sum_{i} | a_{i j} | . \end{matrix}$

• Using the definition of ‖x‖ _∞, show that

$\begin{matrix} {‖ A ‖}_{\infty} = \max_{{‖ x ‖}_{\infty} = 1} {‖ A x ‖}_{\infty} \\ = \max_{i} \sum_{j} | a_{i j} | . \end{matrix}$

• In order to find ‖ A‖₂, use Lagrange’s multiplier technique to minimize

$f (x) = x^{†} A^{†} A x - λ (x^{†} x - 1) .$

Hint: Take the gradient of f (x) with respect to [x λ] and equate it to zero. Exploit then the eigenvalues and eigenvectors of matrix A^†A.

• Show that

${‖ A ‖}_{F}^{2} = tr (A^{†} A) = \sum_{i = 1}^{m} \sum_{j = 1}^{n} {| a_{i j} |}^{2} .$

Exercise 3.23.16. Let A be an m × n matrix and U be an m × m unitary matrix. Show that

${‖ U A ‖}_{F} = {‖ A ‖}_{F} .$

Exercise 3.23.17. Let A and B be matrices of dimension m × n and n × m, respectively. Show that

$\det (I_{m} + A B) = \det (I_{n} + B A)$

Exercise 3.23.18. Let $S$ be a subspace of an inner-product vector space $V$ , and let v ∈ $V$ . Follow the steps below to prove Theorem 3.12.1.

• Suppose s ∈ $S$ and v – s is orthogonal to all s* ∈ $S$ ; i.e.,

$〈 v - s | s^{*} 〉 = 0 \forall s^{*} \in S .$

Let s′ ∈ $S$ and s′ ≠ s. By rewriting the difference v − s′ as

$v - s^{'} = (v - s) + (s - s^{'}),$

show that

${‖ v - s^{'} ‖}^{2} \geq {‖ v - s ‖}^{2} .$

• Conversely, now let ‖v −s′‖², ≤ ‖v−s‖² ∀s′ ∈ S. Show that

$2 ℜ (〈 v - s | s^{*} 〉) + {‖ s^{*} ‖}^{2} \geq 0, \forall s^{*} \in S .$

Then by a suitable choice for s* show that 〈v − s|s − s′〉 =0 and conclude that v – s is orthogonal to every vector in S.

• Show that the best approximation is unique.

• Let $\sum_{k = 1}^{m} \frac{〈 v | s_{k} 〉}{‖ s_{k} ‖} S_{k}$ Show that v–s is orthogonal to s_j , j = 1,…, m, and it is orthogonal to every vector in S. Therefore, s is the best approximation of v by vectors in S.

Exercise 3.23.19. Let $S$ be a finite-dimensional subspace of inner-product vector space $V$ . Suppose P is an orthogonal projection of $V$ onto $S$ .

• Prove that the mapping defined by matrix I − P which maps any vector v ∈ $V$ onto v − Pv is actually the orthogonal projection of $V$ onto $S$ ^⊥.

• Show that P is an idempotent linear mapping from $V$ to $S$ .

• Show that

$P s^{'} = 0 \Leftrightarrow s^{'} \in S^{⊥} .$

• Show that

$V = S \oplus S^{⊥} .$

Exercise 3.23.20. This problem pertains to the proof of Theorem 3.13.1.

• Using the best approximation theorem and the four fundamental subspaces related to T, show that the minimum error (T(v) − w) should be in the nullspace of T^*.

• Conclude that

$T^{*} T (v) = T^{*} (w) .$

Exercise 3.23.21. Show that if an n × n matrix A has distinct eigenvalues, the corresponding eigenvectors should be linearly independent.

Exercise 3.23.22. Let v_i and v_j be two eigenvectors of an n × n Hermitian matrix A, corresponding to different eigenvalues λ_i and λ_j, respectively. Compute $v_{i}^{†} v_{j}$ and show that the eigenvectors are orthogonal. Moreover, by computing $v_{i}^{†} A v_{i}$ , show that A’s eigenvalues are real. Use these results to prove the claims of Theorem 3.15.2.

Exercise 3.23.23. This problem is related to diagonalizing a matrix (Theorem 3.15.3). Suppose v₁, v₂,…, v_n are n linearly independent eigenvectors of an n × n matrix A, corresponding to eigenvalues λ₁, λ₂,… ,λ_n, respectively.

• Show that

$A [v_{1}, v_{2}, \dots, v_{n}] = [v_{1}, v_{2}, \dots, v_{n}] [\begin{matrix} λ_{1} \\ ⋱ \\ λ_{n} \end{matrix}] .$

Define S = [v₁, v₂,…, v_n] and Λ = diag (λ₁,…, λ_n), where diag (·) is a diagonal matrix. Note that S is invertible. Conclude that A is diagonalizable.

• Conversely, let S be an invertible matrix such that

$S^{- 1} A S = Λ,$

where Λ is diagonal. Show that the i^th column of S is an eigenvector of A, corresponding to the i^th diagonal entry of Λ.

Exercise 3.23.24. Using the SVD of matrix A, show that

${‖ A ‖}_{F} = {(σ_{1}^{2} + σ_{2}^{2} + \dots + σ_{n}^{2})}^{1 / 2},$

where n is the rank of A and σ_i (i = 1,…, n) are the singular values of A.

Exercise 3.23.25. Show that the multiplication of block matrices $A = [\begin{matrix} A_{11} A_{12} \\ A_{21} A_{22} \end{matrix}]$ and $B = [\begin{matrix} B_{11} B_{12} \\ B_{21} B_{22} \end{matrix}]$ can be expressed as:

$A B = [\begin{matrix} A_{11} B_{11} + A_{12} B_{21} A_{1, 1} B_{12} + A_{12} B_{22} \\ A_{21} B_{11} + A_{22} B_{21} A_{21} B_{12} + A_{22} B_{22} \end{matrix}] .$

Exercise 3.23.26. Let the permutation matrix ∏_m be defined as follows:

$Π_{m} = c i r c (0, 1, 0, \dots, 0) = [e_{m}, e_{1}, e_{2}, \dots, e_{m - 1}],$

with

$e_{i} = \begin{matrix} \begin{array}{l} 1 & \dots & i - 1 & i & i + 1 & \dots & m^{T} \end{array} \\ (\begin{array}{l} 0 & \dots & 0 & 1 & 0 & \dots & 0 \end{array}) . \end{matrix}$

Prove that the square matrix A is circulant if and only if $A = \prod_{m} A \prod_{m}^{T}$ .

Exercise 3.23.27. Using the result proved in Exercise 3.23.26, justify the following properties.

• If A and B are circulant matrices, then AB is a circulant matrix.

• If A is circulant, then A⁻¹ is a circulant matrix.

Exercise 3.23.28. Show that:

• The eigenvalues of a circulant matrix (c_o, c₁,…, c_n₋₁) are

$λ_{k} = \sum_{i = 0}^{n - 1} c_{i} e^{- j 2 π i k / n} .$

• The corresponding eigenvectors are

$x_{k} = \frac{1}{\sqrt{n}} {[\begin{array}{l} 1 & e^{- j 2 π k / n} & e^{- j 2 π (2 k) / n} & \dots & e^{- j 2 π (n - 1) k / n} \end{array}]}^{T} .$

Exercise 3.23.29. Using the equation for determinant of a Vandermonde matrix, what conditions must the scalars a₁, a₂, …, a_n satisfy in order for Vandermonde matrix A to be invertible?

$A = [\begin{array}{l} 1 & a_{1} & a_{1}^{2} & \dots & a_{1}^{m - 1} \\ 1 & a_{2} & a_{2}^{2} & \dots & a_{2}^{m - 1} \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ 1 & a_{m} & a_{m}^{2} & \dots & a_{m}^{m - 1} \end{array}] .$

Exercise 3.23.30. Consider the square matrix A. Prove that:

1. If A is positive definite, then all its eigenvalues are positive.

2. If all eigenvalues of A are positive, A is positive definite.

Exercise 3.23.31. Prove that the following factorization is valid for the block-partitioned matrix M:

$M = [\begin{matrix} A & B \\ C & D \end{matrix}] = [\begin{matrix} A & 0 \\ C & E \end{matrix}] [\begin{matrix} I A^{- 1} & B \\ 0 & I \end{matrix}],$

where E is the Schur complement of submatrix A in matrix M.

Exercise 3.23.32. Consider the block matrix

$M = [\begin{matrix} A & B \\ C & D \end{matrix}] .$

Prove that the determinant of matrix M is equal to the product of determinants of A and its Schur complement.

Exercise 3.23.33. Show that the generalized inverse of the block matrix

$M = [\begin{matrix} A & B \\ C & D \end{matrix}]$

is given by:

$M^{- 1} = [\begin{matrix} A^{-} + A^{-} B E^{-} C A^{-} - A^{-} B E^{-} \\ - E^{-} C A^{-} E^{-} \end{matrix}],$

where E = D − CA⁻B.

Exercise 3.23.34. Prove the following properties of the Kronecker product. Matrices A, B, C and D are assumed of appropriate sizes.

1. (A + B) ⊗ C = A ⊗C + B ⊗ C.

2. (A ⊗ B) ⊗ C = A ⊗ (B ⊗ C).

3. (αA) ⊗ (βB) = αβ (A ⊗ B).

4. (A ⊗ B) (C ⊗ D) = (AC) ⊗ (BD).

Exercise 3.23.35. Prove the vector and matrix derivative results presented in Tables 3.1 and 3.2, respectively.

References

[1] J. E. Gentle, Matrix Algebra: Theory, Computations, and Applications in Statistics. NY: Springer, 2010.

[2] S. Leon, Linear Algebra with Applications. 9th ed., NY: Prentice Hall, 2009.

[3] A. J. Laub, Matrix Analysis for Scientists and Engineers, Philadephia: SIAM, 2004.

[4] P. Lax, Linear Algebra and Its Applications. 2nd ed., NY: Wiley Interscience, 2007.

[5] C. Meyer, Matrix Analysis and Applied Linear Algebra Book. Philadephia: SIAM, 2001.

[6] S. Roman, Advanced Linear Algebra. 3rd ed., NY: Springer, 2010.

[7] G. A. F. Seber, A Matrix Handbook for Statisticians. NY: Wiley, 2008.

[8] G. Strang, Linear Algebra and Its Applications. Cengage Learning, Florence, KY: Brooks Cole, 2005.

[9] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge: Cambridge University Press, 1985.

[10] S. Kay, Fundamentals of Statistical Signal Processing, vol. I: Estimation Theory, vol. II: Detection Theory. 1st ed., NY: Prentice Hall, 1993.

[11] A. H. Sayed, Adaptive Filters. 1st ed., NY: Wiley-IEEE Press, 2008.

[12] P. Stoica and R. L. Moses, Spectral Analysis of Signals. NY: Prentice Hall, 2005. 91

[13] J. Dauxois and G. M. Nkiet, “Canonical analysis of two Euclidean subspaces and its applications,” Linear Algebra Appl., vol. 264, no. 10, pp. 355–388, October 1997.

[14] R. Gittins, Canonical Analysis: A Review with Applications in Ecology, Biomathematic, Berlin: Springer-Verlag, 1985.

[15] T. Kailath, “A view of three decades of linear filtering theory,” IEEE Transactions on Information Theory, vol. 20, no. 2, pp. 146–181, February 1974.

[16] I. M. Gelfand and A. M. Yaglom, “Calculation of the amount of information about a random function contained in another such function,” American Mathematical Society Translations, series vol. 12, pp. 199–246, 1959.

[17] H. Akaike, “Stochastic theory of minimal realization,” IEEE Transactions on Automatic Control, pp. 667–674, 1974.

[18] P. E. Caines, Linear Stochastic Systems. NY: Wiley, 1988.

[19] P. V. Oversche and B. de Moor, Subspace Identification for Linear Systems: Theory, Implementation and Applications. Boston: Kluwer Academics Publishers, 1996.

[20] T. Kim, O. Arandjelovic, and R. Cipolla, “Boosted manifold principal angles for image set-based recognition,” Pattern Recognition, vol. 40, issue 9, pp. 2475–2484, September 2007.

[21] L. Wolf and A. Shashua, “Learning over Sets using Kernel Principal Angles,” Journal of Machine Learning Research, vol. 4, pp. 913–931, 2003.

[22] R. T. Behrens and L. Scharf, “Signal Processing Applications of Oblique Projection Operators,” IEEE Transactions on Signal Processing, vol. 42, no. 6, pp. 1413–1414, June 1994.

[23] I. E. Telatar, “Capacity of multi-antenna Gaussian channels,” Eur. Trans. Telecommun., vol. 10, no. 6, pp. 585–595, November 1999.

[24] G. Golub and C. F. van Loan, Matrix Computations. 3rd. ed., Baltimore: John Hopkins University Press, 1996.

[25] I. Jolliffe, Principal Component Analysis. Berlin: Springer, 2002.

[26] S. K. Mitra, Digital Signal Processing. NY: McGraw-Hill, 2005.

[27] R. M. Gray, Toeplitz and Circulant Matrices: A Review. Boston-Delft: Now Publishers Inc., 2005.

[28] M. Verhaegen, “Identification of the Deterministic Part of MIMO State Space Models given in Innovations Form from Input-Output Data,” Automatica, vol. 30, no. 1, pp. 61–74, January 1994.92

[29] R. Bhatia, Positive Definite Matrices. Princeton: Princeton University Press, 2006.

[30] S. Haykin, Adaptive Filter Theory. NY: Prentice Hall, 2002.

[31] T. Soderstrom and P. Stoica, System Identification. NY: Prentice Hall, 1994.

[32] F. Zhang, The Schur Complement and Its Applications. Berlin: Springer, 2010.

[33] C. R. Rao and S. K. Mitra, Generalized Inverse of Matrices and Its Applications. NY: Wiley, 1972.

[34] T. K. Moon and W. C. Stirling, Mathematical Methods and Algorithms for Signal Processing. Upper Saddle River, NJ: Prentice-Hall, 2000.

[35] S. B. Wicker, Error Control Systems for Digital Communication and Storage. Upper Saddle River, NJ: Prentice-Hall, 1995.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 3.12 Best Approximations and Orthogonal Projections

Create new playlist

Sign In

Sign Up

Table of Contents for
3.12 Best Approximations and Orthogonal Projections