Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter Ten

Gaussian Random Vectors

10.1 Introduction/Purpose of the Chapter

Gaussian random variables and Gaussian random vectors (vectors whose components are jointly Gaussian, as defined in this chapter) play a central role in modeling real-life processes. Part of the reason for this is that noise like quantities encountered in many practical applications are reasonably modeled as Gaussian. Another reason is that Gaussian random variables and vectors turn out to be remarkably easy to work with (after an initial period of learning their features). Jointly Gaussian random variables are completely described by their means and covariances, which is part of the simplicity of working with them. Estimating these joint Gaussians means approximating only their means and covariances.

A third reason why Gaussian random variables and vectors are so important is that, in many cases, the performance measures we get for estimation and detection problems for the Gaussian case often bounds the performance for other random variables with the same means and covariances. For example, the minimum mean square estimator for Gaussian problems is the same as the linear least squares estimator for other problems with the same mean and covariance and, furthermore, has the same mean square performance. We will also find that this estimator has a simple expression as a linear function of the observations. Finally, we will find that the minimum mean square estimator for non-Gaussian problems always has a better performance than that for Gaussian problems with the same mean and covariance, but that the estimator is typically much more complex. The point of this example is that non-Gaussian problems are often more easily and more deeply understood if we first understand the corresponding Gaussian problem.

In this chapter, we develop the most important properties of Gaussian random variables and vectors, namely the moment-generating function, the moments, the joint densities, and the conditional probability densities.

10.2 Vignette/Historical Notes

The Gaussian distribution is named after Johann Carl Friedrich Gauss (30 April 1777–23 February 1855), one of the most famous mathematicians and physical scientists of the 18th century. Besides 301 probability and statistics, Gauss contributed significantly to many other fields, including number theory, analysis, differential geometry, geodesy, geophysics, electrostatics, astronomy, and optics.

Gauss work on a theory of the motion of planetoids disturbed by large planets, eventually published in 1809 as Theoria motus corporum coelestium in sectionibus conicis solem ambientum (Theory of motion of the celestial bodies moving in conic sections around the sun), contained an influential treatment of the method of least squares, a procedure which is used today in all sciences to minimize the impact of measurement error. Gauss proved the method under the assumption of normally distributed errors, a methodology that today is the first step in analysis of errors produced by very complex processes.

With the proper formulation of the Brownian motion (Wiener process) by Norbert Wiener in the 1950s the Gaussian process became mainstream for the theory of stochastic processes. The chaos theory developed in the 1970s owes a lot of gratitude to Gaussian processes to be able to produce useful formulas and bounds. Most of the machine learning theory uses assumptions about errors behaving like Gaussian processes. For all these reasons, learning about them is a crucial skill to be acquired by any aspiring student in probability.

10.3 Theory and Applications

10.3.1 The Basics

Let us first recall that a random variable follows the normal law N(μ, σ²) with σ > 0 if its density is given by

Before introducing the concept of Gaussian vector, let us list some properties of Gaussian random variables, which are already proven in the previous chapters of this book.

The parameters μ and σ completely characterize the law of X.
X ∼ N(μ, σ²) if and only if

where Z ∼ N(0, 1) is a standard normal random variable (a N(0, 1) r.v.). See Remark 5.27.

If X ∼ N(μ, σ²) and we denote μ_p = E(X − EX)^p = E(X − μ)^p, the p-central moment, then

and

Note in particular that the third central moment is zero, and the fourth central moment is equal to 3σ⁴. We point this out because these are crucial to describe the shape of distributions. The third standardized moment of a random variable X is defined as

and is a measure of the skew of a distribution (how much it deviates from being symmetric). This measure is typically compared to 0 which shows very little skew. A negative value of the measure indicates that the random variable is more likely.

The fourth standardized moment of a random variable X is defined as

and is a measure of the tails of the distribution compared with the tails of a Gaussian random variable. Once again, recall that the kurtosis for a Gaussian is equal to 3.

A random variable with Kurtosis(X) > 3 is said to have a leptokurtic distribution—in essence the random variable takes large values more often than an equivalent normal variable. These random variables are crucial in finance where the observed behavior of returns calculated from price processes have this property. The Cauchy and t distributions are examples of distributions having this property.

A random variable with Kurtosis(X) < 3 is said to have a platykurtic distribution (extreme observations are less frequent than a corresponding normal). An example is the uniform distribution on a finite interval.

If X ∼ N(μ, σ²), then

for every

, a < b, where

is the cumulative distribution function of the standard normal N(0, 1) (its values can be usually found in the table of the normal distribution).

The sum of two independent normal random variables is normal; that is, if , and X, Y are independent, then

The result has been proven in Proposition 7.29. Clearly, the result can be easily extended by induction to finite sums of independent Gaussian random variables. That is, if

for i = 1, …, n, then

The characteristic function of X ∼ N(μ, σ²) is

(10.1)

for every

The moment-generating function of X ∼ N(0, 1) is (see Proposition 9.16)

for every

10.3.2 Equivalent Definitions of a Gaussian Vector

Let us now define the Gaussian vector.

Definition 10.1 A random vector X = (X₁, …, X_d) defined on the probability space

is Gaussian if every linear combination of its components is a Gaussian random variable. That is, for every

the r.v.

is Gaussian.

The following two facts are immediate.

Proposition 10.2

1. If X = (X₁, …, X_d), then X_i is a Gaussian random variable for every i = 1, …, d.

2. If X_i is a Gaussian random variable for every i = 1, …, d and the r.v. X_i are mutually independent, then the vector

is a Gaussian vector.

Proof: 1. Indeed, this follows from the definition by choosing α_i = 1 and α_j = 0 for every j ≠ i, j = 1, …, d.

2. It is a consequence of the definition of the Gaussian vector and of the Proposition 7.29. We mention that the assumption that X_i are independent is crucial. In this chapter, we will see examples showing that it is possible to have random vectors with each component Gaussian which are not Gaussian.

Remark 10.3 A standard normal Gaussian vector is a random vector

such that X_i ∼ N(0, 1) for every i = 1, …, d and X_i are independent. In this case the mean of X is

and the covariance matrix is I_d, the unit matrix (matrix with 1 on the main diagonal and zero everywhere else). We denote

As we mention above, the mean and the variance characterize the law of a Gaussian random variable (one-dimensional). In the multidimensional case, the mean (which is a vector) and the covariance matrix will completely determine the law of a Gaussian vector.

Recall that if X = (X₁, …, X_d) is a random vector, then

and the covariance matrix of X, denoted by Λ_X = (Λ_X(i, j))_i,j=1,…,d, is defined by

for every i, j = 1, …, d.

Let us first remark that the mean and the covariance matrix of a Gaussian vector entirely characterize the first two moments (and thus the distribution) of every linear combination of the components of the vector. We recall the notation of a scalar product of two vectors:

if , and we denoted by x^T the transpose of the d × 1 matrix x.

Proposition 10.4 Let X = (X₁, …, X_d) be a d-dimensional Gaussian vector with mean vector μ and covariance matrix Λ_X. Let

. Define

a linear combination of the vector components. Then

Proof: It follows from Definition 10.1 that Y is a Gaussian r.v. (a linear combination of Gaussian random variables). It remains to compute its expectation and its variance. First

and

by noticing that the components of the vector are

Using Proposition 10.4, it is possible to obtain the characteristic function of a Gaussian vector.

Theorem 10.5 Let X = (X₁, …, X_d) be a Gaussian vector and denote by μ its expectation vector and by Λ_X its covariance matrix. Then the characteristic function of X is

(10.2)

for every

. Here we applied Proposition 8.23.

Proof: By definition, we have

Using Proposition 10.4, we have that, by Eq. (10.1),

where Y = X, u . It suffices to note that

for every .

Remark 10.6 Often in the literature, a Gaussian vector is defined through its characteristic function given in Theorem 10.5. That is, a random vector X is called a Gaussian random vector if its characteristic function is given by

for some vector μ and some symmetric positive definite matrix Σ.

Theorem 10.7 If X is a d-dimensional Gaussian vector, then there exists a vector

and a d-dimensional square matrix

such that

(where = ^(d) stands for the equality in distribution) where N ∼ N(0, I_d).

Proof: Let and define

For every we have

Suppose that X is a Gaussian vector. Since Λ_X is symmetric and positive definite, there exists such that

Let N₁ ∼ N(0, I_d). We apply Theorem 10.5 and the beginning of this proof. It follows that X and m + AN₁ have the same characteristic function and therefore the same law.

The converse of Theorem 10.7 is also true.

Theorem 10.8 If

with

, and N ∼ N(0, 1), then X is a Gaussian vector.

Proof: Let

and consider the linear combination

where (AN)_i is the component i of the vector , i = 1, …, d. Then, due to Proposition 10.4, we have

So every linear combination of the components of X is a Gaussian random variable, which means that X is a Gaussian vector.

By putting together the results in Theorems 10.5, 10.7, and 10.8, we obtain three alternative characterization of a Gaussian vector.

Theorem 10.9 Let X = (X₁, …, X_d) be a random vector with μ = EX the mean vector and Λ_X the covariance matrix. Then the following are equivalent.

1. X is a Gaussian vector.

2. The characteristic function of X is given by

for every

3. There exist a vector

and a matrix

such that

(equality in distribution), where Z ∼ N(0, I_d) is a standard Gaussian vector.

Proof: The implication 1 2 follows from Theorem 10.5. The implication 2 3 is a consequence of the proof of Theorem 10.7. The implication 3 1 has been showed in Theorem 10.8.

Remark 10.10 Let us notice that the sum of two independent Gaussian vectors is a Gaussian vector. This follows easily from the definition of a Gaussian vector and the additivity property of the Gaussian random variables.

Example 10.1 A Vector with all Components Gaussian but Which is not a Gaussian Vector

Let us give an example of a vector with each component Gaussian which is not a Gaussian vector. Consider N₁ ∼ N(0, 1) and define the random variable N₂ by

(a > 0 is some fixed constant) and

Let us show that N₂ is a Gaussian random variable. We compute its cumulative distribution function. It holds that

As a consequence, N₂ has the same law as N₁, so

However, the vector

is not a Gaussian vector. Indeed, the sum

(which constitutes a linear combination of the components of (N₁, N₂)) is such that

so S has strictly positive probability for a fixed value; thus it cannot be a random variable with a normal density (or any other continuous density).

Remark 10.11 See also Example 10.5 for a situation when a non-Gaussian random vector has Gaussian marginals.

10.3.3 Uncorrelated Components and Independence

One of the most important properties of a Gaussian vector is the fact that its components are independent if and only if they are uncorrelated. One direction is always true for every random variable: If two random variables are independent, then they are uncorrelated. On the other hand, the converse is strongly related to the structure of the Gaussian vector, and it does not hold in general for other random variables. In Example 10.5 we show that that there exist Gaussian random variables (without the gaussian vector structure) which are uncorrelated but are not independent.

Theorem 10.12 Let X = (X₁, …, X_d) be a d-dimensional Gaussian vector. Then for every i, j = 1, …, d, i ≠ j, the random variables X_i and X_j are independent if and only if

Proof: If X_i is independent by X_j, then clearly Cov(X_i, X_j) = 0, i ≠ j.

Suppose that Cov(X_i, X_j) = 0, i ≠ j. Denote by μ = EX and Λ_X = (λ_i,j)_i,j=1,…,d the mean vector and covariance matrix of the vector X. Since the individual covariances are zero, this matrix is a diagonal matrix and we have

for every . We used the fact that for every j = 1, …, d, one has

Since the characteristic function of the vector is the product of individual characteristic functions, the components of the vector X are independent.

Example 10.2

Let X, Y be two independent standard normal random variables and define

Then U and V are independent.

Solution: Indeed, it is easy to see that (U, V) is a Gaussian vector (every linear combination of its components is a linear combination of X and Y and (X, Y) is a Gaussian vector). Moreover,

and the independence is obtained from Theorem 10.12.

Example 10.3

Let X₁, X₂, X₃, X₄ be four independent standard normal random variables. Define

We will show that (Y₁, Y₂, Y₃) is a Gaussian vector with independent components.

Solution: Let us first show that Y is a Gaussian vector. We note that

is a Gaussian vector with zero mean and covariance matrix I₄. Take . Then

and this is a Gaussian r.v. since X is a Gaussian vector. Moreover,

In the same way, we have

so Y₁, Y₂, Y₃ are independent random variables. In fact, the vector Y is a Gaussian vector with EY = 0 and covariance matrix

Example 10.4

Let (X, Y) be a Gaussian couple with mean the zero vector such that

and

. Find a scalar α such that X − αY and Y are independent random variables.

Solution: Since X − αY and Y are Gaussian random variables, it suffices to impose the condition

this implies

where we denoted with

the correlation coefficient between X and Y.

Example 10.5 Two Uncorrelated Normals Which are not Independent

Let X ∼ N(0, 1) and let ε be a random variable such that

Suppose that X and ε are independent and define

Then, both of these random variables (X and Y) are normal, they are uncorrelated but they are not independent.

Solution: Let us show first that

Indeed, by computing the cumulative distribution function of Y, we get for every

by using the fact that −X ∼ N(0, 1). So X, Y have the same law N(0, 1). Let us show that X, Y are uncorrelated. Indeed,

since Eε = 0. But it is easy to see that X and Y are not independent.

This example shows that it is possible to find two Gaussian uncorrelated r.v which are not independent. The reason why they are not independent is that the vector (X, Y) is not a Gaussian vector (see exercise 7.8).

A more general result can be stated as follows. The proof follows the arguments in the proof of Theorem 10.12.

Theorem 10.13 Suppose X ∼ N(0, Λ_X) with

. Suppose that the components of X can be divided into two groups (X_i)_iI and (X_j)_j∉I, where I ⊂ {1, 2, …, n}, and further suppose that

for all i

I and j ∉ I. Then the family (X_i)_iI is independent of the family (X_j)_j∉I.

Remark 10.14 A Gaussian vector X is called degenerated if the covariance matrix Λ_X is not invertible (i.e., det Λ_X = 0). For example, the vector

with X₁ a Gaussian r.v., is a degenerated Gaussian vector.

10.3.4 The Density of a Gaussian Vector

Let X = (X₁, …, X_d) be a Gaussian vector with independent components. Assume that for every i = 1, …, d we have

In this case it is easy to write the density of the vector X. It is, from Corollary 7.23,

(10.3) equation

In the case of a standard normal vector X ∼ N(0, I_d), we have

When the components of X are not independent, we have the following.

Theorem 10.15 Let X = (X₁, …, X_d) be a Gaussian vector with μ = EX and covariance matrix denoted by Λ_X. Assume that Λ_X is invertible. Then X admits the following probability density function:

(10.4)

for any

Proof: From Theorem 10.7 we can write

where

We apply the change of variable formula (Theorem 7.24) to the function :

We have

We obtain

where f_N denotes the density of the vector N ∼ N(0, I_d). Since

and

we obtain the conclusion of the theorem.

Remark 10.16 The above result applies only to non-degenerated Gaussian vectors. Indeed, if det Λ_X = 0, the formula (10.4) has no sense. In fact, if X is a degenerated Gaussian vector, then X does not have a probability density function.

However, in the case of degenerated Gaussian vectors, we have the property stated by the next proposition. We need to recall the notion of rank of a matrix. The rank of a matrix is the dimension of the vector space generated by its columns (or its rows). It is the largest possible dimension of a minor with determinant different from zero. A minor in a matrix can be constructed by eliminating any number of rows and columns. Obviously, if the d-dimensional matrix is invertible, then the rank of the matrix is d.

Proposition 10.17 Let X ∼ N(μ, Λ_X) in

and assume that

a degenerated random vector.

Then, there exists a vector space of dimension d − k such that a, X = a^TX is a constant random variable for every a

Example 10.6

Let X = (X₁, X₁) with X₁ Gaussian one-dimensional. Then clearly

Let

Then H is a subspace of

and

C, X

= C^TX = 0 for every C

Example 10.7

Let X be a two-dimensional Gaussian vector with zero mean and covariance matrix

Let us write the density of the vector X. First detΛ_X = 15 and

Therefore

Remark 10.18

1. As we mentioned, we can see that μ and Λ_X completely characterize the law of a Gaussian vector. This is not true for other types of random vectors.

2. It is easy to see that in the case when the components of X are independent, formula (10.4) reduces to (10.3).

In the case of a Gaussian vector of dimension 2 (a Gaussian couple), we have the following:

Proposition 10.19 Let X = (X₁, X₂) be a Gaussian couple with

and

where we denoted the correlation,

Assume ρ² ≠ 1. Then the density of the vector X is

Proof: This follows from Theorem 10.4 since

10.3.5 Cochran's Theorem

Recall that if X ∼ N(0, 1), then , the gamma distribution with parameters and . This law is called the chi square distribution and is usually denoted by χ²(1), with 1 denoting one degree of freedom. More generally, if X₁, …, X_d are independent standard normal random variables, then

follows the law and this is called the chi square distribution with d degrees of freedom, denoted by χ²(d).

This situation can be extended to nonstandard normal random variables.

Definition 10.20 If X = (X₁, …, X_d) is a Gaussian vector with EX = μ and Λ_X = I_d, then the law of ||X||² is denoted by

and it is called the noncentral chi square distribution with d degrees of freedom and noncentrality parameter μ. When ||μ|| = 0, then obviously χ²(d, 0) = χ²(d) .

Recall that the modulus (or the Euclidean norm) of a d-dimensional vector is

Remark 10.21 The law of the random variable ||X||² depends only on ||μ|| and d. Indeed, if is such that

with ||μ′|| = ||μ||, then there exists an orthonormal matrix U such that

Therefore,

and

Remark 10.22 We know that the density of the χ²(d) law is

In the case of the noncentral chi square distribution χ²(d, a) we can prove that the density is

(10.5)

where Y_q denotes a random variable with distribution χ²(q).

We can now state the Cochran theorem.

Theorem 10.23 (Cochran's theorem) Assume

where E_i, i = 1, …, r are orthogonal subspaces of

with dimension d₁, …, d_r respectively. Denote by

the orthogonal projection of X on the subspace E_i.

Then

are independent random vectors,

are also independent, and

where is the projection of μ on E_i for every i = 1, …, r.

Proof: Let

be an orthonormal basis of E_j. Then

The random vectors are independent of distribution , so the random vectors are independent. To finish, it suffices to remark that

for every j = 1, …, d.

Let us give an important application of the Cochran's theorem to Gaussian random vectors.

Proposition 10.24 Let X = (X₁, …, X_n) denote a Gaussian vector with independent identically distributed N(μ, σ²) components. Let us define

and

the sample mean and sample variance respectively.

Then

and

are independent.

Proof: We set for every i = 1, …, n

Then Y_i are independent identically distributed N(0, 1). We also set

(by Vect(e) we mean the vector space of generated by the vector e). Then

The projections of Y = (Y₁, …, Y_n) on E, E^⊥ are independent and given by

and

We therefore have

and

which gives the conclusion.

10.3.6 Matrix Diagonalization and Gaussian Vectors

We start with some notion concerning eigenvalues and eigenvectors of a matrix.

Definition 10.25 Let A be a matrix in

. We say that λ is an eigenvalue of the matrix A if there exists a vector

, u ≠ 0 such that

In this case we will say that u is an eigenvector of the matrix A.

Remark 10.26 For every vector we have I_nu = u (where we denoted with I_n the identity matrix in ). This implies that every vector is an eigenvector of the identity matrix I_n associated to the eigenvalue λ = 1.

If D = Diag(D₁, …, D_n) is a diagonal matrix, then every D_i, i = 1, …, n, is an eigenvalue of the matrix D and every vector e_i = (0, 0, …, 1, …, 0) of the canonical basis of

is an eigenvector of D associated to the eigenvalue D_i.

Proposition 10.27 If λ is an eigenvalue of the matrix , then the set E_λ of the eigenvectors of A associated to λ is a vectorial subspace of .

Proof: Let λ be an eigenvalue of A. If u E_λ is a corresponding eigenvector, then for every , α ≠ 0 we have

(10.6)

which shows that E_λ is closed under multiplication with scalars.

If is another vector in E_λ, then

(10.7)

Relations (10.6) and (10.7) show that E_λ is a vector space in .

Definition 10.28 The vector space E_λ is called the eigenspace associated with the eigenvalue λ.

Definition 10.29 If

is an n-dimensional matrix, we define

the kernel or the null space of the matrix A.

Proposition 10.30 Let . Then the eigenvalues of A are solutions of the equation

and the eigenvectors associated with λ are the elements of Ker(A − λI_n).

Proof: If λ is an eigenvalue for A, there exists u ≠ 0, such that Au = λu; therefore

This implies that the matrix A − λI_n is not invertible and thus det (A − λI_n) = 0 .

Conversely, if det (A − λI_n) = 0, then A − λI_n is not invertible so there exists nonidentically zero such that (A − λI_n)u = 0 .

Finally, if λ is an eigenvalue of A, then the set of associated eigenvectors are the vectors satisfying (A − λI_n)u = 0, which in fact is the definition of Ker(A − λI_n).

Example 10.8

Let

then

and det (A − λI₂) = λ² − 3λ − 4 . The solution of the equation det (A − λI₂) = (which are the eigenvalues of A) is

Further,

and

Definition 10.31 A matrix is diagonalisable if there exists a matrix invertible such that

where D is a diagonal matrix.

In the case when A is diagonalizable, every column of the matrix P represents an eigenvector for A and the diagonal matrix D contains on its diagonal the eigenvalues of A. Each column i is an eigenvector for the eigenvalue i on the diagonal of D.

The following results apply to any random vector and not only the Gaussian random vectors. We shall review the requirement of a covariance matrix.

Definition 10.32 A matrix A is called symmetric iff it is equal to its transpose A = A^T or element-wise a_ij = a_ji for all i, j. Note that from definition a symmetric matrix needs to be a square matrix (number of columns equal to the number of rows).

A d × d-dimensional matrix A is called positive definite if and only if

for any

, u ≠ 0.

The matrix A is called non-negative definite if and only if

Please note that u^TAu is a number, thus its sign is unique. Further, we always consider vectors in as matrices having dimension d × 1.

Proposition 10.33 Let be a random vector with mean μ and covariance matrix Λ_X. Then the matrix Λ_X is symmetric and non-negative definite.

Proof: To prove this proposition, let us first remark that the covariance matrix is a square matrix. Next, the element on row i column j is

thus the matrix must be symmetric.

About the positive definiteness for any , we can construct the one-dimensional random variable: u^TX. Since this is a valid random variable, its variance must be non-negative. So let us calculate this variance:

Since the number squared is one-dimensional, and a one-dimensional number is equal to its transpose, we may write

Thus, the condition that variance is non-negative translates into the condition that the covariance matrix is non-negative definite.

Remark 10.34 The distinction between non-negative and positive definite matrices is important for random variables. If it is possible to create a random variable which is identically zero, then its variance will be zero. If the components are independent, then the covariance matrix will be positive definite.

As a simple example, consider the random vector X = (X₁, − X₁) for some random variable X₁. The covariance matrix of the vector will be non-negative definite since there exists the vector and u^TX = 0 and thus its variance will be zero.

Checking that a square matrix is positive definite can be complicated. However, there is an easy way to check involving the eigenvalues of the matrix.

Lemma 10.35 A matrix is positive definite if all its eigenvalues are real and positive. A matrix is non-negative definite if all eigenvalues are non-negative.

The following result is important because it can be applied to the covariance matrices.

Theorem 10.36 Let A be a symmetric, positive definite n × n matrix. Then A is diagonalizable by an orthonormal matrix P. That is, there exists an orthonormal matrix P (i.e., P^T = P⁻¹) such that

with

a diagonal matrix.

The above result says that every symmetric positive definite matrix is diagonalizable in an orthonormal basis. That is, it can be transformed by elementary transformations into a diagonal matrix.

Definition 10.37 Two vectors x = (x₁, …, x_n) and y = (y₁, …, y_n) in are called orthogonal if

A vector is called with norm 1 if .

The vectors x and y are called orthonormal if they are orthogonal and ||x|| = ||y|| = 1.

Theorem 10.38 Let X be a d-dimensional Gaussian vector with zero mean and covariance matrix Λ_X. Then there exists a matrix such that BX is a Gaussian vector with independent components.

Proof: The proof follows by using Theorem 10.3.36. Since Λ_X is a symmetric matrix, it can be diagonalizable in an orthonormal basis. That is, there exists an orthogonal matrix B such that BΛ_XB^T is diagonal. The covariance of BX is BΛ_XB^T, so the components of BX are independent Gaussian random variables.

Example 10.9

Let X be a centered Gaussian vector with

Note first that this matrix is symmetric. To be a valid covariance matrix, it needs to be positive definite, which can be checked by looking at the sign of the eigenvalues (all eigenvalues should be positive) so it can diagonalizable. Then two eigenvalues are the solution of the equation

and thus

The matrix therefore is positive definite and symmetric. To find the eigenvectors, we need to solve using the definition Λ_Xu = λ_iu for both i = 1 and i = 2. This gives

and

The eigenspace

is generated by the vector (1, − 1), while

is generated by the vector (1, 1). This two vectors are orthogonal but not orthonormal (their euclidian norm is not 1). To make them ortonormal, we normalize them. We define

and

and therefore the matrix P given by

Then it can be checked that

where D is the diagonal matrix with the eigenvalues of Λ_X on the diagonal

Remark 10.39 If the matrix Λ_X is small, then the decomposition shown above will work. However, if the dimensionality of the matrix is large while the methodology presented here will still work, it will become very tedious. In this case, one will enroll the use of a computer and reach the same decomposition using the methodology presented in Section 6.4.

Exercises

Problems with Solution

10.1 Suppose

and define

a. Find

such that Y = (Y₁, Y₂) is a Gaussian vector with independent components.

b. Write the density of the vector Y.

Solution: Y is clearly a Gaussian vector since Y = AX with

where each component of Y is a linear combination of the original independent components Gaussian vector. To have the components of Y independent we need to impose the condition

But

Therefore if a + b = 0, the components Y₁, Y₂ are independent. Since

we will have in this case (a + b = 0)

10.2 Let X = (X₁, X₂, X₃) be a random vector in with density

a. Find the law of X. Derive k which makes the law a proper density function.

b. Let

Find the parameters a, b, c, d, e such that the covariance matrix of

is I₃.

c. What can be said about the variables X₁,

, and

Solution: Note that

where

To see this decomposition, please think about how the polynomial terms appear. It helps to note that the diagonal elements give the squares in a unique way and thus they are easy to recognize. For the off diagonal elements note that twice the element gives the coefficient (because the matrix is symmetric).

Once we write it in this form, we recognize the density of aGaussian vector X with zero mean and covariance matrix

Consequently,

To solve part (b), we need to impose that

and

To impose these conditions, we need to calculate the covariance matrix:

Thus we need to know how to invert a matrix or to use a software program to do so. Using R gives

The command “solve(M)” is the R command to find the inverse of the matrix M. We can now read the covariances between the original vector components X₁, X₂, X₃.

Now, using the matrix above and the formulas for the vector, the conditions are

V (X₁) is already 1. Using a = b and c = d from the first two equations, the later equations become

The first two equations are incompatible with a = 0, so we must have c = −e and either a = 1 or a = −1. Using this in the last equation gives e = 1. Thus the problem has more than one solution. Either of these vectors

will have the desired properties.

Finally, for part (c), since either one of these vectors is Gaussian by requiring that the covariance matrix is the identity, we found components which are mutually independent.

10.3 Let X, X, Z be independent standard normal random variables. Denote

and

(10.8)

Show that U and V are independent.

Solution: Define

Clearly, A is a Gaussian vector with and covariance matrix

the identity matrix. It follows that the vector

is also a Gaussian vector (every linear combination of its components is a linear combination of X, Y, Z). We will show that the first component is independent of all the other three. To prove this since the vector is Gaussian, it suffices to show that the first component is uncorrelated with the other three. We have

and in a similar way

Therefore the r.v. X + Y + Z is independent of X − Y, X − Z, and Y − Z respectively. By the associativity property of the independence, we have that X + Y + Z is independent of (X − Y, X − Z, Y − Z) and, thus, independent of V.

10.4 Let X₁, …, X_n be independent N(0, 1) distributed random variables. Let . Give a necessary and sufficient condition on the vector a in order to have X − a, X a and a, X independent.

Solution: The vector

is an (n + 1)-dimensional Gaussian vector and for every i = 1, …, n we obtain

Therefore if we impose the condition

then all covariances between a, X and the other terms of the vector will be zero. This will accomplish what is needed in the problem.

10.5 Let X = (X₁, X₂, …, X_n) denote an n-dimensional random vector with independent components such that for every i = 1, …, n. Define

a. Give the law of

b. Let a₁, …, a_n in

. Give a necessary and sufficient condition (in terms of a₁, …, a_n) such that

and a₁X₁ +

+ a_nX_n are independent.

c. Deduce that the vector

is independent of

Solution: As a sum of independent normal random variables, is a normal random variable. Its parameters can be easily calculated as mean and variance . So

Note that the vector

is a Gaussian random vector. Indeed, every linear combination of its components is a linear combination of the components of X, so it is a Gaussian random variable. Therefore, and are independent if and only if they are uncorrelated; and after calculating the covariance, this is equivalent to

Hint for part (c): W_n is invariant by translation: W_n(X) = W_n(X + a) if X + a = (X₁ + a, . . ., X_n + a) for . Consider also Proposition 10.24.

10.6 Let (X, Y) be a Gaussian vector with mean 0 and covariance matrix

with ρ [− 1, 1]. What can be said about the random variables

Solution: Clearly, (X, Z) is a Gaussian vector as a linear transformation of a Gaussian vector. Since

we note that the r.v.'s X and Z are independent.

10.7 Suppose X ∼ N(0, 1). Prove that for every x > 0 the following inequalities hold:

Solution: Consider the following functions defined on (0, ∞):

and

We need to show that

for every x > 0, where F is the c.d.f. of an N(0, 1) distributed random variable.

Since the normal c.d.f. does not have a closed form, we look at the derivatives of these functions. We need to check that for x > 0:

where is the standard normal density. Therefore, integrating the respective positive functions on (0, ∞), we obtain

and

Problems without Solution

10.8 Suppose

Show that XY has the same law as

Hint: Use the polarization formula

10.9 Prove the expression of the density function of the noncentral chi square distribution (10.5).

10.10 Let (X, Y) be a two-dimensional Gaussian vector with zero expectation and I₂ covariance matrix. Compute

10.11 Let X, Y be two independent N(0, 1) distributed random variables. Define

a. Prove that U and V are independent.

b. Show that

c. Show that the function

defines a probability density.

d. Show that V admits as density the function f above (this distribution is called the arcsin distribution).

10.12 Let X₁, …, X_n be i.i.d. N(0, 1) random variables. Define

Compare and calculate EU and EV.

10.13 Let with

a. Let (Y₁, Y₂) a standard Gaussian vector (i.e., Y₁, Y₂ are independent standard Gaussian random variables). Find a function

such that

b. Let

and

two independent random variables with the respective distributions. Let R ≥ 0 be the square root of R².

Prove that the random variables X = R cos (Θ) and Y = R sin (Θ) are standard Gaussian and independent.

c. Deduce that if U₁, U₂ are independent

, then the r.v.s

and

are standard Gaussian and independent.

d. Consider U₁, U₂ independent with law

. Construct from U₁, U₂ a vector (X₁, X₂) with law

10.14 Suppose X ∼ N(0, Λ) where

Let

a. Give the law of Y_i, i = 1, 2, 3.

b. Write down the density of the vector

10.15 Let X = (X₁, X₂, X₃) be a Gaussian vector with law N₃(m, C) with density

where

a. Calculate m and C⁻¹.

b. Calculate C and the marginal distributions of X₁, X₂, X₃.

c. Give the law of

10.16 Let (X, Y) be a normal random vector such that X and Y are standard normal N(0, 1). Suppose that Cov(X, Y) = ρ. Let and put

a. Show that |ρ| ≤ 1.

b. Calculate E(U), E(V), Var(U), Var(V), and Cov(U, V). What can we say about the vector (U, V) ?

Suppose ρ ≠ 0. Do there exist values of θ such that U and V are independent?

c. Assume ρ = 0. Give the laws of U and V?

d. Are the r.v. U and V independent?

10.17 Let (X, Y, Z) be a Gaussian vector with mean (1, 2, 3) and covariance matrix

Set

a. Give the law of the couple

. What can be said about U and V? Write without calculation the density of (U, V).

b. Find constants c and d such that the r.v. W = Z + cU + dV is independent by (U, V).

c. Write the covariance matrix of the vector (U, V, W).

10.18 If X is a standard normal random variable N(0, 1), let ϕ denote its characteristic function and F its c.d.f. For every p ≥ 1 integer, denote the p moment with

Let (Y_n , n ≥ 1) be a sequence of independent r.v. with identical distribution N(0, 1). For every k ≥ 1 and n ≥ 1 integers, let

a. Give the distribution of X_k.

b. Calculate

. Are the variables (X_k, k ≥ 1) independent?

c. Show that

. Deduce that for every integer n ≥ 1 the r.v. S_n follows the law

Hint: We recall the formula

d. For every integer p ≥ 1, calculate E(|S_n|^p) in terms of

e. Let

. Show that the sequence (n^−α S_n , n ≥ 1) converges to 0 in L², that is,

as a sequence of numbers.

f. Show that for every β > 0 and for every p ≥ 1 integer, there exists a constant K_p > 0 such that

g. Show that the sequence (n^−α S_n , n ≥ 1) converges almost surely to an r.v. and identify this limit.

h. Calculate the characteristic function

of S_n using ϕ.

i. Let

Deduce the expression of the characteristic function of T_n.

j. Show that the sequence (T_n, n ≥ 1) converges in law to a limit and identify this limit.

10.19 Let X₁, X₂, and X₃ be three i.i.d. random variables where their distribution has zero mean and variance σ² > 0. Denote

a. Calculate the covariance matrix of the vector (X₁, Y₁, Y₂).

b. Give an upper bound for

using Bienaymé–Tchebychev inequality in Appendix B.

c. Suppose that X₁, X₂, and X₃ are Gaussian. In this case, give another upper bound for a and compare with the previous question.

d. Give an upper bound for

Hint: Use

and

10.20 If X ∼ N(0, 1) and Y ∼ χ²(n, 1) and X, Y are independent, show that

has a Student t distribution (t_n) with n degrees of freedom. Specifically, show that the probability density function of Z is given by

10.21 Assume X and Y are two independent N(0, 1) random variables. Find the law of

Hint: We know (how to prove) that X + Y and X − Y are independent and each has N(0, 2) distribution. Then, once we show that

is χ²(1) distributed, we will obtain

which follows a Student distribution with one degree of freedom (see exercise 20).

10.22 Consider the matrix

a. Find the eigenvalues of A.

b. Find the eigenspaces associated with each eigenvalue.

c. Diagonalize the matrix A.

d. Let X be a Gaussian random vector with covariance matrix A and zero mean. Find a linear transformation that transforms X in a Gaussian vector with independent components.

10.23 Consider the matrix

a. Check that λ₁ = 8 is an eigenvalue of A.

b. Find the other eigenvalues of A.

c. Find the eigenspaces associated with each eigenvalue.

d. Diagonalize the matrix A.

e. Can there exist a Gaussian random vector with covariance matrix A?

10.24 Let the matrix Σ be defined as

Check that the matrix is symmetric and positive definite. Find its eigenvalues and the associated eigenvectors. Now, let X be a Gaussian random vector with covariance matrix Σ and zero mean. Find a linear transformation that transforms X in a Gaussian vector with independent components.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter Ten: Gaussian Random Vectors

Create new playlist

Sign In

Sign Up

10.1 Introduction/Purpose of the Chapter

10.2 Vignette/Historical Notes

10.3 Theory and Applications

10.3.1 The Basics

10.3.2 Equivalent Definitions of a Gaussian Vector

10.3.3 Uncorrelated Components and Independence

10.3.4 The Density of a Gaussian Vector

10.3.5 Cochran's Theorem

10.3.6 Matrix Diagonalization and Gaussian Vectors

Exercises

Problems with Solution

Problems without Solution

Table of Contents for
Chapter Ten: Gaussian Random Vectors