Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

The set of n × n positive matrices is a differentiable manifold with a natural Riemannian structure. The geometry of this manifold is intimately connected with some matrix inequalities. In this chapter we explore this connection. Among other things, this leads to a deeper understanding of the geometric mean of positive matrices.

6.1 THE RIEMANNIAN METRIC

The space is a Hilbert space with the inner product = tr A^∗B and the associated norm ||A||₂ = (tr A^∗A)^1/2. The set of Hermitian matrices constitutes a real vector space in . The subset consisting of strictly positive matrices is an open subset in . Hence it is a differentiable manifold. The tangent space to at any of its points A is the space , identified for simplicity, with . The inner product on leads to a Riemannian metric on the manifold . At the point A this metric is given by the differential

(6.1)

This is a mnemonic for computing the length of a (piecewise) differentiable path in . If is such a path, we define its length as

(6.2)

For each X ∈ GL(n) the congruence transformation Γ_X(A) = X^∗AX is a bijection of onto itself. The composition Γ_X ◦ γ is another differentiable path in .

6.1.1 Lemma

For each X ∈ GL(n) and for each differentiable path γ

(6.3)

Proof. Using the definition of the norm || · ||₂ and the fact that tr XY = tr Y X for all X and Y we have for each t

Intergrating over t we get (6.3). ■

For any two points A and B in let

(6.4)

This gives a metric on . The triangle inequality

is a consequence of the fact that a path γ₁ from A to C can be adjoined to a path γ₂ from C to B to obtain a path from A to B. The length of this latter path is L(γ₁) + L(γ₂).

According to Lemma 6.1.1 each Γ_X is an isometry for the length L. Hence it is also an isometry for the metric δ₂; i.e.,

(6.5)

for all A, B in and X in GL(n).

This observation helps us to prove several properties of δ₂. We will see that the infimum in (6.4) is attained at a unique path joining A and B. This path is called the geodesic from A to B. We will soon obtain an explicit formula for this geodesic and for its length. The following inequality called the infinitesimal exponential metric increasing property (IEMI) plays an important role. Following the notation introduced in Exercise 2.7.15 we write De^H for the derivative of the exponential map at a point H of . This is a linear map on whose action is given as

6.1.2 Proposition (IEMI)

For all H and K in H_n we have

(6.6)

Proof. Choose an orthonormal basis in which H = diag (λ₁, . . . , λ_n). By the formula (2.40)

Therefore, the i, j entry of the matrix e^−H/2 D e^H(K) e^−H/2 is

Since (sinh x)/x ≥ 1 for all real x, the inequality (6.6) follows. ■

6.1.3 Corollary

Let H(t), a ≤ t ≤ b be any path in H_n and let γ(t) = e^H^(t). Then

(6.7)

Proof. By the chain rule . So the inequality (6.7) follows from the definition of L(γ) given by (6.2) and the IEMI (6.6). ■

If γ(t) is any path joining A and B in , then H(t) = log γ(t) is a path joining log A and log B in . The right-hand side of (6.7) is the length of this path in the Euclidean space . This is bounded below by the length of the straight line segment joining log A and log B. Thus L(γ) ≥ || log A − log B||₂, and we have the following important corollary called the exponential metric increasing property (EMI).

6.1.4 Theorem (EMI)

For each pair of points A, B in we have

(6.8)

In other words for any two matrices H and K in

(6.9)

So the map

(6.10)

increases distances, or is metric increasing.

Our next proposition says that when A and B commute there is equality in (6.8). Further the exponential map carries the line segment joining log A and log B in to the geodesic joining A and B in . A bit of notation will be helpful here. We write [H, K] for the line segment

joining two points H and K in . If A and B are two points in we write [A, B] for the geodesic from A to B. The existence of such a path is yet to be established. This is done first in the special case of commuting matrices.

6.1.5 Proposition

Let A and B be commuting matrices in . Then the exponential function maps the line segment [log A, log B] in to the geodesic [A, B] in . In this case

Proof. We have to verify that the path

is the unique path of shortest length joining A and B in the space . Since A and B commute, γ(t) = A^1−tB^t and γ^′(t) = (log B − log A) γ(t). The formula (6.2) gives in this case

The EMI (6.7) says that no path can be shorter than this. So the path γ under consideration is one of shortest possible length.

Suppose is another path that joins A and B and has the same length as that of γ. Then is a path that joins log A and log B in H_n, and by Corollary 6.1.3 this path has length || log A − log B||₂. But in a Euclidean space the straight line segment is the unique shortest path between two points. So is a reparametrization of the line segment [log A, log B] . ■

Applying the reasoning of this proof to any subinterval [0, a] of [0, 1] we see that the parametrization

of the line segment [log A, log B] is the one that is mapped isometrically onto [A, B] along the whole interval. In other words the natural parametrisation of the geodesic [A, B] when A and B commute is given by

in the sense that δ₂ A, γ(t) = tδ₂(A, B) for each t. The general case is obtained from this with the help of the isometries Γ_X.

6.1.6 Theorem

Let A and B be any two elements of . Then there exists a unique geodesic [A, B] joining A and B. This geodesic has a parametrization

(6.11)

which is natural in the sense that

(6.12)

for each t. Further, we have

(6.13)

Proof. The matrices I and A^−1/2BA^−1/2 commute. So the geodesic is naturally parametrized as

Applying the isometry Γ_A^1/2 we obtain the path

joining the points Γ_A^1/2(I) = A and Γ_A^1/2 A^−1/2BA^−1/2 = B. Since Γ_A^1/2 is an isometry this path is the geodesic [A, B]. The equality (6.12) follows from the similar property for γ₀(t) noted earlier. Using Proposition 6.1.5 again we see that

Formula (6.13) gives an explicit representation for the metric δ₂ that we defined via (6.4). This is the Riemannian metric on the manifold . From the definition of the norm || · ||₂ we see that

(6.14)

where λ_i are the eigenvalues of the matrix A⁻¹B.

6.1.7 The geometric mean again

The expression (4.10) defining the geometric mean A#B now appears in a new light. It is the midpoint of the geodesic γ joining A and B in the space . This is evident from (6.11) and (6.12). The symmetry of A#B in the two arguments A and B that we deduced by indirect arguments in Section 4.1 is now revealed clearly: the midpoint of the geodesic [A, B] is the same as the midpoint of [B, A].

The next proposition supplements the information given by the EMI.

6.1.8 Proposition

If for some , the identity matrix I lies on the geodesic [A, B], then A and B commute, [A, B] is the isometric image under the exponential map of a line segment through O in , and

(6.15)

where ξ = δ₂(A, I)/δ₂(A, B).

Proof. From Theorem 6.1.6 we know that

where ξ = δ₂ (A, I) /δ₂(A, B). Thus

So A and B commute and (6.15) holds. Now Proposition 6.1.5 tells us that the exponential map sends the line segment [log A, log B] isometrically onto the geodesic [A, B]. The line segment contains the point O = log I. ■

While the EMI says that the exponential map (6.10) is metric nondecreasing in general, Proposition 6.1.8 says that this map is isometric on line segments through O. This essentially captures the fact that is a Riemannian manifold of nonpositive curvature. See the discussion in Section 6.5.

Another essential feature of this geometry is the semiparallelogram law for the metric δ₂. To understand this recall the parallelogram law in a Hilbert space . Let a and b be any two points in and let m = (a + b)/2 be their midpoint. Given any other point c consider the parallelogram one of whose diagonals is [a, b] and the other [c, d]. The two diagonals intersect at m

and the parallelogram law is the equality

Upon rearrangement this can be written as

In the semiparallelogram law this last equality is replaced by an inequality.

6.1.9 Theorem (The Semiparallelogram Law)

Let A and B any two points of and let M = A#B be the midpoint of the geodesic [A, B]. Then for any C in we have

(6.16)

Proof. Applying the isometry Γ_M^−1/2 to all matrices involved, we may assume that M = I. Now I is the midpoint of [A, B] and so by Proposition 6.1.8 we have log B = − log A and

The same proposition applied to [M, C] = [I, C] shows that

The parallelogram law in the Hilbert space tells us

The left-hand side of this equation is equal to and the subtracted term on the right-hand side is equal to . So the EMI (6.8) leads to the inequality (6.16). ■

In a Euclidean space the distance between the midpoints of two sides of a triangle is equal to half the length of the third side. In a space whose metric satisfies the semiparallelogram law this is replaced by an inequality.

6.1.10 Proposition

Let A, B, and C be any three points in . Then

(6.17)

Proof. Consider the triangle with vertices A, B and C (and sides the geodesic segments joining the vertices). Let M₁ = A#B. This is the midpoint of the side [A, B] opposite the vertex C of the triangle {A, B, C}. Hence, by (6.16)

Let M₂ = A#C. In the triangle {A, M₁, C} the point M₂ is the midpoint of the side [A, C] opposite the vertex M₁. Again (6.16) tells us

Substituting the first inequality into the second we obtain

Since δ₂(M₁, A) = δ₂(A, B)/2, the right-hand side of this inequality reduces to . This proves (6.17). ■

The inequality (6.17) can be used to prove a more general version of itself. For 0 ≤ t ≤ 1 let

(6.18)

This is another notation for the geodesic curve γ(t) in (6.11). When t = 1/2 this is the geometric mean A#B. The more general version is in the following.

6.1.11 Corollary

Given four points B, C, B^′, and C^′ in P_n let

Then f is convex on [0, 1]; i.e.,

(6.19)

Proof. Since f is continuous it is sufficient to prove that it is midpoint-convex. Let M₁ = B^′#B, M₂ = C^′#C, and M = B^′#C. By Proposition 6.1.10 we have δ₂(M₁, M) ≤ δ₂(B, C)/2 and δ₂(M, M₂) ≤ δ₂(B^′, C^′)/2. Hence

This shows that f is midpoint-convex. ■

Choosing B^′ = C^′ = A in (6.19) gives the following theorem called the convexity of the metric δ₂.

6.1.12 Theorem

Let A, B and C be any three points in P_n. Then for all t in [0, 1] we have

(6.20)

6.1.13 Exercise

For a fixed A in let f be the function . Show that if , then for 0 < t < 1

(6.21)

This is expressed by saying that the function f is strictly convex on . [Hint: Show this for t = 1/2 first.]

6.2 THE METRIC SPACE

In this section we briefly study some properties of the metric space with special emphasis on convex sets.

6.2.1 Lemma

The exponential is a continuous map from the space onto the space .

Proof. Let H_m be a sequence in converging to H. Then e^−H_me^H converges to I in the metric induced by ||.||₂. So all the eigenvalues , 1 ≤ i ≤ n, converge to 1. The relation (6.14) then shows that goes to zero as m goes to ∞. ■

6.2.2 Proposition

The metric space is complete.

Proof. Let {A_m} be a Cauchy sequence in and let H_m = log A_m. By the EMI (6.8) {H_m} is a Cauchy sequence in , and hence it converges to some H in . By Lemma 6.2.1 the sequence {A_m} converges to A = e^H in the space . ■

Note that P_n is not a complete subspace of . There it has a boundary consisting of singular positive matrices. In terms of the metric δ₂ these are “points at infinity.” The next proposition shows that we may approach these points along geodesics. We use A#_tB for the matrix defined by (6.18) for every real t. When A and B commute, this reduces to A^1−tB^t.

6.2.3 Proposition

Let S be a singular positive matrix. Then there exist commuting elements A and B in such that

as t → ∞.

Proof. Apply a unitary conjugation and assume S = diag (λ₁, . . . , λ_n) where λ_k are nonnegative for 1 ≤ k ≤ n, and λ_k = 0 for some k. If λ_k > 0, then put α_k = β_k = λ_k, and if λ_k = 0, then put α_k = 1 and β_k = 1/2. Let A = diag (α₁, . . . , α_n) and B = diag (β₁, . . . , β_n). Then

For the metric δ₂ we have

and this goes to ∞ as t → ∞. ■

The point of the proposition is that the curve A#_tB starts at A when t = 0, and “goes away to infinity” in the metric space while converging to S in the space .

It is conventional to extend some matrix operations from strictly positive matrices to singular positive matrices by taking limits. For example, the geometric mean A#B is defined by (4.10) for strictly positive matrices A and B, and then defined for singular positive matrices A and B as

The next exercise points to the need for some caution when using this idea.

6.2.4 Exercise

The geometric mean A#B is continuous on pairs of strictly positive matrices, but is not so when extended to positive semidefinite matrices. (See Exercise 4.1.6.)

We have seen that any two points A and B in can be joined by a geodesic segment [A, B] lying in . We say a subset of is convex if for each pair of points A and B in the segment [A, B] lies entirely in . If is any subset of , then the convex hull of is the smallest convex set containing . This set, denoted as conv is the intersection of all convex sets that contain . Clearly, the convex hull of any two point set {A, B} is [A, B].

6.2.5 Exercise

Let S be any set in . Define inductively the sets as and

Show that

The next theorem says that if is a closed convex set in , then a metric projection onto exists just as it does in a Hilbert space.

6.2.6 Theorem

Let be a closed convex set in . Then for each there exists a point such that δ₂(A, C) < δ₂(A, K) for every K in , . (In other words C is the unique best approximant to A from the set .)

Proof. Let µ = inf {δ₂(A, K) : K ∈ } . Then there exists a sequence {C_n} in such that δ₂(A, C_n) → µ. Given n and m, let M be the midpoint of the geodesic segment [C_n, C_m]; i.e., M = C_n#C_m. By the convexity of the point M is in . Using the semiparallelogram law (6.16) we get

and hence

(6.22)

As n and m go to ∞, the right-hand side of (6.22) goes to zero. Hence {C_n} is a Cauchy sequence, and by Proposition 6.2.2 it converges to a limit C in . Since is closed, C is in . Further δ₂(A, C) = lim δ₂(A, C_n) = µ. If K is any other element of such that δ₂(A, K) = µ, then putting C_n = C and C_m = K in (6.22) we see that δ₂(C, K) = 0; i.e., C = K. ■

The map π(A) = C given by Proposition 6.2.6 may be called the metric projection onto K.

6.2.7 Theorem

Let π be the metric projection onto a closed convex set of . If A is any point of and π(A) = C, then for any D in

(6.23)

Proof. Let {M_n} be the sequence defined inductively as M₀ = D, and M_n₊₁ = M_n#C. Then δ₂(C, M_n) = 2⁻ⁿδ₂(C, D), and M_n converges to C = M_∞. By the semiparallelogram law (6.16)

Hence,

Summing these inequalities we have

It is easy to see that the two series are absolutely convergent.

Let . Then the last inequality can be written as

The same argument applied to M_n in place of D shows

Thus

Since is convex, each , and hence d_n ≥ 0. Thus we have

This proves the inequality (6.23). ■

6.2.8 The geometric mean once again

If is a Euclidean space with metric d, and a, b are any two points of , then the function

attains its minimum on at the unique point . In the metric space this role is played by the geometric mean.

Proposition. Let A and B be any two points of P_n, and let

Then the function f is strictly convex on , and has a unique minimum at the point X₀ = A#B.

Proof. The strict convexity is a consequence of Exercise 6.1.13. The semiparallelogram law implies that for every X we have

Hence

This shows that f has a unique minimum at the point X₀ = A#B. ■

6.3 CENTER OF MASS AND GEOMETRIC MEAN

In Chapter 4 we discussed, and resolved, the problems associated with defining a good geometric mean of two positive matrices. In this section we consider the question of a suitable definition of a geometric mean of more than two matrices. Our discussion will show that while the case of two matrices is very special, ideas that work for three matrices do work for more than three as well.

Given three positive matrices A₁, A₂, and A₃, their geometric mean G(A₁, A₂, A₃) should be a positive matrix with the following properties. If A₁, A₂, and A₃ commute with each other, then G(A₁A₂A₃) = (A₁A₂A₃)^1/3. As a function of its three variables, G should satisfy the conditions:

(i) G(A₁, A₂, A₃) = G(A_π₍₁₎, A_π₍₂₎, A_π₍₃₎) for every permutation π of {1, 2, 3}.

(ii) G(A₁, A₂, A₃) ≤ G(A^′₁, A₂, A₃) whenever A₁ ≤ A^′₁.

(iii) G(X^∗A₁X, X^∗A₂X, X^∗A₃X) = X^∗G(A₁, A₂, A₃)X for all X ∈ GL(n).

(iv) G is continuous.

The first three conditions may be called symmetry, monotonicity, and congruence invariance, respectively.

None of the procedures that we used in Chapter 4 to define the geometric mean of two positive matrices extends readily to three. While two positive matrices can be diagonalized simultaneously by a congruence, in general three cannot be. The formula (4.10) has no obvious analogue for three matrices; nor does the extremal characterization (4.15). It is here that the connections with geometry made in Sections 6.1.7 and 6.2.8 suggest a way out: the geometric mean of three matrices should be the “center” of the triangle that has the three matrices as its vertices.

As motivation, consider the arithmetic mean of three points x₁, x₂, and x₃ in a Euclidean space . The point is characterized by several properties; three of them follow:

(i) is the unique point of intersection of the three medians of the triangle Δ(x₁, x₂, x₃). (This point is called the centroid of Δ.)

(ii) is the unique point in at which the function

attains its minimum. (This point is the center of mass of the triple {x₁, x₂, x₃} if each of them has equal mass.)

(iii) is the unique point of intersection of the nested sequence of triangles {Δ_n} in which Δ₁ = Δ(x₁, x₂, x₃) and Δ_j₊₁ is the triangle whose vertices are the midpoints of the three sides of Δ_j.

We may try to mimic these constructions in the space . As we will see, this has to be done with some circumspection.

The first difficulty is with the identification of a triangle in this space. In Section 6.2 we defined convex hulls and observed that the convex hull of two points A₁, A₂ in is the geodesic segment [A₁, A₂]. It is harder to describe the convex hull of three points A₁, A₂, A₃. (This seems to be a difficult problem in Riemannian geometry.) In the notation of Exercise 6.2.5, if = {A₁, A₂, A₃}, then = [A₁, A₂] ∪ [A₂, A₃] ∪ [A₃, A₁] is the union of the three “edges.” However, is not in general a “surface,” but a “fatter” object. Thus it may happen that the three “medians” [A₁, A₂#A₃], [A₂, A₁#A₃], and [A₃, A₁#A₂] do not intersect at all in most cases. So, we have to abandon this as a possible definition of the centroid of the triangle Δ(A₁, A₂, A₃).

Next we ask whether for every triple of points A₁, A₂, A₃ in there exists a (unique) point X₀ at which the function

attains its minimum value on . A simple argument using the semiparallelogram law shows that such a point exists. This goes as follows.

Let m = inf f(X) and let {X_r} be a sequence in such that f(X_r) → m. By the semiparallellgram law we have for j = 1, 2, 3, and for all r and s

Summing up these three inequalities over j, we obtain

This shows that

It follows that {X_r} is a Cauchy sequence, and hence it converges to a limit X₀. Clearly f attains its minimum at X₀. By Exercise 6.1.13 the function f is strictly convex and its minimum is attained at a unique point.

We define the “center of mass” of {A₁, A₂, A₃} as the point

(6.24)

where the notation arcmin f(X) stands for the point X₀ at which the function f(X) attains its minimum value. It is clear from the definition that G(A₁, A₂, A₃) is a symmetric and continuous function of the three variables. Since each congruence transformation Γ_X is an isometry of it is easy to see that G is congruence invariant; i.e.,

Thus G has three of the four desirable properties listed for a good geometric mean at the beginning of this section. We do not know whether G is monotone. Some more properties of G are derived below.

6.3.1 Lemma

Let φ₁, φ₂ be continuously differentiable real-valued functions on the interval (0, ∞) and let

for all . Then the derivative of h is given by the formula

Proof. By the product rule for differentiation (see MA, p. 312) we have

Choose an orthonormal basis in which X = diag (λ₁, . . . , λ_n). Then by (2.40)

Hence,

Similarly,

This proves the lemma. ■

6.3.2 Corollary

Let , . Then

We need a slight modification of this result. If

then

(6.25)

for all .

6.3.3 Theorem

Let A₁, A₂, A₃ be any three elements of , and let

(6.26)

Then the derivative of f at X is given by

(6.27)

for all .

Proof. Using the relation (6.13) we have

Using (6.25) we see that Df(X)(Y ) is a sum of three terms of the form

Here we have used the similarity invariance of trace at the first step, and then the relation

at the second step. The latter is valid for all matrices T with no eigenvalues on the half-line (−∞, 0] and for all invertible matrices S, and follows from the usual functional calculus. This proves the theorem. ■

6.3.4 Theorem

Let A₁, A₂, A₃ be three positive matrices and let X₀ = G(A₁, A₂, A₃) be the point defined by (6.24). Then X₀ is the unique positive solution of the equation

(6.28)

Proof. The point X₀ is the unique minimum of the function (6.26),

and hence, is characterised by the vanishing of the derivative (6.27) for all . But any matrix orthogonal to all Hermitian matrices is zero. Hence

(6.29)

In other words X₀ satisfies the equation (6.28). ■

6.3.5 Exercise

Let A₁, A₂, A₃ be pairwise commuting positive matrices. Show that G(A₁, A₂, A₃) = (A₁A₂A₃)^1/3.

6.3.6 Exercise

Let X and A be positive matrices. Show that

(6.30)

(This shows that the matrices occurring in (6.29) are Hermitian.)

6.3.7 Exercise

Let w = (w₁, w₂, w₃), where w_j ≥ 0 and . We say that w is a set of weights. Let

Show that f_w is strictly convex, and attains a minimum at a unique point.

Let G_w(A₁, A₂, A₃) be the point where f_w attains its minimum. The special choice w = (1/3, 1/3, 1/3) leads to G(A₁, A₂, A₃).

6.3.8 Proposition

Each of the points G_w(A₁, A₂, A₃) lies in the closure of the convex hull conv ({A₁, A₂, A₃}).

Proof. Let be the closure of conv ({A₁, A₂, A₃}) and let π be the metric projection onto . Then by Theorem 6.2.7, for every . Hence f_w(X) ≥ f_w(π(X)) for all X. Thus the minimum value of f_w(X) cannot be attained at a point outside . ■

Now we turn to another possible definition of the geometric mean of three matrices inspired by the characterisation of the centre of a triangle as the intersection of a sequence of nested triangles.

Given A₁, A₂, A₃ in inductively construct a sequence of triples as follows. Set , and let

(6.31)

6.3.9 Theorem

Let A₁, A₂, A₃ be any three points in , and let be the sequence defined by (6.31). Then for any choice of X_m in conv the sequence {X_m} converges to a point X ∈ conv ({A₁, A₂, A₃}). The point X does not depend on the choice of X_m.

Proof. The diameter of a set in is defined as

It is easy to see, using convexity of the metric δ₂, that if diam = M, then diam .

Let . By (6.17), and what we said above, diam , where M₀ = diam {A₁, A₂, A₃}. The sequence is a decreasing sequence. Hence {X_m} is Cauchy and converges to a limit X. Since X_m is in for all m, the limit X is in the closure of . The limit is unique as any two such sequences can be interlaced. ■

6.3.10 A geometric mean of three matrices

Let G^#(A₁, A₂, A₃) be the limit point X whose existence has been proved in Theorem 6.3.9. This may be thought of as a geometric mean of A₁, A₂, A₃. From its construction it is clear that G^# is a symmetric continuous function of A₁, A₂, A₃. Since the geometric mean A#B of two matrices is monotone in A and B and is invariant under congruence transformations, these properties are inherited by G^#(A₁, A₂, A₃) as its construction involves successive two-variable means and limits.

Exercise Show that for a commuting triple A₁, A₂, A₃ of positive matrices G^#(A₁, A₂, A₃) = (A₁A₂A₃)^1/3.

One may wonder whether G^#(A₁, A₂, A₃) is equal to the centre of mass G(A₁, A₂, A₃). It turns out that this is not always the case. Thus we have here two different candidates for a geometric mean of three matrices. While G^# has all properties that we seek, it is not known whether G is monotone in its arguments. It does have all other desired properties.

6.4 RELATED INEQUALITIES

Some of the inequalities proved in Section 6.1 can be generalized from the special ||·||₂ norm to all Schatten ||·||_p norms and to the larger class of unitarily invariant norms. These inequalities are very closely related to others proved in very different contexts like quantum statistical mechanics. This section is a brief indication of these connections.

Two results from earlier chapters provide the basis for our generalizations. In Exercise 2.7.12 we saw that for a positive matrix A

for every X and every unitarily invariant norm. In Section 5.2.9 we showed that for every choice of n positive numbers λ₁, . . . , λ_n, the matrix

is positive. Using these we can easily prove the following generalized version of Proposition 6.1.2.

6.4.1 Proposition (Generalized IEMI)

For all H and K in H_n we have

(6.32)

for every unitarily invariant norm.

In the definition (6.2) replace || · ||₂ by any unitarily invariant norm ||| · ||| and call the resulting length L_|||·|||; i.e.,

(6.33)

Since |||X||| is a (symmetric gauge) function of the singular values of X, Lemma 6.1.1 carries over to L_|||·|||. The analogue of (6.4),

(6.34)

is a metric on invariant under congruence transformations. The generalized IEMI leads to a generalized EMI. For all A, B in we have

(6.35)

or, in other words, for all H, K in

(6.36)

Some care is needed while formulating statements about uniqueness of geodesics. Many unitarily invariant norms have the property that, in the metric they induce on , the straight line segment is the unique geodesic joining any two given points. If a norm ||| · ||| has this property, then the metric δ_|||·||| on inherits it. The Schatten p-norms have this property for 1 < p < ∞, but not for p = 1 or ∞. With this proviso, statements made in Sections 6.1.5 and 6.1.6 can be proved in the more general setting. In particular, we have

(6.37)

The geometric mean A#B defined by (4.10) is equidistant from A and B in each of the metrics δ_|||·|||. For certain metrics, such as the ones corresponding to Schatten p-norms for 1 < p < ∞, this is the unique “metric midpoint” between A and B.

The parallelogram law and the semiparallelogram law, however, characterize a Hilbert space norm and the associated Riemannian metric. These are not valid for other metrics.

Now we can see the connection between these inequalities arising from geometry to others related to physics. Some facts about majorization and unitarily invariant norms are needed in the ensuing discussion. Let H, K be Hermitian matrices. From (6.36) and (6.37) we have

(6.38)

The exponential function is convex and monotonically increasing on . Such functions preserve weak majorization (Corollary II.3.4 in MA). Using this property we obtain from the inequality (6.38)

(6.39)

Two special cases of this are well-known inequalities in physics. The special cases of the || · ||₁ and the || · || norms in (6.39) say

(6.40)

and

(6.41)

where λ₁(X) is the largest eigenvalue of a matrix with real eigenvalues. The first of these is called the Golden-Thompson inequality and the second is called Segal’s inequality.

The inequality (6.41) can be easily derived from the operator monotonicity of the logarithm function (Exercise 4.2.5 and Section 5.3.7).

Let

Then

and hence

Since log is an operator monotone function on (0, ∞), it follows that

Hence

and therefore

This leads to (6.41).

More interrelations between various inequalities are given in the next section and in the notes at the end of the chapter.

6.5 SUPPLEMENTARY RESULTS AND EXERCISES

The crucial inequality (6.6) has a short alternate proof based on the inequality between the geometric and the logarithmic means. This relies on the following interesting formula for the derivative of the exponential map:

(6.42)

This formula, attributed variously to Duhamel, Dyson, Feynman, and Schwinger, has an easy proof. Since

we have

Hence

This is exactly the statement (6.42).

Now let H and K be Hermitian matrices. Using the identity

and the first inequality in (5.34) we obtain

The last integral is equal to De^H(K). Hence,

This is the IEMI (6.6).

The inequality (5.35) generalizes (5.34) to all unitarily invariant norms. So, exactly the same argument as above leads to a proof of (6.32) as well.

From the expression (6.14) it is clear that

(6.43)

for all . Similarly, from (6.37) we see that

(6.44)

An important notion in geometry is that of a Riemannian symmetric space. By definition, this is a connected Riemannian manifold M for each point p of which there is an isometry σ_p of M with two properties:

(i) σ_p(p) = p, and

(ii) the derivative of σ_p at p is multiplication by −1.

The space is a Riemannian symmetric space. We show this using the notation and some basic facts on matrix differential calculus from Section X.4 of MA. For each let σ_A be the map defined on by

Clearly σ_A(A) = A. Let be the inversion map. Then σ_A is the composite . The derivative of is given by −X⁻¹Y X⁻¹, while Γ_A being a linear map is equal to its own derivative. So, by the chain rule

Thus Dσ_p(A) is multiplication by −1.

The Riemannian manifold has nonpositive curvature. The EMI captures the essence of this fact. We explain this briefly.

Consider a triangle Δ(O, H, K) with vertices O, H, and K in . The image of this set under the exponential map is a “triangle” Δ(I, e^H, e^K) in . By Proposition 6.1.5 the δ₂-lengths of the sides [I, e^H] and [I, e^K] are equal to the || · ||₂ -lengths of the sides [O, H] and [O, K], respectively. By the EMI (6.8) the third side [e^H, e^K] is longer than [H, K]. Keep the vertex O as a fixed pivot and move the sides [O, H] and [O, K] apart to get a triangle Δ(O, H^′, K^′) in whose three sides now have the same lengths as the δ₂-lengths of the sides of Δ(I, e^H, e^K) in . Such a triangle is called a comparison triangle for Δ(I, e^H, e^K) and it is unique up to an isometry of . The fact that the comparison triangle in the Euclidean space is “fatter” than the triangle Δ(I, e^H, e^K) is a characterization of a space of nonpositive curvature.

It may be instructive here to compare the situation with the space consisting of unitary matrices. This is a compact manifold of nonnegative curvature. In this case the real vector space consisting of skew-Hermitian matrices is mapped by the exponential onto . The map is not injective in this case; it is a local diffeomorphism.

6.5.1 Exercise

Let H and K be any two skew-Hermitian matrices. Show that

(6.45)

[Hint: Follow the steps in the proof of Proposition 6.1.2. Now the λ_i are imaginary. So the hyperbolic function sinh occurring in the proof of Proposition 6.1.2 is replaced by the circular function sin. Alternately prove this using the formula (6.42). Observe that e^tH is unitary.]

As a consequence we have the opposite of the inequality (6.8) in this case: if A and B are sufficiently close in , then

Thus the exponential map decreases distance locally. This fact captures the nonnegative curvature of .

Of late there has been interest in general metric spaces of nonpositive curvature (not necessarily Riemannian manifolds). An important consequence of the generalised EMI proved in Section 6.4 is that for every unitarily invariant norm the space is a metric space of nonpositive curvature. These are examples of Finsler manifolds, where the metric arises from a non-Euclidean metric on the tangent space.

A metric space (X, d) is said to satisfy the semiparallelogram law if for any two points a, b ∈ X, there exists a point m such that

(6.46)

for all c ∈ X.

6.5.2 Exercise

Let (X, d) be a metric space with the semiparallelogram law. Show that the point m arising in the definition is unique and is the metric midpoint of a and b; i.e., m is the point at which d(a, m) = d(b, m) = .

A complete metric space satisfying the semiparallelogram law is called a Bruhat-Tits space. We have shown that is such a space. Those of our proofs that involved only completeness and the semiparallelogram law are valid for all Bruhat-Tits spaces. See, for example, Theorems 6.2.6 and 6.2.7.

In the next two exercises we point out more connections between classical matrix inequalities and geometric facts of this chapter. We use the notation of majorization and facts about unitarily invariant norms from MA, Chapters II and IV. The reader unfamiliar with these may skip this part.

6.5.3 Exercise

An inequality due to Gel’fand, Naimark, and Lidskii gives relations between eigenvalues of two positive matrices A and B and their product AB. This says

log λ^↓(A) + log λ^↑(B) ≺ log λ(AB) ≺ log λ^↓(A) + log λ^↓(B). (6.47) See MA p. 73. Let A, B, and C be three positive matrices. Then

So, by the second part of (6.47)

Use this to show directly that δ_|||·||| defined by (6.36) is a metric on .

6.5.4 Exercise

Let A and B be positive. Then for 0 ≤ t ≤ 1 and 1 ≤ k ≤ n we have

(6.48)

See MA p. 258. Take logarithms of both sides and use results on majorization to show that

This may be rewritten as

Show that this implies that the metric δ_|||·||| is convex.

In Section 4.5 we outlined a general procedure for constructing matrix means from scalar means. Two such means are germane to our present discussion. The function f in (4.69) corresponding to the logarithmic mean is

So the logarithmic mean of two positive matrices A and B given by the formula (4.71) is

In other words

(6.49)

where γ(t) is the geodesic segment joining A and B.

Likewise, for 0 ≤ t ≤ 1 the Heinz mean

(6.50)

leads to the function

and then to the matrix Heinz mean

(6.51)

The following theorem shows that the geodesic γ(t) has very intimate connections with the order relation on P_n.

6.5.5 Theorem

For every α in [0, 1/2] we have

Proof. It is enough to prove the scalar versions of these inequalities as they are preserved in the transition to matrices by our construction. For fixed a and b, H_t(a, b) is a convex function of t on [0, 1]. It is symmetric about the point t = 1/2 at which it attains its minimum. Hence the quantity

is an increasing function of α for 0 ≤ α ≤ 1/2. Similarly,

is a decreasing function of α. These considerations show

The theorem follows from this. ■

6.5.6 Exercise

Show that for 0 ≤ t ≤ 1

(6.52)

[Hint: Show that for each λ > 0 we have λ^t ≤ (1 − t) + tλ.]

6.5.7 Exercise

Let Φ be any positive linear map on . Then for all positive matrices A and B

[Hint: Use Theorem 4.1.5 (ii).]

6.5.8 Exercise

The aim of this exercise is to give a simple proof of the convergence argument needed to establish the existence of G^#(A₁, A₂, A₃) defined in Section 6.3.10.

(i) Assume that A₁ ≤ A₂ ≤ A₃. Then the sequences defined in (6.31) satisfy

The sequence is increasing and is decreasing. Hence the limits

exist. Show that L = U. Thus

Call this limit G^#(A₁, A₂, A₃).

(ii) Now let A₁, A₂, A₃ be any three positive matrices. Choose positive numbers λ and µ such that

Let (B₁, B₂, B₃) = (A₁, λA₂, µA₃). Apply the special case (i) to get the limit G^#(B₁, B₂, B₃). The same recursion applied to the triple of numbers (a₁, a₂, a₃) = (1, λ, µ) gives

Since

it follows that the sequences , j = 1, 2, 3, converge to the limit G^#(B₁, B₂, B₃)/(λµ)^1/3.

6.5.9 Exercise

Show that the center of mass defined by (6.24) has the property

for all positive matrices A₁, A₂, A₃. Show that G^# also satisfies this relation.

6.6 NOTES AND REFERENCES

Much of the material in Sections 6.1 and 6.2 consists of standard topics in Riemannian geometry. The arrangement of topics, the emphasis, and some proofs are perhaps eccentric. Our view is directed toward applications in matrix analysis, and the treatment may provide a quick introduction to some of the concepts. The entire chapter is based on R. Bhatia and J. A. R. Holbrook, Riemannian geometry and matrix geometric means, Linear Algebra Appl., 413 (2006) 594–618.

Two books on Riemannian geometry that we recommend are M. Berger, A Panoramic View of Riemannian Geometry, Springer, 2003, and S. Lang, Fundamentals of Differential Geometry, Springer, 1999. Closely related to our discussion is M. Bridson and A. Haefliger, Metric Spaces of Non-positive Curvature, Springer, 1999. Most of the texts on geometry emphasize group structures and seem to downplay the role of the matrices that constitute these groups. Lang’s text is exceptional in this respect. The book A. Terras, Harmonic Analysis on Symmetric Spaces and Applications II, Springer, 1988, devotes a long chapter to the space .

The proof of Proposition 6.1.2 is close to the treatment in Lang’s book. (Lang says he follows “Mostow’s very elegant exposition of Cartan’s work.”) The linear algebra in our proof looks neater because a part of the work has been done earlier in proving the Daleckii-Krein formula (2.40) for the derivative. The second proof given at the beginning of Section 6.5 is shorter and more elementary. This is taken from R. Bhatia, On the exponential metric increasing property, Linear Algebra Appl., 375 (2003) 211–220.

Explicit formulas like (6.11) describing geodesics are generally not emphasized in geometry texts. This expression has been used often in connection with means. With the notation A#_tB this is called the t-power mean. See the comprehensive survey F. Hiai, Log-majorizations and norm inequalities for exponential operators, Banach Center Publications Vol. 38, pp. 119–181.

The role of the semiparallelogram law is highlighted in Chapter XI of Lang’s book. A historical note on page 313 of this book places it in context. To a reader oriented towards analysis in general, and inequalities in particular, this is especially attractive. The expository article by J. D. Lawson and Y. Lim, The geometric mean, matrices, metrics and more, Am. Math. Monthly, 108 (2001) 797–812, draws special attention to the geometry behind the geometric mean.

Problems related to convexity in differentiable manifolds are generally difficult. According to Note 6.1.3.1 on page 231 of Berger’s book the problem of identifying the convex hull of three points in a Riemannian manifold of dimension 3 or more is still unsolved. It is not even known whether this set is closed. This problem is reflected in some of our difficulties in Section 6.3.

Berger attributes to E. Cartan, Groupes simples clos et ouverts et géometrie Riemannienne, J. Math. Pures Appl., 8 (1929) 1–33, the introduction of the idea of center of mass in Riemannian geometry. Cartan showed that in a complete manifold of nonpositive curvature (such as ) every compact set has a unique center of mass. He used this to prove his fundamental theorem that says any two compact maximal subgroups of a semisimple Lie group are always conjugate.

The idea of using the center of mass to define a geometric mean of three positive matrices occurs in the paper of Bhatia and Holbrook cited earlier and in M. Moakher, A differential geometric approach to the geometric mean of symmetric positive-definite matrices, SIAM J. Matrix Anal. Appl., 26 (2005) 735–747. This paper contains many interesting ideas. In particular, Theorem 6.3.4 occurs here. Applications to problems of elasticity are discussed in M. Moakher, On the averaging of symmetric positive-definite tensors, preprint (2005).

The manifold is the most studied example of a manifold of nonpositive curvature. However, one of its basic features—order—seems not to have received any attention. Our discussion of the center of mass and Theorem 6.5.5 show that order properties and geometric properties are strongly interlinked. A study of these properties should lead to a better understanding of this manifold.

The mean G^#(A₁, A₂, A₃) was introduced in T. Ando, C.-K Li, and R. Mathias, Geometric Means, Linear Algebra Appl., 385 (2004) 305–334. Many of its properties are derived in this paper which also contains a detailed survey of related matters. The connection with Riemannian geometry was made in the Bhatia-Holbrook paper cited earlier. That G^# and the center of mass may be different, is a conclusion made on the basis of computer-assisted numerical calculations reported in Bhatia-Holbrook. A better theoretical understanding is yet to be found.

As explained in Section 6.5 the EMI reflects the fact that has nonpositive curvature. Inequalities of this type are called CAT(0) inequalities; the initials C, A, T are in honour of E. Cartan, A. D. Alexandrov, and A. Toponogov, respectively. These ideas have been given prominence in the work of M. Gromov. See the book W. Ballmann, M. Gromov, and V. Schroeder, Manifolds of Nonpositive Curvature, Birkhäuser, 1985, and the book by Bridson and Haefliger cited earlier. A concept of curvature for metric spaces (not necessarily Riemannian manifolds) is defined and studied in the latter. The generalised EMI proved in Section 6.4 shows that the space with the metric δ_|||·||| is a metric space (a Finsler manifold) of nonpositive curvature.

Segal’s inequality was proved in I. Segal, Notes towards the construction of nonlinear relativistic quantum fields III, Bull. Am. Math. Soc., 75 (1969) 1390–1395. The simple proof given in Section 6.4 is borrowed from B. Simon, Trace Ideals and Their Applications, Second Edition, American Math. Society, 2005. The Golden-Thompson inequality is due to S. Golden, Lower bounds for the Helmholtz function, Phys. Rev. B, 137 (1965) 1127–1128, and C. J. Thompson, Inequality with applications in statistical mechanics, J. Math. Phys., 6 (1965) 1812–1813. Stronger versions and generalizations to other settings (like Lie groups) have been proved. Complementary inequalities have been proved by F. Hiai and D. Petz, The Golden-Thompson trace inequality is complemented, Linear Algebra Appl., 181 (1993) 153–185, and by T. Ando and F. Hiai, Log majorization and complementary Golden-Thompson type inequalities, ibid., 197/198 (1994) 113–131. These papers are especially interesting in our context as they involve the means A#_tB in the formulation and the proofs of several results. The connection between means, geodesics, and inequalities has been explored in several interesting papers by G. Corach and coauthors. Illustrative of this work and especially close to our discussion are the two papers by G. Corach, H. Porta and L. Recht, Geodesics and operator means in the space of positive operators, Int. J. Math., 4 (1993) 193–202, and Convexity of the geodesic distance on spaces of positive operators, Illinois J. Math., 38 (1994) 87–94.

The logarithmic mean L(A, B) has not been studied before. The definition (6.49) raises interesting questions both for matrix theory and for geometry. In differential geometry it is common to integrate (real) functions along curves. Here we have the integral of the curve itself. Theorem 6.5.5 relates this object to other means, and includes the operator analogue of the inequality between the geometric, logarithmic, and arithmetic means. The norm version of this inequality appears as Proposition 3.2 in F. Hiai and H. Kosaki, Means for matrices and comparison of their norms, Indiana Univ. Math. J., 48 (1999) 899–936. Exercise 6.5.8 is based on the paper D. Petz and R. Temesi, Means of positive numbers and matrices, SIAM J. Matrix Anal. Appl., 27 (2005) 712–720.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter Six: Geometry of Positive Matrices

Create new playlist

Sign In

Sign Up

6.1.1 Lemma

6.1.2 Proposition (IEMI)

6.1.3 Corollary

6.1.4 Theorem (EMI)

6.1.5 Proposition

6.1.6 Theorem

6.1.7 The geometric mean again

6.1.8 Proposition

6.1.9 Theorem (The Semiparallelogram Law)

6.1.10 Proposition

6.1.11 Corollary

6.1.12 Theorem

6.1.13 Exercise

6.2.1 Lemma

6.2.2 Proposition

6.2.3 Proposition

6.2.4 Exercise

6.2.5 Exercise

6.2.6 Theorem

6.2.7 Theorem

6.2.8 The geometric mean once again

6.3.1 Lemma

6.3.2 Corollary

6.3.3 Theorem

6.3.4 Theorem

6.3.5 Exercise

6.3.6 Exercise

6.3.7 Exercise

6.3.8 Proposition

6.3.9 Theorem

6.3.10 A geometric mean of three matrices

6.4.1 Proposition (Generalized IEMI)

6.5.1 Exercise

6.5.2 Exercise

6.5.3 Exercise

6.5.4 Exercise

6.5.5 Theorem

6.5.6 Exercise

6.5.7 Exercise

6.5.8 Exercise

6.5.9 Exercise

Table of Contents for
Chapter Six: Geometry of Positive Matrices