Chapter Six



Geometry of Positive Matrices

The set of n × n positive matrices is a differentiable manifold with a natural Riemannian structure. The geometry of this manifold is intimately connected with some matrix inequalities. In this chapter we explore this connection. Among other things, this leads to a deeper understanding of the geometric mean of positive matrices.


6.1  THE RIEMANNIAN METRIC

The space images/nec-211-1.png is a Hilbert space with the inner product images/nec-211-2.png = tr AB and the associated norm ||A||2 = (tr AA)1/2. The set of Hermitian matrices constitutes a real vector space images/nec-211-3.png in images/nec-211-4.png. The subset images/nec-211-5.png consisting of strictly positive matrices is an open subset in images/nec-211-6.png. Hence it is a differentiable manifold. The tangent space to images/nec-211-7.png at any of its points A is the space images/nec-211-8.png, identified for simplicity, with images/nec-211-9.png. The inner product on images/nec-211-10.png leads to a Riemannian metric on the manifold images/nec-211-11.png. At the point A this metric is given by the differential

images/eq-211-1.png(6.1)

This is a mnemonic for computing the length of a (piecewise) differentiable path in images/nec-211-12.png. If images/nec-211-13.png is such a path, we define its length as

images/eq-211-2.png(6.2)

For each XGL(n) the congruence transformation ΓX(A) = XAX is a bijection of images/nec-211-14.png onto itself. The composition ΓXγ is another differentiable path in images/nec-211-15.png.


6.1.1  Lemma

For each XGL(n) and for each differentiable path γ

images/eq-211-3.png(6.3)

Proof.   Using the definition of the norm || · ||2 and the fact that tr XY = tr Y X for all X and Y we have for each t

images/eq-212-1.png

Intergrating over t we get (6.3).     ■

For any two points A and B in images/nec-212-1.png let

images/eq-212-2.png(6.4)

This gives a metric on images/nec-212-2.png. The triangle inequality

images/eq-212-3.png

is a consequence of the fact that a path γ1 from A to C can be adjoined to a path γ2 from C to B to obtain a path from A to B. The length of this latter path is L(γ1) + L(γ2).

According to Lemma 6.1.1 each ΓX is an isometry for the length L. Hence it is also an isometry for the metric δ2; i.e.,

images/eq-212-4.png(6.5)

for all A, B in images/nec-212-3.png and X in GL(n).

This observation helps us to prove several properties of δ2. We will see that the infimum in (6.4) is attained at a unique path joining A and B. This path is called the geodesic from A to B. We will soon obtain an explicit formula for this geodesic and for its length. The following inequality called the infinitesimal exponential metric increasing property (IEMI) plays an important role. Following the notation introduced in Exercise 2.7.15 we write DeH for the derivative of the exponential map at a point H of images/nec-212-4.png. This is a linear map on images/nec-212-5.png whose action is given as

images/eq-212-5.png

6.1.2  Proposition (IEMI)

For all H and K in Hn we have

images/eq-213-1.png(6.6)

Proof.   Choose an orthonormal basis in which H = diag (λ1, . . . , λn). By the formula (2.40)

images/eq-213-2.png

Therefore, the i, j entry of the matrix eH/2 D eH(K) eH/2 is

images/eq-213-3.png

Since (sinh x)/x ≥ 1 for all real x, the inequality (6.6) follows.     ■


6.1.3  Corollary

Let H(t), atb be any path in Hn and let γ(t) = eH(t). Then

images/eq-213-4.png(6.7)

Proof.   By the chain rule images/nec-213-1.png. So the inequality (6.7) follows from the definition of L(γ) given by (6.2) and the IEMI (6.6).     ■


If γ(t) is any path joining A and B in images/nec-213-2.png, then H(t) = log γ(t) is a path joining log A and log B in images/nec-213-3.png. The right-hand side of (6.7) is the length of this path in the Euclidean space images/nec-213-4.png. This is bounded below by the length of the straight line segment joining log A and log B. Thus L(γ) ≥ || log A − log B||2, and we have the following important corollary called the exponential metric increasing property (EMI).


6.1.4  Theorem (EMI)

For each pair of points A, B in images/nec-213-5.png we have

images/eq-213-5.png(6.8)

In other words for any two matrices H and K in images/nec-213-6.png

images/eq-213-6.png(6.9)

So the map

images/eq-214-1.png(6.10)

increases distances, or is metric increasing.

Our next proposition says that when A and B commute there is equality in (6.8). Further the exponential map carries the line segment joining log A and log B in images/nec-214-1.png to the geodesic joining A and B in images/nec-214-2.png. A bit of notation will be helpful here. We write [H, K] for the line segment

images/eq-214-2.png

joining two points H and K in images/nec-214-3.png. If A and B are two points in images/nec-214-4.png we write [A, B] for the geodesic from A to B. The existence of such a path is yet to be established. This is done first in the special case of commuting matrices.


6.1.5  Proposition

Let A and B be commuting matrices in images/nec-214-5.png. Then the exponential function maps the line segment [log A, log B] in images/nec-214-6.png to the geodesic [A, B] in images/nec-214-7.png. In this case

images/eq-214-3.png

Proof.   We have to verify that the path

images/eq-214-4.png

is the unique path of shortest length joining A and B in the space images/nec-214-8.png . Since A and B commute, γ(t) = A1−tBt and γ(t) = (log B − log A) γ(t). The formula (6.2) gives in this case

images/eq-214-5.png

The EMI (6.7) says that no path can be shorter than this. So the path γ under consideration is one of shortest possible length.

Suppose images/nec-214-9.png is another path that joins A and B and has the same length as that of γ. Then images/nec-214-10.png is a path that joins log A and log B in Hn, and by Corollary 6.1.3 this path has length || log A − log B||2. But in a Euclidean space the straight line segment is the unique shortest path between two points. So images/nec-214-11.png is a reparametrization of the line segment [log A, log B] .     ■

Applying the reasoning of this proof to any subinterval [0, a] of [0, 1] we see that the parametrization

images/eq-215-1.png

of the line segment [log A, log B] is the one that is mapped isometrically onto [A, B] along the whole interval. In other words the natural parametrisation of the geodesic [A, B] when A and B commute is given by

images/eq-215-2.png

in the sense that δ2 images/nec-215-1.png A, γ(t) images/nec-215-2.png = 2(A, B) for each t. The general case is obtained from this with the help of the isometries ΓX.


6.1.6  Theorem

Let A and B be any two elements of images/nec-215-3.png. Then there exists a unique geodesic [A, B] joining A and B. This geodesic has a parametrization

images/eq-215-3.png(6.11)

which is natural in the sense that

images/eq-215-4.png(6.12)

for each t. Further, we have

images/eq-215-5.png(6.13)

Proof.   The matrices I and A−1/2BA−1/2 commute. So the geodesic images/nec-215-4.png is naturally parametrized as

images/eq-215-6.png

Applying the isometry ΓA1/2 we obtain the path

images/eq-215-7.png

joining the points ΓA1/2(I) = A and ΓA1/2 images/nec-215-5.png A−1/2BA−1/2images/nec-215-6.png = B. Since ΓA1/2 is an isometry this path is the geodesic [A, B]. The equality (6.12) follows from the similar property for γ0(t) noted earlier. Using Proposition 6.1.5 again we see that

images/eq-216-1.png

Formula (6.13) gives an explicit representation for the metric δ2 that we defined via (6.4). This is the Riemannian metric on the manifold images/nec-216-1.png. From the definition of the norm || · ||2 we see that

images/eq-216-2.png(6.14)

where λi are the eigenvalues of the matrix A−1B.


6.1.7  The geometric mean again

The expression (4.10) defining the geometric mean A#B now appears in a new light. It is the midpoint of the geodesic γ joining A and B in the space images/nec-216-2.png. This is evident from (6.11) and (6.12). The symmetry of A#B in the two arguments A and B that we deduced by indirect arguments in Section 4.1 is now revealed clearly: the midpoint of the geodesic [A, B] is the same as the midpoint of [B, A].

The next proposition supplements the information given by the EMI.


6.1.8  Proposition

If for some images/nec-216-3.png, the identity matrix I lies on the geodesic [A, B], then A and B commute, [A, B] is the isometric image under the exponential map of a line segment through O in images/nec-216-4.png, and

images/eq-216-3.png(6.15)

where ξ = δ2(A, I)2(A, B).

Proof.   From Theorem 6.1.6 we know that

images/eq-216-4.png

where ξ = δ2 (A, I) 2(A, B). Thus

images/eq-216-5.png

So A and B commute and (6.15) holds. Now Proposition 6.1.5 tells us that the exponential map sends the line segment [log A, log B] isometrically onto the geodesic [A, B]. The line segment contains the point O = log I.     ■


While the EMI says that the exponential map (6.10) is metric nondecreasing in general, Proposition 6.1.8 says that this map is isometric on line segments through O. This essentially captures the fact that images/nec-217-1.png is a Riemannian manifold of nonpositive curvature. See the discussion in Section 6.5.

Another essential feature of this geometry is the semiparallelogram law for the metric δ2. To understand this recall the parallelogram law in a Hilbert space images/nec-217-2.png. Let a and b be any two points in images/nec-217-3.png and let m = (a + b)/2 be their midpoint. Given any other point c consider the parallelogram one of whose diagonals is [a, b] and the other [c, d]. The two diagonals intersect at m

images/img-217-1.png

and the parallelogram law is the equality

images/eq-217-1.png

Upon rearrangement this can be written as

images/eq-217-2.png

In the semiparallelogram law this last equality is replaced by an inequality.


6.1.9  Theorem (The Semiparallelogram Law)

Let A and B any two points of images/nec-217-4.png and let M = A#B be the midpoint of the geodesic [A, B]. Then for any C in images/nec-217-5.png we have

images/eq-217-3.png(6.16)

Proof.   Applying the isometry ΓM−1/2 to all matrices involved, we may assume that M = I. Now I is the midpoint of [A, B] and so by Proposition 6.1.8 we have log B = − log A and

images/eq-218-1.png

The same proposition applied to [M, C] = [I, C] shows that

images/eq-218-2.png

The parallelogram law in the Hilbert space images/nec-218-1.png tells us

images/eq-218-3.png

The left-hand side of this equation is equal to images/nec-218-2.png and the subtracted term on the right-hand side is equal to images/nec-218-3.png. So the EMI (6.8) leads to the inequality (6.16).     ■


In a Euclidean space the distance between the midpoints of two sides of a triangle is equal to half the length of the third side. In a space whose metric satisfies the semiparallelogram law this is replaced by an inequality.


6.1.10  Proposition

Let A, B, and C be any three points in images/nec-218-4.png. Then

images/eq-218-4.png(6.17)

Proof.   Consider the triangle with vertices A, B and C (and sides the geodesic segments joining the vertices). Let M1 = A#B. This is the midpoint of the side [A, B] opposite the vertex C of the triangle {A, B, C}. Hence, by (6.16)

images/eq-218-5.png

Let M2 = A#C. In the triangle {A, M1, C} the point M2 is the midpoint of the side [A, C] opposite the vertex M1. Again (6.16) tells us

images/eq-218-6.png

Substituting the first inequality into the second we obtain

images/eq-219-1.png

Since δ2(M1, A) = δ2(A, B)/2, the right-hand side of this inequality reduces to images/nec-219-1.png. This proves (6.17).     ■

The inequality (6.17) can be used to prove a more general version of itself. For 0 ≤ t ≤ 1 let

images/eq-219-2.png(6.18)

This is another notation for the geodesic curve γ(t) in (6.11). When t = 1/2 this is the geometric mean A#B. The more general version is in the following.


6.1.11  Corollary

Given four points B, C, B, and C in Pn let

images/eq-219-3.png

Then f is convex on [0, 1]; i.e.,

images/eq-219-4.png(6.19)

Proof.   Since f is continuous it is sufficient to prove that it is midpoint-convex. Let M1 = B#B, M2 = C#C, and M = B#C. By Proposition 6.1.10 we have δ2(M1, M) ≤ δ2(B, C)/2 and δ2(M, M2) ≤ δ2(B, C)/2. Hence

images/eq-219-5.png

This shows that f is midpoint-convex.     ■

Choosing B = C = A in (6.19) gives the following theorem called the convexity of the metric δ2.


6.1.12  Theorem

Let A, B and C be any three points in Pn. Then for all t in [0, 1] we have

images/eq-220-1.png(6.20)

6.1.13  Exercise

For a fixed A in images/nec-220-1.png let f be the function images/nec-220-2.png. Show that if images/nec-220-3.png, then for 0 < t < 1

images/eq-220-2.png(6.21)

This is expressed by saying that the function f is strictly convex on images/nec-220-4.png. [Hint: Show this for t = 1/2 first.]


6.2   THE METRIC SPACE images/nec-220-5.png

In this section we briefly study some properties of the metric space images/nec-220-6.png with special emphasis on convex sets.


6.2.1  Lemma

The exponential is a continuous map from the space images/nec-220-7.png onto the space images/nec-220-8.png.

Proof.   Let Hm be a sequence in images/nec-220-9.png converging to H. Then eHmeH converges to I in the metric induced by ||.||2. So all the eigenvalues images/nec-220-10.png, 1 ≤ in, converge to 1. The relation (6.14) then shows that images/nec-220-11.png goes to zero as m goes to ∞.     ■


6.2.2  Proposition

The metric space images/nec-220-12.png is complete.

Proof.   Let {Am} be a Cauchy sequence in images/nec-220-13.png and let Hm = log Am. By the EMI (6.8) {Hm} is a Cauchy sequence in images/nec-220-14.png , and hence it converges to some H in images/nec-220-15.png. By Lemma 6.2.1 the sequence {Am} converges to A = eH in the space images/nec-220-16.png.     ■

Note that Pn is not a complete subspace of images/nec-221-1.png. There it has a boundary consisting of singular positive matrices. In terms of the metric δ2 these are “points at infinity.” The next proposition shows that we may approach these points along geodesics. We use A#tB for the matrix defined by (6.18) for every real t. When A and B commute, this reduces to A1−tBt.


6.2.3  Proposition

Let S be a singular positive matrix. Then there exist commuting elements A and B in images/nec-221-2.png such that

images/eq-221-1.png

as t → ∞.

Proof.   Apply a unitary conjugation and assume S = diag (λ1, . . . , λn) where λk are nonnegative for 1 ≤ kn, and λk = 0 for some k. If λk > 0, then put αk = βk = λk, and if λk = 0, then put αk = 1 and βk = 1/2. Let A = diag (α1, . . . , αn) and B = diag (β1, . . . , βn). Then

images/eq-221-2.png

For the metric δ2 we have

images/eq-221-3.png

and this goes to ∞ as t → ∞.     ■


The point of the proposition is that the curve A#tB starts at A when t = 0, and “goes away to infinity” in the metric space images/nec-221-3.png while converging to S in the space images/nec-221-4.png .

It is conventional to extend some matrix operations from strictly positive matrices to singular positive matrices by taking limits. For example, the geometric mean A#B is defined by (4.10) for strictly positive matrices A and B, and then defined for singular positive matrices A and B as

images/eq-221-4.png

The next exercise points to the need for some caution when using this idea.


6.2.4  Exercise

The geometric mean A#B is continuous on pairs of strictly positive matrices, but is not so when extended to positive semidefinite matrices. (See Exercise 4.1.6.)


We have seen that any two points A and B in images/nec-222-1.png can be joined by a geodesic segment [A, B] lying in images/nec-222-2.png. We say a subset images/nec-222-3.png of images/nec-222-4.png is convex if for each pair of points A and B in images/nec-222-5.png the segment [A, B] lies entirely in images/nec-222-6.png. If images/nec-222-7.png is any subset of images/nec-222-8.png, then the convex hull of images/nec-222-9.png is the smallest convex set containing images/nec-222-10.png. This set, denoted as conv images/nec-222-11.png is the intersection of all convex sets that contain images/nec-222-12.png. Clearly, the convex hull of any two point set {A, B} is [A, B].


6.2.5  Exercise

Let S be any set in images/nec-222-13.png. Define inductively the sets images/nec-222-14.png as images/nec-222-15.png and

images/eq-222-1.png

Show that

images/eq-222-2.png

The next theorem says that if images/nec-222-16.png is a closed convex set in images/nec-222-17.png, then a metric projection onto images/nec-222-18.png exists just as it does in a Hilbert space.


6.2.6  Theorem

Let images/nec-222-19.png be a closed convex set in images/nec-222-20.png. Then for each images/nec-222-21.png there exists a point images/nec-222-22.png such that δ2(A, C) < δ2(A, K) for every K in images/nec-222-23.png, images/nec-222-24.png. (In other words C is the unique best approximant to A from the set images/nec-222-25.png.)

Proof.   Let µ = inf {δ2(A, K) : Kimages/nec-222-26.png} . Then there exists a sequence {Cn} in images/nec-222-27.png such that δ2(A, Cn) → µ. Given n and m, let M be the midpoint of the geodesic segment [Cn, Cm]; i.e., M = Cn#Cm. By the convexity of images/nec-222-28.png the point M is in images/nec-222-29.png. Using the semiparallelogram law (6.16) we get

images/eq-222-3.png

and hence

images/eq-223-1.png(6.22)

As n and m go to ∞, the right-hand side of (6.22) goes to zero. Hence {Cn} is a Cauchy sequence, and by Proposition 6.2.2 it converges to a limit C in images/nec-223-1.png. Since images/nec-223-2.png is closed, C is in images/nec-223-3.png. Further δ2(A, C) = lim δ2(A, Cn) = µ. If K is any other element of images/nec-223-4.png such that δ2(A, K) = µ, then putting Cn = C and Cm = K in (6.22) we see that δ2(C, K) = 0; i.e., C = K.     ■

The map π(A) = C given by Proposition 6.2.6 may be called the metric projection onto K.


6.2.7 Theorem

Let π be the metric projection onto a closed convex set images/nec-223-5.png of images/nec-223-6.png. If A is any point of images/nec-223-7.png and π(A) = C, then for any D in images/nec-223-8.png

images/eq-223-2.png(6.23)

Proof.   Let {Mn} be the sequence defined inductively as M0 = D, and Mn+1 = Mn#C. Then δ2(C, Mn) = 2nδ2(C, D), and Mn converges to C = M. By the semiparallelogram law (6.16)

images/eq-223-3.png

Hence,

images/eq-223-4.png

Summing these inequalities we have

images/eq-223-5.png

It is easy to see that the two series are absolutely convergent.

Let images/nec-223-9.png. Then the last inequality can be written as

images/eq-223-6.png

The same argument applied to Mn in place of D shows

images/eq-224-1.png

Thus

images/eq-224-2.png

Since images/nec-224-1.png is convex, each images/nec-224-2.png, and hence dn ≥ 0. Thus we have

images/eq-224-3.png

This proves the inequality (6.23).     ■


6.2.8  The geometric mean once again

If images/nec-224-3.png is a Euclidean space with metric d, and a, b are any two points of images/nec-224-4.png , then the function

images/eq-224-4.png

attains its minimum on images/nec-224-5.png at the unique point images/nec-224-6.png. In the metric space images/nec-224-7.png this role is played by the geometric mean.

Proposition. Let A and B be any two points of Pn, and let

images/eq-224-5.png

Then the function f is strictly convex on images/nec-224-8.png, and has a unique minimum at the point X0 = A#B.

Proof.   The strict convexity is a consequence of Exercise 6.1.13. The semiparallelogram law implies that for every X we have

images/eq-225-1.png

Hence

images/eq-225-2.png

This shows that f has a unique minimum at the point X0 = A#B.     ■


6.3  CENTER OF MASS AND GEOMETRIC MEAN

In Chapter 4 we discussed, and resolved, the problems associated with defining a good geometric mean of two positive matrices. In this section we consider the question of a suitable definition of a geometric mean of more than two matrices. Our discussion will show that while the case of two matrices is very special, ideas that work for three matrices do work for more than three as well.

Given three positive matrices A1, A2, and A3, their geometric mean G(A1, A2, A3) should be a positive matrix with the following properties. If A1, A2, and A3 commute with each other, then G(A1A2A3) = (A1A2A3)1/3. As a function of its three variables, G should satisfy the conditions:

(i)   G(A1, A2, A3) = G(Aπ(1), Aπ(2), Aπ(3)) for every permutation π of {1, 2, 3}.

(ii)   G(A1, A2, A3) ≤ G(A1, A2, A3) whenever A1A1.

(iii)  G(XA1X, XA2X, XA3X) = XG(A1, A2, A3)X for all XGL(n).

(iv)  G is continuous.

The first three conditions may be called symmetry, monotonicity, and congruence invariance, respectively.

None of the procedures that we used in Chapter 4 to define the geometric mean of two positive matrices extends readily to three. While two positive matrices can be diagonalized simultaneously by a congruence, in general three cannot be. The formula (4.10) has no obvious analogue for three matrices; nor does the extremal characterization (4.15). It is here that the connections with geometry made in Sections 6.1.7 and 6.2.8 suggest a way out: the geometric mean of three matrices should be the “center” of the triangle that has the three matrices as its vertices.

As motivation, consider the arithmetic mean of three points x1, x2, and x3 in a Euclidean space images/nec-226-1.png. The point images/nec-226-2.png is characterized by several properties; three of them follow:

(i)  images/nec-226-3.png is the unique point of intersection of the three medians of the triangle Δ(x1, x2, x3). (This point is called the centroid of Δ.)

(ii)  images/nec-226-4.png is the unique point in images/nec-226-5.png at which the function

images/eq-226-1.png

attains its minimum. (This point is the center of mass of the triple {x1, x2, x3} if each of them has equal mass.)

(iii) images/nec-226-6.png is the unique point of intersection of the nested sequence of triangles {Δn} in which Δ1 = Δ(x1, x2, x3) and Δj+1 is the triangle whose vertices are the midpoints of the three sides of Δj.

We may try to mimic these constructions in the space images/nec-226-7.png. As we will see, this has to be done with some circumspection.

The first difficulty is with the identification of a triangle in this space. In Section 6.2 we defined convex hulls and observed that the convex hull of two points A1, A2 in images/nec-226-8.png is the geodesic segment [A1, A2]. It is harder to describe the convex hull of three points A1, A2, A3. (This seems to be a difficult problem in Riemannian geometry.) In the notation of Exercise 6.2.5, if images/nec-226-9.png = {A1, A2, A3}, then images/nec-226-10.png = [A1, A2] ∪ [A2, A3] ∪ [A3, A1] is the union of the three “edges.” However, images/nec-226-11.png is not in general a “surface,” but a “fatter” object. Thus it may happen that the three “medians” [A1, A2#A3], [A2, A1#A3], and [A3, A1#A2] do not intersect at all in most cases. So, we have to abandon this as a possible definition of the centroid of the triangle Δ(A1, A2, A3).

Next we ask whether for every triple of points A1, A2, A3 in images/nec-226-12.png there exists a (unique) point X0 at which the function

images/eq-226-2.png

attains its minimum value on images/nec-226-13.png. A simple argument using the semiparallelogram law shows that such a point exists. This goes as follows.

Let m = inf f(X) and let {Xr} be a sequence in images/nec-227-1.png such that f(Xr) → m. By the semiparallellgram law we have for j = 1, 2, 3, and for all r and s

images/eq-227-1.png

Summing up these three inequalities over j, we obtain

images/eq-227-2.png

This shows that

images/eq-227-3.png

It follows that {Xr} is a Cauchy sequence, and hence it converges to a limit X0. Clearly f attains its minimum at X0. By Exercise 6.1.13 the function f is strictly convex and its minimum is attained at a unique point.

We define the “center of mass” of {A1, A2, A3} as the point

images/eq-227-4.png(6.24)

where the notation arcmin f(X) stands for the point X0 at which the function f(X) attains its minimum value. It is clear from the definition that G(A1, A2, A3) is a symmetric and continuous function of the three variables. Since each congruence transformation ΓX is an isometry of images/nec-227-2.png it is easy to see that G is congruence invariant; i.e.,

images/eq-227-5.png

Thus G has three of the four desirable properties listed for a good geometric mean at the beginning of this section. We do not know whether G is monotone. Some more properties of G are derived below.


6.3.1  Lemma

Let φ1, φ2 be continuously differentiable real-valued functions on the interval (0, ∞) and let

images/eq-227-6.png

for all images/nec-228-1.png. Then the derivative of h is given by the formula

images/eq-228-1.png

Proof.   By the product rule for differentiation (see MA, p. 312) we have

images/eq-228-2.png

Choose an orthonormal basis in which X = diag (λ1, . . . , λn). Then by (2.40)

images/eq-228-3.png

Hence,

images/eq-228-4.png

Similarly,

images/eq-228-5.png

This proves the lemma.     ■


6.3.2  Corollary

Let images/nec-228-2.png, images/nec-228-3.png. Then

images/eq-228-6.png

We need a slight modification of this result. If

images/eq-228-7.png

then

images/eq-228-8.png(6.25)

for all images/nec-228-4.png.


6.3.3  Theorem

Let A1, A2, A3 be any three elements of images/nec-229-1.png, and let

images/eq-229-1.png(6.26)

Then the derivative of f at X is given by

images/eq-229-2.png(6.27)

for all images/nec-229-2.png.

Proof.   Using the relation (6.13) we have

images/eq-229-3.png

Using (6.25) we see that Df(X)(Y ) is a sum of three terms of the form

images/eq-229-4.png

Here we have used the similarity invariance of trace at the first step, and then the relation

images/eq-229-5.png

at the second step. The latter is valid for all matrices T with no eigenvalues on the half-line (−∞, 0] and for all invertible matrices S, and follows from the usual functional calculus. This proves the theorem.     ■


6.3.4  Theorem

Let A1, A2, A3 be three positive matrices and let X0 = G(A1, A2, A3) be the point defined by (6.24). Then X0 is the unique positive solution of the equation

images/eq-230-1.png(6.28)

Proof.   The point X0 is the unique minimum of the function (6.26),

and hence, is characterised by the vanishing of the derivative (6.27) for all images/nec-230-1.png. But any matrix orthogonal to all Hermitian matrices is zero. Hence

images/eq-230-2.png(6.29)

In other words X0 satisfies the equation (6.28).     ■


6.3.5  Exercise

Let A1, A2, A3 be pairwise commuting positive matrices. Show that G(A1, A2, A3) = (A1A2A3)1/3.


6.3.6  Exercise

Let X and A be positive matrices. Show that

images/eq-230-3.png(6.30)

(This shows that the matrices occurring in (6.29) are Hermitian.)


6.3.7  Exercise

Let w = (w1, w2, w3), where wj ≥ 0 and images/nec-230-2.png. We say that w is a set of weights. Let

images/eq-230-4.png

Show that fw is strictly convex, and attains a minimum at a unique point.

Let Gw(A1, A2, A3) be the point where fw attains its minimum. The special choice w = (1/3, 1/3, 1/3) leads to G(A1, A2, A3).


6.3.8  Proposition

Each of the points Gw(A1, A2, A3) lies in the closure of the convex hull conv ({A1, A2, A3}).

Proof.   Let images/nec-231-1.png be the closure of conv ({A1, A2, A3}) and let π be the metric projection onto images/nec-231-2.png. Then by Theorem 6.2.7, images/nec-231-3.png images/nec-231-4.png for every images/nec-231-5.png. Hence fw(X) ≥ fw(π(X)) for all X. Thus the minimum value of fw(X) cannot be attained at a point outside images/nec-231-6.png.     ■


Now we turn to another possible definition of the geometric mean of three matrices inspired by the characterisation of the centre of a triangle as the intersection of a sequence of nested triangles.

Given A1, A2, A3 in images/nec-231-7.png inductively construct a sequence of triples images/nec-231-8.png as follows. Set images/nec-231-9.png, and let

images/eq-231-1.png(6.31)

6.3.9  Theorem

Let A1, A2, A3 be any three points in images/nec-231-10.png, and let images/nec-231-11.png be the sequence defined by (6.31). Then for any choice of Xm in convimages/nec-231-12.png the sequence {Xm} converges to a point X ∈ conv ({A1, A2, A3}). The point X does not depend on the choice of Xm.

Proof.   The diameter of a set images/nec-231-13.png in images/nec-231-14.png is defined as

images/eq-231-2.png

It is easy to see, using convexity of the metric δ2, that if diam images/nec-231-15.png = M, then diam images/nec-231-16.png.

Let images/nec-231-17.png. By (6.17), and what we said above, diam images/nec-231-18.png, where M0 = diam {A1, A2, A3}. The sequence images/nec-231-19.png is a decreasing sequence. Hence {Xm} is Cauchy and converges to a limit X. Since Xm is in images/nec-231-20.png for all m, the limit X is in the closure of images/nec-231-21.png. The limit is unique as any two such sequences can be interlaced.     ■


6.3.10  A geometric mean of three matrices

Let G#(A1, A2, A3) be the limit point X whose existence has been proved in Theorem 6.3.9. This may be thought of as a geometric mean of A1, A2, A3. From its construction it is clear that G# is a symmetric continuous function of A1, A2, A3. Since the geometric mean A#B of two matrices is monotone in A and B and is invariant under congruence transformations, these properties are inherited by G#(A1, A2, A3) as its construction involves successive two-variable means and limits.


Exercise Show that for a commuting triple A1, A2, A3 of positive matrices G#(A1, A2, A3) = (A1A2A3)1/3.


One may wonder whether G#(A1, A2, A3) is equal to the centre of mass G(A1, A2, A3). It turns out that this is not always the case. Thus we have here two different candidates for a geometric mean of three matrices. While G# has all properties that we seek, it is not known whether G is monotone in its arguments. It does have all other desired properties.


6.4  RELATED INEQUALITIES

Some of the inequalities proved in Section 6.1 can be generalized from the special ||·||2 norm to all Schatten ||·||p norms and to the larger class of unitarily invariant norms. These inequalities are very closely related to others proved in very different contexts like quantum statistical mechanics. This section is a brief indication of these connections.

Two results from earlier chapters provide the basis for our generalizations. In Exercise 2.7.12 we saw that for a positive matrix A

images/eq-232-1.png

for every X and every unitarily invariant norm. In Section 5.2.9 we showed that for every choice of n positive numbers λ1, . . . , λn, the matrix

images/eq-232-2.png

is positive. Using these we can easily prove the following generalized version of Proposition 6.1.2.


6.4.1  Proposition (Generalized IEMI)

For all H and K in Hn we have

images/eq-233-1.png(6.32)

for every unitarily invariant norm.

In the definition (6.2) replace || · ||2 by any unitarily invariant norm ||| · ||| and call the resulting length L|||·|||; i.e.,

images/eq-233-2.png(6.33)

Since |||X||| is a (symmetric gauge) function of the singular values of X, Lemma 6.1.1 carries over to L|||·|||. The analogue of (6.4),

images/eq-233-3.png(6.34)

is a metric on images/nec-233-1.png invariant under congruence transformations. The generalized IEMI leads to a generalized EMI. For all A, B in images/nec-233-2.png we have

images/eq-233-4.png(6.35)

or, in other words, for all H, K in images/nec-233-3.png

images/eq-233-5.png(6.36)

Some care is needed while formulating statements about uniqueness of geodesics. Many unitarily invariant norms have the property that, in the metric they induce on images/nec-233-4.png, the straight line segment is the unique geodesic joining any two given points. If a norm ||| · ||| has this property, then the metric δ|||·||| on images/nec-233-5.png inherits it. The Schatten p-norms have this property for 1 < p <, but not for p = 1 or ∞. With this proviso, statements made in Sections 6.1.5 and 6.1.6 can be proved in the more general setting. In particular, we have

images/eq-233-6.png(6.37)

The geometric mean A#B defined by (4.10) is equidistant from A and B in each of the metrics δ|||·|||. For certain metrics, such as the ones corresponding to Schatten p-norms for 1 < p <, this is the unique “metric midpoint” between A and B.

The parallelogram law and the semiparallelogram law, however, characterize a Hilbert space norm and the associated Riemannian metric. These are not valid for other metrics.

Now we can see the connection between these inequalities arising from geometry to others related to physics. Some facts about majorization and unitarily invariant norms are needed in the ensuing discussion. Let H, K be Hermitian matrices. From (6.36) and (6.37) we have

images/eq-234-1.png(6.38)

The exponential function is convex and monotonically increasing on images/nec-234-1.png. Such functions preserve weak majorization (Corollary II.3.4 in MA). Using this property we obtain from the inequality (6.38)

images/eq-234-2.png(6.39)

Two special cases of this are well-known inequalities in physics. The special cases of the || · ||1 and the || · || norms in (6.39) say

images/eq-234-3.png(6.40)

and

images/eq-234-4.png(6.41)

where λ1(X) is the largest eigenvalue of a matrix with real eigenvalues. The first of these is called the Golden-Thompson inequality and the second is called Segal’s inequality.

The inequality (6.41) can be easily derived from the operator monotonicity of the logarithm function (Exercise 4.2.5 and Section 5.3.7).

Let

images/eq-234-5.png

Then

images/eq-234-6.png

and hence

images/eq-234-7.png

Since log is an operator monotone function on (0, ∞), it follows that

images/eq-234-8.png

Hence

images/eq-234-9.png

and therefore

images/eq-235-1.png

This leads to (6.41).

More interrelations between various inequalities are given in the next section and in the notes at the end of the chapter.


6.5  SUPPLEMENTARY RESULTS AND EXERCISES

The crucial inequality (6.6) has a short alternate proof based on the inequality between the geometric and the logarithmic means. This relies on the following interesting formula for the derivative of the exponential map:

images/eq-235-2.png(6.42)

This formula, attributed variously to Duhamel, Dyson, Feynman, and Schwinger, has an easy proof. Since

images/eq-235-3.png

we have

images/eq-235-4.png

Hence

images/eq-235-5.png

This is exactly the statement (6.42).

Now let H and K be Hermitian matrices. Using the identity

images/eq-235-6.png

and the first inequality in (5.34) we obtain

images/eq-235-7.png

The last integral is equal to DeH(K). Hence,

images/eq-236-1.png

This is the IEMI (6.6).

The inequality (5.35) generalizes (5.34) to all unitarily invariant norms. So, exactly the same argument as above leads to a proof of (6.32) as well.

From the expression (6.14) it is clear that

images/eq-236-2.png(6.43)

for all images/nec-236-1.png. Similarly, from (6.37) we see that

images/eq-236-3.png(6.44)

An important notion in geometry is that of a Riemannian symmetric space. By definition, this is a connected Riemannian manifold M for each point p of which there is an isometry σp of M with two properties:

(i)  σp(p) = p, and

(ii)  the derivative of σp at p is multiplication by −1.

The space images/nec-236-2.png is a Riemannian symmetric space. We show this using the notation and some basic facts on matrix differential calculus from Section X.4 of MA. For each images/nec-236-3.png let σA be the map defined on images/nec-236-4.png by

images/eq-236-4.png

Clearly σA(A) = A. Let images/nec-236-5.png be the inversion map. Then σA is the composite images/nec-236-6.png. The derivative of images/nec-236-7.png is given by images/nec-236-8.pngX−1Y X−1, while ΓA being a linear map is equal to its own derivative. So, by the chain rule

images/eq-236-5.png

Thus p(A) is multiplication by −1.

The Riemannian manifold images/nec-236-9.png has nonpositive curvature. The EMI captures the essence of this fact. We explain this briefly.

Consider a triangle Δ(O, H, K) with vertices O, H, and K in images/nec-236-10.png. The image of this set under the exponential map is a “triangle” Δ(I, eH, eK) in images/nec-237-1.png. By Proposition 6.1.5 the δ2-lengths of the sides [I, eH] and [I, eK] are equal to the || · ||2 -lengths of the sides [O, H] and [O, K], respectively. By the EMI (6.8) the third side [eH, eK] is longer than [H, K]. Keep the vertex O as a fixed pivot and move the sides [O, H] and [O, K] apart to get a triangle Δ(O, H, K) in images/nec-237-2.png whose three sides now have the same lengths as the δ2-lengths of the sides of Δ(I, eH, eK) in images/nec-237-3.png. Such a triangle is called a comparison triangle for Δ(I, eH, eK) and it is unique up to an isometry of images/nec-237-4.png. The fact that the comparison triangle in the Euclidean space images/nec-237-5.png is “fatter” than the triangle Δ(I, eH, eK) is a characterization of a space of nonpositive curvature.

It may be instructive here to compare the situation with the space images/nec-237-6.png consisting of unitary matrices. This is a compact manifold of nonnegative curvature. In this case the real vector space images/nec-237-7.png consisting of skew-Hermitian matrices is mapped by the exponential onto images/nec-237-8.png. The map is not injective in this case; it is a local diffeomorphism.


6.5.1  Exercise

Let H and K be any two skew-Hermitian matrices. Show that

images/eq-237-1.png(6.45)

[Hint: Follow the steps in the proof of Proposition 6.1.2. Now the λi are imaginary. So the hyperbolic function sinh occurring in the proof of Proposition 6.1.2 is replaced by the circular function sin. Alternately prove this using the formula (6.42). Observe that etH is unitary.]

As a consequence we have the opposite of the inequality (6.8) in this case: if A and B are sufficiently close in images/nec-237-9.png, then

images/eq-237-2.png

Thus the exponential map decreases distance locally. This fact captures the nonnegative curvature of images/nec-237-10.png.

Of late there has been interest in general metric spaces of nonpositive curvature (not necessarily Riemannian manifolds). An important consequence of the generalised EMI proved in Section 6.4 is that for every unitarily invariant norm the space images/nec-237-11.png is a metric space of nonpositive curvature. These are examples of Finsler manifolds, where the metric arises from a non-Euclidean metric on the tangent space.

A metric space (X, d) is said to satisfy the semiparallelogram law if for any two points a, bX, there exists a point m such that

images/eq-238-1.png(6.46)

for all cX.


6.5.2  Exercise

Let (X, d) be a metric space with the semiparallelogram law. Show that the point m arising in the definition is unique and is the metric midpoint of a and b; i.e., m is the point at which d(a, m) = d(b, m) = images/nec-238-1.png.


A complete metric space satisfying the semiparallelogram law is called a Bruhat-Tits space. We have shown that images/nec-238-2.png is such a space. Those of our proofs that involved only completeness and the semiparallelogram law are valid for all Bruhat-Tits spaces. See, for example, Theorems 6.2.6 and 6.2.7.

In the next two exercises we point out more connections between classical matrix inequalities and geometric facts of this chapter. We use the notation of majorization and facts about unitarily invariant norms from MA, Chapters II and IV. The reader unfamiliar with these may skip this part.


6.5.3  Exercise

An inequality due to Gel’fand, Naimark, and Lidskii gives relations between eigenvalues of two positive matrices A and B and their product AB. This says

log λ(A) + log λ(B) ≺ log λ(AB) ≺ log λ(A) + log λ(B). (6.47) See MA p. 73. Let A, B, and C be three positive matrices. Then

images/eq-238-2.png

So, by the second part of (6.47)

images/eq-238-3.png

Use this to show directly that δ|||·||| defined by (6.36) is a metric on images/nec-238-3.png.


6.5.4  Exercise

Let A and B be positive. Then for 0 ≤ t ≤ 1 and 1 ≤ kn we have

images/eq-239-1.png(6.48)

See MA p. 258. Take logarithms of both sides and use results on majorization to show that

images/eq-239-2.png

This may be rewritten as

images/eq-239-3.png

Show that this implies that the metric δ|||·||| is convex.

In Section 4.5 we outlined a general procedure for constructing matrix means from scalar means. Two such means are germane to our present discussion. The function f in (4.69) corresponding to the logarithmic mean is

images/eq-239-4.png

So the logarithmic mean of two positive matrices A and B given by the formula (4.71) is

images/eq-239-5.png

In other words

images/eq-239-6.png(6.49)

where γ(t) is the geodesic segment joining A and B.

Likewise, for 0 ≤ t ≤ 1 the Heinz mean

images/eq-239-7.png(6.50)

leads to the function

images/eq-239-8.png

and then to the matrix Heinz mean

images/eq-240-1.png(6.51)

The following theorem shows that the geodesic γ(t) has very intimate connections with the order relation on Pn.


6.5.5  Theorem

For every α in [0, 1/2] we have

images/eq-240-2.png

Proof.   It is enough to prove the scalar versions of these inequalities as they are preserved in the transition to matrices by our construction. For fixed a and b, Ht(a, b) is a convex function of t on [0, 1]. It is symmetric about the point t = 1/2 at which it attains its minimum. Hence the quantity

images/eq-240-3.png

is an increasing function of α for 0 ≤ α ≤ 1/2. Similarly,

images/eq-240-4.png

is a decreasing function of α. These considerations show

images/eq-240-5.png

The theorem follows from this.     ■


6.5.6  Exercise

Show that for 0 ≤ t ≤ 1

images/eq-241-1.png(6.52)

[Hint: Show that for each λ > 0 we have λt ≤ (1 − t) + tλ.]


6.5.7  Exercise

Let Φ be any positive linear map on images/nec-241-1.png. Then for all positive matrices A and B

images/eq-241-2.png

[Hint: Use Theorem 4.1.5 (ii).]


6.5.8  Exercise

The aim of this exercise is to give a simple proof of the convergence argument needed to establish the existence of G#(A1, A2, A3) defined in Section 6.3.10.

(i)  Assume that A1A2A3. Then the sequences defined in (6.31) satisfy

images/eq-241-3.png

The sequence images/nec-241-2.png is increasing and images/nec-241-3.png is decreasing. Hence the limits

images/eq-241-4.png

exist. Show that L = U. Thus

images/eq-241-5.png

Call this limit G#(A1, A2, A3).

(ii)  Now let A1, A2, A3 be any three positive matrices. Choose positive numbers λ and µ such that

images/eq-241-6.png

Let (B1, B2, B3) = (A1, λA2, µA3). Apply the special case (i) to get the limit G#(B1, B2, B3). The same recursion applied to the triple of numbers (a1, a2, a3) = (1, λ, µ) gives

images/eq-242-1.png

Since

images/eq-242-2.png

it follows that the sequences images/nec-242-1.png, j = 1, 2, 3, converge to the limit G#(B1, B2, B3)/(λµ)1/3.


6.5.9  Exercise

Show that the center of mass defined by (6.24) has the property

images/eq-242-3.png

for all positive matrices A1, A2, A3. Show that G# also satisfies this relation.


6.6  NOTES AND REFERENCES

Much of the material in Sections 6.1 and 6.2 consists of standard topics in Riemannian geometry. The arrangement of topics, the emphasis, and some proofs are perhaps eccentric. Our view is directed toward applications in matrix analysis, and the treatment may provide a quick introduction to some of the concepts. The entire chapter is based on R. Bhatia and J. A. R. Holbrook, Riemannian geometry and matrix geometric means, Linear Algebra Appl., 413 (2006) 594–618.

Two books on Riemannian geometry that we recommend are M. Berger, A Panoramic View of Riemannian Geometry, Springer, 2003, and S. Lang, Fundamentals of Differential Geometry, Springer, 1999. Closely related to our discussion is M. Bridson and A. Haefliger, Metric Spaces of Non-positive Curvature, Springer, 1999. Most of the texts on geometry emphasize group structures and seem to downplay the role of the matrices that constitute these groups. Lang’s text is exceptional in this respect. The book A. Terras, Harmonic Analysis on Symmetric Spaces and Applications II, Springer, 1988, devotes a long chapter to the space images/nec-242-2.png.

The proof of Proposition 6.1.2 is close to the treatment in Lang’s book. (Lang says he follows “Mostow’s very elegant exposition of Cartan’s work.”) The linear algebra in our proof looks neater because a part of the work has been done earlier in proving the Daleckii-Krein formula (2.40) for the derivative. The second proof given at the beginning of Section 6.5 is shorter and more elementary. This is taken from R. Bhatia, On the exponential metric increasing property, Linear Algebra Appl., 375 (2003) 211–220.

Explicit formulas like (6.11) describing geodesics are generally not emphasized in geometry texts. This expression has been used often in connection with means. With the notation A#tB this is called the t-power mean. See the comprehensive survey F. Hiai, Log-majorizations and norm inequalities for exponential operators, Banach Center Publications Vol. 38, pp. 119–181.

The role of the semiparallelogram law is highlighted in Chapter XI of Lang’s book. A historical note on page 313 of this book places it in context. To a reader oriented towards analysis in general, and inequalities in particular, this is especially attractive. The expository article by J. D. Lawson and Y. Lim, The geometric mean, matrices, metrics and more, Am. Math. Monthly, 108 (2001) 797–812, draws special attention to the geometry behind the geometric mean.

Problems related to convexity in differentiable manifolds are generally difficult. According to Note 6.1.3.1 on page 231 of Berger’s book the problem of identifying the convex hull of three points in a Riemannian manifold of dimension 3 or more is still unsolved. It is not even known whether this set is closed. This problem is reflected in some of our difficulties in Section 6.3.

Berger attributes to E. Cartan, Groupes simples clos et ouverts et géometrie Riemannienne, J. Math. Pures Appl., 8 (1929) 1–33, the introduction of the idea of center of mass in Riemannian geometry. Cartan showed that in a complete manifold of nonpositive curvature (such as images/nec-243-1.png) every compact set has a unique center of mass. He used this to prove his fundamental theorem that says any two compact maximal subgroups of a semisimple Lie group are always conjugate.

The idea of using the center of mass to define a geometric mean of three positive matrices occurs in the paper of Bhatia and Holbrook cited earlier and in M. Moakher, A differential geometric approach to the geometric mean of symmetric positive-definite matrices, SIAM J. Matrix Anal. Appl., 26 (2005) 735–747. This paper contains many interesting ideas. In particular, Theorem 6.3.4 occurs here. Applications to problems of elasticity are discussed in M. Moakher, On the averaging of symmetric positive-definite tensors, preprint (2005).

The manifold images/nec-244-1.png is the most studied example of a manifold of nonpositive curvature. However, one of its basic features—order—seems not to have received any attention. Our discussion of the center of mass and Theorem 6.5.5 show that order properties and geometric properties are strongly interlinked. A study of these properties should lead to a better understanding of this manifold.

The mean G#(A1, A2, A3) was introduced in T. Ando, C.-K Li, and R. Mathias, Geometric Means, Linear Algebra Appl., 385 (2004) 305–334. Many of its properties are derived in this paper which also contains a detailed survey of related matters. The connection with Riemannian geometry was made in the Bhatia-Holbrook paper cited earlier. That G# and the center of mass may be different, is a conclusion made on the basis of computer-assisted numerical calculations reported in Bhatia-Holbrook. A better theoretical understanding is yet to be found.

As explained in Section 6.5 the EMI reflects the fact that images/nec-244-2.png has nonpositive curvature. Inequalities of this type are called CAT(0) inequalities; the initials C, A, T are in honour of E. Cartan, A. D. Alexandrov, and A. Toponogov, respectively. These ideas have been given prominence in the work of M. Gromov. See the book W. Ballmann, M. Gromov, and V. Schroeder, Manifolds of Nonpositive Curvature, Birkhäuser, 1985, and the book by Bridson and Haefliger cited earlier. A concept of curvature for metric spaces (not necessarily Riemannian manifolds) is defined and studied in the latter. The generalised EMI proved in Section 6.4 shows that the space images/nec-244-3.png with the metric δ|||·||| is a metric space (a Finsler manifold) of nonpositive curvature.

Segal’s inequality was proved in I. Segal, Notes towards the construction of nonlinear relativistic quantum fields III, Bull. Am. Math. Soc., 75 (1969) 1390–1395. The simple proof given in Section 6.4 is borrowed from B. Simon, Trace Ideals and Their Applications, Second Edition, American Math. Society, 2005. The Golden-Thompson inequality is due to S. Golden, Lower bounds for the Helmholtz function, Phys. Rev. B, 137 (1965) 1127–1128, and C. J. Thompson, Inequality with applications in statistical mechanics, J. Math. Phys., 6 (1965) 1812–1813. Stronger versions and generalizations to other settings (like Lie groups) have been proved. Complementary inequalities have been proved by F. Hiai and D. Petz, The Golden-Thompson trace inequality is complemented, Linear Algebra Appl., 181 (1993) 153–185, and by T. Ando and F. Hiai, Log majorization and complementary Golden-Thompson type inequalities, ibid., 197/198 (1994) 113–131. These papers are especially interesting in our context as they involve the means A#tB in the formulation and the proofs of several results. The connection between means, geodesics, and inequalities has been explored in several interesting papers by G. Corach and coauthors. Illustrative of this work and especially close to our discussion are the two papers by G. Corach, H. Porta and L. Recht, Geodesics and operator means in the space of positive operators, Int. J. Math., 4 (1993) 193–202, and Convexity of the geodesic distance on spaces of positive operators, Illinois J. Math., 38 (1994) 87–94.

The logarithmic mean L(A, B) has not been studied before. The definition (6.49) raises interesting questions both for matrix theory and for geometry. In differential geometry it is common to integrate (real) functions along curves. Here we have the integral of the curve itself. Theorem 6.5.5 relates this object to other means, and includes the operator analogue of the inequality between the geometric, logarithmic, and arithmetic means. The norm version of this inequality appears as Proposition 3.2 in F. Hiai and H. Kosaki, Means for matrices and comparison of their norms, Indiana Univ. Math. J., 48 (1999) 899–936. Exercise 6.5.8 is based on the paper D. Petz and R. Temesi, Means of positive numbers and matrices, SIAM J. Matrix Anal. Appl., 27 (2005) 712–720.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.144.194