7
Procrustes analysis

7.1 Introduction

This chapter outlines various methods based on Procrustes methods, which are very practical tools for analysing landmark data. Procrustes methods have earlier been seen to be useful for assessing distances between shapes in Chapter 3. In this chapter we provide a more comprehensive treatment of Procrustes methods suitable for two and higher dimensional shape analysis.

Procrustes methods are useful for estimating an average shape and for exploring the structure of shape variability in a dataset. The techniques described in this chapter are generally of a descriptive nature and more explicit emphasis on shape models and inference will be considered in Chapters 9 and 10.

Procrustes analysis involves matching configurations with similarity transformations to be as close as possible according to Euclidean distance, using least squares techniques. Procrustes analysis using orthogonal (rotation/reflection) matrices was developed initially for applications in psychology, and early papers on the topic appeared in the journal Psychometrika. The technique can be traced back to Boas (1905) and Mosier (1939) and later principal references include Green (1952); Cliff (1966); Schönemann (1966, 1968); Gruvaeus (1970); Schönemann and Carroll (1970); Gower (1971, 1975); Ten Berge (1977); Sibson (1978, 1979); Langron and Collins (1985); and Goodall (1991). In addition, Sneath (1967) considered a similar least squares matching procedure, with applications to biological shape comparison in mind. Gower (1975); Kendall (1984); Goodall (1991); Ziezold (1994); and Le (1995) also discuss the pure rotation case that is of interest in shape analysis. McLachlan (1972) derived the rigid body transformation case for comparing proteins. Ziezold (1977) considered a similar procedure for matching configurations using translation and rotation, in a mathematically rigorous manner. Some other books that include introductions to Procrustes analysis for comparing matrices (e.g. in MDS) are Mardia et al. (1979, p. 416); Cox and Cox (1994); Krzanowski and Marriott (1994); Borg and Groenen (1997); Everitt and Rabe-Hesketh (1997); and Koch (2014).

We begin by describing ordinary Procrustes analysis (OPA) which is used for matching two configurations in Section 7.2. When at least two configurations are available we can use the technique of generalized Procrustes analysis (GPA) to obtain an average shape, as in Section 7.3. The implementation of ordinary and generalized Procrustes analyses is particularly simple in m = 2 dimensions, when complex arithmetic allows Euclidean similarity matching to be expressed as a complex linear regression problem, as seen later in Chapter 8. Generalized Procrustes analysis leads to an explicit eigenvector solution for planar data (see Result 8.2), but a numerical algorithm is required for higher dimensions. We also describe some variants of Procrustes analysis.

After an estimate of mean shape has been obtained we often wish to explore the structure of shape variability in a dataset. Principal component analysis of the tangent shape coordinates using the Procrustes mean as the pole sometimes provides a suitable method. Various graphical procedures for displaying the principal components are presented. If groups are available then canonical variate analysis and linear or quadratic discriminant analysis could be used in the tangent space.

7.2 Ordinary Procrustes analysis

7.2.1 Full OPA

Let us first consider the case where two configuration matrices X1 and X2 are available (both k × m matrices of coordinates from k points in m dimensions) and we wish to match the configurations as closely as possible, up to similarity transformations. In this chapter we assume without loss of generality that the configuration matrices X1 and X2 have been centred using Equation (3.8).

Definition 7.1 The method of full OPA involves the least squares matching of two configurations using the similarity transformations. Estimation of the similarity parameters γ, Γ and β is carried out by minimizing the squared Euclidean distance

where ||X|| = {trace(XTX)}1/2 is the Euclidean norm, Γ is an (m × m) rotation matrixinline SO(m)), β > 0 is a scale parameter and γ is an (m × 1) location vector. The minimum of Equation (7.1) is written as OSS (X1, X2), which stands for ordinary (Procrustes) sum of squares.

In Section 4.1.1, when calculating distances, we were interested in the minimum value of an expression similar to Equation (7.1) except X1 and X2 were of unit size.

Result 7.1 The full ordinary Procrustes solution to the minimization of (7.1) is given by where

where

with Λ a diagonal m × m matrix of positive elements except possibly the last element, and so the singular values are optimally signed, see Equation (4.4) . The solution is unique if X2 is non-degenerate in the sense of condition (4.7). Also, recall that X1 and X2 are centred already, and so the centring provides the optimal translation. Furthermore,

(7.5) numbered Display Equation

and

where ρ(X1, X2) is the Riemannian shape distance of Equation (4.12) .

Proof: We wish to minimize

numbered Display Equation

where X1 and X2 are centred. It simple to see that we must take . If

numbered Display Equation

are the pre-shapes of Xi, then we need to minimize

numbered Display Equation

and we find the minimizing Γ from Lemma 4.2, Equation (4.6). Differentiating with respect to β we obtain:

numbered Display Equation

Hence,

(7.7) numbered Display Equation

Substituting and into Equation (7.1) leads to:

Unnumbered Display Equation

and so the result of Equation (7.6) follows.    □

Note that λm will be negative in the cases where an orthogonal transformation (reflection and rotation) would produce a smaller sum of squares than just a rotation. In practice, for fairly close shapes λm will usually be positive – in which case the solution is the same as minimizing over the orthogonal matrices O(m) instead of SO(m).

Definition 7.2 The full Procrustes fit (or full Procrustes coordinates) of X1 onto X2 is:

numbered Display Equation

where we use the superscript ‘P’ to denote the Procrustes registration. The residual matrix after Procrustes matching is defined as:

numbered Display Equation

Sometimes examining the residual matrix can tell us directly about the difference in shape, for example if one residual is larger than others or if the large residuals are limited to one region of the object or other patterns are observed. In other situations it is helpful to use further diagnostics for shape difference such as the partial warps from thin-plate spline transformations, discussed later in Section 12.3.4.

In order to find the decomposition of Equation (7.4) in practice, one carries out the usual singular value decomposition VΛUT and if either one of U or V has determinant − 1, then its mth column is multiplied by − 1 and λm is negated.

In general if the rôles of X1 and X2 are reversed, then the ordinary Procrustes registration will be different. Writing the estimates for the reverse order case as we see that , but in general. In particular

numbered Display Equation

unless the figures are both of the same size, and so one cannot use √OSS(X1, X2) as a distance. If the figures are normalized to unit size, then we see that

numbered Display Equation

and in this case √OSS(X1/||X1||, X2/||X2||) = sin ρ(X1, X2) is a suitable choice of shape distance, and was denoted as dF(X1, X2) in Equation (4.10) – the full Procrustes distance. In Example 7.1 we see the ordinary Procrustes registration of a juvenile and an adult sooty mangabey in two dimensions.

Since each of the figures can be rescaled, translated and rotated (the full set of similarity transformations) we call the method full Procrustes analysis. There are many other variants of Procrustes matching and these are discussed in Section 7.6.

The term ordinary Procrustes refers to Procrustes matching of one observation onto another. Where at least two observations are to be matched to a common unknown mean the term GPA is used, which is discussed in Section 7.3.

Full Procrustes shape analysis for 2D data is particularly straightforward using complex arithmetic and details are given in Chapter 8.

Example 7.1 Consider a juvenile and an adult from the sooty mangabey data of Section 1.4.12. The unregistered outlines are shown in Figure 7.1. In Figure 7.2 we see the full Procrustes fit of the adult onto the juvenile (Figure 7.2a) and the full Procrustes fit of the juvenile onto the adult (Figure 7.2b). In matching the juvenile onto the adult and . We see that the estimate of scale in matching the adult onto the juvenile is and the rotation is . Note that because the adult and juvenile are not the same size (the matching is not symmetric). Computing the measure of full Procrustes shape distance we see that dF = 0.1049.

Image described by surrounding text and caption.

Figure 7.1 Unregistered sooty mangabeys: juvenile (—–); and adult (- - -).

Image described by surrounding text and caption.

Figure 7.2 The Procrustes fit of (a) the adult sooty mangabey (- - -) onto the juvenile (—–) and (b) the juvenile onto the adult.

7.2.2 OPA in R

In the shapes library in R the function procOPA is used for OPA. The command for ordinary Procrustes registration of B onto A is procOPA(A,B). To match the juvenile Sooty Mangabey onto the adult male Sooty Mangabey we have:

data(sooty)
juvenile <- sooty[,,1]
adult<- sooty[,,2]
ans <- procOPA(adult, juvenile )
ans$Bhat
          [,1] [,2]
 [1,] -879.30998 -1396.74936
 [2,] -998.77789 -1276.29016
 [3,] -1143.71649 -647.43917
 [4,] -1287.85042 67.78971
 [5,] -119.56578 2079.45536
 [6,] 1465.48601 2253.98504
 [7,] 1740.93692 565.09765
 [8,] 1060.70877 180.56805
 [9,] 662.17427 36.14918
[10,] 102.18926 -379.26311
[11,] -63.53411 -533.53575
[12,] -538.74055 -949.76743
ans$OSS
[1] 298926.7
ans$rmsd
[1] 157.8308
ans$R
        [,1] [,2]
[1,] 0.7005709 0.7135828
[2,] -0.7135828 0.7005709
print(atan2(ans$R[1,2],ans$R[1,1])*180/pi) #rotation angle in degrees
[1] 45.52717
ans$s
[1] 1.130936

Hence, we see that the value of the OSS is 298926.7 and the RMSD is . The rotation angle is 45.53o and the scaling is 1.1309, as seen in Example 7.1.

7.2.3 Ordinary partial Procrustes

One may be interested in size-and-shape (form), in which case it is not of interest to consider scaling. The objects must be measured on the same scale.

Definition 7.3 Partial OPA involves registration over translation and rotation only to match two configurations, and scaling is not required. Minimization is required of the expression (Boas, 1905)

numbered Display Equation

The same solution for the location vector and the rotation matrix as in the full Procrustes case [Equation (7.2) and Equation (7.3)] gives the minimum

which is the square of the Riemannian size-and-shape distance of Equation (5.5).

Partial Procrustes analysis on the original centred configurations is particularly appropriate when studying joint size-and-shape, as considered in Chapter 5. If the two configurations are of unit size, then Equation (7.8) is equal to d2P(X1, X2), the square of the partial Procrustes distance.

Ordinary partial Procrustes matching can be carried out in R using procOPA but with option scale=FALSE. For the partial Procrustes matching of the juvenile sooty mangebey to the adult using partial Procrustes analysis we have OSSp = 659375 and RMSD = 234.4097, and the rotation is again 45.52717o of course. The commands are:

ans <- procOPA(adult, juvenile , scale=FALSE)
 
ans$OSS
[1] 659375
 
ans$rmsd
[1] 234.4097
 
ans$R
[,1] [,2]
[1,] 0.7005709 0.7135828
[2,] -0.7135828 0.7005709
 
ans$s
[1] 1

7.2.4 Reflection Procrustes

A further type of ordinary Procrustes matching is where reflections are also allowed. The use of Procrustes methods in shape analysis is generally different from their more traditional use in multivariate analysis, where rotation and reflection are used instead of pure rotation (e.g. see Mardia et al. 1979, p. 416).

If reflections are not important than we can also include reflection invariance by using orthogonal matrices Γ inline O(m), where det(Γ) = ±1 in the matching, instead of special orthogonal matrices Γ inline SO(m), where det(Γ) = +1. We call this reflection Procrustes analysis. For datasets with small variability (and full rank configurations) there will usually be no difference between the two approaches.

Details of the minimization with Γ inline O(m) (an orthogonal matrix) follow in a similar manner to that of Γ inline SO(m) (Gower 1975; Sibson 1978, 1979; Goodall 1991; Cox and Cox 1994; Krzanowski and Marriott 1994). The method when allowing for reflections is almost identical to the rotation only case, except that a singular value decomposition with all positive diagonal elements is used for the reflection/rotation case, whereas the smallest singular value can be negative in the rotation case.

We can use calculus to carry out the minimization over the orthogonal matrices. We need to minimize

(7.9) numbered Display Equation

with respect to Γ inline O(m) (cf. Mardia et al. 1979, p. 416). Consider using the decomposition of X2TX1 in Equation (7.4). Hence, we must maximize trace(VΛUTΓ) with respect to Γ, subject to the constraints that ΓΓT = Im [which gives m(m + 1)/2 constraints, because Γ is symmetric]. Let be a symmetric m × m matrix of Lagrange multipliers for these constraints. The aim is to maximize

By direct differentiation it can be shown that

numbered Display Equation

Hence on differentiating Equation (7.10) and setting derivatives equal to zero we find that

Note that

numbered Display Equation

and so we can take L = UΛUT. Substituting L into Equation (7.11) we see that

numbered Display Equation

Note that , and so we have found the minimizing rotation/reflection. In practice, for fairly close shapes the solutions for minimizing over the orthogonal matrices O(m) instead of SO(m) are the same.

In R we can carry out reflection shape analysis using the option reflect=TRUE, and for the sooty mangabey data there is no difference between the matching whether using reflections or not.

ans <- procOPA(adult, juvenile , reflect=TRUE)
 
ans$OSS
[1] 298926.7

7.3 Generalized Procrustes analysis

7.3.1 Introduction

Consider now the general case where n ≥ 2 configuration matrices are available X1, …, Xn. For example, the configurations could be a random sample from a population with population [μ], and we wish to estimate the shape of the population mean with an ‘average’ shape from the sample.

For shape analysis our objects need not be commensurate in scale. However, for size-and-shape analysis our objects do need to be commensurate in scale.

A least squares approach to finding an estimate of [μ] is that of generalized Procrustes analysis (GPA), a direct generalization of OPA. We shall see that GPA provides a practical method of computing the sample full Procrustes mean, defined in Section 6.2.

Definition 7.4 The method of full GPA involves translating, rescaling and rotating the configurations relative to each other so as to minimize a total sum of squares

with respect to βi, Γi, γi, i = 1, …, n and μ, subject to an overall size constraint. The constraint on the sizes can be chosen in a variety of ways. For example, we could choose

Full generalized Procrustes matching involves the registration of all configurations to optimal positions by translating, rotating and rescaling each figure so as to minimize the sum of squared Euclidean distances between each transformed figure and the estimated mean. The constraint of Equation (7.13) prevents the from all becoming close to 0.

Definition 7.5 The full Procrustes fit (or full Procrustes coordinates) of each of the Xi is given by:

(7.14) numbered Display Equation

where (rotation matrix), (scale parameter), (location parameters), i = 1, …, n, are the minimizing parameters.

An algorithm to estimate the transformation parameters (γi, βi, Γi) is described in Section 7.4. The parameters (Γi, βi, γi) have been termed ‘nuisance parameters’ by Goodall (1991) because they are not the parameters of primary interest in shape analysis.

Result 7.2 The point in shape space corresponding to the arithmetic mean of the Procrustes fits,

has the same shape as the full Procrustes mean, which was defined in Equation (6.11).

Proof: The result follows because we are minimizing sums of Euclidean square distances in GPA. The minimum of

numbered Display Equation

over μ is given by .

Hence, estimating the transformation parameters by GPA is equivalent to minimizing

(7.16) numbered Display Equation

subject to the constraint (7.13).

After a collection of objects has been matched into optimal full Procrustes position with respect to each other, calculation of the full Procrustes mean shape is simple; it is computed by taking the arithmetic means of each coordinate. Full Procrustes matching also been called ‘Procrustes-with-scaling’ by Dryden (1991) and Mardia and Dryden (1994). We see that full GPA is analogous to minimizing sums of squared distances in the shape space d2F defined in Equation (4.10).

The full Procrustes mean shape has to be found iteratively for m = 3 and higher dimensional data, and an explicit eigenvector solution is available for 2D data, which will be seen in Result 8.2.

Example 7.2 In Figure 7.3 we see the registered male and female macaques from the dataset described in Section 1.4.3, using full Procrustes registration. There are k = 7 landmarks in m = 3 dimensions. The R commands are:

> outm<-procGPA(macm.dat)
> outf<-procGPA(macf.dat)
> wire<-c((1:7),1,6,4,1,4,3,7,3,5)
> shapes3d(outm$rotated,joinline=wire)
> shapes3d(outf$rotated,joinline=wire)
> Bhat<-procOPA(outm$mshape,outf$mshape)$Bhat
> shapes3d(abind(outm$mshape,Bhat),joinline=wire,
            col=c(rep(2,times=7),rep(4,times=7)))
> sin(riemdist(outm$mshape,Bhat))
[1] 0.05351208
> outm$rmsd
[1] 0.07872565
> outf$rmsd
[1] 0.05811237

In two of the males the highest landmark in the ‘y’ direction (bregma) is somewhat further away (in the ‘x’ direction) than in the rest of the specimens. This landmark is highly variable in primates. The full Procrustes means (normalized to unit size) are displayed in Figure 7.4 and the female mean has been registered onto the male mean by OPA. The full Procrustes estimated mean shapes for the males and females are full Procrustes distance dF = 0.0535 apart, and the root mean square of dF to the estimated mean shape within each group is 0.0787 for the males and 0.0581 for the females. The males are a little more variable in shape than the females, although the non-isotropic nature of the variation also needs to be considered. Formal tests for mean shape difference are considered in Chapter 9.

Image described by surrounding text and caption.

Figure 7.3 The male (a) and female (b) macaque skulls registered by full GPA.

Image described by surrounding text and caption.

Figure 7.4 The male (red) mean shape registered to the female (blue) mean shape of the macaque skulls registered by OPA.

7.4 Generalized Procrustes algorithms for shape analysis

For practical implementation one can use the GPA algorithm of Gower (1975), modified by Ten Berge (1977). The idea of GPA was originally proposed by Kristof and

Wingersky (1971) and adapted to this situation by Gower (1975). Langron and Collins (1985) gave some useful distributional results based on perturbation theory. Goodall and Bose (1987) and Goodall (1991) adapted the method explicitly for shape analysis.

7.4.1 Algorithm: GPA-Shape-1

  1. Translations. Centre the configurations to remove location. Initially let
    numbered Display Equation
    where C is the centring matrix of Equation (2.3).
  2. Rotations. For the ith configuration let
    numbered Display Equation
    then the new XPi is taken to be the ordinary Procrustes registration, involving only rotation, of the old XPi onto . The n figures are rotated in turn. This process is repeated until the Procrustes sum of squares of Equation (7.12) cannot be reduced further (i.e. the difference is less than a tolerance parameter tol1). Hence, the matrices are symmetric positive semi-definite.
  3. Scaling. Let Φ be the n × n correlation matrix of the vec(XPi) (with the usual rôles of variable and observation labels reversed) with eigenvector ϕ = (ϕ1, …, ϕn)T corresponding to the largest eigenvalue. Then from Ten Berge (1977) take
    numbered Display Equation
    which is repeated for all i.
  4. Repeat steps 2 and 3 until the Procrustes sum of squares of Equation (7.12) cannot be reduced further (i.e. the difference in sum of squares is less than another tolerance parameter tol2).

The algorithm usually converges very quickly. The two tolerance parameters for deciding when a step has converged are clearly data dependent, and specific choices are discussed in Section 7.4.3. Note that the resulting registration satisfies the constraint (7.13). From Goodall (1991) typical implementations will include a total of between 3 × n × 2 and 5 × n × 3 OPA steps. Groisser (2005) discussed several variants of GPA, proved that GPA converges, and provided error estimates and convergence times.

This algorithm, as described by Gower (1975) and Ten Berge (1977), included reflection invariances, although mention was made of the adaptation to the rotation only case in the papers. For shape analysis there must be modification to ensure Γi inline SO(m) rather than O(m). For many datasets with small variability in shape, the algorithms will give the same solution, whether modified to exclude reflections or not.

Note that the rotation step of this algorithm is based on minimizing the sum of squares with respect to rotations given that is given by Equation (7.15). In particular,

numbered Display Equation

and hence as each rotation matrix is updated the sum of squares is reduced until a minimim is reached.

A numerical procedure based on a Newton algorithm for Procrustes matching with translation and rotation was given by Fright and Linney (1993). Other simple implementations also often work well, such as using the first configuration as the initial estimated mean or combining steps 2 and 3 to carry out a full Procrustes match to the current estimated mean. This latter approach leads to a second algorithm.

7.4.2 Algorithm: GPA-Shape-2

  1. Translations. Centre the configurations to remove location. Initially let
    numbered Display Equation
    where C is the centring matrix of (2.3).
  2. Initialize for example at .
  3. Rotations and scale For the ith configuration (i = 1, …, n) carry out an ordinary Procrustes match by rotating and scaling to ,
    numbered Display Equation
  4. Update .
  5. Repeat steps 3 and 4 until the Procrustes sum of squares of Equation (7.12) cannot be reduced further (i.e. the difference in sum of squares is less than tolerance parameter tol2).

Both GPA algorithms should converge to the same mean shape, although the final value will usually differ up to a rotation and scale (which is irrelevant).

Note that in practice the two algorithms behave very similarly. There are some practical advantages in the second algorithm in some cases as it is easily parallelizable and there is some flexibility in the choice of start point for the algorithm. If n is very large, as for example in molecular dynamics analysis (Sherer et al. 1999; Harris et al. 2001), there is a significant advantage in being able use parallel rotation updates on multiple CPUs, and only requiring the updated mean to be kept in memory after each iteration (rather than all the current fitted values).

When n = 2 objects are available we can consider OPA or GPA to match them. The advantage of using GPA is that the matching procedure is symmetrical in the ordering of the objects, that is GPA of X1 and X2 is the same as GPA of X2 and X1. As we have seen in Section 7.2.1, OPA is not symmetrical in general, unless the objects have the same size.

7.4.3 GPA in R

To carry out GPA in the shapes package in R one can use the command procGPA. This function is perhaps the most useful of all the commands in the shapes library, and carries out Procrustes registration using the GPA-Shape-1 algorithm of Section 7.4.1, computes the Procrustes mean, and various summary statistics. The default setting of the procGPA function carries out rotation and scaling. In order to remove the scaling step one uses the option scale=FALSE and examples are given in Section 7.5.3.

To carry out GPA on the macaque data, with scaling included:

ansm <- procGPA(macm.dat)
ansf <- procGPA(macf.dat)

To compute the Riemannian distance between the full Procrustes mean shape estimates, and the r.m.s.d. of the full Procrustes shape distances to the mean we have:

riemdist(ansm$mshape,ansf$mshape)
[1] 0.05353765
> ansm$rmsd
[1] 0.07872565
> ansf$rmsd
[1] 0.05811237

Note that the function procGPA provides many calculations, including the centroid size of each observation ($size), the Riemannian distance to the mean ($rho), the Procrustes rotated figures ($rotated), the sample mean shape ($mshape), and the tangent coordinates ($tan). The particular types of means and tangent coordinates depends on the option choices, but the default is that the full Procrustes mean shape is given with the Procrustes residuals as approximate tangent coordinates.

The R function procGPA contains two tolerances as detailed in Algorithm GPA-Shape-1 (Section 7.4.1) which are set to ensure the accuracy of the algorithm. The option tol1 is the tolerance for optimal rotations in the iterative GPA algorithm (step 2 of GPA-Shape-1), which is the tolerance on the mean sum of squares (divided by size of mean squared) between successive iterations, and the rotation iterations stop when the normalized mean sum of squares is less than tol1. The option tol2 is a tolerance for the scale/rotation steps for GPA for the iterative algorithm, which is tolerance on the mean sum of squares (divided by size of mean squared) between successive iterations. We illustrate the differences in the choices of tolerances for the male macaque data, using the extended output option proc.output=TRUE.

 ans1<-procGPA(macm.dat,tol1=1e-05,tol2=1e-05,proc.output=TRUE)
 Step | Objective function | change
- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
Initial objective fn 0.01223246 -
- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
  Rotation iteration 1 0.008385595 0.003846862
  Rotation iteration 2 0.008278599 0.0001069951
  Rotation iteration 3 0.00827771 8.893455e-07
- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
Rotation step 0 0.00827771 0.003954746
- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
  Scaling updated
  Rotation iteration 1 0.006236379 1.663161e-08
- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
Scale/rotate step 1 0.006236379 0.002041331
- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
  Scaling updated
  Rotation iteration 1 0.006236379 1.535184e-10
- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
Scale/rotate step 2 0.006236379 1.535344e-10
- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
Shape distances and sizes calculation ...
PCA calculation ...
Finished.

Each rotation step involves n OPA matchings, where n is the sample size. In this calculation there are three rotation steps, a scaling step, a rotation step, a scaling step and a rotation step. So, here there are 5 × 9 OPA matchings. For a second choice of tolerances we have:

 ans2<-procGPA(macm.dat,tol1=1e-08,tol2=1e-08,proc.output=TRUE)
 Step | Objective function | change
- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
Initial objective fn 0.01223246 -
- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
  Rotation iteration 1 0.008385595 0.003846862
  Rotation iteration 2 0.008278599 0.0001069951
  Rotation iteration 3 0.00827771 8.893455e-07
  Rotation iteration 4 0.008277695 1.500593e-08
  Rotation iteration 5 0.008277695 1.662341e-10
- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
Rotation step 0 0.008277695 0.003954761
- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
  Scaling updated
  Rotation iteration 1 0.006236379 1.905269e-09
- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
Scale/rotate step 1 0.006236379 0.002041316
- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
  Scaling updated
  Rotation iteration 1 0.006236379 2.268505e-11
- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
Scale/rotate step 2 0.006236379 2.268794e-11
- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
Shape distances and sizes calculation ...
PCA calculation ...
Finished.
 
riemdist(ans1$mshape,ans2$mshape)
[1] 4.942156e-08

Here there are 7 × 9 OPA matchings, and the resulting mean shapes are almost identical. Using a higher tolerance we have:

ans3<-procGPA(macm.dat,tol1=1e-02,tol2=1e-02,proc.output=TRUE)
 Step | Objective function | change
- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
Initial objective fn 0.01223246 -
- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
  Rotation iteration 1 0.008385595 0.003846862
- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
Rotation step 0 0.008385595 0.003846862
- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
  Scaling updated
- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
Scale/rotate step 1 0.006343774 0.00204182
- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -
Shape distances and sizes calculation ...
PCA calculation ...
Finished.
 
riemdist(ans2$mshape,ans3$mshape)
[1] 0.0002719985

Here there are just n OPA matchings, and there is a non-trivial difference between the mean shape estimators. The default tolerances for GPA in procGPA are 10− 5 which has worked well in a very large variety of scenarios.

7.5 Generalized Procrustes algorithms for size-and-shape analysis

Partial Procrustes analysis involves just registration by translation and rotation (not scaling) as opposed to full Procrustes analysis which involves the full set of similarity transformations. The terminology was introduced by Kent (1992). The objects must be recorded to the same scale, and so partial Procrustes analysis is appropriate for size-and-shape analysis. We minimize

(7.17) numbered Display Equation

which follows since

numbered Display Equation

For size-and-shape analysis there is no need to include an overall size constraint, as was required for the pure shape case in Equation (7.13).

The GPA algorithms to carry out the minimization are simply adapted to the size-and-shape case by not including the scaling steps.

7.5.1 Algorithm: GPA-Size-and-Shape-1

  1. Translations. Centre the configurations to remove location.
  2. Rotations. For the ith configuration let
    numbered Display Equation
    then the new XPi is taken to be the ordinary Procrustes registration, involving only rotation, of the old onto . The n figures are rotated in turn. This process is repeated until the Procrustes sum of squares of Equation (7.12) cannot be reduced further (i.e. the difference is less than a tolerance parameter tol1). Hence, the matrices are symmetric positive semidefinite.

7.5.2 Algorithm: GPA-Size-and-Shape-2w

  1. Translations. Centre the configurations to remove location.
  2. Initialize for example at .
  3. Rotations. For the ith configuration (i = 1, …, n) carry out an ordinary Procrustes match by rotating to ,
    numbered Display Equation
  4. Update .
  5. Repeat steps 3 and 4 until the Procrustes sum of squares of Equation (7.12) cannot be reduced further (i.e. the difference in sum of squares is less than tolerance parameter tol2).

7.5.3 Partial GPA in R

Algorithm GPA-Size-and-Shape-1 can be carried out with the R command procGPA(data , scale=FALSE), and so for the Digit 3 data we have:

ans1 <- procGPA(digit3.dat, scale=FALSE)
 
ans1$GSS
[1] 5204.127

Here partial GPA is carried out and the resulting Procrustes sum of squares Gp = 5204. The resulting estimate of Fréchet mean size-and-shape (see Definition 6.12) is in given in ans1$mshape. Note that this estimate is different from the unit size partial Procrustes mean shape in Section 7.6.2, which requires all configurations to be of unit size in the matching.

The usual full GPA is now carried out for comparison:

ans2 <- procGPA(digit3.dat, scale=TRUE)
 
ans2$GSS
[1] 3851.577
 
riemdist(ans1$mshape,ans2$mshape)
[1] 0.01165189

The full Procrustes sum of squares G = 3851, and so the scaling has reduced the sum of squares as expected. The full Procrustes mean shape (up to an arbitrary scaling) is given in ans2$mshape. The Riemannian shape distance between the shapes of the two estimates is 0.01165.

7.5.4 Reflection GPA in R

If matching using orthogonal matrices instead of rotation matrices the method is called reflection Procrustes analysis. This can be carried out using procGPA with option reflect=TRUE. For the digit 3 data we consider reflection Procrustes without or with scaling, respectively:

ans3 <- procGPA(digit3.dat, scale=FALSE, reflect=TRUE)
 
ans2$GSS
[1] 5204.127
 
ans4 <- procGPA(digit3.dat, scale=TRUE, reflect=TRUE)
 
ans4$GSS
[1] 3851.577

Note that reflection GPA in the digit 3 data is the same as GPA without reflection invariance, as all the individuals are very far from their reflected versions.

7.6 Variants of generalized Procrustes analysis

7.6.1 Summary

There are many variants to GPA, and so to help with the terminology we provide a summary in Table 7.1.

Table 7.1 Nomenclature for the different types of ordinary Procrustes analysis (OPA) registrations with n = 2 objects, and generalized Procrustes analysis (GPA) registrations with n ≥ 2 objects.

Name Transformations Section
Full OPA Translation, rotation, scale 7.2.1
Partial OPA Translation, rotation 7.2.3
Reflection OPA Additional reflection 7.2.4
Full GPA Translation, rotation, scale 7.3
Partial GPA Translation, rotation 7.5.3
Reflection GPA Additional reflection 7.5.4

7.6.2 Unit size partial Procrustes

If all the configurations have unit size, then it can be seen that the estimator of [μ] is obtained by minimizing sums of squared chordal distances on the pre-shape sphere, and the resulting estimator is the partial Procrustes mean (see Section 6.3). In this case

(7.18) numbered Display Equation

This is the approach used by Ziezold (1994) and Le (1995) and described by Kent (1992). There is some evidence (Ziezold 1989; Stoyan and Frenz 1993) that this approach leads to non-unique solutions, although Le (1995) gave conditions when there is a unique solution. The method has also been called ‘Procrustes-without-scaling’ by Dryden (1991) and Mardia and Dryden (1994).

In some applications measurement error may be present and Du et al. (2015) considered size and shape analysis for error prone landmark data, and conditional score methods were used to provide asymptotically consistent estimators of rotation and/or scale.

7.6.3 Weighted Procrustes analysis

Standard Procrustes methods weight each landmark equally, and effectively treat the landmarks as uncorrelated. In weighted Procrustes analysis the Procrustes methods are adapted by replacing the squared Euclidean norm ||X||2 = trace(XTX) with the squared Mahalanobis norm

numbered Display Equation

in Section 7.2.1 for OPA and Section 7.3 for GPA. We write OPA(Σ) and GPA(Σ) for these general approaches, sometimes called weighted Procrustes methods, which are a form of weighted least squares. Some problems with estimating covariance matrices using Procrustes analysis are highlighted by Lele (1993). Lele (1993) recommends estimation of the covariance structure based on inter-landmark distances, and this approach is briefly described later in Section 15.3.

Although estimation of covariance matrices is problematic, working with known covariance matrices is fairly straightforward. In practice, of course, Σ is unknown and has to be estimated. Goodall (1991) gives estimates based on maximum likelihood considerations. If XPi are the registered figures by either unweighted or weighted Procrustes and is the resulting Procrustes mean, then

is an estimate of Σ for unrestricted covariances. This estimate is singular and so generalized inverses could be used in any applications. Principal component analysis of Equation (7.19) provides perhaps the best practical way forward in the case of small variations. This approach is equivalent to carrying out the orthogonal decomposition in a tangent plane to shape space and the approach is discussed in detail in Section 7.7.

Goodall (1991) emphasizes the use of factored covariances of the form (cf. Mardia 1984):

where Σk measures the covariances between landmarks and Σm models the variation identical at each landmark. The choice of factored models leads to a fairly straightforward adaptation of the OPA and GPA algorithms. However, factored covariances can be criticized because it is often unrealistic to assume the same structure of variability at each landmark, and their use can be problematic, as pointed out by Lele (1993) and Glasbey et al. (1995). A refinement was proposed by Goodall (1995) using restricted maximum likelihood estimation. Dimension reduction using eigendecomposition of factored covariance matrices of the form (7.20) has been investigated by Dryden et al. (2009a), with application to face identification using the MLE algorithm of Dutilleul (1999). Other patterned covariance structures include self-similar deflation (Bookstein, 2015a) based on bending energy (see Chapter 12).

In the isotropic case inference is more straightforward albeit unrealistic in practical applications. If Σ = σ2Ikm, then Goodall (1991) takes an estimate of variability as:

(7.21) numbered Display Equation

Note that

numbered Display Equation

approximately (for small σ) from Equation (9.7), where q is the dimension of the shape space. Hence, from the approximate chi-squared distribution

numbered Display Equation

so can be quite biased under the isotropic normal model, and a less biased estimator of σ2 is:

numbered Display Equation

Alternative methods for estimating the covariance structure include the offset normal maximum likelihood approach of Dryden and Mardia (1991b) which is described in Section 11.2. Theobald and Wuttke (2006, 2008) have developed an approximate maximum likelihood procedure which works well in many examples using factored covariance structure for size-and-shape or shape analysis, and is available in Douglas Theobald’s Theseus program.

So, one has several choices for GPA with general covariance structure. One possibility is to specify a known Σ and then use GPA(Σ). Alternatively, one could use GPA(Σ = I) and then obtain the estimate using a suitable technique. A third alternative is to use the following iterative procedure:

  1. Use GPA(Σ = I).
  2. Obtain the estimate using a suitable estimator.
  3. Carry out GPA().
  4. Iteratively cycle between steps 2 and 3 until convergence.

This procedure is not guaranteed to converge.

The development of Procrustes techniques to deal with non-isotropic covariance structures is still a topic of current research. Brignell et al. (2005, 2015) and Brignell (2007) have considered covariance weighted Procrustes analysis. Some procedures are available in R in procWOPA and procWGPA. Also see Bennani Dosse et al. (2011) for anisotropic GPA.

Brignell et al. (2005, 2015) provide an explicit solution for partial covariance weighted OPA. The method of partial covariance weighted OPA involves the least squares matching of one configuration to another using rigid-body transformations. Estimation of the translation and rotation parameters, γ and Γ, is carried out by minimizing the Mahalanobis norm,

where Σ (km × km) is a symmetric positive definite matrix, γ is an m × 1 location vector and Γ is an m × m special orthogonal rotation matrix, and

numbered Display Equation

The translation which minimizes Equation (7.22) is given by Result 7.3. In general, the minimizing rotation is solved numerically, however when m = 2 there is only one rotation angle and a solution is given by Result 7.4.

Result 7.3 Given two configuration matrices, X and μ, and a symmetric positive definite matrix, Σ, the translation, as a function of rotation, which minimizes the Mahalanobis norm, D2pCWP(X, μ; Σ) is:

(7.23) numbered Display Equation

Result 7.4 Consider m = 2, let A = [(Im⊗1k)TΣ− 1(Im⊗1k)]− 1(Im⊗1k)TΣ− 1, and denote the partitioned submatrices as:

numbered Display Equation

where the Aij have dimension (1 × k) and Xi, μi have dimension (k × 1) for i, j = 1, 2, then given two configuration matrices, X and μ, and a symmetric positive definite matrix Σ, the rotation which minimizes the Mahalanobis norm, D2pCWP(X, μ; Σ) is given by:

(7.24) numbered Display Equation

where

(7.25) numbered Display Equation

and λ is the real root less than of the quartic equation:

Note that a unique solution of Equation (7.26) that satisfies the constraint may not exist and it may be necessary to evaluate D2pCWP(X, μ; Σ) for several choices of λ or use numerical methods. Brignell et al. (2015) discuss further properties and extensions to general covariance weighted Procrustes with multiple observations.

Other forms of weighted Procrustes methods are when the observations X1, …, Xn themselves have different weights, or a non-isotropic covariance structure. This situation is quite straightforward to deal with, involving weighted individual terms in the Procrustes sum of squares. The GPA algorithm is simply adapted by replacing the mean at each iteration by a weighted mean. This method is used by Zhou et al. (2016) with applications in medical imaging.

7.7 Shape variability: principal component analysis

As well as estimation of mean shape or mean size-and-shape, it is also of greater interest to describe the structure of shape or size-and-shape variability. One such measure is the population covariance matrix of the coordinates in a suitable tangent space to shape space or size-and-shape space, where the pole of the projection μ is a population mean, and the pole is the point where the tangent space touches the manifold. The population tangent covariance matrix is:

numbered Display Equation

and V inline Tμ(M), where M is the manifold. The choice of tangent coordinates and inverse projection could be any suitable candidates from Section 4.4, for example inverse exponential and the exponential map; or partial tangent coordinates and inverse projection.

Given a random sample of data we can estimate the shape or size-and-shape of μ with a suitable estimator , and then project the data into the tangent space at . The sample covariance matrix of the tangent coordinates then provides an estimate of the shape or size-and-shape variability.

Note that we first describe PCA with Procrustes coordinates which is mathematically simple. Shape PCA requires a choice of base metric, called the Procrustes metric here, which is an arbitrary choice (Bookstein, 2015a). PCA with respect to other metrics is considered in Section 12.3.5, including relative warps.

7.7.1 Shape PCA

Cootes et al. (1992) and Kent (1994) developed PCA in a tangent space to shape space. In particular Kent (1994) proposed PCA of the partial Procrustes tangent coordinates defined in Equation (4.28), whereas Cootes et al. (1992) used PCA of the Procrustes residuals, which are approximate tangent coordinates (see Section 4.4.6).

The general method of sample PCA is the following:

  • Choose a pole for the tangent coordinates .
  • Calculate the tangent coordinates vi, i = 1, …, n, where .
  • Compute
    numbered Display Equation
    where .
  • Calculate the eigenvalues λj, j = 1, …, p and corresponding eigenvectors γj of Sv.
  • Calculate
    numbered Display Equation
    for a range of values of c.
  • Project back from v(c, j) to a suitable icon XI(c) to examine the structure of the jth principal component (PC).
  • The j PC scores are given by:
    numbered Display Equation
    for i inline 1, …, n. The number PCs with non-zero eigenvalues is p = min (n − 1, q) for shape and p = min (n − 1, q + 1) for size-and-shape, where q = kmmm(m − 1)/2 − 1.

The structure of the jth PC can be seen by plotting icons XI(c, j) for a range of values of c. In particular we examine

for a range of values of the standardized PC score c and then project back into configuration space using Equation (4.34) and Equation (3.11). The linear transformation to an icon in the configuration space

is often a good approximation to the inverse projection from the tangent space to an icon, near the pole.

So, to evaluate the structure of the jth PC for a range of values of c, calculate v from Equation (7.27), project back using the inverse transformation to the pre-shape sphere and then evaluate an icon using, say, Equation (4.34) and Equation (3.11) or the linear approximation (7.28) to give the centred pre-shape.

There are several ways to visualize the structure of each PC:

  1. Evaluate and plot an icon for a few values of c inline [ − 3, 3], where c = 0 corresponds to the full Procrustes mean shape. The plots could either be separate or registered.
  2. Draw vectors from the mean shape to the shape at c = +3 and/or c = −3 say to understand the structure of shape variability. The plots should clearly label which directions correspond to positive and negative c if both values are used.
  3. Superimpose a square grid on the mean shape and deform the grid to icons in either direction along each PC. The methods of Chapter 12 will be useful for drawing the grids, and for example the thin-plate spline deformation could be used.
  4. Animate a sequence of icons backwards and forwards along the range c inline [ − 3, 3]. This dynamic method is perhaps the most effective for displaying each PC.

In datasets where the shape variability is small it is often beneficial to magnify the range of c in order to easily visualize the structure of each PC.

In some datasets only a few PCs may be required to explain a high percentage of shape variability. Some PCs may correspond to interpretable aspects of variability (e.g. thickness, bending, shear) although interpretation is difficult due to the choice of Procrustes metric here (Bookstein, 2016). This can be improved using relative warps (see Chapter 12).

7.7.2 Kent’s shape PCA

Following Kent (1992), consider n pre-shapes Z1, …, Zn with tangent space shape coordinates given by v1, …, vn, with a pre-shape corresponding to the full Procrustes mean shape as the pole, so

(7.29) numbered Display Equation

where each vi is a real vector of length (k − 1)m, obtained from Equation (4.33). Alternatively we could use the full Procrustes residuals ri of Equation (8.13), or the inverse exponential map Procrustes tangent coordinates vEi of Equation (4.35).

Note that ∑ni = 1ri = 0 and ∑ni = 1vi ≈ 0 and ∑ni = 1vEi ≈ 0.

The PC loadings γj are the orthonormal eigenvectors of the sample covariance of the tangent coordinates Sv, corresponding to eigenvalues λj, j = 1, …, p = min (n − 1, q) (where q is the dimension of the shape space).

By carrying out PCA in the tangent space we are decomposing variability (the total sum of full Procrustes distances) into orthogonal components, with each PC successively explaining the highest variability in the data, subject to being orthogonal to the higher PCs. If the structure of shape variability is that the points are approximately independent and isotropic with equal variances, then the eigenvalues λj of the covariance matrix in tangent space will be approximately equal for 2D data (this property is proved in Section 11.1.6). If there are strong dependencies between landmarks, then only a few PCs may capture a large percentage of the variability.

An alternative decomposition which weights points close together differently from those far apart are relative warps (Bookstein 1991), described in Section 12.3.6. Further types of decomposition are independent components analysis (see Section 7.11) and the non-linear principal nested spheres (see Section 13.4.5).

7.7.3 Shape PCA in R

Carrying out tangent-based PCA is straightforward in the shapes library in R. The calculations are carried out in the routine procGPA and then plots are given using shapepca. To carry out PCA on Kent’s partial tangent coordinates for the T2 small data (with k = 6 landmarks) we have:

ans1 <- procGPA(qset2.dat , tangentcoords=”partial”)
> ans1$percent
[1] 6.871271e+01 9.721554e+00 7.779814e+00 6.553396e+00 2.657204e+00
[6] 2.362407e+00 1.542515e+00 6.703960e-01 5.842414e-08 8.280442e-30
> ans1$pcasd
[1] 5.425114e-02 2.040601e-02 1.825471e-02 1.675419e-02 1.066848e-02
[6] 1.005930e-02 8.128399e-03 5.358657e-03 1.581927e-06 1.883289e-17
shapepca(ans1,joinline=c(1,6,2,3,4,5,1),type=”r”)
shapepca(ans1,joinline=c(1,6,2,3,4,5,1),type=”v”)
shapepca(ans1,joinline=c(1,6,2,3,4,5,1),type=”s”)
shapepca(ans1,joinline=c(1,6,2,3,4,5,1),type=”g”)
shapepca(ans1,joinline=c(1,6,2,3,4,5,1),type=”m”)

We see that the percentage of variability explained by the first three PCs are 68.7 9.7 and 7.8%, respectively, and there are 2k − 4 = 8 PCs with non-zero variance (up to machine error). The PC scores are available in ans1$scores and the standard deviations for each PC are given in ans1$pcasd, as seen above, and these are the square root of the eigenvalues of the sample covariance matrix. The standardized scores (with sd=1) are in ans1$stdscores. In Figure 7.5 we see the first three PCs displayed in two format (using option type=”r”). The other displays (not shown here) are type equal to ”v”,”s”,”g” and ”m” for vector, superposition, grid and movie representations, respectively.

Image described by surrounding text and caption.

Figure 7.5 Plots of the first three PCs. In the jth row: mean − 3sd PCj, mean, mean + 3sd PCj (where j = 1, 2, 3).

If instead we use the Procrustes residuals we have:

ans2<- procGPA(qset2.dat , tangentcoords=”residual”)
> ans2$percent
[1] 6.843087e+01 9.701803e+00 7.832971e+00 6.554448e+00 2.671126e+00
[6] 2.366390e+00 1.586899e+00 6.752792e-01 1.802141e-01 5.786210e-08
[11] 1.188710e-28 8.399878e-29
> ans2$pcasd
[1] 9.415925e+00 3.545383e+00 3.185666e+00 2.914104e+00 1.860305e+00
[6] 1.750976e+00 1.433876e+00 9.353597e-01 4.832050e-01 2.738004e-04
[11] 1.241009e-14 1.043214e-14

and we see now that there are now nine non-zero variances, and the extra dimension of variability arises because the Procrustes residuals do not lie exactly in a tangent space. Note that the Procrustes residuals are on the overall scale of the original data – the figures have not been rescaled to the pre-shape sphere as in the partial Procrustes tangent space approach. However, the percentages of variability and structure of the PCs are very similar, apart from the arbitrary overall scaling in the residual PCA.

For a 3D example we consider the male macaque data:

ans3<-procGPA(macm.dat,tangentcoords=”partial”)
shapepca(ans3)
ans3$percent
[1] 4.740113e+01 2.083676e+01 1.286306e+01 8.373567e+00 5.866786e+00
[6] 2.650214e+00 1.292852e+00 7.156279e-01 5.549779e-31
shapepca(ans3)
shapepca(ans3,pcno=1,type=”g”,zslice=-1)

The default plot for the 3D PCA is a plot of the mean shape, with vectors drawn to the mean + 3sd PCj, j = 1, 2, 3, as seen in Figure 7.6. Figure 7.6 also shows a deformed grid which gives an indication of the main shape variability in PC1. The first PC explains 47.4% of the variability here.

Image described by caption.

Figure 7.6 (a) Plots of the mean (red spheres) with vectors to figures along the first three PCs: (black) mean + 3sd PC1; (red) mean + 3sd PC2; and (green) mean + 3sd PC3. (b) The mean (red) and a figure at mean + 3sd PC1 (blue) with a deformed grid on the blue figure at z = −1, which was deformed from being a square grid on the red figure at z = −1.

It can be seen that the main variability in the landmarks is in the ‘top most’ landmark in the xy plane (bregma), which is difficult to locate in primates. Referring back to Figure 7.3 it can be seen that there are two males with unusual ‘top-most’ landmarks (bregma) and this appears to give rise to the extra variability in these landmarks as seen in the first PC.

Example 7.3 A random sample of 23 T2 mouse vertebral outlines was taken from the Small group of mice introduced in Section 1.4.1. Six mathematical landmarks are located on each outline, and in between each pair of landmarks 9 equally spaced pseudo-landmarks were placed (as in Figure 1.4), giving a total of k = 60 landmarks in m = 2 dimensions. In Figure 7.7 we see the Procrustes registered outlines.

The sample covariance matrix in the tangent space [using the partial Procrustes coordinates of Equation (4.28)] is evaluated and in Figure 7.8 we see sequences of shapes evaluated along the first two PCs. Alternative representations are given in Figure 7.9, Figure 7.10 and Figure 7.11. The R code for producing the plots is:

data(mice)
t2<-mice$outlines[,,mice$group==”s”]
ans<-procGPA(t2,tangentcoords=”partial”)
x<-ans$rotated
plotshapes(ans$rotated,joinline=c(1:60,1))
shapepca(ans,type=”r”,mag=2,joinline=c(1:60,1),pcno=c(1:2))
shapepca(ans,type=”v”,mag=2,joinline=c(1:60,1),pcno=c(1:2))
shapepca(ans,type=”s”,mag=2,joinline=c(1:60,1),pcno=c(1:2))
shapepca(ans,type=”g”,mag=2,joinline=c(1:60,1),pcno=c(1:2))
pairs(cbind(ans$size,ans$rho,ans$scores[,1],ans$scores[,2],
     ans$scores[,3]),label=c(”size”,”rho”,”pc1”,”pc2”,”pc3”))

Shapes are evaluated in the tangent space and then projected back using the approximate linear inverse transformation of Equation (7.28) for visualization. The percentages of variability captured by the first two PCs are 64.6 and 8.8%, so the first PC is a very strong component here.

The first PC appears to highlight the length of the spinous process (the protrusion on the ‘top’ of the bone) in contrast to the relative width of the bone. The angle between lines joining landmarks 1 to 5 and 2 to 3 decreases as the height of landmark 4 increases, whereas there is little change in the angles from the lines

joining 1 to 6 and 2 to 6. The second PC highlights the pattern of asymmetry in the end of the spinous process and asymmetry in the rest of the bone.

Pairwise plots of the elements of the vector (si, ρi, ci1, ci2, ci3)T,  i = 1, …, n, are given in Figure 7.12, where si are the centroid sizes, ρi are the Riemannian distances to the mean, and ci1, ci2 and ci3 are the first three standardized PC scores.

There appears to be one bone that is much smaller than the rest and it also appears that there is some correlation between the first PC score and the centroid size of the bones.

An overall measure of shape variability is the root mean square of full Procrustes distance RMS(dF), which here is 0.07, and the shape variability in the data is quite small.

Image described by caption.

Figure 7.7 Procrustes rotated outlines of T2 Small mouse vertebrae.

Image described by surrounding text and caption.

Figure 7.8 Two rows of series of T2 vertebral shapes evaluated along the first two PCs – the ith row shows the shapes at c inline { − 6, 0, 6} standard deviations along the ith PC. Note that in each row the middle plot (c = 0) is the full Procrustes mean shape. By magnifying the usual range of c by 2 the structure of each PC is more clearly illustrated.

Image described by surrounding text and caption.

Figure 7.9 The first (a) and second (b) PCs for the T2 Small vertebral data. The plot shows the icons overlaid on the same picture. Each plot shows the shapes at c inline { − 6, −4, −2} (---*---), the mean shape at c = 0 (circled +) and the shapes at c inline { + 6, +4, +2} (…+…) standard deviations along each PC.

Image described by surrounding text and caption.

Figure 7.10 The first (a) and second (b) PCs for the T2 Small vertebral outline data. Each plot shows the full Procustes mean shape with vectors drawn from the mean (+) to an icon which is c = +6 standard deviations along each PC from the mean shape.

Image described by surrounding text and caption.

Figure 7.11 The first (a) and second (b) PCs for the T2 Small vertebral outline data. A square grid is drawn on the mean shape and deformed using a pair of thin-plate splines (see Chapter 12) to an icon c = 6 standard deviations along each PC (indicated by a vector from the mean to the icon). The plots just show the deformed grid at c = 6 for each PC and not the starting grids on the mean.

Image described by surrounding text and caption.

Figure 7.12 Pairwise plots of (si, ρi, ci1, ci2, ci3)T, i = 1, …, n, centroid size, Riemannian distance to the mean shape and the first three standardized PC scores, for the T2 Small vertebral outline data.

Image described by caption.

Figure 7.13 The full Procrustes coordinates for all 30 handwritten digits.

Image described by surrounding text and caption.

Figure 7.14 Pairwise plots of (si, ρi, ci1, ci2, ci3)T, i = 1, …, n, the centroid size, Riemannian distance ρ to the mean shape and the first three PC scores, for the digit 3 data. There appears to be an outlier with a particularly large value of ρ. Closer inspection indicates that the first digit may have poorly identified landmarks.

Image described by surrounding text and caption.

Figure 7.15 Principal component analysis of the digit number 3s. The ith row represents the ith PC, with configurations evaluated at − 3, 0, 3 standard deviations along each PC from the Procrustes mean. The central figure on each row (c = 0) is the Procrustes mean shape.

Image described by surrounding text and caption.

Figure 7.16 The first three PCs for the digit 3 data from the first (a) and second (b) PCs. Each plot shows the shapes at c inline { − 3, −2, −1} (---*---), the mean shape at c = 0 (circled +) and the shapes at c inline { + 3, +2, +1} (…+…) standard deviations along each PC.

Example 7.4 A random sample was taken of 30 handwritten digit number 3s in the dataset of Section 1.4.2. Thirteen landmarks were located by hand on images of each of the digits, and here k = 13 and m = 2. The full Procrustes rotated figures are displayed in Figure 7.13. The pairwise plots of (si, ρi, ci1, ci2, ci3)T,  i = 1, …, n, the centroid size, the Riemannian distance ρi to the mean and the first three PC scores are given in Figure 7.14. The first three PCs explain 50.4, 15.4, 12.8, 7.5 and 4.3% of the variability. There is quite a large amount of shape variability in these data – the root mean square of Riemannian distance RMS(ρ) is 0.274.

The PCs are displayed in Figure 7.15 and Figure 7.16. The first PC can be interpreted as partly capturing the amount that the central part (middle prong of the number 3) protrudes in contrast to the degree of curl in the end of the bottom loop and the length of the top loop. The second PC includes measurement of tall thin digits versus short fat digits (vertical/horizontal shear).

In Figure 7.14 there is a positive relationship between the absolute value of the score of PC1 (ci1) and the Riemannian distance ρi to the mean. This is not surprising and should be expected in most datasets, as shapes near to the mean will have the small scores along the PCs.

The R code for producing the Digit 3 PCA plots is:

data(digit3.dat)
ans<-procGPA(digit3.dat,tangentcoords=”partial”)
shapepca(ans,type=”r”,mag=1,joinline=c(1:13),pcno=c(1:2))
shapepca(ans,type=”s”,mag=1,joinline=c(1:60,1),pcno=c(1:2))
pairs(cbind(ans$size,ans$rho,ans$scores[,1],ans$scores[,2],
     ans$scores[,3]),label=c(”size”,”rho”,”pc1”,”pc2”,”pc3”))

There does appear to be one outlier in the dataset with large ρ. On closer inspection, the first digit in the dataset has its landmarks placed very poorly (landmark 7 is placed on the bottom loop where landmark 4 should be, and the other nearby points correspond poorly). The analysis was re-performed without the first digit and the root mean square of Procrustes distance RMS(ρ) is 0.26 and the first five PCs are similar to the first analysis with all the data and the percentages of variability explained by the first five PCs are 43.6, 18.4, 13.2, 9.0 and 4.9%. The similar structure in the first five PCs and the drop in the relative contribution of the first PC is expected, as the outlier digit has a particularly large score on the first PC.

There have been numerous applications of PCA in shape analysis, and it can provide a useful way of exploring low dimensional structures in shape data. Some further applications include face identification (Mallet et al. 2010; Morecroft et al. 2010; Evison et al. 2010), computer vision (Dryden 2003), modelling plant roots (Hodgman et al. 2006), and examining brain shape in epilepsy (Free et al. 2001). An in-depth discussion of shape PCA is given by Bookstein (2015b).

7.7.4 Point distribution models

Principal component analysis with the full Procrustes coordinates of Equation (8.13) has a particularly simple formulation. Cootes et al. (1992, 1994) use PCA to develop the point distribution model (PDM), which is a PC model for shape and uses Procrustes residuals rather than tangent coordinates. Given n independent configuration matrices X1, …, Xn the figures are registered to XP1, …, XnP by full GPA. The estimate of mean shape is taken to be the full Procrustes mean which has the same shape as . The sample covariance matrix is:

numbered Display Equation

and the PCs are the eigenvectors of this matrix, γj, j = 1, …, min (n − 1, q), with corresponding decreasing eigenvalues λj and q = (k − 1)mm(m − 1)/2 − 1 is the dimension of the shape space. Note that PCA using this formulation is the same (up to an overall scaling) as using the full Procrustes tangent coordinates of Equation (8.13), with the tangent coordinates pre-multiplied by HT, the transpose of the Helmert submatrix.

Visualization of the PCs is carried out as in the previous section. In particular, the structure in the jth PC can be viewed through plots of an icon for mean shape with displacement vectors

numbered Display Equation

for the shapes corresponding to c inline [ − 3, 3].

The approaches using partial or full Procrustes tangent coordinates or the PDMs are almost identical in practice, for datasets with small variability. The PDM can be thought of as a structural model in the tangent space.

Cootes et al. (1994) give several early examples of PDMs, and develop flexible models for describing shape variability in various datasets of images of objects, including hands, resistors and heart ventricles.

Image described by surrounding text and caption.

Figure 7.17 Varying hands: the first three PCs with values of c inline { − 2, −1, 0, 1, 2} here. (Reproduced by permission of Carfax Publishing Ltd.) Source: Cootes et al. 1994.

Example 7.5 An example of the PDM approach is given in Figure 7.17 taken from Cootes et al. (1994), who describe a flexible model for describing shape variability in hands. In Figure 7.17 the first PC highlights the spreading of the fingers, the second PC highlights movement of the thumb relative to the fingers, and the third PC highlights movement of the middle finger. The shape variability here is complicated because there are multiple patterns. As well as the biological shape variability of hands, the relative positions of the fingers contributes greatly to shape variability here.

7.7.4.1 PDM in two dimensions

We can formulate the PDM for a 2D configuration matrix X(2k × 1) of k landmarks in as (Mardia 1997):

(7.30) numbered Display Equation

where yjN(0, λj),  ε ∼ N2k(0, σ2I), independently and the vectors γi satisfy

numbered Display Equation

and λ1 ≥ λ2 ≥ ⋅⋅⋅ ≥ λp. In addition, for invariance under rotation and for translation, the vectors γi satisfy, respectively

numbered Display Equation

where ν = ( − β1, …, −βk, α1, …, αk)T with μ = (α1, …, αk, β1, …, βk)T. Here p ≤ min (n − 1, 2k − 4) and p is preferably taken to be quite small, for a parsimonious model.

7.7.5 PCA in shape analysis and multivariate analysis

We shall highlight some similarities with PCA in shape analysis with PCA in multivariate analysis. In the standard implementation PCA is used to summarize succinctly the main modes of variation in a dataset, and the principle is the same in shape analysis – we are looking for orthogonal axes of shape variation which summarize large percentages of the shape variability.

In conventional PCA the coefficients with largest absolute value in each PC are interpreted as patterns of interest in the dataset. For example, consider the PCA of a set of length measurements taken from a biological specimen, for example an ape skull. The first PC may give approximately equal weight to each variable; thus, it could be interpreted as a measure of overall size. The second PC might have positive loadings for face measurements and negative loadings for braincase measurements, and so it measures the relative magnitudes of these two regions.

However, in shape PCA it is difficult to interpret the PCs in terms of effects on the original landmarks. Bookstein (2016) argues that coefficients from a PCA from Procrustes registered data cannot be interpreted as effects that have any sense in biological applications, as the Procrustes geometry is entirely a function of the particular selection of landmarks. In Section 12.3.6 we encounter an important variant of PCA – relative warps – which provides a different orthogonal basis and typically decomposes shape variability at a variety of scales.

Principal component analysis is also used in multivariate analysis as a dimension reducing technique, and this is also one of the goals of PCA in shape analysis.

7.8 Principal component analysis for size-and-shape

As for shape analysis we can carry out PCA in the tangent space to size-and-shape space. Because size variability is often a major aspect of the overall variability, PC1 will often involve size. For example we consider a size-and-shape PCA of the combined male and female gorilla data of Section 1.4.8, and we plot PC1 versus centroid size in Figure 7.18. There is clearly an extremely strong correlation between the size-and-shape PC1 scores and centroid size (correlation = 0.9975), and the size-and-shape PC2 scores are strongly correlated with the shape PC1 (correlation 0.6429).

Image described by caption.

Figure 7.18 Plot of size-and-shape PC1 scores versus centroid size, for the gorilla data. Note the females all have centroid size less than 247 and the males are all greater than 261.

7.9 Canonical variate analysis

As the tangent space is a Euclidean vector space we can carry out the usual techniques from multivariate analysis in this space. For example if we have at least two groups we can consider canonical variate analysis Mardia et al. (1979, Section 12.5). The canonical variates are linear combinations of the variables which best separate the groups, and the same components are used in Fisher’s linear discriminant analysis. For the mouse vertebral data canonical variate analysis can be carried out in R using:

data(mice)
shapes.cva(mice$x,mice$group)

The method involves full Procrustes analysis and uses the Procrustes residuals for the shape canonical variate analysis. A plot of the first two canonical variates is given in Figure 7.19. We see that there is good separation in the three groups using the shape canonical variate analysis.

Image described by surrounding text and caption.

Figure 7.19 Plots of the first two canonical variates for the T2 mouse vertebral data. The three groups are Large (l), Small (s) and Control (c).

To carry out shape canonical variate analysis on the great apes dataset we have:

data(apes)
shapes.cva(apes$x,apes$group)

In Figure 7.20 we see there is complete separation of the three species of great ape. The female and male gorillas are very different, and also the orangutans are quite well separated by sex. However, there is a lot of overlap in the male and female groups for the chimpanzees, indicating less shape difference between the sexes in this group. Note that the underlying assumption of equal covariance matrices may not hold, as in this example, but nevertheless it is a useful representation.

Image described by surrounding text and caption.

Figure 7.20 Plots of the first two canonical variates for the great ape data. The six groups are male gorillas (gorm), female gorillas (gorf), male chimpanzees (panm), female chimpanzees (panf), male orang utans (pongom) and female orang utans (pongof).

An alternative to canonical correlation analysis for investigating associations and dimension reduction in groups of shapes was introduced by Iaci et al. (2008). The method was based on maximizing Kullback–Leibler information and was applied to a dataset of fly wings and mouse mandible data.

7.10 Discriminant analysis

Fisher’s linear discriminant or quadratic discriminant analysis can be carried out on Procrustes tangent coordinates when there are two groups for classification. Mardia et al. (2013a) gave an example of quadratic discriminant analysis used in a court case for a prisoner on Death Row, introduced in Section 1.4.6. Quadratic discriminant analysis is often more appropriate as group covariance matrices are usually different. It was of importance to determine whether the prisoner had suffered brain damage through FASD or not. Using a previous study of controls versus FASD patients there was a difference in the average corpus callosum shape around a particular part on the upper right isthmus region (thin section before the bulge on the right end in Figure 1.16). The FASD patients had a thinner region on average. Using a model based on Procrustes tangent coordinates, the likelihood ratio was very strongly in favour of the prisoner being classified in the FASD group. The resulting testimony in the courtroom, in addition to other expert observations, led to the prisoner being spared the death penalty in favour of life in prison.

Some other methods for discriminant analysis are also available, for example, one based on neural networks Southworth et al. (2000). A model-based approach for triangular shape is given in Kent and Mardia (2013).

7.11 Independent component analysis

As a further indication of the type of analysis that can be carried out we consider independent component analysis (ICA) (Hyvärinen et al. 2001), which seeks the most non-Gaussian directions of variability. There are many types of ICA and we use fastICA, which is available in an R library (Marchini et al. 2013). In practice one normally carries out ICA on a reduced set of PC scores if the dimension is high. The commands to carry out ICA on the mouse vertebral outline data, after first reducing to 3 PC scores from partial Procrustes analysis are:

library(fastICA)
nic<-3
proc<-procGPA(mice$outlines,scale=FALSE)
ans<-fastICA(proc$scores,nic)
ICscores<-t(ans$S)
par(mfrow=c(2,3))
colgrp<-as.integer(mice$group)+1
plot(ICscores[1,],ICscores[2,],type=”n”,xlab=”ic1”,ylab=”ic2”)
text(ICscores[1,],ICscores[2,],mice$group,col=colgrp)
plot(ICscores[1,],ICscores[3,],type=”n”,xlab=”ic1”,ylab=”ic3”)
text(ICscores[1,],ICscores[3,],mice$group,col=colgrp)
plot(ICscores[2,],ICscores[3,],type=”n”,xlab=”ic2”,ylab=”ic3”)
text(ICscores[2,],ICscores[3,],mice$group,col=colgrp)
PCscores<-t(proc$scores)
plot(PCscores[1,],PCscores[2,],type=”n”,xlab=”pc1”,ylab=”pc2”)
text(PCscores[1,],PCscores[2,],mice$group,col=colgrp)
plot(PCscores[1,],PCscores[3,],type=”n”,xlab=”pc1”,ylab=”pc3”)
text(PCscores[1,],PCscores[3,],mice$group,col=colgrp)
plot(PCscores[2,],PCscores[3,],type=”n”,xlab=”pc2”,ylab=”pc3”)
text(PCscores[2,],PCscores[3,],mice$group,col=colgrp)

In Figure 7.21 we see plots of the independent component (IC) scores and PC scores for the mouse vertebral outline data. We see that IC1 versus IC3 give quite good separation of the three groups, as indeed do PC1 versus PC2.

Image described by surrounding text and caption.

Figure 7.21 Plot of the first three independent components scores (top row), and the first three PC scores (bottom row) for the mouse vertebral outline data. The observations are labelled by group: Control (c); Large (l); and Small (s). The ordering of the ICs is arbitrary.

Brignell et al. (2010) gave a further example of using ICA in shape analysis, with an application to brain surface asymmetry in schizophrenia.

7.12 Bilateral symmetry

A particularly important concept in biology is bilateral symmetry, which is symmetry about a plane (as in a mirror image). Kent and Mardia (2001) study the topic in theoretical detail, and in particular provide detailed algebraic decompositions that can be used for a practical investigation of bilateral symmetry. Other investigations include Mardia et al. (2000) who provide a statistical assessment of bilateral symmetry of shapes and Bock and Bowman (2006) who consider the measurement and analysis of asymmetry with applications to face modelling, including measuring symmetry after cleft lip surgery in children. Brignell et al. (2010) study the application described in Section 1.4.15 where the shape of the cortical surface is investigated. A special type of symmetry was observed called brain torque, where the schizophrenia patients had slightly more symmetric brains on average compared with the control group. Theobald et al. (2004) decomposed shape variability into symmetric and asymmetric components, which was also used by Brignell et al. (2010). The method is very simple, in that a dataset is supplemented by relabelled mirror images (relabelled reflections) before carrying out PCA.

Example 7.6 Consider the T2 Small data of Section 1.4.1. The percentages explained by the first four PCs after full Procrustes analysis and PCA are 68.4, 9.7, 7.8, and 6.5%, respectively. In Figure 7.22 we see the raw data and the reflected relabelled data, where the y coordinates are made negative and points {1, 2, 3, 4, 5, 6} are relabelled as {2, 1, 5, 4, 3, 6}. Full Procrustes analysis is carried out and the first four PC vectors are displayed in Figure 7.23. The R code for the analysis is:

refx<-qset2.dat
refx[,1,]<- qset2.dat[c(2,1,5,4,3,6),1,]
refx[,2,]<- -qset2.dat[c(2,1,5,4,3,6),2,]
plotshapes(qset2.dat,refx,col=1:6)
xdat<-abind(qset2.dat,refx)
out<-procGPA(xdat)
shapepca(out,type=”v”,mag=3,pnco=1:4)

Note that in this analysis the Procrustes mean shape has bilateral symmetry. We see that PC1 and PC4 are symmetric PCs (with 60.0 and 5.9% variability, respectively) whereas PC2 and PC3 are asymmetric (with 18.1 and 7.2% variability, respectively). The advantage of the symmetry/asymmetry decomposition is that the PCs can be more interpretable. Here the symmetric PC scores are projections of each case into a subspace. Further types of projections (affine components and partial warps) are considered in Chapter 12.

Image described by surrounding text and caption.

Figure 7.22 The T2 Small landmarks (a) and the reflected relabelled landmarks (b).

Image described by surrounding text and caption.

Figure 7.23 The symmetric and asymmetric PCs from the T2 Small vertebrae, with the bilateral symmetric mean and PC vectors magnified three times.

Fluctuating asymmetry is a type of deviation from bilateral symmetry, where the deviations are randomly distributed about a bilateral symmetric mean value. Low levels of fluctuating asymmetry often demonstrate only small environmental perturbations of genetic development programmes. Klingenberg and McIntyre (1998) and Klingenberg (2011) discuss appropriate methodology which is implemented in the MorphoJ software (Klingenberg 2011) for investigating fluncuating asymmetry. There are numerous other specialist applications in morphometrics, for example Cardini (2014) discusses the use of 2D images as proxies for 3D shapes, again using MorphoJ. Other software for analysing specialized biological applications includes tpsRelw (Rohlf 2010), geomorph (Adams and Otárola-Castillo 2013), PAST (Hammer et al. 2001) and the Evan toolbox (Phillips et al. 2010).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.42.233