In this chapter we outline some simple approaches to inference based on tangent space approximations, which are valid in datasets with small variability in shape. We discuss one and two sample Hotelling’s T2 tests for mean shape based on normality assumptions, and then consider non-parametric alternatives. As a special case we consider isotropic covariance structure, although the model is too simple for most applications. We discuss other multivariate inference techniques which work directly on shape coordinates and conclude with the topic of allometry: the relationship between shape and size. Note that any inference is a function of the landmark selection that the investigator designed at the outset.
Statistical models and parametric inference in the pre-shape, shape and size-and-shape space are discussed later in Chapter 10.
The horizontal tangent space to the pre-shape sphere was defined in Section 4.4. A practical approach to analysis is to use the Procrustes tangent space coordinates if the data are concentrated and then perform standard multivariate analysis in this linear space, where the pole is chosen from the data using a consistent estimator of an overall population mean, which is then treated as fixed. For example the pole could be the full Procrustes mean of Equation (6.11).
After the choice of tangent space has been fixed we carry out inference using any convenient linear statistical methods, and we can make use of all the functions and packages in R, or any other statistical package of choice.
Consider carrying out a test on a mean shape of a single population and whether or not the mean shape has a particular special shape [μ0], that is test between
Let X1, …, Xn be a random sample of configurations with partial Procrustes tangent coordinates (with pole from the full Procrustes mean with unit size) given from Equation (4.33) by v1, …, vn where
Let the tangent coordinates of μ0 be γ0 where
and μP0 is the Procrustes fit of μ0 onto . Since the dimension of the tangent space is q = km − m − m(m − 1)/2 − 1 and the length of each vector vi is (k − 1)m > q we have a singular covariance matrix and so we could use generalized inverses.
Definition 9.1 A generalized inverse of a symmetric square matrix A is denoted by A− and satisfies
The Moore–Penrose generalized inverse of A is:
where γj are the eigenvectors of A corresponding to the p non-zero eigenvalues λj, j = 1, …, p.
To obtain a one sample test a standard multivariate analysis approach is carried out on vi, where a multivariate normal model for vi is assumed,
independently for i = 1, …, n. The one sample Hotelling’s T2 test could be used (e.g. Mardia et al. 1979, p. 125). We write for the sample mean and we write for the sample covariance matrix (with divisor n). The Mahalanobis squared distance between and γ0 is:
where S−v is the Moore–Penrose generalized inverse of Sv. The rank of Sv is min (q, n − 1) and we assume that the rank of our sample covariance matrices is q in this chapter, although the methods can be extended to n ≤ q with appropriate regularization (see Section 9.1.5).
Important point: The test statistic is taken as:
where the is the jth PC score for The test statistic F has an Fq, n − q distribution under H0. Hence, we reject H0 for large values of F.
A 100(1 − α)% confidence region for mean shape is given by:
where Fq, n − q, 1 − α is the (1 − α)-quantile of the F distribution with degrees of freedom (q, n − q). In practice such a confidence region might have limited practical use due to extra uncertainty in the landmark selection or the choice of Procrustes metric.
Example 9.1 Consider the digit 3 data, described in Section 1.4.2, with k = 13 points in m = 2 dimensions on n = 30 objects. We might wish to examine whether the population mean shape could be an idealized template, such as that displayed in Figure 9.1, with equal sized loops, with 12 of the landmarks lying equally spaced on two regular octagons (apart from landmark 7 in the middle). The μ0 is taken as the template and the data are projected into the tangent plane with the pole at the Procrustes mean . The q = 22 PC scores are retained and the squared Mahalanobis distance from to the pole in the tangent space is ∑qj = 1sj2/λj = 47.727 and hence F = 17.356. Since P(F22, 8 > 17.356) ≈ 0.0002 we have very strong evidence that the population mean shape does not have the shape of this template.
Of more practical interest is a two sample test.
Consider two independent random samples and from independent populations with mean shapes [μ1] and [μ2]. To test between
we could carry out a Hotelling’s T2 two sample test in the Procrustes tangent space, where the pole corresponds to the overall pooled full Procrustes mean shape (i.e. the full Procrustes mean shape calculated by GPA on all n1 + n2 individuals). Let and be the partial Procrustes tangent coordinates (with pole ). The multivariate normal model is proposed in the tangent space, where
and the vi and wj are all mutually independent, and common covariance matrices are assumed. We write and Sv, Sw for the sample means and sample covariance matrices (with divisors n1 and n2) in each group. The Mahalanobis distance squared between and is:
where Su = (n1Sv + n2Sw)/(n1 + n2 − 2), and S−u is the Moore–Penrose generalized inverse of Su (see Definition 9.1). Under H0 we have ξ1 = ξ2, and we use the test statistic
The test statistic has an distribution under H0. Hence, we reject H0 for large values of F.
Example 9.2 Consider the gorilla skull data described in Section 1.4.8. There are n1 = 30 female gorilla skulls and n2 = 29 male gorilla skulls, with k = 8 landmarks in two dimensions, and so there are q = 2k − 4 = 12 shape dimensions. The first three PCs in each group explain 34.8, 22.9, 11.2% (females) and 42.2, 18.0, 12.4% (males) of the variability in each group.
> procGPA(gorf.dat)$percent[1:3]
[1] 34.79297 22.90900 11.25934
> procGPA(gorm.dat)$percent[1:3]
[1] 42.20271 17.96493 12.37879
A plot of the first PC for each group is given in Figure 9.2. Although the test requires equal covariance structures this is unreasonable here, as shape variation is invariably different between sexes in biology. However, we continue to illustrate the use of this two sample Hotelling test before considering more reasonable nonparametric tests later in Section 9.1.3.
The percentages of variability explained by the first three within group PCs are 37.3, 16.0, 14.7% and the first three PCs are included in Figure 9.3. In addition, we have no reason to doubt multivariate normality from the pairwise scatters of the standardized PC scores of the data (some of the PC scores are shown in Figure 9.4).
The observed test statistic (9.4) is F = 26.470 and since P(F12, 46 > 26.47) < 0.0001 we have very strong evidence that the mean shapes are different. So our conclusion would be that there is a significant difference in mean shape between the female and male gorilla skulls in the midline. The test can be carried out in R:
> resampletest(gorm.dat,gorf.dat,replace=TRUE)
Resampling...No of resamples = 200
Bootstrap - sampling with replacement
$H
[1] 26.47042
$H.pvalue
[1] 0.004975124
$H.table.pvalue
[1] 1.110223e-16
and here $H.table.pvalue
gives the p-value of the test and $H.pvalue
corresponds to a bootstrap test (see Section 9.1.3).
We can express the squared Mahalanobis distance of Equation (9.3) as:
where are the scores in the direction of the observed group difference (Kent 1997). Large values of s2j/λj indicate which directions of shape variability are associated with the difference between the groups. In our example the values of 1.0084s2j/λj are:
Of the first few PCs (which explain most of the shape variability in the dataset) PC2 and, to a lesser extent, PC1 have high scores in the direction of the shape difference, namely 2.57 and 1.53. The later PCs are effectively just arbitrary choices of directions in shape space so no particular meaning or interpretation should be assigned to these.
Figure 9.4 displays some of the PC scores for the data and PC1 and PC2 give a good separation of the two groups. The higher PCs are effectively just arbitrary directions in shape space which explain very little variability. They have been displayed for mathematical curiosity but their utility in any practical analysis is negligible. From Figure 9.4 it is clear that the groups differ substantially in centroid size. The correlations of the PC scores with centroid size are:
and so the PCs that have a high contribution in the direction of the shape difference have a high correlation with centroid size, namely PC1, PC2 and PC9 with correlations − 0.48, −0.68 and 0.88, respectively. So there is clear allometry here – shape differences are associated with size differences (see Sections 5.7 and 9.5).
As seen above, there is an extremely large contribution from PC9 in terms of s2j/λj. We plot the first 3 PCs and PCs 9, 11 and 12 in Figure 9.3. There is nothing special about the number 9 here – the lower PCs are measuring very little shape variability and are essentially random directions in shape which are orthogonal to the higher PCs. It just happens to be that PC9 correlates well with the observed shape difference between the two means. Another random sample might easily result in a different PC or several PCs having high correlation with the observed shape difference. The two groups are very different in size and in shape, and PC9 is also the most highly correlated with size.
Now that a significant mean shape difference has been found a biologist would be further interested in how size and shape are related. For example, if using size as a covariate how do the shapes of the gorillas differ after removing the effect of the covariate of size? Note that PC3 is not so highly correlated with size and this PC includes mainly braincase variability, so we might wish to investigate whether the non-size-related shape difference is mainly in the braincase. We would also wish to investigate other non-size related variation in all the PCs. Further methods for describing differences between mean shapes and exploring shape variability using thin-plate splines are given in Chapter 12.
The multivariate normal model in the tangent space may be unreasonable in some applications or the assumption of equal covariance matrices may be doubted, as in Example 9.2. Alternative non-parametric methods are a permutation test (Dryden and Mardia 1993; Good 1994; Bookstein 1997) or bootstrap test (Amaral et al. 2007), with the null hypothesis H0 that the groups have equal mean shapes. The permutation test requires the assumption that the groups of data are exchangeable under H0, so for example the covariance matrices must be equal in both groups, but the bootstrap test is less restrictive.
For a two sample permutation test the data are permuted into two groups of the same size as the groups in the data, and the test statistic is evaluated for all possible permutations T1, …, TP. The ranking r of the observed test statistic Tobs is then used to give the < ?TeXp? >-value of the permutation test:
Instead of evaluating all Ti, we can consider a number B (e.g. 200) of random permutations, and the procedure is called a Monte Carlo test. The ranking r of the observed test statistic from B random permutations gives a p-value of:
An alternative non-parametric test is a bootstrap test. In this procedure the data are sampled from the two groups with replication, although it is important that the sampling is carried out under H0, that the mean shapes are equal. To ensure that sampling is carried out under H0 each group is translated to have a common mean before resampling. The test statistic is evaluated for B Monte Carlo replications T1, …, TB. One potential advantage of the bootstrap test is that it does not require exchangeability, and so the two groups could have different covariance matrices for example.
Permutation tests and bootstrap tests have been discussed by Amaral et al. (2007) in some detail for 2D shape analysis. The methods are implemented in R using resampletest
and can also be appropriate for large variations provided the sample sizes are large (we return to this issue later in Chapter 13).
Example 9.3 Consider the 2D chimpanzee data of Section 1.4.8. There are n1 = 28 males and n2 = 26 females, each with k = 8 landmarks in m = 2 dimensions. We wish to test whether the mean population shapes for both sexes are equal. After performing full GPA on the pooled dataset we transform to the tangent space coordinates of Equation (9.1). The dimension of the shape space is q = 12 [8 × 2 (coordinates) − 2 (location) − 1 (rotation) − 1 (size)]. Proceeding with the Hotelling’s T2 test we have statistics F = 1.53 and P(F12, 41 > 1.53) = 0.153, and so there is not a significant difference in mean shape.
A permutation test is also carried out. The data are randomly split into two groups each of size 28 and 26. Out of 200 such permutations the observed F statistic of Equation (9.4) had rank 24, giving a < ?TeXp? >-value of 0.119, and so again there is no evidence for a difference in mean shape. However, for the above two tests the equal covariance assumption may not be reasonable.
Finally a bootstrap test is also carried out which does not require equal covariance matrices. The two full Procrustes means for each group are parallel translated to the common full Procrustes mean, and then samples with replacement are taken of size 28 and 26. Details of the parallel translation are given by Amaral et al. (2007) and in Section 4.1.6. Out of 200 such replications the observed F statistic of Equation (9.4) had rank 35, giving a < ?TeXp? >-value of 0.174, and so again there is no evidence for a difference in mean shape.
The sample sizes are fairly small here and we might expect the Hotelling’s T2 test to be not very powerful.
The above example can be carried out in R as follows (with relevant output given):
ans <- resampletest(panm.dat,panf.dat,replace=FALSE)
Resampling...No of resamples = 200
Permutations - sampling without replacement
ans$H
[1] 1.530009
ans$$H.pvalue
[1] 0.119403
ans$H.table.pvalue
[1] 0.1526978
ans<-resampletest(panm.dat,panf.dat,replace=TRUE)
Resampling...No of resamples = 200
Bootstrap - sampling with replacement
ans$H
[1] 1.530009
ans$H.pvalue
[1] 0.1741294
ans$H.table.pvalue
[1] 0.1526978
One practical issue with the resampletest
implementation is that the tangent space and Procrustes registration are computed on each replication, and so instead we describe some fast approximate two sample tests which can be easily carried out using the command testmeanshapes
in R, which are based on Czogiel (2010). The fast permutation test carries out an initial pooled GPA and then the permutations are carried out without replacement on the Procrustes residuals. The fast bootstrap test carries out an initial pooled GPA and then the bootstrap replications are carried out with replacement on the Procrustes residuals calculated using each group mean. Both methods are fast algorithms as the Procrustes registration is carried out only once.
Example 9.4 Consider the 2D chimpanzee data from Example 9.3. The fast permutation and bootstrap test commands are:
> testmeanshapes(panm.dat,panf.dat,replace=FALSE)
Permutations - sampling without replacement: No of permutations = 1000
$H
[1] 1.53001
$H.pvalue
[1] 0.1468531
$H.table.pvalue
[1] 0.1526974
> testmeanshapes(panm.dat,panf.dat,replace=TRUE)
Bootstrap - sampling with replacement within each group under H0:
No of resamples = 1000
$H
[1] 1.53001
$H.pvalue
[1] 0.2177822
$H.table.pvalue
[1] 0.1526974
and hence in this example the Hotelling T2 statistic and tabular p-value are almost identical, and the permutation p-value is 0.146 and the bootstrap p-value is 0.218, which are similar to Example 9.3. Hence, again we conclude there is no shape difference between the male and female chimpanzee means.
Example 9.5 We consider the macaques example of Section 1.4.3 for an example in three dimensions.
out <- testmeanshapes(macm.dat,macf.dat,replace=FALSE)
out$H
[1] 1.651495
out$H.pvalue
[1] 0.3786214
out$H.table.pvalue
[1] 0.3777721
Note that some regularization has been carried out by adding a small constant to the pooled sample covariance matrix, because the dimension is large compared with the small sample sizes. We see that the Hotelling T2 statistic is 1.65 with tabular p-value (from the F-distribution) of 0.378 and the permutation p-value (from 1000 permutations) of 0.379. Also, rather than a permutation test we can consider a bootstrap test, where the sampling is carried out with replacement and each group is moved to have a common mean before resampling by computing the tangent residuals from each sample mean. Hence sampling is carried out under H0 that the means are equal. The command in this case is:
testmeanshapes(macm.dat,macf.dat,replace=TRUE)
and the bootstrap p-value is 0.673. So, for all these tests the conclusion is the same that there is no evidence for a difference in the mean shapes for the male and female macaques. At this point the statistician should consult with the scientist further to enumerate those features that the literature has indicated might differentiate the groups, e.g. biological function, such as diet.
Some further studies of morphological variation in primates, humans and hominid fossils include: O’Higgins and Jones (1998); Bookstein et al. (1999); O’Higgins (2000); and Mitteroecker et al. (2004).
Further inference, such as testing the equality of the mean shapes in several groups, proceeds in a similar manner. An overall pooled full Procrustes mean is taken as the pole and multivariate analysis of variance (MANOVA) (e.g. see Mardia et al. 1979, p. 333) is carried out on the Procrustes tangent coordinates. General linear models could be proposed in the tangent space and the full armoury of multivariate data analysis can be used to analyse shape data, provided variations are small.
In some datasets there are few observations and possibly many landmarks on each individual. Although inference can be carried out in a suitable tangent space there is often a problem with the space being over-dimensioned. For example, a Hotelling’s T2 test may not be very powerful unless there are a large number of observations available, or it may not be possible to invert the covariance matrices in the procedures. A solution is to carry out some form of regularization, and there are many possible choices. For example one could perform a PCA on the pooled datasets and retain the first few PC scores, although there are obvious dangers, particularly if a true group difference is orthogonal to the first few PCs. An alternative approach, which is used in the testmeanshapes
procedure in R, is to add a small multiple of the identity matrix (0.000001 in fact) before taking the inverse. The precise choice of constant will of course make a practical difference. Other types of regularization might include assuming the inverse covariance matrix is a sparse matrix, and methods such as the graphical LASSO would be appropriate in this case (Meinshausen and Bühlmann 2006; Friedman et al. 2008). Further discussion of choices of regularization for permutation and bootstrap tests was given by Dryden et al. (2014), particularly in the context of MDS estimators (which are discussed in Section 15.3).
Another simple approach to statistical inference is to work with statistics based on squared Procrustes distances, which is equivalent to assuming that the covariance matrix is proportional to the identity matrix. Goodall (1991) has considered such an approach using approximate chi-squared distributions, following from the work of Sibson (1978, 1979) and Langron and Collins (1985). The underlying model is that configurations are isotropic normal perturbations from mean configurations, and the distributions of the squared Procrustes distances are approximately chi-squared distributions. The procedures require a much more restrictive isotropic model than the previous section, and the assumption often does not hold in practice (e.g. Bookstein, 2014). Non-parametric procedures based on Goodall’s procedures can be particularly effective, and do not require such strict assumptions. Important point: For a preliminary analysis we will assume isotropy.
We consider first the case when a random sample of n observations X1, …, Xn (each a k × m matrix) is taken from an isotropic normal model with mean μ and transformed by an additional location, rotation and scale, that is
where βi > 0 (scale), Γi SO(m) (rotation) and (translation), and σ is small.
The following approximate analysis of variance (ANOVA) identity holds for and small σ:
where is the full Procrustes mean and dF is the full Procrustes distance of Equation (4.10). The proof can be seen using Taylor series expansions. Note the similarities with ANOVA in classical regression analysis – the left-hand side of the equation is like a total sum of squares and the right-hand side is like the residual sum of squares plus the explained (regression) sum of squares.
Consider testing between H0: [μ] = [μ0] and H1: [μ] ≠ [μ0]. Under the null model it can be shown that approximately (to second-order terms in Ei) that
independently for i = 1, …, n, where q = km − m − m(m − 1)/2 − 1 is the dimension of the shape space, τ0 = σ/δ0, and δ0 = S(μ0) = ||Cμ0|| is the centroid size of μ0. The proof can be obtained by Taylor series expansions, after Sibson (1978), and the proof for the m = 2 dimensional case is seen from Equation (10.14), when discussing the complex Watson distribution.
From the additive property of independent chi-squared distributions,
In addition, since q parameters are estimated in we have:
and is approximately independent of . Hence, approximately
again using the additive property of independent chi-squared distributions. So, under H0 we have the approximate result
This is valid for small σ and μ0 close to , and so we reject H0 for large values of this test statistic. We call the test the one sample Goodall’s test, after Goodall (1991).
If τ0 is small, Ei is isotropic (but not necessarily normal) and nq is large, then approximately
by applying the central limit theorem.
The test based on the isotropic model can be seen as a special case of the Hotelling’s T2 procedure of Section 9.1.1. If we replace Sv with s2vI2k − 2, where s2v is the unbiased estimate of variance, then the Mahalanobis distance of Equation (9.2) becomes
from Equation (4.30). Now
and hence the test statistic for the one sample Hotelling’s T2 test statistic is proportional to:
and the one sample Hotelling’s T2 test under the isotropic model is identical to using the F statistic of Equation (9.8).
Consider independent random samples from a population modelled by Equation (9.5) with mean μ1, and from Equation (9.5) with mean μ2. Both populations are assumed to have a common variance for each coordinate σ2. We wish to test H0: [μ1] = [μ2]( = [μ0]), say, against H1: [μ1] ≠ [μ2]. Let and be the full Procrustes means of each sample, with icons and . Under H0, with σ small, the Procrustes distances are approximately distributed as:
where τ0 = σ/δ0 and δ0 = S(μ0). Again, proofs of the results can be obtained using Taylor series expansions. In addition these statistics are approximately mutually independent (exactly in the case of the first two expressions). Hence, under H0 we have the approximate distribution
and again this result is valid for small σ. We reject H0 for large values of this test statistic. We call the test the two independent sample Goodall’s test after Goodall (1991).
Example 9.6 Consider the 2D chimpanzee data from Example 9.3. The fast permutation and bootstrap test command testmeanshapes
in R give the Goodall statistic with p-values (with the relevant output given only):
> testmeanshapes(panm.dat,panf.dat,replace=FALSE)
Permutations - sampling without replacement: No of permutations = 1000
$G
[1] 2.591273
$G.pvalue
[1] 0.02197802
$G.table.pvalue
[1] 0.002276534
> testmeanshapes(panm.dat,panf.dat,replace=TRUE)
Bootstrap - sampling with replacement within each group under H0:
No of resamples = 1000
$G
[1] 2.591273
$G.pvalue
[1] 0.01698302
$G.table.pvalue
[1] 0.002276534
From the output using the Goodall test the tabular p-value is P(G > F12, 624) = 0.002, the permutation p-value is 0.022 and the bootstrap p-value is 0.017. The isotropy assumption is unlikely to be reasonable, so the nonparametric tests should have more weight. The Goodall test is more powerful than the Hotelling T2 test under either the parametric or nonparametric assumptions as there are many fewer parameters to estimate in the covariance matrix of the tangent coordinates.
Example 9.7 Consider the schizophrenia data described in Section 1.4.5. We wish to test whether the mean shapes of brain landmarks are different in the two groups of control subjects and schizophrenia patients. There are k = 13 landmarks in m = 2 dimensions. The Procrustes rotated data for the groups are displayed in Figure 9.5.
The percentages of variability explained by the first three PCs are 31.6, 21.4, 13.2% for the controls and 27.1, 21.7, 4.8% for the schizophrenia patients. Box’s M test was carried out and there is some evidence against equal covariance matrices. The root mean square of dF in each group is 0.068 in the controls and 0.073 in the schizophrenia group. It is unreasonable to assume isotropy for any practical biological landmark data, and so the nonparametric versions of the tests will be preferred.
The mean configurations are displayed in Figure 9.6. The full Procrustes distance dF between the mean shapes is 0.038. The sum of squared full Procrustes distances from each configuration to its mean shape is 0.140 and so the F statistic is 1.89. Since P(F22, 572 ≥ 1.89) ≈ 0.01 we have evidence for a significant difference in shape. However, the assumptions of this test are unreasonable and so we next consider the nonparametric versions.
Following Bookstein (1997) we also consider a Monte Carlo test, as described in Section 9.1.3, based on 999 random permutations. The configurations are randomly assigned into each of the two groups, the F statistic is calculated and the proportion of times that the resulting F statistic exceeds the observed value of 1.89 is the < ?TeXp? >-value for the test. From 999 random permutations we obtained a < ?TeXp? >-value of 0.04. Hence, we have some evidence that the mean configurations are different in shape, but with a larger < ?TeXp? >-value than for the isotropic based tests.
If we carry out a Hotelling’s T2 test in the tangent space we have F = 0.834 which is near the centre of the null distribution [P(F22, 5 > 0.834) = 0.66]. So, the Hotelling’s T2 provides no evidence for a shape difference. We should be aware that the Hotelling T2 test is expected to be less powerful when the isotropic model holds: power is lost because many degrees of freedom are used in estimating the covariance matrix in the Hotelling’s T2 test.
Consider the permutation test using command resampletest
. For the schizophrenia data the command for the permutation test is:
resampletest(schizophrenia.dat[,,1:14],
schizophrenia.dat[,,15:28],replace=FALSE)
Random samples of size n1 = 14 and n2 = 14 are taken without replacement from the pooled dataset. The new groups are then registered using Procrustes analysis and individual mean shapes are estimated. The observed square full Procrustes distance is then compared with the simulated distribution under H0, and the p-value is calculated. The bootstrap test is also carried out, as described in Section 9.1.3, and both these tests using the R command reampletest
. In addition the fast approximations to the procedures are also carried out using testmeanshapes
.
For the schizophrenia data the permutation p-values for the Goodall statistics are 0.049 (resampletest) and 0.055 (testmeanshapes). For the bootstrap-based test the p-values are 0.079 (resampletest) and 0.067 (testmeanshapes). Hence, this is an interesting example where the evidence is somewhat on the boundary of significance. Later analysis by Bookstein (2000) using the method of creases found a localized difference in a region corresponding to a known drug effect, with p-value approximately 0.001.
Note that the test based on the isotropic model can be seen as a special case of the two sample Hotelling’s T2 procedure of Section 9.1.2. If we replace Su with s2uI2k − 2, where s2u is the unbiased estimate of variance, then the Mahalanobis distance of Equation (9.3) becomes:
from Equation (4.31). Now
and so the test statistic for the two sample Hotelling’s T2 test statistic would be proportional to:
Hence the Hotelling’s T2 test under the isotropic model becomes identical to using the F statistic of Equation (9.9).
A variety of other choices of test statistics are available in the R functions resampletest
, including the James statistic (James 1954; Seber 1984, p. 115) given by
and the asymptotically pivotal statistic λmin given by Amaral et al. (2007). The λmin statistic is only appropriate for 2D shapes, but the James statistic is appropriate for 3D data using the fast routine testmeanshapes
. The tests have performed well in a range of simulation studies, with the bootstrap test having an advantage over the permutation test when covariance matrices are unequal (Amaral et al. 2007).
We consider the 2D chimpanzee data of Example 9.3. We see the relevant results from the resampletest
command are:
> resampletest(panm.dat,panf.dat,replace=FALSE)
Resampling...No of resamples = 200
Permutations - sampling without replacement
$lambda
[1] 30.577
$lambda.pvalue
[1] 0.0199005
$lambda.table.pvalue
[1] 0.002284522
$J
[1] 23.45578
$J.pvalue
[1] 0.1243781
$J.table.pvalue
[1] 0.112
resampletest(panm.dat,panf.dat,replace=TRUE)
Resampling...No of resamples = 200
Bootstrap - sampling with replacement
$lambda
[1] 30.577
$lambda.pvalue
[1] 0.0199005
$lambda.table.pvalue
[1] 0.002284522
$J
[1] 23.45578
$J.pvalue
[1] 0.1691542
$J.table.pvalue
[1] 0.112
The James test is also available as a fast permutation and bootstrap routine in testmeanshapes
:
> testmeanshapes(panm.dat,panf.dat,replace=FALSE)
Permutations - sampling without replacement: No of permutations = 1000
100 200 300 400 500 600 700 800 900 1000
$J
[1] 23.45601
$J.pvalue
[1] 0.1448551
$J.table.pvalue
[1] 0.112
> testmeanshapes(panm.dat,panf.dat,replace=TRUE)
Bootstrap - sampling with replacement within each group under H0:
No of resamples = 1000
100 200 300 **400 500 600 700 800 900 1000
$J
[1] 23.45601
$J.pvalue
[1] 0.2127872
$J.table.pvalue
[1] 0.112
We see that the results of the James test are similar to those of the Hotelling T2 test of Example 9.3 and Example 9.4 with no significant difference between the mean shapes. However, the λmin test result is similar to the Goodall test where the conclusion is that there is a significant difference in mean shape. The Goodall test is more powerful than the James test if the assumptions are reasonable as there are many fewer parameters to estimate in the covariance matrix of the tangent coordinates.
Consider a balanced MANOVA with independent random samples (Xi1, ..., Xin)T, i = 1, ..., nG, from nG groups, each of size n. Let be the group full Procrustes means and is the overall pooled full Procrustes mean shape. A suitable test statistic Goodall (1991) is:
Under the null hypothesis of equal means the approximate distribution of F is , where q = (k − 1)m − m(m − 1)/2, and the null hypothesis is rejected for large values of the statistic. Since
the two sample test of the previous section (with n1 = n2) is a special case.
Tangent space inference for size-and-shape can proceed in a very similar manner as for shape space. Writing v for the size-and-shape tangent coordinates the test statistics are exactly the same as in the pure shape case for the Hotelling T2, Goodall and James statistics of Section 9.1 except q = (k − 1)m − m(m − 1)/2 is one larger due to the higher dimension of the size-and-shape space.
We can carry out the tests using the option scale=FALSE
in testmeanshapes
. In the 2D chimpanzee data of Example 9.3 testing for mean size-and-shape differences between males and females we have:
> testmeanshapes(panf.dat,panm.dat,scale=FALSE,replace=TRUE)
Permutations - sampling without replacement: No of permutations = 1000
$H
[1] 2.227968
$H.pvalue
[1] 0.02297702
$H.table.pvalue
[1] 0.02836717
$G
[1] 5.158083
$G.pvalue
[1] 0.001998002
$G.table.pvalue
[1] 2.802277e-08
$J
[1] 34.3913
$J.pvalue
[1] 0.01898102
$J.table.pvalue
[1] 0.014
> testmeanshapes(panf.dat,panm.dat,scale=FALSE,replace=TRUE)
Bootstrap - sampling with replacement within each group under H0:
No of resamples = 1000
$H
[1] 2.227968
$H.pvalue
[1] 0.05194805
$H.table.pvalue
[1] 0.02836717
$G
[1] 5.158083
$G.pvalue
[1] 0.001998002
$G.table.pvalue
[1] 2.802277e-08
$J
[1] 34.3913
$J.pvalue
[1] 0.04995005
$J.table.pvalue
[1] 0.014
Hence for all tests there is evidence for a difference between the mean size-and-shapes of the female and male chimpanzees. If we look specifically at centroid size we see that there is a significant difference in mean centroid size:
> t.test(centroid.size(panf.dat),centroid.size(panm.dat))
Welch Two Sample t-test
data: centroid.size(panf.dat) and centroid.size(panm.dat)
t = -3.8176, df = 50.704, p-value = 0.0003682
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-10.887324 -3.382205
sample estimates:
mean of x mean of y
196.0349 203.1697
with the females being smaller on average than the males. The next stage of the analysis would be to study whether the sexual dimorphism of shape aligns with any suggested allometry.
Also, for analysis of variance for size-and-shape we can use the statistic:
where and are the respective size-and-shape estimators and q = (k − 1)m − m(m − 1)/2. We provide an example of using this test in Section 14.
Here we describe a case study to illustrate the size-and-shape tests. This application was described originally by Mardia (2013b) and illustrates the use of a method called TorusDBN (Boomsma et al. 2008) in mutation studies. In particular the application is from a study by Airoldi et al. (2010) who investigated some aspects of evolution by conducting a mutation experiment on two test species of flowering plants. A basic step in the mutation process involves the changing of amino acids in a sequence through addition or removal of an amino acid which are referred to as gaps (Durbin et al. 1998, p. 130). Airoldi et al. (2010) have obtained in their experiment five protein sequences, two original sequences to be called AG and FAR sequences, and their three mutant sequences AG+Q, AG+R, FAR–Q with insertion or deletion of one amino acid. The study involves five fragments displayed below, one from each of these five protein sequences, that are of 12 or 13 amino acids in length (the hyphen in each sequence denotes a gap).
AG fragment sequences:
DYMQKR-EVDLHN (AG) Original DYMQKREVDLHN
DYMQKRQEVDLHN (AG+Q) Insertion of Q
DYMQKRREVDLHN (AG+R) Insertion of R.
FAR fragment sequences:
EYMQKRQEIDLHH (FAR) Original
EYMQKR-EIDLHH (FAR-Q) Deletion of Q.
For brevity, we will denote these five sequences as 1, …, 5, respectively. The inference based on the sequences by Airoldi et al. (2010) is that the sequences 1 and 5 have the same functionality and so do each pair: 2 and 3; 2 and 4; 3 and 4. So the key functional pairs are:
The conclusions of Airoldi et al. (2010) were based only on the sequences as there is no knowledge of their true structures. There are many sequences of amino acids where the structures are still unknown. In fact, the site UniParc http://www.uniprot.org/uniparc/
has about 32 million protein sequences but on the other hand there are only about 85 000 known structures in the Protein Data Bank (PDB). However, the function of a protein is usually determined by its structure rather than its sequence (e.g. see Tramontano 2006, p. 1) and we use TorusDBN to explore inference based on the structure. Mardia (2013b) simulated 100 structures (3D coordinates) from TorusDBN (Boomsma et al. 2008) given each of the five fragments, and these 500 simulated configurations are available in the Supplementary Material of Mardia (2013b). Note that these fragments have local functional information as they are selected around the gap region in the protein sequences which gave rise to physical changes in the plants (Airoldi et al. 2010).
The sequences 1 and 5 have 12 amino acids but the sequences 2, 3 and 4 have 13 amino acids. One plausible way to carry out inference on functionality through these structures is to treat each point of their configuration with the label number given by its position (termed a landmark so there is correspondence across the configurations). Also not all fragments have the same length, therefore, for a meaningful comparison we compare the N and C terminal portions on both sides of the point mutation, that is, use the two sets of six landmarks: Set 1 the first six, Set 2 the last six. (N-terminus refers to the ‘start’ of a protein and C-terminus to the ‘end’ of a protein.)
So, there are 5 groups of n = 100 configurations, j = 1, …, 5 for each of two sets of data (Set 1: N-terminus; Set 2: C-terminus). The observations each have k = 6 landmarks in m = 3 dimensions. It is of interest to compare the size-and-shape (form) of the molecules, and examine which pairs are closer.
First, we assess their similarities by using the RMSD as in the following. We calculate the Procrustes mean size-and-shape of each group by GPA for size-and-shape, as in Section 7.5.1. Then we obtain the RMSD between each mean pair leading to the 5 × 5 form distance matrix; Table 9.1 and Table 9.2 give these distance matrices for the two sets and Figure 9.7 gives, for each set, a corresponding dendrogram (using single linkage clustering). For Set 1, the dendrogram shows no clusters whereas for Set 2 the dendrogram highlights the relative clustering of the pairs (2,3), (2,4), (3,4) and (1,5) under discussion.
Table 9.1 The RMSD between Procrustes means forms in Set 1.
1 | 2 | 3 | 4 | 5 | |
1 | − | 0.004 | 0.0076 | 0.0139 | 0.0112 |
2 | − | 0.0044 | 0.0166 | 0.0157 | |
3 | − | 0.0291 | 0.0221 | ||
4 | − | 0.0047 | |||
5 | − |
Table 9.2 The RMSD between Procrustes means forms in Set 2.
1 | 2 | 3 | 4 | 5 | |
1 | − | 0.0905 | 0.0901 | 0.1595 | 0.0181 |
2 | − | 0.0046 | 0.0095 | 0.1062 | |
3 | − | 0.0200 | 0.1114 | ||
4 | − | 0.2067 | |||
5 | − |
Next, for each set, we formulate a plausible perturbation model. We can proceed one step further by defining and testing the null hypothesis as follows. The plausible model for this ANOVA situation is:
where Xij are the configuration matrices (k × m), μ is an overall mean configuration, μj is the jth mean configuration, Eij are independent Gaussian with zero means, Γij are rotations and γij are translations, 1k is the vector (1, 1, …, 1)T; Γij, γij are the nuisance parameters. We test the null hypothesis that the size-and-shape of the means are equal:
MANOVA can then be applied to these preprocessed configurations where each configuration matrix is stacked as a vector having 3k − 6 elements. Note that 3k − 6 denotes the number of variables left after removing the 3 degrees of freedom for translation and 3 for rotation. Furthermore, we use the Goodall statistic for size-and-shape (9.12); that is, we have the F statistic given by:
It is found that:
Hence, the null hypothesis of the equal means has almost zero p-value, but relatively Set 2 has a larger value of F so this again indicates that these fragments have larger mean differences.
We now examine the Hotelling T2 statistic in the Procrustes tangent space for all the pairs, and Table 9.3 and Table 9.4 give the p-values from the bootstrap for Set 1 and Set 2, respectively, under the null hypothesis. The p-values from the bootstrap are computed in R; 1000 bootstrap samples were used. For Set 2, only the same four key pairs (1,5), (2,3), (2,4) and (3,4) have large p-values in testing the same forms, but the rest of the combinations have almost zero p-values. Again, there is no clear pattern in Set 1. So we arrive at the the same evidence as through using the RMSD. Note that the RMSD in Table 9.1 implicitly assumes isotropic errors, whereas the T2 statistic allows for correlated errors so the tests are more stringent.
Table 9.3 Set 1: Pairwise F values corresponding to Hotelling’s T2 statistics for the five structures with the bootstrap p-values.
Pair | F | p-value | Pair | F | p-value |
1,2 | 1.09 | 0.40 | 2,4 | 1.57 | 0.14 |
1,3 | 1.37 | 0.22 | 2,5 | 2.11 | 0.03 |
1,4 | 1.99 | 0.04 | 3,4 | 2.37 | 0.02 |
1,5 | 1.30 | 0.25 | 3,5 | 1.69 | 0.12 |
2,3 | 0.92 | 0.56 | 4,5 | 1.76 | 0.09 |
Table 9.4 Set 2: Pairwise F values corresponding to Hotelling’s T2 statistics for the five structures with the bootstrap p-values.
Pair | F | p-value | Pair | F | p-value |
1,2 | 4.98 | 0.00 | 2,4 | 0.78 | 0.65 |
1,3 | 4.38 | 0.00 | 2,5 | 8.00 | 0.00 |
1,4 | 8.38 | 0.00 | 3,4 | 1.83 | 0.06 |
1,5 | 1.32 | 0.20 | 3,5 | 5.95 | 0.00 |
2,3 | 0.49 | 0.91 | 4,5 | 12.96 | 0.00 |
In passing, we note that from TorusDBN, we find that the predicted secondary structures of the fragment sequences are α-helices, which is consistent with prediction from established deterministic software such as PHYRES (Kelley et al. 2015).
However, the analysis given here through the structures themselves reveals more about their shape differences. To conclude, all the analysis indicates that there is no pattern in Set 1 (N-terminus) but Set 2 (C-terminus) shows a strong effect. Also for Set 2, the dendrogram and the results from the Hotelling’s T2 tests are in accordance with the conclusions drawn by Airoldi et al. (2010). The mutation affects primarily the C-terminus and, as these results are similar to sequence analysis, the structural effects are local rather than global.
An alternative method to Procrustes registration is to work directly with the edge registration coordinates such as Bookstein or Kendall or QR coordinates, and then use standard multivariate techniques on the vectors of coordinates. Kent (1994) showed that there exists an approximate linear transformation between Bookstein shape coordinates and the partial Procrustes tangent coordinates, and hence multivariate normal-based inference will be similar using either method for small variability.
However, edge registration in Bookstein coordinates induces correlations between landmarks, as we will see in Corollary 11.3. Hence it is not advisable to use Bookstein coordinates for exploring the structure of shape variability through PCA, as this can be misleading (Kent 1994). For example, if the original landmark coordinates are isotropic then the Bookstein shape variables may be far from isotropic. However, the Procrustes tangent coordinates do not suffer from this problem for 2D data.
Examples of using edge based shape coordinates for inference include Bookstein (1991, p. 282) and Bookstein and Sampson (1990), who describe Hotelling’s T2 tests for shape difference using Bookstein coordinates (see Section 9.4), and they also consider testing for affine or uniform shape changes (see Section 13.5.1). Also, O’Higgins and Dryden (1993) use Hotelling’s T2 tests for investigating sexual dimporphism in great apes. One advantage of the approach is that localized shape differences relative to a baseline can be explored, for example O’Higgins and Dryden (1993) investigate the differences between the face and braincase regions in chimpanzees, gorillas and orangutans, and take the baseline as the line between nasion and basion in the data described in Section 1.4.8.
Allometry can also be investigated in our geometrical framework, using regression of shape on size, and an initial discussion was given in Section 5.7. In biology allometry is a far more detailed topic than we can study here. Allometry often involves regressions of shape on size and other predictors and their interactions, such as sex.
Example 9.8 Consider the T2 Small mouse vertebra data of Section 1.4.1 as a simplified, illustative example. In Figure 9.8 we see plots of centroid size versus the first three PC scores. There appears to be a positive correlation between the first PC score and size, and the correlation is 0.74. When examining the plot of Figure 8.5 it can be seen that the first PC partly measures the length of the spinous process (top protrusion). Therefore a shape measure capturing this effect is associated with centroid size in the dataset. From a biological point of view the association between size of the spinous process and the overall size of the bone is reasonable, because the spinous process bears important muscles.
In the following R code we carry out Procrustes analysis, provide pairwise scatterplots, and fit a linear regression model of centroid size on the PC scores:
> data<-qset2.dat
> n<-dim(data)[3]
> k<-dim(data)[1]
> mice<-procGPA(data)
> pairs(cbind(mice$size,mice$scores[,1:3]),
labels=c("size","PC1","PC2","PC3"))
> Y<-mice$size
> X<-mice$scores[,1:(2*k-4)]
> ans0<-lm(Y~X)
> summary(ans0)
Call:
lm(formula = Y ~ X)
Residuals:
Min 1Q Median 3Q Max
-4.4085 -2.0525 -0.3534 1.8196 7.7677
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 173.82121 0.81607 212.997 < 2e-16 ***
XPC1 0.46823 0.08862 5.284 0.000116 ***
XPC2 -0.34271 0.23535 -1.456 0.167412
XPC3 0.09082 0.26193 0.347 0.733937
XPC4 -0.47672 0.28634 -1.665 0.118143
XPC5 0.11087 0.44854 0.247 0.808357
XPC6 0.64310 0.47654 1.350 0.198591
XPC7 -0.48876 0.58193 -0.840 0.415081
XPC8 -1.07219 0.89208 -1.202 0.249338
—
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.914 on 14 degrees of freedom
Multiple R-squared: 0.7253, Adjusted R-squared: 0.5683
F-statistic: 4.62 on 8 and 14 DF, p-value: 0.00624
We see that the only PC that is significantly related to centroid size is PC1. We can also fit other types of regression models, exactly as we would do for Euclidean data. For example, we can fit a least squares, ridge regression (Hoerl and Kennard 1970), LASSO (Tibshirani 1996) and Elastic Net (Zou and Hastie 2005) model of centroid size versus PC scores, and calculate the within sample mean square prediction error. We use the glmnet
library in R as follows:
> library(glmnet)
> #LS
> ans<-as.double(coef(glmnet(X,Y,alpha=0,lambda=0)))
> print(ans)
[1] 173.82120674 0.46823144 -0.34270710 0.09082315 -0.47672340
[6] 0.11086931 0.64310197 -0.48875932 -1.07219378
> out.ls<-glmnet(X,Y,alpha=0)
> ls.pred<-predict(out.ls,s=0,newx=Xtest)
> mean((ls.pred-Ytest)^2)
[1] 9.440591
>
> #Ridge
> out.ridge<-glmnet(X,Y,alpha=0)
> out<-cv.glmnet(X,Y,alpha=0,grouped=FALSE)
> ans<-as.double(coef(glmnet(X,Y,alpha=0,lambda=out$lambda.min)))
> print(ans)
[1] 173.82120674 0.36051172 -0.26386508 0.06992869 -0.36705005
[6] 0.08536310 0.49515214 -0.37631703 -0.82552856
>
> ridge.pred<-predict(out.ridge,s=out$lambda.min,newx=Xtest)
> mean((ridge.pred-Ytest)^2)
[1] 10.62653
>
> #Lasso
> out.lasso<-glmnet(X,Y,alpha=1)
> plot(out.lasso)
> out<-cv.glmnet(X,Y,alpha=1,grouped=FALSE)
> plot(out)
> ans<-as.double(coef(glmnet(X,Y,alpha=1,lambda=out$lambda.min)))
> print(ans)
[1] 173.8212067 0.3149068 0.0000000 0.0000000 0.0000000 0.0000000
[7] 0.0000000 0.0000000 0.0000000
>
> lasso.pred<-predict(out.lasso,s=out$lambda.min,newx=Xtest)
> mean((lasso.pred-Ytest)^2)
[1] 17.34093
>
> #Elastic Net
> out.en<-glmnet(X,Y,alpha=0.5)
> out<-cv.glmnet(X,Y,alpha=0.5,grouped=FALSE)
> ans<-as.double(coef(glmnet(X,Y,alpha=0.5,lambda=lam2)))
>
> print(ans)
[1] 173.82120674 0.31065524 -0.05315727 0.00000000 -0.11589235
[6] 0.00000000 0.06409357 0.00000000 -0.00714971
> enet.pred<-predict(out.en,s=out$lambda.min,newx=Xtest)
> mean((enet.pred-Ytest)^2)
[1] 12.51252
We use cross-validation to choose the values of the regularization parameters. The full regularization path for the LASSO is given in Figure 9.9.
We see that least squares has the lowest mean square prection error as expected, the coefficient of PC1 is the largest in the two sparse methods (LASSO and Elastic Net). The sparse models again point towards PC1 having the strongest association with centroid size.
Finally we consider out of sample prediction by taking 1000 random subsets of 17 of the n = 23 vertebrae as training data and the remaining 6 observations as test data, for example using:
train<-sample(1:n,trunc(3*n/4))
Y<-mice$size[train]
X<-mice$scores[train,1:(2*k-4)]
Ytest<-mice$size[-train]
Xtest<-mice$scores[-train,1:(2*k-4)]
#Lasso
out.lasso<-glmnet(X,Y,alpha=1)
out<-cv.glmnet(X,Y,alpha=1,grouped=FALSE)
plot(out)
lasso.pred<-predict(out.lasso,s=out$lambda.min,newx=Xtest)
mean((lasso.pred-Ytest)^2)
The test mean square prediction errors are: least squares 30.57; ridge 29.48; LASSO 26.00; and Elastic Net 27.28. Hence here LASSO regression has the best out of sample mean square prediction error. Given that the only non-zero coefficient fitted for the full dataset is for PC1, this again outlines the importance of the allometric relationship between size and PC1.
18.226.150.245