Rather than conditioning we could consider the marginal distribution of shape after integrating out the similarity transformations. These distributions will be called offset shape distributions. We consider multivariate normal configurations of k-points in , that is the model for the landmarks is:
where Ω is symmetric positive definite. We shall concentrate initially on the important m = 2 dimensional case in this chapter. Kendall (1984) introduced the case where μ consists of coincident points and Ω was proportional to the identity matrix. Mardia and Dryden (1989a,b) and Dryden and Mardia (1991b) further developed the shape distributions for general μ and Ω, in two dimensions. These distributions have been called ‘Mardia–Dryden’ distributions in the literature (Bookstein 1991, 2014; Lele and Richtsmeier 1991; Kendall 1991b; Stuart and Ord 1994).
Result 11.1 Consider the k ≥ 3 landmarks in to be i.i.d. as bivariate normal with equal means and covariance matrix proportional to the identity, that is
The resulting shape distribution is uniform in Kent’s polar shape coordinates.
Proof: If the original landmarks have the normal distribution of Equation (11.2), then the Helmertized landmarks X* = vec(HX) have a zero mean 2k − 2 dimensional multivariate normal with covariance matrix σ2I2k − 2. Hence the density of X* is:
We transform from X* to (R, s1, …, sk − 2, θ1, …, θk − 1), where R = ||X*|| and are Kent’s polar pre-shape coordinates of Section 4.3.3. The Jacobian of the inverse transformation is R2k − 322 − k and so the joint density of (R, s1, …, sk − 2, θ1, …, θk − 1) is:
We now transform from R to A = R2/(2σ2) and the Jacobian of the inverse transformation is σ2/R. Hence the joint density of (A, s1, …, sk − 2, θ1, …, θk − 1) is:
Integrating out the rotation θk − 1 (which is independently uniform) and the scale A by using:
we have the density of Kent’s polar shape coordinates as:
which is the uniform density.
The shape distribution will also be uniform for non-normal i.i.d. symmetric distributions (subject to certain regularity conditions), that is where the landmarks are independent and coordinates (Xj, Yj) have joint p.d.f. given by:
where r2j = xj2 + y2j for j = 1, …, k.
Although not as simple as Kent’s polar shape coordinates we shall also use the Kendall coordinates of Equation (2.11) and for convenience for the rest of this chapter we write U instead of UK. Under the i.i.d. normal model of Equation (11.2) the p.d.f. of Kendall coordinates U = (U3, …, Uk, V3, …, Vk)T, with respect to the Lebesgue measure, is given by Equation (10.2), and repeated here:
Corollary 11.1 In the triangle case (k = 3) the shape distribution under the i.i.d. normal model of Equation (11.2) is uniform on the shape sphere . Using Kendall’s spherical coordinates (θ, ϕ) [given by Equation (2.13)] the density, with respect to the Lebesgue measure, is:
and 0 ≤ θ ≤ π, 0 ≤ ϕ < 2π.
Proof: Transforming to Kendall’s spherical coordinates using Equation (2.15) we have:
The Jacobian of the inverse transformation is given by:
and so the result follows.
Mardia (1980), describing joint work with Robert Edwards, gave the offset normal distribution for the reflection shape of triangles using internal angles α1 and α2. The joint density of α1 and α2 is given by:
and α3 = π − α1 − α2 ≥ 0, where S = ∑3j = 1sin 2αj and C = ∑3j = 1cos 2αj.
The connection with the uniform distribution in terms of Bookstein coordinates (UB, VB) is as in the following. The uniform density using Bookstein coordinates for the triangle case is:
The transformation between (UB, VB) and (α1, α2), VB > 0, 0 ≤ α1, α2 ≤ π, is obtained from
and the Jacobian is:
Using the fact that
and
the uniform density corresponding to VB ≥ 0 is:
There is an equal contribution to the density from VB ≤ 0 and so the uniform shape density is given by 2f1 as required.
Result 11.2 When the points are distributed according to the i.i.d. normal model of Equation (11.2) then the marginal distribution of the Riemannian distance ρ between the observed shape u and any fixed shape, has density
Proof: Assume the original landmarks zo are i.i.d. normal with zero mean and unit variance. Transforming to the Helmertized coordinates zH = (zH1, …, zH(k − 1))T = Hzo it follows that |zHj|2 ∼ χ22 independently. Without loss of generality we consider the Riemannian distance ρ from zo to the special rank 1 configuration μ = (1, 0, …, 0)T. Using the formula for the Riemannian distance of Equation (4.21) we have:
which is the ratio of independent chi-squared random variables with 2k − 4 and 2 degrees of freedom, respectively. So,
a scaled F distribution with 2k − 4 and 2 degrees of freedom. Hence, using the connection between the F and beta distributions (e.g. see Mardia et al. 1979, Appendix B),
with density (k − 2)sk − 3. Transforming to ρ with Jacobian
leads to the required density of Equation (11.6).
In Figure 11.1 we see how the shape of the density changes as the number of points k increases. The mode increases as the number of points increases.
This density was first communicated to us by David Kendall (1989, personal communication); see also Goodall and Mardia (1991) and Le (1991b).
If one uses Goodall and Mardia polar shape coordinates described at the end of Section 5.6.2, then the uniform shape distribution is given by the product of marginal density of ρ from Equation (11.6) and the uniform density of the spherical coordinates ϕ.
If the covariance matrix in Equation (11.2) is replaced by σ2diag(s, 1/s), s > 0, the p.d.f. of shape (Kendall 1984), with respect to the uniform measure in the shape space, is:
where
with z+ = (1, u3 + iv3, …, uk + ivk)T and
is the Legendre polynomial of degree n (e.g. see Rainville 1960).
The maximum value of λ is (s + s− 1)/2, which has been called the ‘Broadbent factor’ by Kendall (1984). The model was motivated by the archaeological example of the standing stones of Land’s End described in Section 1.4.18, which was first studied by Broadbent (1980).
An i.i.d. model (with the same mean for each point) is often not appropriate. For example, in many applications one would be interested in a model with different means at each landmark. The first extension that we consider is when the landmarks (Xj, Yj)T, j = 1, …, k, are independent isotropic bivariate normal, with different means but the same variance at each landmark. The isotropic normal model is:
where μ = (μ1, …, μk, ν1, …, νk)T. This model is particularly appropriate when the major source of variability is measurement error at each landmark. For example, consider Figure 11.2 where the points of a triangle are independently perturbed by isotropic normal errors. We wish to find the resulting perturbed shape distribution using say Bookstein or Kendall coordinates after translating, rotating and rescaling two points to a fixed baseline.
Result 11.3 The offset normal shape density under the isotropic normal model of Equation (11.10), with respect to the uniform measure in the shape space, is:
where
is a concentration parameter, S(μ) is the population centroid size, ρ is the Riemannian distance between the observed shape [X] and the population shape [μ], and the confluent hypergeometric function 1F1( · ) is defined in Equation (10.11).
Proof: Transform first to the Helmertized landmarks,
where L = I2⊗H is a (2k − 2) × 2k block diagonal matrix with LLT = I2k − 2, μH = Lμ, and H is the Helmert submatrix of Equation (2.10). We partition into scale/rotation and shape by transforming to (W1 = XH2, W2 = YH2, UT)T, where U = (U3, …, Uk, V3, …, Vk)T are Kendall coordinates and W1 and W2 contain both rotation and scale information. Note that
j = 3, …, k and under the model P(W12 + W22 = 0) = 0. Defining the (2k − 2) vectors u+ and the orthogonal vector v+ by:
we can write the transformation from XH to (W1, W2, U) as:
and the Jacobian of the inverse transformation is:
The p.d.f. of (W1, W2, U) is therefore
where
In order to obtain the marginal shape distribution we integrate out W1 and W2, that is
Keeping U fixed for the moment let us suppose that (W1, W2)|U has a bivariate normal distribution with mean
and covariance matrix σ2{(u+)Tu+}I2. We write the density of this distribution as f*(w1, w2). It can be seen that
where
Now, from standard results of the normal distribution, if Wi, i = 1, 2, are independently normal with means bi and common variance ψ2, then
where χ22(λ) is the non-central chi-squared distribution with two degrees of freedom and non-centrality parameter λ. Hence,
Important point: Note that the integral reduces to calculation of the (k − 2)th moment of the non-central chi-squared distribution. The rth moment for the non-central χ22(λ) distribution is:
Simple algebra shows that
and so
Using simple trigonometric identities we have:
and hence the result is proved by recognizing the uniform measure of Equation (10.1).
This distribution was introduced by Mardia and Dryden (1989a,b). Kendall (1991b) and Le (1991b) verified the result using stochastic calculus and made connections with Brownian motion (see Section 14.4.3). The density is also described in Stuart and Ord (1994, p. 179), for the triangle case.
The distribution has 2k − 3 parameters. There are 2k − 4 parameters in the mean shape [μ] and a concentration parameter κ.
The confluent hypergeometric function is a finite series in the density since
where is the Laguerre polynomial of degree r; see Abramowitz and Stegun (1970, pp. 773–802). Using Kummer’s relation,
we can also rewrite the density, with respect to the uniform measure, as:
A convenient parameterization for the mean shape uses Kendall coordinates Θ = (θ3, …, θk, ϕ3, …, ϕk)T obtained from μ and a coefficient of variation τ2 = σ2/||μ2 − μ1||2. There are 2k − 4 shape parameters Θ and one variation parameter τ. The density can be written, with respect to the uniform measure, as:
where
with Θ+ = (1, θ3, …, θk, 0, ϕ3, …, ϕk)T, and ρ is the Riemannian distance between the shapes u and Θ given earlier in Equation (4.21).
The triangle case (k = 3) is particularly simple. The offset normal shape density is given by:
and transforming to Kendall’s shape sphere using Equation (2.13) and Equation (2.14), the probability element of l = (lx, ly, lz)T, is:
where ν is the direction on the shape sphere corresponding to the mean shape and dS is the probability element of . This result can be seen by using the same argument that led to Equation (10.8). Mardia (1989b) introduced this result and explored various properties of the distribution, including an approximation based on the Fisher–von Mises distribution.
Mardia (1980) gave the result for the reflection shape for triangles where the mean is equilateral, using two internal angles α1 and α2 to measure the reflection shape.
As τ → ∞ in the shape density of Equation (11.16) (or equivalently as the concentration κ → 0) we clearly see that the shape distribution tends to the uniform distribution, since 1F1(2 − k; 1; x) → 1 as x → 0.
Result 11.4 As τ → 0 it can be seen that
where means ‘converges in distribution to’ and Φ = ( − ϕ3, …, −ϕk, θ3, …, θk)T is orthogonal to the mean shape Θ = (θ3, …, θk, ϕ3, …, ϕk)T.
Proof: The result can be proved by using a Taylor series expansion of U about Θ. Specifically if we write XH = μ + ε where, without loss of generality, we have rescaled X by δ12 = {(μ2 − μ1)2 + (ν2 − ν1)2}1/2 and
where τ = σ/δ12. The Taylor series expansion is:
where Z = (ε2, …, εk − 1, ν2, …, νk − 1)T. Hence for small variations we consider up to first-order terms and so the result follows.
This result was first given by Bookstein (1984, 1986) using an approximation argument independent of the exact distribution. Bookstein suggested using normal models for shape variables for approximate inference, as described in Section 9.4.
Note that the approximate covariance in Equation (11.18) is not proportional to the identity matrix except for the k = 3 triangle case. Hence, correlations have been induced in general into the shape variables by the edge registration procedure.
If using Bookstein coordinates the approximate distribution, for small τ, is:
where D = I2⊗(HT1H1) and
where H1 is the lower right (k − 2) × (k − 2) partition matrix of the Helmert submatrix (see Section 2.5), and ΘB = (θB3, …, θkB, ϕB3, …, ϕBk)T are the Bookstein coordinates of the mean μ, with ΦB = ( − ϕB3, …, −ϕBk, θB3, …, θBk)T.
Corollary 11.2 For k = 3 the approximate normal distribution of Bookstein coordinates is isotropic, and in particular
where
and τ = σ/{(μ2 − μ1)2 + (ν2 − ν1)2}1/2.
Also, the form of the covariance leads to large variability if the mean location of the corresponding landmark is far away from the mid-point of the baseline.
Corollary 11.3 For k = 4 we have the following approximate covariances between the shape variables:
Important point: We see that even though the original landmarks are independent, the shape variables become correlated when registering on a common edge. Hence, care should be taken with interpretation of PCs obtained from Bookstein coordinates, as discussed by Kent (1994).
It is immediately clear that the above covariances are invariant under similarity transformations of the mean configuration, that is transformations of the form
as the terms in a and b cancel out in Equation (11.21).
Although we have given the covariance structure for k = 4, for larger k the covariances between any pair of landmarks j and l will be exactly the same as in Corollary NaN but with a suitable change of subscripts. So, given a mean configuration μj + iνj, j = 1, …, k, we can write down the approximate covariances for Bookstein’s shape variables for any particular choice of baseline using the above corollaries.
Result 11.5 If using Kendall or Bookstein coordinates, then only the first moment exists. In particular, using Bookstein coordinates (Mardia and Dryden 1994)
See Mardia and Dryden (1994) for a proof of the result.
We see from Equation (11.22) that the expected value of UB is ‘pulled in’ towards the origin by the multiplicative factor 1 − exp { − 1/(4τ2)}.
Computing the exact expectation of the shape variables involves finding the expectation of a linear term in a random vector X over a quadratic form in X, that is aTX/XTBX. From Jones (1987) we have E[(lTX/XTBX)n] < ∞ if n < rank(B). Here rank(B) = 2 and hence the first moment is the only finite positive integer moment.
We now show that a general isotropic distribution for landmarks about a mean shape in two dimensions gives rise to an isotropic distribution in the tangent space to the mean so that standard PCA is valid in the tangent space, following Mardia (1995). This result is of great significance and it underlies the development of the Procrustes tangent space by Kent (1994), discussed in Section 7.7.
Result 11.6 If the original points have an isotropic covariance structure, then the partial tangent coordinates v are also isotropic.
Proof: Consider the original landmarks zo to be isotropic and take the Helmertized landmarks z = Hzo which will also be isotropic. The Helmertized landmarks z can be modelled by , where δ is isotropic and E(δ) = 0, so that the p.d.f. of δ is given by:
The partial Procrustes tangent projection is:
Without any loss of generality, let us take ν = (1, 0, …, 0)T so that,
where δ = (δ1, ΔT)T and λ = ||Δ||. Now, Δ|{δ1 = c} is isotropic since from Equation (11.23)
which is a function of λ2 only. Also, since a new function of Δ depending on λ2 (and λ is the radial component of Δ) is also isotropic given δ1, it follows from Equation (11.24) that
conditional on δ1 is also isotropic and (vT|δ1) = (0, e− iθuT|δ1), θ = Arg(1 + δ1) and E(uT|δ1) = 0. Hence,
Furthermore, we can write
Hence
so that Equation (11.25) and Equation (11.26) yield:
The same argument applies for the PCA in real coordinates in the tangent space.
Shape distributions for the general multivariate normal model of Equation (11.1) with a general covariance structure in m = 2 dimensions have been studied by Dryden and Mardia (1991b). We write X ∼ N2k(μ, Ω) in this general case, where μ = (μ1, …, μk, ν1, …, νk)T and Ω is a 2k × 2k symmetric positive definite matrix. The offset normal shape density in the most general case involves finite sums of generalized Laguerre polynomials and is very complicated. The general shape distribution depends on 2k − 4 mean shape parameters Θ (the shape of μ) and a 2k − 2 × 2k − 2 matrix of covariance parameters Σ = LΩLT, where L = I2⊗H and H is the Helmert submatrix. We write the p.d.f. of the Kendall shape variables in the general case as f(u; Θ, Σ), and the form of the density is given by Dryden and Mardia (1991b). Simplification of the density does occur when there is complex normal structure in the covariance matrix, which we now discuss.
Let x = vec(X) have a complex multivariate normal distribution with covariance matrix
where Ω1 is a k × k positive definite symmetric matrix and Ω2 is a k × k skew symmetric matrix [see Equation (10.3)]. The covariance matrix of the Helmertized landmarks vec(HX) is:
say, where ΣT2 = −Σ2, μH = Lμ, and L = I2⊗H.
Result 11.7 In the complex normal case the shape density is given by:
where r is defined as a generalized Procrustes distance given by:
and κ = μTHΣ− 1μH/4.
The proof of this result was given by Dryden and Mardia (1991b). Le (1991b) has verified the complex normal shape distribution using stochastic calculus. The complex normal shape density is very similar in structure to the isotropic normal case, except that the Euclidean norm is replaced by a Mahalanobis norm. There are many useful models that have a complex normal covariance structure; such as the isotropic model of Section 11.1.2 and the cyclic Markov model (Dryden and Mardia 1991b).
Let us assume that not all the landmark means are equal, and consider the case where variations are small compared with the mean length of the baseline. We write the population shape as Θ = (θ3, …, θk, ϕ3, …, ϕk)T (i.e. the Kendall coordinates of {(μj, vj)T, j = 1, …, k}), and without loss of generality we translate, rotate and rescale the population so that the vectorized Helmertized coordinates XH = LX have mean μH = (1, θ3, …, θk, 0, ϕ3, …, ϕk)T, and general positive definite covariance matrix Σ = 2τ2LΩLT, where τ = σ/[(μ2 − μ1)2 + (ν2 − ν1)2]1/2.
Result 11.8 As τ → 0 +,
where
and Ij, 0j are the j × j identity and zero matrices.
Proof: After transforming to the Helmertized coordinates as in Equation (11.12), we write XH = μH + ε+ where
and ε and η are (k − 2) × 1 column vectors. Note that, if τ is small, then each element of ε+ will be small. Transforming to Kendall coordinates, as in Equation (11.13), we have
Thus,
Thus, using Equation (11.31) and considering up to first-order terms [since we assume τ and hence (ε+)j are small], we have the result.
The covariance matrix of the normal approximation of Equation (11.30) depends on the mean Θ. For the isotropic case, with Ω = I2k, the covariance matrix of the normal approximation is given by:
as seen in Equation (11.18). In the complex normal case the normal approximation is also of complex normal form.
Consider using the general offset normal shape distribution to model population shapes. We could have a random sample of n independent observation shapes u1, …, un from the most general offset normal shape distribution for k ≥ 3, with population shape parameters Θ, and covariance parameters Σ. We have 2k2 − k − 3 parameters to estimate in total, although we expect only 2k2 − 5k + 2 to be identifiable, as there are 2k − 4 mean parameters and (2k − 4)(2k − 3)/2 parameters in the covariance matrix of a general 2k − 4 dimensional multivariate normal distribution. If we write the exact shape density as f(u; Θ, Σ), then the likelihood of the data is:
and the log-likelihood is given by:
The MLE for shape and covariance is given by:
We could proceed to maximize the likelihood with a numerical routine. For example, we could choose Newton–Raphson, Powell’s method, simulated annealing or a large variety of other techniques. Estimation can be problematic for completely general covariance structures (because singular matrices can lead to non-degenerate shape distributions), and so we proceed by considering simple covariance structures with a few parameters. For many simple practical cases there are no estimation problems. We now deal with such an application, where the covariance structure is assumed to be isotropic.
Consider a random sample of n independent observations u1, …, un from the isotropic offset normal shape distribution of Equation (11.11). We work with parameters (ΘT, τ)T, where Θ is the mean shape and τ is the coefficient of variation. From Equation (11.16), the exact log-likelihood of (ΘT, τ)T, ignoring constant terms, is:
where ρi = ρ(ui, Θ) is the Riemannian distance between ui and the population mean shape Θ, given in Equation (11.17). Using the result (e.g. Abramowitz and Stegun 1970) that
we can find the likelihood equations and solve using a numerical procedure. The root of the likelihood equations that provides a global maximum of Equation (11.33) is denoted by , the exact MLEs of (ΘT, τ)T, with τ > 0.
We can now write down likelihood ratio tests for various problems, for example testing for differences in shape between two independent populations or for changes in variation parameter. Large sample standard likelihood ratio tests can be carried out in the usual way, using Wilks’ theorem (Wilks 1962).
Consider, for example, testing whether , versus , where , with dim and dim. Let
then according to Wilks’ theorem − 2log Λ ≈ χ2q − p under H0, for large samples (under certain regularity conditions).
If we had chosen different shape coordinates for the mean shape, then inference would be identical as there is a one to one linear correspondence between Kendall and Bookstein coordinates as seen in Equation (2.12). Also, if a different baseline was chosen, then there is a one to one correspondence between Kendall’s coordinates and the alternative shape parameters. Therefore, any inference will not be dependent upon baseline or coordinate choice with this exact likelihood approach.
Example 11.1 Consider the schizophrenia data described in Section 1.4.5. The isotropic model may not be reasonable here but we will proceed with an illustrative example of the technique. So, we consider the shape variables in group g to be random samples from the isotropic offset normal distribution, with likelihood given by Equation (11.33) with 2k − 3 = 23 parameters in each group Θg, τg, where g = 1 (control group) or g = 2 (schizophrenia group). A hypothesis test for isotropy was considered in Example 9.7.
The isotropic offset normal MLEs were obtained using the R routine frechet
which uses nlm
for the maximization (see Section 11.3.3). Plots of the mean shapes in Bookstein coordinates using baseline genu (landmark 2) and splenium (landmark 1) are given in Figure 11.3. The exact isotropic MLE of shape is very close to the full Procrustes mean shape; in particular the Riemannian distance ρ from the exact isotropic MLE to the full Procrustes mean shapes in both groups are both extremely small. The MLE of the variation parameters are and .
Testing for equality in mean shape in each group we have − 2log Λ, distributed as χ222 under H0, as 43.269. Since the < ?TeXp? >-value for the test is P(χ222 ≥ 43.269) = 0.005 we have strong evidence that the groups are different in mean shape, as is also seen in Example 9.7. Hence, there is evidence for a significant difference in mean shape between the controls and the schizophrenia patients. Note the large shape change in the triangle formed by the splenium and the two landmarks just below it. If we restrict the groups to have the same τ as well as the same mean shape in H0, then − 2log Λ = 44.9 and the < ?TeXp? >-value for the test is P(χ223 ≥ 44.9) = 0.004.
Of course, we may have some concern that our sample sizes may not be large enough for the theory to hold and alternative F and Monte Carlo permutation tests (with similar findings) are given in Example 9.7.
The R code for the above example is as follows:
> anss<-frechet(schizophrenia$x[,,schizophrenia $group==”scz”],
mean=”mle”)
> ansc<-frechet(schizophrenia$x[,,schizophrenia$group==”con”],
mean=”mle”)
> anss$loglike
[1] 775.8128
> ansc$loglike
[1] 798.1256
> anss$tau
[1] 0.036485
> ansc$tau
[1] 0.03441964
> anss$kappa
[1] 1022.851
> ansc$kappa
[1] 1183.186
#pooled groups
> anspool<-frechet(schizophrenia$x,mean=”mle”)
> anspool$loglike
[1] 1551.484
# Wilks’ statistic
> 2*(ansc$loglike+anss$loglike-anspool$loglike)
[1] 44.90941
Hence, we have verified the estimated variance parameters, and the final likelihood ratio statistic.
Kume and Welling (2010) consider a different approach for maximum likelihood estimation for the offset-normal shape distribution using an EM algorithm. The approach involves working with the original distribution of the landmark coordinates but treating the rotation and scale as missing/hidden variables. The approach is also useful for size-and-shape estimation. The methodology has been demonstrated to be effective in a variety of examples and it has also been extended to 3D shape and size-and-shape analysis (Kume et al. 2015).
A further extension of the method has been carried out by Huang et al. (2015) who use a mixture of offset-normal distributions for clustering with mixing proportions from a regression model, with application to corpus callosum shapes. Burl and Perona (1996) used the offset normal distribution for recognizing planar object classes, in particular for detecting faces from clutter via likelihood ratios.
In order to carry out practical inference one needs to decide upon a suitable procedure. The offset normal shape models are over-parameterized and rather awkward to work with. The isotropic offset normal model is reasonably straightforward to use, although the complex Watson or complex Bingham distribution are much easier to use for practical inference because they are members of the exponential family. So, if the isotropic model is appropriate, then in practice one would use the complex Watson distribution. However, the equal independent isotropic covariance structure is too simplistic for most datasets and so a next stage is to consider the complex covariance structure. The complex Bingham distribution is much easier to use than the offset complex normal distribution and so would be preferred.
There is a close analogy here with inference in directional data analysis. The offset normal shape distribution is analogous to the offset normal distribution in directional statistics, yet in practice one uses the von Mises–Fisher distribution for practical analysis. As a next stage in modelling one considers the real Bingham or Fisher–Bingham family of distributions, rather than more complicated offset normal distributions.
Important point: Perhaps the most straightforward and preferred way to proceed with inference for planar shape is to use the complex Bingham distribution, which gives the full Procrustes mean as the MLE of mean shape. In order to assess the structure of variability it is best to proceed with a multivariate analysis approach in the real tangent space, as seen in Chapter 9 and Section 12.4.
There are various extensions of offset normal shape distributions that can be considered, for example Alshabani et al. (2007a) consider partial shape distributions, where the landmark data are in three dimensions but the rotation invariance is only in one plane. These distributions are closely related to the planar shape distributions, and can be considered the joint shape distribution with marks at each landmark. The development of this work was motivated by the Bayesian analysis of human movement data (Alshabani et al. 2007b) from Section 1.4.13.
Under the general normal model of Equation (11.1) we wish to investigate the joint distribution of size-and-shape – the offset normal size-and-shape distribution. These distributions have been developed by Dryden and Mardia (1992), and in general the results are complicated. Bookstein (2013b) states that in many applications the data really do live in Euclidean space but many models are in non-Euclidean shape spaces. One could argue that models should also be constructed in Euclidean spaces, where covariances between atoms for example make sense, and scale is not removed.
In the isotropic case we have Ω = σ2I2k and so Σ = σ2I2k − 2. It will be convenient to consider the centroid size S to measure size here. This model was used for first-order size-and-shape analysis by Bookstein (1986).
Corollary 11.4 The size-and-shape density of (S, U), with respect to the Lebesgue measure dSdU, for the isotropic case is:
where ζ = ||μH|| is the population centroid size and ρ is the Riemannian distance between the population and observed shapes given by:
and f∞(u) is the uniform shape density in Kendall coordinates, from Equation (10.2).
A proof was given by Dryden and Mardia (1992) which involves a transformation and integrating out a rotation variable. The result was also obtained by Goodall and Mardia (1991). Let us write Θ for the Kendall coordinates obtained from μH. There is a total of 2k − 2 parameters here, namely the population size ζ, the population shape Θ (2k − 4 parameters) and the variation σ2 at landmarks.
If all the means (μj, vj)T, j = 1, …, k, are equal under the isotropic model, that is the population configuration is coincident (ζ = 0), then the density of (S, U), with respect to the Lebesgue measure, is given by:
Proof: Let ζ → 0 in the density of Equation (11.34).
Note that size (a scaled χ2k − 2 random variable) and shape are independent in this limiting case, as ζ → 0.
Bookstein (1986) has shown that the centroid size and shape variables have approximately zero covariance for small variations under the isotropic normal model. We can also derive this result directly from the p.d.f. Let
be the shape of the population configuration,
and define
with
Result 11.9 If (σ/ζ) is small, then the approximate isotropic size-and-shape distribution is:
Proof: Transform to T = (S − ζ)/σ and W = ζ(U − Θ)/σ, and write γ = σ/ζ, where ζ > 0. The density of (T, W) is:
Using the facts that, for large x,
and that, for small γ,
we see that
Thus, since |B(Θ)|1/2 = (1 + ΘTΘ), we deduce that, as γ → 0 +,
The result follows.
This result could also be obtained using Taylor series expansions for S and U, and Bookstein (1986) used such a procedure without reference to the offset normal size-and-shape distribution.
Using the standard result for the distribution of S under the isotropic model (e.g. Miller 1964, p. 28), it follows that the conditional density of U given S = s is:
where f∞(u) is the uniform shape density of Equation (10.2). The conditional density of S given U involves the ratio of a modified Bessel function and a confluent hypergeometric function. Using Weber’s first exponential integral (Watson 1944, p. 393), it can be seen that
If γ = σ/ζ is small, then, since for large x > 0 (Abramowitz and Stegun 1970, p. 504),
and since cos 2ρ = 1 − O(γ2), it follows that
This verifies the result of Bookstein (1986) that S2 and U are approximately uncorrelated for small γ, which can also be seen from Equation (11.35).
The size-and-shape distribution can be used for likelihood-based inference. Inference for models with general covariance structures will be very complicated. Placing some restrictions on the covariance matrix is necessary for the same identifiability reasons as in the shape case, discussed in Section 11.3.
Let us assume that we have a random sample (si, ui), i = 1, …, n, from the isotropic size-and-shape distribution of Equation (11.34). We have the 2k − 2 parameters to estimate: population mean size ζ, population mean shape Θ (a 2k − 4-vector), and population variance σ2. The log likelihood is given, up to a constant with respect to the parameters, by:
where ρi is the Riemannian distance between the shapes ui and Θ given by Equation (4.21). We denote the MLEs by .
The likelihood equations must be solved numerically, although there is the following relationship between the MLEs of σ2 and ζ:
The numerical maximization of Equation (11.36) requires the accurate evaluation of log I0(κ). This can be performed using a polynomial approximation such as mentioned by Abramowitz and Stegun (1970, p. 378).
Consider, for example, testing for symmetry in a sample of triangles. One aspect of the shape that could be investigated is asymmetry about the axis from the midpoint of the baseline to landmark 3.
Example 11.2 Consider a random sample of 30 T1 mouse vertebrae with k = 3 landmarks on each bone. Each triangle is formed by landmarks 1, 2 and 3 as in Figure 11.4.
Let us propose that the data form a random sample from the isotropic size-and-shape distribution. As a model check, since variations are small we would expect the Kendall coordinates and centroid size (U, V, S) to be approximately normal and uncorrelated from Equation (11.35). There is no reason to doubt normality using Shapiro–Wilk W tests, and the sample correlations for (U, V), (U, S) and (V, S) are only − 0.07, −0.06 and − 0.07, respectively. In addition, the standard deviations of U and V should be approximately equal and here the sample standard deviations of U and V are 0.023 and 0.034, respectively. Thus, we conclude that the model is fairly reasonable and there appears to be no linear relationship between size and shape in this particular dataset.
The MLEs are found by maximizing Equation (11.36). The values of the MLEs are and .
It is suggested that the population mean triangle could be isosceles with axis of symmetry θ = 0. We can investigate such asymmetry by testing H0: Θ = (0, ϕ)T versus H1: Θ = (θ, ϕ)T, Θ unrestricted, where ϕ, ζ > 0 and σ2 > 0 are not restricted under either hypothesis. Standard likelihood ratio tests can be performed in the usual way using Wilks’ theorem for large samples. Let
Then according to Wilks’ theorem − 2log Λ ≃ χ21 under H0, for large samples. This test could of course be carried out using shape information alone, ignoring size. For this T1 mouse vertebrae example the statistic is − 2log Λ = 5.52 with approximate p-value 0.02 calculated from the χ21 approximation. Thus we have evidence to reject symmetry about θ = 0.
For practical data analysis in three and higher dimensions we recommend the use of Procrustes analysis and tangent space inference (Chapters 7 and 9). The study of suitable models for higher dimensional shape is much more difficult than in the planar case. We outline some distributional results for normally distributed configurations, although the work is complicated.
Goodall and Mardia (1991, 1992, 1993) have considered higher dimensional offset normal size-and-shape and shape distributions by using the QR decomposition. In particular, the decomposition of Bartlett (1933) described below leads to the main results.
Although we have concentrated on k ≥ m + 1 so far, in this section on higher dimensional distributions we also allow k < m + 1. Invariance with respect to rotation if k < m + 1 is the same as invariance with respect to the set of k − 1 orthonormal frames in m dimensions, which is the Stiefel manifold Vk − 1, m (e.g. see Jupp and Mardia 1989). The Stiefel manifold Vn, m of orthonormal n frames in m dimensions can be regarded as the set of (n × m) matrices A satisfying AAT = In. When k > m then Vn, m = Vm, m = O(m).
First of all we consider the general QR decomposition described in Section 5.6.2 in more detail, for general values of k and m. Consider the QR decomposition of the Helmertized coordinates Y = HX, namely
where n = min (k − 1, m), Vn, m is the Stiefel manifold and T((k − 1) × n) is lower triangular with non-negative diagonal elements Tii ≥ 0, i = 1, …, n. When k > m we have Γ O(m) and det(Γ) = ±1, and hence the QR decomposition removes orientation and reflection in this case.
Before in Section 5.6.2 we just considered k > m but restricted Γ so that reflections were not allowed. In that case we removed orientation only for k > m and so we had Γ SO(m), det(Γ) = +1 and T unrestricted. However, when k ≤ m there is no distinction to make between allowing reflections or not – the shape is invariant under reflection in this case.
For k > m, T, {Tij : 1 ≤ i ≤ j ≤m} is the size-and-shape if Γ SO(m) and the reflection size-and-shape if Γ O(m). If k ≤ m, then the distinction is irrelevant and we just call T the size-and-shape.
To remove scale we divide by the Euclidean norm of T,
For k > m, W is the shape of our configuration if Γ SO(m) and reflection shape if Γ O(m). If k ≤ m the distinction is again irrelevant and we call W the shape. If ||T|| = 0, then the shape is not defined.
If X has the multivariate isotropic normal model
then YYT = HXXTHT = TΓΓTTT = TTT has a central Wishart distribution if rank(μ) = 0, where μ = E[Y] = Hvec− 1m(μ*), and m > k − 1. Bartlett’s decomposition gives the distribution of the lower triangular matrix T, where YYT = TTT. Hence T is the reflection size-and-shape in the QR decomposition of Y in Equation (11.37).
Result 11.10 (Bartlett decomposition). Under the normal model of Equation (11.38) with μ = 0 the distribution of (T)ij is:
for i > j, i = 1, …, k − 1, j = 1, …, N, and all these variables are mutually independent {N = min (k − 1, m)}.
The proof involves random rotation matrices. For example, for k − 1 < m see Kshirsagar (1963). Goodall and Mardia (1991) have shown that this result holds also for m < k (even when the Wishart matrix is singular) and thus Equation (11.39) is the reflection size-and-shape distribution for the coincident means (μ = 0) case. If the size-and-shape distribution without reflections is required, then the result of Equation (11.39) holds, except that Tmm ∼ N(0, σ2) for k > m.
Goodall and Mardia (1991) derive further results for planar and higher rank means. The densities are complicated in general. One simple case is that of triangles in higher dimensions.
If k = 3 and m ≥ 3, then we can consider Kendall’s spherical coordinates on the hemisphere . In this case we have W11 = (1 + sin θcos ϕ)1/2/√2, W22 = cos θ(1 + sin θcos ϕ)− 1/2/√2 and J(u) = sin θ(1 + sin θcos ϕ)− 1/2/(2√2). Hence the shape density on is given by,
The uniform measure on the hemisphere is sin θ/(2π)dθdϕ and θ/2 is the great circle distance to the north pole (the equilateral triangle shape). This result was derived by Kendall (1988) and indicates that the higher the dimension m, the greater the strength of the mode at the equilateral triangle.
If m = 2 then we divide Equation (11.40) by 2 to obtain the shape density without reflection on the sphere S2(1/2),
which is the uniform density on .
Diffusions on planar shape spaces were studied by Ball et al. (2008) and the explicit expressions for Brownian motion were given. Also Ornstein–Uhlenbeck processes in planar shape space were defined using Goodall–Mardia coordinates. A drift term was added in the Ornstein–Uhlenbeck process along the geodesic towards the mean shape. Inference for these models was given by Ball et al. (2006) who discussed exact retrospective sampling of Ornstein–Uhlenbeck processes, with a particular application to modelling cell shape.
Other types of diffusions include Radon shape diffusions, where the shape of random projections of shapes into subspaces of lower dimension is considered (Panaretos 2006, 2008). In related work Panaretos (2009) investigates the recovery of a density function from its Radon transform at random and unknown projection angles, and a practical application is a random tomography technique for obtaining sparse reconstructions of proteins from noisy random projections using the LASSO (Panaretos and Konis 2011).
An alternative way to proceed is to consider an approach using results from multivariate analysis. The QR decomposition is again considered Y = TΓ, but one transforms to the joint distribution of (T, Γ) and then integrates out Γ over the Stiefel manifold to obtain the size-and-shape distribution. Integrating out size then gives the marginal shape distribution.
The key result for integrating out the rotation (and reflections with k ≥ m + 1) is that, from James (1964):
where [ΓdΓT] is the invariant unnormalized measure on O(m) and 0F1( · ; ·) is a hypergeometric function with matrix argument, which can be written as a sum of zonal polynomials. If rank(μ) < m, then it follows that
and so the density without reflection invariance is fairly straightforward to find in this case. If rank(μ) = m, for m ≥ 3 then the density without reflection invariance cannot be computed in this way.
Goodall and Mardia (1992) discuss in detail the multivariate approach for computing shape densities from normal distributions. The case where the covariance matrix is Im⊗Σ is dealt with and size-and-shape and shape distributions are given for all cases except rank(μ) = m, for m ≥ 3.
In conclusion, marginal shape distributions from normal models are very complicated in higher than two dimensions, but for low rank means (rank ≤ 2) the distributions are tractable.
When variations are small Goodall and Mardia (1993) have given some normal approximations to the size-and-shape and shape distributions.
Result 11.11 Let TP be the Procrustes size-and-shape coordinates after rotating the size-and-shape T to the mean μ (diagonal and of rank m) given in Equation (7.14). For small σ, it follows that (TP)ij follow independent normal distributions with means μij and variances σ2, for i = j or i > m, and σ2/(1 + μ2ii/μjj2) for j < i ≤ m.
Important point: The normal approximation is much simpler to use for inference than the offset normal distribution, and Goodall and Mardia (1993) consider an m = 3 dimensional application of size-and-shape analysis in the manufacture of chairs. A decomposition of the approximate distribution for size and shape is also given, and it can be shown for general m that size and shape are asymptotically independent (as σ → 0) under isotropy. The 2D (m = 2) case was seen earlier in Result 11.9. Thus the result gives a method for small σ for shape analysis. Inference will be similar to using the tangent approximation inference of Section 9.1.
3.145.37.86