Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

11
Offset normal shape distributions

11.1 Introduction

Rather than conditioning we could consider the marginal distribution of shape after integrating out the similarity transformations. These distributions will be called offset shape distributions. We consider multivariate normal configurations of k-points in , that is the model for the landmarks is:

(11.1)

where Ω is symmetric positive definite. We shall concentrate initially on the important m = 2 dimensional case in this chapter. Kendall (1984) introduced the case where μ consists of coincident points and Ω was proportional to the identity matrix. Mardia and Dryden (1989a,b) and Dryden and Mardia (1991b) further developed the shape distributions for general μ and Ω, in two dimensions. These distributions have been called ‘Mardia–Dryden’ distributions in the literature (Bookstein 1991, 2014; Lele and Richtsmeier 1991; Kendall 1991b; Stuart and Ord 1994).

11.1.1 Equal mean case in two dimensions

11.1.1.1 Isotropic case

Result 11.1 Consider the k ≥ 3 landmarks in to be i.i.d. as bivariate normal with equal means and covariance matrix proportional to the identity, that is

(11.2)

The resulting shape distribution is uniform in Kent’s polar shape coordinates.

Proof: If the original landmarks have the normal distribution of Equation (11.2), then the Helmertized landmarks X* = vec(HX) have a zero mean 2k − 2 dimensional multivariate normal with covariance matrix σ²I_{2k − 2}. Hence the density of X* is:

We transform from X* to (R, s₁, …, s_{k − 2}, θ₁, …, θ_{k − 1}), where R = ||X*|| and are Kent’s polar pre-shape coordinates of Section 4.3.3. The Jacobian of the inverse transformation is R^{2k − 3}2^{2 − k} and so the joint density of (R, s₁, …, s_{k − 2}, θ₁, …, θ_{k − 1}) is:

We now transform from R to A = R²/(2σ²) and the Jacobian of the inverse transformation is σ²/R. Hence the joint density of (A, s₁, …, s_{k − 2}, θ₁, …, θ_{k − 1}) is:

Integrating out the rotation θ_{k − 1} (which is independently uniform) and the scale A by using:

we have the density of Kent’s polar shape coordinates as:

which is the uniform density.

The shape distribution will also be uniform for non-normal i.i.d. symmetric distributions (subject to certain regularity conditions), that is where the landmarks are independent and coordinates (X_j, Y_j) have joint p.d.f. given by:

where r²_j = x_j² + y²_j for j = 1, …, k.

Although not as simple as Kent’s polar shape coordinates we shall also use the Kendall coordinates of Equation (2.11) and for convenience for the rest of this chapter we write U instead of U^K. Under the i.i.d. normal model of Equation (11.2) the p.d.f. of Kendall coordinates U = (U₃, …, U_k, V₃, …, V_k)^T, with respect to the Lebesgue measure, is given by Equation (10.2), and repeated here:

Corollary 11.1 In the triangle case (k = 3) the shape distribution under the i.i.d. normal model of Equation (11.2) is uniform on the shape sphere . Using Kendall’s spherical coordinates (θ, ϕ) [given by Equation (2.13)] the density, with respect to the Lebesgue measure, is:

and 0 ≤ θ ≤ π, 0 ≤ ϕ < 2π.

Proof: Transforming to Kendall’s spherical coordinates using Equation (2.15) we have:

(11.3)

The Jacobian of the inverse transformation is given by:

(11.4)

and so the result follows.

Mardia (1980), describing joint work with Robert Edwards, gave the offset normal distribution for the reflection shape of triangles using internal angles α₁ and α₂. The joint density of α₁ and α₂ is given by:

(11.5)

and α₃ = π − α₁ − α₂ ≥ 0, where S = ∑³_{j = 1}sin 2α_j and C = ∑³_{j = 1}cos 2α_j.

The connection with the uniform distribution in terms of Bookstein coordinates (U^B, V^B) is as in the following. The uniform density using Bookstein coordinates for the triangle case is:

The transformation between (U^B, V^B) and (α₁, α₂), V^B > 0, 0 ≤ α₁, α₂ ≤ π, is obtained from

and the Jacobian is:

Using the fact that

and

the uniform density corresponding to V^B ≥ 0 is:

There is an equal contribution to the density from V^B ≤ 0 and so the uniform shape density is given by 2f₁ as required.

11.1.1.2 Distribution of the Riemannian distance

Result 11.2 When the points are distributed according to the i.i.d. normal model of Equation (11.2) then the marginal distribution of the Riemannian distance ρ between the observed shape u and any fixed shape, has density

(11.6)

Proof: Assume the original landmarks z^o are i.i.d. normal with zero mean and unit variance. Transforming to the Helmertized coordinates z_H = (z_H1, …, z_{H(k − 1)})^T = Hz^o it follows that |z_Hj|² ∼ χ²₂ independently. Without loss of generality we consider the Riemannian distance ρ from z^o to the special rank 1 configuration μ = (1, 0, …, 0)^T. Using the formula for the Riemannian distance of Equation (4.21) we have:

which is the ratio of independent chi-squared random variables with 2k − 4 and 2 degrees of freedom, respectively. So,

a scaled F distribution with 2k − 4 and 2 degrees of freedom. Hence, using the connection between the F and beta distributions (e.g. see Mardia et al. 1979, Appendix B),

with density (k − 2)s^{k − 3}. Transforming to ρ with Jacobian

leads to the required density of Equation (11.6).

In Figure 11.1 we see how the shape of the density changes as the number of points k increases. The mode increases as the number of points increases.

Image described by surrounding text and caption. — **Figure 11.1** The density (f) of the Riemannian distance ρ from any fixed shape for different numbers of points k, in the uniform case. The plot shows the densities for values of k = 3, 4, 6, 10, and the mode of the distribution increases as k increases.

This density was first communicated to us by David Kendall (1989, personal communication); see also Goodall and Mardia (1991) and Le (1991b).

If one uses Goodall and Mardia polar shape coordinates described at the end of Section 5.6.2, then the uniform shape distribution is given by the product of marginal density of ρ from Equation (11.6) and the uniform density of the spherical coordinates ϕ.

11.1.1.3 Anisotropic case

If the covariance matrix in Equation (11.2) is replaced by σ²diag(s, 1/s), s > 0, the p.d.f. of shape (Kendall 1984), with respect to the uniform measure in the shape space, is:

(11.7)

where

(11.8)

with z⁺ = (1, u₃ + iv₃, …, u_k + iv_k)^T and

(11.9)

is the Legendre polynomial of degree n (e.g. see Rainville 1960).

The maximum value of λ is (s + s^{− 1})/2, which has been called the ‘Broadbent factor’ by Kendall (1984). The model was motivated by the archaeological example of the standing stones of Land’s End described in Section 1.4.18, which was first studied by Broadbent (1980).

11.1.2 The isotropic case in two dimensions

An i.i.d. model (with the same mean for each point) is often not appropriate. For example, in many applications one would be interested in a model with different means at each landmark. The first extension that we consider is when the landmarks (X_j, Y_j)^T, j = 1, …, k, are independent isotropic bivariate normal, with different means but the same variance at each landmark. The isotropic normal model is:

(11.10)

where μ = (μ₁, …, μ_k, ν₁, …, ν_k)^T. This model is particularly appropriate when the major source of variability is measurement error at each landmark. For example, consider Figure 11.2 where the points of a triangle are independently perturbed by isotropic normal errors. We wish to find the resulting perturbed shape distribution using say Bookstein or Kendall coordinates after translating, rotating and rescaling two points to a fixed baseline.

Result 11.3 The offset normal shape density under the isotropic normal model of Equation (11.10), with respect to the uniform measure in the shape space, is:

(11.11)

where

is a concentration parameter, S(μ) is the population centroid size, ρ is the Riemannian distance between the observed shape [X] and the population shape [μ], and the confluent hypergeometric function ₁F₁( · ) is defined in Equation (10.11).

Proof: Transform first to the Helmertized landmarks,

(11.12)

where L = I₂⊗H is a (2k − 2) × 2k block diagonal matrix with LL^T = I_{2k − 2}, μ_H = Lμ, and H is the Helmert submatrix of Equation (2.10). We partition into scale/rotation and shape by transforming to (W₁ = X_H2, W₂ = Y_H2, U^T)^T, where U = (U₃, …, U_k, V₃, …, V_k)^T are Kendall coordinates and W₁ and W₂ contain both rotation and scale information. Note that

(11.13)

j = 3, …, k and under the model P(W₁² + W₂² = 0) = 0. Defining the (2k − 2) vectors u⁺ and the orthogonal vector v⁺ by:

(11.14)

we can write the transformation from X_H to (W₁, W₂, U) as:

and the Jacobian of the inverse transformation is:

The p.d.f. of (W₁, W₂, U) is therefore

where

In order to obtain the marginal shape distribution we integrate out W₁ and W₂, that is

Keeping U fixed for the moment let us suppose that (W₁, W₂)|U has a bivariate normal distribution with mean

and covariance matrix σ²{(u⁺)^Tu⁺}I₂. We write the density of this distribution as f*(w₁, w₂). It can be seen that

where

Now, from standard results of the normal distribution, if W_i, i = 1, 2, are independently normal with means b_i and common variance ψ², then

where χ²₂(λ) is the non-central chi-squared distribution with two degrees of freedom and non-centrality parameter λ. Hence,

Important point: Note that the integral reduces to calculation of the (k − 2)th moment of the non-central chi-squared distribution. The rth moment for the non-central χ²₂(λ) distribution is:

Simple algebra shows that

and so

Using simple trigonometric identities we have:

and hence the result is proved by recognizing the uniform measure of Equation (10.1).

This distribution was introduced by Mardia and Dryden (1989a,b). Kendall (1991b) and Le (1991b) verified the result using stochastic calculus and made connections with Brownian motion (see Section 14.4.3). The density is also described in Stuart and Ord (1994, p. 179), for the triangle case.

The distribution has 2k − 3 parameters. There are 2k − 4 parameters in the mean shape [μ] and a concentration parameter κ.

The confluent hypergeometric function is a finite series in the density since

(11.15)

where is the Laguerre polynomial of degree r; see Abramowitz and Stegun (1970, pp. 773–802). Using Kummer’s relation,

we can also rewrite the density, with respect to the uniform measure, as:

A convenient parameterization for the mean shape uses Kendall coordinates Θ = (θ₃, …, θ_k, ϕ₃, …, ϕ_k)^T obtained from μ and a coefficient of variation τ² = σ²/||μ₂ − μ₁||². There are 2k − 4 shape parameters Θ and one variation parameter τ. The density can be written, with respect to the uniform measure, as:

(11.16)

where

(11.17)

with Θ⁺ = (1, θ₃, …, θ_k, 0, ϕ₃, …, ϕ_k)^T, and ρ is the Riemannian distance between the shapes u and Θ given earlier in Equation (4.21).

11.1.3 The triangle case

The triangle case (k = 3) is particularly simple. The offset normal shape density is given by:

and transforming to Kendall’s shape sphere using Equation (2.13) and Equation (2.14), the probability element of l = (l_x, l_y, l_z)^T, is:

where ν is the direction on the shape sphere corresponding to the mean shape and dS is the probability element of . This result can be seen by using the same argument that led to Equation (10.8). Mardia (1989b) introduced this result and explored various properties of the distribution, including an approximation based on the Fisher–von Mises distribution.

Mardia (1980) gave the result for the reflection shape for triangles where the mean is equilateral, using two internal angles α₁ and α₂ to measure the reflection shape.

11.1.4 Approximations: Large and small variations

11.1.4.1 Large variations

As τ → ∞ in the shape density of Equation (11.16) (or equivalently as the concentration κ → 0) we clearly see that the shape distribution tends to the uniform distribution, since ₁F₁(2 − k; 1; x) → 1 as x → 0.

11.1.4.2 Small variations

Result 11.4 As τ → 0 it can be seen that

(11.18)

where means ‘converges in distribution to’ and Φ = ( − ϕ₃, …, −ϕ_k, θ₃, …, θ_k)^T is orthogonal to the mean shape Θ = (θ₃, …, θ_k, ϕ₃, …, ϕ_k)^T.

Proof: The result can be proved by using a Taylor series expansion of U about Θ. Specifically if we write X_H = μ + ε where, without loss of generality, we have rescaled X by δ₁₂ = {(μ₂ − μ₁)² + (ν₂ − ν₁)²}^1/2 and

where τ = σ/δ₁₂. The Taylor series expansion is:

where Z = (ε₂, …, ε_{k − 1}, ν₂, …, ν_{k − 1})^T. Hence for small variations we consider up to first-order terms and so the result follows.

This result was first given by Bookstein (1984, 1986) using an approximation argument independent of the exact distribution. Bookstein suggested using normal models for shape variables for approximate inference, as described in Section 9.4.

Note that the approximate covariance in Equation (11.18) is not proportional to the identity matrix except for the k = 3 triangle case. Hence, correlations have been induced in general into the shape variables by the edge registration procedure.

If using Bookstein coordinates the approximate distribution, for small τ, is:

(11.19)

where D = I₂⊗(H^T₁H₁) and

where H₁ is the lower right (k − 2) × (k − 2) partition matrix of the Helmert submatrix (see Section 2.5), and Θ^B = (θ^B₃, …, θ_k^B, ϕ^B₃, …, ϕ^B_k)^T are the Bookstein coordinates of the mean μ, with Φ^B = ( − ϕ^B₃, …, −ϕ^B_k, θ^B₃, …, θ^B_k)^T.

Corollary 11.2 For k = 3 the approximate normal distribution of Bookstein coordinates is isotropic, and in particular

(11.20)

where

(11.21)

and τ = σ/{(μ₂ − μ₁)² + (ν₂ − ν₁)²}^1/2.

Also, the form of the covariance leads to large variability if the mean location of the corresponding landmark is far away from the mid-point of the baseline.

Corollary 11.3 For k = 4 we have the following approximate covariances between the shape variables:

Important point: We see that even though the original landmarks are independent, the shape variables become correlated when registering on a common edge. Hence, care should be taken with interpretation of PCs obtained from Bookstein coordinates, as discussed by Kent (1994).

It is immediately clear that the above covariances are invariant under similarity transformations of the mean configuration, that is transformations of the form

as the terms in a and b cancel out in Equation (11.21).

Although we have given the covariance structure for k = 4, for larger k the covariances between any pair of landmarks j and l will be exactly the same as in Corollary NaN but with a suitable change of subscripts. So, given a mean configuration μ_j + iν_j, j = 1, …, k, we can write down the approximate covariances for Bookstein’s shape variables for any particular choice of baseline using the above corollaries.

11.1.5 Exact moments

Result 11.5 If using Kendall or Bookstein coordinates, then only the first moment exists. In particular, using Bookstein coordinates (Mardia and Dryden 1994)

(11.22)

See Mardia and Dryden (1994) for a proof of the result.

We see from Equation (11.22) that the expected value of U^B is ‘pulled in’ towards the origin by the multiplicative factor 1 − exp { − 1/(4τ²)}.

Computing the exact expectation of the shape variables involves finding the expectation of a linear term in a random vector X over a quadratic form in X, that is a^TX/X^TBX. From Jones (1987) we have E[(l^TX/X^TBX)ⁿ] < ∞ if n < rank(B). Here rank(B) = 2 and hence the first moment is the only finite positive integer moment.

11.1.6 Isotropy

We now show that a general isotropic distribution for landmarks about a mean shape in two dimensions gives rise to an isotropic distribution in the tangent space to the mean so that standard PCA is valid in the tangent space, following Mardia (1995). This result is of great significance and it underlies the development of the Procrustes tangent space by Kent (1994), discussed in Section 7.7.

Result 11.6 If the original points have an isotropic covariance structure, then the partial tangent coordinates v are also isotropic.

Proof: Consider the original landmarks z^o to be isotropic and take the Helmertized landmarks z = Hz^o which will also be isotropic. The Helmertized landmarks z can be modelled by , where δ is isotropic and E(δ) = 0, so that the p.d.f. of δ is given by:

(11.23)

The partial Procrustes tangent projection is:

Without any loss of generality, let us take ν = (1, 0, …, 0)^T so that,

where δ = (δ₁, Δ^T)^T and λ = ||Δ||. Now, Δ|{δ₁ = c} is isotropic since from Equation (11.23)

(11.24)

which is a function of λ² only. Also, since a new function of Δ depending on λ² (and λ is the radial component of Δ) is also isotropic given δ₁, it follows from Equation (11.24) that

conditional on δ₁ is also isotropic and (v^T|δ₁) = (0, e^{− iθ}u^T|δ₁), θ = Arg(1 + δ₁) and E(u^T|δ₁) = 0. Hence,

(11.25)

Furthermore, we can write

(11.26)

Hence

so that Equation (11.25) and Equation (11.26) yield:

The same argument applies for the PCA in real coordinates in the tangent space.

11.2 Offset normal shape distributions with general covariances

Shape distributions for the general multivariate normal model of Equation (11.1) with a general covariance structure in m = 2 dimensions have been studied by Dryden and Mardia (1991b). We write X ∼ N_2k(μ, Ω) in this general case, where μ = (μ₁, …, μ_k, ν₁, …, ν_k)^T and Ω is a 2k × 2k symmetric positive definite matrix. The offset normal shape density in the most general case involves finite sums of generalized Laguerre polynomials and is very complicated. The general shape distribution depends on 2k − 4 mean shape parameters Θ (the shape of μ) and a 2k − 2 × 2k − 2 matrix of covariance parameters Σ = LΩL^T, where L = I₂⊗H and H is the Helmert submatrix. We write the p.d.f. of the Kendall shape variables in the general case as f(u; Θ, Σ), and the form of the density is given by Dryden and Mardia (1991b). Simplification of the density does occur when there is complex normal structure in the covariance matrix, which we now discuss.

11.2.1 The complex normal case

Let x = vec(X) have a complex multivariate normal distribution with covariance matrix

where Ω₁ is a k × k positive definite symmetric matrix and Ω₂ is a k × k skew symmetric matrix [see Equation (10.3)]. The covariance matrix of the Helmertized landmarks vec(HX) is:

(11.27)

say, where Σ^T₂ = −Σ₂, μ_H = Lμ, and L = I₂⊗H.

Result 11.7 In the complex normal case the shape density is given by:

(11.28)

where r is defined as a generalized Procrustes distance given by:

(11.29)

and κ = μ^T_HΣ^{− 1}μ_H/4.

The proof of this result was given by Dryden and Mardia (1991b). Le (1991b) has verified the complex normal shape distribution using stochastic calculus. The complex normal shape density is very similar in structure to the isotropic normal case, except that the Euclidean norm is replaced by a Mahalanobis norm. There are many useful models that have a complex normal covariance structure; such as the isotropic model of Section 11.1.2 and the cyclic Markov model (Dryden and Mardia 1991b).

11.2.2 General covariances: Small variations

Let us assume that not all the landmark means are equal, and consider the case where variations are small compared with the mean length of the baseline. We write the population shape as Θ = (θ₃, …, θ_k, ϕ₃, …, ϕ_k)^T (i.e. the Kendall coordinates of {(μ_j, v_j)^T, j = 1, …, k}), and without loss of generality we translate, rotate and rescale the population so that the vectorized Helmertized coordinates X_H = LX have mean μ_H = (1, θ₃, …, θ_k, 0, ϕ₃, …, ϕ_k)^T, and general positive definite covariance matrix Σ = 2τ²LΩL^T, where τ = σ/[(μ₂ − μ₁)² + (ν₂ − ν₁)²]^1/2.

Result 11.8 As τ → 0 +,

(11.30)

where

and I_j, 0_j are the j × j identity and zero matrices.

Proof: After transforming to the Helmertized coordinates as in Equation (11.12), we write X_H = μ_H + ε⁺ where

(11.31)

and ε and η are (k − 2) × 1 column vectors. Note that, if τ is small, then each element of ε⁺ will be small. Transforming to Kendall coordinates, as in Equation (11.13), we have

Thus,

Thus, using Equation (11.31) and considering up to first-order terms [since we assume τ and hence (ε⁺)_j are small], we have the result.

The covariance matrix of the normal approximation of Equation (11.30) depends on the mean Θ. For the isotropic case, with Ω = I_2k, the covariance matrix of the normal approximation is given by:

as seen in Equation (11.18). In the complex normal case the normal approximation is also of complex normal form.

11.3 Inference for offset normal distributions

11.3.1 General MLE

Consider using the general offset normal shape distribution to model population shapes. We could have a random sample of n independent observation shapes u₁, …, u_n from the most general offset normal shape distribution for k ≥ 3, with population shape parameters Θ, and covariance parameters Σ. We have 2k² − k − 3 parameters to estimate in total, although we expect only 2k² − 5k + 2 to be identifiable, as there are 2k − 4 mean parameters and (2k − 4)(2k − 3)/2 parameters in the covariance matrix of a general 2k − 4 dimensional multivariate normal distribution. If we write the exact shape density as f(u; Θ, Σ), then the likelihood of the data is:

and the log-likelihood is given by:

(11.32)

The MLE for shape and covariance is given by:

We could proceed to maximize the likelihood with a numerical routine. For example, we could choose Newton–Raphson, Powell’s method, simulated annealing or a large variety of other techniques. Estimation can be problematic for completely general covariance structures (because singular matrices can lead to non-degenerate shape distributions), and so we proceed by considering simple covariance structures with a few parameters. For many simple practical cases there are no estimation problems. We now deal with such an application, where the covariance structure is assumed to be isotropic.

11.3.2 Isotropic case

Consider a random sample of n independent observations u₁, …, u_n from the isotropic offset normal shape distribution of Equation (11.11). We work with parameters (Θ^T, τ)^T, where Θ is the mean shape and τ is the coefficient of variation. From Equation (11.16), the exact log-likelihood of (Θ^T, τ)^T, ignoring constant terms, is:

(11.33)

where ρ_i = ρ(u_i, Θ) is the Riemannian distance between u_i and the population mean shape Θ, given in Equation (11.17). Using the result (e.g. Abramowitz and Stegun 1970) that

we can find the likelihood equations and solve using a numerical procedure. The root of the likelihood equations that provides a global maximum of Equation (11.33) is denoted by , the exact MLEs of (Θ^T, τ)^T, with τ > 0.

We can now write down likelihood ratio tests for various problems, for example testing for differences in shape between two independent populations or for changes in variation parameter. Large sample standard likelihood ratio tests can be carried out in the usual way, using Wilks’ theorem (Wilks 1962).

Consider, for example, testing whether , versus , where , with dim and dim. Let

then according to Wilks’ theorem − 2log Λ ≈ χ²_{q − p} under H₀, for large samples (under certain regularity conditions).

If we had chosen different shape coordinates for the mean shape, then inference would be identical as there is a one to one linear correspondence between Kendall and Bookstein coordinates as seen in Equation (2.12). Also, if a different baseline was chosen, then there is a one to one correspondence between Kendall’s coordinates and the alternative shape parameters. Therefore, any inference will not be dependent upon baseline or coordinate choice with this exact likelihood approach.

Example 11.1 Consider the schizophrenia data described in Section 1.4.5. The isotropic model may not be reasonable here but we will proceed with an illustrative example of the technique. So, we consider the shape variables in group g to be random samples from the isotropic offset normal distribution, with likelihood given by Equation (11.33) with 2k − 3 = 23 parameters in each group Θ_g, τ_g, where g = 1 (control group) or g = 2 (schizophrenia group). A hypothesis test for isotropy was considered in Example 9.7.

The isotropic offset normal MLEs were obtained using the R routine frechet which uses nlm for the maximization (see Section 11.3.3). Plots of the mean shapes in Bookstein coordinates using baseline genu (landmark 2) and splenium (landmark 1) are given in Figure 11.3. The exact isotropic MLE of shape is very close to the full Procrustes mean shape; in particular the Riemannian distance ρ from the exact isotropic MLE to the full Procrustes mean shapes in both groups are both extremely small. The MLE of the variation parameters are and .

Testing for equality in mean shape in each group we have − 2log Λ, distributed as χ²₂₂ under H₀, as 43.269. Since the < ?TeXp? >-value for the test is P(χ²₂₂ ≥ 43.269) = 0.005 we have strong evidence that the groups are different in mean shape, as is also seen in Example 9.7. Hence, there is evidence for a significant difference in mean shape between the controls and the schizophrenia patients. Note the large shape change in the triangle formed by the splenium and the two landmarks just below it. If we restrict the groups to have the same τ as well as the same mean shape in H₀, then − 2log Λ = 44.9 and the < ?TeXp? >-value for the test is P(χ²₂₃ ≥ 44.9) = 0.004.

Of course, we may have some concern that our sample sizes may not be large enough for the theory to hold and alternative F and Monte Carlo permutation tests (with similar findings) are given in Example 9.7.

11.3.3 Exact isotropic MLE in R

The R code for the above example is as follows:

> anss<-frechet(schizophrenia$x[,,schizophrenia $group==”scz”],
      mean=”mle”)
> ansc<-frechet(schizophrenia$x[,,schizophrenia$group==”con”],
      mean=”mle”)
> anss$loglike
[1] 775.8128
> ansc$loglike
[1] 798.1256
> anss$tau
[1] 0.036485
> ansc$tau
[1] 0.03441964
> anss$kappa
[1] 1022.851
> ansc$kappa
[1] 1183.186
#pooled groups
> anspool<-frechet(schizophrenia$x,mean=”mle”)
> anspool$loglike
[1] 1551.484
# Wilks’ statistic
> 2*(ansc$loglike+anss$loglike-anspool$loglike)
[1] 44.90941

Hence, we have verified the estimated variance parameters, and the final likelihood ratio statistic.

11.3.4 EM algorithm and extensions

Kume and Welling (2010) consider a different approach for maximum likelihood estimation for the offset-normal shape distribution using an EM algorithm. The approach involves working with the original distribution of the landmark coordinates but treating the rotation and scale as missing/hidden variables. The approach is also useful for size-and-shape estimation. The methodology has been demonstrated to be effective in a variety of examples and it has also been extended to 3D shape and size-and-shape analysis (Kume et al. 2015).

A further extension of the method has been carried out by Huang et al. (2015) who use a mixture of offset-normal distributions for clustering with mixing proportions from a regression model, with application to corpus callosum shapes. Burl and Perona (1996) used the offset normal distribution for recognizing planar object classes, in particular for detecting faces from clutter via likelihood ratios.

11.4 Practical inference

In order to carry out practical inference one needs to decide upon a suitable procedure. The offset normal shape models are over-parameterized and rather awkward to work with. The isotropic offset normal model is reasonably straightforward to use, although the complex Watson or complex Bingham distribution are much easier to use for practical inference because they are members of the exponential family. So, if the isotropic model is appropriate, then in practice one would use the complex Watson distribution. However, the equal independent isotropic covariance structure is too simplistic for most datasets and so a next stage is to consider the complex covariance structure. The complex Bingham distribution is much easier to use than the offset complex normal distribution and so would be preferred.

There is a close analogy here with inference in directional data analysis. The offset normal shape distribution is analogous to the offset normal distribution in directional statistics, yet in practice one uses the von Mises–Fisher distribution for practical analysis. As a next stage in modelling one considers the real Bingham or Fisher–Bingham family of distributions, rather than more complicated offset normal distributions.

Important point: Perhaps the most straightforward and preferred way to proceed with inference for planar shape is to use the complex Bingham distribution, which gives the full Procrustes mean as the MLE of mean shape. In order to assess the structure of variability it is best to proceed with a multivariate analysis approach in the real tangent space, as seen in Chapter 9 and Section 12.4.

There are various extensions of offset normal shape distributions that can be considered, for example Alshabani et al. (2007a) consider partial shape distributions, where the landmark data are in three dimensions but the rotation invariance is only in one plane. These distributions are closely related to the planar shape distributions, and can be considered the joint shape distribution with marks at each landmark. The development of this work was motivated by the Bayesian analysis of human movement data (Alshabani et al. 2007b) from Section 1.4.13.

11.5 Offset normal size-and-shape distributions

Under the general normal model of Equation (11.1) we wish to investigate the joint distribution of size-and-shape – the offset normal size-and-shape distribution. These distributions have been developed by Dryden and Mardia (1992), and in general the results are complicated. Bookstein (2013b) states that in many applications the data really do live in Euclidean space but many models are in non-Euclidean shape spaces. One could argue that models should also be constructed in Euclidean spaces, where covariances between atoms for example make sense, and scale is not removed.

11.5.1 The isotropic case

11.5.1.1 The density

In the isotropic case we have Ω = σ²I_2k and so Σ = σ²I_{2k − 2}. It will be convenient to consider the centroid size S to measure size here. This model was used for first-order size-and-shape analysis by Bookstein (1986).

Corollary 11.4 The size-and-shape density of (S, U), with respect to the Lebesgue measure dSdU, for the isotropic case is:

(11.34)

where ζ = ||μ_H|| is the population centroid size and ρ is the Riemannian distance between the population and observed shapes given by:

and f_∞(u) is the uniform shape density in Kendall coordinates, from Equation (10.2).

A proof was given by Dryden and Mardia (1992) which involves a transformation and integrating out a rotation variable. The result was also obtained by Goodall and Mardia (1991). Let us write Θ for the Kendall coordinates obtained from μ_H. There is a total of 2k − 2 parameters here, namely the population size ζ, the population shape Θ (2k − 4 parameters) and the variation σ² at landmarks.

11.5.1.2 Equal means

If all the means (μ_j, v_j)^T, j = 1, …, k, are equal under the isotropic model, that is the population configuration is coincident (ζ = 0), then the density of (S, U), with respect to the Lebesgue measure, is given by:

Proof: Let ζ → 0 in the density of Equation (11.34).

Note that size (a scaled χ_{2k − 2} random variable) and shape are independent in this limiting case, as ζ → 0.

11.5.1.3 Small variations

Bookstein (1986) has shown that the centroid size and shape variables have approximately zero covariance for small variations under the isotropic normal model. We can also derive this result directly from the p.d.f. Let

be the shape of the population configuration,

and define

with

Result 11.9 If (σ/ζ) is small, then the approximate isotropic size-and-shape distribution is:

(11.35)

Proof: Transform to T = (S − ζ)/σ and W = ζ(U − Θ)/σ, and write γ = σ/ζ, where ζ > 0. The density of (T, W) is:

Using the facts that, for large x,

and that, for small γ,

we see that

Thus, since |B(Θ)|^1/2 = (1 + Θ^TΘ), we deduce that, as γ → 0 +,

The result follows.

This result could also be obtained using Taylor series expansions for S and U, and Bookstein (1986) used such a procedure without reference to the offset normal size-and-shape distribution.

11.5.1.4 Conditional distributions

Using the standard result for the distribution of S under the isotropic model (e.g. Miller 1964, p. 28), it follows that the conditional density of U given S = s is:

where f_∞(u) is the uniform shape density of Equation (10.2). The conditional density of S given U involves the ratio of a modified Bessel function and a confluent hypergeometric function. Using Weber’s first exponential integral (Watson 1944, p. 393), it can be seen that

If γ = σ/ζ is small, then, since for large x > 0 (Abramowitz and Stegun 1970, p. 504),

and since cos ²ρ = 1 − O(γ²), it follows that

This verifies the result of Bookstein (1986) that S² and U are approximately uncorrelated for small γ, which can also be seen from Equation (11.35).

11.5.2 Inference using the offset normal size-and-shape model

The size-and-shape distribution can be used for likelihood-based inference. Inference for models with general covariance structures will be very complicated. Placing some restrictions on the covariance matrix is necessary for the same identifiability reasons as in the shape case, discussed in Section 11.3.

Let us assume that we have a random sample (s_i, u_i), i = 1, …, n, from the isotropic size-and-shape distribution of Equation (11.34). We have the 2k − 2 parameters to estimate: population mean size ζ, population mean shape Θ (a 2k − 4-vector), and population variance σ². The log likelihood is given, up to a constant with respect to the parameters, by:

(11.36)

where ρ_i is the Riemannian distance between the shapes u_i and Θ given by Equation (4.21). We denote the MLEs by .

The likelihood equations must be solved numerically, although there is the following relationship between the MLEs of σ² and ζ:

The numerical maximization of Equation (11.36) requires the accurate evaluation of log I₀(κ). This can be performed using a polynomial approximation such as mentioned by Abramowitz and Stegun (1970, p. 378).

Consider, for example, testing for symmetry in a sample of triangles. One aspect of the shape that could be investigated is asymmetry about the axis from the midpoint of the baseline to landmark 3.

Example 11.2 Consider a random sample of 30 T1 mouse vertebrae with k = 3 landmarks on each bone. Each triangle is formed by landmarks 1, 2 and 3 as in Figure 11.4.

Let us propose that the data form a random sample from the isotropic size-and-shape distribution. As a model check, since variations are small we would expect the Kendall coordinates and centroid size (U, V, S) to be approximately normal and uncorrelated from Equation (11.35). There is no reason to doubt normality using Shapiro–Wilk W tests, and the sample correlations for (U, V), (U, S) and (V, S) are only − 0.07, −0.06 and − 0.07, respectively. In addition, the standard deviations of U and V should be approximately equal and here the sample standard deviations of U and V are 0.023 and 0.034, respectively. Thus, we conclude that the model is fairly reasonable and there appears to be no linear relationship between size and shape in this particular dataset.

The MLEs are found by maximizing Equation (11.36). The values of the MLEs are and .

It is suggested that the population mean triangle could be isosceles with axis of symmetry θ = 0. We can investigate such asymmetry by testing H₀: Θ = (0, ϕ)^T versus H₁: Θ = (θ, ϕ)^T, Θ unrestricted, where ϕ, ζ > 0 and σ² > 0 are not restricted under either hypothesis. Standard likelihood ratio tests can be performed in the usual way using Wilks’ theorem for large samples. Let

Then according to Wilks’ theorem − 2log Λ ≃ χ²₁ under H₀, for large samples. This test could of course be carried out using shape information alone, ignoring size. For this T1 mouse vertebrae example the statistic is − 2log Λ = 5.52 with approximate p-value 0.02 calculated from the χ²₁ approximation. Thus we have evidence to reject symmetry about θ = 0.

11.6 Distributions for higher dimensions

11.6.1 Introduction

For practical data analysis in three and higher dimensions we recommend the use of Procrustes analysis and tangent space inference (Chapters 7 and 9). The study of suitable models for higher dimensional shape is much more difficult than in the planar case. We outline some distributional results for normally distributed configurations, although the work is complicated.

11.6.2 QR decomposition

Goodall and Mardia (1991, 1992, 1993) have considered higher dimensional offset normal size-and-shape and shape distributions by using the QR decomposition. In particular, the decomposition of Bartlett (1933) described below leads to the main results.

Although we have concentrated on k ≥ m + 1 so far, in this section on higher dimensional distributions we also allow k < m + 1. Invariance with respect to rotation if k < m + 1 is the same as invariance with respect to the set of k − 1 orthonormal frames in m dimensions, which is the Stiefel manifold V_{k − 1, m} (e.g. see Jupp and Mardia 1989). The Stiefel manifold V_{n, m} of orthonormal n frames in m dimensions can be regarded as the set of (n × m) matrices A satisfying AA^T = I_n. When k > m then V_{n, m} = V_{m, m} = O(m).

First of all we consider the general QR decomposition described in Section 5.6.2 in more detail, for general values of k and m. Consider the QR decomposition of the Helmertized coordinates Y = HX, namely

(11.37)

where n = min (k − 1, m), V_{n, m} is the Stiefel manifold and T((k − 1) × n) is lower triangular with non-negative diagonal elements T_ii ≥ 0, i = 1, …, n. When k > m we have Γ O(m) and det(Γ) = ±1, and hence the QR decomposition removes orientation and reflection in this case.

Before in Section 5.6.2 we just considered k > m but restricted Γ so that reflections were not allowed. In that case we removed orientation only for k > m and so we had Γ SO(m), det(Γ) = +1 and T unrestricted. However, when k ≤ m there is no distinction to make between allowing reflections or not – the shape is invariant under reflection in this case.

For k > m, T, {T_ij : 1 ≤ i ≤ j ≤m} is the size-and-shape if Γ SO(m) and the reflection size-and-shape if Γ O(m). If k ≤ m, then the distinction is irrelevant and we just call T the size-and-shape.

To remove scale we divide by the Euclidean norm of T,

For k > m, W is the shape of our configuration if Γ SO(m) and reflection shape if Γ O(m). If k ≤ m the distinction is again irrelevant and we call W the shape. If ||T|| = 0, then the shape is not defined.

11.6.3 Size-and-shape distributions

11.6.3.1 Coincident means

If X has the multivariate isotropic normal model

(11.38)

then YY^T = HXX^TH^T = TΓΓ^TT^T = TT^T has a central Wishart distribution if rank(μ) = 0, where μ = E[Y] = Hvec^{− 1}_m(μ*), and m > k − 1. Bartlett’s decomposition gives the distribution of the lower triangular matrix T, where YY^T = TT^T. Hence T is the reflection size-and-shape in the QR decomposition of Y in Equation (11.37).

Result 11.10 (Bartlett decomposition). Under the normal model of Equation (11.38) with μ = 0 the distribution of (T)_ij is:

(11.39)

for i > j, i = 1, …, k − 1, j = 1, …, N, and all these variables are mutually independent {N = min (k − 1, m)}.

The proof involves random rotation matrices. For example, for k − 1 < m see Kshirsagar (1963). Goodall and Mardia (1991) have shown that this result holds also for m < k (even when the Wishart matrix is singular) and thus Equation (11.39) is the reflection size-and-shape distribution for the coincident means (μ = 0) case. If the size-and-shape distribution without reflections is required, then the result of Equation (11.39) holds, except that T_mm ∼ N(0, σ²) for k > m.

Goodall and Mardia (1991) derive further results for planar and higher rank means. The densities are complicated in general. One simple case is that of triangles in higher dimensions.

If k = 3 and m ≥ 3, then we can consider Kendall’s spherical coordinates on the hemisphere . In this case we have W₁₁ = (1 + sin θcos ϕ)^1/2/√2, W₂₂ = cos θ(1 + sin θcos ϕ)^{− 1/2}/√2 and J(u) = sin θ(1 + sin θcos ϕ)^{− 1/2}/(2√2). Hence the shape density on is given by,

(11.40)

The uniform measure on the hemisphere is sin θ/(2π)dθdϕ and θ/2 is the great circle distance to the north pole (the equilateral triangle shape). This result was derived by Kendall (1988) and indicates that the higher the dimension m, the greater the strength of the mode at the equilateral triangle.

If m = 2 then we divide Equation (11.40) by 2 to obtain the shape density without reflection on the sphere S²(1/2),

which is the uniform density on .

Diffusions on planar shape spaces were studied by Ball et al. (2008) and the explicit expressions for Brownian motion were given. Also Ornstein–Uhlenbeck processes in planar shape space were defined using Goodall–Mardia coordinates. A drift term was added in the Ornstein–Uhlenbeck process along the geodesic towards the mean shape. Inference for these models was given by Ball et al. (2006) who discussed exact retrospective sampling of Ornstein–Uhlenbeck processes, with a particular application to modelling cell shape.

Other types of diffusions include Radon shape diffusions, where the shape of random projections of shapes into subspaces of lower dimension is considered (Panaretos 2006, 2008). In related work Panaretos (2009) investigates the recovery of a density function from its Radon transform at random and unknown projection angles, and a practical application is a random tomography technique for obtaining sparse reconstructions of proteins from noisy random projections using the LASSO (Panaretos and Konis 2011).

11.6.4 Multivariate approach

An alternative way to proceed is to consider an approach using results from multivariate analysis. The QR decomposition is again considered Y = TΓ, but one transforms to the joint distribution of (T, Γ) and then integrates out Γ over the Stiefel manifold to obtain the size-and-shape distribution. Integrating out size then gives the marginal shape distribution.

The key result for integrating out the rotation (and reflections with k ≥ m + 1) is that, from James (1964):

where [ΓdΓ^T] is the invariant unnormalized measure on O(m) and ₀F₁( · ; ·) is a hypergeometric function with matrix argument, which can be written as a sum of zonal polynomials. If rank(μ) < m, then it follows that

and so the density without reflection invariance is fairly straightforward to find in this case. If rank(μ) = m, for m ≥ 3 then the density without reflection invariance cannot be computed in this way.

Goodall and Mardia (1992) discuss in detail the multivariate approach for computing shape densities from normal distributions. The case where the covariance matrix is I_m⊗Σ is dealt with and size-and-shape and shape distributions are given for all cases except rank(μ) = m, for m ≥ 3.

In conclusion, marginal shape distributions from normal models are very complicated in higher than two dimensions, but for low rank means (rank ≤ 2) the distributions are tractable.

11.6.5 Approximations

When variations are small Goodall and Mardia (1993) have given some normal approximations to the size-and-shape and shape distributions.

Result 11.11 Let T^P be the Procrustes size-and-shape coordinates after rotating the size-and-shape T to the mean μ (diagonal and of rank m) given in Equation (7.14). For small σ, it follows that (T^P)_ij follow independent normal distributions with means μ_ij and variances σ², for i = j or i > m, and σ²/(1 + μ²_ii/μ_jj²) for j < i ≤ m.

Important point: The normal approximation is much simpler to use for inference than the offset normal distribution, and Goodall and Mardia (1993) consider an m = 3 dimensional application of size-and-shape analysis in the manufacture of chairs. A decomposition of the approximate distribution for size and shape is also given, and it can be shown for general m that size and shape are asymptotically independent (as σ → 0) under isotropy. The 2D (m = 2) case was seen earlier in Result 11.9. Thus the result gives a method for small σ for shape analysis. Inference will be similar to using the tangent approximation inference of Section 9.1.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 11 Offset normal shape distributions

Create new playlist

Sign In

Sign Up

11.1 Introduction

11.1.1 Equal mean case in two dimensions

11.1.1.1 Isotropic case

11.1.1.2 Distribution of the Riemannian distance

11.1.1.3 Anisotropic case

11.1.2 The isotropic case in two dimensions

11.1.3 The triangle case

11.1.4 Approximations: Large and small variations

11.1.4.1 Large variations

11.1.4.2 Small variations

11.1.5 Exact moments

11.1.6 Isotropy

11.2 Offset normal shape distributions with general covariances

11.2.1 The complex normal case

11.2.2 General covariances: Small variations

11.3 Inference for offset normal distributions

11.3.1 General MLE

11.3.2 Isotropic case

11.3.3 Exact isotropic MLE in R

11.3.4 EM algorithm and extensions

11.4 Practical inference

11.5 Offset normal size-and-shape distributions

11.5.1 The isotropic case

11.5.1.1 The density

11.5.1.2 Equal means

11.5.1.3 Small variations

11.5.1.4 Conditional distributions

11.5.2 Inference using the offset normal size-and-shape model

11.6 Distributions for higher dimensions

11.6.1 Introduction

11.6.2 QR decomposition

11.6.3 Size-and-shape distributions

11.6.3.1 Coincident means

11.6.4 Multivariate approach

11.6.5 Approximations

Table of Contents for
11 Offset normal shape distributions