In this chapter we introduce some differential geometrical aspects of shape and size-and-shape. After a brief review of Riemannian manifolds, we define what is meant by the pre-shape, shape and size-and-shape of a configuration.
Throughout this text the spaces of interest are primarily Riemannian manifolds, and we begin with some informal discussion about the topic. There are many treatments of differential geometry at various levels of formalism, and an excellent introduction is given by Bär (2010).
A manifold is a space which can be viewed locally as a Euclidean space. We first consider tangent spaces for a manifold M in general. Consider a differentiable curve in M given by with γ(0) = p. The tangent vector at p is given by:
and the unit tangent vector is ξ = γ′(0)/||γ′(0)||. The set of all tangent vectors γ′(0) for all curves passing through p is called the tangent space of M at p, denoted by Tp(M). If we consider a manifold M then if it has what is called an affine connection (a way of connecting nearby tangent spaces) then a geodesic can be defined.
A Riemannian manifold M is a connected manifold which has a positive-definite inner product defined on each tangent space Tp(M), such that the choice varies smoothly from point to point. We write g = {gij} to denote the positive definite tensor which defines the inner-product on each tangent space for a given coordinate system. Specifically if we have coordinates (x1, …, xn) then the metric on the space is:
Consider a tangent vector v ∈ Tp(M) in the tangent space at a point p on the manifold M, then there is a unique geodesic γ(t) passing through p with initial tangent vector γ′(0) = v. The corresponding exponential map is defined as:
and the inverse exponential map, or logarithmic map, is:
The Riemannian distance between any two points in the Riemannian manifold is given by the arc length of the minimizing geodesic between two points, where the length of a parameterized curve {γ(t), ∈[a, b]} is defined as:
where ||du||g = {∑i∑jgijduiduj}1/2 is the norm of a vector du induced by the inner product g. The length of the curve does not change under re-parameterizations. The Riemannian distance is an intrinsic distance, where the distance is obtained from the length of the geodesic path entirely within the manifold. An alternative is an extrinsic distance, where the distance is computed within a higher dimensional embedding space (usually Euclidean). In this case an embedding function is required, denoted by j(X), X ∈ M, together with a unique projection from the embedding space back to the manifold, denoted by P() (see Chapter 6).
The Riemannian metric g defines a unique affine connection called the Levi- Civita connection ∇ (or covariant derivative), which enables us to say how vectors in tangent spaces change as we move along a curve from one point to another. Let γ(t) be a curve on M and we want to move from γ(t0) to γ(s) and see how the vector is transformed to in the new tangent space Tγ(s)(M). The parallel transport of ξ along γ is a vector field X(s) which satisfies
and we write for the parallel transport. In general, X(s) is obtained by solving a system of ordinary differential equations, although for some manifolds the solution is analytic.
The Riemannian curvature tensor is used to describe the curvature of a manifold. The curvature tensor R(U, V) is given in terms of the Levi-Civita connection
where [U, V] = UV − VU is the Lie bracket of vector fields (e.g. see Warner 1971).
For 2D landmark shapes the shape space is a Riemannian manifold (a complex projective space), as shown by Kendall (1984) and considered in Section 4.3. However, for the important case of three or more dimensional shapes the space is not a manifold, but rather it is a stratified space. For m ≥ 3 dimensions the shape space has singularities (where configurations lie in a subspace of dimension m − 2 or less), and these singularities are strata of the space. We assume throughout that we are away from such degenerate shapes, which is often reasonable in practice, and so we restrict ourselves to the manifold part of the shape space (i.e. the non-degenerate configurations).
We have already noted that the shape of an object is given by the geometrical information that remains when we remove translation, rotation and scale information.
We shall remove the transformations in stages, which have different levels of difficulty. For example, removing location is very straightforward (e.g. by a specific linear transformation) as is the removal of scale (e.g. by rescaling to unit size). We say that a configuration which has been standardized by these two operations is called a ‘pre-shape’. The pre-shape space is also sometimes called an ‘ambient space’, which contains the rotation information as well as the shape.
Removing rotation is more difficult and can be carried out by optimizing over rotations by minimizing some criterion. This final removal of rotation is often called ‘quotienting out’ in geometry and the resulting shape lies in the shape space, which is a type of ‘quotient space’.
A rotation of a configuration about the origin is given by post-multiplication of the k × m configuration matrix X by a m × m rotation matrix Γ.
Definition 3.1 An m × m rotation matrix satisfies ΓTΓ = ΓΓT = Im and det(Γ) = +1. A rotation matrix is also known as a special orthogonal matrix, which is an orthogonal matrix with determinant + 1. The set of all m × m rotation matrices is known as the special orthogonal group SO(m).
A rotation matrix has degrees of freedom. For m = 2 dimensions the rotation matrix can be parameterized by a single angle θ, −π ≤ θ < π in radians for rotating clockwise about the origin:
In m = 3 dimensions one requires three Euler angles. For example one could consider a clockwise rotation of angle θ1, −π ≤ θ1 < π about the z-axis, followed by a rotation of angle θ2, −π/2 ≤ θ1 < π/2 about the x-axis, then finally a rotation of angle θ3, −π ≤ θ3 < π about the z-axis. This parametrization is known as the x-convention and is unique apart from a singularity at θ2 = −π/2. The rotation matrix is:
There are many choices of Euler angle representations, and all have singularities (Stuelpnagel 1964).
To complete the set of similarity transformations an isotropic scaling is obtained by multiplying X by a positive real number.
Definition 3.2 The Euclidean similarity transformations of a configuration matrix X are the set of translated, rotated and isotropically rescaled X, that is
where is the scale, Γ is a rotation matrix and γ is a translation m-vector.
Definition 3.3 The rigid-body transformations of a configuration matrix X are the set of translated and rotated X, that is
where Γ is a rotation matrix and γ is a translation m-vector.
For m = 2 we can use complex notation. Consider k ≥ 3 landmarks in , zo = (zo1, …, zko)T which are not all coincident. The Euclidean similarity transformations of zo are:
where is the scale, 0 ≤ θ < 2π is the rotation angle and is the translation. Here θ is an anti-clockwise rotation about the origin. The Euclidean similarity transformations of zo are the set of the same complex linear transformations applied to each landmark zoj. Specifying the Euclidean similarity transformations as complex linear transformations leads to great simplifications in shape analysis for the 2D case, as we shall see in Chapter 8.
For m = 3 we can use unit quaternions to represent a 3D rotation, and this approach was used by Horn (1987), Theobald (2005) and Du et al. (2015) for example. A combined 3D rotation and isotropic scaling can be represented by a (non-unit) quaternion.
We can consider the shape of X as the equivalence class of the full set of similarity transformations of a configuration, and we remove the similarity transformations from the configuration in a systematic manner.
If all k points are coincident, then this has a special shape that must be considered as a separate case. We shall remove this case from the set of configurations. After removing translation the coincident points are represented by the origin, the m-vector 0m = (0, …, 0)T. The coincident case is not generally of interest except perhaps as a starting point in the study of the diffusion of shape (Kendall 1989).
The m = 1 case is also not of primary interest and it is simply seen (Kendall 1984) that the shape space is Sk − 2 (the sphere in k − 1 real dimensions) after translation and scale have been removed, and thus can be dealt with using directional data analysis techniques (e.g. Mardia 1972; Mardia and Jupp 2000). Our first detailed consideration is the shape space of Kendall (1984) for the case where k > m and for m ≥ 2.
A translation is obtained by adding a constant m-vector to the coordinates of each point. Translation is the easiest to remove from X and can be achieved by considering contrasts of the data, that is pre-multiplying by a suitable matrix. We can make a specific choice of contrast by pre-multiplying X with the Helmert submatrix of Equation (2.10). We write
(the origin is removed because coincident landmarks are not allowed) and we refer to XH as the Helmertized landmark coordinates.
The centred landmark coordinates are an alternative choice for removing location and are given by:
We can revert back to the centred landmark coordinates from the Helmertized landmark coordinates by pre-multiplying by HT, as
and so
Note that the Helmertized landmark coordinates XH are a (k − 1) × m matrix, whereas the centred landmark coordinates XC are a k × m matrix.
We standardize for size by dividing through by our notion of size. We choose the centroid size [see Equation (2.2)] which is also given by:
since HTH = C is idempotent. Note that S(X) > 0 because we do not allow complete coincidence of landmarks. The pre-shape of a configuration matrix X has all information about location and scale removed.
Definition 3.4 The pre-shape of a configuration matrix X is given by:
which is invariant under the translation and scaling of the original configuration.
An alternative representation of pre-shape is to initially centre the configuration and then divide by size. The centred pre-shape is given by:
since C = HTH. Note that Z is a (k − 1) × m matrix whereas ZC is a k × m matrix.
Important point: Both pre-shape representations are equally suitable for the pre-shape space which has real dimension (k − 1)m − 1. The advantage in using Z is that the number of rows is less than that of ZC (although of course they have the same rank). On the other hand, the advantage of working with the centred pre-shape ZC is that a plot of the Cartesian coordinates gives a correct geometrical view of the shape of the original configuration.
Notation: We use the notation Skm to denote the pre-shape space of k points in m dimensions.
Definition 3.5 The pre-shape space is the space of all pre-shapes. Formally, the pre-shape space Skm is the orbit space of the non-coincident k point set configurations in under the action of translation and isotropic scaling.
The pre-shape space Skm ≡ S(k − 1)m − 1 is a hypersphere of unit radius in (k − 1)m real dimensions, since ||Z|| = 1. The term ‘pre-shape’ signifies that we are one step away from shape: rotation still has to be removed. The term was coined by Kendall (1984), and it is a type of ambient space. The pre-shape space is of higher dimension and informally can be thought of as ‘surrounding’ the shape space, hence the use of the word ‘ambient’ here.
In order to also remove rotation information from the configuration we identify all rotated versions of the pre-shape with each other, and this set or equivalence class is the shape of X.
Definition 3.6 The shape of a configuration matrix X is all the geometrical information about X that is invariant under location, rotation and isotropic scaling (Euclidean similarity transformations). The shape can be represented by the set [X] given by:
where SO(m) is the special orthogonal group of rotations and Z is the pre-shape of X.
Notation: We use the notation Σkm to denote the shape space of k points in m dimensions.
Definition 3.7 The shape space is the space of all shapes. Formally, the shape space Σkm is the orbit space of the non-coincident k point set configurations in under the action of the Euclidean similarity transformations (translation, rotation and scale).
In order to compare two different shapes we need to choose a particular relative rotation so that they are closest as possible in some sense (see Chapter 4). This optimization over rotation is also called ‘quotienting out’ the group of rotations, and so the shape space is a type of quotient space.
Important point: The dimension of the shape space is:
and this can be simply seen as we initially have km coordinates and then must lose m dimensions for location, one dimension for isotropic scale and for rotation.
The shape of X is a set: an equivalence class under the action of the group of similarity transformations. In order to visualize shapes it is often convenient to choose a particular member of the shape set [X].
Definition 3.8 An icon is a particular member of the shape set [X] which is taken as being representative of the shape.
The word icon can mean ‘image or likeness’ and it is appropriate as we use the icon to picture a representative figure from the shape equivalence class which has a resemblance to the other members, that is the objects of the class are all similar (Goodall 1995). The centred pre-shape ZC is a suitable choice of icon.
We could change the order of removing the similarity transformations or only remove some of the transformations. For example, if location and rotation are removed but not scale, then we have the size-and-shape of X.
Definition 3.9 The size-and-shape of a configuration matrix X is all the geometrical information about X that is invariant under location and rotation (rigid-body transformations), and this can be represented by the set [X]S given by:
where XH are the Helmertized coordinates of Equation (3.7).
Notation: We use the notation SΣkm to denote the size-and-shape space of k points in m dimensions.
Definition 3.10 The size-and-shape space is the space of all size-and-shapes. Formally, the size-and-shape space SΣkm is the orbit space of k point set configurations in under the action of translation and rotation.
Size-and-shape is also known as form, particularly in biology. We discuss size-and-shape in more detail in Chapter 5.
If size is removed from the size-and-shape (e.g. by rescaling to unit centroid size), then again we obtain the shape of X,
as in Equation (3.12).
We can also include invariances under reflections for shape or size-and-shape. A reflection can be obtained by multiplying one of the coordinate axes by − 1. Joint rotations and reflections can be represented by an orthogonal matrix.
Definition 3.11 An m × m orthogonal matrix satisfies ΓTΓ = ΓΓT = Im and det(Γ) = ±1. The set of all m × m orthogonal matrices is known as the orthogonal group O(m).
The orthogonal group includes rotations (determinant + 1) and rotations/ reflections (determinant − 1).
Definition 3.12 The reflection shape of a configuration matrix X is all the geometrical information that is invariant under translation, rotation, scale and reflection. The reflection shape can be represented by the set
where O(m) is the set of m × m orthogonal matrices, satisfying RTR = Im = RRT and det(R) = ±1, and Z is the pre-shape.
Definition 3.13 The reflection size-and-shape of a configuration matrix X is all the geometrical information that is invariant under translation, rotation and reflection. The reflection size-and-shape can be represented by the set
where O(m) is the set of m × m orthogonal matrices and XH are the Helmertized coordinates.
In obtaining the shape, the removal of the location and scale could have been performed in a different manner. For example, Ziezold (1994) centres the configuration to remove location, CX, where C is given by Equation (2.3) and he uses the normalized size . We could alternatively have removed location by pre-multiplying by B where the jth row of B is:
and the 1 is in the (j + 1)th column. The implication of pre-multiplication by B is that location is removed by sending the midpoint of the line between landmarks 1 and 2 to the origin, as is carried out when using Bookstein coordinates (see Section 2.4).
Consider a triangle of k = 3 points in m = 3 dimensions. It is clear in this case that the triangle can be rotated to lie in a particular 2D plane (say the x–y plane). A reflection of the triangle can be carried out by a 3D rotation, and hence shape is the same as reflection shape for a triangle in three dimensions.
More generally, if k ≤ m, then the pre-shape of X can be identified with another pre-shape U in Skk − 1 by rotating the configuration to be in a fixed k − 1 dimensional plane. Now U can be reflected in this plane without changing its shape and so the shape of X is:
where O(k − 1) is the orthogonal group (rotation and reflection). Kendall (1984) called the case k ≤ m ‘over-dimensioned’ and dealing with these shape spaces is equivalent to dealing with Σkk − 1, and identifying reflections with each other. Throughout most of this book we shall deal with the case k > m.
Important point: With quite a wide variety of terminology used for the different spaces it may be helpful to refer to Figure 3.1 where we give a diagram indicating the hierarchies of shape and size-and-shape spaces. In addition removing all affine transformations leads to the affine shape space.
18.221.188.161