There has been an explosion in the collection of image data in the last decade, especially with the availability of cheap high-quality cameras and smart-phones. An important area of study is the interpretation of images, and shape is a very important component. A wide ranging summary of measuring shape in images was given by Neale and Russ (2012).
A digital image is an r × c grid of pixels (picture elements) each of which is assigned an integer representing the brightness (or intensity) at that pixel. The pixel is coloured a particular shade of grey depending on the brightness at that position, and hence the integer representing brightness is called the grey level.
A common scale used is for the grey levels to range from 0 (black) through to 255 (white). So, for example, a grey level of 50 would be dark grey, 128 mid grey and 200 light grey. Such images are called grey-level or grey-scale images. The range of grey levels is usually on the scale 0 to (2g − 1), where g = 1 for binary images (black/white), g = 8 for 256 grey-scale (8 bit) images and g = 12 for 4096 grey-scale (12 bit) images. Colour images can be represented as three grey-level images – each image for the red, green and blue colour bands. For a 256 × 256 image a scene is represented by 216 = 65 536 integers, and hence the data are very high dimensional. Image analysis is involved with all aspects of analysing such image data. For an introduction to statistical image analysis see, for example, Ripley (1988) and Winkler (1995). Other reviews include Mardia and Kanji (1993); Mardia (1994); Grenander and Miller (2007); Davies et al. (2008b) and Sotiras et al. (2013).
Example 17.1 In Figure 17.1 we see a simple (r = 7) × (c = 6) image of the letter H with grey levels in the matrix
Image analysis tasks can be broadly divided into two distinct areas – low and high level image analysis. Low level image analysis involves techniques at a pixel by pixel level, such as cleaning up noisy images, deblurring images, classifying each pixel into a few classes (segmentation) and modelling of textures. High level image analysis involves direct modelling of the objects in the images and tasks such as object recognition and location of objects in an image. It is particularly in the area of high level image analysis that shape analysis has a prominent rôle to play.
In image analysis the registration parameters (location, rotation and scale) will usually need to be modelled, although a partition of variables into shape and registration parameters is often helpful.
Since the early 1980s statistical approaches to image analysis using the Bayesian paradigm have proved to be very successful. Initially, the methodology was primarily developed for low level image analysis but then widely used for high level tasks.
To use the Bayesian framework one requires a prior model which represents our initial knowledge about the objects in a particular scene, and a likelihood or noise model which is the joint probability distribution of the grey levels in the image, dependent on the objects in the scene. By using Bayes’ theorem one derives the posterior distribution of the objects in the scene, which can be used for inference. Examples of the tasks of interest include segmentation of the scene into ‘object’ and ‘background’, and object recognition. The computational work involved is generally intense because of the large amount of data in each image.
An appropriate method for high-level Bayesian image analysis is the use of deformable templates, pioneered by Grenander and colleagues (Grenander and Keenan 1993; Grenander 1994; Grenander and Miller 2007). Our description follows the common theme of Mardia et al. (1991, 1995). We assume that we are dealing with applications where we have prior knowledge on the composition of the scene and we can formulate parsimonious geometric descriptions for objects in the images. For example, in medical imaging we can expect to know a priori the main subject of the image, for example a heart or a brain. Consider our prior knowledge about the objects under study to be represented by a parameterized ideal prototype or template S0. Note that S0 could be a template of a single object or many objects in a scene. A probability distribution is assigned to the parameters with density (or probability function) π(S), which models the allowed variations S of S0. Hence, S is a random vector representing all possible templates with associated density π(S). Here S is a function of a finite number of parameters, say θ1, ..., θp.
In addition to the prior model we require an image model. Let the observed image I be the matrix of grey levels xi, where i = (i1, i2) ∈ {1, ..., r} × {1, …, c} are the r × c pixel locations. The image model or likelihood is the joint p.d.f. (or probability function for discrete data) of the grey levels given the parameterized objects S, written as L(I|S). The likelihood expresses the dependence of the observed image on the deformed template. It is often convenient to generate an intermediate synthetic image but we will not need it here (Mardia et al. 1995).
By Bayes’ theorem, the posterior density π(S|I) of the deformed template S given the observed image I is:
An estimate of the true scene can be obtained from the posterior mode (the MAP estimate) or the posterior mean. The posterior mode is found either by a global search, gradient descent (which is often impracticable due to the large number of parameters) or by techniques such as simulated annealing (Geman and Geman 1984), iterative conditional modes (ICM) (Besag 1986) or ICM for image sequences (Hainsworth and Mardia 1993). Alternatively, MCMC algorithms (e.g. see Besag et al. 1995; Gilks et al. 1996; Gelman et al. 2014; Green et al. 2015) provide techniques for simulating from a density. MCMC has the advantage that it allows a study of the whole posterior density itself, and so credibility or confidence regions can be easily obtained.
The key to the successful inclusion of prior knowledge in high level Bayesian image analysis is through specification of the prior distribution. Many approaches have been proposed, including methods based on outlines, landmarks and geometric parameters. The prior can be specified either through a model with known parameters or with parameters estimated from training data.
One approach is to consider a geometric template for S consisting of parametric components, for example line segments, circles, ellipses and arcs. Examples include a circle of random radius to represent the central disc of galaxies (Ripley and Sutherland 1990); simple geometric shapes for facial features such as eyes and mouths (Phillips and Smith 1993, 1994); circular templates to represent pellets in an image, where the number of pellets is unknown (Baddeley and van Lieshout 1993); and ellipses for leaf cell shape (Hurn 1998).
In these models, distributions are specified for the geometrical parameters, and the hierarchical approach of Phillips and Smith (1993, 1994) provides a simple method. Often templates are defined by both global and local parameters. The global parameters represent the object on a coarse scale and the local parameters give a more detailed description on a fine scale. The idea of a hierarchical model for templates is to specify the marginal distribution of the global parameters and a conditional distribution for the local parameters given the global values. This hierarchical division of parameters can be extended to give a complete description of the local dependence structure between variables. Hence, conditionally, each parameter depends only on variables in the levels immediately above and below it.
In general, we assume that templates can be summarized by a small number of parameters θ = (θ1, ..., θp) say, where variation in θ will produce a wide range of deformations of the template. By explicitly assigning a prior distribution to θ, we can quantify the relative probability of different deformations. The prior can be based on training data which need not be a large dataset. By simulation, we can check the possible shapes that could arise.
Consider a single object in with k landmarks. One approach to specifying the template is to work with landmarks on a template directly. Denote the coordinates of the landmarks in as X, a mk × 1 vector. The configuration can be parameterized as an overall location (e.g. centroid), a scale β, a rotation Γ and some suitable shape coordinates, such as shape PC scores. A shape distribution from Chapters 10 or 11 could be chosen as a prior model together with more vague priors for location, scale and rotation.
A point distribution model (PDM) is a PC model prior model for object recognition and was suggested by Cootes et al. (1992) (see Section 7.7.4). In Figure 7.17 we saw a model for hand shapes which uses three PC scores. The PDM is a template model which is estimated from a training dataset of images, where landmarks are usually located on each training image by an observer. The PDM forms a shape prior distribution in the set of models known as active shape models (ASMs) (Cootes et al. 1994; Cootes and Taylor 1995; Davies et al. 2008b). The ASM combines a prior shape model with a likelihood type of model for the pixels based on local grey level intensities in the image at normal directions to the template. The ASM can be used to for detecting objects in a new image, making use of the prior information in a training set.
An extension to the ASM is the active appearance model (AAM) which also includes full information of the grey levels of the training images (rather than in a few local regions as in ASMs). The AAMs also use PCA for reducing the dimension of the variability of the image grey levels, and it is important to register the grey level images in the training set in order to estimate the image PCs in the AAM, see Cootes et al. (2001). The AAM can then be used for detecting objects in a new image, making use of the prior information in a training set of images. Both AAMs and ASMs have been extremely successful in a wide variety of medical and engineering image applications.
Amit and Kong (1996) and Amit (1997) use a graphical template method which is based on comparisons between triangles in an image and a template. Cost functions are given for matching triangles in the deformed template to triangles in the observed template and the total cost gives a measure of discrepancy. The cost functions involve hard constraints limiting the range in which the observed angles can deviate from the template angles and soft constraints penalizing the deviations from template angles.
The procedure had been implemented into a fast algorithm which finds an optimal match in an image in polynomial time, provided the template graph is decomposable (Amit 1997; Amit and Geman 1997).
The thin-plate spline introduced in Section 12.3 can be used in a prior model. The PTPS transformation from the ideal template S0 to a deformed version S has an energy function associated with it. The total minimum bending energy J(Φ) from a PTPS is given in Equation (12.16). Thus if Eint(S, S0) = J(Φ), then a prior distribution of the deformation could be obtained using the Gibbs distribution with density proportional to:
This would inhibit landmark changes that require a large bending energy. If S is an affine transformation of S0, then the total bending energy is zero. This prior was suggested by Mardia et al. (1991) and further applications were given by Mardia and Hainsworth (1993).
The snake (Kass et al. 1988) in its simplest form is a spline with a penalty on the mean curvature. Snakes are used for fitting a curve to an image when there is no underlying template, with the aim that the resulting estimated curve is smooth. Let the outline be . The penalty in the snake can be written as:
where α(t) and β(t) denote the degree of stiffness and elasticity at t, respectively. For tj = j/k, j = 0, 1, …, k, denote f(tj) = uj + ivj. We find that the right-hand side of Equation (17.2) can be written approximately for large k as (Mardia 1996a):
Thus {uj} and {vj} also form a separable Gaussian MRF of order 2.
Inference about a scene in an image is made through the posterior distribution of S obtained from Equation (17.1). The full range of Bayesian statistical inference tools can be used and, as stated in Section 17.2, the maximum of the posterior density (the MAP estimate) is frequently used, as well as the posterior mean. There may be alternative occasions when particular template parameters are of great interest, in which case one would maximize the appropriate marginal posterior densities (Phillips and Smith 1994). One way to calculate the posterior mean is by a simulation method which does not depend on the complicated normalizing constant in π(θ|x). For example, a MCMC procedure using the Metropolis–Hastings algorithm (Metropolis et al. 1953; Hastings 1970) could be used. This procedure generates a Markov chain whose equilibrium distribution is the posterior distribution of θ|x.
Images can be deformed using deformations such as those described in Chapter 12 and this process is called image warping. Warping or morphing of images is used in a wide set of applications, including in the cinema industry and medical image registration.
Definition 17.1 Consider an image f(t) defined on a region and deformed to f(Φ(t)), where . For example, a set of landmarks T1 could be located in the original image and then deformed to new positions T2. We call f(Φ(t)) the warped image.
A practical approach to warping is to use the inverse deformation from D2 to D1 to ‘look up’ the corresponding grey levels from the region D1. Consider a pixel location t ∈ D2. The deformation Φ(t) is computed from the deformed region D2 to the original plane D1. Then assign the new grey level at t as the grey level f(Φ(t)) [in practice the closest pixel to location Φ(t)].
The advantage of using the reverse deformation is that if the original image is defined on a regular lattice, then the warped image is still defined on a regular lattice. An alternative approach is to map from D1 to D2 resulting in a irregular non-square lattice for D2, and then linear interpolation is used to obtain the final image on a regular lattice (Mardia and Hainsworth 1993).
Examples of warping include data fusion for medical imaging. For example, we may wish to combine information from an X-ray image (which has anatomical information) and a nuclear medicine image (which shows functional information). In Figure 17.2 we see the display tool of Mardia and Little (1994) which involves a deformation from an X-ray image to a nuclear medicine image. For some other statistical work see Johnson et al. (1995) and Hurn et al. (1996). Also, Bookstein (2001) considers the relation between the use of warping with landmarks and image-based warping.
We can use the warping approach to construct an average from several images of objects (Mardia 1993).
Definition 17.2 Consider a random sample of images f1, …, fn containing landmark configuration X1, …, Xn, from a population mean image f with a population mean configuration μ. We wish to estimate μ and f up to arbitrary Euclidean similarity transformations. The shape of μ can be estimated by the full Procrustes mean of the landmark configurations X1, …, Xn. Let Φ*i be the deformation obtained from the estimated mean shape to the ith configuration. The average image has the grey level at pixel location t given by:
We consider to be an estimate of the population mean image f.
Example 17.2 In Figure 17.3 we see five T1 (first thoracic) mouse vertebral images. Twelve landmarks were located on each of the vertebral outlines and the full Procrustes mean was computed. The average image is obtained using the above procedure in Equation (17.3) and is displayed in Figure 17.4. This approach leads to rather blurred averages in regions away from landmarks (here the central ‘hole’ in the centre of the bones which has no landmarks). □
Other examples of image averaging include average faces (Craw and Cameron 1992) and the average brain images obtained using thin-plate spline transformations (Bookstein 1991, Appendix A.1.3) and diffeomorphisms (Beg et al., 2005).
Galton (1878, 1883) obtained averaged pictures of faces using photographic techniques in the 19th century, see Figure 17.5, and the technique was called composite photography. Galton was interested in the typical average face of groups of people, such as criminals and tuberculosis patients. He believed that faces contained information that could be used in many applications, for example the susceptibility of people to particular diseases. Another early example is the ‘average’ photograph of 12 American mathematicians obtained by Raphael Pumpelly taken in 1884 (Stigler 1984). Shapes of landmark configurations from photographs of the faces of individuals are used for forensic identification (e.g. see Mardia et al. 1996b; Evison and Vorder Bruegge 2010). A state of the art method of face recognition is DeepFace (Taigman et al., 2014).
The same technique of warping can be used to merge two images f1 and f2 by forming a weighted configuration . The merged image is then (Mardia and Hainsworth 1993):
where Φ*i is the deformation from to Xi.
Example 17.3 In an illustrative example we see in Figure 17.6 a merged image composed from photographs of Einstein at ages 49 and 71. A set of 11 landmarks are chosen on each face – the eyes, mouth corners, tip of nose, bottom of ears, bottom of chin, forehead outline (three landmarks) – together with four landmarks in the extreme corners of each image and four landmarks at midpoints of the sides of each image. A thin-plate spline transformation of each image is calculated to a linear interpolation of the two sets of landmarks, and the grey levels are averaged (Mardia and Hainsworth 1993). The merged photograph has characteristics from both the younger and older images. An additional example merging images of Newton and Einstein is given in Figure 17.7. □
Some other popular examples of image merging and warping include the merging of face images and age changes to the face, such as produced by Benson and Perrett (1993) and Burt and Perrett (1995), using a different technique based on PCA. Other applications include predicting a face image after a number of years from a photograph of a missing child. See Section 17.4.5 for further examples.
Glasbey and Mardia (2001) have proposed a novel approach to image warping using a penalized likelihood with the von Mises–Fourier similarity measure and the null distortion criterion. Various applications are given including registering images with a map, discrimination without landmarks and fusing images. Mardia et al. (2006c) have used this approach for image averaging and discrimination using a combination of landmark and normalized image information.
Although much of the groundwork for Bayesian image analysis has been in place since the early 1990s, there have been many technical developments in recent years. In particular, statistical properties have been investigated in detail for deformable template estimation, including the study of consistency and stochastic approximation algorithms (Allassonnière et al. 2007, 2010a,b) and the consistency of Fréchet means in deformable template models (Bigot and Charlier 2011). In particular, Allassonnière et al. (2007) make use of the Bernstein–von Mises theorem (van der Vaart 1998, p. 141) in order to demonstrate Bayesian consistency, and Cheng et al. (2016) also use this result to show consistency for function and curve estimation.
Mardia et al. (2006b) fit a rigorous stochastic model for a deformation between landmarks and to assess the error of the fitted deformation whereas Mardia et al. (2006a) assess the effects of deformations through a Kullback–Leibler divergence measure. In particular, Mardia et al. (2006b) have also provided principal warps based on intrinsic random fields.
Over the last few decades statistical algorithms have emerged for image analysis which are widely applicable and reliable. There is also an advantage in using explicit stochastic models so that we have a better understanding of the working behind the algorithms and we can make confident statements about conclusions. Another area of application of Bayesian image analysis is where the aim is not only object recognition but knowledge representation of objects, such as the creation of anatomical atlases of the brain. In such cases deformable templates and associated probability distributions can be used to describe normal subjects and patients (e.g. Grenander 1994; Beg et al. 2005; Villalon et al. 2011). Brain mapping methods are widely used for higher dimensional manifolds, that is landmarks (dimension 0) to sulci (lines of dimension 1), cortex and related surfaces (dimension 2), volumes and subvolumes (dimension 3) (e.g. see Grenander and Miller 1994; Joshi et al. 1995; Ashburner and Friston 2000; Durrleman et al. 2007). Another important area of image analysis is the use of scale space techniques (e.g. Witkin 1983) and multiscale methods for registration and template modelling (e.g. Fritsch et al. 1994; Ying-Lie et al. 1994; Wilson 1995; Wilson and Johnson 1995; Mardia et al. 1997b). For further discussion of the very broad area of shape and image analysis see, for example, Ying-Lie et al. (1994); Krim and Yezzi (2006); Grenander and Miller (2007); Davies et al. (2008b); Younes (2010); Neale and Russ (2012); Sotiras et al. (2013); and Turaga and Srivastava (2015).
18.225.234.164