10

Probabilistic approaches to geometric statistics

Stochastic processes, transition distributions, and fiber bundle geometry

Stefan Sommer    University of Copenhagen, Department of Computer Science, Copenhagen, Denmark

Abstract

We discuss construction of parametric families of intrinsically defined and geometrically natural probability distributions on manifolds, in particular, how to generalize the Euclidean normal distribution. This opens up for probabilistic formulations of concepts such as the mean value, covariance, and principal component analysis and general likelihood-based inference. The general idea is to use transition distributions of stochastic processes on manifolds to construct probabilistic models. For manifolds with connection, Gaussian-like distributions with nontrivial covariance structure can be defined via semielliptic diffusion processes in the frame bundle. On Lie groups diffusion processes can be similarly constructed using left or right trivialization to the Lie algebra. In both cases estimation of parameters of the underlying geometry or stochastic structure of the flows can be achieved either using most probable paths to the data, from matching moments of the generated distributions to sample moments of the data, or from Monte Carlo sampling of stochastic bridges to approximate transition distributions. We discuss the relation between geometry and noise structure and provide examples of how geometric statistics can be performed using stochastic flows.

Keywords

Probability distribution on a manifold; semielliptic diffusion process; Monte Carlo sampling; linear latent variable model; Euclidean principal component analysis; stochastic differential equation; Brownian motion on Riemannian manifold

10.1 Introduction

When generalizing Euclidean statistical concepts to manifolds, it is common to focus on particular properties of the Euclidean constructions and select those as the defining properties of the corresponding manifold generalization. This approach appears in many instances in geometric statistics, statistics of manifold-valued data. For example, the Fréchet mean [9] is the minimizer of the expected square distance to the data. It generalizes its Euclidean counterpart by using this least-squares criterion. Similarly, the principal component analysis (PCA) constructions discussed in Chapter 2 use the notion of linear subspaces from Euclidean space, generalizations of those to manifolds, and least-squares fit to data. Although one construction can often be defined via several equivalent characterizations in the Euclidean situation, curvature generally breaks such equivalences. For example, the mean value and PCA can in the Euclidean situation be formulated as maximum likelihood fits of normal distributions to the data resulting in the same constructions as the least-squares definitions. On curved manifolds the least-squares and maximum likelihood definitions give different results. Fitting probability distributions to data implies a shift of focus from the Riemannian distance as used in least-squares to an underlying probability model. We pursue such probabilistic approaches in this chapter.

The probabilistic viewpoint uses the concepts of likelihood-functions and parametric families of probability distributions. Generally, we search for a family of distributions μ(θ)Image depending on the parameter θ with corresponding density function p(;θ)Image, from which we get a likelihood L(θ;y)=p(y;θ)Image. With independent observations y1,,yNImage, we can then estimate the parameter by setting

ˆθML=argmaxθNi=1L(θ;yi),

Image(10.1)

giving a sample maximum likelihood (ML) estimate of θ or, when a prior distribution p(θ)Image for the parameters is available, the maximum a posteriori (MAP) estimate

ˆθMAP=argmaxθNi=1L(θ;yi)p(θ).

Image(10.2)

We can, for example, let the parameter θ denote a point m in M, and let μ(θ)Image denote the normal distribution centered at m, in which case θMLImage is a maximum likelihood mean. This viewpoint transfers the focus of manifold statistics from least-squares optimization to constructions of natural families of probability distributions μ(θ)Image. A similar case arises when progressing beyond the mean to modeling covariance, data anisotropy, and principal components. The view here shifts from geodesic sprays and projections onto subspaces to the notion of covariance of a random variable. In a sense, we hide the complexity of the geometry in the construction of μ(θ)Image, which in turn implies that constructing such distributions is not always trivial.

Throughout the chapter, we will take inspiration from and refer to the standard Euclidean linear latent variable model

y=m+Wx+ϵ

Image(10.3)

on RdImage with normally distributed latent variable xN(0,Idr)Image, rdImage and isotropic noise ϵN(0,σ2Idd)Image. The marginal distribution of y is normal yN(m,Σ)Image as well with mean m and covariance Σ=WWT+σ2IddImage. This simple model exemplifies many of the challenges when working with parametric probability distributions on manifolds: 1) Its definition relies on normal distributions with isotropic covariance for the distribution of x and ϵ. We describe two possible generalizations to manifolds of these, the Riemannian Normal law and the transition density of the Riemannian Brownian motion. 2) The model is additive, but on manifolds addition is only defined for tangent vectors. We handle this fact by defining probability models infinitesimally using stochastic differential equations. 3) The marginal distribution of y requires a way to translate the directions encoded in the matrix W to directions on the manifold. This can be done both in the tangent space of m, by using fiber bundles to move W with parallel transport, by using Lie group structure, or by referring to coordinate systems that in some cases have special meaning for the particular data at hand.

The effect of including all these points is illustrated in Fig. 10.1. The linear Euclidean view of the data produced by tangent space principal component analysis (PCA) is compared to the linear Euclidean view provided by the infinitesimal probabilistic PCA model [34], which transports covariance parallel along the manifold. Because the infinitesimal model does not linearize to a single tangent space and because of the built-in notion of data anisotropy, infinitesimal covariance, the provided Euclidean view gives a better representation of the data variability.

Image
Figure 10.1 (Left) Samples (black dots) along a great circle of S2Image with added minor variation orthogonal to the circle. The sphere is colored by the density of the distribution, the transition density of the underlying stochastic process. (Right) Red crosses (gray in print version): Data mapped to the tangent space of the north pole using the standard tangent principal component analysis (PCA) linearization. Variation orthogonal to the great circle is overestimated because the curvature of the sphere lets geodesics (red (gray in print version) curve and line) leave high-density areas of the distribution. Black dots: Corresponding linearization of the data using the infinitesimal probabilistic PCA model [34]. The black curve represents the expectation over samples of the latent process conditioned on the same observation as the red (gray in print version) curve. The corresponding path shown on the left figure clearly is attracted to the high-density area of the distribution contrary to the geodesic. The orthogonal variation is not overestimated, and the Euclidean view provides a better representation of the data variability.

We start in section 10.2 by discussing two ways to pursue construction of μ(θ)Image via density functions and from transition distributions of stochastic processes. We exemplify the former with the probabilistic principal geodesic analysis (PPGA) generalization of manifold PCA, and the later with maximum likelihood means and an infinitesimal version of probabilistic PCA. In section 10.3, we discuss the most important stochastic process on manifolds, the Brownian motion, and its transition distribution, both in the Riemannian manifold case and when Lie group structure is present. In section 10.4, we describe aspects of fiber bundle geometry necessary for the construction of stochastic processes with infinitesimal covariance as pursued in section 10.5. The fiber bundle construction can be seen as a way to handle the lack of global coordinate system. Whereas it touches concepts beyond the standard set of Riemannian geometric notions discussed in chapter 1, it provides intrinsic geometric constructions that are very useful from a statistical viewpoint. We use this in section 10.6 to define statistical concepts as maximum likelihood parameter fits to data and in section 10.7 to perform parameter estimation. In section 10.8, we discuss advanced concepts arising from fiber bundle geometry, including interpretation of the curvature tensor, sub-Riemannian frame-bundle geometry, and examples of flows using additional geometric structure present in specific models of shape.

We aim with the chapter for providing an overview of aspects of probabilistic statistics on manifolds in an accessible way. This implies that mathematical details on the underlying geometry and stochastic analysis are partly omitted. We provide references to the papers where the presented material was introduced in each section, and we include references for further reading by the end of the chapter. The code for the presented models and parameter estimation algorithms discussed in this chapter are available in the Theano Geometry library https://bitbucket.com/stefansommer/theanogeometry, see also [16,15].

10.2 Parametric probability distributions on manifolds

We here discuss two ways of defining families of probability distributions on a manifold: directly from a density function, or as the transition distribution of a stochastic process. We exemplify their use with the probabilistic PGA generalization of Euclidean PCA and an infinitesimal counterpart based on an underlying stochastic process.

10.2.1 Probabilistic PCA

Euclidean principal component analysis (PCA) is traditionally defined as a fit of best approximating linear subspaces of a given dimension to data, either by maximizing variance

ˆW=argmaxWO(Rr,Rd)Ni=1WWTyi2

Image(10.4)

of the centered data y1,,yNImage projected to r-dimensional subspaces of RdImage represented here by orthonormal matrices WO(Rr,Rd)Image of rank r or by minimizing residual errors

ˆW=argminWO(Rr,Rd)Ni=1yiWWTyi2

Image(10.5)

between the observations and their projections to the subspace. We see that fundamental for this construction is the notion of linear subspace, projections to linear subspaces, and squared distances. The dimension r of the fitted subspace determines the number of principal components.

PCA can however also be defined from a probabilistic viewpoint [37,29]. The approach is here to fit the latent variable model (10.3) with W of fixed rank r. The conditional distribution of the data given the latent variable xRrImage is normal

y|xN(m+Wx,σ2I).

Image(10.6)

With x normally distributed N(0,Idr)Image and noise ϵN(0,σ2Idd)Image, the marginal distribution of y is yN(m,Σ)Image with Σ=WWT+σ2IddImage.

The Euclidean principal components of the data are here interpreted as the conditional distribution x|yiImage of x given the data yiImage. From the data conditional distribution, a single quantity representing yiImage can be obtained by taking expectation xi:=E[x|yi]=(WTW+σ2I)1WT(yim)Image. The parameters of the model m, W, σ can be found by maximizing the likelihood

L(W,σ,m;y)=|2πΣ|12e12(ym)TΣ1(ym)).

Image(10.7)

Up to rotation, the ML fit of W is given by ˆWML=ˆUr(ˆΛσ2Idd)1/2Image, where ˆΛ=diag(ˆλ1,,ˆλr)Image, ˆUrImage contains the first r principal eigenvectors of the sample covariance matrix of yiImage in the columns, and ˆλ1,,ˆλrImage are the corresponding eigenvalues.

10.2.2 Riemannian normal distribution and probabilistic PGA

We saw in chapter 2 the Normal law or Riemannian normal distribution defined via its density

p(y;m,σ2)=C(m,σ2)1edist(m,y)22σ2

Image(10.8)

with normalization constant C(m,σ2)Image and the parameter σ2Image controlling the dispersion of the distribution. The density is given with respect to the volume measure dVgImage on MImage, so that the actual distribution is p(;m,σ2)dVgImage. Because of the use of the Riemannian distance function, the distribution is at first sight related to a normal distribution N(0,σ2Idd)Image in TmMImage; however, its definition with respect to the measure dVgImage implies that it differs from the density of the normal distribution at each point of TmMImage by the square root determinant of the metric |g|12Image. The isotropic precision/concentration matrix σ2IddImage can be exchanged with a more general concentration matrix in TmMImage. The distribution maximizes the entropy for fixed parameters (m,Σ)Image [26].

This distribution is used in [39] to generalize Euclidean PPCA. Here the distribution of the latent variable x is normal in TmMImage, x is mapped to MImage using ExpmImage, and the conditional distribution y|xImage of the observed data y given x is Riemannian normal p(y;Expmx,σ2)dVgImage. The matrix W models the square root covariance Σ=WWTImage of the latent variable x in TmMImage. The model is called probabilistic principal geodesic analysis (PPGA).

10.2.3 Transition distributions and stochastic differential equations

Instead of mapping latent variables from TmMImage to MImage using the exponential map, we can take an infinitesimal approach and only map infinitesimal displacements to the manifold, thereby avoiding the use of ExpmImage and the implicit linearization coming from the use of a single tangent space. The idea is to create probability distributions as solutions to stochastic differential equations, SDEs. In Euclidean space, SDEs are usually written on the form

dy(t)=b(t,y(t))dt+a(t,y(t))dx(t),

Image(10.9)

where a:R×RdRd×dImage is the diffusion field modeling the local diffusion of the process, and b:R×RdRdImage models the deterministic drift. The process x(t)Image of which we multiply the infinitesimal increments dx(t)Image on a is a semimartingale. For our purposes, we can assume that it is a standard Brownian motion, often written W(t)Image or B(t)Image. Solutions to (10.9) are defined by an integral equation that discretized in time takes the form

y(ti)=y(0)+i1j=1b(tj,y(tj))(tj+1tj)+a(tj,y(tj))(x(tj+1)x(tj)).

Image(10.10)

This is called an Itô equation. Alternatively, we can use the Fisk–Stratonovich solution

y(ti)=y(0)+i1j=1b(tj,y(tj))(tj+1tj)+a(tj,y(tj))(x(tj+1)x(tj))

Image(10.11)

where tj=(tj+1tj)/2Image, that is, the integrand is evaluated at the midpoint. Notationally, Fisk–Stratonovich SDEs, often just called Stratonovich SDEs, are distinguished from Itô SDEs by adding ∘ in the diffusion term a(t,y(t))dx(t)Image in (10.9). The main purpose here of using Stratonovich SDEs is that solutions obey the ordinary chain rule of differentiation and therefore map naturally between manifolds.

A solution y(t)Image to an SDE is a t-indexed family of probability distributions. If we fix a time T>0Image, then the transition distribution y(T)Image denotes the distribution of endpoints of sample paths y(ω)(t)Image, where ω is a particular random event. We can thus generate distributions in this way and set μ(θ)=y(T)Image, where the parameters θ now control the dynamics of the process via the SDE, particularly the drift b, the diffusion field a, and the starting point y0Image of the process.

The use of SDEs fits the differential structure of manifolds well because SDEs are defined infinitesimally. However, because we generally do not have global coordinate systems to write up an SDE as in (10.9), defining SDEs on manifolds takes some work. We will see several examples of this in the sections below.

Particularly, we will define an SDE that reformulates (10.3) as a time-sequence of random steps, where the latent variable x will be replaced by a latent process x(t)Image, where the covariance W will be parallel transported over MImage. This process will again have parameters (m,W,σ)Image. We define the distribution μ(m,W,σ)Image by setting μ(m,W,σ)=y(T)Image, and we then assume that the observed data y1,,yNImage have marginal distribution yiμ(m,W,σ)Image. Note that y(T)Image is a distribution, whereas yiImage, i=1,,NImage, denote the data.

Let p(yi;m,W,σ)Image denote the density of the distribution μ(m,W,σ)Image with respect to a fixed measure. As in the PPCA situation, we then have a likelihood for the model

L(m,W,σ;yi)=p(yi;m,W,σ),

Image(10.12)

and we can optimize for the ML estimate ˆθ=(ˆm,ˆW,ˆσ)Image. Again, similarly to the PPCA construction, we get the generalization of the principal components by conditioning the latent process on the data: xi,t:=x(t)|y(T)=yiImage. The picture here is that among all sample paths y(ω)(t)Image, we single out those hitting yiImage at time T and consider the corresponding realizations of the latent process x(ω)(t)Image a representation of the data.

Fig. 10.1 displays the result of pursuing this construction compared to tangent space PCA. Because the anisotropic covariance is now transported with the process instead of being tied to a single tangent space, the curvature of the sphere is in a sense incorporated into the model, and the linear view of the data xi,tImage, particularly the endpoints xi:=xi,TImage, provide an improved picture of the data variation on the manifold.

Below, we will make the construction of the underlying stochastic process precise and present other examples of geometrically natural processes that allow for generating geometrically natural families of probability distributions μ(θ)Image.

10.3 The Brownian motion

In Euclidean space the normal distribution N(0,Idd)Image is often defined in terms of its density function. This view leads naturally to the Riemannian normal distribution or the normal law (10.8). A different characterization [10] is as the transition distribution of an isotropic diffusion processes, the heat equation. Here the density is the solution to the partial differential equation

tp(t,y)=12Δp(t,y),yRk,

Image(10.13)

where p:R×RkRImage is a real-valued function, Δ is the Laplace differential operator Δ=2y1++2ykImage. If (10.13) is started at time t=0Image with p(y)=δm(y)Image, that is, the indicator function taking the value 1 only when y=mImage, the time t=1Image solution is the density of the normal distribution N(m,Idk)Image. We can think of a point-sourced heat distribution starting at m and diffusing through the domain from time t=0Image to t=1Image.

The heat flow can be characterized probabilistically from a stochastic process, the Brownian motion B(t)Image. When started at m at time t=0Image, a solution p to the heat flow equation (10.13) describes the density of the random variable B(t)Image for each t. Therefore, we again regain the density of the normal distribution N(m,Idk)Image as the density of B(1)Image. The heat flow and the Brownian motion view of the normal distribution generalize naturally to the manifold situation. Because the Laplacian is a differential operator and because the Brownian motion is constructed from random infinitesimal increments, the construction is an infinitesimal construction as discussed in section 10.2.3.

Whereas in this section we focus on aspects of the Brownian motion, we will later see that solutions y(t)Image to the SDE dy(t)=WdB(t)Image with more general matrices W in addition allows modeling covariance in the normal distribution, even in the manifold situation, using the fact that in the Euclidean situation, y(1)N(m,Σ)Image when Σ=WWTImage.

10.3.1 Brownian motion on Riemannian manifolds

A Riemannian metric g defines the Laplace–Beltrami operator ΔgImage that generalizes the usual Euclidean Laplace operator used in (10.13). The operator is defined on real-valued functions by Δgf=divgradgfgImage. When e1,,edImage is an orthonormal basis for TyMImage, it has the expression Δgf(y)=i=1d2yf(ei,ei)Image when evaluated at y similarly to the Euclidean Laplacian. The expression 2yf(ei,ei)Image denotes the Hessian 2yImage evaluated at the pair of vectors (ei,ei)Image. The heat equation on MImage is the partial differential equation defined from the Laplace–Beltrami operator by

tp(t,y)=12Δgp(t,y),yM.

Image(10.14)

With initial condition p(0,)Image at t=0Image being the indicator function δm(y)Image, the solution is called the heat kernel and written p(t,m,y)Image when evaluated at yMImage. The heat equation again models point sourced heat flows starting at m and diffusing through the medium with the Laplace–Beltrami operator now ensuring that the flow is adapted to the nonlinear geometry. The heat kernel is symmetric in that p(t,m,y)=p(t,y,m)Image and satisfies the semigroup property

p(t+s,m,y)=Mp(t,m,z)p(s,z,y)dVg(z).

Image

Similarly to the Euclidean situation, we can recover the heat kernel from a diffusion process on MImage, the Brownian motion. The Brownian motion on Riemannian manifolds and Lie groups with a Riemannian metric can be constructed in several ways: Using charts, by embedding in a Euclidean space, or using left/right invariance as we pursue in this section. A particular important construction here is the Eells–Elworthy–Malliavin construction of Brownian motion that uses a fiber bundle of the manifold to define an SDE for the Brownian motion. We will use this construction in section 10.4 and through the rest of the chapter.

The heat kernel p(t,m,y)Image is related to a Brownian motion x(t)Image on MImage by its transition density, that is,

Pm(x(t)C)=Cp(t,m,y)dVg(y)

Image

for subsets CMImage. If MImage is assumed compact, it can be shown that it is stochastically complete, which implies that the Brownian motion exists for all time and that Mp(t,m,y)dVg(y)=1Image for all t>0Image. If MImage is not compact, the long time existence can be ensured by, for example, bounding the Ricci curvature of MImage from below; see, for example, [7]. In coordinates, a solution y(t)Image to the Itô SDE

dy(t)i=b(y(t))dt+g(y(t))1idB(t)

Image(10.15)

is a Brownian motion on MImage [13]. Here B(t)Image is a Euclidean RdImage-valued Brownian motion, the diffusion field g(y(t))1Image is a square root of the cometric tensor g(y(t))ijImage, and the drift b(y(t))Image is the contraction 12g(y(t))klΓ(y(t))klImage of the metric and the Christoffel symbols ΓkliImage. Fig. 10.2 shows sample paths from a Brownian motion on the sphere S2Image.

Image
Figure 10.2 Sample paths x(ω)(t) of a standard Browian motion on the sphere S2Image.

10.3.2 Lie groups

With a left-invariant metric on a Lie group G (see chapter 1), the Laplace–Beltrami operator takes the form Δf(x)=Δ(Lyf)(y1x)Image for all x,yGImage. By left-translating to the identity the operator thus needs only be computed at x=eImage, that is, at the Lie algebra gImage. Like the Laplace–Beltrami operator, the heat kernel is left-invariant [21] when the metric is left-invariant. Similar invariance happens in the right-invariant case.

Let e1,,edImage be an orthonormal basis for gImage, so that Xi(y)=(Ly)(ei)Image is an orthonormal set of vector fields on G. Let CijkImage denote the structure coefficients given by

[Xj,Xk]=CijkXi,

Image(10.16)

and let B(t)Image be a standard Brownian motion on RdImage. Then the solution y(t)Image of the Stratonovich differential equation

dy(t)=12j,iCjijXi(y(t))dt+Xi(y(t))dB(t)i

Image(10.17)

is a Brownian motion on G. Fig. 10.3 visualizes a sample path of B(t)Image and the corresponding sample of y(t)Image on the group SO(3)Image. When the metric on gImage is in addition Ad-invariant, the drift term vanishes leaving only the multiplication of the Brownian motion increments on the basis.

Image
Figure 10.3 (Left) Sample path B(ω)(t) of a standard Brownian motion on R3Image. (Right) The corresponding sample path y(ω)(t) of the SO(3)-valued process (10.17) visualized by the action of rotation y.(e1,e2,e3), y ∈ SO(3) on three basis vectors e1, e2, e3 (red/green/blue) (gray/light gray/dark gray in print version) for R3Image.

The left-invariant fields Xi(y)Image here provide a basis for the tangent space at y that in (10.17) is used to map increments of the Euclidean Brownian motion B(t)Image to TyGImage. The fact that XiImage are defined globally allows this construction to specify the evolution of the process at all points of G without referring to charts as in (10.15). We will later on explore a different approach to obtain a structure much like the Lie group fields XiImage but on general manifolds, where we do not have globally defined continuous and nonzero vector fields. This allows us to write the Brownian motion globally as in the Lie group case.

10.4 Fiber bundle geometry

In the Lie group case, Brownian motion can be constructed by mapping a Euclidean process B(t)Image to the group to get the process y(t)Image. This construction uses the set of left- (or right)-vector fields Xi(y)=(Ly)(ei)Image that are globally defined and, with a left-invariant metric, orthonormal. Globally defined maps from a manifold to its tangent bundle are called sections, and manifolds that support sections of the tangent bundle that at each point form a basis for the tangent space are called parallelizable, a property that Lie groups possess but not manifolds in general. The sphere S2Image is an example: The hairy-ball theorem asserts that no continuous nowhere vanishing vector fields exist on S2Image. Thus we have no chance of finding a set of nonvanishing global vector fields, not to mention a set of fields constituting an orthonormal basis, which we can use to write an SDE similar to (10.17).

A similar issue arises when generalizing the latent variable model (10.3). We can use the tangent space at m to model the latent variables x, map to the manifold using the Riemannian exponential map ExpmImage, and use the Riemannian Normal law to model the conditional distribution y|xImage. However, if we wish to avoid the linearization implied by using the tangent space at m, then we need to convert (10.3) from using addition of the vectors x, W, and ϵ to work infinitesimally, to use addition of infinitesimal steps in tangent spaces, and to transport W between these tangent spaces. We can achieve this by converting (10.3) to the SDE

dy(t)=Wdx(t)+dϵ(t)

Image(10.18)

started at m, where x(t)Image is now a Euclidean Brownian motion, and ϵ(t)Image is a Euclidean Brownian motion scaled by σ. The latent process x(t)Image here takes the place of the latent variable x in (10.3) with x(1)Image and x having the same distribution N(0,Idd)Image. We write x(t)Image instead of B(t)Image to emphasize this. Similarly, the noise process ϵ(t)Image takes the place of ϵ with ϵ(1)Image and ϵ having the same distribution N(0,σ2Idd)Image. In Euclidean space, the transition distribution of this SDE will be equal to the marginal distribution of y in (10.3), that is, y1N(m,Σ)Image and Σ=WWT+σ2IddImage. On the manifold we however need to handle the fact that the matrix W is defined at first only in the tangent space TmMImage. The natural way to move W to tangent spaces nearby m is by parallel transport of the vectors constituting the columns of W. This reflects the Euclidean situation where W is independent of x(t)Image and hence spatially stationary. However, parallel transport is tied to paths, so the result will be a transport of W that is now different for each sample path realization of (10.18). This fact is beautifully handled with the Eells–Elworthy–Malliavin [6] construction of Brownian motion. We outline this construction below. For this, we first need some important notions from fiber bundle geometry.

10.4.1 The frame bundle

A fiber bundle over a manifold M is a manifold E with a map π:EMImage, called the projection, such that for sufficiently small neighborhoods UMImage, the preimage π1(U)Image can be written as a product π1U×FImage between U and a manifold F, the fiber. When the fibers are vector spaces, fiber bundles are called vector bundles. The most commonly occurring vector bundle is the tangent bundle TM. Recall that a tangent vector always lives in a tangent space at a point in M, that is, vTyMImage. The map π(v)=yImage is the projection, and the fiber π1(y)Image of the point y is the vector space TyMImage, which is isomorphic to RdImage.

Consider now basis vectors W1,,WdImage for TyMImage. As an ordered set (W1,,Wd)Image, the vectors are in combination called a frame. The frame bundle FMImage is a fiber bundle over M such that the fibers π1(y)Image are sets of frames. Therefore a point uFMImage consists of a collection of basis vectors (W1,,Wd)Image and the base point yMImage of which W1,,WdImage make up a basis for TyMImage. We can use the local product structure of frame bundles to locally write u=(y,W)Image where yMImage as W1,,WdImage are the basis vectors. Often, we denote the basis vectors in u just u1,,udImage. The frame bundle has interesting geometric properties, which we will use through the chapter. The frame bundle of S2Image is illustrated in Fig. 10.4.

Image
Figure 10.4 The frame bundle FS2Image of the sphere S2Image illustrated by its representation as the principal bundle GL(R2,TS2)Image. A point uFS2Image can be seen as a linear map from R2Image (left) to the tangent space TxS2Image of the point y = π(u) (right). The standard basis (e1,e2) for R2Image maps to a basis (u1,u2), ui = uei for TyS2Image because the frame bundle element u defines a linear map R2TyS2Image. The vertical subbundle of the tangent bundle TFS2Image consists of derivatives of paths in FS2Image that only change the frame, that is, π(u) is fixed. Vertical vectors act by rotation of the basis vectors as illustrated by the rotation of the basis seen along the vertical line. The horizontal subbundle, which can be seen as orthogonal to the vertical subbundle, arises from parallel transporting vectors in the frame u along paths on S2Image.

10.4.2 Horizontality

The frame bundle, being a manifold, itself has a tangent bundle TFMImage with derivatives ˙u(t)Image of paths u(t)FMImage being vectors in Tu(t)FMImage. We can use the fiber bundle structure to split TFMImage and thereby define two different types of infinitesimal movements in FMImage. First, a path u(t)Image can vary solely in the fiber direction meaning that for some yMImage, π(u(t))=yImage for all t. Such a path is called vertical. At a point uFMImage the derivative of the path lies in the linear subspace VuFMImage of TuFMImage called the vertical subspace. For each y, VuFMImage is a d2Image-dimensional manifold. It corresponds to changes of the frame, the basis vectors for TyMImage, while the base point y is kept fixed. FMImage is a (d+d2)Image-dimensional manifold, and the subspace containing the remaining d dimensions of TuFMImage is in a particular sense separate from the vertical subspace. It is therefore called the horizontal subspace HuFMImage. Just as tangent vectors in VuFMImage model changes only in the frame keeping y fixed, the horizontal subspace models changes of y keeping, in a sense, the frame fixed. However, frames are tied to tangent spaces, so we need to define what is meant by keeping the frame fixed. When MImage is equipped with a connection ∇, being constant along paths is per definition having zero acceleration as measured by the connection. Here, for each basis vector uiImage, we need ˙y(t)ui(t)=0Image when u(t)Image is the path in the frame bundle and y(t)=π(u(t))Image is the path of base points. This condition is exactly satisfied when the frame vectors ui(t)Image are each parallel transported along y(t)Image. The derivatives ˙u(t)Image of paths satisfying this condition make up the horizontal subspace of Ty(t)MImage. In other words, the horizontal subspace of TFMImage contains derivatives of paths where the base point y(t)Image changes, but the frame is kept as constant as possible as sensed by the connection.

The frame bundle has a special set of horizontal vector fields H1,,HdImage that make up a global basis for HFMImage. This set is in a way a solution to defining the SDE (10.18) on manifolds: Although we cannot in the general situation find a set of globally defined vectors fields as we used in the Euclidean and Lie group situation to drive the Brownian motion (10.17), we can lift the problem to the frame bundle where such a set of vector fields exists. This will enable us to drive the SDE in the frame bundle and then subsequently project its solution to the manifold using π. To define HiImage, take the ith frame vector uiTyMImage, move y infinitesimally in the direction of the frame vector uiImage, and parallel transport each frame vector ujImage, j=1,,dImage, along the infinitesimal curve. The result is an infinitesimal displacement in TFMImage, a tangent vector to FMImage, which by construction is an element of HFMImage. This can be done for any uFMImage and any i=1,,dImage. Thus we get the global set of horizontal vector fields HiImage on FMImage. Together, the fields HiImage are linearly independent because they model displacement in the direction of the linearly independent vectors uiImage. In combination the fields make up a basis for the d-dimensional horizontal spaces Hπ(u)FMImage for each uFMImage.

For each yMImage, TyMImage has dimension d, and with uFMImage, we have a basis for TyMImage. Using this basis, we can map a vector vRdImage to a vector uvTyMImage by setting uv:=uiviImage using the Einstein summation convention. This mapping is invertible, and we can therefore consider the FMImage element u a map in GL(Rd,TM)Image. Similarly, we can map v to an element of HuFMImage using the horizontal vector fields Hi(u)viImage, a mapping that is again invertible. Combining this, we can map vectors from TyMImage to RdImage and then to HuFMImage. This map is called the horizontal lift hu:Tπ(u)MHuFMImage. The inverse of huImage is just the push-forward π:HuFMTπ(u)MImage of the projection π. Note the u dependence of the horizontal lift: huImage is a linear isomorphism between Tπ(u)MImage and HuFMImage, but the mapping will change with different u, and it is not an isomorphism between the bundles TMImage and HFMImage as can be seen from the dimensions 2d and 2d+d2Image, respectively.

10.4.3 Development and stochastic development

We now use the horizontal fields H1,,HdImage to construct paths and SDEs on FMImage that can be mapped to MImage. Keep in mind the Lie group SDE (10.17) for Brownian motion where increments of a Euclidean Brownian motion B(t)Image or x(t)Image are multiplied on an orthonormal basis. We now use the horizontal fields HiImage for the same purpose. We start deterministically. Let x(t)Image be a C1Image curve on RdImage and define the ODE

˙u(t)=Hi(u(t))˙xi(t)

Image(10.19)

on FMImage started with a frame bundle element u0=uImage. By mapping the derivative of x(t)Image in RdImage to TFMImage using the horizontal fields Hi(u(t))Image we thus obtain a curve in FMImage. Such a curve is called the development of x(t)Image. See Fig. 10.5 for a schematic illustration. We can then directly obtain a curve y(t)Image in MImage by setting y(t)=π(u(t))Image, that is, removing the frame from the generated path. The development procedure is often visualized as rolling the manifold MImage along the path of x(t)Image in the manifold RdImage. For this reason, it is denoted “rolling without slipping”. We will use the letter x for the curve x(t)Image in RdImage, u for its development u(t)Image in FMImage, and y for the resulting curve y(t)Image on MImage.

Image
Figure 10.5 Development and stochastic development maps RdImage-valued curves and processes to curves and processes on the manifold using the frame bundle. Starting at a frame bundle element u with y = π(u), the development maps the derivative of the curve x(t) using the current frame u(t) to a tangent vector in Hu(t)MImage. These tangent vectors are integrated to a curve u(t) in FMImage and a curve y(t)=π(u(t)) on MImage using the ODE (10.19). As a result, the starting frame u(t) is parallel transported along y(t). The construction works as well for stochastic processes (semimartingales): the thin line illustrates a sample path. Note that two curves that do not end at the same point in RdImage can map to curves that do end at the same point in MImage. Because of curvature, frames transported along two curves on MImage that end at the same point are generally not equal. This rotation is a result of the holonomy of the manifold.

The development procedure has a stochastic counterpart: Let now x(t)Image be an RdImage-valued Euclidean semimartingale. For our purposes, x(t)Image will be a Brownian motion on RdImage. The stochastic development SDE is then

du(t)=Hi(u(t))dxi(t)

Image(10.20)

using Stratonovich integration. In the stochastic setting, x(t)Image is sometimes called the driving process for y(t)Image. Observe that the development procedure above, which was based on mapping differentiable curves, here works for processes that are almost surely nowhere differentiable. It is not immediate that this works, and arguing rigorously for the well-posedness of the stochastic development employs nontrivial stochastic calculus; see, for example, [13].

The stochastic development has several interesting properties: (1) It is a mapping from the space of stochastic paths on RdImage to M, that is, each sample path x(ω)(t)Image gets mapped to a path y(ω)(t)Image on MImage. It is in this respect different from the tangent space linearizations, where vectors, not paths, in TmMImage are mapped to points in M. (2) It depends on the initial frame u0Image. In particular, if MImage is Riemannian and u0Image orthonormal, then the process y(t)Image is a Riemannian Brownian motion when x(t)Image is a Euclidean Brownian motion. (3) It is defined using the connection of the manifold. From (10.20) and the definition of the horizontal vector fields we can see that a Riemannian metric is not used. However, a Riemannian metric can be used to define the connection, and a Riemannian metric can be used to state that u0Image is, for example, orthonormal. If MImage is Riemannian, stochastically complete and u0Image orthonormal, we can write the density of the distribution y(t)Image with respect to the Riemannian volume, that is, y(t)=p(t;u0)dVgImage. If π(u0)=mImage, then the density p(t;u0)Image will then equal the heat kernel p(t,m,)Image.

10.5 Anisotropic normal distributions

Perhaps most important for the use here is that (10.20) can be seen as a manifold generalization of the SDE (10.18) generalizing the latent model (10.3). This is the reason for using the notation x(t)Image for the driving process and y(t)Image for the resulting process on the manifold: x(t)Image can be interpreted as the latent variable, and y(t)Image as the response. When u0Image is orthonormal, then the marginal distribution of y(1)Image is normal in the sense of equaling the transition distribution of the Brownian motion just as in the Euclidean case where W=IdImage and σ=0Image results in yN(m,Id)Image.

We start by discussing the case σ=0Image of (10.3), where W is a square root of the covariance of the distribution of y in the Euclidean case. We use this to define a notion of infinitesimal covariance for a class of distributions on manifolds denoted anisotropic normal distributions [32,35]. We assume for now that W is of full rank d, but W is not assumed orthonormal.

10.5.1 Infinitesimal covariance

Recall the definition of covariance of a multivariate Euclidean stochastic variable X: cov(Xi,Xj)=E[(XiˉXi)(XjˉXj)]Image, where ˉX=E[X]Image is the mean value. This definition relies by construction on the coordinate system used to extract the components XiImage and XjImage. Therefore it cannot be transferred to manifolds directly. Instead, other similar notions of covariance have been treated in the literature, for example,

covm(Xi,Xj)=E[Logm(X)iLogm(X)j]

Image

defined in [26]. In the form expressed here, a basis for TmMImage is used to extract components of the vectors Logm(X)Image. Here we take a different approach and define a notion of infinitesimal covariance in the case where the distribution y is generated by a driving stochastic process. This will allow us to extend the transition distribution of the Brownian motion, which is isotropic and has trivial covariance, to the case of anisotropic distributions with nontrivial infinitesimal covariance.

Recall that when σ=0Image, the marginal distribution of y in (10.3) is normal N(m,Σ)Image with covariance Σ=WWTImage. The same distribution appears when we take the stochastic process view and use W in (10.18). We now take this to the manifold situation by starting the process (10.20) at a point u=(m,W)Image in the frame bundle. This is a direct generalization of (10.18). When W is an orthonormal basis, the generated distribution is the transition distribution of a Riemannian Brownian motion. However, when W is not orthonormal, the generated distribution becomes anisotropic. Fig. 10.6 shows density plots of the Riemannian normal distribution and a Brownian motion both with W=0.5IddImage, and an anisotropic distribution with W∝̸IddImage.

Image
Figure 10.6 (Left) Density of the Normal law or Riemannian normal distribution with isotropic variance 0.5. (center) Density of a Brownian motion with isotropic variance 0.5 (stopped at T = 0.5). (Right) Density of a Brownian motion with variance 0.5 in one direction and 0.12 in the orthogonal direction corresponding to W=diag((.5),0.1)Image.

We can write up the likelihood of observing a point yMImage at time t=TImage under the model,

L(m,W;y)=p(T,y;m,W),

Image(10.21)

where p(t,y;m,W)Image is the time t-density at the point yMImage of the generated anisotropic distribution y(t)Image. Without loss of generality, the observation time can be set to T=1Image and skipped from the notation. The density can only be written with respect to a base measure, here denoted μ0Image, such that y(T)=p(T;m,W)μ0Image. If MImage is Riemannian, then we can set μ0=dVgImage, but this is not a requirement: The construction only needs a connection and a fixed base measure with respect to which we define the likelihood.

The parameters of the model, m and W, are represented by one element u of the frame bundle FMImage, that is, the starting point of the process u(t)Image in FMImage. Writing θ for the parameters combined, we have θ=u=(m,W)Image. These parameters correspond to the mean m and covariance Σ=WWTImage for the Euclidean normal distribution N(m,Σ)Image. We can take a step further and define the mean for such a distribution to be x as we pursue below. Similarly, we can define the notion of infinitesimal square root covariance of y(T)Image to be W.

10.5.2 Isotropic noise

The linear model (10.3) includes both the matrix W and isotropic noise ϵN(0,σ2I)Image. We now discuss how this additive structure can be modeled, including the case where W is not of full rank d.

We have so far considered distributions resulting from a Brownian motion analogues of isotropic normal distributions and seen that they can be represented by the frame bundle SDE (10.20). The fundamental structure is that u0Image being orthonormal spreads the infinitesimal variation equally in all directions as seen by the Riemannian metric. There exists a subbundle of FMImage called the orthonormal frame bundle OMImage that consists of only such orthonormal frames. Solutions to (10.20) will always stay in OMImage if u0OMImage. We here use the symbol R for elements of OMImage to emphasize their pure rotation, not the scaling effect. We can model the added isotropic noise by modifying the SDE (10.20) to

dW(t)=Hi(W(t))dx(t)i+Hi(R(t))dϵ(t)i,dR(t)=hR(t)(π(dW)),

Image(10.22)

where the flow now has both the base point and covariance component W(t)Image and a pure rotation R(t)Image component serving as basis for the noise process ϵ(t)Image. As before, we let the generated distribution on MImage be y(t)=π(W(t))Image, that is, W(t)Image takes the place of u(t)Image.

Elements of OMImage differ only by a rotation, and since ϵ(t)Image is a Brownian motion scaled by σ, we can exchange R(t)Image in the right-hand side of dW(t)Image by any other element of OMImage without changing the distribution. Computationally, we can therefore skip R(t)Image from the integration and instead find an arbitrary element of OMImage at each time step of a numerical integration. This is particularly important when the dimension d of the manifold is large because R(t)Image has d2Image components.

We can explore this even further by letting W be a d×rImage matrix with rdImage, thus reducing the rank of W similar to the PPCA situation (10.6). Without addition of the isotropic noise, this would in general result in the density p(;m,W)Image being degenerate, just as the Euclidean normal density function requires full rank covariance matrix. However, with the addition of the isotropic noise, W+σRImage can still be of full rank even though W has zero eigenvalues. This has further computational advantages: If we instead of using the frame bundle FMImage, let W be an element of the bundle of rank r linear maps RrTMImage so that W1,,WrImage are r linearly independent basis vectors in Tπ(W)MImage, and if we remove R(t)Image from the flow (10.22) as described before, then the flow now lives in a (d+rd)Image-dimensional fiber bundle compared to the d+d2Image dimensions of the full frame bundle. For low r, this can imply a substantial reduction in computation time.

10.5.3 Euclideanization

Tangent space linearizations using the ExpmImage and LogmImage maps provide a linear view of the data yiImage on MImage. When the data are concentrated close to a mean m, this view gives a good picture of the data variation. However, as data spread grows larger, curvature starts having an influence, and the linear view can provide a progressively distorted picture of the data. Whereas linear views of a curved geometry will never give truly faithful picture of the data, we can use a generalization of (10.3) to provide a linearization that integrates the effect of curvature at points far from m. The standard PCA dimension-reduced view of the data is writing W=UΛImage, where Λ is the diagonal matrix with the eigenvalues λ1,,λrImage of W in the diagonal. In PPCA this is used to provide a low-dimensional representation of the data from the data conditional expectation x|yiImage. This can be further reduced to a single data descriptor xi:=E[x|yi]Image by taking expectation, and then we obtain an equivalent of the standard PCA view by displaying ΛxiImage.

In the current probabilistic model, we can likewise condition the latent variables on the data to get a Euclidean entity describing the data. Since the latent variable is now a time-dependent path, the result of the conditioning is a process x(t)|y(T)=yiImage where the conditioning is on the response process hitting the data at time T. This results in a quite different view of the data as illustrated in Fig. 10.7 and exemplified in Fig. 10.1: as in PPCA, taking expectation, we get

ˉx(t)i=E[x(t)|y(T)=yi].

Image(10.23)

To get a single data descriptor, we can integrate dˉx(t)iImage in time to get the endpoint xi:=T0dˉx(t)idt=ˉx(T)iImage. From the example in Fig. 10.1 we see that this Euclideanization of the data can be quite different compared to tangent space linearization.

Image
Figure 10.7 Euclideanization using stochastic development: (right) The data point yi (red dot (dark gray dot in print version)) defines the conditioned process y(t)|y(T)=yi illustrated by sample paths on the manifold. (left) The antidevelopment of this process x(t)|y(T)=yi is illustrated by sample paths in the Euclidean space. Note that the Euclidean process x(t)|y(T)=yi need not end at the same point in RdImage (this will generally only happen if M is Euclidean). The Euclidean process can be summarized into the expected path ˉx(t)iImage (thick curve left). This path can again be summarized by its endpoint xi, which is a single vector (dashed arrow left). Contrary to tangent space linearizations, the fact that the process visits all points of the manifold integrates curvature into this vector. It is thus not equivalent to Logm(yi).

10.6 Statistics with bundles

We now use the generalization of (10.3) via processes, either in the Lie algebra (10.17) of a group or on the frame bundle (10.20), to do statistics of manifold data. We start with ML estimation of mean and infinitesimal covariance by fitting anisotropic normal distributions to data, then progress to describing probabilistic PCA, a regression model, and estimation schemes.

10.6.1 Normal distributions and maximum likelihood

Considering the transition distribution μ(θ)=y(T)Image of solutions u(t)Image to (10.20) projected to the manifold y(t)=π(u(t))Image started at θ=u=(m,W)Image a normal distribution with infinitesimal covariance Σ=WWTImage, we can now define the sample maximum likelihood mean ˆmMLImage by

ˆmML=argmaxmNi=1L(m;yi)

Image(10.24)

from samples y1,,yNMImage. Here, we implicitly assume that W is orthonormal with respect to a Riemannian metric. Alternatively, we can set

ˆmML=argmaxmmaxWNi=1L(m,W;yi),

Image(10.25)

where we simultaneously optimize to find the most likely infinitesimal covariance. The former definition defines ˆmMLImage as the starting point of the Brownian motion with transition density making the observations most likely. The latter includes the effect of the covariance, the anisotropy, and because of this it will in general give different results. In practice the likelihood is evaluated by Monte Carlo sampling. Parameter estimation procedures with parameters θ=(m)Image or θ=(m,W)Image and sampling methods are the topic of section 10.7.

We can use the likelihood (10.21) to get ML estimates for both the mean x and the infinitesimal covariance W by modifying (10.25) to

(ˆmML,ˆWML)=ˆuML=argmaxuNi=1L(u;yi).

Image(10.26)

Note the nonuniqueness of the result when estimating the square root W instead of the covariance Σ=WWTImage. We discuss this point from a more geometric view in section 10.8.4.

10.6.2 Infinitesimal PPCA

The latent model (10.22) is used as the basis for the infinitesimal version of manifold PPCA [34], which we discussed in general terms in section 10.2.3. As in Euclidean PPCA, r denotes the number of principal eigenvectors to be estimated. With a fixed base measure μ0Image, we write the density of the distribution generated from the low-rank plus noise system (10.22) as μ(m,W,σ)=p(T;m,W,σ)dVgImage and use this to define the likelihood L(m,W,σ;y)Image in (10.12). The major difference in relation to (10.26) is now that the noise parameter σ is estimated from the data and that W is of rank rdImage.

The Euclideanization approach of section 10.5.3 gives equivalents of Euclidean principal components by conditioning the latent process on the data xi:=x(t)|y(T)=yiImage. By taking expectation this can be reduced to a single path ˉx(t)i:=E[x(t)|y(T)=yi]Image or a single vector xi:=ˉx(T)iImage.

The model is quite different from constructions of manifold PCA [11,14,31,5,27] that seek subspaces of the manifold having properties related to Euclidean linear subspaces. The probabilistic model and the horizontal frame bundle flows in general imply that no subspace is constructed in the present model. Instead, we can extract the parameters of the generated distribution and the information present in the conditioned latent process. As we discuss in section 10.8.1, the fact that the model does not generate subspaces is fundamentally linked to curvature, the curvature tensor, and nonintegrability of the horizontal distribution in FMImage.

10.6.3 Regression

The generalized latent model (10.3) is used in [17] to define a related regression model. Here we assume observations (xi,yi)Image, i=1,,NImage, with xiRdImage and yiMImage. As in the previous models, the unknown is the point mMImage, which takes the role of the intercept in multivariate regression, the coefficient matrix W, and the noise variance σ2Image. Whereas the infinitesimal nature of the model that relies on the latent variable being a semimartingale makes it geometrically natural, the fact that the latent variable is a process implies that its values in the interval (0,T)Image are unobserved if xiImage is the observation at time T. This turns the construction into a missing data problem, and the values of x(t)Image in the unobserved interval (0,T)Image needs to be integrated out. This can be pursued by combining bridge sampling as described below with matching of the sample moments of data with moments of the response variable y defined by the model [18].

10.7 Parameter estimation

So far we have only constructed models and defined parameter estimation as optimization problems for the involved likelihoods. It remains to discuss how we can actually estimate parameters in concrete settings. We describe here three approaches: (1) using a least-squares principle that incorporates the data anisotropy; this model is geometrically intuitive, but it only approximates the true density in the limit as T0Image. (2) Using the method of moments where approximations of low-order moments of the generated distribution is compared with the corresponding data moments. (3) Using bridge sampling of the conditioned process to approximate transition density functions with Monte Carlo sampling.

10.7.1 Anisotropic least squares

The Fréchet mean (see Chapter 2) is defined from the least-squares principle. Here we aim to derive a similar least-squares condition for the variables θ=mImage, θ=(m,W)Image, or θ=(m,W,σ)Image. With this approach, the inferred parameters ˆθImage will only approximate the actual maximum likelihood estimates in a certain limit. Although only providing an approximation, the least-squares approach is different from Riemannian least-squares, and it is thereby both of geometric interest and gives perspective on the bridge sampling described later.

Until now we have assumed the observation time T to be strictly positive or simply T=1Image. If instead we let T0Image, then we can explore the short-time asymptotic limit of the generated density. Starting with the Brownian motion, the limit has been extensively studied in the literature. For the Euclidean normal density, we know that pN(m,TId)(y)=(2πT)d2exp(ym22T)Image. In particular, the density obeys the limit limT0TlogpN(m,TId)(y)=12ym2Image. The same limit occurs on complete Riemannian manifolds with dist(m,y)2Image instead of the norm ym2Image and when y is outside the cut locus C(m)Image; see, for example, [13]. Thus, minimizing the squared distance to data can be seen as equivalent to maximizing the density, and hence the likelihood, for short running times of the Brownian motion specified by small T.

It is shown in [35] that there exists a function dQ:FM×MRImage that, for each uFMImage, incorporates the anisotropy modeled in u in a measurement of the closeness dQ(u,y)Image of m=π(u)Image and y. Like the Riemannian distance, which is defined as the minimal length or energy between curves linking two points in MImage, dQImage is defined using curves in FMImage from u to the fiber π1(y)Image over y but now with energy weighted by a matrix Σ1Image:

dQ(u,y)=minu(t),u(0)=u,π(u(1))=y,˙u(t)HFM10˙u(t)TΣ1(t)˙u(t)dt.

Image(10.27)

Here Σ(t)1=(u(t)1)Tu(t)1Image is the precision matrix of the infinitesimal covariance modeled in the frames u(t)Image. The horizontality requirement ˙u(t)HFMImage implies that the inner product defined by Σ(t)1Image is parallel transported along with u(t)Image. The anisotropy is thus controlled by starting with a possibly nonorthonormal frame u0Image. We motivate this distance further from a geometric viewpoint in sections 10.8.2 and 10.8.3.

It is important here to relate the short-time T0Image asymptotic limit with the Euclidean normal density with covariance Σ. In the Euclidean case, the density is pN(m,TΣ)(y)=|2πTΣ|12exp((ym)TΣ1(ym)2T)Image and, as above, limT0TlogpN(m,TΣ)(y)=12(ym)TΣ1(ym)Image. In the nonlinear situation using the Σ1Image weighted distance dQImage, limT0Tlogpμ(m,W)(y)=12dQ(u,y)2Image. From this we can generalize the Fréchet mean least-squares principle to

ˆθ=(ˆm,ˆW)=argminuFMNi=1dQ(u,q1(yi))2N2log(detgu),

Image(10.28)

where log(detgu)Image denotes the Riemannian determinant of the frame u. This term corresponds to the log-determinant in the Euclidean density pN(m,Σ)Image, and it acts to regularize the optimization that would otherwise increase W to infinity and reduce distances accordingly; ˆmImage can be seen as an anisotropically weighted equivalent of the Fréchet mean.

10.7.2 Method of moments

The method of moments compares low-order moments of the distribution with sample moments of the data. This can be used for parameter estimation by changing the parameters of the model to make the distribution and sample moments match as well as possible. The method of moments does not use the data likelihood, and it is dependent on ability to compute the moments in an appropriate space, for example, by embedding MImage in a larger Euclidean space.

To compare first- and second-order moments, we can set up the cost function

S(μ(θ),y1,y2)=c1μ(θ)1y12+c2μ(θ)2y22,

Image(10.29)

where μ(θ)1Image and y1Image denote the first-order moments of the distribution μ(θ)Image and the sample moments of the data y1,,yNImage, respectively, and similarly for the second-order moments μ(θ)2Image and y2Image, and c1,c2>0Image are weights. If MImage is embedded in a larger Euclidean space, then the norms in (10.29) can be inherited from the embedding space norm. The optimal values of θ can then be found by minimizing this cost.

This approach is used in [18] to estimate parameters in the regression model. The method of moments can be a computationally more lightweight alternative to the bridge sampling discussed further. In addition, the method can be a relatively stable approach because of the implicit regularization provided by only matching entities, here moments, that are averaged over the entire dataset. This is in contrast to the least-squares approach and the bridge sampling that estimate by evaluating dQImage or the likelihood on individual samples, and where averaging is done afterward, for example, by using averaged gradients when optimizing parameters. The moments μ(θ)1Image and μ(θ)2Image can be approximated by sampling from the model or by approximation of the Fokker–Planck equation that governs the time evolution of the density; see, for example, [1].

10.7.3 Bridge sampling

At the heart of the methods discussed in this chapter are the data conditional latent processes x(t)|y(T)=yiImage. We now describe methods for simulating from this conditioned process to subsequently approximate expectation of functions over the conditioned process and the transition density function.

Stochastic bridges arise from conditioning a process to hit a point at a fixed time; here t=TImage. Fig. 10.8 exemplifies the situation with samples from a Brownian bridge on S2Image. Denoting the target point v, the expectation over the bridge process is related to the transition density p(T,v;m)Image of the process by

Ex(t)|x(T)=v[f(x(t))]=Ex(t)[f(x(t))1x(T)=v]p(T,v;m),

Image(10.30)

assuming that p(T,v;m)Image is positive. Here 1 is the indicator function. Setting f(x(t))=1Image, we can write this as

p(T,v;m)=Ex(t)[1x(T)dv]dv

Image(10.31)

for an infinitesimal volume dv containing v. The transition density thus measures the combined probability mass of sample paths x(ω)(t)Image with x(ω)(T)Image near v. However, from the right-hand side of (10.31), we cannot directly get a good approach to computing the transition density and thereby the likelihood by sampling from x(t)Image because the probability of x(t)Image hitting dv is arbitrarily small.

Image
Figure 10.8 Five sample paths from a Brownian bridge on S2Image started at the north pole and conditioned on hitting a fixed point vS2Image (black point) at time T = 1.

Instead, we will use an approach to evaluate the conditional expectation Ex(t)|x(T)=v[f(x(t))]Image by drawing samples from the bridge process and approximate the expectation by Monte Carlo sampling. We will see that this provides us with an effective way to evaluate the density p(T,v;m)Image. It is generally hard to simulate directly from the bridge process x(t)|x(T)=vImage. One exception is the Euclidean Brownian motion, where the bridge satisfies the SDE

dy(t)=y(t)vTtdt+dW(t).

Image(10.32)

More generally, an arbitrary SDE (10.9) can be modified to give a bridge process by addition of an extra drift term:

dy(t)=b(t,y(t))dt+a(t,y(t))a(t,y(t))Tlogp(Tt,v;y(t))+a(t,y(t))dW(t).

Image(10.33)

This SDE could be used to simulate sample paths if it was not for the fact that it involves the gradient of the transition density p(Tt,v;y(t))Image from the current value y(t)Image of the process to v. This transition density gradient generally does not have an explicit or directly computable form; indeed, our goal is to find a way to compute the transition density, and it is thus not feasible to use (10.33) computationally.

To improve on this situation, Delyon and Hu [4] proposed to use the added term from the Brownian bridge (10.32) instead of the gradient of the log-transition density giving an SDE of the form

dy(t)=b(t,y(t))dty(t)vTtdt+a(t,y(t))dW(t).

Image(10.34)

The drift term is illustrated in Fig. 10.9. Solutions y(t)Image to (10.34) are not in general bridges of the original process. Instead, they are called guided processes. However, under certain conditions, the most important being that the diffusion field a is invertible, y(t)Image will hit v at time T a.s., and the law of the conditioned process x(t)|x(T)=vImage and the guided processes y(t)Image will be absolutely continuous with respect to each other with explicit Radon–Nikodym derivative φ. This implies that we can compute expectation over the bridge processes by taking the expectation of the guided process y(t)Image and correcting by factoring in φ:

Ex(t)|x(T)=v[f(x(t))]=Ey(t)[f(y(t))φ(y(t))]Ey(t)[φ(y(t))].

Image(10.35)

Establishing this identity requires a nontrivial limiting argument to compare the two processes in the limit as tTImage, where the denominator TtImage in the guiding term in (10.34) approaches zero. As an additional consequence, Delyon and Hu and later Papaspiliopoulos and Roberts [28] write the transition density as the product of the Gaussian normal density and the expectation over the guided process of the correction factor:

p(T,v;m)=|A(T,v)|(2πT)dea(0,m)1(mv)22TEy(t)[φ(y(t))]

Image(10.36)

with A(t,x)=(a(t,x)1)Ta(t,x)Image. See also [36], where guided bridges are produced in a related way by using an approximation of the true transition density to replace p(Tt,v;y(t))Image in (10.33). The Delyon and Hu approach can be seen as a specific case of this where p(Tt,v;y(t))Image is approximated by the transition density of a Brownian motion.

Image
Figure 10.9 (Left) The guided processes (10.34) modifies the original process x(t) by addition of a scalar multiple of the term v − x(t) (dotted arrow), the difference between the time t value x(t) and the target v, to force the modified process y(t) to hit v a.s. (Right) Scheme (10.34) applied to generate a sample from the bridge process (blue curve (dark gray in print version) of each landmark) between two corpora callosa shapes (red (light gray in print version)/black landmark configurations) represented as points in R156Image with the non-Euclidean landmark metric described in Chapter 4 and section 10.8.5.

10.7.4 Bridge sampling on manifolds

Extending the simulation scheme to general manifolds directly is nontrivial and the subject of ongoing research efforts. The fundamental issue is finding appropriate terms to take the role of the guiding term in (10.34) and controlling the behavior of such terms near the cut locus of the manifold. Here we instead sketch how the Delyon and Hu approach can be used in coordinates. This follows [30], where the approach is used for simulating from the Brownian motion on the landmark manifold described in chapter 4.

We assume that we have a chart covering the manifold up to a set of measure zero, and here we ignore the case where the stochastic process crosses this set. We take as an example the Riemannian Brownian motion with the coordinate process (10.15). Using the approach of Delyon and Hu, we get the guided processes

dy(t)=b(y(t))dty(t)vTtdt+g(y(t))1dB(t).

Image(10.37)

For the analysis in Delyon and Hu, we need the cometric g(y(t))1Image and its inverse, the metric g(y(t))Image, to be bounded, whereas the drift coming from the Christoffel symbols can be unbounded or replaced by a bounded approximation. Then using (10.36), we get the expression

p(T,v;m)=|g(v)|(2πT)de(mv)Tg(m)1(mv)2TEy(t)[φ(y)(t)].

Image

This process is in coordinates and thus gives the density with respect to the Lebesgue measure on RdImage. We get the corresponding density with respect to dVgImage on MImage by removing the |g(v)|Image term:

p(T,v;m)=(2πT)d2e(mv)Tg(m)1(mv)2TEy(t)[φ(y)(t)].

Image(10.38)

The expectation Ey(t)[φ(y(t))]Image has no closed-form expression in general. Instead, it can be approximated by Monte Carlo sampling by simulating processes (10.37) finitely many times and averaging the computed correction factors φ(y(t))Image.

With the machinery to approximate the likelihood in place, we can subsequently seek to optimize the likelihood with respect to the parameters θ. This can be done directly by computing the gradient with respect to θ of (10.38). This is a relatively complex expression to take derivatives of by hand. Instead, automatic differentiation methods can be used such as pursued in the Theano Geometry library, which we used to produce the examples in this chapter. This brings us to the following stochastic gradient descent algorithm for parameter estimation by bridge sampling, where we iteratively update the parameter estimate θlImage:

Image
Algorithm 10.1 Parameter estimation from samples y1,,yNMImage.

10.8 Advanced concepts

Here we give more detail on some of the concepts that result from using a fiber bundle structure to model data variation on manifolds. In particular, we discuss how the Riemannian curvature tensor can be expressed directly as the vertical variation of frames resulting from the nonclosure of the bracket of horizontal vector fields. We then define a sub-Riemannian geometry on the frame bundle that has a notion of most probable paths as geodesics, and we discuss how to geometrically model the actual infinitesimal covariance matrix as compared to the square root covariance we have used so far. Finally, we give two examples of flows using special geometric structure, namely flows in the phase space of the landmark manifold.

Many of the concepts presented here are discussed in more detail in [35,33].

10.8.1 Curvature

The curvature of manifold is most often given in terms of the curvature tensor RT31(M)Image, which is defined from the connection; see chapter 1. Let now uFMImage be a frame considered as an element of GL(Rd,Tπ(u)M)Image. We use this identification between Tπ(u)MImage and RdImage to write the curvature form Ω:

Ω(vu,wu)=u1R(π(vu),π(wu))u,vu,wvTFM.

Image

Note that Ω takes values in gl(n)Image: It describes how the identity map u1u:RdRdImage changes when moving around an infinitesimal parallelogram determined by the tangent vectors π(vu)Image and π(wu)Image with u kept fixed. It is thus vertical valued: It takes values in VFM. This can be made precise by employing an isomorphism ψ between FM×gl(n)Image and VFM given by ψ(u,v)=ddtuexp(tv)|t=0Image using the Lie group exponential exp on GL(Rd)Image; see, for example, [19].

Now using the horizontal–vertical splitting of TFMImage and ψ, we define a gl(n)Image-valued vertical one-form ω:TFMgl(n)Image by

ω(vu)=0 if vuHFMω(vu)=ψ1(vu) if vuVFM.

Image(10.39)

Here ω represents the connection via the horizontal–vertical splitting by singling out the vertical part of a TFMImage vector and representing it as an element of gl(n)Image [13]. Using ω, we have

ω([Hi,Hj])=Ω(Hi,Hj),

Image(10.40)

and we see that the curvature form measures the vertical component of the bracket [Hi,Hj]=HiHjHjHiImage between horizontal vector fields. In other words, a nonzero curvature implies that the bracket between horizontal vector fields is nonzero.

As a consequence, nonzero curvature implies that it is impossible to find a submanifold of FMImage that has its tangent space being the span of the horizontal vector fields: For this to happen, the horizontal vector fields would need to present an integrable distribution by the Frobenius theorem, but the condition for this is exactly that the bracket between vector fields in this distribution must be closed. This is the reason why the infinitesimal PPCA model described here does not generate submanifolds of FMImage or MImage as in the Euclidean case.

10.8.2 Sub-Riemannian geometry

A sub-Riemannian metric acts as a Riemannian metric except that it is not required to be strictly positive definite: It can have zero eigenvalues. We now define a certain sub-Riemannian metric on FMImage that can be used to encode anisotropy and infinitesimal covariance. First, for uFMImage, define the inner product Σ(u)Image on Tπ(u)MImage by

Σ(u)1(v,w)=u1(v),u1(w)Rd,v,wTπ(u)M.

Image(10.41)

Note how u1Image maps the tangent vectors v, w to RdImage before the standard Euclidean inner product is applied. To define an inner product on TFMImage, we need to connect this to tangent vectors in TFMImage. This is done using the pushforward of the projection π giving the inner product

gu(v,w)=Σ(u)1(πv,πw).

Image

This metric is quite different compared to a direct lift of a Riemannian metric to the frame bundle because of the application of u1Image in (10.41). This is a geometric equivalent of using the precision matrix Σ1Image as an inner product in the Gaussian density function. Here it is instead applied to infinitesimal displacements. Note that guImage vanishes on VFMImage because π(v)=0Image for vVFMImage. The inner product is therefore only positive definite on the horizontal subbundle HFMImage.

For a curve u(t)FMImage for which ˙u(t)HFMImage, we define the sub-Riemannian length of u(t)Image by

l(u(t))=10gu(t)(˙u(t),˙u(t))dt.

Image

If ˙uImage is not a.e. horizontal, then we define l(u)=Image; l defines a sub-Riemannian distance, which is equivalent to the distance dQImage in section 10.7.1. Extremal curves are called sub-Riemannian geodesics. A subclass of these curves are the normal geodesics that can be computed from a geodesic equation as in the Riemannian case. Here we represent the sub-Riemannian metric as a map ˜g:TFMHFMTFMImage defined by gu(w,˜g(ξ))=(ξ|w)Image, wHuFMImage, ξTFMImage and define the Hamiltonian

H(u,ξ)=12ξ(˜gu(ξ)).

Image

In canonical coordinates the evolution of normal geodesics is then governed by the Hamiltonian system

˙ui=Hξi(u,ξ),˙ξi=Hui(u,ξ)

Image(10.42)

10.8.3 Most probable paths

The concept of path probability and maximizing path probability needs careful definitions because of the fact that sample paths of semimartingales are a.s. nowhere differentiable. It is therefore not possible to directly write up an energy for such paths using derivatives and to maximize such an energy. Instead, Onsager and Machlup [8] defined a notion of path-probability as the limit of progressively smaller tubes around smooth paths γ. Here we let μMϵ(γ)Image be the probability that a process x(t)Image stays within distance ϵ from the curve γ, that is,

μMϵ(γ)=P(dist(x(t),γ(t))<ϵ,t[0,1]).

Image

The most probable path is the path that maximizes μϵM(γ)Image as ϵ0Image.

For a Riemannian Brownian motion, Onsager and Machlup showed that

μMϵ(γ)exp(cϵ2+10LM(γ(t),˙γ(t))dt)

Image(10.43)

as ϵ0Image, where LMImage is the Onsager–Machlup functional

LM(γ(t),˙γ(t))=12˙γ(t)2g+112Sg(γ(t)),

Image

where SgImage is the scalar curvature. Notice the resemblance with the usual Riemannian energy except for the added scalar curvature term. Intuitively, this term senses the curvature of the manifold as the radii of the tubes around γ approaches zero.

Turning to the mapping of Euclidean processes to the manifold via the frame bundle construction, [32,35,33] propose to define the path probability of a process y(t)Image on MImage that is a stochastic development of a Brownian motion x(t)Image on RdImage by applying the Onsager–Machlup functional to the processes x(t)Image. The path probability is thus measured in the Euclidean space. Extremal paths in this construction are called the most probable paths for the driving semimartingale, which in this case is x(t)Image. Because the scalar curvature term of LMImage is zero in the Euclidean space, we identify the curves as

argminy(t),y(0)=m,y(1)=y10LRn(x(t),ddtx(t))dt.

Image

The function turns out to be exactly the sub- Riemannian length defined in the previous section, and the most probable paths for the driving semimartingale therefore equal geodesics for the sub-Riemannian metric guImage. In particular, Hamiltonian equations (10.42) characterize the subclass of normal geodesics. Fig. 10.10 illustrates such curves, which are now extremal for the anisotropically weighted metric.

Image
Figure 10.10 Geodesics (black curves) and most probable paths (blue) for a driving Brownian motion on the sphere S2Image from the north pole (red dot (dark gray in print version)) to a point on the southern hemisphere (blue dot (light gray in print version)). Left: isotropic process; center and right: processes with covariance visualized by the frame (arrows) at the north pole and the ellipses. The parallel transport of the frame along the most probable paths is plotted. The sphere is colored by an approximation of the generated transition density. It can be clearly seen how increasing anisotropy interacts with curvature to give most probable paths that are not aligned with geodesics. Intuitively, the most probable paths tend to stay in high-density areas on the northern hemisphere before taking the “shorter” low-probability route to the target point on the southern hemisphere.

10.8.4 Bundles without rotation

When modeling infinitesimal covariance, the frame bundle in a sense provides an overspecification because uFMImage represents square root covariances ΣImage and not Σ directly. Multiple such square roots can represent the same Σ. To remedy this, we can factorize the inner product Σ1(u)Image above through the bundle Sym+Image of symmetric positive definite covariant 2-tensors on MImage. We have

FMΣ1Sym+MqM,

Image

and Σ1(u)Image can now directly be seen as an element of Sym+Image. The polar decomposition theorem states that Sym+Image is isomorphic to the quotient FM/O(Rd)Image with O(Rd)Image being orthogonal transformations on RdImage. The construction thus removes the rotation from FMImage that was the over-specification representing the square root covariance. The fiber bundle structure and horizontality that we used on FMImage descend to Sym+Image. In practice we can work on Sym+Image and FMImage interchangeably. It is often more direct to write SDEs and stochastic development on FMImage, which is why we generally prefer this instead of using Sym+Image.

10.8.5 Flows with special structure

We have so far created parametric families of probability distribution on general manifold using stochastic processes, either the Brownian motion or stochastic developments of Euclidean semimartingales. Here we briefly mention other types of processes that use special structure of the underlying space and that can be used to construct distributions for performing parameter estimation. We focus on three cases of flows of the LDDMM landmark manifold discussed in chapter 4.

The landmark geodesic equations with the metric discussed in Chapter 4 are usually written in the Hamiltonian form

˙qi=Hpi(q,p),˙pi=Hqi(q,p),

Image(10.44)

with the position coordinates q=(q1,,qn)Image of the n landmarks, the momentum coordinates p, and the Hamiltonian H(q,p)=pTK(q,q)pImage. We can use this phase-space formulation to introduce noise that is coupled to the momentum variable instead of only affecting the position equation, q, as pursued so far.

A construction for this is given by Trouvé and Vialard [38] by adding noise in the momentum variable with position and momentum dependent infinitesimal covariance

dqi=Hpi(q,p),dpi=Hqi(q,p)dt+ϵi(q,p)dx(t),

Image(10.45)

where x(t)Image is a Brownian motion on RndImage. Similarly, Marsland and Shardlow define the stochastic Langevin equations

dqi=Hpi(q,p),dpi=λHqi(q,p)Hqi(q,p)dt+ϵdxi(t).

Image(10.46)

In both cases the noise directly affects the momentum.

A related but somewhat different model is the stochastic EPDiff equation by Arnaudon et al. [1]. Here a family of fields σ1,,σJImage is defined on the domain Ω where the landmark reside, and noise is multiplied on these fields:

dqi=Hpidt+l=1Jσl(qi)dx(t)l,dpi=Hqidtl=1Jqi(piσl(qi))dx(t)l.

Image(10.47)

Here the driving Brownian motion x(t)Image is RJImage-valued. Notice the coupling to the momentum equation by the derivative of the noise fields. The stochasticity is in a certain sense compatible with the geometric construction that is used to define the LDDMM landmark metric. In particular, the momentum map construction [2] is preserved, and the landmarks equations are extremal for a stochastic variational principle

S(q,p)=H(q,p)dt+ipi(dqi+l=1Jσl(qi)dx(t)l).

Image(10.48)

Bridge sampling on these processes can be pursued with the Delyon and Hu approach, and this can again be used to infer parameters of the model. In this case, the parameter set includes parameters for the noise fields σiImage. However, the diffusion field is in this case in general not invertible as was required by the guidance scheme (10.34). This necessitates extra care when constructing the guiding process [1]. Bridge simulation for the Trouvé–Vialard and Marsland–Shardlow models (10.45) and (10.46) can be pursued with the simulation approach of Schauer and van der Meulen; see [3].

In these examples, the Euclidean structure of the landmark domain Ω is used in defining the SDEs by using either the coordinates on the momentum variable in (10.45) and (10.46) or by using the noise fields σiImage on Ω in the stochastic EPDiff case (10.48). In the latter example, the construction is furthermore related to the representation of the landmark space as a homogeneous space arising from quotienting a subgroup of the diffeomorphism group Diff(Ω)Image by the isotropy group of the landmarks. On this subgroup of Diff(Ω)Image, there exists an SDE driven by the right-invariant noise defined by σiImage. Solutions of this SDE project to solutions of (10.47). A further interpretation of the fields σiImage is that they represent noise in Eulerian coordinates, and they thereby use the Eulerian coordinate frame for defining the infinitesimal covariance.

In all cases the parameters θ can be estimated from observed landmark configurations q1,,qNImage by maximum likelihood. The parameters θ can specify the starting conditions (q0,p0)Image of the process, the shape and position of σiImage, and even parameters for the Riemannian metric on the landmark space.

10.9 Conclusion

The aim of the chapter is to provide examples of probabilistic approaches to manifold statistics and ways to construct parametric families of probability distributions in geometrically natural ways. We pursued this using transition distributions of several stochastic processes: the Riemannian Brownian motion, Brownian motion on Lie groups, anisotropic generalizations of the Brownian motion by use of stochastic development, and finally flows that use special structure related to the particular space, the shape space of landmarks. We have emphasized the role of infinitesimal covariance modeled by frames in tangent spaces when defining SDEs and stochastic processes. In the Lie group setting, left-invariant vector fields provided this basis. In the general situation, we lift to the frame bundle to allow use of the globally defined horizontal vector fields on FMImage.

As illustrated from the beginning of the chapter in Fig. 10.1, probabilistic approaches can behave quite differently from their least-squares counterparts. We emphasized the coupling between covariance and curvature both visually and theoretically, the latter with the link between curvature and nonclosedness of the horizontal distribution, sub-Riemannian geodesics, and most probable paths for the driving semimartingales.

Finally we used the geometric and probabilistic constructions to describe statistical concepts such as the maximum likelihood mean from the Brownian motion, and maximum likelihood mean and infinitesimal covariance, and we provided ways of optimizing the parameters using bridge sampling.

The theoretical development of geometric statistics is currently far from complete, and there are many promising directions to be explored to approach as complete a theory of geometric statistics as is available for linear statistics. The viewpoint of this chapter is that probabilistic approaches play an important role in achieving this.

10.10 Further reading

Here we provide a few useful example references for background information and further reading.

An introduction to general SDE theory can be found in [25]. Much of the frame bundle theory, stochastic analysis on manifolds using frame bundles, and theory of Brownian motion on manifolds can be found in [13]. See also [7] for details on stochastic analysis on manifolds. Brownian motion on Lie groups is, for example, covered in [20]. Diffusions on stratified spaces is described in the works [23,24] by Tom Nye.

The relation between the horizontal subbundle and curvature can be found in the book [19]. Sub-Riemannian geometry is covered extensively in [22]. The stochastic large deformation model in [1] builds on the stochastic variational method of Holm [12].

References

1. Alexis Arnaudon, Darryl D. Holm, Stefan Sommer, A geometric framework for stochastic shape analysis, Foundations of Computational Mathematics July 2018.

2. M. Bruveris, F. Gay-Balmaz, D.D. Holm, T.S. Ratiu, The momentum map representation of images, arXiv:0912.2990; December 2009.

3. Joris Bierkens, Frank van der Meulen, Moritz Schauer, Simulation of elliptic and hypo-elliptic conditional diffusions, arXiv:1810.01761 [math, stat]; October 2018.

4. Bernard Delyon, Ying Hu, Simulation of conditioned diffusion and application to parameter estimation, Stochastic Processes and Their Applications November 2006;116(11):1660–1675.

5. Benjamin Eltzner, Stephan Huckemann, Kanti V. Mardia, Torus principal component analysis with an application to RNA structures, arXiv:1511.04993 [q-bio, stat]; November 2015.

6. David Elworthy, Geometric aspects of diffusions on manifolds, Paul-Louis Hennequin, ed. École D'Été de Probabilités de Saint-Flour XV–XVII, 1985–87, Number 136. Lecture Notes in Mathematics. Berlin, Heidelberg: Springer; 1988:277–425.

7. Michel Emery, Stochastic Calculus in Manifolds. Universitext. Berlin, Heidelberg: Springer Berlin Heidelberg; 1989.

8. Takahiko Fujita, Shin-ichi Kotani, The Onsager–Machlup function for diffusion processes, Journal of Mathematics of Kyoto University 1982;22(1):115–130.

9. M. Frechet, Les éléments aléatoires de nature quelconque dans un espace distancié, Annales de L'Institut Henri Poincaré 1948;10:215–310.

10. U. Grenander, Probabilities on Algebraic Structures. John Wiley and Sons; 1963.

11. Stephan Huckemann, Thomas Hotz, Axel Munk, Intrinsic shape analysis: geodesic PCA for Riemannian manifolds modulo isometric lie group actions, Statistica Sinica January 2010;20(1):1–100.

12. Darryl D. Holm, Variational principles for stochastic fluid dynamics, Proceedings - Royal Society. Mathematical, Physical and Engineering Sciences April 2015;471(2176).

13. Elton P. Hsu, Stochastic Analysis on Manifolds. American Mathematical Soc.; 2002.

14. Sungkyu Jung, Ian L. Dryden, J.S. Marron, Analysis of principal nested spheres, Biometrika January 2012;99(3):551–568.

15. Line Kühnel, Stefan Sommer, Alexis Arnaudon, Differential geometry and stochastic dynamics with deep learning numerics, Applied Mathematics and Computation 1 September 2019;356:411–437.

16. Line Kuhnel, Stefan Sommer, Computational anatomy in Theano, Mathematical Foundations of Computational Anatomy (MFCA). 2017.

17. Kühnel Line, Stefan Sommer, Stochastic development regression on non-linear manifolds, Information Processing in Medical Imaging. Lecture Notes in Computer Science. Cham: Springer; June 2017:53–64.

18. Kühnel Line, Stefan Sommer, Stochastic development regression using method of moments, Geometric Science of Information. Lecture Notes in Computer Science. Cham: Springer; November 2017:3–11.

19. Ivan Kolář, Jan Slovák, Peter W. Michor, Natural Operations in Differential Geometry. Berlin, Heidelberg: Springer Berlin Heidelberg; 1993.

20. Ming Liao, Lévy Processes in Lie Groups. Cambridge, New York: Cambridge University Press; 2004.

21. Zoltán Magyar, Heat kernels on Lie groups, Journal of Functional Analysis October 1990;93(2):351–390.

22. Richard Montgomery, A Tour of Subriemannian Geometries, Their Geodesics and Applications. American Mathematical Soc.; August 2006.

23. T.M.W. Nye, M.C. White, Diffusion on some simple stratified spaces, Journal of Mathematical Imaging and Vision September 2014;50(1):115–125.

24. Tom M.W. Nye, Convergence of random walks to Brownian motion on cubical complexes, arXiv:1508.02906 [math, q-bio]; August 2015.

25. Bernt Øksendal, Stochastic Differential Equations: An Introduction With Applications. Springer Science & Business Media; 2003.

26. Xavier Pennec, Intrinsic statistics on Riemannian manifolds: basic tools for geometric measurements, Journal of Mathematical Imaging and Vision 2006;25(1):127–154.

27. Xavier Pennec, Barycentric subspace analysis on manifolds, arXiv:1607.02833 [math, stat]; July 2016.

28. Omiros Papaspiliopoulos, Gareth O. Roberts, Importance sampling techniques for estimation of diffusion models, Statistical Methods for Stochastic Differential Equations. Chapman & Hall/CRC Press; 2012.

29. Sam Roweis, EM algorithms for PCA and SPCA, Proceedings of the 1997 Conference on Advances in Neural Information Processing Systems 10, NIPS '97. Cambridge, MA, USA: MIT Press; 1998:626–632.

30. Stefan Sommer, Alexis Arnaudon, Line Kuhnel, Sarang Joshi, Bridge simulation and metric estimation on landmark manifolds, Graphs in Biomedical Image Analysis, Computational Anatomy and Imaging Genetics. Lecture Notes in Computer Science. Springer; September 2017:79–91.

31. Stefan Sommer, Horizontal dimensionality reduction and iterated frame bundle development, Geometric Science of Information. LNCS. Springer; 2013:76–83.

32. Stefan Sommer, Anisotropic distributions on manifolds: template estimation and most probable paths, Information Processing in Medical Imaging. Lecture Notes in Computer Science. Springer; 2015;vol. 9123:193–204.

33. Stefan Sommer, Anisotropically weighted and nonholonomically constrained evolutions on manifolds, Entropy November 2016;18(12):425.

34. Stefan Sommer, An infinitesimal probabilistic model for principal component analysis of manifold valued data, arXiv:1801.10341 [cs, math, stat] January 2018.

35. Stefan Sommer, Anne Marie Svane, Modelling anisotropic covariance using stochastic development and sub-Riemannian frame bundle geometry, Journal of Geometric Mechanics June 2017;9(3):391–410.

36. Moritz Schauer, Frank van der Meulen, Harry van Zanten, Guided proposals for simulating multi-dimensional diffusion bridges, Bernoulli November 2017;23(4A):2917–2950.

37. Michael E. Tipping, Christopher M. Bishop, Probabilistic principal component analysis, Journal of the Royal Statistical Society. Series B January 1999;61(3):611–622.

38. Alain Trouve, François-Xavier Vialard, Shape splines and stochastic shape evolutions: a second order point of view, Quarterly of Applied Mathematics 2012;70(2):219–251.

39. Miaomiao Zhang, P.T. Fletcher, Probabilistic principal geodesic analysis, NIPS. 2013:1178–1186.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.134.90.44