Martin Burger, Hendrik Dirks, and Lena Frerking

7On optical flow models forvariational motion estimation

Martin Burger, Institute for Computational and Applied Mathematics and Cells in Motion Cluster of Excellence, University of Münster, Orléans-Ring 10, 48149 Münster, Germany, 18

Hendrik Dirks, Institute for Computational and Applied Mathematics, University of Münster, Orléans-Ring 10, 48149 Münster, Germany, 19

Lena Frerking, Institute for Computational and Applied Mathematics and Cells in Motion Cluster ofExcellence, University of Münster, Orléans-Ring 10, 48149 Münster, Germany, [email protected]

Abstract: The aim of this paper is to discuss and evaluate total variation based regularization methods for motion estimation, with particular focus on optical flow models. In addition to standard L2 and L1 data fidelities we give an overview of different variants of total variation regularization obtained from combination with higher or-dermodels and a unified computational optimization approach based on primal-dual methods. Moreover, we extend the models by Bregman iterations and provide an inverse problems perspective to the analysis of variational optical flow models.

A particular focus of the paper is the quantitative evaluation of motion estimation, which is a difficult and often underestimated task. We discuss several approaches for quality measures of motion estimation and apply them to compare the previously discussed regularization approaches.

Keywords: Inverse problems, imaging, motion estimation, variational models, optical flow, total variation, generalized total variation

7.1Introduction

Motion estimation is a crucial task in many different areas. On the one hand, motion estimation is important in medical and biological contexts where the goal is for example to track moving cells or to detect the motion of organs. Onthe other hand, it is also important in the automotive sector. Nowadays, there are many approaches that use motion estimation to make driving saver, by detecting both dangers from the outside of a car and also inattentiveness of a driver, which expresses for example in slower movement or closing eyes due to tiredness. These are only two of very many further applications of motion estimation.

Motion estimation generally arises in the context of image sequences u(x, t), depending on a spatial position x Ω d and a time t [0, T]. For real applications there exists only the discrete counterpart of u, which is a set of images recorded at time steps t0, t0 + δt , t0 + 2δt , . . .. There also exist a variety of characteristics we have to consider when estimating motion:

A digital image can suffer from low resolution, low contrast, different gray levels and noise.

The temporal resolution δt is strongly connected to the underlying motion. For too large time steps we might loose correspondence between consecutive images (e.g., a very fast car might only be visible in one image).

A natural image often contains a set of moving objects with different speeds. A sufficient model should simultaneously be able to detect small and large movements in the same sequence. Conversely, for the static background no motion should be detected.

For many biological applications, we have to consider the fact that the illumination is constant, but fluorescence of the observed object can be inhomogeneous in space, or might even underlie changes over time.

Optical flow and real motion

When looking at image sequences and moving objects we directly speak of motion. This is a false implication since only projections of the real 3-dimensional motion fields are recorded by an image recording device (and in particular the human eye).

To emphasize this fact, we consider a camera recording traffic on a highway. The camera is only able to follow 2-dimensional paths υp = (υ1, υ2) on the image domain Ω, which is the projection of the real 3-dimensional path υr = (1 , υ2, υ3). Thus, already one degree of information gets lost here. This problem is even worse since we are not able to measure the 2-dimensional motion field directly. On images only the apparent motion (or optical flow) is visible, that is displacements of intensities. Unfortunately the apparent motion and the 2-dimensional motion field are two fundamentally different properties (a detailed discussion of this problem can be found in [32]). The Barbers pole example is often used to underline this difference. The pole simply rotates counterclockwise, but optical devices (e.g., camera, eye) can only detect gray values tending upwards and consequently the optical flow points upwards.

The aim of an optical flow model is to detect the real motion only from the intensities of the given images. Performing flow estimation we have to deal with an inverse problem. We need to find the right motion that leads to the actual appearance of a specific image.

7.2Models

One of the most common techniques to formally link intensity variations and motion is the optical flow constraint. On the basis of the assumption that the image intensity u(x, t) is constant along a trajectory x(t) with we get using the chain-rule

The last equation is generally known as the optical flow constraint. This optical flow constraint can also be regarded as a linear inverse problem. Therefore, we rewrite (2.1) by changing the positions of υ and u to get

We define A: Lp(Ω)2 Lp(Ω) via

Hence, the constraint fulfills the inverse problem Aυ = g, with p = 1 or p = 2 depending on the norm of the data fidelity. A more detailed introduction into the optical flow problem can be found for example in [1] or [5]. We will discuss this issue in more detail in the following sections.

7.2.1Variational models with gradient regularization

The optical flow constraint constitutes in every point x one equation, but in the contextof motion estimation from images we usually have two or three spatial dimensions. Consequently, the problem is massively underdetermined. However, it is possible to estimate the motion using a variational model

where ?(u, υ) represents the so-called data term and incorporates the optical flow constraint in a suitable norm. The second part (υ) models additional a priori knowledge on υ and is denoted as a regularizer. The parameter α regulates between data term and regularizer.

Possible choices for the data term are

or

The quadratic L2 norm can be interpreted as solving the optical flow constraint in a least-squares sense inside the image domain Ω. Conversely, taking the L1 norm enforces the optical flow constraint linearly and is able to handle outliers more robustly.

The regularizer (υ) has to be chosen such that the a priori knowledge is modeled in a reasonable way. If the solution is expected to be smooth, a quadratic L2 norm on the gradient of υ is chosen and we have

Another possible approach is to choose the total variation (TV) of υ if we expect piecewise constant parts of motion. In the finite-dimensional setting this can be written as

2(υ) := υ1 .

Taking

results in the very well known model of Horn and Schunck [18], 1981. With efficient primal-dual schemes to minimize L1 norms [12], L1-TV optical flow models with

?2(u, υ) = υ u + ut1 and 2(υ) = υ1

became very popular [25, 36]. This model was further improved by Werlberger et al. [35], where the classical TV regularizer was replaced with a Huber norm. In this context, let us refer to a recent survey by Becker, Petra and Schnörr [5].

7.2.2Extension of the regularizer

The previously described regularizations are able to produce either smooth velocity fields (L2 regularization) or sharp edges (TV regularization). However, in realistic applications we often need a combination of both characteristics. To calculate velocity fields that combine smoothness and sharp edges, we need to extend the regularization term.

In the following approach we want to make use of the potential of TV regularization to calculate sharp edges. To incorporate the possibility to get smooth parts, we need an additional primal variable w: 2 2, which will be required for a second regularization term that allows for smooth transitions. We link the variables υ and w by forcing

υ w 0 .

Ifwe penalize this difference with an L1 norm and minimize it for υ, we get a simple TV regularization, which is shifted by w. This way we make sure we keep characteristics of a TV regularization for our results. To combine this with the characteristics of another regularization technique, we apply a second regularizer to w and also minimize everything for w. This procedure leads us to the following extended regularizer:

The term ?(w) is the additional term, which should be chosen in a way that it produces smooth flow fields. The weighting parameters α0 + and α1 [0, 1] on the one hand determine the relation between the regularization parts and the optical flow constraints. On the other hand, they also determine the relation between the two regularization parts and in this way adjust the proportion of smooth parts and edges in the result. If is large, the constant parts and edges outweigh the smooth parts; if it is small, the smooth parts outweigh the constant parts and edges. Note that a functional of the form (2.4) leads to a regularization of υ via

Similar to the TV term in a standard model, we can choose between different evaluations of the norm υi w1 in the above formulation. Since this is a shifted TV norm, we once more have the possibility to use the anisotropic variant.

L1-TV/L2 optical flow model

As we have already seen, a regularization with the L2 norm leads to smooth approximations of a flow field. In the following approach, we use L2 of w as a norm for the additional regularization term ?(w), which leads us to

Together with

?(u, υ) = υ u + ut1

we get an L1-TV/L2 model. This model is a combination between the previously described HornSchunck model and the L1-TV model and thus contains characteristics of both kinds of flow fields. The regularizer in this model is very similar to the Huber function [19], which is defined as

The anisotropic variant of an optical flow model with Huber regularization has also been discussed in [35]. For another formulation of a related regularizer see for example [33] or [8].

Even if the Huber function can be solved directly, we want to be able to compare the different models by using the same algorithm. Therefore, later on we will use the same the algorithm as for the other approaches.

L1-TV/TV optical flow model

Another possibility to consider smoothness in a flow estimation is to incorporate a higher order derivative. In the setting of (2.4), we can choose ?(w) as the TV of w, which leads us to

Since the first term of the regularization leads to w being close to the gradient of w, the last term considers a term, which is close to the second gradient of υ. For the additional TV term in this model, we can again choose between the isotropic and the anisotropic variant, that is equivalently to those in Section 7.2.1. Overall, this leads to a huge variety of possible evaluations for this model.

By using

?(u, υ) = υ u + ut1

as a data term we get an L1-TV/TV model, which we will later use for the numerical realization.

The regularizer we get with the described approach is a primal realization of a specific case of a Total Generalized Variation (TGV) regularizer [7]. See [27, 28] for further analysis of the TGV regularizer applied to motion estimation models.

7.2.3Bregman iterations

A well-known drawback of TV regularization in combination with L2 data fidelity is the loss of contrast, which means that the difference between intensities in the result is lowered. However, edges are well recovered. Concerning optical flow, this shows up in a reduced velocity of motion. A possibility to overcome this drawback is to apply contrast-enhancing Bregman iterations [23]. For the case of a higher order regularization term additional systematic bias, for example related to slopes, is corrected in the same way (cf. [6]). For this sake, we need the Bregman Distance of the regularizer , which is defined as

for υ, υ dom(). This distance replaces the former regularization term in the respective models. In this way it is possible to re-enhance the contrast. Thanks to the condition that b is in the subgradient of (υ), it is ensured, that the edges stay at the correct positions. The Bregman iteration iteratively computes υn+1 as a solution of

Afterwards, the Bregman variable is updated by setting

Applying Bregman iterations to the optical flow problem with L2 data fidelity can bewritten equivalently by adding the Bregman variable bn to the data. For the n-th Bregman iteration we can use the extended data term whichresults in an additive constant for the optimality condition. So far Bregman iterations have not been investigated as an iterative regularization method for optical flow. A previous investigation [17] only considered (split) Bregman methods as a minimization method for the original variational model, and did not investigate the possibility of reducing systematic bias. As we shall see below (Figure 3) the Bregman iteration can yield improvements in parts where other variational methods underestimate motion (see e.g., the right block in the rubber whale data set below) and produce competitive results (Table 2). It might thus nicely complement other approaches.

In the case of an L1 data fidelity, a simple modification of the data term is not possible and the iteration needs to be carried out via the original form (2.8). Because of the additional linear term ⟨bn, υ⟩ in comparison to the other linear terms in the functional, it is not guaranteed that the functional is coercive, hence the well-posedness of the iteration is not guaranteed in the infinite-dimensional setting. In any case, one is not sure that the Bregman iteration actually yields additional benefit, since the L1 data fidelity does not introduce systematic bias. We leave this issue to future research.

7.3Analysis

In the following we briefly comment on some aspects in the analysis of variational problems for optical flow, with special focus on particular aspects related to the recent extensions of total variation. We are able to generalize existing results for the Horn Schunck and L2-TV to other Lp data terms, focusing in particular on p = 1. This can be done under the same conditions on the image gradient as previously used, but the proof needs some adaption to take care of the constant functions in the null-space of the gradient. The coercivity related to the latter needs a different proof compared to previous results (cf. [30]). Conversely, we give a more concise proof as in [16] and avoid too strong regularity conditions. The understanding of the basic structure of the proof then immediately allows a generalization to existence results for the different recent variants of total variation regularization, an analysis that has not been carried out in the optical flow setting. Finally, we will discuss another problem that has hardly been looked at in the optical flow literature previously, namely quantitative estimation of errors in the motion estimate when changing image data.

7.3.1Existence of minimizers

The basic issue of the existence of minimizers for optical flow problems has been considered by various authors (cf. e.g., [1, 16]) using methods of calculus of variations. With the assumption ut L2(Ω) and u L(Ω) it is straightforward to see that the operator A and data g defined in (2.3) are bounded for p = 1, 2 and with standard convexity arguments lower semicontinuity of the data fidelities follow (for p = 1 assumptions can even be relaxed). The associated data terms can all be shown to be L2-coercive on subspaces of H1(Ω)2 in the case of the HornSchunck model and of BV(Ω) for the other models. The subspaces are given by functions with mean value zero for standard models, while they are given by functions of vanishing mean and first moments in the case of the TV-TV and TV-L2 regularizer. Hence, in order to obtain coercivity and thus existence of minimizers it is a natural question whether the data term yields coercivity on the whole space, that is boundedness of the data fidelity further implies bounds on the mean (and potentially the first moments) of υ. Obviously this cannot be true without further assumptions on u (which is seen immediately for constant u). In the case of the HornSchunck model this question was answered in [30], who showed coercivity for the original HornSchunck model (see also [20] for a detailed discussion). His proof heavily uses the quadratic structure of the data fidelity as well as the fact that functions in the kernel of the regularization term are constant, hence it can be generalized to the L2-TV model, but neither to extensions of the TV regularization nor to L1-fidelities. However, the result can easily be generalized as we show in the following:

Lemma 3.1. Let Ω d, d 1 arbitrary, be a bounded domain and let (υn) be a sequence in Lp(Ω), 1 p . Let ut Lp(Ω) and u L(Ω)2 such that the functions xi u are linearly independent. If there exists a sequence of vectors cn d such that cn + υn is bounded in Lp(Ω) and

ut + u υnLp is bounded in + ,

then υn is bounded in Lp(Ω) .

Proof. Under the above assumptions it obviously suffices to prove that the sequence cn is bounded in d.We have

The boundedness of ut+uυnLp of cn+υn imply that ucn is bounded in Lp(Ω). Now assume that cn is unbounded, then there exists a subsequence (again denoted by cn) with |cn| growing to infinity in a monotone way. Then is well defined and bounded, moreover tends to zero in Lp(Ω). Hence, has a convergent subsequence dn with limit d satisfying |d| = 1, in particular d ≠ 0. By continuity of the norm we find

u dLp = 0 ,

which contradicts the linear independence of the functions xi u and thus yields theassertion.

With the PoincareWirtinger inequality and the embedding of BV(Ω) into L2(Ω) for d 2 and L1(Ω) in arbitrary dimension, we obtain the coercivity of the total variation on functions of mean value zero. Now we choose cn equal to the mean value of υn and obtain coercivity of the respective Lp-TV model in BV(Ω). With a standard (weak) lower semicontinuity argument for the convex functionals we thus deduce the following existence result:

Theorem 3.2. Let Ω d be a bounded domain and let either p = 1 or p = 2. Let ut Lp(Ω) and u L(Ω)d such that the functions xi u are linearly independent. Then there exists a minimizer υ BV(Ω)d for the Lp-TV model in the cases p = 1 for arbitrary d and p = 2 for d 2.

Lemma 3.3. Let Ω d, d 1 arbitrary, be a bounded domain and let (υn) be a sequence in Lp(Ω), 1 p . Let ut Lp(Ω) and u L(Ω)2 such that the functions {xi u, xjxi u}i,j=1,...,d are linearly independent. If there exists a sequence of vectors cn d and matrices An d×dsuch that cn + Anx + υn is bounded in Lp(Ω) and

ut + u υnLp is bounded in + ,

then υn is bounded in Lp(Ω) .

Proof. Under the above assumptions it obviously suffices to prove that the sequence cn is bounded in d and the sequence An is bounded in d×d. We have

The boundedness of ut+uυnLp of cn +Anx+υυn imply that u(cn +Anx) is bounded in Lp(Ω). A limiting procedure as in the proof of Lemma 3.1 yields a nontrivial limit d respectively B such that

u (d + BX)Lp = 0 ,

which contradicts the linear independence assumption and thus yields the assertion.

Using the coercivity of the effective functionals ̃ on the space of BV-functions with vanishing mean and first moments (cf. [6] in the TV/ TVcase and [11] in the TV/L2 case) and a similar proof as above we obtain the following result:

Theorem 3.4. Let Ω d be a bounded domain and let either p = 1 or p = 2. Let ut Lp(Ω) and u L(Ω)d such that the functions {xi u, xjxi u}i,j=1,...,d are linearly independent. Then there exists a minimizer υ BV(Ω)d for the Lp-TV/TV model and the Lp-TV/L2 model in the cases p = 1 for arbitrary d and p = 2 for d 2.

A major restriction of the existing existence analysis is the fact that some regularity of u t and u needs to be assumed, whichmight not be met by solutions of transport equations, in particular for noisy versions of the images (note however that many results in the literature assume even stronger regularity, e.g., images of class C2 as in [16]). A detailed analysis of such issues remains a relevant question for future work. The case of stochastic noise in the image data might even enforce different approaches, e.g., Bayesian models (cf. [31]).

Let us finally mention that uniqueness of minimizers cannot be expected for total variation models, since neither the regularization nor the data term are strictly convex (due to the nullspace of the map υ u υ). In the case of an L2 data fidelity one can at least derive uniqueness of the component u υ normal to the image level sets.

7.3.2Quantitative estimates

Quantitative estimates have hardly been investigated for optical flow models in the past, which is probably due to the nonuniqueness of the flow reconstruction. Hence, an estimate to a ground truth seems out of reach in most cases. Nonetheless, it seems interesting to take a look at the optical flow problem from the perspective of error estimation for inverse problems (cf. [10]). Let us consider the case of L2 data fidelities for simplicity, we start from the optimality condition

The obvious first question to ask is the robustness of solutions for errors in the data u, a second question is related to the behavior as α tends to zero. Let us first sketch the derivation of quantitative error estimates in the case of data perturbations, for this sake let υ̃ be the solution of the variational model with data ũ t and ũ. Then we have

u (υ υ)̃ u + α(b b̃) = ũtũ utu + ũ υ̃ ũ u υ̃ u .

A duality product with υ υ̃ and an elementary calculation yields

Using appropriate duality estimates on the right-hand side one immediately obtains estimates for the Bregman distances and the error in the velocity component parallel to u in terms of errors in the data. To do so, let us further rewrite the right-hand side to obtain

and with Youngs inequality we find

The two error terms on the right-hand side can be interpreted well. The first term measures the difference of the image data in a norm related to the optical flow model, it can further be related to the consistency in the optical flow constraint with estimated velocity υ.̃ The second term creates more trouble, since due to ũ it can contain terms in the direction perpendicular to u, in which case it cannot be related to the first term on the left-hand side. Note that when viewing the optical flow estimation as an inverse problem this term is indeed related to the model error, that is the error in the forward operator υ υu. Even if this term can be estimated somehow by the Bregman distances on the left-hand side (e.g., in the case of the HornSchunck functional simply by Youngs inequality), it will lead to an error term of order With the inherent smallness of α we hence conclude that errors in the image gradient will strongly influence the result. This can for example be interpreted in terms of the accuracy in the discrete approximation of the image gradient. The quality in the estimated flow indeed depends heavily on the gradient approximation in numerical results. This is illustrated in Table 3 via an evaluation of errors for different choices of difference quotients for u.

In the case of negligible error in the image gradient we can give an improved estimate:

Proposition 3.5. Let u and u ̃ be weakly differentiable image data such that tu, tũ L2(Ω) and u = ũ. Then the estimate

holds for υ and υ̃ being the solution of

with data u respectively ũ.

Note that we see in any estimate that the velocity parallel to the image gradient (normal to level lines) can be estimated robustly from the image error (with constants independent of α), while we obtain hardly any information perpendicular to the gradient. This is just a mathematical consequence of the aperture problem in motion estimation. In the perpendicular direction we obtain some stability of the velocity via the Bregman distance terms, but note that the error will grow inversely proportional to α. We mention that to leading order in a small time step τ we have

u (υ υ)̃ L2 u( + τυ()) u( + τυ(̃ ))L2 .

Hence, the error measure is strongly related to the end-point error frequently used in practical evaluations of motion estimation, an issue we will discuss in detail below.

The limit α 0 is the one usually studied in inverse problems with strong emphasis on the appropriate treatment of instabilities. In the case of motion estimation the underdeterminedness is the more crucial issue. Thus, the key question is to understand what solutions are obtained in the limit, respectively which kind of flows can be reconstructed well. The limiting solutions are determined by the constrained problem

The solutions to be reconstructed well are those satisfying a source condition (cf. [10]), that is the optimality condition for the constrained problem in a Lagrangian duality setting. This is equivalent to

for some w L2(Ω). Under this condition one can construct data ũt = ũt + αw, such that the constrained solution also satisfies the unconstrained variational problem with regularization parameter α > 0, thus error estimation reduces to the above case. From(3.4) we can give a mathematical reason why it is useful to have images with a large gradient, which corresponds to the usual intuition. The data perturbation is a multiple of w, hence error estimates depend on the norm of w, which is however inversely proportional to the norm of u. A more detailed study of (3.4) depends on the specific regularization terms and is an interesting question for further research.

7.4Numerical solution

In the following we discuss the numerical solution of the variational models discussed above. We start with a unified discussion of primal-dual iterative methods and then proceed to discretization issues.

7.4.1Primaldual algorithm

The concept of duality offers an efficient way of minimizing the introduced variational models. Let us for the numerical application consider two finite-dimensional vector spaces ? and ? equipped with a scalar product ⟨, ⟩ and a norm 2. We furthermore consider a continuous linear operator K : ? ?. Now, the class of variational problems in this section can be written as

where F, G: ? are proper, convex and lower semi-continuous functionals. In our application, G(υ) usually incorporates the data term, whereas F (Kυ) denotes the regularizer. In the following, we denote Equation (4.1) as the primal problem. The primal problem is often hard to minimize or yields very slow algorithms. Instead of the primal problem, we can equivalently optimize the so-called primal-dual problem:

Here F refers to the convex conjugate of F, which can be easily calculated for the observed regularizers. The very well-known algorithm of Chambolle and Pock is dedicated to this class of problems [12]. For τ, σ > 0, a pair (υ0, y0) ? × ? and initial value ῡ0 = 0 we obtain the following iterative scheme to solve the saddle-point problem (4.2):

Here, proxτG denotes the proximal or resolvent operator

which can be interpreted as a compromise between minimizing G and being close to the input argument y. Suitable choices of τ and σ were chosen according to [26] and are given in Section 7.4.2 for each algorithm. The primal-dual methods are efficient if the proximal problem can be solved efficiently. In [12] the authors demonstrated an application of the primaldual algorithm to a L1-TV optical flow problem. In the following we will extend this application to each of the introduced models and provide solutions to the occurring proximal problems.

In Table 1 we transferred our models to the notation of the primal problem (4.1). Let us begin with the proximal problems for the primal part G(υ), for a more detailed description we refer to [13].

L2 data term

For the L2 data term we have to solve for υ̃k+1 := υk τKyk+1 the following proximalproblem:

This yields a linear optimality system, which can be directly calculated. The solution is given by υ = f(̃k), where f is a linear function also incorporating the scalar terms ut , u and τ.

L1 data term

In case of the L1 optical flow term, the proximal problem reads

Table 1: Overview of splitting for different models.

Defining the affine linear function ρ(υ) := υu + u t, the problem above represents an affine-linear L1 optimization problem. It can be shown that the solution is given by

L2 regularization

Denoting the proximal problems for the dual part requires calculating the convex conjugate F of the dual part F first. For the L2 regularization term wecalculate and obtain the proximal problem

Consequently, the proximal problem can be solved straightaway.

TV regularization

In case of the total variation and the higher order regularization we always have L1 norms involved. Similar to the standard TV regularization (F(Kυ) = α υ1) we have F (y) = αδB(L)(y/α), where δB(L) denotes the indicator function of the L unit ball. The corresponding proximal problem then reads

The solution is given by a point-wise projection of ỹk+1 onto the unit ball corresponding to the chosen vector norm. The solution will be discussed in more detail in Section 7.4.2.

7.4.2Discretization and parameters

In the context of discretization we have to consider two different aspects. On the one hand we have to approximate the image derivatives ut, ux and uy, which are passed to the algorithm as static variables afterwards. On the other hand, we have to discretize the operators K and K coming from the primal-dual iteration.

Starting with the image discretization, we consider the image domain to consist of the regular grid:

{(t, i, j) : t = 0, 1, i = 0, . . . , nx , j = 0, . . . , ny} .

Since we are interested in the velocity field between t = 0 and t = 1 we have to evaluate the derivatives of ut,i,j at t = 0. In our observation, a mixed scheme consisting of a forward difference for the time derivative and central differences for the spatial derivatives turned out to give the best results. Stability in terms of u is not required, because of the fact that they act as constants in the model. Consequently, we obtain the following scheme:

For the sake of simplicity, we omitted the time-dependency in the derivatives.

The operators in our model only consist of gradients for the spatial regularization and the corresponding divergences. We propose a forward discretization with Neumann boundary conditions for the gradient and consequently obtain a backward difference for the divergence to keep a discrete adjoint structure. The resulting scheme reads

The primal-dual algorithms moreover requires suitable parameters σ and τ such that στ K. It has been shown in [26] that a diagonal preconditioning technique can be used to find τ, σ without calculating K. According to [26], Lemma 2, we choose for the gradient models from Section 7.2.1 and for the extended models from Section 7.2.2.

In the previous section, the solution of a proximal problem involving an indicator function of the L unit ball of y was given as a point-wise projection. To conceive this, let us begin by writing down the convex set corresponding to δB(L)(y/α):

{y : y/α 1} .

Here, y/α refers to the discrete maximum over all elements (i, j) in the image domain:

An interesting aspect is how to evaluate the vector norm yi,j= (y1, y2, y3, y4)i,j. Choosing

results in the so-called anisotropic variant, whereas

is denoted as the fully isotropic one. The anisotropic total variation is more suitable when dealing with quadratic structures, the isotropic variant better fits to circular structures. Depending on the chosen vector norm for υ (anisotropic, fully isotropic), we obtain a different dual inner norm yi,j/α. For the anisotropic case, that refers to the vectorial L1 norm, we get the maximum norm for the dual variable:

Since the fully isotropic case refers to the self-dual Euclidean norm, we get in this case

Transferring this to the optimization problem above, we get in the anisotropic case a the point-wise projection of ỹk+1 onto [α, α]:

yk+1 = min(α,max(α, ỹk+1)) ,

and for the isotropic case a point-wise projection of ỹk+1 onto the L2 unit ball:

For isotropic and anisotropic variants of the extended models, the norms are achieved equivalently. However, it has to be considered that for the L1-TV/TV model there are independent norms for both parts of the regularization. Using the partially isotropic variant also results in independent norms for the different components of υ. For the sake of simplicity we will restrict ourselves to the fully isotropic cases for all appearing total variations in the numerical realizations.

7.5Results

7.5.1Error measures for velocity fields

Finding error measures for velocity fields is a delicate problem which can be motivated by the following simple example. Consider two images, each consisting of only 3 × 3 pixels of the following form:

From our point of view we seek for a velocity field υ = (υ1, υ2) that transforms u1 into u2. Unfortunately, it is unclear which velocity field underlies this motion. Examples of possible solutions are

1.Just move the pixels at position (1, 1) and (1, 2) by 1 to the bottom, hence

2.Move the pixels at position (1, 1) and (1, 2) by 1 to the bottom and change their position:

3.Move the whole image by 1 to the bottom which gives

4.Move the pixels at position (1, 1) and (1, 2) by 1 to the bottom and exchange some pixels in the third column

All these velocity fields produce the same result, that is they all fulfill the optical flow constraint ut+uυ = 0. This results from the fact that for given images the optical flow constraint states for every point x Ω only one equation for a 2-dimensional velocity field υ. The solution is unique iff there exists a unique bijection between u1 and u2. This requires both images to consist of the same intensity values, which furthermore have to be unique. Unfortunately, this assumption is far from reality, since regions of constant intensity are characteristic for background or objects. Hence, for practical problems we have a highly underdetermined system and consequently a huge variety of possible underlying velocity fields υ.

As we will see later an error measure always explicitly or implicitly favors one of these possible results over the others. Those error measures, which explicitly prefer one result, expect a given ground truth field and measure a distance d(υGT, υ) from the calculated velocity field to the given ground truth field. This strategy is questionable because in real world examples we usually do not have a ground truth velocity field, and artificial examples can usually be generated by several different ground truth fields. Hence, setting the ground truth υGT directly favors a subjective result. This is at least from the mathematical viewpoint questionable.

Absolute endpoint error

Despite the fact that explicit error measures might be problematic they are often used in the literature and we will also use them to evaluate our algorithms. In [2] two explicit error measures have been presented. The most intuitive one is the average endpoint error (aee), proposed in [24], which is the vector-wise Euclidean norm of the difference vector υ υGT. The difference is divided by |Ω| and we have

or in a discrete formulation

Here, nPx denotes the number of pixels.

Angular error

A second measure states the angular error (ae) which goes back to the work of Fleet and Jepson [14] and a survey of Barron et al. [3]. Here υ and υGT are projected into the 3D space (to avoid division by zero) and normalized by

The error is then calculated measuring the angle between υ̂ and υ̂GT in the continuous setting as

and in a discrete setting as

Besides the fact that the ae is very popular, we want to emphasize a serious drawback. To illustrate this drawback we took a vector υGT = (υ1, υ2) and denote the Euclidean norm on the x-axis in Figure 1. This vector has been disturbed by an absolute and relative error of 0.01 to create υabs and υrel:

Afterwards we calculated the angular error between υ and υabs resp. υ and υrel and denoted this error on the y-axis of Figure 1. The first obvious property is that the error shrinks for larger velocities. This is critical since larger velocities are usually coupled with the object we are interested in. Conversely, absolute errors in small velocities (e.g., background) are over-penalized. To make this clear, let us consider a simple example: Take an algorithm producing an all-zero velocity field. This result perfectly recovers the background, but causes errors in the region of a moving object. Conversely,consider an algorithm that perfectly recovers the movement in the background, but introduces slight errors in parts of the object. Because of the fact that errors in the background are over-penalized, the first result might be preferred although it yields a worse result.

Fig. 1: Plot of the angular error ae on the y-axis with increasing υ on the x-axis. Blue: Absolute error υ υGT 2 of 0.01 Red: Relative error υ υGT 2 / υGT of 0.01

7.5.2Evaluation

In order to evaluate the introduced models we used the evaluation sequences from the Middlebury optical flow database [22]. First, the given ground-truth flow fields υGT were scaled down to a magnitude of one pixel to overcome the limitations of our algorithms (see Section 7.6.4). Afterwards, the evaluation images were created by keeping the first frame I1 and generating the second frame I 2 by cubic interpolation of I1(x + υGT). The noisy dataset was created with Gaussian noise (σ = 0.002).

The evaluation was performed on the one hand with static parameters along all datasets and on the other hand by manually adapting the parameters to find smaller values in terms of AE and AEE. The static parameters can be found in Table 2. Moreover, Table 2 contains the average rank of each algorithm for static and dynamic parameters over all evaluation datasets. Detailed error plots can be found in Figure 2. An overview of typical results for the individual algorithms is shown in Figure 3.

7.6Conclusion and outlook

The first result from the comparison of fixed versus variable parameters is that it only has a small effect on the error and, consequently, the parameters do not have to be highly tuned for motion estimation.

Another important result from the evaluation is the effect of noise on the motionestimation process. Already a small level of Gaussian noise massively disturbs the mo-

Fig. 2: Plot of absolute endpoint error (AEE) and angular error (AE) for manually chosen parameter values (top) and fixed parameter values (bottom). The number on the x-axis denotes the following evaluation sequences from the Middlebury database: 1. Dimetrodon, 2. Grove2, 3. Grove3, 4. Hydrangea, 5. Rubber Whale, 6. Urban2, 7. Urban3, 8. Venus.

Table 2: Table contains static parameters and averaged ranks in terms of AAE and AE. The average is calculated over AEE or AE of each result divided by the smallest AEE or AE of the particular dataset, i.e., and

Table 3: Table contains evaluation for different approximations of the image gradient u tested with the L1-TV model on the Dimetrodon dataset with fixed parameter α = 0.05.

Fig. 3: Overview of the results for Dimetrodon (left column), Grove2 (middle column) and RubberWhale datasets (right column). Top to bottom: ground-truth, L2-L2, L2-TV, L2-TV Bregman, L1-TV, L1-TV/L2, L1-TV/TV.

tion estimation. In general, the models with L1 data fidelity perform slightly better. Hence, images should be denoised beforehand or during the motion estimation process (see [13]).

Taking into account the visual impression of Figure 3 we notice that the grid structures in the background and the rotation on the lower left side are better reconstructed by the L1 models. The L2-L2 model also seems to be able to detect the reconstruction, but due to the missing sharp edges the visual impression is poor. Conversely, focusing on the constant moving block on the lower right side, the L2-TV Bregman model reconstructs this part better than all the other models.

7.6.1Mass preservation

So far, the optical flow equation was used as a data term in combination regularizers modeling different a priori assumptions to the motion field υ. Another possible approach to this problem is given by the continuity equation. Following the assumptionthat the total mass in the image data stays constant in time, that is Ωu(, t)dx = C t and, furthermore, that mass is moved by a continuous flow υ it can be shown that theflow satisfies the continuity equation ut + (υu). This equation, referring to the optical flow constraint often denoted as mass-preservation constraint, can be used as a data term in combination with each of the introduced regularizers.

7.6.2Higher dimensions

Nowadays, estimating the flow for higher dimensional image data is a current field of research. For example, microscopes record 3-dimensional data of moving cells and intracellular flows. Also in a medical context 3-dimensional motion estimation is of great importance, for example in modalities with canonical time resolution like PET and SPECT, or in MR imaging. It is for example possible to exactly map tumors of lung cancer patients despite the respiratory motion based on (3d+t)-CT data. Unfortunately, the aperture problem becomes even worse in higher dimensions. Having in two spatial dimensions one equation (the optical flow constraint) for the two unknowns υ1, υ2 of the velocity field, we have to deal with three unknowns for three spatial dimensions. This missing information has to be caught by the regularizer in the variational model. Consequently, a priori assumptions to the flow field have an even higher impact on the solution. For biological applications we would typically expect an incompressible flow (i.e., υ = 0), which can be used as an additional regularizer. For 3-dimensional computing of optical flow see for example [4].

7.6.3Joint models

Another interesting question in the context of motion estimation is robustness towards noise. It can be shown that already low levels of noise in the image data massively disturb the motion estimation process. Consequently, for many applications the image data gets denoised as a preprocessing step. An alternative approach is given by the joint motion estimation and image reconstruction models,which is even inevitable in the case of missing data, for example partly in space or time. For given image data f , we would typically minimize models of the following structure

where ?(u, f) connects the unknown image data u with f , (u) (e.g., total variation) and ?(υ) (e.g., total variation or smoothness) act as regularizers for image data u and velocity field υ. The most important last part ?(u, υ) (optical flow or mass preservation constraint) connects image data and underlying motion field. One general benefit of such models is that the underlying motion is directly used for the image reconstruction, hence can be thought of as a motion compensation technique. Conversely, the enhanced image data should generate more reasonable velocity fields (cf. [13])

7.6.4Large displacements

One main drawback of the previously introduced models is their limitation to flows of small magnitude. The reason for this lies in the optical flow constraint, where we assumed a small timestep δt in the modeling part. For large-scale flows this assumption is simply not fulfilled anymore. Moreover, large errors in the discretization of the image gradients ux, uy and u t are introduced when proceeding to longer timesteps.

A common approach to overcome these difficulties is to use a different linearization of the optical flow formulation u2(x + υ) u1(x) = 0 in combination with a multiscale strategy. The brightness constancy assumption can be linearized using an approximation υ0 to υ, yielding the following modified optical flow constraint:

u2(x + υ0) (υ υ0) + u2(x + υ0) u1(x) = 0 .

Acting as a data term in a variational model, approximations υ0 are obtained by performing the flow estimation on different downscaled versions of u1 and u2. Starting on a very coarse scale, an initial υ0 is calculated. This initial υ0 then acts as an approximation in the next finer scale. Following this so-called warping strategy, we finally arrive at the original image.

It should be considered that it might be useful to adapt the regularization parameter in every warping step. Using the same regularization parameter independent of the size of the grid enhances the significance of a small error in an image, for example caused by noise, in a finer grid compared to its significance in a coarser grid. By reducing the influence of the regularization whilst refining the grid circumvents this mismatch.

A requirement for these multiscale approaches is that smaller structures move in a similar way to the larger ones, since the movements found in the coarse grids serve as an initial guess for the movements in finer grids. If small structures move completely independent of large ones, the flow field will not be detected correctly.

Recent approaches overcome these problems by using the idea of descriptor matching for optical flow. Descriptor matching is based on finding identical features in different images and matching them. This way it is possible to find extremely large displacements of discrete regions in between two images. However, since it is not possible to match every single pixel, this approach alone does not yield absolutely precise results. In [9], Brox and Malik combine descriptor matching with a variational approach to be able to estimate large displacements as well as precise results. On the basis of this work, Weinzaepfel at al. propose an algorithm for combined descriptor matching and variational motion estimation that also allows for nonrigid movements [34]. In [29], Revaud et al. assume that edges in images match with edges in the flow field and incorporate an interpolation framework in the model of [34] to avoid error-propagation caused by a warping. For evaluating large-scale motion we may refer to the KITTI Vision Benchmark Suite [15, 21], which contains high-resolution real world datasets.

Acknowledgment: This work was supported by ERC via Grant EU FP 7 ERC Consolidator Grant 615216 LifeInverse and by the German Science Foundation DFG via BU 2327/6-1 and EXC 1003 Cells in Motion Cluster of Excellence, Münster, Germany.

Bibliography

[1]G. Aubert and P. Kornprobst. Mathematical problems in image processing: partial differential equations and the calculus of variations, volume 147. Springer, 2006.

[2]S. Baker, D. Scharstein, J. P. Lewis, S. Roth, M. J. Black, and R. Szeliski. A database and evaluation methodology for optical flow. International Journal of Computer Vision, 92(1):131, 2011.

[3]J. L. Barron, D. J. Fleet, and S. S. Beauchemin. Performance of optical flow techniques. International journal of computer vision, 12(1):4377, 1994.

[4]J. L. Barron and N. A. Thacker. Tutorial: Computing 2d and 3d optical flow. Imaging Science andBiomedical Engineering Division, Medical School, University of Manchester, 2005.

[5]F. Becker, S. Petra, and C. Schnörr. Optical flow. Handbook of Mathematical Methods in Imaging, pages 19452004, 2015.

[6]M. Benning, C. Brune, M. Burger, and J. Müller. Higher-order tv methods - enhancement via bregman iteration. Journal of Scientific Computing, 54(23):269310, 2013.

[7]K. Bredies, K. Kunisch, and T. Pock. Total generalized variation. SIAM Journal on Imaging Sciences, 3(3):492526, 2010.

[8]T. Brox, A. Bruhn, N. Papenberg, and J. Weickert. High accuracy optical flow estimation based on a theory for warping. In Computer Vision-ECCV 2004, pages 2536. Springer, 2004.

[9] T. Brox and J. Malik. Large displacement optical flow: descriptor matching in variational motion estimation. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 33(3):500513, 2011.

[10]M. Burger and S. Osher. Convergence rates of convex variational regularization. Inverse problems, 20(5):1411, 2004.

[11]M. Burger, K. Papafitsoros, E. Papoutsellis, and C.-B. Schönlieb. Infimal convolution regularisation functionals of bv and lp spaces. part i: The finite p case. arXiv preprint arXiv:1504.01956,2015.

[12]A. Chambolle and Th. Pock. A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision, 40(1):120145, 2011.

[13]H. Dirks. Variational Methods for Joint Motion Estimation and Image Reconstruction. PhD thesis, WWU Münster, 2015.

[14]D. J. Fleet and A. D. Jepson. Computation of component image velocity from local phase information. International Journal of Computer Vision, 5(1):77104, 1990.

[15]A. Geiger, P. Lenz, and R. Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. In Conference on Computer Vision and PatternRecognition (CVPR), 2012.

[16]W. Hinterberger, O. Scherzer, C. Schnörr, and J. Weickert. Analysis of optical flow models in the framework of the calculus of variations. Numerical Functional Analysis and Optimization, 23(1-2):6989, 2002.

[17]L. Hoeltgen. Bregman iteration for optical flow. Masters thesis, Saarland University, 2010.

[18]B. K. Horn and B. G. Schunck. Determining optical flow. In 1981 Technical Symposium East,pages 319331. International Society for Optics and Photonics, 1981.

[19]P. J. Huber et al. Robust estimation of a location parameter. The Annals of Mathematical Statistics, 35(1):73101, 1964.

[20]D. Lorenz. Existence of minimizers for the horn-schunck functional for optical flow, https://regularize.wordpress.com/2013/05/13/existence-of-minimizers-for-the-horn-schunck-functional-for-optical-flow/, 2015.

[21]M. Menze and A. Geiger. Object scene flow for autonomous vehicles. In Conference on Computer Vision and PatternRecognition (CVPR), 2015.

[22]Middlebury. http://vision.middlebury.edu/flow, 2015.

[23]S. Osher, M. Burger, D. Goldfarb, J. Xu, and W. Yin. An iterative regularization method for total variation-based image restoration. Multiscale Modeling & Simulation, 4(2):460489, 2005.

[24]M. Otte and H.-H. Nagel. Optical flow estimation: advances and comparisons. In Computer Vision ECCV94, pages 4960. Springer, 1994.

[25]J. Sánchez Pérez, E. Meinhardt-Llopis, and G. Facciolo. Tv-l1 optical flow estimation. Image Processing On Line, 2013:137150, 2013.

[26]Th. Pock and A. Chambolle. Diagonal preconditioning for first order primal-dual algorithms inconvex optimization. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages17621769. IEEE, 2011.

[27]R. Ranftl, K. Bredies, and Th. Pock. Non-local total generalized variation for optical flow estimation. In Computer VisionECCV 2014, pages 439454. Springer, 2014.

[28]R. Ranftl, S. Gehrig, Th. Pock, and H. Bischof. Pushing the limits of stereo using variational stereo estimation. In Intelligent Vehicles Symposium (IV), 2012 IEEE, pages 401407. IEEE, 2012.

[29]J. Revaud, Ph. Weinzaepfel, Z. Harchaoui, and C. Schmid. Epicflow: Edge-preserving interpolation of correspondences for optical flow. arXiv preprint arXiv:1501.02565, 2015.

[30]C. Schnörr. Determining optical flow for irregular domains by minimizing quadratic functionals of a certain class. International Journal of Computer Vision, 6(1):2538, 1991.

[31]S. Suhr, D. Tenbrinck, M. Burger, and J. Modersitzki. Registration of noisy images via maximum a-posteriori estimation. In Biomedical Image Registration, pages 231240. Springer, 2014.

[32]A. Verri and T. Poggio. Motion field and optical flow: Qualitative properties. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 11(5):490498, 1989.

[33]J. Weickert and C. Schnörr. Variational optic flow computation with a spatio-temporal smoothness constraint. Journal of Mathematical Imaging and Vision, 14(3):245255, 2001.

[34]P. Weinzaepfel, J. Revaud, Z. Harchaoui, and C. Schmid. Deepflow: Large displacement opticalflow with deep matching. In Computer Vision (ICCV), 2013 IEEE International Conference on, pages 13851392. IEEE, 2013.

[35]M. Werlberger, W. Trobin, Th. Pock, A. Wedel, D. Cremers, and H. Bischof. Anisotropic huber-l1 optical flow. BMVC, 1(2):3, 2009.

[36]C. Zach, Th. Pock, and H. Bischof. A duality based approach for realtime tv-l 1 optical flow. In Pattern Recognition, pages 214223. Springer, 2007.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.217.55.106