5.2. Related Work

Perceptual organization has been an active research area. Important issues include noise robustness, initialization requirements, handling of discontinuities, flexibility in the types that can be represented, and computational complexity. This section reviews related work, which can be classified in the following categories:

- regularization

- relaxation labeling

- computational geometry

- robust methods

- level set methods

- symbolic methods

- clustering

- methods based on local interactions

- methods inspired by psychophysiology and neuroscience

Regularization

Due to the projective nature of imaging, a single image can correspond to different scene configurations. This ambiguity in image formation makes the inverse problem, the inference of structures from images, ill-posed. To address ill-posed problems, constraints have to be imposed on the solution space. Within the regularization theory, this is achieved by selecting an appropriate objective function and optimizing it according to the constraints. Poggio, Torre, and Koch [51] present the application of regularization theory to computer vision problems. Terzopoulos [65] and Robert and Deriche [52] address the issue of preserving discontinuities while enforcing global smoothness in a regularization framework. A Bayesian formulation of the problem based on minimum description length is proposed by Leclerc [32]. Variational techniques are used by Horn and Schunk [22] for the estimation of optical flow, and by Morel and Solimini [43] for image segmentation. In both cases, the goal is to infer functions that optimize the selected criteria while preserving discontinuities.

Relaxation labeling

A different approach to vision problems is relaxation labeling. The problems are cast as the assignment of labels to the elements of the scene from a set of possible labels. Haralick and Shapiro define the consistent labeling problem in [17] and [18]. Labels that violate consistency according to predefined criteria are iteratively removed from the tokens until convergence. Faugeras and Berthod [8] describe a gradient optimization approach to relaxation labeling. A global criterion is defined that combines the concepts of ambiguity and consistency of the labeling process. Geman and Geman discuss how stochastic relaxation can be applied to the task of image restoration in [11]. MAP estimates are obtained by a Gibbs sampler and simulated annealing. Hummel and Zucker [24] develop an underlying theory for the continuous relaxation process. One result is the definition of an explicit function to maximize and guide the relaxation process, leading to a new relaxation operator. The second result is that finding a consistent labeling is equivalent to solving a variational inequality. This work was continued by Parent and Zucker [50] for the inference of trace points in 2D and by Sander and Zucker [54] in 3D.

Computational geometry

Techniques for inferring surfaces from 3D point clouds have been reported in the computer graphics literature. Boissonnat [2] proposes a technique based on computational geometry for object representation by triangulating point clouds in 3D. Hoppe et al. [21] infer surfaces from unorganized point clouds as the zero levels of a signed distance function from the unknown surface. The strength of their approach is that the surface model, topology, and boundaries need not be known a priori. Edelsbrunner and Mücke [7] introduce the 3D alpha shapes that are based on a 3D De-launay triangulation of the data. Szeliski, Tonnesen, and Terzopoulos [61] describe a method for modeling surfaces of arbitrary, or changing, topology using a set of oriented dynamic particles that interact according to distance, coplanarity, conormality and cocircularity. Computational geometry methods are limited by their sensitivity to even a very small number of outliers and their computational complexity.

Robust methods

In the presence of noise, robust techniques inspired by RANSAC (random sample consensus) [9] can be applied. Small random samples are selected from the noisy data and are used to derive model hypotheses, which are tested using the remainder of the data set. Hypotheses that are consistent with a large number of the data points are considered valid. Variants of RANSAC include RESC [74] and MIR [31], which are mainly used for segmentation of surfaces from noisy 3D point clouds. The extracted surfaces are limited to planar or quadratic, except for the approach in [33], which can extract high-order polynomial surfaces. Chen et al. [4] introduce robust statistical methods to computer vision. They show how robust M-estimators with auxiliary scale are more general than the other robust estimators previously used in the field. They also point out the difficulties caused by multiple structures in the data and propose a way to detect them. In all cases, an a priori parametric representation of the unknown structure is necessary, thus limiting the applicability of these methods.

Level set methods

The antipode of the explicit representation of surfaces by a set of points is the implicit representation in terms of a function. In [58], Sethian proposes a level set approach under which surfaces can be inferred as the zero-level isosurface of a multivariate implicit function. The technique allows for topological changes; thus it can reconstruct surfaces of any genus as well as nonmanifolds. Zhao, Osher, and Fedkiw [20] and Osher and Fedkiw [49] propose efficient ways of handling implicit surfaces as level sets of a function. A combination of points and elementary surfaces and curves can be provided as input to their technique, which can handle local changes locally as well as global deformations and topological changes. All the implicit surface-based approaches are iterative and require careful selection of the implicit function and initialization. The surface in explicit form, as a set of polygons, can be extracted by a technique such as the classic marching cubes algorithm [37]. The simultaneous representation of surfaces, curves, and junctions is impossible.

Symbolic methods

Following the paradigm set by Marr [39], many researchers developed methods for hierarchical grouping of symbolic data. Lowe [38] developed a system for 3D object recognition based on perceptual organization of image edgels. Groupings are selected among the numerous possibilities according to the Gestalt principles, viewpoint invariance, and low likelihood of being accidental formations. Mohan and Nevatia [41] and Dolan and Riseman [6] also propose perceptual organization approaches based on the Gestalt principles. Both are symbolic and operate in a hierarchical bottom-up fashion starting from edgels and increasing the level of abstraction at each iteration. The latter approach aims at inferring curvilinear structures, while the former aims at segmentation and extraction of 3D scene descriptions from collations of features that have high likelihood of being projections of scene objects. Along the same lines is Jacobs' [25] technique for inferring salient convex groups among clutter, since they most likely correspond to world objects. The criteria to determine the nonaccidentalness of the potential structures are convexity, proximity, and contrast of the edgels.

Clustering

A significant current trend in perceptual organization is clustering [26]. Data are represented as nodes of a graph, and the edges between them encode the likelihood that two nodes belong in the same partition of the graph. Clustering is achieved by cutting some of these edges in a way that optimizes global criteria. A landmark approach in the field was the introduction of normalized cuts by Shi and Malik [60]. They aim at maximizing the degree of dissimilarity between the partitions normalized by essentially the size of each partition in order to remove the bias for small clusters. Boykov, Veksler, and Zabih [3] use graph-cut based algorithms to approximately optimize energy functions whose explicit optimization is NP-hard. They demonstrate the validity of their approach on a number of computer vision problems. Stochastic clustering algorithms are developed by Cho and Meer [5] and Gdalyahu, Weinshall, and Werman [10]. A consensus of various clusterings of the data is used as a basis of the solution. Finally, Robles-Kelly and Hancock [53] present a perceptual grouping algorithm based on graph cuts and an iterative expectation maximization scheme, which improves the quality of results at the expense of increased computational complexity.

Methods based on local interactions

We now turn our attention to perceptual organization techniques that are based on local interaction between primitives. Shashua and Ullman [59] first addressed the issue of structural saliency and how prominent curves are formed from tokens that are not salient in isolation. They define a locally connected network that assigns a saliency value to every image location according to the length and smoothness of curvature of curves going through that location. In [50], Parent and Zucker infer trace points and their curvature based on spatial integration of local information. An important aspect of this method is its robustness to noise. This work was extended to surface inference in three dimensions by Sander and Zucker [54]. Sarkar and Boyer [55] employ a voting scheme to detect a hierarchy of tokens. Voting in parameter space has to be performed separately for each type of feature, thus making the computational complexity prohibitive for generalization to 3D. The inability of previous techniques to simultaneously handle surfaces, curves, and junctions was addressed in the precursor of our research in [15, 16]. A unified framework where all types of perceptual structures can be represented is proposed along with a preliminary version of the voting scheme presented here. The major advantages of the work of Guy and Medioni were noise robustness and computational efficiency, since it is not iterative. How this methodology evolved is presented in the remaining sections of this chapter.

Methods inspired by psychophysiology and neuroscience

Finally, there is an important class of perceptual organization methods that are inspired by human perception and research in psychophysiology and neuroscience. Gross-berg, Mingolla, and Todorovic [13, 14] developed the boundary contour system and the feature contour system that can group fragmented and even illusory edges to form closed boundaries and regions by feature cooperation in a neural network. Heitger and von der Heydt [19], in a classic paper on neural contour processing, claim that elementary curves are grouped into contours via convolution with a set of orientation selective kernels whose responses decay with distance and difference in orientation. Williams and Jacobs [72] introduce the stochastic completion fields for contour grouping. Their theory is probabilistic and models the contour from a source to a sink as the motion of a particle performing a random walk. Particles decay after every step, thus minimizing the likelihood of completions that are not supported by the data or between distant points. Li [36] presents a contour integration model based on excitatory and inhibitory cells and a top-down feedback loop. What is more relevant to our research, that focuses on the preattentive bottom-up process of perceptual grouping, is that connection strength decreases with distance and that zero or low curvature alternatives are preferred to high curvature ones. The model for contour extraction of Yen and Finkel [73] is based on psychophysical and physiological evidence that has many similarities to ours. It employs a voting mechanism where votes, whose strength decays as a Gaussian function of distance, are cast along the tangent of the osculating circle. An excellent review of perceptual grouping techniques based on cooperation and inhibition fields can be found in [44]. Even though we do not attempt to present a biologically plausible system, the similarities between our framework and the ones presented in this paragraph are nevertheless encouraging.

Relations with our approach

Some important aspects of our approach in the context of the work presented in this section are discussed here. In case of dense, noise-free, uniformly distributed data, we can match the performance of surface extraction methods such as [2, 7, 66, 58, 37]. Furthermore, our results degrade much more gracefully in the presence of noise (see, for example, [15] and [40]). The input can be oriented tokens, unoriented tokens, or a combination of both, while many of the techniques mentioned above require oriented inputs to proceed. In addition, we can extract open and closed forms of all types of structures simultaneously. Our model-free approach allows us to handle arbitrary perceptual structures that adhere just to the "matter is cohesive" principle [39] and not predefined models that restrict the admissible solutions. Model-based approaches cannot easily distinguish between model misfit and noise. Our voting function has many similarities with other voting-based methods, such as the decay with distance and curvature [19, 73, 36] and the use of constant curvature paths [50, 56, 55, 73] that result in an eight-shaped voting field (in 2D) [13, 14, 19, 73, 36]. The major difference is that in our case the votes cast are tensors and not scalars; therefore, they can express much richer information.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.35.229