8.5. Similarity and Search

When the information from images is captured in a feature set, there are two possibilities for endowing them with meaning: one derives a unilateral interpretation from the feature set and one compares the feature set with the elements in a given data set on the basis of a similarity function.

8.5.1. Semantic interpretation

In content-based retrieval, it is useful to push the semantic interpretation of features derived from the image as far as possible.

Semantic features aim at encoding interpretations of the image that may be relevant to the application.

Of course, such interpretations are a subset of the possible interpretations of an image. To that end, consider a feature vector F derived from an image i. For given semantic interpretations z from the set of all interpretations Z, a strong semantic feature with interpretation zj would generate a P(z|F) = δ(z – zj). If the feature carries no semantics, it would generate a distribution P(z|F) = P(z) independent of the value of the feature. In practice, many feature types will generate a probability distribution that is neither a pulse nor independent of the feature value. This means that the feature value skews the interpretation of the image, but does not determine it completely.

Under the umbrella weak semantics, we collect the approaches that try to combine features in some semantically meaningful interpretation. Weak semantics aims at encoding in a simple and approximate way a subset of the possible interpretations of an image that are of interest in a given application. As an example, the system in [28] uses color features derived from Itten's color theory to encode the semantics associated with color contrast and harmony in art application.

In the MAVIS2-system [90], data are considered at four semantic levels and embodied in four layers called the raw media, the selection, the selection expression, and conceptual layers. Each layer encodes information at an increasingly symbolic level. Agents are trained to create links between features, feature signatures at the selection layer, interrelated signatures at the selection expression layer, and concept (expressed as textual labels) at the conceptual layer. In addition to the vertical connections, the two top layers have intralayer connections that measure the similarity between concepts at that semantic level and contribute to the determination of the similarity between elements at the lower semantic level.

8.5.2. Similarity between features

A different road to assign a meaning to an observed feature set is to compare a pair of observations by a similarity function. While searching for a query image iq(x) among the elements of the data set of images, id(x), knowledge of the domain is expressed by formulating a similarity measure Sq,d between the images q and d on the basis of some feature set. The similarity measure depends on the type of features.

At its best use, the similarity measure can be manipulated to represent different semantic contents; images are then grouped by similarity in such a way that close images are similar with respect to use and purpose. A common assumption is that the similarity between two feature vectors F can be expressed by a positive, monotonically nonincreasing function. This assumption is consistent with a class of psychological models of human similarity perception [152, 142] and requires that the feature space be metric. If the feature space is a vector space, d often is a simple Euclidean distance, although there is indication that more complex distance measures might be necessary [142]. This similarity model was well suited for early query by example systems in which images were ordered by similarity with one example.

A different view sees similarity as an essentially probabilistic concept. This view is rooted in the psychological literature [8], and in the context of content-based retrieval, it has been proposed, for example, in [116].

Measuring the distance between histograms has been an active line of research since the early years of content-based retrieval, where histograms can be seen as a set of ordered features. In content-based retrieval, histograms have mostly been used in conjunction with color features, but there is nothing against being used in texture or local geometric properties.

Various distance functions have been proposed. Some of these are general functions such as Euclidean distance and cosine distance. Others are specially designed for image retrieval such as histogram intersection [162]. The Minkowski-form distance for two vectors or histograms and with dimension n is given by:

Equation 8.1


The Euclidean distance between two vectors and is defined as follows:

Equation 8.2


The Euclidean distance is an instance of the Minkowski distance with k = 2.

The cosine distance compares the feature vectors of two images and returns the cosine of the angle between the two vectors:

Equation 8.3


where φ is the angle between the vectors and . When the two vectors have equal directions, the cosine will add to one. The angle φ can also be described as a function of and :

Equation 8.4


The cosine distance is well suited for features that are real vectors and not a collection of independent scalar features.

The histogram intersection distance compares two histograms and of n bins by taking the intersection of both histograms:

Equation 8.5


When considering images of different sizes, this distance function is not a metric due to . In order to become a valid distance metric, histograms need to be normalized first:

Equation 8.6


For normalized histograms (total sum of 1), the histogram intersection is given by

Equation 8.7


This is again the Minkowski-form distance metric with k = 1. Histogram intersection has the property that it allows for occlusion; that is, when an object in one image is partly occluded, the visible part still contributes to the similarity [60, 59].

Alternatively, histogram matching is defined by normalized cross correlation:

Equation 8.8


The normalized cross correlation has a maximum of unity that occurs if and only if exactly matches .

In the QBIC system [42], the weighted Euclidean distance has been used for the similarity of color histograms. In fact, the distance measure is based on the correlation between histograms and :

Equation 8.9


Further, A is a weight matrix with term aij expressing the perceptual distance between bin i and j.

The average color distance has been proposed by [70] to obtain a simpler, low-dimensional distance measure:

Equation 8.10


where kavg and lavg are 3 × 1 average color vectors of and .

As stated before, for broad domains, a proper similarity measure should be robust to object fragmentation, occlusion, and clutter by the presence of other objects in the view. In [58], various similarity functions were compared for color-based histogram matching. From these results, it is concluded that retrieval accuracy of similarity functions depends on the presence of object clutter in the scene. The histogram cross correlation provides best retrieval accuracy without any object clutter (narrow domain). This is because this similarity function is symmetric and can be interpreted as the number of pixels with the same values in the query image that can be found present in the retrieved image, and vice versa. This is a desirable property when one object per image is recorded without any object clutter. In the presence of object clutter (broad domain), highest image retrieval accuracy is provided by the quadratic similarity function (e.g., histogram intersection). This is because this similarity measure counts the number of similar hits and hence is insensitive to false positives.

Finally, the natural measure to compare ordered sets of accumulative features is nonparametric test statistics. They can be applied to the distributions of the coefficients of transforms to determine the likelihood that two samples derive from the same distribution [14, 131]. They can also be applied to compare the equality of two histograms and all variations thereof.

8.5.3. Similarity of object outlines

In [176], a good review is given of methods to compare shapes directly after segmentation into a set of object points t(x) without an intermediate description in terms of shape features.

For shape comparison, the authors distinguish between transforms, moments, deformation matching, scale space matching, and dissimilarity measurement. Difficulties for shape matching based on global transforms are the inexplicability of the result and the brittleness for small deviations. Moments, specifically their invariant combinations, have been frequently used in retrieval [94]. Matching a query and an object in the data file can be done along the ordered set of eigen shapes [145] or with elastic matching [34, 11]. Scale space matching is based on progressively simplifying the contour by smoothing [118]. By comparing the signature of annihilated zero crossings of the curvature, two shapes are matched in a scale-and rotation-invariant fashion. A discrete analogue can be found in [98], where points are removed from the digitized contour on the basis of perceptually motivated rules.

When based on a metric, dissimilarity measures render an ordered range of deviations suited for a predictable interpretation. In [176], an analysis is given for the Hausdorff and related metrics between two shapes on robustness and computational complexity. The directed Hausdorff metric is defined as the maximum distance between a point on query object q and its closest counterpart on d. The partial Hausdorff metric, defined as the k-th maximum rather than the absolute maximum, is used in [71] for affine-invariant retrieval.

8.5.4. Similarity of object arrangements

The result of a structural description is a hierarchically ordered set of feature values H. In this section, we consider the similarity of two structural or layout descriptions.

Many different techniques have been reported for the similarity of feature structures. In [180, 82], a Bayesian framework is developed for the matching of relational attributed graphs by discrete relaxation. This is applied to line patterns from aerial photographs.

A metric for the comparison of two topological arrangements of named parts, applied to medical images, is defined in [166]. The distance is derived from the number of editing steps needed to nullify the difference in the Voronoi diagrams of two images.

In [18], 2D strings describing spatial relationships between objects are discussed, and much later reviewed in [185]. From such topological relationships of image regions, in [79], a 2D indexing is built in trees of symbol strings, each representing the projection of a region on the coordinate axis. The distance between the Hq and Hd is the weighted number of editing operations required to transform one tree into the other. In [151], a graph is formed from the image on the basis of symmetry, as appears from the medial axis. Similarity is assessed in two stages via graph-based matching, followed by energy-deformation matching.

In [51], hierarchically ordered trees are compared for the purpose of retrieval by rewriting them into strings. A distance-based similarity measure establishes the similarity scores between corresponding leaves in the trees. At the level of trees, the total similarity score of corresponding branches is taken as the measure for tree and subtree similarity. From a small sized experiment, it is concluded that hierarchically ordered feature sets are more efficient than plain feature sets, with projected computational shortcuts for larger data sets.

8.5.5. Similarity of salient features

Salient features are used to capture the information in the image in a limited number of salient points. Similarity between images can then be checked in several ways.

In the first place, the color, texture, or local shape characteristics may be used to identify the salient points of the data as identical to the salient points of the query.

A measure of similarity between the feature values measured of the blobs resulting from weak segmentation consists of a Mahalanobis distance between the feature vector composed of the color, texture, position, area, eccentricity, and direction of the two ellipses [16].

In the second place, one can store all salient points from one image in a histogram on the basis of a few characteristics, such as color on the inside versus color on the outside. The similarity is then based on the groupwise presence of enough similar points [59]. The intersection model is used in image retrieval in [153] while keeping access to point location in the image by back-projection [162]. Further, a weight per dimension may favor the appearance of some salient features over another. See also [77] for a comparison with correlograms.

A third alternative for similarity of salient points is to concentrate only on the spatial relationships among the salient point sets. In point-by-point-based methods for shape comparison, shape similarity is studied in [89], where maximum curvature points on the contour and the length between them are used to characterize the object. To avoid the extensive computations, one can compute the algebraic invariants of point sets, known as the cross-ratio. Due to their invariant character, these measures tend to have only a limited discriminatory power among different objects. A more recent version for the similarity of nameless point sets is found in geometric hashing [182], where each triplet spans a base for the remaining points of the object. An unknown object is compared on each triplet to see whether enough similarly located points are found. Geometric hashing, though attractive in its concept, is too computationally expensive to be used on the very large data sets of image retrieval due to the anonymity of the points. Similarity of two points sets given in a row-wise matrix is discussed in [179].

8.5.6. Discussion

Whenever the image itself permits an obvious interpretation, the ideal content-based system should employ that information. A strong semantic interpretation occurs when a sign can be positively identified in the image. This is rarely the case due to the large variety of signs in a broad class of images and the enormity of the task to define a reliable detection algorithm for each of them. Weak semantics rely on inexact categorization induced by similarity measures, preferably online by interaction. The categorization may agree with semantic concepts of the user, but the agreement is in general imperfect. Therefore, the use of weak semantics is usually paired with the ability to gear the semantics of the user to his or her needs by interpretation. Tunable semantics is likely to receive more attention in the future, especially as data sets grow larger.

Similarity is an interpretation of the image based on the difference between it and another image. For each of the feature types, a different similarity measure is needed. For similarity between feature sets, special attention has gone to establishing similarity among histograms due to their computational efficiency and retrieval effectiveness.

Similarity of shape has received considerable attention in the context of object-based retrieval. Generally, global shape matching schemes break down when there is occlusion or clutter in the scene. Most global shape comparison methods implicitly require a frontal viewpoint against a clear enough background to achieve a sufficiently precise segmentation. With the recent inclusion of perceptually robust points in the shape of objects, an important step forward has been made.

Similarity of hierarchically ordered descriptions deserves considerable attention, as it is one mechanism to circumvent the problems with segmentation while maintaining some of the semantically meaningful relationships in the image. Part of the difficulty here is to provide matching of partial disturbances in the hierarchical order and the influence of sensor-related variances in the description.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.100.205