8.6. Interaction and Learning

8.6.1. Interaction on a semantic level

In [78], knowledge-based type abstraction hierarchies are used to access image data based on context and a user profile, generated automatically from cluster analysis of the database. Also, in [19], the aim is to create a very large concept space inspired by the thesaurus-based search from the information retrieval community. In [117], a linguistic description of texture patch visual qualities is given, and ordered in a hierarchy of perceptual importance on the basis of extensive psychological experimentation.

A more general concept of similarity is needed for relevance feedback, in which similarity with respect to an ensemble of images is required. To that end, in [43], more complex relationships are presented between similarity and distance functions defining a weighted measure of two simpler similarities, S(s, S1, S2) = w1exp(–d(S1, s)) + w2exp(–d(S2, s)). The purpose of the bireferential measure is to find all regions that are similar to two specified query points, an idea that generalizes to similarity queries given multiple examples. The approach can be extended with the definition of a complete algebra of similarity measures with suitable composition operators [43, 38]. It is then possible to define operators corresponding to the disjunction, conjunction, and negation of similarity measures, much like traditional databases. The algebra is useful for the user to manipulate the similarity directly as a means to express characteristics in specific feature values.

8.6.2. Classification on a semantic level

To further enhance the performance of content-based retrieval systems, image classification has been proposed to group images into semantically meaningful classes [171, 172, 184, 188]. The advantage of these classification schemes is that simple, low-level image features can be used to express semantically meaningful classes. Image classification is based on unsupervised learning techniques such as clustering, self-organization maps (SOM) [188], and Markov models [184]. Further, supervised grouping can be applied. For example, vacation images have been classified based on a Bayesian framework into city vs. landscape by supervised learning techniques [171, 172]. However, these classification schemes are entirely based on pictorial information. Aside from image retrieval [44, 146], very little attention has been paid using both textual and pictorial information for classifying images on the Web. This is even more surprising given that images on Web pages are usually surrounded by text and discriminatory HTML tags such as IMG and the HTML fields SRC and ALT. Hence, WWW images have intrinsic annotation information induced by the HTML structure. Consequently, the set of images on the Web can be seen as an annotated image set.

8.6.3. Learning

As data sets grow and the processing power matches that growth, the opportunity arises to learn from experience. Rather than designing, implementing, and testing an algorithm to detect the visual characteristics for each different semantic term, the aim is to learn from the appearance of objects directly.

For a review on statistical pattern recognition, see [2]. In [174], a variety of techniques treating retrieval as a classification problem are discussed.

One approach is principal component analysis over a stack of images taken from the same class z of objects. This can be done in feature space [120] or at the level of the entire image; for example, faces as features in [115]. The analysis yields a set of eigenface images, capturing the common characteristics of a face without having a geometric model.

Effective ways to learn from partially labeled data have recently been introduced in [183, 32], both using the principle of transduction [173]. This saves the effort of labeling the entire data set, which is infeasible and unreliable as it grows.

In [169], a very large number of precomputed features is considered, of which a small subset is selected by boosting [2] to learn the image class.

An interesting technique to bridge the gap between textual and pictorial descriptions to exploit information at the level of documents is borrowed from information retrieval, called latent semantic indexing [146, 187]. First, a corpus of documents (in this case, images with a caption) is formed, from which features are computed. Then, by singular value decomposition, the dictionary covering the captions is correlated with the features derived from the pictures. The search is for hidden correlations of features and captions.

8.6.4. Discussion

Learning computational models for semantics is an interesting and relatively new approach. It gains attention quickly as the data sets and the machine power grow large. Learning opens up the possibility to an interpretation of the image without designing and testing a detector for each new notion. One such approach is appearance-based learning of the common characteristics of stacks of images from the same class. Appearance-based learning is suited for narrow domains. For the success of the learning approach, there is a tradeoff between standardizing the objects in the data set and the size of the data set. The more standardized the data are, the less data will be needed, but, on the other hand, the less broadly applicable the result will be. Interesting approaches to derive semantic classes from captions or a partially labeled or unlabeled data set have been presented recently.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.80.213