Conclusions

This book introduced the learning mechanism into the visual local representation and visual dictionary models, aiming to build a more effective computerized visual representation that is more similar to the human visual system. The book began from a learning-based semi-local interest-point detector, with subsequent optimal visual dictionary learning from both unsupervised and supervised perspectives, and ended up exploiting higher-order visual word combinations. During these procedures, the machine-learning mechanism was pervasively embedded into different chapters.

The first contribution of this book was a context-aware semi-local detector (CASL), which emphasized the use of the local interest-point context to build a more semantically and spatially aware semi-local detector, where both spatial context learning and semantic category learning is embedded in the detector construction. The first part of the CASL detector is built over traditional local feature extractions, based on which the spatial context of the local features within the target image is modeled with a multi-scale contextual Gaussian field. Subsequently, a kernalized mean shift procedure is deployed over the contextual field to detect the semi-local features, where the kernel is flexible enough to embed the category information (if available). The relationship between the proposed CASL interest-point detector and the visual saliency model was also revealed and discussed.

The second contribution of this book was an unsupervised way to optimize the visual dictionary, where two main issues were discovered and addressed: (1) The similarity metric bias in hierarchical quantization of local feature space; (2) the lack of adaptive capability for the visual dictionary among different datasets. To address the first issue, a density-based metric learning (DML) was proposed to rectify the biased similarity metric existing in hierarchical k-means clustering. To address the second issue, a vocabulary tree shift approach was presented, which not only addresses the dictionary adaption between different datasets, but also addresses the incremental indexing of dictionary within a dynamically changed dataset.

The third contribution of this book was a generative embedding framework for supervised visual dictionary learning. It proposed adopting the cheaply available, inaccurate Flickr labels to carry supervised dictionary learning, based on the main idea of a hidden Markov random field with the modeling of WordNet-based tag correlation. It was also shown that by simplifying the Markov operations in the proposed model, several widely-used visual dictionary models can be derived from our formulation.

The fourth contribution of this book was to further exploit the higher-order combination of visual words to further improve the effectiveness of visual search and recognition performances. To this end, a gravity-distance based co-location pattern mining approach was proposed, which was shown to outperform the traditional transaction-based co-location pattern mining and distance-based co-location pattern mining. The quantitative experiments also showed its superiority over the existing alternatives.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.21.104.183