List of Figures

1.1 Visualized example of a visual dictionary 10

2.1 Influences of different contextual and Mean Shift scales 23

2.2 Proposed descriptor for CASL detectors 24

2.3 Results of learning-based CASL detection 26

2.4 Examples from the UKBench retrieval benchmark database 28

2.5 Examples from the Caltech101 object categorization database 29

2.6 Repeatability comparison in detector repeatability sequence 32

2.7 CASL performance comparison in near-duplicated image retrieval 34

2.8 Categorization confusion matrix in 10 categories from Caltech101 (I) 36

2.9 Categorization confusion matrix in 10 categories from Caltech101 (II) 36

3.1 Visual word distribution in a 2-layer, 12-level vocabulary tree 43

3.2 Feature-Frequency statistics (the scale of each axis is given by Log-Log) 46

3.3 Hierarchical TF-IDF term weighting. Each hierarchial level is treated as a higher-level "visual word" 49

3.4 Hierarchical recognition chain by a vocabulary tree 50

3.5 Exemplar photos in SCity database 56

3.6 Vocabulary tree-based visual recognition model flowchart 56

3.7 Performance comparison using original vocabulary tree in UKBench 57

3.8 Performance comparison using DML-based vocabulary tree in UKBench 58

3.9 Performance comparison using original vocabulary tree in SCity 58

3.10 Performance comparison using DML-based vocabulary tree in SCity 59

3.11 Visual words distribution in 1-way, 12-layer dictionary in SCity 59

3.12 Visualized results of quantization error reduction 60

3.13 Precision and time comparison between hierarchical recognition chain (1-way) and GNP. (GNP number: 1-11) 60

3.14 Performance of hierarchical chain at different hierarchy levels 61

3.15 Performance of dictionary transfer learning from SCity to UKBench 61

3.16 Dictionary transfer performance from UKBench to SCity 62

3.17 Recognition model updating 62

3.18 Sequential indexing without trigger criteria 63

3.19 Time cost with/without trigger criteria 64

3.20 Incremental indexing with trigger criterion 65

4.1 Semantic embedding framework 68

4.2 Original patch set (partial) and its DDE filtering for "Face" 70

4.3 Semantic embedding by Markov Random Field 71

4.4 Ratios between inter-class distance and intra-class distance with and without semantic embedding 76

4.5 MAP comparisons between GSE and VT, GNP in Flickr 77

4.6 MAP with different embedding cases 78

4.7 Comparison with adaptive dictionary in Flickr 60,000 78

4.8 Confusion tables on PASCAL VOC 05 with comparison to Universal Vocabulary Confusion tables on PASCAL VOC 05 in comparison to universal dictionary 79

5.1 Exemplar illustrations of incorrect 2D neighborhood configurations of visual words, which are caused by either binding words with diverse depth, or binding words from both foreground and background objects, respectively 82

5.2 The proposed compact bag of patterns (CBoP) descriptor with application to low bit rate mobile visual search 83

5.3 Visualized examples about the point clouds for visual pattern candidate construction. Exemplar landmark locations are within Peking University 86

5.4 Case study of the mined patterns between the gravity-based pattern mining and the Euclidean distance-based pattern mining 89

5.5 The proposed low bit rate mobile visual search framework using CBoP descriptor. Different from previous works in near-duplicate visual search, we emphasize on extremely compact descriptor extraction directly on the mobile end. To achieve zero-latency query delivery, for each query, our CBoP descriptor is typically hundreds of bits. To the best of our knowledge, it is the most compact descriptor with comparable discriminability to the state-of-the-art visual descriptors [16, 112, 113, 125] 92

5.6 Exemplar local patches in the PhotoTourism dataset. Each patch is sampled as 64 × 64 gray scale with a canonical scale and orientation. For details of how the scale and orientation is established, please refer to [126]. These ground truth correspondences are collected from the structure-from-motion-based point cloud construction, with the back projection of the matched points 94

5.7 Exemplar photos collected from Flickr and Panoramio to build our 10M landmarks dataset 94

5.8 Example comparisons in the extreme mobile query scenarios including Occlusive query set, Background cluttered query set, Night query set, and Blurring and shaking query set in the PKUBench dataset 97

5.9 Compression rate and ranking distortion analysis with comparison to [112, 113, 125] using the ground truth query set 98

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.76.138