8
Semantic Correspondences

8.1 Introduction

The approaches presented so far for shape correspondence are geometry driven, focusing solely on the geometrical and topological similarities between shapes. However, in many situations, corresponding shape parts may significantly differ in geometry or even in topology. Take the 3D models shown in Figure 8.1. Any human can easily put in correspondence and match parts of the chairs of Figure 8.1a despite the fact that they differ significantly in geometry and topology. For instance, some chairs have one leg while others have four. In these examples, correspondence is simply beyond pure geometric analysis; it requires understanding the semantics of the shapes in order to reveal relations between geometrically dissimilar yet functionally or semantically equivalent shape parts.

Illustrations of partwise correspondences between 3D shapes in the presence of significant geometrical and topological variations. (a) Man-made 3D shapes that differ in geometry and topology. (b) A set of 3D models with significant shape differences.

Figure 8.1Partwise correspondences between 3D shapes in the presence of significant geometrical and topological variations. (a) Man‐made 3D shapes that differ in geometry and topology. The partwise correspondences are color coded. (b) A set of 3D models with significant shape differences, yet they share many semantic correspondences.

This fundamental problem of putting in correspondence 3D shapes that exhibit large geometrical and topological variations has been extensively studied in the literature, especially in the past five years 183, 203205. Beyond standard applications, such as blending and morphing, this problem is central to many modern 3D modeling tools. For instance, the modeling‐by‐example paradigm, introduced by Funkhouser et al. 206, enables the creation of new 3D models by pasting and editing 3D parts retrieved from collections of 3D models. Its various extensions 204, 207, which lead to smart 3D modeling techniques, create new 3D shape variations by swapping parts that are functionally equivalent but significantly different in geometry and topology.

In this chapter, we review some of the methods that incorporate prior knowledge in the process of finding semantic correspondences between 3D shapes. We will show how the problem can be formulated as the optimization of an energy function and compare the different variants of the formulation that have been proposed in the literature. We conclude the chapter with a discussion of the methods that are not covered in detail and outline some challenges to stimulate future research.

8.2 Mathematical Formulation

Let c08-i0001 and c08-i0002 be two input 3D objects, hereinafter referred to as the source and target shapes, respectively. The goal of correspondence is to find for each point on c08-i0003 its corresponding point(s) on c08-i0004, and for each point on c08-i0005 its corresponding point(s) on c08-i0006. This is not a straightforward problem for several reasons. First, this correspondence is often not one‐to‐one due to differences in the structure and topology of shapes. Consider the example of Figure 8.1a where a desk chair has one central leg, a table chair has four legs, and the bench, which although has four legs, their geometry and structure are completely different from those of the table chair. Clearly, the one leg on the source chair has four corresponding legs on the target chair. Second, even when the source and target shapes have the same structure, matching parts can have completely different topologies (e.g. the backrests of the seats in Figure 8.1a), and geometries (e.g. the heads in the objects of Figure 8.1b). Thus, descriptors, which are only based on geometry and topology, are not sufficient to capture the similarities and differences between such shapes.

One approach, which has been extensively used in the literature, is to consider this correspondence problem as a semantic labeling process. More formally, we seek to assign labels to each point on c08-i0007 and c08-i0008 such that points that are semantically similar are assigned the same label c08-i0009 chosen from a predefined set of possible (semantic) labels. At the end of the labeling process, points, faces, patches, or parts from both objects that have the same label are then considered to be in correspondence. This labeling provides also a segmentation of the input objects. If performed jointly, this segmentation will be consistent across 3D objects belonging to the same shape family.

In this section, we lay down the general formulation of this semantic labeling process, which has been often formulated as a problem of optimizing an energy function. The remaining sections will discuss the details such as the choice of the terms of the energy function and the mechanisms for optimizing the energy function.

We treat a 3D mesh c08-i0032 as a graph c08-i0033. Each node in c08-i0034 represents a surface patch. An edge c08-i0035 connects two nodes in c08-i0036 if their corresponding patches are spatially adjacent. In practice, a patch can be as simple as a single face, a set of adjacent faces grouped together, or a surface region obtained using some segmentation or mesh partitioning algorithms 208. Let c08-i0037 and c08-i0038. We represent each node c08-i0039 using a descriptor c08-i0040, which characterizes the shape of the surface patch represented by c08-i0041. We also characterize the relation between two adjacent nodes c08-i0042 and c08-i0043 using a descriptor c08-i0044. Let c08-i0045 denote a labeling of the nodes of c08-i0046 where c08-i0047 is the label assigned to node c08-i0048. Each c08-i0049 takes its value from a predefined set of c08-i0050 (semantic) labels c08-i0051. We seek to find the optimal labeling c08-i0052 of the graph c08-i0053, and subsequently the mesh c08-i0054. This can be formulated as a problem of minimizing, with respect to c08-i0055, an objective function of the form:

This objective function is the sum of two terms; a data term c08-i0056 and a smoothness term c08-i0057. The data term measures how likely it is that a given node c08-i0058 has a specific label c08-i0059 given its shape properties c08-i0060. The smoothness term c08-i0061 is used to impose some constraints between the labels c08-i0062 and c08-i0063 of the nodes c08-i0064 and c08-i0065, given the properties c08-i0066. It is also used to incorporate some prior knowledge in order to guide the labeling. Table 8.1 summarizes the different choices of these terms and the different combinations that have been used in the literature.

Now, solving the optimization problem of Eq. 8.1, for a given shape c08-i0067, will provide a segmentation and a labeling of c08-i0068. Often, however, we will be dealing with multiple shapes and we would like them to be labeled and segmented in a consistent way. Thus, instead of labeling, say a pair of, 3D models separately from each other, one can do it simultaneously in a process called joint labeling. The goal is to extract parts from the two (or more) input shapes while simultaneously considering their correspondence.

Let c08-i0069 and c08-i0070 be the source and the target shapes. We represent c08-i0071 with a graph c08-i0072 where the nodes c08-i0073 are surface patches and the edges in c08-i0074 connect pairs of adjacent patches. A similar graph c08-i0075 is also defined for the target shape c08-i0076. To perform joint semantic labeling, we define a new graph c08-i0077 where c08-i0078 and the connectivity c08-i0079 of the graph is given by two types of arcs, i.e. c08-i0080, where

  • The intramesh arcs c08-i0081 are simply given by c08-i0082.
  • The intermesh arcs c08-i0083 connect nodes in c08-i0084 to nodes in c08-i0085 based on the similarity of their shape descriptors.

Let also c08-i0086 where c08-i0087 is a labeling of c08-i0088, and c08-i0089 is a labeling of c08-i0090. With this setup, the joint semantic labeling process can be written as the problem of optimizing an energy function of the form:

The first two terms are the same as the data term and the smoothness term of Eq. 8.1. The third term is the intermesh term. It quantifies how likely it is that two nodes across the two meshes have a specific pair of labels, according to a pairwise cost.

Table 8.1 Potential functions used in the literature for semantic labeling.

References Data term c08-i0010 Smoothness term c08-i0011
Kalogerakis et al. 209 c08-i0012 c08-i0013
van Kaick et al. 203 c08-i0014 c08-i0015
with c08-i0016.
Sidi et al. 210 c08-i0017 c08-i0018, and c08-i0019.
with c08-i0020.

c08-i0021 is the label compatibility term, c08-i0022 is the dihedral angle between the nodes c08-i0023 and c08-i0024, c08-i0025 is the distance between c08-i0026 and c08-i0027, c08-i0028 is the area of the surface patch corresponding to the c08-i0029th node, and c08-i0030, and c08-i0031 are some constants.

Implementing the labeling models of Eqs. 8.1 and 8.2 requires:

  • Constructing the graph c08-i0091 and characterizing, using some unary and binary descriptors, the geometry of each of (i) its nodes, which represent the surface patches, and (ii) its edges, which represent the spatial relations between adjacent patches (Section 8.3).
  • Defining the terms of the energy functions and setting their parameters (Section 8.4).
  • Solving the optimization problem using some numerical algorithms (Section 8.5).

The subsequent sections describe these aspects in detail. In what follows, we consider that the input shapes have been normalized for translation and scale using any of the normalization methods described in Chapter 2.

8.3 Graph Representation

There are several ways of constructing the intramesh graph c08-i0092 for a given input shape c08-i0093. The simplest way is to treat each face of c08-i0094 as a node. Two nodes will be connected with an edge if their corresponding faces on c08-i0095 are adjacent. When dealing with large meshes, this method produces large graphs and thus the subsequent steps of the analysis will be computationally very expensive. Another solution is to follow the same concept as superpixels for image analysis 211. For instance, one can decompose a 3D shape into patches. Each patch will be represented with a node in c08-i0096. Two nodes will be connected with an edge if their corresponding patches are spatially adjacent on c08-i0097. Figure 8.2 shows an example of a 3D object decomposed into disjoint patches.

Illustration of a surface represented as a graph of nodes and edges. First, the surface is decomposed into patches. Each patch is represented with a node. Adjacent patches are connected with an edge. The geometry of each node vi is represented with a unary descriptor xi. The geometric relation between two adjacent nodes vi and vj is represented with a binary descriptor xij .

Figure 8.2An example of a surface represented as a graph of nodes and edges. First, the surface is decomposed into patches. Each patch is represented with a node. Adjacent patches are connected with an edge. The geometry of each node c08-i0098 is represented with a unary descriptor c08-i0099. The geometric relation between two adjacent nodes c08-i0100 and c08-i0101 is represented with a binary descriptor c08-i0102.

When jointly labeling a pair of 3D models c08-i0103 and c08-i0104, one can incorporate additional intermesh edges to link nodes in the graph of c08-i0105 to nodes in the graph of c08-i0106. This will require pairing, or putting in correspondence, patches across c08-i0107 and c08-i0108.

8.3.1 Characterizing the Local Geometry and the Spatial Relations

The next step is to compute a set of unary descriptors c08-i0109, which characterize the local geometry of the graph nodes c08-i0110, and a set of pairwise descriptors c08-i0111 that characterize the spatial relation between adjacent nodes c08-i0112 and c08-i0113.

8.3.1.1 Unary Descriptors

The descriptors c08-i0114 at each node c08-i0115 characterize the local geometry of the patch represented by c08-i0116. They provide cues for patch labeling. In principle, one can use any of the local descriptors described in Chapter 5. Examples include:

  • The shape diameter function (SDF) 212, which gives an estimate of the thickness of the shape at the center of the patch.
  • The area of the patch normalized by the total shape area.
  • The overall geometry of the patch, which is defined as:
    equation
    where c08-i0117 are the three eigenvalues obtained when applying principal component analysis to all the points on the surface of the patch.

Other descriptors that can be incorporated include the principal curvatures within the patch, the average of the geodesic distances 213, the shape contexts 135, and the spin images 95.

8.3.1.2 Binary Descriptors

In addition to the unary descriptors, one can define for each pair of adjacent patches c08-i0118 and c08-i0119, a vector of pairwise features c08-i0120, that provides cues to whether adjacent nodes should have the same label 209. This is used to model the fact that two adjacent patches on the surface of a 3D object are likely to have similar labels. Examples of such pairwise features include:

  • The dihedral angles c08-i0121, i.e. the angle between the normal vectors of adjacent patches. The normal vectors can be computed at the center of each patch or as the average of the normal vectors of the patch faces.
  • The distance c08-i0122 between the centers of the patches c08-i0123 and c08-i0124. This distance can be Euclidean or geodesic.

These pairwise descriptors are computed for each pair of adjacent nodes c08-i0125 and c08-i0126 that are connected with an edge c08-i0127.

8.3.2 Cross Mesh Pairing of Patches

The intermesh edges c08-i0128, which connect a few nodes in c08-i0129 to nodes in c08-i0130, can be obtained in different ways. For instance, they can be manually provided, e.g. in the form of constraints or prior knowledge. They can be also determined by pairing nodes that have similar geometric descriptors. This is, however, prone to errors since a 3D shape may have several parts that are semantically different but geometrically very similar (e.g. the legs and the tail of a quadruple animal). Since the purpose of these intermesh edges is to guide the joint labeling to ensure consistency, only a few reliable intermesh edges is usually sufficient. These can be efficiently computed in a supervised manner if training data are available.

Let c08-i0131 and c08-i0132 be the graphs of two training shapes c08-i0133 and c08-i0134. Let also c08-i0135 be a set of matching pairs of nodes c08-i0136, where c08-i0137 and c08-i0138. We assign to each pair of nodes c08-i0139 in c08-i0140 a label 1 if the pairing is correct, i.e. the node c08-i0141 is a true correspondence to the node c08-i0142, and 0 otherwise. The set of nodes in c08-i0143 and their associated labels can then be used as training samples for learning a binary classifier, which takes a pair of nodes and assigns them the label 1 if the pair is a correct correspondence, and the label 0 otherwise. This is a standard binary classification problem, which, in principle, can be solved using any type of supervised learning algorithm, e.g. Support Vector Machines 214 or AdaBoost 215, 216. van Kaick et al. 203, for example, used GentleBoost 215, 216 to learn this classifier. GentleBoost, just like AdaBoost, is an ensemble classifier, which combines the output of weak classifiers to build a strong classifier. Its advantage is the fact that one can use a large set of descriptor types to characterize the geometry of each node. The algorithm learns from the training examples the descriptors that provide the best performance. It also assigns a confidence, or a weight, to each of the selected descriptors.

The main difficulty with a training‐based approach is in building the list of pairs c08-i0144 and their associated labels. Since these are used only during the training stage, one could do it manually. Alternatively, van Kaick et al. 203 suggested an automatic approach, which operates as follows:

  • First, for a given pair of shapes represented with their graphs c08-i0145 and c08-i0146, collect all their nodes and compute c08-i0147 types of geometric descriptors on each of their nodes. These can be any of the unary descriptors described in Section 8.3.1.
  • Next, for each node in c08-i0148, select the first c08-i0149 pairwise assignments with the highest similarity and add them to c08-i0150. The similarity is given by the inverse of the Euclidean distance between the descriptors of the two nodes.
  • Finally, build a matrix c08-i0151 of c08-i0152 rows and c08-i0153 columns where each entry c08-i0154 stores the dissimilarity between the c08-i0155th pair of nodes in c08-i0156 but computed using the c08-i0157th descriptor type.

The matrix c08-i0158 will be used as input to the GentleBoost‐based training procedure, which will learn a strong classifier as a linear combination of weak classifiers, each one is based on one descriptor type. The outcome of the training is a classifier that can label a candidate assignment with true or false. At run time, given a pair of query shapes, select candidate pairs of nodes with the same procedure used during the training stage. Then, using the learned classifier, classify each of these pairs into either a valid or invalid intermesh edge. This procedure produces a small, yet reliable, set of intermesh edges.

8.4 Energy Functions for Semantic Labeling

The energy to be minimized by the labeling process, i.e. Eqs. 8.1 and 8.2, is composed of two types of terms: the unary (or data) term and the pairwise (or smoothness) term. The data term takes into consideration how likely it is that a given node has a specific label. The smoothness term considers the intramesh (Eq. 8.1) and intermesh (Eq. 8.2) connectivity between nodes. It quantifies, according to a pairwise cost, how likely it is that two neighboring nodes (i.e. two nodes which are connected with an edge) have a specific pair of labels. Below, we discuss these two terms in more detail. We also refer the reader to Table 8.1.

8.4.1 The Data Term

The data term is defined as the negative log likelihood that a node c08-i0159 is part of a segment labeled c08-i0160:

where c08-i0161 is the area of the surface patch corresponding to the c08-i0162th node. This likelihood can be also interpreted as a classifier, which takes as input the feature vector c08-i0163 of a node c08-i0164, and returns c08-i0165, the probability distribution of labels for that node. This classifier, which provides a probabilistic labeling of the nodes on the query shapes, can be learned either in a supervised or in a nonsupervised manner, see Section 8.5. It provides a good initial labeling since it captures very well the part interiors but not the part boundaries. This initial labeling can be also interpreted as an initial segmentation where the set of nodes with the same label form a segment.

8.4.2 Smoothness Terms

To refine the result of the probabilistic labeling of Eq. 8.3, one can add other terms to the energy function to impose additional constraints and to incorporate some a‐priori knowledge about the desired labeling. The main role of these terms is to improve the quality of the boundaries between the segments and to prevent incompatible segments from being adjacent. Examples of such constraints include:

8.4.2.1 Smoothness Constraints

In general, adjacent nodes in a 3D model are likely to belong to the same segment. Thus, they should be assigned the same label. The smoothness can then have the following form:

(8.4)equation

Here, c08-i0166 is a positive constant, which can be interpreted as the cost of having two adjacent nodes labeled differently. For instance, two adjacent nodes are likely to have the same label if their dihedral angle is close to zero, i.e. the patches are coplanar. They are likely to have different labels if they are far apart (large c08-i0167) or their corresponding dihedral angle is large. Using these observations, Sidi et al. 210 define c08-i0168 as:

(8.5)equation

Here, c08-i0169 where c08-i0170 is the distance between the centers of the patches c08-i0171 and c08-i0172, and c08-i0173, also known as the dihedral angle, is the angle between the normals to the two patches at their centers.

8.4.2.2 Geometric Compatibility

In addition to the smoothness constraints, one can incorporate a geometry‐dependent term c08-i0174, which measures the likelihood of there being a difference in the labels between two adjacent nodes as a function of their geometry. For instance, one can penalize boundaries in flat regions and favor boundaries between the nodes that have a high dihedral angle (i.e. concave regions). One can also use it to penalize the boundary length, which will help in preventing jaggy boundaries and in removing small, isolated segments. van Kaick et al. 203 define this term as:

where c08-i0175 and c08-i0176 are some real weights that control the importance of each of the two geometric properties (i.e. c08-i0177, which is the distance between the centers of the patches c08-i0178 and c08-i0179, and the dihedral angle c08-i0180).

Kalogerakis et al. 209 used a more complex term, which is defined as follows;

The first term, c08-i0181, measures the probability of two adjacent nodes having distinct labels as a function of their pairwise geometric features c08-i0182. These consist of the dihedral angles between the nodes c08-i0183 and c08-i0184, and the differences of the various local features computed at the nodes, e.g. the differences of the surface curvatures and the surface third‐derivatives, the shape diameter differences, and the differences of distances from the medial surface points. The c08-i0185 term of Eq. 8.7 penalizes boundaries between the faces with a high exterior dihedral angle c08-i0186, following Shapira et al. 217. c08-i0187 and c08-i0188 are small constants, which are set to c08-i0189. They are added to avoid computing c08-i0190 and to avoid numerical instabilities 217, 218.

Note that there are similarities between Eqs. 8.6 and 8.7. For instance, the second term of both equations is defined in terms of the dihedral angle c08-i0191. For the first term, although in both equations it is defined using the distance c08-i0192, Eq. 8.7 uses, in addition to the dihedral angle, more pairwise features and is defined as the log probability of two adjacent nodes having different labels given their pairwise geometric features. It is more general since any type of probability can be used. However, these probabilities should be learned, in a supervised manner, from training data. This will be discussed in Section 8.5.

8.4.2.3 Label Compatibility

Finally, one can impose penalties to some combinations of labeling. For instance, when labeling human body shapes, one can never find two adjacent nodes that are labeled head and torso, respectively. This label‐compatibility constraint can be represented as an c08-i0193 symmetric matrix c08-i0194 of penalties for each possible pair of labels (c08-i0195 here is the number of possible semantic labels). Note that, c08-i0196 for all c08-i0197 since there is no penalty when there is no discontinuity. The entries of the matrix c08-i0198 can be either manually specified or learned in a supervised manner from training data, see Section 8.5.

8.4.3 The Intermesh Term

Finally, the intermesh term c08-i0199 is given by

where c08-i0200 is a node on the graph of the source mesh, c08-i0201 is a node on the graph of the target mesh, and c08-i0202 is the label compatibility defined in Section 8.4.2. c08-i0203 is the confidence that the assignment between nodes c08-i0204 and c08-i0205 is correct. It is computed using the procedure described in Section 8.3.2. The higher the confidence value attached to the assignment is, the more the cost is increased if the labels are different. The parameter c08-i0206 regulates the influence of the intermesh term to the total energy.

8.5 Semantic Labeling

Before we discuss the algorithms that can be used to optimize the energy function of Eqs. 8.1 and 8.2, we will first discuss how to set the different terms. In particular, both the data and smoothness terms of the labeling energy require the probability that a node c08-i0207 is part of a segment labeled c08-i0208 (Eq. 8.3), and the probability that two adjacent nodes have distinct labels (Eq. 8.7). These two distributions can be learned if some training data is available.

Let us assume that for each semantic label c08-i0209 we have c08-i0210 training nodes coming from different models. By computing the descriptors of each node, we obtain a collection of descriptors, c08-i0211, for each label c08-i0212. These training nodes can be obtained either manually, by labeling some training samples, or automatically, in a nonsupervised manner, by using some unsupervised clustering techniques. We will first look at the latter case, i.e. how to build the training data (i.e. a set of nodes and their corresponding labels) using unsupervised clustering. We will then detail how the probability distributions of Eqs. 8.3 and 8.7 are learned independently of how the training data are obtained.

8.5.1 Unsupervised Clustering

One approach to automatically generate training samples for learning the conditional probability of Eq. 8.3 is by unsupervised clustering in the descriptor space. Let us start with a collection of 3D shapes, each shape is represented with their graphs of nodes. The first step is to pre‐segment, in a nonsupervised manner, each shape by clustering its nodes based on their distance in the descriptor space. Second, all the segments of all the shapes are put together into a single bag. They are then clustered into groups of segments based on their geometric similarities. Each cluster obtained with this procedure corresponds to one semantic component (e.g. legs, handles, etc.) for which we assign a semantic label. Finally, since each cluster contains multiple instances of the semantic class with rich geometric variability, one can learn the statistics of that class. In particular, one can learn c08-i0213. This process is summarized in Figure 8.3. Below, we describe in detail each of these steps.

Illustrations of unsupervised learning of the statistics of the semantic labels. (a) Training set. (b) Presegmentation of individual shapes. (c) Embedding in the descriptor space. (d) Clustering. (e) Statistical model per semantic label.

Figure 8.3Unsupervised learning of the statistics of the semantic labels. (a) Training set. (b) Presegmentation of individual shapes. (c) Embedding in the descriptor space. (d) Clustering. (e) Statistical model per semantic label.

  • Step 1 – Presegmentation of individual shapes. After computing a shape descriptor c08-i0214 for each node c08-i0215 of a given 3D model, one can obtain an initial segmentation of the input 3D shape by clustering in the descriptor space all the nodes of the 3D shape using unsupervised algorithms such as k‐means or mean‐shift 219. Both algorithms require the specification of one parameter; k‐means requires setting in advance the number of clusters. Mean‐shift, on the other hand, operates by finding the modes (i.e. the local maxima of the density or the cluster centers) of points in the descriptor space. The advantage of using mean‐shift lies in that the number of clusters does not have to be known in advance. Instead, the algorithm requires an estimation of the bandwidth or the radius of support around the neighborhood of a point. This can be estimated from the data. For instance, Shamir et al. 220 and Sidi et al. 210 fix the bandwidth to a percentage of the range of each descriptor.

    This procedure produces a pre‐segmentation of each individual shape into a set of clusters. The last step is to break spatially disconnected clusters into separate segments.

  • Step 2 – Cosegmentation. Next, to produce the final semantic clusters, which will be used to learn the probability distribution of each semantic label, we cluster, in an unsupervised manner, all the segments produced in Step 1. These segments are first embedded into a common space via diffusion map. The embedding translates the similarity between segments into spatial proximity so that a clustering method based on the Euclidean distance will be able to find meaningful clusters in this space.

    Let c08-i0216 be the set of segments obtained for all shapes and let c08-i0217 be the dissimilarity between two segments c08-i0218 and c08-i0219. First, we construct an affinity matrix c08-i0220 such that each of its entries c08-i0221 is given as:

    (8.9)equation

    The parameter c08-i0222 can be compute either manually or automatically from the data by taking the standard deviation of all possible pairwise distances c08-i0223.

    Now, let c08-i0224 be a diagonal matrix such that c08-i0225 and c08-i0226 for c08-i0227. We define the matrix c08-i0228, which can be seen as a stochastic matrix with c08-i0229 being the probability of a transition from the segment c08-i0230 to the segment c08-i0231 in one time step. The transition probability can be interpreted as the strength of the connection between the two segments.

    Next, we compute the eigendecomposition of the matrix c08-i0232 and obtain the eigenvalues c08-i0233 and their corresponding eigenvectors c08-i0234. The diffusion map at time c08-i0235 is given by:

    (8.10)equation

    Here, c08-i0236 refers to the c08-i0237‐th entry of the eigenvector c08-i0238, and c08-i0239 refers to c08-i0240 power c08-i0241 (Sidi et al. 210 used c08-i0242). The point c08-i0243 defines the coordinates of the segment c08-i0244 in the embedding space. In practice, one does not need to use all the eigenvectors. The first leading ones (e.g. the first three) are often sufficient. Note that the eigenvector c08-i0245 is constant and is thus discarded.

Finally, a cosegmentation is obtained by clustering the segments in the embedded space. Here, basically, any clustering method, e.g. k‐means or fuzzy clustering, can be used. However, the number of clusters, which corresponds to the number of semantic parts (e.g. base, body, handle, or neck of a vase) that constitute the shapes, is usually manually specified by the user.

The output of this procedure is a set of node clusters where each cluster corresponds to one semantic category (e.g. base, body, handle, or neck of a vase) to which we assign a label c08-i0246.

8.5.2 Learning the Labeling Likelihood

The next step is to learn a statistical model of each semantic cluster. For each semantic cluster labeled c08-i0247, we collect into a set c08-i0248 the descriptor values for all of the nodes (surface patches) which form the cluster. One can then learn a conditional probability function c08-i0249 using the descriptors in c08-i0250 as positive examples and the remaining descriptors (i.e. the descriptors of the nodes whose labels are different from c08-i0251) as negative ones. This conditional probability defines the likelihood of assigning a certain label for an observed vector of descriptors.

To learn this conditional probability distribution, van Kaick et al. 203 first use GentleBoost algorithm 215, 216 to train one classifier per semantic label. Each classifier returns the confidence value c08-i0252, i.e. the confidence of c08-i0253 belonging to the class whose label is c08-i0254. These unnormalized confidence values returned by each classifier are then transformed into probabilities with the softmax activation function as follows:

For a given 3D shape, which is unseen during the training phase, this statistical model is used to estimate the probability that each of its nodes c08-i0255 belongs to each of the classes in c08-i0256.

8.5.2.1 GentleBoost Classifier

Here, we briefly summarize the GentleBoost classifier and refer the reader to 216 for more details. GentleBoost takes an input feature c08-i0257 and outputs a probability c08-i0258 for each possible label c08-i0259 is c08-i0260. It is composed of decision stumps. A decision stump of parameters c08-i0261 is a very simple classifier that returns a score c08-i0262 of each possible class label c08-i0263, given a feature vector c08-i0264, based only on thresholding its c08-i0265th entry c08-i0266. It proceeds as follows:

  • First, a decision stump stores a set of classes c08-i0267.
  • If c08-i0268, then the stump compares c08-i0269 against a threshold c08-i0270, and returns a constant c08-i0271 if c08-i0272, and another constant c08-i0273 otherwise.
  • If c08-i0274, then a constant c08-i0275 is returned. There is one c08-i0276 for each c08-i0277.

This can be written mathematically as follows;

The parameters c08-i0278 of a single decision stump are c08-i0279, c08-i0280 (the entry of c08-i0281 which is thresholded), the set c08-i0282, and c08-i0283 for each c08-i0284. The probability of a given class c08-i0285 is computed by summing the decision stumps:

where c08-i0286 is the c08-i0287th decision stump and c08-i0288 are its parameters. The unnormalized confidence value returned by the classifier c08-i0289 for a given feature vector c08-i0290 is then given by:

(8.14)equation

Finally, the probability of c08-i0291 belonging to the class of label c08-i0292 is computed using the softmax transformation of Eq. 8.11.

8.5.2.2 Training GentleBoost Classifiers

The goal is to learn, from training data, the appropriate number of decision stumps in Eq. 8.13 and the parameters c08-i0293 of each decision stump c08-i0294. The input to the training algorithm is a collection of c08-i0295 pairs c08-i0296, where c08-i0297 is a feature vector and c08-i0298 is its corresponding label value. Furthermore, each training sample c08-i0299 is assigned a per‐class weight c08-i0300. These depend on whether we are learning the unary term of Eq. 8.3 or the elements of the binary term of Eq. 8.7;

  • For the unary term of Eq. 8.3, the training pairs are the per‐node feature vectors and their labels, i.e. c08-i0301, for all the graph nodes in the training set. The weight c08-i0302 is set to be the area of the c08-i0303th node.
  • For the pairwise term of Eq. 8.7, the training pairs are the pairwise feature vectors (see Section 8.3.1) and their binary labels. That is c08-i0304. The weight c08-i0305 is used to reweight the boundary edges, since the training data contain many more nonboundary edges. Let c08-i0306 and c08-i0307 be, respectively, the number of boundary edges and the number of nonboundary edges. Then, c08-i0308 for boundary edges and c08-i0309 for nonboundary edges, where c08-i0310 is the corresponding edge length.

GentleBoost learns the appropriate number of decisions stumps and the parameters of each decision stump which minimize the weighted multiclass exponential loss over the training set:

(8.15)equation

where c08-i0311 is given by Eq. 8.13, and c08-i0312 is an indicator function. It takes the value of 1 when c08-i0313 and the value c08-i0314 otherwise. The training is performed iteratively as follows:

  1. Set c08-i0315.
  2. At each iteration, add one decision stump (Eq. 8.12) to the classifier.
  3. Compute the parameters c08-i0316 of the newly added decision stump c08-i0317 by optimizing the following weighted least‐squares objective:
    (8.16)equation
  4. Update the weights c08-i0318 as follows;
    (8.17)equation
  5. Repeat from Step 2. The algorithms stops after a few hundred iterations.

Following Torralba et al. 216, the optimal c08-i0319 are computed in closed‐form, and c08-i0320 are computed by brute‐force. When the number of labels is greater than 6, the greedy heuristic search is used for c08-i0321.

Belonging to the family of boosting algorithms, GentleBoost has many appealing properties: it performs automatic feature selection and can handle large numbers of input features for multiclass classification. It has a fast learning algorithm. It is designed to share features among classes, which greatly reduces the generalization error for multiclass recognition when classes overlap in the feature space.

8.5.3 Learning the Remaining Parameters

The remaining parameters, c08-i0322 (Eqs. 8.6 and 8.7), c08-i0323 (the label compatibility, see the second row of Table 8.1), and c08-i0324 (Eq. 8.8), can be learned by performing a grid search over a specified range. Given a training dataset, and for each possible value of these parameters, one can evaluate the classification accuracy and select those which give the best score. One possible measure of the classification accuracy is the Segment‐Weighted Error of Kalogerakis et al. 209, which weighs each segment equally:

(8.18)equation

where c08-i0325 is the area of the c08-i0326th node, c08-i0327 is the ground‐truth label for the c08-i0328th node, c08-i0329 is the output of the classifier for the c08-i0330th node, c08-i0331 is the total area of all nodes (patches) within the segment c08-i0332 that has ground‐truth label c08-i0333, and c08-i0334 if c08-i0335, and zero otherwise.

To improve the computation efficiency, these parameters can be optimized in two steps. First, the error is minimized over a coarse grid in the parameter space by brute‐force search. Second, starting from the minimal point in the grid, the optimization continues using the MATLAB implementation of Preconditioned Conjugate Gradient with numerically‐estimated gradients 209.

8.5.4 Optimization Using Graph Cuts

Now that we have defined all the ingredients, the last step is to solve the optimizations of Eqs. 8.1 and 8.2. One can use for this purpose multilabel graph‐cuts to assign labels to the nodes in an optimal manner. In general, graph‐cuts algorithms require the pairwise term to be a metric. This is not necessarily the case in our setup. Thus, one can use the c08-i0336 swap algorithm 221. By minimizing the given energy, we obtain the most likely label for each graph node (and thus surface patch) while also avoiding the creation of small disconnected segments. Algorithm 8.1 summarizes the graph cut's c08-i0337 swap algorithm. We refer the reader to the paper of Boykov et al. 221 for a detailed discussion of the algorithm.

image

8.6 Examples

In this section, we show a few semantic labeling and correspondence examples obtained using the techniques discussed in this chapter. In particular, we will discuss the importance of each of the energy terms of Eqs. 8.1 and 8.2.

First, Figure 8.4 shows that c08-i0338, and subsequently the data term of Eqs. 8.1 and 8.2, captures the main semantic regions of a 3D shape but fails to accurately capture the boundaries of the semantic regions. By adding the intramesh term, the boundaries become smoother and well localized (Figure 8.5).

Illustration of the effects of the unary term P(cjx) on the quality of the segmentation. The most likely labels of the head, neck, torso, leg, tail, and ear are also presented.

Figure 8.4Effects of the unary term c08-i0339 on the quality of the segmentation.

Source: The image has been adapted from the powerpoint presentation of Kalogerakis et al. 209 with the permission of the author.

Illustration of semantic labeling by using (a) only the unary term and (b) the unary and binary terms of Equation (8.1). The result in (b) is obtained using the approach, using the unary and binary terms in Table 8.1.

Figure 8.5Semantic labeling by using (a) only the unary term of Eq. 8.1 and (b) the unary and binary terms of Eq. 8.1. The result in (b) is obtained using the approach of Kalogerakis et al. 209, i.e. using the unary term and binary term defined in the first row of Table 8.1.

Source: The image has been adapted from the powerpoint presentation of Kalogerakis et al. 209 with the permission of the author.

Figure 8.6 shows samples from the training datasets, which were used as prior knowledge for learning the labeling. Note that, although at runtime the shapes are processed individually, the approach is able to label multiple shapes in a consistent manner. Thus, the labeling provides simultaneously a semantic segmentation and also partwise correspondences between the 3D models. Figure 8.6 also shows the effect of the training data on the labeling. In fact, different segmentation and labeling results are obtained when using different training datasets.

Illustration of samples from the training datasets. Different segmentation and labeling results are obtained when using different training datasets.

Figure 8.6Supervised labeling using the approach of Kalogerakis et al. 209. Observe that different training sets lead to different labeling results. The bottom row image is from Reference 209.

©2010 ACM, Inc. Reprinted by permission. http://doi.acm.org/10.1145/1778765.1778839

Finally, Figure 8.7 demonstrates the importance of the intermesh term, and subsequently the joint labeling process. In Figure 8.7a, the correspondence is computed based purely on the prior knowledge, i.e. by only using the unary and intramesh terms of Eq. 8.1. Observe that part of the support of the candle on the right is mistakenly labeled as a handle and, therefore, does not have a corresponding part on the shape on the left. However, when the intramesh term (i.e. Eq. 8.2), is taken into consideration, we obtain a more accurate correspondence, as shown in Figure 8.7b, since the descriptors of the thin structures on both shapes are similar. We observe an analogous situation in the chairs, where the seat of the chair is mistakenly labeled as a backrest, due to the geometric distortion present in the model. In Figure 8.7b, the joint labeling is able to obtain the correct part correspondence, guided by the similarity of the seats. This demonstrates that both knowledge and content are essential for finding the correct correspondence in these cases.

Illustration of the effects of the intermesh term on the quality of the semantic labelling. (a) Semantic labeling without the intermesh term. (b) Semantic labeling using the intermesh term.

Figure 8.7Effect of the intermesh term on the quality of the semantic labeling. Results are obtained using the approach of van Kaick et al. 203, i.e. using the energy functions of the second row of Table 8.1. (a) Semantic labeling without the intermesh term (i.e. using the energy of Eq. 8.1). (b) Semantic labeling using the intermesh term (i.e. using the energy of Eq. 8.2).

Source: The image has been adapted from 203.

8.7 Summary and Further Reading

We have presented in this chapter some mathematical tools for computing correspondences between 3D shapes that differ in geometry and topology but share some semantic similarities. The set of solutions, which we covered in this chapter, treat this problem as a joint semantic labeling process, i.e. simultaneously and consistently labeling the regions of two or more surfaces. This is formulated as a problem of minimizing an energy function composed of multiple terms; a data term, which encodes the a‐priori knowledge about the semantic labels, and one or more terms, which constrain the solution. Most of existing techniques differ in the type of constraints that are encoded in these terms and the way the energy function is optimized.

The techniques presented in this chapter are based on the work of (i) Kalogerakis et al. 209, which presented a data‐driven approach to simultaneous segmentation and labeling of parts in 3D meshes, (ii) van Kaick et al. 203, which used prior knowledge to learn semantic correspondences, and (iii) Sidi et al. 210, which presented a unsupervised method for jointly segmenting and putting in correspondence multiple objects belonging to the same family of 3D shapes. These are just examples of the state‐of‐the‐art. Other examples include the work of Huang et al. 222, which used linear programming for joint segmentation and labeling, and Laga et al. 204, 223, which observed that structure, i.e. the way shape components are interconnected to each other, can tell a lot about the semantic of a shape. They designed a metric that captures structural similarities. The metric is also used for partwise semantic correspondence and for functionality recognition in man‐made 3D shapes. Feragen et al. 224 considered 3D shapes that have a tree‐like structure and developed a framework, which enables computing correspondences and geodesics between such shapes. The framework has been applied to the analysis of airways trees 224 and botanical trees 225.

Finally, semantic analysis of 3D shapes has been extensively used for efficient modeling and editing of 3D shapes 226228 and for exploring 3D shape collections 229. For further reading, we refer the reader to the surveys of Mitra et al. 205 on structure‐aware shape processing and to the survey of van Kaick et al. 183 on 3D shape correspondence.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.220.120