Cluster assumption

This assumption is strictly linked to the previous one and it's probably easier to accept. It can be expressed with a chain of interdependent conditions. Clusters are high density regions; therefore, if two points are close, they are likely to belong to the same cluster and their labels must be the same. Low density regions are separation spaces; therefore, samples belonging to a low density region are likely to be boundary points and their classes can be different. To better understand this concept, it's useful to think about supervised SVM: only the support vectors should be in low density regions. Let's consider the following bidimensional example:

In a semi-supervised scenario, we couldn't know the label of a point belonging to a high density region; however, if it is close enough to a labeled point that it's possible to build a ball where all the points have the same average density, we are allowed to predict the label of our test sample. Instead, if we move to a low-density region, the process becomes harder, because two points can be very close but with different labels. We are going to discuss the semi-supervised, low-density separation problem at the end of this chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.97.187