Completeness score

This score is complementary to the previous one. Its purpose is to provide a piece of information about the assignment of samples belonging to the same class. More precisely, a good clustering algorithm should assign all samples with the same true label to the same cluster. From our previous analysis, we know that, for example, the digit 7 has been wrongly assigned to both clusters 9 and 1; therefore, we expect a non-perfect completeness score. The definition is symmetric to the homogeneity score:

The rationale is very intuitive. When H(Ypred|Ytrue) is low (c → 1), it means that the knowledge of the ground truth reduces the uncertainty about the predictions. Therefore, if we know that all the sample of subset A have the same label yi, we are quite sure that all the corresponding predictions have been assigned to the same cluster. The completeness score for our example is:

from sklearn.metrics import completeness_score

print(completeness_score(digits['target'], Y))

Again, the value confirms our hypothesis. The residual uncertainty is due to a lack of completeness because a few samples with the same label have been split into blocks that are assigned to wrong clusters. It's obvious that a perfect scenario is characterized by having both homogeneity and completeness scores equal to 1.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.