Completeness score

This score is complementary to the previous one. Its purpose is to provide a piece of information about the assignment of samples belonging to the same class. More precisely, a good clustering algorithm should assign all samples with the same true label to the same cluster. From our previous analysis, we know that, for example, the digit 7 has been wrongly assigned to both clusters 9 and 1; therefore, we expect a non-perfect completeness score. The definition is symmetric to the homogeneity score:

The rationale is very intuitive. When H(Y_pred|Y_true) is low (c → 1), it means that the knowledge of the ground truth reduces the uncertainty about the predictions. Therefore, if we know that all the sample of subset A have the same label y_i, we are quite sure that all the corresponding predictions have been assigned to the same cluster. The completeness score for our example is:

from sklearn.metrics import completeness_score

print(completeness_score(digits['target'], Y))
0.747718831945

Again, the value confirms our hypothesis. The residual uncertainty is due to a lack of completeness because a few samples with the same label have been split into blocks that are assigned to wrong clusters. It's obvious that a perfect scenario is characterized by having both homogeneity and completeness scores equal to 1.

Table of Contents for Completeness score

Create new playlist

Sign In

Sign Up

Table of Contents for
Completeness score