There are extensive related research works on it. The author in [27]
theorized that K-means was a classical heuristic clustering algorithm. Due
to the sensitivity problem of K-means, some modifi ed approaches have
been proposed in the literature. Fast Global K-means [51] (FGK means),
for example, is an incremental approach of clustering that dynamically
adds one cluster centre at a time through a deterministic global search
procedure consisting of D executions of the K-means algorithm with
different suitable initial positions. Zhu et al. presented a new clustering
strategy, which can produce much lower Q(C) value than affinity
propagation (AP) by initializing K-means clustering with cluster centre
produced by AP [45]. In [42], the authors were motivated theoretically and
experimentally by a use of a deterministic divisive hierarchical method
and use of PCA-part (Principal Component Analysis Partitioning) as the
initialization of K-means. In order to overcome the sensitivity problem of
heuristic clustering algorithm, Han et al. proposed CLARANS based on
the random restart local search method [4]. VSH [9] used the iteratively
modifying cluster centre method to deal with initiation problem. More
modifi ed methods addressing the initialization sensitivity problem of
clustering algorithm are referred to [20, 36, 59] .
12.3.2 Hierarchical Clustering
The K-means algorithm has the limitation of choosing the specifi c number of
clusters, and it has the problem of non-determinism. It returns the clusters in
an unstructured set. As a result of such limitations, if we require hierarchy
structure, we need to involve the hierarchical clustering.
Hierarchical clustering constructs a hierarchy of clusters that can
be illustrated in a tree structure as a dendrogram. Each node in the
tree structure, including the root, represents the relationship between
parents and children, so it is able to explore different levels of clustering
granularity [19]. Hierarchical clustering algorithms are either top-down
or bottom-up, the bottom-up algorithms treat each fi le as a separate
cluster in the beginning and then begin to merge, until all cluster clusters
have been merged into a single cluster, such cluster contains all the
fi les.
The bottom-up hierarchical clustering is called hierarchical agglomerative
clustering.
The top-down clustering requires a method for dividing a cluster. It
splits clusters recursively until the individual documents are reached [69]
The advantages of the hierarchical clustering are [5, 19]: It has a high
fl exibility with respect to the level of granularity; it is easy to deal with any
Social Tagging Systems 259