set mining algorithm, the cluster criterion must implement a downward
closure property (also called monotonicity property): If subspace S contains
a cluster, then any subspace T S must also contain a cluster. The reverse
implication, if a subspace T does not contain a cluster, then any super
space S T also cannot contain a cluster, can be used for pruning, that is,
excluding specifi c subspaces from consideration. Let us note that there are
bottom-up algorithms that do not use an APRIORI-like subspace search,
but instead apply other search heuristics. In another way, the bottom-up
approaches are also called as subspace clustering.
CLIQUE [1], the pioneering approach to subspace clustering, uses a
grid-based clustering notion. The data space is partitioned by an axis-
parallel grid into equal units of width ζ. Only units which contain at least
τ points are considered as dense. A cluster is defi ned as a maximal set
of adjacent dense units. Since dense units satisfy the downward closure
property, subspace clusters can be explored rather effi ciently in a bottom-
up way. Starting with all one-dimensional dense units, (k+1)-dimensional
dense units are computed from the set of k-dimensional dense units in an
APRIORI-like style. If a (k+1)-dimensional unit contains a projection onto
a k-dimensional unit that is not dense, then the (k+1)-dimensional unit
also cannot be dense. Further, a heuristic that is based on the minimum
description length principle is introduced to discard candidate units
within less interesting subspaces (i.e., subspaces that contain only a very
small number of dense units). This way, the effi ciency of the algorithm is
enhanced but at the cost of incomplete results, namely some true clusters
are lost. There are some variants of CLIQUE. The method ENCLUS [11] also
relies on a fi xed grid, but searches for subspaces that potentially contain
one or more clusters rather than for dense units. Three quality criteria
for subspaces are introduced, one implementing the downward closure
property. The method MAFIA [45] uses an adaptive grid. The generation of
subspace clusters is similar to CLIQUE. Another variant of CLIQUE, called
nCluster [41], allows overlapping windows of length δ as one-dimensional
units of the grid. In summary, all grid-based methods use a simple but rather
effi cient cluster model. The shape of each resulting cluster corresponds to a
polygon with axis-parallel lines in the corresponding subspace. Obviously,
the accuracy and effi ciency of CLIQUE and its variants primarily depend
on the granularity and the positioning of the grid. A higher grid granularity
results in higher runtime requirements but will most likely produce more
accurate results. SUBCLU [16] uses the DBSCAN cluster model of density
connected sets. It is shown that density-connected sets satisfy the downward
closure property. This enables SUBCLU to search for density based clusters
in subspaces in an APRIORI-like style. The resulting clusters may exhibit
an arbitrary shape and size in the corresponding subspaces. RIS [39] is
a subspace ranking algorithm that uses a quality criterion to rate the
Clustering Analysis 85