The clustering result of projective clustering is represented by the data
objects which are located in small ranges in its specifi c subspace. As for
SLDA, the various Hs do indicate these small ranges in S. The data objects
that are scattered in deferent hyper-rectangle Hs but within the same
subspace S constitute the different parts of a dense area in S. To obtain the
fully projected clusters, we need to merge these hyper-rectangles within
the same subspace S to generate its corresponding cluster. More concretely,
the clusters derived by a projective clustering algorithm are represented by
all the dense areas and their related subsets of attributes.
Algorithm 7.7: MC_SLDA
Input. SLD
Output. Clustering results and outliers
(1) Divide SLD into several subsets;
(2) For each subset of SLD, a single-linkage merger algorithm is run to fi nd out the
clustering results;
(3) Refi ne the clustering results
Hence we merge all the dense hyper-rectangles in the same subset of
attributes to generate the clusters. Further, for the example shown in Fig.
7.4.3, the cluster in the subset of attributes S = 1, 2 is (
h
= 1
6
, (1, 2)), where
h
= 1
6
denotes the merger processing of density hyper-rectangles. The
three major steps of the merger clustering algorithm on SLDAs, named
MC_SLDA, are to (1) divide SLDAs into several subsets so that the hyper-
rectangles within one attribute subset have the same subspace S; (2) run a
single-linkage merger algorithm on these subsets to fi nd the fully projected
signifi cant local dense area; (3) refi ne the clustering results and detecting
the outliers. The pseudo codes of MC_SLDA are detailed in Algorithm 2.
The data objects which are not included in any clusters are denoted as Rest
= D(
K
k=1
C
k
),
where
is the set different operator. In the clustering result refi nement, we use
the reassign method proposed in [39] to assign data objects in Rest to the
corresponding clusters. After the refi nement, the data objects which do not
belong to any clusters can be regarded as outliers, and an outlier collection
is generated.
Hyper-rectangle structure is often used in fi nding the density area
in high dimensional data sets. The determination of the width of hyper-
rectangle structure is a crucial task in high dimension clustering applications.
The majority of projective clustering algorithms use the restrictive model
to determine the width of hyper-rectangle, which has signifi cant efforts on
discovering real clustering results. Inspired by the kernel density method,
we present a new way to design the hyper-rectangle structure, whose width
is determined by the true data distribution. In order to examine whether
a hyper-rectangle structure is a Signifi cant Local Density Area, we run a
Advanced Clustering Analysis 177