1.1.3.5 Advanced Data Mining Algorithms
Despite the great success of data mining techniques applied in different
areas and settings, there is an increasing demand for developing new data
mining algorithms and improving state-of-the-art approaches to handle
the more complicated and dynamical problems. In the meantime, with the
prevalence and deployment of data mining in real applications, some new
research questions and emerging research directions have been raised in
response to the advance and breakthrough of theory and technology in
data mining. Consequently, applied data mining is becoming an active and
fast progressing topic which has opened up a big algorithmic space and
developing potential. Here we list some interesting topics, which will be
described in subsequent chapters.
1. High-Dimensional Clustering In general, data objects to be clustered are
described by points in a high-dimensional space, where each dimension
corresponds to an attribute/feature. A distance measurement between
any two points is used to measure their similarity. The research has
shown that the increasing dimensionality results in the loss of contrast
in distances between data objects. Thus, clustering algorithms that
measure the similarity between data objects based on all attributes/
features tend to degrade in high dimensional data spaces. In additional,
the widely used distance measurement usually perform effectively
only on some particular subsets of attributes, where the data objects
are distributed densely. In other words, it is more likely to form dense
and reasonable clusters of data objects in a low-dimensional subspace.
Recently, several algorithms for discovering data object clusters in
subsets of attributes have been proposed, and they can be classifi ed
into two categories: subspace clustering and projective clustering [8].
2. Multi-Label Classifi cation In the framework of classifi cation, each
object is described as an instance, which is usually a feature vector
that characterizes the object from different aspects. Moreover, each
instance is associated with one or more labels indicating its categories.
Generally speaking, the process of classifi cation consists of two main
steps: the fi rst is training a classifi er or model on a given set of labeled
instances, the second is using the learned classifi er to predict the
label of unseen instance. However, the instances might be assigned
with multiple labels simultaneously, and problems of this type are
ubiquitous in many modern applications. Recently, there has been a
considerable amount of research concerned with dealing with multi-
label problems and many state-of-the-art methods have already been
proposed [3]. It has also been applied to lots of practical applications,
including text classifi cation, gene function prediction, music emotion
analysis, semantic annotation of video, tag recommendation, etc.
Introduction 15