Publisher Summary

This chapter discusses the advanced topics of cluster analysis. In conventional cluster analysis, an object is assigned to one cluster exclusively. However, in some applications, there is a need to assign an object to one or more clusters in a fuzzy or probabilistic way. Fuzzy clustering and probabilistic model-based clustering allow an object to belong to one or more clusters. A partition matrix records the membership degree of objects belonging to clusters. There are two major categories of clustering methods for high-dimensional data: subspace clustering methods and dimensionality reduction methods. Subspace clustering methods search for clusters in subspaces of the original space. Dimensionality reduction methods create a new space of lower dimensionality and search for clusters there. Probabilistic model-based clustering has a general framework and is a method for deriving clusters where each object is assigned a probability of belonging to a cluster. Probabilistic model-based clustering is widely used in many data mining applications such as text mining. Clustering high-dimensional data is used when the dimensionality is high and conventional distance measures are dominated by noise. Fundamental methods for cluster analysis on high-dimensional data are introduced. Graph and network data are increasingly popular in applications such as online social networks, the World Wide Web, and digital libraries. The key issues in clustering graph and network data, including similarity measurement and clustering methods are studied. In some applications various constraints may exist. These constraints may rise from background knowledge or spatial distribution of the objects. The process of how to conduct cluster analysis with different kinds of constraints is discussed.

You learned the fundamentals of cluster analysis in Chapter 10. In this chapter, we discuss advanced topics of cluster analysis. Specifically, we investigate four major perspectives:

■ Probabilistic model-based clustering: Section 11.1 introduces a general framework and a method for deriving clusters where each object is assigned a probability of belonging to a cluster. Probabilistic model-based clustering is widely used in many data mining applications such as text mining.

■ Clustering high-dimensional data: When the dimensionality is high, conventional distance measures can be dominated by noise. Section 11.2 introduces fundamental methods for cluster analysis on high-dimensional data.

■ Clustering graph and network data: Graph and network data are increasingly popular in applications such as online social networks, the World Wide Web, and digital libraries. In Section 11.3, you will study the key issues in clustering graph and network data, including similarity measurement and clustering methods.

■ Clustering with constraints: In our discussion so far, we do not assume any constraints in clustering. In some applications, however, various constraints may exist. These constraints may rise from background knowledge or spatial distribution of the objects. You will learn how to conduct cluster analysis with different kinds of constraints in Section 11.4.

By the end of this chapter, you will have a good grasp of the issues and techniques regarding advanced cluster analysis.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.42.87