7.2 What Is Data Mining?

We have already considered ways that statistical techniques can help us process and summarize large amounts of data. By computing statistical measures such as the range, mean, standard deviation, and frequency, we can begin to make statements about our data. In this chapter we will explore this idea further.

Large amounts of data can be overwhelming. Yet, it is likely that important pieces of information are hidden away within the data that may not be obvious when only simple types of descriptive statistics are used. In this kind of situation, we can use data mining—the application of automated techniques that attempt to discover underlying patterns. These techniques can be applied to any number of data domains. For example, in business, data mining is often used for marketing purposes to find consumer-related patterns. Once these patterns are identified, they can be used to recommend the products that a customer might likely purchase. In addition, many applications in science and medicine require finding patterns in large amounts of data.

Cluster analysis is a data mining technique that attempts to divide the data into meaningful groups called clusters. These clusters represent data values that show some kind of similarity to one another while exhibiting a dissimilar relationship to data values outside of the cluster.

In this chapter we focus on cluster analysis as a way to find hidden information in a collection of data. Our goal is to implement one version of cluster analysis and to apply tools that allow us to see the results.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.198.7