Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Clustering

An alternative approach to PCA is k-means (unsupervised) clustering, which partitions the data into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. We can perform k-means clustering with the kmeans() function and plot the results with plot3d() as follows:

> set.seed(44)
> cl <- kmeans(fish.data[,1:3],5)
> fish.data$cluster <- as.factor(cl$cluster)
> plot3d(fish.log.pca$x[,1:3], col=fish.data$cluster, main="k-means clusters")

Note

The color scheme used for the groups is different from the 3D plot of the PCA results. However, the overall distribution of the groups is similar.

Let's now evaluate how well it categorizes the data with a table as follows:

> with(fish.data, table(cluster, fish))
       fish
cluster Bluegill Bowfin Carp Goldeye Largemouth_Bass
      1        0      0   14      39              18
      2        0     27   12       0              22
      3        0     23   13       0               2
      4        0      0   11       0               0
      5       50      0    0      11               8

As you can see, it nicely groups all the Bluegill fish together but had a much harder time placing the other fish in the right group.

To help improve the classification of the fish into the five groups, we can perform hierarchical clustering as follows:

> di <- dist(fish.data[,1:3], method="euclidean")
> tree <- hclust(di, method="ward")

> fish.data$hcluster <- as.factor((cutree(tree, k=5)-2) %% 3 +1)
> plot(tree, xlab="", cex=0.2)

Let's add a red box around the five hierarchical clusters as follows:

> rect.hclust(tree, k=5, border="red")

The result is shown in the following plot:

Now, let's create a table to determine how well we can group the fish based on the hierarchical clustering as follows:

> with(fish.data, table(hcluster, fish))
        fish
hcluster Bluegill Bowfin Carp Goldeye Largemouth_Bass
      -1       50      0    0       0               4
      0         0     35    8       0              20
      1         0     15   23       0               0
      2         0      0   10       9              14
      3         0      0    9      41              12

As you can see, hierarchical clustering didn't drastically improve the classification of the fish. Therefore, you might consider collecting additional measurements to help classify the fish since the information provided by the length, weight, and speed is insufficient.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Clustering

Create new playlist

Sign In

Sign Up

Clustering

Note

Table of Contents for
Clustering