Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Finding similarity across columns using cluster variables

Cluster variables work in a manner that is similar to cluster observations. Here, we are interested in the columns and variables in the worksheet rather than grouping the observations and rows.

We will look at the dataset for car fuel efficiency to identify groups of variables, or rather identify the columns that are similar to each other.

This dataset was collected from manufacturer-stated specifications.

How to do it…

Open the mpg.MTW worksheet.
Go to the Stat menu, click on Multivariate, and select Cluster Variables….
Enter the columns for CO2, Cylinders, Weight, Combined mpg, Max hp, and Capacity into the Variables or distance matrix: section.
Check the Show dendrogram option.
Click on OK to create the results.
Inspect the dendrogram to identify groups in the result. The higher the value of similarity along the the y axis, the greater the similarity between columns, as shown in the following figure:
It looks like there are three main groups of variables. Press Ctrl + E to return to the last dialog box.
Under Number of clusters:, select 3.
Click on OK.

How it works…

As with cluster observations, we have used the single linkage method by default. We have the same options for the linkage method and distance measure as the ones used in cluster observations.

The results for the variables here show us that the combined mpg is very different when compared to the other variables.

For larger numbers of variables, dendrograms can be split into separate graphs by clusters. The Customize… option for the dendrogram can be set to the maximum number of observations per graph.

Table of Contents for
Finding similarity across columns using cluster variables

Finding similarity across columns using cluster variables

How to do it…

How it works…

See also

Table of Contents for Finding similarity across columns using cluster variables

Create new playlist

Sign In

Sign Up

Finding similarity across columns using cluster variables

How to do it…

How it works…

See also

Table of Contents for
Finding similarity across columns using cluster variables