Finding similarity across columns using cluster variables

Cluster variables work in a manner that is similar to cluster observations. Here, we are interested in the columns and variables in the worksheet rather than grouping the observations and rows.

We will look at the dataset for car fuel efficiency to identify groups of variables, or rather identify the columns that are similar to each other.

This dataset was collected from manufacturer-stated specifications.

How to do it…

  1. Open the mpg.MTW worksheet.
  2. Go to the Stat menu, click on Multivariate, and select Cluster Variables….
  3. Enter the columns for CO2, Cylinders, Weight, Combined mpg, Max hp, and Capacity into the Variables or distance matrix: section.
  4. Check the Show dendrogram option.
  5. Click on OK to create the results.
  6. Inspect the dendrogram to identify groups in the result. The higher the value of similarity along the the y axis, the greater the similarity between columns, as shown in the following figure:
    How to do it…
  7. It looks like there are three main groups of variables. Press Ctrl + E to return to the last dialog box.
  8. Under Number of clusters:, select 3.
  9. Click on OK.

How it works…

As with cluster observations, we have used the single linkage method by default. We have the same options for the linkage method and distance measure as the ones used in cluster observations.

The results for the variables here show us that the combined mpg is very different when compared to the other variables.

For larger numbers of variables, dendrograms can be split into separate graphs by clusters. The Customize… option for the dendrogram can be set to the maximum number of observations per graph.

See also

  • The Finding similarity in results by rows using cluster observations recipe
  • The Identifying groups in data using cluster K-means recipe
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.167.195