Multidimensional Scaling (MDS) is a multivariate technique that was first used in geography. The main goal of MDS is to plot multivariate data points in two dimensions, thus revealing the structure of the dataset by visualizing the relative distance of the observations. MDA is used in diverse fields such as attitude study in psychology, sociology, and market research.
While the MASS
package provides non-metric MDS via the isoMDS
function, we will concentrate on the classical metric MDS, which is available in the cmdscale
function offered by the stats
package. Both types of MDS take a distance matrix as the main argument and can be created from any numeric tabular data by the dist
function.
But before we explore more complex examples, let's see what MDS can offer us while working with an already existing distance matrix, such as the built-in eurodist
dataset:
> as.matrix(eurodist)[1:5, 1:5] Athens Barcelona Brussels Calais Cherbourg Athens 0 3313 2963 3175 3339 Barcelona 3313 0 1318 1326 1294 Brussels 2963 1318 0 204 583 Calais 3175 1326 204 0 460 Cherbourg 3339 1294 583 460 0
The preceding values represents the travel distance between 21 European cities in kilometers, although only the first 5-5 values were shown. Running classical MDS is fairly easy:
> (mds <- cmdscale(eurodist)) [,1] [,2] Athens 2290.2747 1798.803 Barcelona -825.3828 546.811 Brussels 59.1833 -367.081 Calais -82.8460 -429.915 Cherbourg -352.4994 -290.908 Cologne 293.6896 -405.312 Copenhagen 681.9315 -1108.645 Geneva -9.4234 240.406 Gibraltar -2048.4491 642.459 Hamburg 561.1090 -773.369 Hook of Holland 164.9218 -549.367 Lisbon -1935.0408 49.125 Lyons -226.4232 187.088 Madrid -1423.3537 305.875 Marseilles -299.4987 388.807 Milan 260.8780 416.674 Munich 587.6757 81.182 Paris -156.8363 -211.139 Rome 709.4133 1109.367 Stockholm 839.4459 -1836.791 Vienna 911.2305 205.930
These scores are very similar to two principal components, such as running prcomp(eurodist)$x[, 1:2]
. As a matter of fact, PCA can be considered as the most basic MDS solution.
Anyway, we have just transformed the 21-dimensional space into 2 dimensions, which can be plotted very easily (unlike the previous matrix with 21 rows and 21 columns):
> plot(mds)
Does this ring a bell? If not, please feel free to see the following image, where the following two lines of code also show the city names instead of the anonymous points:
> plot(mds, type = 'n') > text(mds[, 1], mds[, 2], labels(eurodist))
Although the y axis is flipped, which you can fix by multiplying the second argument of text by -1, we have just rendered a European map of cities from the distance matrix—without any further geographical data. I find this rather impressive.
Please find more data visualization tricks and methods in Chapter 13, Data Around Us.
Now let's see how to apply MDS on non-geographic data that was not prepared with a view to its being a distance matrix. Let's get back to the mtcars
dataset:
> mds <- cmdscale(dist(mtcars)) > plot(mds, type = 'n') > text(mds[, 1], mds[, 2], rownames(mds))
The plot shows the 32 cars of the original dataset scattered in a two-dimensional space. The distance between the elements was computed by MDS, which took into account all the 11 original variables, and it's very easy to identify the similar and very different car types. We will cover these topics in more details in the next chapter, Chapter 10, Classification and Clustering.
3.145.45.5