Multidimensional Scaling

Multidimensional Scaling (MDS) is a multivariate technique that was first used in geography. The main goal of MDS is to plot multivariate data points in two dimensions, thus revealing the structure of the dataset by visualizing the relative distance of the observations. MDA is used in diverse fields such as attitude study in psychology, sociology, and market research.

While the MASS package provides non-metric MDS via the isoMDS function, we will concentrate on the classical metric MDS, which is available in the cmdscale function offered by the stats package. Both types of MDS take a distance matrix as the main argument and can be created from any numeric tabular data by the dist function.

But before we explore more complex examples, let's see what MDS can offer us while working with an already existing distance matrix, such as the built-in eurodist dataset:

> as.matrix(eurodist)[1:5, 1:5]
          Athens Barcelona Brussels Calais Cherbourg
Athens         0      3313     2963   3175      3339
Barcelona   3313         0     1318   1326      1294
Brussels    2963      1318        0    204       583
Calais      3175      1326      204      0       460
Cherbourg   3339      1294      583    460         0

The preceding values represents the travel distance between 21 European cities in kilometers, although only the first 5-5 values were shown. Running classical MDS is fairly easy:

> (mds <- cmdscale(eurodist))
                      [,1]      [,2]
Athens           2290.2747  1798.803
Barcelona        -825.3828   546.811
Brussels           59.1833  -367.081
Calais            -82.8460  -429.915
Cherbourg        -352.4994  -290.908
Cologne           293.6896  -405.312
Copenhagen        681.9315 -1108.645
Geneva             -9.4234   240.406
Gibraltar       -2048.4491   642.459
Hamburg           561.1090  -773.369
Hook of Holland   164.9218  -549.367
Lisbon          -1935.0408    49.125
Lyons            -226.4232   187.088
Madrid          -1423.3537   305.875
Marseilles       -299.4987   388.807
Milan             260.8780   416.674
Munich            587.6757    81.182
Paris            -156.8363  -211.139
Rome              709.4133  1109.367
Stockholm         839.4459 -1836.791
Vienna            911.2305   205.930

These scores are very similar to two principal components, such as running prcomp(eurodist)$x[, 1:2]. As a matter of fact, PCA can be considered as the most basic MDS solution.

Anyway, we have just transformed the 21-dimensional space into 2 dimensions, which can be plotted very easily (unlike the previous matrix with 21 rows and 21 columns):

> plot(mds)
Multidimensional Scaling

Does this ring a bell? If not, please feel free to see the following image, where the following two lines of code also show the city names instead of the anonymous points:

> plot(mds, type = 'n')
> text(mds[, 1], mds[, 2], labels(eurodist))
Multidimensional Scaling

Although the y axis is flipped, which you can fix by multiplying the second argument of text by -1, we have just rendered a European map of cities from the distance matrix—without any further geographical data. I find this rather impressive.

Please find more data visualization tricks and methods in Chapter 13, Data Around Us.

Now let's see how to apply MDS on non-geographic data that was not prepared with a view to its being a distance matrix. Let's get back to the mtcars dataset:

> mds <- cmdscale(dist(mtcars))
> plot(mds, type = 'n')
> text(mds[, 1], mds[, 2], rownames(mds))
Multidimensional Scaling

The plot shows the 32 cars of the original dataset scattered in a two-dimensional space. The distance between the elements was computed by MDS, which took into account all the 11 original variables, and it's very easy to identify the similar and very different car types. We will cover these topics in more details in the next chapter, Chapter 10, Classification and Clustering.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.45.5