Training the SOM

It's now time to train the SOM using R. For that, we will use the kohonen package. The name of the package comes from Teuvo Kohonen, who first introduced this algorithm.

As SOM is mainly based on Euclidean distances, it is recommended to standardize your variables. We will use the caret package for this, as follows:

library(kohonen)
library(caret)
preprocess <- preProcess(macroeconomic_data[,4:13], method=c("center", "scale"))
print(preprocess)
## Created from 224 samples and 10 variables
##
## Pre-processing:
## - centered (10)
## - ignored (0)
## - scaled (10)

This will create a data frame with the transformed variables. Then, we will add the country name and ratings to this transformed data frame, as follows:

macroeconomic_data_trn <- cbind(macroeconomic_data[,c(1:3,14)],predict(preprocess,                      macroeconomic_data[,4:13]))

Now, it is time to train the map. The first layer of the network will have ten input patterns or ten macroeconomic variables. Our output layer will be fixed as a bidimensional map that is 6×5 in size. This size was obtained after several tests, shown as follows:

set.seed(1234)

som_grid <- somgrid(xdim = 6, ydim=5, topo="hexagonal")

som_model <- som(as.matrix(macroeconomic_data_trn[,5:14]),grid=som_grid,rlen=800,alpha=c(0.1,0.01), keep.data = TRUE )

After 800 iterations, the training process finishes. Different plots can be obtained from the trained map.

Let's display the mean distance to the closest codebook vector during the training, using the following code:

plot(som_model, type = "changes")

This will plot the following graph:

We can see that the error of the training process, measured as the mean distance to the closest unit, is reduced over the course of the iterations. If the number of iterations is high enough, the algorithm converges.

Moreover, it is possible to visualize the map and count the number of countries that are classified in each neuron:

plot(som_model, type = "counts", main="Node Counts")

A graphical representation of the node counts can be seen in the following plot:

In the graph, 30 cells (6x5 in size) are marked in a different color, depending on the number of countries classified. Grey cells indicate that no countries have been classified in this cell. The larger the number of cells, the lower the number of countries in each neuron, obtaining a more granular map.

In the following plot, the average distance of the countries mapped to each unit in the map is shown. The smaller the distances, the better the countries are represented by the codebook vectors, as follows:

plot(som_model, type = "quality", main="Node Quality/Distance")

The node quality plot is as follows:

The weight vectors across the map can also be visualized. This is useful to uncover patterns in the distribution of the different countries.

The node weight vectors, or codes, are made up of normalized values of the original variables that are used to generate the SOM:

plot(som_model, type = "codes")

The output of the preceding code is displayed in the following plot:

From the preceding diagram, it is possible to extract the characteristics of countries that have been placed in each cell. It is also possible to visualize how individual variables are distributed on the map. For example, let's look at the distribution of the MEANWGI variable, as follows:

plot(som_model, type = "property", property = getCodes(som_model)[,'MEANWGI'], main="WorldBank Governance Indicators")

The trained map is plotted and the cells are colored depending on the values of this variable for the countries placed on them, as shown in the following diagram:

According to the preceding map, countries with higher values in this variable will be located in the top-right part of the map.

Now, let's see where the countries are located by using the macroeconomic information from December 2017 with the following code:

Labels<-macroeconomic_data_trn[,c("CountryName","Year")]
Labels$CountryName<-ifelse(Labels$Year!=2017,"",as.character(Labels$CountryName))

plot(som_model, type = "mapping",label=Labels$CountryName)

A map will be displayed, where countries will be placed according to their economic variables, as shown in the following diagram:

Closer countries display similar values in macroeconomic variables. Thus, Greece, Portugal, and Cyprus are very close on the map. These countries have had important problems in the last few years as a consequence of financial crises. Countries such as Germany, France, and the United Kingdom are also similar, based on the macroeconomic information. Finally, we can create different clusters or groups on this map by taking into account the codebook vectors of the trained map. Hierarchical clustering can be used to select the number of groups. With the following code, we will obtain five different groups of countries:

clusters <- cutree(hclust(dist(getCodes(som_model))), 5)

The recently created groups can be visualized on the map by running the following code:

plot(som_model, type="codes", bgcol = clusters, main = "Clusters")
add.cluster.boundaries(som_model, clusters)

The following map is plotted:

The preceding diagram gives us the following:

  • Group 1: Countries such as Portugal, Greece, and Cyprus.
  • Group 2: Countries such as Poland, Malta, Bulgaria, and Spain. In general, this is a cluster of countries from Western Europe and other countries with a weak economic situation. This group is in a better shape than the previous group.
  • Group 3: More solvent countries in the European Union, such as the UK, Germany, and France.
  • Group 4 and Group 5: These groups have one cell each, one of which contains Ireland. Ireland was in a deep economic crisis a few years back, but its situation is very different now. The difference is to such an extent that it cannot be included in any other group of countries.

As usual, you can now back up your workspace using the following code:

save.image("Backup2.RData")

With this simple clustering approach, it is possible to visualize the different situations of each country in the European Union. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.183.139