Chapter 11. Social Network Analysis of the R Ecosystem

Although the concept of social networks has a pretty long history, starting at the beginning of the last century, social network analysis (SNA) became extremely popular only in the last decade, probably due to the success of huge social media sites and the availability of related data. In this chapter, we are going to take a look on how to retrieve and load such data, then analyze and visualize such networks by heavily using the igraph package.

Igraph is an open source network analysis tool made by Gábor Csárdi. The software ships with a wide variety of network analysis methods, and it can be used in R, C, C++, and Python as well.

In this chapter, we will cover the following topics with some examples on the R ecosystem:

  • Loading and handling network data
  • Network centrality metrics
  • Visualizing network graphs

Loading network data

Probably the easiest way to retrieve network-flavored information on the R ecosystem is to analyze how R packages depend on each other. Based on Chapter 2, Getting the Data, we could try to load this data via HTTP parsing of the CRAN mirrors but, luckily, R has a built-in function to return all available R packages from CRAN with some useful meta-information as well:

Tip

The number of packages hosted on CRAN is growing from day to day. As we are working with live data, the actual results you see might be slightly different.

> library(tools)
> pkgs <- available.packages()
> str(pkgs)
 chr [1:6548, 1:17] "A3" "abc" "ABCanalysis" "abcdeFBA" ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:6548] "A3" "abc" "ABCanalysis" "abcdeFBA" ...
  ..$ : chr [1:17] "Package" "Version" "Priority" "Depends" ...

So we have a matrix with more than 6,500 rows, and the fourth column includes the dependencies in a comma-separated list. Instead of parsing those strings and cleaning the data from the package versions and other relatively unimportant characters, let's use another handy function from the tools package to do the dirty work:

> head(package.dependencies(pkgs), 2)
$A3
     [,1]      [,2] [,3]    
[1,] "R"       ">=" "2.15.0"
[2,] "xtable"  NA   NA      
[3,] "pbapply" NA   NA      

$abc
     [,1]       [,2] [,3]  
[1,] "R"        ">=" "2.10"
[2,] "nnet"     NA   NA    
[3,] "quantreg" NA   NA    
[4,] "MASS"     NA   NA    
[5,] "locfit"   NA   NA    

So the package.dependencies function returns a long named list of matrixes: one for each R package, which includes the required package name and version to install and load the referred package. Besides the very same function can retrieve the list of packages that are imported or suggested by others via the depLevel argument. We will use this information to build a richer dataset with different types of connections between the R packages.

The following script creates the data.frame, in which each line represents a connection between two R packages. The src column shows which R package refers to the dep package, and the label describes the type of connection:

> library(plyr)
> edges <- ldply(
+   c('Depends', 'Imports', 'Suggests'), function(depLevel) {
+     deps <- package.dependencies(pkgs, depLevel = depLevel)
+     ldply(names(deps), function(pkg)
+         if (!identical(deps[[pkg]], NA))
+             data.frame(
+                 src   = pkg,
+                 dep   = deps[[pkg]][, 1],
+                 label = depLevel,
+                 stringsAsFactors = FALSE))
+ })

Although this code snippet might seem complex at first sight, we simply look up the dependencies of each package (like in a loop), return a row of data.frame, and nest it in another loop, which iterates through all previously mentioned R package connection types. The resulting R object is really straightforward to understand:

> str(edges)
'data.frame':  26960 obs. of  3 variables:
 $ src  : chr  "A3" "A3" "A3" "abc" ...
 $ dep  : chr  "R" "xtable" "pbapply" "R" ...
 $ label: chr  "Depends" "Depends" "Depends" "Depends" ...
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.146.176.88