To compare these two metrics, let's draw a simple scatter plot showing each R package by degree
and betweenness
:
> plot(degree(g), betweenness(g), type = 'n', + main = 'Centrality of R package dependencies') > text(degree(g), betweenness(g), labels = V(g)$name)
Relax; we will be soon able to generate much more spectacular and instructive plots in a few minutes! But the preceding plot shows that there are some packages with a rather low number of direct dependents that still have a great impact on the global R ecosystem.
Before we proceed, let's filter our dataset and graph to include far fewer vertices by building the dependency tree of the igraph
package, including all packages it depends on or imports from:
The following short list of igraph
dependencies was generated in April 2015. Since then, a major new version of igraph
has been released with a lot more dependencies due to importing from the magrittr
and NMF
packages, so the following examples repeated on your computer will return a much larger network and graphs. For educational purposes, we are showing the smaller network in the following outputs.
> edges <- edges[edges$label != 'Suggests', ] > deptree <- edges$dep[edges$src == 'igraph'] > while (!all(edges$dep[edges$src %in% deptree] %in% deptree)) + deptree <- union(deptree, edges$dep[edges$src %in% deptree]) > deptree [1] "methods" "Matrix" "graphics" "grid" "stats" [6] "utils" "lattice" "grDevices"
So we need the previously mentioned eight packages to be able to use the igraph
package. Please note that not all of these are direct dependencies; some are dependencies from other packages. To draw a visual representation of this dependency tree, let's create the related graph object and plot it:
> g <- graph.data.frame(edges[edges$src %in% c('igraph', deptree), ]) > plot(g)
Well, the igraph
package literally depends on only one package, although it also imports some functions from the Matrix
package. All the other previously mentioned packages are dependencies of the latter.
To draw a more intuitive version of the preceding plot to suggest this statement, we might consider removing the dependency labels and represent that aspect by colors, and we can also emphasize the direct dependencies of igraph
by vertex
colors. We can modify the attributes of vertices and edges via the V
and E
functions:
> V(g)$label.color <- 'orange' > V(g)$label.color[V(g)$name == 'igraph'] <- 'darkred' > V(g)$label.color[V(g)$name %in% + edges$dep[edges$src == 'igraph']] <- 'orangered' > E(g)$color <- c('blue', 'green')[factor(df$label)] > plot(g, vertex.shape = 'none', edge.label = NA)
Much better! Our central topic, the igraph
package, is highlighted in dark red, the two direct dependencies are marked in dark orange, and all the other dependencies are colored in lighter orange. Similarly, we emphasize the Depends
relations in blue compared to the vast majority of other Imports connections.
What if you do not like the order of the vertices in the preceding plot? Feel free to rerun the last command to produce new results, or draw with tkplot
for a dynamic plot, where you can design your custom layout by dragging-and-dropping the vertices:
> tkplot(g, edge.label = NA)
Can we do any better? Although this result is extremely useful, it lacks the immediate appeal of the currently trending, JavaScript-empowered interactive plots. So let's recreate this interactive plot with JavaScript, right from R! htmlwidgets
and the visNetwork
package, discussed in more detail in the Chapter 13, Data Around Us, can help us with this task, even without any JavaScript knowledge. Simply pass the extracted nodes and edge datasets to the visNetwork
function:
> library(visNetwork) > nodes <- get.data.frame(g, 'vertices') > names(nodes) <- c('id', 'color') > edges <- get.data.frame(g) > visNetwork(nodes, edges)
Alternatively, we can also generate such hierarchical plots in a programmatic way, by drawing the denominator tree of this directed plot:
> g <- dominator.tree(g, root = "igraph")$domtree > plot(g, layout = layout.reingold.tilford(g, root = "igraph"), + vertex.shape = 'none')
As we are using R, a statistical programming environment whose most exciting and useful feature is its community, we might prefer to look for other, already implemented solutions for this research. After a quick Google search, and having looked up a few questions on StackOverflow or posts on http://www.r-bloggers.com/, it's pretty easy to find the Revolution Analytics miniCRAN
package, which has some related and useful functions:
> library(miniCRAN) > pkgs <- pkgAvail() > pkgDep('igraph', availPkgs = pkgs, suggests = FALSE, + includeBasePkgs = TRUE) [1] "igraph" "methods" "Matrix" "graphics" "grid" [6] "stats" "utils" "lattice" "grDevices" > plot(makeDepGraph('igraph', pkgs, suggests = FALSE, + includeBasePkgs = TRUE))
But let's get back to the original question: How do we analyze network data?
3.146.35.72