Visualizing network data

To compare these two metrics, let's draw a simple scatter plot showing each R package by degree and betweenness:

> plot(degree(g), betweenness(g), type = 'n',
+   main = 'Centrality of R package dependencies')
> text(degree(g), betweenness(g), labels = V(g)$name)
Visualizing network data

Relax; we will be soon able to generate much more spectacular and instructive plots in a few minutes! But the preceding plot shows that there are some packages with a rather low number of direct dependents that still have a great impact on the global R ecosystem.

Before we proceed, let's filter our dataset and graph to include far fewer vertices by building the dependency tree of the igraph package, including all packages it depends on or imports from:

Tip

The following short list of igraph dependencies was generated in April 2015. Since then, a major new version of igraph has been released with a lot more dependencies due to importing from the magrittr and NMF packages, so the following examples repeated on your computer will return a much larger network and graphs. For educational purposes, we are showing the smaller network in the following outputs.

> edges <- edges[edges$label != 'Suggests', ]
> deptree <- edges$dep[edges$src == 'igraph']
> while (!all(edges$dep[edges$src %in% deptree] %in% deptree))
+   deptree <- union(deptree, edges$dep[edges$src %in% deptree])
> deptree
[1] "methods"   "Matrix"    "graphics"  "grid"      "stats"
[6] "utils"     "lattice"   "grDevices"

So we need the previously mentioned eight packages to be able to use the igraph package. Please note that not all of these are direct dependencies; some are dependencies from other packages. To draw a visual representation of this dependency tree, let's create the related graph object and plot it:

> g <- graph.data.frame(edges[edges$src %in% c('igraph', deptree), ])
> plot(g)
Visualizing network data

Well, the igraph package literally depends on only one package, although it also imports some functions from the Matrix package. All the other previously mentioned packages are dependencies of the latter.

To draw a more intuitive version of the preceding plot to suggest this statement, we might consider removing the dependency labels and represent that aspect by colors, and we can also emphasize the direct dependencies of igraph by vertex colors. We can modify the attributes of vertices and edges via the V and E functions:

> V(g)$label.color <- 'orange'
> V(g)$label.color[V(g)$name == 'igraph'] <- 'darkred'
> V(g)$label.color[V(g)$name %in%
+        edges$dep[edges$src == 'igraph']] <- 'orangered'
> E(g)$color <- c('blue', 'green')[factor(df$label)]
> plot(g, vertex.shape = 'none', edge.label = NA)
Visualizing network data

Much better! Our central topic, the igraph package, is highlighted in dark red, the two direct dependencies are marked in dark orange, and all the other dependencies are colored in lighter orange. Similarly, we emphasize the Depends relations in blue compared to the vast majority of other Imports connections.

Interactive network plots

What if you do not like the order of the vertices in the preceding plot? Feel free to rerun the last command to produce new results, or draw with tkplot for a dynamic plot, where you can design your custom layout by dragging-and-dropping the vertices:

> tkplot(g, edge.label = NA)
Interactive network plots

Can we do any better? Although this result is extremely useful, it lacks the immediate appeal of the currently trending, JavaScript-empowered interactive plots. So let's recreate this interactive plot with JavaScript, right from R! htmlwidgets and the visNetwork package, discussed in more detail in the Chapter 13, Data Around Us, can help us with this task, even without any JavaScript knowledge. Simply pass the extracted nodes and edge datasets to the visNetwork function:

> library(visNetwork)
> nodes <- get.data.frame(g, 'vertices')
> names(nodes) <- c('id', 'color')
> edges <- get.data.frame(g)
> visNetwork(nodes, edges)
Interactive network plots

Custom plot layouts

Alternatively, we can also generate such hierarchical plots in a programmatic way, by drawing the denominator tree of this directed plot:

> g <- dominator.tree(g, root = "igraph")$domtree
> plot(g, layout = layout.reingold.tilford(g, root = "igraph"), 
+   vertex.shape = 'none')
Custom plot layouts

Analyzing R package dependencies with an R package

As we are using R, a statistical programming environment whose most exciting and useful feature is its community, we might prefer to look for other, already implemented solutions for this research. After a quick Google search, and having looked up a few questions on StackOverflow or posts on http://www.r-bloggers.com/, it's pretty easy to find the Revolution Analytics miniCRAN package, which has some related and useful functions:

> library(miniCRAN)
> pkgs <- pkgAvail()
> pkgDep('igraph', availPkgs = pkgs, suggests = FALSE,
+   includeBasePkgs = TRUE)
[1] "igraph"    "methods"   "Matrix"    "graphics"  "grid"
[6] "stats"     "utils"     "lattice"   "grDevices"
> plot(makeDepGraph('igraph', pkgs, suggests = FALSE,
+   includeBasePkgs = TRUE))
Analyzing R package dependencies with an R package

But let's get back to the original question: How do we analyze network data?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.128.198.59