Being able to view a network graph and understand the component parts is a critical aspect of network analysis. There are a number of approaches we can take to do this, including the use of clustering algorithms, data partitioning, or simply through the creative application of colors and sizing of specific portions of the graph. The end goal is typically the same—to tell an effective story through the use of visual elements.
Let's begin with a tour through some of the many available clustering and partitioning options that are part of the Gephi core or available through a host of plugins.
Before moving into specific methods to partition our network graphs, we should understand why we choose to do so. There are at least two main reasons for segmenting our graphs in this fashion:
Note that these two are not mutually exclusive aims, but can be used to great advantage together. So the end goal of the efforts taken for partitioning or clustering should be to enhance the viewer's ability to interpret the graph. Clustering methods will do this through specialized algorithms that interpret network patterns into distinct groupings (clusters) of similar nodes. Partitioning is typically more manual, but also seeks to deconstruct the network into meaningful groupings based on categorical data.
With this in mind, Gephi provides multiple options to segment a network graph, which includes the following selections:
At this point, let's clarify a significant difference between partitioning and clustering. While the aim of both is to separate a network into easily understood groups, each approach goes about this in different ways. When we partition a graph, it is typically through an arbitrary selection of one or more attributes or measures, an a priori segmenting of the network. This enables us to see how nodes or edges are categorized across a specific dimension. On the other hand, clustering is typically performed by an algorithm—we might have the ability to adjust the settings, but the individual method analyzes the network data to create clustered groupings. We can also reference the concept of community detection, where overlapping (or non-overlapping) communities are defined based on their connection patterns. In the case of overlapping communities, the resulting graph might be more complex than in either partitioning or clustering, as individual nodes might be members of multiple communities. The Link Communities statistic can also provide insight into these patterns.
One instance where the partitioning and clustering approaches interact is when we use a clustering method to classify the graph, and then elect to display it using the Partition function. This gives us the ability to create clusters using one or more algorithmic approaches, followed by the ability to partition the graph using data attributes or the newly created cluster values. The partition method makes it very easy to select the color scheme for our network groups, regardless of whether they are manually defined partitions or algorithmically created clusters.
We're going to start this section with the Partition tab, often located above the Layout tab in the Gephi Overview window. This is a very powerful option in Gephi, as it can be quickly applied using many different fields in a network dataset. These fields can be part of the source data—perhaps gender, class, country, or any of a long list of values that can be used to visually segment your graph display.
Like many other options in Gephi, it is quite simple to change the partitioning variable—just navigate to the Partition tab and select a new attribute (you might need to refresh these on occasion by clicking on the green colored refresh icon). In most cases, this will be done using the Nodes option, but there might be instances where you wish to perform partitioning using edges.
As noted above, the primary goal of partitioning is to visually segment a network to help identify patterns based on categorical or integer-based attributes. Be careful not to use this approach when too many possible values are available, as it will only cause visual chaos, as 30, 40, 50 or more distinct colors might be introduced to the graph. The best use of partitioning is when you have a handful of possible values—anywhere from two to perhaps 15 or 20. Anything beyond this point might lead to confusion rather than clarification.
The Ranking tab allows value-based manipulation of the graph that employs both color and sizing options. This method can be highly effective to create visual segmentation based on numerical values, including both data attributes (such as number of games played, number of visitors, and so on) as well as calculated attributes (degrees, centrality measures, and more).
Unlike traditional partitioning, you will not be able to examine categorical variables in this fashion, so you might wish to utilize the two approaches as complements to one another.
Gephi provides users the ability to manually adjust virtually any aspect of a network graph, including resizing and recoloring graph nodes. This can be an effective route to choose when just a few modifications are needed to tell a specific story. In some instances, categorical or computational clusters might fail to tell the story we are seeking, forcing us to shift to more manual methods. Whatever the use case, Gephi provides the requisite tools to aid in customizing our graph. The next section will demonstrate these capabilities using a simple example.
The Chinese Whispers algorithm, named after the children's game, is a useful plugin that can help to segment your graph by identifying similarities (and differences) across network members. The results are returned as specific clusters, with each node in a graph assigned to one of the groupings. For a detailed understanding of the methodology, here is the link to the paper describing the approach: http://wortschatz.uni-leipzig.de/~cbiemann/pub/2006/BiemannTextGraph06.pdf.
In simple terms, the algorithm is used to partition undirected, weighted graphs. Similar to the children's game of the same name, the goal is to understand which nodes are able to broadcast the same message to their neighbors. Nodes with very similar patterns are then grouped into clusters based on these similarities.
There are a couple of limitations to the approach:
In the next section, we'll demonstrate the value of the Chinese Whispers approach in helping to develop stories based on network patterns.
This plugin is based on the work of Stijn van Dongen, with his thesis available at http://www.micans.org/mcl/. This approach uses the theory of random walks in determining how the graph should be clustered. Two operators known as expansion and inflation are used to simulate the random walks. At a simple level, a random walk should have low probability of crossing from one cluster to another, resulting in a graph with natural splits based on the existing network structure. We won't dive into the mathematical constructs here (refer to the preceding link for more details), but will view some examples later in this chapter.
One caveat with the Markov approach—it can be very demanding computationally, so be certain to tweak your settings appropriately for the network in question, or prepare to receive memory errors. We'll look at this a bit more while running through some examples later in this chapter.
18.119.133.160