Chapter 7. Segmenting and Partitioning a Graph

Being able to view a network graph and understand the component parts is a critical aspect of network analysis. There are a number of approaches we can take to do this, including the use of clustering algorithms, data partitioning, or simply through the creative application of colors and sizing of specific portions of the graph. The end goal is typically the same—to tell an effective story through the use of visual elements.

  • We'll begin by exploring the various partitioning and clustering options available in Gephi
  • We'll then share several examples using some of the clustering and partitioning tools
  • In the final section, we'll walk through some cases where we take a more free form approach to visual clustering, using variations in size and color to direct the audience's attention

Let's begin with a tour through some of the many available clustering and partitioning options that are part of the Gephi core or available through a host of plugins.

Partitioning and clustering options

Before moving into specific methods to partition our network graphs, we should understand why we choose to do so. There are at least two main reasons for segmenting our graphs in this fashion:

  • To highlight patterns in the data based on underlying statistical or behavioral patterns. This can be especially essential when we have a dense network where patterns are not obvious to the naked eye.
  • To make a network more visually attractive through the use of size and coloring options.

Note that these two are not mutually exclusive aims, but can be used to great advantage together. So the end goal of the efforts taken for partitioning or clustering should be to enhance the viewer's ability to interpret the graph. Clustering methods will do this through specialized algorithms that interpret network patterns into distinct groupings (clusters) of similar nodes. Partitioning is typically more manual, but also seeks to deconstruct the network into meaningful groupings based on categorical data.

With this in mind, Gephi provides multiple options to segment a network graph, which includes the following selections:

  • The Partition tab enables the segmentation of a graph using a variety of attributes or statistical measures applied to both nodes and edges. As we will see in a moment, partitioning differs in approach from clustering, even when the end goal might be the same.
  • Sizing and coloring options can be created using the Ranking tab. A typical approach is to have scaled coloring based on a measurable attribute, with darker colors corresponding to higher values. The same approach can be applied using node size to scale values according to their degree level.
  • Manual sizing and coloring is an option in cases where we need to step outside of categorical or value-based partitioning. For instance, we might have five individual nodes we wish to focus on; Gephi enables custom coloring or sizing for these cases. This can be useful when we wish to draw attention to a specific set of nodes based on some salient characteristic.
  • The Chinese Whispers clustering plugin is a community detection algorithm that provides you with the ability to customize a number of parameters. The goal of this method is to find groups of nodes that broadcast the same message to their neighbors, hence the use of the Chinese Whispers children's game as the title.
  • Markov Clustering is one more option available as a Gephi plugin. This is an easy-to-use plugin that does not require a lot of intervention, although a few settings can be adjusted. With Markov Clustering, the approach is based on a comparison of random walks (a succession of random steps of varying lengths) to actual patterns found within the graph, resulting in the identification of clusters where patterns vary significantly from a simple random walk expectation.

At this point, let's clarify a significant difference between partitioning and clustering. While the aim of both is to separate a network into easily understood groups, each approach goes about this in different ways. When we partition a graph, it is typically through an arbitrary selection of one or more attributes or measures, an a priori segmenting of the network. This enables us to see how nodes or edges are categorized across a specific dimension. On the other hand, clustering is typically performed by an algorithm—we might have the ability to adjust the settings, but the individual method analyzes the network data to create clustered groupings. We can also reference the concept of community detection, where overlapping (or non-overlapping) communities are defined based on their connection patterns. In the case of overlapping communities, the resulting graph might be more complex than in either partitioning or clustering, as individual nodes might be members of multiple communities. The Link Communities statistic can also provide insight into these patterns.

One instance where the partitioning and clustering approaches interact is when we use a clustering method to classify the graph, and then elect to display it using the Partition function. This gives us the ability to create clusters using one or more algorithmic approaches, followed by the ability to partition the graph using data attributes or the newly created cluster values. The partition method makes it very easy to select the color scheme for our network groups, regardless of whether they are manually defined partitions or algorithmically created clusters.

The Partition tab

We're going to start this section with the Partition tab, often located above the Layout tab in the Gephi Overview window. This is a very powerful option in Gephi, as it can be quickly applied using many different fields in a network dataset. These fields can be part of the source data—perhaps gender, class, country, or any of a long list of values that can be used to visually segment your graph display.

Like many other options in Gephi, it is quite simple to change the partitioning variable—just navigate to the Partition tab and select a new attribute (you might need to refresh these on occasion by clicking on the green colored refresh icon). In most cases, this will be done using the Nodes option, but there might be instances where you wish to perform partitioning using edges.

As noted above, the primary goal of partitioning is to visually segment a network to help identify patterns based on categorical or integer-based attributes. Be careful not to use this approach when too many possible values are available, as it will only cause visual chaos, as 30, 40, 50 or more distinct colors might be introduced to the graph. The best use of partitioning is when you have a handful of possible values—anywhere from two to perhaps 15 or 20. Anything beyond this point might lead to confusion rather than clarification.

The Ranking tab

The Ranking tab allows value-based manipulation of the graph that employs both color and sizing options. This method can be highly effective to create visual segmentation based on numerical values, including both data attributes (such as number of games played, number of visitors, and so on) as well as calculated attributes (degrees, centrality measures, and more).

Unlike traditional partitioning, you will not be able to examine categorical variables in this fashion, so you might wish to utilize the two approaches as complements to one another.

Manual settings

Gephi provides users the ability to manually adjust virtually any aspect of a network graph, including resizing and recoloring graph nodes. This can be an effective route to choose when just a few modifications are needed to tell a specific story. In some instances, categorical or computational clusters might fail to tell the story we are seeking, forcing us to shift to more manual methods. Whatever the use case, Gephi provides the requisite tools to aid in customizing our graph. The next section will demonstrate these capabilities using a simple example.

Chinese Whispers

The Chinese Whispers algorithm, named after the children's game, is a useful plugin that can help to segment your graph by identifying similarities (and differences) across network members. The results are returned as specific clusters, with each node in a graph assigned to one of the groupings. For a detailed understanding of the methodology, here is the link to the paper describing the approach: http://wortschatz.uni-leipzig.de/~cbiemann/pub/2006/BiemannTextGraph06.pdf.

In simple terms, the algorithm is used to partition undirected, weighted graphs. Similar to the children's game of the same name, the goal is to understand which nodes are able to broadcast the same message to their neighbors. Nodes with very similar patterns are then grouped into clusters based on these similarities.

There are a couple of limitations to the approach:

  • Component boundaries cannot be crossed. If your graph has a single giant component with no secondary components, then the algorithm works as designed.
  • Nodes with no edges are removed from the clustering process, leaving any such nodes outside of the partitioning structure. These nodes are simply unclustered and are referred to as singletons.

In the next section, we'll demonstrate the value of the Chinese Whispers approach in helping to develop stories based on network patterns.

Markov clustering

This plugin is based on the work of Stijn van Dongen, with his thesis available at http://www.micans.org/mcl/. This approach uses the theory of random walks in determining how the graph should be clustered. Two operators known as expansion and inflation are used to simulate the random walks. At a simple level, a random walk should have low probability of crossing from one cluster to another, resulting in a graph with natural splits based on the existing network structure. We won't dive into the mathematical constructs here (refer to the preceding link for more details), but will view some examples later in this chapter.

One caveat with the Markov approach—it can be very demanding computationally, so be certain to tweak your settings appropriately for the network in question, or prepare to receive memory errors. We'll look at this a bit more while running through some examples later in this chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.119.106