Clustering and community detection in social networks

Graphs exhibit clustering behavior, and identification of communities is an important task in social networks. A node's clustering coefficient is the number of triadic closures (closed triples) in the node's neighborhood. This is an expression of transitivity. Nodes with higher transitivity exhibit higher subdensity, and if completely closed, form cliques that can be identified as communities. In this recipe, we will look at clustering and community detection in social networks.

Getting ready

You will again need NetworkX and, for the first time in this chapter, the python-louvain library.

How to do it...

These steps will guide you through the detection of communities within social networks:

  1. Let's actually get into some clustering. The python-louvain library uses NetworkX to perform community detection with the louvain method. Here is a simple example of cluster partitioning on a small, built-in social network:
    G = nx.karate_club_graph()
    
    #first compute the best partition
    partition = community.best_partition(G)
    
    #drawing
    pos = nx.spring_layout(G)
    plt.figure(figsize=(12,12))
    plt.axis('off')
    
    nx.draw_networkx_nodes(G, pos, node_size=200, cmap=plt.cm.RdYlBu, node_color=partition.values())
    nx.draw_networkx_edges(G,pos, alpha=0.5)
    plt.savefig("figure/karate_communities.png")
    

    The following is the resulting graph with shades of grey and/or colors representing different partitions:

    How to do it...

    This is pretty neat! We can see there are yellow, light blue, and dark red cliques, but dark blue is pretty homogenous. We'll talk more about how to create graph visualizations with matplotlib later.

  2. To partition our comic book characters, we'll add their partitions to each of their nodes; we'll then look at the relative sizes of each partition:
    >>> graph = graph_from_csv(HERO_NETWORK)
    >>> partition = community.best_partition(graph)
    >>> print "%i partitions" % len(set(partition.values()))
    
    25 partitions
    >>> nx.set_node_attributes(graph, 'partition', partition)
    

    As you can see, the louvain method has discovered 25 communities without our social graph.

  3. To examine the relative size of each community, a histogram view may be helpful. To create the histogram, add the following function to your file:
         import matplotlib.pyplot as plt
    
        def communities_histogram(graph):
            graph, partition = detect_communities(graph)
            numbins = len(partition.values())
    
            plt.hist(partition.values(), bins=numbins), color="#0f6dbc")
            plt.title("Size of Marvel Communities")
            plt.xlabel("Community")
            plt.ylabel("Nodes")
    
            plt.show()
    

    This should produce the following figure:

    How to do it...

    There are three major communities containing over a thousand nodes each. However, we also have eight medium-size communities around 400 actors strong. The other communities are much smaller, but it does appear that the Marvel social graph is indeed a real-world and small-world graph, much like the natural social networks observed in human culture!

  4. An alternate, area-based approach to visualizing the size of the Marvel communities is to use a bubble chart. The area of each circle represents the size of each community. Graphs such as the following are often used to collapse large graphs into subgraphs based on community. To create this code, add the following function to your file:
        def communities_bubblechart(graph):
            graph, partition = detect_communities(graph)
    
            parts = defaultdict(int)
            for part in partition.values():
                parts[part] += 1
    
            bubbles = nx.Graph()
            for part in parts.items():
                bubbles.add_node(part[0], size=part[1])
            pos = nx.random_layout(bubbles)
            plt.figure(figsize=(12,12))
            plt.axis('off')
    
            nx.draw_networkx_nodes(bubbles, pos,
                alpha=0.6, node_size=map(lambda x: x*6, parts.values()),
                node_color=[random.random() for x in parts.values()],
                cmap=plt.cm.RdYlBu)
    
            plt.show()
    

    When run, this code should produce the following figure for our Hero network:

    How to do it...

How it works...

NetworkX has several mechanisms to compute clusters:

>>> nx.transitivity(graph)
0.194539747093
>>> nx.average_clustering(graph)
0.774654121711

The nx.transitivity function uses the nx.triangles function to compute the ratio of the number of triangles to the number of possible triangles. The nx.clustering function computes the clustering coefficient for each node, and the nx_average_clustering function computes the average coefficient for the graph. Higher transitivity and clustering coefficients mean that the graph exhibits the small-world effect.

The small world is a network that appears random, but it has a high transitivity and a short average path length. This is a very common structure for social networks because the semantics of modern networks have strong ties and, therefore, strong transitivity. This essentially means that there are a series of large clusters with few bridge nodes. The heroes' social network exhibits both the small-world effect as well as a preferential attachment; the majority of new edges are to nodes with an already high degree, thus creating a long-tail, left-skewed distribution as we've seen in our graph.

There's more...

The Louvian method is a greedy optimization method that partitions the network, optimizing the modularity of each network partition. First, the method identifies small communities by attempting to optimize local modularity, then it aggregates the nodes belonging to the same community and performs the process again. In this way, a hierarchical data structure of communities is returned. The method is simple, efficient, and works against large networks of millions of nodes with billions of links.

The original method was developed by Etienne Lefebvre at UCL (Louvain-la-Neuve), then co-authored and improved along with Vincent Blondel, Jean-Loup Guillaume, and Renaud Lambiotte. It is the "Louvain method" because the method is devised when all the members of the team were at the Université catholique de Louvain. Together, the authors have created a Python method to compute these hierarchical communities, a method that depends on NetworkX. The basic detection algorithm is as follows:

    import community
    import networkx as nx
    def detect_communities(graph):
        partition = community.best_partition(graph)
        nx.set_node_attributes(graph, 'partition', partition)
        return graph, partition

This function expects an nx.Graph and uses the community module to compute the best root partition. The partition is hierarchical, so to get the subcommunities, we simply iterate over the parts in the partition and assign the nodes in our graph an attribute called partition which identifies a node with their community. The function then returns both the modified graph and the partition to use in visualization.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.130.201