Mining Large Graphs 209
At root, the reason that our intuition fails in larger graphs is that—for typical informatic
graphs—the local structure of a large graph is distinct from and qualitatively different than
its global structure.
∗
A small subgraph of size roughly 100 vertices is a global structure in
a graph with only 1000 vertices; however, it is a local structure in a graph with millions
of vertices. As typical graphs get larger and larger, much of the local structure does not
change or grow [33,57]; instead, one simply observes more and more varied pockets of local
structure. Hence one aspect of our point: a fundamental difference between small graph
mining and large graph mining is that for large graphs, the global structure of the graph
(think of the whole graph) and its local structure (think of vertex neighborhoods) are very
different, while, for small graphs, these two types of structures are much more similar.
Moreover, in a large realistic graph, these local structures connect up with each
other in ways that are nearly random/quasirandom, or just slightly better than
random/quasirandom. A good example of this to keep in mind is the case of community
detection, as first described in [33] and as elaborated upon in [57]. The result of those
exhaustive empirical investigations was that large real-world graphs do not have good large
communities. This is very different than working with graphs of a few thousand nodes,
where good clusters and communities of size 5%–25% of the graph do exist. As the graphs
get larger and larger, the good clusters/communities stay roughly of the same size. Thus,
if we insist on finding good communities then we may find hundreds or thousands of good
small communities in graphs with millions or more of nodes, but we won’t find good large
communities. That being said, in a large realistic graph, there certainly are large groups
of nodes (think 10% of the graph size) with better than random community structure (for
example, [11,58]); and there certainly are large groups of nodes (again, think 10% of the
graph size) with slightly better community quality score (for whatever score is implicitly or
explicitly being optimized by the community detection algorithm that one decides to run)
than the community quality score that an arbitrary 10% of the nodes of the graph would
have; there are many methods that find these latter structures.
In the remainder of this section, we illustrate three examples in which the qualitative
difference between large graphs and small graphs manifests and our natural small graph
intuition fails: graph drawing, viral propagation, and modularity-based communities.
12.7.1 Graph Drawing
Perhaps the pictorially most vivid illustration of the difference between small graphs and
large graphs is with respect to visualization and graph drawing. There is no shortage of graph
layout ideas that proclaim to visualize large graphs [75,76]. While graph layout algorithms
are often able to find interesting and useful structures in graphs with around 1000 vertices,
they almost universally fail at finding any useful or interesting structure in graphs with
more than 10,000 vertices.
†
The reason for this is that graph drawing algorithms attempt
to show both the local and global structure simultaneously by seeking an arrangement of
vertices that respects the local edge structure for all edges in the graph. This is not possible
for graphs with strong expander-like properties [33,57]. Relatedly, as we will explain shortly
in terms of communities, there is surprisingly little global structure to be found.
A better strategy for large graphs is to use summary features to reveal the graph
structure. This is essentially how the oddball anomaly detection method works [12].
Each vertex is summarized with a few small local features. The result is a set of
∗
Informally, by local structure, we mean, for example, the properties of a single node and its nearest
neighbors, while by global structure, we mean the properties of the graph as a whole or that involve a
constant fraction of the nodes of the graph.
†
These failures can be quite beautiful, though, from an artistic point of view.