206 Handbook of Big Data
multiplying by 1 do not change the answer. The identity elements in min-plus are
0
=
and 1
= 0. Note that using the min-plus semi-ring means that we continue to work with
numbers, but just change the way these numbers are manipulated by these operations.
A wide variety of classic graph algorithms can be expressed as generalized matrix-vector
products using a semi-ring. This idea is more fully explored in the edited volume: Graph
Algorithms in the Language of Linear Algebra [18]. Note that for a general matrix and
vector A and x the matrix-vector y = Ax produces the element-wise computation:
y
i
= A
i,1
× x
1
+ A
i,2
× x
2
+ ···+ A
i,n
× x
n
.
The idea with a semi-ring generalized matrix-vector product is that we replace all of these
algebraic operations with their semi-ring counterparts:
y
i
= A
i,1
x
1
A
i,2
x
2
···A
i,n
x
n
.
Implementation. Implementations of these semi-ring iterative algorithms work just like
the implementations of the standard matrix-vector products described above. The only
difference is that the actual operations involved change. Note that the semi-ring methods
are all reduction functions applied to the neighbor data because semi-rings are guaranteed
to be associative.
Example: Single-source shortest paths. In fact, using the min-plus algebra we can encode
the solution of a single-source shortest path computation.
Recall that this operation
involves computing the shortest path distance from a source vertex s to all other vertices
v.LetA
v,u
be the distance between vertex v and u, A
v,u
=
0
if they are not connected
and A
v,v
=1
otherwise. Consider the iteration:
Initialize: x
(start)
v
=
1
v = s
0
v = s,
Iterate : x
(next)
v
= A
v,1
x
(cur)
1
A
v,1
x
(cur)
2
···A
v,1
x
(cur)
n
x
(next)
v
=min
uN(v)∪{v}
A
v,u
+ x
(cur)
u
.
At each iteration, we find the shortest path to all vertices that are one link further than
each previous step. This iteration is closely related to Dijkstra’s algorithm without a priority
queue.
Example: Connected components. There is also a min-times semi-ring, where a b =
min(a, b)andab = a ×b (the regular multiplication operation). Here, 1
=1and
0
= .
Let A
v,u
=1
if v and u have an edge, A
v,u
=
0
otherwise, and let A
v,v
=1
.Usingthis
semi-ring, we can compute the connected components of a graph:
Initialize: x
(start)
v
= unique id for v
Iterate : x
(next)
v
= A
v,1
x
(cur)
1
A
v,1
x
(cur)
2
···A
v,1
x
(cur)
n
x
(next)
v
=min
uN(v)∪{v}
x
(cur)
u
.
Once the values do not change in an iteration, all vertices in the same connected component
will have the same value on their vertex. If the vertices are labeled 1 to n, then using those
labelssucefortheuniqueids.
This can also be used to compute a breadth-first search as well.
Mining Large Graphs 207
12.6.3 General Updates
The most general of the generalized matrix-vector product operations apply arbitrary
functions to the neighbor data. For instance, in the example we will see with label
propagation clustering, each neighbor sends labels to a vertex v and the vertex takes the
most frequent incoming label. This operation is not a reduction as it depends on all of
the neighboring data, which eliminates some opportunities to optimize intermediate data
transfer.
Each of the three examples we will see use different functions, but the unifying
theme of these operations is that each step is an instance of Equation 12.2 for some function
f. Consequently, all of the parallelization, distribution, and system support is identical
between all of these operations.
Implementation. These operations are easy to implement both for adjacency and edge
list access in centralized or distributed settings. Getting the information between neighbors,
that is, the communication, is the difficult step. In distributed settings, there may be a great
deal of data movement required and optimizing this is an active area of research.
Example: Label propagation for clusters and communities. Label propagation is a method
to divide a graph into small clusters or communities [70] that has been used to optimize
distributions of graph vertices to processors [11] and to optimize the ordering of vertices in
adjacency matrices [71]. It works by giving each vertex a unique id (like in the connected
components algorithm) and then having vertices iteratively assume the id of the most
frequently seen label in their neighborhood (where ties are broken arbitrarily). As we already
explained, this is an instance of a generalized matrix-vector product. A few iterations of
this procedure suffice for most graphs. The output depends strongly on how ties break and
a host of other implementation details.
Example: Distance histograms and average distance. A distance histogram of a network
shows the number of vertex pairs separated by k links. It is a key component to the effective
diameter computation and also the average distance. Recall that the effective diameter is the
smallest k such that (say) 90% of all pairs are connected by k links. If these computations
are done exactly, we need one shortest-path computation from each node in the graph, which
has a terrible complexity. However, the distance histogram and the neighborhood function
of a node can both be approximated, with high accuracy, using generalized matrix-vector
products. The essential idea is the Flajolet–Martin count sketch [72] and the HyperLogLog
counter [73] to approximate the number of vertices at distance exactly k from each vertex.
Both of these approximations maintain a small amount of data associated with each vertex.
This information is aggregated in a specific way at each vertex to update the approximation.
The aggregation is formally a reduction, and so this method can take advantage of those
optimizations. These techniques have been demonstrated on Large graphs with almost
100 billion edges.
12.6.4 Edge-Oriented Updates
We have described these generalized matrix-vector products as updating quantities at each
vertex. There is no limitation to vertex-oriented updates only. The same ideas apply edge-
oriented updates by viewing them as generalized matrix-vector products with the line-graph
or dual-graph of the input graph. In the dual graph, we replace each edge with a vertex and
connect each new vertex to the vertices that represent all adjacent edges.
In MapReduce environments, one example of this optimization is the use of local combiners to reduce
the number of key-value pairs sent to the reducers.
The specifics of the algorithm are not our focus here, see [25] for a modern description.
208 Handbook of Big Data
12.6.5 Complexities with Power-Law Graphs
Power-law, or highly skewed, degree distribution pose problems for efficient implementations
of generalized matrix-vector products at scale. The PowerGraph extension of GraphLab [22]
contains a number of ideas to improve performance with formal reductions and power-law
graphs based on vertex splitting ideas that create virtual copies of vertices with lower degree.
These large degree vertices, however, tend not to prohibit implementations of these ideas
on large graphs and only make them slower.
12.6.6 Implementations in SQL
Generalized matrix-vector products are possible to implement even in traditional database
systems that implement SQL [74]. Using a distributed SQL database such as Greenplum will
evaluate these tasks with reasonable efficiency even for large graphs. As an example, here we
illustrate how to perform the generalized matrix-vector product for connected components
in SQL. The graph is stored as a table that provides edge-list access.
The columns head
and tail indicate the start and end of each edge. The initial vector is stored as a table with
a vertex id and the unique id associated with it (which could be the same).
edges : id | head | tail
x : id | comp
In the iteration, we create the vector x
(next)
from x:
CREATE TABLE xnext AS (
SELECT e.tail AS id, MIN(x.comp) AS comp
FROM edges e INNER JOIN x ON e.head = x.id
GROUP BY e.tail );
This query takes the graph structure, joins it with the vector such that each component of
the table x is mapped to the head of each edge. Then we group them by the tail of each
edge and take the MIN function over all components. This is exactly what the iteration in
the connected components example did.
12.7 Limitations and Tradeoffs in Mining Large Graph
In many cases, individuals who employ graph mining tools want to obtain some sort of
qualitative understanding of or insight into their data. This soft and subtle goal differs in
important ways from simply using the graph to obtain better prediction accuracy in some
well-defined downstream machine learning task. In particular, it can differ from the use of
the algorithms and techniques we have focused on in the previous sections, for example,
when those algorithms are used as black-box components in larger analytics frameworks such
as various machine learning pipelines that are increasingly common. A common temptation
in this setting is to use the intuition obtained from mining small graph, assuming or hoping
that the intuition thereby obtained is relevant or useful for much larger graphs. In general,
it is not. Using intuition obtained from small graphs can lead to qualitatively incorrect
understanding of the behavior and properties of larger graphs; it can lead to qualitatively
incorrect understanding of the behavior of algorithms that run on larger graphs.
We assume that the graph structure is undirected, so both edges (u, v)and(v, u) are stored, and that
each vertex has a self-loop.
Mining Large Graphs 209
At root, the reason that our intuition fails in larger graphs is that—for typical informatic
graphs—the local structure of a large graph is distinct from and qualitatively different than
its global structure.
A small subgraph of size roughly 100 vertices is a global structure in
a graph with only 1000 vertices; however, it is a local structure in a graph with millions
of vertices. As typical graphs get larger and larger, much of the local structure does not
change or grow [33,57]; instead, one simply observes more and more varied pockets of local
structure. Hence one aspect of our point: a fundamental difference between small graph
mining and large graph mining is that for large graphs, the global structure of the graph
(think of the whole graph) and its local structure (think of vertex neighborhoods) are very
different, while, for small graphs, these two types of structures are much more similar.
Moreover, in a large realistic graph, these local structures connect up with each
other in ways that are nearly random/quasirandom, or just slightly better than
random/quasirandom. A good example of this to keep in mind is the case of community
detection, as first described in [33] and as elaborated upon in [57]. The result of those
exhaustive empirical investigations was that large real-world graphs do not have good large
communities. This is very different than working with graphs of a few thousand nodes,
where good clusters and communities of size 5%–25% of the graph do exist. As the graphs
get larger and larger, the good clusters/communities stay roughly of the same size. Thus,
if we insist on finding good communities then we may find hundreds or thousands of good
small communities in graphs with millions or more of nodes, but we won’t find good large
communities. That being said, in a large realistic graph, there certainly are large groups
of nodes (think 10% of the graph size) with better than random community structure (for
example, [11,58]); and there certainly are large groups of nodes (again, think 10% of the
graph size) with slightly better community quality score (for whatever score is implicitly or
explicitly being optimized by the community detection algorithm that one decides to run)
than the community quality score that an arbitrary 10% of the nodes of the graph would
have; there are many methods that find these latter structures.
In the remainder of this section, we illustrate three examples in which the qualitative
difference between large graphs and small graphs manifests and our natural small graph
intuition fails: graph drawing, viral propagation, and modularity-based communities.
12.7.1 Graph Drawing
Perhaps the pictorially most vivid illustration of the difference between small graphs and
large graphs is with respect to visualization and graph drawing. There is no shortage of graph
layout ideas that proclaim to visualize large graphs [75,76]. While graph layout algorithms
are often able to find interesting and useful structures in graphs with around 1000 vertices,
they almost universally fail at finding any useful or interesting structure in graphs with
more than 10,000 vertices.
The reason for this is that graph drawing algorithms attempt
to show both the local and global structure simultaneously by seeking an arrangement of
vertices that respects the local edge structure for all edges in the graph. This is not possible
for graphs with strong expander-like properties [33,57]. Relatedly, as we will explain shortly
in terms of communities, there is surprisingly little global structure to be found.
A better strategy for large graphs is to use summary features to reveal the graph
structure. This is essentially how the oddball anomaly detection method works [12].
Each vertex is summarized with a few small local features. The result is a set of
Informally, by local structure, we mean, for example, the properties of a single node and its nearest
neighbors, while by global structure, we mean the properties of the graph as a whole or that involve a
constant fraction of the nodes of the graph.
These failures can be quite beautiful, though, from an artistic point of view.
210 Handbook of Big Data
less-artistic-but-more-informative scatter plots that show multivariate relationships among
the vertices. Anomalies are revealed because they are outliers in the space of local features.
Hive plots of networks are an attempt to make these multivariate plots reveal some of the
correlation structure among these attributes on the edges [78].
12.7.2 Viral Propagation
Another qualitative goal in large graph mining is to understand the spread of information
within a network. This is often called viral propagation due to its relationship with how a
virus spreads through a population. This is also a property that is fundamentally different
between large graphs and small graphs. Consider the two graphs from Figure 12.2. For
each graph, we place three seed nodes in these graphs, and we look at how far information
would spread from these seeds to the rest of the graph in three steps.
The results of
this simple experiment are in Figure 12.3, and it too illustrates this difference between
small and large graphs. In small graphs, each of the viral propagations from the source
nodes find their own little region of the graph; each region can be meaningfully interpreted,
and there is only a very little overlap between different regions. In the large graph, the
viral propagations quickly spread and intersect and overlap throughout the graph. This
qualitative difference is of fundamental importance, and is not limited to our relatively-
simple notion of information propagation; instead, it also holds much more generally for
more complex diffusions [57, Figures 12 and 13].
(b) a 8887 node graph(a) a 379 node graph
FIGURE 12.2
Graph drawing for a small and large graph. The small graph drawing (a) shows considerable
local structure and reveals some overall topology of the relationships (the graph is Newman’s
network science collaborators [48]). The large graph drawing (b) shows what is affectionately
called a hairball and does not reveal any meaningful structure (the graph is a human protein–
protein interaction network [77]). The failure of graph drawing to show any meaningful
structure in large networks is an example of how we should not draw intuition from small
graphs when mining large graphs.
Here, we are looking at the three-step geodesic neighborhoods of each of the three seeds, but we observe
similar results with diffusion-based dynamics and other dynamics.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.202.61