12. Mining Large Graphs (4/6)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

206 Handbook of Big Data

multiplying by 1 do not change the answer. The identity elements in min-plus are



= ∞

and 1



= 0. Note that using the min-plus semi-ring means that we continue to work with

numbers, but just change the way these numbers are manipulated by these operations.

A wide variety of classic graph algorithms can be expressed as generalized matrix-vector

products using a semi-ring. This idea is more fully explored in the edited volume: Graph

Algorithms in the Language of Linear Algebra [18]. Note that for a general matrix and

vector A and x the matrix-vector y = Ax produces the element-wise computation:

= A

i,1

× x

+ A

i,2

× x

+ ···+ A

i,n

× x

The idea with a semi-ring generalized matrix-vector product is that we replace all of these

algebraic operations with their semi-ring counterparts:

= A

i,1

⊗ x

⊕ A

i,2

⊗ x

⊕···⊕A

i,n

⊗ x

Implementation. Implementations of these semi-ring iterative algorithms work just like

the implementations of the standard matrix-vector products described above. The only

diﬀerence is that the actual operations involved change. Note that the semi-ring methods

are all reduction functions applied to the neighbor data because semi-rings are guaranteed

to be associative.

Example: Single-source shortest paths. In fact, using the min-plus algebra we can encode

the solution of a single-source shortest path computation.

∗

Recall that this operation

involves computing the shortest path distance from a source vertex s to all other vertices

v.LetA

v,u

be the distance between vertex v and u, A

v,u



if they are not connected

and A

v,v



otherwise. Consider the iteration:

Initialize: x

(start)





v = s



v = s,

Iterate : x

(next)

= A

v,1

⊗ x

(cur)

⊕ A

v,1

⊗ x

(cur)

⊕···⊕A

v,1

⊗ x

(cur)

(next)

=min

u∈N(v)∪{v}



v,u

+ x

(cur)



At each iteration, we ﬁnd the shortest path to all vertices that are one link further than

each previous step. This iteration is closely related to Dijkstra’s algorithm without a priority

queue.

Example: Connected components. There is also a min-times semi-ring, where a ⊕ b =

min(a, b)anda⊗b = a ×b (the regular multiplication operation). Here, 1



=1and



= ∞.

Let A

v,u



if v and u have an edge, A

v,u



otherwise, and let A

v,v



.Usingthis

semi-ring, we can compute the connected components of a graph:

Initialize: x

(start)

= unique id for v

Iterate : x

(next)

= A

v,1

⊗ x

(cur)

⊕ A

v,1

⊗ x

(cur)

⊕···⊕A

v,1

⊗ x

(cur)

(next)

=min

u∈N(v)∪{v}



(cur)



Once the values do not change in an iteration, all vertices in the same connected component

will have the same value on their vertex. If the vertices are labeled 1 to n, then using those

labelssuﬃcefortheuniqueids.

∗

This can also be used to compute a breadth-ﬁrst search as well.

Mining Large Graphs 207

12.6.3 General Updates

The most general of the generalized matrix-vector product operations apply arbitrary

functions to the neighbor data. For instance, in the example we will see with label

propagation clustering, each neighbor sends labels to a vertex v and the vertex takes the

most frequent incoming label. This operation is not a reduction as it depends on all of

the neighboring data, which eliminates some opportunities to optimize intermediate data

transfer.

∗

Each of the three examples we will see use diﬀerent functions, but the unifying

theme of these operations is that each step is an instance of Equation 12.2 for some function

f. Consequently, all of the parallelization, distribution, and system support is identical

between all of these operations.

Implementation. These operations are easy to implement both for adjacency and edge

list access in centralized or distributed settings. Getting the information between neighbors,

that is, the communication, is the diﬃcult step. In distributed settings, there may be a great

deal of data movement required and optimizing this is an active area of research.

Example: Label propagation for clusters and communities. Label propagation is a method

to divide a graph into small clusters or communities [70] that has been used to optimize

distributions of graph vertices to processors [11] and to optimize the ordering of vertices in

adjacency matrices [71]. It works by giving each vertex a unique id (like in the connected

components algorithm) and then having vertices iteratively assume the id of the most

frequently seen label in their neighborhood (where ties are broken arbitrarily). As we already

explained, this is an instance of a generalized matrix-vector product. A few iterations of

this procedure suﬃce for most graphs. The output depends strongly on how ties break and

a host of other implementation details.

Example: Distance histograms and average distance. A distance histogram of a network

shows the number of vertex pairs separated by k links. It is a key component to the eﬀective

diameter computation and also the average distance. Recall that the eﬀective diameter is the

smallest k such that (say) 90% of all pairs are connected by k links. If these computations

are done exactly, we need one shortest-path computation from each node in the graph, which

has a terrible complexity. However, the distance histogram and the neighborhood function

of a node can both be approximated, with high accuracy, using generalized matrix-vector

products. The essential idea is the Flajolet–Martin count sketch [72] and the HyperLogLog

counter [73] to approximate the number of vertices at distance exactly k from each vertex.

Both of these approximations maintain a small amount of data associated with each vertex.

This information is aggregated in a speciﬁc way at each vertex to update the approximation.

†

The aggregation is formally a reduction, and so this method can take advantage of those

optimizations. These techniques have been demonstrated on Large graphs with almost

100 billion edges.

12.6.4 Edge-Oriented Updates

We have described these generalized matrix-vector products as updating quantities at each

vertex. There is no limitation to vertex-oriented updates only. The same ideas apply edge-

oriented updates by viewing them as generalized matrix-vector products with the line-graph

or dual-graph of the input graph. In the dual graph, we replace each edge with a vertex and

connect each new vertex to the vertices that represent all adjacent edges.

∗

In MapReduce environments, one example of this optimization is the use of local combiners to reduce

the number of key-value pairs sent to the reducers.

†

The speciﬁcs of the algorithm are not our focus here, see [25] for a modern description.

208 Handbook of Big Data

12.6.5 Complexities with Power-Law Graphs

Power-law, or highly skewed, degree distribution pose problems for eﬃcient implementations

of generalized matrix-vector products at scale. The PowerGraph extension of GraphLab [22]

contains a number of ideas to improve performance with formal reductions and power-law

graphs based on vertex splitting ideas that create virtual copies of vertices with lower degree.

These large degree vertices, however, tend not to prohibit implementations of these ideas

on large graphs and only make them slower.

12.6.6 Implementations in SQL

Generalized matrix-vector products are possible to implement even in traditional database

systems that implement SQL [74]. Using a distributed SQL database such as Greenplum will

evaluate these tasks with reasonable eﬃciency even for large graphs. As an example, here we

illustrate how to perform the generalized matrix-vector product for connected components

in SQL. The graph is stored as a table that provides edge-list access.

∗

The columns head

and tail indicate the start and end of each edge. The initial vector is stored as a table with

a vertex id and the unique id associated with it (which could be the same).

edges : id | head | tail

x : id | comp

In the iteration, we create the vector x

(next)

from x:

CREATE TABLE xnext AS (

SELECT e.tail AS id, MIN(x.comp) AS comp

FROM edges e INNER JOIN x ON e.head = x.id

GROUP BY e.tail );

This query takes the graph structure, joins it with the vector such that each component of

the table x is mapped to the head of each edge. Then we group them by the tail of each

edge and take the MIN function over all components. This is exactly what the iteration in

the connected components example did.

12.7 Limitations and Tradeoﬀs in Mining Large Graph

In many cases, individuals who employ graph mining tools want to obtain some sort of

qualitative understanding of or insight into their data. This soft and subtle goal diﬀers in

important ways from simply using the graph to obtain better prediction accuracy in some

well-deﬁned downstream machine learning task. In particular, it can diﬀer from the use of

the algorithms and techniques we have focused on in the previous sections, for example,

when those algorithms are used as black-box components in larger analytics frameworks such

as various machine learning pipelines that are increasingly common. A common temptation

in this setting is to use the intuition obtained from mining small graph, assuming or hoping

that the intuition thereby obtained is relevant or useful for much larger graphs. In general,

it is not. Using intuition obtained from small graphs can lead to qualitatively incorrect

understanding of the behavior and properties of larger graphs; it can lead to qualitatively

incorrect understanding of the behavior of algorithms that run on larger graphs.

∗

We assume that the graph structure is undirected, so both edges (u, v)and(v, u) are stored, and that

each vertex has a self-loop.

Mining Large Graphs 209

At root, the reason that our intuition fails in larger graphs is that—for typical informatic

graphs—the local structure of a large graph is distinct from and qualitatively diﬀerent than

its global structure.

∗

A small subgraph of size roughly 100 vertices is a global structure in

a graph with only 1000 vertices; however, it is a local structure in a graph with millions

of vertices. As typical graphs get larger and larger, much of the local structure does not

change or grow [33,57]; instead, one simply observes more and more varied pockets of local

structure. Hence one aspect of our point: a fundamental diﬀerence between small graph

mining and large graph mining is that for large graphs, the global structure of the graph

(think of the whole graph) and its local structure (think of vertex neighborhoods) are very

diﬀerent, while, for small graphs, these two types of structures are much more similar.

Moreover, in a large realistic graph, these local structures connect up with each

other in ways that are nearly random/quasirandom, or just slightly better than

random/quasirandom. A good example of this to keep in mind is the case of community

detection, as ﬁrst described in [33] and as elaborated upon in [57]. The result of those

exhaustive empirical investigations was that large real-world graphs do not have good large

communities. This is very diﬀerent than working with graphs of a few thousand nodes,

where good clusters and communities of size 5%–25% of the graph do exist. As the graphs

get larger and larger, the good clusters/communities stay roughly of the same size. Thus,

if we insist on ﬁnding good communities then we may ﬁnd hundreds or thousands of good

small communities in graphs with millions or more of nodes, but we won’t ﬁnd good large

communities. That being said, in a large realistic graph, there certainly are large groups

of nodes (think 10% of the graph size) with better than random community structure (for

example, [11,58]); and there certainly are large groups of nodes (again, think 10% of the

graph size) with slightly better community quality score (for whatever score is implicitly or

explicitly being optimized by the community detection algorithm that one decides to run)

than the community quality score that an arbitrary 10% of the nodes of the graph would

have; there are many methods that ﬁnd these latter structures.

In the remainder of this section, we illustrate three examples in which the qualitative

diﬀerence between large graphs and small graphs manifests and our natural small graph

intuition fails: graph drawing, viral propagation, and modularity-based communities.

12.7.1 Graph Drawing

Perhaps the pictorially most vivid illustration of the diﬀerence between small graphs and

large graphs is with respect to visualization and graph drawing. There is no shortage of graph

layout ideas that proclaim to visualize large graphs [75,76]. While graph layout algorithms

are often able to ﬁnd interesting and useful structures in graphs with around 1000 vertices,

they almost universally fail at ﬁnding any useful or interesting structure in graphs with

more than 10,000 vertices.

†

The reason for this is that graph drawing algorithms attempt

to show both the local and global structure simultaneously by seeking an arrangement of

vertices that respects the local edge structure for all edges in the graph. This is not possible

for graphs with strong expander-like properties [33,57]. Relatedly, as we will explain shortly

in terms of communities, there is surprisingly little global structure to be found.

A better strategy for large graphs is to use summary features to reveal the graph

structure. This is essentially how the oddball anomaly detection method works [12].

Each vertex is summarized with a few small local features. The result is a set of

∗

Informally, by local structure, we mean, for example, the properties of a single node and its nearest

neighbors, while by global structure, we mean the properties of the graph as a whole or that involve a

constant fraction of the nodes of the graph.

†

These failures can be quite beautiful, though, from an artistic point of view.

210 Handbook of Big Data

less-artistic-but-more-informative scatter plots that show multivariate relationships among

the vertices. Anomalies are revealed because they are outliers in the space of local features.

Hive plots of networks are an attempt to make these multivariate plots reveal some of the

correlation structure among these attributes on the edges [78].

12.7.2 Viral Propagation

Another qualitative goal in large graph mining is to understand the spread of information

within a network. This is often called viral propagation due to its relationship with how a

virus spreads through a population. This is also a property that is fundamentally diﬀerent

between large graphs and small graphs. Consider the two graphs from Figure 12.2. For

each graph, we place three seed nodes in these graphs, and we look at how far information

would spread from these seeds to the rest of the graph in three steps.

∗

The results of

this simple experiment are in Figure 12.3, and it too illustrates this diﬀerence between

small and large graphs. In small graphs, each of the viral propagations from the source

nodes ﬁnd their own little region of the graph; each region can be meaningfully interpreted,

and there is only a very little overlap between diﬀerent regions. In the large graph, the

viral propagations quickly spread and intersect and overlap throughout the graph. This

qualitative diﬀerence is of fundamental importance, and is not limited to our relatively-

simple notion of information propagation; instead, it also holds much more generally for

more complex diﬀusions [57, Figures 12 and 13].

(b) a 8887 node graph(a) a 379 node graph

FIGURE 12.2

Graph drawing for a small and large graph. The small graph drawing (a) shows considerable

local structure and reveals some overall topology of the relationships (the graph is Newman’s

network science collaborators [48]). The large graph drawing (b) shows what is aﬀectionately

called a hairball and does not reveal any meaningful structure (the graph is a human protein–

protein interaction network [77]). The failure of graph drawing to show any meaningful

structure in large networks is an example of how we should not draw intuition from small

graphs when mining large graphs.

∗

Here, we are looking at the three-step geodesic neighborhoods of each of the three seeds, but we observe

similar results with diﬀusion-based dynamics and other dynamics.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 12. Mining Large Graphs (4/6)

Create new playlist

Sign In

Sign Up

Table of Contents for
12. Mining Large Graphs (4/6)