172 Handbook of Big Data
has
n
2
distinct dyads, and in an undirected graph, this is also the total number of possible
edges. The degree of a node is its number neighbors, or nodes with which it shares an edge.
In a directed network, each node has an in-degree and an out-degree; in an undirected
network, these are by definition the same. Some types of networks, such as family trees
and street maps, have been used for centuries to efficiently represent relationships among
objects (i.e., people and locations, respectively), but the genesis of the mathematical study
of networks and their topology (graph theory) is usually attributed to Euler’s 1741 Seven
Bridges of K¨onigsberg (Euler, 1741).
Beginning with Euler’s seminal paper and continuing through the middle of the twentieth
century, the formal study of networks or graphs was the exclusive domain of deterministic
sciences such as mathematics, chemistry, and physics; its primary objectives were the
description of properties of a given, fixed graph, for example, the number of edges, paths,
or loops of a graph or taxonomies of various kinds of subgraphs. Random graph theory was
first introduced by the mathematicians Erdos and Renyi (1959). A random graph is simply
a random variable whose sample space is a collection of graphs. It can be characterized
by a probability distribution over the sample space of graphs or by the graph-generating
mechanism that produces said probability distribution. Random graph theory has become a
vibrant area of research in statistics: random graph models have been used to describe and
analyze gene networks, brain networks, social networks, economic interactions, the formation
of international treaties and alliances, and many other phenomena across myriad disciplines.
Common to all of these disparate applications is a focus on quantifying similarities and
differences among local and global topological features of different networks. A random
graph model indexes a probability distribution over graphs with parameters, often having
topological interpretations; the parameters can be estimated using an observed network
as data. Parameter estimates and model fit statistics are then used to characterize the
topological features of the graph. We describe some such models and estimating procedures
in Section 11.2.
Over the past 5–10 years, interest has grown in a complementary but quite different
area of network research, namely the study of causal effects in social networks. Here,
the network itself is not causal, but edges in the network represent the opportunity for
one person to influence another. Learning about the causal effects that people may have
on their social contacts concerns outcomes and covariates sampled from network nodes—
outcomes superimposed over an underlying network topology—rather than features of the
network topology. A small but growing body of literature attempts to learn about peer
effects (also called induction or contagion ) using network data (e.g., Christakis and Fowler,
2007, 2008, 2010): these are the causal effects that one individual’s outcome can have
on the outcomes of his or her social contacts. A canonical example is infectious disease
outcomes, where one individual’s disease status effects his or her contacts’ disease statuses.
Interference or spillover effects are related but distinct causal effects that are also of interest
in network settings; these are the causal effects that one individual’s treatment or exposure
can have on his or her contacts’ outcomes. For example, vaccinating an individual against
an infectious disease is likely to have a protective effect on his or her contacts’ disease
statuses.
Simple randomized experiments that facilitate causal inference in many settings cannot
be applied to the study of contagion or interference in social networks. This is because the
individual subjects (nodes) who would be independently randomized in classical settings
do not furnish independent outcomes in the network setting. In Section 11.3, we describe
recent methodological advances toward causal inference using network data. Before that,
in Section 11.2, we describe some current work on probabilistic network generating models.
While it is possible to be relatively complete in our survey of the literature on causal
inference for outcomes sampled from social network nodes, the literature on network