176 Handbook of Big Data
Extensions to the stochastic blockmodel include mixed membership models (Airoldi
et al., 2008) and degree corrected stochastic blockmodels, which induce power law
distributions on the degrees (Karrer and Newman, 2011).
11.2.3 General Latent Space Model
A final level of complexity is afforded by the general latent space model. Under this model,
the nodes of a network are embedded into a low-dimensional latent space, usually Euclidean,
and the probability of an edge between any two nodes is a function of their latent positions.
For example, in the latent distance model, the probability of a tie increases as the (Euclidean)
distance between the latent positions decreases (Hoff et al., 2002). This captures both
reciprocity and transitivity in the formation of network edges: since distances are symmetric,
if the probability of an edge between i and j is high, then the probability of an edge between
j and i will also be high, and the triangle inequality suggests that if i and j are close and j
and t are close, then i and t are going to be close. Reciprocity and transitivity are properties
that are thought to be important in real-world networks but are impossible to incorporate
into the Erdos–Renyi–Gilbert model or the stochastic blockmodel. The inherent symmetry
of the distance model rules out the possibility that certain nodes have a greater affinity for
ties than others, and to circumvent this limitation, the general latent space model allows for
asymmetric functions of the latent positions as well as for node- and dyad-specific covariates
to affect the probability of tie formation. An example of a latent space model with additive
and multiplicative functions of the latent positions as well as such covariates is described in
detail below.
Consider an n×n asymmetric adjacency matrix A, representing a directed graph, and let
X be an n ×n ×p array of observed characteristics. Each n ×n slice of X is either constant
in the rows (representing a fixed effect that contributes to the propensity to send ties in
the network, or sender effect); constant in the columns (representing a fixed effect that
contributes to the propensity to receive ties in the network, or receiver effect); or neither,
representing dyadic effects. We can model a
ij
as the indicator 1
s
ij
>0
that s
ij
> 0, where
s
ij
= X
ij·
θ + α
i
+ β
j
+ u
t
i
v
j
+
ij
, X
ij·
is the p-dimensional vector of covariates associated
with the relationship between nodes i and j, α
i
is an additive sender effect, β
j
is an additive
receiver effect, and u
t
i
v
j
is a multiplicative effect (as it is the projection of u
i
in the direction
of v
j
in the latent space) that captures similarity between nodes i and j (Hoff, 2005). This
model is a generalization of the social relations model of Warner et al. (1979). Reciprocity
can be introduced into the model by allowing for the error terms (
ij
,
ji
) to be correlated.
Here X
ij·
might include sender-specific information, receiver-specific information, or dyadic
information. The additive latent effects α
i
and β
j
contain information about the affinity of
nodes i and j to send and receive ties in general, while the multiplicative effect u
t
i
v
j
contains
the information on the latent similarity of the two nodes. In particular, if the nodes are close
in the latent space (u
t
i
v
j
> 0), then the probability of a tie is increased and if they are far
apart (u
t
i
v
j
< 0), then it is decreased.
The third panel of Figure 11.1 displays a directed network generated from the latent class
model described above (without covariates and with a one-dimensional latent space). The
two sets of nodes are colored according to the sign of α
i
. The emergence of the two clusters
is due to the multiplicative effect α
i
α
j
: ties are more likely between individuals for whom
the signs of α
i
match. This demonstrates the ability of this model to capture stochastic
blockmodel behavior. Each node has its own probability of sending a tie to another node,
which allows for much greater flexibility than the blockmodel. The yellow nodes send out
more ties to the blue nodes than they receive from the blue nodes, due to the additional
additive effect of α
i
in the model as nodes with α
i
> 0 have a higher probability of sending
out ties.
Networks 177
General latent space models can be fit to data from a real network via a Markov chain
Monte Carlo algorithm (Hoff, 2005). A salient advantage of these models is their ability
to model restrictions on the sampling design according to which data from a network is
collected. For example, when collecting information about friendship networks in a survey
setting, researchers often ask individuals to name their top five friends. This type of network
sampling scheme is known as the fixed rank nominations scheme and is explored in detail
in Hoff et al. (2013). When this sampling design is ignored, estimation of effects on the
probability of edge formation is imprecise, potentially leading to incorrect inference. The
ability to account for restrictions on the sampling design makes this general class of models
potentially promising for causal inference using data from partially observed networks.
11.2.4 Testing for Network Structure
All three models described above make assumptions about the similarity among nodes in
a network, allowing for stochastic equivalence, homophily or both (Hoff, 2007). Recent
work by Bickel and Sarkar (2013) and Volfovsky and Hoff (2014) proposes two different
testing procedures for the amount of similarity among nodes in a relational dataset. These
tests can inform of the validity of the assumptions underlying the three models described
above; rejecting the null hypothesis can be seen as an indication that the data exhibit more
complexity than the null model allows. Bickel and Sarkar (2013) leverage Tracy–Widom
theory to develop a hypothesis test for the null hypothesis that the graph comes from
an Erdos–Renyi–Gilbert model (rejection of the null suggests the presence of blocks or
clusters, as in the stochastic blockmodel), while Volfovsky and Hoff (2014) develop a test
for correlation among the rows and columns of a relational data matrix within the context
of the normal model (defined below) and provide an extension to the latent space model for
networks that informs the choice of dimension of the latent space.
11.2.4.1 Bickel and Sarkar (2013) Main Result
Let A be the adjacency matrix of an Erdos–Renyi random graph with parameter p and let
ˆp betheestimateofp. For the centered and scaled adjacency matrix,
˜
A
=
A (nˆp11
t
ˆpI)
(n 1)ˆp(1 ˆp)
the limiting distribution of the leading eigenvalue, λ
1
(
˜
A
)isgivenby
n
2/3
(λ
1
(
˜
A
) 2)
d
TracyWidom
Given this limiting distribution, it is now easy to construct a test for the null hypothesis
that a graph is generated from an Erdos–Renyi model with parameter p. Bickel and Sarkar
(2013) propose using this test to recursively find the blocks of the stochastic blockmodel of
Section 11.2.2. In particular, if the test rejects the null hypothesis, then the algorithm splits
the graph into two blocks and recurses the test on each block.
11.2.4.2 Volfovsky and Hoff (2014) Main Result
Let S be an n ×n random matrix distributed according to the matrix normal distribution
with zero mean matrix and Kronecker covariance structure Σ
col
Σ
row
. If the entries of S
represent a weighted relationship between actors i and j in a network, then the identities
E[ZZ
t
]=Σ
row
tr(Σ
col
)andE[S
t
S]=Σ
col
tr(Σ
row
) suggest the interpretation of Σ
row
and
178 Handbook of Big Data
Σ
col
as the covariance of the network nodes as senders and as receivers, respectively. Under
the normal model, using a single sample matrix, we can construct a likelihood ratio test for
row and column dependenc under this model. This is because both under the null hypothesis
that Σ
row
and Σ
col
are diagonal positive definite and under the alternative hypothesis that
Σ
row
and Σ
col
are positive definite, the matrix normal likelihood is bounded for a single
matrix S.
To connect this test to binary networks we make use of the latent model of Section 11.2.3.
Consider a graph with adjacency matrix A where the probability of an edge is modeled as
a function of latent multiplicative sender and receiver effects, u
i
,v
j
R
R
using the probit
link: Pr(a
ij
=1)=Φ(γ + u
t
i
v
j
). Writing this model in the latent variable formulation we
write a
ij
=1
[s
ij
>γ]
,wheres
ij
= u
t
i
v
j
+
ij
.Then×R matrices U and V formed by stacking
the vectors u
i
, i =1, ..., n and v
j
, j =1, ..., n, describe the row and column heterogeneity
in S, respectively, since E[SS
t
]=U(V
t
V )U
t
+ I and E[S
t
S]=V (U
t
U)V
t
+ I.Themain
challenge here is choosing the appropriate rank R to use for the model. This can be achieved
by performing the likelihood ratio test described above on the latent scale for a sequence of
ranks. When the test stops rejecting the null of no dependence among the nodes, it means
that the latent vectors capture all of the dependence between senders and between receivers
present in A. The above-described likelihood ratio test can be applied on the latent scale.
Since latent space models are best fit using Bayesian methodology (Hoff, 2005, 2008), we can
easily construct a posterior predictive p-value using the techniques described in Thompson
and Geyer (2007).
Testing procedures have been recently developed for more complicated models: Yang
et al. (2014) proposed a novel test for a graphon, a limit object for random graphs. The
models described in this section carry with them the underlying assumption of infinite
exchangeability, that is, that the finite networks being modeled comprise subsamples of an
infinite graph where the nodes can be relabeled without affecting the joint distribution of
the graph. In a recent work, Volfovsky and Airoldi (2014) demonstrated that distributions
that satisfy the substantially weaker assumption of finite exchangeability are close in total
variation distance to the distribution of graphs satisfying the stronger assumption. This
suggests that inference made under the stronger assumption is likely to be appropriate even
when only the weaker assumption holds.
11.3 Observations Sampled from Network Nodes
In most of the literature on networks, the network itself is the object of study: features of
network topology and probability models for edge formation, node addition, and network
generation. But situations abound in which the network can be thought of as a nuisance
parameter and interest is instead in the behavior and attributes of the network nodes.
For social networks, which are our focus here, we might be interested in how behaviors
spread from one person to their social contacts, how attributes cluster among network
nodes, or how one person’s randomly assigned treatment might affect their social contacts’
outcomes. The underlying network gives important structure to the data—network ties
represent opportunities for contagion or correlation—but interest is in the effects, not the
determinants, of the network topology. Research methods for analyzing data on outcomes
sampled from a single network is nascent and remains underdeveloped. In what follows we
describe recent work in this area and conclude with a description of open problems. Most
of the research in this area assumes that the network is fixed and known.
Networks 179
11.3.1 Terminology and Notation
In this section, we will use the terms nodes, subjects, and individuals interchangeably. This
is a slight abuse of the term node but should cause no confusion as we are not concerned
with network topology except insofar as it informs relations among the subjects and the
observations they furnish. Let Y
i
represent an outcome for node/subject i and let Z
i
be a
treatment or exposure for individual i. Sometimes, we will index the outcomes with time,
that is, Y
t
i
is the outcome for subject i at time t.Ifwehaveobservedn subjects, Y is an
n-dimensional vector of outcomes for subjects 1 through n, possibly indexed by time, and
Z is an n-dimensional vector of exposures or treatments for subjects 1 through n.
Questions about the influence one subject has on the outcome of another subject
are inherently questions about causal effects, which are defined in terms of potential or
counterfactual outcomes (see, e.g., Hernan, 2004; Rubin, 2005). In general, a unit-level
potential outcome, Y
i
(z), is defined as the outcome that we would have observed for subject
i if we could have intervened to set that subject’s treatment or exposure Z
i
to value z,
where z is in the support of Z. But social networks represent a paradigmatic opportunity
for interference: one subject’s exposure may affect not only his own outcome but also the
outcomes of his social contacts and possibly other subjects. This means that the traditional
unit-level potential outcomes are not well defined. Instead, Y
i
(z) is the outcome that we
would have observed if we could have set the vector of exposures for the entire population, Z,
to z =(z
1
, ..., z
n
), where for each i, z
i
is in the support of Z. In the presence of interference,
there are many types of causal effects that might be of interest. Unit-level causal effects
isolate the effect of a subject’s own exposure on his outcome. These effects are usually
defined either holding other subjects’ exposures constant or averaging over possible exposure
distributions for other subjects. Spillover effects quantify the effect on subject i’s outcome
of other subjects’ exposures. For a binary treatment, the contrast
n
i=1
E[Y
i
(1) Y
i
(0)] is
often of interest. This is the difference in average potential outcomes in a world in which
every subject is assigned to treatment compared to a world in which every subject is assigned
to control. Eckles et al. (2014) call this particular estimand the average treatment effect
(ATE); however, to differentiate this estimand from the ATE defined in the absence of
interference, we will adopt the terminology of Halloran and Struchiner (1995) and refer
to
n
i=1
E[Y
i
(1) Y
i
(0)] as the overall ATE. See Ogburn and VanderWeele (2014a) and
references therein for further discussion of causal effects in the presence of interference and
of the distinction between interference and contagion.
We distinguish between interference, which is present when one subject’s treatment or
exposure may affect others’ outcomes, and peer effects or contagion,whicharepresent
when one subject’s outcome may influence or transmit to other subjects (Ogburn and
VanderWeele, 2014a). In the literature on peer effects, an ego is defined as a subject whose
outcome we are interested in studying and the ego’s alters are the ego’s neighbors, that is,
subjects who share an edge with the ego in the underlying network. A peer effect is a causal
effect on an ego’s outcome at time t of his alter’s outcome at time s for some s<t.
11.3.2 Experiments in Networks
The design of randomized experiments in networks, when subjects and their treatments
may not be independent, is an area of recent but increasing interest. Without attempting
to be exhaustive, this section reviews what we consider to be the most important threads
in this line of research.
Experiments are difficult for two reasons: interference, which may make it impossible
to simultaneously observe the treatment condition (e.g., every subject is treated) for some
180 Handbook of Big Data
units and the control condition (e.g., no subject is treated) for others, and dependence
among observations, which makes the estimation of standard errors difficult. Fisherian
randomization-based inference can circumvent the problem of dependence, as described
in Sections 11.3.2.1 and 11.3.2.2. Other randomization-based approaches assume that the
only source of dependence is interference, and therefore that observations are independent
conditional on treatment. There are two approaches to dealing with the challenge of
interference in the randomization literature: assumptions on the limits of interference and
attempts to minimize bias due to interference. Both are discussed below.
11.3.2.1 Fisherian Hypothesis Testing
Most of the work on inference about outcomes sampled from network nodes concerns
randomized experiments. But there is an important distinction to be made between inference
from randomized experiments and the specific case of Fisherian randomization-based
inference, pioneered by Fisher (1922) and applied to network-like settings by Rosenbaum
(2007) and Bowers et al. (2013). Fisherian randomization-based inference is founded on
the very intuitive notion that, under the null hypothesis of no effect of treatment on
any subject (sometimes called the sharp null hypothesis to distinguish it from other null
hypotheses that may be of interest), the treated and control groups are random samples from
the same underlying distribution. Fisherian randomization-based inference treats outcomes
as fixed and treatment assignments as random variables: quantities that depend on the
vector of treatment assignments are the only random variables in this paradigm. Therefore,
dependence among outcomes is a nonissue.
In an influential paper, Rosenbaum (2007) proposed the use of distribution-free statistics
(statistics whose distribution under the null hypothesis can be determined apriori, without
reference to the data) to perform hypothesis tests and derive confidence intervals in the
presence of interference. The Mann–Whitney statistic, for example, takes all pairs consisting
of one treated and one untreated subject and counts the number of times that the treated
subject had a greater outcome than the untreated subject. The null distribution of this
statistic is known aprioriand does not depend on any aspect of the data-generating
distribution of the outcomes; it is therefore agnostic to any dependence among the outcomes.
After deriving the null distribution for such a statistic, one can compare the observed
distribution of the statistic to the null distribution to perform a hypothesis test and to
derive confidence intervals for the magnitude of the departure from the null hypothesis in
the event that it is rejected.
A slightly different version of randomization-based inference is developed by Bowers et al.
(2013). Under the null hypothesis, that is, supposing that we are observing outcomes in the
world in which the null is true, we can generate the randomization-based null distribution
of any test statistic by enumerating all of the possible treatment assignment permutations
and recalculating the test statistic for each one. For example, suppose we have a network
with four nodes and our experiment dictates assigning half to treatment. Our test statistic
is the difference in mean outcomes among the treated compared to the controls. The
null distribution of this statistic is comprised of six equiprobable realizations of the mean
difference: [(Y
1
+Y
2
)/2][(Y
3
+Y
4
)/2], [(Y
3
+Y
4
)/2][(Y
1
+Y
2
)/2], [(Y
1
+Y
3
)/2][(Y
2
+Y
4
)/2],
[(Y
2
+ Y
4
)/2] [(Y
1
+ Y
3
)/2], [(Y
1
+ Y
4
)/2] [(Y
2
+ Y
3
)/2], and [(Y
2
+ Y
3
)/2] [(Y
1
+ Y
4
)/2],
corresponding to the six possible permutations of the treatment assignment vector. If, in
reality, subjects 1 and 2 were assigned to treatment and 3 and 4 to control, then the observed
test statistic is the first in the list above—the mean outcome among treated minus the mean
outcome among controls. We can compare the observed test statistic to the statistic’s null
distribution and ask the question “how likely was this value to have occurred by chance,
assuming that the null hypothesis is true?”
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.154.252