11. Networks (2/4)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

176 Handbook of Big Data

Extensions to the stochastic blockmodel include mixed membership models (Airoldi

et al., 2008) and degree corrected stochastic blockmodels, which induce power law

distributions on the degrees (Karrer and Newman, 2011).

11.2.3 General Latent Space Model

A ﬁnal level of complexity is aﬀorded by the general latent space model. Under this model,

the nodes of a network are embedded into a low-dimensional latent space, usually Euclidean,

and the probability of an edge between any two nodes is a function of their latent positions.

For example, in the latent distance model, the probability of a tie increases as the (Euclidean)

distance between the latent positions decreases (Hoﬀ et al., 2002). This captures both

reciprocity and transitivity in the formation of network edges: since distances are symmetric,

if the probability of an edge between i and j is high, then the probability of an edge between

j and i will also be high, and the triangle inequality suggests that if i and j are close and j

and t are close, then i and t are going to be close. Reciprocity and transitivity are properties

that are thought to be important in real-world networks but are impossible to incorporate

into the Erdos–Renyi–Gilbert model or the stochastic blockmodel. The inherent symmetry

of the distance model rules out the possibility that certain nodes have a greater aﬃnity for

ties than others, and to circumvent this limitation, the general latent space model allows for

asymmetric functions of the latent positions as well as for node- and dyad-speciﬁc covariates

to aﬀect the probability of tie formation. An example of a latent space model with additive

and multiplicative functions of the latent positions as well as such covariates is described in

detail below.

Consider an n×n asymmetric adjacency matrix A, representing a directed graph, and let

X be an n ×n ×p array of observed characteristics. Each n ×n slice of X is either constant

in the rows (representing a ﬁxed eﬀect that contributes to the propensity to send ties in

the network, or sender eﬀect); constant in the columns (representing a ﬁxed eﬀect that

contributes to the propensity to receive ties in the network, or receiver eﬀect); or neither,

representing dyadic eﬀects. We can model a

as the indicator 1

that s

> 0, where

= X

ij·

θ + α

+ β

+ u

+ 

, X

ij·

is the p-dimensional vector of covariates associated

with the relationship between nodes i and j, α

is an additive sender eﬀect, β

is an additive

receiver eﬀect, and u

is a multiplicative eﬀect (as it is the projection of u

in the direction

of v

in the latent space) that captures similarity between nodes i and j (Hoﬀ, 2005). This

model is a generalization of the social relations model of Warner et al. (1979). Reciprocity

can be introduced into the model by allowing for the error terms (

, 

) to be correlated.

Here X

ij·

might include sender-speciﬁc information, receiver-speciﬁc information, or dyadic

information. The additive latent eﬀects α

and β

contain information about the aﬃnity of

nodes i and j to send and receive ties in general, while the multiplicative eﬀect u

contains

the information on the latent similarity of the two nodes. In particular, if the nodes are close

in the latent space (u

> 0), then the probability of a tie is increased and if they are far

apart (u

< 0), then it is decreased.

The third panel of Figure 11.1 displays a directed network generated from the latent class

model described above (without covariates and with a one-dimensional latent space). The

two sets of nodes are colored according to the sign of α

. The emergence of the two clusters

is due to the multiplicative eﬀect α

: ties are more likely between individuals for whom

the signs of α

match. This demonstrates the ability of this model to capture stochastic

blockmodel behavior. Each node has its own probability of sending a tie to another node,

which allows for much greater ﬂexibility than the blockmodel. The yellow nodes send out

more ties to the blue nodes than they receive from the blue nodes, due to the additional

additive eﬀect of α

in the model as nodes with α

> 0 have a higher probability of sending

out ties.

Networks 177

General latent space models can be ﬁt to data from a real network via a Markov chain

Monte Carlo algorithm (Hoﬀ, 2005). A salient advantage of these models is their ability

to model restrictions on the sampling design according to which data from a network is

collected. For example, when collecting information about friendship networks in a survey

setting, researchers often ask individuals to name their top ﬁve friends. This type of network

sampling scheme is known as the ﬁxed rank nominations scheme and is explored in detail

in Hoﬀ et al. (2013). When this sampling design is ignored, estimation of eﬀects on the

probability of edge formation is imprecise, potentially leading to incorrect inference. The

ability to account for restrictions on the sampling design makes this general class of models

potentially promising for causal inference using data from partially observed networks.

11.2.4 Testing for Network Structure

All three models described above make assumptions about the similarity among nodes in

a network, allowing for stochastic equivalence, homophily or both (Hoﬀ, 2007). Recent

work by Bickel and Sarkar (2013) and Volfovsky and Hoﬀ (2014) proposes two diﬀerent

testing procedures for the amount of similarity among nodes in a relational dataset. These

tests can inform of the validity of the assumptions underlying the three models described

above; rejecting the null hypothesis can be seen as an indication that the data exhibit more

complexity than the null model allows. Bickel and Sarkar (2013) leverage Tracy–Widom

theory to develop a hypothesis test for the null hypothesis that the graph comes from

an Erdos–Renyi–Gilbert model (rejection of the null suggests the presence of blocks or

clusters, as in the stochastic blockmodel), while Volfovsky and Hoﬀ (2014) develop a test

for correlation among the rows and columns of a relational data matrix within the context

of the normal model (deﬁned below) and provide an extension to the latent space model for

networks that informs the choice of dimension of the latent space.

11.2.4.1 Bickel and Sarkar (2013) Main Result

Let A be the adjacency matrix of an Erdos–Renyi random graph with parameter p and let

ˆp betheestimateofp. For the centered and scaled adjacency matrix,



A − (nˆp11

− ˆpI)



(n − 1)ˆp(1 − ˆp)

the limiting distribution of the leading eigenvalue, λ

(



)isgivenby

2/3

(λ

(



) − 2)

→ Tracy–Widom

Given this limiting distribution, it is now easy to construct a test for the null hypothesis

that a graph is generated from an Erdos–Renyi model with parameter p. Bickel and Sarkar

(2013) propose using this test to recursively ﬁnd the blocks of the stochastic blockmodel of

Section 11.2.2. In particular, if the test rejects the null hypothesis, then the algorithm splits

the graph into two blocks and recurses the test on each block.

11.2.4.2 Volfovsky and Hoﬀ (2014) Main Result

Let S be an n ×n random matrix distributed according to the matrix normal distribution

with zero mean matrix and Kronecker covariance structure Σ

col

⊗ Σ

row

. If the entries of S

represent a weighted relationship between actors i and j in a network, then the identities

E[ZZ

]=Σ

row

tr(Σ

col

)andE[S

S]=Σ

col

tr(Σ

row

) suggest the interpretation of Σ

row

and

178 Handbook of Big Data

col

as the covariance of the network nodes as senders and as receivers, respectively. Under

the normal model, using a single sample matrix, we can construct a likelihood ratio test for

row and column dependenc under this model. This is because both under the null hypothesis

that Σ

row

and Σ

col

are diagonal positive deﬁnite and under the alternative hypothesis that

row

and Σ

col

are positive deﬁnite, the matrix normal likelihood is bounded for a single

matrix S.

To connect this test to binary networks we make use of the latent model of Section 11.2.3.

Consider a graph with adjacency matrix A where the probability of an edge is modeled as

a function of latent multiplicative sender and receiver eﬀects, u

∈ R

using the probit

link: Pr(a

=1)=Φ(γ + u

). Writing this model in the latent variable formulation we

write a

>−γ]

,wheres

= u

+

.Then×R matrices U and V formed by stacking

the vectors u

, i =1, ..., n and v

, j =1, ..., n, describe the row and column heterogeneity

in S, respectively, since E[SS

]=U(V

V )U

+ I and E[S

S]=V (U

U)V

+ I.Themain

challenge here is choosing the appropriate rank R to use for the model. This can be achieved

by performing the likelihood ratio test described above on the latent scale for a sequence of

ranks. When the test stops rejecting the null of no dependence among the nodes, it means

that the latent vectors capture all of the dependence between senders and between receivers

present in A. The above-described likelihood ratio test can be applied on the latent scale.

Since latent space models are best ﬁt using Bayesian methodology (Hoﬀ, 2005, 2008), we can

easily construct a posterior predictive p-value using the techniques described in Thompson

and Geyer (2007).

Testing procedures have been recently developed for more complicated models: Yang

et al. (2014) proposed a novel test for a graphon, a limit object for random graphs. The

models described in this section carry with them the underlying assumption of inﬁnite

exchangeability, that is, that the ﬁnite networks being modeled comprise subsamples of an

inﬁnite graph where the nodes can be relabeled without aﬀecting the joint distribution of

the graph. In a recent work, Volfovsky and Airoldi (2014) demonstrated that distributions

that satisfy the substantially weaker assumption of ﬁnite exchangeability are close in total

variation distance to the distribution of graphs satisfying the stronger assumption. This

suggests that inference made under the stronger assumption is likely to be appropriate even

when only the weaker assumption holds.

11.3 Observations Sampled from Network Nodes

In most of the literature on networks, the network itself is the object of study: features of

network topology and probability models for edge formation, node addition, and network

generation. But situations abound in which the network can be thought of as a nuisance

parameter and interest is instead in the behavior and attributes of the network nodes.

For social networks, which are our focus here, we might be interested in how behaviors

spread from one person to their social contacts, how attributes cluster among network

nodes, or how one person’s randomly assigned treatment might aﬀect their social contacts’

outcomes. The underlying network gives important structure to the data—network ties

represent opportunities for contagion or correlation—but interest is in the eﬀects, not the

determinants, of the network topology. Research methods for analyzing data on outcomes

sampled from a single network is nascent and remains underdeveloped. In what follows we

describe recent work in this area and conclude with a description of open problems. Most

of the research in this area assumes that the network is ﬁxed and known.

Networks 179

11.3.1 Terminology and Notation

In this section, we will use the terms nodes, subjects, and individuals interchangeably. This

is a slight abuse of the term node but should cause no confusion as we are not concerned

with network topology except insofar as it informs relations among the subjects and the

observations they furnish. Let Y

represent an outcome for node/subject i and let Z

be a

treatment or exposure for individual i. Sometimes, we will index the outcomes with time,

that is, Y

is the outcome for subject i at time t.Ifwehaveobservedn subjects, Y is an

n-dimensional vector of outcomes for subjects 1 through n, possibly indexed by time, and

Z is an n-dimensional vector of exposures or treatments for subjects 1 through n.

Questions about the inﬂuence one subject has on the outcome of another subject

are inherently questions about causal eﬀects, which are deﬁned in terms of potential or

counterfactual outcomes (see, e.g., Hernan, 2004; Rubin, 2005). In general, a unit-level

potential outcome, Y

(z), is deﬁned as the outcome that we would have observed for subject

i if we could have intervened to set that subject’s treatment or exposure Z

to value z,

where z is in the support of Z. But social networks represent a paradigmatic opportunity

for interference: one subject’s exposure may aﬀect not only his own outcome but also the

outcomes of his social contacts and possibly other subjects. This means that the traditional

unit-level potential outcomes are not well deﬁned. Instead, Y

(z) is the outcome that we

would have observed if we could have set the vector of exposures for the entire population, Z,

to z =(z

, ..., z

), where for each i, z

is in the support of Z. In the presence of interference,

there are many types of causal eﬀects that might be of interest. Unit-level causal eﬀects

isolate the eﬀect of a subject’s own exposure on his outcome. These eﬀects are usually

deﬁned either holding other subjects’ exposures constant or averaging over possible exposure

distributions for other subjects. Spillover eﬀects quantify the eﬀect on subject i’s outcome

of other subjects’ exposures. For a binary treatment, the contrast



i=1

E[Y

(1) −Y

(0)] is

often of interest. This is the diﬀerence in average potential outcomes in a world in which

every subject is assigned to treatment compared to a world in which every subject is assigned

to control. Eckles et al. (2014) call this particular estimand the average treatment eﬀect

(ATE); however, to diﬀerentiate this estimand from the ATE deﬁned in the absence of

interference, we will adopt the terminology of Halloran and Struchiner (1995) and refer



i=1

E[Y

(1) − Y

(0)] as the overall ATE. See Ogburn and VanderWeele (2014a) and

references therein for further discussion of causal eﬀects in the presence of interference and

of the distinction between interference and contagion.

We distinguish between interference, which is present when one subject’s treatment or

exposure may aﬀect others’ outcomes, and peer eﬀects or contagion,whicharepresent

when one subject’s outcome may inﬂuence or transmit to other subjects (Ogburn and

VanderWeele, 2014a). In the literature on peer eﬀects, an ego is deﬁned as a subject whose

outcome we are interested in studying and the ego’s alters are the ego’s neighbors, that is,

subjects who share an edge with the ego in the underlying network. A peer eﬀect is a causal

eﬀect on an ego’s outcome at time t of his alter’s outcome at time s for some s<t.

11.3.2 Experiments in Networks

The design of randomized experiments in networks, when subjects and their treatments

may not be independent, is an area of recent but increasing interest. Without attempting

to be exhaustive, this section reviews what we consider to be the most important threads

in this line of research.

Experiments are diﬃcult for two reasons: interference, which may make it impossible

to simultaneously observe the treatment condition (e.g., every subject is treated) for some

180 Handbook of Big Data

units and the control condition (e.g., no subject is treated) for others, and dependence

among observations, which makes the estimation of standard errors diﬃcult. Fisherian

randomization-based inference can circumvent the problem of dependence, as described

in Sections 11.3.2.1 and 11.3.2.2. Other randomization-based approaches assume that the

only source of dependence is interference, and therefore that observations are independent

conditional on treatment. There are two approaches to dealing with the challenge of

interference in the randomization literature: assumptions on the limits of interference and

attempts to minimize bias due to interference. Both are discussed below.

11.3.2.1 Fisherian Hypothesis Testing

Most of the work on inference about outcomes sampled from network nodes concerns

randomized experiments. But there is an important distinction to be made between inference

from randomized experiments and the speciﬁc case of Fisherian randomization-based

inference, pioneered by Fisher (1922) and applied to network-like settings by Rosenbaum

(2007) and Bowers et al. (2013). Fisherian randomization-based inference is founded on

the very intuitive notion that, under the null hypothesis of no eﬀect of treatment on

any subject (sometimes called the sharp null hypothesis to distinguish it from other null

hypotheses that may be of interest), the treated and control groups are random samples from

the same underlying distribution. Fisherian randomization-based inference treats outcomes

as ﬁxed and treatment assignments as random variables: quantities that depend on the

vector of treatment assignments are the only random variables in this paradigm. Therefore,

dependence among outcomes is a nonissue.

In an inﬂuential paper, Rosenbaum (2007) proposed the use of distribution-free statistics

(statistics whose distribution under the null hypothesis can be determined apriori, without

reference to the data) to perform hypothesis tests and derive conﬁdence intervals in the

presence of interference. The Mann–Whitney statistic, for example, takes all pairs consisting

of one treated and one untreated subject and counts the number of times that the treated

subject had a greater outcome than the untreated subject. The null distribution of this

statistic is known aprioriand does not depend on any aspect of the data-generating

distribution of the outcomes; it is therefore agnostic to any dependence among the outcomes.

After deriving the null distribution for such a statistic, one can compare the observed

distribution of the statistic to the null distribution to perform a hypothesis test and to

derive conﬁdence intervals for the magnitude of the departure from the null hypothesis in

the event that it is rejected.

A slightly diﬀerent version of randomization-based inference is developed by Bowers et al.

(2013). Under the null hypothesis, that is, supposing that we are observing outcomes in the

world in which the null is true, we can generate the randomization-based null distribution

of any test statistic by enumerating all of the possible treatment assignment permutations

and recalculating the test statistic for each one. For example, suppose we have a network

with four nodes and our experiment dictates assigning half to treatment. Our test statistic

is the diﬀerence in mean outcomes among the treated compared to the controls. The

null distribution of this statistic is comprised of six equiprobable realizations of the mean

diﬀerence: [(Y

)/2]−[(Y

)/2], [(Y

)/2]−[(Y

)/2], [(Y

)/2]−[(Y

)/2],

[(Y

+ Y

)/2] −[(Y

+ Y

)/2], [(Y

+ Y

)/2] −[(Y

+ Y

)/2], and [(Y

+ Y

)/2] −[(Y

+ Y

)/2],

corresponding to the six possible permutations of the treatment assignment vector. If, in

reality, subjects 1 and 2 were assigned to treatment and 3 and 4 to control, then the observed

test statistic is the ﬁrst in the list above—the mean outcome among treated minus the mean

outcome among controls. We can compare the observed test statistic to the statistic’s null

distribution and ask the question “how likely was this value to have occurred by chance,

assuming that the null hypothesis is true?”

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 11. Networks (2/4)

Create new playlist

Sign In

Sign Up

Table of Contents for
11. Networks (2/4)