186 Handbook of Big Data
with noise, so treating it as fixed and known (as most of the methods in Section 11.3 do)
may be inappropriate. This makes the already-difficult project of causal inference even more
challenging. The naive approach to causal inference using incomplete network data would
be to impute missing data in a first step and then to proceed with causal inference as if
the data estimated in the first step were fixed and known. The primary downside of this
procedure is that it does not incorporate the uncertainty from the network fitting into the
uncertainty about the causal effects; a procedure that performs both tasks simultaneously
is highly desirable.
In Lunagomez and Airoldi (2014), the authors tackle the problem of jointly modeling the
sampling mechanism for causal inference as well as the underlying network on which the data
was collected. The model selected for the network in this chapter is the simple Erdos–Renyi
model that depends on a single parameter p. Since the network is not fully observed under
the sampling scheme discussed in this chapter (respondent-driven sampling), the network
model is chosen to accommodate marginalizing out the missing network information in a
Bayesian framework. The use of a simple network model makes computation tractable, but
the framework proposed by Lunagomez and Airoldi (2014) can theoretically be relaxed to
incorporate any network model.
An alternative approach could be based on the proposal of Fosdick and Hoff (2013).
While these authors do not discuss estimation of causal effects, their procedure for the joint
modeling of network and nodal attributes can be adapted to a model-based causal analysis.
In particular, the authors leverage Section 11.2.3 to first test for a relationship between
nodal attributes Y
i
and the latent position vector lat
i
=(a
i
,b
i
,U
i·
,V
i·
) and, when evidence
for such a relationship is found, to jointly model the vector (Y
i
, lat
i
). Considering the nodal
attributes as potential outcomes and writing (Y
i
(0),Y
i
(1), lat
i
), it should in principle be
possible to jointly model the full data vector using the same Markov chain Monte Carlo
procedure as in Fosdick and Hoff (2013).
Without fully observing the network it is difficult to precisely define, let alone to
estimate, the causal effects discussed in Section 11.3. Jointly modeling network topology, to
account for missing data or subsampling, and causal effects for observations sampled from
network nodes, is one of the most important and challenging areas for future research. The
work of Lunagomez and Airoldi (2014) and Fosdick and Hoff (2013) point toward powerful
and promising solutions, but much work remains to be done.
References
E.M. Airoldi, D.M. Blei, S.E. Fienberg, and E.P. Xing. Mixed membership stochastic
blockmodels. The Journal of Machine Learning Research, 9:1981–2014, 2008.
E.M. Airoldi, T.B. Costa, and S.H. Chan. Stochastic blockmodel approximation of a
graphon: Theory and consistent estimation. In Advances in Neural Information Processing
Systems, pp. 692–700, 2013.
R. Albert and A.-L. Barab´asi. Statistical mechanics of complex networks. Reviews of Modern
Physics, 74(1):47, 2002.
M.M. Ali and D.S. Dwyer. Estimating peer effects in adolescent smoking behavior: A
longitudinal analysis. Journal of Adolescent Health, 45(4):402–408, 2009.
Networks 187
J.D. Angrist, G.W. Imbens, and D.B. Rubin. Identification of causal effects using instru-
mental variables. Journal of the American Statistical Association, 91(434):444–455, 1996.
J.D. Angrist and J.-S. Pischke. Mostly Harmless Econometrics: An Empiricist’s Companion.
Princeton University Press, Princeton, NJ, 2008.
P.M. Aronow and C. Samii. Estimating average causal effects under general interference.
Technical report, http://arxiv.org/abs/1305.6156, 2013.
A.-L. Barab´asi and R. Albert. Emergence of scaling in random networks. Science, 286(5439):
509–512, 1999.
P.J. Bickel and P. Sarkar. Hypothesis testing for automated community detection in
networks. arXiv preprint arXiv:1311.2694, 2013.
J. Blitzstein and P. Diaconis. A sequential importance sampling algorithm for generating
random graphs with prescribed degrees. Internet Mathematics, 6(4):489–522, 2011.
J. Bowers, M.M. Fredrickson, and C. Panagopoulos. Reasoning about interference between
units: A general framework. Political Analysis, 21(1):97–124, 2013.
J.T. Cacioppo, J.H. Fowler, and N.A. Christakis. Alone in the crowd: The structure
and spread of loneliness in a large social network. Journal of Personality and Social
Psychology, 97(6):977, 2009.
D.S. Choi. Estimation of monotone treatment effects in network experiments. arXiv preprint
arXiv:1408.4102, 2014.
D.S. Choi, P.J. Wolfe, and E.M. Airoldi. Stochastic blockmodels with a growing number of
classes. Biometrika, 99(2):273–284, 2012.
N.A. Christakis and J.H. Fowler. The spread of obesity in a large social network over 32
years. New England Journal of Medicine, 357(4):370–379, 2007.
N.A. Christakis and J.H. Fowler. The collective dynamics of smoking in a large social
network. New England Journal of Medicine, 358(21):2249–2258, 2008.
N.A. Christakis and J.H. Fowler. Social network sensors for early detection of contagious
outbreaks. PLoS One, 5(9):e12948, 2010.
N.A. Christakis and J.H. Fowler. Social contagion theory: Examining dynamic social
networks and human behavior. Statistics in Medicine, 32(4):556–577, 2013.
A. Clauset, C.R. Shalizi, and M.E.J. Newman. Power-law distributions in empirical data.
SIAM Review, 51(4):661–703, 2009.
E. Cohen-Cole and J.M. Fletcher. Is obesity contagious? Social networks vs. environmental
factors in the obesity epidemic. Journal of Health Economics, 27(5):1382–1387, 2008.
R. Durrett. Random Graph Dynamics, vol. 200. Cambridge University Press, Cambridge,
New York, 2007.
D. Eckles, B. Karrer, and J. Ugander. Design and analysis of experiments in networks:
Reducing bias from interference. arXiv preprint arXiv:1404.7530, 2014.
188 Handbook of Big Data
P. Erdos and A. Renyi. On random graphs I. Publicationes Mathematicae Debrecen,6:
290–297, 1959.
L. Euler. Solutio problematis ad geometriam situs pertinentis. Commentarii Academiae
Scientiarum Petropolitanae, 8:128–140, 1741.
R.A. Fisher. On the mathematical foundations of theoretical statistics. In Philosophical
Transactions of the Royal Society of London. Series A, Containing Papers of a Mathe-
matical or Physical Character, pp. 309–368, 1922.
B.K. Fosdick and P.D. Hoff. Testing and modeling dependencies between a network and
nodal attributes. arXiv preprint arXiv:1306.4708, 2013.
J.H. Fowler and N.A. Christakis. Estimating peer effects on health in social networks: A
response to Cohen-Cole and Fletcher; Trogdon, Nonnemaker, Pais. Journal of Health
Economics, 27(5):1400, 2008.
E.N. Gilbert. Random graphs. The Annals of Mathematical Statistics, 30:1141–1144, 1959.
A. Goldenberg, A.X. Zheng, S.E. Fienberg, and E.M. Airoldi. A survey of statistical network
models. Foundations and Trends
in Machine Learning, 2(2):129–233, 2010.
S. Greenland. An introduction to instrumental variables for epidemiologists. International
Journal of Epidemiology, 29(4):722–729, 2000.
M.E. Halloran and C.J. Struchiner. Causal inference in infectious diseases. Epidemiology,6
(2):142–151, 1995.
M.A. Hernan. A definition of causal effect for epidemiological research. Journal of
Epidemiology and Community Health, 58(4):265–271, 2004.
P. Hoff, B. Fosdick, A. Volfovsky, and K. Stovel. Likelihoods for fixed rank nomination
networks. Network Science, 1(03):253–277, 2013.
P.D. Hoff. Bilinear mixed-effects models for dyadic data. Journal of the American Statistical
Association, 100(469):286–295, 2005.
P.D. Hoff. Discussion of “Model-based clustering for social networks,” by Handcock, Raftery
and Tantrum. Journal of the Royal Statistical Society, Series A, 170(2):339, 2007.
P.D. Hoff. Modeling homophily and stochastic equivalence in symmetric relational data. In
J.C. Platt, D. Koller, Y. Singer, and S. Roweis (eds.), Advances in Neural Information
Processing Systems 20, pp. 657–664. MIT Press, Cambridge, MA, 2008. http://cran.r-
project.org/web/packages/eigenmodel/.
P.D. Hoff, A.E. Raftery, and M.S. Handcock. Latent space approaches to social network
analysis. Journal of the American Statistical Association, 97(460):1090–1098, 2002.
P.W. Holland, K.B. Laskey, and S. Leinhardt. Stochastic blockmodels: First steps. Social
Networks, 5(2):109–137, 1983.
B. Karrer and M.E.J. Newman. Stochastic blockmodels and community structure in
networks. Physical Review E, 83(1):016107, 2011.
E.D. Kolaczyk. Statistical Analysis of Network Data. Springer, New York, 2009.
E.D. Kolaczyk and P.N. Krivitsky. On the question of effective sample size in network
modeling. arXiv preprint arXiv:1112.0840, 2011.
Networks 189
D. Lazer, B. Rubineau, C. Chetkovich, N. Katz, and M. Neblo. The coevolution of networks
and political attitudes. Political Communication, 27(3):248–274, 2010.
S. Lunagomez and E. Airoldi. Bayesian inference from non-ignorable network sampling
designs. arXiv preprint arXiv:1401.4718, 2014.
R. Lyons. The spread of evidence-poor medicine via flawed social-network analysis.
Statistics, Politics, and Policy, 2(1), 2011.
C.F. Manski. Identification of endogenous social effects: The reflection problem. The Review
of Economic Studies, 60(3):531–542, 1993.
C.F. Manski. Identification of treatment response with social interactions. The Econometrics
Journal, 16(1):S1–S23, 2013.
H. Noel and B. Nyhan. The unfriending problem: The consequences of homophily in
friendship retention for causal estimates of social influence. Social Networks, 33(3):
211–218, 2011.
K. Nowicki and T.A.B. Snijders. Estimation and prediction for stochastic blockstructures.
Journal of the American Statistical Association, 96(455):1077–1087, 2001.
E.L. Ogburn and T.J. VanderWeele. Causal diagrams for interference. Statistical Science,
29(4):559–578, 2014a.
E.L. Ogburn and T.J. VanderWeele. Vaccines, contagion, and social networks. arXiv
preprint arXiv:1403.1241, 2014b.
A.J. O’Malley, F. Elwert, J.N. Rosenquist, A.M. Zaslavsky, and N.A. Christakis. Estimat-
ing peer effects in longitudinal dyadic data using instrumental variables. Biometrics,
70(3):506–515, 2014.
J. Pearl. Causality: Models, Reasoning and Inference, vol. 29. Cambridge University Press,
New York, 2000.
K. Rohe, S. Chatterjee, and B. Yu. Spectral clustering and the high-dimensional stochastic
blockmodel. The Annals of Statistics, 39(4):1878–1915, 2011.
P.R. Rosenbaum. Interference between units in randomized experiments. Journal of the
American Statistical Association, 102(477):191–200, 2007.
J.N. Rosenquist, J. Murabito, J.H. Fowler, and N.A. Christakis. The spread of alcohol
consumption behavior in a large social network. Annals of Internal Medicine, 152(7):
426–433, 2010.
D.B. Rubin. Causal inference using potential outcomes. Journal of the American Statistical
Association, 100(469), 2005.
C.R. Shalizi. Comment on “why and when ‘flawed’ social network analyses still yield valid
tests of no contagion.” Statistics, Politics, and Policy, 3(1):1–3, 2012.
C.R. Shalizi and A.C. Thomas. Homophily and contagion are generically confounded in
observational social network studies. Sociological Methods & Research, 40(2):211–239,
2011.
E.A. Thompson and C.J. Geyer. Fuzzy p-values in latent variable problems. Biometrika,94
(1):49–60, 2007.
190 Handbook of Big Data
P. Toulis and E. Kao. Estimation of causal peer influence effects. In Proceedings of the 30th
International Conference on Machine Learning, pp. 1489–1497, 2013.
J. Ugander, B. Karrer, L. Backstrom, and J. Kleinberg. Graph cluster randomization:
Network exposure to multiple universes. In Proceedings of the 19th ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, pp. 329–337. ACM,
2013.
M.J. van der Laan. Causal inference for a population of causally connected units. Journal
of Causal Inference, 2(1):13–74, 2014.
M.J. van der Laan, E.L. Ogburn, and I. Diaz. Causal inference for social networks.
(forthcoming).
M.J. van der Laan and S. Rose. Targeted Learning: Causal Inference for Observational and
Experimental Data. Springer, New York, 2011.
T.J. VanderWeele. Sensitivity analysis for contagion effects in social networks. Sociological
Methods & Research, 40(2):240–255, 2011.
T.J. VanderWeele, E.L. Ogburn, and E.J. Tchetgen Tchetgen. Why and when “flawed”
social network analyses still yield valid tests of no contagion. Statistics, Politics, and
Policy, 3(1):1–11, 2012.
A. Volfovsky and E. Airoldi. Characterization of finite group invariant distributions. arXiv
preprint arXiv:1407.6092, 2014.
A. Volfovsky and P.D. Hoff. Testing for nodal dependence in relational data matrices.
Journal of the American Statistical Association, 2014.
Y.J. Wang and G.Y. Wong. Stochastic blockmodels for directed graphs. Journal of the
American Statistical Association, 82(397):8–19, 1987.
R.M. Warner, D.A. Kenny, and M. Stoto. A new round robin analysis of variance for social
interaction data. Journal of Personality and Social Psychology, 37(10):1742, 1979.
D.J. Watts and S.H. Strogatz. Collective dynamics of “small-world” networks. Nature, 393
(6684):440–442, 1998.
J.J. Yang, Q. Han, and E.M. Airoldi. Nonparametric estimation and testing of exchangeable
graph models. In Proceedings of the 17th International Conference on Articial Intelli-
gence and Statistics, pp. 1060–1067, 2014.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.133.54