4 Research Designs Dominated by Self-Selection

In this section, we briefly consider a group of research designs where institutional knowledge of the treatment assignment process typically does not provide most of the information needed to draw causal inferences. In Section 3, we discussed how some aspects of the assignment process could be treated more as literal “descriptions” (“D”-conditions), rather than conjectures or assumptions. In each of the four research designs, those “D” conditions went a long way towards identification, and when other structural assumptions (“S”-conditions) were needed, the class of models consistent with those assumptions, while strictly smaller because of the restrictions, arguably remained very broad.

With the common program evaluation approaches considered in this section, we shall see that the assignment process is not dominated by explicit institutional knowledge, and identification thus requires more conjectures/assumptions (“S”-conditions) to make causal inferences. We will argue that with these designs there will be more scope for alternative plausible economic models that would be strictly inconsistent with the conditions needed for identification. Of the three approaches we consider, the “difference-in-difference” approach appears to have the best potential for testing the key “S”-conditions needed for identification.

For the reasons above, we suggest that these designs will tend to deliver causal inferences with lower internal validity, in comparison to the designs described in Section 3. But even if one agrees with the three criteria we have put forth in Section 2.2.1 to assess internal validity, and even if one agrees with the conclusion that these designs deliver lower internal validity, the question of “how much lower” requires a subjective judgment, and such a question is ill-defined at any rate (what is a unit of “internal validity”?). Of the three criteria we discuss, the extent to which the conditions for identification can be treated as a hypothesis with testable implications seems to be the least subjective in nature.

In our discussion below, it is still true that a particular weighted average effect of “interest” from an ex ante evaluation problem may in general be quite different from the effects identified by these research designs. In this sense, these approaches suffer from similar “external validity” concerns as discussed in Section 3. We will therefore focus our discussion on the ex post evaluation goal, and do not have separate sections on ways to extrapolate from the results of an ex post evaluation to forecast the effect of interest in an ex ante evaluation.

4.1 Using longitudinal data: “difference-in-difference”

We now consider the case where one has longitudinal data on program participants and non-participants. Suppose we are interested in the effectiveness of a job training program in raising earnings. image is now earnings and image is participation in the job training program. A commonly used approach in program evaluation is the “difference-in-difference” design, which has been discussed as a methodology and utilized in program evaluation research in some form or another countless times. Our only purpose here is to discuss how the design fits into the general framework we have used in this chapter, and to be explicit about the restrictions in a way that facilitates comparison with the designs in Section 3.

First, let us simplify the problem by considering the situation where the program was made available at only one point in time image. This allows us to define image as those who were treated at time image, and image as those who did not take up the program at that time.

D7 (Program exposure at one point in time): Individuals will have image for all image; for image, the non-treated will continue to have image while the treated will have image.

image will continue to denote all the factors that could potentially affect image. Additionally, we imagine that this vector of variables could be partitioned into sub-vectors image, image. where the subscript denotes the value of the variables at time image.

Furthermore, we explicitly include time in the outcome equation as


image


A “difference-in-difference” approach begins by putting some structure on image:

S15 (Additive Separability): image.

This highly restrictive structure (although it does not rule out heterogeneous treatment effects) is the standard “individual fixed effects” specification for the outcome, where image captures the permanent component of the outcome.

Perhaps the most important thing to keep in mind is that D7 and S15 is not generally sufficient for the “difference-in-difference” approach to identify the treatment effects. This is because the two differences in question are

image

image

The term image is equal to image, the treatment on the treated (TOT) parameter. When the second equation is subtracted from the first, the terms with image do not cancel without further restrictions.

One approach to this problem is to further assume that

S16 (Influence of “Other factors” Fixed): image, for any image.

This certainly would ensure that the D-in-D estimand identifies the TOT. It is, however, restrictive: even if image—so that only contemporaneous factors are relevant—then as long as there were some factors in image that changed over time, this would be violated. Note that in this case, it is irrelevant how similar or different the distribution of unobservable types are between the treated and non-treated individuals (image vs. image). 39

4.1.1 Assessment

In terms of our three criterion for assessing this approach, how does the D-in-D fare? It should be very clear from the above derivation that the model of both the outcome and treatment assignment is a far cry from a literal description of the data generating process, except for D7, which describes the timing of the program and structure of the data. As for the second criterion, it is not difficult to imagine writing down economic models that would violate the restrictions. Indeed, much of the early program evaluation literature (Heckman and Robb, 1985; Ashenfelter, 1978; Ashenfelter and Card, 1985) discussed different scenarios under which S15 and S16 would be violated, and how to nevertheless identify program effects.

On the other hand, there is one positive feature of this approach—and it is driven by S16, which is precisely the assumption that allowed identification of the program effect—is that there are strong testable predictions of the design, namely


image


for all image. That is, the choice of the base year in constructing the DD should be irrelevant. Put differently, it means that


image


for image: the DD estimand during the “pre-program” period should equal zero. As is well-known from the literature, there are as many testable restrictions as there are pre-program periods, and it is intuitive that the more evidence that these restrictions are not rejected, the greater confidence we might have in the causal inferences that are made from the D-in-D.

Overall, while it is clear that the assumptions given by S15 and S16 require a great deal of speculation—and arguably a greater suspension of disbelief, relative to the conditions outlined in Section 3—at least there is empirical evidence (the pre-program data) that can be used to assess the plausibility of the key identifying assumption, S16.

4.2 Selection on unobservables and instrumental variables

In this section, we briefly discuss the identification of program effects when we have much less information about treatment assignment, relative to the designs in Section 3. We will focus on the use of instrumental variable approaches, but it will be clear that the key points we discuss will equally apply to a control function approach.

The instrumental variable approach is typically described as finding a variable image that impacts image only through its effect on treatment status image. Returning to our job search assistance program example, let us take image to be the binary variable of whether the individual’s sibling participated in the program. The “story” behind this instrument would be that the sibling’s participation might be correlated with the individual’s participation—perhaps because the sibling would be an influential source of information about the program—but that there is no reason why a sibling’s participation in the program would directly impact the individual’s re-employment probabilities.

Even if one completely accepts this “story”—and there are undoubtedly reasons to question its plausibility—this is not sufficient to identify the treatment effect via this instrument. To see this, we use the framework in Eqs (1)-(3), and adopt S8 (Excludability) and S9 (Probabilistic Monotonicity).

S8 is the formal way to say that for any individual type image, image (the sibling’s program participation) has no direct impact on the outcome. This exclusion restriction might come from a particular behavioral model. Furthermore, S9 simply formalizes the notion that for any given type image, the probability of receiving treatment is higher when the sibling participated in the program. Alternatively, it says that for each type image, there are more individuals who are induced to receive treatment because of their sibling’s participation, than those who would be discouraged from doing so.

The problem is that, in general, it is easy to imagine that there is heterogeneity in the latent propensity for the individual to have a sibling participate in the program: image has a non-degenerate distribution. If such variability in image exists in the population, this immediately implies that image will in general be different from image. The IV (Wald) estimand will in general not identify any average effect, LATE or otherwise.

Typically researchers immediately recognize that their instrument image is not “as good as randomly assigned” as in D3, and so instead appeal to a “weaker” condition that

S17 (Conditional on image, image “as good as randomly assigned”): image, a function of image.

This is a restriction on the heterogeneity in image. S17 says that types with the same image have identical propensities image.

Of course, the notion that the analyst knows and could measure all the factors in image, is a conjecture in itself:

S18 (Sufficient variables for image): Let image be the observable (to the researcher) elements of image, and assume image for all image.

S18 simply says that the researcher happens to observe all the variables that determine the propensity image.

It should be clear that with S17, S18, S8, and S9, one can condition the analysis on a particular value image, and apply the results from the randomized experiment with imperfect compliance (Section 3.2).

Note that while we have focused on the binary instrument case, it should also be clear that this argument will apply to the case when image is continuously distributed. It is not sufficient for image to simply be excluded from the outcome equation, image must be assumed to be (conditionally) independent of image, and indeed this is the standard assumption in the evaluation literature 40

4.2.1 Assessment

It is clear that in this situation, as D3 is replaced with S17 and S18, there is now no element of the statistical model that can be considered a literal description of the treatment assignment process. The model is entirely driven by assumptions about economic behavior, rather than a description of the data generating process that is derived from our institutional knowledge about how treatment was assigned.

S17 makes it clear that causal inferences will be dependent on having the correct set of variables image. Without the complete set of image, there will be variability in image conditional on the covariates, which will mean that the distribution of types image will not be the same in the image and image groups. In general, different theories about which image’s satisfy S18 will lead to a different causal inference. Recall that no similar specification of the relevant image’s was necessary in the case of the randomized experiment with imperfect compliance, considered in Section 3.2.

Finally, there seems to be very little scope for testing the validity of this design. If the argument is that the instrument is independent of image only conditional on all the images observed by the researcher, then all the data will have been “used up” to identify the causal parameter.

To make the design somewhat testable, the researcher could assume that only a smaller subset of variables in image are needed to characterize the heterogeneity in image. In that case, one could imagine conditioning on that smaller subset, and examining whether the distribution of the remaining image variables are balanced between the image and image, as suggested in Section 3.2 (with the appropriate caveats and qualifications discussed there). But in practice, when evidence of imbalance is found, the temptation is to simply include those variables as conditioning variables to achieve identification, which then eliminates the potential for testing.

What this shows is the benefit to credibility that one obtains from explicit knowledge about the assignment process whereby D3 is a literal description of what is known about the process. It disciplines the analysis so that any observed image variables can be used to treat D3 as a hypothesis to be tested.

4.3 Selection on observables and matching

Absent detailed institutional knowledge of the selection process, i.e., the propensity score equation, a common approach to the evaluation problem is to “control” for covariates, either in a multiple regression framework, or more flexibly through multivariate matching or propensity score techniques. 41 Each of these approaches essentially assumes that conditional on some observed covariates image, treatment status is essentially “as good as randomly assigned”.

In terms of the model given in Eqs (1)-(3), one can think of the selection on observables approach as amounting from two important assumptions. First we have

S19 (Conditional on image, image is “as good as randomly assigned”): image, a function of image.

Here, the unobservable type image does not enter the latent propensity, whereas it does in (2).

It is important to reiterate that the function image, the so-called “propensity score”—which can be obtained as long as one can observe image—is not, in general, the same thing as the latent propensity image for a given image. That is, even though it will always be true that image, there will be heterogeneity in image for a given image, unless one imposes the condition S19. And it is precisely this heterogeneity, and its correlation with the outcome image, that is the central problem of making causal inferences, as discussed in Section 2.2.

In addition, if the researcher presumes that there are some factors that are unobservable in image, then one must further assume that

S20 (S19 + Sufficient variables for image): Let image be the observable (to the researcher) elements of image, and assume image for all image.

So S20 goes further to say that not only are the images sufficient to characterize the underlying propensity, all the unobservable elements of image are irrelevant in determining the underlying propensity image.

S20 has the same implications as condition D2 discussed in Section 3.1.2: the difference image identifies the (conditional) average treatment effect image. The key difference is that in Section 3.1.2, D2 was a literal description of a particular assignment process (random assignment with probabilities of assignment being a function of image). 42 Here, S20 is a restriction on the framework defined by Eqs (1)-(3).

To see how important it is not to have variability in image conditional on image, consider the “conditional” version of Eq. (5)


image


A non-degenerate density image will automatically lead to image, which would prevent the two terms from being combined. 43

4.3.1 Assessment: included variable bias

In most observational studies, analysts will rarely claim that they have a model of behavior or institutions that dictate that the assignment mechanism must be modeled as S20. More often, S20 is invoked because there is an explicit recognition that there is non-random selection into treatment, so that image is certainly not unconditionally randomly assigned. S20 is offered as a “weaker” alternative.

Perhaps the most unattractive feature of this design is that, even if one believes that S20 does hold, typically there is not much in the way of guidance as to what images to include, as has long been recognized (Heckman et al., 1998). There usually is a multitude of different plausible specifications, and no disciplined way to choose among those specifications.

It is therefore tempting to believe that if we compare treatment and control individuals who look more and more similar on observable dimensions image, then—even if the resulting bias is non-zero—at the least, the bias in the estimate will decrease. Indeed, there is a “folklore” in the literature which suggests that “overfitting” is either beneficial or at worst harmless. Rubin and Thomas (1996) suggest including variables in the propensity score unless there is a consensus that they do not belong. Millimet and Tchernis (2009) go further and argue that overfitting the propensity score equation is possibly beneficial and at worst harmless.

It is instructive to consider a few examples to illuminate why this is in general not true, and how adding image’s can lead to “included variable bias”. To gain some intuition, first consider the simple linear model


image


where image is the coefficient of interest and image. The probability limit of the OLS regression coefficient on image is


image


Now suppose there is a “control variable” image. Suppose image actually has covariance image, so it is an “irrelevant” variable, but it can explain some variation in image. When image is included, the least squares coefficient on image will be


image


where image is the predicted value from a population regression of image on image. This expression shows that the magnitude of the bias in the least squares estimand that includes image will be strictly larger, with the denominator in the bias term decreasing. What is happening is that the extra variable image is doing nothing to reduce bias, while absorbing some of the variation in image.

To gain further intuition on the potential harm in “matching on images” in the treatment evaluation problem, consider the following system of equations

image

image

where image, the “control” variable, is in this case a binary variable. image is assumed to be independent of image. This is a simplified linear and parametric version of (1), (2), and (3). 44

The bias of the simple difference in means—without accounting for image—can be shown to be


image


whereas the bias in the matching estimand for the TOT is


image


In this very simple example, a comparison between image and image reveals two sources of differences. First, there is a standard “omitted variable bias” that stems from the first term in image. This bias does not exist in the matching estimand.

But there is another component in image, the term in curly braces—call it the “selectivity bias” term. This term will always be smaller in magnitude than image. That is, “controlling for” image can only increase the magnitude of the selectivity bias term. To see this, note that the difference between the term in curly braces in image and image is the difference between


image     (15)


and


image     (16)


in image. Each of these expressions is a weighted average of image.

Consider the case of positive selectivity, so image is increasing in image, so that the selectivity term (curly braces) in image is positive, and suppose that image (i.e. image).45 This means that image. That is, among the non-treated group, those with image are more negatively selected than those with image.

Comparing (15) and (16), it is clear that image automatically places relatively more weight on image, since image, and hence image will exceed the selectivity term (curly braces) in image. 46 Intuitively, as image can “explain” more and more of the variation in image, the exceptions (those with image, but image) must have unobservable factors that are even more extreme in order to be exceptional. And it is precisely those exceptional individuals that are implicitly given relatively more weight when we “control” for image.

So in the presence of nontrivial selection on unobservables, a matching on observables approach will generally exacerbate the selectivity bias. Overall, this implies that a reduction in bias will require the possible “benefits”—elimination of the omitted variable bias driven by image—to outweigh the cost of exacerbating the selectivity bias.

There is another distinct reason why the magnitude of image may be larger than that of image. The problem is that image is unknown, and the sign and magnitude need not be tied to the fact that image correlates with image. In the above example, even if there is positive selectivity on unobservables, image may well be negative, and therefore, image could be zero (or very small). So even if matching on image had a small effect on the selectivity bias component, the elimination of the omitted variable bias term will cause image. That is, if the two sources of biases were offsetting each other in the simple difference, eliminating one of the problems via matching make the overall bias increase.

Overall, we conclude that as soon as the researcher admits departures from S20, there is a rather weak case to be made for “matching” on observables being an improvement. Indeed, there is a compelling argument that including more image’s will increase bias, and that the “cure may be worse than the disease”.

Finally, in terms of the third criterion we have been considering, the matching approach seems to have no testable implications whatsoever. One possibility is to specify a particular subset image of the images that are available to the researcher, and make the argument that it is specifically those variables that determine treatment in S20. The remainder of the observed variables could be used to test the implication that the distribution of types image is the same between the treated and non-treated populations, conditional on image. The problem, of course, is that if some differences were found in those images not in image, there would again be the temptation to simply include those variables in the subset image. Overall, not only do we believe this design to have a poor theoretical justification in most contexts (outside of actual stratified randomized experiments), but there seems to be nothing in the design to discipline which images to include in the analysis, and as we have shown above, there is a great risk to simply adding as many images to the analysis as possible.

4.3.2 Propensity score, matching, re-weighting: methods for descriptive, non-causal inference

Although we have argued that the matching approach is not compelling as a research design for causal inference, it can nevertheless be a useful tool for descriptive purposes. Returning to our hypothetical job search assistance program, suppose that the program is voluntary, and that none of the data generating processes described in Section 4 apply. We might observe the difference


image


but also notice that the distribution of particular images (education, age, gender, previous employment history) are also different: image. One could ask the descriptive question, “mechanically, how much of the difference could be exclusively explained by differences in the distribution of image?” We emphasize the word “mechanically”; if we observe that image varies systematically by different values of image for the treated population, and if we further know that the distribution of image is different in the non-treated population, then even if the program were entirely irrelevant, we would nevertheless generally expect to see a difference between image and image.

Suppose we computed


image


Then the difference


image     (17)


would tell us how relevant image is in predicting image, once adjusting for the observables image. If one adopted S20 then this could be interpreted as an average treatment effect for the treated. But more generally, this adjusted difference could be viewed as a descriptive, summary statistic, in the same way multiple regressions could similarly provide descriptive information about the association between image and image after partialling out image.

We only briefly review some of the methods used to estimate the quantity (17), since this empirical exercise is one of the goals of more general decomposition methods, which is the focus of the chapter by Firpo et al. (2011). We refer the reader to that chapter for further details.

Imputation: Blinder/Oaxaca

One way of obtaining (17), is to take each individual in the treated sample, and “impute” the missing quantity, the average image given the individual’s characteristics image. This is motivated by the fact that


image     (18)


is identical to the quantity (17).

The sample analogue is given by


image


where image is the number of observations in the treated sample, and image is the predicted value of regressing image on image for the non-treated sample. This is immediately recognizable as a standard Blinder/Oaxaca exercise.

Matching

One development in the recent labor economics literature is an increased use of matching estimators, estimators based on the propensity score and semi-parametric estimators which eschew parametric specification of the outcome functions. The concern is that the regression used to predict image may be a bad approximation of the true conditional expectation.

The first approach is to simply use the sample mean of image for all individuals in the non-treated sample that have exactly the same value for image as the individual image. Sometimes it will be possible to do this for every individual (e.g. when image is discrete and for each value of image there are treated and non-treated observations).

In other cases, image is so multi-dimensional that for each value of image there are very few observations with many values only having treated or non-treated observations. Alternatively, image could have continuously distributed elements, in which case exact matching is impossible. In this case, one approach is to compute non-parametric estimates of image using kernel regression or local polynomial regression (Hahn, 1998; Hirano et al., 2003). A version of matching takes the data point in the control sample that is “closest” to the individual image in terms of the characteristics image, and assigns image to be the value of image for that “nearest match”.

Propensity score matching

A variant of the above matching approach is to “match” on the Propensity score, rather than on the observed image, and it is motivated by the fact that (17) is also equivalent to


image


where


image


is the well-known “propensity score” of Rosenbaum and Rubin (1983). We emphasize once again that image is not the same thing as image, the latent propensity to be treated. Indeed it is the fact that there may be variability in image conditional on image, which threatens the validity of the “selection on observables” approach to causal inference.

Re-weighting

An alternative approach is to “re-weight” the control sample so that the re-weighted distribution of image matches that in the treated population. It is motivated by the fact that (18) is also equivalent to


image


which is equal to


image


Using the fact that image, this becomes


image


The second term is simply a weighted average of image for the non-treated observations using image as a weight. It is clear that this average will up-weight those individuals with image, when relatively “more” individuals with that value are among the treated than among the non-treated; when there are disproportionately fewer individuals with image, the weighted average will down-weight the observation.

By Bayes’ rule, this weight is also equal to


image


(DiNardo et al., 1996; Firpo, 2007).

Thus, in practice, a re-weighting approach will involve computing the sample analogue


image


where image is the estimated propensity score function for an individual image with image.

A useful aspect of viewing the adjustment as a re-weighting problem is that one is not limited to examining only conditional expectations of image: one can re-weight the data and examine other aspects of the distribution, such as quantiles, variances, etc. by computing the desired statistic with the appropriate weight. See (DiNardo et al., 1996; DiNardo and Lemieux, 1997; Biewen, 1999; Firpo, 2007) for discussion and applications.

5 Program Evaluation: Lessons and Challenges

This chapter provides a systematic assessment of a selection of commonly employed program evaluation approaches. We adopt a perspective that allows us to consider how the Regression Discontinuity Design—an approach that has seen a marked increase in use over the past decade—relates to other well-known research designs. In our discussion, we find it helpful to make two distinctions. One is between the descriptive goals of an ex post evaluation, and the predictive goals of an ex ante evaluation. And the other is between two kinds of statistical conditions needed to make causal inference—(1) descriptions of our institutional knowledge of the program assignment process, and (2) structural assumptions—some that have testable restrictions, and others that will not—that do not come from our institutional knowledge, but rather stem from conjectures and theories about individual behavior; such structural assumptions necessarily restrict the set of models of behavior within which we can consider the causal inference to be valid.

In our discussion, we provide three concrete illustrations of how the goals of ex post and ex ante evaluations are quite complementary. In the case of the randomized experiment with perfect compliance, highly credible estimates can be obtained for program effects for those who selected to be a participant in the study. Through the imposition of a number of structural assumptions about the nature of the economy, one can draw a precise link between the experimentally obtained treatment effect and a particular policy parameter of interest—the aggregate impact of a wide-spread “scaling” up of the program. In the case of the randomized experiment with imperfect compliance, one can make highly credible inferences about program effects, even if the obtained treatment effect is a weighted average. But with an additional functional form assumption (that is by no means unusual in the applied literature), one can extrapolate from a Local Average Treatment Effect to the Average Treatment Effect, which might be the “parameter of interest” in an ex ante evaluation. Finally, in the case of the RD design, one can obtain highly credible estimates of a weighted average treatment effect, which in turn can be viewed as an ingredient to an extrapolation for the Treatment on the Treated parameter.

Our other observation is that “D”-conditions and “S”-conditions are also quite complementary. On the one hand, for the designs we examine above, “D”-conditions are generally necessary (even if not sufficient) to isolate the component of variation in program status that is “as good as randomly assigned”. When they are not sufficient, “S”-conditions are needed to fill in the missing pieces of the assignment process. Furthermore, in our three illustrations, only with “S”-conditions can any progress be made to learn about other parameters of interest defined by an ex ante evaluation problem. Thus, our three examples are not meant to be definitive, but rather illustrative of how research designs dominated by “D”-conditions could supply the core ingredients to ex ante evaluations that are defined by “S”-conditions. In our view, this combination seems promising.

More importantly, what is the alternative? There is no a priori reason to expect that the variation that we may be able to isolate as “effectively randomized”, to be precisely the variation required to identify a particular policy proposal of interest, particularly since what is “of interest” is subjective, and researcher-dependent. 47 That is, in virtually any context—experimental or non-experimental—the effects we can obtain will not exactly match what we want. The alternative to being precise about the sub-population for whom the effects are identified is to be imprecise about it. And for conducting an ex ante evaluation, the alternative to using an extrapolation where the leading term is a highly credible experimental/quasi-experimental estimate is to abandon that estimate in favor of an extrapolation in which the leading term is an estimate with questionable or doubtful internal validity. Similarly, even if the assumptions needed for extrapolation involve structural assumptions that require an uncomfortable suspension of disbelief, the alternative to being precise in specifying those assumptions, is to be imprecise about it and make unjustified generalizations, or to abandon the ex ante evaluation question entirely.

We conclude with some speculation on what could be fruitful directions for further developing strategies for the ex post evaluation problem. One observation from our discussion is that both the Sharp RD and Fuzzy RD are representations of the general self-selection problem, where agents can take actions to influence their eligibility or participation in a program. What allows identification—and indeed the potential to generate randomization from a non-experimental setting—is our knowledge of the threshold, and the observability of the “latent variable” that determines the selection. Turning that on its head, we could view all selection problems with a latent index structure as inherently Regression Discontinuity designs, but ones for which we do not perfectly observe the latent selection variable (or the cutoff). But what if we have partial institutional knowledge on the assignment process? That is, even if we don’t measure image (from Sections 3.3 and 3.4), what if we observe a reasonable proxy for image? Can that information be used?

On a related point, our presentation of various research designs has a “knife-edge” quality. The designs in Section 3 are such that image or image have point-mass distributions, or we required every individual to have a continuous density for image. When those conditions held, we argued that the effects would be highly credible, with strong, testable implications. But in Section 2.2.1, we argued that when we do not have specific knowledge about the assignment process, the designs will tend to yield more questionable inferences, because there will be an increase in plausible alternative specifications often if with very little to guide us as to the “preferred” specification. Does a middle-ground exist? Are there situations where our knowledge of the assignment process tells us that image or image, while not distributed as a mass-point, has small variance? Might there be ways to adjust for these “minor” departures from “as good as randomized”?

Finally, another lesson from the RD design is how much is gained from actually knowing something about the treatment assignment process. It is intuitive that when one actually knows the rule that partially determines program status, and one observes the selection rule variable, that should help matters. And it is intuitive that if program assignment is a complete “black box”—as is often the case when researchers invoke a “selection on observables”/matching approach—we will be much less confident about those program effects; one ought to be a bit skeptical about strong claims to the contrary. Since most programs are at least partially governed by some eligibility rules, the question is whether there are some other aspects of those rules—that go beyond discontinuities or actual random assignment—from which we can tease out credible inferences on the programs’ causal impacts.

References

Alberto Abadie, Joshua D. Angrist, Guido Imbens. Instrumental variables estimates of the effect of subsidized training on the quantiles of trainee earnings. Econometrica. 2002;70(1):91-117.

Jaap H. Abbring, James J. Heckman. Econometric evaluation of social programs, Part III: Distributional treatment effects, dynamic treatment effects, dynamic discrete choice, and general equilibrium policy evaluation. J.J. Heckman, E.E. Leamer, editors. Handbook of Econometrics. Handbook of Econometrics. vol. 6. Elsevier; 2007. (Chapter 72)

Joshua D. Angrist. Lifetime earnings and the Vietnam Era draft lottery: Evidence from social security administrative records. American Economic Review. 1990;80(3):313-336.

Joshua D. Angrist. Treatment effect heterogeneity in theory and practice. Economic Journal. 114(494), 2004. C52–C83

Joshua D. Angrist, Alan B. Krueger. Empirical strategies in labor economics. Orley Ashenfelter, David Card, editors. Handbook of Labor Economics. Handbooks in Economics. vol. 3-A. New York: Elsevier Science; 1999:1277-1366. (Chapter 23)

Angrist, Joshua D., Pischke, Jörn-Steffen, The credibility revolution in empirical economics: how better research design is taking the con out of econometrics. NBER Working Papers 15794, National Bureau of Economic Research, Inc. March 2010

Joshua D. Angrist, Victor Lavy. Using Maimonides’ rule to estimate the effect of class size on scholastic achievement. Quarterly Journal of Economics. 1999;114(2):533-575.

Joshua D. Angrist, William N. Evans. Children and their parents’ labor supply: evidence from exogenous variation in family size. American Economic Review. 1998;88(3):450-477.

Joshua D. Angrist, Guido W. Imbens, Donald B. Rubin. Identification of causal effects using instrumental variables. Journal of the American Statistical Association. 1996;91(434):444-455.

Joshua Angrist, Eric Bettinger, Michael Kremer. Long-term educational consequences of secondary school vouchers: evidence from administrative records in Colombia. American Economic Review. 2006;96(3):847-862.

Orley Ashenfelter. Estimating the effect of training programs on earnings. Review of Economics and Statistics. 1978;60(1):47-57.

Orley Ashenfelter, David Card. Time series representations of economic variables and alternative models of the labour market. Review of Economic Studies. 1982;49(5):761-781.

Orley Ashenfelter, David Card. Using the longitudinal structure of earnings to estimate the effect of training programs. Review of Economics and Statistics. 67, 1985.

Orley Ashenfelter, Mark W. Plant. Nonparametric estimates of the labor-supply effects of negative income tax programs. Journal of Labor Economics. 8(1), 1990. S396–S415

B. Barnow, G. Cain, A. Goldberger. Issues in the analysis of selectivity bias. Evaluation Studies Review Annual. 1976;5:43-59.

Biewen, Martin, Measuring the effects of socio–economic variables on the income distribution: an application to the east German transition process. Discussion Paper Series, Ruprecht–Karls–Universität, Heidelberg, Germany, March 1999

Sandra Black. Do better schools matter? parental valuation of elementary education. Quarterly Journal of Economics. 1999;114(2):577-599.

Busso, Matias, DiNardo, John, McCrary, Justin, 2008. Finite sample properties of semiparametric estimators of average treatment effects. Unpublished Working Paper, University of Michigan, Ann Arbor, MI. September 19

Busso, Matias, DiNardo, John, McCrary, Justin, 2009. New evidence on the finite sample properties of propensity score matching and reweighting estimators. Working Paper 3998, Institute for the Study of Labor (IZA). February

Donald T. Campbell, Thomas D. Cook. Quasi-Experimentation: Design and Analysis for Field Settings, first edition. Chicago: Rand McNally College Publishing Company; 1979.

David Card, Alan B. Krueger. Myth and Measurement: The New Economics of the Minimum Wage. Princeton, NJ: Princeton University Press; 1995.

David Card, Carlos Dobkin, Nicole Maestas. Does Medicare save lives? Quarterly Journal of Economics. 2009;124(2):597-636.

David Card, Carlos Dobkin, Nicole Maestas. The impact of nearly universal insurance coverage on health care utilization: evidence from medicare. American Economic Review. 2009.

T.D. Cook. “Waiting for life to arrive”: A history of the regression-discontinuity design in psychology, statistics and economics. Journal of Econometrics. 2008;142(2):636-654.

D.R. Cox. Planning of Experiments. New York: Wiley; 1958.

Angus S. Deaton. Instruments of Development: randomization in the tropics and the search for the elusive keys to development. Proceedings of the British Academy. 2008;162:123-160. Keynes Lecture, British Academy

John DiNardo, David S. Lee. Economic impacts of new unionization on private sector employers: 1984–2001. Quarterly Journal of Economics. 2004;119(4):1383-1441.

John DiNardo, Thomas Lemieux. Diverging male wage inequality in the United States and Canada, 1981–1988: do institutions explain the difference? Industrial and Labor Relations Review. 1997.

John DiNardo, Nicole Fortin, Thomas Lemieux. Labor market institutions and the distribution of wages, 1973–1993: a semi-parametric approach. Econometrica. 1996;64(5):1001-1045.

Jianqing Fan, Irene Gijbels. Local Polynomial Modelling and its Applications. New York: Chapman and Hall; 1996.

Hanming Fang, Michael Keane, Ahmed Khwaja, Martin Salm, Daniel Silverman. Testing the mechanisms of structural models: The case of the mickey mantle effect. American Economic Review. 2007;97(2):53-59.

Fernández-Villaverde, Jesús, 2009. The Econometrics of DSGE Models. Working Paper 14677, National Bureau of Economic Research. January

Erica Field. Entitled to work: urban property rights and labor supply in Peru. The Quarterly Journal of Economics. 2007;122(4):1561-1602.

Sergio Firpo. Efficient semiparametric estimation of quantile treatment effects. Econometrica. 2007;75(1):259-276.

Sergio Firpo, Nicole Forin, Thomas Lemieux. Decomposition methods in economics. In: Orley Ashenfelter, David Card, editors. Handbook of Labor Economics, vol. 4A. Amsterdam: North Holland; 2011:1-102.

Sir Ronald Aylmer Fisher. Design of Experiments. Edinburgh, London: Oliver and Boyd; 1935.

Sir Ronald Aylmer Fisher. Design of Experiments, eighth ed. Edinburgh, London: Oliver and Boyd; 1966. First edition published in 1935

David A. Freedman. A note on screening regression equations. The American Statistician. 1983;37(2):152-155.

Robert Guttman. Job training partnership act: new help for the unemployed. Monthly Labor Review. 1983:3-10.

Trygve Haavelmo. The probability approach in econometrics. Econometrica. (12):1944. iii–115

Ian Hacking. The Logic of Statistical Inference. Cambridge: Cambridge University Press; 1965.

Jinyong Hahn. On the role of the propensity score in efficient semiparametric estim ation of average treatment effects. Econometrica. 1998;66(2):315-331.

Jinyong Hahn, Petra Todd, Wilbert Van der Klaauw. Identification and estimation of treatment effects with a regression-discontinuity design. Econometrica. 2001;69(1):201-209.

Norman Hearst, Tom B. Newman, Stephen B. Hulley. Delayed effects of the military draft on mortality: a randomized natural experiment. New England Journal of Medicine. 1986;314:620-624.

James J. Heckman. Shadow prices, market wages, and labor supply. Econometrica. 1974;42(4):679-694.

James J. Heckman. The common structure of statistical models of truncation, sample selection, and limited dependent variables, and a simple estimator for such models. Annals of Economic and Social Measurement. 1976;5(4):475-492.

James J. Heckman. Dummy endogenous variables in a simultaneous equation system. Econometrica. 1978;46:931-960.

Heckman, James J., 1991. Randomization and social policy evaluation. Working Paper 107. National Bureau of Economic Research. July

James J. Heckman. Causal parameters and policy analysis in economics: a twentieth century retrospective. Quarterly Journal of Economics. 2000;115(1):45-97.

James J. Heckman. Micro data, heterogeneity, and the evaluation of public policy: nobel lecture. The Journal of Political Economy. 2001;109(4):673-748.

James J. Heckman, Bo E. Honore. The empirical content of the roy model. Econometrica. 1990;58(5):1121-1149.

James J. Heckman, Edward J. Vytlacil. Local instrumental variables. Cheng Hsiao, Kimio Morimune, James L. Powell, editors. Nonlinear Statistical Modeling: Proceedings of the Thirteenth International Symposium in Economic Theory and Econometrics: Essays in Honor of Takeshi Amemiya. International Symposia in Economic Theory and Econometrics. vol. 13. Cambridge University Press; 2001. (Chapter 1)

James J. Heckman, Edward J. Vytlacil. Policy-relevant treatment effects. The American Economic Review. 2001;91(2):107-111. Papers and Proceedings of the Hundred Thirteenth Annual Meeting of the American Economic Association

James J. Heckman, Edward J. Vytlacil. Structural equations, treatment effects, and econometric policy evaluation. Econometrica. 2005;73(3):669-738.

Heckman, James J., Vytlacil, Edward J., 2007a. Econometric evaluation of social programs, part I: causal models, structural models and econometric policy evaluation. In: Heckman, J.J., Leamer, E.E., (Eds.), Handbook of Econometrics, first ed., vol. 6B, Elsevier (Chapter 70)

James J. Heckman, Edward J. Vytlacil. Econometric evaluation of social programs, part II: Using the marginal treatment effect to organize alternative econometric estimators to evaluate social programs, and to forecast their effects in new environments. J.J. Heckman, E.E. Leamer, editors. Handbook of Econometrics. Handbook of Econometrics. vol. 6. Elsevier; 2007. (Chapter 71)

Heckman, James J., Smith, Jeffrey A., 1998. Evaluating the Welfare State. NBER Working Papers 6542. National Bureau of Economic Research, Inc

James J. Heckman, Richard RobbJr. Alternative methods for evaluating the impact of interventions. In: James J. Heckman, Burton Singer, editors. Longitudinal Analysis of Labor Market Data. New York: Cambridge University Press, 1985.

Heckman, James J., Urzua, Sergio, 2009. Comparing IV with structural models: what simple IV can and cannot identify. Working Paper 14706, National Bureau of Economic Research. February

James J. Heckman, H. Ichimura, Petra Todd. Matching as an econometric evaluation estimator. Review of Economic Studies. 1998;65(2):261-294.

James J. Heckman, Justin L. Tobias, Edward J. Vytlacil. Four parameters of interest in the evaluation of social programs. Southern Economic Journal. 2001;68(2):210-223.

James J. Heckman, Justin L. Tobias, Edward J. Vytlacil. Simple estimators for treatment parameters in a latent variable framework. Review of Economics and Statistics. 2003;85(3):748-755.

James J. Heckman, Robert J. LaLonde, James A. Smith. The economics and econometrics of active labour market programmes. In: The Handbook of Labor Economics, vol. III. Amsterdam: North–Holland; 1999.

James J. Heckman, Sergio Urzua, Edward J. Vytlacil. Understanding instrumental variables in models with essential heterogeneity. Review of Economics and Statistics. 2006;88(3):389-432.

Keisuke Hirano, Guido Imbens, Geert Ridder. Efficient estimation of average treatment effects using the estimated propensity score. Econometrica. 2003;71(4):1161-1189.

Paul W. Holland. Statistics and causal inference. Journal of the American Statistical Association. 1986;81(396):945-960.

Guido Imbens, Joshua Angrist. Identification and estimation of local average treatment effects. Econometrica. 1994;62(2):467-476.

Guido Imbens, Thomas Lemieux. Regression discontinuity designs: a guide to practice. Journal of Econometrics. 2008;142(2):615-635.

Guido Imbens, Thomas Lemieux. Regression discontinuity designs: a guide to practice. Journal of Econometrics. 2008.

Imbens, Guido W., 2009. Better LATE than nothing: some comments on Deaton (2009) and Heckman and Urzua (2009). Working Paper 14896, National Bureau of Economic Research, April

Michael P. Keane. Structural vs. atheoretic approaches to econometrics. Journal of Econometrics. 2009. Corrected Proof

Michael P. Keane, Kenneth I. Wolpin. Exploring the usefulness of a nonrandom holdout sample for model validation: welfare effects on female behavior. International Economic Review. 2007;48(4):1351-1378.

David S. Lee. Randomized experiments from non-random selection in US house elections. Journal of Econometrics. 2008;142(2):675-697.

David S. Lee, David Card. Regression discontinuity inference with specification error. Journal of Econometrics. 2008;142(2):655-674. The regression discontinuity design: Theory and applications

Lee, David S., Lemieux, Thomas, 2009. Regression discontinuity designs in economics. Working Paper 14723. National Bureau of Economic Research, February

Erich Leo Lehmann. Testing Statistical Hypotheses. New York: John Wiley & Sons, Inc.; 1959.

Erich Leo Lehmann, Joseph Lawson HodgesJr. Basic Concepts of Probability and Statistics. San Francisco: Holden-Day; 1964.

Thomas Lemieux, Kevin Milligan. Incentive effects of social assistance: a regression discontinuity approach. Journal of Econometrics. 2008;142(2):807-828.

Robert LucasJr. Econometric policy evaluation: a critique. Carnegie-Rochester Conference Series on Public Policy. 1976;1(1):19-46.

G.S. Maddala. Limited-dependant and Qualitative Variables in Econometrics. Cambridge University Press; 1983.

Alan Manning. Monopsony in Motion: Imperfect Competition in Labor Markets. Princeton, NJ: Princeton University Press; 2003.

Jacob Marschak. Economic measurements for policy and prediction. William C. Hood, Tjalling C. Koopmans, editors. Studies in Econometric Method. New York: John Wiley and Sons; 1953:1-26.

Deborah G. Mayo. Error and the growth of experimental knowledge. In: Science and Its Conceptual Foundations. Chicago: University of Chicago Press; 1996.

McCall, Brian Patrick, McCall, John Joseph, 2008. The economics of search, Routledge, London, New York

Justin McCrary. Manipulation of the running variable in the regression discontinuity design: a density test. Journal of Econometrics. 2008;142(2):698-714.

McCrary, Justin, Royer, Heather, 2010. The effect of female education on fertility and infant health: evidence from school entry laws using exact date of birth. Unpublished Working Paper. University of California Berkeley

McFadden, Daniel, Talvitie, Antti, Associates. 1977. Demand Model Estimation and Validation, Urban Travel Demand Forecasting Project UCB-ITS-SR-77-9. The Institute of Transportation Studies, vol. V. University of California, Irvine and University of California, Berkeley. Phase 1 Final Report Series

Daniel L. Millimet, Rusty Tchernis. On the specification of propensity scores, with applications to the analysis of trade policies. Journal of Business and Economic Statistics. 2009;27(3):397-415.

Phillip Oreopoulos. Estimating average and local average treatment effects of education when compulsory schooling laws really matter. American Economic Review. 2006;96(1):152-175.

Orr, Larry, Feins, Judith D., Jacob, Robin, Beecroft, Erik, Sanbonmatsu, Lisa, Katz, Lawrence F., Liebman, Jeffrey B., Kling, Jeffrey R., 2003. Moving to opportunity interim impacts evaluation. Final Report. US Department of Housing and Urban Development

A. Pagan, A. Ullah. Nonparametric Econometrics. New York: Cambridge University Press; 1999.

James L. Powell. Estimation of semiparametric models. In: Robert Engle, Daniel McFadden, editors. Handbook of Econometrics, vol. 4. Amsterdam: North Holland, 1994.

Richard E. Quandt. The estimation of the parameters of a linear regression system obeying two separate regimes. Journal of the American Statistical Association. 1958;53(284):873-880.

Richard E. Quandt. A new approach to estimating switching regressions. Journal of the American Statistical Association. 1972;67(338):306-310.

Peter C. Reiss, Frank A. Wolak. Structural econometric modeling: rationales and examples from industrial organization. James J. Heckman, E.E. Leamer, editors. Handbook of Econometrics. Handbook of Econometrics. vol. 6. Elsevier; 2007.

Philip K. Robins. A comparison of the labor supply findings from the four negative income tax experiments. The Journal of Human Resources. 1985;20(4):567-582.

Sherwin Rosen. The theory of equalizing differences. Handbook of Labor Economics. O. Ashenfelter, R. Layard, editors. Handbook of Labor Economics. vol. 1. Elsevier; 1987:641-692. (Chapter 12)

Paul Rosenbaum, Donald Rubin. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70(1):41-55.

Mark R. Rosenzweig, Kenneth I. Wolpin. Natural ‘natural experiments’ in economics. Journal of Economic Literature. 2000;38(4):827-874.

A. Roy. Some thoughts on the distribution of earnings. Oxford Economic Papers. 1951;3(2):135-146.

Donald B. Rubin. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology. 1974;66(5):688-701.

Donald B. Rubin. Statistics and causal inference: comment: which ifs have causal answers. Journal of the American Statistical Association. 1986;81(396):961-962.

Donald B. Rubin. On the application of probability theory to agricultural experiments. Essay on principles. Section 9. Comment: Neyman (1923) and causal inference in experiments and observational studies. Statistical Science. 1990;5(4):472-480.

Donald B. Rubin, N. Thomas. Matching using estimated propensity scores: relating theory to practice. Biometrics. 1996;52:249.

John Rust. Comments on: by Michael Keane. Journal of Econometrics. 2009. Corrected Proof

Jerzy Splawa-Neyman, D.M. Dabrowska, T.P. Speed. On the application of probability theory to agricultural experiments. Essay on principles. Section 9. Statistical Science. 1990;5(4):465-472.

Splawa-Neyman, K. Iwaszkiewicz, St. Kolodziejczyk. Statistical problems in agricultural experimentation. Supplement to the Journal of the Royal Statistical Society. 1935;2(2):107-180.

Christopher Taber, Eric French. Identification of models of the labor market. In: Orley Ashenfelter, David Card, editors. Handbook of Labor Economics, vol. 4A. Elsevier Science; 2011:537-617.

Tesfatsion, Leight, 2007. Introductory notes on complex adaptive systems and agent-based computational economics. Technical Report. Department of Economics. Iowa State University, January http://www.econ.iastate.edu/classes/econ308/tesfatsion/bat1a.htm

Donald L. Thistlethwaite, Donald T. Campbell. Regression-discontinuity analysis: an alternative to the ex-post facto experiment. Journal of Educational Psychology. 1960;51:309-317.

Todd, Petra, Wolpin, Kenneth, 2006. Ex ante evaluation of social programs. PIER Working Paper Archive. Penn Institute for Economic Research. Department of Economics. University of Pennsylvania

Wilbert Van der Klaauw. Estimating the effect of financial aid offers on college enrollment: a regression-discontinuity approach. International Economic Review. 2002;43(4):1249-1287.

Wilbert Van der Klaauw. Regression-discontinuity analysis: a survey of recent developments in economics. Labour. 2008;22(2):219-245.

Wilbert Van der Klaauw. Regression-discontinuity analysis: A survey of recent developments in economics. Labour. 2008;22(2):219-245.

Paul Windrum, Giorgio Fagiolo, Alessio Moneta. Empirical validation of agent-based models: alternatives and prospects. Journal of Artificial Societies and Social Simulation. 10, 2007.

Kenneth I. Wolpin. Ex ante policy evaluation, structural estimation and model selection. American Economic Review. 2007;97(2):48-52.

1 Other recent reviews of common evaluation approaches include, for example, Heckman and Vytlacil (2007a,b) and Abbring and Heckman (2007).

2 A sampling of papers that reflects this debate would include Heckman and Vytlacil (2005), Heckman et al. (2006), Deaton (2008), Imbens (2009), Keane (2009) and Angrist and Pischke (2010).

3 In our chapter, we will say nothing about another kind of ex ante evaluation question: what would be the effects of a program that was never run in the first place, or of a qualitatively different kind of program? See the discussion in Todd and Wolpin (2006).

4 For a comprehensive discussion and review of many of these issues see the reviews of Heckman and Vytlacil (2007a,b) and Abbring and Heckman (2007).

5 “Structural models” more generally refer to a collection of stylized mathematical descriptions of behavior and the environment which are combined to produce predictions about the effects of different choices, etc. It is a very broad area, and we make no attempt to review this literature. For a tiny sample of some of the methodological discussion, see Haavelmo (1944), Marschak (1953), Lucas (1976), Ashenfelter and Card (1982), Heckman (1991), Heckman (2000), Reiss and Wolak (2007), Heckman and Vytlacil (2007a), Deaton (2008), Fernández-Villaverde (2009), Heckman and Urzua (2009), and Keane (2009). We also ignore other types of structural models including “agent based” models (Windrum et al., 2007; Tesfatsion, 2007).

6 image and image (in the potential outcomes framework) correspond to image and image.

7 Specifically, where we consider their image, image, and image as elements of our vector image.

8 Formally, image, and similarly, image.

9 From Bayes’ rule we have image, and image.

10 Campbell and Cook (1979) contains a discussion of various “threats” to internal validity.

11 See Keane (2009); Rosenzweig and Wolpin (2000) for a discussion along these lines.

12 image could be observable components of human capital, image.

13 A discussion of “low grade” experiments can be found in Keane (2009). See also Rosenzweig and Wolpin (2000).

14 While this setup has been described as the “selection on observables”, “potential outcomes”, “switching regressions” or “Neyman-Rubin-Holland model” (Splawa-Neyman et al., 1990; Lehmann and Hodges, 1964; Quandt, 1958, 1972; Rubin, 1974; Barnow et al., 1976; Holland, 1986), to avoid confusion we will reserve the phrase “selection on observables” for the case where the investigator does not have detailed institutional knowledge of the selection process and treat the stratified/block randomization case as special case of simple randomization.

15 See Deaton (2008).

16 This is because image.

17 Heckman and Vytlacil (2005) make this point clearly, noting that the treatment on the treated parameter is the key ingredient to predicting the impacts of shutting down the program.

18 Without the monotonicity condition, the other point of support would be image, the latent propensity of the “defiers”.

19 Note that image.

20 Alternatively, one can view the weights as being proportional to the fraction of compliers in excess of the defiers among individuals of the same type image: image.

21 But in this example, image would have to be bounded above by image.

22 Heckman and Vytlacil (2005) and Heckman et al. (2006) correctly observe that from the perspective of the ex ante evaluation problem a singular focus on estimators without an articulated model will not, in general, be helpful in answering a question of economic interest. In an ex post evaluation, however, careful qualification of what parameters are identified from the experiment (as opposed to the parameters of more economic interest) is a desirable feature of the evaluation.

23 The exception to this is that, in some cases, our institutional knowledge may lead us to know that those assigned image are barred from receiving treatment. S9 will necessarily follow.

24 Discussions on this point can be found, for example, in Heckman and Vytlacil (2001b), Heckman (2001), and Heckman and Vytlacil (2005), as well as Heckman and Vytlacil (2007a,b) and Abbring and Heckman (2007).

25 To see this, note that image which is equal to image. We can decompose the first term into two terms to yield image. Taking the difference between this and an analogous expression for image, the first and fourth terms cancel. Dividing the result by image yields Eq. (10).

26 We chose the studies for this exercise in the following way. We searched articles mentioning “local average treatment effect(s)”, as well as articles which cite Imbens and Angrist (1994) and Angrist et al. (1996). We restricted the search to articles that are published in American Economic Review, Econometrica, Journal of Political Economy, or Quarterly Journal of Economics, or Review of Economic Studies. From this group, we restricted our attention to studies in which both the instrument and the treatment were binary. The studies presented in the table are the ones in this group for which we were able to obtain the data, successfully replicate the results, and where computing the IV estimate without covariates did not substantially influence the results (Angrist and Evans, 1998; Abadie et al., 2002; Angrist et al., 2006; Field, 2007).

27 The obvious problem with the functional form here is that “Worked for pay” is a binary variable. One can still use the bivariate normal framework as an approximation, if image is interpreted to be the latent probability of working, which can be continuously distributed (but one has to ignore the fact that the tails of the normal necessarily extend beyond the unit interval). In this case, the “average” effect is the average effect on the underlying probability of working.

28 See Cook (2008) for an interesting history of the RD design in education research, psychology, statistics, and economics. Cook argues the resurgence of the RD design in economics is unique as it is still rarely used in other disciplines.

29 Recent surveys of the RD design in theory and practice include Lee and Lemieux (2009), Van der Klaauw (2008a), and Imbens and Lemieux (2008a).

30 Typically, one assumes that conditional on the covariates, the treatment (or instrument) is “as good as” randomly assigned.

31 For example, for the uniform density image, the weights would be identical across types, even though there would be variability in the probability of treatment driven by variability in image.

32 Additionally, any variable determined prior to image—whether or not they are an element of image—should have the same distribution on either side of the discontinuity threshold. This, too, is analogous to the case of randomized assignment.

33 See Imbens and Lemieux (2008b) and Van der Klaauw (2008b) for other surveys.

34 See Lee and Card (2008) for a discussion.

35 As an example, Powell (1994) points out that the same least squares estimator can simultaneously be viewed as solutions to parametric, semi-parametric, and nonparametric problems.

36 Unless the underlying function is exactly linear in the area being examined.

37 One of the reasons why typical non-parametric “methods” (e.g. local linear regression) are sometimes viewed as being superior is that the statistics yield consistent estimators. But it is important to remember that such consistency is arising from an asymptotic approximation that dictates that one of the “parameters” of the statistic (i.e. the function of the sample data)—the bandwidth—shrinks (at an appropriate rate) as the sample size increases. Thus, the consistency of the estimator is a direct result of a different notion of asymptotic behavior. If one compares the behavior of “non-parametric” statistics (e.g. local linear regression) with that of “parametric” statistics (e.g. global polynomial regression) using the same asymptotic framework (i.e. statistics are not allowed to change with the sample size), then the non-parametric method loses this superiority in terms of consistency. Depending on the true underlying function (which is unknown), the difference between the truth and the probability limit of the estimator, may be larger or smaller with the “parametric” statistic.

38 Note that image.

39 S16 has the implication that there will be no variance in image for the untreated group, which will in practice almost never be the case. The variance in changes could be accommodated by an independent, additive error term in the outcome equation.

40 In the more general discussion in Heckman and Vytlacil (2005), for example, image is assumed, at a minimum, to be independent of image and image (or of potential outcomes) given a set of observed “conditioning variables.”

41 For a discussion of several of these approaches, see Busso et al. (2008, 2009).

42 Indeed, for a “half century” the basic framework was “entirely tied to randomization based evaluations” and was “not perceived as being relevant for defining causal effects in observational studies.” Rubin (1990).

43 Recall that image and image, which will be unequal with image non-degenerate.

44 Here, image.

45 Parallel arguments hold when image is decreasing in image and/or when image.

46 By Bayes’ rule, image.

47 Heckman and Vytlacil (2005) make the point that if the propensity score has limited support (e.g. including discrete support), marginal treatment effects cannot be identified in certain areas, and certain policy parameters of interest are also not identified.

We are grateful to Diane Alexander and Pauline Leung, who provided outstanding research assistance. We thank Orley Ashenfelter, David Card, Damon Clark, Nicole Fortin, Thomas Lemieux, Enrico Moretti, Phil Oreopolous, Zhuan Pei, Chris Taber, Petra Todd, John Van Reenen, and Ken Wolpin for helpful suggestions, comments, and discussions.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.232.187