9 Matching

The last technique I want to introduce is propensity score matching (PSM).59 What sounds fancy is often not included in the basic Stata textbooks, and actually quite easy to understand and interpret, even for the beginner. The next section will shortly discuss the main ideas behind the method, run an example and check some diagnostics afterwards.

9.1 Simulating an experiment

Experiments are regarded as the gold standard in science. Most people will know about experiments from the medical sciences, when new drugs are tested and one group receives the actual drug and the other one a placebo to assess the “real” effect of the new active component. What is the core of experiments? The answer is randomization. People that are sampled from a larger population are randomly assigned to two groups, the treatment group and the control. This guarantees that, on average, both groups are similar, with respect to visible (e.g. age, gender) and hidden (e.g. intelligence, motivation, health) properties. When the treatment (drug) is applied to one group and randomization was performed, we can be sure that any effect that happens is solely due to the treatment, as no other factors can influence the result. Basically, we could do this in the social sciences as well.60 Suppose, we want to research the effect of education on success in life. We sample pupils from the population and randomly put some of them in elite schools, while others have to go to not so fancy schools. After some years, we see how the pupils are doing. As factors like intelligence, motivation or social and financial background were similar at the start of the experiment (due to the randomization) the differences later in life, between the groups, can be traced back to the effect of school alone. As you may expect, especially parents from wealthy families might not be happy when their child is randomly put in a low-quality school, which clearly underlines the ethical and economic problems of social experiments.

The idea of matching is to simulate an experiment, even when only observational data is available. Basically, the method tries to find pairs in the data which are similar in visible characteristics, and only differ in the treatment status. For example, when we find two girls that have the same age, intelligence and social background, but one of them went to the elite school while the other did not, then we have a matched pair. The basic problems of the method are the same as with regressions: only measured characteristics can be used as “controls” and matching is not a magic trick that introduces new information. The main advantages, in contrast to a regression are that the functional form of the relationship between treatment and outcome is not needed to be specified (so the problem of “linearity” we discussed before is gone) and the method is very close to the ideas of the counterfactual framework.

9.2 Propensity score matching

One problem in matching is that it is very hard or even impossible to find good matches when we use a lot of control variables. When we match on age, gender, education, intelligence, parental background, etc… it will be hard to find “statistical twins” in the data that have the exact same properties in all these variables. This is called the curse of dimensionality, which can be solved using propensity scores. The basic idea is to run a logistic regression which uses all control variables as independent variables, predict the chance of being in the treatment group, and then match using the score alone (which is a single number). The assumption is that people with a similar score will have similar properties. This has been proven mathematically, but still has some problems that were discussed recently (King and Nielsen, 2016). Therefore, we will rely on kernel-matching, which seems quite robust against some problems that are associated with matching. Based upon recent developments, I recommend this approach (possibly in combination with exact matching) over algorithms like ­nearest-neighbor or caliper.61

The problem is that Stata does not support kernel-matching, therefore, we will use a new command developed by Jann (2017) which implements a robust and fast version and comes with a lot of diagnostic options.

ssc install kmatch, replace           //Install CCS

Using this, we want to test the effect of being in a union on wage. As in chapter six, we will rely on NLSW88 data. Note that our dependent variable can be metric (continuous) or binary, and our treatment variable must be binary, so exactly two groups can be defined (treatment and control).62 The independent variables can have any metric, as long as you use factor-variable notation. For the example, we choose total work experience, region, age and smsa (metropolitan region) as control variables. Let’s see this in action. We open our dataset and run the command:

sysuse nlsw88, clear
kmatch ps union c.ttl_exp i.south c.age i.smsa (wage)

The explanation is as follows: kmatch is the name of the command we want to use, ps tells Stata to perform propensity score matching. Union is the treatment variable (coded 0/1), followed by four (rather randomly chosen) control variables. Just as with regressions, you enter your variables with factor-variable notation. When desired, you can also include interactions or higher-ordered terms. The outcome variable (wage) is put in parentheses at the end of the command. Note that you can also include multiple outcomes in the same model.

The ATE (0.915), is displayed as a result, which is the Average Treatment Effect. This is the estimated difference between the means of the treatment, and control group for the outcome variable (wage). As we see that the value is positive, we know that the average effect of being in a union results in a plus on wage of about 92 cents per hour.

Now we would like to know whether this result is statistically significant. In contrast to a regression, p-values cannot be calculated in the regular way, but must be estimated, using bootstrapping (which is a form of resampling). On modern computers it is reasonable to use about 500 replications for bootstrapping (the larger the number, the less variable the results). To get these we type

kmatch ps union c.ttl_exp i.south c.age i.smsa (wage), ///
   vce(bootstrap, reps(500))

This might take a few minutes to run, as a random sample is drawn 500 times and then the command is repeated for each. The result shows that the p-level is below 0.05 and,therefore the effect is statistically significant. Note that your results will probably differ from the one shown here, as resampling is based on random samples, thus the p-level and standard error will slightly deviate.63

When you are not interested in the ATE, you can also see the ATT (Average Treatment Effect on the Treated). This statistic tells us what people who are actually in a union earn more, due to the union membership. The counterpart is the ATC (Average Treatment Effect on the Control) which tells us how much more, people (who are not in a union) would earn, if they were union members. To get these effects add the options att or atc.

Finally, it seems like a good idea to combine PSM with an exact matching on certain key variables (here we choose college education). In other words, before the matching is run, Stata will match perfectly, which means that people with college education will be compared only to other people with the same level of education. The downside is that when you match on a large number of exact variables, the number of cases used will be lower, as a perfect match cannot always be found. It is usually a good idea to include only a few binary or ordinal variables for exact matching. Note that you cannot use factor variable notation or interactions in this option.

kmatch ps union c.ttl_exp i.south c.age i.smsa (wage), ///
   vce(bootstrap, reps(500)) att ematch(collgrad)

- Output omitted -

The option att tells Stata to report the ATT instead of the ATE, and ematch(collgrad) will combine the PSM with an exact matching on the variable collgrad. It is usually a good idea to, later, report the ATT, and the ATE, as these are often the most interesting results.

9.3 Matching diagnostics

In contrast to linear regressions, matching has lower demands, which makes our lives easier, yet there are two central aspects that must be checked, to see if we can trust our results (the following diagnostics refer to the model from page 135).

9.3.1 Common support

As described above, when you run a PSM, Stata will start with a logit model to calculate (for each person) the propensity score that summarizes the probability of being in the treatment group. If some variables have perfect predictability, which means they flawlessly determine if a person is either in treatment or control, PSM will not work. For example, suppose that every person from the south is in the treatment group, and persons from other regions are in the control group. Then, the variable south perfectly predicts the treatment status, which is not allowed. Stated otherwise: when you divide the entire range of calculated propensity scores into an arbitrary number of groups, you need, within each group, both people from treatment and control. This can be visualized.

The region between the two vertical bars is the region of common support. Although kmatch automatically trims away cases outside that region (for example the people depicted by the continuous black graph to the very left), you still have to inspect this, as there might be other problems. Sometimes, there are regions in the middle of the spectrum, with very low values for one group, which can be problematic. Checking this will tell you how well the common support is fulfilled. In our case it looks fine. You can create this kind of graph automatically by typing

kmatch density           //Output omitted

Another important aspect to consider, are the cases that could not be matched. Almost always, there will be cases in your sample for which an adequate match could not be found, therefore, these cases will not be used for calculating the statistic of interest (for example the ATE). This means, your overall sample and the sample used to calculate the statistic differ, which might leads to bias. To check if this is a problem type

kmatch cdensity

You see that the graphs of the “Total” sample and the “Matched” sample are very close to each other, meaning that the sample that could be used in the matching is almost identical to the overall sample on average. As long as this is the case, we do not expect any bias.

9.3.2 Balancing of covariates

Remember that a PSM tries to make two groups similar, that were quite different before the match, with respect to all independent variables in the model. It is a good idea to check whether this goal was actually achieved. The idea is simple: you inspect the means of these variables, within both groups, after the matching was done, and see if they are similar. As long as this is the case, you can be optimistic that your matching was successful. You can do this using either tables or graphs. We first start with a simple table.

kmatch summarize

You see that the standardized differences (mean) for total work experience was 0.13 before matching, and 0.02 after matching. This seems like a good result, also for the other variables. Keep in mind that the matched value should approach zero for means

and one for the ratios of the standard deviations, which are listed below. All in all these results are very positive. If you prefer graphs over tables type

kmatch density ttl_exp south age smsa

You see a plot comparing the distribution of each variable before and after the matching. The closer the two graphs resemble each other, after the matching, the better the balancing.

Finally, you can use Rosenbaum-Bounds to assess how robust your results are, with respect to omitted variables. This slightly advanced technique cannot be introduced here (check out the online resources), but is available using community-contributed software (Rosenbaum, 2002; Becker and Caliendo, 2007; DiPrete and Gangl, 2004).64 To learn how this technique can be integrated in a research paper, refer to Gebel (2009).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.254.90