CHAPTER 11

Regression Discontinuity and Differences-in-Differences

Booka tells us that two years ago, China imposed a ban on several books exported by her company. She wonders how she can estimate the impact this ban has on her company’s profits. Prof. Metric says that we will discuss this and other issues in this chapter. Upon finishing the chapter, we should be able to:

1. Analyze the econometric method of Regression Discontinuity (RD) design.

2. Explain the conditions for difference-in-differences (DD) estimations.

3. Apply dummy and panel techniques learned in earlier chapters into DD estimations.

4. Perform RD and DD estimations using Excel.

RD Design

Worldwide public and private institutions set certain rules that affect human behavior. For example, during the 1950s to the 1970s, many countries followed a 10-year school system, so children were not allowed to attend the first grade until their seventh birthday. Since this system changed to a 12-year school system, all children have been allowed to enroll in the first grade once they have reached their sixth birthday. Another example is that many Asian countries set the early retirement age at 55 for women and 60 for men, whereas, in the United States it is often at 62 years of age for either sex. Women in Asia therefore have to prepare for their retirement much earlier than their American counterparts.

In addition, a natural event such as a major earthquake, tsunami, hurricane, or other natural disaster might change the pattern of consumer behavior or the overall growth of the economy for a while. These changes call for a special treatment in data analyses so that the changing effects are caught at the thresholds. This method is called RD design and should be used whenever there is a specific threshold in the data.

Sharp RD: Tohoku Earthquake and Tsunami

Sharp RD occurs when the change at the threshold is very clear. This is another application of multiple-regression inference to control for the difference. As an example, Prof. Metric tells us the Tohoku earthquake and subsequent tsunami on March 11, 2011 resulted in huge economic and human losses to Japan. During the two and a half years after this incident, a dozen earthquakes of magnitudes greater than 7.0 have struck Japan, wreaking havoc on the country’s natural resources. By September 12, 2012, the Japanese National Police Agency confirmed that 15,883 deaths were attributed to these natural disasters. Additionally, the GDP in Japan fell dramatically in 2011 and continued to fall in 2012 which can be attributed to the losses of both physical and human capital. While historically Japan has experienced a speedy recovery from such natural disasters, this time the recovery is taking much longer.

With many Japanese residents shifting resources away from travel, due to their lost income, the impact of these disasters in developing countries and other nations that relied on the income generated by Japanese tourism may significantly affect their economic growth and development. This is especially interesting for the Pacific Islands which, prior to Tohoku, had seen high rates of tourist arrivals from Japan and may not be prepared for sharp decreases in demand in the local economy that Japanese tourists might have otherwise provided.

Touro tells us he has read a book on RD and decided to perform a RD using 2011 as the threshold to see the changes in tourist patterns to these islands, including Guam, where he came from. This is his model:

image

where TOUR is the number of tourist arrivals divided by population (henceforth called the tourist ratio) DAM is the total amount of direct damage in millions of U.S. dollars, and PROM is tourist promotion expenditures. These variables are in logarithmic forms. The letter i is the country index among the Pacific island, j is the country index for Japan, and t is the time index measured in years. EXC denotes real exchange rates and INFL is inflation. D is the dummy for 2011 which is used to see the effect that the Tohoku earthquake and tsunami had on tourist arrivals in the neighboring islands in 2011.

image

The three most important characteristics of the RD are:

  (i) The treatment variable D is a deterministic function of years.

 (ii) D is a discontinuous function because it remains unchanged until the year gets to the threshold of 2011.

(iii) The variable DAM is called the running variable, which is crucial in revealing the treatment effect in RD.

This DAM measures the possible reductions in the number of tourist arrivals to the islands due to other disasters, rather than those resulting from the Tohoku earthquake and tsunami of 2011. The remaining variables are called the control variables as usual.

Data on natural disasters in Japan were downloaded from the Emergency Events Database website (http://emdat.be) which is provided by the Center for Research Epidemiology of Disasters (CRED) and the Office of U.S. Foreign Disaster Assistance (OFDA), for the years 1953 to 2015. Data on the number of tourist arrivals come from the World Bank website (http://worldbank.org). Data on other variables are also found on the World Bank website. Since Touro wished to estimate the effect around the threshold, he chose the estimation period from 2006 to 2015.

A total of 15 small islands have made their data available to international organizations, making a total of 150 observations for estimations. There are 12 missing observations, so Touro had an unbalanced panel of 138 observations and had to use binary dummies to control for missing observations. To this point, Taila asks, “I do not remember learning this method.” Prof. Metric commends her for a good memory and says that treating missing observations will be discussed in Chapter 12.

Equation (11.3) shows the effect of DAM on the tourist arrivals from Japan to the neighboring islands:

image

image

From this equation, the aggregate effect of all disasters is not statistically different from zero as evident in the coefficient estimate for LnDAM. However, the coefficient estimate of the 2011 dummy is statistically significant and implies that a 1 percent increase in the total damage occurred in 2011 decreases the proportion of the tourist arrivals from Japan to the Pacific Islands by 4 percent.

RD Specifics

Data sometimes are not related to each other in a linear relationship. When this happens, the model needs modifications so that the RD tools can be employed effectively. Since the RD involves time-series data, the best solution is to sketch a time-series plot of the data to see if data exhibit a linear trend or a nonlinear one. In general, a quadratic model for time-series data can be written as:

image

A cubic polynomial can be written in a similar manner. Since these generate a new variable for each model, x* = x2, or x* = x3, the models can be estimated using the linear regression techniques learned in previous chapters.

A modification can be introduced to see the difference in the coefficient estimates to the left and right vicinities of the threshold by interacting t with Dt. Substituting t with (tt0), where t0 is the threshold, will allow for easy interpretation of the different effects slightly before and after the threshold. For a linear model, we can write:

image

The coefficient c1 is still the change in the treatment effect at the threshold, where t = t0. In the vicinity of the threshold, the treatment effect is c1 + d (t – t0). Prof. Metric reminds us that Equation (11.5) is just an application of the spline models introduced in Chapter 8.

The Model can also be modified for a nonlinear trend as follows:

image

The coefficient c1 is again the change in the treatment effect at the cutoff, where t = t0. In the vicinity of the cutoff, the treatment effect is c1 + d1 (t – t0) + d2 (tt0)2.

Fuzzy RD: Minimum Driving Age and Car Crashes

As an example of a fuzzy RD, Prof. Metric tells us that the minimum driving age (MDA) in the United States is determined by each state, which set the MDA anywhere from 14 to 16 years old. Each state also sets certain rules for teenage drivers, in a so called “graduated licensing system,” which is designed to teach teens how to drive well, by gradually increasing their driving privileges as they advance through the system. States also differ on the duration of this tining period, which varies from 12 to 36 months.

According to the U.S. National Highway Traffic Safety Administration, the MDA strongly affects the accident rate, which is due to the increase of young inexperienced drivers whom have reached the MDA. In fact, the automobile accident rate, which is defined as the number of automobile accidents per 1,000 drivers, is dramatically high for young drivers of the age of 16, which is also coincidentally the usual MDA for most teen drivers, and falls sharply once they reach the age of 17. This decrease in the auto accident rate can be due to increased maturity, driving experience or both. Figure 11.1 shows this tendency.

image

Figure 11.1 Automobile accidents per 1,000 drivers by age group in the United States

Source: National Highway Traffic Safety Administration.

In addition, the automobile accident fatality rate, which is defined as the number of automobile accident fatalities per 100,000 residents, is the highest for teens within the first six months after getting their license, according to the Insurance Institute for Highway Safety. In fact, the fatality rate per miles driven for drivers 16 to 19 years old is nearly three times that of drivers 20 years and older in the United States, and the fatality rate per mile driven is nearly twice as high for drivers 16 to 17 years old as it is for drivers 18 to 19 years old. Since there is a range of ages, instead of just a single year, and this range also is dependent on each state’s efforts to reduce the automobile accident rate or automobile accident fatality rate, this is considered a fuzzy RD.

Using Two-Stage Least Squares (2SLS) Estimations for Fuzzy RD

Since the teenage automobile accident rate depends on a state’s graduated licensing system, there is an endogeneity problem, so the fuzzy RD requires IV regressions or 2SLS estimations.

In a fuzzy RD, the magnitude of the change at the cutoff can be estimated by introducing the following structural equation for the second-stage regression:

image

where, yi is district i’s car crash rate and image is the average length of the graduated licensing system in the state. The subscript (i) was put in parenthesis to remind us that district i is not included when calculating the average length of the graduated licensing system in the state. Ri is the running variable.

The reduced form equation for the first-stage regression is:

image

where, Di is the treatment variable, which is excluded from the structural equation. The treatment dummy variable represents the MDA. The inclusion of this dummy in the first stage of the 2SLS implies that the MDA has no direct effect on the automobile accident rate but only an indirect impact through the state’s graduated licensing system. This is another reason for the name “fuzzy RD.”

Estimate Equation (11.8) and obtain the predicted value of to use as an IV for the second stage estimation of the structural Equation (11.7)

image

Prof. Metric reminds us to review Chapter 7 for detailed explanations of the 2SLS estimations and then moves on to the next topic.

Differences-in-Differences

Prof. Metric tells us that when treatment and control groups move in parallel before treatment, we can identify the post-treatment effect by measuring the divergent path of the treated group from the control group, in addition to, the difference caused by a time threshold. This is the DD approach in multiple regression.

Tohoku Earthquake Revisited

Booka tells us that she found an article about Guam’s effort in 2012 to increase Japanese tourism after the Tohoku Earthquake by increasing cultural tours, including folk festivals, dances, night markets and eateries. Invo wonders how this policy affected the tourist arrivals to Guam compared to those of other Pacific islands in 2012. Taila says that she has read this section and that we can use a DD model to see how Guam’s tourist strategy helped this island (Guam) improve over other Pacific islands (Others) in 2012. Prof. Metric commends her on a good idea and tells us that this case is the simplest example of a DD regression.

Booka asks, “Suppose that we calculate the two differences by hand, then which one should we do first?” Taila answers, “I guess either way is OK.” Prof. Metric commends them on their remarks and explains that it indeed does not matter. For the previous example:

image

That is, we calculate the time differences first (in parentheses), then the country difference in Equation (11.10) where, dDD is the DD coefficient and ds, in parentheses, is the time-differences coefficients for Guam and Others, respectively. Alternatively, we can proceed with:

image

Note that in Equation (11.11), we calculate the country differences first (in parentheses) then the time difference.

Prof. Metric asks us to go back to Equation (11.3) where the slope for all Pacific Islands in the question is −0.04, implying a 1 percent increase in the total damage occurred2011 decreases the proportion of tourist arrivals to the Pacific Islands by 4 percent. Invo asks, “Is this evidence of parallel movement without treatment?” Prof. Metric says that he is correct and asks Touro whether he tried to add a dummy for Guam in his subsequent regression. Touro says he did and offers to explain the theoretical foundation of the DD method. He says that in general, the equation is written as follows:

image

where the subscript i is for any identity such as a country or a group of countries, subscript t is for time. X is any control variable, TREAT is the treatment variable for the treated identity, in the earlier example, it equals one for all observations from Guam and zero otherwise. POST is for the period after the treatment; in the Guam case, it equals one for all data points from 2012 onward and zero otherwise. The interaction term TREAT*POST is for observations in Guam from 2012 onward, and its coefficient is for the DD causal effect.

Empirically, we need to add all control variables in Equation (11.1) to Equation (11.12). Touro writes his equation on the board for us to see:

image

where Di = TREAT = 1 for all observations from Guam

TREAT = 0 otherwise

Dt = POST = 1 for all data points from 2012 onward POST = 0 otherwise

Di * Dt = TREAT*POST = 1 for Guam from 2012 onward TREAT*POST = 0 otherwise

Touro tells us that he found dDD = −0.02 with a p-value of 0.035. Invo asks, “Is that implying the cultural tour policy helped Guam cut the damage in half?” Touro says, “Yes, I think before treatment, the slope was −0.04. Now the slope for Guam changes to −0.02, while the slope for the other islands remains at −0.04.”

Prof. Metric praises the class for a good discussion and tells us that we can extend the model to many identities and years. For example, if Palau learned from Guam’s experience and started cultural-tour policy in 2013, and then Fiji started it in 2014, and so on, then we can write the DD equation for all islands as

image

where the variable ISLANDki is for the 15 islands in question, and the variable YEARjt is for the years 2012 to 2017.

Relaxing DD Assumptions

Prof. Metric says that having a strictly parallel trend for all states before a treatment is difficult. It turns out that we do not need this parallel trend. Suppose that the islands have nonparallel trends before the treatment, the model in Equation (5.5) can be modified by adding a country trend variable

image

where the variable ISLANDks*t controls for the different trend over time. As long as, the regression results show a significant jump during the treatment period, the DD model predicts well for the policy effect.

Data Analyses

RD Estimation

Taila tells us that she has read a story about another powerful earthquake that occurred in Hokkaido in Japan on September 25, 2003. It measured 8.3 on the earthquake magnitude scale that destroyed roads, caused severe power outages, and landslides all around Hokkaido. She was surprised to learn that Japan’s imports from the three Indochina countries of Vietnam, Laos, and Cambodia increased after this incident.

Prof. Empirie praises her for a good story and tells us that this happened because Hokkaido has nearly one-fourth of Japan’s total arable land. It ranks first in the nation in the production of many agricultural products. The earthquake destroyed the infrastructure in Hokkaido, resulting in a reduction of Japan’s agricultural products and the subsequent increase in imports of agricultural products from Indochina.

Invo has data on earthquake damage and bilateral trade between Japan and the three Indo-China countries for the period 1980 to 2009. He tells us to go to the Data Analysis folder and click on the file Ch11.xls, Fig. 11.2. In this file

IMPi = Japan’s import from the Indochina countries

GDPi = Japan’s GDP, GDPj is each of the other countries’ GDP

DIST = Distance from each country to Japan

POPi = Population of Japan

POPj = Population of the other countries

DAMAGE = Disaster damage in Japan

Dt = 1 if the year is 2003 to 2009

Dt = 0 otherwise

You should perform the following steps along with us:

Go to Data then Data Analysis, select Regression then click OK.

The input Y range is D1:D181, the input X range is E1:K181.

Check the boxes Labels.

Check the button Output Range and enter M1 then click OK.

A dialogue box will appear, click OK to overwrite the data.

Figure 11.2 shows the results. Cell Q23 reveals that disasters in other years did not affect Japan’s imports (p-value = 0.599), but Cell Q24 shows that there is a significant and positive jump in 2003 (p-value = 9.86*10−6), implying Japan’s imports from the Indochina countries increased after the Hokkaido earthquake.

image

Figure 11.2 Impact of the Hokkaido earthquake: RD results

Prof. Empirie tells us that once we know how to perform a regression for the Sharp RD, the Fuzzy RD is similar, except that 2SLS estimations are employed. Since we have already learned how to perform 2SLS estimations in the previous chapters, she will not provide a demonstration here.

DD Estimation

Booka says she had heard that the increase in exports from the Indo-China countries to Japan from 2003 to 2009 was not equally distributed. While Laos and Vietnam rigorously followed export growth policies, Cambodia fell behind and in fact actually experienced a downward trend of its own exports to Japan from 2003 to 2009. Since Cambodia’s exports to Japan are the same as Japan’s imports from Cambodia, Prof. Empirie tells us to go to the file Ch11.xls, Fig. 11.3, in which the same dataset in the file Ch11.xls, Fig. 11.2 is used with these additional variables:

Di = 1 if the country is Cambodia

Di = 0 zero otherwise

Di * Dt = 1 for Cambodia from 2003 onward

Di * Dt = 0 otherwise

We then perform the following regression:

Go to Data then Data Analysis, select Regression then click OK.

The input Y range is D1:D181, the input X range is E1:M181.

Check the boxes Labels.

Check the button Output Range and enter O1 then click OK.

A dialogue box will appear, click OK to overwrite the data.

Figure 11.3 shows the results, in which Cell S26 shows that there is a significant and negative jump in 2003 (p-value = 0.0002), implying that Japan’s import from Cambodia decreased after the Hokkaido earthquake.

image

Figure 11.3 Impact of the Hokkaido earthquake: DD results without a time trend

Touro asks, “Can we add a trend?” Prof. Empirie tells us that it is possible and that we can go to the file Ch11.xls, Fig. 11.4, in which the same dataset in the file Ch11.xls, Fig. 11.3 is used with these additional variables:

t = 1, 2… 30 for the years 1980, 1981, …2009

Di*t = Trend line for Cambodia

We then run the following regression:

Go to Data then Data Analysis, select Regression then click OK.

The input Y range is D1:D181, the input X range is E1:O181.

Check the boxes Labels.

Check the button Output Range and enter Q1 then click OK.

A dialogue box will appear, click OK to overwrite the data.

Figure 11.4 shows a section of the results, in which Cell U28 shows that there is a significant and negative trend (p-value = −0.0449), implying that Japan’s imports from Cambodia decreased overtime regardless of the Hokkaido earthquake. However, the value in Cell P26 of Figure 11.3 is much larger than that in Cell R28 of Figure 11.4, implying that Japan’s imports from Cambodia decreased even more after the Hokkaido earthquake.

image

Figure 11.4 Impact of the Hokkaido earthquake: DD results with a time trend

To conclude this chapter, Prof. Empirie reminds us to review the Instrumental Variable estimations (2SLS) in the previous chapters so that we can master the fuzzy RD with ease.

Exercises

1. Iowa has an MDA of 14. Propose a model for a sharp RD estimation on the change in quarterly death rates (per 100,000) due to motor vehicle accidents for teenage drivers before and after they turn 14 and a quarterly trend line. Write the equation, provide definition of the variables, and explain the meaning of the variables.

2. Continue with the problem in Exercise 1, but with a slightly different structural equation:

Qa, j = a + βDa, j + λCa, j + ea, i

where Qa,i is the probability that a teen will die, Da,i is the same as in (2), and Ca,i is the probability that the teen will receive a citation from the police.

Assume that a teen’s driving behavior is endogenous and depends on their peer driving behavior, which is measured by citation points that they receive from the police for traffic violations.

Propose a model for a fuzzy RD estimation (using 2SLS) on the change in death rates (per 100,000) due to motor vehicle accidents for teenage drivers before and after they turn 14. Write the equation, provide definition of the variables, and explain the meaning of the variables.

3. In 1971, Montana lowered its minimum legal drinking age (MLDA) from 21 to 19; whereas, Missouri decided to keep its MLDA at 21. Propose a model for a DD estimation on the change in death rates (in 100,000) due to the MLDA, for residents of the two states in question at the age of 19 during the years 1970 to 1971. Write the equation, provide definition of the variables, and explain the meaning of the variables.

4. The file Terrorism in the folder Data for Exercises provides data on terrorism damage and tourist arrivals in Iraq, Syria, and Libya from the United States. You suspect that the increase in terrorist activities in 2008 reduced tourist arrivals in Iraq, Syria, and Libya from the United States.

a. Perform an RD with the cut point at 2008 and provide an interpretation of the results.

b. You wish to see if tourist arrivals in Iraq from the United States is any worse than those from the other two countries. Generate additional variables to perform a DD regression and provide an interpretation of the results.

5. Table 11.1 provides results of the estimation on the change in death rates (in 100,000) due to the MDA for residents 13 to 17 years old in the U.S. 50 states during 2000 to 2015. Column (1) displays the estimated results without the state trends, and Column (2) displays the estimated results with the state trends.

a. What are the similarities and disparities between the results in the two columns regarding the magnitude and significance levels?

b. What are the implications?

Table 11.1 Death rates (in 100,000) due to MDA

Variables

Column (1)

Column (2)

Coefficient

p-value

Coefficient

p-value

All death

9.86

0.034

8.43

0.035

Moto vehicles accidents

5.19

0.041

4.13

0.043

Others

1.45

0.038

1.02

0.033

State trends

no

no

yes

yes

Note: The state trend reflects changes in death rates common to all states such as the implementation of public smoking laws, a rise in MLDA, or an increase in vehicle safety.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.222.132