Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 5
Robust Inference and Estimation for Non‐spherical Errors

5.1 Robust Inference

In this chapter we focus on relaxing the hypothesis of independence and homoscedasticity of the remainder errors. Independent and identically distributed (i.i.d.) errors can seldom be taken for granted in the mostly non‐experimental contexts of econometrics. In the so‐called robust approach to model diagnostics, one relaxes the hypothesis of homoscedastic and independent errors from the beginning, and consequently uses an appropriate estimator for the parameters' covariance matrix, instead of testing for departures from sphericity after estimation, as is customary in the classical approach.

In panel data, error correlation often descends from clustering issues: the group (firm, individual, country) and the time dimension define natural clusters; observations sharing a common individual unit, or time period, are likely to share common characters, violating the independence assumption and potentially biasing inference. In particular, variance estimates derived under the random sampling assumption are typically biased downward, possibly leading to false significance of model parameters. Although clustering can often be an issue in cross‐sectional data too, especially when employing data at different levels of aggregation (Moulton, 1986, 1990), it is such an obvious feature in panels that a number of robust covariance estimators have been devised for the most common situations: within‐individual and/or ‐time period correlation, the former of either time‐constant or time‐decaying type, and cross correlation between different individuals over time.

Next to the panel‐specific implementation of the well‐known heteroscedasticity‐consistent covariance, there are a number of other robust covariance estimators specifically devised for panel data. We will now review the general idea of sandwich estimation, its application in a panel setting, and lastly the best known covariance estimators for the most common cases of nonsphericity in the errors and their implementations in plm.

5.1.1 Robust Covariance Estimators

Consider a linear model and the OLS estimator . If the error terms are independent and identically distributed, then the estimated covariance matrix of estimators takes the familiar textbook form: , where is an estimate of the error variance. This is the classical case, also known as spherical errors, and the relative formulation of is often referred to as “OLS covariance”.

Let us consider robust estimation in the context of the simple linear model outlined above. The problem at hand is to estimate the covariance matrix of the OLS estimator relaxing the assumptions of serial correlation and/or homoscedasticity without imposing any particular structure to the errors' variance or interdependence. The OLS parameters' covariance matrix with a general error covariance is:

According to the seminal work of White (1980), in order to consistently estimate , it is not necessary to estimate all the unknown elements in the matrix but only the ones in

which may be called the meat of the sandwich, the two being the bread. All that is required are pointwise consistent estimates of the errors, which is satisfied by consistency of the estimator for (see Greene, 2003). In the heteroscedasticity case, correlation between different observations is ruled out, and the meat reduces to

where the unknown s can be substituted by (see White, 1980). In the serial correlation case, the natural estimation counterpart would be but this structure proves too general to achieve convergence. Newey and West (1987) devise a heteroscedasticity and‐autocorrelation consistent estimator that works based on the assumption of correlation dying out as the distance between observations increases. The Newey‐West HAC estimator for the meat takes that of White and adds a sum of covariances between the different residuals, smoothed out by a kernel function giving weights decreasing with distance:

with the weight from the kernel smoother. For the latter, Newey and West (1987) chose the well‐known Bartlett kernel function: . The lag is usually truncated well below sample size: one popular rule of thumb is (see Greene, 2003; Driscoll and Kraay, 1998).

In the following we will consider the extensions of this framework for a panel data setting where, thanks to added dimensionality, various combinations of the two above structures will turn out to be able to accommodate very general types of dependence.

5.1.1.1 Cluster‐robust Estimation in a Panel Setting

Clustering estimators extend the sandwich principle to panel data. Besides heteroscedasticity, the added dimensionality allows to obtain robustness against totally unrestricted time‐wise or cross‐sectional correlation, provided this is along the “smaller” dimension. In the case of “large‐” (wide) panels, the big cross‐sectional dimension allows robustness against serial correlation (Arellano, 1987); in “large‐” (long) panels, on the converse, robustness to cross‐sectional correlation can be attained drawing on the large number of time periods observed. As a general rule, the estimator is asymptotic in the number of clusters.

Imposing cross‐sectional (serial) independence in fact restricts all covariances between observations belonging to different individuals (time periods) to zero, yielding an error covariance matrix that is block‐diagonal, with blocks of the form:

(5.1)

and the consistency relies on the cross‐sectional dimension being “large enough” with respect to the number of free covariance parameters in the diagonal blocks. The other case is symmetric.

White's heteroscedasticity‐consistent covariance matrix has been extended to clustered data by Liang and Zeger (1986) and to econometric panel data by Arellano (1987). Observations can be clustered by the individual index, which is the most popular use of this estimator and is appropriate in large, short panels because it is based on ‐asymptotics, or by the time index, which is based on ‐asymptotics and therefore appropriate for long panels. In the first case, the covariance estimator is robust against cross‐sectional heteroscedasticity and also against serial correlation of arbitrary form; in the second case, symmetrically, against time‐wise heteroscedasticity and cross‐sectional correlation. Arellano's original estimator, an instance of the first case, has the form:

(5.2)

It is of course still feasible to rule out serial correlation and compute an estimator that is robust to heteroscedasticity only, based on the following error structure:

(5.3)

in which case the original White estimator applies:

(5.4)

The case of clustering by time period is symmetric to that along the other dimension: data are assumed to be serially independent and allowed to have arbitrary heteroscedasticity and an unrestricted cross‐sectional dependence structure.

(5.5)

Example 5‐1 clustered standard errors for pooled models – `Produc` data set

Munnell (1990) analyzes the impact of public infrastructure on economic activity by drawing on a sample of 48 US states (all continental states minus the District of Columbia) over 17 years, 1970–1986. She specifies a Cobb‐Douglas production function that relates the gross social product (gsp) of a given state to the input of public capital (pcap), private capital (pc) and labor (emp); she also includes the state unemployment rate (unemp) to capture business cycle effects:

 library("plm")
data("Produc", package = "plm")
fm <- log(gsp) ˜ log(pcap) + log(pc) + log(emp) + unemp

The function coeftest from package lmtest produces a compact coefficients table allowing for a flexible choice of the covariance matrix. We calculate a heteroscedasticity‐robust diagnostic table for two statistically equivalent models. First, pooled OLS by lm:

 lmmod <- lm(fm, Produc)
library("lmtest")
library("sandwich")
coeftest(lmmod, vcov = vcovHC)

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)
(Intercept)  1.64330    0.07161   22.95  < 2e-16 ***
log(pcap)    0.15501    0.01870    8.29  4.7e-16 ***
log(pc)      0.30919    0.01263   24.48  < 2e-16 ***
log(emp)     0.59393    0.01979   30.01  < 2e-16 ***
unemp       -0.00673    0.00135   -4.99  7.5e-07 ***
‐‐‐
Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Next, we compute pooled OLS by plm. The coeftest function complies with plm objects, so the same syntax as above can be employed. In turn, the summary.plm method is itself compliant with providing a custom covariance (a note about using a nonstandard covariance will be issued):

 plmmod <- plm(fm, Produc, model = "pooling")
summary(plmmod, vcov = vcovHC)
Pooling Model

Note: Coefficient variance-covariance matrix supplied: vcovHC

Call:
plm(formula = fm, data = Produc, model = "pooling")

Balanced Panel: n = 48, T = 17, N = 816

Residuals:
     Min.   1st Qu.    Median   3rd Qu.      Max.
-0.231762 -0.061037 -0.000102  0.050852  0.351113

Coefficients:
            Estimate Std. Error t-value Pr(>|t|)
(Intercept)  1.64330    0.24418    6.73  3.2e-11 ***
log(pcap)    0.15501    0.06012    2.58     0.01 *
log(pc)      0.30919    0.04623    6.69  4.2e-11 ***
log(emp)     0.59393    0.06861    8.66  < 2e-16 ***
unemp       -0.00673    0.00309   -2.18     0.03 *
‐‐‐
Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    850
Residual Sum of Squares: 6.29
R-Squared:      0.993
Adj. R-Squared: 0.993
F-statistic: 2778.06 on 4 and 47 DF, p-value: <2e-16

Coefficients are obviously the same, but the estimated standard errors will turn out different. In particular, the standard error of the coefficient on pcap is much larger, and while still significant at the 5% level, it is not any more at the 1% level. This is because the classes of the model objects to be tested are different, and so are the default settings of the vcovHC.lm and vcovHC.plm methods. Only if one overrides the defaults, here, specifying the method as 'white1' and the small sample correction as 'HC3', the lm results will be replicated. Therefore, thanks to object orientation, if applying the generic robust method vcovHC to a panelmodel object, one will get a result that is likely to be “sensible” for the most common applications.

Clustering in Non‐Panels

Clustering can occur in non‐panel settings too. Whenever a grouping index of some sort is provided and there is reason to believe that errors are dependent within groups defined by that index, the clustered standard errors can be employed to account for heteroscedasticity across groups and for within group correlation of any kind, not limited to proper serial correlation in time. One example is when a regression is augmented with variables at a higher level of aggregation.

The seminal example is in Moulton (1986, 1990): if some regressors are observed at group level, as is the case, e.g., when adding local GDP to individual data drawn from different geographical units, then standard errors have to be adjusted for intra‐group correlation.

Froot (1989), in the context of financial data, discusses sampling firms from different industries, assumed mutually independent. In his application, clustering is employed to account for within‐industry dependence, while it would be meaningless across the “other” dimension.

Any dataset mixing different levels of detail is prone to this issue. In such cases, panel data methods can seamlessly be employed on cross‐sectional datasets by specifying the relevant grouping variable as the first element of the index. The second one will obviously be left blank as there would be no meaningful second dimension.

Example 5‐2 Clustered standard errors for non‐panel data – `Hedonic` data set

Harrison and Rubinfeld (1978) consider the median values of owner‐occupied homes in a cross section of 506 census tracts from 92 towns in the Boston area. Values are explained by a combination of tract‐ and town‐level variables. Crime rate (crim), pollution (nox), average number of rooms (rm) and age, distance to employment centers (dis), and proportion of blacks in the population are observed at tract level. Other variables such as proportion of industrial dwellings (indus), distance to radial highways (rad), property tax rate, and pupil‐to‐teacher ratio in local schools (ptratio) are observed at town level, thus leading to the Moulton problem. The town identifier for each tract (townid) allows to account for clustering within each town, which may comprise from 1 to 30 tracts. We estimate OLS with HC SEs by lm:

 data("Hedonic", package = "plm")
hfm <- mv ˜ crim + zn + indus + chas + nox + rm + age + dis +
    rad + tax + ptratio + blacks + lstat

 hlmmod <- lm(hfm, Hedonic)
coeftest(hlmmod, vcov = vcovHC)

t test of coefficients:

             Estimate Std. Error t value Pr(>|t|)
(Intercept)  9.76e+00   1.74e-01   56.21  < 2e-16 ***
crim        -1.19e-02   2.85e-03   -4.16  3.8e-05 ***
zn           8.03e-05   3.89e-04    0.21  0.83670
indus        2.41e-04   1.84e-03    0.13  0.89589
chasyes      9.14e-02   3.71e-02    2.46  0.01413 *
nox         -6.38e-03   1.26e-03   -5.08  5.4e-07 ***
rm           6.33e-03   2.11e-03    3.00  0.00282 **
age          8.98e-05   6.07e-04    0.15  0.88252
dis         -1.91e-01   4.08e-02   -4.69  3.5e-06 ***
rad          9.57e-02   2.03e-02    4.71  3.2e-06 ***
tax         -4.20e-04   1.17e-04   -3.59  0.00036 ***
ptratio     -3.11e-02   4.17e-03   -7.47  3.7e-13 ***
blacks       3.64e-01   1.54e-01    2.36  0.01884 *
lstat       -3.71e-01   3.94e-02   -9.43  < 2e-16 ***
‐‐‐
Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

and pooled OLS by plm; then we compare White and clustered standard errors:.

 hplmmod <- plm(hfm, Hedonic, model = "pooling", index = "townid")

sign.tab <- cbind(coef(hlmmod), coeftest(hlmmod, vcov = vcovHC)[,4],
                  coeftest(hplmmod, vcov = vcovHC)[, 4])
dimnames(sign.tab)[[2]] <- c("Coefficient", "p-values, HC", "p-val., cluster")
round(sign.tab, 3)
            Coefficient p-values, HC p-val., cluster
(Intercept)       9.756        0.000           0.000
crim             -0.012        0.000           0.000
zn                0.000        0.837           0.882
indus             0.000        0.896           0.933
chasyes           0.091        0.014           0.064
nox              -0.006        0.000           0.003
rm                0.006        0.003           0.090
age               0.000        0.883           0.914
dis              -0.191        0.000           0.004
rad               0.096        0.000           0.000
tax               0.000        0.000           0.005
ptratio          -0.031        0.000           0.000
blacks            0.364        0.019           0.235
lstat            -0.371        0.000           0.000

Proximity to the Charles River, average number of rooms, and the proportion of blacks in the population are not significant any more after clustering by town.

5.1.1.2 Double Clustering

Double clustering methods have originated in the financial literature (Petersen, 2009; Cameron et al., 2011; Thompson, 2011) and are motivated by the need to account for persistent shocks (another name for individual, time‐invariant error components) and at the same time for cross‐sectional or spatial correlation. The former feature, persistent shocks, is usually dealt with in the econometric literature by parametric estimation of random effects models; the latter through spatial panels, where again it is estimated parametrically imposing a structure to the dependence, or common factor models. As Cameron et al. (2011) observe, though, double clustering, as all robustified inference of this kind, relies on much weaker assumptions as regards the data‐generating process than parametric modeling of dependence does. In fact, this estimator combining both individual and time clustering relies on a combination of the asymptotics of each: the minimum number of clusters along the two dimensions must go to infinity (which will be especially appropriate for data‐rich financial applications, less so in the smaller samples that are frequently encountered in economics). Apart from this, any dependence structure is allowed within each group or within each time period, while cross‐serial correlations between observations belonging to different groups and time periods are ruled out.

Cameron et al. (2011) have shown how the double‐clustered estimator is simply calculated by summing up the group‐clustering and the time‐clustering ones, then subtracting the standard White estimator in order to avoid double counting the error variances along the diagonal:

(5.6)

In order to control for the effect of common shocks, Thompson (2011) proposes to add to the sum of covariances one more term, related to the covariances between observations from any group at different points in time. Given a maximum lag , this will be the sum over of the following generic term:

(5.7)

representing the covariance between pairs of observations from any group distanced periods in time. As the correlation between observations belonging to the same group at different points in time has already been captured by the group‐clustering term, to avoid double counting one must subtract the within‐groups part:

(5.8)

for each . The resulting estimator

(5.9)

is robust to cross‐sectional and time‐wise correlation inside, respectively, time periods and groups and to the cross‐serial correlation between observations belonging to different groups, up to the ‐th lag.

5.1.1.3 Panel Newey‐west and SCC

As mentioned above, in a time series context Newey and West (1987) have proposed an estimator that is robust to serial correlation as well as to heteroscedasticity. This estimator, based on the hypothesis of the serial correlation dying out “quickly enough,” takes into account the covariance between units by weighting it through a kernel‐smoothing function giving less weight as they get more distant and adding it to the standard White estimator.

A panel version of the original Newey‐West estimator can be obtained as:

(5.10)

As can readily be seen, the Newey‐West non‐parametric estimator closely resembles the double clustering plus lags, the difference being that instead of adding a (possibly truncated) sum of unweighted lag terms, the latter downweighs the correlation between “distant” terms through a kernel‐smoothing function.

Driscoll and Kraay (1998) have adapted the Newey‐West estimator to a panel time series context where not only serial correlation between residuals from the same individual in different time periods is taken into account but also cross‐serial correlation between different individuals in different times and, within the same period, cross‐sectional correlation (see also Arellano, 2003).

The Driscoll and Kraay estimator, labeled SCC (as in “spatial correlation consistent”), is defined as the time‐clustering version of Arellano plus a sum of lagged covariance terms, weighted by a distance‐decreasing kernel function :

(5.11)

The “SCC” covariance estimator requires the data to be a mixing sequence, i.e., roughly speaking, to have serial and cross‐serial dependence dying out quickly enough with the dimension, which is therefore supposed to be fairly large: Driscoll and Kraay (1998), based on Monte Carlo simulation, put the practical minimum at ; the dimension is irrelevant in this respect and is allowed to grow at any rate relative to .

As is apparent from Equation 5.1.1.3, if the maximum lag order is set to 0 (no serial or cross‐serial dependence is allowed) the SCC estimator becomes the cross‐section version (time‐clustering) of the Arellano estimator . On the other hand, if the cross‐serial terms are all unweighted (i.e., if ), then .

A Comprehensive Definition

Let us now look systematically at the similarities between the above formulas, embedding them into an encompassing one (see Millo, 2017b). A comprehensive formulation can be written in terms of White's heteroscedasticity‐consistent covariance matrix , the group‐clustering and time‐clustering ones and , and an appropriate kernel‐weighted sum of their lags:

(5.12)

The different estimators are in turn particularizations of the above and can be expressed in terms of the same basic common components, as shown in Table 5.1. A function vcovG making either , , or is provided at user level, mainly for educational purposes, and is used internally to construct all other estimators.¹

images — Table 5.1 Covariance structures as combinations of the basic building blocks.

Higher‐level functions are provided to produce the double‐clustering and kernel‐smoothing estimators by (possibly weighted) sums of the former terms. The general tool in this respect, in turn based on vcovG, is vcovSCC, which computes weighted sums of according to a weighting function that is by default the Bartlett kernel. The default values will yield the Driscoll and Kraay estimator, . As the SCC estimator differs from the (one‐way) time‐shocks‐robust version of the double‐clustering a la Cameron et al. (2011) only by the distance‐decaying weighting of the covariances between different periods so that , no weighting (equivalent to passing the constant 1 as the weighting function: wj=1) will produce the building blocks for double clustering, according to formula 5.9.

Convenient wrappers are provided as the tool of choice for the end user: vcovNW computes the panel Newey‐West estimator ; vcovDC the double‐clustering one .

Example 5‐3 Newey‐West and double‐clustering estimators – `Produc` data set

Reconsidering the Munnell (1990) example, one might want to account for both the spatial correlation between states observed in the same time period and for the serial correlation within the same state and across different ones. To this end, one may supply the vcovSCC function to the vcov argument in coeftest:

 coeftest(plmmod, vcov=vcovSCC)

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)
(Intercept)  1.64330    0.15035   10.93  < 2e-16 ***
log(pcap)    0.15501    0.03697    4.19  3.1e-05 ***
log(pc)      0.30919    0.00764   40.45  < 2e-16 ***
log(emp)     0.59393    0.03870   15.35  < 2e-16 ***
unemp       -0.00673    0.00254   -2.65   0.0082 **
‐‐‐
Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

or possibly, if allowing for double clustering,

 coeftest(plmmod, vcov=vcovDC)

(results omitted, see the next example). More complicated structures allowing for two‐way clustering and error persistence in the sense of Thompson (2011) can be obtained by combination, as illustrated above. Below, the case of double clustering plus four periods of persistent (unweighted) shocks a la Thompson (2011) (notice that the weighting function wj has been defined as the constant 1 but must still be a function of two arguments):

 myvcovDCS <- function(x, maxlag = NULL, ...) {
    w1 <- function(j, maxlag) 1
    VsccL.1 <- vcovSCC(x, maxlag = maxlag, wj = w1, ...)
    Vcx <- vcovHC(x, cluster = "group", method = "arellano", ...)
    VnwL.1 <- vcovSCC(x, maxlag = maxlag, inner = "white", wj = w1, ...)
    return(VsccL.1 + Vcx - VnwL.1)
}
coeftest(plmmod, vcov=function(x) myvcovDCS(x, maxlag = 4))

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)
(Intercept)  1.64330    0.27694    5.93  4.4e-09 ***
log(pcap)    0.15501    0.06612    2.34    0.019 *
log(pc)      0.30919    0.03265    9.47  < 2e-16 ***
log(emp)     0.59393    0.07244    8.20  9.5e-16 ***
unemp       -0.00673    0.00375   -1.80    0.073 .
‐‐‐
Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Example 5‐4 computing an array of standard errors – `Produc` data set

In the following applied example, still considering the Munnell (1990) model, we take advantage of the capabilities of the R language for compactly presenting the complete array of standard error estimates for each estimator in Table 5.1 by defining a vector of covariance functions and then looping on it.²

Looping on a vector of functions is a useful consequence of R treating functions as a data type. For the sake of clarity, let us predefine some functions for calculating the different covariance estimators with the appropriate parameters (leaving the maximum lag calculation at its default value of ):

 Vw <- function(x) vcovHC(x, method = "white1")
Vcx <- function(x) vcovHC(x, cluster = "group", method = "arellano")
Vct <- function(x) vcovHC(x, cluster = "time", method = "arellano")
Vcxt <- function(x) Vcx(x) + Vct(x) - Vw(x)
Vct.L <- function(x) vcovSCC(x, wj = function(j, maxlag) 1)
Vnw.L <- function(x) vcovNW(x)
Vscc.L <- function(x) vcovSCC(x)
Vcxt.L <- function(x) Vct.L(x) + Vcx(x) - vcovNW(x, wj = function(j, maxlag) 1)

then build up a vector of functions on which to loop:

 vcovs <- c(vcov, Vw, Vcx, Vct, Vcxt, Vct.L, Vnw.L, Vscc.L, Vcxt.L)
names(vcovs) <- c("OLS", "Vw", "Vcx", "Vct", "Vcxt", "Vct.L", "Vnw.L",
                  "Vscc.L", "Vcxt.L")

in order to calculate a comprehensive table of from robust estimators. To this end we define a convenience function:

 cfrtab <- function(mod, vcovs, ...) {
    cfrtab <- matrix(nrow = length(coef(mod)), ncol = 1 + length(vcovs))
    dimnames(cfrtab) <- list(names(coef(mod)),
                             c("Coefficient", paste("s.e.", names(vcovs))))
    cfrtab[,1] <- coef(mod)
    for(i in 1:length(vcovs)) {
        myvcov = vcovs[[i]]
        cfrtab[ , 1 + i] <- sqrt(diag(myvcov(mod)))
        }
    return(t(round(cfrtab, 4)))
}

The additive nature of the three basic components , , and allows the researcher to infer on the relative importance of each clustering dimension by looking at the contribution of each to the standard error estimate, so that if, e.g., , then this is evidence of important cross‐sectional correlation (Petersen, 2009).

 cfrtab(plmmod, vcovs)
            (Intercept) log(pcap) log(pc) log(emp)   unemp
Coefficient      1.6433    0.1550  0.3092   0.5939 -0.0067
s.e. OLS         0.0576    0.0172  0.0103   0.0137  0.0014
s.e. Vw          0.0708    0.0185  0.0125   0.0195  0.0013
s.e. Vcx         0.2442    0.0601  0.0462   0.0686  0.0031
s.e. Vct         0.0944    0.0232  0.0063   0.0246  0.0018
s.e. Vcxt        0.2520    0.0617  0.0450   0.0702  0.0033
s.e. Vct.L       0.1875    0.0461  0.0079   0.0480  0.0031
s.e. Vnw.L       0.1144    0.0299  0.0206   0.0316  0.0020
s.e. Vscc.L      0.1503    0.0370  0.0076   0.0387  0.0025
s.e. Vcxt.L      0.2722    0.0657  0.0389   0.0736  0.0036

For this pooled OLS model, standard errors estimates assuming group clustering are consistently larger that the rest, including Newey‐West and SCC, pointing at non‐decaying serial error dependence.

5.1.2 Generic Sandwich Estimators and Panel Models

plm provides a comprehensive set of modular tools: lower‐level components, conceptually corresponding to the statistical “objects” involved, (see Zeileis, 2006a,b), and a higher‐level set of “wrapper functions” corresponding to standard parameter covariance estimators as they would be used in statistical packages, which work by combining the same, few lower‐level components in multiple ways in the spirit of the Lego principle of Hothorn et al. (2006).

When estimating regression models, R creates a model object that, together with estimation results, carries on a wealth of useful information, including the original data. Robust testing in R is done retrieving the necessary elements from the model object, using them to calculate a robust covariance matrix for coefficient estimates and then feeding the latter to the actual test function, for example a t‐test for significance or a Wald restriction test. This approach to diagnostic testing is more flexible than with standard econometric software packages, where diagnostics usually come with standard output. In our case, for example, one can obtain different estimates of the standard errors under various kinds of dependence without re‐estimating the model and present them compactly.

Robust covariance estimators a la White or a la Newey and West for different kinds of regression models are available in package sandwich (Lumley and Zeileis, 2007) under form of appropriate methods for the generic functions vcovHC and vcovHAC (Zeileis, 2004, 2006a). These are designed for data sampled along one dimension; therefore, they cannot generally be used for panel data, yet they provide a uniform and flexible software approach, which has become standard in the R environment. The corresponding plm methods described in this chapter have therefore been designed to be sintactically compliant with them.

For example, a vcovHC.plm method for the generic vcovHC is available, allowing to apply sandwich estimators to panel models in a way that is natural for users of the sandwich package. In fact, despite the different structure “under the hood,” the user will, e.g., specify a robust covariance for the diagnostics table of a panel model the same way she would for a linear or a generalized linear model, the object‐orientation features of R taking care that the right statistical procedure be applied to the model object at hand. What will change, though, are the defaults: the vcovHC.lm method defaults to the original White estimator, while vcovHC.plm to clustering by groups, both the most obvious choices for the object at hand.

Next to the HC estimator of White (1980), all variants of the panel‐specific estimators used in applied practice (Arellano, 1987; Newey and West, 1987; Driscoll and Kraay, 1998; Cameron et al., 2011) are provided; all can be applied to objects representing panel models of different kinds: FE, RE, FD, and, obviously, pooled OLS. The estimate of the parameters' covariance thus obtained can in turn be plugged into diagnostic testing functions, producing either significance tables or hypothesis tests. A function is a regular object type in R, hence compact comparisons of standard errors from different (statistical) methods can be produced by looping on covariance types, as shown in the examples.

Application to Models on Transformed Data

The application of the above estimators to pooled data is always warranted, subject to the relevant assumptions mentioned before. In some, but not all cases, these can also be applied to random or fixed effects panel models, or models estimated on first‐differenced data. In all of these cases the estimator is computed as OLS on transformed (partially or totally demeaned, first differenced) data. In general, the same transformation used in estimation is employed. Sandwich estimators can then be computed by applying the usual formula to the transformed data and residuals: (see Arellano (1987) and Wooldridge (2002, Eq. 10.59) for the fixed effects case, Wooldridge (2002, Ch.10) in general).

Under the fixed effects hypothesis, the OLS estimator is biased and FE is required for consistency of parameter estimates in the first place. Similarly, under the hypothesis of a unit root in the errors, first differencing the data is warranted in order to revert to a stationary error term. On the contrary, under the random effects hypothesis, OLS is still consistent, and asymptotically, using RE instead makes no difference. Yet for the sake of parameter covariance estimation, it may be advisable to eliminate time‐invariant heterogeneity first, by using one of the above.

One compelling reason for combining a demeaning or a differencing estimator with robust standard errors may be to get rid of persistent individual effects before applying a more parsimonious and efficient kernel‐based covariance estimator if cross‐serial correlation is suspected or if the sample is simply not big enough to allow double clustering. In fact, as Petersen (2009) shows, the Newey‐West‐ type estimators are biased if effects are persistent, because the kernel smoother unduly downweighs the covariance between faraway observations.

In the following we discuss when it is appropriate to apply clustering estimators to the residuals of demeaned or first‐differenced models.

Fixed Effects

The fixed effects estimator requires particular caution. In fact, under the hypothesis of spherical errors in the original model, the time‐demeaning of data induces a serial correlation in the demeaned residuals (see Wooldridge, 2002, p. 310).

The White‐Arellano estimator has originally been devised for this case. By way of symmetry, it can be used for time‐clustered data with time fixed effects. The combination of group clustering with time fixed effects and the reverse is inappropriate because of the serial (cross‐sectional) correlation induced by the time‐ (cross‐sectional‐) demeaning.

By analogy, the Newey‐West‐type estimators can be safely applied to models with individual fixed effects, while the time and two‐way cases require caution. The best policy in both cases, if the degrees of freedom allow, is perhaps to explicitly add dummy variables to account for the fixed effects along the “short” dimension.

Random Effects

In the random effects case, as Wooldridge (2002) notes, the quasi‐time demeaning procedure removes the random effects reducing the model on transformed data to a pooled regression, thus preserving the properties of the White‐type estimators. By extension of this line of reasoning, all above estimators are applicable to the demeaned data of a random effects model, provided the transformed errors meet the relevant assumptions.

First‐Differences

First‐differencing, like fixed effects estimation, removes time‐invariant effects. Roughly speaking, the choice between the two rests on the properties of the error term: if it is assumed to be well behaved in the original data, then FE is the most efficient estimator and is to be preferred; if on the contrary the original errors are believed to behave as a random walk, then first‐differencing the data will yield stationary and uncorrelated errors and is therefore advisable (see Wooldridge, 2002, p. 317). Given this, FD estimation is nothing else than OLS on differenced data, and the usual clustering formula applies (see Wooldridge, 2002, p. 318 and Chapter here). As in the RE case, the statistical properties of the different covariance estimators will depend on whether the transformed errors meet the relevant assumptions.

Example 5‐5 random effects and robust covariances – `Produc` data set

Consider again the comprehensive table of estimators for the Munnell (1990) model in the previous example. The relative magnitude of standard errors under group clustering with respect to the others was hinting at error correlation in time. In the following, the previous table is replicated on a random effects specification:

 replmmod <- plm(fm, Produc)
cfrtab(replmmod, vcovs)
            log(pcap) log(pc) log(emp)   unemp
Coefficient   -0.0261  0.2920   0.7682 -0.0053
s.e. OLS       0.0290  0.0251   0.0301  0.0010
s.e. Vw        0.0312  0.0305   0.0398  0.0011
s.e. Vcx       0.0603  0.0617   0.0817  0.0025
s.e. Vct       0.0454  0.0480   0.0627  0.0015
s.e. Vcxt      0.0688  0.0720   0.0949  0.0027
s.e. Vct.L     0.0640  0.0644   0.0941  0.0015
s.e. Vnw.L     0.0434  0.0417   0.0562  0.0015
s.e. Vscc.L    0.0575  0.0588   0.0828  0.0015
s.e. Vcxt.L    0.0717  0.0747   0.1054  0.0023

The cross‐sectional dependence component becomes relatively more important when accounting for time persistence in the model through random individual (country) effects.

5.1.2.1 Panel Corrected Standard Errors

Unconditional covariance estimators are based on the assumption of no error correlation in time (cross‐section) and of an unrestricted but invariant correlation structure inside every cross section (time period). ³ They are popular in contexts characterized by relatively small samples, with prevalence of the time dimension. The most common use is on pooled time series, where the assumption of no serial correlation can be accommodated, for example, by adding lagged values of the dependent variable.

Beck and Katz (1995), in the context of political science models with moderate time and cross‐sectional dimensions, introduced the so‐called panel corrected standard errors (PCSE), which, in the original time‐clustering setting, are robust against cross‐sectional heteroscedasticity and correlation. The “PCSE” covariance is based on the hypothesis that the covariance matrix of the errors in every group be the same: , with

(5.13)

so that can be estimated by:

from which can be constructed and inserted in the usual “sandwich” formula.

Example 5‐6 time fixed effects model – `agl` data set

Alvarez et al. (1991) estimate a model where economic performance in a panel of 16 countries over 15 years is related to political and labor organization variables: union strength (central) and the prevalence of a leftist cabinet (leftc). They control for trade openness of countries toward other OECD and for lagged growth, instrumented through an auxiliary regression. They originally use the FGLS estimator of Parks (1967), finding out that economic performance is enhanced where strong unions coexist with an important presence of leftist movements in government or in the opposite situation (rightist governments with weak unions), being less satisfactory for in‐between cases. Their original results (see the example in the next section) are very sharp, with narrow standard errors. Beck et al. (1993) attribute the narrow confidence bands to the estimator employed being inappropriate for the sample size at hand; they re‐examine the data using OLS estimation of a dynamic model with time fixed effects and time‐clustered errors, upholding previous conclusions as regards the effects on growth (although with lower significance) but rendering mixed evidence for inflation and unemployment. The dataset is included in package pcse (Bailey and Katz, 2011):

 library("pcse")
data("agl", package = "pcse")

In the following we estimate the model with time fixed effects⁴ and produce the diagnostics table with PCSE standard errors:

 fm <- growth ˜ lagg1 + opengdp + openex + openimp + central * leftc
aglmod <- plm(fm, agl, model = "w", effect = "time")
coeftest(aglmod, vcov=vcovBK)

t test of coefficients:

               Estimate Std. Error t value Pr(>|t|)
lagg1          0.095085   0.117523    0.81  0.41935
opengdp        0.007256   0.001735    4.18  4.2e-05 ***
openex         0.002373   0.000882    2.69  0.00768 **
openimp       -0.006475   0.002301   -2.81  0.00534 **
leftc         -0.023378   0.008009   -2.92  0.00388 **
central:leftc  0.013172   0.003497    3.77  0.00021 ***
‐‐‐
Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

5.1.3 Robust Testing of Linear Hypotheses

The main use of robust covariance estimators is together with testing functions from the lmtest (Zeileis and Hothorn, 2002) and car (Fox and Weisberg, 2011) packages. We have seen the special case of testing single exclusion restrictions through coeftest: in order of increasing generality, joint restrictions can be tested through waldtest, while linearHypothesis from package car enables testing a general linear hypothesis on model parameters.

Example 5‐7 testing with robust covariance matrices – `Produc` data set

All these functions typically allow passing the vcov parameter either as a matrix or as a function (see Zeileis, 2004). If one is happy with the defaults, it is easiest to pass the function itself, as seen in the previous examples; else, one may do the covariance computation inside the call to coeftest, thus passing on a matrix:

 coeftest(plmmod, vcov = vcovHC(plmmod, type = "HC3"))

or, rather, define an appropriate function inside the call: in this case, optional parameters are provided as shown below (see also Zeileis, 2004, p. 12):

 coeftest(plmmod, vcov = function(x) vcovHC(x, type = "HC3"))

For some tests, e.g., for multiple model comparisons by waldtest, one should always provide a function.

Example 5‐8 testing with robust covariance matrices – `Parity` data set

The next example shows how to extend the comparison across models with different kinds of fixed effects, using linearHypothesis from package car.

Coakley et al. (2006) present a purchasing power parity (PPP) regression on a “long” panel of quarterly data 1973‐Q1 to 1998‐Q4 for 17 developed countries so that and . The estimated model is

where is the relative exchange rate against USD and is the inflation differential between each country and the US.

 data("Parity", package = "plm")
fm <- ls ˜ ld
pppmod <- plm(fm, data = Parity, effect = "twoways")

The hypothesis of interest is , meaning that inflation differentials are fully reflected in the exchange rate. We report the corresponding robust Wald test from linearHypothesis in package car (Fox and Weisberg, 2011), which would be done interactively as follows:

 library("car")
linearHypothesis(pppmod, "ld = 1", vcov = vcov)

(output suppressed), in a compact table supplying different covariance estimators to each of four models: OLS, one‐way time or country fixed effects, and two‐way fixed effects.

 vcovs <- c(vcov, Vw, Vcx, Vct, Vcxt, Vct.L, Vnw.L, Vscc.L, Vcxt.L)
names(vcovs) <- c("OLS", "Vw", "Vcx", "Vct", "Vcxt", "Vct.L", "Vnw.L",
                  "Vscc.L", "Vcxt.L")
tttab <- matrix(nrow = 4, ncol = length(vcovs))
dimnames(tttab) <- list(c("Pooled OLS","Time FE","Country FE","Two-way FE"),
                        names(vcovs))

pppmod.ols <- plm(fm, data = Parity, model = "pooling")
for(i in 1:length(vcovs)) {
    tttab[1, i] <- linearHypothesis(pppmod.ols, "ld = 1",
                                    vcov = vcovs[[i]])[2, 4]
}

pppmod.tfe <- plm(fm, data = Parity, effect = "time")
for(i in 1:length(vcovs)) {
    tttab[2, i] <- linearHypothesis(pppmod.tfe, "ld = 1",
                                    vcov = vcovs[[i]])[2, 4]
}

pppmod.cfe <- plm(fm, data = Parity, effect = "individual")
for(i in 1:length(vcovs)) {
    tttab[3, i] <- linearHypothesis(pppmod.cfe, "ld = 1",
                                    vcov = vcovs[[i]])[2, 4]
}

pppmod.2fe <- plm(fm, data = Parity, effect = "twoways")
for(i in 1:length(vcovs)) {
    tttab[4, i] <- linearHypothesis(pppmod.2fe, "ld = 1",
                                    vcov = vcovs[[i]])[2, 4]
}

print(t(round(tttab, )))
       Pooled OLS  Time FE Country FE Two-way FE
OLS      0.000000 0.000000   0.000000   0.000000
Vw       0.000000 0.000000   0.000000   0.000000
Vcx      0.001032 0.000869   0.070773   0.119787
Vct      0.000000 0.000000   0.000000   0.000000
Vcxt     0.000966 0.000842   0.071866   0.121614
Vct.L    0.000000 0.000000   0.001861   0.000748
Vnw.L    0.000000 0.000000   0.000030   0.000000
Vscc.L   0.000000 0.000000   0.000076   0.000013
Vcxt.L   0.000648 0.000672   0.075022   0.129857

As is apparent from the results' table, the PPP hypothesis is not rejected any more once one controls for, at a minimum, country fixed effects and by‐group clustering.

5.1.3.1 An Application: Robust Hausman Testing

Beside the usual quadratic form, Hausman's specification test can be performed in an equivalent form based on testing a linear restriction in an auxiliary linear model. In particular, it can be computed through an artificial regression of the quasi‐demeaned response over the quasi‐demeaned regressors from the random effects augmented with the fully demeaned regressors from the within model:

The Hausman test is then the redundancy test on , i.e., the restriction test . This artificial regression version of the test can easily be robustified (see Wooldridge, 2002) by using a robust covariance matrix.

Example 5‐9 regression‐based Hausman test – `Grunfeld` data set

We compare the Hausman test in original and regression‐based form for the Grunfeld data. The function phtest allows for an optional argument method, defaulting to 'chisq' (original form); if method is specified as 'aux', the test is performed through the auxiliary regression. Below we compare the two versions, using the default estimated covariance matrix in the auxiliary regression.

 data("Grunfeld", package = "plm")
phtest(inv ˜ value + capital, data = Grunfeld)

Hausman Test

data:  inv ˜ value + capital
chisq = 2.3, df = 2, p-value = 0.3
alternative hypothesis: one model is inconsistent
phtest(inv ˜ value + capital, data = Grunfeld, method = "aux")

Regression-based Hausman test

data:  inv ˜ value + capital
chisq = 2.1, df = 2, p-value = 0.3
alternative hypothesis: one model is inconsistent

Unsurprisingly, the results from the regression‐based and the original Hausman test are consistent: both support the random effects hypothesis.

Example 5‐10 robust Hausman test – `RDSpillovers` data set

The RDSpillovers data are highly heteroscedastic. In this situation, the original Hausman test is biased toward rejection, as is the alternative regression‐based version if not robustified. The latter can nevertheless be computed in robust form, by employing a robust covariance matrix in the restriction test on the auxiliary regression. If method is 'aux', the phtest function admits a further vcov argument, possibly allowing to specify the use of a robust estimator for the covariance. As can be seen from the table below, the results change substantially:

 data("RDSpillovers", package = "pder")
pehs <- pdata.frame(RDSpillovers, index = c("id", "year"))
ehsfm <- lny ˜ lnl + lnk + lnrd
phtest(ehsfm, pehs, method = "aux")

Regression-based Hausman test

data:  ehsfm
chisq = 53, df = 3, p-value = 2e-11
alternative hypothesis: one model is inconsistent

 phtest(ehsfm, pehs, method = "aux", vcov = vcovHC)

Regression-based Hausman test, vcov: vcovHC

data:  ehsfm
chisq = 2.3, df = 3, p-value = 0.5
alternative hypothesis: one model is inconsistent

The robust version of the Hausman test does not reject the random effects hypothesis any more.

5.2 Unrestricted Generalized Least Squares

If the data‐generating process is:

and has a general structure, ordinary least squares estimates for are inefficient, though consistent. By Aitken's theorem (see, e.g., Greene (2003), 10.5), generalized least squares (GLS) are the efficient estimator for the model parameters if is known. The estimator is then

Various feasible GLS procedures exist drawing on consistent estimators of , which are then plugged into the GLS estimator. The key to obtaining a consistent estimate of is, in general, to specify enough structure to faithfully represent its characteristics while keeping the number of parameters to be estimated at a manageable level.

In the standard one‐way error components model, as already seen, the disturbance term may be written as where denotes the (time‐invariant) individual‐specific effect and the idiosyncratic error. Observations regarding the same individual share the same effect, thus the relative errors are autocorrelated. The random effects structure is a very parsimonious way to account for individual heterogeneity, which can be extended in various dimensions, e.g., by specifying an autoregressive process in space and/or time for the idiosyncratic component .

Under the random effects specification, the variance‐covariance matrix of the errors is block‐diagonal with where

The above is the standard specification of random effects panels, described in the previous chapters. It parsimoniously describes the error covariance by means of just two parameters and is, therefore, of very general applicability as far as sample sizes are concerned. In panels with one dimension much larger than the other (typically, large and short panels) a less restrictive approach is possible, termed general GLS (Wooldridge, 2002, 10.4.3), which allows for arbitrary within‐individual heteroscedasticity and serial correlation of errors, i.e., inside the covariance submatrices, provided that these remain the same for every individual.

5.2.1 General Feasible Generalized Least Squares

If one assumes but leaves the structure of completely free except for the obvious requisites of being symmetric and positive definite:

individual errors can evolve through time with an unlimited amount of heteroscedasticity and autocorrelation, but they are assumed to be uncorrelated between them in the cross section, and this structure is assumed constant over the different individuals. By this assumption, the components of can be estimated drawing on the cross‐sectional dimension, using the average over individuals of the outer products of the residuals from a consistent estimator:

where is the subvector of OLS residuals for individual .

This estimator is called general feasible GLS, or GGLS and is also sometimes referred to as the Parks (1967) estimator and is, as observed by Driscoll and Kraay (1998), a variant of the SUR estimator by Zellner (1962). Greene (2003) presents the same estimator in the context of pooled time series, with fixed and “large” .

Leaving the intra‐group error covariance parameters completely free to vary is an attractive strategy, provided that because the number of variance parameters to be estimated with data points is (Wooldridge, 2002). This is a typical situation in micro‐panels such as, e.g., household income surveys, where is in the thousands but is typically quite short so that even if estimating an unrestricted covariance, many degrees of freedom are still available.

The original applications have instead been in the field of pooled time series, aimed at accounting for cross‐sectional correlation and heteroscedasticity. In this context, Driscoll and Kraay (1998) observe how the lack of degrees of freedom in estimating the error covariance leads to near‐singular estimates and hence to downward‐biased standard errors, thus overestimating parameter significance. Beck and Katz (1995) also discuss some severe biases of this estimator in small samples. Both start from a pooled time series, ‐asymptotic approach, and both are interested in robustness over the cross‐sectional dimension. In this light, most of the criticism this estimator has been subject to depends on the peculiar field of application, especially in Beck and Katz (1995) and references therein (Alvarez et al., 1991) where it is applied to political science data with very modest sample sizes; but recent simulations by Chen et al. (2009) show that even in such situations FGLS can be more efficient than the proposed alternatives (OLS with PCSE standard errors).

The GGLS principle can be applied to various situations, consistent with different views on heterogeneity (random vs fixed effects hypothesis) or stationarity (e.g., to a model in first differences). That translates into either applying the unrestricted GLS estimator directly to the observed data or to a transformation thereof.

This framework allows the error covariance structure inside every group (if effect is set to 'individual') of observations to be fully unrestricted and is therefore robust against any type of intra‐group heteroscedasticity and serial correlation. This structure, by converse, is assumed identical across groups and thus general FGLS is inefficient under group‐wise heteroscedasticity. Cross‐sectional correlation is excluded a priori.

In a pooled time series context (effect is set to 'time'), symmetrically, this estimator is able to account for arbitrary cross‐sectional correlation, provided that the latter is time invariant (see Greene, 2003, 13.9.1–2, p. 321–322). In this case, serial correlation has to be assumed away and the estimator is consistent with respect to the time dimension, keeping fixed.

5.2.1.1 Pooled GGLS

Under the specification described at the beginning of this section, residuals can be consistently estimated by OLS and then used to estimate as above. Using , the FGLS estimator is:

The estimated individual submatrix will give an assessment of the structure, if any, of the errors' covariance, which may guide towards more parsimonious specifications like the RE one (if all diagonal and, respectively, all off‐diagonal elements are of similar magnitude) or possibly an AR(1) specification, if covariances between pairs of off‐diagonal elements become smaller with distance.

In this small‐, large‐ context, one will often want to include time fixed effects to mitigate cross‐sectional correlation, which is assumed out of the residuals.

The function pggls estimates general FGLS models, either with or without fixed effects, or on first‐differenced data. In the following we illustrate it on the EmplUK data.

Example 5‐11 generalized GLS estimator – `EmplUK` data set

The EmplUK dataset is a good candidate for GGLS estimation as being a relatively big random sample of firms observed over a limited number of years.

The “random effect” equivalent, general GLS, is estimated by specifying the model argument as 'pooling':

 data("EmplUK", package = "plm")
gglsmod <- pggls(log(emp) ˜ log(wage) + log(capital),
                 data = EmplUK, model = "pooling")
summary(gglsmod)

Call:
pggls(formula = log(emp) ˜ log(wage) + log(capital), data = EmplUK,
    model = "pooling")

Unbalanced Panel: n = 140, T = 7-9, N = 1031

Residuals:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
-1.8070 -0.3655  0.0618  0.0323  0.4428  1.5872

Coefficients:
             Estimate Std. Error z-value Pr(>|z|)
(Intercept)    2.0235     0.1585   12.77  < 2e-16 ***
log(wage)     -0.2323     0.0480   -4.84  1.3e-06 ***
log(capital)   0.6105     0.0174   35.02  < 2e-16 ***
‐‐‐
Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Total Sum of Squares: 1850
Residual Sum of Squares: 403
Multiple R-squared: 0.783

The pggls function is similar to plm in many respects. An exception is that the estimate of the group covariance matrix of errors (sigma, a matrix) is reported in the model objects instead of the usual estimated variances of the two error components. It can be displayed as follows:

 round(gglsmod$sigma, 3)
      1976  1977  1978  1979  1980  1981  1982  1983  1984
1976 0.307 0.291 0.277 0.269 0.252 0.254 0.247 0.303 0.362
1977 0.291 0.303 0.296 0.294 0.275 0.259 0.251 0.272 0.428
1978 0.277 0.296 0.299 0.301 0.280 0.264 0.256 0.280 0.433
1979 0.269 0.294 0.301 0.314 0.291 0.273 0.263 0.287 0.452
1980 0.252 0.275 0.280 0.291 0.282 0.265 0.254 0.279 0.426
1981 0.254 0.259 0.264 0.273 0.265 0.266 0.254 0.279 0.447
1982 0.247 0.251 0.256 0.263 0.254 0.254 0.262 0.291 0.473
1983 0.303 0.272 0.280 0.287 0.279 0.279 0.291 0.300 0.486
1984 0.362 0.428 0.433 0.452 0.426 0.447 0.473 0.486 0.505

As can be seen, the correlations between pairs of residuals (in time) for the same individual do not die out with the distance in time. The estimated error covariance very much resembles the random effects structure, with a strong prevalence of the individual variance component over (witness the small difference between values on and outside the diagonal).

5.2.1.2 Fixed Effects GLS

If individual heterogeneity is present but we do not trust the random effects assumption, and moreover the remainder errors are expected to show heteroscedasticity and serial correlation, the FE estimator can be employed together with a robust covariance matrix; but if the cross‐sectional dimension is sufficient and the assumption of constant covariance matrix across individuals is realistic, then applying the GGLS method to time‐demeaned data can provide a more efficient alternative, called the fixed effects GLS () estimator (Wooldridge, 2002, 10.5.5).⁵

The errors covariance submatrix for each individual is now:

where is the subvector of FE (within) residuals for individual . Using and the within transformed data, the estimator is:

This estimator, originally due to Kiefer (1980), takes care of both the serial correlation present in the original errors and, implicitly, of that induced by the demeaning. For this reason, being a combination of both, the estimated does not give a direct assessment of the original error structure anymore.

Example 5‐12 estimator – `EmplUK` data set

The fixed effects pggls is based on the estimation of a within model in the first step, but this is transparent to the user; estimation follows as above but for the need to specify model='within'. For reasons of robustness, as happens with plm, this is the default method. It is estimated by:

 feglsmod <- pggls(log(emp) ˜ log(wage) + log(capital), data = EmplUK,
                  model = "within")
summary(feglsmod)
 Within model

Call:
pggls(formula = log(emp) ˜ log(wage) + log(capital), data = EmplUK,
    model = "within")

Unbalanced Panel: n = 140, T = 7-9, N = 1031

Residuals:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
-0.5084 -0.0743 -0.0024  0.0000  0.0761  0.6014

Coefficients:
             Estimate Std. Error z-value Pr(>|z|)
log(wage)     -0.6176     0.0308   -20.1   <2e-16 ***
log(capital)   0.5610     0.0172    32.6   <2e-16 ***
‐‐‐
Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Total Sum of Squares: 1850
Residual Sum of Squares: 17.4
Multiple R-squared: 0.991

The phtest function can be used to assess the need for fixed effects through a Hausman test:

 phtest(feglsmod, gglsmod)

Hausman Test

data:  log(emp) ˜ log(wage) + log(capital)
chisq = 1100, df = 2, p-value <2e-16
alternative hypothesis: one model is inconsistent

The Hausman test strongly favours the fixed effects model.

5.2.1.3 First Difference GLS

Analogously, the GGLS principle can be applied to data in first differences, in the very same way as for , giving rise to the first difference GLS () estimator (Wooldridge, 2002, p. 320).

In this case, the errors covariance submatrix for an individual is:

where is the subvector of FD residuals for individual . Using and the differenced data, the estimator is:

First differencing eliminates time‐invariant unobserved heterogeneity, as does the within transformation; one difference is that now one time period is lost for each individual. FD has to be preferred to FE when the original data are likely to be nonstationary, because then the FD‐transformed residuals will be. Again, elements of do not directly represent the correlation structure of residuals because of the induced correlation from first differencing.

To choose which method to use, one can look at the stationarity properties of the residuals. If the residuals of the estimator are not stationary, then will be a more appropriate estimator.

Example 5‐13 estimator – `EmplUK` data set

Specifying model='fd', we obtain the estimator.

 fdglsmod <- pggls(log(emp) ˜ log(wage) + log(capital), data = EmplUK,
                  model = "fd")
summary(fdglsmod)
 NA

Call:
pggls(formula = log(emp) ˜ log(wage) + log(capital), data = EmplUK,
    model = "fd")

Unbalanced Panel: n = 140, T = 7-9, N = 1031

Residuals:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
-0.7578 -0.0751 -0.0189 -0.0283  0.0260  0.6506

Coefficients:
             Estimate Std. Error z-value Pr(>|z|)
log(wage)     -0.3343     0.0385   -8.67   <2e-16 ***
log(capital)   0.3786     0.0203   18.68   <2e-16 ***
‐‐‐
Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Total Sum of Squares: 1850
Residual Sum of Squares: 11.6
Multiple R-squared: 0.994

5.2.2 Applied Examples

Example 5‐14 generalized GLS estimator – `RiceFarms` data set

The Rice Farming dataset contains observations on 171 farms over 6 years; therefore, the number of covariance parameters to estimate on 1026 data points is a still manageable 21. Farms come from 6 different regions, each with peculiar characteristics. The random sampling assumption seems to be reasonable within regions, but one might suspect observations from the same region to share some common characteristics, and such characteristics to be possibly related to the regressors: therefore it is advisable to include 5 regional fixed effects, to control for region‐related, correlated heterogeneity along the lines of Wooldridge (2002, p. 328). For the reasons given above, we include time effects to control for contemporaneous correlation in the cross section; instead of following the original application of Druska and Horrace (2004) adding one dummy for wet seasons as opposed to dry ones, we simply introduce 5 separate time effects.

 data("RiceFarms", package = "splm")
RiceFarms <- transform(RiceFarms,
                       phosphate = phosphate / 1000,
                       pesticide = as.numeric(pesticide > 0))

fm <- log(goutput) ˜ log(seed) + log(urea) + phosphate +
    log(totlabor) + log(size) + pesticide + varieties +
        + region + time

 gglsmodrice <- pggls(fm, RiceFarms, model = "pooling", index = "id")
summary(gglsmodrice)
 NA

Call:
pggls(formula = fm, data = RiceFarms, model = "pooling", index = "id")

Balanced Panel: n = 171, T = 6, N = 1026

Residuals:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
-0.9315 -0.2285  0.0151  0.0000  0.2147  1.3740

Coefficients:
                  Estimate Std. Error z-value Pr(>|z|)
(Intercept)         5.3334     0.1788   29.83  < 2e-16 ***
log(seed)           0.1285     0.0241    5.34  9.1e-08 ***
log(urea)           0.1351     0.0151    8.94  < 2e-16 ***
phosphate           0.7040     0.2526    2.79   0.0053 **
log(totlabor)       0.2099     0.0265    7.93  2.1e-15 ***
log(size)           0.5000     0.0281   17.77  < 2e-16 ***
pesticide           0.0355     0.0245    1.45   0.1473
varietieshigh       0.1351     0.0345    3.92  8.9e-05 ***
varietiesmixed      0.1031     0.0446    2.31   0.0209 *
regionlangan       -0.0451     0.0472   -0.96   0.3393
regiongunungwangi   0.0140     0.0532    0.26   0.7926
regionmalausma      0.0200     0.0541    0.37   0.7121
regionsukaambit     0.0671     0.0529    1.27   0.2049
regionciwangi       0.1633     0.0530    3.08   0.0021 **
time2              -0.0328     0.0262   -1.25   0.2102
time3              -0.2049     0.0316   -6.49  8.4e-11 ***
time4              -0.3440     0.0285  -12.08  < 2e-16 ***
time5               0.0576     0.0287    2.01   0.0448 *
time6               0.0441     0.0313    1.41   0.1581
‐‐‐
Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Total Sum of Squares: 1010
Residual Sum of Squares: 101
Multiple R-squared: 0.901

Regions do not seem to be so important after all, only Ciwangi being significantly different from the baseline; although a joint restriction test still rejects:

 library("lmtest")
waldtest(gglsmodrice, "region")
Wald test

Model 1: log(goutput) ˜ log(seed) + log(urea) + phosphate + log(totlabor) +
    log(size) + pesticide + varieties + +region + time
Model 2: log(goutput) ˜ log(seed) + log(urea) + phosphate + log(totlabor) +
    log(size) + pesticide + varieties + time
  Res.Df Df Chisq Pr(>Chisq)
1   1007
2   1012 -5  28.9    2.5e-05 ***
‐‐‐
Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 feglsmodrice <- pggls(update(fm, . ˜ . - region), RiceFarms, index = "id")

Qualitatively, the results do not seem to change much when adding individual fixed effects. The hypothesis that after controlling for the region, all remaining individual heterogeneity be of the random effects type can be tested formally by means of a Hausman test:

 phtest(gglsmodrice, feglsmodrice)

Hausman Test

data:  fm
chisq = 18, df = 13, p-value = 0.1
alternative hypothesis: one model is inconsistent

The Hausman test does in fact not reject. Given the low significance of the regional effects, one might wonder whether a full “random effects” specification can be justified. An updated GGLS specification can readily be compared to the already estimated model:

 phtest(pggls(update(fm, . ˜ . - region), RiceFarms,
             model = "pooling", index = "id"),
       feglsmodrice)

Hausman Test

data:  update(fm, . ˜ . - region)
chisq = 19, df = 13, p-value = 0.1
alternative hypothesis: one model is inconsistent

In fact, even omitting the regional fixed effects, the GGLS specification still passes the Hausman test. The 171 rice farms can actually be seen as random draws from the same population, without the need for either individual or regional fixed effects.

Example 5‐15 generalized GLS estimator – `RDSpillovers` data set

The static production function estimation in Eberhardt et al. (2013) is a problematic candidate for GGLS techniques; although it is desirable to allow for a free heteroscedasticity and serial correlation structure across this sample of manufacturing firms observed over a relatively long period of time, care shall be taken with the results exactly because of the relatively big time dimension. As too many covariance parameters, as discussed above, would result in underestimation of standard errors and hence false significance, sharp results should be looked at with suspicion. The example is nevertheless useful for illustration purposes, especially as it can be benchmarked against the thorough specification analysis in the original paper. As it will turn out, the GGLS approach ultimately seems to have satisfactory properties in this setting too.

 fm <- lny ˜ lnl + lnk + lnrd

 gglsmodehs <- pggls(fm, RDSpillovers, model = "pooling")
coeftest(gglsmodehs)

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)
(Intercept)  1.04589    0.06416    16.3   <2e-16 ***
lnl          0.54825    0.01118    49.0   <2e-16 ***
lnk          0.43762    0.01384    31.6   <2e-16 ***
lnrd         0.08548    0.00548    15.6   <2e-16 ***
‐‐‐
Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 feglsmodehs <- pggls(fm, RDSpillovers, model = "within")
coeftest(feglsmodehs)

t test of coefficients:

     Estimate Std. Error t value Pr(>|t|)
lnl    0.4942     0.0204   24.18  < 2e-16 ***
lnk    0.4922     0.0307   16.01  < 2e-16 ***
lnrd   0.0490     0.0147    3.34  0.00086 ***
‐‐‐
Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 phtest(gglsmodehs, feglsmodehs)

Hausman Test

data:  fm
chisq = 18, df = 3, p-value = 5e-04
alternative hypothesis: one model is inconsistent

The Hausman test rejects the “random effects” GGLS specification. Given that correlated heterogeneity seems to be present, an alternative to eliminate it is the first difference transformation:

 fdglsmodehs <- pggls(fm, RDSpillovers, model = "fd")

Which one to choose between and depends on the properties of transformed residuals. residuals show a high level of persistence, as a simple serial correlation test (Wooldridge, 2002, 10.6.3) shows. We make a data.frame of the residuals, then estimate a (pooled) autoregressive model:

 fee <- resid(feglsmodehs)
dbfee <- data.frame(fee=fee, id=attr(fee, "index")[[1]])
coeftest(plm(fee˜lag(fee)+lag(fee,2), dbfee, model = "p", index="id"))

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.01096    0.00123    8.89  < 2e-16 ***
lag(fee)     1.07741    0.01926   55.95  < 2e-16 ***
lag(fee, 2) -0.14512    0.01886   -7.69  2.1e-14 ***
‐‐‐
Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The residuals seem close to being nonstationary. The estimated autocorrelation in residuals is instead much lower:

 fde <- resid(fdglsmodehs)
dbfde <- data.frame(fde=fde, id=attr(fde, "index")[[1]])
coeftest(plm(fde˜lag(fde)+lag(fde,2), dbfde, model = "p", index="id"))

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.01392    0.00132   10.50  < 2e-16 ***
lag(fde)     0.10548    0.02085    5.06  4.6e-07 ***
lag(fde, 2)  0.02317    0.01969    1.18     0.24
‐‐‐
Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

hence it is advisable to resort to the estimator:

 coeftest(fdglsmodehs)

t test of coefficients:

     Estimate Std. Error t value Pr(>|t|)
lnl    0.5569     0.0217   25.61   <2e-16 ***
lnk    0.3514     0.0326   10.78   <2e-16 ***
lnrd   0.0611     0.0157    3.89    1e-04 ***
‐‐‐
Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The result, despite the limited number of degrees of freedom in estimating , is in line with the more sophisticated analyses in the original paper by Eberhardt et al. (2013, Table 7) and with the preferred FD specification in Table 5, ibid. Moreover, despite the expected downward bias, standard errors are not too far from those of the above‐mentioned FD model.

Notes

1 vcovG can be used for calculating

, or

or, leaving the default lag at 0, to calculate

, or

. It takes as arguments a clustering dimension (cluster), a function of the errors corresponding to

(inner), and a lag order. The inner argument can accept either one of two strings 'cluster' or 'white', specifying respectively

and

, or a user‐supplied function. For example, specifying vcovG(plmmod, cluster = "group", inner = "cluster", l = 0) is equivalent to set vcovHC(plmmod) and will produce the Arellano estimator.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

double‐clustering
time‐clustering + shocks
panel Newey‐West
Driscoll and Kraay's SCC
double‐clustering + shocks