Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

4 Going beyond the Mean—Distributional Methods

Developing new decomposition methods for distributional statistics other than the mean has been an active research area over the last 15 years. In this section, we discuss a number of procedures that have been suggested for decomposing general distributional statistics. We focus on the case of the aggregate decomposition, though some of the suggested methods can be extended to the case of the detailed decomposition, which we discuss in Section 5. We begin by looking at the simpler case of a variance decomposition. The decomposition is obtained by extending the classic analysis of variance approach (based on a between/within group approach) to a general case with covariates . We then turn to new approaches based on various “plugging in” methods such as JMP’s residual imputation method and Machado and Mata (2005)’s conditional quantile regression method. Finally, we discuss methods that focus on the estimation of counterfactuals for the entire distribution. These methods are either based on reweighting or on the estimation of the conditional distribution.

Most of this recent research was initially motivated by the dramatic growth in earnings inequality in the United States. Prior to that episode, the literature was considering particular summary measures of inequality such as the variance of logs and the Gini coefficient. For instance, Freeman (1980, 1984) looks at the variance of log wages in his influential work on the effect of unions on wage dispersion. This research establishes that unions tend to reduce wage dispersion as measured by the variance of log wages. Freeman shows that despite the inequality-enhancing effect of unions on the between-group component of inequality, the overall effect of unions is to reduce inequality because of the even larger effect of unions on within-group inequality.

One convenient feature of the variance is that it can be readily decomposed into a within- and between-group component. Interestingly, related work in the inequality literature shows that other measures such as the Gini or Theil coefficient are also decomposable into a within- and between-group component.³⁹

Note that the between vs. within decomposition is quite different in spirit from the aggregate or detailed OB decomposition discussed in the previous section. There are advantages and disadvantages to this alternative approach. On the positive side, looking at between- and within-group effects can help understand economic mechanisms, as in the case of unions, or the sources of inequality growth (Juhn et al., 1993).

On the negative side, the most important drawback of the between vs. within decomposition is that it does not hold in the case of many other interesting inequality measures such as the interquartile ranges, the probability density function, etc. This is a major shortcoming since looking at what happens where in the distribution is important for identifying the factors behind changes or differences in distributions. Another drawback of the between vs. within approach is that it does not provide a straightforward way of looking at the specific contribution of each covariate, i.e. to perform a detailed decomposition. One final drawback is that with a rich enough set of covariates the number of possible groups becomes very large, and some parametric restrictions have to be introduced to keep the estimation problem manageable.

In response to these drawbacks, a new set of approaches have been proposed for performing aggregate decompositions on any distributional statistic. Some approaches such as Juhn et al. (1993), Donald et al. (2000), and Machado and Mata (2005) can be viewed as extensions of the variance decomposition approach where the whole conditional distribution (instead of just the conditional variance) are estimated using parametric approaches. Others such as DiNardo et al. (1996) completely bypass the problem of estimating conditional distributions and are, as such, closer cousins to estimators proposed in the program evaluation literature.

4.1 Variance decompositions

Before considering more general distributional statistics, it is useful to recall the steps used to obtain the standard OB decomposition. The first step is to assume that the conditional expectation of given is linear, i.e. . This follows directly from the linearity and zero conditional mean assumptions (Assumptions 10 and 11) introduced in Section 2. Using the law of conditional expectations, it then follows that the unconditional mean is . This particular property of the mean is then used to compute the OB decomposition.

In light of this, it is natural to think of extending this type of procedure to the case of the variance. Using the analysis of variance formula, the unconditional variance of can be written as:⁴⁰

where the expectations are taken over the distribution of . The first component of the equation is the within-group component (also called residual variance), while the second component is the between-group component (also called regression variance). Writing , , we can write the difference in variances across groups and as

A few manipulations yield , where

and

While it is straightforward to estimate the regression coefficients ( and ) and the covariance matrices of the covariates ( and ), the within-group (or residual) variance terms and v also have to be estimated to compute the decomposition.

Several approaches have been used in the literature to estimate v and . The simplest possible approach is to assume that the error term is homoscedastic, in which case and , and the two relevant variance parameters can be estimated from the sampling variance of the error terms in the regressions. The homoscedasticity assumption is very strong, however. When errors are heteroscedastic, differences between and can reflect spurious composition effects, in which case the decomposition will attribute to the wage structure effect () what should really be a composition effect (). Lemieux (2006b) has shown this was a major problem when looking at changes in residual wage inequality in the United States since the late 1980s.

A simple way of capturing at least some of the relationship between the covariates and the conditional variance is to compute the variance of residuals for a limited number of subgroups of “cells”. For instance, Lemieux (2006b) shows estimates for 20 different subgroups of workers (based on education and experience), while Card (1996) divides the sample into five quintiles based on predicted wages .

Finally, one could attempt to estimate a more general specification for the conditional variance by running a “second step” model for squared regression residual on some specification of the covariates. For example, assuming that , we can estimate by running a regression of on .⁴¹ We can then write the two aggregate components of the variance decomposition as:

(25)

and

(26)

Compared to the standard OB decomposition for the mean, which only requires estimating a (regression) model for the conditional mean, in the case of the variance, we also need to estimate a model for the conditional variance. While this is quite feasible in practice, we can already see a number of challenges involved when decomposing distributional parameters beyond the mean:

• The estimation is more involved since we need to estimate models for two, instead of just one, conditional moment. Furthermore, little guidance is typically available on “reasonable” specifications for the conditional variance. For instance, in the case of wages, the Mincer equation provides a reasonably accurate and widely accepted specification for the conditional mean, while no such standard model is available for the conditional variance.

• Computing the detailed decomposition is more complicated since the between-group component is a quadratic form in the

’s. This yields a number of interaction terms that are difficult to interpret.

Since the complexity of decomposition methods already increases for a distributional measure as simple and convenient as the variance, this suggests these problems will be compounded in the case of other distributional measures such as quantiles. Indeed, we show in the next subsection that for quantiles, attempts at generalizing the approach suggested here require estimating the entire conditional distribution of given . This is a more daunting estimation challenge, and we now discuss solutions that have been suggested in the literature.

4.2 Going beyond the variance: general framework

An important limitation of summary measures of dispersion such as the variance, the Gini coefficient or the Theil coefficient is that they provide little information regarding what happens where in the distribution. This is an important shortcoming in the literature on changes in wage inequality where many important explanations of the observed changes have specific implications for specific points of the distribution. For instance, the minimum wage explanation suggested by DiNardo et al. (1996) should only affect the bottom end of the distribution. At the other extreme, explanations based on how top executives are compensated should only affect the top of the distribution. Other explanations based on de-unionization (Freeman, 1993; Card, 1992; DiNardo et al., 1996) and the computerization of “routine” jobs (Autor et al., 2003) tend to affect the middle (or “lower middle”) of the distribution. As a result, it is imperative to go beyond summary measures such as the variance to better understand the sources of growing wage inequality.

Going beyond summary measures is also important in many other interesting economic problems such the sources of the gender wage gap and the impact of social programs on labor supply.⁴² The most common approach for achieving this goal is to perform a decomposition for various quantiles (or differences between quantiles like the 90-10 gap) of the distribution. Unfortunately, as we point out in the introduction, it is much more difficult to decompose quantiles than the mean or even the variance. The basic problem is that the law of iterated expectations does not hold in the case of quantiles, i.e. , where , is the th quantile of the (unconditional) distribution of , and is the corresponding conditional quantile.

As it turns out, one (implicitly) needs to know the entire conditional distribution of given given to compute . To see this, note that

where is the cumulative distribution of conditional on in group . Given , it is possible to implicitly use this equation to solve for . It is also clear that in order to do so we need to know the conditional distribution function , as opposed to just the conditional mean and variance, as was the case for the variance. Estimating an entire conditional distribution function for each value of is a difficult problem. Various decomposition methods that we discuss in detail below suggest different ways of handling this challenge.

But before covering them in detail, we recall the basic principles underlying these methods. As in Section 2, we focus on cumulative distributions since any standard distribution statistic, such as a quantile, can be directly computed from the cumulative distribution. For instance, quantiles of the counterfactual distribution can be obtained by inverting : .

For the sake of presentational simplicity, we introduce a simplified notation relative to Section 2. We use instead of to represent the marginal distribution of , and to represent the conditional distributions, for , introduced in Eq. (4). We use the shorthand instead of to represent the key counterfactual distribution of interest introduced in Eq. (5), which mixes the distribution of characteristics of group B with the wage structure from group A:

(27)

Three general approaches have been suggested in the decomposition literature for estimating the counterfactual distribution . A first general approach, initially suggested by Juhn et al. (1993), replaces each value of for group with a counterfactual value of , where is an imputation function. The idea is to replace from group with a counterfactual value of that holds the same rank in the conditional distribution as it did in the original distribution of . As we discussed in Section 2.2.3, this is done in practice using a residual imputation procedure. Machado and Mata (2005) and Autor et al. (2005) have later suggested other approaches, based on conditional quantile regressions, to transform a wage observation into a counterfactual observation .

A second approach proposed by DiNardo et al. (1996) [DFL] is based on the following manipulation of Eq. (27):

(28)

where is a reweighting factor. This makes it clear that the counterfactual distribution is simply a reweighted version of the distribution . The reweighting factor is a simple function of that can be easily estimated using standard methods such as a logit or probit. The basic idea of the DFL approach is to start with group , and then replace the distribution of of group () with the distribution of of group () using the reweighting factor .

The third set of approaches also works with Eq. (27) starting with group , and then replacing the conditional distribution with . Doing so is more involved, from an estimation point of view, than following the DFL approach. The problem is that the conditional distributions depend on both and , while the reweighting factor only depends on .

Under this third set of approaches, one needs to directly estimate the conditional distribution . Parametric approaches for doing so were suggested by Donald et al. (2000) who used a hazard model approach, and Fortin and Lemieux (1998) who suggested estimating an ordered probit. More recently, Chernozhukov et al. (2009) suggest estimating distributional regressions (e.g. a logit, for each value of ). In all cases, the idea is to replace the conditional distribution for group , , with an estimate of the conditional distribution obtained using one of these methods.

In the next subsections, we discuss how these various approaches can be implemented. We also present some results regarding their statistical properties, and address computational issues linked to their implementation.

4.3 Residual imputation approach: JMP

Procedure

As we explain above, Juhn et al. (1993) propose an imputation approach where the wage from group is replaced by a counterfactual wage where both the returns to observables and unobservables are set to be as in group . The implementation of this procedure is divided in two steps. First, unobservables are replaced by counterfactual unobservables, as in Eq. (9). Second, counterfactual returns to observables are also imputed, as in Eq. (12).⁴³

Under the assumption of additive linearity (Assumption 10), the original wage equation for individual from group ,

allows the returns to unobservables to be group-specific. Under the assumption of rank preservation (14), the first counterfactual is computed as

(29)

where

and is the conditional rank of in the distribution of residuals for group (). A second counterfactual is then obtained by also replacing the returns to observable characteristics with

Under the assumptions of linearity and rank preservation, this counterfactual wage should be the same as , the counterfactual wage obtained by replacing the wage structure with .

In practice, it is straightforward to estimate and using OLS under the assumptions of linearity and zero conditional mean. It is much less clear, however, how to perform the residual imputation procedure described above. Under the strong assumption that the regression residuals are independent of , it follows that

Under this independence assumption, one simply needs to compute the rank of the residual in the marginal distribution (distribution over the whole sample) of residuals for group , and then pick the corresponding residuals in the marginal distribution of residuals for group . If is at the th percentile of the distribution of residuals of group (), then will simply be the 70th percentile of the distribution of residuals for group . In practice, most applications of the JMP procedure use this strong assumption of independence because there is little guidance on how a conditional imputation procedure could be used instead.

Limitations

Since independence of regression residuals is unrealistic, a more accurate implementation of JMP would require deciding how to condition on when performing the imputation procedure. If consists of a limited number of groups or “cells”, then one could perform the imputation within each of these groups. In general, however, it is difficult to know how to implement this ranking/imputation procedure in more general cases. As a result, other procedures such as the quantile method of Machado and Mata (2005) are increasingly being used as an alternative to JMP.

Another limitation of the JMP procedure is that there is no natural way of extending it to the case of the detailed decomposition for the composition effect.

Advantages

One advantage of the two-step procedure is that it provides a way of separating the between- and within-group components, as in a variance decomposition. This plays an important role in the inequality literature, since JMP concluded that most of the inequality growth from the 1960s to the 1980s was linked to the residual inequality component.

It is not clear, however, what is meant by between- and within-group components in the case of distributional measures like the 90-10 gap that are not decomposable. A better way of justifying JMP is that represents a structural model where are observed skills, while represents unobserved skills. One can then perform simulation exercises asking what happens to the distribution when one either replaces returns to observed or unobserved skills (see also Section 2.2.3).

This economic interpretation also requires, however, some fairly strong assumptions. The two most important assumptions are the linearity of the model (Assumption 10, ) and rank preservation (Assumption 14). While linearity can be viewed as a useful approximation, rank preservation is much stronger since it means that someone with the same unobserved skills would be in the exact same position, conditional on , in either group or . Just adding measurement error to the model would result in a violation of rank preservation.

Finally, if one is willing to interpret a simple regression as a decomposition between observed and unobserved skills, this can be combined with methods other than JMP. For instance, DFL perform regression adjustments to illustrate the effects of supply and demand factors on wages. ⁴⁴