Developing new decomposition methods for distributional statistics other than the mean has been an active research area over the last 15 years. In this section, we discuss a number of procedures that have been suggested for decomposing general distributional statistics. We focus on the case of the aggregate decomposition, though some of the suggested methods can be extended to the case of the detailed decomposition, which we discuss in Section 5. We begin by looking at the simpler case of a variance decomposition. The decomposition is obtained by extending the classic analysis of variance approach (based on a between/within group approach) to a general case with covariates . We then turn to new approaches based on various “plugging in” methods such as JMP’s residual imputation method and Machado and Mata (2005)’s conditional quantile regression method. Finally, we discuss methods that focus on the estimation of counterfactuals for the entire distribution. These methods are either based on reweighting or on the estimation of the conditional distribution.
Most of this recent research was initially motivated by the dramatic growth in earnings inequality in the United States. Prior to that episode, the literature was considering particular summary measures of inequality such as the variance of logs and the Gini coefficient. For instance, Freeman (1980, 1984) looks at the variance of log wages in his influential work on the effect of unions on wage dispersion. This research establishes that unions tend to reduce wage dispersion as measured by the variance of log wages. Freeman shows that despite the inequality-enhancing effect of unions on the between-group component of inequality, the overall effect of unions is to reduce inequality because of the even larger effect of unions on within-group inequality.
One convenient feature of the variance is that it can be readily decomposed into a within- and between-group component. Interestingly, related work in the inequality literature shows that other measures such as the Gini or Theil coefficient are also decomposable into a within- and between-group component.39
Note that the between vs. within decomposition is quite different in spirit from the aggregate or detailed OB decomposition discussed in the previous section. There are advantages and disadvantages to this alternative approach. On the positive side, looking at between- and within-group effects can help understand economic mechanisms, as in the case of unions, or the sources of inequality growth (Juhn et al., 1993).
On the negative side, the most important drawback of the between vs. within decomposition is that it does not hold in the case of many other interesting inequality measures such as the interquartile ranges, the probability density function, etc. This is a major shortcoming since looking at what happens where in the distribution is important for identifying the factors behind changes or differences in distributions. Another drawback of the between vs. within approach is that it does not provide a straightforward way of looking at the specific contribution of each covariate, i.e. to perform a detailed decomposition. One final drawback is that with a rich enough set of covariates the number of possible groups becomes very large, and some parametric restrictions have to be introduced to keep the estimation problem manageable.
In response to these drawbacks, a new set of approaches have been proposed for performing aggregate decompositions on any distributional statistic. Some approaches such as Juhn et al. (1993), Donald et al. (2000), and Machado and Mata (2005) can be viewed as extensions of the variance decomposition approach where the whole conditional distribution (instead of just the conditional variance) are estimated using parametric approaches. Others such as DiNardo et al. (1996) completely bypass the problem of estimating conditional distributions and are, as such, closer cousins to estimators proposed in the program evaluation literature.
Before considering more general distributional statistics, it is useful to recall the steps used to obtain the standard OB decomposition. The first step is to assume that the conditional expectation of given is linear, i.e. . This follows directly from the linearity and zero conditional mean assumptions (Assumptions 10 and 11) introduced in Section 2. Using the law of conditional expectations, it then follows that the unconditional mean is . This particular property of the mean is then used to compute the OB decomposition.
In light of this, it is natural to think of extending this type of procedure to the case of the variance. Using the analysis of variance formula, the unconditional variance of can be written as:40
where the expectations are taken over the distribution of . The first component of the equation is the within-group component (also called residual variance), while the second component is the between-group component (also called regression variance). Writing , , we can write the difference in variances across groups and as
A few manipulations yield , where
and
While it is straightforward to estimate the regression coefficients ( and ) and the covariance matrices of the covariates ( and ), the within-group (or residual) variance terms and v also have to be estimated to compute the decomposition.
Several approaches have been used in the literature to estimate v and . The simplest possible approach is to assume that the error term is homoscedastic, in which case and , and the two relevant variance parameters can be estimated from the sampling variance of the error terms in the regressions. The homoscedasticity assumption is very strong, however. When errors are heteroscedastic, differences between and can reflect spurious composition effects, in which case the decomposition will attribute to the wage structure effect () what should really be a composition effect (). Lemieux (2006b) has shown this was a major problem when looking at changes in residual wage inequality in the United States since the late 1980s.
A simple way of capturing at least some of the relationship between the covariates and the conditional variance is to compute the variance of residuals for a limited number of subgroups of “cells”. For instance, Lemieux (2006b) shows estimates for 20 different subgroups of workers (based on education and experience), while Card (1996) divides the sample into five quintiles based on predicted wages .
Finally, one could attempt to estimate a more general specification for the conditional variance by running a “second step” model for squared regression residual on some specification of the covariates. For example, assuming that , we can estimate by running a regression of on .41 We can then write the two aggregate components of the variance decomposition as:
and
Compared to the standard OB decomposition for the mean, which only requires estimating a (regression) model for the conditional mean, in the case of the variance, we also need to estimate a model for the conditional variance. While this is quite feasible in practice, we can already see a number of challenges involved when decomposing distributional parameters beyond the mean:
Since the complexity of decomposition methods already increases for a distributional measure as simple and convenient as the variance, this suggests these problems will be compounded in the case of other distributional measures such as quantiles. Indeed, we show in the next subsection that for quantiles, attempts at generalizing the approach suggested here require estimating the entire conditional distribution of given . This is a more daunting estimation challenge, and we now discuss solutions that have been suggested in the literature.
An important limitation of summary measures of dispersion such as the variance, the Gini coefficient or the Theil coefficient is that they provide little information regarding what happens where in the distribution. This is an important shortcoming in the literature on changes in wage inequality where many important explanations of the observed changes have specific implications for specific points of the distribution. For instance, the minimum wage explanation suggested by DiNardo et al. (1996) should only affect the bottom end of the distribution. At the other extreme, explanations based on how top executives are compensated should only affect the top of the distribution. Other explanations based on de-unionization (Freeman, 1993; Card, 1992; DiNardo et al., 1996) and the computerization of “routine” jobs (Autor et al., 2003) tend to affect the middle (or “lower middle”) of the distribution. As a result, it is imperative to go beyond summary measures such as the variance to better understand the sources of growing wage inequality.
Going beyond summary measures is also important in many other interesting economic problems such the sources of the gender wage gap and the impact of social programs on labor supply.42 The most common approach for achieving this goal is to perform a decomposition for various quantiles (or differences between quantiles like the 90-10 gap) of the distribution. Unfortunately, as we point out in the introduction, it is much more difficult to decompose quantiles than the mean or even the variance. The basic problem is that the law of iterated expectations does not hold in the case of quantiles, i.e. , where , is the th quantile of the (unconditional) distribution of , and is the corresponding conditional quantile.
As it turns out, one (implicitly) needs to know the entire conditional distribution of given given to compute . To see this, note that
where is the cumulative distribution of conditional on in group . Given , it is possible to implicitly use this equation to solve for . It is also clear that in order to do so we need to know the conditional distribution function , as opposed to just the conditional mean and variance, as was the case for the variance. Estimating an entire conditional distribution function for each value of is a difficult problem. Various decomposition methods that we discuss in detail below suggest different ways of handling this challenge.
But before covering them in detail, we recall the basic principles underlying these methods. As in Section 2, we focus on cumulative distributions since any standard distribution statistic, such as a quantile, can be directly computed from the cumulative distribution. For instance, quantiles of the counterfactual distribution can be obtained by inverting : .
For the sake of presentational simplicity, we introduce a simplified notation relative to Section 2. We use instead of to represent the marginal distribution of , and to represent the conditional distributions, for , introduced in Eq. (4). We use the shorthand instead of to represent the key counterfactual distribution of interest introduced in Eq. (5), which mixes the distribution of characteristics of group B with the wage structure from group A:
Three general approaches have been suggested in the decomposition literature for estimating the counterfactual distribution . A first general approach, initially suggested by Juhn et al. (1993), replaces each value of for group with a counterfactual value of , where is an imputation function. The idea is to replace from group with a counterfactual value of that holds the same rank in the conditional distribution as it did in the original distribution of . As we discussed in Section 2.2.3, this is done in practice using a residual imputation procedure. Machado and Mata (2005) and Autor et al. (2005) have later suggested other approaches, based on conditional quantile regressions, to transform a wage observation into a counterfactual observation .
A second approach proposed by DiNardo et al. (1996) [DFL] is based on the following manipulation of Eq. (27):
where is a reweighting factor. This makes it clear that the counterfactual distribution is simply a reweighted version of the distribution . The reweighting factor is a simple function of that can be easily estimated using standard methods such as a logit or probit. The basic idea of the DFL approach is to start with group , and then replace the distribution of of group () with the distribution of of group () using the reweighting factor .
The third set of approaches also works with Eq. (27) starting with group , and then replacing the conditional distribution with . Doing so is more involved, from an estimation point of view, than following the DFL approach. The problem is that the conditional distributions depend on both and , while the reweighting factor only depends on .
Under this third set of approaches, one needs to directly estimate the conditional distribution . Parametric approaches for doing so were suggested by Donald et al. (2000) who used a hazard model approach, and Fortin and Lemieux (1998) who suggested estimating an ordered probit. More recently, Chernozhukov et al. (2009) suggest estimating distributional regressions (e.g. a logit, for each value of ). In all cases, the idea is to replace the conditional distribution for group , , with an estimate of the conditional distribution obtained using one of these methods.
In the next subsections, we discuss how these various approaches can be implemented. We also present some results regarding their statistical properties, and address computational issues linked to their implementation.
As we explain above, Juhn et al. (1993) propose an imputation approach where the wage from group is replaced by a counterfactual wage where both the returns to observables and unobservables are set to be as in group . The implementation of this procedure is divided in two steps. First, unobservables are replaced by counterfactual unobservables, as in Eq. (9). Second, counterfactual returns to observables are also imputed, as in Eq. (12).43
Under the assumption of additive linearity (Assumption 10), the original wage equation for individual from group ,
allows the returns to unobservables to be group-specific. Under the assumption of rank preservation (14), the first counterfactual is computed as
where
and is the conditional rank of in the distribution of residuals for group (). A second counterfactual is then obtained by also replacing the returns to observable characteristics with
Under the assumptions of linearity and rank preservation, this counterfactual wage should be the same as , the counterfactual wage obtained by replacing the wage structure with .
In practice, it is straightforward to estimate and using OLS under the assumptions of linearity and zero conditional mean. It is much less clear, however, how to perform the residual imputation procedure described above. Under the strong assumption that the regression residuals are independent of , it follows that
Under this independence assumption, one simply needs to compute the rank of the residual in the marginal distribution (distribution over the whole sample) of residuals for group , and then pick the corresponding residuals in the marginal distribution of residuals for group . If is at the th percentile of the distribution of residuals of group (), then will simply be the 70th percentile of the distribution of residuals for group . In practice, most applications of the JMP procedure use this strong assumption of independence because there is little guidance on how a conditional imputation procedure could be used instead.
Since independence of regression residuals is unrealistic, a more accurate implementation of JMP would require deciding how to condition on when performing the imputation procedure. If consists of a limited number of groups or “cells”, then one could perform the imputation within each of these groups. In general, however, it is difficult to know how to implement this ranking/imputation procedure in more general cases. As a result, other procedures such as the quantile method of Machado and Mata (2005) are increasingly being used as an alternative to JMP.
Another limitation of the JMP procedure is that there is no natural way of extending it to the case of the detailed decomposition for the composition effect.
One advantage of the two-step procedure is that it provides a way of separating the between- and within-group components, as in a variance decomposition. This plays an important role in the inequality literature, since JMP concluded that most of the inequality growth from the 1960s to the 1980s was linked to the residual inequality component.
It is not clear, however, what is meant by between- and within-group components in the case of distributional measures like the 90-10 gap that are not decomposable. A better way of justifying JMP is that represents a structural model where are observed skills, while represents unobserved skills. One can then perform simulation exercises asking what happens to the distribution when one either replaces returns to observed or unobserved skills (see also Section 2.2.3).
This economic interpretation also requires, however, some fairly strong assumptions. The two most important assumptions are the linearity of the model (Assumption 10, ) and rank preservation (Assumption 14). While linearity can be viewed as a useful approximation, rank preservation is much stronger since it means that someone with the same unobserved skills would be in the exact same position, conditional on , in either group or . Just adding measurement error to the model would result in a violation of rank preservation.
Finally, if one is willing to interpret a simple regression as a decomposition between observed and unobserved skills, this can be combined with methods other than JMP. For instance, DFL perform regression adjustments to illustrate the effects of supply and demand factors on wages. 44
Like JMP, Machado and Mata (2005, MM from hereinafter) propose a procedure based on transforming a wage observation into a counterfactual observation . The main advantage relative to JMP is that their estimation procedure, based on quantile regressions (Koenker and Bassett, 1978), provides an explicit way of estimating the (inverse) conditional distribution function in the imputation function . One important difference, however, is that instead of transforming each actual observation of into a counterfactual , MM use a simulation approach where quantiles are drawn at random.
and follows a uniform distribution, one can think of doing the following:
where and are the conditional quantile functions for the th quantile in group and , respectively.
A key implementation question is how to specify the functional forms for the conditional quantile functions. MM suggest a linear specification in the X that can be estimated using quantile regression methods. The conditional quantile regression models can be written as:
Table 4 reports in the top panel the results of the Machado-Mata procedure applied to our gender gap example using the male wage structure as reference. 46 It shows that the median gender log wage gap in the central column gives almost the same results for the aggregate decomposition as the OB mean gender log wage gap decomposition displayed in column (1) of Table 3. Going across the columns to compare quantile effects shows that gender differences in characteristics are much more important at the bottom (10th centile) than at the top (90th centile) of the wage distribution. Indeed, some significant wage structure effects emerge at the 90th percentile.
This decomposition method is computationally demanding, and becomes quite cumbersome for data sets numbering more than a few thousand observations. Bootstrapping quantile regressions for sizeable number of quantiles (100 would be a minimum) is computationally tedious with large data sets. The implementation of the procedure can be simplified by estimating a large number of quantile regressions (say 99, one for each percentile from 1 to 99) instead of drawing values of at random.47
Another limitation is that the linear specification is restrictive and finding the correct functional form for the conditional quantile regressions can be tedious. For instance, if there is a spike at the minimum wage in the wage distribution, this will result in flat spots in quantile regressions that would have to be captured with spline functions with knots that depend on . Accurately describing a simple distribution with mass points (as is commonly observed in wage data) can, therefore, be quite difficult to do using quantile regressions.
As pointed out by Chernozhukov et al. (2009), it is not very natural to estimate inverse conditional distribution functions (quantile regressions) when the main goal of counterfactual exercises is to replace the conditional distribution function with to obtain Eq. (27). Chernozhukov et al. (2009) suggest instead to estimate directly distributional regression models for , which is a more direct way of approaching the problem.
One advantage of the MM approach is that it provides a natural way of performing a detailed decomposition for the wage structure component. The idea is to successively replace the elements of by those of when performing the simulations, keeping in mind that this type of detailed decomposition is path dependent. Unfortunately, the MM approach does not provide a way of performing the detailed decomposition for the composition effect.48 This is a major drawback since the detailed decomposition of the composition effects is always clearly interpretable, while the detailed decomposition of the wage structure effect arbitrarily depends on the choice of the omitted group.
As we mention in Section 4.2, another way of estimating the counterfactual distribution is to replace the marginal distribution of for group with the marginal distribution of for group using a reweighting factor . This idea was first introduced in the decomposition literature by DiNardo, Fortin and Lemieux [DFL] (1996). While DFL focus on the estimation of counterfactual densities in their empirical application, the method is easily applicable to any distributional statistic.
In practice, the DFL reweighting method is similar to the propensity score reweighting method commonly used in the program evaluation literature (see Hirano et al. (2003)). For instance, in DFL’s application to changes in wage inequality in the United States, time is viewed as a state variable, or in the context of the treatment effects literature as a treatment.49 The impact of a particular factor or set of factors on changes in the wage distribution over time is constructed by considering the counterfactual state of the world where the distribution of this factor remained fixed in time, maintaining the Assumption 6 of invariance of the conditional distribution. Note that in contrast with the notation of this chapter, in DFL, time period 1 is used as reference group.50 The choice of period 0 or period 1 as the reference group is analogous to the choice of whether the female or the male wage structure should be the reference wage structure in the analysis of the gender wage gap and is expected to yield different results in most cases.
In DFL, manipulations of the wage distributions, computed through reweighting, are applied to non-parametric estimates of the wage density, which can be particularly useful when local distortions, from minimum wage effects for example, are at play. To be consistent with the rest of this section, however, we focus our discussion on the cumulative distribution instead of the density. The key counterfactual distribution of interest, shown in Eq. (27) (distribution of wages that would prevail for workers in group if they had the distribution of characteristics of group ) is constructed, as shown in Eq. (28), using the reweighting factor
Although the reweighting factor is the ratio of two multivariate marginal distribution functions (of the covariates ), this expression can be simplified using Bayes’ rule. Remembering that Bayes’ rule states that
we have
and a similar expression for . Since and , the reweighting factor that keeps all conditioning variables as in period 0 becomes
The reweighting factor can be easily computed by estimating a probability model for , and using the predicted probabilities to compute a value for each observation. DFL suggest estimating a flexible logit or probit model, while Hirano, Imbens, and Ridder propose to use a non-parametric logit model.51
The reweighting decomposition procedure can be implemented in practice as follows:
where is either a normal or logit link function, and is a polynomial in .
In DFL, the main object of interest is the probability density function, which is estimated using kernel density methods. The density for group and the counterfactual density can be estimated as follows using kernel density methods, where is the kernel function:52
Consider the density function for group , , and the counterfactual density . The composition effect in a decomposition of densities is:
Various statistics from the wage distribution, such as the 10th, 50th, and 90th percentile, or the variance, Gini, or Theil coefficients can be computed either from the counterfactual density or the counterfactual distribution using the reweighting factor. The latter procedure is easier to use as it simply involves computing (weighted) statistics using standard computer packages. For example, the counterfactual variance can be computed as:
where the counterfactual mean is:
For the 90-10, 90-50, and 50-10 wage differentials, the sought-after contributions to changes in inequality are computed as differences in the composition effects, for example,
Table 5 presents, in panel A, the results of a DFL decomposition of changes over time in male wage inequality using large samples from combined MORG-CPS data as in Firpo et al. (2007). In this decomposition, the counterfactual distribution of wages in 1983/85 is constructed by reweighting the characteristics of workers in 1983/85 (time period 0) so that they look like those of 2003/05 (time period 1) workers, holding the conditional distribution of wages in 1983/85 fixed.53 The results of the aggregate decomposition, reported in the first three rows of Table 5, show that composition effects play a large role in changes in overall wage inequality, as measured by the 90-10 log wage differential or the variance of log wages. But the wage structure effects are more important when looking for increases at the top of the wage distribution, as measured by the 90-50 log wage differential, or decreases in the bottom, as measured by the 50-10 log wage differential.
The main advantage of the reweighting approach is its simplicity. The aggregate decomposition for any distributional statistic is easily computed by running a single probability model (logit or probit) and using standard packages to compute distributional statistics with as weight.54
Another more methodological advantage is that formal results from Hirano et al. (2003) and Firpo (2007, 2010) establish the efficiency of this estimation method. Note that although it is possible to compute analytically the standard errors of the different elements of the decomposition obtained by reweighting, it is simpler in most cases to conduct inference by bootstrapping.55
For these two reasons, we recommend the reweighting approach as the method of choice for computing the aggregate decomposition. This recommendation even applies in the simple case of the mean decomposition. As pointed out by Barsky et al. (2002), a standard OB decomposition based on a linear regression model will yield biased estimates of the decomposition terms when the underlying conditional expectation of given is non-linear (see Section 3.4). They suggest using a reweighting approach as an alternative, and the results of Hirano et al. (2003) can be used to show that the resulting decomposition is efficient.
A first limitation of the reweighting method is that it is not straightforwardly extended to the case of the detailed decomposition. One exception is the case of binary covariates where it is relatively easily to compute the corresponding element of the decomposition. For instance, in the case of the union status (a binary covariate), DFL show how to compute the component of the composition corresponding to this particular covariate. It also relatively easy to compute the corresponding element of the wage structure effect. We discuss in Section 5 other options that can be used in the case of non-binary covariates.
As in the program evaluation literature, reweighting can have some undesirable properties in small samples when there is a problem of common support. The problem is that the estimated value of becomes very large when gets close to 1. While lack of common support is a problem for any decomposition procedure, Frolich (2004) finds that reweighting estimators perform particularly poorly in this context, though Busso et al. (2009) reach the opposite conclusion using a different simulation experiment.56
Finally, even in cases where a pure reweighting approach has some limitations, there may be gains in combining reweighting with other approaches. For instance, we discuss in the next section how reweighting can be used to improve a decomposition based on the -regression approach of Firpo et al. (2009). Lemieux (2002) also discusses how an hybrid approach based on DFL reweighting and the JMP decomposition procedure can be used to compute both the between- and within-group components of the composition and wage structure effects.
As mentioned above, when we first introduced the key counterfactual distribution of interest in Eq. (5), an alternative approach to the construction of this counterfactual is based on the estimation of the conditional distribution of the outcome variable, . The counterfactual distribution is then estimated by integrating this conditional distribution over the distribution of in group .
Two early parametric methods based on this idea were suggested by Donald et al. (2000), and Fortin and Lemieux (1998).57 Donald, Green and Paarsch propose estimating the conditional distribution using a hazard model. The (conditional) hazard function is defined as
where is the survivor function. Therefore, the conditional distribution of the outcome variable, , or its density, , is easily recovered from the estimates of the hazard model. For instance, in the standard proportional hazard model58
estimates of and of the baseline hazard can be used to recover the conditional distribution
where is the integrated baseline hazard.
Fortin and Lemieux (1998) suggest estimating an ordered probit model instead of a hazard model. They consider the following model for the outcome variable :
where is a monotonically increasing transformation function. The latent variable , interpreted as a latent “skill index” by Fortin and Lemieux, is defined as
where is assumed to follow a standard normal distribution. It follows that the conditional distribution of is given by
Fortin and Lemieux implement this in practice by discretizing the outcome variable into a large number of small bins. Each bin corresponds to values of between the two thresholds and . The conditional probability of being in bin is
This corresponds to an ordered probit model where the parameters (for ) are the usual latent variable thresholds. The estimated values of and of the thresholds can then be used to construct the counterfactual distribution, just as in Donald et al. (2000).
To be more concrete, the following steps could be used to estimate the counterfactual distribution at the point :
Repeating this for a large number of values of will provide an estimate of the counterfactual distribution .
In a similar spirit, Chernozhukov et al. (2009) suggest a more flexible distribution regression approach for estimating the conditional distribution . Following Foresi and Peracchi (1995), the idea is to estimate a separate regression model for each value of . They consider the model , where is a known link function. For example, if is a logistic function, can be estimated by creating a dummy variable indicating whether the value of is below , where is the indicator function, and running a logit regression of on to estimate .
Similarly, if the link function is the identity function () the probability model is a linear probability model. If the link function is the normal CDF () the probability model is a probit. Compared to Fortin and Lemieux (1998), Chernozhukov et al. (2009) suggest estimating a separate probit for each value of , while Fortin and Lemieux use a more restrictive model where only the intercept (the threshold in the ordered probit) is allowed to change for different values of .
As above, the counterfactual distribution can be obtained by first estimating the regression model (probit, logit, or LPM) for group to obtain the parameter estimates , computing the predicted probabilities , and averaging over these predicted probabilities to get the counterfactual distribution :
Once the counterfactual distribution has been estimated, counterfactual quantiles can be obtained by inverting the estimated distribution function. Consider , the th quantile of the counterfactual distribution . The estimated counterfactual quantile is:
It is useful to illustrate graphically how the estimation of the counterfactual distribution and the inversion into quantiles can be performed in practice. Figure 1 first shows the actual CDF’s for group , , and , , respectively. The squares in between the two cumulative distributions illustrate examples of counterfactuals computed using the one of the method discussed above.
For example, consider the case of the median wage for group , . Using the distribution regression approach of Chernozhukov et al. (2009), one can estimate, for example, a LPM by running a regression of on for group . This yields an estimate of that can then be used to compute . This counterfactual proportion is represented by the square on the vertical line over in Fig. 1.
Figure 2 then illustrates what happens when a similar exercise is performed for a larger number of values of (100 in this particular figure). It now becomes clear from the figure how to numerically perform the inversion. In the case of the median, the total gap between group and is . The counterfactual median can then be estimated by picking the corresponding point on the counterfactual function defined by the set of points estimated by running a set of LPM at different values of . In practice, one could compute the precise value of by estimating the LPMs (or a logit or probit) for a large number of values of , and then “connecting the dots” (i.e. using linear interpolations) between these different values.
Figure 2 also illustrates one of the key messages of the chapter listed in the introduction, namely that is it easier to estimate models for proportions than quantiles. In Fig. 2, the difference in the proportion of observations under a given value of is simply the vertical distance between the two cumulative distributions, . Decomposing this particular gap in proportion is not a very difficult problem. As discussed in Section 3.5, one can simply run a LPM and perform a standard OB decomposition. An alternative also discussed in Section 3.5 is to perform a nonlinear decomposition using a logit or probit model. The conditional distribution methods of Fortin and Lemieux (1998) and Chernozhukov et al. (2009) essentially amount to computing this decomposition in the vertical dimension.
By contrast, it is not clear at first glance how to decompose the horizontal distance, or quantile gap, between the two curves. But since the vertical and horizontal are just two different ways of describing the same difference between the two cumulative distributions and , one can perform a first decomposition either vertically or horizontally, and then invert back to get the decomposition in the other dimension. Since decomposing proportions (the vertical distance) is relatively easy, this suggests first performing the decomposition on proportions at many points of the distribution, and then inverting back to get the decomposition in the quantile dimension (the horizontal distance).
Table 5 reports, in panels B and C, the results of the aggregate decomposition results for male wages using the method of Chernozhukov et al. (2009). The counterfactual wage distribution is constructed by asking what would be the distribution of wages in 1983/85 if the conditional distribution was as in 2003/05. Panel B uses the LPM to estimate while the logit model is used in Panel C.59 The first rows of Panel B and C show the changes in the wage differentials based on the fitted distributions, so that any discrepancies between these rows in the first row of Panel A shows the estimation errors. The second rows report the composition effects computed as the difference between the fitted distribution in 1983/85 and the counterfactual distribution. Given our relatively large sample, the differences across estimators in the different panels are at times statistically different. However, the results from the logit estimation in Panel C give results that are qualitatively similar to the DFL results shown in Panel A, with composition effects being relatively more important in accounting for overall wage inequality, as measured by the 90-10 log wage differential, and wage structure effects playing a relatively more important role in increasing wage inequality at the top and reducing wage inequality at the bottom.
If one is just interested in performing an aggregate distribution, it is preferable to simply use the reweighting methods discussed above. Like the conditional quantile methods discussed in Section 4.4, conditional distribution methods require some parametric assumptions on the distribution regressions that may or may not be valid. Chernozhukov, Fernandez-Val, and Melly’s distribution regression approach is more flexible than earlier suggestions by Donald et al. (2000) and Fortin and Lemieux (1998), but it potentially involves estimating a large number of regressions.
Running unconstrained regressions for a large number of values of may result, however, in non-monotonicities in the estimated counterfactual distribution . Smoothing or related methods then have to be used to make sure that the counterfactual distribution is monotonic and, thus, invertible into quantiles.60 By contrast, reweighting methods require estimating just one flexible logit or probit regression, which is very easy to implement in practice.
An important advantage of distribution regression methods over reweighting is that they can be readily generalized to the case of the detailed decomposition, although these decomposition will be path dependent. We show in the next section how Chernozhukov, Fernandez-Val, and Melly’s distribution regression approach, and the related regression method of Firpo et al. (2009) can be used to perform a detailed decomposition very much in the spirit of the traditional OB decomposition for the mean.
In this section we discuss most of the existing methods that have been proposed to perform an aggregate decomposition for general distributional statistics. While all these methods could, in principle, yield similar results, we argue that DFL reweighting is the method of choice in this context for two main reasons. First, it is simple to implement as it simply involves estimating a single logit or probit model for computing the reweighting factors. Counterfactual values of any distributional statistical can then be readily computed from the reweighted sample. By contrast, methods that yield counterfactual estimates of quantiles or the whole CDF require estimating a separate model at a large number of points in the distribution.
The second advantage of reweighting is that there are well established results in the program evaluation that show that the method is asymptotically efficient (Hirano et al., 2003; Firpo, 2007).
In this section, we extend the methods introduced above for the aggregate decomposition to the case of the detailed decomposition. We first show that conditional distribution methods based on distribution regressions can be used to compute both the composition and wage structure subcomponents of the detailed decomposition. We then discuss a related method based the -regressions introduced in Firpo et al. (2009). The main advantage of this last procedure is that it is regression based and, thus, as easy to use in practice as the traditional OB method.
The other methods proposed in Section 4 are not as easy to extend to the case of the detailed decomposition. We discuss, nonetheless, which elements of the detailed decomposition can be estimated using these various methods, and under which circumstances it is advantageous to use these methods instead of others.
In the case where the specification used for the distribution regression is the LPM, the aggregate decomposition of Section 4.6 can be generalized to the detailed decomposition as follows. Since the link function for the LPM is , the counterfactual distribution used earlier becomes:
We can also write:
where the first term is the familiar wage structure effect, while the second term is the composition effect. The above equation can, therefore, be used to compute a detailed decomposition of the difference in the proportion of workers below wage between groups and . We obtain the detailed distribution of quantiles by (i) computing the different counterfactuals for each element of and sequentially, for a large number of values of , and (ii) inverting to get the corresponding quantiles for each detailed counterfactual. A similar approach could also be used when the link function is a probit or a logit by using the procedure suggested in Section 3.5.
The main advantage of this method based on distribution regressions and the global inversion of counterfactual CDF into counterfactual quantiles (as in Fig. 2) is that it yields a detailed decomposition comparable to the OB decomposition of the mean.
One limitation of this method is that it involves computing a large number of counterfactuals CDFs and quantiles, as the procedure has to be repeated for a sizable number of values of . This can become cumbersome because of the potential non-monotonicity problems discussed earlier. Furthermore, the procedure suffers from the problem of path dependence since the different counterfactual elements of the detailed decomposition have to be computed sequentially. For these reasons, we next turn to a simpler approach based on a local, as opposed to a global, inversion of the CDF.
-regression methods provide a simple way of performing detailed decompositions for any distributional statistic for which an influence function can be computed. Although we focus below on the case of quantiles of the unconditional distribution of the outcome variable, our empirical example includes the case of the variance and Gini. The procedure can be readily used to address glass ceiling issues in the context of the gender wage gap, or changes in the interquartile range in the context of changes in wage inequality. It can be used to either perform OB- type detailed decompositions, or a slightly modified “hybrid” version of the decomposition suggested by Firpo et al. (2007) (reweighting combined with regressions, as in Section 3.4 for the mean).
A -regression (Firpo et al., 2009) is similar to a standard regression, except that the dependent variable, , is replaced by the (recentered) influence function of the statistic of interest. Consider , the influence function corresponding to an observed wage for the distributional statistic of interest, . The recentered influence function () is defined as , so that it aggregates back to the statistics of interest (). In its simplest form, the approach assumes that the conditional expectation of the can be modeled as a linear function of the explanatory variables,
where the parameters can be estimated by OLS.61
In the case of quantiles, the influence function is given by , where is an indicator function, is the density of the marginal distribution of , and is the population -quantile of the unconditional distribution of . As a result, is equal to , and can be rewritten as
where and . Except for the constants and , the for a quantile is simply an indicator variable for whether the outcome variable is smaller or equal to the quantile . Using the terminology introduced above, running a linear regression of on is a distributional regression estimated at , using the link function of the linear probability model ().
There is, thus, a close connection between regressions and the distributional regression approach of Chernozhukov et al. (2009). In both cases, regression models are estimated for explaining the determinants of the proportion of workers earning less than a certain wage. As we saw in Fig. 2, in Chernozhukov et al. (2009) estimates of models for proportions are then globally inverted back into the space of quantiles. This provides a way of decomposing quantiles using a series of simple regression models for proportions.
Figure 3 shows that -regressions for quantiles are based on a similar idea, except that the inversion is only performed locally. Suppose that after estimating a model for proportions, we compute a counterfactual proportion based on changing either the mean value of a covariate, or the return to the covariate estimated with the LPM regression. Under the assumption that the relationship between counterfactual proportions and counterfactual quantiles is locally linear, one can then go from the counterfactual proportion to the counterfactual quantile (both illustrated in Fig. 3) by moving along a line with a slope given by the slope of the counterfactual distribution function. Since the slope of a cumulative distribution is the just the probability density function, one can easily go from proportions to quantiles by dividing the elements of the decomposition for proportions by the density.
While the argument presented in Fig. 3 is a bit heuristic, it provides the basic intuition for how we can get a decomposition model for quantiles by simply dividing a model for proportions by the density. As we see in Eq. (33), in the for quantiles, the indicator variable is indeed divided by (i.e. multiplying by the constant ).
Firpo et al. (2009) explain how to first compute the , and then run regressions of the on the vector of covariates. In the case of quantiles, the is first estimated by computing the sample quantile , and estimating the density at that point using kernel methods. An estimate of the of each observation, , is then obtained by plugging the estimates and into Eq. (33).
Letting the coefficients of the unconditional quantile regressions for each group be
we can write the equivalent of the OB decomposition for any unconditional quantile as
The second term in Eq. (36) can be rewritten in terms of the sum of the contribution of each covariate as
That is, the detailed elements of the composition effect can be computed in the same way as for the mean. Similarly, the detailed elements of the wage structure effects can be computed, but as in the case of the mean, these will also be subject to the problem of the omitted group.
Table 4 presents in its bottom panel such OB like gender wage gap decomposition of the 10th, 50th, and 90th percentiles of the unconditional distribution of wages corresponding to Tables 2 and 3 using the male coefficients as reference group and without reweighting. As with the MM decomposition presented in the top panel, the composition effects from the decomposition of the median gender pay gap reported in the central column of Table 4 are very close to those of the decomposition of the mean gender pay gap reported in column (1) of Table 3. As before, the wage structure effects in the relatively small NLSY sample are generally not statistically significant, with the exception of the industrial sectors which are, however, subject to the categorical variables problem. The comparison of the composition effects at the 10th and 90th percentiles shows that the impact of differences in life-time work experience is much larger at the bottom of the distribution than at the top where it is not statistically significant. Note that the aggregate decomposition results obtained using either the MM method or the regressions do not exhibit statistically significant differences. Table 5 presents in Panel D the results of the aggregate decomposition using -regressions without reweighting. The results are qualitatively similar to those of Panels A and C. Table 6 extends the analysis of the decomposition of male wage inequality presented in Table 5 to the detailed decomposition. For each inequality measures, the detailed decomposition are presented both for the extension of the classic OB decomposition in Eq. (36), and for the reweighted-regression decomposition, described in the case of the mean in Section 3.4. 62 For the reweighted-regression decomposition, Table 6 reports the detailed elements of the main composition effect and the detailed elements of the main wage structure effect , where
and where the group sample is reweighted to mimic the group sample, which means we should have . The total reweighting error corresponds to the difference between the “Total explained” across the classic OB and the reweighted-regression decomposition. For example, for the 90-10 log wage differential, it is equal to . 63 The total specification error, , corresponds to the difference between the “Total wage structure” across the classic OB and the reweighted-regression decomposition and is found to be more important. In terms of composition effects, de-unionization is found to be an important factor accounting for the polarization of male wage inequality. It is also found to reduce inequality at the bottom, as measured by the 50-10 log wage differential, and to increase inequality at the top, as measured by the 90-50 log wage differential. In terms of wage structure effects, increases in the returns to education are found, as in Lemieux (2006a), to be the dominant factor accounting for overall increases in male wage inequality.
The linearity of regressions has several advantages. It is straightforward to invert the proportion of interest by dividing by the density. Since the inversion can be performed locally, another advantage is that we don’t need to evaluate the global impact at all points of the distribution and worry about monotonicity. One gets a simple regression which is easy to interpret. As a result, the resulting decomposition is path independent.
Like many other methods, regressions assume the invariance of the conditional distribution (i.e., no general equilibrium effects). Also, a legitimate practical issue is how good the approximation is. For relatively smooth dependent variables, such as test scores, it may be a moot point. But in the presence of considerable heaping (usually displayed in wage distribution), it may advisable to oversmooth density estimates and compare its values around the quantile of interest. This can be formally looked at by comparing reweighting estimates to the OB-type composition effect based on regressions (the specification error discussed earlier).
As we mention in Section 4, it is relatively straightforward to extend the DFL reweighting method to perform a detailed decomposition in the case of binary covariates. DFL show how to compute the composition effect corresponding to a binary covariate (union status in their application). Likewise, DiNardo and Lemieux (1997) use yet another reweighting technique to compute the wage structure component. We first discuss the case where a covariate is a binary variable, and then discuss the case of categorical (with more than 2 categories) and continuous variables.
Consider the case of one binary covariate, , and a vector of other covariates, . For instance, DiNardo et al. (1996) look at the case of unionization. They are interested in isolating the contribution of de-unionization to the composition effect by estimating what would have happened to the wage distribution if the distribution of unionization, but of none of the other covariates, had changed over time.
Letting index the base period and the end period, consider the counterfactual distribution , which represents the period distribution that would prevail if the conditional distribution of unionization (but of none of the other covariates ) was as in period .64 Note that we are performing a counterfactual experiment by changing the conditional, as opposed to the marginal, distribution of unionization. Unless unionization is independent of other covariates (), the marginal distribution of unionization, , will depend on the distribution of , . For instance, if unionization is higher in the manufacturing sector, but the share of workers in manufacturing declines over time, the overall unionization rate will decline even if, conditional on industrial composition, the unionization rate remains the same.
Using the language of program evaluation, we want to make sure that secular changes in the rate of unionization are not confounded by other factors such as industrial change. This is achieved by looking at changes in the conditional, as opposed to the marginal distribution of unionization. Note that the main problem with the procedure suggested by MM to compute the elements of the composition effect corresponding to each covariate is that it fails to control this problem. MM suggest using an unconditional reweighting procedure based on the change in the marginal, as opposed to the conditional distribution of covariates. Unless the covariates are independent, this will yield biased estimates of the composition effect elements of the detailed decomposition.
The counterfactual distribution is formally defined as
where the reweighting function is
Note that the conditional distribution is assumed to be unaffected by the change in the conditional distribution of unionization (assumption of invariance of conditional distribution in Section 2). This amounts to assuming away selection into union status based on unobservables (after controlling for the other covariates ).
The reweighting factor can be computed in practice by estimating two probit or logit models for the probability that a worker is unionized in period and , respectively. The resulting estimates can then be used to compute the predicted probability of being unionized ( and ) or not unionized ( and ), and then plugging these estimates into the above formula.
DiNardo and Lemieux (1997) use a closely related reweighting procedure to compute the wage structure component of the effect of unions on the wage distribution. Consider the question of what would happen to the wage distribution if no workers were unionized. The distribution of wages among non-union workers:
is not a proper counterfactual since the distribution of other covariates, , may not be the same for union and non-union workers. DiNardo and Lemieux (1997) suggest solving this problem by reweighting non-union workers so that their distribution of is the same as for the entire workforce. The reweighting factor that accomplishes this at time and are and , respectively, where:
Using these reweighting terms, we can write the counterfactual distribution of wages that would have prevailed in the absence of unions as:
These various counterfactual distributions can then be used to compute the contribution of unions (or another binary variable ) to the composition effect, , and to the wage structure effect, :
and
Although we need three different reweighting factors (, , and ) to compute the elements of the detailed wage decomposition corresponding to , these three reweighting factors can be constructed from the estimates of the two probability models and . As before, once these reweighting factors have been computed, the different counterfactual statistics are easily obtained using standard statistical packages.
It is difficult to generalize the approach suggested above to the case of covariates that are not binary. In the case of the composition effect, one approach that has been followed in the applied literature consists of sequentially adding covariates in the probability model used to compute .65 For instance, start with , compute and the counterfactual statistics of interest by reweighting. Then do the same thing with , etc.
One shortcoming of this approach is that the results depend on the order in which the covariates are sequentially introduced, just like results from a sequential decomposition for the mean also depend on the order in which the covariates are introduced in the regression. For instance, estimates of the effect of unions that fail to control for any other covariates may be overstated if union workers tend to be concentrated in industries that would pay high wages even in the absence of unions. As pointed out by Gelbach (2009), the problem with sequentially introducing covariates can be thought of as an omitted variable problem. Unless there are compelling economic reasons for first looking at the effect of some covariates without controlling for the other covariates, sequential decompositions will have the undesirable property of depending (strongly in some cases) on the order of the decomposition (path dependence).66
Fortunately, there is a way around the problem of path dependence when performing detailed decompositions using reweighting methods. The approach however still suffers from the adding-up problem and is more appropriate when only the effect of a particular factor is of interest. To illustrate this approach, consider a case with three covariates , , and . In a sequential decomposition, one would first control for only, then for and , and finally for , , and . On the one hand, the regression coefficient on and/or in regressions that fail to control for are biased because of the omitted variable problem. The corresponding elements of a detailed OB decomposition for the mean based on these estimated coefficients would, therefore, be biased too.
On the other hand, the coefficient on the last covariate to be introduced in the regression () is not biased since the other covariates ( and ) are also controlled for. So although order matters in a sequential regression approach, the effect of the last covariate to be introduced is not affected by the omitted variable bias.
The same logic applies in the case of detailed decompositions based on a reweighting approach. Intuitively, the difference in the counterfactual distribution one gets by reweighting with and only, comparing to reweighting with , , and should yield the appropriate contribution of to the composition effect.
To see this more formally, consider the group counterfactual distribution that would prevail if the distribution of , conditional on , , was as in group :
where the reweighting factor can be written as:
is the reweighting factor used to compute the aggregate decomposition in Section 4.5. is a reweighting factor based on all the covariates except the one considered for the detailed decomposition (). As before, Bayes’ rule can be used to show that:
Once again, this new reweighting factor is easily computed by running a probit or logit regression (with and as covariates) and using predicted probability to estimate .
This reweighting procedure for the detailed decomposition is summarized as follows:
Note that while this procedure does not suffer from path dependence, the contribution of each covariates does not sum up to the total contribution of covariates (aggregate composition effect). The difference is an interaction effect between the different covariates which is harder to interpret.
This reweighting procedure shares most of the advantages of the other reweighting procedures we proposed for the aggregate decomposition. First, it is generally easy to implement in practice. Second, by using a flexible specification for the logit/probit, it is possible to get estimates of the various components of the decomposition that depend minimally on functional form assumptions. Third, the procedure yields efficient estimates.
With a large number of covariates, one needs to compute a sizable number of reweighting factors to compute the various elements of the detailed decomposition. This can be tedious, although it does not require that much in terms of computations since each probit/logit is easy to estimate. Another disadvantage of the suggested decomposition is that although it does not suffer from the problem of path dependence, we are still left with an interaction term which is difficult to interpret. For these reasons, we suggest to first use a regression-based approach like the -regression approach discussed above, which is essentially as easy to compute as a standard OB decomposition. The reweighting procedure suggested here can then be used to probe these results, and make sure they are robust to the functional-form assumptions implicit in the -regression approach.
As we mentioned earlier, the method of Machado and Mata (2005) can be used to compute the wage structure sub-components of the detailed decomposition. These components are computed by sequentially switching the coefficients of the quantile regressions for each covariate from their estimated valued for group to their estimated values for group . This sequential switching cannot be used, however, to compute the sub-components of the composition effect of the detailed decomposition. Rather, Machado and Mata (2005) suggest an unconditional reweighting approach to do so. This does not provide a consistent effect since the effect of the reweighted covariate of interest gets confounded by other covariates correlated with that same covariate. For instance, if union workers are more concentrated in manufacturing, doing an unconditional reweighting on unions will also change the fraction of workers in manufacturing. In this sense the effect of unions is getting confounded by the effect of manufacturing.
This is a significant drawback since it is arguably more important to conduct a detailed decomposition for the composition effect than for the wage structure effect. As discussed earlier, there are always some interpretation problems with the detailed components of the wage structure effect because of the omitted group problem.
One solution is to use the conditional reweighting procedure described above instead. But once this type of reweighting approach is used, there is no need to estimate (conditional) quantile regressions. Unless the quantile regressions are of interest on their own, it is preferable to use a more consistent approach, such as the one based on the estimation of -regressions, for estimating the detailed components of both the wage structure and composition effects.
In this section, we present three extensions to the decomposition methods discussed earlier. We first consider the case where either the ignorability or the zero conditional mean assumptions are violated because of self-selection or endogeneity of the covariates. We next discuss the situation where some of these problems can be addressed when panel data are available. We conclude the section by discussing the connection between conventional decomposition methods and structural modeling.
The various decomposition procedures discussed up to this point provide consistent estimates of the aggregate composition and wage structure effects under the ignorability assumption. Stronger assumptions, such as conditional mean independence (for decompositions of the mean) or straight independence, have to be invoked to perform the detailed decomposition. In this section we discuss some alternatives for estimating the decomposition when these assumptions fail. We mostly focus on the case of the OB decomposition of the mean, though some of the results we present could be extended to more general distributional statistics.
We consider three scenarios, first introduced in Section 2.1.6, under which the OB decomposition is inconsistent because of a failure of the ignorability or conditional independence assumption. In the first case, the problem is that individuals from groups and may self-select differently into the labor market. For instance, participation decisions of men (group ) may be different from participation decisions of women (group ) in ways that are not captured by observable characteristics. In the second case, we consider what happens when individuals can self-select into group or (for instance union and non-union jobs) on the basis of unobservables. The third case is a standard endogeneity problem where the covariates are correlated with the error term. For example, education (one of the covariate) may be correlated with the error term because more able individuals tend to get more schooling.
One major concern when decomposing differences in wages between two groups with very different labor force participation rates is that the probability of participation depends on unobservables in different ways for groups and . This is a well known problem in the gender wage gap literature (Blau and Kahn, 2006; Olivetti and Petrongolo, 2008; Mulligan and Rubinstein, 2008, etc.) and in the black-white wage gap literature (Neal and Johnson, 1996).
Our estimates of decomposition terms may be directly affected when workers of groups and self-select into the labor market differently. Thus, controlling for selection based on observables and unobservables is necessary to guarantee point identification of the decomposition terms. If no convincing models for self-selection is available a more agnostic approach based on bounds has also been recently proposed. Therefore, following Machado (2009), we distinguish three branches in the literature of self-selection: (i) selection on observables; (ii) selection based on unobservables; (iii) bounds.
Selection based on observables and, when panel data are available, on time-invariant unobserved components can be used to impute values for the missing data on wages of non-participants. Representative papers of this approach are Neal and Johnson (1996), Johnson et al. (2000), Neal (2004), Blau and Kahn (2006) and Olivetti and Petrongolo (2008). These papers are typically concerned with mean or median wages. However, extensions to cumulative distribution functions or general -wage gaps could also be considered.
When labor market participation is based on unobservables, correction procedures for the mean wages are also available. In these procedures, a control variate is added as a regressor in the conditional expectation function. The exclusion restriction that an available instrument does not belong to the conditional expectation function also needs to be imposed. 67 Leading parametric and nonparametric examples are Heckman (1974, 1976), Duncan and Leigh (1980), Dolton and Makepeace (1986), Vella (1998), Mulligan and Rubinstein (2008).
In this setting, the decomposition can be performed by adding a control variate to the regression. In most applications, is the usual inverse Mills’ ratio term obtained by fitting a probit model of the participation decision. Note that the addition of this control variate slightly changes the interpretation of the decomposition. The full decomposition for the mean is now
where and are the estimated coefficients on the control variates. The decomposition provides a full accounting for the wage gap that also includes differences in both the composition of unobservables () and in the return to unobservables (). This treats symmetrically the contribution of observables (the ’s) and unobservables in the decomposition.
A third approach uses bounds for the conditional expectation function of wages for groups and . With those bounds one can come up with bounds for the wage structure effect, , and the composition effect, . Let . Then, letting be a dummy indicating labor force participation, we can write the conditional expected wage as
and therefore
where and are lower and upper bounds of the distribution of , for . Therefore,
This bounding approach to the selection problem may also use restrictions motivated by econometric or economic theory to narrow the bounds, as in Manski (1990) and Blundell et al. (2007).
In the next case we consider, individuals have the choice to belong to either group or . The leading example is the choice of the union status of workers. The traditional way of dealing with the problem is to model the choice decision and correct for selection biases using control function methods.68
As discussed in Section 2.1.6, it is also possible to apply instrumental variable methods more directly without explicitly modeling the selection process into groups and . Imbens and Angrist (1994) show that this will identify the wage gap for the subpopulation of compliers who are induced by the instrument to switch from one group to the other.
The standard assumption used in the OB decomposition is that the outcome variable is linearly related to the covariates, , and that the error term is conditionally independent of , as in Eq. (1). Now consider the case where the conditional independence assumption fails because one or several of the covariates are correlated with the error term. Note that while the ignorability assumption may hold even if conditional independence fails, we consider a general case here where neither assumption holds.
As is well known, the conventional solution to the endogeneity problem is to use instrumental variable methods. For example, if we suspect years of education (one of the covariate) to be correlated with the error term in the wage equation, we can still estimate the model consistently provided that we have a valid instrument for years of education. The decomposition can then be performed by replacing the OLS estimates of the coefficients by their IV counterparts.
Of course, in most cases it is difficult to come up with credible instrumentation strategies. It is important to remember, however, that even when the zero conditional mean assumption fails, the aggregate decomposition may remain valid, provided that ignorability holds. This would be the case, for example, when unobserved ability is correlated with education, but the correlation (more generally the conditional distribution of ability given education) is the same in group and . While we are not able to identify the contribution of education vs. ability in this context (unless we have an instrument), we know that there are no systematic ability differences between groups and once we have controlled for education. As a result, the aggregate decomposition remains valid.
An arguably better way of dealing with the selection and endogeneity problems mentioned above is to use panel data. Generally speaking, panel data methods can be used to compute consistent estimates of the ’s in each of the three cases discussed earlier. For example, if the zero conditional mean assumption holds once we also control for a person-specific fixed effects in a panel of length (), we can consistently estimate using standard panel data methods (fixed effects, first differences, etc.). This provides an alternative way of dealing with endogeneity problems when no instrumental variables are available.
As we also discussed earlier, panel data can be used to impute wages for years where an individual is not participating in the labor market (e.g. Olivetti and Petrongolo, 2008). Note that in cases where groups are mutually exclusive (e.g. men vs. women), it may still be possible to estimate fixed effect models if the basic unit used is the firm (or related concepts) instead (Woodcock, 2008). Care has to be exercised in those circumstances to ensure that the firm fixed effect is the same for both female and male employees of the same firm. Another important issue with these models is the difficulty of interpretation of the differences in male and female intercepts which may capture the unobserved or omitted individual and firm effects.
Panel data methods have also been used to adjust for the selection into groups in cases where the same individual is observed in group and . For example, Freeman (1984) and Card (1996) estimate the union wage gap with panel data to control for the selection of workers into union status. Lemieux (1998) uses a more general approach where the return to the fixed effect may be different in the union and non-union sector. He also shows how to generalize the approach to the case of a decomposition of the variance.
Without loss of generality, assume that the return to the fixed effect for non-union workers (group ) is 1, while it is equal to for union workers. The mean decomposition adjusted for fixed effects yields:
The interpretation of the decomposition is the same as in a standard OB setting except that now represents the composition effect term linked to non-random selection into the union sector, while the wage structure term captures a corresponding wage structure effect.
More sophisticated models with several levels of fixed effects have also been used in practice. For instance, Abowd et al. (2008) decompose inter-industry wage differentials into various components that include both individual- and firm-specific fixed effects.
In Section 2, we pointed out that decomposition methods were closely related to methods used in the program evaluation literature where it is not necessary to estimate a fully specified structural model to estimate the main parameter of interest (the ). Provided that the ignorability assumption is satisfied, we can perform an aggregate decomposition without estimating an underlying structural model.
There are some limits, however, to what can be achieved without specifying any structure to the underlying economic problem. As we just discussed in Section 6.1, one problem is that the ignorability assumption may not hold. Under this scenario, more explicit modeling may be useful for correcting biases in the decomposition due to endogeneity, self-selection, etc.
Another problem that we now address concerns the interpretation of the wage structure components of the detailed decomposition. Throughout this chapter, we have proposed a number of ways of estimating these components for both the mean and more general distributional statistics. In the case of the mean, the interpretation of the detailed decomposition for the wage structure effect is relatively straightforward. Under the assumption (implicit in the OB decomposition) that the wage equations are truly linear and the errors have a zero conditional mean, we can think of the wage setting model as a fully specified structural model. The coefficients are the “deep” structural parameters of the model, and these structural parameters are used directly to perform the decomposition.
Things become more complicated once we go beyond the mean. For instance, in the case of the variance (Section 4.1), recall that the wage structure effect from Eq. (26) which depends on the parameters of both the models for the conditional mean () and for the variance ().
Take, for example, the case where one of the covariates is the union status of workers. The parameter captures the “compression”, or within-group, effect, while the parameter captures the “wage gap”, or between-group, effect. These two terms have a distinct economic interpretation as they reflect different channels through which union wage policies tend to impact the wage distribution.
In the case of more general distributional statistics, the wage structure effect depends on an even larger number of underlying parameters capturing the relationship between the covariates and higher order moments of the distribution. As a result, the wage structure part of the detailed decomposition becomes even harder to interpret, as it potentially depends on a large number of underlying parameters.
In some cases, this may not pose a problem from an interpretation point of view. For instance, we may only care about the overall effect of unions, irrespective of whether it is coming from a between- or within-group effect (or corresponding components for higher order moments). But in other cases this type of interpretation may be unsatisfactory. Consider, for example, the effect of education on the wage structure. Like unions, education may influence wage dispersion through a between- or within-group channel. The between-group component is linked to the traditional return to education (effect on conditional means), but education also has a substantial effect on within-group dispersion (see, e.g., Lemieux, 2006b). All these effects are combined together in the decomposition methods proposed in Section 5, which is problematic if we want to know, for instance, the specific contribution of changes in the return to education to the growth in wage inequality.
In these circumstances, we need to use a more structural approach to get a more economically interpretable decomposition of the wage structure effect. The decomposition method of Juhn et al. (1993) is, in fact, an early example of a more structurally-based decomposition. In their setting, the model for the conditional mean is interpreted as an underlying human capital pricing equation. Likewise, changes in residual wage dispersion (given ) are interpreted as reflecting an increase in the return to unobservable skills.
As we discussed in Section 4.3, the fact that Juhn et al. (1993) provides a richer interpretation of the wage structure effect by separating the within- and between-group components is an important advantage of the method. We also mentioned, however, that the interpretation of the decomposition was not that clear for distributional statistics going beyond the variance, and that the procedure typically imposes substantial restrictions on the data that may or may not hold. By contrast, a method like DFL imposes very little restrictions (provided that the probit/logit model used for reweighting is reasonably flexible), though it is more limited in terms of the economic interpretation of the wage structure effect.
In light of this, the challenge is to find a way of imposing a more explicit structure on the economic problem while making sure the underlying model “fits” the data reasonably well. One possible way of achieving this goal is to go back to the structural form introduced in Section 2 (), and use recent results from the literature on nonparametric identification of structural functions to identify the functions . As discussed in Section 2.2.1, this can be done by invoking results obtained by Matzkin (2003), Blundell and Powell (2007) and Imbens and Newey (2009). Generally speaking, it is possible to identify the functions nonparametrically under the assumptions of independence of (Assumption 8), and strict monotonicity of in (Assumption 9).
But while it is possible, in principle, to nonparametrically identify the functions , there is no guarantee that the resulting estimates will be economically interpretable. As a result, a more common approach used in the empirical literature is to write down a more explicit (and parametric) structural model, but carefully look at whether the model adequately fits the data. Once the model has been estimated, simulation methods can then be used to compute a variety of counterfactual exercises. The counterfactuals then form the basis of a more economically interpretable decomposition of the wage structure effect.
To take a specific example, consider the Keane and Wolpin (1997) model of career progression of young men, where educational and occupational choices are explicitly modeled using a dynamic programming approach. After carefully looking at whether the estimated model is rich enough to adequately fit the distribution of wages, occupational choices, and educational achievement, Keane and Wolpin use the estimated model to decompose the distribution of lifetime utility (itself computed using the model). They conclude that 90 percent of the variance of lifetime utility is due to skill endowment heterogeneity (schooling at age 16 and unobserved type). By contrast, choices and other developments happening after age 16 have a relatively modest impact on the variance of lifetime utility.69 The general idea here is to combine structural estimation and simulation methods to quantify the contribution of the different parameters of interest to some decompositions of interest. These issues are discussed in more detail in the chapter on structural methods by Keane et al. (2011).
One last point is that the interpretation problem linked to the wage structure effect does not apply to the detailed decomposition for the composition effect. In that case, each component is based on a clear counterfactual exercise that does not require an underlying structure to be interpretable. The aggregate decomposition is based on the following counterfactual exercise: what would be the distribution of outcomes for group if the distribution of the covariates for group were the same as for group ? Similarly, the detailed decomposition is based on a conditional version of the counterfactual. For example, one may want to ask what would be the distribution of outcomes for group if the distribution of unionization (or another covariate) for group was the same as for group , conditional on the distribution of the other covariates remaining the same.
These interpretation issues aside, it may still be useful to use a more structural approach when we are concerned about the validity of the decomposition because of self-selection, endogeneity, etc. For instance, in Keane and Wolpin (1997), the choice of schooling and occupation is endogenous. Using standard decomposition methods to look, for instance, at the contribution of the changing distribution of occupations to changes in the distribution wages would yield invalid results because occupational choice is endogenous. In such a context, structural modeling, like the IV and selection methods discussed in Section 6.1, can help recover the elements of the decomposition when standard methods fail because of endogeneity or self-selection. But the problem here is quite distinct from issues with the wage structure effect where standard decomposition methods are limited because of an interpretation problem, and where structural modeling provides a natural way of resolving this interpretation problem. By contrast, solutions to the problem of endogeneity or self-selection are only as a good as the instruments (or related assumptions) used to correct for these problems. As a result, the value added of the structural approach is much more limited in the case of the composition effect than in the case of the wage structure effect.
This last point is very clear in the emerging literature where structural modeling is used in conjunction with experimental data. For example, Card and Hyslop (2005) use experimental data from the Self Sufficiency Project (SSP) to look at why individuals offered with a generous work subsidy are less likely to receive social assistance (SA). By definition, there is no composition effect since the treatment and control groups are selected by random assignment. In that context, the average treatment effect precisely corresponds to the wage structure effect (or “SA” structure effect in this context) in a decomposition of the difference between the treatment and control group. It is still useful, however, to go beyond this aggregate decomposition to better understand the mechanisms behind the measured treatment effect. Card and Hyslop (2005) do so by estimating a dynamic search model.
This provides much more insight into the “black box” of the treatment effect than what a traditional decomposition exercise would yield. Remember that the detailed wage structure component in a OB type decomposition is based on the difference between the return to different characteristics in the two groups. In a pure experimental context like the SSP project, this simply reflects some heterogeneity in the treatment effect across different subgroups. Knowing about the importance of heterogeneity in the treatment effect is important from the point of view of the generalizability of the results. But unlike a structural approach, it provides relatively little insight on the mechanisms underlying the treatment effect.
The development of new decomposition methods has been a fertile area of research over the last 10-15 years. Building on the seminal work of Oaxaca (1973) and Blinder (1973), a number of procedures that go beyond the mean have been suggested and used extensively in practice. In this chapter, we have reviewed these methods and suggested a number of “best practices” for researchers interested in these issues. We have also illustrated how these methods work in practice by discussing existing applications and working through a set of empirical examples throughout the chapter.
Another important and recent development in this literature has linked decomposition methods to the large and growing literature on program evaluation and treatment effects. This connection is useful for several reasons. First, it helps clarify some interpretation issues with decompositions. In particular, results from the treatment effects literature can be used to show, for example, that we can give a structural interpretation to an aggregate decomposition under the assumption of ignorability. Another benefit of this connection is that formal results about the statistical properties of treatment effects estimators can also be directly applied to decomposition methods. This helps guide the choice of decomposition methods that have good statistical properties, and conduct inference on these various components of the estimated decomposition.
But this connection with the treatment effects literature also comes at a cost. While no structural modeling is required to perform a decomposition or estimate a treatment effect, these approaches leave open the question of what are the economic mechanisms behind the various elements of the decomposition (or behind the treatment effect). Now that the connection between decomposition methods and the treatment effects literature has been well established, an important direction for future research will be to improve the connection between decomposition methods and structural modeling.
The literature on inequality provides some useful hints on how this connection can be useful and improved upon. In this literature, decomposition methods have helped uncover the most important factors behind the large secular increase in the distribution of wages. Those include the return to education, de-unionization, and the decline in the minimum wage, to mention a few examples. These findings have spurred a large number of more conceptual studies trying to provide formal economic explanations for these important phenomena. In principle, these explanations can then be more formally confronted to the data by writing down and estimating a structural model, and using simulation methods to quantify the role of these explanations.
This suggest a two-step research strategy where “off-the-shelf” decomposition methods, like those discussed in this chapter, can first be used to uncover the main forces underlying an economic phenomenon of interest. More “structural” decomposition methods could then be used to better understand the economics behind the more standard decomposition results. We expect such a research strategy to be a fruitful area of research in the years to come.
References
Abowd, John M., Kramarz, Francis, Lengerman, Paul, Roux, Sebastien, 2008. Persistent inter-industry wage differences: rent sharing and opportunity costs. Working paper
James Albrecht, Anders Björklund, Susan Vroman. Is there a glass ceiling in Sweden? Journal of Labor Economics. 2003;21:145-178.
Joseph G. Altonji, Rebecca Blank. Race and gender in the labor market. In: O. Ashenfelter, D. Card., editors. Handbook of Labor Economics, vol. 3C. Amsterdam: Elsevier Science, 1999.
Joseph G. Altonji, Rosa L. Matzkin. Cross section and panel data estimators for nonseparable models with endogenous regressors. Econometrica. 2005;73:1053-1102.
Altonji, Joseph G., Bharadwaj, P., Lange, Fabian, 2008, Changes in the characteristics of American youth: Implications for adult outcomes. Working paper, Yale University
Susan Athey, Guido W. Imbens. Identification and inference in nonlinear difference-in-differences models. Econometrica. 2006;74:431-497.
David H. Autor, Frank Levy, Richard Murnane. The skill content of recent technological change: an empirical exploration. Quarterly Journal of Economics. 2003;118:1279-1333.
Autor, David H., Katz, Lawrence B., Kearney, Melissa S., 2005. Rising Wage Inequality: The Role of Composition and Prices. NBER Working Paper No. 11628, September
R. Barsky, John Bound, K. Charles, J. Lupton. Accounting for the black-white wealth gap: a nonparametric approach. Journal of the American Statistical Association. 2002;97:663-673.
Thomas K. Bauer, Silja Göhlmann, Mathias Sinning. Gender differences in smoking behavior. Health Economics. 2007;19:895-909.
Thomas K. Bauer, Mathias Sinning. An extension of the Blinder–Oaxaca decomposition to nonlinear models. Advances in Statistical Analysis. 2008;92:197-206.
Marianne Bertrand, Kevin F. Hallock. The gender gap in top corporate jobs. Industrial and Labor Relations Review. 2001;55:3-21.
Martin Biewen. Measuring the effects of socio-economic variables on the income distribution: an application to the income distribution: an application to the East German transition process. Review of Economics and Statistics. 2001;83:185-190.
Marianne P. Bitler, Jonah B. Gelbach, Hilary W. Hoynes. What mean impacts miss: distributional effects of welfare reform experiments. American Economic Review. 2006;96:988-1012.
Dan Black, Amelia Haviland, Seth Sanders, Lowell Taylor. Gender wage disparities among the highly educated. Journal of Human Resources. 2008;43:630-659.
Francine D. Blau, Lawrence M. Kahn. The gender earnings gap: learning from international comparisons. American Economic Review. 1992;82:533-538.
Francine D. Blau, Lawrence M. Kahn. Swimming upstream: trends in the gender wage differential in the 1980s. Journal of Labor Economics. 1997;15:1-42.
Francine D. Blau, Lawrence M. Kahn. Understanding international differences in the gender pay gap. Journal of Labor Economics. 2003;21:106-144.
Francine D. Blau, Lawrence M. Kahn. The US gender pay gap in the 1990s: slowing convergence. Industrial & Labor Relations Review. 2006;60(1):45-66.
Alan Blinder. Wage discrimination: reduced form and structural estimates. Journal of Human Resources. 1973;8:436-455.
Richard Blundell, James L. Powell. Censored regression quantiles with endogenous regressors. Journal of Econometrics. 2007;141:65-83.
Richard Blundell, Amanda Gosling, Hidehiko Ichimura, Costas Meghir. Changes in the distribution of male and female wages accounting for employment composition using bounds. Econometrica. 2007;75:323-363.
Francois Bourguignon. Decomposable income inequality measures. Econometrica. 1979;47:901-920.
F. Bourguignon, Francisco H.G. Ferreira. Decomposing changes in the distribution of household incomes: methodological aspects. In: F. Bourguignon, F.H.G. Ferreira, N. Lustig, editors. The Microeconomics of Income Distribution Dynamics in East Asia and Latin America. World Bank; 2005:17-46.
F. Bourguignon, Francisco H.G. Ferreira, Philippe G. Leite. Beyond Oaxaca–Blinder: Accounting for differences in household income distributions. Journal of Economic Inequality. 2008;6:117-148.
Busso, Matias, DiNardo, John, McCrary, Justin, 2009. New Evidence on the Finite Sample Properties of Propensity Score Matching and Reweighting Estimators. IZA Discussion Paper No. 3998
Kristin F. Butcher, John DiNardo. The Immigrant and native-born wage distributions: evidence from United States censuses. Industrial and Labor Relations Review. 2002;56:97-121.
Glen Cain. The economic analysis of labor market discrimination: a survey. In: O.C. Ashenfelter, R. Layard, editors. Handbook of Labor Economics, vol. 1. North-Holland; 1986:709-730.
Card, David, 1992. The Effects of Unions on the Distribution of Wages: Redistribution or Relabelling? NBER Working Paper 4195. National Bureau of Economic Research, Cambridge, Mass
David Card. The effect of unions on the structure of wages: a longitudinal analysis. Econometrica. 1996;64:957-979.
David Card, Dean R. Hyslop. Estimating the effects of a time-limited earnings subsidy for welfare-leavers. Econometrica. 2005;73:1723-1770.
Kenneth Y. Chay, David S. Lee. Changes in relative wages in the 1980s: returns to observed and unobserved skills and black-white wage differentials. Journal of Econometrics. 2000;99(1):1-38.
Chernozhukov, Victor, Fernandez-Val, Ivan, Melly, Blaise, 2009. Inference on Counterfactual Distributions. CeMMAP working paper CWP09/09
Victor Chernozhukov, Ivan Fernandez-Val, A. Galichon. Quantile and probability curves without crossing. Econometrica. 2010;78:1093-1126.
Daniel Chiquiar, Gordon H. Hanson. International migration, self-selection, and the distribution of wages: Evidence from Mexico and the United States. Journal of Political Economy. 2005;113:239-281.
Jeremiah Cotton. On the decomposition of wage differentials. Review of Economics and Statistics. 1998;70:236-243.
Frank A. Cowell. On the structure of additive inequality measures. Review of Economic Studies. 1980;47:521-531.
Denison, E.F., 1962. The sources of economic growth in the United States and the alternatives before us. Supplementary Paper No. 13. Committee for Economic Development, New York
John DiNardo, Nicole M. Fortin, Thomas Lemieux. Labor market institutions and the distribution of wages, 1973-1992: a semiparametric approach. Econometrica. 1996;64:1001-1044.
John DiNardo, David S. Lee. Economic impacts of new unionization on private sector employers: 1984-2001. The Quarterly Journal of Economics. 2004;119:1383-1441.
John DiNardo, Thomas Lemieux. Diverging male inequality in the United States and Canada, 1981-1988: do institutions explain the difference. Industrial and Labor Relations Review. 1997;50:629-651.
Peter John Dolton, Gerald H. Makepeace. Sample selection and male-female earnings differentials in the graduate labour market. Oxford Economic Papers. 1986;38:317-341.
Denise J. Doiron, W. Craig Riddell. The impact of unionization on male-female earnings differences in Canada. Journal of Human Resources. 1994;29:504-534.
Stephen G. Donald, David A. Green, Harry J. Paarsch. Differences in wage distributions between Canada and the United States: an application of a flexible estimator of distribution functions in the presence of covariates source. Review of Economic Studies. 2000;67:609-633.
Gregory M. Duncan, Duane E. Leigh. Wage determination in the union and nonunion sectors: a sample selectivity approach. Industrial and Labor Relations Review. 1980;34:24-34.
Egel, Daniel, Graham, Bryan, Pinto, Cristine, 2009. Efficicient estimation of data combination problems by the method of auxiliary-to-study tilting. mimeo
William E. Even, David A. Macpherson. Plant size and the decline of unionism. Economics Letters. 1990;32:393-398.
Robert W. Fairlie. The absence of the African-American owned business: an analysis of the dynamics of self–employment. Journal of Labor Economics. 1999;17:80-108.
Robert W. Fairlie. An extension of the Blinder-Oaxaca decomposition technique to logit and probit models. Journal of Economic and Social Measurement. 2005;30:305-316.
Judith Fields, Edward N. Wolff. Interindustry wage differentials and the gender wage gap. Industrial and Labor Relations Review. 1995;49:105-120.
Sergio Firpo. Efficient semiparametric estimation of quantile treatment effects. Econometrica. 2007;75:259-276.
Firpo, Sergio, 2010. Identification and Estimation of Distributional Impacts of Interventions Using Changes in Inequality Measures. EESP-FGV. mimeo
Firpo, Sergio, Fortin, Nicole M., Thomas, Lemieux, 2007. Decomposing Wage Distributions using Recentered Influence Functions Regressions. mimeo, University of British Columbia
Sergio Firpo, Nicole M. Fortin, Thomas Lemieux. Unconditional quantile regressions. Econometrica. 2009;77(3):953-973.
Fitzenberger, Bernd, Kohn, Karsten, Wang, Qingwei, 2010. The erosion of union membership in Germany: determinants, densities, decompositions. Journal of Population Economics (forthcoming)
Silverio Foresi, Franco Peracchi. The conditional distribution of excess returns: an empirical analysis. Journal of the American Statistical Association. 1995;90:451-466.
Nicole M. Fortin, Thomas Lemieux. Rank regressions, wage distributions, and the gender gap. Journal of Human Resources. 1998;33:610-643.
Nicole M. Fortin. The gender wage gap among young adults in the United States: the importance of money vs. people. Journal of Human Resources. 2008;43:886-920.
Richard B. Freeman. Unionism and the dispersion of wages. Industrial and Labor Relations Review. 1980;34:3-23.
Richard B. Freeman. Longitudinal analysis of the effect of trade unions. Journal of Labor Economics. 1984;2:1-26.
Richard B. Freeman. How much has deunionization contributed to the rise of male earnings inequality? In: Sheldon Danziger, Peter Gottschalk, editors. Uneven Tides: Rising Income Inequality in America. New York: Russell Sage Foundation, 1993. 133-63
Markus Frolich. Finite-sample properties of propensity-score matching and weighting estimators. Review of Economics and Statistics. 2004;86:77-90.
Javier Gardeazabal, Arantza Ugidos. More on the identification in detailed wage decompositions. Review of Economics and Statistics. 86, 2004. 1034–57
Gelbach, Jonah B., 2002. Identified Heterogeneity in Detailed Wage Decompositions. mimeo, University of Maryland at College Park
Gelbach, Jonah B., 2009. When Do Covariates Matter? And Which Ones, and How Much? mimeo, Eller College of Management, University of Arizona
Joanna Gomulka, Nicholas Stern. The employment of married women in the United Kingdom, 1970–1983. Economica. 1990;57:171-199.
Amanda Gosling, Stephen Machin, Costas Meghir. The changing distribution of male wages in the U.K,. Review of Economic Studies. 2000;67:635-666.
William H. Greene. Econometric Analysis, 5th ed. Upper Saddle River, NJ: Pearson Education; 2003.
James Heckman. Shadow prices, market wages and labor supply. Econometrica. 1974;42:679-694.
James Heckman. The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. Annals of Economic and Social Measurement. 1976;5:475-492.
James Heckman. Sample selection bias as a specification error. Econometrica. 1979;47:153-163.
James J. Heckman, Jeffrey Smith, Nancy Clements. Making the most out of programme evaluations and social experiments: accounting for heterogeneity in programme impacts. Review of Economic Studies. 1997;64(4):487-535.
James J. Heckman, Hidehiko Ichimura, Petra Todd. Matching as an econometric evaluation estimator: evidence from evaluating a job training programme. Review of Economic Studies. 1997;64:605-654.
James J. Heckman, Hidehiko Ichimura, Jeffrey Smith, Petra Todd. Characterizing selection bias using experimental data. Econometrica. 1998;66:1017-1098.
Heywood, John S., Parent, Daniel, 2009. Performance Pay and the White-Black Wage Gap. mimeo, McGill University
Kiesuke Hirano, Guido W. Imbens, Geert Ridder. Efficient estimation of average treatment effects using the estimated propensity score. Econometrica. 2003;71:1161-1189.
Paul W. Holland. Statistics and causal inference. Journal of the American Statistical Association. 1986;81(396):945-960.
Hoffman, Florian, 2009. An Empirical Model of Life-Cycle Earnings and Mobility Dynamics. University of Toronto, Department of Economics. mimeo
William Horrace, Ronald L. Oaxaca. Inter-industry wage differentials and the gender wage gap: an identification problem. Industrial and Labor Relations Review. 2001;54:611-618.
Guido W. Imbens, Joshua Angrist. Identification and estimation of local average treatment effects. Econometrica. 1994;62:467-476.
Guido W. Imbens, Whitney K. Newey. Identification and estimation of triangular simultaneous equations models without additivity. Econometrica. 2009;77(5):1481-1512.
Jann, Ben, 2005. Standard errors for the Blinder-Oaxaca decomposition. German Stata Users’ Group Meetings 2005. Available from http://repec.org/dsug2005/oaxaca_se_handout.pdf
Ben Jann. The Oaxaca-Blinder decomposition for linear regression models. Stata Journal. 2008;8:435-479.
Frank Lancaster Jones. On decomposing the wage gap: a critical comment on blinder’s method. Journal of Human Resources. 1983;18:126-130.
William Johnson, Yuichi Kitamura, Derek Neal. Evaluating a simple method for estimating black-white gaps in median wages. American Economic Review. 2000;90:339-343.
D.W. Jorgenson, Z. Griliches. The explanation of productivity change. Review of Economic Studies. 1967;34:249-283.
Chinhui Juhn, Kevin M. Murphy, Brooks Pierce. Accounting for the slowdown in black-white wage convergence. In: M.H. Kosters, editor. Workers and Their Wages: Changing Patterns in the United States. Washington: American Enterprise Institute, 1991.
Chinhui Juhn, Kevin M. Murphy, Brooks Pierce. Wage inequality and the rise in returns to skill. Journal of Political Economy. 1993;101:410-442.
Michael P. Keane, Kenneth I. Wolpin. The career decisions of young men. Journal of Political Economy. 1997;105:473-522.
Michael P. Keane, Petra E. Todd, Kenneth I. Wolpin. The structural estimation of behavioral models: discrete choice dynamic programming methods and applications. In: O. Ashenfelter, D. Card, editors. Handbook of Labor Economics, vol. 4A. Amsterdam: Elsevier Science; 2011:331-461.
John W. Kendrick. Productivity Trends in the United States. Princeton: Princeton University Press; 1961.
Peter Kennedy. Interpreting dummy variables. Review of Economics and Statistics. 1986;68:174-175.
Kline, Pat, 2009. Blinder-Oaxaca as a Reweighting Estimator. UC Berkeley mimeo
Roger Koenker, G. Bassett. Regression quantiles. Econometrica. 1978;46:33-50.
Alan B. Krueger, Lawrence H. Summers. Efficiency wages and the inter-industry wage structure. Econometrica. 1988;56(2):259-293.
John M. Krieg, Paul Storer. How much do students matter? applying the Oaxaca decomposition to explain determinants of adequate yearly progress. Contemporary Economic Policy. 2006;24:563-581.
Thomas Lemieux. Estimating the effects of unions on wage inequality in a panel data model with comparative advantage and non-random selection. Journal of Labor Economics. 1998;16:261-291.
Thomas Lemieux. Decomposing changes in wage distributions: a unified approach. The Canadian Journal of Economics. 2002;35:646-688.
Thomas Lemieux. Post-secondary education and increasing wage inequality. American Economic Review. 2006;96:195-199.
Thomas Lemieux. Increasing residual wage inequality: composition effects, noisy data, or rising demand for skill? American Economic Review. 2006;96:461-498.
H.Gregg Lewis. Unionism and Relative Wages in the United States. Chicago: University of Chicago Press; 1963.
H.Gregg Lewis. Union Relative Wage Effects: A Survey. Chicago: University of Chicago Press; 1986.
David Neumark. Employers’ discriminatory behavior and the estimation of wage discrimination. Journal of Human Resources. 1988;23:279-295.
José F. Machado, José Mata. Counterfactual decomposition of changes in wage distributions using quantile regression. Journal of Applied Econometrics. 2005;20:445-465.
Machado, Cecilia, 2009. Selection, Heterogeneity and the Gender Wage Gap. Columbia University, Economics Department. mimeo
Charles F. Manski. Nonparametric bounds on treatment effects. American Economic Review. 1990;80(2):319-323.
Rosa L. Matzkin. Nonparametric estimation of nonadditive random functions. Econometrica. 2003;71(5):1339-1375.
P.J. McEwan, J.H. Marshall. Why does academic achievement vary across countries? Evidence from Cuba and Mexico. Education Economics. 2004;12:205-217.
Melly, Blaise, 2006. Estimation of counterfactual distributions using quantile regression. University of St. Gallen, Discussion Paper
Blaise Melly. Decomposition of differences in distribution using quantile regression. Labour Economics. 2005;12:577-590.
Casey B. Mulligan, Yona Rubinstein. Selection, investment, and women’s relative wages over time. Quarterly Journal of Economics. 2008;123:1061-1110.
Derek A. Neal, W. Johnson. The role of premarket factors in black-white wage differences. Journal of Political Economy. 1996;104:869-895.
Derek A. Neal. The measured black-white wage gap among women is too small. Journal of Political Economy. 2004;112:S1-S28.
Hugo Ñopo. Matching as a tool to decompose wage gaps. Review of Economics and Statistics. 2008;90:290-299.
Ronald Oaxaca. Male-female wage differentials in urban labor markets. International Economic Review. 1973;14:693-709.
Ronald L. Oaxaca, Michael R. Ransom. On discrimination and the decomposition of wage differentials. Journal of Econometrics. 1994;61:5-21.
Ronald L. Oaxaca, Michael R. Ransom. Calculation of approximate variances for wage decomposition differentials. Journal of Economic and Social Measurement. 1998;24:55-61.
Ronald L. Oaxaca, Michael R. Ransom. Identification in detailed wage decompositions. Review of Economics and Statistics. 1999;81:154-157.
Ronald L. Oaxaca. The challenge of measuring labor market discrimination against women. Swedish Economic Policy Review. 2007;14:199-231.
Claudia Olivetti, Barbara Petrongolo. Unequal pay or unequal employment? a cross-country analysis of gender gaps. Journal of Labor Economics. 2008;26:621-654.
June O’Neill, Dave O’Neill. What do wage differentials tell us about labor market discrimination?. Soloman Polachek, Carmel Chiswich, Hillel Rapoport, editors. The Economics of Immigration and Social Policy. Research in Labor Economics. 2006;24:293-357.
Cornelia W. Reimers. Labor market discrimination against hispanic and black men. Review of Economics and Statistics. 1983;65:570-579.
James Robins, Andrea Rotnizky, Lue Ping Zhao. Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association. 1994;89:846-866.
Paul R. Rosenbaum, Donald B. Rubin. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70:41-55.
Paul R. Rosenbaum, Donald B. Rubin. Reducing bias in observational studies using subclassification on the propensity score. Journal of the American Statistical Association. 1984;79:516-524.
Christoph Rothe. Nonparametric estimation of distributional policy effects. Journal of Econometrics. 2009;155:56-70.
Anthony F. Shorrocks. The class of additively decomposable inequality measures. Econometrica. 1980;48:613-625.
Anthony F. Shorrocks. Inequality decomposition by population subgroups. Econometrica. 1984;52:1369-1385.
Shorrocks, Anthony F. 1999. Decomposition Procedures for Distributional Analysis: A Unified Framework Based on the Shapley Value. University of Essex, Department of Economics. mimeo
Robert Solow. Technical change and the aggreagate production function. Review of Economics and Statistics. 1957;39:312-320.
Sohn, Ritae, 2008. The Gender Math Gap: Is It Growing? mimeo, SUNY Albany
Frank Vella. Estimating models with sample selection bias: a survey. Journal of Human Resources. 1998;33:127-169.
Simon D. Woodcock. Wage differentials in the presence of unobserved worker, firm, and match heterogeneity? Labour Economics. 2008;15:772-794.
Myeong-Su Yun. A simple solution to the identification problem in detailed wage decomposition. Economic Inquiry. 2005;43:766-772. with Erratum, Economic Inquiry (2006), 44: 198
Myeong-Su Yun. Identification problem and detailed oaxaca decomposition: a general solution and inference. Journal of Economic and Social Measurement. 2008;33:27-38.
1 See also Kendrick (1961), Denison (1962), and Jorgenson and Griliches (1967).
2 We limit our discussion to so-called “regression-based” decomposition methods, where the decomposition focuses on explanatory factors, rather than decomposition methods that apply to additively decomposable indices, where the decomposition pertains to population sub-groups. Bourguignon and Ferreira (2005) and Bourguignon et al. (2008) are recent surveys discussing these methods.
3 The decomposition can also be written by exchanging the reference group used for the wage structure and composition effects as follows:
Alternatively, the so-called three-fold decomposition uses the same reference group for both effects, but introduces a third interaction term: . While these various versions of the basic decomposition are used in the literature, using one or the other does not involve any specific estimation issues. For the sake of simplicity, we thus focus on the one decomposition introduced in the text for most of the chapter.
4 Firpo (2010) shows that for any smooth functional of the reweighted cdf, efficiency is achieved. In other words, decomposing standard distributional statistics such as the variance, the Gini coefficient, or the interquartile range using the reweighting method suggested by DiNardo et al. (1996) will be efficient. Note, however, that this result does not apply to the (more complicated) case of the density considered by DiNardo et al. (1996) where non-parametric estimation is involved.
5 One possible explanation for the lack of discussion of identification assumptions is that they were reasonably obvious in the case of the original OB decompositions for the mean. The situation is quite a bit more complex, however, in the case of distributional statistics other than the mean. Note also that some recent papers have started addressing these identification issues in more detail. See, for instance, Firpo et al. (2007), and Chernozhukov et al. (2009).
6 Alternatively, the overlapping issue can bypassed by excluding Hispanics from the Black and White groups.
7 Many papers (DiNardo et al., 1996; Machado and Mata, 2005; Chernozhukov et al., 2009) have proposed methodologies to estimate and decompose entire distributions (or densities) of wages, but the decomposition results are ultimately quantified through the use of distributional statistics. Analyses of the entire distribution look at several of these distributional statistics simultaneously.
8 When we construct the counterfactual , we choose to be the reference group and the group whose wages are “adjusted”. Thus counterfactual women’s wages if they were paid like men would be , although the gender gap example is more difficult to conceive in the treatment effects literature.
9 Chernozhukov et al. (2009) discuss the conditions under which the two types of decomposition are equivalent.
10 To see more explicitly how the conditional distribution depends on the distribution of , note that we can write under the assumption that is monotonic in (see Assumption 9 introduced below).
11 See, for instance, Rosenbaum and Rubin (1983, 1984), Heckman et al. (1997a,b) and Heckman et al. (1998).
12 Differences in the distribution of the are fairly constrained under the ignorability assumption. While the unconditional distribution of may differ between group and (because of differences in the distribution of ), the conditional distribution of has to be the same for the two groups.
13 This monotonicity assumption can also be found in the works of Matzkin (2003), Altonji and Matzkin (2005), Imbens and Newey (2009), and Athey and Imbens (2006).
14 The rank pairing of two outcome variables and will be disrupted if the rank of remains the same because at a mass point corresponding to the minimum wage, while the rank of continues to increase in the absence of minimum wage at the rank. Heckman et al. (1997a,b) consider the case of mass points at zero, but the case of multiple mass points is much more difficult.
15 Note that it is possible to relax the homoskedasticity assumption while maintaining the assumption of a single price of unobservables , as in Chay and Lee (2000). We do not follow this approach here to simplify the presentation.
16 Note that we depart somewhat from our previous notation, as retains some components of the structural form of group B, which will disappear in below.
17 See Blau and Kahn (1992, 2003) for an application of the methodology to the study of gender wage differentials across countries.
18 Only and are observed.
19 We note that this last decomposition corresponds, in the OB context, to the so-called three-fold decomposition presented in footnote 3.
20 The union/non-union wage gaps or private/public sector wage gaps are more amenable to choice.
21 Note that some analyses (e.g. Neal and Johnson, 1996) take great care to focus on pre-market variables.
22 The empirical applications of the OB procedure in this chapter use Jann (2008) procedures in Stata.
23 As is common in the gender pay gap literature, we begin with the counterfactual that use group (males) as the reference group. In column (3) of Table 3, we present the decomposition that corresponds to Eq. (15), that is uses group (females) as the reference group.
24 In particular, see the discussion of the case of scalable or categorical variables below.
25 This interpretation issue also arises in other applications that use categorical variables, notably the inter-industry wage differentials literature. In this literature, following the seminal Krueger and Summers (1988) paper on inter-industry wage differentials, the standard practice is to express industry differentials as deviations from an employment-share weighted mean, a well-defined average.
26 In the first regression, the composition effect is given by , and in the second regression, because , .
27 Actually, problems arise when they are more than two categories. Blinder (1973, footnote 13) and Oaxaca (2007) correctly point out that in the case of a binary dummy variable, these problems do not occur.
28 This problem is different from a “true” identification problem which arises when multiple values of a parameter of interest are consistent with a given model and population.
29 As pointed by Gardeazabal and Ugidos (2004), such restrictions can have some disturbing implications. In the case of educational categories, it rules out an outcome where group members would earn higher returns than group members for all levels of education.
30 In the gender wage gap literature, when the reference wage structure is the male wage structure (group ) the means among women will be used in Eq. (22).
31 It is indeed easy to see that .
32 The for the omitted category is simply the first and last components of Eq. (22), since for that category.
33 and are the matrices of covariates (of dimension and ) for groups and , respectively.
34 This “pooled” decomposition is easily implemented using the option “pooled” in Jann (2008) “oaxaca” procedure in Stata 9.2.
35 When considering covariates , we use the subscript to denote the group whose characteristics are “adjusted” with reweighting.
36 We show in Section 4 that the reweighting factor is defined as the ratio of the marginal distributions of for groups and , . As a result, the reweighted distribution of for group should be the same as the original distribution of in group . This implies that the mean value of in the reweighted sample, , should be the same as the mean value of for group , .
37 When the conditional expectation is non-linear, the OLS estimate of can be interpreted as the one which minimizes the square of the specification error over the distribution of . Since the expected value of the OLS estimate of depends on the distribution of , differences in over two samples may either reflect true underlying differences in the conditional expectation (i.e. in the wage structure), or “spurious” differences linked to the fact that the distribution of is different in the two samples. For example, if is convex in , the expected value of will tend to grow as the distribution of shifts up, since the relationship between and gets steeper as becomes larger.
38 This corresponds to an experimental setting where, for example, regression analysis was used to assess the impact of various soils and fertilizers () on agricultural yields .
39 See, for instance, Bourguignon (1979), Cowell (1980), and Shorrocks (1980, 1984).
40 See for example, Theorem B.4 in Greene (2003).
41 Estimating these simple models of the conditional cross-sectional variance is a special case of the large time-series literature on the estimation of auto-regressive conditional heteroskedasticity models (ARCH, GARCH, etc.).
42 See Albrecht et al. (2003), who look at whether there is a glass ceiling in female earnings, and Bitler et al. (2006), who study the distributional effects of work incentive programs on labor supply.
43 Juhn et al. (1993) actually consider multiple time periods and proposed an additional counterfactual where the returns to observables are set to their mean across time periods, a complex counterfactual treatment.
44 See also Lemieux (2002).
45 For each random draw , MM also draw a vector of covariates from the observed data and perform the prediction for this value only. Melly (2005) discusses more efficient ways of computing distributions using this conditional quantile regression approach.
46 The estimates were computed with Melly’s implementation “rqdeco” in Stata.
47 See Melly (2005) for a detailed description of this alternative procedure. Gosling et al. (2000) and Autor et al. (2005) also use a similar idea in their empirical applications to changes in the distribution of wages over time.
48 Machado and Mata (2005) (page 449-450) suggest computing the detailed decomposition for the composition effect using an unconditional reweighting procedure. This is invalid as a way of performing the decomposition for the same reason that a OB decomposition would be invalid if the coefficient used for one covariate was estimated without controlling for the other covariates. We propose a conditional reweighting procedure in the next section that deals adequately with this issue.
49 This view of course makes more sense when some policy or other change has taken place over time (see Biewen, 2001).
50 On the other hand, by analogy with the treatment effects literature, Firpo et al. (2007) use time period 0 as the reference group.
51 The estimator suggested by Hirano et al. (2003) is a series estimator applied to the case of a logit model. The idea is to add increasingly higher order polynomial terms in the covariates as the size of the sample increases. Importantly, they also show that this approach yields an efficient estimate of the treatment effect.
52 The two most popular kernel functions are the Gaussian and the Epanechnikov kernel.
53 By contrast, in the original DiNardo et al. (1996) decomposition, workers in 1988 (time period 1) were reweighted to look like workers in 1979 (time period 0). The counterfactual distribution of wages was asking what would the distribution of wages look like if the workers’ characteristics had remained at 1979 levels.
54 In small samples, it is important to ensure that these estimated weights sum up to the number of actual observations in the sample, though this is done automatically in packages like Stata. See Busso et al. (2009) for more detail.
55 The analytical standard errors have to take account of the fact that the logit or probit model used to construct the reweighting factor is estimated. Firpo et al. (2007) show how to perform this adjustment. In practice, however, it is generally simpler to bootstrap the whole estimation procedure (both the estimation of the logit/probit to construct the weights and the computation of the various elements of the decomposition).
56 In principle, other popular methods in the program evaluation literature such as matching could be used instead of reweighting.
57 Foresi and Peracchi (1995) proposed to use a sequence of logit models to estimate the conditional distribution of excess returns.
58 Donald et al. (2000) use a more general specification of the proportional hazard model where and are allowed to vary for different values (segments) of .
59 The estimation was performed using Melly’s “counterfactual” Stata procedure. The computation of the variance and Gini coefficient were based on the estimation of 100 centiles.
60 Chernozhukov et al. (2009) use the method of Chernozhukov et al. (2010) to ensure that the function is monotonic.
61 Firpo et al. (2009) also propose other more flexible estimation procedures.
62 Using a reweighted regression approach can be particularly important in the cases of -regressions that are unlikely to be linear for distributional statistics besides the mean.
63 The reweighting error reflects the fact that the composition effect in the reweighted-regression decomposition, , is not exactly equal to the standard composition effect when the reweighted mean is not exactly equal to .
64 Note that in DFL, it is the opposite; group is the 1988 time period and group is the 1979 time period.
65 See, for example, Butcher and DiNardo (2002) and Altonji et al. (2008).
66 Both Butcher and DiNardo (2002) and Altonji et al. (2008) consider cases where there is indeed a good reason for following a particular order in the decomposition. For instance, Altonji et al. (2008) argue that, when looking at various youth outcomes, one should first control for predetermined factors like gender and race before controlling for other factors determined later in life (AFQT score, educational achievement, etc.). In such a situation, the decomposition is econometrically interpretable even if gender and race are introduced first without controlling for the other factors.
67 As is well known, selection models can be identified on the basis of functional restrictions even when an excluded instrumental variable is not available. This is no longer viewed, however, as a credible identification strategy. We, therefore, only focus on the case where an instrumental variable is available.
68 See for instance, the survey of Lewis (1986) who concludes that these methods yield unreliable estimates of the union wage gap. Given these negative results and the lack of credible instruments for unionization, not much progress has been made in this literature over the last two decades. One exception is DiNardo and Lee (2004) who use a regression discontinuity design.
69 Note, however, that Hoffman (2009) finds that skill endowments have a sizably smaller impact in a richer model that incorporates comparative advantage (across occupations), search frictions, and exogenous job displacement.
18.225.95.245