4 Going beyond the Mean—Distributional Methods

Developing new decomposition methods for distributional statistics other than the mean has been an active research area over the last 15 years. In this section, we discuss a number of procedures that have been suggested for decomposing general distributional statistics. We focus on the case of the aggregate decomposition, though some of the suggested methods can be extended to the case of the detailed decomposition, which we discuss in Section 5. We begin by looking at the simpler case of a variance decomposition. The decomposition is obtained by extending the classic analysis of variance approach (based on a between/within group approach) to a general case with covariates image. We then turn to new approaches based on various “plugging in” methods such as JMP’s residual imputation method and Machado and Mata (2005)’s conditional quantile regression method. Finally, we discuss methods that focus on the estimation of counterfactuals for the entire distribution. These methods are either based on reweighting or on the estimation of the conditional distribution.

Most of this recent research was initially motivated by the dramatic growth in earnings inequality in the United States. Prior to that episode, the literature was considering particular summary measures of inequality such as the variance of logs and the Gini coefficient. For instance, Freeman (1980, 1984) looks at the variance of log wages in his influential work on the effect of unions on wage dispersion. This research establishes that unions tend to reduce wage dispersion as measured by the variance of log wages. Freeman shows that despite the inequality-enhancing effect of unions on the between-group component of inequality, the overall effect of unions is to reduce inequality because of the even larger effect of unions on within-group inequality.

One convenient feature of the variance is that it can be readily decomposed into a within- and between-group component. Interestingly, related work in the inequality literature shows that other measures such as the Gini or Theil coefficient are also decomposable into a within- and between-group component.39

Note that the between vs. within decomposition is quite different in spirit from the aggregate or detailed OB decomposition discussed in the previous section. There are advantages and disadvantages to this alternative approach. On the positive side, looking at between- and within-group effects can help understand economic mechanisms, as in the case of unions, or the sources of inequality growth (Juhn et al., 1993).

On the negative side, the most important drawback of the between vs. within decomposition is that it does not hold in the case of many other interesting inequality measures such as the interquartile ranges, the probability density function, etc. This is a major shortcoming since looking at what happens where in the distribution is important for identifying the factors behind changes or differences in distributions. Another drawback of the between vs. within approach is that it does not provide a straightforward way of looking at the specific contribution of each covariate, i.e. to perform a detailed decomposition. One final drawback is that with a rich enough set of covariates the number of possible groups becomes very large, and some parametric restrictions have to be introduced to keep the estimation problem manageable.

In response to these drawbacks, a new set of approaches have been proposed for performing aggregate decompositions on any distributional statistic. Some approaches such as Juhn et al. (1993), Donald et al. (2000), and Machado and Mata (2005) can be viewed as extensions of the variance decomposition approach where the whole conditional distribution (instead of just the conditional variance) are estimated using parametric approaches. Others such as DiNardo et al. (1996) completely bypass the problem of estimating conditional distributions and are, as such, closer cousins to estimators proposed in the program evaluation literature.

4.1 Variance decompositions

Before considering more general distributional statistics, it is useful to recall the steps used to obtain the standard OB decomposition. The first step is to assume that the conditional expectation of image given image is linear, i.e. image. This follows directly from the linearity and zero conditional mean assumptions (Assumptions 10 and 11) introduced in Section 2. Using the law of conditional expectations, it then follows that the unconditional mean is image. This particular property of the mean is then used to compute the OB decomposition.

In light of this, it is natural to think of extending this type of procedure to the case of the variance. Using the analysis of variance formula, the unconditional variance of image can be written as:40


image


where the expectations are taken over the distribution of image. The first component of the equation is the within-group component (also called residual variance), while the second component is the between-group component (also called regression variance). Writing imageimage, image, we can write the difference in variances across groups image and image as


image


A few manipulations yield image, where


image


and


image


While it is straightforward to estimate the regression coefficients (image and image) and the covariance matrices of the covariates (image and image), the within-group (or residual) variance terms image and vimage also have to be estimated to compute the decomposition.

Several approaches have been used in the literature to estimate vimage and image. The simplest possible approach is to assume that the error term is homoscedastic, in which case image and image, and the two relevant variance parameters can be estimated from the sampling variance of the error terms in the regressions. The homoscedasticity assumption is very strong, however. When errors are heteroscedastic, differences between image and image can reflect spurious composition effects, in which case the decomposition will attribute to the wage structure effect (image) what should really be a composition effect (image). Lemieux (2006b) has shown this was a major problem when looking at changes in residual wage inequality in the United States since the late 1980s.

A simple way of capturing at least some of the relationship between the covariates and the conditional variance is to compute the variance of residuals for a limited number of subgroups of “cells”. For instance, Lemieux (2006b) shows estimates for 20 different subgroups of workers (based on education and experience), while Card (1996) divides the sample into five quintiles based on predicted wages image.

Finally, one could attempt to estimate a more general specification for the conditional variance by running a “second step” model for squared regression residual image on some specification of the covariates. For example, assuming that image, we can estimate image by running a regression of image on image.41 We can then write the two aggregate components of the variance decomposition as:


image     (25)


and


image     (26)


Compared to the standard OB decomposition for the mean, which only requires estimating a (regression) model for the conditional mean, in the case of the variance, we also need to estimate a model for the conditional variance. While this is quite feasible in practice, we can already see a number of challenges involved when decomposing distributional parameters beyond the mean:

The estimation is more involved since we need to estimate models for two, instead of just one, conditional moment. Furthermore, little guidance is typically available on “reasonable” specifications for the conditional variance. For instance, in the case of wages, the Mincer equation provides a reasonably accurate and widely accepted specification for the conditional mean, while no such standard model is available for the conditional variance.
Computing the detailed decomposition is more complicated since the between-group component is a quadratic form in the image’s. This yields a number of interaction terms that are difficult to interpret.

Since the complexity of decomposition methods already increases for a distributional measure as simple and convenient as the variance, this suggests these problems will be compounded in the case of other distributional measures such as quantiles. Indeed, we show in the next subsection that for quantiles, attempts at generalizing the approach suggested here require estimating the entire conditional distribution of image given image. This is a more daunting estimation challenge, and we now discuss solutions that have been suggested in the literature.

4.2 Going beyond the variance: general framework

An important limitation of summary measures of dispersion such as the variance, the Gini coefficient or the Theil coefficient is that they provide little information regarding what happens where in the distribution. This is an important shortcoming in the literature on changes in wage inequality where many important explanations of the observed changes have specific implications for specific points of the distribution. For instance, the minimum wage explanation suggested by DiNardo et al. (1996) should only affect the bottom end of the distribution. At the other extreme, explanations based on how top executives are compensated should only affect the top of the distribution. Other explanations based on de-unionization (Freeman, 1993; Card, 1992; DiNardo et al., 1996) and the computerization of “routine” jobs (Autor et al., 2003) tend to affect the middle (or “lower middle”) of the distribution. As a result, it is imperative to go beyond summary measures such as the variance to better understand the sources of growing wage inequality.

Going beyond summary measures is also important in many other interesting economic problems such the sources of the gender wage gap and the impact of social programs on labor supply.42 The most common approach for achieving this goal is to perform a decomposition for various quantiles (or differences between quantiles like the 90-10 gap) of the distribution. Unfortunately, as we point out in the introduction, it is much more difficult to decompose quantiles than the mean or even the variance. The basic problem is that the law of iterated expectations does not hold in the case of quantiles, i.e. image, where image, is the imageth quantile of the (unconditional) distribution of image, and image is the corresponding conditional quantile.

As it turns out, one (implicitly) needs to know the entire conditional distribution of image given image given to compute image. To see this, note that


image


where image is the cumulative distribution of image conditional on image in group image. Given image, it is possible to implicitly use this equation to solve for image. It is also clear that in order to do so we need to know the conditional distribution function image, as opposed to just the conditional mean and variance, as was the case for the variance. Estimating an entire conditional distribution function for each value of image is a difficult problem. Various decomposition methods that we discuss in detail below suggest different ways of handling this challenge.

But before covering them in detail, we recall the basic principles underlying these methods. As in Section 2, we focus on cumulative distributions since any standard distribution statistic, such as a quantile, can be directly computed from the cumulative distribution. For instance, quantiles of the counterfactual distribution can be obtained by inverting image:  image.

For the sake of presentational simplicity, we introduce a simplified notation relative to Section 2. We use image instead of image to represent the marginal distribution of image, and image to represent image the conditional distributions, for image, introduced in Eq. (4). We use the shorthand image instead of image to represent the key counterfactual distribution of interest introduced in Eq. (5), which mixes the distribution of characteristics of group B with the wage structure from group A:


image     (27)


Three general approaches have been suggested in the decomposition literature for estimating the counterfactual distribution image. A first general approach, initially suggested by Juhn et al. (1993), replaces each value of image for group image with a counterfactual value of image, where image is an imputation function. The idea is to replace image from group image with a counterfactual value of image that holds the same rank in the conditional distribution image as it did in the original distribution of image. As we discussed in Section 2.2.3, this is done in practice using a residual imputation procedure. Machado and Mata (2005) and Autor et al. (2005) have later suggested other approaches, based on conditional quantile regressions, to transform a wage observation image into a counterfactual observation image.

A second approach proposed by DiNardo et al. (1996) [DFL] is based on the following manipulation of Eq. (27):


image     (28)


where image is a reweighting factor. This makes it clear that the counterfactual distribution image is simply a reweighted version of the distribution image. The reweighting factor is a simple function of image that can be easily estimated using standard methods such as a logit or probit. The basic idea of the DFL approach is to start with group image, and then replace the distribution of image of group image (image) with the distribution of image of group image (image) using the reweighting factor image.

The third set of approaches also works with Eq. (27) starting with group image, and then replacing the conditional distribution image with image. Doing so is more involved, from an estimation point of view, than following the DFL approach. The problem is that the conditional distributions depend on both image and image, while the reweighting factor image only depends on image.

Under this third set of approaches, one needs to directly estimate the conditional distribution image. Parametric approaches for doing so were suggested by Donald et al. (2000) who used a hazard model approach, and Fortin and Lemieux (1998) who suggested estimating an ordered probit. More recently, Chernozhukov et al. (2009) suggest estimating distributional regressions (e.g. a logit, for each value of image). In all cases, the idea is to replace the conditional distribution for group image, image, with an estimate of the conditional distribution image obtained using one of these methods.

In the next subsections, we discuss how these various approaches can be implemented. We also present some results regarding their statistical properties, and address computational issues linked to their implementation.

4.3 Residual imputation approach: JMP

Procedure

As we explain above, Juhn et al. (1993) propose an imputation approach where the wage image from group image is replaced by a counterfactual wage image where both the returns to observables and unobservables are set to be as in group image. The implementation of this procedure is divided in two steps. First, unobservables are replaced by counterfactual unobservables, as in Eq. (9). Second, counterfactual returns to observables are also imputed, as in Eq. (12).43

Under the assumption of additive linearity (Assumption 10), the original wage equation for individual image from group image,


image


allows the returns to unobservables to be group-specific. Under the assumption of rank preservation (14), the first counterfactual is computed as


image     (29)


where


image


and image is the conditional rank of image in the distribution of residuals for group image (image). A second counterfactual is then obtained by also replacing the returns to observable characteristics image with image


image


Under the assumptions of linearity and rank preservation, this counterfactual wage should be the same as image, the counterfactual wage obtained by replacing the wage structure image with image.

In practice, it is straightforward to estimate image and image using OLS under the assumptions of linearity and zero conditional mean. It is much less clear, however, how to perform the residual imputation procedure described above. Under the strong assumption that the regression residuals image are independent of image, it follows that


image


Under this independence assumption, one simply needs to compute the rank of the residual image in the marginal distribution (distribution over the whole sample) of residuals for group image, and then pick the corresponding residuals in the marginal distribution of residuals for group image. If image is at the imageth percentile of the distribution of residuals of group image (image), then image will simply be the 70th percentile of the distribution of residuals for group image. In practice, most applications of the JMP procedure use this strong assumption of independence because there is little guidance on how a conditional imputation procedure could be used instead.

Limitations

Since independence of regression residuals is unrealistic, a more accurate implementation of JMP would require deciding how to condition on image when performing the imputation procedure. If image consists of a limited number of groups or “cells”, then one could perform the imputation within each of these groups. In general, however, it is difficult to know how to implement this ranking/imputation procedure in more general cases. As a result, other procedures such as the quantile method of Machado and Mata (2005) are increasingly being used as an alternative to JMP.

Another limitation of the JMP procedure is that there is no natural way of extending it to the case of the detailed decomposition for the composition effect.

Advantages

One advantage of the two-step procedure is that it provides a way of separating the between- and within-group components, as in a variance decomposition. This plays an important role in the inequality literature, since JMP concluded that most of the inequality growth from the 1960s to the 1980s was linked to the residual inequality component.

It is not clear, however, what is meant by between- and within-group components in the case of distributional measures like the 90-10 gap that are not decomposable. A better way of justifying JMP is that image represents a structural model where image are observed skills, while image represents unobserved skills. One can then perform simulation exercises asking what happens to the distribution when one either replaces returns to observed or unobserved skills (see also Section 2.2.3).

This economic interpretation also requires, however, some fairly strong assumptions. The two most important assumptions are the linearity of the model (Assumption 10, image) and rank preservation (Assumption 14). While linearity can be viewed as a useful approximation, rank preservation is much stronger since it means that someone with the same unobserved skills would be in the exact same position, conditional on image, in either group image or image. Just adding measurement error to the model would result in a violation of rank preservation.

Finally, if one is willing to interpret a simple regression as a decomposition between observed and unobserved skills, this can be combined with methods other than JMP. For instance, DFL perform regression adjustments to illustrate the effects of supply and demand factors on wages. 44

4.4 Methods based on conditional quantiles

Procedure

Like JMP, Machado and Mata (2005, MM from hereinafter) propose a procedure based on transforming a wage observation image into a counterfactual observation image. The main advantage relative to JMP is that their estimation procedure, based on quantile regressions (Koenker and Bassett, 1978), provides an explicit way of estimating the (inverse) conditional distribution function image in the imputation function image. One important difference, however, is that instead of transforming each actual observation of image into a counterfactual image, MM use a simulation approach where quantiles are drawn at random.

More specifically, since 


image


and image follows a uniform distribution, one can think of doing the following:

1. Draw a simulated value image from a uniform distribution image.
2. Estimate a linear quantile regression for the imageth quantile, and use the estimated result to predict simulated values of both image and image.45 The reason for using quantile regressions is that:

image


where image and image are the conditional quantile functions for the imageth quantile in group image and image, respectively.

3. Compare the simulated distributions of image and image to obtain measures of the wage structure effect. The composition effect is computed as the complement to the overall difference.

A key implementation question is how to specify the functional forms for the conditional quantile functions. MM suggest a linear specification in the X that can be estimated using quantile regression methods. The conditional quantile regression models can be written as:


image


Table 4 reports in the top panel the results of the Machado-Mata procedure applied to our gender gap example using the male wage structure as reference. 46 It shows that the median gender log wage gap in the central column gives almost the same results for the aggregate decomposition as the OB mean gender log wage gap decomposition displayed in column (1) of Table 3. Going across the columns to compare quantile effects shows that gender differences in characteristics are much more important at the bottom (10th centile) than at the top (90th centile) of the wage distribution. Indeed, some significant wage structure effects emerge at the 90th percentile.

Table 4 Gender wage gap: quantile decomposition results (NLSY, 2000).

image

Limitations

This decomposition method is computationally demanding, and becomes quite cumbersome for data sets numbering more than a few thousand observations. Bootstrapping quantile regressions for sizeable number of quantiles image (100 would be a minimum) is computationally tedious with large data sets. The implementation of the procedure can be simplified by estimating a large number of quantile regressions (say 99, one for each percentile from 1 to 99) instead of drawing values of image at random.47

Another limitation is that the linear specification is restrictive and finding the correct functional form for the conditional quantile regressions can be tedious. For instance, if there is a spike at the minimum wage in the wage distribution, this will result in flat spots in quantile regressions that would have to be captured with spline functions with knots that depend on image. Accurately describing a simple distribution with mass points (as is commonly observed in wage data) can, therefore, be quite difficult to do using quantile regressions.

As pointed out by Chernozhukov et al. (2009), it is not very natural to estimate inverse conditional distribution functions (quantile regressions) when the main goal of counterfactual exercises is to replace the conditional distribution function image with image to obtain Eq. (27). Chernozhukov et al. (2009) suggest instead to estimate directly distributional regression models for image, which is a more direct way of approaching the problem.

Advantages

One advantage of the MM approach is that it provides a natural way of performing a detailed decomposition for the wage structure component. The idea is to successively replace the elements of image by those of image when performing the simulations, keeping in mind that this type of detailed decomposition is path dependent. Unfortunately, the MM approach does not provide a way of performing the detailed decomposition for the composition effect.48 This is a major drawback since the detailed decomposition of the composition effects is always clearly interpretable, while the detailed decomposition of the wage structure effect arbitrarily depends on the choice of the omitted group.

4.5 Reweighting methods

Procedure

As we mention in Section 4.2, another way of estimating the counterfactual distribution image is to replace the marginal distribution of image for group image with the marginal distribution of image for group image using a reweighting factor image. This idea was first introduced in the decomposition literature by DiNardo, Fortin and Lemieux [DFL] (1996). While DFL focus on the estimation of counterfactual densities in their empirical application, the method is easily applicable to any distributional statistic.

In practice, the DFL reweighting method is similar to the propensity score reweighting method commonly used in the program evaluation literature (see Hirano et al. (2003)). For instance, in DFL’s application to changes in wage inequality in the United States, time is viewed as a state variable, or in the context of the treatment effects literature as a treatment.49 The impact of a particular factor or set of factors on changes in the wage distribution over time is constructed by considering the counterfactual state of the world where the distribution of this factor remained fixed in time, maintaining the Assumption 6 of invariance of the conditional distribution. Note that in contrast with the notation of this chapter, in DFL, time period 1 is used as reference group.50 The choice of period 0 or period 1 as the reference group is analogous to the choice of whether the female or the male wage structure should be the reference wage structure in the analysis of the gender wage gap and is expected to yield different results in most cases.

In DFL, manipulations of the wage distributions, computed through reweighting, are applied to non-parametric estimates of the wage density, which can be particularly useful when local distortions, from minimum wage effects for example, are at play. To be consistent with the rest of this section, however, we focus our discussion on the cumulative distribution instead of the density. The key counterfactual distribution of interest, shown in Eq. (27) (distribution of wages that would prevail for workers in group image if they had the distribution of characteristics of group image) is constructed, as shown in Eq. (28), using the reweighting factor


image


Although the reweighting factor is the ratio of two multivariate marginal distribution functions (of the covariates image), this expression can be simplified using Bayes’ rule. Remembering that Bayes’ rule states that


image


we have


image


and a similar expression for image. Since image and image, the reweighting factor that keeps all conditioning variables as in period 0 becomes


image


The reweighting factor can be easily computed by estimating a probability model for image, and using the predicted probabilities to compute a value image for each observation. DFL suggest estimating a flexible logit or probit model, while Hirano, Imbens, and Ridder propose to use a non-parametric logit model.51

The reweighting decomposition procedure can be implemented in practice as follows:

1. Pool the data for group image and image and run a logit or probit model for the probability of belonging to group image:

image     (30)


where image is either a normal or logit link function, and image is a polynomial in image.

2. Estimate the reweighting factor image for observations in group image using the predicted probability of belonging to group image (image) and image (image), and the sample proportions in group image (image) and image (image):

image


3. Compute the counterfactual statistic of interest using observations from the group A sample reweighted using image.

In DFL, the main object of interest is the probability density function, which is estimated using kernel density methods. The density for group image and the counterfactual density can be estimated as follows using kernel density methods, where image is the kernel function:52

image

image

Consider the density function for group image, image, and the counterfactual density image. The composition effect in a decomposition of densities is:


image     (31)


Various statistics from the wage distribution, such as the 10th, 50th, and 90th percentile, or the variance, Gini, or Theil coefficients can be computed either from the counterfactual density or the counterfactual distribution using the reweighting factor. The latter procedure is easier to use as it simply involves computing (weighted) statistics using standard computer packages. For example, the counterfactual variance can be computed as:


image


where the counterfactual mean image is:


image


For the 90-10, 90-50, and 50-10 wage differentials, the sought-after contributions to changes in inequality are computed as differences in the composition effects, for example,


image     (32)


Table 5 presents, in panel A, the results of a DFL decomposition of changes over time in male wage inequality using large samples from combined MORG-CPS data as in Firpo et al. (2007). In this decomposition, the counterfactual distribution of wages in 1983/85 is constructed by reweighting the characteristics of workers in 1983/85 (time period 0) so that they look like those of 2003/05 (time period 1) workers, holding the conditional distribution of wages in 1983/85 fixed.53 The results of the aggregate decomposition, reported in the first three rows of Table 5, show that composition effects play a large role in changes in overall wage inequality, as measured by the 90-10 log wage differential or the variance of log wages. But the wage structure effects are more important when looking for increases at the top of the wage distribution, as measured by the 90-50 log wage differential, or decreases in the bottom, as measured by the 50-10 log wage differential.

Table 5 Male wage inequality: aggregate decomposition results (CPS, 1983/85-2003/05)

image

Advantages

The main advantage of the reweighting approach is its simplicity. The aggregate decomposition for any distributional statistic is easily computed by running a single probability model (logit or probit) and using standard packages to compute distributional statistics with image as weight.54

Another more methodological advantage is that formal results from Hirano et al. (2003) and Firpo (2007, 2010) establish the efficiency of this estimation method. Note that although it is possible to compute analytically the standard errors of the different elements of the decomposition obtained by reweighting, it is simpler in most cases to conduct inference by bootstrapping.55

For these two reasons, we recommend the reweighting approach as the method of choice for computing the aggregate decomposition. This recommendation even applies in the simple case of the mean decomposition. As pointed out by Barsky et al. (2002), a standard OB decomposition based on a linear regression model will yield biased estimates of the decomposition terms when the underlying conditional expectation of image given image is non-linear (see Section 3.4). They suggest using a reweighting approach as an alternative, and the results of Hirano et al. (2003) can be used to show that the resulting decomposition is efficient.

Limitations

A first limitation of the reweighting method is that it is not straightforwardly extended to the case of the detailed decomposition. One exception is the case of binary covariates where it is relatively easily to compute the corresponding element of the decomposition. For instance, in the case of the union status (a binary covariate), DFL show how to compute the component of the composition corresponding to this particular covariate. It also relatively easy to compute the corresponding element of the wage structure effect. We discuss in Section 5 other options that can be used in the case of non-binary covariates.

As in the program evaluation literature, reweighting can have some undesirable properties in small samples when there is a problem of common support. The problem is that the estimated value of image becomes very large when image gets close to 1. While lack of common support is a problem for any decomposition procedure, Frolich (2004) finds that reweighting estimators perform particularly poorly in this context, though Busso et al. (2009) reach the opposite conclusion using a different simulation experiment.56

Finally, even in cases where a pure reweighting approach has some limitations, there may be gains in combining reweighting with other approaches. For instance, we discuss in the next section how reweighting can be used to improve a decomposition based on the image-regression approach of Firpo et al. (2009). Lemieux (2002) also discusses how an hybrid approach based on DFL reweighting and the JMP decomposition procedure can be used to compute both the between- and within-group components of the composition and wage structure effects.

4.6 Methods based on estimating the conditional distribution

Procedure(s)

As mentioned above, when we first introduced the key counterfactual distribution of interest in Eq. (5), an alternative approach to the construction of this counterfactual is based on the estimation of the conditional distribution of the outcome variable, image. The counterfactual distribution is then estimated by integrating this conditional distribution over the distribution of image in group image.

Two early parametric methods based on this idea were suggested by Donald et al. (2000), and Fortin and Lemieux (1998).57 Donald, Green and Paarsch propose estimating the conditional distribution using a hazard model. The (conditional) hazard function is defined as


image


where image is the survivor function. Therefore, the conditional distribution of the outcome variable, image, or its density, image, is easily recovered from the estimates of the hazard model. For instance, in the standard proportional hazard model58


image


estimates of image and of the baseline hazard image can be used to recover the conditional distribution


image


where image is the integrated baseline hazard.

Fortin and Lemieux (1998) suggest estimating an ordered probit model instead of a hazard model. They consider the following model for the outcome variable image:


image


where image is a monotonically increasing transformation function. The latent variable image, interpreted as a latent “skill index” by Fortin and Lemieux, is defined as


image


where image is assumed to follow a standard normal distribution. It follows that the conditional distribution of image is given by


image


Fortin and Lemieux implement this in practice by discretizing the outcome variable into a large number of small bins. Each bin image corresponds to values of image between the two thresholds image and image. The conditional probability of image being in bin image is


image


This corresponds to an ordered probit model where the image parameters (for image) are the usual latent variable thresholds. The estimated values of image and of the thresholds can then be used to construct the counterfactual distribution, just as in Donald et al. (2000).

To be more concrete, the following steps could be used to estimate the counterfactual distribution image at the point image:

1. Estimate the ordered probit for group image. This yields estimates image and image, the ordered probit parameters.
2. Compute the predicted probability image for each individual image in group image.
3. For each threshold image, compute the sample average of image over all observations in group image:

image


Repeating this for a large number of values of image will provide an estimate of the counterfactual distribution image.

In a similar spirit, Chernozhukov et al. (2009) suggest a more flexible distribution regression approach for estimating the conditional distribution image. Following Foresi and Peracchi (1995), the idea is to estimate a separate regression model for each value of image. They consider the model image, where image is a known link function. For example, if image is a logistic function, image can be estimated by creating a dummy variable image indicating whether the value of image is below image, where image is the indicator function, and running a logit regression of image on image to estimate image.

Similarly, if the link function is the identity function (image) the probability model is a linear probability model. If the link function is the normal CDF (image) the probability model is a probit. Compared to Fortin and Lemieux (1998), Chernozhukov et al. (2009) suggest estimating a separate probit for each value of image, while Fortin and Lemieux use a more restrictive model where only the intercept (the threshold in the ordered probit) is allowed to change for different values of image.

As above, the counterfactual distribution can be obtained by first estimating the regression model (probit, logit, or LPM) for group image to obtain the parameter estimates image, computing the predicted probabilities image, and averaging over these predicted probabilities to get the counterfactual distribution image:


image


Once the counterfactual distribution image has been estimated, counterfactual quantiles can be obtained by inverting the estimated distribution function. Consider image, the imageth quantile of the counterfactual distribution image. The estimated counterfactual quantile is:


image


It is useful to illustrate graphically how the estimation of the counterfactual distribution image and the inversion into quantiles can be performed in practice. Figure 1 first shows the actual CDF’s for group image, image, and image, image, respectively. The squares in between the two cumulative distributions illustrate examples of counterfactuals computed using the one of the method discussed above.

image

Figure 1 Relationship between proportions and quantiles.

For example, consider the case of the median wage for group image, image. Using the distribution regression approach of Chernozhukov et al. (2009), one can estimate, for example, a LPM by running a regression of image on image for group image. This yields an estimate of image that can then be used to compute image. This counterfactual proportion is represented by the square on the vertical line over image in Fig. 1.

Figure 2 then illustrates what happens when a similar exercise is performed for a larger number of values of image (100 in this particular figure). It now becomes clear from the figure how to numerically perform the inversion. In the case of the median, the total gap between group image and image is image. The counterfactual median can then be estimated by picking the corresponding point image on the counterfactual function defined by the set of points estimated by running a set of LPM at different values of image. In practice, one could compute the precise value of image by estimating the LPMs (or a logit or probit) for a large number of values of image, and then “connecting the dots” (i.e. using linear interpolations) between these different values.

image

Figure 2 Inverting globally.

Figure 2 also illustrates one of the key messages of the chapter listed in the introduction, namely that is it easier to estimate models for proportions than quantiles. In Fig. 2, the difference in the proportion of observations under a given value of image is simply the vertical distance between the two cumulative distributions, image. Decomposing this particular gap in proportion is not a very difficult problem. As discussed in Section 3.5, one can simply run a LPM and perform a standard OB decomposition. An alternative also discussed in Section 3.5 is to perform a nonlinear decomposition using a logit or probit model. The conditional distribution methods of Fortin and Lemieux (1998) and Chernozhukov et al. (2009) essentially amount to computing this decomposition in the vertical dimension.

By contrast, it is not clear at first glance how to decompose the horizontal distance, or quantile gap, between the two curves. But since the vertical and horizontal are just two different ways of describing the same difference between the two cumulative distributions image and image, one can perform a first decomposition either vertically or horizontally, and then invert back to get the decomposition in the other dimension. Since decomposing proportions (the vertical distance) is relatively easy, this suggests first performing the decomposition on proportions at many points of the distribution, and then inverting back to get the decomposition in the quantile dimension (the horizontal distance).

Table 5 reports, in panels B and C, the results of the aggregate decomposition results for male wages using the method of Chernozhukov et al. (2009). The counterfactual wage distribution is constructed by asking what would be the distribution of wages in 1983/85 if the conditional distribution was as in 2003/05. Panel B uses the LPM to estimate image while the logit model is used in Panel C.59 The first rows of Panel B and C show the changes in the wage differentials based on the fitted distributions, so that any discrepancies between these rows in the first row of Panel A shows the estimation errors. The second rows report the composition effects computed as the difference between the fitted distribution in 1983/85 and the counterfactual distribution. Given our relatively large sample, the differences across estimators in the different panels are at times statistically different. However, the results from the logit estimation in Panel C give results that are qualitatively similar to the DFL results shown in Panel A, with composition effects being relatively more important in accounting for overall wage inequality, as measured by the 90-10 log wage differential, and wage structure effects playing a relatively more important role in increasing wage inequality at the top and reducing wage inequality at the bottom.

Limitations

If one is just interested in performing an aggregate distribution, it is preferable to simply use the reweighting methods discussed above. Like the conditional quantile methods discussed in Section 4.4, conditional distribution methods require some parametric assumptions on the distribution regressions that may or may not be valid. Chernozhukov, Fernandez-Val, and Melly’s distribution regression approach is more flexible than earlier suggestions by Donald et al. (2000) and Fortin and Lemieux (1998), but it potentially involves estimating a large number of regressions.

Running unconstrained regressions for a large number of values of image may result, however, in non-monotonicities in the estimated counterfactual distribution image. Smoothing or related methods then have to be used to make sure that the counterfactual distribution is monotonic and, thus, invertible into quantiles.60 By contrast, reweighting methods require estimating just one flexible logit or probit regression, which is very easy to implement in practice.

Advantages

An important advantage of distribution regression methods over reweighting is that they can be readily generalized to the case of the detailed decomposition, although these decomposition will be path dependent. We show in the next section how Chernozhukov, Fernandez-Val, and Melly’s distribution regression approach, and the related image regression method of Firpo et al. (2009) can be used to perform a detailed decomposition very much in the spirit of the traditional OB decomposition for the mean.

4.7 Summary

In this section we discuss most of the existing methods that have been proposed to perform an aggregate decomposition for general distributional statistics. While all these methods could, in principle, yield similar results, we argue that DFL reweighting is the method of choice in this context for two main reasons. First, it is simple to implement as it simply involves estimating a single logit or probit model for computing the reweighting factors. Counterfactual values of any distributional statistical can then be readily computed from the reweighted sample. By contrast, methods that yield counterfactual estimates of quantiles or the whole CDF require estimating a separate model at a large number of points in the distribution.

The second advantage of reweighting is that there are well established results in the program evaluation that show that the method is asymptotically efficient (Hirano et al., 2003; Firpo, 2007).

5 Detailed Decompositions for General Distributional Statistics

In this section, we extend the methods introduced above for the aggregate decomposition to the case of the detailed decomposition. We first show that conditional distribution methods based on distribution regressions can be used to compute both the composition and wage structure subcomponents of the detailed decomposition. We then discuss a related method based the image-regressions introduced in Firpo et al. (2009). The main advantage of this last procedure is that it is regression based and, thus, as easy to use in practice as the traditional OB method.

The other methods proposed in Section 4 are not as easy to extend to the case of the detailed decomposition. We discuss, nonetheless, which elements of the detailed decomposition can be estimated using these various methods, and under which circumstances it is advantageous to use these methods instead of others.

5.1 Methods based on the conditional distribution

Procedure

In the case where the specification used for the distribution regression is the LPM, the aggregate decomposition of Section 4.6 can be generalized to the detailed decomposition as follows. Since the link function for the LPM is image, the counterfactual distribution used earlier becomes:


image


We can also write:


image


where the first term is the familiar wage structure effect, while the second term is the composition effect. The above equation can, therefore, be used to compute a detailed decomposition of the difference in the proportion of workers below wage image between groups image and image. We obtain the detailed distribution of quantiles by (i) computing the different counterfactuals for each element of image and image sequentially, for a large number of values of image, and (ii) inverting to get the corresponding quantiles for each detailed counterfactual. A similar approach could also be used when the link function is a probit or a logit by using the procedure suggested in Section 3.5.

Advantages

The main advantage of this method based on distribution regressions and the global inversion of counterfactual CDF into counterfactual quantiles (as in Fig. 2) is that it yields a detailed decomposition comparable to the OB decomposition of the mean.

Limitations

One limitation of this method is that it involves computing a large number of counterfactuals CDFs and quantiles, as the procedure has to be repeated for a sizable number of values of image. This can become cumbersome because of the potential non-monotonicity problems discussed earlier. Furthermore, the procedure suffers from the problem of path dependence since the different counterfactual elements of the detailed decomposition have to be computed sequentially. For these reasons, we next turn to a simpler approach based on a local, as opposed to a global, inversion of the CDF.

5.2 image-regression methods

Procedure

image-regression methods provide a simple way of performing detailed decompositions for any distributional statistic for which an influence function can be computed. Although we focus below on the case of quantiles of the unconditional distribution of the outcome variable, our empirical example includes the case of the variance and Gini. The procedure can be readily used to address glass ceiling issues in the context of the gender wage gap, or changes in the interquartile range in the context of changes in wage inequality. It can be used to either perform OB- type detailed decompositions, or a slightly modified “hybrid” version of the decomposition suggested by Firpo et al. (2007) (reweighting combined with image regressions, as in Section 3.4 for the mean).

A image-regression (Firpo et al., 2009) is similar to a standard regression, except that the dependent variable, image, is replaced by the (recentered) influence function of the statistic of interest. Consider image, the influence function corresponding to an observed wage image for the distributional statistic of interest, image. The recentered influence function (image) is defined as image, so that it aggregates back to the statistics of interest (image). In its simplest form, the approach assumes that the conditional expectation of the image can be modeled as a linear function of the explanatory variables,


image


where the parameters image can be estimated by OLS.61

In the case of quantiles, the influence function image is given by image, where image is an indicator function, image is the density of the marginal distribution of image, and image is the population image-quantile of the unconditional distribution of image. As a result, image is equal to image, and can be rewritten as


image     (33)


where image and image. Except for the constants image and image, the image for a quantile is simply an indicator variable image for whether the outcome variable is smaller or equal to the quantile image. Using the terminology introduced above, running a linear regression of image on image is a distributional regression estimated at image, using the link function of the linear probability model (image).

There is, thus, a close connection between image regressions and the distributional regression approach of Chernozhukov et al. (2009). In both cases, regression models are estimated for explaining the determinants of the proportion of workers earning less than a certain wage. As we saw in Fig. 2, in Chernozhukov et al. (2009) estimates of models for proportions are then globally inverted back into the space of quantiles. This provides a way of decomposing quantiles using a series of simple regression models for proportions.

Figure 3 shows that image-regressions for quantiles are based on a similar idea, except that the inversion is only performed locally. Suppose that after estimating a model for proportions, we compute a counterfactual proportion based on changing either the mean value of a covariate, or the return to the covariate estimated with the LPM regression. Under the assumption that the relationship between counterfactual proportions and counterfactual quantiles is locally linear, one can then go from the counterfactual proportion to the counterfactual quantile (both illustrated in Fig. 3) by moving along a line with a slope given by the slope of the counterfactual distribution function. Since the slope of a cumulative distribution is the just the probability density function, one can easily go from proportions to quantiles by dividing the elements of the decomposition for proportions by the density.

image

Figure 3 image regressions: Inverting locally.

While the argument presented in Fig. 3 is a bit heuristic, it provides the basic intuition for how we can get a decomposition model for quantiles by simply dividing a model for proportions by the density. As we see in Eq. (33), in the image for quantiles, the indicator variable image is indeed divided by image (i.e. multiplying by the constant image).

Firpo et al. (2009) explain how to first compute the image, and then run regressions of the image on the vector of covariates. In the case of quantiles, the image is first estimated by computing the sample quantile image, and estimating the density at that point using kernel methods. An estimate of the image of each observation, image, is then obtained by plugging the estimates image and image into Eq. (33).

Letting the coefficients of the unconditional quantile regressions for each group be


image     (34)


we can write the equivalent of the OB decomposition for any unconditional quantile as

image     (35)

image     (36)

The second term in Eq. (36) can be rewritten in terms of the sum of the contribution of each covariate as


image


That is, the detailed elements of the composition effect can be computed in the same way as for the mean. Similarly, the detailed elements of the wage structure effects can be computed, but as in the case of the mean, these will also be subject to the problem of the omitted group.

Table 4 presents in its bottom panel such OB like gender wage gap decomposition of the 10th, 50th, and 90th percentiles of the unconditional distribution of wages corresponding to Tables 2 and 3 using the male coefficients as reference group and without reweighting. As with the MM decomposition presented in the top panel, the composition effects from the decomposition of the median gender pay gap reported in the central column of Table 4 are very close to those of the decomposition of the mean gender pay gap reported in column (1) of Table 3. As before, the wage structure effects in the relatively small NLSY sample are generally not statistically significant, with the exception of the industrial sectors which are, however, subject to the categorical variables problem. The comparison of the composition effects at the 10th and 90th percentiles shows that the impact of differences in life-time work experience is much larger at the bottom of the distribution than at the top where it is not statistically significant. Note that the aggregate decomposition results obtained using either the MM method or the image regressions do not exhibit statistically significant differences. Table 5 presents in Panel D the results of the aggregate decomposition using image-regressions without reweighting. The results are qualitatively similar to those of Panels A and C. Table 6 extends the analysis of the decomposition of male wage inequality presented in Table 5 to the detailed decomposition. For each inequality measures, the detailed decomposition are presented both for the extension of the classic OB decomposition in Eq. (36), and for the reweighted-regression decomposition, described in the case of the mean in Section 3.4. 62 For the reweighted-regression decomposition, Table 6 reports the detailed elements of the main composition effect image and the detailed elements of the main wage structure effect image, where

Table 6 Male wage inequality: FFL decomposition results (CPS, 1983/85-2003/05).

image image

image


and where the group image sample is reweighted to mimic the group image sample, which means we should have image. The total reweighting error image corresponds to the difference between the “Total explained” across the classic OB and the reweighted-regression decomposition. For example, for the 90-10 log wage differential, it is equal to image. 63 The total specification error, image, corresponds to the difference between the “Total wage structure” across the classic OB and the reweighted-regression decomposition and is found to be more important. In terms of composition effects, de-unionization is found to be an important factor accounting for the polarization of male wage inequality. It is also found to reduce inequality at the bottom, as measured by the 50-10 log wage differential, and to increase inequality at the top, as measured by the 90-50 log wage differential. In terms of wage structure effects, increases in the returns to education are found, as in Lemieux (2006a), to be the dominant factor accounting for overall increases in male wage inequality.

Advantages

The linearity of image regressions has several advantages. It is straightforward to invert the proportion of interest by dividing by the density. Since the inversion can be performed locally, another advantage is that we don’t need to evaluate the global impact at all points of the distribution and worry about monotonicity. One gets a simple regression which is easy to interpret. As a result, the resulting decomposition is path independent.

Limitations

Like many other methods, image regressions assume the invariance of the conditional distribution (i.e., no general equilibrium effects). Also, a legitimate practical issue is how good the approximation is. For relatively smooth dependent variables, such as test scores, it may be a moot point. But in the presence of considerable heaping (usually displayed in wage distribution), it may advisable to oversmooth density estimates and compare its values around the quantile of interest. This can be formally looked at by comparing reweighting estimates to the OB-type composition effect based on image regressions (the specification error discussed earlier).

5.3 A reweighting approach

Procedure(s)

As we mention in Section 4, it is relatively straightforward to extend the DFL reweighting method to perform a detailed decomposition in the case of binary covariates. DFL show how to compute the composition effect corresponding to a binary covariate (union status in their application). Likewise, DiNardo and Lemieux (1997) use yet another reweighting technique to compute the wage structure component. We first discuss the case where a covariate is a binary variable, and then discuss the case of categorical (with more than 2 categories) and continuous variables.

Binary covariate

Consider the case of one binary covariate, image, and a vector of other covariates, image. For instance, DiNardo et al. (1996) look at the case of unionization. They are interested in isolating the contribution of de-unionization to the composition effect by estimating what would have happened to the wage distribution if the distribution of unionization, but of none of the other covariates, had changed over time.

Letting image index the base period and image the end period, consider the counterfactual distribution image, which represents the period image distribution that would prevail if the conditional distribution of unionization (but of none of the other covariates image) was as in period image.64 Note that we are performing a counterfactual experiment by changing the conditional, as opposed to the marginal, distribution of unionization. Unless unionization is independent of other covariates (image), the marginal distribution of unionization, image, will depend on the distribution of image, image. For instance, if unionization is higher in the manufacturing sector, but the share of workers in manufacturing declines over time, the overall unionization rate will decline even if, conditional on industrial composition, the unionization rate remains the same.

Using the language of program evaluation, we want to make sure that secular changes in the rate of unionization are not confounded by other factors such as industrial change. This is achieved by looking at changes in the conditional, as opposed to the marginal distribution of unionization. Note that the main problem with the procedure suggested by MM to compute the elements of the composition effect corresponding to each covariate is that it fails to control this problem. MM suggest using an unconditional reweighting procedure based on the change in the marginal, as opposed to the conditional distribution of covariates. Unless the covariates are independent, this will yield biased estimates of the composition effect elements of the detailed decomposition.

The counterfactual distribution image is formally defined as


image


where the reweighting function is

image     (37)

image     (38)

Note that the conditional distribution image is assumed to be unaffected by the change in the conditional distribution of unionization (assumption of invariance of conditional distribution in Section 2). This amounts to assuming away selection into union status based on unobservables (after controlling for the other covariates image).

The reweighting factor image can be computed in practice by estimating two probit or logit models for the probability that a worker is unionized in period image and image, respectively. The resulting estimates can then be used to compute the predicted probability of being unionized (image and image) or not unionized (image and image), and then plugging these estimates into the above formula.

DiNardo and Lemieux (1997) use a closely related reweighting procedure to compute the wage structure component of the effect of unions on the wage distribution. Consider the question of what would happen to the wage distribution if no workers were unionized. The distribution of wages among non-union workers:


image


is not a proper counterfactual since the distribution of other covariates, image, may not be the same for union and non-union workers. DiNardo and Lemieux (1997) suggest solving this problem by reweighting non-union workers so that their distribution of image is the same as for the entire workforce. The reweighting factor that accomplishes this at time image and image are image and image, respectively, where:


image


Using these reweighting terms, we can write the counterfactual distribution of wages that would have prevailed in the absence of unions as:


image


These various counterfactual distributions can then be used to compute the contribution of unions (or another binary variable image) to the composition effect, image, and to the wage structure effect, image:


image     (39)


and


image     (40)


Although we need three different reweighting factors (image, image, and image) to compute the elements of the detailed wage decomposition corresponding to image, these three reweighting factors can be constructed from the estimates of the two probability models image and image. As before, once these reweighting factors have been computed, the different counterfactual statistics are easily obtained using standard statistical packages.

General covariates

It is difficult to generalize the approach suggested above to the case of covariates that are not binary. In the case of the composition effect, one approach that has been followed in the applied literature consists of sequentially adding covariates in the probability model image used to compute image.65 For instance, start with image, compute image and the counterfactual statistics of interest by reweighting. Then do the same thing with image, etc.

One shortcoming of this approach is that the results depend on the order in which the covariates are sequentially introduced, just like results from a sequential decomposition for the mean also depend on the order in which the covariates are introduced in the regression. For instance, estimates of the effect of unions that fail to control for any other covariates may be overstated if union workers tend to be concentrated in industries that would pay high wages even in the absence of unions. As pointed out by Gelbach (2009), the problem with sequentially introducing covariates can be thought of as an omitted variable problem. Unless there are compelling economic reasons for first looking at the effect of some covariates without controlling for the other covariates, sequential decompositions will have the undesirable property of depending (strongly in some cases) on the order of the decomposition (path dependence).66

Fortunately, there is a way around the problem of path dependence when performing detailed decompositions using reweighting methods. The approach however still suffers from the adding-up problem and is more appropriate when only the effect of a particular factor is of interest. To illustrate this approach, consider a case with three covariates image, image, and image. In a sequential decomposition, one would first control for image only, then for image and image, and finally for image, image, and image. On the one hand, the regression coefficient on image and/or image in regressions that fail to control for image are biased because of the omitted variable problem. The corresponding elements of a detailed OB decomposition for the mean based on these estimated coefficients would, therefore, be biased too.

On the other hand, the coefficient on the last covariate to be introduced in the regression (image) is not biased since the other covariates (image and image) are also controlled for. So although order matters in a sequential regression approach, the effect of the last covariate to be introduced is not affected by the omitted variable bias.

The same logic applies in the case of detailed decompositions based on a reweighting approach. Intuitively, the difference in the counterfactual distribution one gets by reweighting with image and image only, comparing to reweighting with image, image, and image should yield the appropriate contribution of image to the composition effect.

To see this more formally, consider the group image counterfactual distribution that would prevail if the distribution of image, conditional on image, image, was as in group image:


image


where the reweighting factor image can be written as:


image


image is the reweighting factor used to compute the aggregate decomposition in Section 4.5. image is a reweighting factor based on all the covariates except the one considered for the detailed decomposition (image). As before, Bayes’ rule can be used to show that:


image


Once again, this new reweighting factor is easily computed by running a probit or logit regression (with image and image as covariates) and using predicted probability to estimate image.

This reweighting procedure for the detailed decomposition is summarized as follows:

1. Compute the reweighting factor using all covariates, image.
2. For each individual covariate image, compute the reweighting factor using all covariates but image, image.
3. For each covariate image, compute the counterfactual statistic of interest using the ratio of reweighting factors image as weight, and compare it to the counterfactual statistic obtained using only image as weight. The difference is the estimated contribution of covariate image to the composition effect.

Note that while this procedure does not suffer from path dependence, the contribution of each covariates does not sum up to the total contribution of covariates (aggregate composition effect). The difference is an interaction effect between the different covariates which is harder to interpret.

Advantages

This reweighting procedure shares most of the advantages of the other reweighting procedures we proposed for the aggregate decomposition. First, it is generally easy to implement in practice. Second, by using a flexible specification for the logit/probit, it is possible to get estimates of the various components of the decomposition that depend minimally on functional form assumptions. Third, the procedure yields efficient estimates.

Limitations

With a large number of covariates, one needs to compute a sizable number of reweighting factors to compute the various elements of the detailed decomposition. This can be tedious, although it does not require that much in terms of computations since each probit/logit is easy to estimate. Another disadvantage of the suggested decomposition is that although it does not suffer from the problem of path dependence, we are still left with an interaction term which is difficult to interpret. For these reasons, we suggest to first use a regression-based approach like the image-regression approach discussed above, which is essentially as easy to compute as a standard OB decomposition. The reweighting procedure suggested here can then be used to probe these results, and make sure they are robust to the functional-form assumptions implicit in the image-regression approach.

5.4 Detailed decomposition based on conditional quantiles

As we mentioned earlier, the method of Machado and Mata (2005) can be used to compute the wage structure sub-components of the detailed decomposition. These components are computed by sequentially switching the coefficients of the quantile regressions for each covariate from their estimated valued for group image to their estimated values for group image. This sequential switching cannot be used, however, to compute the sub-components of the composition effect of the detailed decomposition. Rather, Machado and Mata (2005) suggest an unconditional reweighting approach to do so. This does not provide a consistent effect since the effect of the reweighted covariate of interest gets confounded by other covariates correlated with that same covariate. For instance, if union workers are more concentrated in manufacturing, doing an unconditional reweighting on unions will also change the fraction of workers in manufacturing. In this sense the effect of unions is getting confounded by the effect of manufacturing.

This is a significant drawback since it is arguably more important to conduct a detailed decomposition for the composition effect than for the wage structure effect. As discussed earlier, there are always some interpretation problems with the detailed components of the wage structure effect because of the omitted group problem.

One solution is to use the conditional reweighting procedure described above instead. But once this type of reweighting approach is used, there is no need to estimate (conditional) quantile regressions. Unless the quantile regressions are of interest on their own, it is preferable to use a more consistent approach, such as the one based on the estimation of image-regressions, for estimating the detailed components of both the wage structure and composition effects.

6 Extensions

In this section, we present three extensions to the decomposition methods discussed earlier. We first consider the case where either the ignorability or the zero conditional mean assumptions are violated because of self-selection or endogeneity of the covariates. We next discuss the situation where some of these problems can be addressed when panel data are available. We conclude the section by discussing the connection between conventional decomposition methods and structural modeling.

6.1 Dealing with self-selection and endogeneity

The various decomposition procedures discussed up to this point provide consistent estimates of the aggregate composition and wage structure effects under the ignorability assumption. Stronger assumptions, such as conditional mean independence (for decompositions of the mean) or straight independence, have to be invoked to perform the detailed decomposition. In this section we discuss some alternatives for estimating the decomposition when these assumptions fail. We mostly focus on the case of the OB decomposition of the mean, though some of the results we present could be extended to more general distributional statistics.

We consider three scenarios, first introduced in Section 2.1.6, under which the OB decomposition is inconsistent because of a failure of the ignorability or conditional independence assumption. In the first case, the problem is that individuals from groups image and image may self-select differently into the labor market. For instance, participation decisions of men (group image) may be different from participation decisions of women (group image) in ways that are not captured by observable characteristics. In the second case, we consider what happens when individuals can self-select into group image or image (for instance union and non-union jobs) on the basis of unobservables. The third case is a standard endogeneity problem where the covariates are correlated with the error term. For example, education (one of the covariate) may be correlated with the error term because more able individuals tend to get more schooling.

1. Differential self-selection within groups A and B.

One major concern when decomposing differences in wages between two groups with very different labor force participation rates is that the probability of participation depends on unobservables image in different ways for groups image and image. This is a well known problem in the gender wage gap literature (Blau and Kahn, 2006; Olivetti and Petrongolo, 2008; Mulligan and Rubinstein, 2008, etc.) and in the black-white wage gap literature (Neal and Johnson, 1996).

Our estimates of decomposition terms may be directly affected when workers of groups image and image self-select into the labor market differently. Thus, controlling for selection based on observables and unobservables is necessary to guarantee point identification of the decomposition terms. If no convincing models for self-selection is available a more agnostic approach based on bounds has also been recently proposed. Therefore, following Machado (2009), we distinguish three branches in the literature of self-selection: (i) selection on observables; (ii) selection based on unobservables; (iii) bounds.

Selection based on observables and, when panel data are available, on time-invariant unobserved components can be used to impute values for the missing data on wages of non-participants. Representative papers of this approach are Neal and Johnson (1996), Johnson et al. (2000), Neal (2004), Blau and Kahn (2006) and Olivetti and Petrongolo (2008). These papers are typically concerned with mean or median wages. However, extensions to cumulative distribution functions or general image-wage gaps could also be considered.

When labor market participation is based on unobservables, correction procedures for the mean wages are also available. In these procedures, a control variate is added as a regressor in the conditional expectation function. The exclusion restriction that an available instrument image does not belong to the conditional expectation function also needs to be imposed. 67 Leading parametric and nonparametric examples are Heckman (1974, 1976), Duncan and Leigh (1980), Dolton and Makepeace (1986), Vella (1998), Mulligan and Rubinstein (2008).

In this setting, the decomposition can be performed by adding a control variate image to the regression. In most applications, image is the usual inverse Mills’ ratio term obtained by fitting a probit model of the participation decision. Note that the addition of this control variate slightly changes the interpretation of the decomposition. The full decomposition for the mean is now


image


where image and image are the estimated coefficients on the control variates. The decomposition provides a full accounting for the wage gap that also includes differences in both the composition of unobservables (image) and in the return to unobservables (image). This treats symmetrically the contribution of observables (the image’s) and unobservables in the decomposition.

A third approach uses bounds for the conditional expectation function of wages for groups image and image. With those bounds one can come up with bounds for the wage structure effect, image, and the composition effect, image. Let image. Then, letting image be a dummy indicating labor force participation, we can write the conditional expected wage as


image


and therefore


image


where image and image are lower and upper bounds of the distribution of image, for image. Therefore,


image


This bounding approach to the selection problem may also use restrictions motivated by econometric or economic theory to narrow the bounds, as in Manski (1990) and Blundell et al. (2007).

2. Self-Selection into groups A and B

In the next case we consider, individuals have the choice to belong to either group image or image. The leading example is the choice of the union status of workers. The traditional way of dealing with the problem is to model the choice decision and correct for selection biases using control function methods.68

As discussed in Section 2.1.6, it is also possible to apply instrumental variable methods more directly without explicitly modeling the selection process into groups image and image. Imbens and Angrist (1994) show that this will identify the wage gap for the subpopulation of compliers who are induced by the instrument to switch from one group to the other.

3. Endogeneity of the covariates

The standard assumption used in the OB decomposition is that the outcome variable image is linearly related to the covariates, image, and that the error term image is conditionally independent of image, as in Eq. (1). Now consider the case where the conditional independence assumption fails because one or several of the covariates are correlated with the error term. Note that while the ignorability assumption may hold even if conditional independence fails, we consider a general case here where neither assumption holds.

As is well known, the conventional solution to the endogeneity problem is to use instrumental variable methods. For example, if we suspect years of education (one of the covariate) to be correlated with the error term in the wage equation, we can still estimate the model consistently provided that we have a valid instrument for years of education. The decomposition can then be performed by replacing the OLS estimates of the image coefficients by their IV counterparts.

Of course, in most cases it is difficult to come up with credible instrumentation strategies. It is important to remember, however, that even when the zero conditional mean assumption image fails, the aggregate decomposition may remain valid, provided that ignorability holds. This would be the case, for example, when unobserved ability is correlated with education, but the correlation (more generally the conditional distribution of ability given education) is the same in group image and image. While we are not able to identify the contribution of education vs. ability in this context (unless we have an instrument), we know that there are no systematic ability differences between groups image and image once we have controlled for education. As a result, the aggregate decomposition remains valid.

6.2 Panel data

An arguably better way of dealing with the selection and endogeneity problems mentioned above is to use panel data. Generally speaking, panel data methods can be used to compute consistent estimates of the image’s in each of the three cases discussed earlier. For example, if the zero conditional mean assumption holds once we also control for a person-specific fixed effects image in a panel of length image (image), we can consistently estimate image using standard panel data methods (fixed effects, first differences, etc.). This provides an alternative way of dealing with endogeneity problems when no instrumental variables are available.

As we also discussed earlier, panel data can be used to impute wages for years where an individual is not participating in the labor market (e.g. Olivetti and Petrongolo, 2008). Note that in cases where groups are mutually exclusive (e.g. men vs. women), it may still be possible to estimate fixed effect models if the basic unit used is the firm (or related concepts) instead (Woodcock, 2008). Care has to be exercised in those circumstances to ensure that the firm fixed effect is the same for both female and male employees of the same firm. Another important issue with these models is the difficulty of interpretation of the differences in male and female intercepts which may capture the unobserved or omitted individual and firm effects.

Panel data methods have also been used to adjust for the selection into groups in cases where the same individual is observed in group image and image. For example, Freeman (1984) and Card (1996) estimate the union wage gap with panel data to control for the selection of workers into union status. Lemieux (1998) uses a more general approach where the return to the fixed effect may be different in the union and non-union sector. He also shows how to generalize the approach to the case of a decomposition of the variance.

Without loss of generality, assume that the return to the fixed effect for non-union workers (group image) is 1, while it is equal to image for union workers. The mean decomposition adjusted for fixed effects yields:


image


The interpretation of the decomposition is the same as in a standard OB setting except that image now represents the composition effect term linked to non-random selection into the union sector, while the wage structure term image captures a corresponding wage structure effect.

More sophisticated models with several levels of fixed effects have also been used in practice. For instance, Abowd et al. (2008) decompose inter-industry wage differentials into various components that include both individual- and firm-specific fixed effects.

6.3 Decomposition in structural models

In Section 2, we pointed out that decomposition methods were closely related to methods used in the program evaluation literature where it is not necessary to estimate a fully specified structural model to estimate the main parameter of interest (the image). Provided that the ignorability assumption is satisfied, we can perform an aggregate decomposition without estimating an underlying structural model.

There are some limits, however, to what can be achieved without specifying any structure to the underlying economic problem. As we just discussed in Section 6.1, one problem is that the ignorability assumption may not hold. Under this scenario, more explicit modeling may be useful for correcting biases in the decomposition due to endogeneity, self-selection, etc.

Another problem that we now address concerns the interpretation of the wage structure components of the detailed decomposition. Throughout this chapter, we have proposed a number of ways of estimating these components for both the mean and more general distributional statistics. In the case of the mean, the interpretation of the detailed decomposition for the wage structure effect is relatively straightforward. Under the assumption (implicit in the OB decomposition) that the wage equations are truly linear and the errors have a zero conditional mean, we can think of the wage setting model as a fully specified structural model. The image coefficients are the “deep” structural parameters of the model, and these structural parameters are used directly to perform the decomposition.

Things become more complicated once we go beyond the mean. For instance, in the case of the variance (Section 4.1), recall that the wage structure effect from Eq. (26) which depends on the parameters of both the models for the conditional mean (image) and for the variance (image).

Take, for example, the case where one of the covariates is the union status of workers. The parameter image captures the “compression”, or within-group, effect, while the parameter image captures the “wage gap”, or between-group, effect. These two terms have a distinct economic interpretation as they reflect different channels through which union wage policies tend to impact the wage distribution.

In the case of more general distributional statistics, the wage structure effect depends on an even larger number of underlying parameters capturing the relationship between the covariates and higher order moments of the distribution. As a result, the wage structure part of the detailed decomposition becomes even harder to interpret, as it potentially depends on a large number of underlying parameters.

In some cases, this may not pose a problem from an interpretation point of view. For instance, we may only care about the overall effect of unions, irrespective of whether it is coming from a between- or within-group effect (or corresponding components for higher order moments). But in other cases this type of interpretation may be unsatisfactory. Consider, for example, the effect of education on the wage structure. Like unions, education may influence wage dispersion through a between- or within-group channel. The between-group component is linked to the traditional return to education (effect on conditional means), but education also has a substantial effect on within-group dispersion (see, e.g., Lemieux, 2006b). All these effects are combined together in the decomposition methods proposed in Section 5, which is problematic if we want to know, for instance, the specific contribution of changes in the return to education to the growth in wage inequality.

In these circumstances, we need to use a more structural approach to get a more economically interpretable decomposition of the wage structure effect. The decomposition method of Juhn et al. (1993) is, in fact, an early example of a more structurally-based decomposition. In their setting, the model for the conditional mean is interpreted as an underlying human capital pricing equation. Likewise, changes in residual wage dispersion (given image) are interpreted as reflecting an increase in the return to unobservable skills.

As we discussed in Section 4.3, the fact that Juhn et al. (1993) provides a richer interpretation of the wage structure effect by separating the within- and between-group components is an important advantage of the method. We also mentioned, however, that the interpretation of the decomposition was not that clear for distributional statistics going beyond the variance, and that the procedure typically imposes substantial restrictions on the data that may or may not hold. By contrast, a method like DFL imposes very little restrictions (provided that the probit/logit model used for reweighting is reasonably flexible), though it is more limited in terms of the economic interpretation of the wage structure effect.

In light of this, the challenge is to find a way of imposing a more explicit structure on the economic problem while making sure the underlying model “fits” the data reasonably well. One possible way of achieving this goal is to go back to the structural form introduced in Section 2 (image), and use recent results from the literature on nonparametric identification of structural functions to identify the functions image. As discussed in Section 2.2.1, this can be done by invoking results obtained by Matzkin (2003), Blundell and Powell (2007) and Imbens and Newey (2009). Generally speaking, it is possible to identify the functions image nonparametrically under the assumptions of independence of image (Assumption 8), and strict monotonicity of image in image (Assumption 9).

But while it is possible, in principle, to nonparametrically identify the functions image, there is no guarantee that the resulting estimates will be economically interpretable. As a result, a more common approach used in the empirical literature is to write down a more explicit (and parametric) structural model, but carefully look at whether the model adequately fits the data. Once the model has been estimated, simulation methods can then be used to compute a variety of counterfactual exercises. The counterfactuals then form the basis of a more economically interpretable decomposition of the wage structure effect.

To take a specific example, consider the Keane and Wolpin (1997) model of career progression of young men, where educational and occupational choices are explicitly modeled using a dynamic programming approach. After carefully looking at whether the estimated model is rich enough to adequately fit the distribution of wages, occupational choices, and educational achievement, Keane and Wolpin use the estimated model to decompose the distribution of lifetime utility (itself computed using the model). They conclude that 90 percent of the variance of lifetime utility is due to skill endowment heterogeneity (schooling at age 16 and unobserved type). By contrast, choices and other developments happening after age 16 have a relatively modest impact on the variance of lifetime utility.69 The general idea here is to combine structural estimation and simulation methods to quantify the contribution of the different parameters of interest to some decompositions of interest. These issues are discussed in more detail in the chapter on structural methods by Keane et al. (2011).

One last point is that the interpretation problem linked to the wage structure effect does not apply to the detailed decomposition for the composition effect. In that case, each component is based on a clear counterfactual exercise that does not require an underlying structure to be interpretable. The aggregate decomposition is based on the following counterfactual exercise: what would be the distribution of outcomes for group image if the distribution of the covariates for group image were the same as for group image? Similarly, the detailed decomposition is based on a conditional version of the counterfactual. For example, one may want to ask what would be the distribution of outcomes for group image if the distribution of unionization (or another covariate) for group image was the same as for group image, conditional on the distribution of the other covariates remaining the same.

These interpretation issues aside, it may still be useful to use a more structural approach when we are concerned about the validity of the decomposition because of self-selection, endogeneity, etc. For instance, in Keane and Wolpin (1997), the choice of schooling and occupation is endogenous. Using standard decomposition methods to look, for instance, at the contribution of the changing distribution of occupations to changes in the distribution wages would yield invalid results because occupational choice is endogenous. In such a context, structural modeling, like the IV and selection methods discussed in Section 6.1, can help recover the elements of the decomposition when standard methods fail because of endogeneity or self-selection. But the problem here is quite distinct from issues with the wage structure effect where standard decomposition methods are limited because of an interpretation problem, and where structural modeling provides a natural way of resolving this interpretation problem. By contrast, solutions to the problem of endogeneity or self-selection are only as a good as the instruments (or related assumptions) used to correct for these problems. As a result, the value added of the structural approach is much more limited in the case of the composition effect than in the case of the wage structure effect.

This last point is very clear in the emerging literature where structural modeling is used in conjunction with experimental data. For example, Card and Hyslop (2005) use experimental data from the Self Sufficiency Project (SSP) to look at why individuals offered with a generous work subsidy are less likely to receive social assistance (SA). By definition, there is no composition effect since the treatment and control groups are selected by random assignment. In that context, the average treatment effect precisely corresponds to the wage structure effect (or “SA” structure effect in this context) in a decomposition of the difference between the treatment and control group. It is still useful, however, to go beyond this aggregate decomposition to better understand the mechanisms behind the measured treatment effect. Card and Hyslop (2005) do so by estimating a dynamic search model.

This provides much more insight into the “black box” of the treatment effect than what a traditional decomposition exercise would yield. Remember that the detailed wage structure component in a OB type decomposition is based on the difference between the return to different characteristics in the two groups. In a pure experimental context like the SSP project, this simply reflects some heterogeneity in the treatment effect across different subgroups. Knowing about the importance of heterogeneity in the treatment effect is important from the point of view of the generalizability of the results. But unlike a structural approach, it provides relatively little insight on the mechanisms underlying the treatment effect.

7 Conclusion

The development of new decomposition methods has been a fertile area of research over the last 10-15 years. Building on the seminal work of Oaxaca (1973) and Blinder (1973), a number of procedures that go beyond the mean have been suggested and used extensively in practice. In this chapter, we have reviewed these methods and suggested a number of “best practices” for researchers interested in these issues. We have also illustrated how these methods work in practice by discussing existing applications and working through a set of empirical examples throughout the chapter.

Another important and recent development in this literature has linked decomposition methods to the large and growing literature on program evaluation and treatment effects. This connection is useful for several reasons. First, it helps clarify some interpretation issues with decompositions. In particular, results from the treatment effects literature can be used to show, for example, that we can give a structural interpretation to an aggregate decomposition under the assumption of ignorability. Another benefit of this connection is that formal results about the statistical properties of treatment effects estimators can also be directly applied to decomposition methods. This helps guide the choice of decomposition methods that have good statistical properties, and conduct inference on these various components of the estimated decomposition.

But this connection with the treatment effects literature also comes at a cost. While no structural modeling is required to perform a decomposition or estimate a treatment effect, these approaches leave open the question of what are the economic mechanisms behind the various elements of the decomposition (or behind the treatment effect). Now that the connection between decomposition methods and the treatment effects literature has been well established, an important direction for future research will be to improve the connection between decomposition methods and structural modeling.

The literature on inequality provides some useful hints on how this connection can be useful and improved upon. In this literature, decomposition methods have helped uncover the most important factors behind the large secular increase in the distribution of wages. Those include the return to education, de-unionization, and the decline in the minimum wage, to mention a few examples. These findings have spurred a large number of more conceptual studies trying to provide formal economic explanations for these important phenomena. In principle, these explanations can then be more formally confronted to the data by writing down and estimating a structural model, and using simulation methods to quantify the role of these explanations.

This suggest a two-step research strategy where “off-the-shelf” decomposition methods, like those discussed in this chapter, can first be used to uncover the main forces underlying an economic phenomenon of interest. More “structural” decomposition methods could then be used to better understand the economics behind the more standard decomposition results. We expect such a research strategy to be a fruitful area of research in the years to come.

References

Abowd, John M., Kramarz, Francis, Lengerman, Paul, Roux, Sebastien, 2008. Persistent inter-industry wage differences: rent sharing and opportunity costs. Working paper

James Albrecht, Anders Björklund, Susan Vroman. Is there a glass ceiling in Sweden? Journal of Labor Economics. 2003;21:145-178.

Joseph G. Altonji, Rebecca Blank. Race and gender in the labor market. In: O. Ashenfelter, D. Card., editors. Handbook of Labor Economics, vol. 3C. Amsterdam: Elsevier Science, 1999.

Joseph G. Altonji, Rosa L. Matzkin. Cross section and panel data estimators for nonseparable models with endogenous regressors. Econometrica. 2005;73:1053-1102.

Altonji, Joseph G., Bharadwaj, P., Lange, Fabian, 2008, Changes in the characteristics of American youth: Implications for adult outcomes. Working paper, Yale University

Susan Athey, Guido W. Imbens. Identification and inference in nonlinear difference-in-differences models. Econometrica. 2006;74:431-497.

David H. Autor, Frank Levy, Richard Murnane. The skill content of recent technological change: an empirical exploration. Quarterly Journal of Economics. 2003;118:1279-1333.

Autor, David H., Katz, Lawrence B., Kearney, Melissa S., 2005. Rising Wage Inequality: The Role of Composition and Prices. NBER Working Paper No. 11628, September

R. Barsky, John Bound, K. Charles, J. Lupton. Accounting for the black-white wealth gap: a nonparametric approach. Journal of the American Statistical Association. 2002;97:663-673.

Thomas K. Bauer, Silja Göhlmann, Mathias Sinning. Gender differences in smoking behavior. Health Economics. 2007;19:895-909.

Thomas K. Bauer, Mathias Sinning. An extension of the Blinder–Oaxaca decomposition to nonlinear models. Advances in Statistical Analysis. 2008;92:197-206.

Marianne Bertrand, Kevin F. Hallock. The gender gap in top corporate jobs. Industrial and Labor Relations Review. 2001;55:3-21.

Martin Biewen. Measuring the effects of socio-economic variables on the income distribution: an application to the income distribution: an application to the East German transition process. Review of Economics and Statistics. 2001;83:185-190.

Marianne P. Bitler, Jonah B. Gelbach, Hilary W. Hoynes. What mean impacts miss: distributional effects of welfare reform experiments. American Economic Review. 2006;96:988-1012.

Dan Black, Amelia Haviland, Seth Sanders, Lowell Taylor. Gender wage disparities among the highly educated. Journal of Human Resources. 2008;43:630-659.

Francine D. Blau, Lawrence M. Kahn. The gender earnings gap: learning from international comparisons. American Economic Review. 1992;82:533-538.

Francine D. Blau, Lawrence M. Kahn. Swimming upstream: trends in the gender wage differential in the 1980s. Journal of Labor Economics. 1997;15:1-42.

Francine D. Blau, Lawrence M. Kahn. Understanding international differences in the gender pay gap. Journal of Labor Economics. 2003;21:106-144.

Francine D. Blau, Lawrence M. Kahn. The US gender pay gap in the 1990s: slowing convergence. Industrial & Labor Relations Review. 2006;60(1):45-66.

Alan Blinder. Wage discrimination: reduced form and structural estimates. Journal of Human Resources. 1973;8:436-455.

Richard Blundell, James L. Powell. Censored regression quantiles with endogenous regressors. Journal of Econometrics. 2007;141:65-83.

Richard Blundell, Amanda Gosling, Hidehiko Ichimura, Costas Meghir. Changes in the distribution of male and female wages accounting for employment composition using bounds. Econometrica. 2007;75:323-363.

Francois Bourguignon. Decomposable income inequality measures. Econometrica. 1979;47:901-920.

F. Bourguignon, Francisco H.G. Ferreira. Decomposing changes in the distribution of household incomes: methodological aspects. In: F. Bourguignon, F.H.G. Ferreira, N. Lustig, editors. The Microeconomics of Income Distribution Dynamics in East Asia and Latin America. World Bank; 2005:17-46.

F. Bourguignon, Francisco H.G. Ferreira, Philippe G. Leite. Beyond Oaxaca–Blinder: Accounting for differences in household income distributions. Journal of Economic Inequality. 2008;6:117-148.

Busso, Matias, DiNardo, John, McCrary, Justin, 2009. New Evidence on the Finite Sample Properties of Propensity Score Matching and Reweighting Estimators. IZA Discussion Paper No. 3998

Kristin F. Butcher, John DiNardo. The Immigrant and native-born wage distributions: evidence from United States censuses. Industrial and Labor Relations Review. 2002;56:97-121.

Glen Cain. The economic analysis of labor market discrimination: a survey. In: O.C. Ashenfelter, R. Layard, editors. Handbook of Labor Economics, vol. 1. North-Holland; 1986:709-730.

Card, David, 1992. The Effects of Unions on the Distribution of Wages: Redistribution or Relabelling? NBER Working Paper 4195. National Bureau of Economic Research, Cambridge, Mass

David Card. The effect of unions on the structure of wages: a longitudinal analysis. Econometrica. 1996;64:957-979.

David Card, Dean R. Hyslop. Estimating the effects of a time-limited earnings subsidy for welfare-leavers. Econometrica. 2005;73:1723-1770.

Kenneth Y. Chay, David S. Lee. Changes in relative wages in the 1980s: returns to observed and unobserved skills and black-white wage differentials. Journal of Econometrics. 2000;99(1):1-38.

Chernozhukov, Victor, Fernandez-Val, Ivan, Melly, Blaise, 2009. Inference on Counterfactual Distributions. CeMMAP working paper CWP09/09

Victor Chernozhukov, Ivan Fernandez-Val, A. Galichon. Quantile and probability curves without crossing. Econometrica. 2010;78:1093-1126.

Daniel Chiquiar, Gordon H. Hanson. International migration, self-selection, and the distribution of wages: Evidence from Mexico and the United States. Journal of Political Economy. 2005;113:239-281.

Jeremiah Cotton. On the decomposition of wage differentials. Review of Economics and Statistics. 1998;70:236-243.

Frank A. Cowell. On the structure of additive inequality measures. Review of Economic Studies. 1980;47:521-531.

Denison, E.F., 1962. The sources of economic growth in the United States and the alternatives before us. Supplementary Paper No. 13. Committee for Economic Development, New York

John DiNardo, Nicole M. Fortin, Thomas Lemieux. Labor market institutions and the distribution of wages, 1973-1992: a semiparametric approach. Econometrica. 1996;64:1001-1044.

John DiNardo, David S. Lee. Economic impacts of new unionization on private sector employers: 1984-2001. The Quarterly Journal of Economics. 2004;119:1383-1441.

John DiNardo, Thomas Lemieux. Diverging male inequality in the United States and Canada, 1981-1988: do institutions explain the difference. Industrial and Labor Relations Review. 1997;50:629-651.

Peter John Dolton, Gerald H. Makepeace. Sample selection and male-female earnings differentials in the graduate labour market. Oxford Economic Papers. 1986;38:317-341.

Denise J. Doiron, W. Craig Riddell. The impact of unionization on male-female earnings differences in Canada. Journal of Human Resources. 1994;29:504-534.

Stephen G. Donald, David A. Green, Harry J. Paarsch. Differences in wage distributions between Canada and the United States: an application of a flexible estimator of distribution functions in the presence of covariates source. Review of Economic Studies. 2000;67:609-633.

Gregory M. Duncan, Duane E. Leigh. Wage determination in the union and nonunion sectors: a sample selectivity approach. Industrial and Labor Relations Review. 1980;34:24-34.

Egel, Daniel, Graham, Bryan, Pinto, Cristine, 2009. Efficicient estimation of data combination problems by the method of auxiliary-to-study tilting. mimeo

William E. Even, David A. Macpherson. Plant size and the decline of unionism. Economics Letters. 1990;32:393-398.

Robert W. Fairlie. The absence of the African-American owned business: an analysis of the dynamics of self–employment. Journal of Labor Economics. 1999;17:80-108.

Robert W. Fairlie. An extension of the Blinder-Oaxaca decomposition technique to logit and probit models. Journal of Economic and Social Measurement. 2005;30:305-316.

Judith Fields, Edward N. Wolff. Interindustry wage differentials and the gender wage gap. Industrial and Labor Relations Review. 1995;49:105-120.

Sergio Firpo. Efficient semiparametric estimation of quantile treatment effects. Econometrica. 2007;75:259-276.

Firpo, Sergio, 2010. Identification and Estimation of Distributional Impacts of Interventions Using Changes in Inequality Measures. EESP-FGV. mimeo

Firpo, Sergio, Fortin, Nicole M., Thomas, Lemieux, 2007. Decomposing Wage Distributions using Recentered Influence Functions Regressions. mimeo, University of British Columbia

Sergio Firpo, Nicole M. Fortin, Thomas Lemieux. Unconditional quantile regressions. Econometrica. 2009;77(3):953-973.

Fitzenberger, Bernd, Kohn, Karsten, Wang, Qingwei, 2010. The erosion of union membership in Germany: determinants, densities, decompositions. Journal of Population Economics (forthcoming)

Silverio Foresi, Franco Peracchi. The conditional distribution of excess returns: an empirical analysis. Journal of the American Statistical Association. 1995;90:451-466.

Nicole M. Fortin, Thomas Lemieux. Rank regressions, wage distributions, and the gender gap. Journal of Human Resources. 1998;33:610-643.

Nicole M. Fortin. The gender wage gap among young adults in the United States: the importance of money vs. people. Journal of Human Resources. 2008;43:886-920.

Richard B. Freeman. Unionism and the dispersion of wages. Industrial and Labor Relations Review. 1980;34:3-23.

Richard B. Freeman. Longitudinal analysis of the effect of trade unions. Journal of Labor Economics. 1984;2:1-26.

Richard B. Freeman. How much has deunionization contributed to the rise of male earnings inequality? In: Sheldon Danziger, Peter Gottschalk, editors. Uneven Tides: Rising Income Inequality in America. New York: Russell Sage Foundation, 1993. 133-63

Markus Frolich. Finite-sample properties of propensity-score matching and weighting estimators. Review of Economics and Statistics. 2004;86:77-90.

Javier Gardeazabal, Arantza Ugidos. More on the identification in detailed wage decompositions. Review of Economics and Statistics. 86, 2004. 1034–57

Gelbach, Jonah B., 2002. Identified Heterogeneity in Detailed Wage Decompositions. mimeo, University of Maryland at College Park

Gelbach, Jonah B., 2009. When Do Covariates Matter? And Which Ones, and How Much? mimeo, Eller College of Management, University of Arizona

Joanna Gomulka, Nicholas Stern. The employment of married women in the United Kingdom, 1970–1983. Economica. 1990;57:171-199.

Amanda Gosling, Stephen Machin, Costas Meghir. The changing distribution of male wages in the U.K,. Review of Economic Studies. 2000;67:635-666.

William H. Greene. Econometric Analysis, 5th ed. Upper Saddle River, NJ: Pearson Education; 2003.

James Heckman. Shadow prices, market wages and labor supply. Econometrica. 1974;42:679-694.

James Heckman. The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. Annals of Economic and Social Measurement. 1976;5:475-492.

James Heckman. Sample selection bias as a specification error. Econometrica. 1979;47:153-163.

James J. Heckman, Jeffrey Smith, Nancy Clements. Making the most out of programme evaluations and social experiments: accounting for heterogeneity in programme impacts. Review of Economic Studies. 1997;64(4):487-535.

James J. Heckman, Hidehiko Ichimura, Petra Todd. Matching as an econometric evaluation estimator: evidence from evaluating a job training programme. Review of Economic Studies. 1997;64:605-654.

James J. Heckman, Hidehiko Ichimura, Jeffrey Smith, Petra Todd. Characterizing selection bias using experimental data. Econometrica. 1998;66:1017-1098.

Heywood, John S., Parent, Daniel, 2009. Performance Pay and the White-Black Wage Gap. mimeo, McGill University

Kiesuke Hirano, Guido W. Imbens, Geert Ridder. Efficient estimation of average treatment effects using the estimated propensity score. Econometrica. 2003;71:1161-1189.

Paul W. Holland. Statistics and causal inference. Journal of the American Statistical Association. 1986;81(396):945-960.

Hoffman, Florian, 2009. An Empirical Model of Life-Cycle Earnings and Mobility Dynamics. University of Toronto, Department of Economics. mimeo

William Horrace, Ronald L. Oaxaca. Inter-industry wage differentials and the gender wage gap: an identification problem. Industrial and Labor Relations Review. 2001;54:611-618.

Guido W. Imbens, Joshua Angrist. Identification and estimation of local average treatment effects. Econometrica. 1994;62:467-476.

Guido W. Imbens, Whitney K. Newey. Identification and estimation of triangular simultaneous equations models without additivity. Econometrica. 2009;77(5):1481-1512.

Jann, Ben, 2005. Standard errors for the Blinder-Oaxaca decomposition. German Stata Users’ Group Meetings 2005. Available from http://repec.org/dsug2005/oaxaca_se_handout.pdf

Ben Jann. The Oaxaca-Blinder decomposition for linear regression models. Stata Journal. 2008;8:435-479.

Frank Lancaster Jones. On decomposing the wage gap: a critical comment on blinder’s method. Journal of Human Resources. 1983;18:126-130.

William Johnson, Yuichi Kitamura, Derek Neal. Evaluating a simple method for estimating black-white gaps in median wages. American Economic Review. 2000;90:339-343.

D.W. Jorgenson, Z. Griliches. The explanation of productivity change. Review of Economic Studies. 1967;34:249-283.

Chinhui Juhn, Kevin M. Murphy, Brooks Pierce. Accounting for the slowdown in black-white wage convergence. In: M.H. Kosters, editor. Workers and Their Wages: Changing Patterns in the United States. Washington: American Enterprise Institute, 1991.

Chinhui Juhn, Kevin M. Murphy, Brooks Pierce. Wage inequality and the rise in returns to skill. Journal of Political Economy. 1993;101:410-442.

Michael P. Keane, Kenneth I. Wolpin. The career decisions of young men. Journal of Political Economy. 1997;105:473-522.

Michael P. Keane, Petra E. Todd, Kenneth I. Wolpin. The structural estimation of behavioral models: discrete choice dynamic programming methods and applications. In: O. Ashenfelter, D. Card, editors. Handbook of Labor Economics, vol. 4A. Amsterdam: Elsevier Science; 2011:331-461.

John W. Kendrick. Productivity Trends in the United States. Princeton: Princeton University Press; 1961.

Peter Kennedy. Interpreting dummy variables. Review of Economics and Statistics. 1986;68:174-175.

Kline, Pat, 2009. Blinder-Oaxaca as a Reweighting Estimator. UC Berkeley mimeo

Roger Koenker, G. Bassett. Regression quantiles. Econometrica. 1978;46:33-50.

Alan B. Krueger, Lawrence H. Summers. Efficiency wages and the inter-industry wage structure. Econometrica. 1988;56(2):259-293.

John M. Krieg, Paul Storer. How much do students matter? applying the Oaxaca decomposition to explain determinants of adequate yearly progress. Contemporary Economic Policy. 2006;24:563-581.

Thomas Lemieux. Estimating the effects of unions on wage inequality in a panel data model with comparative advantage and non-random selection. Journal of Labor Economics. 1998;16:261-291.

Thomas Lemieux. Decomposing changes in wage distributions: a unified approach. The Canadian Journal of Economics. 2002;35:646-688.

Thomas Lemieux. Post-secondary education and increasing wage inequality. American Economic Review. 2006;96:195-199.

Thomas Lemieux. Increasing residual wage inequality: composition effects, noisy data, or rising demand for skill? American Economic Review. 2006;96:461-498.

H.Gregg Lewis. Unionism and Relative Wages in the United States. Chicago: University of Chicago Press; 1963.

H.Gregg Lewis. Union Relative Wage Effects: A Survey. Chicago: University of Chicago Press; 1986.

David Neumark. Employers’ discriminatory behavior and the estimation of wage discrimination. Journal of Human Resources. 1988;23:279-295.

José F. Machado, José Mata. Counterfactual decomposition of changes in wage distributions using quantile regression. Journal of Applied Econometrics. 2005;20:445-465.

Machado, Cecilia, 2009. Selection, Heterogeneity and the Gender Wage Gap. Columbia University, Economics Department. mimeo

Charles F. Manski. Nonparametric bounds on treatment effects. American Economic Review. 1990;80(2):319-323.

Rosa L. Matzkin. Nonparametric estimation of nonadditive random functions. Econometrica. 2003;71(5):1339-1375.

P.J. McEwan, J.H. Marshall. Why does academic achievement vary across countries? Evidence from Cuba and Mexico. Education Economics. 2004;12:205-217.

Melly, Blaise, 2006. Estimation of counterfactual distributions using quantile regression. University of St. Gallen, Discussion Paper

Blaise Melly. Decomposition of differences in distribution using quantile regression. Labour Economics. 2005;12:577-590.

Casey B. Mulligan, Yona Rubinstein. Selection, investment, and women’s relative wages over time. Quarterly Journal of Economics. 2008;123:1061-1110.

Derek A. Neal, W. Johnson. The role of premarket factors in black-white wage differences. Journal of Political Economy. 1996;104:869-895.

Derek A. Neal. The measured black-white wage gap among women is too small. Journal of Political Economy. 2004;112:S1-S28.

Hugo Ñopo. Matching as a tool to decompose wage gaps. Review of Economics and Statistics. 2008;90:290-299.

Ronald Oaxaca. Male-female wage differentials in urban labor markets. International Economic Review. 1973;14:693-709.

Ronald L. Oaxaca, Michael R. Ransom. On discrimination and the decomposition of wage differentials. Journal of Econometrics. 1994;61:5-21.

Ronald L. Oaxaca, Michael R. Ransom. Calculation of approximate variances for wage decomposition differentials. Journal of Economic and Social Measurement. 1998;24:55-61.

Ronald L. Oaxaca, Michael R. Ransom. Identification in detailed wage decompositions. Review of Economics and Statistics. 1999;81:154-157.

Ronald L. Oaxaca. The challenge of measuring labor market discrimination against women. Swedish Economic Policy Review. 2007;14:199-231.

Claudia Olivetti, Barbara Petrongolo. Unequal pay or unequal employment? a cross-country analysis of gender gaps. Journal of Labor Economics. 2008;26:621-654.

June O’Neill, Dave O’Neill. What do wage differentials tell us about labor market discrimination?. Soloman Polachek, Carmel Chiswich, Hillel Rapoport, editors. The Economics of Immigration and Social Policy. Research in Labor Economics. 2006;24:293-357.

Cornelia W. Reimers. Labor market discrimination against hispanic and black men. Review of Economics and Statistics. 1983;65:570-579.

James Robins, Andrea Rotnizky, Lue Ping Zhao. Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association. 1994;89:846-866.

Paul R. Rosenbaum, Donald B. Rubin. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70:41-55.

Paul R. Rosenbaum, Donald B. Rubin. Reducing bias in observational studies using subclassification on the propensity score. Journal of the American Statistical Association. 1984;79:516-524.

Christoph Rothe. Nonparametric estimation of distributional policy effects. Journal of Econometrics. 2009;155:56-70.

Anthony F. Shorrocks. The class of additively decomposable inequality measures. Econometrica. 1980;48:613-625.

Anthony F. Shorrocks. Inequality decomposition by population subgroups. Econometrica. 1984;52:1369-1385.

Shorrocks, Anthony F. 1999. Decomposition Procedures for Distributional Analysis: A Unified Framework Based on the Shapley Value. University of Essex, Department of Economics. mimeo

Robert Solow. Technical change and the aggreagate production function. Review of Economics and Statistics. 1957;39:312-320.

Sohn, Ritae, 2008. The Gender Math Gap: Is It Growing? mimeo, SUNY Albany

Frank Vella. Estimating models with sample selection bias: a survey. Journal of Human Resources. 1998;33:127-169.

Simon D. Woodcock. Wage differentials in the presence of unobserved worker, firm, and match heterogeneity? Labour Economics. 2008;15:772-794.

Myeong-Su Yun. A simple solution to the identification problem in detailed wage decomposition. Economic Inquiry. 2005;43:766-772. with Erratum, Economic Inquiry (2006), 44: 198

Myeong-Su Yun. Identification problem and detailed oaxaca decomposition: a general solution and inference. Journal of Economic and Social Measurement. 2008;33:27-38.

1 See also Kendrick (1961), Denison (1962), and Jorgenson and Griliches (1967).

2 We limit our discussion to so-called “regression-based” decomposition methods, where the decomposition focuses on explanatory factors, rather than decomposition methods that apply to additively decomposable indices, where the decomposition pertains to population sub-groups. Bourguignon and Ferreira (2005) and Bourguignon et al. (2008) are recent surveys discussing these methods.

3 The decomposition can also be written by exchanging the reference group used for the wage structure and composition effects as follows:


image


Alternatively, the so-called three-fold decomposition uses the same reference group for both effects, but introduces a third interaction term: image. While these various versions of the basic decomposition are used in the literature, using one or the other does not involve any specific estimation issues. For the sake of simplicity, we thus focus on the one decomposition introduced in the text for most of the chapter.

4 Firpo (2010) shows that for any smooth functional of the reweighted cdf, efficiency is achieved. In other words, decomposing standard distributional statistics such as the variance, the Gini coefficient, or the interquartile range using the reweighting method suggested by DiNardo et al. (1996) will be efficient. Note, however, that this result does not apply to the (more complicated) case of the density considered by DiNardo et al. (1996) where non-parametric estimation is involved.

5 One possible explanation for the lack of discussion of identification assumptions is that they were reasonably obvious in the case of the original OB decompositions for the mean. The situation is quite a bit more complex, however, in the case of distributional statistics other than the mean. Note also that some recent papers have started addressing these identification issues in more detail. See, for instance, Firpo et al. (2007), and Chernozhukov et al. (2009).

6 Alternatively, the overlapping issue can bypassed by excluding Hispanics from the Black and White groups.

7 Many papers (DiNardo et al., 1996; Machado and Mata, 2005; Chernozhukov et al., 2009) have proposed methodologies to estimate and decompose entire distributions (or densities) of wages, but the decomposition results are ultimately quantified through the use of distributional statistics. Analyses of the entire distribution look at several of these distributional statistics simultaneously.

8 When we construct the counterfactual image, we choose image to be the reference group and image the group whose wages are “adjusted”. Thus counterfactual women’s wages if they were paid like men would be image, although the gender gap example is more difficult to conceive in the treatment effects literature.

9 Chernozhukov et al. (2009) discuss the conditions under which the two types of decomposition are equivalent.

10 To see more explicitly how the conditional distribution image depends on the distribution of image, note that we can write image under the assumption that image is monotonic in image (see Assumption 9 introduced below).

11 See, for instance, Rosenbaum and Rubin (1983, 1984), Heckman et al. (1997a,b) and Heckman et al. (1998).

12 Differences in the distribution of the image are fairly constrained under the ignorability assumption. While the unconditional distribution of image may differ between group image and image (because of differences in the distribution of image), the conditional distribution of image has to be the same for the two groups.

13 This monotonicity assumption can also be found in the works of Matzkin (2003), Altonji and Matzkin (2005), Imbens and Newey (2009), and Athey and Imbens (2006).

14 The rank pairing of two outcome variables image and image will be disrupted if the rank of image remains the same because at a mass point corresponding to the minimum wage, while the rank of image continues to increase in the absence of minimum wage at the rank. Heckman et al. (1997a,b) consider the case of mass points at zero, but the case of multiple mass points is much more difficult.

15 Note that it is possible to relax the homoskedasticity assumption while maintaining the assumption of a single price of unobservables image, as in Chay and Lee (2000). We do not follow this approach here to simplify the presentation.

16 Note that we depart somewhat from our previous notation, as image retains some components of the structural form of group B, which will disappear in image below.

17 See Blau and Kahn (1992, 2003) for an application of the methodology to the study of gender wage differentials across countries.

18 Only image and image are observed.

19 We note that this last decomposition corresponds, in the OB context, to the so-called three-fold decomposition presented in footnote 3.

20 The union/non-union wage gaps or private/public sector wage gaps are more amenable to choice.

21 Note that some analyses (e.g. Neal and Johnson, 1996) take great care to focus on pre-market variables.

22 The empirical applications of the OB procedure in this chapter use Jann (2008) procedures in Stata.

23 As is common in the gender pay gap literature, we begin with the counterfactual that use group image (males) as the reference group. In column (3) of Table 3, we present the decomposition that corresponds to Eq. (15), that is uses group image (females) as the reference group.

24 In particular, see the discussion of the case of scalable or categorical variables below.

25 This interpretation issue also arises in other applications that use categorical variables, notably the inter-industry wage differentials literature. In this literature, following the seminal Krueger and Summers (1988) paper on inter-industry wage differentials, the standard practice is to express industry differentials as deviations from an employment-share weighted mean, a well-defined average.

26 In the first regression, the composition effect is given by image, and in the second regression, image because image, image.

27 Actually, problems arise when they are more than two categories. Blinder (1973, footnote 13) and Oaxaca (2007) correctly point out that in the case of a binary dummy variable, these problems do not occur.

28 This problem is different from a “true” identification problem which arises when multiple values of a parameter of interest are consistent with a given model and population.

29 As pointed by Gardeazabal and Ugidos (2004), such restrictions can have some disturbing implications. In the case of educational categories, it rules out an outcome where group image members would earn higher returns than group image members for all levels of education.

30 In the gender wage gap literature, when the reference wage structure is the male wage structure (group image) the means among women image will be used in Eq. (22).

31 It is indeed easy to see that image image.

32 The image for the omitted category is simply the first and last components of Eq. (22), since image for that category.

33 image and image are the matrices of covariates (of dimension image and image) for groups image and image, respectively.

34 This “pooled” decomposition is easily implemented using the option “pooled” in Jann (2008) “oaxaca” procedure in Stata 9.2.

35 When considering covariates image, we use the subscript image to denote the group whose characteristics are “adjusted” with reweighting.

36 We show in Section 4 that the reweighting factor image is defined as the ratio of the marginal distributions of image for groups image and image, image. As a result, the reweighted distribution of image for group image should be the same as the original distribution of image in group image. This implies that the mean value of image in the reweighted sample, image, should be the same as the mean value of image for group image, image.

37 When the conditional expectation is non-linear, the OLS estimate of image can be interpreted as the one which minimizes the square of the specification error image over the distribution of image. Since the expected value of the OLS estimate of image depends on the distribution of image, differences in image over two samples may either reflect true underlying differences in the conditional expectation (i.e. in the wage structure), or “spurious” differences linked to the fact that the distribution of image is different in the two samples. For example, if image is convex in image, the expected value of image will tend to grow as the distribution of image shifts up, since the relationship between image and image gets steeper as image becomes larger.

38 This corresponds to an experimental setting where, for example, regression analysis was used to assess the impact of various soils and fertilizers (image) on agricultural yields image.

39 See, for instance, Bourguignon (1979), Cowell (1980), and Shorrocks (1980, 1984).

40 See for example, Theorem B.4 in Greene (2003).

41 Estimating these simple models of the conditional cross-sectional variance is a special case of the large time-series literature on the estimation of auto-regressive conditional heteroskedasticity models (ARCH, GARCH, etc.).

42 See Albrecht et al. (2003), who look at whether there is a glass ceiling in female earnings, and Bitler et al. (2006), who study the distributional effects of work incentive programs on labor supply.

43 Juhn et al. (1993) actually consider multiple time periods and proposed an additional counterfactual where the returns to observables are set to their mean across time periods, a complex counterfactual treatment.

44 See also Lemieux (2002).

45 For each random draw image, MM also draw a vector of covariates image from the observed data and perform the prediction for this value only. Melly (2005) discusses more efficient ways of computing distributions using this conditional quantile regression approach.

46 The estimates were computed with Melly’s implementation “rqdeco” in Stata.

47 See Melly (2005) for a detailed description of this alternative procedure. Gosling et al. (2000) and Autor et al. (2005) also use a similar idea in their empirical applications to changes in the distribution of wages over time.

48 Machado and Mata (2005) (page 449-450) suggest computing the detailed decomposition for the composition effect using an unconditional reweighting procedure. This is invalid as a way of performing the decomposition for the same reason that a OB decomposition would be invalid if the image coefficient used for one covariate was estimated without controlling for the other covariates. We propose a conditional reweighting procedure in the next section that deals adequately with this issue.

49 This view of course makes more sense when some policy or other change has taken place over time (see Biewen, 2001).

50 On the other hand, by analogy with the treatment effects literature, Firpo et al. (2007) use time period 0 as the reference group.

51 The estimator suggested by Hirano et al. (2003) is a series estimator applied to the case of a logit model. The idea is to add increasingly higher order polynomial terms in the covariates as the size of the sample increases. Importantly, they also show that this approach yields an efficient estimate of the treatment effect.

52 The two most popular kernel functions are the Gaussian and the Epanechnikov kernel.

53 By contrast, in the original DiNardo et al. (1996) decomposition, workers in 1988 (time period 1) were reweighted to look like workers in 1979 (time period 0). The counterfactual distribution of wages was asking what would the distribution of wages look like if the workers’ characteristics had remained at 1979 levels.

54 In small samples, it is important to ensure that these estimated weights sum up to the number of actual observations in the sample, though this is done automatically in packages like Stata. See Busso et al. (2009) for more detail.

55 The analytical standard errors have to take account of the fact that the logit or probit model used to construct the reweighting factor is estimated. Firpo et al. (2007) show how to perform this adjustment. In practice, however, it is generally simpler to bootstrap the whole estimation procedure (both the estimation of the logit/probit to construct the weights and the computation of the various elements of the decomposition).

56 In principle, other popular methods in the program evaluation literature such as matching could be used instead of reweighting.

57 Foresi and Peracchi (1995) proposed to use a sequence of logit models to estimate the conditional distribution of excess returns.

58 Donald et al. (2000) use a more general specification of the proportional hazard model where image and image are allowed to vary for different values (segments) of image.

59 The estimation was performed using Melly’s “counterfactual” Stata procedure. The computation of the variance and Gini coefficient were based on the estimation of 100 centiles.

60 Chernozhukov et al. (2009) use the method of Chernozhukov et al. (2010) to ensure that the function is monotonic.

61 Firpo et al. (2009) also propose other more flexible estimation procedures.

62 Using a reweighted regression approach can be particularly important in the cases of image-regressions that are unlikely to be linear for distributional statistics besides the mean.

63 The reweighting error reflects the fact that the composition effect in the reweighted-regression decomposition, image, is not exactly equal to the standard composition effect image when the reweighted mean image is not exactly equal to image.

64 Note that in DFL, it is the opposite; group image is the 1988 time period and group image is the 1979 time period.

65 See, for example, Butcher and DiNardo (2002) and Altonji et al. (2008).

66 Both Butcher and DiNardo (2002) and Altonji et al. (2008) consider cases where there is indeed a good reason for following a particular order in the decomposition. For instance, Altonji et al. (2008) argue that, when looking at various youth outcomes, one should first control for predetermined factors like gender and race before controlling for other factors determined later in life (AFQT score, educational achievement, etc.). In such a situation, the decomposition is econometrically interpretable even if gender and race are introduced first without controlling for the other factors.

67 As is well known, selection models can be identified on the basis of functional restrictions even when an excluded instrumental variable is not available. This is no longer viewed, however, as a credible identification strategy. We, therefore, only focus on the case where an instrumental variable is available.

68 See for instance, the survey of Lewis (1986) who concludes that these methods yield unreliable estimates of the union wage gap. Given these negative results and the lack of credible instruments for unionization, not much progress has been made in this literature over the last two decades. One exception is DiNardo and Lee (2004) who use a regression discontinuity design.

69 Note, however, that Hoffman (2009) finds that skill endowments have a sizably smaller impact in a richer model that incorporates comparative advantage (across occupations), search frictions, and exogenous job displacement.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.254.103