Handbook of Labor Economics, Vol. 4, No. Suppl PA, 2011

ISSN: 1573-4463

doi: 10.1016/S0169-7218(11)00407-2

Chapter 1Decomposition Methods in Economics

Nicole Fortin*, Thomas Lemieux**, Sergio Firpo***


* UBC and CIFAR

** UBC and NBER

*** EESP-FGV and IZA

Abstract

This chapter provides a comprehensive overview of decomposition methods that have been developed since the seminal work of Oaxaca and Blinder in the early 1970s. These methods are used to decompose the difference in a distributional statistic between two groups, or its change over time, into various explanatory factors. While the original work of Oaxaca and Blinder considered the case of the mean, our main focus is on other distributional statistics besides the mean, such as quantiles, the Gini coefficient or the variance. We discuss the assumptions required for identifying the different elements of the decomposition, as well as various estimation methods proposed in the literature. We also illustrate how these methods work in practice by discussing existing applications and working through a set of empirical examples throughout the paper.

JEL classification

• J31 • J71 • C14 • C21

Keywords

• Decomposition • Counterfactual distribution • Inequality • Wage structure • Wage differentials • Discrimination

1 Introduction

What are the most important explanations accounting for pay differences between men and women? To what extent has wage inequality increased in the United States between 1980 and 2010 because of increasing returns to skill? Which factors are behind most of the growth in US GDP over the last 100 years? These important questions all share a common feature. They are typically answered using decomposition methods. The growth accounting approach pioneered by Solow (1957) and others is an early example of a decomposition approach aimed at quantifying the contribution of labor, capital, and unexplained factors (productivity) to US growth.1 But it is in labor economics, starting with the seminal papers of Oaxaca (1973) and Blinder (1973), that decomposition methods have been used the most extensively. These two papers are among the most heavily cited in labor economics, and the Oaxaca-Blinder (OB) decomposition is now a standard tool in the toolkit of applied economists. A large number of methodological papers aimed at refining the OB decomposition, and expanding it to the case of distributional parameters besides the mean, have also been written over the past three decades.

The twin goals of this chapter are to provide a comprehensive overview of decomposition methods that have been developed since the seminal work of Oaxaca and Blinder, and to suggest a list of best practices for researchers interested in applying these methods.2 We also illustrate how these methods work in practice by discussing existing applications and working through a set of empirical examples throughout the chapter.

At the outset, it is important to note a number of limitations to decomposition methods that are beyond the scope of this chapter. As the above examples show, the goal of decomposition methods are often quite ambitious, which means that strong assumptions typically underlie these types of exercises. In particular, decomposition methods inherently follow a partial equilibrium approach. Take, for instance, the question “what would happen to average wages in the absence of unions?”As H. Gregg Lewis pointed out a long time ago (Lewis, 1963, 1986), there are many reasons to believe that eliminating unions would change not only the wages of union workers, but also those of non-union workers. In this setting, the observed wage structure in the non-union sector would not represent a proper counterfactual for the wages observed in the absence of unions. We discuss these general equilibrium considerations in more detail towards the end of the paper, but generally follow the standard partial equilibrium approach where observed outcomes for one group (or region/time period) can be used to construct various counterfactual scenarios for the other group.

A second important limitation is that while decompositions are useful for quantifying the contribution of various factors to a difference or change in outcomes in an accounting sense, they may not necessarily deepen our understanding of the mechanisms underlying the relationship between factors and outcomes. In that sense, decomposition methods, just like program evaluation methods, do not seek to recover behavioral relationships or “deep” structural parameters. By indicating which factors are quantitatively important and which are not, however, decompositions provide useful indications of particular hypotheses or explanations to be explored in more detail. For example, if a decomposition indicates that differences in occupational affiliation account for a large fraction of the gender wage gap, this suggests exploring in more detail how men and women choose their fields of study and occupations.

Another common use of decompositions is to provide some “bottom line” numbers showing the quantitative importance of particular empirical estimates obtained in a study. For example, while studies after studies show large and statistically significant returns to education, formal decompositions indicate that only a small fraction of US growth, or cross-country differences, in GDP per capita can be accounted for by changes or differences in educational achievement.

Main themes and road map to the chapter

The original method proposed by Oaxaca and Blinder for decomposing changes or differences in the mean of an outcome variable has been considerably improved and expanded upon over the years. Arguably, the most important development has been to extend decomposition methods to distributional parameters other than the mean. For instance, Freeman (1980, 1984) went beyond a simple decomposition of the difference in mean wages between the union and non-union sector to look at the difference in the variance of wages between the two sectors.

But it is the dramatic increase in wage inequality observed in the United States and several other countries since the late 1970s that has been the main driving force behind the development of a new set of decomposition methods. In particular, the new methods introduced by Juhn et al. (1993) and DiNardo et al. (1996) were directly motivated by an attempt at better understanding the underlying factors behind inequality growth. Going beyond the mean introduces a number of important econometric challenges and is still an active area of research. As a result, we spend a significant portion of the chapter on these issues.

A second important development has been to use various tools from the program evaluation literature to (i) clarify the assumptions underneath popular decomposition methods, (ii) propose estimators for some of the elements of the decomposition, and (iii) obtain formal results on the statistical properties of the various decomposition terms. As we explain below, the key connection with the treatment effects literature is that the “unexplained” component of a Oaxaca decomposition can be interpreted as a treatment effect. Note that, despite the interesting parallel with the program evaluation literature, we explain in the paper that we cannot generally give a “causal” interpretation to the decomposition results.

The chapter also covers a number of other practical issues that often arise when working with decomposition methods. Those include the well known omitted group problem (Oaxaca and Ransom, 1999), and how to deal with cases where we suspect the true regression equation not to be linear.

Before getting into the details of the chapter, we provide here an overview of our main contributions by relating them to the original OB decomposition for the difference in mean outcomes for two groups image and image. The standard assumption used in these decompositions is that the outcome variable image is linearly related to the covariates, image, and that the error term image is conditionally independent of image:


image     (1)


where image, and image is the vector of covariates (image). As is well known, the overall difference in average outcomes between group image and image,


image


can be written as:3


image


where image and image (image) are the estimated intercept and slope coefficients, respectively, of the regression models for groups image. The first term in the equation is what is usually called the “unexplained” effect in Oaxaca decompositions. Since we mostly focus on wage decompositions in this chapter, we typically refer to this first element as the “wage structure” effect (image). The second component, image, is a composition effect, which is also called the “explained” effect (by differences in covariates) in OB decompositions.

In the above decomposition, it is straightforward to compute both the overall composition and wage structure effects, and the contribution of each covariate to these two effects. Following the existing literature on decompositions, we refer to the overall decomposition (separating image in its two components image and image) as an aggregate decomposition. The detailed decomposition involves subdividing both image, the wage structure effect, and image, the composition effect, into the respective contributions of each covariate, image and image, for image.

The chapter is organized around the following “take away” messages:

A. The wage structure effect can be interpreted as a treatment effect

This point is easily seen in the case where group image consists of union workers, and group image consists of non-union workers. The raw wage gap image can be decomposed as the sum of the “effect” of unions on union workers, image, and the composition effect linked to differences in covariates between union and non-union workers, image. We can think of the effect of unions for each worker (image) as the individual treatment effect, while image is the Average Treatment effect on the Treated (image). One difference between the program evaluation and decomposition approaches is that the composition effect image is a key component of interest in a decomposition, while it is a selection bias resulting from a confounding factor to be controlled for in the program evaluation literature. By construction, however, one can obtain the composition effect from the estimated treatment effect since image and image.

Beyond semantics, there are a number of advantages associated with representing the decomposition component image as a treatment effect:

The zero conditional mean assumption (image) usually invoked in OB decompositions (as above) is not required for consistently estimating the image (or image). The mean independence assumption can be replaced by a weaker ignorability assumption. Under ignorability, unobservables do not need to be independent (or mean independent) of image as long as their conditional distribution given image is the same in groups image and image. In looser terms, this “selection based on observables” assumption allows for selection biases as long they are the same for the two groups. For example, if unobservable ability and education are correlated, a linear regression of image on image will not yield consistent estimates of the structural parameters (i.e. the return to education). But the aggregate decomposition remains valid as long as the dependence structure between ability and education is the same in group image and image.
A number of estimators for the image have been proposed in the program evaluation literature including Inverse Probability Weighting (image), matching and regression methods. Under ignorability, these estimators are consistent for the image (or image) even if the relationship between image and image is not linear. The statistical properties of these non-parametric estimators are also relatively well established. For example, Hirano et al. (2003) show that image estimators of the image are efficient. Firpo (2007) similarly shows that image is efficient for estimating quantile treatment effects. Accordingly, we can use the results from the program evaluation literature to show that decomposition methods based on reweighting techniques are efficient for performing decompositions. 4
When the distribution of covariates is different across groups, the image depends on the characteristics of group image (unless there is no heterogeneity in the treatment effect, i.e. image for all image). The subcomponents of image associated with each covariate image, image, can be (loosely) interpreted as the “contribution” of the covariate image to the image. This helps understand the issues linked to the well-known “omitted group problem” in OB decompositions (see, for example Oaxaca and Ransom, 1999).

B. Going beyond the mean is a “solved” problem for the aggregate decomposition

As discussed above, estimation methods from the program evaluation literature can be directly applied for performing an aggregate decomposition of the gap image into its two components image and image. While most of the results in the program evaluation literature have been obtained in the case of the mean (e.g., Hirano et al., 2003), they can also be extended to the case of quantiles (Firpo, 2007) or more general distribution parameters (Firpo, 2010). The image estimator originally proposed in the decomposition literature by DiNardo et al. (1996) or matching methods can be used to perform the decomposition under the assumption of ignorability. More parametric approaches such as those proposed by Juhn et al. (1993), Donald et al. (2000) and Machado and Mata (2005) could also be used. These methods involve, however, a number of assumptions and/or computational difficulties that can be avoided when the sole goal of the exercise is to perform an aggregate decomposition. By contrast, image methods involve no parametric assumptions and are an efficient way of estimating the aggregate decomposition.

It may be somewhat of an overstatement to say that computing the aggregate decomposition is a “solved” problem since there is still ongoing research on the small sample properties of various treatment effect estimators (see, for example, Busso et al., 2009). Nonetheless, performing an aggregate decomposition is relatively straightforward since several easily implementable estimators with good asymptotics properties are available.

C. Going beyond the mean is more difficult for the detailed decomposition

Until recently, no comprehensive approach was available for computing a detailed decomposition of the effect of single covariates for a distributional statistic image other than the mean. One popular approach for estimating the subcomponents of image is Machado and Mata (2005)’s method, which relies on quantile regressions for each possible quantile, combined with a simulation procedure. For the subcomponents of image, DiNardo et al. (1996) suggest a reweighting procedure to compute the contribution of a dummy covariate (like union status) to the aggregate composition effect image. Altonji et al. (2008) implemented a generalization of this approach to the case of either continuous or categorical covariates. Note, however, that these latter methods are generally path dependent, that is, the decomposition results depend on the order in which the decomposition is performed. Later in this chapter, we show how to make the contribution of the last single covariate path independent in the spirit of Gelbach (2009).

One comprehensive approach, very close in spirit to the original OB decomposition, which is path independent, uses the recentered influence function (image) regressions recently proposed by Firpo et al. (2009). The idea is to use the (recentered) influence function for the distribution statistic of interest instead of the usual outcome variable image as the left hand side variable in a regression. In the special case of the mean, the recentered influence function is image, and a standard regression is estimated, as in the case of the OB decomposition.

More generally, once the image regression has been estimated, the estimated coefficients can be used to perform the detailed decomposition in the same way as in the standard OB decomposition. The downside of this approach is that image regression coefficients only provide a local approximation for the effect of changes in the distribution of a covariate on the distributional statistics of interest. The question of how accurate this approximation is depends on the application at hand.

D. The analogy between quantile and standard (mean) regressions is not helpful

If the mean can be decomposed using standard regressions, can we also decompose quantiles using simple quantile regressions? Unfortunately, the answer is negative. The analogy with the case of the mean just does not apply in the case of quantile regressions.

To understand this point, it is important to recall that the coefficient image in a standard regression has two distinct interpretations. Under the conditional mean interpretation, image indicates the effect of image on the conditional mean image in the model image. Using the law of iterated expectations, we also have image. This yields an unconditional mean interpretation where image can be interpreted as the effect of increasing the mean value of image on the (unconditional) mean value of image. It is this particular property of regression models, and this particular interpretation of image, which is used in OB decompositions.

By contrast, only the conditional quantile interpretation is valid in the case of quantile regressions. As we discuss in more detail later, a quantile regression model for the imageth conditional quantile image postulates that image. By analogy with the case of the mean, image can be interpreted as the effect of image on the imageth conditional quantile of image given image. The law of iterated expectations does not apply in the case of quantiles, so image, where image is the unconditional quantile. It follows that image cannot be interpreted as the effect of increasing the mean value of image on the unconditional quantile image.

This greatly limits the usefulness of quantile regressions in decomposition problems. Machado and Mata (2005) suggest estimating quantile regressions for all image as a way of characterizing the full conditional distribution of image given image. The estimates are then used to construct the different components of the aggregate decomposition using simulation methods. Compared to other decomposition methods, one disadvantage of this method is that it is computational intensive.

An alternative regression approach where the estimated coefficient can be interpreted as the effect of increasing the mean value of image on the unconditional quantile image (or other distributional parameters) has recently been proposed by Firpo et al. (2009). As we mention above, this method provides one of the few options available for computing a detailed decomposition for distributional parameters other than the mean.

E. Decomposing proportions is easier than decomposing quantiles

A cumulative distribution provides a one-to-one mapping between (unconditional) quantiles and the proportion of observations below this quantile. Performing a decomposition on proportions is a fairly standard problem. One can either run a linear probability model and perform a traditional OB decomposition, or do a non-linear version of the decomposition using a logit or probit model.

Decompositions of quantiles can then be obtained by inverting back proportions into quantiles. Firpo et al. (2007) propose doing so using a first order approximation where the elements of the decomposition for a proportion are transformed into elements of the decomposition for the corresponding quantile by dividing by the density (slope of the cumulative distribution function). This can be implemented in practice by estimating recentered influence function (image) regressions (see Firpo et al., 2009).

A related approach is to decompose proportions at every point of the distribution (e.g. at each percentile) and invert back the whole fitted relationship to quantiles. This can be implemented in practice using the distribution regression approach of Chernozhukov et al. (2009).

F. There is no general solution to the “omitted group” problem

As pointed out by Jones (1983) and Oaxaca and Ransom (1999) among others, in the case of categorical covariates, the various elements of image in a detailed decomposition arbitrarily depend on the choice of the omitted group in the regression model. In fact, this interpretation problem may arise for any covariate, including continuous covariates, that does not have a clearly interpretable baseline value. This problem has been called an identification problem in the literature (Oaxaca and Ransom, 1999; Yun, 2005). But as pointed out by Gelbach (2002), it is better viewed as a conceptual problem with the detailed part of the decomposition for the wage structure effect.

As discussed above, the effect image for the omitted group can be interpreted as an average treatment effect among the omitted group (group for which image for all image). The decomposition then corresponds to a number of counterfactual experiments asking “by how much the treatment effect would change if image was switched from its value in the omitted group (image) to its average value (image)”? In cases like the gender wage gap where the treatment effect analogy is not as clear, the same logic applied, nonetheless. For example, one could ask instead “by how much the average gender gap would change if actual experience (image) was switched from its value in the omitted group (image) to its average value (image)?”

Since the choice of the omitted group is arbitrary, the elements of the detailed decomposition can be viewed as arbitrary as well. In cases where the omitted group has a particular economic meaning, the elements of the detailed decomposition are more interpretable as they correspond to interesting counterfactual exercises. In other cases the elements of the detailed decomposition are not economically interpretable. As a result, we argue that attempts at providing a general “solution” to the omitted group problem are misguided. We discuss instead the importance of using economic reasoning to propose some counterfactual exercise of interest, and suggest simple techniques to easily compute these counterfactual exercises for any distributional statistics, and not only the mean.

Organization of the chapter

The different methods covered in the chapter, along with their key assumptions and properties are listed in Table 1. The list includes an example of one representative study for each method, focusing mainly on studies on the gender and racial gap (see also Altonji and Blank, 1999), to facilitate comparison across methods. A detailed discussion of the assumptions and properties follows in the next section. The mean decomposition methodologies comprise the classic OB decomposition, as well as extensions that appeal to complex counterfactuals and that apply to limited dependent variable models. The methodologies that go beyond the mean include the classic variance decomposition, methods based on residual imputation, methods based on conditional quantiles and on estimating the conditional distribution, and methods based on reweighting and image-regressions.

Table 1 Maintained assumptions and properties of major decomposition methodologies.

image image

Since there are a number of econometric issues involved in decomposition exercises, we start in Section 2 by establishing what are the parameters of interest, their interpretation, and the conditions for identification in decomposition methods. We also introduce a general notation that we use throughout the chapter. Section 3 discusses exhaustively the case of decomposition of differences in means, as originally introduced by Oaxaca (1973) and Blinder (1973). This section also covers a number of ongoing issues linked to the interpretation and estimation of these decompositions. We then discuss decompositions for distributional statistics other than the mean in Sections 4 and 5. Section 4 looks at the case of the aggregate decomposition, while Section 5 focuses on the case of the detailed decomposition. Finally, we discuss a number of limitations and extensions to these standard decomposition methods in Section 6. Throughout the chapter, we illustrate the “nuts and bolts” of decomposition methods using empirical examples, and discuss important applications of these methods in the applied literature.

2 Identification: What Can We Estimate Using Decomposition Methods?

As we will see in subsequent sections, a large and growing number of procedures are available for performing decompositions of the mean or more general distributional statistics. But despite this rich literature, it is not always clear what these procedures seek to estimate, and what conditions need to be imposed to recover the underlying objects of interest. The main contribution of this section is to provide a more formal theory of decompositions where we clearly define what it is that we want to estimate using decompositions, and what are the assumptions required to identify the population parameters of interest. In the first part of the section, we discuss the case of the aggregate decomposition. Since the estimation of the aggregate decomposition is closely related to the estimation of treatment effects (see the introduction), we borrow heavily from the identification framework used in the treatment effects literature. We then move to the case of the detailed decomposition, where additional assumptions need to be introduced to identify the parameters of interest. We end the section by discussing the connection between program evaluation and decompositions, as well as the more general issue of causality in this context.

Decompositions are often viewed as simple accounting exercises based on correlations. As such, results from decomposition exercises are believed to suffer from the same shortcomings as OLS estimates, which cannot be interpreted as valid estimates of some underlying causal parameters in most circumstances. The interpretation of what decomposition results mean becomes even more complicated in the presence of general equilibrium effects.

In this section, we argue that these interpretation problems are linked in part to the lack of a formal identification theory for decompositions. In econometrics, the standard approach is to first discuss identification (what we want to estimate, and what assumptions are required to interpret these estimates as sample counterparts of parameters of interest) and then introduce estimation procedures to recover the object we want to identify. In the decomposition literature, most papers jump directly to the estimation issues (i.e. discuss procedures) without first addressing the identification problem.5

To simplify the exposition, we use the terminology of labor economics, where, in most cases, the agents are workers and the outcome of interest is wages. Decomposition methods can also be applied in a variety of other settings, such as gaps in test scores between gender (Sohn, 2008), schools (Krieg and Storer, 2006) or countries (McEwan and Marshall, 2004).

Throughout the chapter, we restrict our discussion to the case of a decomposition for two mutually exclusive groups. This rules out decomposing wage differentials between overlapping groups like Blacks, Whites, and Hispanics, who can be Black or White.6 In this setting, the dummy variable method (Cain, 1986) with interactions is a more natural way of approaching the problem. Then one can use Gelbach (2009)’s approach, which appeals to the omitted variables bias formula, to compute a detailed decomposition.

The assumption of mutually exclusive groups is not very restrictive, however, since most decomposition exercises fall into this category:

Assumption 1 ( Mutually Exclusive Groups) The population of agents can be divided into two mutually exclusive groups, denoted image and image. Thus, for an agent image, image, where image, image, and image is the indicator function.

We are interested in comparing features of the wage distribution for two groups of workers: image and image. We observe wage image for worker image, which can be written as image, for image, where image is the wage worker image would receive in group image. Obviously, if worker image belongs to group image, for example, we only observe image.

As in the treatment effects literature, image and image can be interpreted as two potential outcomes for worker image. While we only observe image when image, and image when image, decompositions critically rely on counterfactual exercises such as “what would be the distribution of image for workers in group image ?”. Since we do not observe this counterfactual wage image for these workers, some assumptions are required for estimating this counterfactual distribution.

2.1 Case 1: The aggregate decomposition

2.1.1 The overall wage gap and the structural form

Our identification results for the aggregate decomposition are very general, and hold for any distributional statistic.7 Accordingly, we focus on general distributional measures in this subsection of the chapter.

Consider the case where the distributional statistic of interest is image, where image is a real-valued functional, and where image is a class of distribution functions such that image if image, image. The distribution function image represents the distribution of the (potential) outcome image for workers in group image. image is an observed distribution when image, and a counterfactual distribution when image.

The overall image-difference in wages between the two groups measured in terms of the distributional statistic image is


image     (2)


The more common distributional statistics used to study wage differentials are the mean and the median. The wage inequality literature has focused on the variance of log wages, the Gini and Theil coefficients, and the differentials between the 90th and 10th percentiles, the 90th and 50th percentiles, and the 50th and 10th percentiles. These latter measures provide a simply way of distinguishing what happens at the top and bottom end of the wage distribution. Which statistic image is most appropriate depends on the problem at hand.

A typical aim of decomposition methods is to divide image, the image-overall wage gap between the two groups, into a component attributable to differences in the observed characteristics of workers, and a component attributable to differences in wage structures. In our setting, the wage structure is what links observed characteristics, as well as some unobserved characteristics, to wages.

The decomposition of the overall difference into these two components depends on the construction of a meaningful counterfactual wage distribution. For example, counterfactual states of the world can be constructed to simulate what the distribution of wages would look like if workers had different returns to observed characteristics. We may want to ask, for instance, what would happen if group image workers were paid like group image workers, or if women were paid like men? When the two groups represent different time periods, we may want to know what would happen if workers in year 2000 had the same characteristics as workers in 1980, but were still paid as in 2000. A more specific counterfactual could keep the return to education at its 1980 level, but set all the other components of the wage structure at their 2000 levels.

As these examples illustrate, counterfactuals used in decompositions often consist of manipulating structural wage setting functions (i.e. the wage structure) linking the observed and unobserved characteristics of workers to their wages for each group. We formalize the role of the wage structure using the following assumption:

Assumption 2 ( Structural Form) A worker image belonging to either group image or image is paid according to the wage structure, image and image, which are functions of the worker’s observable (image) and unobservable (image) characteristics:


image     (3)


where image has a conditional distribution image given image, and image.

While the wage setting functions are very general at this point, the assumption implies that there are only three reasons why the wage distribution can differ between group image and image. The three potential sources of differences are (i) differences between the wage setting functions image and image, (ii) differences in the distribution of observable (image) characteristics, and (iii) differences in the distribution of unobservable (image) characteristics. The aim of the aggregate decomposition is to separate the contribution of the first factor (differences between image and image) from the two others.

When the counterfactuals are based on the alternative wage structure (i.e. using the observed wage structure of group image as a counterfactual for group image), decompositions can easily be linked to the treatment effects literature. However, other counterfactuals may be based on hypothetical states of the world, that may involve general equilibrium effects. For example, we may want to ask what would be the distribution of wages if group image workers were paid according to the pay structure that would prevail if there were no image workers, for example if there were no union workers. Alternatively, we may want to ask what would happen if women were paid according to some non-discriminatory wage structure (which differs from what is observed for either men or women)?

We use the following assumption to restrict the analysis to the first type of counterfactuals.

Assumption 3 ( Simple Counterfactual Treatment) A counterfactual wage structure, image, is said to correspond to a simple counterfactual treatment when it can be assumed that image for workers in group B, or image for workers in group A.

It is helpful to represent the assumption using the potential outcomes framework introduced earlier. Consider image,where image indicates the potential outcome, while image indicates group membership. For group image, the observed wage is image, while image represents the counterfactual wage. For group image, image is the observed wage while the counterfactual wage is image. Note that we add the superscript C to highlight counterfactual wages. For instance, consider the case where workers in group image are unionized, while workers in group image are not unionized. The dichotomous variable image indicates the union status of workers. For a worker image in the union sector (image), the observed wage under the “union” treatment is image, while the counterfactual wage that would prevail if the worker was not unionized is image, image. An alternative counterfactual could ask what would be the wage of a non-union worker image if this worker was unionized image, image. We note that the choice of which counterfactual to choose is analogous to the choice of reference group in standard OB decomposition. 8

What Assumption 3 rules out is the existence of another counterfactual wage structure such as image that represents how workers would be paid if there were no unions in the labor market. Unless there are no general equilibrium effects, we would expect that image, and, thus, Assumption 3 to be violated.

2.1.2 Four decomposition terms

With this setup in mind, we can now decompose the overall difference image into the four following components of interest:

D.1 Differences associated with the return to observable characteristics under the structural image functions. For example, one may have the following counterfactual in mind: What if everything but the return to image was the same for the two groups?
D.2 Differences associated with the return to unobservable characteristics under the structural image functions. For example, one may have the following counterfactual in mind: What if everything but the return to image was the same for the two groups?
D.3 Differences in the distribution of observable characteristics. We have here the following counterfactual in mind: What if everything but the distribution of image was the same for the two groups?
D.4 Differences in the distribution of unobservable characteristics. We have the following counterfactual in mind: What if everything but the distribution of image was the same for the two groups?

Obviously, because unobservable components are involved, we can only decompose image into the four decomposition terms after imposing some assumptions on the joint distribution of observable and unobservable characteristics. Also, unless we make additional separability assumptions on the structural forms represented by the image functions, it is virtually impossible to separate out the contribution of returns to observables from that of unobservables. The same problem prevails when one tries to perform a detailed decomposition in returns, that is, provide the contribution of the return to each covariate separately.

2.1.3 Imposing identification restrictions: overlapping support

The first assumption we make to simplify the discussion is to impose a common support assumption on the observables and unobservables. Further, this assumption ensures that no single value of image or image can serve to identify membership into one of the groups.

Assumption 4 ( Overlapping Support) Let the support of all wage setting factors image be image. For all image in image, image.

Note that the overlapping support assumption rules out cases where inputs may be different across the two wage setting functions. The case of the wage gap between immigrant and native workers is an important example where the image vector may be different for two groups of workers. For instance, the wage of immigrants may depend on their country of origin and their age at arrival, two variables that are not defined for natives. Consider also the case of changes in the wage distribution over time. If group image consists of workers in 1980, and group image of workers in 2000, the difference in wages over time should take into account the fact that many occupations of 2000, especially those linked to information technologies, did not even exist in 1980. Thus, taking those differences explicitly into account could be important for understanding the evolution of the wage distribution over time.

The case with different inputs can be formalized as follows. Assume that for group image, there is a image vector of observable and unobservable characteristics image that may include components not included in the image vector of characteristics image for group image, where image and image denote the length of the image and image vectors, respectively. Define the intersection of these characteristics by the image vector image, which represent characteristics common to both groups. The respective complements, which are group-specific characteristics, are denoted by tilde as imageand image, such that image and image.

In that context, the overlapping support assumption could be restated by letting the support of all wage setting factors image be image. The overlapping support assumption would then guarantee that, for all image in image, image. The assumption rules out the existence of the vectors image and image.

In the decomposition of gender wage differentials, it is not uncommon to have explanatory variables for which this condition does not hold. Black et al. (2008) and Ñopo (2008) have proposed alternative decompositions based on matching methods to address cases where they are severe gaps in the common support assumption (for observables). For example, Ñopo (2008) divides the gap into four additive terms. The first two are analogous to the above composition and wage structure effects, but they are computed only over the common support of the distributions of observable characteristics, while the other two account for differences in support.

2.1.4 Imposing identification restrictions: ignorability

We cannot separate out the decomposition terms (D.1) and (D.2) unless we impose some separability assumptions on the functional forms of image and image. For highly complex nonlinear functions of observables image and unobservables image, there is no clear definition of what would be the component of the image functions associated with either image or image. For instance, if image and image represent years of schooling and unobserved ability, respectively, we may expect the return to schooling to be higher for high ability workers. As a result, there is an interaction term between image or image in the wage equation image, which makes it hard to separate the contribution of these two variables to the wage gap.

Thus, consider the decomposition term D.1* that combines (D.1) and (D.2):

D.1* Differences associated with the return to observable and unobservable characteristics in the structural image functions.

This decomposition term solely reflects differences in the image functions. We call this decomposition term image, or the “image-wage structure effect” on the “image-overall difference”, image. The key question here is how to identify the three decomposition terms (D.1*), (D.3) and (D.4) which, under Assumption 4, fully describe image?

We denote the decomposition terms (D.3) and (D.4) as image and image, respectively. They capture the impact of differences in the distributions of image and image between groups image and image on the overall difference, image. We can now write


image


Without further assumptions we still cannot identify these three terms. There are two problems. First, we have not imposed any assumption for the identification of the image functions, which could help in our identification quest. Second, we have not imposed any assumption on the distribution of unobservables. Thus, even if we fix the distribution of covariates image to be the same for the two groups, we cannot clearly separate all three components because we do not observe what would happen to the unobservables under this scenario.

Therefore, we need to introduce an assumption to make sure that the effect of manipulations of the distribution of observables image will not be confounded by changes in the distribution of image. As we now show formally, the assumption required to rule out these confounding effects is the well-known ignorability, or unconfoundedness, assumption.

Consider a few additional concepts before stating our main assumption. For each member of the two groups image, an outcome variable image and some individual characteristics image are observed. image and image have a conditional joint distribution, image, and image is the support of image.

The distribution of image is defined using the law of iterated probabilities, that is, after we integrate over the observed characteristics we obtain


image     (4)


We can construct a counterfactual marginal wage distribution that mixes the conditional distribution of image given image and image using the distribution of image. We denote that counterfactual distribution as image, which is the distribution of wages that would prevail for group image workers if they were paid like group image workers. This counterfactual distribution is obtained by replacing image with image (or image with image) in Eq. (4):


image     (5)


These types of manipulations play a very important role in the implementation of decomposition methods. Counterfactual decomposition methods can either rely on manipulations of image, as in DiNardo et al. (1996), or of image, as in Albrecht et al. (2003) and Chernozhukov et al. (2009).9

Back to our union example, image represents the conditional distribution of wages observed in the union sector, while image represents the conditional distribution of wages observed in the non-union sector. In the case where image, Eq. (4) yields, by definition, the wage distribution in the union sector where we integrate the conditional distribution of wages given image over the marginal distribution of image in the union sector, image. The counterfactual wage distribution image is obtained by integrating over the conditional distribution of wages in the non-union sector instead (Eq. (5)). It represents the distribution of wages that would prevail if union workers were paid like non-union workers.

The connection between these conditional distributions and the wage structure is easier to see when we rewrite the distribution of wages for each group in terms of the corresponding structural forms,


image


Conditional on image, the distribution of wages only depends, therefore, on the conditional distribution of image, and the wage structure image.10 When we replace the conditional distribution in the union sector, image, with the conditional distribution in the non-union sector, image, we are replacing both the wage structure and the conditional distribution of image. Unless we impose some further assumptions on the conditional distribution of image, this type of counterfactual exercise will not yield interpretable results as it will mix differences in the wage structure and in the distribution of image.

To see this formally, note that unless image has the same conditional distribution across groups, the difference


image     (6)


will mix differences in image functions and differences in the conditional distributions of image given image.

We are ultimately interested in a functional image (i.e. a distributional statistic) of the wage distribution. The above result means that, in general, image. The question is under what additional assumptions will the difference between a statistic from the original distribution of wages and the counterfactual distribution, image, solely depend on differences in the wage structure? The answer is that under a conditional independence assumption, also known as ignorability of the treatment in the treatment effects literature, we can identify image and the remaining terms image and image.

Assumption 5 ( Conditional Independence/Ignorability) For image, let image have a joint distribution. For all image in image: image is independent of image given image or, equivalently, image.

In the case of the simple counterfactual treatment, the identification restrictions from the treatment effects literature may allow the researcher to give a causal interpretation to the results of the decomposition methodology as discussed in Section 2.3. The ignorability assumption has become popular in empirical research following a series of papers by Rubin and coauthors and by Heckman and coauthors.11 In the program evaluation literature, this assumption is sometimes called unconfoundedness or selection on observables, and allows identification of the treatment effect parameter.

2.1.5 Identification of the aggregate decomposition

We can now state our main result regarding the identification of the aggregate decomposition

Proposition 1 ( Identification of the Aggregate Decomposition) UnderAssumption 3(simple counterfactual),4(overlapping support), and5(ignorability), the overall image-image
gap, can be written as


image


where

(i) the wage structure term image solely reflects the difference between the structural functions image and image
(ii) the composition effect term image solely reflects the effect of differences in the distribution of characteristics (image  and image ) between the two groups.

This important result means that, under the ignorability and overlapping assumptions, we can give a structural interpretation to the aggregate decomposition that is formally linked to the underlying wage setting models, image and image. Note also that the wage structure (image) and composition effect (image) terms represent algebraically what we have informally defined by terms D.1* and D.3.

As can be seen from Eq. (6), the only source of difference between image and image is the difference between the structural functions image and image. Now note that under Assumptions 4 and 5, we have that image, where


image


Thus, image reflects only changes or differences in the distribution of observed covariates. As a result, under Assumptions 4 and 5, we identify image by image and set image. This normalization makes sense as a result of the conditional independence assumption: no difference in wages will be systematically attributed to differences in distributions of image once we fix these distributions to be the same given image. Thus, all remaining differences beyond image are due to differences in the distribution of covariates captured by image.

Combining these two results, we get


image     (7)


which is the main result in Proposition 1.

When the Assumption 3 (simple counterfactual) and 5 (ignorability) are satisfied, the conditional distribution of image given image remains invariant under manipulations of the marginal distribution of image. It follows that Eq. (5) represents a valid counterfactual for the distribution of image that would prevail if workers in group image were paid according to the wage structure image. The intuition for this result is simple. Since image, manipulations of the distribution of image can only affect the conditional distribution of image given image if they either (i) change the wage setting function image, or (ii) change the distribution of image given image. The first change is ruled out by the assumption of a simple counterfactual treatment (i.e. no general equilibrium effects), while the second effect is ruled out by the ignorability assumption.

In the inequality literature, the invariance of the conditional distribution is often introduced as the key assumption required for image to represent a valid counterfactual (e.g. DiNardo et al., 1996; Chernozhukov et al., 2009).

Assumption 6 ( Invariance of Conditional Distributions) The construction of the counterfactual wage distribution for workers of group image that would have prevailed if they were paid like group image workers (described in Eq. (5)), assumes that the conditional wage distribution image applies or can be extrapolated for image, that is, it remains valid when the marginal distribution image replaces image.

One useful contribution of this chapter is to show the economics underneath this assumption, i.e. that the invariance assumption holds provided that there are no general equilibrium effects (ruled out by Assumption 3) and no selection based on unobservables (ruled out by Assumption 5).

Assumption 6 is also invoked by Chernozhukov et al. (2009) to perform the aggregate decomposition using the following alternative counterfactual that uses group image as the reference group. Let image be the distribution of wages that would prevail for group image workers under the conditional distribution of wages of group image workers. In our union example, this would represent the distribution of wages of non-union workers that would prevail if they were paid like union workers. Relative to Eq. (7), the terms of the decomposition equation are now inverted:


image


Now the first term image is the composition effect and the second term image the wage structure effect.

Whether the assumption of the invariance of the conditional distribution is likely to be satisfied in practice depends on the economic context. If group image were workers in 2005 and group image were workers in 2007, perhaps Assumption 6 would be more likely to hold than if group image were workers in 2007 and group image were workers in 2009 in the presence of the 2009 recession. Thus it is important to provide an economic rationale to justify Assumption 6 in the same way the choice of instruments has to be justified in terms of the economic context when using an instrumental variable strategy.

2.1.6 Why ignorability may not hold, and what to do about it

The conditional independence assumption is a somewhat strong assumption. We discuss three important cases under which it may not hold:

1. Differential selection into labor market. This is the selection problem that Heckman (1979) is concerned with in describing the wage offers for women. In the case of the gender pay gap analysis, it is quite plausible that the decisions to participate in the labor market are quite different for men and women. Therefore, the conditional distribution of image may be different from the distribution of image. In that case, both the observed and unobserved components may be different, reflecting the fact that men participating in the labor market may be different in observable and unobservable ways from women who also participate. The ignorability assumption does not necessarily rule out the possibility that these distributions are different, but it constrains their relationship. Ignorability implies that the joint densities of observables and unobservables for groups image and image (men and women) have to be similar up to a ratio of conditional probabilities:

image


2. Self-selection into groups A and B based on unobservables. In the gender gap example there is no selection into groups, although the consequences of differential selection into the labor market are indeed the same. An example where self-selection based on unobservables may occur is in the analysis of the union wage gap. The conditional independence or ignorability assumption rules out selection into groups based on unobservable components image beyond image. However, the ignorability assumption does not impose that image, so the groups may have different marginal distributions of image. But if selection into groups is based on unobservables, then the ratio of conditional joint densities will in general depend on the value of image being evaluated, and not only on image, as ignorability requires:

image


3. Choice of image  and image. In the previous case, the values of image and image are not determined by group choice, although they will be correlated and may even explain the choice of the group. In the first example of the gender pay gap, values of image and image such as occupation choice and unobserved effort may also be functions of gender ‘discrimination’. Thus, the conditional independence assumption will not be valid if image is a function of image, even holding image constant. The interpretation of ignorability here is that given the choice of image, the choice of image will be randomly determined across groups. Pursuing the gender pay gap example, fixing image (for example education), men and women would exert the same level of effort. The only impact of anticipated discrimination is that they may invest differently in education.

In Section 6, we discuss several solutions to these problems that have been proposed in the decomposition literature. Those include the use of panel data methods or standard selection models. In case 2 above, one could also use instrumental variable methods to deal with the fact that the choice of group is endogenous. One identification issue we briefly address here is that IV methods would indeed yield a valid decomposition, but only for the subpopulation of compliers.

To see this, consider the case where we have a binary instrumental variable image, which is independent of image conditional on image, where image is a categorical variable which indicates ‘type’. There are four possible types: image,image, image and image as described below:

Assumption 7 ( LATE) For image, let image have a joint distribution in image. We define image, a random variable that may take on four values image, and that can be constructed using image and image according to the following rule: if image and image, then image; if image and image, then image; if image and image, then image; if image and image, then image.

(i) For all image in image: image is independent of image.
(ii) image.

These are the LATE assumptions from Imbens and Angrist (1994), which allow us to identify the counterfactual distribution of image. We are then able to decompose the image-wage gap under that less restrictive assumption, but only for the population of compliers:


image


2.2 Case 2: The detailed decomposition

One convenient feature of the aggregate decomposition is that it can be performed without any assumption on the structural functional forms, image, while constraining the distribution of unobserved (image) characteristics.12 Under the assumptions of Proposition 1, the composition effect component image reflects differences in the distribution of image, while the wage structure component image reflects differences in the returns to either image or image.

To perform a detailed decomposition, we need to separate the respective contributions of image or image in both image and image, in addition to separating the individual contribution of each element of the vector of covariates image. Thus, generally speaking, the identification of an interpretable detailed decomposition involves stronger assumptions such as functional form restrictions and/or further restrictions on the distribution of image, like independence with respect to image and image.

Since these restrictions tend to be problem specific, it is not possible to present a general identification theory as in the case of the aggregate decomposition. We discuss instead how to identify the elements of the detailed decomposition in a number of specific cases. Before discussing these issues in detail, it is useful to state what we seek to recover with a detailed decomposition.

Property 1 ( Detailed Decomposition) A procedure is said to provide a detailed decomposition when it can apportion the composition effect, image , or the wage structure effect, image , into components attributable to each explanatory variable:

1. The contribution of each covariate image to the composition effect, image , is the portion of image  that is only due to differences between the distribution of image in groups image and image . When image , the detailed decomposition of the composition effect is said to add up.
2. The contribution of each covariate image to the wage structure effect, image , is the portion of imagethat is only due to differences in the parameters associated with image in group image and image , i.e. to differences in the parameters of image and image linked to image . Similarly, the contribution of unobservables image to the wage structure effect, image , is the portion of imagethat is only due to differences in the parameters associated with image in image and image .

Note that unobservables do not make any contribution to the composition effect because of the ignorability assumption we maintain throughout most of the chapter. As we mentioned earlier, it is also far from clear how to divide the parameters of the functions image and image into those linked to a given covariate or to unobservables. For instance, in a model with a rich set of interactions between observables and unobservables, it is not obvious which parameters should be associated with a given covariate. As a result, computing the elements of the detailed decomposition for the wage structure involves arbitrary choices to be made depending on the economic question of interest.

The adding-up property is automatically satisfied in linear settings like the standard OB decomposition, or the image-regression procedure introduced in Section 5.2. However, it is unlikely to hold in non-linear settings when the distribution of each individual covariate image is changed while keeping the distribution of the other covariates unchanged (e.g. in the case discussed in Section 5.3). In such a procedure “with replacement” we would, for instance, first replace the distribution of image for group image with the distribution of image for group image, then switch back to the distribution of image for group image and replace the distribution of image instead, etc.

By contrast, adding up would generally be satisfied in a sequential (e.g. “without replacement”) procedure where we first replace the distribution of image for group image with the distribution of image for group image, and then do the same for each covariate until the whole distribution of image has been replaced. The problem with this procedure is that it would introduce some path dependence in the decomposition since the “effect” of changing the distribution of one covariate generally depends on distribution of the other covariates.

For example, the effect of changes in the unionization rate on inequality may depend on the industrial structure of the economy. If unions have a particularly large effect in the manufacturing sector, the estimated effect of the decline in unionization between, say, 1980 and 2000 will be larger under the distribution of industrial affiliation observed in 1980 than under the distribution observed in 2000. In other words, the order of the decomposition matters when we use a sequential (without replacement) procedure, which means that the property of path independence is violated. As we will show later in the chapter, the lack of path independence in many existing detailed decomposition procedures based on a sequential approach is an important shortcoming of these approaches.

Property 2 ( Path Independence) A decomposition procedure is said to be path independent when the order in which the different elements of the detailed decomposition are computed does not affect the results of the decomposition.

A possible solution to the problem of path dependence suggested by Shorrocks (1999) consists of computing the marginal impact of each of the factors as they are eliminated in succession, and then averaging these marginal effects over all the possible elimination sequences. He calls the methodology the Shapley decomposition, because the resulting formula is formally identical to the Shapley value in cooperative game theory. We return to these issues later in the chapter.

2.2.1 Nonparametric identification of structural functions

One approach to the detailed decomposition is to identify the structural functions image and image, and then use the knowledge of these structural forms to compute various counterfactuals of interest. For example, one could look at what happens when all the parameters of image pertaining to education are switched to their values estimated for group image, while the rest of the image function remains unchanged.

For the purpose of identifying the structural functions image and image, neither ignorability nor LATE assumptions are very helpful. Stronger assumptions invoked in the literature on nonparametric identification of structural functions (e.g. Matzkin, 2003; Blundell and Powell, 2007; Imbens and Newey, 2009) have to be used instead:

Assumption 8 ( Independence) For image, image.

Assumption 9 ( Strict Monotonicity in the Random Scalar image) For image and for all values image in image, image is a scalar random variable and image is strictly increasing in image.

With these two additional assumptions we can write, for image, the functions image using solely functionals of the joint distribution of image. We can assume without loss of generality that image, because (i) we observe the conditional distributions of image, and image is a scalar random variable independent of image given image. Once we have identified the functions image for image, we can construct the counterfactual distribution of image and compute any distributional statistic of interest.13

Note, however, that the monotonicity assumption is not innocuous in the context of comparisons across groups. If there was only one group of workers, the monotonicity assumption would be a simple normalization. With more than one group, however, it requires that the same unobservable variable has positive returns for all groups of workers, which in some settings may not be plausible, though this is automatically satisfied in additively separable models.

There are various reasons why this assumption may be problematic in practice. Empirical wage distributions exhibit many flat spots because of heaping or minimum wage effects. For example, if group image and image corresponded to two different years or countries with different minimum wages, the monotonicity assumption would not be satisfied.14 The monotonicity assumption would also break down in the presence of measurement error in wages since the wage residual would now mix measurement error and unobserved skills. As a result, the same amount of unobserved skills would not guarantee the same position in the conditional distribution of residuals in the two groups.

In most labor economics applications, assuming that unobservables are independent of the covariates is a strong and unrealistic assumption. Thus, the identification of the structural functions comes at a relatively high price. The milder assumption of ignorability allows us to identify image and image. With full independence, we can go back and identify more terms. In fact, because we obtain an expression for image, we can construct detailed decompositions by fixing deterministically the values of some covariates while letting others vary.

2.2.2 Functional form restrictions: decomposition of the mean

A more common approach used in the decomposition literature consists of imposing functional form restrictions to identify the various elements of a detailed decomposition. For instance, detailed decompositions can be readily computed in the case of the mean using the assumptions implicit in Oaxaca (1973) and Blinder (1973). The first assumption is additive linearity of the image functions. The linearity assumption is also commonly used in quantile-based decomposition methodologies, such as Albrecht et al. (2003), Machado and Mata (2005), and Melly (2006). The linearity assumption allows for heteroscedasticity due, for example, to the fact that the variance of unobservables increases as educational attainment increases.

Assumption 10 ( Additive Linearity) The wage structure, image and image, are linear additively separable functions in the worker’s observable and unobservable characteristics:


image


where image.

The second assumption implicit in the OB procedure is that the conditional mean of image is equal to zero:

Assumption 11 ( Zero Conditional Mean) image.

Under mean independence, we have that for image, image and therefore we can write the mean counterfactual image as image. Therefore,


image


2.2.3 Functional form restrictions: more general decompositions

Under Assumption 11, the error term conveniently drops out of the decomposition for the mean. For more general distributional statistics such as the variance, however, we need more assumptions about the distribution of unobservables to perform a detailed decomposition. If we add the following assumptions on the conditional wage variance and on the function of the unobservables image, we can separate out the wage structure effects of observables and unobservables.

Assumption 12 ( Constant Returns to Unobservables) For image.

Assumption 13 ( Homoscedasticity) For image.

Under these two additional assumptions, we can identify image, and interpret it as the price of unobservables.15Assumption 10 (additive linearity) then allows us to separate out returns to observable and unobservable factors, and to separately identify the contribution of observable and unobservable factors to the wage structure effect. Note that because of the zero conditional mean assumption, only the observable factors influence mean wages.

More formally, consider the counterfactual wage, image, for group image workers where the return to unobservables is set to be as in group image, 16


image     (8)


Under the Assumption 5, and 9 to 13, we can divide the wage structure effect into a component linked to unobservables, image, and a component linked to observables, image, as follows


image


The above assumptions correspond to those implicitly used by Juhn et al. (1991) in their influential study on the evolution of the black-white wage gap.17 While it is useful to work with a single “price” of unobservables image, doing so is not essential for performing a detailed decomposition. Juhn et al. (1993) [JMP] use a weaker set of assumptions in their influential study of wage differentials over time that we now discuss in more detail.

JMP propose a residual imputation procedure that relies on the key assumption that the rank of worker image in the distribution of image is the same as in the distribution of image, conditional on image. This procedure enables them to perform a decomposition even when the function image used to define the regression error image is not linear (non-linear skill pricing). Since the (conditional) rank of image normalized on a image scale is simply the cumulative distribution image evaluated at that point, conditional rank preservation can be stated as follows in our context:

Assumption 14 ( Conditional Rank Preservation) For all individual image, we have image, where image and image are the rankings of image and image in their respective conditional distributions.

Under this assumption, if individual image in group image observed at rank image were in group image instead, he/she would remain at the same rank in the conditional distribution of image for that other group (and vice versa). Conditional rank preservation is a direct consequence of the assumptions of ignorability (Assumption 5) and monotonicity (Assumption 9). Under ignorability, the distribution of image given image does not depend on group membership. Since image and image, the rank of image and image in their respective distributions is the same as the rank of image, provided that image and image are monotonic.

Note that the assumption of rank preservation is substantially stronger than ignorability. For instance, consider the case where image is a vector of two ability measures: cognitive ability and manual ability. If cognitive ability is more valued under the wage structure image than under the wage structure image, the ranking of workers in the image and image distributions will be different, which means that neither monotonicity nor rank preservation will hold. But provided that the conditional distribution of cognitive and manual ability given image is the same for groups image and image, ignorability holds, which means that the aggregate decomposition is still identified.

We explain how to implement the JMP procedure in practice in Section 4.3. Compared to the procedure described above to construct the counterfactual wage, image, the difference is that an imputed residual from the group image distribution is used instead of image. The idea is to replace image with rank image in the conditional distribution of image with an imputed error term


image     (9)


The resulting counterfactual wage for group image workers,


image     (10)


can then be used to compute the following two elements of the decomposition:

image

image

One important implementation issue we discuss in Section 4.3 is how to impute residuals conditional on image. This is an important limitation of JMP’s procedure that can be addressed in a number of ways. One popular approach is to use conditional quantile regressions to allow for different returns to observables that vary along the conditional wage distribution. This approach was proposed by Machado and Mata (2005) and reexamined by Albrecht et al. (2003) and Melly (2005). It relies on the assumption that the conditional distribution of image, is completely characterized by the collection of regression quantiles image.

Assumption 15 ( Heterogenous Returns to Observables) For image.

Assumption 16 ( Complete Collection of Linear Conditional Quantiles) For image, and image.

The above assumptions plus ignorability allow the decomposition of image into image and image. Note that because image for all image, we are fully parameterizing the conditional distribution of image by image using all image. Thus, once one inverts the conditional quantile to obtain a conditional CDF, one can apply Eq. (4) and (5) to compute an actual or counterfactual distribution.

Many other decomposition methods have been proposed to deal with parametric and nonparametric identification of conditional distribution functions. We have discussed the JMP procedure, as well as extensions to the case of conditional quantiles, as a way of illustrating the kind of assumptions required for identifying detailed decompositions of general distributional statistics. The general message is that more stringent assumptions have to be imposed to perform a detailed decomposition instead of an aggregate decomposition. The same general message would apply if we had discussed the identification of other decomposition procedures such as (to cite a few examples) Donald et al. (2000), Fortin and Lemieux (1998), Melly (2005), Chernozhukov et al. (2009), and Rothe (2009) instead.

Finally, it is also possible to relax some of the above assumptions provided that other assumptions are used instead. For instance, if one fixes the prices of unobservables to be the same across groups, say to a unit price, then image reflects in fact changes in the distribution of unobservables. In that case, ignorability does not hold, but because of linearity and zero conditional mean assumptions we can identify the parameter image’s. The difference between image and image is interpreted as differences in the image-quantile of the conditional distribution of image given image across groups image and image (image is the image-quantile of the conditional distribution of image for group image). Let us state the following normalization assumption.

Assumption 17 ( Unit Price to Unobservables) For image.

The overall wage gap can then be decomposed as follows


image     (11)


Because of Assumptions 10, 12 and 17, we now have image and image. The first difference image, corresponds to differences in image’s only; the second difference is due to differences in


image


which are explained by differences in the conditional distribution of image given image across groups image and image. Thus, an easy way to obtain that difference is to construct a counterfactual


image     (12)


and to replace image with image given that they will be equivalent under the above functional form assumptions.

Finally, the difference image can be obtained as a residual difference. However, under the maintained assumptions it shall reflect only differences in the marginal distributions of image.

2.3 Decomposition terms and their relation to causality and the treatment effects literature

We end this section by discussing more explicitly the connection between decompositions and various concepts introduced in the treatment effects literature. As it turns out, when the counterfactuals are based on hypothetical alternative wage structures, they can be easily linked to the treatment effects literature. For example: What if group image workers were paid according to the wage structure of group image? What if all workers were paid according to the wage structure of group image?

Define the overall average treatment effect (image) as the difference between average wages if everybody were paid according to the wage structure of group image and average wages if everybody were paid according to the wage structure of group image. That is:


image


where switching a worker of from “type image” to “type image” is thought to be the “treatment”.

We also define the average treatment effect on the treated (image) as the difference between actual average wages of group image workers and average wages if group image workers were paid according to the pay structure of group image. That is:


image


These treatment effects can be generalized to other functionals or statistics of the wage distribution. For example, define image-image, the image-treatment effect, as


image


and its version applied to the subpopulation of “treated”, image-image as


image


The distributions image, image and image are not observed from data on image.18 Following the treatment effects literature, we could in principle identify these parameters if “treatment” was randomly assigned. This is hardly the case, at least for our examples, and one needs extra identifying restrictions. In fact, we note that ignorability and common support assumptions (which together are termed strong ignorability after (Rosenbaum and Rubin, 1983)) are sufficient to guarantee identification of the previous parameters. For example under strong ignorability, for image

image

image

Under ignorability, it follows that image. Then image-image and image. Reweighting methods, as discussed by DiNardo et al. (1996), Hirano et al. (2003) and Firpo (2007, 2010) have implicitly or explicitly assumed strong ignorability to identify specific image-treatment effects.

It is interesting to see how the choice of the reference or base group is related to the treatment effects literature. Consider the treatment effect parameter for the non-treated, image-image:


image


Under strong ignorability, we have image. Thus, in this case, image-image and image.

We could also consider other decompositions, such as:


image


where image includes the actual wages of group image workers and the counterfactual wages of group image workers if they were are paid like group image workers, and conversely for image. In this case, the wage structure effect is image-image, while the composition effect is the sum image.19

The above discussion reveals that the reference group choice problem is just a matter of choosing a meaningful counterfactual. There will be no right answer. In fact, we see that analogously to the treatment effects literature, where treatment effect parameters are different from each other because they are defined over distinct subpopulations, the many possible ways of performing decompositions will reflect the reference group that we want to emphasize.

We conclude this section by discussing briefly the relationship between causality, structural parameters and decomposition terms. In this section, we show that the decomposition terms do not necessarily rely on the identification of structural forms. Whenever we can identify those structural functions linking observable and unobservable characteristics to wages, we benefit from being able to perform counterfactual analysis that we may not be able to do otherwise. However, that comes at the cost of having to impose either strong independence assumptions, as in the case of nonparametric identification, or restrictive functional form assumptions plus some milder independence assumption (mean independence, for instance) between observables and unobservables within each group of workers.

If we are, however, interested in the aggregate decomposition terms image and image, we saw that a less restrictive assumption is sufficient to guarantee identification of these terms. Ignorability is the key assumption here as it allows fixing the conditional distribution of unobservables to be the same across groups. The drawback is that we cannot separate out the wage structure effects associated with particular observable and unobservable characteristics.

The treatment effects literature is mainly concerned with causality. Under what conditions can we claim that although identifiable under ignorability, image may have a causal interpretation? The conditions under which we could say that image is a causal parameter are very stringent and unlikely to be satisfied in general cases. There are two main reasons for that, in our view.

First, in many cases, “treatment” is not a choice or a manipulable action. When decomposing gender or race gaps in particular, we cannot conceive workers choosing which group to belong to. 20 They may have different labor market participation behavior, which is one case where ignorability may not hold, as discussed in Section 2.1.6. However, workers cannot choose treatment. Thus, if we follow, for example, Holland (1986)’s discussion of causality, we cannot claim that image is a causal parameter.

A second reason for failing to assign causality to the pay structure effect is that most of the observable variables considered as our image (or unobservables image) are not necessarily pre-treatment variables. 21 In fact, image may assume different values as a consequence of the treatment. In the treatment effects literature, a confounding variable image may have different distributions across treatment groups. But that is not a direct action of the treatment. It should only be a selection problem: People who choose to be in a group may have a different distribution of image relative to people who choose to be in the other group. When image is affected by treatment, we cannot say that controlling for image we will obtain a causal parameter. In fact, what we will obtain is a partial effect parameter, netted from the indirect effect through changes in image.

3 Oaxaca-Blinder—Decompositions of Mean Wages Differentials

In this section, we review the basics of OB decompositions, discussing at length some thorny issues related to the detailed decomposition. We also address alternative choices of counterfactuals, including the case of the pooled regression that uses a group membership dummy to obtain a measure of the aggregate wage structure effect. We introduce a reweighted-regression decomposition as an attractive alternative when the linearity of the conditional mean as a function of the covariates is questionable. Finally, we briefly discuss the extensions of OB decompositions to limited dependent variable models, which carry some of the issues, such as path dependence, that will surface in methods that go beyond the mean.

3.1 Basics

Despite its apparent simplicity, there are many important issues of estimation and interpretation in the classic OB decomposition. The goal of the method is to decompose differences in mean wages, image, across two groups. The wage setting model is assumed to be linear and separable in observable and unobservable characteristics (Assumption 10):


image     (13)


where image (Assumption 11). Letting image be an indicator of group image membership, and taking the expectations over image, the overall mean wage gap image can be written as


image


where image. Adding and subtracting the average counterfactual wage that group image workers would have earned under the wage structure of group image, image, the expression becomes


image


Replacing the expected value of the covariates image, for image, by the sample averages image, the decomposition is estimated as

image     (14)

image     (15)

image     (16)

The first term in Eq. (15) is the wage structure effect, image, while the second term is the composition effect, image. Note that in cases where group membership is linked to some immutable characteristics of the workers, such as race or gender, the wage structure effect has also been called the “unexplained” part of the wage differentials or the part due to “discrimination”.

The OB decomposition is very easy to use in practice. It is computed by plugging in the sample means and the OLS estimates image in the above formula. Various good implementations of the procedure are available in existing software packages.22Table 2 displays the various underlying elements of the decomposition in the case of the gender wage gap featured in O’Neill and O’Neill (2006) using data from the NLSY79. The composition effect is computed as the difference between the male and female means reported in column (1) multiplied by the male coefficients reported in column (2).23 The corresponding wage structure effect is computed from the difference between the male and female coefficients reported in columns (2) and (3). The results are reported in column (1) of Table 3. The composition effect accounts for 0.197 (0.018) log points out of the 0.233 (0.015) average log wage gap between men and women in 2000. When the male wage structure is used as reference, only an insignificant 0.036 (0.019) part of the gap (the wage structure effect) is left unexplained.

Table 2 Means and OLS regression coefficients of selected variables from NLSY log wage regressions for workers ages 35-43 in 2000.

image image

Table 3 Gender wage gap: Oaxaca-Blinder decomposition results (NLSY, 2000).

image image

Because of the additive linearity assumption, it is easy to compute the various elements of the detailed decomposition. The wage structure and composition effects can be written in terms of sums over the explanatory variables

image     (17)

image     (18)

where image represents the omitted group effect, and where image and image represent the imageth element of image and image, respectively. image and image are the respective contributions of the imageth covariate to the composition and wage structure effect. Each element of the sum image can be interpreted as the contribution of the difference in the returns to the imageth covariate to the total wage structure effect, evaluated at the mean value of image. Whether or not this decomposition term is economically meaningful depends on the choice of the omitted group, an issue we discuss in detail in Section 3.2 below.24

Similar to O’Neill and O’Neill (2006), Table 3 reports the contribution of single variables and groups of variables to composition (upper panel) and wage structure effects (lower panel). Life-time work experience ‘priced’ at the male returns to experience stands out as the factor with the most explanatory power (0.137 out of 0.197, or 69%) for composition effects. The wage structure effects are not significant in this example, except for the case of industrial sectors which we discuss below.

Because regression coefficients are based on partial correlations, an OB decomposition that includes all image explanatory variables of interest satisfies the property of path independence (Property 2). Note, though, that a sequence of Oaxaca-Blinder decompositions, each including a subset of the image variables, would suffer from path dependence, as pointed out by Gelbach (2009). Despite these attractive properties, there are some important limitations to the standard OB decomposition that we now address in more detail.

3.2 Issues with detailed decompositions: choice of the omitted group

There are many relevant economic questions that can be answered with the detailed decomposition of the composition effect image in Eq. (18). For example, what has been the contribution of the gender convergence in college enrollment to the gender convergence in average pay? There are also some important questions that are based on the detailed decomposition of the wage structure effect image. For example, consider the related “swimming upstream” query of Blau and Kahn (1997). To what extent have the increases in the returns to college slowed down the gender convergence in average pay? Or, to what extent has the decline in manufacturing and differences in industry wage premia contributed to that convergence?

Some difficulties of interpretation arise when the explanatory variables of interest are categorical (with more than two categories, or more generally, in the case of scalable variables, such as test scores) and do not have an absolute interpretation. In OB decompositions, categorical variables generate two problems. The first problem is that categorical or scalable variables do not have a natural zero, thus the reference point has to be chosen arbitrarily. The conventional practice is to omit one category which becomes the reference point for the other groups. This generates some interpretation issues even in the detailed decomposition of the composition effect.

Returning to our NLSY example, assume that the industry effects can captured by four dummy variables, image to image, for the broad sectors: (i) primary, construction, transportation & utilities, (ii) manufacturing, (iii) education and health services & public administration, and (iv) other services. Consider the case where image is the omitted category, image, and denote by image the coefficients from the wage regression, as in column (2) of Table 2. Denote by image the coefficients of a wage regression where image is the omitted category, image, as in column (4) of Table 2, so that, for example, imageimage. In our example, given the large difference in the coefficients of manufacturing between columns (2) and (4) of Table 2, this could mistakenly lead one to conclude that the effect of the underrepresentation of women in the manufacturing sector has an effect three times as large image in one case (education and health omitted) as image in the other case (primary omitted). In the first case, the underrepresentation of women in the manufacturing sector is ‘priced’ at the relative returns in the manufacturing versus the education and health sector, while in the other it is ‘priced’ at the relative returns in the manufacturing versus the primary sector.25

Note, however, that the overall effect of 0.017 (0.006) of gender differences in industrial sectors on the gender wage gap, is the same in columns (1) and (2) of Table 3. To simplify the exposition, consider the special case where industrial sectors are the only explanatory factors in the wage regression. It follows that the composition effect,


image     (19)


is unaffected by the choice of omitted category.26

The second problem with the conventional practice of omitting one category to identify the coefficients of the remaining categories is that in the unexplained part of the decomposition one cannot distinguish the part attributed to the group membership (true “unexplained” captured by the difference in intercepts) from the part attributed to differences in the coefficient of the omitted or base category.27 These difficulties with the detailed decomposition of the unexplained part component were initially pointed by Jones (1983) who argued that “this latter decomposition is in most applications arbitrary and uninterpretable” (p. 126). Pursuing the example above, the effect of industry wage differentials on the gender wage gap is given by the right-hand side sums in the following expressions

image     (20)

image     (21)

where image and image, image. The overall wage structure effect is the same irrespective of the omitted category image, as shown in the last row of column (1) and (2) of Table 3. However, the overall effect of differences in the returns to industrial sectors, given by the right hand side sums with either choice of omitted group, −0.092 (0.033) in column (1) and 0.014 (0.028) in column (2), are different because different parts of the effect are hidden in the intercepts [0.128 (0.213) in column (1) and 0.022 (0.212) in column (2)].28

This invariance issue has been discussed by Oaxaca and Ransom (1999), Gardeazabal and Ugidos (2004), and Yun (2005, 2008), who have proposed tentative solutions to it. These solutions impose some normalizations on the coefficients to purge the intercept from the effect of the omitted category, either by transforming the dummy variables before the estimation, or by implementing the restriction, image, image, via restricted least squares.29 Yun (2005) imposes the constraint that the coefficient on the first category equals the unweighted average of the coefficients on the other categories, image along with image. While these restrictions may appear to solve the problem of the omitted group, as pointed out by Yun (2008) “some degree of arbitrariness in deriving a normalized equation is unavoidable” (p. 31). For example, an alternative restriction on the coefficients, that goes back to Kennedy (1986), could be a weighted sum, image, where the weights image reflect the relative frequencies of the categories in the pooled sample. The coefficients would then reflect deviations from the overall sample mean.

The pitfall here is that the normalizations proposed by Gardeazabal and Ugidos (2004) and Yun (2005) may actually leave the estimation and decomposition without a simple meaningful interpretation. Moreover, these normalizations will likely be sample specific and preclude comparisons across studies. By contrast, in the case of educational categories, the common practice of using high school graduates as the omitted category allows the comparison of detailed decomposition results when this omitted category is comparable across studies.

Invariance of the detailed decomposition with respect to the choice of omitted category may appear to be a desirable property, but it is actually elusive and should not come at the expense of interpretability. There is no quick fix to the difficult choice of the appropriate omitted category or base group, which is actually exacerbated in procedures that go beyond the mean. To mimic the case of continuous variables, one may argue that an education category such as less than high school that yields the smallest wage effect should be the omitted one, but this category may vary more across studies than the high school category. Issues of internal logic have to be balanced with comparability across studies.

Another way of reporting the results of counterfactual experiments, proposed in the context of the gender wage gap by industry, is to report the wage structure effects for each image category by setting image and image for image in the expression (20) for the total wage structure effect


image     (22)


in a case where there are other explanatory variables, image, image. 30 Initially, such expressions included only the first two terms, the intercept and the effect of the category image(Fields and Wolff, 1995). Later, Horrace and Oaxaca (2001) added the wage structure effect associated with the other variables. This allows one to compare the effect of wage structure on gender wage differentials by category while controlling for other explanatory variables image, image in a way that is invariant to the choice of omitted category. 31In columns (1) and (2) of Table 3, the wage structure effect associated with variables other than industrial sectors is essentially zero, and the image can be computed as the difference between the male and female coefficients in columns (2) and (3) of Table 2 plus the 0.128 difference in the constant, yielding values of 0.128, 0.022, 0.004, and 0.048 for industries 1 through 4, respectively. Horrace and Oaxaca (2001) also proposed to ex-post normalize the effects of each category with respect to the maximum categorical effect.

One disadvantage of decomposition terms like image relative to the usual components of the detailed decomposition is that they do not sum up to the overall wage structure effect. As a result, just looking at the magnitude of the image terms gives little indication of their quantitative importance in the decomposition. We propose a normalization to help assess the proportion of the total wage structure effect which can be attributed to a category image given that a proportion image of group image workers belongs to that category, and that is also invariant to the choice of omitted category. The normalization uses the fact that the weighted sum of the image, image (that is, including the omitted category), is equal to the total wage structure effect, so that the proportional effect image of category image in the total wage structure can be computed as32


image     (23)


In our empirical example, with group image as the reference group, this expression is computed using female averages, thus image will tell us the proportion of the total wage structure effect that can be attributed to industrial category image given the proportion of women in each category. The numbers are 0.308 for primary, 0.074 for manufacturing, 0.040 for education and health, and 0.578 for other services. Despite being underrepresented in the manufacturing sector, because women’s returns to manufacturing jobs are relatively high, the share of the unexplained gap attributable to that factor turns out not to be that large.

3.3 Alternative choices of counterfactual

On the one hand, the choice of a simple counterfactual treatment is attractive because it allows us to use the identification results from the treatment effects literature. On the other hand, these simple counterfactuals may not always be appropriate for answering the economic question of interest. For instance, the male wage structure may not represent the appropriate counterfactual for the way women would be paid in the absence of labor market discrimination. If the simple counterfactual does not represent the appropriate treatment, it may be more appropriate to posit a new wage structure. For example, in the case of the gender pay gap, typically propositions (Reimers, 1983; Cotton, 1998; Neumark, 1988; Oaxaca and Ransom, 1994) have used a weighted average expression image, where image corresponds to image, image corresponds to image, and where image could reflect a weighting corresponding to the share of the two groups in the population. Another popular choice is the matrix image, which captures the sample variation in the characteristics of group image and image workers. 33 The decomposition is then based on the triple differences:


image


Table 3 shows that in the NLSY example, the gender gap decomposition is substantially different when either the female wage structure (column 3) or the weighted sum of the male and female wage structure (column 4) is used as the reference wage structure. Typically (as in Bertrand and Hallock (2001) for example), with the female wage structure as reference, the explained part of the decomposition (composition effect) is smaller than with the male wage structure as reference. Indeed, evaluated at either female ‘prices’ or average of male and female ‘prices’, the total unexplained (wage structure) effect becomes statistically significant.

An alternative measure of “unexplained” differences (see Cain, 1986) in mean wages between group image and group image workers is given by the coefficient image of the group membership indicator variable image in the wage regression on the pooled sample, where the coefficients of the observed wage determination characteristics are constrained to be the same for both groups:


image     (24)


where the vector of observed characteristics image excludes the constant. It follows that,


image


where image. As noted by Fortin (2008), this “regression-compatible” approach is preferable to the one based on a pooled regression that omits the group membership variable (as in Neumark (1988) and Oaxaca and Ransom (1994)), because in the latter case the estimated coefficients are biased (omitted variable bias). Note, however, that this counterfactual corresponds to the case where the group membership dummy is thought to be sufficient to purge the reference wage structure from any group membership effect, an assumption that is maintained in the common practice of using the group membership dummy in a simple regression to assess its effect. The detailed decomposition is obtained using the above triple differences decomposition. 34

The results of this decomposition, reported in column (5) of Table 3, are found to be closest to the one using the female coefficients in column (3), but this is not necessarily always the case. Notice that the magnitude of the total unexplained wage log wage gap 0.092 (0.014) log points corresponds to the coefficient of the female dummy in column (5) of Table 2.

3.4 Reweighted-regression decompositions

A limitation of OB decompositions, discussed by Barsky et al. (2002), is that they may not provide consistent estimates of the wage structure and composition effect when the conditional mean function is non linear. Barsky et al. (2002) look at the role of earnings and other factors in the racial wealth gap. They argue that a standard OB decomposition is inadequate because the wealth-earnings relationship is non linear, and propose a more flexible approach instead.

Under the linearity assumption, the average counterfactual wage that group image workers would have earned under the wage structure of group image is equal to image, and is estimated as the product image, a term that appears in both the wage structure and composition effect in Eq. (15). However, when linearity does not hold, the counterfactual mean wage will not be equal to this term.

One possible solution to the problem is to estimate the conditional expectation using non-parametric methods. Another solution proposed by Barsky et al. (2002) is to use a (non-parametric) reweighting approach as in DiNardo et al. (1996) to perform the decomposition. One drawback of this decomposition method, discussed later in the chapter, is that it does not provide, in general, a simple way of performing a detailed decomposition. In the case of the mean, however, this drawback can be readily addressed by estimating a regression in the reweighted sample.

To see this, let image be the reweighting function, discussed in Section 4.5, that makes the characteristics of group image workers similar to those of group image workers. The counterfactual coefficients image and the counterfactual mean image, are then estimated as:35

image

image

where image.36 If the conditional expectation of image given image was linear, both the weighted and unweighted regressions would yield the same consistent estimate of image, i.e. we would have image. When the conditional expectation is not linear, however, the weighted and unweighted estimates of image generally differ since OLS minimizes specification errors over different samples. 37

Consider the “reweighted-regression” decomposition of the overall wage gap image, where


image


The composition effect image can be divided into a pure composition effect image using the wage structure of group image, and a component linked to the specification error in the linear model, image:


image


The wage structure effect can be written as


image


and reduces to the first term image as the reweighting error image goes to zero in large samples (image).

The reweighted-regression decomposition is similar to the usual OB decomposition except for two small differences. The first difference is that the wage structure effect is based on a comparison between image and the weighted estimate image instead of the usual unweighted estimate image. As discussed in Firpo et al. (2007), this ensures that the difference image reflects true underlying differences in the wage structure for group image and image, as opposed to a misspecification error linked to the fact that the underlying conditional expectation is non-linear. Note that is also useful to check whether the reweighting error image is equal to zero (or close to zero), as it should be when the reweighting factor image is consistently estimated.

The other difference relative to the OB decomposition is that the composition effects consists of a standard term image plus the specification error image. If the model was truly linear, the specification error term would be equal to zero. Computing the specification error is important, therefore, for checking whether the linear model is well specified, and adjusting the composition effect in the case where the linear specification is found to be inaccurate.

In the case where the conditional expectation image is estimated non-parametrically, a whole different procedure would have to be used to separate the wage structure into the contribution of each covariate. For instance, average derivative methods could be used to estimate an effect akin to the image coefficients used in standard decompositions. Unfortunately, these methods are difficult to use in practice, and would not be helpful in dividing up the composition effect into the contribution of each individual covariate.

On a related note, Kline (2009) points out that the standard OB decomposition can be interpreted as a reweighting estimator where the weights have been linearized as a function of the covariates. This suggests that the procedure may actually be more robust to departures from linearity than what has been suggested in the existing literature. Since the procedure is robust to these departures and remains the method of choice when linearity holds, Kline (2009) points out that it is “doubly robust” in the sense of Robins et al. (1994) and Egel et al. (2009).

3.5 Extensions to limited dependent variable models

OB decompositions have been extended to cases where the outcome variable is not a continuous variable. To mention a few examples, Gomulka and Stern (1990) study the changes over time in labor force participation of women in the United Kingdom using a probit model. Even and Macpherson (1990) decomposes the male-female difference in the average probability of unionization, while Doiron and Riddell (1994) propose a decomposition of the gender gap in unionization rate based on a first order Taylor series approximation of the probability of unionization. Fitzenberger et al. (forthcoming) use a probit model to decompose changes over time in the rate of unionization in West and East Germany. Fairlie (1999, 2005) discuss the cases of the racial gaps in self-employment and computer ownership. Bauer and Sinning (2008) discuss the more complicated cases of a count data model, for example where the dependent variable is the number of cigarettes smoked by men and women (Bauer et al., 2007), and of the truncated dependent variable, where for example the outcome of interest is hours of work.

In the case of a limited dependent variable image, the conditional expectation of image is typically modeled as a non-linear function in image, image. For example, if image is a dichotomous outcome variable (image) and image is a latent variable which is linear in image, it follows that image where image is the CDF of image. When image follows a standard normal distribution, we have a standard probit model and image. More generally, under various assumptions regarding the functional form image and/or the distribution of the error terms image, the models are estimated by maximum likelihood.

Because image, the decomposition cannot simply be computed by plugging in the estimated image’s and the mean values of image’s, as in the standard OB decomposition. Counterfactual conditional expectations have to be computed instead, and averaged across observations. For example, if group image is thought to be the reference group, image will be the counterfactual conditional expectation of image that would prevail if the coefficients of the determinants of self-employment (for example) for group image were the same as for group image. This involves computing predicted (i.e. expected) values based on the estimated model for group image, image, over all observations in group image, and averaging over these predicted values.

The mean gap between group image and group image is then decomposed as follows


image


into a component that attributes differences in the mean outcome variable to differences in the characteristics of the individuals, and a component that attributes these differences to differences in the coefficients.

The same difficult issues in the appropriate choice of counterfactuals persist for more general non-linear models. In addition, extra care has to be taken to verify that the sample counterfactual conditional expectation lies within the bounds of the limited dependent variable. For example, Fairlie (1999) checks that average self-employment for Blacks predicted from the White coefficients is not negative.

The non-linear decomposition may perform better than the linear alternative (linear probability model, LPM) when the gap is located in the tails of the distribution or when there are very large differences in the explanatory variables, whose effects would remain unbounded in a LPM. On the other hand, there are many challenges in the computation of detailed decompositions for non-linear models. Because of non-linearity, the detailed decomposition of the two components into the contribution of each variable, even if the decomposition was linearized using marginal effects, would not add up to the total. Gomulka and Stern (1990) and Fairlie (2005) have proposed alternative methodologies based on a series of counterfactuals, where the coefficient of each variable is switched to reference group values in sequence. In the latter cases, the decomposition will be sensitive to the order of the decomposition, that is will be path dependent. We discuss these issues further in the context of the decompositions of entire distributions in Section 5.

3.6 Statistical inference

OB decompositions have long been presented without standard errors. More recently, Oaxaca and Ransom (1998), followed by Greene (2003, p. 53–54), have proposed approximate standard errors based the delta method, under the assumption that the explanatory variables were fixed.38 A more modern approach where, as above, image are stochastic was suggested and implemented by Jann (2005). In cases where the counterfactuals are not a simple treatment, or where a non-linear estimator is used, bootstrapping the entire procedure may prove to be the practical alternative.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.228.246