Handbook of Labor Economics, Vol. 4, No. Suppl PA, 2011

ISSN: 1573-4463

doi: 10.1016/S0169-7218(11)00409-6

Chapter 3Lab Labor: What Can Labor Economists Learn from the Lab?

Gary Charness, Peter Kuhn,


University of California, Santa Barbara, United States

Abstract

This chapter surveys the contributions of laboratory experiments to labor economics. We begin with a discussion of methodological issues: when (and why) is a lab experiment the best approach; how do laboratory experiments compare to field experiments; and what are the main design issues? We then summarize the substantive contributions of laboratory experiments to our understanding of principal-agent interactions, social preferences, union-firm bargaining, arbitration, gender differentials, discrimination, job search, and labor markets more generally.

JEL classification

• C9 • J0

Keywords

• Laboratory experiment • Social preferences • Principal-agent models • Personnel economics

The economics literature has witnessed an explosion of laboratory experiments in the past 20 years. Many of these experiments have focused on topics that are central to the field of labor economics, including how workers respond to various forms of compensation, and the economics of discrimination, arbitration, bargaining, and matching. In this chapter we survey the contributions of laboratory experiments to our understanding of these questions.

We begin our review with a discussion of methodological issues: First, we pose the general question of why (and more importantly when) a labor economist might want to conduct a laboratory experiment: What types of questions, if any, are laboratory experiments best suited to answer? How do laboratory experiments compare to field experiments? Next, once one has decided to conduct a laboratory experiment, how should it be designed? Here we review the main methodological decisions an experimenter typically needs to make, and the advantages and disadvantages of the various choices.

The second half of our review turns its attention to the substantive issues in labor economics that have been addressed using laboratory experiments. While these are wide-ranging, we focus our review on the set of issues that have generated probably the largest volume of experimental papers in labor economics: the effects of compensation policies on the supply of effort by workers. We do this in two parts. The first uses “traditional” principal-agent theory as a theoretical lens to derive predictions regarding the effects of incentives, and tests these predictions in the lab. Many are confirmed; at the same time a number of robust “anomalies”, such as apparent gift exchange, also appear. The second part focuses specifically on the use of experiments and the development of new theoretical models of social preferences to understand these anomalies. Finally, we also provide brief guides to the laboratory literature on a number of other labor-related topics, including union-firm bargaining, arbitration, gender differentials, discrimination, and job search.

1 Why Laboratory Experiments?

Why should labor economists care about laboratory experiments? After all, there are plenty of field data available for empirical tests. In addition, there have been a number of objections to lab experiments concerning issues such as a lack of realism (external validity), demand effects, and selection effects. Indeed, the laboratory is an artificial environment. On the other hand, lab experiments have some important advantages over other approaches; we begin this section with a discussion of these advantages, then move on to critiques and responses.

1.1 Advantages of laboratory experiments

Most practitioners of lab experiments would probably agree that a key advantage is the ability to control conditions more tightly than in any other context. For example, testing theory is a basic component of both the physical and social sciences, and the scientific method relies upon explicit tests of theory. While empirical data are indeed rich and abundant, they reflect a variety of environmental factors; disentangling these factors is difficult if not impossible.

Falk and Fehr (2003) provide the concrete example of testing tournament theory (e.g., Lazear and Rosen, 1981), where contestants should in equilibrium choose effort levels to equate marginal effort with marginal gain. Since a direct empirical test of this theory must take into account “the number of workers who compete for the prize, the effort cost functions of the workers, the exact level of the prize, and the production level including the nature of the error term” (p. 400), such a test seems impossible with traditional empirical data.1 However, all of these factors can be controlled (and systematically varied) in a laboratory experiment. In this manner, it is also feasible to study the impact of specific institutional arrangements on behavior by systematically varying them.

Of course, the same argument can be made regarding the effects of workers’ outside options, minimum wages, sick pay, discrimination, etc. As another example, the gift-exchange game (an experimental model of Akerlof (1982)) tests for a positive relationship between wages and effort; as this is a critical assumption of efficiency-wage theories (as discussed for example in Akerlof (1984) and Akerlof and Yellen (1986)), it is useful to test it; however, it is quite difficult to do so with standard field data. In general, the laboratory offers superior control, which make it possible to identify causal relationships. In field data, variables are often determined endogenously and usually it is only possible to identify correlation. Finally, if one has doubts concerning data reported in a laboratory experiment, one can readily replicate the experiment (particularly when a standard subject pool is used).2

One of the greatest strengths of lab experiments is the ability to take a specific theoretical model (say of behavior under a specific group incentive scheme, with no communication between players and one-shot interaction), where theory says exactly what, say, the Perfect Bayesian equilibrium should be, and have real agents play exactly that game with real monetary consequences. One can then compare the predictions of the model to what happens; if the theory is rejected it is then relatively easy to test competing explanations (e.g. inequity aversion, “reciprocity”, loss aversion, framing) for the rejection. One way to think about a lab experiment is as a first link in a longer chain running from theory to actual interactions in real firms. A distinct role of lab experiments applies to situations where ‘standard’ game theory doesn’t give us crisp predictions, as in games with multiple equilibria. Important examples include production in teams with complementarities among the agents’ efforts (Brandts and Cooper, 2007), where the multiple equilibria are due to coordination problems, and repeated principal-agent interactions (e.g. Brown et al., 2004) where the folk theorem can generate an infinity of equilibria. In these cases, experiments provide information on how people behave in situations where existing theory provides little or no guide to what should happen.

The theory of mechanism design often provides suggestions for a number of mechanisms that are predicted to yield socially-efficient outcomes, or sometimes for a number of alternative mechanisms that are all predicted to be equivalent (for example, it is well known that a properly-designed tournament should be able to exactly mimic an efficient individual piece rate). Lab experiments however can be used to show that mechanisms/institutions that work in theory don’t always do so in practice, even under conditions designed to be ideal for the institution. Also, institutions that are predicted to yield identical results may not do so. In fact, the use of lab experiments to ‘pre-test’ proposed allocation mechanisms before implementing them in the real world already has an established history (see Plott (1987) for some examples). The same is true of lab experiments in at least two labor contexts: the design of matching mechanisms in professional labor markets (e.g. McKinney et al., 2005) and the design of arbitration mechanisms for public-sector union bargaining (e.g. Deck et al., 2007).

A sometimes-overlooked advantage of lab experiments is their low cost, especially compared to field experiments and survey data collection: competing explanations can often be tested or distinguished quickly and inexpensively with a modest number of sessions. In this sense, dropping laboratory experiments from our toolkit would be a little like dropping animal studies from cancer research: while results from animal studies do not always apply to humans, the ability to test many hypotheses cheaply under carefully controlled conditions provides an indispensable tool for the development of models that work in the real world.

Another comparative advantage of lab experiments is in the study of phenomena that are hard to measure in the field because they are illegal or face disapproval, such as acts of sabotage, discrimination, and spite. It is also relatively easy to measure agents’ beliefs in the lab, using monetary incentives. This is important in view of the role played by beliefs in many game-theoretic models. Belief data from experiments has been central in the development of new behavioral theory such as guilt aversion (see Charness and Dufwenberg (2006) and Battigalli and Dufwenberg (2007, 2009)).

Finally, lab experiments offer unique opportunities to researchers who are interested in the form of strategies used by agents in solving dynamic problems (or playing dynamic games). An illustrative example here is the classic problem of search from a fixed wage distribution, which is often used by labor economists to model individual workers’ unemployment spells. Theory has strong predictions here—that the optimal strategy has a reservation wage property—but it is difficult to test this prediction from field data (whether experimental or not) because strategies need to be inferred from choice histories. In contrast, the lab makes it easier to elicit subjects’ strategies more directly in a number of ways, including asking subjects to describe their strategy (Hey, 1982), observing subjects’ use of information boards (Sonnemans, 1998), and forcing subjects to play the game using the strategy method (Sonnemans, 1998; Brown et al., forthcoming). This approach has identified some interesting deviations between actual and predicted strategies (for example, subjects seem to condition their acceptance behavior on factors like their total earnings to date, which is not optimal) that researchers are now attempting to understand using a variety of behavioral approaches.

1.2 Objections to laboratory experiments and some responses

The most common objection to the data obtained through laboratory experiments is that they have nothing to do with the field environment (no external validity). In principle, this is a serious objection, of course. It comprises a number of facets, such as the fact the participants are usually undergraduate students, who typically have little experience in labor markets (particularly as firms), the issue that the stakes are low, and the fact that the “labor task” is often simply the choice of how much money to assign to another party. One might also be concerned that participants are affected by the mere fact that they are being observed (Hawthorne effects).

There are at least two main responses to these objections. First, as pointed out by Falk and Heckman (2009), “for the purpose of testing theories, [representative evidence] is not a problem because most economic models derive predictions that are independent of assumptions concerning participant pools (p. 537). Of course, it seems better to have a richer variation than is provided by undergraduate students, the most convenient source of participants for experiments conducted by academic researchers.3 The second response involves the use of more “real-world” participants, allowing for agent self-selection in the lab, less artificial tasks, and higher stakes.

Laboratory experiments have been conducted on soldiers (Fehr et al., 1998), Costa Rican coffee-plantation CEOs (Fehr and List, 2004), Chinese central planners (Cooper et al., 1999), professional arbitrators (Farber and Bazerman, 1986), Ghanaian manufacturing workers (Barr and Serneels, 2009), Japanese shrimp fishermen (Carpenter and Seki, forthcoming), and employees at large French firms (Charness and Villeval, 2009), among others.4 In fact, the performance by student participants is often fairly closely matched in such experiments. In addition, many if not most field experiments on incentive effects focus on highly specific industries or occupations, such as windshield repairers (Lazear, 2000), tree planters (Shearer, 2004), fruit pickers (Bandiera et al., 2005), and bicycle messengers (Fehr and Goette, 2007).5 Virtually all of these seem likely to be a more highly-selected population than college students as a group, who can reasonably be considered to be representative of the college-educated labor force. Therefore, if the goal is to identify general principles that apply broadly to a large population of workers, college students might be a more attractive choice than workers in a single, narrowly-defined occupation or industry.

A second approach to the “representativeness” issue is to mimic, in the lab, the same sorts of self-selection that generate different subpopulations in the real world. Clearly, such selection can be important, for example, if altruistic workers tend to self-select into cooperative work environments (such as teams), or risk-loving (or overconfident) workers self-select into highly competitive work environments. Persons in jobs that frequently require them to make “tough” decisions such as cutting workers’ pay or firing them (i.e. managers) might have highly selected social preferences indeed, so that laboratory experiments that randomly assign college students to represent “principals” may provide a particularly poor guide to the decisions of real managers. To some extent, however, laboratory experiments can allow for such self selection and even shed important light on how it works. Interesting examples of this approach in a non-labor-market context include Lazear et al. (2006), and DellaVigna et al. (2009), who allow experimental subjects to self-select out of a situation where they are “expected” to be altruistic. In a labor market context, Eriksson et al. (2008) show that allowing risk-averse subjects to self-select out of tournaments improves tournament performance. A series of interesting experiments beginning with Gneezy et al. (2003) show that women tend to self-select out of tournaments. A key advantage of the lab in addressing these self-selection questions is the opportunity to directly measure, and control for, confounding factors such as the agent’s ability at the task, her perception of her own and others’ abilities, loss aversion, and risk aversion.

Concerning the objection that the labor task is abstract and artificial, there has been an increasing trend in “real-effort” experiments, in which tasks have included proofreading (Frohlich and Oppenheimer, 1992), solving puzzles (Rütstrom and Williams, 2000), mazes (e.g., Gneezy et al., 2003), anagrams (Charness and Villeval, 2009), complex optimization problems (Van Dijk et al., 2001), simple clerical tasks (Falk and Ichino, 2006; Carpenter et al., 2010) and cracking walnuts (Fahr and Irlenbusch, 2000).

Regarding the issue of small stakes, laboratory experiments have been conducted in locations where the stakes translated into more than a month’s earnings (e.g., Fehr et al., 2002; Slonim and Roth, 1998), with evidence that fairness considerations still seem to apply.6 Furthermore, it is not obvious whether stakes involving larger sums of money or the small stakes that apply to decisions people make on a daily basis are more relevant for economic purposes. It is also the case that large stakes do not necessarily lead to fewer mistakes, as in shown by Ariely et al. (2008). Finally, it is possible that participants behave differently due to scrutiny (Levitt and List, 2007). As discussed by Falk and Heckman (2009), in many laboratory experiments involving more complex decisions, this is likely to be only a minor problem.7 And of course scrutiny can be present in the field as well, as workers are often monitored. In any event, scrutiny can be eliminated (or systematically varied); some experimenters use double-blind techniques, where payments are placed in envelopes by monitors who have not observed the experiment, so that participants understand that the experimenter cannot know their choices.

Thus, many of the objections raised against laboratory experiments are either red herrings or can be met by taking the laboratory to the field, using “real” people (of course, students are real people as well, and they respond to the financial incentives provided), real-effort tasks, and varying the stakes.8 So, while there are certainly issues in taking the results of laboratory experiments to the field environment, these can be ameliorated. The real value of laboratory experiments is in the enhanced opportunities for, and lower cost of carefully-controlled variation, as is required for causal knowledge rather than simple correlation. This control extends to environmental features such as institutions, payoff parameters, participant pools, the nature of the interaction among the participants (e.g., anonymity versus face-to-face; one-shot versus repeated), and even the level of scrutiny. To quote Falk and Heckman (2009): “Laboratory experiments are very powerful whenever tight control …is essential. …Tight control …also allows replicability of results, which is generally more difficult with field data” (p. 537). If one wishes to perform careful tests of theory, laboratory experiments are particularly useful. Of course, none of the comments above should be taken to imply that laboratory techniques are intrinsically superior to standard empirical data or field experiments. We discuss this issue in some detail in the next subsection.

1.3 Laboratory experiments and field experiments

Is a hammer a better tool than a screwdriver (or vice versa)? Sometimes one needs a hammer and sometimes one needs a screwdriver. They are different tools, suited for different purposes. Claiming superiority for one tool over the other seems misplaced. This principle also applies to research methods, as each method has its own strengths and weaknesses. The idea is not a new one; Runkel and McGrath (1972) identify eight research strategies (including field studies, field experiments, and laboratory experiments), which they categorize along the two dimensions of obtrusive-unobtrusive research operations and universal-particular behavior. They state: “We cannot emphasize too strongly our belief that none of these strategies has any natural or scientific claim to greater respect from researchers than any other” (p. 89).9

This brings us to the current debate about the value of field experiments compared to the value of laboratory experiments. List (2008) and Levitt and List (2009) extol the value of field experiments, often mentioning the notion that these are a useful bridge between naturally-occurring environments and laboratory experiments. Levitt and List (2007) provide a criticism of laboratory experiments, pointing out factors that are beneficial in field experiments, while pointing out a number of factors that make the interpretation of data in laboratory experiments problematic. Their main issue is the degree to which “the insights gained in the lab can be extrapolated to the world beyond” (p. 153); this is also known as external validity. They mention five factors (p. 154) that can influence behavior in the lab; in our view, the three most relevant of these involve the nature and extent of scrutiny (on which the greatest emphasis is placed in the paper), the context in which the decision is embedded, and the stakes of the game.

Falk and Heckman (2009) provide a response to these comments, and strongly emphasize the value of laboratory experiments. The thrust of their argument is that the controlled variation possible in laboratory experiments facilitates tests of theory, causal effects, and treatment effects.10 To a certain extent, field experiments also can provide fairly good control of the environment, although rarely to the level attainable with laboratory experiments.11 Falk and Heckman address the notion that the conditions in field experiments are more “realistic”; for example, they point out how it is unclear whether undergraduate students are less representative of the overall population than sports-card traders in their natural setting.12

A number of arguments regarding scrutiny are mentioned above. Indeed, there is no doubt that the mere act of scrutiny can affect behavior.13 However, since there is also scrutiny in field environments and since the sense of scrutiny and the associated possibility of “demand effects” (where the participant acts in a manner that he or she believes reflects the experimenter’s desired outcome) can be nearly eliminated,14 this concern seems somewhat overstated. The notion that social preferences can be crowded out by large financial incentives is hardly new, as it is a feature of the Rabin (1993) model. Nevertheless, as Falk and Heckman point out, many real-life decisions involve small stakes, so that it is not clear that one requires larger stakes in order to provide incentives that are meaningful enough to match the relevant field environment.

Despite the divergent views expressed in these articles and others, it is worth noting that there is indeed common ground. For example, in both Levitt and List (2007) and Falk and Heckman (2009), the authors discuss how one needs a model or theory to transport findings to new populations or environments, whether these data originate in laboratory experiments or field experiments. These articles also appear to agree that the controlled variation in the laboratory is better for careful tests of theory. Both state that there are shortcomings in both laboratory and field experiments, but that each can provide useful insights. In fact, both camps apparently agree that both forms of experimentation (as well as hybrids) can be combined to yield a better understanding of the phenomena involved. One interesting point is that two of the key players in the debate (Falk and List) have used and continue to use both laboratory and field experiments in their research.15 In a certain sense, one wonders what the shouting is about.

Our own view is that laboratory experiments are best at testing theory and identifying treatment effects, and they can also provide useful qualitative insights. However, any assumption that the quantitative levels of behavior observed in the laboratory apply to naturally-occurring settings must be carefully considered, as the laboratory is only a model of the field environment and cannot include many details that may influence behavior.16 Field experiments, for their part, offer promise in areas that are not readily susceptible to laboratory experimentation and generally involve a greater range of personal and demographic characteristics. Field experiments are especially valuable to the extent that they can capture more realistic behavior (particularly in settings where the participants are unaware that there is an ongoing experiment). That said, a similar level of care needs to be taken in applying quantitative estimates from some highly selected field populations (fruit pickers, bicycle messengers, tree planters, school children) to other field populations. These research methodologies are complements, not substitutes. One should use the most appropriate tool or tools for the job at hand.

2 Issues in Designing Laboratory Experiments

Suppose you have decided that a laboratory experiment is a fruitful way to address a research question. This section reviews some of the main design questions the investigator typically needs to address. We do this in two stages: first, we consider general issues that arise in almost all laboratory experiments, not necessarily restricted to questions in labor economics. Second, we focus specifically on the design of “supply of effort” experiments, which constitute the main focus of our review of the substantive research.

2.1 General design questions

The first and most basic question is how closely to try to match the field environment. This will depend to a substantial degree on whether one is testing theory, one is trying to isolate a treatment effect, or one is trying for realism in an effort to draw conclusions about the effect of policy changes in a specific environment.17 While one should be fairly insistent that the details of the design deal correctly with the issues involved with a test of theory, one cannot expect the experimental design to precisely match the field environment. There are typically trade-offs between parsimony and richness. A general rule is to err on the side of simplicity, but to include the central elements of the question at issue. It is fundamental that the participants understand the task at hand, and this varies inversely with the degree of complexity.

Regarding the issue of comprehension, a choice variable is the degree of examples (or even actual coaching) that will be provided. Experimental practice has to some extent evolved over the years. In earlier times, it was customary to provide neither examples nor test questions, out of concern that giving examples could introduce bias or demand effects. However, this policy runs the serious risk that participants will fail to understand some important aspects of the task. It has become nearly standard practice to at least ask participants questions about what outcome would prevail in the event of various combinations of choices; it is also customary to provide examples in the instructions. Of course there is always the possibility that some bias may be introduced due to this process. Nevertheless, one can minimize this possibility by going over every contingency; if this is not feasible due to the presence of a large number of contingencies, one can select “representative” contingencies. Whether examples are needed will depend on the complexity present in the experiment.

A closely-related issue is whether to use an abstract context in the instructions or to provide a richer context that points to the field environment in question. In some experiments (e.g., Charness and Rabin, 2002), the researchers are careful to choose completely neutral terms. The main advantage of this policy is that it may well limit bias.18 On the other hand, many laboratory experiments explicitly label the subjects’ roles as “firms” or “workers” (or even more specifically as “high-ability workers”, etc.) and to label the choices as “wages” or “effort”, etc., even though (strictly speaking) the experiment is simply a game with no actual work performed. As with providing examples, the main advantage of a richer context is that it makes it easier for subjects to understand the game, while the primary disadvantage is that it might bring in established behavior patterns/expectations from those environments.

The question of context, framing and reference points is not innocuous. Sometimes the details of instructions given to the subjects can unwittingly cause their behavior to focus on certain outcomes. One nice example of framing comes from Liberman et al. (2004). There are two treatments, both of which feature the identical prisoner’s-dilemma game. However, in one treatment, the game is labeled “The Wall St. Game”, while the game is labeled “The Community Game” in the second treatment. The rate of cooperation was less than 30% in The Wall St. Game, but was over 70% in The Community Game. A second example is provided by Cooper et al. (1999), who find that providing context for Chinese central planners in a lab experiment improved their understanding of the game; however this had no effect on the students in other sessions of the experiment. More recently, Levitt et al. (2009) present evidence suggesting that even professionals (such as world-class poker players who are skilled randomizers in the field) have difficulty transferring those skills to the unfamiliar context of the laboratory.19

Another important design question concerns whether an experiment features multiple periods or not. In our view, the first question the investigator needs to ask here is whether the real-world situation they are interested in understanding most closely resembles (a) repeated interactions between the same decision-makers over a long and indefinite horizon, (b) repeated interactions with a clear end date, or (c) one-shot interactions. Case (a) can be mimicked in the lab by having the same subjects interact repeatedly with the last period of the experiment unknown to the subjects;20 case (b) is straightforward to implement; case (c) can be implemented either by having a single period, or by randomly re-matching subjects between multiple periods (i.e. a “strangers” design). If the experimenter is interested in one-shot interactions, having only one period is in a sense the cleanest design, but it is also the most expensive approach to gathering experimental data. It can also be problematic if agents need some experience to actually understand the game they are playing. This leads most investigators interested in one-shot interactions (where the predictions of theory are usually the sharpest) to implement multiple periods with re-matching. Interestingly, even though “standard” game theory predicts no repeated-game effects under these conditions,21 behavior sometimes resembles the predictions of one-shot models more closely in the last few rounds.

“Partners” designs where agents are matched for the duration of the session are, of course, expected to yield repeated-game effects; the predicted effects of repeated interaction typically differ dramatically between finitely- and infinitely-repeated games (with the “folk theorem” applying to the latter case). In the latter case, experiments are less useful in testing theory than in providing some idea of what tends to happen when “standard” theory has little predictive power. A related design question is whether people are always in the same role or whether this can change from period to period. There is disagreement concerning which approach facilitates learning, but role change permits the experimenter to compare an individual’s behavior across the various roles.

This leads us to the question of whether to use a “within-subjects” design or a “between-subjects” design. Labor economists’ experience with field data where there is typically substantial nonrandom heterogeneity disposes them towards research designs with subject fixed effects; in the laboratory, this requires administering both the treatment(s) and the control situation to the same subjects. Experimenters are accustomed to having (both observable and unobserved) heterogeneity handled by randomization, but are highly sensitive to framing and sequencing effects. Thus one’s behavior under one condition may be influenced by his exposure to other conditions (something that is usually ruled out by assumption in fixed-effects econometric models). Thus, many experimenters tend to prefer “between-subjects” designs, where each subject is exposed to one and only one treatment.

Labor economists need to be aware of this motivation for the between-subject approach. In some ways, a between-subjects design is cleaner and avoids sequencing effects (although sometimes these are a main topic of interest),22 but it is typically less powerful and costlier to implement; on the other hand, a within-subjects approach tends to be more powerful in statistical tests, but can lead to spurious correlations. An advantage of a within-subjects design is that one can control for individual differences by letting each person serve as their own control. Some experiments combine both. One way to incorporate both approaches is (a) to vary the order of treatments in a within-subject design, then (b) use only the cross-sectional data from the first treatment as a between-subjects experiment to test the robustness of the within-subject approach.

A truly crucial issue in experimental design is the calibration of the parameters, as results can be very sensitive to parameter values and functional forms. Calibration typically involves establishing a baseline for comparisons. There is little “science” to guide one to choosing parameter values; instead, this is an art that is informed by the experimenter’s intuition and experience. However, one tip is to find a calibration for which the baseline treatment’s results leave room to move in either direction (a calibration that leads to a very low or very high rate in the baseline permits movement in only one direction). The researcher must also consider how to justify the choice of parameter values and functional forms.

The choice of payoff method is intimately connected to the issue of calibration. Incentives should be large enough to induce thoughtful and motivated behavior by the participants. An additional consideration is whether to pay for each period or to pay the participant for only one (or several) periods randomly-chosen at the end of the session. It is more traditional to pay for each period, but there is a definite trend towards paying for only some random subset of all periods. The latter approach avoids wealth effects (the amount already earned in a session: participants often have some form of income targets), mitigates boredom in later rounds, and avoids issues of people taking chances because they know that they have negative earnings (bankruptcy) at some point in the session and that negative earnings are uncollectible.

Finally, while it is traditional to tell a responder the choice of the paired first mover before the response, a more economical approach involves contingent payments. In this “strategy method” (Selten, 1967), the responder states an action at each and every information set. This permits the researcher to obtain an observation at every node of the game, which is particularly valuable when a node is reached rather infrequently. However, while the quantity of data is maximized, there remains the issue of the quality of the data. The strategy method is quite popular, but remains controversial. The most exhaustive study to date (Brandts and Charness, forthcoming) examines many comparisons of results with the two methods, finding that there is generally no qualitative difference; we are unaware of any experiment in which a treatment effect is found using the strategy method that vanishes when the game is played through (“direct response”). In any case, this is an arrow in the experimentalist’s quiver and is something to consider.

We close this section with a list of “fatal errors” mentioned in Holt (2007, p. 14):

1. Inadequate or inappropriate incentives
2. Non-standardized instructions and procedures
3. Inappropriate context
4. Uncontrolled effects of psychological biases
5. An insufficient number of independent observations
6. Loss of control due to deception or biased terminology
7. The failure to provide a calibrated baseline treatment.
8. The change in more than one design factor at a time.

Needless to say, one should endeavor to avoid these pitfalls.

2.2 Design questions in principal-agent/effort experiments

Principal-agent experiments, discussed in detail in Sections 3 and 4, are a broad class of experiments in which a principal (who in some cases is the experimenter herself) first specifies a “contract” that describes how the agent (who moves second) will be rewarded as a function of his performance in a task. In the second stage, the agent performs the task, choosing—among other things—how much effort to expend. Principal-agent experiments are perhaps the largest class of lab experiments of interest to labor economists; this section discusses some design issues specific to these types of experiments.

A first question in these experiments is whether there will be a market for contracts. Most experiments simply start with firm-worker pairs that can realize some rents if they make an exchange and have a fixed outside option if they do not exchange. For many questions this is perfectly fine. But this leaves no room for labor markets, which can affect and be affected by the nature of principal-agent interactions. Early principal-agent experiments (e.g. Fehr et al. (FKR) 1993) incorporated an ex ante market for labor contracts, and showed that fairness considerations in the principal-agent interactions caused that market to fail to clear. More recently, Charness et al. (forthcoming) have shown how causation can run the other way: introducing ex post labor markets can eliminate the well-known ratchet effect in the repeated principal-agent problem.

If one chooses to implement a market in the laboratory, how can this be achieved? One approach involves an auction, as in FKR, who set up a two-stage game in their experiment. The first stage was a one-sided oral auction in which firms made wage proposals, but could not choose any individual worker, as every worker could accept every offer. If a worker accepted an offered wage, a binding contract ensued; people who were not paired at the end of three minutes received zero profits for this period. In the second stage, workers chose effort anonymously (only the paired firm learned the chosen effort). Other, simpler approaches to modeling agents’ outside options include simply manipulating the agent’s compensation if he/she chooses not to work for the principal to whom he/she has been assigned, or allowing an agent to receive simultaneous offers from more than one principal (see for example Charness et al., forthcoming).

As mentioned earlier, some labor experiments use some form of real effort, while others use a stated effort level that is simply a transfer (at some rate of exchange) from the agent to the principal. An advantage of stated effort is that we know the disutility-of-effort function and can therefore calculate exactly what the equilibrium effort levels should be, according to different theories. This approach also allows the investigator to induce, and manipulate, differences in ability/cost of effort, separately from other personal characteristics (e.g. risk aversion, competitiveness, reciprocity) that might be correlated with it in a sample of persons. Of course, the advantage of using real effort is that the task is more in line with what most people consider labor, and so might be considered to be a better match to the field environment.23

Another design issue concerns who plays the role of the principal. One approach is to place all participants in the role of agents, with the agent’s compensation scheme manipulated by the experimenter. If the researcher’s only interest is in the response of agents to different compensation schemes, one might argue that this is the simplest and most economical design: all subjects are agents, essentially working “for” the experimenter by performing either a real task or selecting a level of “chosen effort”.24 An alternative design assigns some subjects to the role of principals, who choose compensation policies to which agent-subjects respond. Some arguments in favor of the former approach are that (a) subjects might be more disposed to treat the experimenter (as opposed to a fellow student in the lab) like a “real” employer, and (b) the behavior of college students acting as firms provides little insight into the behavior of “real employers”. Also, when participants choose the pay scheme, it is not randomly assigned. On the other hand, the latter approach (with subjects as principals) may have advantages if one is interested in social preferences towards persons other than the experimenter, or in the behavior of principals per se (for example if the subjects are experienced managers).

Additional considerations in the design of principal-agent experiments include whether workers can self-select among reward schemes, such as a tournament or a piece-rate scheme (Niederle and Vesterlund, 2007); whether the experimenter induces reference points (Abeler et al., 2009); allows communication (Charness and Dufwenberg, 2006; Brandts and Cooper, 2007); allows for a monitoring/fines technology (Fehr et al., 2007); or for some coercion of agents (Falk and Kosfeld, 2006). In the case of multiple agents per principal, the experimenter needs to decide whether agents can observe each other’s actions (Falk and Ichino, 2006) or wages (Charness and Kuhn, 2007), whether pay is based on relative performance, or whether subjects interact in teams. All of these, and related questions, constitute the fabric of an extensive research agenda on principal-agent interactions in the lab, which we review in detail in Sections 3 and 4 below.

2.3 Reading papers involving laboratory experiments

Labor economists may be at a loss in reading papers that report the results of laboratory experiments. The format may well be unfamiliar, the design mysterious, and the statistical methods foreign. In addition, many experimental papers seem written for experimental audiences, rather than the general population of economists. Nevertheless, there are some pointers that can be provided for labor economists interested in gleaning the substance and details of experimental papers.

Perhaps the most important factor in reading an experimental paper is to understand the experimental design. This is not always as clear as it should be in the text; often, experimental referees first read the experimental instructions. These should be consulted if there is any doubt concerning the exact procedures. It is critical for the reader to understand the flow of information; this means knowing what the participants knew and when they knew it, in terms of the stages of the experimental game or task. When the design is complex, there is also a concern that participants may not have understood the game or task involved.

Once the instructions are understood, the reader should consider how well the experimental design constitutes an appropriate test of theory or matches the “ideal” field environment of interest. While one should be fairly insistent that the details of the design deal correctly with the issues involved with a test of theory, one cannot expect the experimental design to precisely match the field environment. Still, it is important that the reader is persuaded of the relevance of the experiment to the field or to the theoretical environment. The reader should also be alert to the issue of framing, given the substantial possibility that this affects behavior. To a certain degree, framing effects may wash out when one compares across treatments, but this can be a delicate issue.

An important issue when reading an experimental paper is the presentation and analysis of the data. If one is concerned with “where the bodies are buried” some degree of caution may be sensible. Authors have been known to put the best face on the data (for example, empirical researchers may tend to report the more useful regressions), so readers should keep this in mind. For example, sometimes articles emphasize (or only report) data from a subset of the periods; at times this can be justified and at times it is convenient. Sometimes authors will pool data from treatments; this increases the number of observations and makes statistical tests more powerful, but this pooling must be justified. In general, it has happened that experimental papers (and others) have interpreted their results in a favorable light. One should consider whether these interpretations are justified and whether there are alternative interpretations.

Regarding the issue of statistical and econometric tests, since most labor experiments in the laboratory feature multiple periods and interaction amongst the participants, one must have some approach towards determining how to treat multiple observations for the same individual. Labor economists are very familiar and comfortable with panel-data techniques, but experimenters are less so. Some feel that each session can only be considered to present one independent observation. A less strict approach is to collapse each individual’s choices to an average, eliminating the issue of multiple observations (but not eliminating the issue of interactions during the session). In either of these cases, it is common for experimenters to report non-parametric tests, and sometimes no regressions are reported; labor economists may be unfamiliar with these tests. One workhorse is the Wilcoxon rank-sum test, which ranks the behavior of individual participants of individual sessions in each treatment and then compares the sums. When within-subject data are available, the binomial test is often used; here one can compare changes for each individual across tasks. If these changes go predominantly in one direction or the other, one can conclude statistically that the behavior is significant. The reader should understand how these tests (and the ones reported in the article) work.

3 Testing “Traditional” Principal-Agent Theory in the Lab

The question of how workers’ choices of effort and work hours respond to financial incentives is among the oldest questions in labor economics. In this section we consider how these questions have been addressed in the laboratory; our treatment roughly follows the literature on principal-agent models and in personnel economics by beginning with the simplest forms of work incentives (a wage per hour worked or an individual piece rate), moving on to incentives based on relative performance (tournaments), incentives for teams, multitask settings, and multi-period principal-agent settings. Not only does “traditional” principal-agent theory serve as a useful organizing device for our discussion, many of its predictions are confirmed in the lab.25

That said, the experiments summarized in this section yield a number of robust results that are inconsistent with standard principal-agent models, including for example a strong apparent tendency by workers to “reciprocate” generous wage offers from firms, even when such reciprocal behavior is costly to workers. In Section 4 we focus specifically on the use of experiments and the development of new theoretical models of social preferences to understand these “anomalies”, with the ultimate goal of developing a more general class of models that is more firmly grounded in empirical fact and might be dubbed behavioral principal-agent theory.

3.1 The basic principal-agent problem: One principal, one agent, one task, and one interaction

3.1.1 Animal labor supply experiments

To the best of our knowledge, the earliest economic studies of the effects of material incentives on labor supply in the laboratory were the animal experiments of the early 1980s (Battalio et al., 1981; Battalio and Kagel, 1985).26 Much of this work is summarized in Kagel et al. (1995); see also Kagel (1987) for a general discussion of the contribution of animal experiments to economics. A key objective of these studies was to test the classic, static economic model of labor supply in which an agent chooses consumption image and leisure image to maximize a quasiconcave utility function image, subject to the constraint image where image is the wage rate and image is unearned income. In these experiments, hungry animals expend real effort (key pecking for pigeons, lever presses for rats) to obtain income; the experimenters then vary both parameters of the budget constraint (image and image) exogenously and study the animals’ reactions.

The key prediction tested by the authors is the labor supply response to an income-compensated wage decrease. As predicted by the standard model, both pigeons and rats reduce their labor supply and consumption (Battalio et al., 1981; Battalio and Kagel, 1985). The authors also study the pure income effects of declines in nonlabor income image: in virtually all cases these raised labor supply, indicating that leisure is a normal good. The normality of leisure means that it is common to observe backward-bending labor supply curves among animal workers (see for example Battalio et al., 1981, Table 3).27

Another interesting feature of the animal studies that generalizes to the plethora of human studies is the presence of large subject effects: while most subjects respond to changes in incentives in the direction predicted by simple utility maximizing models, both the level of effort at any given reward and its responsiveness to incentives vary widely across subjects.

3.1.2 Piece rates and effort

To the best of our knowledge, the first laboratory experiment to examine labor supply responses to wage changes among humans that is couched in economic theory appeared in an accounting journal (Swenson, 1988). Swenson’s subjects supplied “real” effort (repeatedly typing “!” then “enter” on a computer keyboard—this requires two hands and does not allow for continuous cursor movement).28 Wages per character typed were fixed, but “taxed” (this language was used in the subjects’ instructions) at rates ranging from 12 to 87%. Total tax proceeds from the previous session were randomly distributed to the subjects in the following period, mimicking a balanced government budget but breaking most of the connection between current individual effort and future lump-sum income. The primary questions addressed were how labor supply and total tax revenues respond to the tax rate. Both curves were backward-bending, with tax revenues (i.e. the Laffer curve) peaking at the 73% tax rate.29

A decade later, economists Sillamaa (1999a,b) and Dickinson (1999) conducted similar real-effort experiments.30 Like Swenson’s, Sillamaa’s experiments were motivated by questions about the impact of taxation (in one case, the impact of tax progressivity, in the other the effect of a zero top marginal rate), though in her case taxes were never mentioned in the subjects’ instructions. Sillamaa found that (a) work effort responds more (positively) to real wage increases in the presence of an (equivalent) linear than a progressive income tax, and (b) introducing a zero top marginal tax rate also increased effort.

Like Sillamaa, Dickinson (1999) paid his subjects a piece rate, but in some treatments allowed his subjects to choose between two types of leisure: on- versus off-the-job. This modification is noteworthy because it provides one of the few empirical links between the types of work decisions that are usually studied in lab (and field) experiments (effort) and the traditional application of labor supply theory (to hours worked). Specifically, in the baseline (“intensity”) treatments, subjects were required to stay for the entire two-hour experimental period; thus any time not working was spent in the lab. In the “combined” treatment, subjects could leave at any time during the experimental period. Consistent with theory and with previous research, subjects increased their output in the baseline treatment, substituting on-the-job leisure for effort when incentives were strengthened. In the combined treatment, many subjects responded to higher wages by working more quickly, but reducing their total work time by leaving the experiment early. This substitution of off-the-job for on-the-job leisure is offered as a possible explanation for why econometric estimates of labor supply elasticities (which use hours worked, not effort as their measure of labor supply) are often close to zero. Dickinson’s analysis also points out that care must be taken in relating the results of laboratory labor supply experiments (where workers’ effort during a fixed work period is the outcome of interest) to econometric studies of labor supply (where hours worked is the outcome).

Gneezy and Rustichini (2000) also studied the response of work effort to financial incentives; they conducted real-effort experiments in both the lab and the field, with similar results: the relationship between the piece rate and effort was U-shaped, with low piece rates eliciting less effort than a zero piece rate. (It may be interesting to note that this is exactly the opposite of the backward-bending labor supply curve in the one-period neoclassical model, which yields an inverted U). The authors hypothesize that small levels of financial compensation (explicit incentives) may “crowd out” workers’ intrinsic motivation to perform these tasks.31 While this explanation may be more relevant to their field experiment (where the workers solicited charitable contributions) than their lab experiment (which had no charitable component) the phenomenon was observed in both settings. Since earlier studies of piece rates did not, to our knowledge, implement treatments with a zero rate, Gneezy and Rustichini’s results do not necessarily conflict with those findings, whether on human or animal subjects.

In a more recent real-effort experiment, Cadsby et al. (2009) show that the effect of performance incentives varies with agents’ risk aversion. In their experiment, 25% of subjects actually perform worse when incentives are intensified; further the probability of such deterioration increases with risk aversion and with measures of stress. A similar result is obtained by Ariely et al. (2008), who exposed subjects in the US and India to incentives ranging from small to very large (relative to their typical levels of pay). In many cases, very high rewards had a detrimental effect on performance. Combining these results with the nonmonotonicity identified by Gneezy and Rustichini (2000) above suggests that the effect of stronger incentives on performance, predicted to be monotonic by basic labor supply theory (at least when income effects are unimportant, which is expected for laboratory experiments on humans), may in fact be highly non-monotonic.

3.1.3 Selection into piece rate compensation

Since Lazear’s famous Safelite study (2000) economists have realized that a significant share of the productivity improvements associated with piece rates can take the form of voluntary self-selection of higher-productivity workers into piece rate schemes, rather than changes in the work effort of existing workers. Laboratory studies that allow for self-selection into different pay schemes abound, though many of these focus specifically on selection into tournaments and teams. These studies are discussed later in this section. A recent study that considers the self-selection that occurs when a simple piece rate is introduced is Cadsby et al. (2007). As in most studies, pay-for-performance raises productivity. Like Lazear (2000), they also find that sorting enhances this effect: more productive employees are more likely to choose pay-for-performance schemes.32

A somewhat different perspective on selection into pay-for-performance schemes is provided by two recent papers by Bandiera et al. (2007, 2009a) who conducted field experiments in a fruit-picking firm. When a pay-for-performance element (based on their unit’s output) is added to managers’ compensation schedules, Bandiera et al. (2007) find that managers are more likely to select able workers into the units they manage. In their 2009 paper, the same authors show that this shift towards abler workers came at the expense of workers who were socially connected to the manager. In both papers, the shift away from friends increased the work group’s total output and the manager’s compensation. In contrast, Belot and van de Ven (2009) find in a field experiment with children that agents who are selected because they are friends increase their subsequent performance, presumably to reciprocate the favor of being selected. In such cases, favoring one’s friends can be costless, or even beneficial to the manager and the firm. To our knowledge, the effect of favoritism on selection into pay-for-performance has not yet been studied in the lab.

3.1.4 Reciprocal behavior

Evidence on the apparent presence of reciprocal behavior in workers’ effort choices in economics goes back at least to the pure gift-exchange labor markets implemented by Fehr et al. (1993). Labor contracts in these settings contained no explicit incentives; despite this, workers supplied costly effort, and supplied more effort the higher the (lump sum) wage the principal paid them. We treat this “pure” gift exchange literature in another section; here we provide one or two examples of how workers’ apparent concerns for reciprocity in the laboratory affect the performance of standard incentive contracts, such as piece rates.33

An illustrative paper in this regard is Anderhub et al. (2002), who study the behavior of both principals and agents where the contract specifies the agent’s pay as a linear function of his/her output. (All the papers considered thus far study agents’ reactions to reward schedules set by the experimenter.) Because there is no uncertainty, the efficient linear contract has a piece rate of 100%. Further, because principals make take-it-or-leave-it contract offers to agents in this experiment, the intercept term of the equilibrium linear contract is predicted to extract all of the agent’s surplus if social preferences are absent. As one might expect, principals and agents behave relatively efficiently with respect to the slope of the contract (principals choose a 100% piece rate 30% of the time and a positive piece rate 98% of the time; agents optimized against this, choosing conditionally rational effort levels 87% of the time). Social preferences, however, clearly affected both the principals’ choice of the intercept and agents’ responses to it: agents rejected offers that split the surplus too unevenly, and principals made few such offers. There was also some tendency for generous offers to lead to higher effort levels, though as already noted the vast majority of effort decisions were egoistically rational given the piece rate.

While effects such as those reported above are both dramatic and common in lab experiments, we note that more recently, Gneezy and List (2006) have argued that positive reciprocity effects detected in lab experiments can wear off very quickly in the field; Kube et al. (2006a,b) in turn generate longer-term effects of reciprocity in the field, especially for negative reciprocity. We discuss these questions further in the section on reciprocity and social preferences. Finally, we note that, in addition to modifying the nature of principal-agent interactions, social preferences may also explain why some principal-agent relationships exist in the first place. For example, Hamman et al. (2008) report on an experiment in which principals can hire agents to behave selfishly on the principal’s behalf. Delegation of decisions that would otherwise make the principal act directly in a selfish manner appears to yield more lucrative outcomes for principals.34

3.1.5 Reference points

An emerging issue in the study of one-on-one principal-agent relationships is the effect of reference points on effort provision. Part of the inspiration for this is a lively debate in the non-experimental literature on the presence of reference points in labor supply decisions by agents (in particular, taxi drivers and bicycle messengers) who can vary their hours and effort on a daily basis (see for example Camerer et al., 1997, Farber, 2005, Fehr and Goette, 2007, Farber, 2008, Crawford and Meng, 2008). One advantage of addressing this issue in the lab is that some possible reference points—for example, expected earnings in a round or session—can not only be observed, but manipulated by the experimenter. This is the approach taken by Abeler et al. (2009).

In Abeler et al.’s experiment, subjects are paid a piece rate to perform a tedious task. At the end of the period, with 50% probability they are paid their accumulated piece rate; otherwise they receive a fixed payment that is known in advance. Subjects decide how much to work before they know whether they will receive the fixed payment or their accumulated piece-rate earnings. Abeler et al. find significant bunching of piece-rate earnings at the level of the fixed payment. Further, this spike in the earnings distribution moves when the fixed payment is changed. Neither of these is consistent with the “standard” effort-leisure choice model (unless one were to introduce fairly unusual forms of non-separability between income and leisure). Instead, the authors argue that their results are consistent with Koszegi and Rabin’s (2006) model of reference-dependent preferences, where the reference point is the subject’s expected earnings for the experimental session (which is manipulated by the experimenter). The authors take considerable care to ensure that subjects’ choices of “target” earnings are not driven purely by the salience of those particular numbers in the instructions and experimental environment.35

3.1.6 Motivational ‘crowding out’

Another question addressed in the experimental literature on worker-firm interactions is the effect of certain “coercive” features of contracts, such as minimum effort requirements or employee monitoring, on agents’ effort levels (and more broadly, on contract efficiency). In this regard, Frey (1993) proposed that, especially in environments where the principal and agent know one another personally, the principal’s decision to monitor the agent may be interpreted as a signal of distrust, and may reduce effort despite the obvious direct “disciplining” effects of monitoring. Falk and Kosfeld (2006) test a closely related idea in the lab using a very simple game where agents choose effort (which costs them less than it benefits the principal), and principals’ only decision is whether to impose a minimum effort level on the agent. (This is essentially a gift-exchange game without an initial “gift”—agents’ endowments are positive and principals’ are zero.) If principals’ decisions to impose a minimum effort level had no effects on agents’ behavior, the truncated distributions of agents’ effort (above the imposed minimum) should be the same whether the minimum is imposed or not. This is not the case: Falk and Kosfeld find “hidden costs of control” in the sense that the majority of agents reduce effort when firms attempt to “control” their actions (though the effort levels of a smaller number of “opportunistic” agents were mechanically increased by the effort minimum). In most treatments, these net reductions in effort were so substantial that principals who “controlled” earned lower payoffs then those who did not. In a follow-up survey, the authors asked agents the free-form question “What do you feel if [the principal] forces you to transfer at least [image] points?” The most common response was “distrust”, especially among agents who reacted negatively to control. The authors’ results suggest that, at least in a laboratory environment, rigid attempts to control agents’ behavior can “backfire”; the authors also provide some support for the external validity of their results by administering a survey eliciting students’ self-reported “work motivation” in a variety of hypothetical work situations involving different degrees of employer control or trust.

In a clever variation on Falk and Kosfeld’s design, Schnedler and Vadovic (2007) show that control by principals does not elicit negative reactions from agents when the principal’s control is legitimized in two alternative ways. In one of these, the principal must set a common control policy that applies not only to the agent, but also to a computerized “automaton” agent who supplies minimum effort whenever this is allowed. Perhaps not surprisingly, agents “understand” the principal’s decision in this case and do not reduce their effort when controls are imposed. In the other, the principal is given a small endowment (in contrast to zero in Falk-Kosfeld), and agents are allowed to take from this endowment by choosing a very low effort level. Here as well, experimental subjects treat control decisions that simply protect the principal’s endowment from agent “pilfering” as legitimate.

Irlenbusch and Sliwka (2005b) suggest an intriguing explanation of the negative incentive effects of paying for performance in a simple experiment where principals and agents first play a pure gift-exchange game, followed by a game in which principals had the option of offering a piece rate in addition to the fixed payment (gift). Consistent with Gneezy and Rustichini (2000), agents’ effort actually fell after a low piece rate was introduced. This is particularly interesting since the task performed by the agents was chosen to yield little or no intrinsic reward. More importantly, effort fell even further when, in a third treatment, piece rates were once again disallowed. A possible explanation is that agents’ perception of the implicit contract offered by principals is changed by the introduction of piece rates: the presence of piece rates signals that agents are expected to behave egoistically; while the offer of a fixed wage signals that, as in many real-world employment relationships, a reasonable amount of effort is simply expected in return for a wage. Certainly, Irlenbusch and Sliwka’s results suggest that studies of what appears to be intrinsic motivation should pay close attention to subjects’ interpretation of the implicit contractual understandings that may be signaled by different pay schemes.

Fehr et al. (2007b) find a similar pattern when they compare the performance of three types of contracts in a simple laboratory experiment. In an “incentive” contract, the principal stipulates a wage image, a required effort level image, and a fine image. If the agent accepts the contract, he is ‘audited’ with exogenous probability image, and is forced to pay the fine if the effort he has chosen falls short of image. In a “bonus” contract, the principal announces a wage, a desired effort image and her (unenforceable) intention to pay a bonus image if image. Finally, a “trust” contract is pure gift exchange in which the principal offers a wage and simply requests effort in return.36 For their parameterization, FKS find that, when principals must choose between trust and incentive contracts, incentive contracts performed better: they yielded higher effort levels, higher payoffs for both principals and agents, and were increasingly selected by principals over the course of the experiment. These results are consistent with findings of Lazear (2000) and others that incentives increase effort. In contrast, however, when principals must choose between incentive and bonus contracts,37 bonus contracts dominate incentive contracts: they constitute the overwhelming majority of contracts offered, and yield higher levels of effort and payoffs to principals; this result contradicts the predictions of contract theory with egoistic agents. The authors explain these contrasting results by parameterizing Fehr and Schmidt’s (1999) model of inequity aversion. Essentially, if the ‘fair-minded’ share of the population is neither too high nor too low, there are too few fair-minded persons to make trust contracts perform best, and two many fair-minded persons to make incentive contracts work best. That said, the authors recognize that inequity-aversion is not the only possible explanation for their results; indeed, the likelihood that incentive contracts signal distrust (plus the fact that the authors constrain the enforcement technology in incentive contracts to make a first-best allocation infeasible) may also help explain this pattern of outcomes.

Dickinson and Villeval (2008) also consider the effect of monitoring (the key element of FKR’s “incentive contract”) on work effort; their setting is a real effort laboratory experiment, where the task was designed to contain an element of intrinsic motivation.38 Principals choose monitoring intensity, which raises the probability the agent is audited (and penalized via a “fine” paid to the principal if his output did not exceed the target). Dickinson and Villeval vary two main aspects of the environment: in the “variable” treatment, the principal’s profit, as usual, depends directly on the effort chosen by the agent. In the “fixed” treatment it does not. The other aspect that is varied is the degree of anonymity; interaction is either anonymous or preceded by five minutes of face-to-face interaction. Dickinson and Villeval find that monitoring raises agents’ effort in the anonymous setting, as predicted in the standard agency model. Motivational crowding-out is observed only when interactions are not anonymous and when the principal’s payoff depends directly on the agent’s effort (their “variable” treatment). This suggests that the motivational “crowding out” by monitoring is not driven primarily by a reduction in the intrinsic rewards derived from the task, but from a form of negative reciprocity (punishing the principal for a lack of trust).39

3.1.7 Nonlinearities: Targets, fines and bonuses

While most of the experimental literature on piece rates considers simple linear reward schedules, and while linear contracts are theoretically sufficient to achieve efficiency when agents are risk neutral, it is interesting to consider the effects of nonlinearities in individual piece rate contracts, since these do occur in the real world.40 The one experimental paper we know of that focuses on this topic is Cadsby et al. (2008). The distinguishing feature of their experiment is that the task allowed agents to misrepresent their own performance (the number of words created in an anagram game). While actual output was similar under target-based pay schemes versus a continuous (linear) reward scheme, the former produced significantly more cheating. Further, cheating is more likely under a target-based scheme the closer a participant’s actual production is to the target. Since the agent’s rewards to cheating are also greatest in these situations, Cadsby et al.’s results are both consistent with theory and indicative of a possible drawback with sharp discontinuities in reward schedules.

3.1.8 Peer effects and wage comparisons

Although “peer effects” on effort can only exist when a firm employs multiple workers, we consider peer effects in this section on one-on-one principal-agent interactions because “pure” peer effects refer to a situation where workers work, side by side, for the same firm but do not interact in any way (except that they observe each others’ work activity). For example, suppose that two workers are each paid an individual piece rate, and there are no substitutabilities or complementarities in production, but can observe each other’s effort or output. Does anything change? Perhaps surprisingly, it does. In a real-effort experiment, Falk and Ichino (2006) find that average output is higher. Further, the standard deviation of output is lower within worker pairs than between pairs. Essentially, low-productivity workers raise their output towards that of their co-worker when a co-worker is present. One can imagine a number of possible explanations for this behavior, including subject uncertainty about the “true” compensation schedule. Similar results were found in a field experiment by Bandiera et al. (2009a), but only when the co-workers were friends. Specifically, in a situation where workers received individual piece rates and no appreciable production externalities existed, workers who were less able than a co-worker with whom they are friends increased their effort (and hence income) by 10%. In contrast to Falk and Ichino, however, Bandiera et al. also found that workers who are more able than their co-worker friends reduce their effort and forgo 10% of their earnings.41

A final, related question is how workers’ effort changes when they can see each others’ wages. Charness and Kuhn (2007) pose this question in a pure gift-exchange game in which workers knew that their productivity was different from their co-worker’s, but did not know the size or direction of this difference. If between-worker equity concerns are important determinants of effort, we might expect that low-productivity workers (who tend to receive lower wages) would reduce their effort in treatments where they observe their co-worker’s wage than when they do not (indeed this is suggested by Akerlof and Yellen (1990)). Perhaps surprisingly, they do not. Agents’ primary concern seems to be to reciprocate generous wage offers from the firm; the authors speculate that responding to wage offers made to their co-workers would likely muddy this “signal” and is therefore avoided by workers.42

In sum, laboratory tests of the one-period, one-agent, one-task principal-agent model have identified the following broad empirical regularities:

(a) Compensated wage cuts reduce effort in animal labor supply studies. In addition, leisure is normal, and uncompensated wage changes generate backward-bending labor supply curves (Battalio et al., 1981; Battalio and Kagel, 1985). For both animals and humans, there are large individual subject effects, both in the level of effort supplied for a given level of incentives, and in the responsiveness of effort to incentives.
(b) For humans, higher piece rates usually raise effort (Lazear, 2000; Dohmen and Falk, 2006; Cadsby et al., 2007).
(c) Not paying at all can yield higher effort than low pay (Gneezy and Rustichini, 2000). Very high stakes can reduce agent performance in certain types of tasks (Ariely et al., 2008).
(d) Effort also responds to the intercept of the worker’s compensation schedule, at least when generosity is seen as intentional (Fehr et al., 1993; Charness, 2004).
(e) Effort decisions can be affected by at least one type of reference point that can be manipulated in the lab: the subject’s expected earnings for the session (Abeler et al., 2009).
(f) Forcible restrictions on agents’ choice sets can reduce the efforts of agents on whom they are not binding (Falk and Kosfeld, 2006), but not when the restrictions are seen as “legitimate” (Schnedler and Vadovic, 2007).
(g) A decision by a principal to use piece rates can also reduce agents’ efforts; a likely explanation is that the introduction of piece rates changes the agents’ interpretation of the implicit contract for labor services (Irlenbusch and Sliwka, 2005a).
(h) A principal’s decision to monitor the agent can also reduce agents’ efforts, but only when the agents “know” the principal (Dickinson and Villeval, 2008).
(i) Unenforceable promises by principals to pay bonuses for “satisfactory” worker performance can elicit surprising amounts of effort, and can outperform more objective mechanisms such as random monitoring combined with punishment (Fehr et al., 2007b).
(j) Sharp discontinuities in reward schedules induce workers to misrepresent their output (Cadsby et al., 2008).
(k) Even when there is no strategic interaction between workers, workers’ efforts may depend on their co-workers’ efforts (Falk and Ichino, 2006). Co-workers’ wages do not appear to affect effort, at least in our earlier work (Charness and Kuhn, 2007).

3.2 Tournaments

3.2.1 Theory

Consider now a situation where a firm employs multiple workers, who still do not interact in production. However, because the firm bases rewards, at least in part, on agents’ performance relative to each other, workers’ effort decisions, pay levels, and utilities are interdependent. Relative performance plays a key role in a number of features of real-world compensation schemes, including promotions and bonuses for top-performing workers. Since at least Lazear and Rosen’s seminal 1981 paper, economists have understood that, if workers are risk neutral, pay structures which award prizes based only on workers’ relative performance can generate identical allocations in Nash equilibrium as would be achieved by optimal individual piece rates. This may be useful if rank order is easier to measure than cardinal performance, or if rewards are inherently indivisible (some promotions might be an example). Further, tournaments can be more efficient than piece rates if workers are risk averse and if their outputs are affected by a common shock. These results are robust to ability differences between workers if ability is public information: in that case, efficient tournaments typically include handicaps for the abler agents. As Carmichael (1983a,b), among others, has pointed out, payment by relative performance also mitigates an incentive problem affecting the principal, namely the incentive to understate workers’ true outputs, or to provide suboptimal levels of complementary inputs after the contract has been signed. Finally, as O’Keefe et al. (1984) have pointed out, contests may have an efficiency advantage if workers derive direct utility from competition itself.

Counterbalancing the above advantages, tournaments may be less efficient than individual piece rates when workers’ abilities are hidden information, especially if workers can self-select into tournaments (Lazear and Rosen, 1981). Also, in contrast to individual piece rates, tournament games in general require agents to think strategically about their co-workers’ effort levels to find a Nash equilibrium; this may make them less robust as incentive schemes. It is also worth noting that, in general, optimal contest design will be different if (part of) the contest’s objective is not simply to induce effort, but to identify the most talented contestant (e.g. for promotion). In this section we examine how tournament reward schemes work, not just theoretically, but when the games are played by human subjects in what has become a sizable experimental literature.

3.2.2 Early experiments

To our knowledge the first laboratory experiment on tournament-based incentives was by Bull et al. (1987). Bull et al. implemented tournaments between pairs of experimental subjects whose output was subject to independent, uniformly distributed productivity shocks (this guarantees a unique, pure Nash strategy equilibrium if the spread of the distribution is high enough), parameterized to yield identical equilibrium effort levels to a simple piece rate. As predicted, average effort levels were similar between the tournament and the piece rate, but effort variance across subject pairs was much greater under the tournament.43 This suggests that, while equivalent in principle, tournaments may be a much less robust incentive scheme than piece rates in practice because agents have difficulty finding a Nash equilibrium. Bull et al. also studied tournaments between players with different abilities, and found that less able agents systematically exerted more effort than the Nash equilibrium.

Schotter and Weigelt (1992) implement a very similar laboratory protocol but focus in more detail on “uneven” tournaments (where the participants’ abilities differ), as well as on “unfair” tournaments (where the rules favor one identical agent over another). They are also interested in the effects of policies that (a) restore fairness in unfair tournaments (termed “equal opportunity laws”), or (b) give handicaps to less-able agents in uneven tournaments (termed “affirmative action laws”). Schotter and Weigelt find that (a) again, that mean effort levels in fair, symmetric tournaments match the Nash equilibrium; (b) disadvantaged contestants in unfair tournaments supply more than Nash equilibrium effort; (c) mean effort in uneven tournaments matched theoretical predictions, though largely because the less able agents either worked too much or chose zero effort, neither of which was Nash behavior; (d) symmetrizing previously-unfair tournaments raised both agents’ effort levels; and (e) handicapping abler contestants raised total worker output and the principal’s profit when the ability difference between contestants was large. It did so largely by eliminating drop-out behavior among the less able contestants. The authors suggest that these results might have some relevance to the effects of affirmative-action programs in the real world.

3.2.3 Selection into tournaments

The early theoretical literature on tournaments considered the question of selection into tournaments largely from an adverse-selection perspective. For example, Lazear and Rosen (1981) predicted that, if workers have private information about their own ability, less able workers will “contaminate” tournaments designed for abler workers. Whether or not firms modify the structure of their tournaments to address this adverse selection problem, the resulting effort allocations will no longer be as efficient as individual piece rates.

Experimental studies of self-selection into tournaments tend to focus on different questions. For example, building on earlier market-entry experiments in Industrial Organization (e.g. Rapaport, 1995), which tend to find surprising levels of co-ordination on the efficient outcome in entry decisions despite the absence of communication among subjects, Camerer and Lovallo (1999) designed a game in which MBA students chose whether to enter a “market” where their success depended on performance relative to other entrants, and on the subjects’ own skill level (on a sample of logic puzzles or trivia questions about sports or current events). Camerer and Lovallo found excessive entry, which appear to stem not from inaccurate forecasts of the number of entrants or other factors, but from the subjects’ substantial overestimates of their own ability. Vandegrift et al. (2007) conduct a similar experiment, where they allow a fixed population of agents to choose whether to be paid an individual piece rate or to enter a pool where they receive a prize of fixed value for the best performance. The task performed by workers (forecasting the price of a fictitious stock based on cues that are correlated with the true price) is deliberately chosen to allow for “winning” to have some intrinsic or signaling value. These authors do not detect significant levels of excess entry. In this same vein, Niederle and Vesterlund (2007) also allow subjects to self-select between a piece rate and a tournament; their main interest is in gender differences. They find that, at given levels of ability, men exhibit significantly more overconfidence in their tournament entry decisions than women. We discuss this and related articles in more depth in the section on discrimination.

More recently, Eriksson et al. (2008) replicate Bull et al.’s (1987) original experimental design as closely as possible, with the exception that subjects are allowed to choose between a tournament (where they are randomly matched with another player) and an individual piece rate, calibrated to yield the same levels of optimal effort and expected utility. Thus, effort is a decision number and all agents are equally able. Eriksson et al., however, elicit subjects’ risk aversion after the experiment. They find that risk-averse subjects are less likely to enter the tournament. This has the additional effect of reducing the high between-subject variance of tournaments, which was cited by Bull et al. as a possible disadvantage of tournaments. Mean effort was about one third higher in the tournament scheme; as in Lazear (2000) half of this was due not to incentive effects but due to selection. This is of particular interest here, since agents’ abilities in this experiment were equal by construction.

3.2.4 Tournaments and risk-taking

Another prediction of tournament theory is that, in a tournament setting, agents have incentives to take actions that increase the spread of their output distribution, i.e. that increase risk. To our knowledge, Bronars (1986) was the first to discuss risk taking as a choice variable in tournaments; he argued that leading agents in sequential tournaments prefer a low risk strategy (to “lock in” their gains), whereas their opponents choose higher risk. Hvide (2002) shows that, even in a one-stage tournament, if there are no limits on risk taking, tournaments “collapse” in the sense that, for any given prize spread, agents choose infinite risk and zero effort in the Nash equilibrium. When there are limits on risk-taking, the Nash equilibrium is at the maximum level of risk; furthermore an exogenous reduction in the maximum permissible risk level raises both effort and welfare. Finally, Hvide shows that contests in which agents’ performance is ranked according to its absolute distance from a target level image (thus outputs in excess of image are punished) have superior efficiency properties in this environment.

Agents’ tournament-induced preferences for risk have been studied in various field contexts, including stock car racing—Becker and Huselid (1992) show that drivers take more risks when the prize spread is large—and investment fund managers (Brown et al., 1996; Chevalier and Ellison, 1997). Brown et al. find, as predicted by Bronars, that expected losers prefer high risks while expected winners prefer low risks. In the lab, Vandegrift and Brown (2003) find that high-variance strategies are indeed attractive in tournaments, but primarily to agents with low capabilities performing a simple task. Nieken and Sliwka (2010) extend the theory of agent risk selection in tournaments to cases where the agents face correlated risks. They argue that—in contrast to Bronars’ prediction—leading agents, rather than “playing it safe”, may be forced to imitate their opponent’s risky strategy. This prediction is confirmed experimentally. A possible application is to the case of mutual fund managers investing in the same, or similar, risky assets.

3.2.5 Sabotage

Since Lazear’s important article (1989), economists have recognized that any compensation system based on relative worker performance rewards workers who take actions that reduce the measured performance of their peers, i.e. to engage in sabotage. Of course, sabotage is inherently difficult to study in the field, because workers may go to considerable lengths to conceal their acts of sabotage. This consideration has led a number of authors to study sabotage in the laboratory.

The first published laboratory experiment on sabotage appears to be Harbring and Irlenbusch (2005), who investigate sabotage in both a baseline treatment where the prizes are exogenously manipulated by the experimenter, and in a setting where principals in the experiment can choose the prize structure. Four agents compete against each other with the top two receiving a “winner” prize. Investing in sabotage reduces the output of all the other agents. When the prize spread is exogenously set by the experimenter, Harbring and Irlenbusch find, as predicted in simple tournament models, higher prize spreads encourage both greater effort and more sabotage; interestingly the latter effect dominates, suggesting that pay compression may be an optimal strategy. This finding does not, however, generalize to the case where prize spreads are selected by participants in the experiment, perhaps because in the authors’ design higher spreads imply a higher expected value of the prize. Now, agents appear to reciprocate more generous compensation packages by refraining from sabotage. While it is not clear if this result would persist in designs that held the expected prize constant, the result does remind us that agents’ “behavioral” intentions to reward or punish the principal may also play a role in real-world sabotage decisions.44

In a 2007 article, Harbring, Irlenbusch, Kräkel, and Selten consider sabotage in a contest where players are heterogeneous in ability. Three contestants play a two-stage “Tullock” contest where each agent selects targeted levels of sabotage aimed at each of the two other players. (Sabotaging other players makes it more costly for them to exert effort.) All sabotage levels are then revealed, and effort choices made in the second stage. In addition, players can be of two types—those with ex ante high effort costs (“favorites”) or ex ante low effort costs (“underdogs”); this is publicly known in advance. Three treatments are implemented: one with homogeneous contestants, one with two underdogs and one favorite, and one with one underdog and two favorites. Contrary to expectations, when there are two underdogs, they do not “conspire” against favorites by directing their sabotage against the favorite. In line with expectations, an underdog engages in less sabotage when she is playing against two favorites than one; this resembles the dropout behavior of less-skilled agents in Schotter and Weigelt (1992). Harbring et al. also examine some treatments in which a saboteur’s identity is revealed to the other contestants; they find that retaliation occurs in future rounds and that overall sabotage is less common.

Harbring and Irlenbusch (2008) implement tournaments with two, four, or eight contestants, and with the share of contestants who receive the winning prize equal to 1/4, 1/2, or 3/4. Agents choose both their own effort and a level of sabotage that affects all other agents equally. Parameters are chosen to yield identical Nash equilibrium effort levels in all treatments. Consistent with Nash behavior, neither tournament size nor the share of winning prizes has strong effects on effort, or on sabotage, though there is some tendency for effort to be higher when the share of winning prizes is 1/2, compared to 1/4 or 3/4. In a more recent paper (2009) the same authors introduce communication among the principal and agents. This turns out to curb sabotage via agreements on flat prize structures and increased output.

To our knowledge, the only papers to study sabotage in a real-effort experiment are Carpenter et al. (2010) and Charness et al. (2010). Sabotage in Carpenter et al.’s context takes the form of peers’ subjective evaluation of the quality of each others’ output. (The task is printing letters, placing them into envelopes, and handwriting addresses on envelopes. Subjects also had an opportunity to miscount the number of envelopes produced by their co-workers.) Carpenter et al. found that subjects responded to the possibility of sabotage by their co-workers by producing less output than when peer review was not possible. Piece rate compensation performed much better than the tournament with peer-review. In Charness et al. (2010), people are paid a flat rate for their work (the task consists of decoding sets of one-digit numbers into letters from a grid of letters that is displayed on the computer screen), and learn about the rank of their production in a 3-person group. In one treatment, people could pay to sabotage the production of the other people in the 3-person group and many people did so.

3.2.6 Collusion

In all tournaments, agents have an incentive to collude against the principal: if they can all agree to exert zero effort, the prize in any fair tournament will be randomly assigned to one of them, and they will all be better off than if they had truly “tried” to win the prize. Despite this feature, collusion rarely appears to occur in tournament experiments.45 This may be because various features of the design, including anonymity and re-matching, are deliberately chosen to make collusion difficult. To our knowledge, Harbring and Irlenbusch (2003) is the only laboratory experiment that addresses the issue of collusion. They find (as one might expect) that the smallest (two-person) tournaments are the most conducive to collusion. It would be interesting to see if greater amounts of collusion are observed when there is less anonymity and more opportunity for communication and repeated interaction among agents.46

3.2.7 Feedback

Suppose that the production process takes time during which partial information about the agents’ relative performance becomes available. How will this information affect agents’ subsequent effort decisions? Theoretically, even with standard preferences, this depends on both the distribution of the randomness in the effort-output relationship and the shape of agents’ disutility-of-effort functions; thus Eriksson et al.’s (2008) recent finding that feedback has no effect is not necessarily surprising.47 More recently, however, Gill and Prowse (2009) consider a case—where the probability of winning is linear in the difference in the agents’ efforts—where such information should have no effect under standard preferences; despite this, they find that agents who are behind exert less effort.48 They interpret this “discouragement effect” as a consequence of disappointment aversion, and estimate both the level and heterogeneity of disappointment aversion in their sample using structural methods.

Using field data from over 60,000 basketball games, however, Berger and Pope (2009) find the opposite: being slightly behind at halftime leads to a discontinuous increase in a team’s winning percentage; this apparent psychological effect is roughly half the size of the home-team advantage. This field data is corroborated with experimental evidence; the experiment sheds important additional light on the field data by helping to rule out alternative explanations. The contrasting results of these three studies on feedback suggest there is much we still do not understand about the effects of providing interim performance feedback on effort decisions.49

In sum, laboratory tests of the tournament models have identified the following empirical regularities:

(a) A properly-designed tournament can replicate the results of an efficient piece rate in expectation, but generally yields greater variance in mean output across agent groups (Bull et al., 1987).
(b) Handicaps, or “affirmative action” tend to improve the performance of tournaments between unequal agents. Part of this effect, however, is due to non-Nash choices by less able agents in the absence of handicaps (sometimes working too hard, sometimes dropping out completely) (Schotter and Weigelt, 1992).
(c) Under some conditions, decisions to enter into tournaments are surprisingly close to optimal levels (Rapaport, 1995). For certain populations, however, entry can be excessive due in part to overconfidence. See Camerer and Lovallo (1999) for MBAs and Niederle and Vesterlund (2007) for men. Allowing risk-averse agents to self-select out of tournaments reduces the between-group variance in output (Eriksson et al., 2008).
(d) Tournaments can increase risk-taking (Vandegrift and Brown, 2003); this effect is not necessarily confined to agents with a low probability of winning (Nieken and Sliwka, 2010).
(e) Increases in tournament prize spreads can raise sabotage as well as effort; this effect can be strong enough to reduce total output (Harbring and Irlenbusch, 2005; Carpenter et al., 2010). Agents do not always “target” their sabotage in the expected direction (Harbring et al., 2007).
(f) Collusion is rare in anonymous tournaments with more than two contestants (Harbring and Irlenbusch, 2008).
(g) The effects of interim performance information on agents’ subsequent efforts remain poorly understood. Recent studies have found intriguing effects in opposite directions, with Gill and Prowse (2009) finding that being behind reduces effort (when standard preferences predict no effect) and Berger and Pope (2009) finding the opposite.

We conclude this section by noting one aspect of tournaments that has so far received relatively little attention in the lab. This is the use and effectiveness of tournaments as a tool, not for eliciting efficient amounts of effort, but for identifying the more able player. (A recent exception in the field is Calsamiglia et al., 2009). If—as Gibbons and Waldman (1999) argue—it is efficient for organizations to promote abler persons into higher-level positions, then the almost-exclusive focus of the experimental literature on tournaments’ consequences for effort levels (as opposed to efficient inference of the agents’ underlying abilities) may be missing a key function of relative performance evaluation schemes in real organizations.

3.3 Teams

3.3.1 Holmstrom’s model

In the classic model of agency in teams (Holmstrom, 1982), a group’s output, image, is a differentiable function of the effort levels of its image members, image. The principal’s problem is to design a set of compensation functions, image, that depend only on the group’s total output and induce efficient effort choices by all agents. Holmstrom’s well known result is that such a function cannot exist if image takes the form of a “sharing rule”. A group compensation function is a sharing rule if it satisfies image, image, i.e. it balances the budget for all possible group output levels, not just the group’s equilibrium output. Thus, free riding is inevitable unless the principal can commit to paying agents as a group more than their combined output for some out-of-equilibrium effort choices, and less in others.

To our knowledge, the first economists to study team production in the lab were Nalbantian and Schotter (1997).50 In their experiments, Nalbantian and Schotter create two teams of six workers each whose group output is given by the formula image, where image is a uniformly distributed random variable. Four compensation schemes are compared: revenue sharing (where each worker is paid 1/6 of the group’s output), “forcing” contracts or targets (where revenue is shared if group output exceeds a target; otherwise all agents receive a low payoff), “gainsharing” (where the target is a function of the team’s past performance in the experiment), and a tournament between the firm’s two teams (where all members of the team with the higher output receive a prize). Consistent with the “classic” model’s predictions, effort levels under the revenue sharing scheme converge towards individually rational levels, which entail a high degree of free riding (though they start out considerably higher in early rounds). Forcing contracts (including gainsharing) perform poorly, perhaps because of the multiple equilibria that are theoretically associated with the induced game among agents. Competition between teams generates the highest level of efficiency among all these treatments, at least for Nalbantian and Schotter’s parameterization, which sets a prize high enough to (theoretically) induce efficient effort levels.51

More recently, Meidinger et al. (2003) have introduced “active” principals into Nalbantian and Schotter’s protocol: here, a subject in the role of the principal first makes an offer to two prospective team members; the offer is the share of the team’s output that the team gets to keep—either one half or two thirds. Agents then decide independently on effort levels. In addition to the usual cooperation versus free-riding considerations, agents’ effort decisions in this context also appear to be affected by a desire to reciprocate generous “share” offers from the principal.

3.3.2 Team production and the voluntary contributions mechanism (VCM)

A second branch of the “team production” literature traces its origins not to the principal-agent literature but to that on public goods. Team production in these studies is defined as contributors’ behavior in a voluntary contributions mechanism (VCM) (see Isaac and Walker, 1988 for an early example; Ledyard, 1995 for a review of the experimental literature, and Chaudhuri, 2007 for a more recent review of specific issues). In the standard VCM, each member of a group, image, voluntarily contributes image “tokens” out of his endowment, image to a common account. Each member’s payoff is then just image, where image is group size and image is the efficiency gain from public provision. Thus, in the “classic” VCM, the reward schedule for the agents is predetermined to be a sharing rule, with equal shares accruing to all agents. The typical experimental result for the basic VCM is that individual contributions start out above the individually-rational (but socially inefficient) level but converge to that level as agents gain experience with the game. Work within this tradition has examined the impact of factors like group size, communication, and group heterogeneity on voluntary contribution levels; see Chaudhuri et al. (2006) for a recent example. Recent summaries of results in the VCM literature are available in Plott and Smith (2008, Chapters 82-90).

Beginning in the 1980s, some experimenters interested in team production in the workplace studied effort decisions using a VCM framework. The key modification to the VCM framework that was introduced by these researchers is reward schedules that do depend in some way on the effort contributions of individual team members.52 Since we know (at least theoretically) that either individual piece rates or tournaments can induce efficient effort levels in this case, the main questions concern the efficacy of particular types of reward schedules in raising individual contributions toward efficient levels. Examples of this approach include Dickinson and Issac (1998), who introduce prizes for the highest individual contribution into a standard VCM environment; not surprisingly, these prizes raise contribution levels. (Of course, to be effective, such prizes must be committed to regardless of the level of output that is attained by the group; thus they constitute precisely the kind of “budget-breaking” that Holmstrom (1982) showed is necessary to attain efficiency in a team production environment.) Other papers who have introduced different types of relative rewards into VCMs are Dickinson (2001), which considers monetary fines on the lowest contributors, and Irlenbusch and Ruchala (2008).

3.3.3 Complementarities in production

An ironic feature of most models of team production that have been implemented in the lab (including the basic VCM) is that the assumed production technology—in which all agents’ efforts are perfect substitutes for each other—rules out one of the main reasons why teams exist: production complementarities among the members. An extreme form of complementarity that has, however, been studied in the lab is the “minimal effort” or “weakest link” game (Van Huyck et al., 1990) where group output depends only on the lowest effort supplied among the team members. Especially in the absence of communication among agents, deductive methods provide little guidance regarding what equilibria to expect in such co-ordination games; Van Huyck et al. find massive co-ordination failures in the lab as agents play “safe” strategies that leave them relatively invulnerable to the hard-to-predict actions of their co-workers.53

More recently, Brandts and Cooper (2007) consider Leontief production in a team of four workers. With no communication or management, Brandts and Cooper show that—as in earlier studies—such games almost always converge on co-ordination failure, with all agents supplying low effort levels. The authors then assign a manager to each such team, and give the manager two types of tools: (a) increasing the (common) rate of pay received by each team member for an extra unit of team output—recall that both the manager and the workers see only the team’s total output, which equals the lowest effort level chosen by any team member,—or (b) communicating with the workers by sending (and in one treatment also receiving) completely unstructured messages to the group. All of this communication takes place before any effort decisions are made.

Brandts and Cooper’s provocative finding in this environment is that increased financial incentives are essentially powerless in raising group performance (small increases in group output can be achieved, but never enough to compensate the principal for the cost of the additional compensation). Unenforceable messages, however, can be highly effective in overcoming co-ordination failures of the type modeled by these authors. The authors supplement their experimental analysis with an interesting econometric analysis of the effects of different types of messages exchanged between principals and agents.

Most recently in this context, Georg et al. (2009) test an intriguing theoretical result about incentives in teams with production complementarities due to Winter (2004). Winter’s result refers to a group of image identical agents contributing either 0 or 1 unit of effort to a project, each of whom receives a reward of image, image if the project succeeds and 0 otherwise. The production function is said to exhibit complementarity when image increases in image, where image is the number of agents exerting positive effort and image is the probability of success. In this situation, Winter shows that the only reward scheme that induces efficient effort as a unique Nash equilibrium when workers’ efforts are complementary is fully discriminating, in the sense that no two workers’ rewards for group success, image, are the same. Thus, efficiency requires inequity, in the sense of treating identical workers differently. Intuitively, it is easier (and cheaper) to make some agents work if they “know” that working is a dominant strategy for some other (identical) agent because that agent will be extremely well rewarded if the entire group succeeds.54

Of course, given many authors’ arguments that workers’ effort decisions can be strongly, and negatively, influenced by ‘unfair’ wage differentials (e.g. Akerlof and Yellen, 1990), it is not at all obvious that Winter’s proposed mechanism—which requires arbitrary wage differentials—would work well in practice. Perhaps surprisingly, it does: When the production technology exhibits complementarity, higher efficiency is achieved under a discriminatory reward mechanism than under a cost-equivalent symmetric one. Further, despite concerns that fairness considerations might affect workers’ behavior, “subjects’ effort choices are highly sensitive to their own reward, but largely unresponsive to the rewards of the other …subjects in their group”. This echoes Charness and Kuhn’s (2007) result for individual labor contracts, which also found that workers’ effort decisions in a multi-worker firm were insensitive to other workers’ wages. Taken together, these two papers suggest that, in contrast to the role of social preferences in exchanges between individual “workers” and “firms”, the importance of horizontal comparisons (among workers) within firms seems limited in the experimental literature.

3.3.4 Selection into teams

As noted, probably the most common team compensation policy is a sharing rule in which all members receive an equal share of the group’s output. An immediate consequence of such a policy—which contrasts starkly with tournament-based compensation—is that team members will prefer to have abler co-workers. Thus the issue of how teams are formed can be an interesting one, especially when complementarities exist among team members: what are the effects of different team formation mechanisms on the matches that are formed, and are these matches efficient?

We are aware of only a handful of laboratory studies of the team-formation process. Weber (2006) conducts experiments on the minimum-effort co-ordination game that start with small groups—who find it easier to co-ordinate—then adds entrants who are aware of the group’s history. Using this procedure, coordinated large groups can be created, as long as the rate of growth is not too large. In contrast to Brandts and Cooper (2007), no communication is involved. Charness and Yang (2008) evaluate a specific voting mechanism for group formation in a VCM where there are economies of scale. Societies of nine people are initially formed randomly into three groups of three people who play the game for three periods. Individuals then learn about the average contribution of each individual (by ID number) in their current own group, as well as the average contribution in other groups, and can decide whether to exit the group. Remaining group members choose whether to exclude any current members from the group; the new groups and “free agents” then choose whether to merge with other existing groups and/or other free agents. A critical element is the role of efficiency in terms of group size—the multiplying factor for each token contributed to the group account increases with group size. Groups of nine are common, particularly after a ‘restart’ after 15 periods. They find considerable success for the mechanism, as the threat of ostracism seems to keep contribution rates quite high and efficiency is a driving force.

In an interesting field experiment, Bandiera et al. (2009b) find that, when the incentives facing an entire team are strengthened (in their case by introducing a tournament between teams or performance feedback), assortative matching into teams by ability is increased. Further, workers become less likely to form teams with those they are socially connected to. In Hamilton et al.’s (2003) field study of a textile plant, strong assortative matching into teams did not occur. Further, equal sharing of production bonuses within teams seems to have stimulated cooperation, information sharing, monitoring and even mutual training within teams, generating a productivity increase (relative to piece rates) despite the expected free-rider problem.

Finally, in addition to complementarity between team members’ effort levels, there may also exist complementarity of a different sort—actual gains from heterogeneity, for example if different types of group members possess complementary skills. Charness and Villeval (2009) cite some work that suggests this and also finds that there is a preference for mixed ages (and higher efficiency) in a VCM played at two French firms’ work sites.

In sum, laboratory tests of effort decisions in teams have identified the following stylized facts:

(a) In the absence of communication and/or repeated interaction, teams in which agents are paid equal shares of the team’s output perform poorly, with agents’ efforts converging to low, individually rational levels after a few rounds of play (Isaac and Walker, 1988; Nalbantian and Schotter, 1997).
(b) The forcing contracts (essentially group bonuses) suggested by Holmstrom (1982) typically fail to improve outcomes in these environments due to co-ordination problems among agents (Nalbantian and Schotter, 1997).
(c) Team performance may also be affected by considerations of reciprocity towards the principal, if one exists (Meidinger et al., 2003).
(d) Adding incentives based on the relative contributions of individual members to the team’s output can improve teams’ performance, if such measures are available (Dickinson and Issac, 1998; Dickinson, 2001; Irlenbusch and Ruchala, 2008).
(e) Adding competition between teams can be more effective than any of the above strategies (Nalbantian and Schotter, 1997). Given the tremendous popularity of team sports, both to participants and spectators, it is not at all implausible to us that humans are naturally attracted to such situations and perform well in them.
(f) When there is complementarity between the efforts of team members, loss of output due to co-ordination failures can be severe in the absence of communication among team members (Van Huyck et al., 1990). Adding “cheap talk” communication in such situations can generate dramatic improvements, much more so than strengthening financial incentives (Brandts and Cooper, 2007). Other mechanisms that have been observed to work include asymmetric incentives—that facilitate co-ordination by making high effort a dominant strategy for at least one player— (Georg et al., 2009), and slowly adding new members to smaller groups, which find it easier to co-ordinate (Weber, 2006).

We conclude by noting one aspect of team production that, to our knowledge, has not been addressed in the lab: opportunistic behavior by principals. For example, principals who attempt to commit to incentives that “break the budget” at out-of-equilibrium effort levels may face strong temptations to understate the team’s total output (see for example Eswaran and Kotwal, 1984).55

3.4 Multi-task principal-agent problems

Suppose the agent performs multiple tasks the principal cares about, but the principal is only able to base the agent’s compensation on her performance in a subset of those tasks. At least since Farrell and Shapiro (1989) outlined their “Principle of Negative Protection”, and Holmstrom and Milgrom (1991) introduced the “multi-task principal agent problem”, economists have recognized that, depending on circumstances, incentive systems based on a subset of the tasks performed by the agent may be less efficient than a compensation system with no incentives at all.

The first experimental implementation of a multi-task principal-agent model we are aware of is Fehr and Schmidt (2004).56 In their experiment, an agent exerts two types of effort, both of which are observed by the principal and agent, but only one of these is contractible. The two types of effort are complements in the production of total output (specifically, output equals the product of image and image). Disutility of effort, on the other hand, is given by image, where image is increasing and convex. Principals can choose between two types of contracts: A “piece rate contract” pays the agent a fixed base wage plus a linear piece rate per unit of task one performed. A “bonus contract” consists of a fixed base wage plus an unenforceable announcement that the agent might receive a bonus from the principal if his overall performance is “satisfactory”. Principals and agents interact anonymously, and only once.

Absent concerns for reciprocity (in the sense of perceived intentions for offering a bonus rather than a fine), it is clear that both the above contracts should perform relatively poorly: the bonus contract should yield zero effort, and the piece rate should produce effort only on the first task, which leads to very low output due to the assumed complementarity in production. Allowing for reciprocity, it is not immediately clear which contract should perform better, since both contracts allow for principals to make generous fixed wage payments that agents could conceivably reciprocate. Empirically, however, echoing results in Fehr et al. (2007b), bonus contracts perform much better than piece rates in this environment: many agents reward high effort levels on both tasks with generous bonuses, and agents seem to anticipate this. (Some apparent reciprocation of high fixed pay was evident in both treatments, but was not very effective, relative to the possibility of earning a bonus.)

In some sense, Fehr and Schmidt’s results support Holmstrom and Milgrom’s prediction that powerful incentives may be a mistake when they are based on a strict subset of the agents’ actions that affect the principal’s welfare. Indeed in such situations, Fehr and Schmidt’s results suggest that vague and unenforceable subjective performance evaluations by the principal can outperform piece rates in this case.57 Of course, the advantages of such “bonus” contracts depend heavily on what Fehr and Schmidt interpret as workers’ concerns for fairness vis-à-vis the firm; if workers’ perceptions of fairness are manipulable or highly context-dependent, the widespread use of vague expectations as a solution to multitask principal-agent problems may not be practical.

The only other multitask principal-agent experiment we know of is by Oosterbeek et al. (2006). In contrast to Fehr and Schmidt, rather than choosing between two complementary, productive activities, Oosterbeek, Sloof and Sonnemans’ agent chooses between a productive and a “rent-seeking” activity. The latter activity is pure social waste, but increases the agent’s bargaining power by improving his outside options. In the “classic” model with no social preferences, effort devoted to rent seeking should increase when the experimenters raise the marginal efficacy of effort in that activity. In some behavioral models, however, the opposite could happen, because abstaining from rent-seeking activities becomes a more powerful signal of the agent’s good intentions the more effective those activities are. The authors’ experimental results are largely in line with the classic model.

In sum, economists are just beginning to study the multitask principal-agent problem in the lab.58 Some key patterns that have been observed to date are:

(a) As predicted by the “standard” model (Farrell and Shapiro, 1989; Holmstrom and Milgrom, 1991), rewarding the observable task via a piece rate while not rewarding the other yields poor outcomes, especially if the tasks are complements (Fehr and Schmidt, 2004).
(b) Unenforceable promises by the principal to reward “satisfactory” overall performance by the agent perform remarkably well (Fehr and Schmidt, 2004). Perhaps this situation is familiar to subjects and they act according to norms that are highly effective in the real world.
(c) When agents can choose to invest in an unproductive rent-diverting activity, raising the returns to that activity generates more of it. Agents seem neither to anticipate, nor to receive, increased rewards for refraining from such activity when it becomes more tempting for them to undertake it (Oosterbeek et al., 2006).

3.5 Multi-period principal-agent interactions

3.5.1 Ratchet effects

The “ratchet effect” applies to a situation where a principal contracts with an agent more than once, the agent has some persistent private information (such as his ability or the productivity of the principal’s technology), and binding multi-period contracts are not enforceable. In such situations, actions taken by the agent early in the relationship reveal information to the principal, which can be used by the principal later on to the agent’s disadvantage. The classic example in labor economics is in the context of piece rates, where an agent’s choice of a high effort level in the first period reveals either that (a) the agent’s effort costs are low, or (b) the firm’s technology is more productive than expected, either of which leads the firm to reduce the generosity of the agent’s compensation package in the future. Anticipating this, “able” workers (or workers who have discovered that the firm’s new technology is highly productive) will choose low first-period effort levels (Gibbons, 1987; Ickes and Samuelson, 1987). This benefits those workers by preventing the firm from extracting their rents later in the relationship, but is socially inefficient.

Aside from some early ethnographic studies (see for example Mathewson, 1931), the only empirical evidence of ratchet effects of which we are aware is experimental in nature.59 Chaudhuri (1998) conducted a laboratory experiment in which principals and agents interacted for two periods, and agents were one of two types that were unobserved by the principal. There was little evidence of ratcheting: most agents played naively, revealing their type in the first period even when an informed principal would use this information to the agent’s disadvantage, and principals often did not exploit agents’ type revelation. Possible explanations for this result include the relative complexity of the game, and the lack of context provided to the subjects that might have impeded the learning process.

Cooper et al. (1999) frame their experiment in a context-rich way, as a game between central planners and firm managers, use both students and actual Chinese firm managers as subjects, and implement experimental payoffs with high stakes relative to the participants’ real-world incomes. They also simplify the interactions between principals and agents, focusing the experiment only on the stages of the game where information revelation matters: the agent’s effort choice in the first period, and the principal’s choice of a payoff schedule in the second. Cooper et al. do find evidence of ratchet effects, though even in their context it took some time for the players to learn the consequences of type revelation.

Finally, Charness et al. (2008) experimentally test a prediction of Kanemoto and MacLeod (1992) that ex post competition for agents can eliminate ratchet effects and lead to first-best outcomes in equilibrium. Importantly, this prediction holds even when outside firms cannot observe the past performance of the agent. They also extend Kanemoto and MacLeod’s theoretical analysis to show that ex post competition for principals has the same effect. They impose three conditions in their experiment: no ex post competition, competition with an excess supply of principals, or with an excess supply of agents. As predicted, both types of competition virtually eliminate ratchet effects, though of course their effects on the utilities of the agent differ dramatically.

3.5.2 Career concerns

In ratchet effects models, high-ability agents exert low effort in early interactions to convince principals they have low ability; this prevents principals from reducing the agents’ compensation in the future. In “career concerns” models, such as Holmstrom (1982, 1999) agents exert high effort in early interactions, in order to convince principals they have high ability, and thereby attract high wage offers in the future. The key reasons for these dramatic differences are different assumptions about information and competition: In ratchet models, the principal and agent are in a bilateral monopoly situation, and the agent’s performance is typically not seen by other firms; thus high effort by the agent signals that he has a high level of rents that can be extracted in the future. In career concerns models, there is a competitive market for agents in all periods and the agent’s output is seen by other firms; thus the wage the agent can command in future periods increases with the market’s assessment of his ability. In the literature, ratchet models have been used to describe long-term worker-firm interactions, or interactions between managers and planners in non-market economies. Career-concerns models have been applied mostly to the compensation of CEOs and other senior executives, beginning with Fama (1980).

We are aware of two experimental papers on career concerns; the more recent (Koch et al., 2009) is actually the simpler one. Consider a world with a finite number of agents and principals. In the first period, no agent’s ability is known to anyone, and all principals and agents share the same prior for agents’ abilities. In this period, an agent chooses an effort level image at a private cost, image, which is common to all agents. Once effort has been chosen, the worker’s output is publicly revealed to be image, where image is realized worker ability. A worker’s ability is a permanent characteristic, thus in this simple version of the model a worker completely learns his ability at the end of the first period. At the end of period 1, all firms see every worker’s first-period performance image, and will rationally attempt to infer each worker’s ability from this information.

In period two, agents exert no effort but produce an output equal to their ability, image. Knowing this, principals engage in (Bertrand) competition for agents at the start of this period. (In the experiment, four firms bid for three agents; a principal can employ more than one agent; principals simply offer lump-sum wages that agents accept or not.) In a Perfect Bayesian equilibrium, firms will therefore offer wages equal to agents’ abilities in the second period; this gives agents incentives to take period-one actions that convince firms they have higher abilities. Subject to certain restrictions on out-of-equilibrium beliefs, Koch et al identify a unique Perfect Bayesian equilibrium in which all agents choose the same, strictly positive effort level in period one. In consequence, first-period outputs perfectly reveal agents’ abilities, and agents are paid their abilities in period two. While all agents would be better off choosing zero effort in period one (and principals earn zero profits in all equilibria), the logic of “signal-jamming” equilibria such as these compels all agents to work harder, to prevent being misidentified as a lower-ability type.

In their experiment, Koch et al. compare the above model to a “public ability” treatment, which is identical except that both the worker’s output and effort (and therefore ability) are publicly revealed at the end of the first period. In this treatment agents have no reason to expend effort in the first period, so equilibrium effort is predicted to be zero. By and large, Koch et al.’s findings are consistent with the predictions of the career concerns model: effort is higher in the hidden-ability treatment, and subjects’ first and second-order beliefs (which are elicited by the experimenters) are quite consistent with the model. That said, decision errors were high in early rounds (it apparently took some time for subjects to understand the game), and principals’ offers were subject to a mild winner’s curse.

Irlenbusch and Sliwka’s (2006) model of career concerns is identical to Koch et al.’s, except that it contains two key features. First, firms bid for workers at the start of the first period as well as the second. Second, agents choose effort in the second period as well as the first. Since all workers are ex ante identical (so there is no reason to offer them different wages in the first period), and since all workers’ privately optimal second-period effort is zero, neither of these differences changes the PBE of the game. Interestingly, however, the experimental outcome is dramatically different: now, contrary to predictions, first-period effort is much higher in the public ability treatment, precisely where there is no signal-jamming reason to exert effort. According to both Irlenbusch and Sliwka and Koch et al., the most likely reason is that Irlenbusch and Sliwka’s model introduces opportunities for signaling of a different kind (that is not formally modeled in either paper): because effort is now revealed before period-two wage offers are made, agents now have an opportunity to signal that they are “high-effort” (or “reciprocal”, or “fair”) types by choosing high effort in the first period. Since effort is not directly observed in the hidden-ability treatment, such signaling is less effective there. It is noteworthy, however, that this apparent signaling behavior occurs even though, by construction, all agents have the same cost of effort function in the experiment. The effort costs, or “willingness to work” that agents are apparently signaling, is some personal characteristic that is not induced by the experimental design. (Though it is clearly consistent with heterogeneous social preferences.)

Of course, which of the two above designs is more representative of any particular “real world” labor market is unclear. While it is clear that bidding for workers and effort decisions are made at multiple points in any worker’s career, Koch et al. argue that, to the extent that agents’ actions in Irlenbusch and Sliwka’s experiment are driven by reciprocity or fairness concerns, they may be irrelevant to the market for CEOs. On the other hand, if these actions are meant to signal a low cost of effort, or a high level of determination, drive and ambition, they may indeed be very relevant to the CEO market.60

3.5.3 Investments and hold-up

The “hold-up” problem pertains to any multi-period relationship involving specific investments, where binding multi-period contracts are not enforceable. An early formal statement was in the context of a unionized firm’s investments in plant and equipment, where market frictions create a gap between capital’s purchase price and its resale value (Grout, 1984). Grout showed that the firm in general under-invests when union-firm bargaining occurs after the capital is in place, since the firm pays the full cost of the investment but (due to ex post surplus sharing) it only reaps a fraction of the returns.

Most applications of the hold-up problem in labor economics, however, refer to the problem of workers’ investments in firm-specific skills. Long ago, Becker (1964) proposed that sharing both the costs and returns to firm-specific investments should achieve (presumably constrained) efficiency in separation decisions after the investment is made, though the precise efficiency properties of this arrangement were not specified, nor were its implications for the initial investment decision analyzed. Since then, Hashimoto and Yu (1980), Hall and Lazear (1984) and others have studied the problem more formally, and some ingenious institutional solutions (such as rigid wages in combination with a “triggered” renegotiation (MacLeod and Malcomson, 1993) have been proposed. Other institutional arrangements that have been argued to help solve the firm-specific training problem (in the sense of inducing both efficient investments and efficient ex post separations) include various types of promotion ladders (Carmichael, 1983a,b; Prendergast, 1993), and multi-skilling policies (Carmichael and MacLeod, 1993).61 We note that a key feature of all these analyses is that principals (firms) cannot make binding commitments, for example to retain a worker if that is ex post unprofitable for the firm. If firms can make credible promises of this nature (perhaps because their interactions with previous cohorts of workers can be observed by current workers), the holdup problem can be substantially mitigated, or eliminated.

The first experimental study of the hold-up problem of which we are aware is Anderhub et al. (2003). They consider a two-period worker-firm interaction in which workers choose whether to make a firm-specific investment that reduces their effort costs in the first period; the investment is profitable to the worker only if he is employed by the same principal in both periods. In perfect equilibrium, workers should make the investment whether or not firms can commit to re-employ them; perhaps unsurprisingly, Anderhub et al. however find that contractual form does matter, with workers being more willing to make specific investments when their re-employment is contractually guaranteed. More recent experimental papers have examined the effects on holdup of communication (Ellingsen and Johannesson, 2004); of which party (worker or firm) makes the investment and the nature of ex post wage bargaining (Oosterbeek et al., 2007a); and of firms’ promotion rules (Oosterbeek et al., 2007b).

More specifically, the promotion rules compared in the latter paper are “up or stay” (Prendergast, 1993), versus “up or out” (Kahn and Huberman, 1988); the setting is one in which workers’ specific investments may affect their firm-specific productivity not only in their current job, but in an alternative job (in the same firm) to which they might be promoted. Promotion decisions are at the firm’s discretion, and firms cannot commit to refrain from opportunism in those decisions (aside from being bound by an up-or-out or an up-or-stay rule). Workers’ only incentives to acquire firm-specific skills in these models are to win promotions or to keep their jobs. In this context, up-or-stay promotion policies will induce workers to invest only if investments raise their productivity more in the job to which they would be promoted than in the current job. Thus investment incentives may be too weak, but specific investments are never wasted due to separations. Up-or-out policies can provide better investment incentives, but may waste investments. (See Gibbons, 1988 for a more in-depth discussion.) Oosterbeek et al. implement these policies in the lab and find that workers’ investment decisions are, at least on average, in line with theoretical predictions.62

In sum, as for multitask principal-agent problems, experiments on multi-period principal-agent problems remain few in number. Nevertheless a few interesting results can be identified. They are:

(a) The early pooling equilibria at low effort levels predicted by ratchet effects models can be generated in the lab (Cooper et al., 1999).
(b) Consistent with Perfect Bayesian equilibrium in the modified game, labor market competition essentially eliminates the ratchet effect (Charness et al., 2008), at least in the case where workers’ private information is about their ability (as opposed to the firm’s technology). To our knowledge, no lab experiments on the latter (“hidden technology”) variant of the ratchet effects model exist.
(c) The early signal-jamming equilibria at high effort levels predicted by career-concerns models can be generated in the lab (Irlenbusch and Sliwka, 2006; Koch et al., 2009).
(d) If agents choose efforts in both periods in a career-concerns game, the effects of making effort publicly observable sometimes contradict the career concerns model: Rather than reducing first-period effort (because signal jamming is no longer possible), making effort public actually raises first-period effort (Irlenbusch and Sliwka, 2006). This suggests that agents are attempting to signal some personal characteristic that is outside the model, such as “honesty” or a personal willingness to work hard.
(e) Even in situations where short-term contracts should theoretically guarantee workers the same return to firm-specific investments as long-term contracts, enforceable long-term contracts induce more worker investments in firm-specific skills (Anderhub et al., 2003).
(f) The nature of ex post wage bargaining (threat point versus outside offer), and promotion policies (up-or-stay versus up-or-out) affect investments in specific training (Oosterbeek et al., 2007a,b).

4 Towards Behavioral Principal-Agent Theory: Fairness, Social Preferences and Effort

Perhaps the main contribution of experiments to principal-agent theory is the cascade of papers demonstrating the presence of “social preferences” (where one takes into account the payoffs, actions, and/or beliefs of other parties) in the laboratory. The essential content of social preferences is that people will deliberately sacrifice money to help other people or hurt other people, or even to keep their promises and thereby avoid guilt or a “cost of lying”. The fact that people do not simply maximize their earnings has far-reaching consequences for theory; some efforts have been made in this regard (for example, see Von Siemens, 2004 for a characterization of optimal contracts with social preferences and hidden action).

The earliest experimental paper to convincingly demonstrate the existence of social preferences is the article on the ultimatum game in Güth et al. (1982). In this bargaining experiment (which has been replicated hundreds if not thousands of times), one person is provisionally allocated a sum of money and chooses how much to offer a paired participant. If the offer is accepted, it is implemented; if it is rejected, both parties receive nothing. The main result is that people reject lopsided offers, even though it is costly to do so.

This result and many others like it have led to a number of models of utility. These can be roughly classified as either distributional or reciprocal, with some hybrid models (e.g., Loewenstein et al., 1989; Bolton, 1991; Fehr and Schmidt, 1999, and Bolton and Ockenfels, 2000) assume that players are motivated to reduce differences between their payoffs and others’, while “social-welfare models” (e.g., Charness and Rabin (2002)) assume that people like to increase social surplus, caring especially about helping those (themselves or others) with low payoffs. On the other hand, reciprocity models (e.g. Rabin (1993), Dufwenberg and Kirchsteiger (2004)) assume that the desire to raise or lower others’ payoffs depends on how fairly those others are behaving; in other words, how one perceives the intentions of other parties affects one’s behavior.

In the field environment, it is difficult to rule out that behavior is driven by expectations of future material benefit, since there is typically repeated interaction in the field. In the laboratory, one can isolate social preferences by ruling out the possibility of future interaction, either by using a one-shot game or by anonymous re-matching. Of course, one must be concerned that the behavior found in the laboratory is specific to the laboratory. Regarding distributional preferences, charitable giving in the US (a notoriously individualistic society) exceeds 2% of GDP, with 90% of people donating; thus, most people are willing to contribute materially to the well-being of even anonymous strangers in the field. Regarding negative reciprocity, workers have been known to engage in sabotage or increased theft rates after a pay cut or other actions perceived to be unfair (see for example Greenberg (1990) and Schminki et al. (2002)), particularly when procedural justice in the organization is low (Skarlicki and Folger, 1997). And the studies by Krueger and Mas (2004) and Mas (2006, 2008) present results that show retribution with real firms and workers. The case for non-instrumental positive reciprocity in the field is weaker, but includes cases such as tipping when on the road or higher response rates to mailed surveys that include small gifts.

Nevertheless, institutional and contextual factors determine the extent that social preferences come into play. For example, it is conventional wisdom that inexorable forces drive out social preference in a market setting. Roth et al. (1991) find experimentally that competition essentially completely drives out fairness considerations when 10 people can potentially accept a demand (only one of the people who accept this demand is selected) for a share of the pie that is made by one paired person. Recent papers such as Gneezy and List (2006) and Fershtman et al. (2009) suggest possible limitations for the applicability of social preferences. On the other hand, even though a great deal of research (see Holt, 1995 for a survey) has found a striking degree of convergence to the self-interested competitive equilibrium in experimental double auctions, Fehr and Falk (1999) demonstrate the presence of downward wage rigidity in an experimental labor market featuring the competitive double auction.

Fehr and Gächter (2000) suggest that a key determinant of whether social preferences come into play is whether contracts are complete and enforceable. They point out that incomplete contracts typically prevail in the labor market, where (for example) wages are often paid without any explicit performance incentives. They state: “In situations where contracts are reasonably complete, the underlying assumption of self-interest should continue to be especially important” (p. 178). However, based on evidence from Fehr et al. (2007b), where an implicit and incomplete contract with an unenforceable bonus outperforms a complete contract with contracted effort enforced through partial monitoring and the incomplete contract is selected 88% of the time, they argue that reciprocal considerations are not only critical with incomplete contracts, but also: “The endogenous formation of incomplete contracts through reciprocal choices shows that reciprocity may not only cause substantial changes in the functioning of given economic institutions but that it also may have a powerful impact on the selection and formation of institutions” (p. 178).

Given the evidence presented above, it seems clear that there is scope for improving the principal-agent models through the incorporation of behavioral motivations such as social preferences. Perhaps the insights into the nature of non-self-interested behavior gleaned from experiments will eventually be applied to a variety of economic settings, including employee response to changes in wages and employment practices.

4.1 Models of social preferences

We begin the discussion of the relevant literature by summarizing some of the more prominent models of social preferences. We proceed historically, rather than by publication date. For detail of the models and their full functional forms, we refer the reader to the individual papers.

Bolton (1991) develops a model in which people care about both their own money and their relative position; this model is based on the Ochs and Roth (1989) finding that people frequently make disadvantageous counter-offers in a two-round ultimatum game.63 One receives negative utility from receiving less than the other person, but is not bothered by receiving more. There is a trade-off between one’s material payoff and one’s relative standing, in that a person might prefer (Own, Other) material payoffs of (3, 2) to material payoffs of (4, 6). This model is entirely consequential, as intentions or previous history does not affect preferences.

Rabin (1993) was the first to provide a formal game-theoretic model incorporating reciprocity preferences. The central notion is that of kindness, defined in terms of the payoff options made available to the other player by one’s own choice, with one’s kindness increasing as the best available payoff for the other person increases. If you are unkind, the other person may sacrifice money to hurt you, while if you are kind, the other person may sacrifice money to help you. In the latter case, mutual cooperation in the prisoner’s dilemma can be supported as an equilibrium. Dufwenberg and Kirchsteiger (2004) formally extend this model (still including kindness as the main concept) to sequential games, which facilitates application to the more standard experimental games with first movers and responders. These models concentrate on reciprocity and only employ simplistic notions of fairness and distributional preferences.64

The Bolton and Ockenfels (2000) and Fehr and Schmidt (1999) consequential models of relative position (or inequity or difference aversion) were developed approximately simultaneously. The heart of these models is that people may (depending on parameter values) trade off money to make material payoffs closer together. The Bolton and Ockenfels model is more general and does not provide a specific functional form, while the Fehr and Schmidt model provides a simple and quite tractable functional form. The main difference between these models (in the two-player case) is that people are more bothered by being behind than by being ahead in the Fehr and Schmidt model, but that one’s disutility from unequal payoffs is unaffected by whether one is ahead or behind in the Bolton-Ockenfels model.65

Falk and Fischbacher (2006) combine the Fehr and Schmidt notion of inequity aversion with reciprocity considerations in a complex hybrid model where a person is less bothered by another’s refusal to come out on the short end of a split than by a refusal to share equally. Importantly, they assume that one does not resent harmful behavior by the other player if it seems to come only from the other player’s unwillingness to come out behind rather than his or her selfishness when ahead. In other words, the prevailing social norm is one where it is bad form to be selfish when one has more than others, but is understandable when one has the short stick.

Charness and Rabin (2002) also combine distributional preferences with reciprocity.66 The key innovation of this model is that people care about social efficiency, or the total payoffs for the reference group. Absent “misbehavior” (determined endogenously), one’s utility is determined by a weighted average of one’s material payoff and a social component, which is itself comprised of a weighted average of the total payoffs for the reference group and the lowest payoff for anyone in the reference group. With misbehavior, there is negative reciprocity: one withdraws one’s willingness to sacrifice to help the miscreant, by diminishing or eliminating the weight put on their payoff; one may even be willing to sacrifice to hurt the other person. However, in line with considerable experimental evidence (see the discussion below), there is no positive reciprocity in this model.67

Cox et al. (2007) present a non-equilibrium approach that combines a form of distributional preferences with reciprocity considerations. In this approach, both status (relative position) and reciprocity affect one’s emotional state, which in turn affects the choices that are made by a utility-maximizing agent. They introduce a parametric model of other-regarding preferences in which one’s emotional state determines the marginal rate of substitution between own and others’ payoffs, and thus my subsequent choices. In turn, one’s emotional state responds to relative status and to the kindness or unkindness of others’ choices. Structural estimates of this model with six existing data sets demonstrate that other-regarding preferences depend on status, reciprocity, and perceived property rights.

A final model involves a trade-off between one’s material payoff and one’s degree of guilt from violating the expectations of another person. This is called “disappointment aversion” in Dufwenberg and Gneezy (2000), but is probably better known as “guilt aversion” (Charness and Dufwenberg, 2006). The idea is that the more one believes that a party with whom one is paired is expecting a favorable move, the more likely it is that one chooses the actual move. Charness and Dufwenberg (2006) find support for this notion, as there is a strong positive correlation between a responder’s beliefs about the first-mover’s beliefs and the responder’s choice of the favorable action. Formal presentations of this concept (“simple guilt”) and a more complex version (“guilt-from-blame”) in which one feels guilt only to the extent that one believes that the other party blames one for an unfavorable outcome can be found in Battigalli and Dufwenberg (2007, 2009).68

In sum, a number of models of social preferences have been proposed, mainly classifiable into distributional models and reciprocity models and hybrids of these two approaches. Other motivations such as guilt aversion (and lying aversion, which has not generally been formalized; however, see Charness and Dufwenberg (forthcoming) for one approach) are also being considered in recent models.

(a) Distributional models posit that people are concerned with the payoffs of others in some fashion, with considerations of intention on the part of the others being irrelevant; in other words, only the consequences of one’s choice matter. Bolton (1991) presumes that one cares about receiving less than another person in one’s reference group, but doesn’t mind receiving more than the other person. Bolton and Ockenfels (2000) instead presume that people care equally about coming out behind or coming out ahead of the average amount received by those people in the reference group. Fehr and Schmidt (1999) put forward a very tractable model that also presumes that people care about coming out behind or coming out ahead of others; however, one cares at least as much about coming out behind. In addition, one makes pairwise comparisons with the material payoffs of others, rather than comparing with the average of others (this only matters in environments with more than two people).
(b) Reciprocity models instead posit that people respond to the intentions of others, with perceived kind intentions being met with kind responses and perceived unkind intentions being met with unkind responses. In this way, both mutual cooperation and mutual defection are potentially equilibria in a Prisoner’s Dilemma. These models typically abstract away from distributional considerations. Rabin (1993) is the seminal paper in this area, incorporating reciprocity into simultaneous games. Dufwenberg and Kirchsteiger (2004) extend this approach to sequential games, typically better-suited to the laboratory.
(c) Since both distributional and reciprocity considerations appear to be relevant, some recent models have combined these factors. Falk and Fischbacher (2006) combine intentions with Fehr–Schmidt preferences. Charness and Rabin (2002) put forth the notion of social efficiency, whereby people are typically interested in improving the payoffs of others, unless said others have behaved badly. There is negative reciprocity in this model, but (based on the experimental evidence in that paper and elsewhere) there is no positive reciprocity. Cox et al. (2007) use a non-equilibrium approach that does not require knowledge of beliefs and is therefore considerably more tractable.
(d) Other motivations such as guilt aversion have also been modeled (Battigalli and Dufwenberg, 2007, 2009). With guilt aversion, one trades off feelings of guilt against being selfish. The more one expects that another person expects her to behave favorably, the more likely one is to then behave favorably. Charness and Dufwenberg (2006) provide experimental support for guilt aversion, also finding that promises (statements of intent) are particularly useful in achieving optimal social outcomes.

4.2 The gift-exchange game

Probably no experimental game in the area of labor economics has had as much impact as the gift-exchange game, which tests the notion (Akerlof, 1982; Akerlof and Yellen, 1988, 1990) that there is a positive relationship between wages and effort. This game was designed to mimic an employment relationship, in that labor contracts are typically incomplete and effort is not (fully) enforceable; thus a wage offer is binding, but effort is discretionary. In its basic form, the experimental participants are divided into “firms” and “workers”, who interact anonymously either in some form of labor market or in one-to-one pairings; typically, the game is played for a number of periods. In this subsection we report results from the earliest gift-exchange experiments.

The first paper reporting results from the gift-exchange game is Fehr et al. (1993). They create a competitive labor market using a two-stage game. Workers and firms69 (with more workers than firms) were separated into two rooms, and communication between the two rooms took place via telephone. The first stage was a one-sided oral auction with employers as bidders. Firms made wage proposals, which were posted in the room containing the workers. Once a worker accepted an offered wage, the first stage was concluded for both the firm and the worker. A firm could revise its (non-accepted) wage offer upward, so that it was higher than any existing posted wage offer. People who did not contract received zero earnings for the period. In the second stage, workers chose effort and this choice was only revealed to the paired firm. Firms chose wages (restricted to be multiples of five) from the interval [26, 126], earning (126-wage) × effort. Workers earned the wage less 26 less the cost of effort, shown below:

image

If workers are purely self-interested, the prediction is of course that they will choose image regardless of the wage. Knowing this, firms will choose a wage of 30, the lowest wage ensuring participation. However, the main result is that both wages and effort levels far exceed the predictions with self-interested workers, as the average wage was 72 and the average effort chosen was 0.4. In addition, there is a very strong positive relationship between effort and wage.

The second paper in this series is Fehr et al. (1998), who conduct both a one-sided oral auction treatment and a bilateral-gift-exchange treatment using Austrian soldiers to test the robustness of the Fehr et al. (1993) results and to determine the relative effect of competition on wages. An additional treatment involves complete contracts, in the sense that the experimenter enforced an effort level of 1.0 and no effort costs were subtracted from worker’s earnings. The main finding is that gift exchange persists even in the absence of competition, when firms and workers are matched on a one-to-one basis; after a few periods wages in this treatment (and the ratio of effort to wage) coincide with wages in the one-sided oral auction without effort enforcement. Thus, the qualitative findings of Fehr et al. (1993) are replicated with non-student participants and even without competition. Wages were considerably lower when effort is enforced, although still substantially above the minimum needed to ensure participation (workers could reject offers).

Fehr et al. (1997) study the impact of reciprocity on contract terms and their enforcement with three experimental treatments involving competitive markets and more workers than firms. In each treatment, the firm specifies a wage, a desired effort level, and a fine imposed if the firm detects that the worker has shirked (provided less than the contracted level of effort). In the weak-reciprocity treatment, workers who have accepted posted contracts choose effort levels, and a random device determines whether shirking is verifiable (and fined at the specified level); the firm then learns the chosen effort level. The no-reciprocity treatment is identical, except that the experimenter exogenously fixes the effort level. Finally, in the strong-reciprocity treatment, there is an additional stage in which firms can reward or punish workers at a cost. The results show that firms demand and enforce much higher effort levels in the strong-reciprocity treatment than in the weak-reciprocity treatment; there is much less shirking in the strong-reciprocity treatment. In fact, both firms and workers earn more in the strong-reciprocity treatment, in large part because the higher effort levels lead to a larger pie. Nevertheless, firms’ contract offers are much higher in the weak-reciprocity treatment than in the no-reciprocity treatment, and the offers increase with the desired effort level.

In sum, the gift-exchange game has been a very successful approach to modeling labor issues in the laboratory. The main finding is that higher wages lead to higher effort. This section reports only the earliest gift-exchange experiments.

(a) Some early gift-exchange papers used a form of one-sided oral auction to create a market environment. Fehr et al. (1993) have more workers than firms, simulating unemployment conditions. Fehr et al. (1998) find very similar results with Austrian soldiers and a bi-lateral design, where there are equal numbers of workers and firms.
(b) Fehr et al. (1997) find that both firms and workers earn more when firms can punish or reward workers for their effort choices, since higher effort is socially efficient (there is a larger pie to divide). Even though contract offers are higher without this enforcement possibility, there is much more shirking (low effort). Nevertheless, offers increase with the desired effort level in both cases.

4.3 Multi-worker gift-exchange experiments

It seems substantially more realistic to consider an environment in which a firm has more than one employee. When there are multiple workers who receive wages and who can provide effort that benefits the firm, there is the possibility of dispersed responsibility for the firm’s earnings (leading to possible free riding) and there are considerations of horizontal fairness (one’s pay compared to the pay of other workers).

Maximiano et al. (2007) compare a bilateral gift-exchange game with one in which each firm has four workers; in the latter case the employer is likely to earn much more than any of her workers, thus reducing the need for any individual worker to sacrifice to help the relatively high-income firm. The authors did not expect gift exchange to survive in this environment.70 In fact, effort levels in the latter treatment are only marginally lower than in the bilateral game, so that “the gift exchange relationship is quite robust to increases in the size of the workforce” (p. 1026).71 Their results suggest that intentions-based reciprocity is a driving factor, although efficiency preferences (Charness and Rabin, 2002) may also play a role in inducing this behavior.

Charness and Kuhn (2007) match two workers (with different productivity levels, although the precise levels are unknown to the workers) with one firm to investigate whether workers have concerns with pay inequality and whether pay secrecy and pay compression is therefore beneficial for a firm; in a within-subjects design, we varied whether a worker was aware of the other worker’s wage. Under fairly general conditions, we demonstrated theoretically that workers’ responsiveness to co-workers’ wages should lead profit-maximizing firms to compress wages or maintain pay secrecy. And, as in other gift exchange experiments, we observed a strong positive empirical relationship between “own” wages and effort. Surprisingly, however, the effort level provided by a worker was unaffected by the wage paid to his or her co-worker. Furthermore, although firms compress wages when the co-workers will know both wages, this did not raise profits. It seems that the relationship between a worker and the firm is much more salient than the relationship between the pay of the co-workers. Overall, our experimental evidence “casts doubt on the notion that workers’ concerns with equity might explain pay policies such as wage compression or wage secrecy” (p. 693).

Gächter et al. (2008) perform an experimental analysis of pay-comparison information and effort-comparison information in an environment in which firms are matched with two workers. While effort is highly sensitive to the worker’s wage, co-worker wages per se have no effect. Further, when the firm pays different wages to the workers, the co-worker’s effort decision is ignored. However, worker behavior is affected when both pieces of social information are provided: a generous wage generates higher effort when one’s co-worker exerts high effort, but is ineffective when the co-worker contributes little or no effort. They suggest that group composition is a relevant factor for obtaining beneficial effects from social information.

Abeler et al. (forthcoming) focus on a two-worker-one-firm environment in which a worker knows both the ability and effort level of the co-worker. However, they reverse the order of play, as workers first choose effort levels and the paired firm then chooses wages. In one treatment firms are constrained to pay equal wages, while in a second treatment there is no such constraint. Perhaps surprisingly, they find that there is lower effort when the firm is forced to pay equal wages, as a worker who chooses higher effort than the co-worker does not feel it is fair to receive the same wage as the co-worker, and subsequently reduces effort. Thus, it is important to pay attention to equity considerations rather than equality per se.72

In sum, there have been a modest number of gift-exchange experiments with multiple workers, a more realistic case than the standard game.

(a) Maximiano et al. (2007) find that gift exchange is relatively undiminished even when each firm has four workers, with their results suggesting that intentions-based reciprocity is a driving factor in the effort choices.
(b) Some recent papers have examined the effect of pay-comparison information and/or effort-comparison information when there are two workers per firm. Both Charness and Kuhn (2007) and Gächter et al. (2008) find that pay-comparison information alone has little or no effect on the effort choices of workers. However, the latter study finds that worker behavior is affected when both pay-comparison information and/or effort-comparison information are provided and the co-worker provides high effort.
(c) Abeler et al. (forthcoming) examine the two-worker-per-firm environment in which a worker knows both the ability and effort level of the co-worker, with workers first choosing effort and the firm then choosing the wage. Effort is lower when firms are forced to pay the same wage, due to workers who choose higher effort objecting to being paid the same as a shirking co-worker.

4.4 Positive and negative reciprocity

While the classic gift-exchange experiments provide strong evidence of reciprocal behavior, it is less clear that this represents reciprocity in a strict sense. The reason for this is that people may have distributional preferences that could lead to the same behavior that is observed in an environment where reciprocity is possible. Thus, positive or negative reciprocity reflects behavior that differs from what a responder would have done in the absence of a first-mover action that is perceived to be positive or negative. Some experimental games from Charness and Rabin (2002) illustrate this point. In the baseline case, a participant unilaterally chooses between (Other, Own) material payoffs of (400, 400) or (750, 375). Around fifty percent of the population chooses to sacrifice 25 units to give the other person an additional 350 units, even though this leads to a large difference in material payoffs.

In a second case, the other paired participant first faced a choice between payoffs of (750, 0) or passing the choice to the second participant, who would once again face a choice between (Other, Own) payoffs of (750, 375) or (400, 400). Positive reciprocity would imply that the rate of (750, 375) choices should increase, as the first mover would clearly seem to be kind by allowing the responder to receive a positive payoff. However, in fact the rate goes down slightly (to 39%)! So positive reciprocity doesn’t seem to be present here. In a third case, the other paired participant first faced a choice between payoffs of (550, 550) or passing the choice to the second participant, who would once again face a choice between (Other, Own) payoffs of (750, 375) or (400, 400). Negative reciprocity would imply that the rate of (750, 375) choices should decrease, as the first mover would clearly seem to be unkind by forcing the responder to receive a smaller material payoff than was available with the outside option. In fact, the rate of (750, 375) choices does decrease sharply to 11%; thus, negative reciprocity is a factor.73

Perhaps the first experimental paper to carefully test for positive and negative reciprocity was Charness (2004).74 This paper considers gift-exchange in a bilateral setting, varying whether the wage was determined by a self-interested firm or generated by an exogenous process (such as a draw from a bingo cage); in all cases, the firm benefits from the worker’s chosen effort. There is a strong positive relationship between wage and effort in all treatments. However, the effort level with low wages is lower when the wage was chosen by a self-interested firm than when it was generated exogenously, suggesting the presence of negative reciprocity. On the other hand, there was virtually no difference across treatments in the effort level with high wages. Thus, this paper was the first to provide experimental evidence that positive reciprocity seems to be much weaker than negative reciprocity, while at the same time clearly identifying the effect of the distribution of payoffs per se on behavior.

A subsequent and related paper (in a non-labor setting) is Offerman (2002), who studies the effects of random choice mechanisms while allowing for both positive and negative reciprocity. He considers players’ responses to a helpful or hurtful choice, as a function of whether the “choice” was made by an interested party or generated at random (responders could sacrifice one unit to either increase or decrease the first-mover’s payoff by four). Following the helpful choice, responders never paid to lower the first-mover’s payoff, but paid to help first movers more often when the first mover made the choice than when the choice was randomly-determined. This suggests some positive reciprocity may be present, although the effect is not significant. On the other hand, the effect on the response to the first-mover’s perceived intentions was dramatic and significant following a hurtful choice, again suggesting that hurting hurts more than helping helps (the title of the paper).

Brandts and Charness (2003) test for punishment and reward in a cheap-talk game and show that intention is a critical issue, finding substantial negative reciprocity and limited positive reciprocity. One player sends a message about her intended play to another player; after play takes place, the other player is then given an opportunity to punish or reward the first player. The authors found that the responder was twice as likely to punish unfavorable play by the first player if that first player had lied about his play than if he had told the truth. A relatively small number of responders chose to reward a favorable play by the first mover.

On the other hand, Cox (2004) reports significant positive reciprocity in the investment game (Berg et al., 1995).75 The triadic design compares behavior in the standard game with behavior when the first mover is a dictator and to behavior when the experimenter determines the amount received by the responder rather than this being determined by a self-interested first mover; this procedure should allow one to distinguish distributional preferences per se from reciprocal preferences. Cox et al. (2008) finds significant positive reciprocity, but does not find significant negative reciprocity in the “moonlighting game”, where the first mover can take from the responder as well as pass to the responder. Finally, Cox and Deck (2005, 2006) report mixed findings on positive reciprocity; the results depend on whether or not the experimenter can observe the actions of the players (double blind or single blind), as well as on the sex of the subject.

In sum, the experimental evidence regarding intention-based positive and negative reciprocity is mixed, although the general result is that negative reciprocity is stronger than positive reciprocity. In a sense, this can be seen as reflecting expectations and violations thereof, as seen in a self-serving way. If one expects kind or favorable treatment and receives it, there is no strong emotional jolt; on the other hand, if one has this expectation and receives unkind or hurtful treatment, the emotional response is much stronger.

(a) Negative reciprocity is highly pervasive in experiments, much as it is in the field.
(b) On the other hand, a number of papers, such as Charness (2004), Offerman (2002) and Charness and Rabin (2002) find little or no evidence of intentional positive reciprocity, in the sense that responses after a kind action do not receive a more favorable response than when no action has been taken.
(c) However, this is not a universal result, as papers such as Cox (2004) and Cox et al. (2008) report significant positive reciprocity. Still, Cox and Deck (2005, 2006) report mixed findings on positive reciprocity. So this topic remains open.

4.5 Pay regulation

Governments often consider whether to regulate pay by means such as mandating a minimum wage or a sick-pay rate. What are the effects of such policies? In the field, it is difficult to ascertain this, as there are many factors present and changing at the same time. Thus, experimental techniques are likely to be useful, as the effects of such policies can be successfully isolated.

Brandts and Charness (2004) were the first to study the effect of a minimum wage in a gift-exchange format. They created a more symmetric payoff design. Each person was endowed with 10 units, the first mover could pass up to 10 units, and the responder received five times whatever was passed; the responder could then send back up to 10 units, with the original first mover receiving five times whatever was passed back. Brandts and Charness imposed a minimum wage of five in a condition with an excess supply of workers. The mandated minimum wage was counterproductive in the sense that the average effort was 30% lower than without it, even though the average wage was 5% higher. In addition, they found that the highest wage was chosen only half as frequently in the minimum-wage condition, with effort also reduced by 30% at this top wage.

Falk et al. (2006) test the effect of imposing a minimum wage in an environment in which each firm is matched with three workers and must choose the same wage for each of the workers. The workers decide on the minimum wage that they would conditionally accept (the strategy method) and the actual assigned wage is then accepted or rejected accordingly; however, workers do not choose effort levels, and a firm’s profits only depend on the number of workers who accept the wage offer. In some sessions, fifteen periods of a mandated minimum wage were followed by 15 periods with no minimum wage; the order was reversed in the other sessions. They find that there are lasting consequences of the mandated minimum wage even after it has been removed, in that firms must pay higher wages after the removal than before it was imposed. Thus, the authors conclude that policy can affect people’s sense of what is a fair wage.

Owens and Kagel (2010) use a variant of the payoff design in Brandts and Charness (2004). They find that an imposed minimum wage reduces effort in the neighborhood of the minimum wage, but that there are no significant effects on effort levels for higher wages. The minimum wages leads to improved incomes for both firms and workers (particularly the latter). Since there is little effect at higher wages, it appears that the minimum wage requirement is less salient at these higher wages and/or that employees recognized that wages set a good deal higher than the minimum represent just as large a monetary gift as without the presence of the minimum.

Dürsch et al. (2008) report the results of a form of gift-exchange game that investigates the issue of sick pay in an experimental labor market. The main variation is whether there is one-to-one matching of firms and workers or whether there is a market setting in which firms post wage offers and workers select the ranking of their preferences regarding the possible offers; in the latter case, a firm may end up hiring more or less than one worker. In all cases, there is a one-third chance (chosen at random) that a worker will be “sick” and so be unable to complete his or her intended effort. Firms chose one of five possible contracts, each of which specified a wage to be paid in case of zero effort (not showing up for work, either because the computer made this choice or because the worker chose zero effort voluntarily; the firm cannot tell if the worker is actually sick or simply chose to stay home) and a wage to be paid in case of positive effort. The main findings are that higher wage offers significantly increase effort choices (both with respect to sick pay and non-sick pay), firms can attract more reciprocal workers by offering sick pay, but firms benefit from offering sick pay only in the market setting where there is competition for workers.

Bauernschuster et al. (2009) use a similar design to consider the effect of sick pay with heterogeneous workers with different likelihoods of being “sick”. They use a 2×2 experimental design, in which they vary whether there is a minimum sick-pay rate (40% of the wage) and whether workers all have the same 20% rate of being sick or whether half of the workers have a 10% chance of being sick and the other workers have a 30% chance of being sick. Issues of interest include the degree of moral hazard (pretending to be sick), whether the adverse-selection problem is severe enough to lead to a collapse in the market for sick pay, and whether higher levels of sick pay lead workers to choose higher effort levels. The main results are that higher-risk workers do indeed select into contracts with higher sick pay rates, and they pay for this by getting lower wages and an overall worse deal than the low risk workers, higher sick pay leads to increased effort, but only in the case where the sick pay rate is freely-chosen, and the sick-pay market does not break down due to the adverse-selection problem.

In sum, there are mixed results regarding the effect of imposing a minimum wage. Regarding sick pay, it appears it is possible (but perhaps not easy) to design systems where this is a useful policy intervention.

(a) Brandts and Charness (2004) find that a mandated minimum wage was counter-productive in terms of effort provision. High wages were not offered nearly as frequently when this mandated minimum wage is present, as it appears that intrinsic generosity or the perceived need for high wages is crowded out by the mandated minimum wage.
(b) On the other hand, Falk et al. (2006) find that a minimum wage is potentially beneficial. They find lasting positive consequences from the mandated minimum wage even after it has been removed, as firms higher wages after the removal than before it was imposed. Owens and Kagel (2010) also find some beneficial effects from imposing a minimum wage, with Pareto improvements resulting for firms and workers.
(c) Dürsch et al. (2008) and Bauernschuster et al. (2009) study the effects of sick pay provision in an experimental labor market. In the first paper, higher wage offers significantly increase effort choices (both with respect to sick pay and non-sick pay), but firms only benefit from offering sick pay when there is competition for workers. In the second paper, workers with a greater likelihood of being “sick” select into contracts with higher sick pay rates, but at a cost in terms of wages. Higher sick pay can result in increased effort in some circumstances, and the adverse selection problem does not destroy the sick-pay market.

4.6 Do gift-exchange and social preferences map into the field?

A major question that labor economists might have concerning laboratory gift-exchange results is the degree to which these mean anything in the field environment (external validity). Two recent papers call into question the degree to which the social preferences exhibited in these experiments are robust. On the other hand, a number of other studies provide evidence that the social preferences identified in the laboratory map well into the real world.

Gneezy and List (2006) investigate whether gift-exchange behavior persists over time in two real-effort tasks, one involving work in a library and the other involving door-to-door solicitation. The mechanism for testing for reciprocity consists of the experimenters telling people that they will be receiving a certain piece rate for their work, but then announcing at the time of the six-hour task that in fact they will be paid a substantially higher piece rate. The results are persuasive, particularly for the fundraising task: people do in fact work harder for the surprise pay, but this effect vanishes over the course of time. Thus, these results constitute a cautionary note against applying the results of these laboratory experiments to the field environment. However, one caveat is that these experiments only pertain to positive reciprocity, as even the lower advertised pay rate is above the alternative wage that the student workers would normally earn. Given the relative dearth of positive reciprocity in laboratory experiments, it may not be so surprising that positive reciprocity in these field experiments is fleeting.

Fershtman et al. (2009) present compelling evidence that social preferences are subject to framing effects and are not general to all environments. They show that introducing a competitive frame to the environment crowds out social preferences in mini-dictator and trust games. For example, 72.5% of dictators choose an (Own, Other) allocation of (8, 8) instead of (11, 2). This is followed by a real-effort competition between the dictator and the recipient, where the (8, 8) outcome is implemented if the recipient wins but the (11, 2) outcome is implemented otherwise. If the dictator really wanted the (8, 8) split, the simple strategy would be to solve no problems; however, dictators contribute considerably more effort when solving more problems leads to higher own payoffs than when solving more problems doesn’t affect these payoffs (in the baseline condition). Furthermore, in another treatment the (11, 2) split is implemented if the dictator loses to the recipient, while otherwise the (8, 8) split is implemented. In this case, 85% of the dictators do nothing. This result demonstrates that social preferences may not have much effect when competition is salient.

Two other experiments present a different view than Gneezy and List (2006). Bellemare and Shearer (2009) conduct a field experiment that investigates worker responses to a monetary gift from their tree-planting firm using incentive contracts. Workers were told that they would receive a pay raise for one day. Productivity on the day of the gift is compared with productivity on adjacent days, under similar planting conditions. They find direct evidence of a significant and positive effect on daily planter productivity on the day of the gift, controlling for a variety of other possible factors. Kube et al. (2006b) conduct a field experiment involving a six-hour task in which the participants were told that they would presumably receive 15 Euro per hour; in three treatments, they were then paid either 10, 15, or 20 Euro per hour. The main result is that there is little difference in performance in the treatments where people are paid either 15 or 20 Euro per hour, but that there is a strong, deleterious, and lasting effect on performance when people are paid less than the presumptive rate. These results suggest that there is an asymmetry between positive and negative reciprocity in the field, very much in line with the evidence found in laboratory experiments. However, one caveat to these results is that the sample size is quite small.76

There are two main threads of evidence suggesting that social preferences are linked to the general population and the workforce. First, there are a number of non-laboratory experiments in European nations that show that the basic behavioral patterns observed in fairness-related experiments with students also prevail in representative samples of the general population, with older cohorts generally being more reciprocal than younger cohorts.77 Holm and Nystedt (2005) analyze behavior in the investment game, with people selected from a public database in Sweden. While the average amount returned was similar for both cohorts, the proportions dispersed for the older responders, suggesting a greater degree of responsiveness to the environment. Other studies conducted with representative surveys in Germany (Fehr et al., 2002) and the Netherlands (Bellemare and Kröger, 2007) conclude that older cohorts are more generous as responders. Falk and Zehnder (2006) find substantial evidence of social preferences in a sequential trust game conducted with 1000 residents of Zurich, also finding evidence of discrimination and in-group favoritism. Sutter and Kocher (2007) find that the elderly are more reciprocal, in a study with participants ranging from 8-year-old children to people in their late sixties. In their study, trust increases from early childhood to early adulthood, but stays constant thereafter. Finally, Karlan (2005) finds that the amounts returned in a laboratory investment game conducted in Peru predict micro-finance loan payments a year after the experiment.

The second thread involves studies that show a relationship between worker productivity and social preferences. Barr and Serneels (2009) conducted a study with Ghanaian manufacturing workers. They find that a measure of trustworthiness (the ratio of the amount returned to the amount sent) has a positive relationship with the wages paid, with these wages being used as a proxy for productivity. This is particularly true with respect to the average trustworthiness in the workplace in question, although the direction of any causal relationship is unclear—are people more trustworthy because their wages are higher or vice versa? They conclude that behavioral characteristics and corporate culture are major determinants of the productivity of the firm and the operation of the labor market. Carpenter and Seki (forthcoming) conducted a series of experiments with local shrimp fisherman in Japan. They used a laboratory public-goods game to measure social preferences and also obtained measures of fishing productivity. The main finding is that there is a strong correlation between measured social preferences and productivity for the fisherman; a second finding is that social preferences grow with the degree to which team production is present in the fishing environment.

In sum, there is mixed evidence regarding the extent to which social preferences manifest in the field. It is clear that these are not present in all environments, but there are a number of papers providing evidence from laboratory and field experiments, as well as work environments and surveys, that suggest that social preferences can be a factor in the field.

(a) Gneezy and List (2006) find positive reciprocity in the first three hours of field experiments, but not in the last three hours. They interpret their results as indicating that social preferences are ephemeral. Similarly, Fershtman et al. (2009) use a clever design to show that social preferences are crowded out when a competitive game is played after an initial choice. Clearly social preferences are not ubiquitous.
(b) However, other studies find rather different results. Bellemare and Shearer (2009) present direct field-experimental evidence of a significant and positive effect on productivity on the day when a gift is given, controlling for a variety of other possible factors (yet it remains unclear how long such an effect will persist). Kube et al. (2006b) use a design in a field experiment that can test for both positive and negative reciprocity. In line with many experimental results, there is little evidence of positive reciprocity, but strong evidence of negative reciprocity when the pay rate is lower than what was anticipated.
(c) Survey evidence indicates that social preferences are linked to the general population and the workforce. Holm and Nystedt (2005), Fehr et al. (2002), Bellemare and Kröger (2007), Falk and Zehnder (2006), Sutter and Kocher (2007), and Karlan (2005) provide such evidence from surveys in several European nations, as well as Peru. In general, it seems to be that older people are more reciprocal than younger people, suggesting that this is a learned trait.
(d) There is also direct evidence from the work environment. Barr and Serneels (2009) find a relationship between worker productivity and social preferences for Ghanaian manufacturing workers. Carpenter and Seki (forthcoming) observe a definite positive relationship between social preferences and productivity for shrimp fishermen in Japan, with said social preferences increasing to the extent that team production is present in the fishing environment.

4.7 Communication

In this final subsection, we consider how communication can affect behavior in principal-agent relationships. While it is true that other forms of social considerations such as distributional preferences and negative reciprocity are likely to lessen the applicability of the equilibrium contracts derived from standard theory, some studies demonstrate that communication can lead to better social outcomes than can be achieved with more standard social preferences.

Charness and Dufwenberg (2006) examine experimentally the impact of communication on trust and cooperation with hidden action (moral hazard). The principal has an outside option that (in the main treatment) gives (Principal, Agent) payoffs of (5, 5); alternatively, the principal can leave matters up the agent, who chooses between (Principal, Agent) payoffs of (0, 14) and expected payoffs of (10, 10).78 With standard preferences, the agent would choose (0, 14), so that the principal will choose (5, 5). In fact, there are substantial rates of cooperation even in the absence of communication, but the likelihood of the (10, 10) expected outcome increases from 20% to 50%. Furthermore, this increase is almost entirely driven by free-form and endogenous promises (statements of intent) by the agent to behave cooperatively. The evidence is consistent with people striving to live up to others’ expectations so as to avoid guilt. When players exhibit such guilt aversion, communication may influence motivation and behavior by influencing beliefs about beliefs. Charness and Dufwenberg argue that guilt aversion may be relevant for understanding strategic interaction in a variety of settings, and that it may shed light on the role of language and social norms in these contexts.

Brandts and Cooper (2007) study manager-worker interactions in an environment where payoffs depend on employees coordinating at high effort levels; the lowest effort level chosen by any worker determines the overall production. One treatment investigates the effect of increased incentives (the marginal benefit of coordinating on a higher effort level increases substantially) on worker behavior, while a second treatment allows the manager to communicate via chat with the workers. It turns out that communication is a more effective tool than incentive changes for improving coordination on high effort levels. An analysis of the content of communication indicates that the most effective communication strategy is to request high effort, pointing out the mutual benefits of high effort. Thus, direct financial incentives have some benefit in terms of coordination, but messages are the better strategy if one cannot choose both.

Charness and Dufwenberg (forthcoming) investigate the impact of communication on trust and cooperation with hidden information (adverse selection). There are two possible agent ability levels, with a 1/3 chance that an agent will have high ability. The principal has an outside option or can leave matters up the agent. There are two main cases. In the first, there are possible Pareto-improvements over the outside option for both types of agents. In the second case, there is no feasible Pareto-improvement for the low-ability agent. While the standard game-theoretic prediction is for the principal to choose the outside option in both cases, the authors find a dramatically different effect of free-form communication from the agent to the principal in these two cases. When a Pareto-improvement is feasible with a low-ability agent, communication doubles the rate at which the low-ability agent sacrifices money to help the principal. However, communication has no effect in the alternative case. The difference is driven by the fact that many messages in the first treatment confess that the agent has low ability, but that he or she will choose the Pareto-improvement over the selfish option (and they all do so). Charness and Dufwenberg conclude that is good policy to offer lower-ability workers an opportunity to participate in a socially-beneficial outcome, as they are likely to behave cooperatively.

In sum, while experiments on cheap talk have been conducted for over 20 years, some recent work has applied such nonbinding pre-play communication in clear principal-agent environments. This is a very promising area for both future research and policy.

(a) Charness and Dufwenberg (2006) find that promises (statements of intent) by the agent to behave cooperatively are very effective in achieving optimal social outcomes in a hidden action (moral hazard) setting, where the standard prediction is a total lack of cooperation. The evidence supports the notion of guilt aversion, where people strive to avoid guilt stemming from disappointing the expectations of others.
(b) Brandts and Cooper (2007) find the striking result that coordination on higher effort is facilitated more by communication between a manager and the workers than by increased incentives. Communication about the mutual benefits of high effort is particularly valuable. Thus, while direct financial incentives may be somewhat useful, communication in a team-production environment is even more so.
(c) Charness and Dufwenberg (forthcoming) also find that promises are effective in a hidden information (adverse selection) environment, but only when less able agents can participate in a Pareto-improvement in material payoffs. A potential policy application is to offer lower-ability workers an opportunity to participate in a socially-beneficial outcome, as they are likely to behave cooperatively.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.148.105