CHAPTER 10

Tips for Using Tournaments

For a very long time, managers have used tournaments and other contests, whether formal or informal, to encourage good ideas and to spark creativity. But with respect to tournaments, we’re entering a whole new age. Let’s take a look at a highly publicized example, the Netflix Prize contest, and then try to extract some more general lessons.

The Netflix Prize

Netflix, one of the major providers of rental and video-on-demand entertainment content, is an innovative and trend-setting company. It does a lot of its own research and development. A distinctive feature of its website—a feature that draws many customers to the business—is its movie recommendation service, Cinematch. If you like House of Cards, you’ll probably like Mad Men, and if you like Star Trek, you’ll probably like Twilight Zone, and if you like Air Bud Spikes Back, you’ll probably like Super Buddies (don’t ask).

The concept of Cinematch originated at Bell Labs, which is also renowned for its innovative capabilities, including the invention of the electronic transistor. Bell’s scientists were trying to improve the services of web-based Yellow Pages. They hit on the idea of recommending services to inquirers, on the basis of the successful past recommendations to similar customers. Through Bell’s user database, a new customer would be matched with other customers who had entered similar queries or had provided similar satisfaction ratings on services obtained through the automated Yellow Pages.

For example, a female, middle-aged, Caucasian, Protestant accountant might be matched to several other similar users and then provided with “answers” to her queries; the answers were slanted to include recommendations that had pleased the similar users. Netflix applied the same case-based similarity algorithms in its Cinematch feature to recommend movies to its customers. And with a few amusing exceptions, customers were happy and often surprised by the usefulness of the Cinematch recommendations.

In 2006, Netflix decided to make a substantial upgrade in the quality of its recommendations. But instead of simply sending the problem to its own in-house engineers, Netflix advertised a $1 million prize for the first algorithm to beat Cinematch by 10 percent (a reduction in the rate of errors in the recommendations). The contest’s rules were carefully spelled out. For example, the criteria for a winning submission were defined precisely; the conditions of the test to verify those criteria were specified; and “training data” were provided publicly to all competitors so that they could fine-tune and test their algorithms before submitting them to Netflix. It took only four days for a competitor to beat Cinematch’s performance, but a 10 percent reduction in error is substantial, and the contest ran for three years before a winner was declared. (Perhaps not coincidentally, it was a team from Bell Labs.)

What is most amazing, at least to us, is the extraordinary investment of effort that was produced by the $1 million incentive. Over twenty thousand teams from more than 150 nations joined the competition. In the endgame, more than ten teams of elite engineers were putting more time into the competition than into their day jobs. Compare this result with the much more modest effort that would have been produced with a $1 million investment in in-house R&D.

Overcoming Failures of Deliberation

Notice also how the innovation tournament (as an economist would call it) solves the problems faced by deliberating groups. First, the dispersion of teams across the globe—connected to the contest only through the internet—meant that there was independence. Second, the teams did not share methods (only performance results). And finally, the winner-take-all character of the tournament (only one team can win) incentivizes divergent strategies, including attempts to hit the jackpot with a completely novel strategy.

This cocktail of conditions and incentives, promoting independent and deliberately divergent thinking, prevents a misstep or blind alley from being propagated and amplified across teams. Cascades will not occur if no one knows where the others are headed and, furthermore, if everyone is motivated to run in unique directions in the first place. And although individual teams may pursue extreme solutions, the competitors as a whole are unlikely to become polarized in a single direction.

Netflix’s precision in the statement of the task conditions and the objectives (criteria for winning) ensured that most teams stayed focused on the problem of interest. Fuzzy contest rules (e.g., the traditional in-house suggestion-box contest) are unlikely to yield the kind of focused but widespread and energetic participation produced by the Netflix Prize.

InnoCentive and Beyond

As we have noted, innovation tournaments have been around for centuries, but with the rise of the internet, their scale and frequency have jumped up dramatically. One of the famous historical examples was the Longitude Prize of £20,000 offered by the British government in 1714 for a method that would determine a ship’s longitudinal location precisely (within thirty nautical miles). The large prize was never paid out, but many small prizes were awarded for advances that contributed to improved nautical location. Now numerous companies are creating innovation tournaments through their own sites or implementing them in-house if they do not want to publicize their own R&D strategies.

Consider just a few examples. In 2013, General Electric publicly posted the GE Engine Bracket Challenge, with a $20,000 prize, and the company received almost seven hundred solutions for an engineering problem involving lighter mounting brackets for an aircraft engine.1 Eight solutions were close enough to split the prize, and GE integrated the complementary parts of those solutions to create a new bracket that is 80 percent lighter than the prior solution.

In the same year, the Pentagon paid a $2 million prize to a Stanford University team whose robotic car won a 132-mile race. Also in 2013, the Breakthrough Prize in Life Sciences announced that it would pay $3 million in awards for medically relevant innovations. A new start-up (BattleFin) uses contests with prizes in the thousands of dollars to identify mathematical talent for commercial financial analysis.

In this context, the best-known practitioner is InnoCentive, a technically oriented firm that matches “seekers” with more than three hundred thousand registered “solvers,” over half from Asia and Eastern Europe, through bulletin board postings. Essentially, InnoCentive serves as a referee guaranteeing that only well-formed problems are posted. The company verifies solutions and oversees award payments, usually in the range of $10,000 to $100,000. InnoCentive will also deal with the legal contracts that give the intellectual-property rights for solutions to the solvers, after solutions have been found.

We have mentioned the extraordinary capacity of well-designed innovation tournaments to evoke and maintain high levels of performance. For cases like the Netflix Prize, there are undoubtedly novelty effects, and of course, there are reputational incentives that probably overshadow the $1 million prize money. A huge professional audience of computer scientists remembers that BellKor’s Pragmatic Chaos team (from Bell Labs) won the Netflix Prize, and in that world, the winning team’s members are celebrities.

But even without those bonuses, the tournaments conducted through InnoCentive and other companies produce large investments of ingenuity and hard work by world-class engineers and scientists. There’s a lesson here for businesses, both large and small, which can experiment with tournaments of their own. Even modest rewards can have big dividends, especially because of the reputational benefits that accompany a victory.

Why Tournaments Work

It is no coincidence that tournaments are the most effective market mechanism known in economics to promote elite performance at the top of the competitors’ capacities. This is why we see tournaments everywhere in sports; they are also frequently used to distribute top-end bonuses to software engineers, sales personnel, and so forth. Furthermore, sharply declining payoff schedules (winner-take-all is the extreme on this continuum), like the prizes in professional golf or tennis, motivate the most elite of the elite to extraordinary achievements.

Other helpful features of the InnoCentive method include the anonymity of the competitors (no one knows the exact skill levels of their competitors), the prospect of developing a personal reputation or a long-term relationship with the “solver” (e.g., a job offer), and the role of the internet as a gateway to significantly better financial rewards than those available in local economies in China, India, and East Europe.

What is not so clear is how tournament methods affect information sharing. As with prediction markets, there are no clear incentives for sharing information with other parties. In fact, competitive organizations and institutions tend to discourage information sharing. It’s obvious that private information is power, and the route to the ultimate payoff in a tournament, or a competitive investment institution like a pari-mutuel betting market, is keeping the good information secret. If you’re trying to win the Netflix Prize, you don’t want to share your ideas with competitors (you’ll reduce your own chances of winning). If you’re betting at the racetrack, you don’t want to share valid tips with other bettors (you’ll reduce the value of your own bets in a system where winners split the betting pool).

Despite the incentive not to share information in a competitive setting, one behavioral curiosity did occur toward the end of the Netflix challenge: to our surprise (and that of other behavioral scientists we’ve talked to), several teams that were close to winning combined their efforts in a final sprint to the finish line. We don’t fully understand why the teams did this, but there might have been some reason to believe that continuing alone would not win the prize, and that a close collaboration would achieve the ultimate goal. But even without information sharing, tournaments can be a great idea, because they produce the right incentives to motivate high levels of investment by elite performers.

Designing an Effective Tournament: Numbers? Prizes?

The discussion thus far suggests three practical lessons about how to set up an effective problem-solving or decision-making tournament. First, make sure there is a well-defined rule to determine the winner. Second, provide a trustworthy umpire to decide the winner. Third, advertise the tournament widely to an audience of potential competitors who have the expertise and tools to create viable solutions. In a nutshell, it is good practice to define the problem clearly enough so that solvers who have a chance to win can self-select into the competition.

But two questions remain open: How many competitors should be encouraged to participate in a tournament? And how should prizes be structured? In other words, is a winner-take-all, single prize the most effective incentive to produce good solutions?

How Many Should Play?

With respect to the question of how many competitors to encourage, keep in mind the fact that the more competitors there are, the less likely any one competitor is to win. Increased numbers can therefore discourage the effort expended by every competitor. In the case of problems with well-defined solutions, only those who have a chance to solve those problems will enter, ensuring that there will be a vigorous contest among serious contenders.

The difficulty is that as uncertainty about who might win the prize increases (as is likely to be the case in many of the problems that companies and governments have been placing in tournaments), the increased size of the competition will demotivate all individual competitors. This conclusion follows directly not only from economic theory but also from common sense, because competitors will consider the costs and benefits of their actions in deciding what to do. As the chance of victory is reduced, the benefits of investing in the competition decrease.

At the same time, there is a psychological wrinkle. As we have seen, human beings tend to be unrealistically optimistic, and if competitors have that characteristic, then they might give it a shot even when they ought not to do so. To our knowledge, only one study has tested this question empirically, and it suggests that large numbers of competitors can indeed reduce effort.

In 2011, Kevin Boudreau and his colleagues analyzed the results of several hundred TopCoder contests to see if the number of competitors was related to individual effort.2 TopCoder is a repeated competition to solve a wide variety of software coding problems. Competitors sign up for a competition by problem type (e.g., algorithms), and at a designated start time, they receive a problem and have seventy-five minutes to come up with a solution. An example problem might be: “Find the most popular person in a social network of differing ethnicities in the least amount of computation time.”

Hundreds of thousands of programmers from all over the world participate to win professional acclaim and to establish an overall performance score across contests that can lead to better jobs as well as cash prizes. The key feature of TopCoder contests, which supports a behavioral analysis of a number of competitors, is that the contests are segregated into virtual rooms, with up to twenty players per room, and a first prize and other prizes are awarded to each room. (And of course, it’s important that there is a reliable refereed score for each player.)

The authors’ analysis confirmed that the more participants, the lower the competitors’ effort. This was true at every level of performance. Even the top fifth of competitors, who had a realistic shot at winning, slowed down on the coding task if there were many competitors.

However, other researchers have suggested that the situation is more complicated—that increasing the numbers of players has other effects besides discouraging effort. A few years before Boudreau and his colleagues’ study, Christian Terwiesch and Yi Xu conjectured that more players meant less effort per player, but also, especially with more complex problems, that more players increased the chance of finding a truly novel solution (even with lower effort per player).3 Terwiesch and Xu call this second factor the “parallel-paths” effect, referring to the observation that more competitors means that a greater number of different but parallel paths will be explored by competitors searching for a solution. That’s good news.

Boudreau and his colleagues also found empirical evidence for the parallel-paths effect. Although most players put less effort into the task, the absolute best solution was usually better with more players. Other theorists have speculated that with more competitors in an enterprise (e.g., a tournament), the players are likelier to try truly novel paths to the solution—the parallel-paths effect.4 On top of these results, Boudreau and his colleagues concluded that the more difficult a problem, the greater the positive parallel-paths effect and the smaller the discouragement effect.

In light of these opposing forces and the complexity of innovative endeavors, it is premature to give precise advice to a manager who wants to use a tournament to solve a problem or make a decision. Furthermore, the only empirical basis we have for such advice is the excellent study by Boudreau and colleagues, but theirs is not a study of innovation tournaments; it’s a study of a contest with well-defined, known best answers.

Our best advice is to restrict participation to fewer competitors if the problem is routine—meaning that everyone agrees that a solution is close at hand. But for truly challenging problems, the kinds of problems that may not be solvable with present methods, open the competition up widely. This advice is somewhat different from the claim that it is always best to seek the highest participation levels possible—a claim sometimes made in work on crowdsourcing innovation.5

In the design of a tournament, the most important task is the careful specification of the objectives that must be optimized by the winning solution. The more precise the objectives, the more likely that the people best able to achieve remarkable solutions will be drawn to participate. Clarity about the criteria for winning may be even more important than the number of competitors or the prize payoff structure.

Deciding on the Nature of the Prize

What is the best policy for assigning prizes for solutions? In most current innovation tournaments, the rule is that only one competitor can win—winner takes all. But this rule does not accord with practices in analogous tournaments in, for example, professional sports, where there is prize money for several top performers (although the amounts drop off sharply from the first-place payoff).

Economic theories of incentives in tournaments usually prescribe multiple payoffs, with the exact distribution contingent on players’ skill levels and risk attitudes. In innovation tournaments, which are not as numerous as sports tournaments (or TopCoder contests), considerable motivation can be produced by ancillary benefits, like reputation in the professional community, promotions within a company, and other payoffs from career signaling. We think that offering a few prizes, and not just one, is the way to go. The offer of more than a single prize tends to increase incentives by broadening the class of winners, without losing anything significant. So make it a winner-takes-a-lot contest; but we recommend against winner-takes-all.

Government by Tournament

In recent years, the US government has noticed the extraordinary potential of tournaments and prizes, and it has taken a number of steps to promote them. Let’s take a look at the enthusiastic words of the government itself:

For example, Defense Advanced Research Projects Agency’s “grand challenges” have advanced the state of the art of robotic cars that drive themselves. The National Aeronautics and Space Administration’s (NASA) Centennial Challenges have triggered an outpouring of creative solutions from students, citizen inventors, and entrepreneurial firms for technologies such as lunar landers, space elevators, fuel-efficient aircraft, and astronaut gloves. The Department of Energy has sponsored the L Prize, designed to spur the development of high-quality, highly efficient solid-state lighting products to replace today’s inefficient light bulbs. The Environmental Protection Agency has used prizes to encourage students and others to develop videos to promote environmental stewardship. The Department of State has sponsored highly successful video and writing contests that have attracted contestants from a broad diversity of countries and geographic regions, and that have furthered the U.S. public diplomacy mission.6

Formal guidance from the Obama administration shows a lot of enthusiasm for the use of tournaments and offers a series of requirements designed to ensure that agencies use tournaments and that they are consistent with law.7 The many successes to date suggest that there is a great deal of potential for groups within government to enlist tournaments to produce innovative solutions.

In September 2010, the Obama administration launched Challenge.gov, an online listing of incentive prizes offered by federal government agencies.8 Four months later, President Obama signed into law the America COMPETES Reauthorization Act, which provides clear legal authority for all agencies to hold prize competitions to spur innovation.9 Within its first two years, the site listed challenges from forty-five agencies and engaged more than sixteen thousand participants.10 The challenges spanned the breadth of government programs, including a $10,000 State Department prize for new ideas for implementing arms control and nonproliferation treaties and a $160,000 prize from the EPA and the Department of Health and Human Services for portable systems to monitor and report on local air quality.11

The Challenge.gov program and America COMPETES Reauthorization Act have led to a number of evident success stories, including the creation of a consumer-friendly web portal for the National Cancer Institute—a portal that provides prospective patients with information about clinical trials for cancer and other diseases.12 The Air Force Research Lab used the results of a competition to create a prototype system for air-dropping large humanitarian-aid packages of food and water into populated areas without damaging the packages or injuring bystanders.13

For our purposes, the most informative example of an innovation tournament within government was conducted by the Office of the Director of Intelligence through the Intelligence Advanced Research Projects Agency (IARPA). The tournament was called the Aggregative Contingent Estimation Program (or IARPA-ACE). The overarching objective was to invent methods to improve intelligence analysis and forecasts in a verifiable manner. As with the Netflix Prize, specific ground rules, one precise metric for assessing success, and some benchmark performance standards were provided to define the contest.

Five university-industry teams were selected, and each team was given approximately $1 million for the first year of the contest. A sixth organization—the “umpires”—was paid to supervise the tournament, to design a level playing field of about one hundred individual forecasting problems per year, and to track the overall performance scores of the research teams. All such teams were required to score their performance on a single metric—the same one used to evaluate weather forecasters. And once a year, each team had to submit one best procedure to the umpires, who would test all five methods with a sample of new forecasters on one set of identical test questions.

In this tournament, the ultimate prizes were the continued funding and professional acclaim from discovering high-performing methods. These kinds of incentives for the competitors have the advantage of allowing the sponsors, from IARPA, to require the teams to share their best results in a timely manner. Annual meetings and written progress reports were shared by all the competitors so that the teams could profit by learning from all the other teams as well as their own. Full disclosure of each team’s discoveries was guaranteed, because these reports determined whether IARPA continued to allocate funding to participate in the tournament.

Methods of this kind will generally be useful for solving many business and government estimation and strategy problems, as the individual forecasting problems were representative of the kinds of forecasts made in many domains of human activity. The questions included these: Will Bashar al-Assad still be the president of Syria six months from today? Will Greece leave the European Union sometime before the end of this calendar year? Will a North Korean military force fire on, or invade, South Korea in the next six months? Will Vladimir Putin be reelected as president of Russia?

Different research teams tended to specialize in distinct approaches to improving these forecasts. One group focused on statistical weighting and other mathematical adjustments to integrate individual estimates; one focused on prediction markets (see chapter 11); one focused on the Delphi method (see chapter 6); one on forecasters who communicated socially via chat rooms; and one on leveraging the independence and diversity of the individual estimates.

A lot has been discovered, so far, by the teams competing in this tournament. Averaging is good, as per our comments on the wisdom of crowds. Averaging by weighting individual estimates by the objective past records for accuracy on similar judgments is better; and additional improvements can be obtained by averaging updated beliefs with prior estimates (and weighting the most recent estimates more than those in the distant past).

By contrast, subjective weights for confidence or perceived relative expertise, by the estimators or by their peers, notably do not improve the accuracy of the weighted average estimates. Both prediction markets and Delphi methods produced improvements over the performance of typical individuals. Somewhat surprisingly, the most successful judgment aid was a good old-fashioned chat room with an unregulated discussion of current forecast problems (pursued by the Good Judgment Team, led by Barbara Mellers and Philip Tetlock at the University of Pennsylvania).

As we see it, the early results from the forecasting tournament are reassuring insofar as they tend to reinforce the principal lessons we have presented here. But in one respect, we are a little disappointed. To date, the tournament has mostly produced methods that improve decisions by removing some of usual noise and bias from a forecast. What we really want to see are game-changing methods that don’t just reduce noise, but also amplify the signal. As yet, the tournament has not led to such benefits, which are likely to derive from more-effective information-pooling.

A lot remains to be learned, but carefully conducted face-to-face or structured deliberations might be effective in this regard. For example, an enhanced Delphi method, in which forecasters or decision makers are trained to pool unshared information, thereby escaping hidden profiles and the common-knowledge trap, could lead to breakthroughs, not just to incremental improvements. A lot more work would be necessary to demonstrate that our conjecture is correct. But in this context at least, there is little harm in trying.

The Forest

The topic of tournaments is fascinating and important, and it deserves a whole book of its own. But let’s focus on the forest and not get lost in the trees.

First, tournaments are an excellent way to reduce group failures, because they can overcome the problems explored in the first part of this book. Second, many institutions, both small and large, should be experimenting with tournaments to see if they can spark creativity. And finally, it often makes sense to go outside your own institution and to create incentives to get a little help, and maybe a lot of it, from outsiders. If you ask the whole world, you might be surprised and excited by what you learn.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.93.221