CHAPTER 3

Further Solution Concepts for Normal-Form Games

As described earlier at the beginning of Chapter 2, we reason about multiplayer games using solution concepts, principles according to which we identify interesting subsets of the outcomes of a game. While the most important solution concept is the Nash equilibrium, there are also a large number of others, only some of which we will discuss here. Some of these concepts are more restrictive than the Nash equilibrium, some less so, and some noncomparable. In subsequent chapters we will introduce some additional solution concepts that are only applicable to game representations other than the normal form.

3.1 MAXMIN AND MINMAX STRATEGIES

The maxmin strategy of player i in an n-player, general-sum game is a (not necessarily unique, and in general mixed) strategy that maximizes i’s worst-case payoff, in the situation where all the other players happen to play the strategies which cause the greatest harm to i. The maxmin value (or security level) of the game for player i is that minimum amount of payoff guaranteed by a maxmin strategy.

Definition 3.1.1 (Maxmin). The maxmin strategy for player i is arg maxsi mins−i ui (si, s−i), and the maxmin value for player i is maxsi mins−i ui (si, s−i).

Although the maxmin strategy is a concept that makes sense in simultaneous-move games, it can be understood through the following temporal intuition. The maxmin strategy is i’s best choice when first i must commit to a (possibly mixed) strategy, and then the remaining agents −i observe this strategy (but not i’s action choice) and choose their own strategies to minimize i’s expected payoff. In the Battle of the Sexes game (Figure 1.6), the maxmin value for either player is 2/3, and requires the maximizing agent to play a mixed strategy. (Do you see why?)

While it may not seem reasonable to assume that the other agents would be solely interested in minimizing i’s utility, it is the case that i plays a maxmin strategy and the other agents play arbitrarily, i will still receive an expected payoff of at least his maxmin value. This means that the maxmin strategy is a sensible choice for a conservative agent who wants to maximize his expected utility without having to make any assumptions about the other agents, such as that they will act rationally according to their own interests, or that they will draw their action choices from some known distributions.

The minmax strategy and minmax value play a dual role to their maxmin counterparts. In two-player games the minmax strategy for player i against player −i is a strategy that keeps the maximum payoff of −i at a minimum, and the minmax value of player −i is that minimum. This is useful when we want to consider the amount that one player can punish another without regard for his own payoff. Such punishment can arise in repeated games, as we will see in Section 6. The formal definitions follow.

Definition 3.1.2 (Minmax, two-player). In a two-player game, the minmax strategy for player i against playeri is arg minsi maxs−i u−i (si, s−i), and playeri’s minmax value is minsi maxs−i u−i(si, si).

In n-player games with n > 2, defining player i’s minmax strategy against player j is a bit more complicated. This is because i will not usually be able to guarantee that j achieves minimal payoff by acting unilaterally. However, if we assume that all the players other than j choose to “gang up” on j—and that they are able to coordinate appropriately when there is more than one strategy profile that would yield the same minimal payoff for j—then we can define minmax strategies for the n-player case.

Definition 3.1.3 (Minmax, n-player). In an n-player game, the minmax strategy for player i against player ji is i’s component of the mixed-strategy profile sj in the expression arg mins−j maxsj uj(sj, sj), wherej denotes the set of players other than j. As before, the minmax value for player j is mins−j maxsj uj(sj, sj).

As before, we can give intuition for the minmax value through a temporal setting. Imagine that the agents −i must commit to a (possibly mixed) strategy profile, to which i can then play a best response. Player i receives his minmax value if players −i choose their strategies in order to minimize i’s expected utility after he plays his best response.

In two-player games, a player’s minmax value is always equal to his maxmin value. For games with more than two players a weaker condition holds: a player’s maxmin value is always less than or equal to his minmax value. (Can you explain why this is?)

Since neither an agent’s maxmin strategy nor his minmax strategy depend on the strategies that the other agents actually choose, the maxmin and minmax strategies give rise to solution concepts in a straightforward way. We will call a mixed-strategy profile s = (s1, s2, …) a maxmin strategy profile of a given game if s1 is a maxmin strategy for player 1, s2 is a maxmin strategy for player 2 and so on. In two-player games, we can also define minmax strategy profiles analogously. In two-player, zero-sum games, there is a very tight connection between minmax and maxmin strategy profiles. Furthermore, these solution concepts are also linked to the Nash equilibrium.

Theorem 3.1.4 (Minimax theorem (von Neumann, 1928)). In any finite, two-player, zero-sum game, in any Nash equilibrium1 each player receives a payoff that is equal to both his maxmin value and his minmax value.

Why is the minmax theorem important? It demonstrates that maxmin strategies, min-max strategies and Nash equilibria coincide in two-player, zero-sum games. In particular, Theorem 3.1.4 allows us to conclude that in two-player, zero-sum games:

1. Each player’s maxmin value is equal to his minmax value. By convention, the maxmin value for player 1 is called the value of the game;

2. For both players, the set of maxmin strategies coincides with the set of minmax strategies; and

3. Any maxmin strategy profile (or, equivalently, minmax strategy profile) is a Nash equilibrium. Furthermore, these are all the Nash equilibria. Consequently, all Nash equilibria have the same payoff vector (namely, those in which player 1 gets the value of the game).

For example, in the Matching Pennies game in Figure 1.4, the value of the game is 0. The unique Nash equilibrium consists of both players randomizing between heads and tails with equal probability, which is both the maxmin strategy and the minmax strategy for each player.

Nash equilibria in zero-sum games can be viewed graphically as a “saddle” in a high-dimensional space. At a saddle point, any deviation of the agent lowers his utility and increases the utility of the other agent. It is easy to visualize in the simple case in which each agent has two pure strategies. In this case the space of strategy profiles can be viewed as all points on the square between (0,0) and (1,1), with each point in the square describing the mixed strategy of each agent. The payoff to player 1 (and thus the negative of the payoff to player 2) is indeed a saddle-shaped, three-dimensional surface above this square. Figure 3.1 (left) gives a pictorial example, illustrating player 1’s expected utility in Matching Pennies as a function of both players’ probabilities of playing heads. Figure 3.1 (right) adds a plane at z = 0 to make it easier to see that it is an equilibrium for both players to play heads 50% of the time and that zero is both the maxmin value and the minmax value for both players.

image

FIGURE 3.1: The saddle point in Matching Pennies, with and without a plane at z = 0.

3.2 MINIMAX REGRET

We argued earlier that agents might play maxmin strategies in order to achieve good payoffs in the worst case, even in a game that is not zero sum. However, consider a setting in which the other agent is not believed to be malicious, but is instead believed to be entirely unpredictable. (Crucially, in this section we do not approach the problem as Bayesians, saying that agent is beliefs can be described by a probability distribution; instead, we use a “pre-Bayesian” model in which i does not know such a distribution and indeed has no beliefs about it.) In such a setting, it can also make sense for agents to care about minimizing their worst-case loss, rather than simply maximizing their worst-case payoff.

Consider the game in Figure 3.2. Interpret the payoff numbers as pertaining to agent 1 only and let image be an arbitrarily small positive constant. For this example it does not matter what agent 2’s payoffs a, b, c, and d are, and we can even imagine that agent 1 does not know these values. Indeed, this could be one reason why player 1 would be unable to form beliefs about how player 2 would play, even if he were to believe that player 2 was rational. Let us imagine that agent 1 wants to determine a strategy to follow that makes sense despite his uncertainty about player 2. First, agent 1 might play his maxmin, or “safety level” strategy. In this game it is easy to see that player 1’s maxmin strategy is to play B; this is because player 2’s minmax strategy is to play R, and B is a best response to R.

If player 1 does not believe that player 2 is malicious, however, he might instead reason as follows. If player 2 were to play R then it would not matter very much how player 1 plays: the most he could lose by playing the wrong way would be image. On the other hand, if player 2 were to play L then player 1’s action would be very significant: if player 1 were to make the wrong choice here then his utility would be decreased by 98. Thus player 1 might choose to play T in order to minimize his worst-case loss. Observe that this is the opposite of what he would choose if he followed his maxmin strategy.

image

FIGURE 3.2: A game for contrasting maxmin with minimax regret. The numbers refer only to player 1’s payoffs; image is an arbitrarily small positive constant. Player 2’s payoffs are the arbitrary (and possibly unknown) constants a, b, c, and d.

Let us now formalize this idea. We begin with the notion of regret.

Definition 3.2.1 (Regret). An agent i’s regret for playing an action ai if the other agents adopt action profile a−i is defined as

image

In words, this is the amount that i loses by playing ai, rather than playing his best response to ai. Of course, i does not know what actions the other players will take; however, he can consider those actions that would give him the highest regret for playing ai.

Definition 3.2.2 (Max regret). An agent i’s maximum regret for playing an action ai is defined as

image

This is the amount that i loses by playing ai rather than playing his best response to a−i, if the other agents chose the a−i that makes this loss as large as possible. Finally, i can choose his action in order to minimize this worst-case regret.

Definition 3.2.3 (Minimax regret). Minimax regret actions for agent i are defined as

image

Thus, an agent’s minimax regret action is an action that yields the smallest maximum regret. Minimax regret can be extended to a solution concept in the natural way, by identifying action profiles that consist of minimax regret actions for each player. Note that we can safely restrict ourselves to actions rather than mixed strategies in the definitions above (i.e., maximizing over the sets Ai and A−i instead of Si and Si), because of the linearity of expectation. We leave the proof of this fact as an exercise.

3.3 REMOVAL OF DOMINATED STRATEGIES

We first define what it means for one strategy to dominate another. Intuitively, one strategy dominates another for a player i if the first strategy yields i a greater payoff than the second strategy, for any strategy profile of the remaining players.2 There are, however, three gradations of dominance, which are captured in the following definition.

Definition 3.3.1 (Domination). Let si and s′i be two strategies of player i, and S−i the set of all strategy profiles of the remaining players. Then

1. si strictly dominates s′i if for all si image Si, it is the case that ui(si, s−i) > ui(si, s−i).

2. si weakly dominates si if for all s−i image Si, it is the case that ui(si, s−i) ≥ ui(s′i, s−i), and for at least one s−i image Si, it is the case that ui(si, s−i) > ui(s′i, s−i).

3. si very weakly dominates s′i if for all s−i image Si, it is the case that ui(si, s−i) ≥ ui (si, s−i).

If one strategy dominates all others, we say that it is (strongly, weakly or very weakly) dominant.

Definition 3.3.2 (Dominant strategy). A strategy is strictly (resp., weakly; very weakly) dominant for an agent if it strictly (weakly; very weakly) dominates any other strategy for that agent.

It is obvious that a strategy profile (s1, …, sn) in which every si is dominant for player i (whether strictly, weakly, or very weakly) is a Nash equilibrium. Such a strategy profile forms what is called an equilibrium in dominant strategies with the appropriate modifier (strictly, etc). An equilibrium in strictly dominant strategies is necessarily the unique Nash equilibrium. For example, consider again the Prisoner’s Dilemma game. For each player, the strategy D is strictly dominant, and indeed (D, D) is the unique Nash equilibrium. Indeed, we can now explain the “dilemma” which is particularly troubling about the Prisoner’s Dilemma game: the outcome reached in the unique equilibrium, which is an equilibrium in strictly dominant strategies, is also the only outcome that is not Pareto optimal.

image

FIGURE 3.3: A game with dominated strategies.

Games with dominant strategies play an important role in game theory, especially in games handcrafted by experts. This is true in particular in mechanism design, an area of game theory not covered in this booklet. However, dominant strategies are rare in naturally occurring games. More common are dominated strategies.

Definition 3.3.3 (Dominated strategy). A strategy si is strictly (weakly; very weakly) dominated for an agent i if some other strategy s′i strictly (weakly; very weakly) dominates si.

Let us focus for the moment on strictly dominated strategies. Intuitively, all strictly dominated pure strategies can be ignored, since they can never be best responses to any moves by the other players. There are several subtleties, however. First, once a pure strategy is eliminated, another strategy that was not dominated can become dominated. And so this process of elimination can be continued. Second, a pure strategy may be dominated by a mixture of other pure strategies without being dominated by any of them independently. To see this, consider the game in Figure 3.3.

Column R can be eliminated, since it is dominated by, for example, column L. We are left with the reduced game in Figure 3.4.

In this game M is dominated by neither U nor D, but it is dominated by the mixed strategy that selects either U or D with equal probability. (Note, however, that it was not dominated before the elimination of the R column.) And so we are left with the maximally reduced game in Figure 3.5.

This yields us a solution concept: the set of all strategy profiles that assign zero probability to playing any action that would be removed through iterated removal of strictly dominated strategies. Note that this is a much weaker solution concept than Nash equilibrium—the set of strategy profiles will include all the Nash equilibria, but it will include many other mixed strategies as well. In some games, it will be equal to S, the set of all possible mixed strategies.

image

FIGURE 3.4: The game from Figure 3.3 after removing the dominated strategy R.

Since iterated removal of strictly dominated strategies preserves Nash equilibria, we can use this technique to computational advantage. In the previous example, rather than computing the Nash equilibria in the original 3 × 3 game, we can now compute them in this 2 × 2 game, applying the technique described earlier. In some cases, the procedure ends with a single cell; this is the case, for example, with the Prisoner’s Dilemma game. In this case we say that the game is solvable by iterated elimination.

Clearly, in any finite game, iterated elimination ends after a finite number of iterations. One might worry that, in general, the order of elimination might affect the final outcome. It turns out that this elimination order does not matter when we remove strictly dominated strategies. (This is called a Church–Rosser property.) However, the elimination order can make a difference to the final reduced game if we remove weakly or very weakly dominated strategies.

Which flavor of domination should we concern ourselves with? In fact, each flavor has advantages and disadvantages, which is why we present all of them here. Strict domination leads to better-behaved iterated elimination: it yields a reduced game which is independent of the elimination order, and iterated elimination is more computationally manageable. There is also a further related advantage that we will defer to Section 3.4. Weak domination can yield smaller reduced games, but under iterated elimination the reduced game can depend on the elimination order. Very weak domination can yield even smaller reduced games, but again these reduced games depend on elimination order. Furthermore, very weak domination does not impose a strict order on strategies: when two strategies are equivalent, each very weakly dominates the other. For this reason, this last form of domination is generally considered the least important.

image

FIGURE 3.5: The game from Figure 3.4 after removing the dominated strategy M.

3.4 RATIONALIZABILITY

A strategy is rationalizable if a perfectly rational player could justifiably play it against one or more perfectly rational opponents. Informally, a strategy profile for player i is rationalizable if it is a best response to some beliefs that i could have about the strategies that the other players will take. The wrinkle, however, is that i cannot have arbitrary beliefs about the other players’ actions—his beliefs must take into account his knowledge of their rationality, which incorporates their knowledge of his rationality, their knowledge of his knowledge of their rationality, and so on in an infinite regress. A rationalizable strategy profile is a strategy profile that consists only of rationalizable strategies.

For example, in the Matching Pennies game given in Figure 1.4, the pure strategy heads is rationalizable for the row player. First, the strategy heads is a best response to the pure strategy heads by the column player. Second, believing that the column player would also play heads is consistent with the column player’s rationality: the column player could believe that the row player would play tails, to which the column player’s best response is heads. It would be rational for the column player to believe that the row player would play tails because the column player could believe that the row player believed that the column player would play tails, to which tails is a best response. Arguing in the same way, we can make our way up the chain of beliefs.

However, not every strategy can be justified in this way. For example, considering the Prisoner’s Dilemma game given in Figure 1.1, the strategy C is not rationalizable for the row player, because C is not a best response to any strategy that the column player could play. Similarly, consider the game from Figure 3.3. M is not a rationalizable strategy for the row player: although it is a best response to a strategy of the column player’s (R), there do not exist any beliefs that the column player could hold about the row player’s strategy to which R would be a best response.

Because of the infinite regress, the formal definition of rationalizability is somewhat involved; however, it turns out that there are some intuitive things that we can say about rationalizable strategies. First, Nash equilibrium strategies are always rationalizable: thus, the set of rationalizable strategies (and strategy profiles) is always nonempty. Second, in two-player games rationalizable strategies have a simple characterization: they are those strategies that survive the iterated elimination of strictly dominated strategies. In n-player games there exist strategies which survive iterated removal of dominated strategies but are not rationalizable. In this more general case, rationalizable strategies are those strategies which survive iterative removal of strategies that are never a best response to any strategy profile by the other players.

We now define rationalizability more formally. First we will define an infinite sequence of (possibly mixed) strategies Si0, Si1, Si2, … for each player i. Let Si0 = Si; thus, for each agent i, the first element in the sequence is the set of all i’s mixed strategies. Let CH(S) denote the convex hull of a set S: the smallest convex set containing all the elements of S. Now we define Sik as the set of all strategies si image Sik−1 for which there exists some si imageji CH(Sjk−1) such that for all s′i image Sik−1, ui(si, si) ≥ ui(si, s−i). That is, a strategy belongs to Sik if there is some strategy si for the other players in response to which si is at least as good as any other strategy from Sik−1. The convex hull operation allows i to best respond to uncertain beliefs about which strategies from Sik−1 another player j will adopt. CH(Sjk−1) is used instead of Π(Sjk−1), the set of all probability distributions over Sjk−1, because the latter would allow consideration of mixed strategies that are dominated by some pure strategies for j. Player i could not believe that j would play such a strategy because such a belief would be inconsistent with i’s knowledge of j’s rationality.

Now we define the set of rationalizable strategies for player i as the intersection of the sets Si0,Si1,Si2,….

Definition 3.4.1 (Rationalizable strategies). The rationalizable strategies for player i are image.

3.5 CORRELATED EQUILIBRIUM

The correlated equilibrium is a solution concept which generalizes the Nash equilibrium. Some people feel that this is the most fundamental solution concept of all.3

In a standard game, each player mixes his pure strategies independently. For example, consider again the Battle of the Sexes game (reproduced here as Figure 3.6) and its mixed-strategy equilibrium.

As we saw in Section 2.3, this game’s unique mixed-strategy equilibrium yields each player an expected payoff of 2/3. But now imagine that the two players can observe the result of a fair coin flip and can condition their strategies based on that outcome. They can now adopt strategies from a richer set; for example, they could choose “WL if heads, LW if tails.” Indeed, this pair forms an equilibrium in this richer strategy space; given that one player plays the strategy, the other player only loses by adopting another. Furthermore, the expected payoff to each player in this so-called correlated equilibrium is .5 * 2 + .5 * 1 = 1.5. Thus both agents receive higher utility than they do under the mixed-strategy equilibrium in the uncorrelated case (which had expected payoff of 2/3 for both agents), and the outcome is fairer than either of the pure-strategy equilibria in the sense that the worst-off player achieves higher expected utility. Correlating devices can thus be quite useful.

image

FIGURE 3.6: Battle of the Sexes game.

The aforementioned example had both players observe the exact outcome of the coin flip, but the general setting does not require this. Generally, the setting includes some random variable (the “external event”) with a commonly known probability distribution, and a private signal to each player about the instantiation of the random variable. A player’s signal can be correlated with the random variable’s value and with the signals received by other players, without uniquely identifying any of them. Standard games can be viewed as the degenerate case in which the signals of the different agents are probabilistically independent.

To model this formally, consider n random variables, with a joint distribution over these variables. Imagine that nature chooses according to this distribution, but reveals to each agent only the realized value of his variable, and that the agent can condition his action on this value.4

Definition 3.5.1 (Correlated equilibrium). Given an n-agent game G = (N, A, u), a correlated equilibrium is a tuple (v, π, σ), where v is a tuple of random variables v = (v1, …, vn) with respective domains D = (D1, …, Dn), π is a joint distribution over v, σ = (σ1, …, σn) is a vector of mappings σi : Di image Ai, and for each agent i and every mapping σ′i : Di image Ai it is the case that

image

Note that the mapping is to an action, that is to a pure strategy rather than a mixed one. One could allow a mapping to mixed strategies, but that would add no greater generality. (Do you see why?)

Clearly, for every Nash equilibrium, we can construct an equivalent correlated equilibrium, in the sense that they induce the same distribution on outcomes.

Theorem 3.5.2. For every Nash equilibrium σ* there exists a corresponding correlated equilibrium σ.

The proof is straightforward. Roughly, we can construct a correlated equilibrium from a given Nash equilibrium by letting each Di = Ai and letting the joint probability distribution be π(d) = ΠiimageNσ*i(di). Then we choose σi as the mapping from each di to the corresponding ai. When the agents play the strategy profile σ, the distribution over outcomes is identical to that under σ*. Because the vi’s are uncorrelated and no agent can benefit by deviating from σ*, σ is a correlated equilibrium.

On the other hand, not every correlated equilibrium is equivalent to a Nash equilibrium; the Battle-of-the-Sexes example given earlier provides a counter-example. Thus, correlated equilibrium is a strictly weaker notion than Nash equilibrium.

Finally, we note that correlated equilibria can be combined together to form new correlated equilibria. Thus, if the set of correlated equilibria of a game G does not contain a single element, it is infinite. Indeed, any convex combination of correlated equilibrium payoffs can itself be realized as the payoff profile of some correlated equilibrium. The easiest way to understand this claim is to imagine a public random device that selects which of the correlated equilibria will be played; next, another random number is chosen in order to allow the chosen equilibrium to be played. Overall, each agent’s expected payoff is the weighted sum of the payoffs from the correlated equilibria that were combined. Since no agent has an incentive to deviate regardless of the probabilities governing the first random device, we can achieve any convex combination of correlated equilibrium payoffs. Finally, observe that having two stages of random number generation is not necessary: we can simply derive new domains D and a new joint probability distribution π from the Ds and π’s of the original correlated equilibria, and so perform the random number generation in one step.

3.6 TREMBLING-HAND PERFECT EQUILIBRIUM

Another important solution concept is the trembling-hand perfect equilibrium, or simply perfect equilibrium. While rationalizability is a weaker notion than that of a Nash equilibrium, perfection is a stronger one. Several equivalent definitions of the concept exist. In the following definition, recall that a fully mixed strategy is one that assigns every action a strictly positive probability.

Definition 3.6.1 (Trembling-hand perfect equilibrium). A mixed strategy S is a (trembling-hand) perfect equilibrium of a normal-form game G if there exists a sequence S0, S1, … of fully mixed-strategy profiles such that limn→∞ Sn = S, and such that for each Sk in the sequence and each player i, the strategy si is a best response to the strategies sk−i.

Perfect equilibria are an involved topic, and relate to other subtle refinements of the Nash equilibrium such as the proper equilibrium. The notes at the end of the booklet point the reader to further readings on this topic. We should, however, at least explain the term “trembling hand.” One way to think about the concept is as requiring that the equilibrium be robust against slight errors—“trembles”—on the part of players. In other words, one’s action ought to be the best response not only against the opponents’ equilibrium strategies, but also against small perturbation of those. However, since the mathematical definition speaks about arbitrarily small perturbations, whether these trembles in fact model player fallibility or are merely a mathematical device is open to debate.

3.7 image -NASHEQUILIBRIUM

Another solution concept reflects the idea that players might not care about changing their strategies to a best response when the amount of utility that they could gain by doing so is very small. This leads us to the idea of an image-Nash equilibrium.

Definition 3.7.1 (image-Nash). Fix image > 0. A strategy profile s = (s1, …, sn) is an image-Nash equilibrium if, for all agents i and for all strategies s′i ≠ si, ui(si, si) ≥ ui(si, si) − image.

This concept has various attractive properties. image-Nash equilibria always exist; indeed, every Nash equilibrium is surrounded by a region of image-Nash equilibria for any image > 0. The argument that agents are indifferent to sufficiently small gains is convincing to many. Further, the concept can be computationally useful: algorithms that aim to identify image-Nash equilibria need to consider only a finite set of mixed-strategy profiles rather than the whole continuous space. (Of course, the size of this finite set depends on both image and on the game’s payoffs.) Since computers generally represent real numbers using a floating-point approximation, it is usually the case that even methods for the “exact” computation of Nash equilibria actually find only image-equilibria where image is roughly the “machine precision” (on the order of 10−16 or less for most modern computers). image-Nash equilibria are also important to multiagent learning algorithms, not discussed in this booklet.

However, image-Nash equilibria also have several drawbacks. First, although Nash equilibria are always surrounded by image-Nash equilibria, the reverse is not true. Thus, a given image-Nash equilibrium is not necessarily close to any Nash equilibrium. This undermines the sense in which image-Nash equilibria can be understood as approximations of Nash equilibria. Consider the game in Figure 3.7.

image

FIGURE 3.7: A game with an interesting image-Nash equilibrium.

This game has a unique Nash equilibrium of (D, R), which can be identified through the iterated removal of dominated strategies. (D dominates U for player 1; on the removal of U, R dominates L for player 2.) (D, R) is also an image-Nash equilibrium, of course. However, there is also another image-Nash equilibrium: (U, L). This game illustrates two things.

First, neither player’s payoff under the image-Nash equilibrium is within image of his payoff in a Nash equilibrium; indeed, in general both players’ payoffs under an image-Nash equilibrium can be arbitrarily less than in any Nash equilibrium. The problem is that the requirement that player 1 cannot gain more than image by deviating from the image-Nash equilibrium strategy profile of (U, L) does not imply that player 2 would not be able to gain more than image by best responding to player 1’s deviation.

Second, some image-Nash equilibria might be very unlikely to arise in play. Although player 1 might not care about a gain of image, he might reason that the fact that D dominates U would lead player 2 to expect him to play D, and that player 2 would thus play R in response. Player 1 might thus play D because it is his best response to R. Overall, the idea of image-approximation is much messier when applied to the identification of a fixed point than when it is applied to a (single-objective) optimization problem.

3.8 EVOLUTIONARILY STABLE

Roughly speaking, an evolutionarily stable strategy is a mixed strategy that is “resistant to invasion” by new strategies. As can be gleaned from the name, inspiration for to concept of evolutionarily stable strategies comes from evolutionary biology. There one speaks about different species within a population, and how each species’ relative “fitness" causes its proportion within the population to grow or shrink. In our setting the species are those agents playing a particular strategy. Then suppose that a small population of “invaders” playing a different strategy is added to the population. The original strategy is considered to be an ESS if it gets a higher payoff against the resulting mixture of the new and old strategies than the invaders do, thereby “chasing out” the invaders.

More formally, we have the following.

Definition 3.8.1 (Evolutionarily stable strategy (ESS)). Given a symmetric two-player normal-form game G = ({1,2}, A, u) and a mixed strategy S, we say that S is an evolutionarily stable strategy if and only if for some image > 0 and for all other strategies S′ it is the case that

image

We can use properties of expectation to state this condition equivalently as

image

Note that, since this only need hold for small image, this is equivalent to requiring that either u(S, S) > u(S′, S) holds, or else both u(S, S) = u(S′, S) and u(S, S′) > u(S′, S′) hold. Note that this is a strict definition. We can also state a weaker definition of ESS.

Definition 3.8.2 (Weak ESS). S is a weak evolutionarily stable strategy if and only if for some image > 0 and for all S′ it is the case that either u(S, S) > u(S′, S) holds, or else both u(S, S) = u(S′, S) and u (S, S′) ≥ u(S′, S′) hold.

This weaker definition includes strategies in which the invader does just as well against the original population as it does against itself. In these cases the population using the invading strategy will not grow, but it will also not shrink.

We illustrate the concept of ESS with the instance of the Hawk–Dove game shown in Figure 3.8. The story behind this game might be as follows. Two animals are fighting over a prize such as a piece of food. Each animal can choose between two behaviors: an aggressive hawkish behavior H, or an accommodating dovish behavior D. The prize is worth 6 to each of them. Fighting costs each player 5. When a hawk meets a dove he gets the prize without a fight, and hence the payoffs are 6 and 0, respectively. When two doves meet they split the prize without a fight, hence a payoff of 3 to each one. When two hawks meet a fight breaks out, costing each player 5 (or, equivalently, yielding −5). In addition, each player has a 50% chance of ending up with the prize, adding an expected benefit of 3, for an overall payoff of −2.

image

FIGURE 3.8: Hawk–Dove game.

It is not hard to verify that the game has a unique symmetric Nash equilibrium (S, S), where S = image, and that S is also the unique ESS of the game. To confirm that S is an ESS, we need that for all S′ ≠ S, u(S, S) = u(S′, S) and u(S, S′) > u(S′, S′). The equality condition is true of any mixed strategy equilibrium with full support, so follows directly. To demonstrate that the inequality holds, it is sufficient to find the S′—or equivalently, the probability of playing H—that minimizes f(S′) = u(S, S′) − u(S′, S′). Expanding f(S′) we see that it is a quadratic equation with the (unique) maximum S′ = S, proving our result.

The connection between an ESS and a Nash equilibrium is not accidental. The following two theorems capture this connection.

Theorem 3.8.3. Given a symmetric two-player normal-form game G = ({1,2}, A, u) and a mixed strategy S, if S is an evolutionarily stable strategy then (S, S) is a Nash equilibrium of G.

This is easy to show. Note that by definition an ESS S must satisfy

u(S, S) ≥ u(S′, S).

In other words, it is a best response to itself and thus must be a Nash equilibrium. However, not every Nash equilibrium is an ESS; this property is guaranteed only for strict equilibria.

Theorem 3.8.4. Given a symmetric two-player normal-form game G = ({1,2}, A, u) and a mixed strategy S, if (S, S) is a strict (symmetric) Nash equilibrium of G, then S is an evolutionarily stable strategy.

This is also easy to show. Note that for any strict Nash equilibrium S it must be the case that

u(S, S) > u(S′, S).

But this satisfies the first criterion of an ESS.

 

 

 

1The attentive reader might wonder how a theorem from 1928 can use the term “Nash equilibrium,” when Nash’s work was published in 1950. Von Neumann used different terminology and proved the theorem in a different way; however, the given presentation is probably clearer in the context of modern game theory.

2Note that here we consider strategy domination from one individual player’s point of view; thus, this notion is unrelated to the concept of Pareto domination discussed earlier.

3 One Nobel-prize-winning game theorist, R. Myerson, has gone so far as to say that “if there is intelligent life on other planets, in a majority of them, they would have discovered correlated equilibrium before Nash equilibrium.”

4This construction is closely related to one used later in the book in connection with Bayesian Games in Chapter 7.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.128.226.121