CHAPTER 4

Games With Sequential Actions: The Perfect-Information Extensive Form

In Chapter 1 we assumed that a game is represented in normal form: effectively, as a big table. In some sense, this is reasonable. The normal form is conceptually straightforward, and most game theorists see it as fundamental. While many other representations exist to describe finite games, we will see in this chapter and in those that follow that each of them has an “induced normal form”: a corresponding normal-form representation that preserves game-theoretic properties such as Nash equilibria. Thus the results given in previous chapters hold for all finite games, no matter how they are represented; in that sense the normal-form representation is universal.

In this chapter we will look at perfect-information extensive-form games, a finite representation that does not always assume that players act simultaneously. This representation is in general exponentially smaller than its induced normal form, and furthermore can be much more natural to reason about. While the Nash equilibria of an extensive-form game can be found through its induced normal form, computational benefit can be had by working with the extensive form directly. Furthermore, there are other solution concepts, such as subgame-perfect equilibrium (see Section 4.3), which explicitly refer to the sequence in which players act and which are therefore not meaningful when applied to normal-form games.

The normal-form game representation does not incorporate any notion of sequence, or time, of the actions of the players. The extensive (or tree) form is an alternative representation that makes the temporal structure explicit. In this chapter we discuss the special case of perfect information extensive-form games. We will restrict the discussion to finite games, that is, to games represented as finite trees.

4.1 DEFINITION

Informally speaking, a perfect-information game in extensive form (or, more simply, a perfect-information game) is a tree in the sense of graph theory, in which each node represents the choice of one of the players, each edge represents a possible action, and the leaves represent final outcomes over which each player has a utility function. Indeed, in certain circles (in particular, in artificial intelligence), these are known simply as game trees. Formally, we define them as follows.

Definition 4.1.1 (perfect-information game). A (finite) perfect-information game (in extensive form) is a tuple G = (N, A, H, Z, χ , ρ, σ, u), where:

N is a set of n players;

A is a (single) set of actions;

H is a set of nonterminal choice nodes;

Z is a set of terminal nodes, disjoint from H;

χ : H image 2A is the action function, which assigns to each choice node a set of possible actions;

p : H image N is the player function, which assigns to each nonterminal node a player i image N who chooses an action at that node;

σ : H × A image HZ is the successor function, which maps a choice node and an action to a new choice node or terminal node such that for all h1, h2 image H and a1, a2 image A, if σ (h1, a1) = σ(h2, a2) then h1 = h2 and a1 = a2; and

u = (u1, …, un), where ui : Z image image, is a real-valued utility function for player i on the terminal nodes Z.

Since the choice nodes form a tree, we can unambiguously identify a node with its history, that is, the sequence of choices leading from the root node to it. We can also define the descendants of a node h, namely all the choice and terminal nodes in the subtree rooted in h.

An example of such a game is the Sharing game. Imagine a brother and sister following the following protocol for sharing two indivisible and identical presents from their parents. First the brother suggests a split, which can be one of three—he keeps both, she keeps both, or they each keep one. Then the sister chooses whether to accept or reject the split. If she accepts they each get their allocated present(s), and otherwise neither gets any gift. Assuming both siblings value the two presents equally and additively, the tree representation of this game is shown in Figure 4.1.

4.2 STRATEGIES AND EQUILIBRIA

A pure strategy for a player in a perfect-information game is a complete specification of which deterministic action to take at every node belonging to that player. A more formal definition follows.

image

FIGURE 4.1: The Sharing game.

Definition 4.2.1 (Pure strategies). Let G = (N, A, H, Z, χ, ρ, σ, u) be a perfect-information extensive-form game. Then the pure strategies of player i consist of the Cartesian product image.

Notice that the definition contains a subtlety. An agent’s strategy requires a decision at each choice node, regardless of whether or not it is possible to reach that node given the other choice nodes. In the Sharing game above the situation is straightforward—player 1 has three pure strategies, and player 2 has eight, as follows.

S1 = {2–0, 1–1, 0–2}

S2 = {(yes, yes, yes), (yes, yes, no), (yes, no, yes), (yes, no, no), (no, yes, yes), (no, yes, no), (no, no, yes), (no, no, no)}

But now consider the game shown in Figure 4.2.

In order to define a complete strategy for this game, each of the players must choose an action at each of his two choice nodes. Thus we can enumerate the pure strategies of the players as follows.

S1 = {(A, G), (A, H), (B, G), (B, H)}

S2 = {(C, E), (C, F), (D, E), (D, F)}

image

FIGURE 4.2: A perfect-information game in extensive form.

image

FIGURE 4.3: The game from Figure 4.2 in normal form.

It is important to note that we have to include the strategies (A, G) and (A, H), even though once player 1 has chosen A then his own G-versus-H choice is moot.

The definition of best response and Nash equilibria in this game are exactly as they are for normal-form games. Indeed, this example illustrates how every perfect-information game can be converted to an equivalent normal-form game. For example, the perfect-information game of Figure 4.2 can be converted into the normal form image of the game, shown in Figure 4.3. Clearly, the strategy spaces of the two games are the same, as are the pure-strategy Nash equilibria. (Indeed, both the mixed strategies and the mixed-strategy Nash equilibria of the two games are also the same; however, we defer further discussion of mixed strategies until we consider imperfect-information games in Chapter 5.)

In this way, for every perfect-information game there exists a corresponding normal-form game. Note, however, that the temporal structure of the extensive-form representation can result in a certain redundancy within the normal form. For example, in Figure 4.3 there are 16 different outcomes, while in Figure 4.2 there are only 5; the payoff (3, 8) occurs only once in Figure 4.2 but four times in Figure 4.3. One general lesson is that while this transformation can always be performed, it can result in an exponential blowup of the game representation. This is an important lesson, since the didactic examples of normal-form games are very small, wrongly suggesting that this form is more compact.

The normal form gets its revenge, however, since the reverse transformation—from the normal form to the perfect-information extensive form—does not always exist. Consider, for example, the Prisoner’s Dilemma game from Figure 1.1. A little experimentation will convince the reader that there does not exist a perfect-information game that is equivalent in the sense of having the same strategy profiles and the same payoffs. Intuitively, the problem is that perfect-information extensive-form games cannot model simultaneity. The general characterization of the class of normal-form games for which there exist corresponding perfect-information games in extensive form is somewhat complex.

The reader will have noticed that we have so far concentrated on pure strategies and pure Nash equilibria in extensive-form games. There are two reasons for this, or perhaps one reason and one excuse. The reason is that mixed strategies introduce a new subtlety, and it is convenient to postpone discussion of it. The excuse (which also allows the postponement, though not for long) is the following theorem.

Theorem 4.2.2. Every (finite) perfect-information game in extensive form has a pure-strategy Nash equilibrium.

This is perhaps the earliest result in game theory, due to Zermelo in 1913 (see the historical notes at the end of the book). The intuition here should be clear; since players take turns, and everyone gets to see everything that happened thus far before making a move, it is never necessary to introduce randomness into action selection in order to find an equilibrium. We will see this plainly when we discuss backward induction below. Both this intuition and the theorem will cease to hold when we discuss more general classes of games such as imperfect-information games in extensive form. First, however, we discuss an important refinement of the concept of Nash equilibrium.

4.3 SUBGAME-PERFECT EQUILIBRIUM

As we have discussed, the notion of Nash equilibrium is as well defined in perfect-information games in extensive form as it is in the normal form. However, as the following example shows, the Nash equilibrium can be too weak a notion for the extensive form. Consider again the perfect-information extensive-form game shown in Figure 4.2. There are three pure-strategy Nash equilibria in this game: {(A, G), (C, F)}, {(A, H), (C, F)}, and {(B, H), (C, E)}. This can be determined by examining the normal form image of the game, as indicated in Figure 4.4.

However, examining the normal form image of an extensive-form game obscures the game’s temporal nature. To illustrate a problem that can arise in certain equilibria of extensive-form games, in Figure 4.5 we contrast the equilibria {(A, G), (C, F)} and {(B, H), (C, E)} by drawing them on the extensive-form game tree.

image

FIGURE 4.4: Equilibria of the game from Figure 4.2.

First consider the equilibrium {(A, G), (C, F)}. If player 1 chooses A then player 2 receives a higher payoff by choosing C than by choosing D. If player 2 played the strategy (C, E) rather than (C, F) then player 1 would prefer to play B at the first node in the tree; as it is, player 1 gets a payoff of 3 by playing A rather than a payoff of 2 by playing B. Hence we have an equilibrium.

The second equilibrium {(B, H), (C, E)} is less intuitive. First, note that {(B, G), (C, E)} is not an equilibrium: player 2’s best response to (B, G) is (C, F). Thus, the only reason that player 2 chooses to play the action E is that he knows that player 1 would play H at his second decision node. This behavior by player 1 is called a threat: by committing to choose an action that is harmful to player 2 in his second decision node, player 1 can cause player 2 to avoid that part of the tree. (Note that player 1 benefits from making this threat: he gets a payoff of 5 instead of 2 by playing (B, H) instead of (B, G).) So far so good. The problem, however, is that player 2 may not consider player 1’s threat to be credible: if player 1 did reach his final decision node, actually choosing H over G would also reduce player 1’s own utility. If player 2 played F, would player 1 really follow through on his threat and play H, or would he relent and pick G instead?

To formally capture the reason why the {(B, H), (C, E)} equilibrium is unsatisfying, and to define an equilibrium refinement concept that does not suffer from this problem, we first define the notion of a subgame.

image

FIGURE 4.5: Two out of the three equilibria of the game from Figure 4.2: {(A, G), (C, F)} and {(B, H), (C, E)}. Bold edges indicate players’ choices at each node.

Definition 4.3.1 (Subgame). Given a perfect-information extensive-form game G, the subgame of G rooted at node h is the restriction of G to the descendants of h. The set of subgames of G consists of all of subgames of G rooted at some node in G.

Now we can define the notion of a subgame-perfect equilibrium, a refinement of the Nash equilibrium in perfect-information games in extensive form, which eliminates those unwanted Nash equilibria.1

Definition 4.3.2 (Subgame-perfect equilibrium). The subgame-perfect equilibria (SPE) of a game G are all strategy profiles s such that for any subgame G of G, the restriction of s to G is a Nash equilibrium of G′.

Since G is in particular its own subgame, every SPE is also a Nash equilibrium. Furthermore, although SPE is a stronger concept than Nash equilibrium (i.e., every SPE is a NE, but not every NE is a SPE) it is still the case that every perfect-information extensive-form game has at least one subgame-perfect equilibrium.

This definition rules out “noncredible threats,” of the sort illustrated in the above example. In particular, note that the extensive-form game in Figure 4.2 has only one subgame-perfect equilibrium, {(A, G), (C, F)}. Neither of the other Nash equilibria is subgame perfect. Consider the subgame rooted in player 1’s second choice node. The unique Nash equilibrium of this (trivial) game is for player 1 to play G. Thus the action H, the restriction of the strategies (A, H) and (B, H) to this subgame, is not optimal in this subgame, and cannot be part of a subgame-perfect equilibrium of the larger game.

4.4 BACKWARD INDUCTION

Inherent in the concept of subgame-perfect equilibrium is the principle of backward induction. One identifies the equilibria in the “bottom-most” subgame trees, and assumes that those equilibria will be played as one backs up and considers increasingly larger trees. We can use this procedure to compute a sample Nash equilibrium. This is good news: not only are we guaranteed to find a subgame-perfect equilibrium (rather than possibly finding a Nash equilibrium that involves noncredible threats), but also this procedure is computationally simple.

The algorithm BACKWARDINDUCTION is described in Figure 4.6. The variable util_at_child is a vector denoting the utility for each player at the child node; util_at_childρ(h) denotes the element of this vector corresponding to the utility for player ρ(h) (the player who gets to move at node h). Similarly, best_util is a vector giving utilities for each player.

Observe that this procedure does not return an equilibrium strategy for each of the n players, but rather describes how to label each node with a vector of n real numbers. This labeling can be seen as an extension of the game’s utility function to the nonterminal nodes H. The players’ equilibrium strategies follow straightforwardly from this extended utility function: every time a given player i has the opportunity to act at a given node h image H (i.e., p(h) = i), that player will choose an action ai image χ(h) that solves arg maxaiimageχ(h) ui(σ (ai, h)). These strategies can also be returned by BACKWARDINDUCTION given some extra bookkeeping.

image

FIGURE 4.6: Procedure for finding the value of a sample (subgame-perfect) Nash equilibrium of a perfect-information extensive-form game.

In general in this booklet we do not address computational issues, so this example could be misleading without additional explanation. While the procedure demonstrates that in principle a sample SPE is effectively computable, in practice many game trees are not enumerated in advance and are hence unavailable for backward induction. For example, the extensive-form representation of chess has around 10150 nodes, which is vastly too large to represent explicitly.

We note that BACKWARDINDUCTION has another name in the two-player, zero-sum context: the minimax algorithm. Recall that in such games, only a single payoff number is required to characterize any outcome. Player 1 wants to maximize this number, while player 2 wants to minimize it. In this context BACKWARDINDUCTION can be understood as propagating these single payoff numbers from the root of the tree up to the root. Each decision node for player 1 is labeled with the maximum of the labels of its child nodes (representing the fact that player 1 would choose the corresponding action), and each decision node for player 2 is labeled with the minimum of that node’s children’s labels. The label on the root node is the value of the game: player 1’s payoff in equilibrium.

As we said, real-world games—even zero-sum ones, such as chess—cannot be represented explicitly. Such games require the gradual development of the tree, and its heuristic search. At least in the context of zero-sum games, considerable effort has gone into such search algorithms. The best-known one, ALPHABETAPRUNING, is a heuristic version of the minimax algorithm.

Despite the fact that strong arguments can be made in its favor, the concept of backward induction is not without controversy. To see why this is, consider the well-known Centipede game, depicted in Figure 4.7. (The game starts at the node at the upper left.) In this game two players alternate in making decisions, at each turn choosing between going “down” and ending the game or going “across” and continuing it (except at the last node where going “across” also ends the game). The payoffs are constructed in such a way that the only SPE is for each player to always choose to go down. So see why, consider the last choice. Clearly at that point the best choice for the player is to go down. Since this is the case, going down is also the best choice for the other player in the previous choice point. By induction the same argument holds for all choice points.

image

FIGURE 4.7: The Centipede game.

This would seem to be the end of this story, except for two pesky factors. The first problem is that the SPE prediction in this case flies in the face of intuition. Indeed, in laboratory experiments subjects in fact continue to stay play “across” until close to the end of the game. The second problem is theoretical. Imagine that you are the second player in the game, and in the first step of the game the first player actually goes across. What should you do? The SPE suggests you should go down, but the same analysis suggests that you would not have gotten to this choice point in the first place. In other words, you have reached a state to which your analysis has given a probability of zero. How should you amend your beliefs and course of action based on this measure-zero event? It turns out this seemingly small inconvenience actually raises a fundamental problem in game theory. We will not develop the subject further here, but let us only mention that there exist different accounts of this situation, and they depend on the probabilistic assumptions made, on what is common knowledge (in particular, whether there is common knowledge of rationality), and on exactly how one revises one’s beliefs in the face of measure-zero events.

 

 

 

1Note that the word “perfect” is used in two different senses here.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.136.22.179