Following the setup described in Section
7.2, consider a system with two players
N (
network) and
M (
malware), specified by a system of
n differential equations
[146, P.83]:
ẋ (t)=f(t,x(t),uN(t),uM(t))t∈[t0,T], where
uN(t)∈UN⊂Rm, and
uM(t)∈UM⊂Rs, and the initial condition
x(t0)=x0, and a damage function (the functional)
J(uN,uM)=g(x(T))+∫Tt0h(x,uN,uM,t)dt, where
x(t) is the
n-dimensional state vector.
Player N seeks to minimize
J by choosing the
m dimensional
control function uN(⋅), and player
M seeks to maximize
J by choosing the
s-dimensional control function
uM(⋅). The game is therefore referred to as a dynamic two-player zero-sum game. The players’ payoffs and the set of strategies available to them are called
rules of the game. Both players know the rules of the game and each player knows that its opponent knows the rule and ad infinitum. As we mentioned in the Introduction to this chapter, we consider open-loop strategies, that is, the controls only depend on time (and not on the state, or the previous history). This is appropriate in the context of the security in networks, as the instantaneous state of the network (exact fraction of the nodes of each type) is impossible (or very costly) to follow, for both of the players.
In our context,
(7.1) provides the
f(⋅) functions, the initial conditions are provided by
(7.1),
(7.3) provides the
g(⋅),h(⋅) functions,
(7.4a),
(7.4b) provide
UN,UM. Also, we have
n=3,m=2,
s=1. Note that the
f(⋅),h(⋅) functions in our context depend on time
t only implicitly, that is through the state and control functions. Also, the formulation does not capture any other constraints on the state functions, and in our context it does not need to either, owing to
Lemma 7.1.
We now consider the values of the game. The
lower value denoted by
V∗ is the overall damage when the minimizing player (
N) is given the upper-hand, i.e. selects its strategy after learning its opponent’s strategy. Mathematically,
V∗=maxuMminuNJ(uN,uM). Conversely, the
upper value of the game
V∗ is defined as
V∗=minuNmaxuMJ(uN,uM) Thus,
V∗ (
V∗, respectively) is the maximum (minimum, respectively) damage that the malware (network, respectively) can inflict (incur, respectively) if the other player has the upper-hand. Hence,
V∗≤V∗. A pair of strategies
(uN∗,uM∗) is called a
saddle-point if
J(uN∗,uM)≤J(uN∗,uM∗)=V≤J(uN,uM∗) for any strategy
uN of the network and
uM of the malware, and then
V is the value of the game, and
V=V∗=V∗.
Thus, if the network selects its saddle-point strategy
uN∗, irrespective of the strategy of the malware, the damage it incurs is at most
V, which is also the minimum
damage that the malware can inflict if it has the upper-hand. Thus, the network’s saddle-point strategy is also its
robust strategy, in the sense, that it minimizes the maximum possible damage it can incur. Conversely, the malware’s saddle-point strategy is also its robust strategy, since it maximizes the minimum possible damage it can inflict. Also, the network’s and the malware’s saddle-point strategies are their respective best responses to the other’s robust strategy. We prove the existence of saddle-point pair in the next theorem.
Theorem 7.1
The dynamic game defined above has a saddle-point pair of strategies.
Proof
This theorem directly follows from Theorem 2 in page 91 of
[146]. The necessary conditions of the theorem are readily satisfied in our game. Namely,
(i) the system function
f(t,x,uN,uM) in this game is continuous in states and controls and is moreover bounded. Note that this is sufficient for the condition in page 83 of
[146] to hold;
(ii) the instantaneous pay-off function
h(t,x,uN,uM) and the terminal pay-off function
g(x(T)) are continuous in the states and controls (page 84 of
[146]);
(iii) the system function
f(t,x,uN,uM) is linear in controls. The set defining controls are convex sets, and the instantaneous pay-off function
h(t,x,uN,uM) is linear in controls (page 91 of
[146]).
□