Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 11

Attack Identification and Prevention in Networked Systems

Abstract

An essential requirement for a large-scale networked system to work properly is its safety. In this chapter, we investigate some preliminary issues in prevention and estimation of malicious and organized inputs to a networked system, which include detectability and identifiability of attacks, appropriate sensor distributions, and so on. We establish connections with system observability and with system transmission zeros. It has been made clear that accurate system model is quite important in both risk managements and attack building. We also discuss some recent quantitative methods developed for analyzing risks of a networked system.

Keywords

controllability; least squares estimates; observability; SCADA; security; state estimation; transmission zero

11.1 Introduction

A large-scale networked system usually consists of a great amount of subsystems, and these subsystems are often spatially distributed and far from each other geometrically. This asks for timely data exchanges among subsystems, which can be supported by advanced communication technologies, including both wired and wireless channels. Examples include electric power systems, unmanned vehicle systems, water supply networks, and so on.

Although integrations of several subsystems are greatly expected to increase performances of the whole system, they also provide more opportunities of being affected by malicious operators. There are many real world examples illustrating these possibilities, and in some cases, significant damages have been caused to the physical processes targeted by the attackers [1]. One example is the advanced computer worm Stuxnet, which happened in 2010 and infected some industrial control systems, reportedly leading to damages of approximately 1,000 centrifuges at these plants, although the attack itself was rather naive from a control engineer's point of view [2]. Another example is the Maroochy water breach in 2000 [3]. In this incident, an attacker managed to hack into some controllers that activate and deactivate some valves, which caused flooding of the grounds of a hotel, a park, and a river, with about a million liters of sewage. In 2003, the SQL Slammer worm attacked the Davis–Besse nuclear plant [4]. The recent multiple power blackouts in Brazil are also believed to be caused by malicious attacks [5].

Differently from faults and disturbances in a control system, which happen naturally and occasionally, attacks may happen in a well-organized way, simultaneously or sequentially over several spatial places of a large-scale networked system, with the objective to significantly violate performances of the system, or even to destroy the system itself. In other words, although faults also affect system behaviors, simultaneous events are usually not considered to be colluding. On the contrary, simultaneous attacks are often cooperative. In addition, faults are usually constrained by some physical dynamics and do not have an intent or objective, but attacks are usually not straightforwardly restricted by the dynamics of the plant physical process and/or chemical process, usually have a malicious objective, and often perform in a secrete way.

System safety is in fact not a new topic [6,7]. Particularly, in a power system, a supervisory control and data acquisition system, which is usually abbreviated as SCADA, is well developed and has been widely adopted. This system is originally designed to prevent electricity theft and to monitor abrupt voltage changes. In this system, measurement data of plant static outputs are used to estimate plant states, and the estimates are normally used in an internal control area of a transmission system operator. A basic assumption adopted in SCADA is that the system is in a steady-state behavior within its computation interval, which is usually of tens of seconds. When the plant states have a fast change during some alert or emergency situations, its estimation accuracy deteriorates drastically. Moreover, new power generation sources, such as large-scale wind power penetration, smart transmission devices, and so on have not been taken into account appropriately. In addition, inaccurate estimations about the state of a directly connected subsystem may cause some misleading estimates about system security and lead to a wrong decision of security control. These factors ask investigations on attack preventions using dynamic input–output data.

On the other hand, with the development of communication technologies and computer technologies, it is extensively anticipated that a plant and even subsystems of a plant are connected by some public communication networks, such as the world wide web and so on, to reduce hardware costs and increase maintenance flexibilities. In such a situation, a communication network becomes also an essential part of a networked system, such as that played by a physical plant and a digital feedback controller. When a system is connected by communication channels, many other possibilities arise for the system to be attacked. For example, it becomes easier for a hacker to insert some destructive disturbances in a more secrete way into a signal being transmitted in a communication channel [7–9]. In particular, a cyber-physical system usually suffers from some specific vulnerabilities, which do not exist in classical control systems and for which appropriate detection and identification techniques are required. For instance, the reliance on communication networks and standard communication protocols to transmit measurements and control packets increases the possibility of intentional and worst-case attacks against physical plants and/or the controller. On the other hand, information security methods, such as authentication, access control, message integrity, and so on, may not be adequate for a satisfactory protection of these attacks. Indeed, these security methods are only some passive ways to prevent a signal being disturbed during its transmissions. They do not exploit compatibilities of the received signals with the underlying physical/chemical process of the plant and/or the control mechanism. These characteristics may make them ineffective against an insider attack targeting the physical dynamics and/or the controller.

Generally, the dynamics of a networked system with possible attacks can be described as

$x (k + 1) = f (k, x (k), u (k), d (k)),$

(11.1)

$y (k) = g (k, x (k), u (k), d (k)),$

(11.2)

where $x (k) \in R^{n}$ , $y (k) \in R^{m}$ , $u (k) \in R^{p}$ , and $d (k) \in R^{q}$ respectively represent the plant state vector, output vector, input vector, and the vector consisting of possible attacks. Moreover, $f (k, \cdot)$ and $g (k, \cdot)$ are in general nonlinear functions that may be time varying. For a physical plant to work properly, there are usually some strict restrictions on its states. For example, in a water supply network, the levels of its reservoirs are in general restricted in some suitable ranges. The set consisting of these permissible plant states is called the plant safe region, whereas the complimentary set of the plant states is called the unsafe region. A plant is said to be safe if its state vector belongs to its safe region; otherwise, the plant is said to be unsafe. The objectives of the malicious attackers are often manipulating the plant state vector from its safe region to its unsafe region through adding an attack disturbance vector process $d (k)$ into the system with the smallest efforts, as well as to avoid being detected. In other words, to manipulate the plant from being safe to being unsafe in a theft and economic way. The input signal $u (k)$ is delivered from a controller through a communication network designed to make the plant work satisfactory. Due to the imperfectness of communication channels, this delivery may have some time delays and even failures and be corrupted by environment noises. Without detectors, the plant is usually impossible to distinguish the input vector $u (k)$ from that carrying the attack vector $d (k)$ . In other words, the attack vector $d (k)$ is usually injected into the input vector $u (k)$ possibly in a transmission process, and this injection combines these two vectors into a single vector that is input to the plant. Differently from external disturbances widely studied in control system analysis and synthesis, the attack vector $d (k)$ is usually not random. More precisely, it might be designed with a clear objective using some measurements of the plant output vector $y (k)$ .

When $d (k) \equiv 0$ , the networked system described by Eqs. (11.1) and (11.2) is said to be in its normal situation, and the associated trajectories of its state and output vectors are said to be in their nominal behavior. Otherwise, the system is said to be in an abnormal situation, and the corresponding trajectories of its state and output vectors are said to be in their abnormal behavior. The objective of attack estimation is to clarify whether or not the system is in its normal situation using measurements of its output vector, whereas the objective of attack identification is to estimate the amount and positions of attackers when the system is in an abnormal situation. Generally, these two problems are dealt with through investigating characteristics of the residues of some detectors.

Taking into account that for nonlinear dynamic systems, there still does not exist a general analysis and/or synthesis method, our attention is restricted to linear systems in this chapter, in which both the function $f (k, \cdot)$ of Eq. (11.1) and the function $g (k, \cdot)$ of Eq. (11.2) are linear. This significantly reduces mathematical difficulties in handling the attack estimation and prevention problems, keeping the associated results significant and relevant from an engineering viewpoint. With this simplification, the input vector $u (k)$ can be removed from the model, as its existence does not affect conclusions on either attack identification or attack prevention due to the linearities of the system.

In this chapter, we first discuss attack identification and prevention using static data, which has been extensively adopted in large-scale electric power networks. Afterward, relations between system observability and attack prevention are investigated. With these relations, estimation and identification of attacks are respectively dealt with, as well as the relations between system security and optimal sensor placements. Further research topics will also be briefly discussed.

To clarify basic concepts and ideas, rather than the general model of a networked system given by Eqs. (3.25) and (3.26), which has been discussed in several chapters of this book, in this chapter the state-space model of Eq. (3.1) is utilized.

11.2 The SCADA System

Very possibly, the most successful attack estimator until now is the SCADA system, which is originally designed to protect an electricity network from working in an unsafe region. In this system, the plant is assumed to be in steady state, and its state vector is estimated from its output measurements using a least squares method. Clearly, to let this system work properly, it is necessary that the number of sensors in the plant is not smaller than the dimension of its state vector. Otherwise, the output of the associated attack detector cannot be uniquely determined, which will invalidate its usefulness. Generally, the more the sensors the plant has, the better the detection performance the system has.

When the plant is linear and works in its steady state, and only output measurements are utilized in estimating its state vector, the relations between the plant outputs and states, described by Eq. (11.2), can be expressed in a much simpler way given as follows:

$y (k) = C_{k} x (k) + D_{k} d (k) + w (k) .$

(11.3)

In this equation, a vector $w (k)$ is introduced to represent measurement noises. For simplicity, we assume that its mathematical expectation is constantly equal to zero, whereas its covariance matrix is constantly equal to the identity matrix. As pointed out before, to get a physically meaningful estimate about the plant states, it is necessary that the dimension of the output vector $y (k)$ is not smaller than that of the state vector $x (k)$ . This means that the output matrix $C_{k}$ has more rows than columns.

When no attacks are available, a reasonable estimate about the plant state vector, denoted $\hat{x} (k)$ , which is extensively adopted and widely called a least squares estimate, is

$\hat{x} (k) = {(C_{k}^{T} C_{k})}^{- 1} C_{k}^{T} y (k) .$

(11.4)

With this estimate, the predicted plant output $\hat{y} (k)$ and its prediction errors $\tilde{y} (k)$ are respectively

$\hat{y} (k) = C_{k} {(C_{k}^{T} C_{k})}^{- 1} C_{k}^{T} y (k),$

(11.5)

$\tilde{y} (k) = y (k) - \hat{y} (k) = [I - C_{k} {(C_{k}^{T} C_{k})}^{- 1} C_{k}^{T}] y (k) .$

(11.6)

Obviously, to make this estimate physically meaningful, it is necessary that the matrix $C_{k}$ is of full column rank, which guarantees the existence of the inverse of $C_{k}^{T} C_{k}$ . This condition can be satisfied through an appropriate sensor placement in the plant.

When the measurement error vector $w (k)$ has a normal distribution, we can easily prove that ${\tilde{y}}^{T} (k) \tilde{y} (k)$ has a $X^{2}$ distribution. Hence, an anomaly detector, which is based on a quantity of the residue of plant output measurement, can be defined as

$r (k) = S y (k), S = I - C_{k} {(C_{k}^{T} C_{k})}^{- 1} C_{k}^{T},$

(11.7)

which is in fact the SCADA system [6]. Obviously, this detector shares the same form of the output prediction error in the normal situation that is given by $\tilde{y} (k)$ . In this system, $\sqrt{r^{T} (k) r (k)}$ is utilized to decide whether or not there exists an attack in the networked system. When its value is greater than a prescribed threshold, it is believed that an attack exists. Otherwise, there are no attacks in the networked system [6,7].

This anomaly detector is usually efficient in detecting a single attack which caused a significant change in one of the plant output measurements. However, when a coordinated malicious attack exists in the plant and may cause simultaneous change of several plant output measurements, very possibly, this detector does not work quite well. For example, assume that there is a coordinated attack $\bar{d} (k)$ satisfying

$D_{k} \bar{d} (k) = C_{k} \bar{x} (k) .$

(11.8)

This is possible if the attackers appropriately select their attack places and strategies and coordinate their actions. Under such a situation, we have that the output of the anomaly detector satisfies the following equalities:

$\begin{matrix} r (k) & = & S y (k) \\ = & [I - C_{k} {(C_{k}^{T} C_{k})}^{- 1} C_{k}^{T}] \times [C_{k} x (k) + D_{k} \bar{d} (k) + w (k)] \\ = & [I - C_{k} {(C_{k}^{T} C_{k})}^{- 1} C_{k}^{T}] \times [C_{k} x (k) + w (k)] + [I - C_{k} {(C_{k}^{T} C_{k})}^{- 1} C_{k}^{T}] C_{k} \bar{x} (k) \\ = & [I - C_{k} {(C_{k}^{T} C_{k})}^{- 1} C_{k}^{T}] \times [C_{k} x (k) + w (k)], \end{matrix}$

(11.9)

that is, no matter how large the magnitude of an element in the vector $\bar{x} (k)$ is, the output of the anomaly detector, that is, $r (k)$ , does not change. This invalidates its capabilities of detecting the attacks satisfying Eq. (11.8).

Note that when the plant is linear and the input signal $u (t)$ is constantly set to zero, by Eq. (11.1) its state evolutions can be expressed as

$x (k + 1) = A_{k} x (k) + B_{k} d (k) .$

(11.10)

In the plant steady state, we have that $x (k + 1) = x (k)$ . Hence,

$x (k) = {[I - A_{k}]}^{- 1} B_{k} d (k),$

(11.11)

which may be very far from the steady state of the plant without an attack, which is equal to zero by the adopted assumption that $u (k) \equiv 0$ . In fact, when the attack of Eq. (11.8) is applied,

$x (k) = {[I - A_{k}]}^{- 1} B_{k} D_{k}^{†} C_{k} \bar{x} (k),$

(11.12)

where $D_{k}^{†}$ stands for the pseudo-inverse of the matrix $D (k)$ .¹ As each element of the vector $\bar{x} (k)$ can be arbitrarily large in magnitude without being detected by the anomaly detector, this equation implies that an appropriate selection of either the matrix $B_{k}$ or the matrix $D_{k}$ , or both of them, may cause a significant change of the plant states, which are capable of deriving the plant to its unsafe region. On the other hand, through choices of attack positions in a networked system, a selection of either the matrix $B_{k}$ or the matrix $D_{k}$ can be realized in a relatively easy way, provided that the attackers have accurate information on the system model, which is mostly about the system output matrix $C_{k}$ in this case.

These conclusions are valid even if the plant is not in its steady state. In addition, numerical studies show that the attack in the form (11.8), which is extensively called stealthy attack, is usually sparse [6,7].

To analyze the security of a networked system, some security indices have also been introduced. When the plant is also time invariant, the following index $α_{j}$ is suggested in [7] to measure the security degree of the plant jth output measurement:

$α_{j} \overset{def}{=} \min_{x \in R^{n}} {‖ C x ‖}_{0} subject to C (j, :) x \neq 0,$

(11.13)

where to make the expressions more concise, the temporal variable k has been omitted from the system matrices. Moreover, $C (j, :)$ stands for the jth row vector of the output matrix C.

From its definition it is clear that when an attacker is intended to change the plant jth output measurement without being detected by the SCADA system, $α_{j}$ is in fact the minimum number of plant output measurements that he/she needs compromise. The smaller this index, the easier the jth plant output measurement to be attacked by a stealthy attack, that is, an attack that cannot be detected in principle. This means that knowledge about this security index for all the plant output measurements may provide some useful information about the security vulnerabilities of the networked system, which may be helpful in protecting the networked system with restricted resources.

Although the security index is significant from an engineering viewpoint, the associated optimization problem itself cannot be easily solved. In fact, this minimization problem has been proven to be NP-hard, and various modifications have been suggested as an alternative. One example is to replace the vector 0-norm with the vector 1-norm.

11.3 Attack Prevention and System Transmission Zeros

The previous section illustrates briefly the SCADA system, which uses only plant output measurements to detect whether or not there are attacks in the plant. The attack detection method is quite simple, but it reveals almost all essential issues in attack preventions: an attacker usually intends to significantly violate or even destroy a system in a secrete way, whereas a detector should be efficient in revealing all the attacks as soon as possible. This implies that there may exist close relations between attack preventions and system observability. In this section, we clarify these relations under the situation in which both the detector and the attacker have accurate knowledge about the plant dynamics.

As general conclusions are still not available for nonlinear time-varying systems, our attention is also restricted to linear and time-invariant systems in this section. We assume that an attacker is intended to alter the states of a plant through injecting some exogenous inputs on the basis of the system matrices of the plant at each time. In addition, the attacker is assumed to have unrestricted computation capabilities. An attacker with these characteristics is called an omniscient attacker. Under these assumptions, stealthy colluding attacks are closely related to plant transmission zeros.

Assume that the dynamics of a networked system with possible attacks is described as

$x (k + 1) = A x (k) + B d (k),$

(11.14)

$y (k) = C x (k) + D d (k) .$

(11.15)

As in the previous section, here $d (k)$ represents an attack disturbance. Once again, we assume that $x (k) \in R^{n}$ , $y (k) \in R^{m}$ , and $d (k) \in R^{q}$ . Noting that the plant is assumed to be linear, it is not necessary to include control signals in the adopted model.

To simplify discussions, we assume throughout the rest of this chapter that the above state space model is a minimal realization, that is, it is both controllable and observable. Note that if the system is not controllable, then a decomposition can divide its state space into the direct sum of controllable and uncontrollable subspaces, and the system state vector can be manipulated by an attacker only when it is in the controllable subspace. On the other hand, if the system is unobservable, then the system state space can be decomposed into the direct sum of observable and unobservable subspaces. As the part of a state vector belonging to the unobservable subspace does not have any influence on the plant output vector, variations of this part cannot be detected by the plant output vectors. That is, under such a situation, there always exist attacks that cannot be detected and therefore cannot be identified by monitoring only these plant outputs. This observation means that the aforementioned minimal realization assumption is reasonable.

Another assumption adopted in the rest of this chapter is that the matrix $[\begin{matrix} B \\ D \end{matrix}]$ is of full column rank, which is constituted from the system input matrix B and its direct feedthrough matrix D. This is necessary for the identifiability of attacks, noting that if this matrix is not of full column rank, then different attacks may cause the same value of the vector $[\begin{matrix} B \\ D \end{matrix}] d (k)$ . More precisely, for an arbitrary $\bar{d} (k) \in N u l l ([\begin{matrix} B \\ D \end{matrix}])$ , it is obvious from the definition of the null space of a matrix that

$[\begin{matrix} B \\ D \end{matrix}] [d (k) + \bar{d} (k)] = [\begin{matrix} B \\ D \end{matrix}] d (k) .$

In addition, when the matrix $[\begin{matrix} B \\ D \end{matrix}]$ is not of full column rank, $N u l l ([\begin{matrix} B \\ D \end{matrix}])$ has at least one nonzero vector as its element. Under such a situation, it is clear from Eqs. (11.14) and (11.15) that these different attacks lead to the same trajectory of the system state vector and to the same trajectory of the system output vector. Obviously, this is not an appreciative property in attack detection and/or attack identification.

Similarly to those in the SCADA system, an attack detector, which is sometimes also called an attack monitor, exploits only plant dynamics and output measurements to reveal the existence of attacks and identify their positions. Hence, an attack cannot be detected in principle if the associated plant output measurements are consistent with those without any attacks. On the contrary, if an attack causes a plant output measurement that has some characteristics different from those in regular/normal situations, some methods may be developed to detect this attack. Note that the output of a plant depends on both its initial conditions and external inputs. To clarify this dependence, rather than $y (k)$ , $y (k, x (0), d (j) |_{j = 0}^{k})$ is used more often in the following discussions. Here, $x (0)$ belongs to the set $R^{n}$ and represents the value of the plant state vector at the time instant $k = 0$ .

Similarly, $x (k, x (0), d (j) |_{j = 0}^{k})$ is sometimes adopted to indicate the dependence of the state vector on its initial values and on the input vector sequence $d (k) |_{k = 0}^{\infty}$ .

Definition 11.1

(Undetectable attack) Concerning the system described by Eqs. (11.14) and (11.15), an attack $d (k) |_{k = 0}^{\infty}$ is undetectable if for each initial state vector $x (0) \in R^{n}$ , there exists at least one other initial state vector $\bar{x} (0) \in R^{n}$ such that $y (k, \bar{x} (0), d (j) |_{j = 0}^{k}) = y (k, x (0), 0)$ at each $k = 0, 1, 2, \dots$ .

From an engineering point of view, this definition means that a detector cannot be constructed if the plant outputs associated with some attacks are completely the same as those due to modifications of plant initial conditions. One of the basic motivations behind this definition is that plant initial conditions are usually not exactly known in many applications, which makes the mapping from a plant input series to a plant output series not bijective and therefore leaves opportunities for an attacker to inject destructive disturbances.

In addition, to detect whether or not there exists an attack in a networked system, it is usually necessary for a detector to determine where the attack is from if it exists. This leads to an attack identification problem and requires to construct a detector that has capabilities of identifying attack locations. As attacks in a networked system are usually sparse, only a few elements of the attack vector $d (k)$ are usually not constantly equal to zero. Hence, some upper bounds can be put on the number of colluding attackers. Here, we assume that at most K attackers collude in one attack.

Definition 11.2

(Unidentifiable attack) Concerning the system described by Eqs. (11.14) and (11.15), an attack vector sequence $d (k) |_{k = 0}^{\infty}$ with K elements not constantly zero is unidentifiable if for each initial state vector $x (0) \in R^{n}$ , there exist at least one other initial state vector $\bar{x} (0) \in R^{n}$ and one other attack vector sequence $\bar{d} (k) |_{k = 0}^{\infty}$ with $\bar{K}$ elements not constantly zero, where $0 ⩽ \bar{K} ⩽ K$ , such that at every time instant $k = 0, 1, 2, \dots$ , the plant output satisfies $y (k, \bar{x} (0), \bar{d} (j) |_{j = 0}^{k}) = y (k, x (0), d (j) |_{j = 0}^{k})$ .

Similarly to an undetectable attack, an unidentifiable attack cannot be identified because another attack can generate completely the same plant output, which makes plant outputs alone not sufficiently informative in distinguishing these attacks.

Let $K$ denote a set consisting of K nonzero positive integers that belong to the set ${1, 2, \dots, q}$ , where q stands for the dimension of the attack vector $d (k)$ . Associating with this set, define the attack set $A_{K}$ in which an entry of the attack vector $d (k)$ is not constantly set to zero only if it is in the row whose number is one of the elements of the set $K$ . More precisely, assume that $K = {α_{1}, α_{2}, \dots, α_{K}}$ with $α_{i} \in {1, 2, \dots, q}$ for each $i = 1, 2, \dots, q$ . Then,

$A_{K} = {d (k) | d (i, k) \equiv 0, \forall i \in {1, 2, \dots, q} K; d (i, k) ≢ 0, \forall i \in K},$

where $d (i, k)$ stands for the ith row element of the attack vector $d (k)$ .

An attack set is undetectable if there exists at least one undetectable attack in that set. Similarly, an attack set is unidentifiable if there exists at least one unidentifiable attack in that set. As the attack set $A_{K}$ only provides information about the position, in which an attack is injected into the networked system, which is completely the same as that of the set $K$ , sometimes the set $K$ is also called as an attack set. In other words, when an attack from the attack set $A_{K}$ is injected into the networked system described by Eqs. (11.14) and (11.15), $B d (k)$ and $D d (k)$ in these equations can be respectively rewritten as $B_{K} d_{K} (k)$ and $D_{K} d_{K} (k)$ , where $B_{K}$ is constituted from the columns of the matrix B whose numbers are consistent with the elements of the set $K$ , whereas $D_{K}$ is constituted from the columns of the matrix D whose numbers are consistent with the elements of the set $K$ . In addition, $d_{K} (k)$ is a K-dimensional real-valued vector.

In [9], a different attack detection problem is formulated and investigated. It is assumed there that only sensors or actuators of a networked system are attacked, and a characterization is given for the maximum number of attacks that can be detected and corrected, which is expressed as a function of the plant state transition matrix and the plant output matrix. Although this problem formulation appears to be different from that given in Definition 11.1, they are actually closely related to each other [7].

From linearity assumption of the networked system and Definition 11.1 it is obvious that an attack $d (k) |_{k = 0}^{\infty}$ is undetectable if and only if there exists a plant initial state vector $\tilde{x} (0) \in R^{n}$ such that

$y (k, \tilde{x} (0), d (j) |_{j = 0}^{k}) \equiv 0 .$

(11.16)

The latter is closely related to some fundamental properties of a dynamic system. In fact, Eq. (11.16) has completely the same form as that of the conditions adopted in the definition of the zero dynamics of a system. More precisely, under some weak conditions, the equality in this equation can be satisfied if and only if the attack $d (k) |_{k = 0}^{\infty}$ only stimulates the zero dynamics of the networked system [10,11]. Due to these relations, we can establish an algebraic criterion to verify whether or not an attack set is undetectable.

Similarly, we can straightforwardly prove that an attack $d (k) |_{k = 0}^{\infty}$ with K elements that are not constantly zero is unidentifiable if and only if there exist a plant initial state vector $\tilde{x} (0) \in R^{n}$ and an attack $\bar{d} (k) |_{k = 0}^{\infty}$ with $\bar{K}$ elements in $d (k) |_{k = 0}^{\infty}$ that is not constantly zero and the integer $\bar{K}$ satisfying $0 ⩽ \bar{K} ⩽ K$ such that

$y (k, \tilde{x} (0), d (j) - \bar{d} (j) |_{j = 0}^{k}) \equiv 0 .$

(11.17)

Note that Eqs. (11.16) and (11.17) are quite similar in their forms. It is not hard to understand that attack identification is once again closely related to transmission zeros of a networked system, which leads to an easily verifiable algebraic condition for an unidentifiable attack set.

11.3.1 Zero Dynamics and Transmission Zeros

These discussions reveal that both attack detection and attack identification are in fact a verification problem on the existence of an initial state vector and an input sequence that make the plant output vector be constantly equal to zero, which is closely related to zero dynamics and therefore to transmission zeros of a system. To illustrate these relations, the following conclusions are first established for a discrete-time system, which are similar to the results about continuous-time systems given in [10,11].

Lemma 11.1

Let $G (z)$ be the transfer function matrix with minimal realization given by Eqs. (11.14) and (11.15). If the system input vector process $d (k)$ is of the form $d (k) = λ^{k} d (0)$ for each $k ⩾ 0$ and the system initial conditions satisfy $x (0) = {(λ I - A)}^{- 1} B d (0)$ , in which λ is a constant scalar not equal to any eigenvalue of the matrix A, and $d (0)$ is a constant vector with compatible dimension. Then for an arbitrary integer $k ⩾ 0$ , the system output vector process $y (k)$ can be expressed as

$y (k) = λ^{k} G (λ) d (0) .$

(11.18)

Proof

When λ is not an eigenvalue of the matrix A, the matrix $λ I - A$ is invertible, which means that the condition $x (0) = {(λ I - A)}^{- 1} B d (0)$ is well defined. Note that when a system has a minimal realization of Eqs. (11.14) and (11.15), the $Z$ -transformation of its output vector process $y (k)$ can be expressed as

$y (z) = C {(z I - A)}^{- 1} [x (0) + B d (z)] + D d (z) .$

(11.19)

Hence, when $d (k) = λ^{k} d (0)$ for each $k ⩾ 0$ , we have that

$\begin{matrix} y (z) & = & C {(z I - A)}^{- 1} {x (0) + B [{(z - λ)}^{- 1} d (0)]} + D [{(z - λ)}^{- 1} d (0)] \\ = & [C {(λ I - A)}^{- 1} B + D] [{(z - λ)}^{- 1} d (0)] \\ + C {(z I - A)}^{- 1} {x (0) + B [{(z - λ)}^{- 1} d (0)]} \\ - C {(λ I - A)}^{- 1} B [{(z - λ)}^{- 1} d (0)] \\ = & G (λ) [{(z - λ)}^{- 1} d (0)] \\ + C {(z I - A)}^{- 1} {x (0) + [I - (z I - A) {(λ I - A)}^{- 1}] B [{(z - λ)}^{- 1} d (0)]} \\ = & G (λ) [{(z - λ)}^{- 1} d (0)] + C {(z I - A)}^{- 1} {x (0) - {(λ I - A)}^{- 1} B d (0)} . \end{matrix}$

(11.20)

Therefore, the condition $x (0) = {(λ I - A)}^{- 1} B d (0)$ leads to the following equality:

$y (z) = G (λ) [{(z - λ)}^{- 1} d (0)]$

(11.21)

Recall that λ is a constant scalar. The proof can now be completed through taking the inverse $Z$ -transform of the vector-valued function $y (z)$ . □

From Lemma 11.1 it is clear that if $d (0)$ belongs to the null space of $G (λ)$ , then when the system initial state vector is set as $x (0) = {(λ I - A)}^{- 1} B d (0)$ , the plant output vector $y (k)$ is constantly equal to zero for each $k = 0, 1, 2, \dots$ . However, it is worth mentioning that when λ is a real scalar and $d (0)$ is a real-valued vector, both $λ^{k} d (0)$ with any $k ⩾ 0$ and ${(λ I - A)}^{- 1} B d (0)$ are real-valued vectors, which can be realized in principle in actual engineering problems, noting that both the state transition matrix A and the system input matrix B are real valued. However, when either λ is a complex scalar or $d (0)$ is a complex-valued vector, the associated $λ^{k} d (0)$ with $k ⩾ 0$ and/or ${(λ I - A)}^{- 1} B d (0)$ may also be complex valued. This may make the associated values not be realizable by any actual system input vector process and/or any system initial states.

When λ is equal to an eigenvalue of the state transition matrix A, the discussions become more complicated. However, with the results of Theorem 2.5, similar results can still be obtained. On the other hand, using a concept called system matrix, similar results can also be established without distinguishing whether or not λ is an eigenvalue of the state transition matrix A, which are given in the following Lemma 11.2.

To guarantee the existence of a nontrivial input vector process $d (k)$ and a nontrivial initial state vector $x (0)$ such that all the conditions of Lemma 11.1 are satisfied and the plant output vector process is constantly equal to zero, it is necessary that the matrix $G (λ)$ is not of full column rank. When the system is of single input and single output, it is obvious that λ must be one of the zeros of its transfer function, as the minimal realization assumption does not permit the existence of a zero in the transfer function that is equal to one of its poles. The situation becomes much more complicated when the system is of multiple inputs and multiple outputs. In this case, even if its state space model is a minimal realization, there still exists possibility that some of its zeros and some of its poles share the same value. To clarify the mathematical descriptions of a system zero and its relations to the zero dynamics of the system, the following concepts are required.

Definition 11.3

A vector $x (0) \in R^{n}$ is called weakly unobservable if there exists an input vector sequence $d (k) |_{k = 0}^{\infty}$ , such that the corresponding output vector sequence $y (k)$ is constantly equal to zero, that is, $y (k, x (0), d (j) |_{j = 0}^{k - 1}) = 0$ for each $k = 0, 1, 2, \dots$ .

Denote by $V$ the set consisting of all weakly unobservable vectors of a system. From the linearity of the system we can straightforwardly proven that this set is actually a subspace of $R^{n}$ . Due to this reason, the set $V$ is sometimes called the weakly unobservable subspace of the system [11,12]. In fact, if we define a subspace $V_{k}$ with $k = 0, 1, 2, \dots$ , as

$\begin{matrix} V_{k} & = {x (0) | x (0) \in R^{n}, there is an input sequence d (k) |_{k = 0}^{\infty}, \\ such that y (i) = 0 for each i = 0, 1, \dots, k - 1}, \end{matrix}$

then it is obvious from the definition of this subspace sequence that

$V_{0} \supseteq V_{1} \supseteq V_{2} \supseteq V_{3} \supseteq \dots .$

In addition, we can directly prove that there exists a nonnegative integer $0 ⩽ k ⩽ n$ such that

$V = V_{k} = V_{k + i} for all i ⩾ 0 .$

Let $F \in R^{q \times n}$ and $L \in R^{q \times s}$ satisfy

$(A + B F) V \subseteq V, N u l l (C + D F) \supseteq V, S p a n (L) = N u l l (D) ⋂ (B^{†} V) .$

The existence of these matrices is shown in [11]. Then we can prove that, for each $x (0) \in V$ , an input vector sequence $d (k) |_{k = 0}^{\infty}$ that satisfies

$y (k, x (0), d (j) |_{j = 0}^{k - 1}) = 0 for each k = 0, 1, 2, \dots,$

can be parameterized so that, for an arbitrary $k \in {0, 1, 2, \dots}$ ,

$d (k) = F x (k) + L w (k),$

where $w (k)$ is an arbitrary real vector sequence with compatible dimension. Moreover, we can also prove that if $x (0) \in V$ and $d (i) |_{i = 0}^{\infty}$ with $d (i) \in R^{q}$ is an associated input vector sequence such that $y (k, x (0), d (i) |_{i = 0}^{k - 1}) = 0$ for each $k = 1, 2, \dots$ , then the associated state vector sequence $x (k) |_{k = 0}^{\infty}$ , denoted $x (k, x (0), d (i) |_{i = 0}^{k - 1}) |_{k = 0}^{\infty}$ , satisfies $x (k, x (0), d (i) |_{i = 0}^{k - 1}) \in V$ for each $k = 1, 2, \dots$ .

Some other properties of this weakly unobservable subspace can be found, for example, in [11,12].

The concept of zero dynamics of a system is established on the basis of its weakly unobservable subspace.

Definition 11.4

When the initial state vector of the system described by Eqs. (11.14) and (11.15) is restricted to belong to its weakly unobservable subspace $V$ , its associated input–output relation is called its zero dynamics.

The zero dynamics of a system is also connected to its weakly unobservable subspace through its system matrix, a concept suggested originally by Rosenbrock [11,13].

Lemma 11.2

Concerning the system described by Eqs. (11.14) and (11.15), assume that the matrix $[\begin{matrix} B \\ D \end{matrix}]$ is of full column rank. Define its system matrix

$P (z) = [\begin{matrix} z I - A & - B \\ C & D \end{matrix}] .$

(11.22)

Suppose that there exist a vector $x (0) \in R^{n}$ and a vector $d (0) \in R^{q}$ such that at least one of these two vectors is not equal to zero, and at a real value of the complex variable z, denoted λ, the following equality is satisfied:

$P (λ) [\begin{matrix} x (0) \\ d (0) \end{matrix}] = 0 .$

(11.23)

Then for each $k = 1, 2, \dots$ , the input vector sequence $d (k) = λ^{k} d (0)$ satisfies both $y (k, x (0), d (i) |_{i = 0}^{k}) = 0$ and $d (k) \in R^{q}$ .

Proof

Note that the system defined by Eqs. (11.14) and (11.15) is well defined. This means that for a fixed initial state vector $x (0)$ and each input vector sequence $d (k) |_{k = 0}^{\infty}$ , there are only one state vector sequence $x (k) |_{k = 0}^{\infty}$ and only one output vector sequence $y (k) |_{k = 0}^{\infty}$ that satisfy these two equations simultaneously.

When condition (11.23) is satisfied, define $x (k)$ as $x (k) = λ^{k} x (0)$ for every $k = 0, 1, 2, \dots$ . Then, we can straightforwardly prove that

$x (1) = λ x (0) = A x (0) + B d (0) .$

Assume now that, for an integer i with $i ⩾ 0$ , Eq. (11.14) is satisfied by $x (i) = λ^{i} x (0)$ and $d (i) = λ^{i} d (0)$ . Then, we can straightforwardly prove that

$\begin{matrix} x (i + 2) & = λ^{i + 2} x (0) = λ x (i + 1) = λ [A x (i) + B d (i)] = A [λ x (i)] + B [λ d (i)] \\ = A x (i + 1) + B d (i + 1), \end{matrix}$

that is, when $k = i + 1$ , the aforementioned state vector process and input vector process also satisfy Eq. (11.14). Therefore, $x (k) = λ^{k} x (0)$ and $d (k) = λ^{k} d (0)$ satisfy this equation for every $k ⩾ 0$ . Hence, $x (k) = λ^{k} x (0)$ is the solution to this difference equation with its initial state vector $x (0)$ and input vector process $d (k) = λ^{k} d (0)$ .

On the other hand, note that, for each $k = 0, 1, 2, \dots$ ,

$C x (k) + D d (k) = C (λ^{k} x (0)) + D (λ^{k} d (0)) = λ^{k} [C x (0) + D d (0)] = 0,$

provided that the vectors $x (0)$ and $d (0)$ satisfy conditions (11.23). Therefore, $y (0) = 0$ , and for each $k = 1, 2, \dots$ ,

$y (k, x (0), d (i) |_{i = 0}^{k - 1}) = 0 .$

Now, assume that λ takes a complex value. Denote its real and imaginary parts respectively by $λ_{r}$ and $λ_{j}$ . Recall that the matrices A, B, C, and D and the vectors $x (0)$ and $d (0)$ are real valued. Satisfaction of Eq. (11.23) implies that

$[\begin{matrix} λ_{r} I - A \\ C \end{matrix}] x (0) + [\begin{matrix} - B \\ D \end{matrix}] d (0) = 0, λ_{j} x (0) = 0 .$

In addition, note that

$[\begin{matrix} - B \\ D \end{matrix}] = [\begin{matrix} - I & 0 \\ 0 & I \end{matrix}] [\begin{matrix} B \\ D \end{matrix}],$

which means that

$rank ([\begin{matrix} - B \\ D \end{matrix}]) = rank ([\begin{matrix} B \\ D \end{matrix}]) .$

If $λ_{j} \neq 0$ , then it is necessary that $x (0) = 0$ . As the matrix $[\begin{matrix} B \\ D \end{matrix}]$ is assumed to be of full column rank, $x (0) = 0$ and the first subequation in the above equation imply that the vector $d (0)$ is also equal to zero. This is a contradiction with the requirement that at least one of the vectors $x (0)$ and $d (0)$ is not equal to zero. Hence, $λ_{j} = 0$ , that is, λ takes a real value.

These arguments show that if there is λ satisfying Eq. (11.23), then it is necessary that this λ is real valued. Hence, $λ^{k} d (0)$ is real valued for each $k = 0, 1, 2, \dots$ . This completes the proof. □

Clearly, a real vector satisfying Eq. (11.23) belongs to the weakly unobservable subspace $V$ of the system.

From the proof of Lemma 11.2 it is obvious that if both the system initial state vector $x (0)$ and its input vector sequence $d (k) |_{k = 0}^{\infty}$ are allowed to take complex values, then the associated results are valid even when the complex variable z takes a complex value that satisfies Eq. (11.23). In this case, both $x (0)$ and $d (0)$ may be complex vectors, which will lead to a complex-valued sequence $d (k) |_{k = 0}^{\infty}$ that makes the associated system output vector sequence constantly equal to zero.

On the other hand, Lemma 11.2 does not require that λ is different from every eigenvalue of the state transition matrix A, which makes its results more general than those on system zero dynamics derived from Lemma 11.1. In fact, if λ is not an eigenvalue of the state transition matrix A, then Eq. (11.23) straightforwardly leads to $[C {(λ I - A)}^{- 1} B + D] d (0) = 0$ , which can be rewritten as $G (λ) d (0) = 0$ . This is completely in the same form as the condition obtained from Lemma 11.1 for the system output vector sequence constantly equal to zero. These observations imply that when λ is not an eigenvalue of the matrix A, the conditions of Lemma 11.2 are equal to those derived from Lemma 11.1. Hence, we can regard that Lemma 11.2 includes some conclusions of Lemma 11.1 as its particular case.

To guarantee the existence of a nonzero vector $x (0)$ and/or a nonzero vector $d (0)$ such that Eq. (11.23) is satisfied, it is necessary that the matrix $P (λ)$ is not of full column rank. A complex value λ at which the system matrix $P (z)$ defined by Eq. (11.22) is not of full column rank is called a transmission zero of this system, which is an extension of a zero in a transfer function of a single-input single-output system to a zero of the transfer function matrix of a multiple-input multiple-output system [10,13].

Various methods have been developed for the computation of the transmission zeros of a multiple-input multiple-output system, among which one of most widely adopted methods appears to be the approach based on the so-called McMillan–Smith form [10,11]. In addition, the proof of the lemma reveals that to construct an actually realizable system initial state vector and an actually realizable system input vector process such that the system output vector process is constantly equal to zero, real-valued transmission zeros must be considered.

Our discussions show that if a system has a real transmission zero, then there certainly exist a real-valued initial state vector and a real-valued input vector sequence that lead to a constantly zero output vector sequence. However, it is still not clear that if there is a real-valued initial state vector $x (0)$ and a real-valued input vector sequence $d (k) |_{k = 0}^{\infty}$ such that the associated output vector sequence $y (k, x (0), d (j) |_{j = 0}^{k})$ is constantly zero, whether or not the system must have at least one real transmission zero.

To attack this problem, we need the following concept.

Definition 11.5

If for all permissible input vector sequences $u_{1} (k) |_{k = 0}^{\infty}$ and $u_{2} (k) |_{k = 0}^{\infty}$ , the satisfaction by the system output vector sequence $y (k) |_{k = 0}^{\infty}$ of the condition

$y (k, 0, u_{1} (j) |_{j = 0}^{k}) = y (k, 0, u_{2} (j) |_{j = 0}^{k})$

for all $k = 0, 1, 2, \dots$ implies that

$u_{1} (k) = u_{2} (k)$

for all $k = 0, 1, 2, \dots$ , then this system is called left invertible.

When a system is linear, it is obvious from the definition that a necessary and sufficient condition for its left invertibility is that $y (k, 0, u (j) |_{j = 0}^{k}) \equiv 0$ if and only if $u (k) \equiv 0$ . For the linear time-invariant system described by Eqs. (11.14) and (11.15) that has a minimal realization, let $G (z)$ denote its transfer function matrix $C {(z I - A)}^{- 1} B + D$ . It has been proven that the following statements are equivalent to each other [11]:

• This system is left invertible.
• There exists a rational matrix-valued function $G_{l} (z)$ such that $G_{l} (z) G (z) = I$ , that is, the transfer function matrix $G (z)$ has a left inverse.
• The system matrix $P (z)$ defined in Eq. (11.22) has a rank of $n + q$ for all but finitely many $z \in C$ .
• Let $V$ denote the weakly unobservable subspace of the system. Then

$V ⋂ {B \times N u l l (D)} = {0} and [\begin{matrix} B \\ D \end{matrix}] is of full column rank.$

Related to the concept of weak observability, there is also a concept called strong observability.

Definition 11.6

If for every permissible initial state vector $x (0)$ and every feasible input vector sequence $u (k) |_{k = 0}^{\infty}$ , the system output vector sequence $y (k) |_{k = 0}^{\infty}$ satisfies the equality

$y (k, 0, u (j) |_{j = 0}^{k}) = 0$

for all $k = 0, 1, 2, \dots$ only when

$x (0) = 0,$

then this system is called strongly observable.

For a linear time invariant system, the following results have been established in [11].

Lemma 11.3

A system with its state space model described by Eqs. (11.14) and (11.15) is strongly observable, if and only if at every complex value of the complex variable z,

$rank ([\begin{matrix} z I - A & - B \\ C & D \end{matrix}]) = n + rank ([\begin{matrix} B \\ D \end{matrix}]) .$

From Definitions 11.3 and 11.6 it is obvious that a system is strongly observable if and only if its weakly unobservable subspace is constituted only from the zero vector. It has also been proven that, for the linear time-invariant system of Eqs. (11.14) and (11.15), its strong observability is equivalent to that, for each real matrix F with compatible dimension, the matrix pair $(A + B F, C + D F)$ is always observable [11].

Based on these results, the following conclusions are derived, which reveal situations under which the weakly unobservable subspace of a system is not restricted only to the zero vector.

Theorem 11.1

Assume that the system described by Eqs. (11.14) and (11.15) is left invertible. Then there exists at least one nonzero initial state vector $x (0)$ such that there is an input vector sequence $d (k) |_{k = 0}^{\infty}$ that makes the system output vector $y (k, x (0), d (j) |_{j = 0}^{k})$ constantly equal zero if and only if it has a transmission zero.

Proof

Assume that the system has a real transmission zero. Then by Lemma 11.2 there certainly exists a nonzero initial state vector satisfying the requirements.

Now, assume that all the transmission zeros of the system are complex. Let λ be any one of them. Then there exist vectors $x_{⁎} (0)$ and $d_{⁎} (0)$ such that at least one of them is not equal to zero and the following equality is satisfied:

$[\begin{matrix} λ I - A & - B \\ C & D \end{matrix}] [\begin{matrix} x_{⁎} (0) \\ d_{⁎} (0) \end{matrix}] = 0 .$

(11.24)

Recall that all the system parameter matrices A, B, C, and D are real. Taking the conjugates of both sides of this equation, we obtain the following equality:

$\overline{[\begin{matrix} λ I - A & - B \\ C & D \end{matrix}] [\begin{matrix} x_{⁎} (0) \\ d_{⁎} (0) \end{matrix}]} = [\begin{matrix} \bar{λ} I - A & - B \\ C & D \end{matrix}] [\begin{matrix} {\bar{x}}_{⁎} (0) \\ {\bar{d}}_{⁎} (0) \end{matrix}] = 0 .$

(11.25)

As $col {x_{⁎} (0), d_{⁎} (0)}$ does not equal zero, it is obvious that the vector $col {{\bar{x}}_{⁎} (0), {\bar{d}}_{⁎} (0)}$ is not a zero vector either. These imply that ${\bar{λ}}_{0}$ is also a transmission zero of the system.

Define $d_{⁎} (k)$ as $d_{⁎} (k) = λ^{k} d (0)$ , $k = 0, 1, 2, \dots$ . Let $d_{⁎} (z)$ and $d_{⋄} (z)$ denote respectively the $Z$ -transformations of the sequences $d_{⁎} (k) |_{k = 0}^{\infty})$ and ${\bar{d}}_{⁎} (k) |_{k = 0}^{\infty})$ . Moreover, let $y_{⁎} (z)$ and $y_{⋄} (z)$ denote respectively the $Z$ -transformations of the output vector sequences $y (k, x_{⁎} (0), d_{⁎} (j) |_{j = 0}^{k})$ and $y (k, {\bar{x}}_{⁎} (0), {\bar{d}}_{⁎} (j) |_{j = 0}^{k})$ . According to the proof of Lemma 11.2, we have that

$y (k, x_{⁎} (0), d_{⁎} (j) |_{j = 0}^{k - 1}) = 0, y (k, {\bar{x}}_{⁎} (0), {\bar{d}}_{⁎} (j) |_{j = 0}^{k - 1}) = 0 for all k = 0, 1, 2, \dots .$

(11.26)

Define $x (0)$ and $d (k)$ further as

$x (0) = x_{⁎} (0) + {\bar{x}}_{⁎} (0), d (k) = d_{⁎} (k) + {\bar{d}}_{⁎} (k) for all k = 0, 1, 2, \dots .$

Then, both $x (0)$ and $d (k)$ with $k \in {0, 1, 2, \dots}$ are real valued and therefore can be realized in principle.

Let $y (z)$ denote the $Z$ -transformation of the output vector sequence associated with the system initial state vector $x (0)$ and the input vector sequence $d (k) |_{k = 0}^{\infty}$ . Then by Eq. (11.19) we have that

$\begin{matrix} y (z) & = & C {(z I - A)}^{- 1} [x (0) + B d (z)] + D d (z) \\ = & C {(z I - A)}^{- 1} {[x_{⁎} (0) + {\bar{x}}_{⁎} (0)] + B [d_{⁎} (z) + d_{⋄} (z)]} + D [d_{⁎} (z) + d_{⋄} (z)] \\ = & {C {(z I - A)}^{- 1} [x_{⁎} (0) + B d_{⁎} (z)] + D d_{⁎} (z)} \\ + {C {(z I - A)}^{- 1} [x_{⁎} (0) + B d_{⋄} (z)] + D d_{⋄} (z)} \\ = & y_{⁎} (z) + y_{⋄} (z) . \end{matrix}$

(11.27)

We can therefore declare from Eq. (11.26) that, for each $k = 0, 1, 2, \dots$ ,

$y (k, x (0), d (j) |_{j = 0}^{k}) = 0 .$

(11.28)

On the contrary, assume that the system described by Eqs. (11.14) and (11.15) does not have a transmission zero but there is a nonzero initial state vector $x (0)$ such that there exists an input sequence $d (k) |_{k = 0}^{\infty}$ that makes $y (k, x (0), d (j) |_{j = 0}^{k}) = 0$ for every $k = 0, 1, 2, \dots$ . Then, for arbitrary $λ \in C$ , the matrix

$[\begin{matrix} λ I - A & - B \\ C & D \end{matrix}]$

is of full column rank. Hence

$rank ([\begin{matrix} λ I - A & - B \\ C & D \end{matrix}]) = n + q = n + rank ([\begin{matrix} B \\ D \end{matrix}])$

By Lemma 11.3 this system is strongly observable. Hence, if there is an input sequence $d (k) |_{k = 0}^{\infty}$ that makes $y (k, x (0), d (j) |_{j = 0}^{k}) = 0$ for every $k = 0, 1, 2, \dots$ , then $x (0) = 0$ . This contradicts the assumption $x (0) \neq 0$ . Therefore, the system must have some transmission zeros.

This completes the proof. □

Different from Lemma 11.2, the transmission zero in the theorem is not required to be real. On the other hand, both the initial state vector and the input vector sequence are required to be real.

11.3.2 Attack Prevention

From the results in the previous subsection it is clear that a linear time-invariant system is left invertible if and only if its transfer function matrix is of full normal column rank. This observation leads further to the following conclusions.

Corollary 11.1

If an LTI system is not left invertible, then it is certainly not attack detectable/identifiable.

Proof

Assume that a transfer function matrix $G (z)$ is not of FNCR. Then there exists a nonzero vector α such that, for each $λ \in C$ , the following equality is satisfied:

$G (z) α = 0 .$

(11.29)

Let $(A, B, C, D)$ be a real matrix quadruplex that satisfies $C {(z I - A)}^{- 1} B + D = G (z)$ and associates with a state space model having a minimal realization. Take an arbitrary real scalar λ that is not an eigenvalue of the matrix A and denote the associated real and imaginary parts of the vector α by $α_{r}$ and $α_{j}$ , respectively. Then, the Eq. (11.29) means that the following two equalities are satisfied:

$[C {(λ I - A)}^{- 1} B + D] α_{r} = 0, [C {(λ I - A)}^{- 1} B + D] α_{j} = 0 .$

(11.30)

Note that $α \neq 0$ is equivalent to $[\begin{matrix} α_{r} \\ α_{j} \end{matrix}] \neq 0$ , that is, the assumption $α \neq 0$ implies that at least one of $α_{r}$ and $α_{j}$ is not equal to zero.

Assume that $α_{r} \neq 0$ . Then, the first equality of Eq. (11.30) can be equivalently rewritten as

$[\begin{matrix} λ I - A & - B \\ C & D \end{matrix}] [\begin{matrix} {(λ I - A)}^{- 1} B α_{r} \\ α_{r} \end{matrix}] = 0 .$

(11.31)

Note that both vectors ${(λ I - A)}^{- 1} B α_{r}$ and $α_{r}$ are real. Moreover, the vector $[\begin{matrix} {(λ I - A)}^{- 1} B α_{r} \\ α_{r} \end{matrix}]$ is not a zero vector. By Lemma 11.2 there always exist a nonzero $x (0) \in R^{n}$ and an input vector sequence $d (k) \in R^{q}$ , not constantly equal to zero, such that the output vector process of the LTI system, which can be expressed as $y (k, x (0), d (i) |_{i = 0}^{k})$ , is constantly equal to zero for each $k = 0, 1, \dots$ . Hence the system is not attack detectable/identifiable.

When $α_{j} \neq 0$ , the same arguments show that the system is not attack detectable/identifiable.

This completes proof. □

These observations reveal that the problem of attack preventions is not trivial only when the associated system is left invertible.

With these preparations on system zero dynamics, we discuss now conditions respectively for the detectability and identifiability of attacks in a networked system.

Theorem 11.2

Concerning the system described by Eqs. (11.14) and (11.15), an attack set $K$ is undetectable if and only if there exists a number λ such that the system matrix

$P_{d} (λ) = [\begin{matrix} λ I - A & - B_{K} \\ C & D_{K} \end{matrix}]$

(11.32)

is not of full column rank.

Proof

Assume that the matrix $P_{d} (λ)$ is not of full column rank at a particular real value of the complex variable λ, denoted $λ_{0}$ . Then, there exist vectors $x (0)$ and $d (0)$ such that at least one of these two vectors is not a zero vector and

$(λ_{0} I - A) x (0) - B_{K} d (0) = 0,$

(11.33)

$C x (0) + D_{K} d (0) = 0 .$

(11.34)

As all the involved matrices, that is, the state transition matrix A, the input matrix $B_{K}$ , the output matrix C, and the direct feedthrough matrix $D_{K}$ , and the scalar $λ_{0}$ are real valued, we can assume, without any loss of generality, that both vectors $x (0)$ and $d (0)$ are real valued. In fact, if they are not real valued, then denote their real and imaginary parts respectively by $x_{r} (0)$ , $d_{r} (0)$ , $x_{j} (0)$ , and $d_{j} (0)$ . Then Eqs. (11.33) and (11.34) imply that

$(λ_{0} I - A) x_{r} (0) - B_{K} d_{r} (0) = 0, (λ_{0} I - A) x_{j} (0) - B_{K} d_{j} (0) = 0,$

(11.35)

$C x_{r} (0) + D_{K} d_{r} (0) = 0, C x_{j} (0) + D_{K} d_{j} (0) = 0 .$

(11.36)

Note that by their definitions all the vectors $x_{r} (0)$ , $d_{r} (0)$ , $x_{j} (0)$ , and $d_{j} (0)$ are real valued. Moreover, at least one of them is not equal to zero. Reasonability of the adopted assumption immediately follows.

We now can declare the undetectability of the attack set $K$ through a direct application of Lemma 11.2.

Assume now that the system matrix $P_{d} (λ)$ is not of full column rank at a particular complex value $λ_{0}$ . If the system is not left invertible, then according to the aforementioned arguments, we can declare that the system is not detectable of the attack set $K$ .

Now, assume that the system is left invertible. Then from Theorem 11.1 we can declare that the attack set $K$ is not detectable.

Assume that the attack set $K$ is detectable. Then, by Corollary 11.1 its transfer function matrix must be of full normal column rank. From this observation and the definition of attack detectability, this system cannot have any transmission zero.

This completes proof. □

From Eqs. (11.33) and (11.34) we can also see that for the existence of an undetectable attack, it is necessary that the cardinality of the attack set is sufficiently large [7]. On the other hand, it is possible to improve attack detectability of a networked system through introducing state feedback into the system. The motivations are that a state feedback can modify transmission zeros of a system and the feedback gain can be designed with objectives that are unknown to an attacker.

Similarly, we have the following criterion for attack identifiability of the networked system described by Eqs. (11.14) and (11.15).

Theorem 11.3

Concerning the system described by Eqs. (11.14) and (11.15), an attack set $K$ is unidentifiable, if and only if there exist an attack set $R$ with its cardinality not greater than that of the attack set $K$ and a number λ such that the matrix

$P_{i} (λ) = [\begin{matrix} λ I - A & - B_{K} & - B_{R} \\ C & D_{K} & D_{R} \end{matrix}]$

(11.37)

is not of full column rank.

Proof

Assume that at a particular real value of the complex variable λ, denoted $λ_{0}$ , the matrix-valued function $P_{i} (λ)$ is not of full column rank. Then there exist vectors $x (0)$ , $d_{K} (0)$ , and $d_{R} (0)$ such that at least one of these three vectors is not a zero vector, and

$(λ_{0} I - A) x (0) - B_{K} d_{K} (0) - B_{R} d_{R} (0) = 0,$

(11.38)

$C x (0) + D_{K} d_{K} (0) + D_{R} d_{R} (0) = 0 .$

(11.39)

Using the same arguments as in the proof of Theorem 11.2, we can assume, without loss of any generality, that the vectors $x (0)$ , $d_{K} (0)$ , and $d_{R} (0)$ are real valued. Then by Lemma 11.2 there exist an initial state vector $x (0) \in R^{n}$ and an input vector sequence $u (k) \in R^{K + R}$ , $k = 0, 1, \dots$ , such that the output vector sequence $y (k)$ of the following system is constantly equal to zero.

$x (k + 1) = A x (k) + [B_{K} B_{R}] u (k),$

(11.40)

$y (k) = C x (k) + [D_{K} D_{R}] u (k) .$

(11.41)

Moreover, at least either the initial state vector $x (0)$ is not a zero vector, or the input vector sequence $u (k) |_{k = 0}^{\infty}$ is not constantly equal to zero. Note that both matrices $[\begin{matrix} B_{K} \\ D_{K} \end{matrix}]$ and $[\begin{matrix} B_{R} \\ D_{R} \end{matrix}]$ are of full column rank. Direct algebraic manipulations show that, for this plant initial state vector $\tilde{x} (0) \in R^{n}$ , there exist an attack $d (k) |_{k = 0}^{\infty}$ with K elements in $d (k) |_{k = 0}^{\infty}$ not constantly zero, an attack $\bar{d} (k) |_{k = 0}^{\infty}$ with $\bar{K}$ elements in $d (k) |_{k = 0}^{\infty}$ not constantly zero, and the integer $\bar{K}$ satisfying $0 ⩽ \bar{K} ⩽ K$ such that

$y (k, \tilde{x} (0), d (j) - \bar{d} (j) |_{j = 0}^{k}) \equiv 0, | | d (k) |_{k = 0}^{\infty} - \bar{d} (k) |_{k = 0}^{\infty} | | \neq 0,$

that is, the attack set $K$ is not identifiable.

By the same token as that adopted in the proof of Theorem 11.2, we can prove that even if the aforementioned $λ_{0}$ takes a particular complex value, the system is also unidentifiable with respect to the attack set $K$ .

This completes the proof. □

On the basis of geometric linear system theories, some graph theoretic conditions have also been established for the existence of an undetectable attack set in a networked system and for the existence of an unidentifiable attack set in a networked system. These conditions depend only on system structures and are valid for almost all system parameters when the networked system has a compatible structure. A system property with these characteristics is extensively called generic and has the advantage of strong robustness against parametric modeling errors [14]. When the initial state vector of a networked system is known, it is proven that if the attack-state-output graph is sufficiently connected, then undetectable attacks do not exist in almost all structurally compatible networked systems. When the initial state vector is not known for a networked system, the criterion becomes more complicated, and left invertibility of the system may be required. Detailed discussions can be found, for example, in [7,8].

11.4 Detection of Attacks

In the above section, we developed some criteria to verify whether or not an attack can be detected or identified. These criteria are closely related to the transmission zeros of a networked system and can be verified in principle. When the scale of the system is large, however, computations of its transmission zeros may be computationally prohibitive and/or numerically unstable. This means that further efforts are required to develop a computationally attractive method for verifying attack detectability and identifiability. To achieve this objective, the methods given in Chapter 3 may be helpful, in which structure information is explicitly and efficiently utilized in the verification of controllability and observability of a large-scale networked system, noting that both the system matrix $P_{d} (λ)$ of Eq. (11.32) and the system matrix $P_{i} (λ)$ of Eq. (11.37) have a similar form of the matrix-valued polynomial $M (λ)$ of Eq. (3.29). In this section, we investigate how to detect an attack using a centralized observer.

The design of an attack detector consists of a discrete-time modified Luenberger observer, which is sometimes also called a residual filter, with its input being the plant output measurements $y (k) |_{k = 0}^{\infty}$ and its output being a residual signal $r (k) |_{k = 0}^{\infty}$ :

$z (k + 1) = A z (k) - L [y (k) - C z (k)],$

(11.42)

$r (k) = C z (k) - y (k),$

(11.43)

where the matrix L is usually called the gain of the observer, which is selected to make the matrix $A + L C$ stable, that is, all its eigenvalues have a magnitude smaller than 1. When the matrix pair $(A, C)$ is observable, the existence of a desirable gain matrix L is always guaranteed. The residue signal $r (k)$ is used to detect the existence of an attack. Ideally, this signal is constantly equal to zero if and only if there do not exist any attacks in the networked system. As the initial state vector of the observer also affects the residual signal, these expectations cannot be satisfied in general, and some modifications are required to develop a practically meaningful criterion to check the existence of an attack. However, we have the following theoretical results on the aforementioned residual filter.

Theorem 11.4

Concerning the system described by Eqs. (11.14) and (11.15), assume that an attack set $K$ is detectable and the initial state vector of the system, that is, $x (0)$ , is known. Assume also that the initial state vector of the residual filter described by Eqs. (11.42) and (11.43), that is, $z (0)$ , is set to $z (0) = x (0)$ , and the gain matrix L is selected such that the matrix $A + L C$ is stable. Then, the residual signal of the filter is constantly equal to zero if and only if the attack disturbance is constant equal to zero.

Proof

Define the vector-valued function $e (k) = z (k) - x (k)$ . Then from Eqs. (11.14), (11.15), (11.42), and (11.43) we can straightforwardly prove that

$\begin{matrix} e (k + 1) & = & z (k + 1) - x (k + 1) \\ = & {A z (k) - L [C x (k) + D_{K} d_{K} (k) - C z (k)]} - {A x (k) + B_{K} d_{K} (k)} \\ = & (A + L C) e (k) - (L D_{K} + B_{K}) d_{K} (k), \end{matrix}$

(11.44)

$\begin{matrix} r (k) & = & C z (k) - [C x (k) + D_{K} d_{K} (k)] \\ = & C e (k) - D_{K} d_{K} (k) . \end{matrix}$

(11.45)

When the initial state vector of the detector is set to be the same value of the initial state vector of the system, that is, $z (0) = x (0)$ , we have that $e (0) = 0$ . Hence, when the matrix $A + L C$ has all its eigenvalues being smaller than 1 in magnitude and the attack disturbances do not exist, that is, $d_{K} (k) \equiv 0$ , it is obvious from Eq. (11.44) that the error vector sequence $e (k)$ with $k = 0, 1, 2, \dots$ is also constantly equal to zero. This further implies that the output vector of the detector, that is, the residual vector $r (k)$ is also constantly equal to zero when the temporal variable k takes any value from the set ${0, 1, 2, \dots}$ .

Now assume that there exists a nonzero attack vector sequence $d_{K} (k) |_{k = 0}^{\infty}$ such that the output of the attack detector described by Eqs. (11.42) and (11.43) is constantly equal to zero. Then by Theorem 11.4 there must exist at least one $λ \in C$ , one vector $\bar{e} (0)$ , and one vector ${\bar{d}}_{K} (0)$ such that at least one of these two vectors is not equal to zero and

$[\begin{matrix} λ I - (A + L C) & L D_{K} + B_{K} \\ C & - D_{K} \end{matrix}] [\begin{matrix} \bar{e} (0) \\ {\bar{d}}_{K} (0) \end{matrix}] = 0 .$

(11.46)

Note that

$[\begin{matrix} λ I - (A + L C) & L D_{K} + B_{K} \\ C & - D_{K} \end{matrix}] = [\begin{matrix} I & - L \\ 0 & I \end{matrix}] [\begin{matrix} λ I - A & B_{K} \\ C & - D_{K} \end{matrix}] .$

Note also that the matrix $[\begin{matrix} I & - L \\ 0 & I \end{matrix}]$ is invertible for every real-valued matrix L with compatible dimension. It is obvious that the satisfaction of Eq. (11.46) is equivalent to the satisfaction of the equation

$[\begin{matrix} λ I - A & B_{K} \\ C & - D_{K} \end{matrix}] [\begin{matrix} \bar{e} (0) \\ {\bar{d}}_{K} (0) \end{matrix}] = 0,$

(11.47)

which contradicts the assumption that the attack set $K$ is detectable. This completes the proof. □

Although the results of Theorem 11.4 are promising, the system initial state vector $x (0)$ is usually not known exactly. Under such a situation, the condition $z (0) = x (0)$ can generally not be satisfied properly for the initial state vector of the residual filter, and $z (0)$ may be selected with some arbitrariness. This means that even if there do not exist attacks in the networked system, the residual signal $r (k)$ may not be constantly equal to zero. However, it can be proven that when the networked system has not been attacked, the residual signal $r (t)$ asymptotically converges to zero with the increment of the temporal variable k. On the other hand, measurement errors and external random disturbances are usually unavoidable in actual networked systems, which means that even if the conditions of Theorem 11.4 are perfectly satisfied, the residual signal $r (k)$ may not be constantly equal to zero and some statistical hypothesis verification methodologies appear necessary for the determination of whether or not there are attacks in the networked system. In addition to these, parametric uncertainties and unmodeled dynamics usually exist in the model of Eqs. (11.14) and (11.15). Hence, in actual implementations of attack detections, in addition to construct a stable residual filter, the gain matrix L should be selected to reduce sensitivities of the residual signal $r (k)$ to modeling errors while keeping it satisfactorily sensitive to attacks. Furthermore, for a large-scale networked system, it is usually more attractive to implement an attack detector in a distributed way using only local subsystem output measurements. Detailed discussions on these issues are given in various literature, for example, [9], [8], and [7].

11.5 Identification of Attacks

In actual applications, it is usually not only necessary to know whether or not there exists an attack in a networked system, but also essential to know where the attack is from when it exists. Differently from attack detection, attack identification is computationally much more difficult due to its combinatorial characteristics, although their essential treatments are the same. In fact, it has been proven in [8] that attack identifications are generally NP hard with respect to the number of colluding attackers. More precisely, the identification of an attack from the attack set $K$ usually requires a combinatorial procedure, noting that even when the number q of attackers is known, the actual attack is only one of the $(\begin{matrix} q \\ K \end{matrix})$ possible attacks. To identify this attack, each possibility must be considered, and for each possible attack, a residual filter needs be designed. When the attack set is identifiable, there exists one and only one residual filter whose outputs are constantly equal to zero. It is this residual filter that indicates the actual attack.

Unlike the residual filter in attack detection, the model of Eqs. (11.14) and (11.15) is not straightforwardly utilized in constructing the residual filters in attack identification. Usually, the design of these residual filters consists of three steps. First, a transformation is applied to the plant output vector to construct a vector without any attacks. Second, a state transformation is performed to divide the plant state vector into two parts, one affected by attack disturbances and the other is free from attack disturbances. The third step is to construct a residual filter with the transformed plant output vector and the transformed plant state vector that are not affected by attack disturbances, which is completely the same as that in the construction procedure for attack detections.

Now, we discuss the first step, in which the attack identification problem for the networked system described by Eqs. (11.14) and (11.15) is converted to that of a modified system in which the associated system output vector is not attacked.

For this purpose, let $D_{K}^{†}$ denote the pseudo-inverse of the matrix $D_{K}$ . Define an auxiliary networked system as

$x (k + 1) = [A - B_{K} D_{K}^{†} C] x (k) + B_{K} [I - D_{K}^{†} D_{K}] d (k),$

(11.48)

$y (k) = [I - D_{K} D_{K}^{†}] C x (k) .$

(11.49)

Then, we have the following conclusions.

Lemma 11.4

Let the networked system be described by Eqs. (11.14) and (11.15). The attack set $K$ is identifiable if and only if this attack set is identifiable for the networked system described by Eqs. (11.48) and (11.49).

Proof

According to Theorem 11.3, if the attack set $K$ is identifiable for the networked system described by Eqs. (11.14) and (11.15), then for each attack set $R$ with its cardinality not greater than that of the attack set $K$ , the following system matrix $P_{i} (λ)$ is of full column rank at each complex number λ:

$P_{i} (λ) = [\begin{matrix} λ I - A & - B_{K} & - B_{R} \\ C & D_{K} & D_{R} \end{matrix}] .$

(11.50)

It is equivalent to that the matrix ${\bar{P}}_{i} (λ)$ defined as

${\bar{P}}_{i} (λ) = [\begin{matrix} λ I - A & - B_{K} & - B_{R} \\ C & D_{K} & D_{R} \\ C & D_{K} & D_{R} \end{matrix}]$

(11.51)

is of full column rank at each complex number λ.

Multiplying both sides of the above equation from the left by $diag {I, D_{K} D_{K}^{†}, I - D_{K} D_{K}^{†}}$ , we have that

$[\begin{matrix} I \\ D_{K} D_{K}^{†} \\ I - D_{K} D_{K}^{†} \end{matrix}] {\bar{P}}_{i} (λ) = [\begin{matrix} λ I - A & - B_{K} & - B_{R} \\ D_{K} D_{K}^{†} C & D_{K} & D_{K} D_{K}^{†} D_{R} \\ [I - D_{K} D_{K}^{†}] C & 0 & [I - D_{K} D_{K}^{†}] D_{R} \end{matrix}] .$

(11.52)

Note that

$\begin{matrix} [\begin{matrix} I & B_{K} D_{K}^{†} & 0 \\ 0 & 0 & I \\ 0 & I & 0 \end{matrix}] [\begin{matrix} λ I - A & - B_{K} & - B_{R} \\ D_{K} D_{K}^{†} C & D_{K} & D_{K} D_{K}^{†} D_{R} \\ [I - D_{K} D_{K}^{†}] C & 0 & [I - D_{K} D_{K}^{†}] D_{R} \end{matrix}] \\ = & [\begin{matrix} λ I - A + B_{K} D_{K}^{†} C & - B_{K} (I - D_{K}^{†} D_{K}) & - B_{R} + B_{K} D_{K}^{†} D_{R} \\ [I - D_{K} D_{K}^{†}] C & 0 & [I - D_{K} D_{K}^{†}] D_{R} \\ D_{K} D_{K}^{†} C & D_{K} & D_{K} D_{K}^{†} D_{R} \end{matrix}] . \end{matrix}$

(11.53)

Moreover, the matrix

$[\begin{matrix} I & B_{K} D_{K}^{†} & 0 \\ 0 & 0 & I \\ 0 & I & 0 \end{matrix}]$

is always invertible, and it is obvious that the above equations mean that the matrix-valued function

$[\begin{matrix} λ I - A + B_{K} D_{K}^{†} C & - B_{K} (I - D_{K}^{†} D_{K}) & - B_{R} + B_{K} D_{K}^{†} D_{R} \\ [I - D_{K} D_{K}^{†}] C & 0 & [I - D_{K} D_{K}^{†}] D_{R} \end{matrix}]$

is also of full column rank at each complex λ. It can therefore be concluded from Theorem 11.3 that the attack set $K$ is also identifiable for the networked system described by Eqs. (11.48) and (11.49).

On the contrary, if the attack set $K$ is identifiable for the networked system described by Eqs. (11.48) and (11.49), then similar arguments show that this attack set is also identifiable for the networked system described by Eqs. (11.14) and (11.15).

This completes the proof. □

To construct a residual signal that is able to identify an attack, the state space model of Eqs. (11.48) and (11.49) is further transformed. To develop this transformation, however, some concepts and results are needed from geometric control theory, which can be found, for example, in [11,12].

Definition 11.7

Assume that the dynamics of a discrete-time and linear time-invariant system can be described by the following state space model:

$\begin{matrix} x (k + 1) = A x (k) + B u (k), \\ y (k) = C x (k), \end{matrix}$

where $x (k) \in R^{n}$ , $y (k) \in R^{p}$ , and $u (k) \in R^{q}$ denote respectively the state vector, input vector, and output vector of the system. Moreover, assume that $S$ is a subspace of $R^{n}$ . If

$A (S ⋂ N u l l (C)) \subseteq S,$

then this subspace is said to be $(A, N u l l (C))$ -conditioned invariant.

A conditioned invariant subspace is closely related to estimator designs. A well-known result is that the smallest $(A, N u l l (C))$ -conditioned invariant subspace that contains the subspace $S p a n (B)$ as its subset is the largest subspace in the state space $R^{n}$ of the system, which can be estimated in the presence of an unknown input sequence $u (k) |_{k = 0}^{\infty}$ . This conclusion is obviously of great significance in engineering, as it clarifies situations for the existence of a state estimator with unknown inputs. Another well-known result is that the null space of the observability matrix, that is,

$N u l l ({[C^{T} A^{T} C^{T} {(A^{T})}^{2} C^{T} \dots {(A^{T})}^{n - 1} C^{T}]}^{T}),$

which gives all the initial state vectors that result in a zero system output when there do not exist any external inputs are an $(A, N u l l (C))$ -conditioned invariant subspace.

It has been proven that for each $(A, N u l l (C))$ -conditioned invariant subspace $S$ , there exists at least one real matrix L such that

$(A + L C) S \subseteq S,$

that is, an output injection matrix that renders this subspace invariant.

To apply these concepts and results to the system described by Eqs. (11.48) and (11.49), let ${\bar{S}}_{K}$ denote the smallest $(A - B_{K} D_{K}^{†} C, N u l l ([I - D_{K} D_{K}^{†}] C))$ -conditioned invariant subspace that contains $S p a n (B_{K} [I - D_{K}^{†} D_{K}])$ as its subspace. Moreover, let L be a matrix that satisfies

${\bar{S}}_{K} \supseteq (A - B_{K} D_{K}^{†} C + L [I - D_{K} D_{K}^{†}] C) {\bar{S}}_{K}$

Existence of a desirable matrix L is guaranteed by the properties of the subspace ${\bar{S}}_{K}$ . Furthermore, let T be a nonsingular square matrix satisfying

$T [A - B_{K} D_{K}^{†} C + L C] T^{- 1} = [\begin{matrix} {\bar{A}}_{11} & {\bar{A}}_{12} \\ 0 & {\bar{A}}_{22} \end{matrix}], T B_{K} [I - D_{K}^{†} D_{K}] = [\begin{matrix} {\bar{B}}_{K} \\ 0 \end{matrix}],$

(11.54)

$[I - D_{K} D_{K}^{†}] C T^{- 1} = [{\bar{C}}_{1} {\bar{C}}_{2}]$

(11.55)

where ${\bar{B}}_{K}$ is a matrix of full row rank. We can prove using geometric control theories that a desirable matrix T always exists [8,11,12]. As a matter of fact, the matrix T can be constituted from a basis of the subspace ${\bar{S}}_{K}$ and a basis of the quotient subspace $R^{n} {\bar{S}}_{K}$ [11,12].

With the aforementioned matrix transformation, we can establish the following conclusions.

Lemma 11.5

Let the networked system be described by Eqs. (11.14) and (11.15). The attack set $K$ is identifiable if and only if this attack set is identifiable for the following networked system:

$[\begin{matrix} x_{1} (k + 1) \\ x_{2} (k + 1) \end{matrix}] = [\begin{matrix} {\bar{A}}_{11} & {\bar{A}}_{12} \\ 0 & {\bar{A}}_{22} \end{matrix}] [\begin{matrix} x_{1} (k) \\ x_{2} (k) \end{matrix}] + [\begin{matrix} {\bar{B}}_{K} \\ 0 \end{matrix}] d (k),$

(11.56)

$y (k) = [{\bar{C}}_{1} {\bar{C}}_{2}] [\begin{matrix} x_{1} (k) \\ x_{2} (k) \end{matrix}] .$

(11.57)

Proof

Assume that the networked system described by Eqs. (11.56) and (11.57) is attack identifiable for the attack set $K$ . Then, for each complex number λ and each attack set $R$ with its attacker number not greater than k, if

$[\begin{matrix} λ I - {\bar{A}}_{11} & - {\bar{A}}_{12} & {\bar{B}}_{K} & - B_{R 1} \\ 0 & λ I - {\bar{A}}_{22} & 0 & - B_{R 2} \\ {\bar{C}}_{1} & {\bar{C}}_{2} & 0 & D_{R} \end{matrix}] [\begin{matrix} x_{1} (0) \\ x_{2} (0) \\ g_{K} \\ g_{R} \end{matrix}] = 0,$

(11.58)

then it is necessary that all the vectors $x_{1} (0)$ , $x_{2} (0)$ , $g_{K}$ , and $g_{R}$ are zero vectors, that is,

${[x_{1}^{T} (0) x_{2}^{T} (0) g_{K}^{T} g_{R}^{T}]}^{T} = 0 .$

(11.59)

Note that

$\begin{matrix} [\begin{matrix} {\bar{A}}_{11} & {\bar{A}}_{12} & {\bar{B}}_{K} \\ 0 & {\bar{A}}_{22} & 0 \\ {\bar{C}}_{1} & {\bar{C}}_{2} & 0 \end{matrix}] & = [\begin{matrix} T [A - B_{K} D_{K}^{†} C + L C] T^{- 1} & T B_{K} [I - D_{K}^{†} D_{K}] \\ [I - D_{K} D_{K}^{†}] C T^{- 1} & 0 \end{matrix}] \\ = [\begin{matrix} T & 0 \\ 0 & I \end{matrix}] [\begin{matrix} A - B_{K} D_{K}^{†} C + L C & B_{K} [I - D_{K}^{†} D_{K}] \\ [I - D_{K} D_{K}^{†}] C & 0 \end{matrix}] [\begin{matrix} T^{- 1} & 0 \\ 0 & I \end{matrix}] . \end{matrix}$

(11.60)

We therefore have that

$\begin{matrix} [\begin{matrix} λ I - {\bar{A}}_{11} & - {\bar{A}}_{12} & {\bar{B}}_{K} & - B_{R 1} \\ 0 & λ I - {\bar{A}}_{22} & 0 & - B_{R 2} \\ {\bar{C}}_{1} & {\bar{C}}_{2} & 0 & D_{R} \end{matrix}] = [\begin{matrix} T & 0 \\ 0 & I \end{matrix}] \\ \times [\begin{matrix} λ I - [A - B_{K} D_{K}^{†} C + L C] & - B_{K} [I - D_{K}^{†} D_{K}] & - T^{- 1} [\begin{matrix} B_{R 1} \\ B_{R 2} \end{matrix}] \\ - [I - D_{K} D_{K}^{†}] C & 0 & D_{R} \end{matrix}] [\begin{matrix} T^{- 1} & 0 & 0 \\ 0 & I & 0 \\ 0 & 0 & I \end{matrix}] . \end{matrix}$

(11.61)

As the matrix T is regular by its definition, it is obvious that both two matrices

$[\begin{matrix} T & 0 \\ 0 & I \end{matrix}], [\begin{matrix} T^{- 1} & 0 & 0 \\ 0 & I & 0 \\ 0 & 0 & I \end{matrix}]$

are well defined and are of full rank. We can therefore claim from Eqs. (11.58) and (11.61) that the matrix

$[\begin{matrix} λ I - [A - B_{K} D_{K}^{†} C + L C] & - B_{K} [I - D_{K}^{†} D_{K}] & B_{R} \\ - [I - D_{K} D_{K}^{†}] C & 0 & D_{R} \end{matrix}]$

is also of full column rank for each complex number λ and each attack set $R$ with its attacker number not greater than that of the attack set $K$ . Hence, from Lemma 11.4 and Theorem 11.3 we can declare attack identifiability for the attack set $K$ in the networked system described by Eqs. (11.14) and (11.15).

Similar arguments show that if the attack set $K$ is identifiable for the networked system described by Eqs. (11.14) and (11.15), then it is also identifiable for the networked system described by Eqs. (11.56) and (11.57).

This completes the proof. □

Note that in the state space model of Eqs. (11.56) and (11.57), the state vector $x_{2} (k)$ is completely isolated from any attack disturbances. Using this property, a residual filter can be constructed as

$w (k + 1) = [{\bar{A}}_{22} - \bar{L} (I - {\bar{C}}_{1} {\bar{C}}_{1}^{†})] w (k) - \bar{L} \bar{y} (k),$

(11.62)

$\bar{y} (k) = [I - {\bar{C}}_{1} {\bar{C}}_{1}^{†}] y (k),$

(11.63)

$r_{K} (k) = [I - {\bar{C}}_{1} {\bar{C}}_{1}^{†}] {\bar{C}}_{2} w (k) - \bar{y} (k) .$

(11.64)

This residual filter is very similar to that for attack detections given in the previous section, except that the output vector $y (k)$ is replaced by a modified output vector $\bar{y} (k)$ . Obviously, this modification is removing ${\bar{C}}_{1} x_{1} (k)$ from the output vector $y (k)$ with the purpose of constructing an output vector that is not affected by any attack disturbances, noting that the substate vector $x_{2} (k)$ is independent of the attack disturbance vector $d (k)$ and

$\begin{matrix} \bar{y} (k) = [I - {\bar{C}}_{1} {\bar{C}}_{1}^{†}] y (k) & = & [I - {\bar{C}}_{1} {\bar{C}}_{1}^{†}] [{\bar{C}}_{1} {\bar{C}}_{2}] [\begin{matrix} x_{1} (k) \\ x_{2} (k) \end{matrix}] \\ = & [I - {\bar{C}}_{1} {\bar{C}}_{1}^{†}] {\bar{C}}_{2} x_{2} (k) . \end{matrix}$

With this residual filter, we have the following results.

Theorem 11.5

Let the networked system be described by Eqs. (11.14) and (11.15). Assume that the attack set $K$ is identifiable. Moreover, assume that the initial state vector $x (0)$ of the networked system is known. Furthermore, assume that the initial state vector of the residual filter described by Eqs. (11.62)–(11.64) is set as $w (0) = x_{2} (0)$ and that the gain matrix $\bar{L}$ is selected such that the matrix ${\bar{A}}_{22} - \bar{L} (I - {\bar{C}}_{1} {\bar{C}}_{1}^{†}) {\bar{C}}_{2}$ is stable. Then the residual signal $r (k)$ of the above filter is constantly equal to zero if and only if both matrices $B_{K}$ and $D_{K}$ coincide with those of the actual colluding attackers.

Proof

Using the state vector $w (k)$ of the attack identifier described by Eqs. (11.62)–(11.64), define the new state vector $\bar{w} (k)$ evolving as

$\bar{w} (k + 1) = {\bar{A}}_{11} w (k) + {\bar{A}}_{12} \bar{w} (k) .$

(11.65)

Moreover, define the error vector $e (k) = col {e_{1} (k), e_{2} (k)}$ , in which two suberror vectors $e_{1} (k)$ and $e_{2} (k)$ are respectively defined as $e_{1} (k) = \bar{w} (k) - x_{1} (k)$ and $e_{2} (k) = w (k) - x_{2} (k)$ . Then on the basis of Eqs. (11.14), (11.15), (11.62)–(11.64), and (11.65), straightforward algebraic manipulations show that

$e (k + 1) = [\begin{matrix} {\bar{A}}_{11} & {\bar{A}}_{12} \\ 0 & {\bar{A}}_{22} - \bar{L} (I - {\bar{C}}_{1} {\bar{C}}_{1}^{†}) {\bar{C}}_{2} \end{matrix}] e (k) - [\begin{matrix} {\bar{B}}_{K} \\ 0 \end{matrix}] d (k),$

(11.66)

$r_{K} (k) = [0 (I - {\bar{C}}_{1} {\bar{C}}_{1}^{†}) {\bar{C}}_{2}] e (k) .$

(11.67)

When $w (0)$ is set to be equal to $x_{2} (0)$ , the definition of the suberror vector $e_{2} (k)$ implies that $e (0) = 0$ . As the residual signal $r_{K} (k)$ is not directly affected by the attack disturbances $d (k)$ , and the matrix $\bar{L}$ is selected to assure that the magnitude of each eigenvalue of the matrix ${\bar{A}}_{22} - \bar{L} (I - {\bar{C}}_{1} {\bar{C}}_{1}^{†}) {\bar{C}}_{2}$ is smaller than 1, this residual signal $r_{K} (k)$ is constantly equal to zero, provided that the attack set is $K$ . This completes the proof. □

In case that there exist modeling errors in the adopted state space model, and/or there are random disturbances in the networked system, and/or the initial state vector $x (0)$ is not exactly known, observations in attack detections are also valid for the above attack identification methods. On the other hand, from the above results it is clear that to identify an attack, it is necessary to construct at least one residual filter for each possible attack. This might be not feasible when the number K is large.

Similarly to the attack detector discussed in the previous section, it is also possible to realize the above attack identifier in a distributed way. The details are not included in this book. Some preliminary results can be found in [8,9].

11.6 System Security and Sensor/Actuator Placement

The previous sections make it clear that to detect and/or identify an attack in networked systems, system observability is an essential property. In Chapter 3, we have discussed how to verify observability of a large-scale networked system and how many sensors are required to construct an observable networked system. However, it is usually not sufficient in actual engineering to only detect and/or identify an attack. A general requirement is to achieve this objective in an allowed time period after the occurrence of an attack, if not in the shortest time period. In fact, the longer the time is needed to detect/identify an attack, the larger the damages the attack may bring.

Note that both attack detection and attack identification depend on a residual generator. As system initial states cannot be known exactly in general and both modeling errors and external disturbances are usually unavoidable in actual applications, the time period and/or data length needed to detect/identify an attack depend heavily on the performances of that residual generator in its robustness against modeling errors and its capabilities of reducing influences from external disturbances. From the results of Section 11.5 it is clear that this residual generator can be easily rewritten into a form of the Luenberger observer discussed in Chapter 4, which includes the Kalman filter as its particular form. On the other hand, recall that the Kalman filter is optimal when the plant is linear and the external disturbances are normally distributed and when the sensitivity penalization-based robust state estimator given in Chapter 4 has completely the same form as that of the Kalman filter. It is interesting from both theoretical aspects and application aspects to investigate relations among estimation performances of the Kalman filter, the number of sensors, and the length of system output measurements [15]. A dual problem is about relations about control efforts, the number of actuators and optimal control performances, which is also an interesting topic in the analysis and synthesis of networked systems [16,17]. In fact, an attacker often intends to destroy a system with minimal efforts in a stealthy way, whereas a controller usually wants to improve system performances with the smallest energy consumptions.

In this section, we investigate a sensor placement problem and an actuator placement problem, both of them related closely to system security.

From a mathematical viewpoint, both sensor placement problems and actuator placement problems can be regarded as set function optimizations. When a sensor placement problem is under investigation, a set $V$ is usually adopted to denote the set consisting of all potential positions of a networked system in which a sensor can be placed, whereas the function $f (\cdot)$ represents metric measuring factors like the best performances that a state estimator can achieve with the measurements from several sensors belonging to a set $S$ , the costs of these sensors, and so on. Then, when there are restrictions on the number of sensors in a networked system, say, it does not exceed k, a sensor placement problem may be mathematically described as

$\max_{S \subseteq V, | S | ⩽ k} f (S) .$

(11.68)

Here $| S |$ stands for the number of elements in the set $S$ . Basically, this is a finite combinatorial optimization problem, which can be solved in principle through exhausting all possible sensor sets that have its element number not greater than k and comparing their associated cost function values. When the number of subsystems in a networked system is large, as possibilities of sensor locations in general increase exponentially with the subsystem number, such a brute-force-based method is usually computationally prohibitive.

Similar statements are valid for actuator placements.

To deal with these sensor/actuator placement problems, assume that the dynamics of a discrete time and linear time-variant system can be described by the following state space model:

$x (k + 1) = A (k) x (k) + B (k) w (k),$

(11.69)

$y (k) = C (k) x (k) + v (k) .$

(11.70)

Once again, here, $x (k) \in R^{n}$ , $y (k) \in R^{p}$ , and $w (k) \in R^{q}$ denote respectively the state vector, input vector, and output vector of the system. Moreover, assume that $S$ is a subspace of $R^{n}$ .

11.6.1 Some Properties of the Kalman Filter

It is now widely known that when a plant is linear and its external disturbances are normally distributed, the Kalman filter is the optimal state estimator under the criterion of mean squares errors [18,19]. Recent studies, however, show that when mean squares errors are adopted in the determination of optimal sensor locations, the associated optimization problem generally does not have the submodularity property and is in general NP-hard. Moreover, the widely adopted greedy heuristic algorithm may perform arbitrarily poorly, and there does not exist a constant-factor (polynomial-time) approximation algorithm [20,21].

To deal with the problem of appropriately locating sensors for a networked system, in this section, we investigate relations among estimation error, data length, and plant output number in the Kalman filtering. As only stochastic properties of $B (k) w (k)$ affect estimation accuracies of the Kalman filter [18,19,22], we can assume without any loss of generality that $B (k) \equiv I_{n}$ . To simplify mathematical expressions, this assumption is adopted throughout this and next subsections. Moreover, we assume that the system initial conditions, process disturbances, and measurement errors are white and uncorrelated. More precisely:

• $x (0)$ , $w (k)$ , and $v (k)$ are uncorrelated with each other for $k ⩾ 0$ ;
• $w (k)$ and $v (k)$ are uncorrelated with $w (k^{'})$ and $v (k^{'})$ for every nonnegative integer $k^{'} \neq k$ ;
• $E (x (0)) = 0$ and $E (w (k)) = E (v (k)) = 0$ for each $k = 0, 1, \dots$ ;
• $Var (x (0)) > 0$ , $Var (w (k)) > 0$ , and $Var (v (k)) = σ^{2} I$ with $σ > 0$ for every nonnegative integer k.

Using the same arguments as in the derivation of Eq. (3.6), we can show straightforwardly from Eqs. (11.69) and (11.70) that, for arbitrary nonnegative integers k and $k^{'}$ with $0 ⩽ k^{'} ⩽ k$ , the following equality is valid:

$[\begin{matrix} y (0) \\ y (1) \\ y (2) \\ ⋮ \\ y (k) \\ x (k^{'}) \end{matrix}] = [\begin{matrix} Γ (k) \\ L (k^{'}) \end{matrix}] [\begin{matrix} x (0) \\ w (0) \\ w (1) \\ w (2) \\ ⋮ \\ w (k - 1) \end{matrix}] + [\begin{matrix} v (0) \\ v (1) \\ v (2) \\ ⋮ \\ v (k) \\ 0 \end{matrix}],$

(11.71)

where

$\begin{matrix} Γ (k) = \begin{matrix} [\begin{matrix} C_{0} & 0 & 0 & 0 & \dots & 0 \\ C (1) A (0) & C (1) & 0 & 0 & \dots & 0 \\ C (2) A (1) A (0) & C (2) A (1) & C (2) & 0 & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ C (k) \prod_{j = k - 1}^{0} A (j) & C (k) \prod_{j = k - 2}^{0} A (j) & C (k) \prod_{j = k - 3}^{0} A (j) & C (k) \prod_{j = k - 4}^{0} A (j) & \dots & C (k) \end{matrix}], \end{matrix} \\ L (k^{'}) = [\prod_{j = k^{'} - 1}^{0} A (j) \prod_{j = k^{'} - 2}^{0} A (j) \dots I 0 \dots 0] . \end{matrix}$

If $x (0)$ , $w (j) |_{j = 0}^{k - 1}$ , and $v (k) |_{j = 0}^{k}$ are normally distributed, then we can declare from Eq. (11.71) that

${[y^{T} (0) y^{T} (1) y^{T} (2) \dots y^{T} (k) x^{T} (k^{'})]}^{T}$

is also normally distributed, which is equivalent to that

$x (k^{'}) and {[y^{T} (0) y^{T} (1) y^{T} (2) \dots y^{T} (k)]}^{T}$

are jointly normally distributed.

For simplicity, denote the vectors

$\begin{matrix} {[y^{T} (0) y^{T} (1) \dots y^{T} (k)]}^{T}, & {[x^{T} (0) w^{T} (0) w^{T} (1) \dots w^{T} (k - 1)]}^{T}, and \\ {[v^{T} (0) v^{T} (1) \dots v^{T} (k)]}^{T} \end{matrix}$

respectively by $Y (k)$ , $Z (k - 1)$ , and $V (k)$ . Then Eq. (11.71) can be rewritten as follows:

$[\begin{matrix} Y (k) \\ x (k^{'}) \end{matrix}] = [\begin{matrix} Γ (k) & I \\ L (k^{'}) & 0 \end{matrix}] [\begin{matrix} Z (k - 1) \\ V (k) \end{matrix}] .$

(11.72)

In addition, from the system output vector measurements $y (j)$ , $j = 0, 1, \dots, k$ , the best estimate of the system state vector $x (k^{'})$ under the criterion of mean squares errors, denoted $\hat{x} (k^{'})$ , has been proven to have the following closed-form expression [18,19,22]:

$\hat{x} (k^{'}) = E (x (k^{'}) | Y (k)),$

(11.73)

that is,

$\hat{x} (k^{'}) = \arg \min_{α} E ({[α - x (k^{'})]}^{T} [α - x (k^{'})])$

(11.74)

under the constraints of Eq. (11.71).

Based on this relation and the adopted assumptions on the external disturbance process $w (k) |_{k = 0}^{\infty}$ and the measurement error process $v (k) |_{k = 0}^{\infty}$ , direct algebraic manipulations show that

$\begin{matrix} [\begin{matrix} Y (k) \\ x (k^{'}) \end{matrix}] \\ \sim N ([\begin{matrix} 0 \\ 0 \end{matrix}], [\begin{matrix} Γ (k) Var (Z (k - 1)) Γ^{T} (k) + σ^{2} I & Γ (k) Var (Z (k - 1)) L^{T} (k^{'}) \\ L (k^{'}) Var (Z (k - 1)) Γ^{T} (k) & L (k) Var (Z (k - 1)) L^{T} (k^{'}) \end{matrix}]) . \end{matrix}$

(11.75)

Concerning joint normal distributions, the following result is well known [23,24].

Lemma 11.6

Assume that random vectors x and y are jointly normally distributed. Moreover, assume that

$E ([\begin{matrix} x \\ y \end{matrix}]) = [\begin{matrix} μ_{x} \\ μ_{y} \end{matrix}], Var ([\begin{matrix} x \\ y \end{matrix}]) = [\begin{matrix} V_{x} & V_{y x}^{T} \\ V_{y x} & V_{y} \end{matrix}] .$

Then the conditional random vector $y | x$ is also normally distributed. In addition,

$E (y | x) = μ_{y} + V_{y x} V_{x}^{- 1} [x - μ_{x}], Var (y | x) = V_{y} - V_{y x} V_{x}^{- 1} V_{y x}^{T} .$

On the basis of Eqs. (11.73) and (11.72) and Lemma 11.6, the following equality can be straightforwardly established:

$\begin{matrix} \hat{x} (k^{'}) & = E (x (k^{'}) | Y (k)) \\ = L (k^{'}) Var (Z (k - 1)) Γ^{T} (k) {[Γ (k) Var (Z (k - 1)) Γ^{T} (k) + σ^{2} I]}^{- 1} Y (k) . \end{matrix}$

(11.76)

Recall that when a linear space is constituted from n-dimensional random vectors with zero mean and finite covariance matrix, an inner product can be defined as

$< x, y > = E (x^{T} y) .$

Obviously, the norm induced from this inner product is consistent with the cost function adopted in the optimization problem of Eq. (11.74), that is, the mean squares errors. Hence, from the projection-based optimization condition [25,26] we have that

$< x (k^{'}) - \hat{x} (k^{'}), \hat{x} (k^{'}) > = E ({[x (k^{'}) - \hat{x} (k^{'})]}^{T} \hat{x} (k^{'})) = 0 .$

(11.77)

Denote the estimation error vector $x (k^{'}) - \hat{x} (k^{'})$ by $\tilde{x} (k^{'})$ . From this equality and Eq. (11.75) direct algebraic operations show that

$\begin{matrix} Var (\tilde{x} (k^{'})) & = & E ([x (k^{'}) - \hat{x} (k^{'})] {[x (k^{'}) - \hat{x} (k^{'})]}^{T}) \\ = & L (k^{'}) Var (Z (k - 1)) L^{T} (k^{'}) \\ - L (k^{'}) Var (Z (k - 1)) Γ^{T} (k) {[Γ (k) Var (Z (k - 1)) Γ^{T} (k) + σ^{2} I]}^{- 1} \\ \times Γ (k) Var (Z (k - 1)) L^{T} (k^{'}) \\ = & L (k^{'}) {[{Var}^{- 1} (Z (k - 1)) + \frac{1}{σ^{2}} Γ^{T} (k) Γ (k)]}^{- 1} L^{T} (k^{'}) . \end{matrix}$

(11.78)

A recursive realization of Eqs. (11.76) and (11.78) is the well-known Kalman filter [18,19]. Rather than the recursive formula of the Kalman filter, which is given by Eqs. (4.8) and (4.9), expression (11.78) for the covariance matrix of estimation errors appears to be much more convenient in the analysis of the relations among sensor positions, sensor number, measurement data length, and estimation accuracy.

Define the random variable

$δ (k^{'}) = {\tilde{x}}^{T} (k^{'}) {Var}^{- 1} (\tilde{x} (k^{'})) \tilde{x} (k^{'})$

(11.79)

Recall that the random vector $x (k^{'})$ is normally distributed. On the other hand, it is clear from Eqs. (11.72) and (11.76) that the random vector $\hat{x} (k^{'})$ is also normally distributed. We can therefore declare that the estimation error vector $\tilde{x} (k^{'})$ is also normally distributed. In addition, note that $E [E (y | x)] = E (y)$ for arbitrary random vectors x and y [23,24]. We therefore have

$E (\tilde{x} (k^{'})) = E (x (k^{'}) - \hat{x} (k^{'})) = E (x (k^{'})) - E (E (x (k^{'}) | Y (k))) = 0 .$

(11.80)

Hence the random variable $δ (k^{'})$ has the $χ_{n}^{2}$ distribution, that is, the $χ^{2}$ distribution with n degrees of freedom [23,24]. This property is quite helpful in developing algorithms for attack detections and attack identifications [6,7].

It is also worth mentioning that

$\begin{matrix} E {{[x (k^{'}) - \hat{x} (k^{'})]}^{T} [x (k^{'}) - \hat{x} (k^{'})]} & = & E {tr ([x (k^{'}) - \hat{x} (k^{'})] {[x (k^{'}) - \hat{x} (k^{'})]}^{T})} \\ = & tr {E ([x (k^{'}) - \hat{x} (k^{'})] {[x (k^{'}) - \hat{x} (k^{'})]}^{T})} \\ = & tr {Var (\tilde{x} (k^{'}))} . \end{matrix}$

(11.81)

These observations mean that the covariance matrix $Var (\tilde{x} (k^{'}))$ of the estimation error is closely related to both some statistics adopted in attack detections/identifications and the estimation accuracy measured by the mathematical expectation of squared estimation errors.

11.6.2 Sensor Placements

In both attack detections and attack identifications, it is essential to distinguish an abnormal situation of a networked system from its normal situation as fast as possible. This means that it is preferable to have sensors in positions that can make a detector achieve the minimal value of a metric on its residual signal when there do not exist any attacks in the associated networked system and reach a value of that metric as large as possible if the associated networked system has been attacked. Keeping in mind relations among state observers and detectors, in this subsection, we investigate sensor placements with the objective of minimizing state estimation errors.

For this purpose, we adopt the following assumption for the system described by Eqs. (11.69) and (11.70) to reflect characteristics in optimizations of sensor locations.

Assumption 11.1

The matrix $C (k)$ is time invariant. Moreover, each of its rows has a nonzero element equal to 1. Furthermore, each of its columns has at most one nonzero element.

This assumption reflects that all sensor positions have been fixed, each sensor only directly measures one of the states of the networked system, and if a state in the networked system is directly measured, then it can only be measured by one sensor. Under such an assumption, we can directly prove that the number of the sensors in the networked system is equal to the number of nonzero elements in the matrix C, which is an abbreviation of the matrix $C (k)$ when Assumption 11.1 is satisfied.

Define the set

$S = {i | \begin{matrix} C = [c_{j l} |_{j = 1, l = 1}^{j = p, l = n}] satisfies Assumption 11.1, and \\ there exists j \in {1, 2, \dots, p} such that c_{j i} = 1 \end{matrix}} .$

It is clear that this set indicates all the states in the networked system described by Eqs. (11.69) and (11.70) for which there is a sensor measuring its value. With a little abuse of terminology, in this subsection, we call this set the sensor set for that networked system.

For an arbitrary $t \times s$ -dimensional matrix Ξ, let ${(Ξ)}_{i j}$ represent its ith row jth column element, $i = 1, 2, \dots, t$ and $j = 1, 2, \dots, s$ . Define scalars $μ (k)$ , $v_{w} (k)$ , $v_{0}$ , and $v_{0 i n}$ respectively as

$\begin{matrix} μ (k) = \max_{0 ⩽ i ⩽ k} σ_{\max} (A (i)), v_{w} (k) = \max_{0 ⩽ i ⩽ k} \max_{1 ⩽ j ⩽ n} {(Var (w (i)))}_{j j}, \\ v_{0} = \max_{1 ⩽ j ⩽ n} {(Var (x (0)))}_{j j}, v_{0 i n} = \max_{1 ⩽ j ⩽ n} {({Var}^{- 1} (x (0)))}_{j j} . \end{matrix}$

On the basis of the results given in the previous subsection on the Kalman filtering, the following results have been obtained in [15], which establish some relations among the number of sensors, data length, and state estimation accuracy in a networked system.

Theorem 11.6

Let the networked system be described by Eqs. (11.69) and (11.70), and let $\hat{x} (k)$ stand for the estimate of its state vector $x (k)$ using the Kalman filter. Assume that $σ > 0$ and $μ (k) \neq 1$ . Then for every $k = 1, 2, \dots$ ,

$\frac{n σ^{2} λ_{\min} (L^{T} (k) L (k))}{| S | \frac{1 - μ^{2 (k + 1)}}{1 - μ^{2}} + σ^{2} v_{0 i n}} ⩽ E ({| | x (k) - \hat{x} (k) | |}_{2}^{2}) ⩽ n (k + 1) λ_{\max} (L^{T} (k) L (k)) \max {v_{0}, v_{w} (k)},$

(11.82)

where $| S |$ is the number of elements in the sensor set $S$ .

This theorem reveals that the lower bound of state estimation accuracy decreases only inversely proportionally to the number of the sensors in the system and increases linearly with the increment of the number of the states in the system. This means that estimation errors cannot be significantly reduced through only adding sensors to the system, and for a large-scale system, measurements with a short data length cannot lead to an acceptable estimation accuracy in general.

From this theorem conditions can be derived on the number of sensors and data length required to meet some prescribed estimation accuracy. More precisely, assume that it is required that $E ({| | x (k) - \hat{x} (k) | |}_{2}^{2}) ⩽ α$ with some prescribed positive number α. Then the following conclusions can be derived from Theorem 11.6.

• Assume that the date length k is fixed. Then, to satisfy $E ({| | x (k) - \hat{x} (k) | |}_{2}^{2}) ⩽ α$ , it is necessary that

$| S | ⩾ [\frac{n σ^{2} λ_{\min} (L^{T} (k) L (k))}{α} - σ^{2} v_{0 i n}] \frac{1 - μ^{2}}{1 - μ^{2 (k + 1)}}$

• Assume that the number of sensors $| S |$ , is fixed. Then, to satisfy $E ({| | x (k) - \hat{x} (k) | |}_{2}^{2}) ⩽ α$ , it is necessary that

$k ⩾ \frac{\log {1 - [\frac{n σ^{2} λ_{\min} (L^{T} (k) L (k))}{α} - σ^{2} v_{0 i n}] \frac{1 - μ^{2}}{| S |}}}{2 \log (μ)} - 1 .$

These relations imply that the required number of sensors increases linearly with the increment of the number of the system states, but the required data length increases only logarithmically.

Now, we investigate relations between sensor sets and the covariance matrix of estimation errors, that is, $Var (\tilde{x} (k))$ . For this purpose, define $L (j)$ with $j = 0$ as $L (0) = [I_{n} 0 \dots 0]$ . Then from the definitions of the matrices $Γ (k)$ and $L (k)$ it is obvious that

$Γ (k) = col {C L (j) |_{j = 0}^{k}} .$

(11.83)

Let $e_{j}$ denote the jth canonical basis of the n-dimensional Euclidean space $R^{n}$ , $j = 1, 2, \dots, n$ , that is, $e_{j}$ is an n-dimensional column vector with its jth row element 1 and all other elements 0. Moreover, for each $i = 1, 2, \dots, n$ , define the matrix

$I (i) = e_{i} e_{i}^{T} .$

Furthermore, let $s (i)$ with $i = 1, 2, \dots, n$ , be the indicator function defined as

$s (i) = {\begin{matrix} 1 & if the i th state of the system is directly measured by a sensor, \\ 0 & if the i th state of the system is not directly measured by any sensor. \end{matrix}$

By Assumption 11.1 the output matrix C of the networked system described by Eqs. (11.69) and (11.70) can be expressed as

$C = col {e_{j (i)}^{T} |_{i = 1}^{p}},$

(11.84)

where $j (i)$ is the position of the nonzero element in the ith row of the matrix C, $i = 1, 2, \dots, p$ . Moreover, the following equalities are valid:

$C^{T} C = \sum_{m = 1}^{p} e_{j (m)} e_{j (m)}^{T} = \sum_{m \in S} I (m) = \sum_{m = 1}^{n} s (m) I (m) .$

(11.85)

To clarify dependence of the matrices $Γ (k)$ and $Var (\tilde{x} (k))$ on the sensor set $S$ , we further reexpress them respectively as $Γ (k, S)$ and $Var (\tilde{x} (k), S)$ , where the sensor set $S$ is explicitly included. By Eqs. (11.83), (11.84), and (11.85) we have that

$\begin{matrix} Γ^{T} (k, S) Γ (k, S) & = & \sum_{j = 0}^{k} L^{T} (j) C^{T} C L (j) \\ = & \sum_{j = 0}^{k} (L^{T} (j) \sum_{m = 1}^{n} s (m) I (m) L (j)) \\ = & \sum_{m = 1}^{n} s (m) (\sum_{j = 0}^{k} L^{T} (j) I (m) L (j)) \\ = & \sum_{m \in S} O (k, m), \end{matrix}$

(11.86)

where

$\begin{matrix} O (k, m) = \sum_{j = 0}^{k} L^{T} (j) I (m) L (j) = \sum_{j = 0}^{k} {(e_{m}^{T} L (j))}^{T} (e^{T} (m) L (j)), \\ k = 0, 1, \dots, m = 1, 2, \dots, n . \end{matrix}$

From its definition it is clear that for all $k = 0, 1, \dots$ and $m = 1, 2, \dots, n$ , the matrix $O (k, m)$ is independent of either the number of sensors in the system or the positions of the sensors. In addition, this matrix is at least semipositive definite.

Now, assume that there are two sensor sets $S_{1}$ and $S_{2}$ satisfying

$S_{1} \subseteq S_{2} \subseteq {1, 2, \dots, n} .$

Then by Eq. (11.86) the following inequality is immediate:

$\begin{matrix} Γ^{T} (k, S_{2}) Γ (k, S_{2}) & = & \sum_{m \in S_{2}} O (k, m) \\ = & \sum_{m \in S_{1}} O (k, m) + \sum_{m \in S_{2} S_{1}} O (k, m) \\ ⩾ & \sum_{m \in S_{1}} O (k, m) \\ = & Γ^{T} (k, S_{1}) Γ (k, S_{1}) . \end{matrix}$

(11.87)

We can therefore declare from Lemma 2.1 that

$\begin{matrix} {[\frac{1}{σ^{2}} Γ^{T} (k, S_{2}) Γ (k, S_{2}) + {Var}^{- 1} (Z (k - 1))]}^{- 1} \\ ⩽ {[\frac{1}{σ^{2}} Γ^{T} (k, S_{1}) Γ (k, S_{1}) + {Var}^{- 1} (Z (k - 1))]}^{- 1} . \end{matrix}$

(11.88)

By Eqs. (11.78) and (11.88) and by Lemma 2.1 we further have that

$Var (\tilde{x} (k), S_{2}) ⩽ Var (\tilde{x} (k), S_{1})$

(11.89)

Eqs. (11.89) and (11.81) mean that, with the addition of some sensors, both the mean squares estimation errors of the Kalman filter and the covariance matrix of its estimation errors monotonically decrease. This is consistent with engineering intuition, as a sensor addition results in more measurement data, which provide more information about the plant state vector and therefore lead to a more accurate state estimate.

However, when the number of sensors is fixed, it is still not very easy to find the optimal sensor positions that minimize either the mean squares estimation errors or the covariance matrix of the state estimation errors [15,20,21].

On the other hand, discussions in the previous section show that observability of a networked system is necessary for the existence of a residual filter that is able to detect/identify an attack. It can be shown, however, that even the problem of finding the best sensor positions that lead to an observable networked system is NP-hard.

In particular, similarly to the proof of Theorem 3.2, we can also show that to recover the state vectors of the networked system described by Eqs. (11.69) and (11.70) from its output measurements $y (j) |_{j = 0}^{k}$ , it is necessary and sufficient that the following matrix $O (k, S)$ w is of full column rank:

$O (k, S) = {[C^{T} A^{T} (0) C^{T} {(A (1) A (0))}^{T} C^{T} \dots {(\prod_{j = k}^{0} A (j))}^{T} C^{T}]}^{T} .$

Note that the matrix $O (k, S)$ is of full column rank if and only if the matrix $O^{T} (k, S) O (k, S)$ is of full rank. In fact, the ranks of these two matrices equal each other [27,28]. When sensor positions are selected to guarantee the observability of a networked system, it is reasonable from these observations to investigate the problem of maximizing the rank of the matrix $O^{T} (k, S) O (k, S)$ under the restrictions that $S \subseteq {1, 2, \dots, n}$ and $| S | ⩽ q$ .

Concerning the set function involved in this optimization problem, the following conclusions are obtained.

Theorem 11.7

Assume that the dynamics of a networked system is described by Eqs. (11.69) and (11.70). Moreover, assume that Assumption 11.1 is satisfied by its output matrix $C (k)$ . Define the set function $f (S) : 2^{V} ⟶ R$ as

$f (k, S) = rank (O^{T} (k, S) O (k, S)),$

(11.90)

where $V = {1, 2, \dots, n}$ . Then this set function is submodular and increasing for each $k = 0, 1, 2, \dots$ .

The proof of this theorem is deferred to the appendix of this chapter.

From this theorem and the results on optimization of the set function given in Chapter 2 we can declare that optimal sensor placements for maximizing the rank of the observability matrix $O (k, S)$ are in general NP-hard. On the other hand, the greedy heuristic method of that chapter usually gives a good approximation result. But it is worth noting that when the scale of a networked system is large, some computation issues may still arise for this greedy heuristic method, noting that in this case, the matrix $\bar{O} (k, i)$ of Eq. (11.A.1) often has a high dimension. These are different significantly by the results of Chapter 3, given by Corollary 3.1 and Theorem 3.10. The output matrix there is allowed to take an arbitrary real value for a networked system, and an explicit parameterization is given for all output matrices that lead to an observable system.

When the state transition matrix of the networked system described by Eqs. (11.69) and (11.70) is also time invariant, arguments similar to those of Chapter 3 show that when the observability matrix $O (k, S)$ is involved in an optimization problem, it is sufficient to only consider the case $k = n - 1$ .

11.6.3 Actuator Placements

In the previous sections, it has been argued that an attacker usually intends to give destructive damages to a system in a stealthy way with small efforts. In this subsection, we investigate selection of input positions for the system described by Eqs. (11.69) and (11.70) such that some metrics on the input efforts can be minimized. For this purpose, we further adopt the following assumption, which is in a dual form of Assumption 11.1 adopted in the previous subsection for studying sensor placements.

Assumption 11.2

The matrix $B (k)$ is time invariant. Moreover, each of its columns has a nonzero element 1. Furthermore, each of its rows has at most one nonzero element.

When this assumption is satisfied, the input matrix $B (k)$ is abbreviated as B for simplicity.

Concerning the system described by Eqs. (11.69) and (11.70), its controllability Gramian at the time instant k, denoted $W_{c} (k)$ , is defined as

$W_{c} (k) = \sum_{j = 0}^{k} Φ (j, 0) B B^{T} Φ^{T} (j, 0),$

(11.91)

where $Φ (j, 0) = \prod_{i = j}^{0} A (i)$ . Note that inputs to a system are usually energy restricted. Moreover, some functions of the controllability Gramian $W_{c} (k)$ have been argued to be nice quantities for measuring energies required to maneuver the system state vector from the zero-valued initial conditions to a prescribed value in the time interval $[0, k]$ . Depending on whether the average energy required to maneuver the system state vector, a worst-case energy required to maneuver the system state vector or the volume of the states that can be reached through one unit or less of input energies, $tr (W_{c}^{- 1} (k))$ , $σ_{\max} (W_{c}^{- 1} (k))$ and $\log \det (W_{c} (k))$ are utilized respectively [16,17].

For each $i = 1, 2, \dots, n$ , let $a (i)$ with $i = 1, 2, \dots, n$ , be an indicator function defined as

$a (i) = {\begin{matrix} 1 & if the i th state of the system is directly maneuvered by an actuator, \\ 0 & if the i th state of the system is not directly maneuvered by any actuator. \end{matrix}$

By Assumption 11.2 the input matrix B of the networked system can be expressed as

$B = [e_{j (1)}, e_{j (2)}, \dots, e_{j (q)}],$

(11.92)

where $j (i)$ is the position of the nonzero element in the ith column of the input matrix B, $i = 1, 2, \dots, q$ . Moreover, similarly to Eq. (11.85), we can obtain the following relation:

$B B^{T} = \sum_{m = 1}^{n} a (m) I (m),$

(11.93)

where the matrix $I (m)$ is defined as that in the previous subsection.

Define the set

$A = {i | \begin{matrix} B = [b_{j l} |_{j = 1, l = 1}^{j = n, l = q}] satisfies Assumption 11.2, and \\ there exists j \in {1, 2, \dots, q} such that b_{i j} = 1 \end{matrix}} .$

Then it is clear that this set indicates all the states in the networked system described by Eqs. (11.69) and (11.70) that are directly controlled by an actuator. Similarly to the sensor set $S$ , in this subsection, we call this set the actuator set for that networked system.

To clarify dependence of the controllability Gramian $W_{c} (k)$ on the actuator set $A$ , we further reexpress it as $W_{c} (k, A)$ , so that the sensor set $S$ is explicitly included. Using completely the same arguments as those in the derivation of Eq. (11.86), from Eqs. (11.91) and (11.93) we obtain the following relation:

$W_{c} (k, A) = \sum_{m \in A} a (m) C (k, m),$

(11.94)

where

$C (k, m) = \sum_{j = 0}^{k} Φ (j, 0) I (m) Φ^{T} (j, 0), k = 0, 1, \dots, m = 1, 2, \dots, n .$

It is clear from its definition that, for all $k = 0, 1, \dots$ and $m = 1, 2, \dots, n$ , the matrix $C (k, m)$ is independent of either the number of actuators in the system or the positions of the actuators. In addition, this matrix is at least semipositive definite.

Now, assume that there are two actuator sets $A_{1}$ and $A_{2}$ satisfying

$A_{1} \subseteq A_{2} \subseteq {1, 2, \dots, n} .$

Then by Eq. (11.94) we can obtain the following inequality by the same arguments as those in the derivation of Eq. (11.87):

$W_{c} (k, A_{2}) ⩾ W_{c} (k, A_{1}) .$

(11.95)

We can therefore declare from Lemma 2.1 that when both matrices $W_{c} (k, A_{1})$ and $W_{c} (k, A_{2})$ are invertible, the following inequalities are valid:

$tr (W_{c}^{- 1} (k, A_{2})) ⩽ tr (W_{c}^{- 1} (k, A_{1})),$

(11.96)

$σ_{\max} (W_{c}^{- 1} (k, A_{2})) ⩽ σ_{\max} (W_{c}^{- 1} (k, A_{1})),$

(11.97)

$\log \det (W_{c} (k, A_{2})) ⩾ \log \det (W_{c} (k, A_{1})),$

(11.98)

that is, through adding some other actuators, both the average energy required to maneuver the system state vector and the worst-case energy required to maneuver the system state vector decrease monotonically, whereas the volume of the states that can be reached through one unit or less of input energies increases monotonically. Once again, this is in a good agreement with engineering intuition.

However, optimization of these metrics is generally mathematically difficult. In particular, we have the following results.

Theorem 11.8

Let the networked system be described by Eqs. (11.69) and (11.70). Define the set function $f (A) : 2^{V} ⟶ R$ as

$f (A) = \log \det (W_{c} (k, A)),$

(11.99)

where $V = {1, 2, \dots, n}$ . Then this set function is submodular and increasing.

Proof

From Eq. (11.98) we can directly declare that this function is increasing.

To prove its submodularity, assume that ξ is an arbitrary element of the set $V$ . Moreover, assume that $A_{1}$ and $A_{2}$ are two sets satisfying $A_{1} \subseteq A_{2} \subseteq V {ξ}$ . Define the matrix $W_{c} (γ)$ and the function $g (γ)$ respectively as

$\begin{matrix} W_{c} (γ) = W_{c} (k, A_{1}) + γ [W_{c} (k, A_{2}) - W_{c} (k, A_{1})], \\ g (γ) = \log \det [W_{c} (γ) + W_{c} (k, {ξ})] - \log \det (W_{c} (γ)), \end{matrix}$

where $γ \in [0, 1]$ .

By Eq. (11.95) we have that

$W_{c} (k, A_{2}) - W_{c} (k, A_{1}) ⩾ 0 .$

(11.100)

On the other hand, noting that by its definition the matrix $W_{c} (k, {ξ})$ is at least positive semidefinite, it is obvious that

$W_{c} (γ) + W_{c} (k, {ξ}) ⩾ W_{c} (γ) .$

(11.101)

We can therefore declare from Lemma 2.1 that

$\begin{matrix} \frac{d g (γ)}{d γ} & = & \frac{d}{d γ} \log \det (W_{c} (γ) + W_{c} (k, {ξ})) - \frac{d}{d γ} \log \det (W_{c} (γ)) \\ = & tr {{[W_{c} (γ) + W_{c} (k, {ξ})]}^{- 1} [W_{c} (k, A_{2}) - W_{c} (k, A_{1})]} \\ - tr {W_{c}^{- 1} (γ) [W_{c} (k, A_{2}) - W_{c} (k, A_{1})]} \\ = & tr {[{(W_{c} (γ) + W_{c} (k, {ξ}))}^{- 1} - W_{c}^{- 1} (γ)] [W_{c} (k, A_{2}) - W_{c} (k, A_{1})]} \\ = & tr {{[W_{c} (k, A_{2}) - W_{c} (k, A_{1})]}^{\frac{1}{2}} [{(W_{c} (γ) + W_{c} (k, {ξ}))}^{- 1} - W_{c}^{- 1} (γ)] \\ \times {[W_{c} (k, A_{2}) - W_{c} (k, A_{1})]}^{\frac{1}{2}}} \\ ⩽ & 0, \end{matrix}$

(11.102)

which implies that $g (0) ⩾ g (1)$ .²

On the other hand, from the definition of the function $g (γ)$ it is obvious that

$g (0) = \log \det (W_{c} (k, A_{1}) + W_{c} (k, {ξ})) - \log \det (W_{c} (k, A_{1})),$

(11.103)

$g (1) = \log \det (W_{c} (k, A_{2}) + W_{c} (k, {ξ})) - \log \det (W_{c} (k, A_{2})) .$

(11.104)

Note that from Eq. (11.94) we immediately have that

$W_{c} (k, A ⋃ {ξ}) = W_{c} (k, A) + W_{c} (k, {ξ})$

for all $A \subseteq V {ξ}$ and $ξ \in V$ . Hence we can claim that $g (0)$ and $g (1)$ are in fact equal to the values of the following set function $f_{ξ} (A)$ respectively at $A = A_{1}$ and $A = A_{2}$ :

$f_{ξ} (A) = \log \det [W_{c} (k, A ⋃ {ξ})] - \log \det (W_{c} (k, A)), A \subseteq V {ξ} .$

The proof can now be completed by an application of Lemma 2.8. □

In addition to the set function $\log \det (W_{c} (k, A))$ , it has also been proven in [16,17] that some other set functions, such as $rank (W_{c} (k, A))$ , $- tr (W_{c}^{- 1} (k, A))$ , etc., are also monotone increasing and submodular. This means that maximization of these set functions is in general NP-hard, but the greedy heuristic algorithm described in Chapter 2 usually works well approximately. Once again, as in the sensor placement problem, computation issues may arise for a large-scale system in the application of that greedy heuristic algorithm, noting that the associated matrix in this case still has a great dimension, which is not very convenient for numerical computations.

On the other hand, from the duality between controllability and observability of a system, similiar results can be established for sensor placements using the associated observability Gramian.

In the above investigations on sensor/actuator placements, the output/input matrix of the system is restricted by Assumption 11.1/11.2. The associated results can be extended to a more general situation in which each row of the output matrix can take values from a set of some candidate rows, or each column of the input matrix can take values from a set of some candidate columns. The major required modification in these extensions is a redefinition of the matrix $I (i)$ adopted in the aforementioned investigations. Details can be found in the related literature, such as [16,17] and the references therein.

11.7 Concluding Remarks

Importance of safety is now widely recognized for a networked system, especially with the ambitious introduction of public communication channels into a control system. In this chapter, we investigated some basic issues in system designs closely related to attack detection and identification, which reveals some relations among attack detectability/identifiability, system controllability/observability, and actuator/sensor placement. To develop an algorithm applicable to actual engineering problems, some other factors, such as robustness against modeling errors, stochastic properties of external disturbances, and measurement errors, must also be taken into account.

11.8 Bibliographic Notes

Network safety has been extensively investigated in power systems. [6] summarizes various important aspects of this problem and some interesting recent advancements. Recent interest in this problem are mainly stimulated by the prospective introduction of public communication channels into control systems, which brings some new characteristics into this field. Many essential issues are still in their primary stages toward a complete settlement. [2,7] are some recent surveys about this problem.

Basic concepts and results for combinatorial optimizations can be found in [26,30]. Concerning with convex optimization, [25,31] provides an excellent introduction on its essential motivations and results.

The notion of a system matrix is due to Rosenbrock, who also introduced zeros of linear multivariable systems in terms of the Smith–McMillan form of the system transfer matrix in [10,13]. The concept of weakly unobservable subspace appears to be firstly introduced for a discrete-time system by Silverman [32]. A summary and some detailed discussions can be found in [11]. Connections between the spectral properties of a weakly unobservable subspace and the system zeros are also dealt with in [33], and [34] has studied connections between the system zeros and the state space geometric structure of the system.

Appendix 11.A

11.A.1 Proof of Theorem 11.7

From the definition of the matrix $O (k, S)$ and Eq. (11.85) it is obvious that

$\begin{matrix} O^{T} (k, S) O (k, S) & = & [C^{T} A {(0)}^{T} C^{T} {(A (1) A (0))}^{T} C^{T} \dots {(\prod_{j = k}^{0} A (j))}^{T} C^{T}] \\ \times {[C^{T} A {(0)}^{T} C^{T} {(A (1) A (0))}^{T} C^{T} \dots {(\prod_{j = k}^{0} A (j))}^{T} C^{T}]}^{T} \\ = & \sum_{l = 0}^{k} {{(\prod_{j = l}^{0} A (j))}^{T} C^{T} C (\prod_{j = l}^{0} A (j))} \\ = & \sum_{l = 0}^{k} {{(\prod_{j = l}^{0} A (j))}^{T} (\sum_{i \in S} I (i)) (\prod_{j = l}^{0} A (j))} \\ = & \sum_{i \in S} \sum_{l = 0}^{k} {{(\prod_{j = l}^{0} A (j))}^{T} I (i) (\prod_{j = l}^{0} A (j))} \\ = & \sum_{i \in S} \bar{O} (k, i), \end{matrix}$

(11.A.1)

where

$\bar{O} (k, i) = \sum_{l = 0}^{k} {{(\prod_{j = l}^{0} A (j))}^{T} I (i) (\prod_{j = l}^{0} A (j))}, i = 1, 2, \dots, n .$

It is clear that, for each $i = 1, 2, \dots, n$ , the matrix $\bar{O} (k, i)$ is at least positive semidefinite. Hence, for arbitrary sensor sets $S_{1}$ and $S_{2}$ satisfying $S_{1} \subseteq S_{2} \subseteq {1, 2, \dots, n}$ , the following relations are valid:

$\begin{matrix} O^{T} (k, S_{2}) O (k, S_{2}) & = & \sum_{m \in S_{2}} \bar{O} (k, m) \\ = & \sum_{m \in S_{1}} \bar{O} (k, m) + \sum_{m \in S_{2} S_{1}} \bar{O} (k, m) \\ ⩾ & \sum_{m \in S_{1}} \bar{O} (k, m) \\ = & O^{T} (k, S_{1}) O (k, S_{1}) . \end{matrix}$

(11.A.2)

Let $λ_{i} (\cdot)$ denote the ith eigenvalue of an $n \times n$ -dimensional symmetric matrix that satisfies $λ_{1} (\cdot) ⩽ λ_{2} (\cdot) ⩽ \dots ⩽ λ_{n} (\cdot)$ . By Lemma 2.1 and the last equation we have that, for each $i = 1, 2, \dots, n$ ,

$λ_{i} {O^{T} (k, S_{2}) O (k, S_{2})} ⩾ λ_{i} {O^{T} (k, S_{1}) O (k, S_{1})} .$

(11.A.3)

Note that, for a positive semidefinite matrix, its rank is equal to the number of its nonzero eigenvalues. Moreover, each of its eigenvalues is not smaller than 0. We can therefore declare from Eq. (11.A.3) that

$rank {O^{T} (k, S_{2}) O (k, S_{2})} ⩾ rank {O^{T} (k, S_{1}) O (k, S_{1})},$

(11.A.4)

that is, the set function $f (k, S)$ of Eq. (11.90) is increasing.

To prove its submodularity, define the derived set function $f_{i} (k, S)$ for each $i \in {1, 2, \dots, n}$ as

$\begin{matrix} f_{i} (k, S) = rank (O^{T} (k, S ⋃ {i}) O (k, S ⋃ {i})) - rank (O^{T} (k, S) O (k, S)), \\ S \subseteq {1, 2, \dots, n} {i} . \end{matrix}$

(11.A.5)

Note that, for arbitrary $n \times n$ -dimensional real matrices X and Y,

$rank (X + Y) = rank (X) + rank (Y) - \dim (S p a n (X) ⋂ S p a n (Y)),$

where $\dim (\cdot)$ stands for the dimension of a subspace [27,28]. From this property, the definition of the set function $f_{i} (k, S)$ , and from Eq. (11.A.1) we have that

$\begin{matrix} f_{i} (k, S) & = & rank (\sum_{j \in S ⋃ {i}} \bar{O} (k, j)) - rank (\sum_{j \in S} \bar{O} (k, j)) \\ = & rank (\bar{O} (k, i) + \sum_{j \in S} \bar{O} (k, j)) - rank (\sum_{j \in S} \bar{O} (k, j)) \\ = & rank (\bar{O} (k, i)) - \dim (S p a n (\bar{O} (k, i)) ⋂ S p a n (\sum_{j \in S} \bar{O} (k, j))) . \end{matrix}$

(11.A.6)

Assume now that $S_{1} \subseteq S_{2} \subseteq {1, 2, \dots, n} {i}$ . Then from the relation

$\sum_{m \in S_{2}} \bar{O} (k, m) = \sum_{m \in S_{1}} \bar{O} (k, m) + \sum_{m \in S_{2} S_{1}} \bar{O} (k, m)$

and the definition of the matrix $\bar{O} (k, m)$ , $m = 1, 2, \dots, n$ , the following relation can be established by some algebraic manipulations:

$S p a n (\sum_{j \in S_{2}} \bar{O} (k, j)) \supseteq S p a n (\sum_{j \in S_{1}} \bar{O} (k, j))$

(11.A.7)

Therefore

$\begin{matrix} {S p a n (\bar{O} (k, i)) ⋂ S p a n (\sum_{j \in S_{2}} \bar{O} (k, j))} \\ \supseteq {S p a n (\bar{O} (k, i)) ⋂ S p a n (\sum_{j \in S_{1}} \bar{O} (k, j))}, \end{matrix}$

(11.A.8)

which further leads the following inequality:

$\begin{matrix} \dim {S p a n (\bar{O} (k, i)) ⋂ S p a n (\sum_{j \in S_{2}} \bar{O} (k, j))} \\ ⩾ \dim {S p a n (\bar{O} (k, i)) ⋂ S p a n (\sum_{j \in S_{1}} \bar{O} (k, j))} . \end{matrix}$

(11.A.9)

Substituting this inequality into Eq. (11.A.6), we have that, under the condition that $i \in {1, 2, \dots, n}$ and $S_{1} \subseteq S_{2} \subseteq {1, 2, \dots, n} {i}$ , the following inequality is true: