Chapter 6

Quickest Change Detection

Venugopal V. Veeravalli and Taposh Banerjee,    ECE Department and Coordinated Science Laboratory, Urbana, IL, USA, [email protected], [email protected]

Abstract

The problem of detecting changes in the statistical properties of a stochastic system and time series arises in various branches of science and engineering. It has a wide spectrum of important applications ranging from machine monitoring to biomedical signal processing. In all of these applications the observations being monitored undergo a change in distribution in response to a change or anomaly in the environment, and the goal is to detect the change as quickly as possibly, subject to false alarm constraints. In this chapter, two formulations of the quickest change detection problem, Bayesian and minimax, are introduced, and optimal or asymptotically optimal solutions to these formulations are discussed. Then some generalizations and extensions of the quickest change detection problem are described. The chapter is concluded with a discussion of applications and open issues.

Keywords

Quickest detection; Change-point detection; CuSuM test; Shiryaev test; optimal stopping; nonlinear renewal theory

Acknowledgments

The authors would like to thank Prof. Abdelhak Zoubir for useful suggestions, and Mr. Michael Fauss and Mr. Shang Kee Ting for their detailed reviews of an early draft of this work. The authors would also like to acknowledge the support of the National Science Foundation under Grants CCF 0830169 and CCF 1111342, through the University of Illinois at Urbana-Champaign, and the U.S. Defense Threat Reduction Agency through subcontract 147755 at the University of Illinois from prime award HDTRA1-10-1-0086.

3.06.1 Introduction

The problem of quickest change detection comprises three entities: a stochastic process under observation, a change point at which the statistical properties of the process undergo a change, and a decision maker that observes the stochastic process and aims to detect this change in the statistical properties of the process. A false alarm event happens when the change is declared by the decision maker before the change actually occurs. The general objective of the theory of quickest change detection is to design algorithms that can be used to detect the change as soon as possible, subject to false alarm constraints.

The quickest change detection problem has a wide range of important applications, including biomedical signal and image processing, quality control engineering, financial markets, link failure detection in communication networks, intrusion detection in computer networks and security systems, chemical or biological warfare agent detection systems (as a protection tool against terrorist attacks), detection of the onset of an epidemic, failure detection in manufacturing systems and large machines, target detection in surveillance systems, econometrics, seismology, navigation, speech segmentation, and the analysis of historical texts. See Section 3.06.7 for a more detailed discussion of the applications and related references.

To motivate the need for quickest change detection algorithms, in Figure 6.1a we plot a sample path of a stochastic sequence whose samples are distributed as image before the change, and distributed as image after the change. For illustration, we choose time slot 500 as the change point. As is evident from the figure, the change cannot be detected through manual inspection. In Figure 6.1b, we plot the evolution of the Shiryaev statistic (discussed in detail in Section 3.06.3), computed using the samples of Figure 6.1a. As seen in Figure 6.1b, the value of the Shiryaev statistic stays close to zero before the change point, and grows up to one after the change point. The change is detected by using a threshold of 0.8.

image

Figure 6.1 Detecting a change in the mean of a Gaussian random sequence. (a) Stochastic sequence with samples from image(0, 1) before the change (time slot 500), and with samples from image(0.1, 1) after the change. (b) Evolution of the classical Shiryaev algorithm when applied to the samples given on the left. We see that the change is detected approximately at time slot 1000.

We also see from Figure 6.1b that it takes around 500 samples to detect the change after it occurs. Can we do better than that, at least on an average? Clearly, declaring change before the change point (time slot 500) will result in zero delay, but it will cause a false alarm. The theory of quickest change detection deals with finding algorithms that have provable optimality properties, in the sense of minimizing the average detection delay under a false alarm constraint. We will show later that the Shiryaev algorithm, employed in Figure 6.1b, is optimal for a certain Bayesian model.

Earliest results on quickest change detection date back to the work of Shewhart [1,2] and Page [3] in the context of statistical process/quality control. Here the state of the system is monitored by taking a sequence of measurements, and an alarm has to be raised if the measurements indicate a fault in the process under observation or if the state is out of control. Shewhart proposed the use of a control chart to detect a change, in which the measurements taken over time are plotted on a chart and an alarm is raised the first time the measurements fall outside some pre-specified control limits. In the Shewhart control chart procedure, the statistic computed at any given time is a function of only the measurements at that time, and not of the measurements taken in the past. This simplifies the algorithm but may result in a loss in performance (unacceptable delays when in detecting small changes). In [3], Page proposed that instead of ignoring the past observations, a weighted sum (moving average chart) or a cumulative sum (CuSum) of the past statistics (likelihood ratios) can be used in the control chart to detect the change more efficiently. It is to be noted that the motivation in the work of Shewhart and Page was to design easily implementable schemes with good performance, rather than to design schemes that could be theoretically proven to be optimal with respect to a suitably chosen performance criterion.

Initial theoretical formulations of the quickest change detection problem were for an observation model in which, conditioned on the change point, the observations are independent and identically distributed (i.i.d.) with some known distribution before the change point, and i.i.d. with some other known distribution after the change point. This observation model will be referred to as the i.i.d. case or i.i.d. model in this article.

The i.i.d. model was studied by Shiryaev [4,5], under the further assumption that the change point is a random variable with a known geometric distribution. Shiryaev obtained an optimal algorithm that minimizes the average detection delay over all stopping times that meet a given constraint on the probability of false alarm. We refer to Shiryaev’s formulation as the Bayesian formulation; details are provided in Section 3.06.3.

When the change point is modeled as non-random but unknown, the probability of false alarm is not well defined and therefore false alarms are quantified through the mean time to false alarm when the system is operating under the pre-change state, or through its reciprocal, which is called the false alarm rate. Furthermore, it is generally not possible to obtain an algorithm that is uniformly efficient over all possible values of the change point, and therefore a minimax approach is required. The first minimax theory is due to Lorden [6] in which he proposed a measure of detection delay obtained by taking the supremum (over all possible change points) of a worst-case delay over all possible realizations of the observations, conditioned on the change point. Lorden showed that the CuSum algorithm of [3] is asymptotically optimal according to his minimax criterion for delay, as the mean time to false alarm goes to infinity (false alarm rate goes to zero). This result was improved upon by Moustakides [7] who showed that the CuSum algorithm is exactly optimal under Lorden’s criterion. An alternative proof of the optimality of the CuSum procedure is provided in [8]. See Section 3.06.4 for details.

Pollak [9] suggested modifying Lorden’s minimax criterion by replacing the double maximization of Lorden by a single maximization over all possible change points of the detection delay conditioned on the change point. He showed that an algorithm called the Shiryaev-Roberts algorithm, one that is obtained by taking a limit on Shiryaev’s Bayesian solution as the geometric parameter of the change point goes to zero, is asymptotically optimal as the false alarm rate goes to zero. It was later shown in [10] that even the CuSum algorithm is asymptotically optimum under the Pollak’s criterion, as the false alarm rate goes to zero. Recently a family of algorithms based on the Shiryaev-Roberts statistic was shown to have strong optimality properties as the false alarm rate goes to zero. See [11] and Section 3.06.4 for details.

For the case where the pre- and post-change observations are not independent conditioned on the change point, the quickest change detection problem was studied in the minimax setting by Lai [10] and in the Bayesian setting by Tartakovsky and Veeravalli [12]. In both of these works, an asymptotic lower bound on the delay is obtained for any stopping rule that meets a given false alarm constraint (on false alarm rate in [10] and on the probability of false alarm in [12]), and an algorithm is proposed that meets the lower bound on the detection delay asymptotically. Details are given in Sections 3.06.3.2 and 3.06.4.3

To summarize, in Sections 3.06.3 and 3.06.4, we discuss the Bayesian and Minimax versions of the quickest change detection problem, where the change has to be detected in a single sequence of random variables, and where the pre- and post-change distributions are given. In Section 3.06.6, we discuss variants and generalizations of the classical quickest change detection problem, for which significant progress has been made. We consider the cases where the pre- or post-change distributions are not completely specified (Section 3.06.6.1), where there is an additional constraint on the cost of observations used in the detection process (Section 3.06.6.2), and where the change has to detected using multiple geographically distributed sensor nodes (Section 3.06.6.3). In Section 3.06.7 we provide a brief overview of the applications of quickest change detection. We conclude in Section 3.06.8 with a discussion of other possible extensions and future research directions.

For a more detailed treatment of some of the topics discussed in this chapter, we refer the reader to the books by Poor and Hadjiliadis [13] and Chow et al. [14], and the upcoming book by Tartakovsky et al. [15]. We will restrict our attention in this chapter to detecting changes in discrete-time stochastic systems; the continuous time setting is discussed in [13].

In Table 6.1, a glossary of important symbols used in this chapter is provided.

Table 6.1

Glossary

Image

3.06.2 Mathematical preliminaries

A typical observation process will be denoted by sequence image. Before we describe the quickest change detection problem, we present some useful definitions and results that summarize the required mathematical background. For a detailed treatment of the topics discussed below we recommend [14,1618].

3.06.2.1 Martingales

Definition 1

The random sequence image is called a martingale if image is finite for all n, and for any image,

image (6.1)

If the “image” in (6.1) is replaced by “image,” then the sequence image is called a supermartingale, and if the “image” is replaced by “image,” the sequence is called a submartingale. A martingale is both a supermartingale and a submartingale.

Some important and useful results regarding martingales are as follows:

Theorem 1 [14]

Kolmogorov’s Inequality

Letimagebe a submartingale. Then

image

whereimage.

Kolmogorov’s inequality can be considered to be a generalization of Markov’s inequality, which is given by

image (6.2)

As we will see in the following sections, quickest change detection procedures often involve comparing a stochastic sequence to a threshold to make decisions. Martingale inequalities often play a crucial role in the design of the threshold so that the procedure meets a false alarm constraint. We now state one of the most useful results regarding martingales.

Theorem 2 [16]

Martingale Convergence Theorem

Letimagebe a martingaleimageor submartingale or supermartingaleimagesuch thatimage. Then, with probability one, the limitimageexists and is finite.

3.06.2.2 Stopping times

Definition 2

A stopping time with respect to the random sequence image is a random variable image with the property that for each n, the event image, where image denotes the sigma-algebra generated by image. Equivalently, the random variable image, which is the indicator of the event image, is a function of only image.

Sometimes the definition of a stopping time image also requires that image be finite almost surely, i.e., that image.

Stopping times are essential to sequential decision making procedures such as quickest change detection procedures, since the times at which decisions are made are stopping times with respect to the observation sequence. There are two main results concerning stopping times that are of interest.

Theorem 3 [14]

Doob’s Optional Stopping Theorem

Letimagebe a martingale, and letimagebe a stopping time with respect toimage. If the following conditions hold:

1. image,

2. image,

3. imageasimage,

then

image

Similarly, if the above conditions hold, and ifimageis a submartingale, then

image

and ifimageis a supermartingale, then

image

Theorem 4 [17]

Wald’s Identity

Letimagebe a sequence of independent and identically distributed (i.i.d.) random variables, and letimagebe a stopping time with respect toimage. Furthermore, define the sum at time n as

image

Then, ifimageandimage,

image

Like martingale inequalities, the optional stopping theorem is useful in the false alarm analysis of quickest change detection procedures. Both the optional stopping theorem and Wald’s identity also play a key role in the delay analysis of quickest change detection procedures.

3.06.2.3 Renewal and nonlinear renewal theory

As we will see in subsequent sections, quickest change detection procedures often involve comparing a stochastic sequence to a threshold to make decisions. Often the stochastic sequence used in decision-making can be expressed as a sum of a random walk and possibly a slowly changing perturbation. To obtain accurate estimates of the performance of the detection procedure, one needs to obtain an accurate estimate of the distribution of the overshoot of the stochastic sequence when it crosses the decision threshold. Under suitable assumptions, and when the decision threshold is large enough, the overshoot distribution of the stochastic sequence can be approximated by the overshoot distribution of the random walk. It is then of interest to have asymptotic estimates of the overshoot distribution, when a random walk crosses a large boundary.

Consider a sequence of i.i.d. random variables image (with Y denoting a generic random variable in the sequence) and let

image

and

image

The quantity of interest is the distribution of the overshoot image. If image are i.i.d. positive random variables with cumulative distribution function (c.d.f.) image, then image can be viewed as inter-arrival times of buses at a stop. The overshoot is then the time to next bus when an observer is waiting for a bus at time b. The distribution of the overshoot, and hence also of the time to next bus, as image is a well known result in renewal theory.

Theorem 5 [17]

Ifimageare nonarithmetic2random variables, andimage, then

image

Further, ifimage, then

image

When the image are i.i.d. but not necessarily non-negative, and image, then the following concept of ladder variables can be used. Let

image

Note that if image, then image is a positive random variable. Also, if

image

then the distribution of image is the same as the overshoot distribution for the sum of a sequence of i.i.d. positive random variables (each with distribution equal to that of image) crossing the boundary b. Therefore, by applying Theorem 5, we have the following result.

Theorem 6 [17]

Ifimageare nonarithmetic, then

image

Further, ifimage, then

image

Techniques for computing the required quantities involving the distribution of the ladder height image in Theorem 6 can be found in [17].

As mentioned earlier, often the stochastic sequence considered in quickest change detection problem can be written as a sum of a random walk and a sequence of small perturbations. Let

image

and

image

Then,

image

Therefore, assuming that image, Wald’s Identity (see Theorem 4) implies that

image (6.3)

image (6.4)

Thus,

image

If image and image are finite then it is easy to see that

image

where image is as defined in Table 6.1.

But if we can characterize the overshoot distribution of image when it crosses a large threshold then we can obtain better approximations for image. Nonlinear renewal theory allows us to obtain distribution of the overshoot when image satisfies some properties.

Definition 3

image is a slowly changing sequence if

image (6.5)

and for every image, there exists image and image such that for all image

image (6.6)

If indeed image is a slowly changing sequence, then the distribution of image, as image, is equal to the asymptotic distribution of the overshoot when the random walk image crosses a large positive boundary, as stated in the following result.

Theorem 7 [17]

Ifimageare nonarithmetic andimageis a slowly changing sequence then

image

Further, ifimageand certain additional conditions (imagein [17] ) are satisfied, then

image

whereimage, andimageis the limit ofimagein distribution.

3.06.3 Bayesian quickest change detection

As mentioned earlier we will primarily focus on the case where the observation process image is a discrete time stochastic process, with image taking real values, whose distribution changes at some unknown change point. In the Bayesian setting it is assumed that the change point is a random variable image taking values on the non-negative integers, with image. Let image (correspondingly image) be the probability measure (correspondingly expectation) when the change occurs at time image. Then, image and image stand for the probability measure and expectation when image, i.e., the change does not occur. At each time step a decision is made based on all the information available as to whether to stop and declare a change or to continue taking observations. Thus the time at which the change is declared is a stopping time image on the sequence image (see Section 3.06.2.2). Define the average detection delay (ADD) and the probability of false alarm (PFA), as

image (6.7)

image (6.8)

Then, the Bayesian quickest change detection problem is to minimize image subject to a constraint on image. Define the class of stopping times that satisfy a constraint image on image:

image (6.9)

Then the Bayesian quickest change detection problem as formulated by Shiryaev is as follows.

image (6.10)

Under an i.i.d. model for the observations, and a geometric model for the change point image, (6.10) can be solved exactly by relating it to a stochastic control problem [4,5]. We discuss this i.i.d. model in detail in Section 3.06.3.1. When the model is not i.i.d., it is difficult to find algorithms that are exactly optimal. However, asymptotically optimal solutions, as image, are available in a very general non-i.i.d. setting [12], as discussed in Section 3.06.3.2.

3.06.3.1 The Bayesian i.i.d. setting

Here it is assumed that conditioned on the change point image, the random variables image are i.i.d. with probability density function (p.d.f.) image before the change point, and i.i.d. with p.d.f. image after the change point. The change point image is modeled as geometric with parameter image, i.e., for image

image (6.11)

where image is the indicator function. The goal is to choose a stopping time image on the observation sequence image to solve (6.10).

A solution to (6.10) is provided in Theorem 8 below. Let image denote the observations up to time n. Also let

image (6.12)

be the a posteriori probability at time n that the change has taken place given the observation up to time n. Using Bayes’ rule, image can be shown to satisfy the recursion

image (6.13)

where

image (6.14)

image, image is the likelihood ratio, and image.

Definition 4

Kullback-Leibler (K-L) Divergence

The K-L divergence between two p.d.f.’s image and image is defined as

image

Note that image with equality iff image almost surely. We will assume that

image

Theorem 8 [4,5]

The optimal solution to Bayesian optimization problem of (6.10) is the Shiryaev algorithm/test, which is described by the stopping time:

image (6.15)

ifimagecan be chosen such that

image (6.16)

Proof

Towards solving (6.10), we consider a Lagrangian relaxation of this problem that can be solved using dynamic programming.

image (6.17)

where image is the Lagrange multiplier, image. It is shown in [4,5] that under the assumption (6.16), there exists a image such that the solution to (6.17) is also the solution to (6.10).

Let image denote the state of the system at time n. After the stopping time image it is assumed that the system enters a terminal state image and stays there. For image, we have image for image, and image otherwise. Then we can write

image

Furthermore, let image denote the stopping decision variable at time n, i.e., image if image and image otherwise. Then the optimization problem in (6.17) can be written as a minimization of an additive cost over time:

image

with

image

Using standard arguments [19] it can be seen that this optimization problem can be solved using infinite horizon dynamic programming with sufficient statistic (belief state) given by:

image

which is the a posteriori probability of (6.12).

The optimal policy for the problem given in (6.17) can be obtained from the solution to the Bellman equation:

image (6.18)

where

image

It can be shown by using an induction argument that both J and image are non-negative concave functions on the interval image, and that image. Then, it is easy to show that the optimal solution for the problem in (6.17) has the following structure:

image

 image

See Figure 6.2a for a plot of image and image as a function of image. Figure 6.2b shows a typical evolution of the optimal Shiryaev algorithm.

image

Figure 6.2 Cost curves and typical evolution of the Shiryaev algorithm. (a) A plot of the cost curves for image. (b) Typical evolution of the Shiryaev algorithm. Threshold image and change point image.

We now discuss some alternative descriptions of the Shiryaev algorithm. Let

image

and

image

We note that image is the likelihood ratio of the hypotheses “image” and “image” averaged over the change point:

image (6.19)

where as defined before image. Also, image is a scaled version of image:

image (6.20)

Like image can also be computed using a recursion:

image

It is easy to see that image and image have one-to-one mappings with the Shiryaev statistic image.

Algorthm 1

[( Shiryaev Algorithm)] The following three stopping times are equivalent and define the Shiryaev stopping time

image (6.21)

image (6.22)

image (6.23)

with image.

We will later see that defining the Shiryaev algorithm using the statistic image(6.19) will be useful in Section 3.06.3.2, where we discuss the Bayesian quickest change detection problem in a non-i.i.d. setting. Also, defining the Shiryaev algorithm using the statistic image(6.20) will be useful in Section 3.06.4 where we discuss quickest change detection in a minimax setting.

3.06.3.2 General asymptotic Bayesian theory

As mentioned earlier, when the observations are not i.i.d. conditioned on the change point image, then finding an exact solution to the problem (6.10) is difficult. Fortunately, a Bayesian asymptotic theory can be developed for quite general pre- and post-change distributions [12]. In this section we discuss the results from [12] and provide a glimpse of the proofs.

We first state the observation model studied in [12]. When the process evolves in the pre-change regime, the conditional density of image given image is image. After the change happens, the conditional density of image given image is given by image.

As in the i.i.d. case, we can define the a posteriori probability of change having taken place before time n, given the observation up to time n, i.e.,

image (6.24)

with the understanding that the recursion (6.13) is no longer valid, except for the i.i.d. model.

We note that in the non-i.i.d. case also image is the likelihood ratio of the hypotheses “image” and “image.” If image, then following (6.19), image can be written for a general change point distribution image as

image

where

image

If image is geometrically distributed with parameter image, the above expression reduces to

image

In fact, image can even be computed recursively in this case:

image (6.25)

with image.

In [12], it is shown that if there exists q such that

image (6.26)

(image for the i.i.d. model), and some additional conditions on the rates of convergence are satisfied, then the Shiryaev algorithm (6.21) is asymptotically optimal for the Bayesian optimization problem of (6.10) as image. In fact, image minimizes all moments of the detection delay as well as the moments of the delay, conditioned on the change point. The asymptotic optimality proof is based on first finding a lower bound on the asymptotic moment of the delay of all the detection procedures in the class image, as image, and then showing that the Shiryaev stopping time (6.21) achieves that lower bound asymptotically.

To state the theorem, we need the following definitions. Let q be the limit as specified in (6.26), and let image. Then define

image

Thus, image is the last time that the log likelihood sum image falls outside an interval of length image around q. In general, existence of the limit q in (6.26) only guarantees image, and not the finiteness of the moments of image. Such conditions are needed for existence of moments of detection delay of image. In particular, for some image, we need:

image (6.27)

and

image (6.28)

Now, define

image

The parameter d captures the tail parameter of the distribution of image. If image is “heavy tailed” then image, and if image has an “exponential tail” then image. For example, for the geometric prior with parameter image.

Theorem 9 [12]

If the likelihood ratios are such that (6.26) is satisfied then

1. Ifimagethenimageas defined in (6.21) belongs to the setimage.

2. For allimage, the mth moment of the conditional delayimageconditioned onimageimagesatisfiesimage

image (6.29)

3. For allimageif (6.27) is satisfied then for allimage

image (6.30)

4. The mth (unconditional) moment of the delay satisfies

image (6.31)

5. If (6.27) and (6.28) are satisfied, then for allimage

image (6.32)

Proof

We provide sketches of the proofs for part (1), (2), and (3). The proofs of (4) and (5) follow by averaging the results in (2) and (3) over the prior on the change point.

1. Note that

image

Thus, image would ensure image. Since, image, we have the result.

2. Let image be a positive number. By Chebyshev inequality,

image

This gives a lower bound on the detection delay

image

Minimizing over the family image, we get

image

Thus, if

image (6.33)

then image is a lower found for the detection delay of the family image. It is shown in [12] that if (6.26) is satisfied then (6.33) is true for image for all image.

3. We only summarize the technique used to obtain the upper bound. Let image be any stochastic process such that

image

Let

image

and for image

image

First note that image. Also, on the set image, image for all image. The event image implies image. Using these observations we have

image

If image, and because image was chosen arbitrarily, we have

image

This also motivates the need for conditions on finiteness of higher order moments of image to obtain upper bound on the moments of the detection delay. image

From the above theorem, the following corollary easily follows.

Corollary 1 [12]

If the likelihood ratios are such that (6.266.28) are satisfied for someimagethen for the Shiryaev stopping timeimagedefined in (6.21)

image (6.34)

A similar result can be concluded for the conditional moments as well.

3.06.3.3 Performance analysis for i.i.d. model with geometric prior

We now present the second order asymptotic analysis of the Shiryaev algorithm for the i.i.d. model, provided in [12] using the tools from nonlinear renewal theory introduced in Section 3.06.2.3.

When the observations image are i.i.d. conditioned on the change point, condition (6.26) is satisfied and

image

where image is the K-L divergence between the densities image and image (see Definition 4). From Thereom 9, it follows that for image,

image

Also, it is shown in [12] that if

image

then conditions (6.27) and (6.28) are satisfied, and hence from Corollary 1,

image (6.35)

Note that the above statement provides the asymptotic delay performance of the Shiryaev algorithm. However, the bound for image can be quite loose and the first order expression for the image(6.35) may not provide good estimate for image if the image is not very small. In that case it is useful to have a second order estimate based on nonlinear renewal theory, as obtained in [12].

First note that (6.25) for the i.i.d. model will reduce to

image (6.36)

with image. Now, let image, and recall that image. Then, it can be shown that

image (6.37)

Therefore the Shiryaev algorithm can be equivalently written as

image

We now show how the asymptotic overshoot distribution plays a key role in second order asymptotic analysis of image and image. Since, image implies that image, we have,

image

image

Thus,

image

and we see that image is a function of the overshoot when image crosses a from below.

Similarly,

image

Following the developments in Section 3.06.2.3, if the sequence image satisfies some additional conditions, then we can write3:

image

It is shown in [12] that image is a slowly changing sequence, and hence the distribution of image, as image, is equal to the asymptotic distribution of the overshoot when the random walk image crosses a large positive boundary. We define the following quantities: the asymptotic overshoot distribution of a random walk

image (6.38)

its mean

image (6.39)

and its Laplace transform at 1

image (6.40)

Also, the sequence image satisfies some additional conditions and hence the following results are true.

Theorem 10 [12]

Ifimages are nonarithmetic thenimageis a slowly changing sequence. Then by Theorem 7

image

This implies

image

If in additionimagethen

image

whereimageis the a.s. limit of the sequenceimage.

In Table 6.2, we compare the asymptotic expressions for image and image given in Theorem 10 with simulations. As can be seen in the table, the asymptotic expressions get more accurate as image approaches 0.

Table 6.2

image

Image

In Figure 6.3 we plot the image as a function of image for Gaussian observations. For a image constraint of image that is small, image, and

image

giving a slope of image to the trade-off curve.

image

Figure 6.3 ADD-PFA trade-off curve for the Shiryaev algorithm: image.

When image, the observations contain more information about the change than the prior, and the tradeoff slope is roughly image. On the other hand, when image, the prior contains more information about the change than the observations, and the tradeoff slope is roughly image. The latter asymptotic slope is achieved by the stopping time that is based only on the prior:

image

This is also easy to see from (6.14). With image small, image, and the recursion for image reduces to

image

Expanding we get image. The desired expression for the mean delay is obtained from the equation image.

3.06.4 Minimax quickest change detection

When the distribution of the change point is not known, we may model the change point as a deterministic but unknown positive integer image. A number of heuristic algorithms have been developed in this setting. The earliest work is due to Shewhart [1,2], in which the log likelihood based on the current observation is compared with a threshold to make a decision about the change. The motivation for such a technique is based on the following fact: if X represents the generic random variable for the i.i.d. model with image and image as the pre- and post-change p.d.fs, then

image (6.41)

where as defined earlier image, and image and image correspond to expectations when image and image, respectively. Thus, after image, the log likelihood of the observation X is more likely to be above a given threshold. Shewhart’s method is widely employed in practice due to its simplicity; however, significant performance gain can be achieved by making use of past observations to make the decision about the change. Page [3] proposed such an algorithm that uses past observations, which he called the CuSum algorithm. The motivation for the CuSum algorithm is also based on (6.41). By the law of large numbers, image grows to image as image. Thus, if image is the accumulated log likelihood sum, then before image, image has a negative drift and evolves towards image. After image, image has a positive drift and climbs towards image. Therefore, intuition suggests the following algorithm should detect this change in drift:

image (6.42)

where image. Note that

image

Thus, image can be equivalently defined as follows.

Algorthm 2

[( CuSum algorithm)]

image (6.43)

where

image (6.44)

The statistic image has the convenient recursion:

image (6.45)

It is this cumulative sum recursion that led Page to call image the CuSum algorithm.

The summation on the right hand side (RHS) of (6.44) is assumed to take the value 0 when image. It turns out that one can get an algorithm that is equivalent to the above CuSum algorithm by removing the term image in the maximization on the RHS of (6.44), to get the statistic:

image (6.46)

The statistic image also has a convenient recursion:

image (6.47)

Note that unlike the statistic image, the statistic image can be negative. Nevertheless it is easy to see that both image and image will cross a positive threshold b at the same time (sample path wise) and hence the CuSum algorithm can be equivalently defined in terms of image as:

image (6.48)

An alternative way to derive the CuSum algorithm is through the maximum likelihood approach i.e., to compare the likelihood of image against image. Formally,

image (6.49)

Cancelling terms and taking log in (6.49) gives us (6.48) with image.

See Figure 6.4 for a typical evolution of the CuSum algorithm.

image

Figure 6.4 Typical Evolution of the CuSum algorithm. Threshold image and change point image.

Although, the CuSum algorithm was developed heuristically by Page [3], it was later shown in [68,10], that it has very strong optimality properties. In this section, we will study the CuSum and related algorithms from a fundamental viewpoint, and discuss how each of these algorithms is provably optimal with respect to a meaningful and useful optimization criterion.

Without a prior on the change point, a reasonable measure of false alarms is the mean time to false alarm, or its reciprocal, which is the false alarm rate (image):

image (6.50)

Finding a uniformly powerful test that minimizes the delay over all possible values of image subject to a image constraint is generally not possible. Therefore it is more appropriate to study the quickest change detection problem in a minimax setting in this case. There are two important minimax problem formulations, one due to Lorden [6] and the other due to Pollak [9].

In Lorden’s formulation, the objective is to minimize the supremum of the average delay conditioned on the worst possible realizations, subject to a constraint on the false alarm rate. In particular, if we define4

image (6.51)

and denote the set of stopping times that meet a constraint image on the image by

image (6.52)

We have the following problem formulation due to Lorden.

image (6.53)

For the i.i.d. setting, Lorden showed that the CuSum algorithm (6.43) is asymptotically optimal for Lorden’s formulation (6.53) as image. It was later shown in [7,8] that the CuSum algorithm is actually exactly optimal for (6.53). Although the CuSum algorithm enjoys such a strong optimality property under Lorden’s formulation, it can be argued that image is a somewhat pessimistic measure of delay. A less pessimistic way to measure the delay was suggested by Pollak [9]:

image (6.54)

for all stopping times image for which the expectation is well-defined.

Lemma 1

image

Proof

Due to the fact that image is a stopping time on image,

image

Therefore, for each n

image

and the lemma follows. image

We now state Pollak’s formulation of the problem that uses image as the measure of delay.

image (6.55)

Pollak’s formulation has been studied in the i.i.d. setting in [9,20], where it is shown that algorithms based on the Shiryaev-Roberts statistic (to be defined later) are within a constant of the best possible performance over the class image, as image.

Lai [10] studied both (6.53) and (6.55) in a non-i.i.d. setting and developed a general minimax asymptotic theory for these problems. In particular, Lai obtained a lower bound on image, and hence also on the image, for every stopping time in the class image, and showed that an extension of the CuSum algorithm for the non-i.i.d. setting achieves this lower bound asymptotically as image.

In Section 3.06.4.1 we introduce a number of alternatives to the CuSum algorithm for minimax quickest change detection in the i.i.d. setting that are based on the Bayesian Shiryaev algorithm. We then discuss the optimality properties of these algorithms in Section 3.06.4.2. While we do not discuss the exact optimality of the CuSum algorithm from [7] or [8], we briefly discuss the asymptotic optimality result from [6]. We also note that the asymptotic optimality of the CuSum algorithm for both (6.53) and (6.55) follows from the results in the non-i.i.d. setting of [10], which are summarized in Section 3.06.4.3.

3.06.4.1 Minimax algorithms based on the Shiryaev algorithm

Recall that the Shiryaev algorithm is given by (see (6.20) and (6.23)):

image

where

image

Also recall that image has the recursion:

image

Setting image in the expression for image we get the Shiryaev-Roberts (SR) statistic [21]:

image (6.56)

with the recursion:

image (6.57)

Algorthm 3

[( Shiryaev-Roberts (SR) Algorithm)]

image (6.58)

It is shown in [9] that the SR algorithm is the limit of a sequence of Bayes tests, and in that limit it is asymptotically Bayes efficient. Also, it is shown in [20] that the SR algorithm is second order asymptotically optimal for (6.55), as image, i.e., its delay is within a constant of the best possible delay over the class image. Further, in [22], the SR algorithm is shown to be exactly optimal with respect to a number of other interesting criteria.

It is also shown in [9] that a modified version of the SR algorithm, called the Shiryaev-Roberts-Pollak (SRP) algorithm, is third order asymptotically optimal for (6.55), i.e., its delay is within a constant of the best possible delay over the class image, and the constant goes to zero as image. To introduce the SRP algorithm, let image be the quasi-stationary distribution of the SR statistic image above:

image

The new recursion, called the Shiryaev-Roberts-Pollak (SRP) recursion, is given by,

image

with image distributed according to image.

Algorthm 4

[( Shiryaev-Roberts-Pollak image Algorithm)]

image (6.59)

Although the SRP algorithm is strongly asymptotically optimal for Pollak’s formulation of (6.55), in practice, it is difficult to compute the quasi-stationary distribution image. A numerical framework for computing image efficiently is provided in [23]. Interestingly, the following modification of the SR algorithm with a specifically designed starting point image is found to outperform the SRP procedure uniformly over all possible values of the change point [20]. This modification, referred to as the SR-r procedure, has the recursion:

image

Algorthm 5

[( Shiryaev-Roberts-r (SR-r) Algorithm)]

image (6.60)

It is shown in [20] that the SR-r algorithm is also third order asymptotically optimal for (6.55), i.e., its delay is within a constant of the best possible delay over the class image, and the constant goes to zero as image.

Note that for an arbitrary stopping time, computing the image metric (6.54) involves taking supremum over all possible change times, and computing the image metric (6.51) involves another supremum over all possible past realizations of observations. While we can analyze the performance of the proposed algorithms through bounds and asymptotic approximations, as we will see in Section 3.06.4.2, it is not obvious how one might evaluate image and image for a given algorithm in computer simulations. This is in contrast with the Bayesian setting, where image (see (6.7)) can easily be evaluated in simulations by averaging over realizations of change point random variable image.

Fortunately, for the CuSum algorithm (6.43) and for the Shiryaev-Roberts algorithm (6.58), both image and image are easy to evaluate in simulations due to the following lemma.

Lemma 2

image (6.61)

image (6.62)

Proof

The CuSum statistic image (see (6.44)) has initial value 0 and remains non-negative for all n. If the change were to happen at some time image, then the pre-change statistic image is greater than or equal 0, which equals the pre-change statistic if the change happens at image. Therefore, the delay for the CuSum statistic to cross a positive threshold b is largest when the change happens at image, irrespective of the realizations of the observations, image. Therefore

image

and

image

This proves (6.61). A similar argument can be used to establish (6.62).

Note that the above proof crucially depended on the fact that both the CuSum algorithm and the Shiryaev-Roberts algorithm start with the initial value of 0. Thus it is not difficult to see that Lemma 2 does not hold for the SR-r algorithm, unless of course image. Lemma 2 holds partially for the SRP algorithm since the initial distribution image makes the statistic image stationary in n. As a result image is the same for every n. However, as mentioned previously, image is difficult to compute in practice, and this makes the evaluation of image and image in simulations somewhat challenging.

3.06.4.2 Optimality properties of the minimax algorithms

In this section we first show that the algorithms based on the Shiryaev-Roberts statistics, SR, SRP, and SR-r are asymptotically optimal for Pollak’s formulation of (6.55). We need an important theorem that is proved in [22].

Theorem 11 [22]

If the threshold in the SR algorithm (6.58) can be selected to meet the constraintimageonimagewith equality, then

image

Proof

We give a sketch of the proof here. Note that

image

and hence is finite for all stopping times for which image and image are finite. The first part of the proof is to show that

image

This follows from the following result of [9]. If

image

with image having the geometric distribution of (6.11) with parameter image, and

image (6.63)

Then for a given image (with a given threshold), there exists a sequence image and with image and image such that image converge to image, as image. Thus, the SR algorithm is the limit of a sequence of Bayes tests. Moreover,

image

By (6.63), for any stopping time image, it holds that

image

Now by taking the limit image on both sides, using the fact that for any stopping time image[22]

image

and using the hypothesis of the theorem that the image constraint can be met with equality by using image, we have the desired result.

The next step in the proof is to show that it is enough to consider stopping times in the class image that meet the constraint of image with quality. The result then follows easily from the fact that image is optimal with respect to the numerator in imageimage

We now state the optimality proof for the procedures SR, SR-r and SRP. We only provide an outline of the proof to illustrate the fundamental ideas behind the result.

Theorem 12 [20]

Ifimage, andimageis nonarithmetic then

1. 

image (6.64)

whereimageis a constant that can be characterized using renewal theory [18].

2. For the choice of thresholdimage, and

image (6.65)

whereimageis again a constant that can be characterized using renewal theory [18].

3. There exists a choice for the thresholdimagesuch thatimageand

image (6.66)

4. There exists a choice for the thresholdimagesuch thatimageand a choice for the initial point r such that

image (6.67)

Proof

To prove that the above mentioned choice of thresholds meets the image constraint, we note that image is a martingale. Specifically, image is a martingale and the conditions of Theorem 3 are satisfied [24]. Hence,

image

Since, image, for image we have

image

For a description of how to set the thresholds for image and image, we refer the reader to [20].

The proofs of the delay expressions for all the algorithms have a common theme. The first part of these proofs is based on Theorem 11. We first show that image is a lower bound to image.

image

From Theorem 11, since image is optimal with respect to minimizing image, we have

image

The next step is to use nonlinear renewal theory (see Section 3.06.2.3) to obtain a second order approximation for the right hand side of the above equation, as we did for the Shiryaev procedure in Section 3.06.3.3:

image

The final step is to show that the image for the SR-r and SRP procedures are within image, and the image for SR procedure is within image of this lower bound (6.64). This is done by obtaining second order approximations using nonlinear renewal theory for the image of SR, SRP, SR-r procedures, and get (6.656.67), respectively.

It is shown in [22] that image is also equivalent to the average delay when the change happens at a “far horizon”: image. Thus, the SR algorithm is also optimal with respect to that criterion.

The following corollary follows easily from the above two theorems. Recall that in the minimax setting, an algorithm is called third order asymptotically optimum if its delay is within an image term of the best possible, as the image goes to zero. An algorithm is called second order asymptotically optimum if its delay is within an image term of the best possible, as the image goes to zero. And an algorithm is called first order asymptotically optimal if the ratio of its delay with the best possible goes to 1, as the image goes to zero.

Corollary 2

Under the conditions of Theorem 11, the SR-r (6.60) and the SRP (6.59) algorithms are third order asymptotically optimum, and the SR algorithm is second order asymptotically optimum for the Pollak’s formulation (6.55). Furthermore, using the choice of thresholds specified in Theorem 11to meet theimageconstraint ofimagethe asymptotic performance of all three algorithms are equal up to first order:

image

Furthermore, by Lemma 2, we also have:

image

In [6] the asymptotic optimality of the CuSum algorithm (6.43) as image is established for Lorden’s formulation of (6.53). First, ergodic theory is used to show that choosing the threshold image ensures image. For the above choice of threshold image, it is shown that

image

Then the exact optimality of the SPRT [25] is used to find a lower bound on the image of the class image:

image

These arguments establish the first order asymptotic optimality of the CuSum algorithm for Lorden’s formulation. Furthermore, as we will see later in Theorem 13, Lai [10] showed that:

image

Thus by Lemma 2, the first order asymptotic optimality of the CuSum algorithm extends to Pollak’s formulation (6.55), and we have the following result.

Corollary 3

The CuSum algorithm (6.43) with thresholdimageis first order asymptotically optimum for both Lorden’s and Pollak’s formulations. Furthermoreimage

image

In Figure 6.5, we plot the trade-off curve for the CuSum algorithm, i.e., plot image as a function of image. Note that the curve has a slope of image.

image

Figure 6.5 ADD-PFA trade-off curve for the CuSum algorithm: image.

3.06.4.3 General asymptotic minimax theory

In [10], the non-i.i.d. setting earlier discussed in Section 3.06.3.2 is considered, and asymptotic lower bounds on the image and image for stopping times in image are obtained under quite general conditions. It is then shown that the an extension of the CuSum algorithm (6.43) to this non-i.i.d. setting achieves this lower bound asymptotically as image.

Recall the non-i.i.d. model from Section 3.06.3.2. When the process evolves in the pre-change regime, the conditional density of image given image is image. After the change happens, the conditional density of image given image is given by image. Also

image

The CuSum algorithm can be generalized to the non-i.i.d. setting as follows.

Algorthm 6

[( Generalized CuSum algorithm)] Let

image

Then the stopping time for the generalized CuSum is

image (6.68)

Then the following result is proved in [10].

Theorem 13

If there exists a positive constant I such that theimages satisfy the following condition

image (6.69)

then, asimage

image (6.70)

Further

image

and under the additional condition of

image (6.71)

we have

image

Thus, if we set image, then

image

and

image

which is asymptotically equal to the lower bound in (6.70) up to first order. Thus image is first-order asymptotically optimum within the class image of tests that meet the image constraint of image.

Proof

We only provide a sketch of the proof for the lower bound since it also based on the idea of using Chebyshev’s inequality. The fundamental idea of the proof is to use Chebyshev’s inequality to get a lower bound on any arbitrary stopping time image from image, such that the lower bound is not a function of image. The lower bound obtained is then a lower bound on the image for the entire family image.

Let image and image be positive constants. To show that

image

it is enough to show that there exists n such that

image

This n is obtained from the following condition. Let m be any positive integer. Then if image, there exists n such that

image (6.72)

We use the n that satisfies the condition (6.72). Then, by Chebyshev’s inequality

image

We can then write

image

Now it has to be shown that image uniformly over the family image. To show this, we condition on image.

image

The trick now is to use the hypothesis of the theorem and choose proper values of image and image such that the two terms on the right hand side of the above equations are bounded by terms that go to zero and are not a function of the stopping time image. We can write

image

In [10], it is shown that if we choose

image

and

image

then the condition (6.69) ensures that the above probability goes to zero. The other term goes to zero by using a change of measure argument and using condition (6.72):

image

 image

3.06.5 Relationship between the models

We have discussed the Bayesian formulation of the quickest change detection problem in Section 3.06.3 and the minimax formulations of the problem in Section 3.06.4. The choice between the Bayesian and the minimax formulations is obvious, and is governed by the knowledge of the distribution of image. However, it is not obvious which of the two minimax formulations should be chosen for a given application. As noted earlier, the formulations of Lorden and Pollak are equivalent for low image constraints, but differences arise for moderate values of image constraints. Recent work by Moustakides [26] has connected these three formulations and found possible interpretations for each of them. We summarize the result below.

Consider a model in which the change point is dependent on the stochastic process. That is, the probability that change happens at time n depends on image. Let

image

The distribution of image thus belongs to a family of distributions. Assume that while we are trying to find a suitable stopping time image to minimize delay, an adversary is searching for a process image such that the delay for any stopping time is maximized. That is the adversary is trying to solve

image

It is shown in [26] that if the adversary has access to the observation sequence, and uses this information to choose image, then image becomes the delay expression in Lorden’s formulation (6.53) for a given image, i.e.,

image

However, if we assume that the adversary does not have access to the observations, but only has access to the test performance, then image is equal to the delay in Pollak’s formulation (6.55), i.e.,

image

Finally, the delay for the Shiryaev formulation (6.10) corresponds to the case when image is restricted to only one possibility, the distribution of image.

3.06.6 Variants and generalizations of the quickest change detection problem

In the previous sections we reviewed the state-of-the-art for quickest change detection in a single sequence of random variables, under the assumption of complete knowledge of the pre- and post-change distributions. In this section we review three important variants and extensions of the classical quickest change detection problem, where significant progress has been made. We discuss other variants of the change detection problem as part of our future research section.

3.06.6.1 Quickest change detection with unknown pre- or post-change distributions

In the previous sections we discussed quickest change detection problem when both the pre- and post-change distributions are completely specified. Although this assumption is a bit restrictive, it helped in obtaining recursive algorithms with strong optimality properties in the i.i.d. setting, and allowed the development of asymptotic optimality theory in a very general non-i.i.d. setting. In this section, we provide a review of the existing results for the case where this assumption on the knowledge of the pre- and post-change distributions is relaxed.

Often it is reasonable to assume that the pre-change distribution is known. This is the case when changes occur rarely and/or the decision maker has opportunities to estimate the pre-change distribution parameters before the change occurs. When the post-change distribution is unknown but the pre-change distribution is completely specified, the post-change uncertainty may be modeled by assuming that the post-change distribution belongs to a parametric family image. Approaches for designing algorithms in this setting include the generalized likelihood ratio (GLR) based approach or the mixture based approach. In the GLR based approach, at any time step, all the past observations are used to first obtain a maximum likelihood estimate of the post-change parameter image, and then the post-change distribution corresponding to this estimate of image is used to compute the CuSum, Shiryaev-Roberts or related statistics. In the mixture based approach, a prior is assumed on the space of parameters, and the statistics (e.g., CuSum or Shiryaev-Roberts), computed as a function of the post change parameter, are then integrated over the assumed prior.

In the i.i.d. setting this problem is studied in [6] for the case when the post-change p.d.f. belongs to a single parameter exponential family image, and the following generalized likelihood ratio (GLR) based algorithm is proposed:

image (6.73)

If image, and image is of the order of image, then it is shown in [6] that as the FAR constraint image,

image

and for each post-change parameter image,

image

Here, the notation image is used to denote the image when the post-change observations have p.d.f image. Note that image is the best one can do asymptotically for a given FAR constraint of image, even when the exact post-change distribution is known. Thus, this GLR scheme is uniformly asymptotically optimal with respect to the Lorden’s criterion (6.53).

In [27], the post-change distribution is assumed to belong to an exponential family of p.d.f.s as above, but the GLR based test is replaced by a mixture based test. Specifically, a prior image is assumed on the range of image and the following test is proposed:

image (6.74)

For the above choice of threshold, it is shown that

image

and for each post-change parameter image, if image is in a set over which image has a positive density, as the FAR constraint image,

image

Thus, even this mixture based test is uniformly asymptotically optimal with respect to the Lorden’s criterion.

Although the GLR and mixture based tests discussed above are efficient, they do not have an equivalent recursive implementation, as the CuSum test or the Shiryaev-Roberts tests have when the post-change distribution is known. As a result, to implement the GLR or mixture based tests, we need to use the entire past information image, which grows with n. In [10], asymptotically optimal sliding-window based GLR and mixtures tests are proposed, that only used a finite number of past observations. The window size has to be chosen carefully and is a function of the FAR. Moreover, the results here are valid even for the non-i.i.d. setting discussed earlier in Section 3.06.4.3. Recall the non-i.i.d. model from Section 3.06.3.2, with the prechange conditional p.d.f. given by image, and the post-change conditional p.d.f. given by image. Then the generalized CuSum (6.68) GLR and mixture based algorithms are given, respectively, by:

image (6.75)

and

image (6.76)

It is shown that for a suitable choice of image and image, under some conditions on the likelihood ratios and on the distribution G, both of these tests satisfy the FAR constraint of image. In particular,

image

and as image,

image

Moreover, under some conditions on the post-change parameter image,

image

Thus, under the conditions stated, the above window-limited tests are also asymptotically optimal. We remark that, although finite window based tests are useful for the i.i.d. setting here, we still need to store the entire history of observations in the non-i.i.d. setting to compute the conditional densities, unless the dependence is finite order Markov. See [10,28] for details.

To detect a change in mean of a sequence of Gaussian observations in an i.i.d. setting, i.e., when image and image, the GLR rule discussed above (6.75) (with image and image) reduces to

image (6.77)

This test is studied in [29] and performance of the test, i.e., expressions for image and image are obtained. In [28], it is shown that a window-limited modification of the above scheme (6.77) can also be designed.

When both pre- and post-change distributions are not known, again GLR based or mixture based tests have been studied and asymptotic properties characterized. In [30], this problem has been studied in the i.i.d setting (Bayesian and non-Bayesian), when both pre- and post-change distributions belong to an exponential family. For the Bayesian setting, the change point is assumed to be geometric and there are priors (again from an exponential family) on both the pre- and post-change parameters. GLR and mixture based tests are proposed that have asymptotic optimality properties. For a survey of other algorithms when both the pre- and post-change distributions are not known, see [28].

In [31], rather than developing procedures that are uniformly asymptotically optimal for each possible post-change distribution, robust tests are characterized when the pre- and post-change distributions belong to some uncertainty classes. The objective is to find a stopping time that satisfies a given false alarm constraint (probability of false alarm or the image depending on the formulation) for each possible pre-change distribution, and minimizes the detection delay (Shiryaev, Lorden or the Pollak version) supremized over each possible pair of pre- and post-change distributions. It is shown that under some conditions on the uncertainty classes, the least favorable distribution from each class can be obtained, and the robust test is the classical test designed according to the lease favorable distribution pair. It is shown in [31] that although robust test is not uniformly asymptotically optimal, it can be significantly better than the GLR based test of [6] for some parameter ranges and for moderate values of image. The robust solution also has the added advantage that it can be implemented in a simple recursive manner, while the GLR test does not admit a recursive solution in general, and may require the solution to a complex nonconvex optimization problem at every time instant.

3.06.6.2 Data-efficient quickest change detection

In the classical problem of quickest change detection that we discussed in Section 3.06.3, a change in the distribution of a sequence of random variables has to be detected as soon as possible, subject to a constraint on the probability of false alarm. Based on the past information, the decision maker has to decide whether to stop and declare change or to continue acquiring more information. In many engineering applications of quickest change detection there is a cost associated with acquiring information or taking observations, e.g., cost associated with taking measurements, or cost of batteries in sensor networks, etc. (see [32] for a detailed motivation and related references). In [32], Shiryaev’s formulation (Section 3.06.3) is extended by including an additional constraint on the cost of observations used in the detection process. The observation cost is captured through the average number of observations used before the change point, with the understanding that the cost of observations after the change point is included in the delay metric.

In order to minimize the average number of observations used before image, at each time instant, a decision is made on whether to use the observation in the next time step, based on all the available information. Let image, with image if it is been decided to take the observation at time n, i.e., image is available for decision making, and image otherwise. Thus, image is an on-off (binary) control input based on the information available up to time image, i.e.,

image

with image denoting the control law and I defined as:

image

Here, image represents image if image, otherwise image is absent from the information vector image.

Let image represent a policy for data-efficient quickest change detection, where image is a stopping time on the information sequence image. The objective in [32] is to solve the following optimization problem:

image (6.78)

image

Here, image and image stand for average detection delay, probability of false alarm and average number of observations used, respectively, and image and image are given constraints.

Define,

image

Then, the two-threshold algorithm from [32] is:

Algorthm 7

[( DE-Shiryaev: image)] Start with image and use the following control, with image, for image:

image (6.79)

The probability image is updated using the following recursions:

image

with image.

With image the DE-Shiryaev algorithm reduces to the Shiryaev algorithm. When the DE-Shiryaev algorithm is employed, the a posteriori probability image typically evolves as depicted in Figure 6.6a. It is shown in [32] that for a fixed image, as image:

image (6.80)

and

image (6.81)

Here, image is the asymptotic overshoot distribution of the random walk image, when it crosses a large positive boundary. It is shown in [12] that these are also the performance expressions for the Shiryaev algorithm. Thus, the image and image of the DE-Shiryaev algorithm approach that of the Shiryaev algorithm as image, i.e., the DE-Shiryaev algorithm is asymptotically optimal.

image

Figure 6.6 Evolution and performance of the DE-Shiryaev algorithm. (a) Evolution of image for image, and image, with thresholds image and image. (b) Trade-off curves comparing performance of the DE-Shiryaev algorithm with the Fractional Sampling scheme when image is chosen to achieve image equal to 50% of mean time to change. Also image, and image.

The DE-Shiryaev algorithm is also shown to have good delay-observation cost trade-off curves: for moderate values of probability of false alarm, for Gaussian observations, the delay of the algorithm is within 10% of the Shiryaev delay even when the observation cost is reduced by more than 50%. Furthermore, the DE-Shiryaev algorithm is substantially better than the standard approach of fractional sampling scheme, where the Shiryaev algorithm is used and where the observations to be skipped are determined a priori in order to meet the observation constraint (see Figure 6.6b).

In most practical applications, prior information about the distribution of the change point is not available. As a result, the Bayesian solution is not directly applicable. For the classical quickest change detection problem, an algorithm for the non-Bayesian setting was obtained by taking the geometric parameter of the prior on the change point to zero (see Section 3.06.4.1). Such a technique cannot be used in the data-efficient setting. This is because when an observation is skipped in the DE-Shiryaev algorithm in [32], the a posteriori probability is updated using the geometric prior. In the absence of prior information about the distribution of the change point, it is by no means obvious what the right substitute for the prior is. A useful way to capture the cost of observations in a minimax setting is also needed.

In [33], the minimax formulation of [9] is used to propose a minimax formulation for data-efficient quickest change detection. We observe that in the two-threshold algorithm image, when the change occurs at a far horizon, it is the fraction of observations taken before change that is controlled. This insight is used to formulate a duty cycle based metric to capture the cost of taking observations before the change point. Also, we note that the duration for which observations are not taken in the algorithm in [32], is also a function of the undershoot of the a posteriori probability when it goes below the threshold B. Given the fact that image for the DE-Shiryaev algorithm, has the interpretation of the likelihood ratio of the hypotheses “image” and “image,” the undershoots essentially carry the information on the likelihood ratio. It is shown in [33] that this insight can be used to design a good test in the non-Bayesian setting. This algorithm is called the DE-CuSum algorithm and it is shown that it inherits good properties of the DE-Shiryaev algorithm. The DE-CuSum algorithm is also asymptotically optimal in a sense similar to (6.80) and (6.81), has good trade-off curves, and performs significantly better than the standard approach of fractional sampling.

3.06.6.3 Distributed sensor systems

In the previous sections, we provided a summary of existing work on quickest change detection and classification in the single sensor (equivalently, centralized multisensor) setting. For the problem of detecting biological and chemical agents, the setting that is more relevant is one where there is a set of distributed sensors collecting the data relevant for detection, as shown in Figure 6.7. Based on the observations that the sensors receive, they send messages (which could be local decisions, but not necessarily) to a fusion center where a final decision about the hypothesis or change is made.

image

Figure 6.7 Change detection using distributed sensors.

Since the information available for detection is distributed across the sensors in the network, these detection problems fall under the umbrella of distributed (or decentralized) detection, except in the impractical setting where all the information available at the sensors is immediately available without any errors at the fusion center. Such decentralized decision making problems are extremely difficult. Without certain conditional independence assumptions across sensors, the problem of finding the optimal solutions, even in the simplest case of static binary hypothesis testing, is computationally intractable [3439]. Decentralized dynamic decision making problems, of which the quickest change detection problem is a special case, are even more challenging due to the fact that information pattern in these problems is non-classical, i.e., the different decision makers have access to different pieces of information [40].

The problem of decentralized quickest change detection in distributed sensor systems was introduced in [41], and is described as follows. Consider the distributed multisensor system with L sensors, communicating with a fusion center shown in Figure 6.7. At time n, an observation image is made at sensor image. The changes in the statistics of the sequences image are governed by the event. Based on the information available at time n, a message image is sent from sensor image to the fusion center. The fusion center may possibly feedback some control signals to the sensors based on all the messages it has received so far. For example, the fusion center might inform the sensors how to update their local decision thresholds. The final decision about the change is made at the fusion center.

There are two main approaches to generating the messages at the sensors. In the first approach, the sensors can be thought of simply quantizing their observations, i.e., image is simply a quantized version of image. The goal then is to choose these quantizers over time and across the sensors, along with a decision rule at the fusion center, to provide the best tradeoff between detection delay and false alarms. In the second approach, the sensors perform local change detection, using all of their observations, and the fusion center combines these decisions to make the final decision about the change.

The simplest observation model for the decentralized setting is one where the sensors are affected by the change at the same time, and the observations are i.i.d. in time at each sensor and independent across sensors in both the pre-change and post-change modes. This model was introduced in [41], and studied in a Bayesian setting with a geometric prior on the change point for the case of quantization at the sensors. It was shown that, unlike the centralized problem for which the Shiryaev test is optimal (see Section 3.06.3), in the decentralized setting the optimization problem is intractable in even for this simple observation model. Some progress can be made if we allow for feedback from the fusion center [41]. Useful results can be obtained in the asymptotic setting where the probability of false (Bayesian formulation) or false alarm rate (minimax formulation) go to zero. These results can be summarized as follows (see, e.g., [12,42,43] for more details):

• It is asymptotically optimum for the sensors to use stationary monotone likelihood ratio quantizers, i.e., the sensors use the same quantization functions at all times, and the quantization regions are obtained by dividing the likelihood ratio of the observations into intervals and assigning the quantization levels to them in increasing order.

• The optimum quantization thresholds at the sensors are chosen to maximize the K-L divergence between the post-change and pre-change distributions at the sensors.

• For fixed stationary quantizers at the sensors, the fusion center is faced with a centralized quickest change detection problem. Therefore, depending on the optimization criterion (Bayes or minimax), asymptotically optimum change detection procedures can be designed using the techniques described in Sections 3.06.3 and 3.06.4

• The tradeoff between delay and false alarms is governed by the K-L divergence of the quantized observations at the output of the sensors, and hence the first order asymptotic performance with quantization is at best equal to that without quantization.

For the case where the sensors make local decisions about the change, it is reasonable to assume that the local detection procedures use (asymptotically) optimum centralized (single sensor) statistics. For example, in the Bayesian setting, the Shiryaev statistic described in Algorithm 1 can be used, and in the minimax setting one of the statistics described in Section 3.06.4.1 can be used depending on the specific minimax criterion used. The more interesting aspect of the decision-making here is the fusion rule used to combine the individual sensor decisions. There are three main basic options that can be considered [43]:

• image: the fusion center stops and declares the change as soon as one of the sensors’ statistics crosses its decision threshold.

• image: the sensors stop taking observations once their local statistics cross their thresholds, and the fusion center stops and declares the change after all sensors have stopped.

• image: the sensors continue taking observations even if their local statistics cross their thresholds, and the fusion center stops and declares the change after all the sensor statistics are above their local thresholds simultaneously.

It was first shown by [44] that the image procedure using CuSum statistics at the sensors is globally first order asymptotically optimal under Lorden’s criterion (6.53) if the sensor thresholds are chose appropriately. That is, the first order performance is the same as that of the centralized procedure that has access to all the sensor observations. A more detailed analysis of minimax setting was carried out in [45], in which procedures based on using CuSum and Shiryaev-Roberts statistics at the sensors were studied under Pollak’s criterion (6.55). It was again shown that image is globally first order asymptotically optimal, and that image and image are not.

For the Bayesian case, if the sensors use Shiryaev statistics, then both image and image can be shown to be globally first order asymptotically optimal, with an appropriate choice of sensor thresholds [43,46]. The procedure image does not share this asymptotic optimality property.

Interestingly, while tests based on local decision making at the sensors can be shown to have the same first order performance as that of the centralized test, simulations reveal that these asymptotics “kick in” at unreasonably low values of false alarm probability (rate). In particular, even schemes based on binary quantization at the sensors can perform better than the globally asymptotically optimum local decision based tests at moderate values of false alarm probability (rate) [43]. These results point to the need for further research on designing procedures that perform local detection at the sensors that provide good performance at moderate levels of false alarms.

3.06.6.4 Variants of quickest change detection problem for distributed sensor systems

In Section 3.06.6.3, it is assumed for the decentralized quickest change detection problem, that the change affects all the sensors in the system simultaneously. In many practical systems it is reasonable to assume that the change will be seen by only a subset of the sensors. This problem can be modeled as quickest change detection with unknown post-change distribution, with a finite number of possibilities. A GLRT based approached can of course be used, in which multiple CuSum tests are run in parallel, corresponding to each possible post-change hypotheses. But this can be quite expensive from an implementation view point. In [47], a CuSum based algorithm is proposed in which, at each sensor a CuSum test is employed, the CuSum statistic is transmitted from the sensors to the fusion center, and at the fusion center, the CuSum statistics from all the sensors are added and compared with a threshold. This test has much lower computational complexity as compared to the GLR based test, and is shown to be asymptotically as good as the centralized CuSum test, as the image goes to zero. Although this test is asymptotically optimal, the noise from the sensors not affected by change can degrade the performance for moderate values of false alarm rate. In [48], this work of [47], is extended to the case where information is transmitted from the sensors only when the CuSum statistic at each sensor is above a certain threshold. It is shown that this has the surprising effect of suppressing the unwanted noise and improving the performance. In [49], it is proposed that a soft-thresholding function should be used to suppress these noise terms, and a GLRT based algorithm is proposed to detect presence of a stationary intruder (with unknown position) in a sensor network with Gaussian observations. A similar formulation is described in [50].

The Bayesian decentralized quickest change detection problem under an additional constraint on the cost of observations used is studied in [51]. The cost of observations is captured through the average number of observations used until the stopping time and it is shown that a threshold test similar to the Shiryaev algorithm is optimal. Recently, this problem has been studied in a minimax setting in [52] and asymptotically minimax algorithms have been proposed. Also, see [53,54] for other interesting energy-efficient algorithms for quickest change detection in sensor networks.

3.06.7 Applications of quickest change detection

As mentioned in the introduction, the problem of quickest change detection has a variety of applications. A complete list of references to applications of quickest change detection can be quite overwhelming and therefore we only provide representative references from some of the areas. For a detailed list of references to application in areas such as climate modeling, econometrics, environment and public health, finance, image analysis, navigation, remote sensing, etc., see [13,55].

1. Statistical process control (SPC): As discussed in the introduction, algorithms are required that can detect a sudden fault arising in an industrial process or a production process. In recent years algorithms for SPC with sampling rate and sampling size control have also been developed to minimize the cost associated with sampling [56,57]. See [58,59] and the references therein for some recent contributions.

2. Sensor networks: As discussed in [32], quickest change detection algorithms can be employed in sensor networks for infrastructure monitoring [60], or for habitat monitoring [61]. Note that in these applications, the sensors are deployed for a long time, and the change may occur rarely. Therefore, data-efficient quickest change detection algorithms are needed (see Section 3.06.6.2).

3. Computer network security: Algorithms for quickest change detection have been applied in the detection of abnormal behavior in computer networks due to security breaches [6264].

4. Cognitive radio: Algorithms based on the CuSum algorithm or other quickest change detection algorithms can be developed for cooperative spectrum sensing in cognitive radio networks to detect activity of a primary user. See [53,6567].

5. Neuroscience: The evolution of the Shiryaev algorithm is found to be similar to the dynamics of the Leaky Integrate-and-Fire model for neurons [68].

6. Social networks: It is suggested in [69,70] that the algorithms from the change detection literature can be employed to detect the onset of the outbreak of a disease, or the effect of a bioterroist attack, by monitoring drug sales at retail stores.

3.06.8 Conclusions and future directions

In this article we reviewed the state-of-the-art in the theory of quickest change detection. We saw that while exactly or nearly optimal algorithms are available only for the i.i.d. model and for the detection of a change in a single sequence of observations, asymptotically optimal algorithms can be obtained in a much broader setting. We discussed the uniform asymptotic optimality of GLR and mixture based tests, when the post-change distribution is not known. We discussed algorithms for data-efficient quickest change detection, and showed that they are also asymptotically equivalent to their classical counterparts. For the decentralized quickest change detection model, we discussed various algorithms that are asymptotically optimal. We also reviewed the asymptotic optimality theory in the Bayesian as well as in the minimax setting for a general non-i.i.d. model, and showed that extensions of the Shiryaev algorithm and the CuSum algorithm to the non-i.i.d. setting are asymptotically optimal. Nevertheless, the list of topics discussed in this article is far from exhaustive.

Below we enumerate possible future directions in which the quickest change detection problem can be explored. We also provide references to some recent articles in which some research on these topics has been initiated.

1. Transient change detection: It is assumed throughput this article that the change is persistent, i.e., once the change occurs, the system stays in the post-change state forever. In many applications it might be more appropriate to model change as transient, i.e., the system only stays in the post-change state for a finite duration and then returns to the pre-change state; see e.g., [7173]. In this setting, in addition to false alarm and delay metrics, it may be of interest to consider metrics related to the detection of the change while the system is still in the change state.

2. Change propagation: In applications with multiple sensors, unlike the model assumed in Section 3.06.6.3, it may happen that the change does not affect all the sensors simultaneously [74]. The change may propagate from one sensor to the next, with the statistics of the propagation process being known before hand [75].

3. Multiple change points in networks: In some monitoring application there may be multiple change points that affect different sensors in a network, and the goal is to exploit the relationship between the change points and the sensors affected to detect the changes [76].

4. Quickest change detection with social learning: In the classical quickest change detection problem, to compute the statistic at each time step, the decision maker has access to the entire past observations. An interesting variant of the problem is quickest change detection with social learning, when the time index is replaced by an agent index, i.e., when the statistic is updated over agents and not over time, and the agents do not have access to the entire past history of observations but only to some processed version (e.g., binary decisions) from the previous agent; see [7779].

5. Change detection with simultaneous classification: In many applications, the post-change distribution is not uniquely specified and may come from one of multiple hypotheses image, in which case along with detecting the change it of interest to identify which hypothesis is true. See, e.g., [80,81].

6. Synchronization issues: If a quickest change detection algorithm is implemented in a sensor network where sensors communicate with the fusion center using a MAC protocol, the fusion center might receive asynchronous information from the sensors due to networking delays. It is of interest to develop algorithms that can detect changes while handling MAC layer issues; see, e.g., [50].

Relevant Theory: Signal Processing Theory, and Machine Learning, Statistical Signal Processing

See Vol. 1, Chapter 11 Parametric Estimation

See Vol. 1, Chapter 12 Adaptive Filters

See Vol. 1, Chapter 18 Introduction to Probabilistic Graphical Models

See this Volume, Chapter 5 Distributed Signal Detection

References

1. Shewhart WA. The application of statistics as an aid in maintaining quality of a manufactured product. J Am Stat Assoc. 1925;20:546–548.

2. Shewhart WA. Economic Control of Quality of Manufactured Product. American Society for Quality Control 1931.

3. Page ES. Continuous inspection schemes. Biometrika. 1954;41:100–115.

4. Shiryaev AN. On optimum methods in quickest detection problems. Theory Probab Appl. 1963;8:22–46.

5. Shiryayev AN. Optimal Stopping Rules. New York: Springer-Verlag; 1978.

6. Lorden G. Procedures for reacting to a change in distribution. Ann Math Stat. 1971;42:1897–1908.

7. Moustakides GV. Optimal stopping times for detecting changes in distributions. Ann Stat. 1986;14:1379–1387.

8. Ritov Y. Decision theoretic optimality of the CUSUM procedure. Ann Stat. 1990;18:1464–1469.

9. Pollak M. Optimal detection of a change in distribution. Ann Stat. 1985;13:206–227.

10. Lai TL. Information bounds and quick detection of parameter changes in stochastic systems. IEEE Trans Inf Theory. 1998;44:2917–2929.

11. Polunchenko A, Tartakovsky AG. State-of-the-art in sequential change-point detection. Methodol Comput Appl Probab. 2011;1–36.

12. Tartakovsky AG, Veeravalli VV. General asymptotic Bayesian theory of quickest change detection. SIAM Theory Probab Appl. 2005;49:458–497.

13. Poor HV, Hadjiliadis O. Quickest Detection. Cambridge University Press 2009.

14. Chow YS, Robbins H, Siegmund D. Great Expectations: The Theory of Optimal Stopping. Houghton Mifflin 1971.

15. Tartakovsky AG, Nikiforov IV, Basseville M. Sequential Analysis: Hypothesis Testing and Change-Point Detection, Statistics. CRC Press 2013.

16. Williams D. Probability with Martingales, Cambridge Mathematical Textbooks. Cambridge University Press 1991.

17. Siegmund D. Sequential Analysis: Tests and Confidence Intervals, Springer Series in Statistics. Springer-Verlag 1985.

18. Woodroofe M. Nonlinear Renewal Theory in Sequential Analysis, CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM 1982.

19. Bertsekas D. In: Belmont, Massachusetts: Athena Scientific; 1995; Dynamic Programming and Optimal Control. vols. I and II.

20. A.G. Tartakovsky, M. Pollak, A. Polunchenko, Third-order asymptotic optimality of the generalized Shiryaev-Roberts changepoint detection procedures, May, 2010, arXiv e-prints.

21. Roberts SW. A comparison of some control chart procedures. Technometrics. 1966;8:411–430.

22. Pollak M, Tartakovsky AG. Optimality properties of the Shiryaev-Roberts procedure. Stat Sinica. 2009;19:1729–1739.

23. Moustakides GV, Polunchenko AS, Tartakovsky AG. A numerical approach to performance analysis of quickest change-point detection procedures. Stat Sinica. 2011;21:571–596.

24. Pollak M. Average run lengths of an optimal method of detecting a change in distribution. Ann Stat. 1987;15:749–779.

25. Wald A, Wolfowitz J. Optimum character of the sequential probability ratio test. Ann Math Stat. 1948;19(3):326–339.

26. Moustakides GV. Sequential change detection revisited. Ann Stat. 2008;36:787–807.

27. Pollak M, Siegmund D. Approximations to the expected sample size of certain sequential tests. Ann Stat. 1975;3:1267–1282.

28. Lai TL. Sequential changepoint detection in quality control and dynamical systems. J Roy Stat Soc Suppl. 1995;57(4):613–658.

29. Siegmund D, Venkatraman ES. Using the generalized likelihood ratio statistic for sequential detection of a change-point. Ann Stat. 1995;23:255–271.

30. Lai TL, Xing H. Sequential change-point detection when the pre- and post-change parameters are unknown. Sequent Anal. 2010;29:162–175.

31. Unnikrishnan J, Veeravalli VV, Meyn SP. Minimax robust quickest change detection. IEEE Trans Inf Theory. 2011;57:1604–1614.

32. Banerjee T, Veeravalli VV. Data-efficient quickest change detection with on-off observation control. Sequent Anal. 2012;31:40–77.

33. Banerjee T, Veeravalli VV. Data-efficient minimax quickest change detection. In: IEEE Conference on Acoustics, Speech, and Signal Processing (ICASSP). March 2012;3937–3940.

34. Tsitsiklis JN. Decentralized detection. In: Greenwich, CT: JAI Press; 1993;Poor HV, Thomas JB, eds. Advances in Statistical Signal Processing. vol. 2.

35. Varshney PK. Distributed Detection and Data Fusion. New York: Springer-Verlag; 1996.

36. Willett P, Swaszek PF, Blum RS. The good, bad and ugly: distributed detection of a known signal in dependent Gaussian noise. IEEE Trans Signal Process. 2000;48:3266–3279.

37. Chamberland JF, Veeravalli VV. Decentralized detection in sensor networks. IEEE Trans Signal Process. 2003;51:407–416.

38. Chamberland J-F, Veeravalli VV. Wireless sensors in distributed detection applications. IEEE Signal Process Mag. 2007;24:16–25 (special issue on Resource-Constrained Signal Processing, Communications, and Networking).

39. Veeravalli VV, Varshney PK. Distributed inference in wireless sensor networks. Philos Trans R Soc A: Math Phys Eng Sci. 2012;370:100–117.

40. Ho Y. Team decision theory and information structures. Proc IEEE. 1980;68:644–654.

41. Veeravalli VV. Decentralized quickest change detection. IEEE Trans Inf Theory. 2001;47:1657–1665.

42. Tartakovsky AG, Veeravalli VV. Change-point detection in multichannel and distributed systems. In: New York, USA: Marcel Dekker, Inc.; 2004;339–370. Mukhopadhyay N, Datta S, Chattopadhyay S, eds. Applied Sequential Methodologies: Real-World Examples with Data Analysis, Statistics: A Series of Textbooks and Monographs. vol. 173.

43. Tartakovsky AG, Veeravalli VV. Asymptotically optimal quickest change detection in distributed sensor systems. Sequent Anal. 2008;27:441–475.

44. Mei Y. Information bounds and quickest change detection in decentralized decision systems. IEEE Trans Inf Theory. 2005;51:2669–2681.

45. Tartakovsky AG, Kim H. Performance of certain decentralized distributed change detection procedures. In: IEEE International Conference on Information Fusion, Florence, Italy. July 2006;1–8.

46. Tartakovsky AG, Veeravalli VV. Quickest change detection in distributed sensor systems. In: IEEE International Conference on Information Fusion, Cairns, Australia. July 2003;756–763.

47. Mei Y. Efficient scalable schemes for monitoring a large number of data streams. Biometrika. 2010;97:419–433.

48. Mei Y. Quickest detection in censoring sensor networks. In: IEEE International Symposium on Information Theory (ISIT). August 2011;2148–2152.

49. Xie Y, Siegmund D. Multi-sensor change-point detection. In: Joint Statistical Meetings. August 2011.

50. Premkumar K, Kumar A, Kuri J. Distributed detection and localization of events in large ad hoc wireless sensor networks. In: Allerton Conference on Communication, Control, and Computing. October 2009;178–185.

51. Premkumar K, Kumar A. Optimal sleep-wake scheduling for quickest intrusion detection using wireless sensor networks. In: IEEE Conference on Computer Communications (INFOCOM). April 2008;1400–1408.

52. Banerjee T, Veeravalli VV. Energy-efficient quickest change detection in sensor networks. In: IEEE Statistical Signal Processing Workshop. August 2012.

53. Banerjee T, Sharma V, Kavitha V, JayaPrakasam AK. Generalized analysis of a distributed energy efficient algorithm for change detection. IEEE Trans Wireless Commun. 2011;10:91–101.

54. Zacharias L, Sundaresan R. Decentralized sequential change detection using physical layer fusion. IEEE Trans Wireless Commun. 2008;7:4999–5008.

55. Basseville M, Nikiforov IV. Detection of Abrupt Changes: Theory and Application. Englewood Cliffs, NJ: Prentice Hall; 1993.

56. Tagaras G. A survey of recent developments in the design of adaptive control charts. J Qual Technol. 1998;30:212–231.

57. Stoumbos ZG, Reynolds MR, Ryan TP, Woodall WH. The state of statistical process control as we proceed into the 21st century. J Am Stat Assoc. 2000;95:992–998.

58. Stoumbos ZG, Reynolds MR. Economic statistical design of adaptive control schemes for monitoring the mean and variance: an application to analyzers. Nonlinear Anal Real World Appl. 2005;6:817–844.

59. Makis V. Multivariate bayesian control chart. Oper Res. 2008;56:487–496.

60. Rice JA, Mechitov K, Sim S, et al. Flexible smart sensor framework for autonomous structural health monitoring. Smart Struct Syst. 2010;6(5–6):423–438.

61. Mainwaring A, Culler D, Polastre J, Szewczyk R, Anderson J. Wireless sensor networks for habitat monitoring. In: Proceedings of the 1st ACM International Workshop on Wireless Sensor Networks and Applications, WSNA ’02. New York, NY, USA: ACM; September 2002;88–97.

62. Thottan M, Ji C. Anomaly detection in IP networks. IEEE Trans Signal Process. 2003;51:2191–2204.

63. Tartakovsky AG, Rozovskii BL, Blazek RB, Kim H. A novel approach to detection of intrusions in computer networks via adaptive sequential and batch-sequential change-point detection methods. IEEE Trans Signal Process. 2006;54:3372–3382.

64. Cardenas AA, Radosavac S, Baras JS. Evaluation of detection algorithms for MAC layer misbehavior: theory and experiments. IEEE/ACM Trans Network. 2009;17:605–617.

65. Lai L, Fan Y, Poor HV. Quickest detection in cognitive radio: A sequential change detection framework. In: IEEE GLOBECOM. December 2008;1–5.

66. Jayaprakasam AK, Sharma V. Cooperative robust sequential detection algorithms for spectrum sensing in cognitive radio. In: International Conference on Ultramodern Telecommunications (ICUMT). October 2009;1–8.

67. Jayaprakasam AK, Sharma V. Sequential detection based cooperative spectrum sensing algorithms in cognitive radio. In: First UK-India International Workshop on Cognitive Wireless Systems (UKIWCWS). December 2009;1–6.

68. Yu AJ. Optimal change-detection and spiking neurons. In: Cambridge, MA: MIT Press; 2007;1545–1552. Schölkopf B, Platt J, Hoffman T, eds. Advances in Neural Information Processing Systems. vol. 19.

69. Frisen M. Optimal sequential surveillance for finance, public health, and other areas. Sequent Anal. 2009;28:310–337.

70. Fienberg SE, Shmueli G. Statistical issues and challenges associated with rapid detection of bio-terrorist attacks. Stat Med. 2005;24:513–529.

71. Han C, Willett PK, Abraham DA. Some methods to evaluate the performance of Page’s test as used to detect transient signals. IEEE Trans Signal Process. 1999;47:2112–2127.

72. Wang Z, Willett PK. A performance study of some transient detectors. IEEE Trans Signal Process. 2000;48:2682–2685.

73. Premkumar K, Kumar A, Veeravalli VV. Bayesian quickest transient change detection. In: International Workshop on Applied Probability (IWAP). July 2010.

74. Hadjiliadis O, Zhang H, Poor HV. One shot schemes for decentralized quickest change detection. IEEE Trans Inf Theory. 2009;55:3346–3359.

75. Raghavan V, Veeravalli VV. Quickest change detection of a Markov process across a sensor array. IEEE Trans Inf Theory. 2010;56:1961–1981.

76. Nguyen X, Amini A, Rajagopal R. Message-passing sequential detection of multiple change points in networks. In: IEEE International Symposium on Information Theory (ISIT). July 2012.

77. Krishnamurthy V. Quickest time change detection with social learning. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). March 2012;5257–5260.

78. Krishnamurthy V. Bayesian sequential detection with phase-distributed change time and nonlinear penalty; a POMDP lattice programming approach. IEEE Trans Inf Theory. 2011;57:7096–7124.

79. V. Krishnamurthy, Quickest detection with social learning: interaction of local and global decision makers, March 2012, arXiv e-prints.

80. Nikiforov IV. A lower bound for the detection/isolation delay in a class of sequential tests. IEEE Trans Inf Theory. 2003;49:3037–3046.

81. Tartakovsky AG. Multidecision quickest change-point detection: previous achievements and open problems. Sequent Anal. 2008;27:201–231.


2A random variable is arithmetic if all of it probability mass is on a lattice. Otherwise it is said to be non-arithmetic.

3As explained in [12], this analysis is facilitated if we restrict to the worst-case detection delay, which is obtained by conditioning on the event that the change happens at time 1.

4Lorden defined image with image inside the expectation, i.e., he assumed a delay penalty of 1 if the algorithm stops at the change point. We drop this additional penalty in our definition in order to be consistent with the other delay definitions in this chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.45.92