Methods of statistical process control (SPC) help to monitor and improve processes in manufacturing and service industries, and they are also often used in fields such as public-health surveillance. For the given process, relevant quality characteristics are measured over time, thus leading to a (possibly multivariate) stochastic process of continuous-valued or discrete-valued random variables (variables data or attributes data, respectively). Examples of such quality characteristics could be the diameter of a drill hole (variables data) or the number of non-conformities (attributes data) in a produced item, or the number of infections in a health-related example. One of the most important SPC tools is the control chart, which requires the relevant quality characteristics to be measured online. Control charts are applied to a process operating in a stable state (in control); that is, is assumed to be stationary according to a specified model (the in-control model). As a new measurement arrives, it is used to compute a statistic (possibly also incorporating past values of the quality characteristic), which is then plotted on the control chart with its control limits. If the statistic violates the limits, then an alarm is triggered to signal that the process may not be stable anymore (out of control). So the process is interrupted, and it is checked if the alarm indeed results from an assignable cause (say, a shift or drift in the process mean); the time when the process left its in-control model is said to be the change point (more formal definitions are given below). In this case, corrective actions are required before continuing the process. If the process is still in its in-control state, the alarm is classified as a false alarm. An example of a control chart with limits 0 and 5 is shown in Figure 8.1, where the upper limit is violated at time . Note that the lower limit 0 can never be violated; that is, it is actually a one-sided (upper-sided) control chart. We shall discuss this control chart and the related application in much more detail in Example 8.2.2.3.
The use of control charts for prospective online monitoring, as described before, is commonly referred to as the Phase-II application. But control charts may also be applied in a retrospective manner to already available in-control data. This is called the Phase-I application of a control chart. During this iterative procedure, potential outliers are identified and removed from the data, and parameter estimates and the chart design are revised accordingly. A (successful) Phase-I analysis ends up with an estimated model characterizing the in-control properties of ; this model is then used for designing the control charts to be used during Phase-II monitoring. More details about all these terms and concepts can be found, among others, in the textbook by Montgomery (2009) and in the survey paper by Woodall & Montgomery (2014).
In this book, we shall exclusively concentrate on attributes data processes, and we shall start with the monitoring of count processes. Typical examples from manufacturing industry are the number of non-conformities per produced item (range ) or the number of defective items in a sample of size (range ). Non-manufacturing examples include counts of new cases of an infection (per time unit) in public-health surveillance, or counts of complaints by customers (per time unit) in a service industry. The majority of studies on the monitoring of such counts assumes the process to be i.i.d. in its in-control state (Woodall, 1997), but in this book, we shall attach more importance to the case of autocorrelated counts. In fact, there has been increasing research activity in this direction in recent years (Weiß, 2015b), and Alwan & Roberts (1995) have already shown that autocorrelation is indeed a common phenomenon in SPC-related count processes. Typical reasons are a high sampling frequency due to automated production environments in manufacturing industry, or varying service times (extending over more than one time unit) in service industry, or varying incubation times and infectivities of diseases in public-health surveillance.
In Section 8.2, we start with basic Shewhart charts for a count process, where the plotted statistic at time is a function only of the most recent observation (or of the most recent sample for sample-based monitoring). While the Shewhart charts themselves are rather simple, they offer an opportunity to introduce general design principles for control charts. These principles are applied in Section 8.3 when considering advanced control charts, such as the CUSUM and EWMA methods, where the plotted statistic at time also uses past observations of the process and hence accumulates information about the process for a longer period of time. Later in Chapter 9, we shall move our focus towards the monitoring of categorical processes, but where methods for count data still might be useful for a sample-based monitoring approach.
The first control charts were proposed by Shewhart (1926 1931). Because of this pioneering work, a number of standard control charts are referred to as Shewhart control charts; an extensive review of Shewhart control charts is given by Montgomery (2009). The characteristic feature of these charts is that the plotted statistic is a function only of the most recent observation (or of the most recent sample for sample-based monitoring). Then is plotted on a chart against time with time-invariant lower and upper control limits (as in Figure 8.1). An alarm is triggered at time for the first time if
then the process is interrupted to check for an assignable cause. The (random) run length of the control chart is defined as
the corresponding run length distribution turns out to be of utmost importance when designing the control chart; that is, when choosing the control limits and .
If monitoring a count process with a Shewhart chart, the counts are commonly directly plotted on the chart as they arrive in time; that is, the plotted statistics are . If the range of the counts is unlimited, such a chart is referred to as a chart ( for “count”). For the case of the finite range , a chart for is said to be an chart, while a chart plots the relative quantities . This terminology is obviously motivated by the binomial distribution (Example A.2.1) and by the idea of a sample-based monitoring, with or expressing the absolute number or relative proportion, respectively, of “successes” in the sample being collected at time . For simplicity, we shall always consider the case of an unlimited range (and hence charts) in this section, but the presented concepts apply to charts and charts as well; see Section 9.1.1. A truly two-sided chart has control limits with . One-sided charts are obtained by either setting (upper-sided chart) or (lower-sided chart).
For the rest of this section, let us assume that is serially independent. If the process is in-control, it is i.i.d. and has the in-control marginal distribution . As an out-of-control scenario, we restrict to the case of a sudden shift; that is, at a certain time (called change point), the marginal distribution becomes . This leads to the following (unconditional) change point model (Knoth, 2006):
For , the process is in control,
while it is out of control for if .
For a change point , the process is out of control right from the beginning. If the control chart triggers an alarm at time (rule (8.2)), we stop monitoring and conclude that the process might have run out of control. If indeed and , the alarm was correct; otherwise, it was a false alarm. In the first case, the difference expresses the delay in detecting the change point. Here, the “” is used since even in the case of immediate detection (), we have one out-of-control observation (say, one defective item in a production process).
At this point, it is important to study the run length of the control chart – see (8.2) – in more detail. If the process is in control, we wish the run length to be large (a robust chart), since then the run length expresses the time until the first false alarm. In contrast, it should be small for an out-of-control process, since the run length then goes along with the delay in detecting the process change. As for a significance test, the approach to designing the chart is to choose the control limits in such a way that a certain degree of robustness against false alarms is guaranteed. For this purpose, one looks at properties of the in-control run length distribution. These could be quantiles, such as the median, but the main approach (although one that is sometimes criticized; see Kenett & Pollak (2012) as an example) is to consider the mean of the run length ; that is, the average run length (ARL).1 If there are several candidate designs leading to (roughly) the same in-control ARL, abbreviated as , then one compares the out-of-control ARL performances of these charts to select the final chart design.
So the question is how to compute the ARL given a specific chart design . If the process is in-control (that is, i.i.d. with marginal distribution ), then a signal is triggered at time with probability
Because of the independence of the plotted statistics, the distribution of is a shifted geometric distribution (Example A.1.5), so it follows immediately that
Note that this formula also includes the one-sided cases by setting and .
If the distribution becomes out of control at time , then the delay in detecting this change is (see above). Again, because of the independence of the plotted statistics (the non-aging property), this delay can still be described by a shifted geometric distribution, but using instead of . Therefore, the out-of-control ARL for the considered i.i.d.-scenario is defined by setting (since the true position of the change point does not affect the delay anyway); that is,
Note that ARLs should be interpreted with caution in practice, since the shifted geometric distribution is strongly skewed and has a large dispersion (the standard deviation nearly equals the mean; see Example A.1.5). This is illustrated by Figure 8.2, which shows the run length distribution corresponding to . Although an alarm is triggered in the mean after 250 plotted statistics, the median, for instance, equals only 173; that is, in 50% of all cases, the actual run length is not larger than 173. The quartiles range from 72 to 346, so again in 50% of all cases, the actual run length is outside even this region. For a further critical discussion, see Kenett & Pollak (2012).
A possible way to achieve an ARL-unbiased chart design that is close to any prespecified -level was proposed by Paulino et al. (2016), and relies on a randomization of the emission of an alarm.
Usually, a control chart is designed as if the true in-control model is known precisely. In reality, however, the in-control model has to be estimated from given data (believed to stem from the presumed in-control model). Due to the uncertainty of parameter estimation, the true performance of the chart will usually deviate from the “believed” one, and this difference might be rather large. In view of (8.3) and the typically large values for (as in Example 8.2.1.1), the control limits correspond to rather extreme quantiles of the (estimated) in-control distribution. So already moderate misspecifications of the model parameters may lead to strong effects on the control limits and ARLs. Hence, for the data examples below, we should be aware that we always consider some kind of conditional ARL performance, conditioned on the fitted model.
A comprehensive literature review of the effect of estimated parameters on control chart performance is provided by Jensen et al. (2006). Probably the first such work in the attributes case is by Braun (1999), who considers the and the charts, while Testik (2007) investigates the effect of estimation on the CUSUM chart for i.i.d. Poisson counts; this chart is discussed in Section 8.3.1.
While (appropriately chosen) Shewhart charts are generally quite sensitive to very large shifts in the process (and they are also generally recommended for application in Phase I (Montgomery, 2009)), Example 8.2.1.1 has already demonstrated that these charts are not particularly well-suited to detecting small-to-moderate shifts. For this reason, in Section 8.3 we shall consider advanced control schemes that are more sensitive to small shifts, because these charts are designed to have an inherent memory.
In this section, we skip the i.i.d.-assumption and allow the count process to be a Markov chain, say, an INAR(1) process as in Section 2.1, a binomial AR(1) process as in Section 3.3, or an INARCH(1) process as in Example 4.1.6. But still, our aim is to plot the observed counts directly on the chart with limits ; that is, we choose again .
Because of the serial dependence of the plotted statistics, now the run length (8.2) no longer follows a simple geometric distribution. In addition, if we now look at the detection delay , the corresponding distribution generally depends on the position of the change point . Therefore, more refined ARL concepts have to be considered; a detailed survey of different ARL concepts is provided by Knoth (2006). In this book, the following ARL concepts are used:
where denotes the expectation related to the change point .
As before, we refer to the computed ARL value as the in-control ARL (out-of-control ARL) if (), and the in-control ARL is signified by adding the index “0”.
Obviously, the zero-state ARL is nothing other than . For the case of serial independence, as considered in Section 8.2.1, we have , but otherwise these ARLs may differ. The essential questions are:
Let us start with the second question. When designing a chart, one first looks at the in-control behavior. In this context, it is reasonable to use the in-control zero-state ARL, , as a measure of robustness against false alarms. Then, in a second step, one analyzes the out-of-control behavior. If there are reasons to expect, say, that the change will probably happen quite early, then it would be reasonable to evaluate the out-of-control performance of an with sufficiently small . In many applications, however, one will not have such information. However, we shall see below that often converges rather quickly to . This implies that the steady-state ARL, , might serve as a reasonable approximation for the true mean delay of detection after the unknown change point. Therefore, in this book, we shall evaluate the out-of-control performance in terms of .
The first question is how to compute the different types of ARL. Certainly, it is always possible to approximate the ARLs through simulations; to simulate , one will simulate with a large as a substitute. But if considering a chart applied to an underlying Markov chain, a numerically exact solution is also possible by using the Markov chain (MC) approach proposed by Brook & Evans (1972). Since this approach can also be used for the advanced control charts to be introduced below, we provide a rather general description in the sequel following Weiß (2011b).
In view of decision rule (8.1), we can assume a slightly simplified range for the plotted statistics : their range is partitioned into the set of “no-alarm states” (because no alarm is triggered by the chart if takes a value in ) and the set consisting of a single “alarm state” ‘’. This is justified since any kind of violation of the control limits will lead to the same action: stop the process and search for an assignable cause. Therefore, ‘’ is an absorbing state; that is, it is no longer possible to leave this state. The set is equal to for the case of a two-sided chart.
The MC approach now assumes a conditional change point model (Weiß, 2011b), as given in Definition 8.2.2.1; see also the survey about Markov chains in Appendix B.2.
If for all , then the whole process is stationary according to the in-control model. Furthermore, since ‘’ is an absorbing state by definition, we have for all , where denotes the Kronecker delta. The requirement that consists of inessential states guarantees, among other things, that the probability of reaching ‘’ in finite time equals 1.
Let us now describe the procedures for computing the different types of ARL; derivations and more details can be found in Brook & Evans (1972) and Weiß (2011b). For this purpose, define to be the transpose of the transition matrix for the states in ; that is, . Analogously, we set . The requirement that consist of inessential states guarantees that the fundamental matrices and exist, where denotes the identity matrix. Since ‘’ is an absorbing state, the transition matrices of before and after the change point, respectively, are given by
To compute the out-of-control zero-state ARL (for the in-control ARL, we just have to replace by ), we first compute the unique solution of the equation
Here, the entries express the mean time to reach ‘’ if . After having specified the initial probabilities for (see Remark 8.2.2.2), we collect these probabilities in the vector (note that the change already happened at time 1). Then
If , then there exist in-control observations. So the entries of the solution to (8.9) express the mean delay to reach ‘’ if . If the vector consists of the probabilities and if refers to the (Markov property, in control), then the conditional expected delay equals
Finally, to compute the steady-state ARL, we need to take the limit according to (8.7). To be able to apply the Perron–Frobenius theorem – see Remark B.2.2.1 in Appendix B.2 for a summary – we have to assume that the non-negative matrix is primitive. For the corresponding Perron–Frobenius eigenvalue , there exists a strictly positive right eigenvector ; is the normed version of . Then
where the rate of convergence of for is determined by the second largest eigenvalue, which satisfies .
The basic chart presented in Section 8.2 allows for continuous monitoring of a serially dependent count process. But the statistic plotted on the chart at time , which is simply the count observed at time , does not include any information about the past observations of the process, or at least not explicitly, beyond the mere effect of autocorrelation. Therefore, the chart (as any other Shewhart-type chart) is not particularly sensitive to small changes in the process. For this reason, several types of advanced control charts have been proposed, in which the plotted statistic at time also uses past observations of the process and hence accumulates information about it for a longer period of time. In the sequel, we will discuss the most popular types of advanced control chart: CUSUM charts in Sections 8.3.1 and 8.3.2, and EWMA charts in Section 8.3.3. Further charts and references can be found in Woodall (1997) and Weiß (2015b).
The traditional cumulative sum (CUSUM) control chart, being applied directly to the observations of the process, is perhaps the most straightforward advanced candidate for monitoring processes of counts, because it preserves the discrete nature of the process by only using addition (but no multiplications). Initialized by a starting value , the upper-sided CUSUM is defined by
that is, by accumulating the deviations from the reference value . Because of this accumulation, the plotted statistic at time is not solely based on but also incorporates the process in the past: If the CUSUM statistic becomes negative, the construction resets the CUSUM to zero.
The starting value is commonly chosen as ; a value is referred to as a fast initial response (FIR) feature, and it may help to detect an initial out-of-control state more quickly; see also the discussion below formulae (8.5)–(8.7). If and are taken as integer values, then also is integer-valued. As another example, if then so is , but in any case, we have a discrete range. In the sequel, we shall concentrate on integer-valued . An alarm is triggered if violates the upper control limit (typically, ).
While the upper-sided CUSUM is designed to detect increases in the process mean, the lower-sided CUSUM, defined by
aims at uncovering decreases in the mean. If are monitored simultaneously, then this chart combination is referred to as a two-sided CUSUM chart. A book with a lot of background information about CUSUM charts is the one by Hawkins & Olwell (1998).
In this section, we assume the monitored count process to be i.i.d. in its in-control state, a situation that was also considered in the article by Brook & Evans (1972). Because of the accumulation according to (8.13), however, the statistics are no longer i.i.d., but constitute a Markov chain (analogous arguments apply to the lower-sided CUSUM (8.14)) with transition probabilities
and the initial statistic satisfies . Therefore, the MC approach as described in Section 8.2.2 is applicable, with . In fact, Brook & Evans (1972) introduced their MC approach for exactly this type of control chart and considered the application to i.i.d. Poisson counts.
We conclude this section by pointing out the relationship between the CUSUM scheme (8.13) and the sequential probability ratio test (SPRT); see Sections 6.1, 6.2 in Hawkins & Olwell (1998) for more details. The likelihood function (see Remark B.2.1.2) for i.i.d. counts is given by
so we obtain the likelihood ratio (LR) as
The SPRT now monitors the logarithmic likelihood ratio (log-LR)
for increasing . This procedure can be rewritten recursively by accumulating the contributions to the log-LR at times , thus leading to a type of one-sided CUSUM scheme:
Note the relation to the random walk in Example B.1.6. Comparing this type of CUSUM recursion with the one given in (8.13), we see that the construction is missing, so this CUSUM is not reset to zero if the CUSUM statistic becomes negative. As pointed out by Lorden (1971), the CUSUM (8.13) is equivalent to monitoring a slight modification of (8.15):
If this statistic was not positive at time but , then the statistic at time just equals , which corresponds to the above resetting feature.
Now, let us turn back to the case of a Markov-dependent count process , as in Section 8.2.2. If we apply the upper-sided CUSUM scheme (8.13) to such a process, then the statistics no longer constitute a Markov chain, so the MC approach of Brook & Evans (1972) is not directly applicable. But, as shown in Weiß & Testik (2009) and Weiß (2011b), ARL computations are possible by considering the bivariate process , which is a bivariate Markov chain with transition probabilities
In view of the CUSUM decision rule, it is clear that the set of “no-alarm states” is contained in . However, since values of larger than will always push beyond , the set is indeed finite. Excluding impossible transitions (say, from an alarm state back to a no-alarm state), Weiß & Testik (2009) showed that
which is of size . So the matrices required for the MC approach (8.8) are of dimension , which will often be a rather large number. It should be noted, however, that many entries of will be equal to 0 according to (8.17); that is, are sparse matrices. Therefore, the MC approach for ARL computation can be implemented efficiently using sparse matrix techniques; see Section 3 in Weiß (2011b) for possible software solutions.
The idea of applying the MC approach to the bivariate process of observed counts and CUSUM statistics also essentially applies to the lower-sided CUSUM scheme (8.14), but the set then becomes infinite; that is, ARLs can only be computed approximately (see Yontay et al. (2013) for details). If using a two-sided scheme, then the MC approach has to be applied to the trivariate Markov chain . Although here, the set is finite again, computations become very slow because of the immense matrix dimensions; more details and feasible approximations are presented by Yontay et al. (2013).
Although we exemplified the log-LR approach for Markov count processes here, it can also be used for completely different types of count process. As an example, Höhle & Paul (2008) derived such a log-LR CUSUM chart for counts stemming from the seasonal log-linear model (5.6), which proved to be useful for the surveillance of epidemic counts. A related study is the one by Sparks et al. (2010).
Another advanced approach for process monitoring, which is also very popular in applications, is the exponentially weighted moving-average (EWMA) control chart, which dates back to Roberts (1959). The standard EWMA recursion is defined by
with ; that is, it is a weighted mean of all available observations, where the weights decrease exponentially with increasing time lag :
An application of (8.19) to the case of Poisson counts was presented by Borror et al. (1998). The EWMA recursion (8.19), however, has an important drawback compared to the CUSUM approach of Sections 8.3.1 and 8.3.2 if applied to count processes: it does not preserve the discrete range, except the boundary case , which just corresponds to a chart. On the contrary, the range of possible values of changes in time, which rules out, among other things, the possibility of an exact ARL computation by the MC approach. As a simple numerical example, assume that and ; then takes a value in and in , and so on.
Therefore, Gan (1990a) suggests plotting rounded values of the statistic (8.19):
with , which are initialized by . Note that the statistics can take only integer values from , and again leads to a chart. might be chosen as the rounded value of the in-control mean. An alarm is triggered if violates one of the control limits .
In the i.i.d. case, as considered by Gan (1990a), the statistics constitute a Markov chain with transition probabilities
and the initial probabilities are obtained by replacing by . So the MC approach of Brook & Evans (1972) is applicable, analogous to the CUSUM case discussed in Section 8.3.1, but now with . Note that the lower limit can only be violated if holds; that is, if . Other choices of lead to a purely upper-sided EWMA chart.
If the underlying count process is itself a Markov chain, then we proceed by analogy to Section 8.3.2 and consider the bivariate process (Weiß, 2009e). constitutes a bivariate Markov chain with range and with transition probabilities
where denotes the indicator function. So ARLs can be computed again by adapting the MC approach; see Weiß (2009e) for details. Here, the set of “no-alarm states” is derived as
and the resulting matrices are again sparse matrices.
A possible disadvantage of the rounded EWMA approach (8.20) became clear from the second design in Example 8.3.3.2: for small values of , which are generally recommended if small mean shifts are to be detected, one may observe some kind of “oversmoothing”; that is, becomes piecewise constant in time and rather insensitive to process changes. Therefore, Weiß (2011c) proposed a modification of (8.20), where a refined rounding operation is used: for , the operation -round maps onto the nearest fraction with denominator . For , we obtain the standard rounding operation, while 2-round rounds onto values in , for example. The resulting -EWMA chart follows the recursion
with . If is a Markov chain, then again is a discrete Markov chain, now with range , where is the set of all non-negative rationals with denominator . So again, it is possible to adapt the MC approach of Brook & Evans (1972) for ARL computation; see Weiß (2011c) for details.
3.139.97.53