Chapter 9
Control Charts for Categorical Processes

Having discussed control charts for count processes in the Sections 8.2 and 8.3, we shall now turn to another type of attributes data process c09-math-001, namely categorical processes, as introduced in Part II. Although now the range (state space) c09-math-002 of c09-math-003 consists of a finite number c09-math-004 of (unordered) categories with c09-math-005, count values will still play an important role since the most obvious way of evaluating categorical data is by counting the occurrences of categories; see also Chapter 6.

For quality-related applications, c09-math-006 often describes the result of an inspection of an item, which either leads to the classification c09-math-007 for an c09-math-008 iff the c09-math-009th item was non-conforming of type c09-math-010, or c09-math-011 for a conforming item. A typical example is the one described by Mukhopadhyay (2008), in which a non-conforming ceiling fan cover is classified according to the most predominant type of paint defect, say “poor covering” or “bubbles”. Another field of application is the monitoring of network traffic data with different types of audit events; see Ye et al. (2002) for details. For monitoring such categorical processes, we shall consider two general strategies: if the process evolves too fast to be monitored continuously, then segments are taken from the process at selected times. For each of the resulting samples, a statistic is computed and plotted on a control chart. Here, it is important to carefully consider the serial dependence within the sample; see Section 9.1 for further details. In other cases, it is possible to continuously monitor the process, but then the serial dependence has to be taken into account between the plotted statistics. Control charts for this scenario are presented in Section 9.2. In both cases, we shall first concentrate on the special case of a binary process (that is, c09-math-012) and then extend our discussion to the general categorical case.

9.1 Sample-based Monitoring of Categorical Processes

In this section, we assume that the categorical process c09-math-013 cannot be monitored continuously. Instead, samples are taken as non-overlapping segments1 from the process at times c09-math-014, each being of a certain length c09-math-015. Note that we restrict ourselves to a constant segment length c09-math-016 for simplicity, but at least the Shewhart-type charts could be directly adapted to varying length c09-math-017 by using varying control limits (Montgomery, 2009, Section 7.3.2). The time distance c09-math-018 is assumed to be sufficiently large such that we do not need to worry about the serial dependence between the samples; just the serial dependence within the samples. After having collected the segment, a certain type of sample statistic is computed and then plotted on an appropriately designed control chart.

9.1.1 Sample-based Monitoring: Binary Case

In view of its practical importance, let us first focus on the special case of a binary process c09-math-019, with the range coded as c09-math-020 as in Example 6.3.2. Having available the binary segment c09-math-021, one commonly determines either the sample sum c09-math-022 (say, counts of non-conforming items) or the corresponding sample fraction of ‘1’s. Since the sample fraction differs from the count just by a factor c09-math-023, we shall always consider the resulting count process c09-math-024 in the sequel. The original binary process c09-math-025 is now monitored by monitoring this derived count process c09-math-026. At this point, the fundamental premise of this section should be remembered: although c09-math-027 might exhibit serial dependence, due to taking sufficiently distant segments, we shall assume c09-math-028 to be serially independent, and hence i.i.d. in its in-control state.

For monitoring c09-math-029 (being i.i.d. in its in-control state), any of the concepts discussed in Chapter 8 can be used, it just has to be adapted to the finite range c09-math-030 of c09-math-031. This difference sometimes manifests itself in the name of the resulting control charts. If the counts are plotted directly on a Shewhart-type chart, for instance, it is no longer referred to as a c09-math-032 chart, but as an c09-math-033 chart; see also the discussion in Section 8.2.1, as well as Montgomery (2009). If the sample fractions are plotted, it is called a c09-math-034 chart. Despite this different terminology, the c09-math-035 chart still has two control limits c09-math-036 satisfying c09-math-037, which includes the one-sided charts as boundary cases (upper-sided if c09-math-038). Also ARLs are computed as before; see (8.3) and (8.4).

Concerning the distribution of the sample counts, the serial dependence structure of the underlying binary process c09-math-050 is of importance. If c09-math-051 is i.i.d. with c09-math-052 (say, the probability of a non-conforming item), then each sample sum c09-math-053 is binomially distributed according to c09-math-054 (Example A.2.1). So the statistics c09-math-055 constitute themselves as an i.i.d. process of binomial counts. But if c09-math-056 exhibits serial dependence, in contrast, the distribution of c09-math-057 will deviate from a binomial one.

In Deligonul & Mergen (1987), Bhat & Lal (1990) and Weiß (2009f), the case of c09-math-058 being a binary Markov chain with success probability c09-math-059 and autocorrelation parameter c09-math-060 was considered (Example 7.1.3); that is, with the transition matrix given by (7.6). In this case, c09-math-061 follows the so-called Markov binomial distribution c09-math-062 (which coincides with c09-math-063 iff c09-math-064). While the mean of c09-math-065 is not affected by the serial dependence, the variance in particular changes (extra-binomial variation if c09-math-066, see the discussion in the context of Equation (2.3)):

9.1 equation

The pmf is given by (Kedem, 1980, Corollary 1.1)

for c09-math-069 (zero inflation if c09-math-070; see the discussion in Appendix A.2). If the time distance c09-math-071 between successive segments from c09-math-072 is sufficiently large, the resulting process of counts c09-math-073 can still be assumed to be approximately i.i.d. (note that the correlation between c09-math-074 and c09-math-075 decays exponentially with c09-math-076), but with a marginal distribution different from a binomial one. This difference in the distribution of c09-math-077 certainly has to be considered when designing a corresponding control chart (Weiß, 2009f).

In addition, advanced control schemes, such as the EWMA or CUSUM charts discussed in Section 8.3, can be used for monitoring c09-math-078. Assuming the counts c09-math-079 to be binomially distributed in their in-control state (that is, c09-math-080 is assumed to be i.i.d.), Gan (1993) applied the CUSUM scheme described in Section 8.3.1 for process monitoring, while Gan (1990b) used the modified EWMA chart from Section 8.3.3 (with rounding operation as in (8.20)) for this purpose. The application of such an EWMA chart to the case of c09-math-081 being a binary Markov chain – that is, with c09-math-082 following the Markov binomial distribution – was considered by Weiß (2009f). The computation of ARLs is done in the same way as described in Sections 8.3.1 and 8.3.3, by just using the pmf of the (Markov) binomial distribution. A completely different approach for a sample-based monitoring of an underlying binary Markov chain was recently proposed by Adnaik et al. (2015), who did not compute the sample sums c09-math-083 as the charting statistics, but instead used some kind of likelihood ratio statistic for each of the successive segments. Finally, Höhle (2010) proposed a log-LR CUSUM chart for monitoring the c09-math-084 under the assumption that these counts follow a marginal (beta-)binomial logit regression model; see Section 7.4.

9.1.2 Sample-based Monitoring: Categorical Case

Let us return to the truly categorical case; that is, where the range of c09-math-125 consists of more than two states, c09-math-126 with c09-math-127. As in Section 6.2, we denote the time-invariant marginal probabilities by c09-math-128. If the number of different states, c09-math-129, is small, it would be feasible to monitor the process with c09-math-130 simultaneous binary charts; say, by using the c09-math-131-tree method described in Duran & Albin (2009). However, here we shall concentrate on charting procedures in which the information about the process is comprised in a univariate statistic: After having taken a segment from the process, we first compute the resulting frequency distribution as a summary, which then serves as the base for deriving the statistic to be plotted on the control chart. To keep it consistent with the binary case from Section 9.1.1, we concentrate on absolute frequencies: c09-math-132 with c09-math-133 being the absolute frequency of the state c09-math-134 in the sample c09-math-135, such that c09-math-136. With c09-math-137 denoting the binarization of c09-math-138, we may express c09-math-139.

If the underlying categorical process c09-math-140 is even serially independent (so altogether i.i.d.), then the distribution of each c09-math-141 is a multinomial one; see Example A.3.3. This case was considered by Marcucci (1985) and Mukhopadhyay (2008), among others, who proposed plotting Pearson's c09-math-142-statistic on a control chart,

9.3 equation

where c09-math-144 refers to the in-control values of the categorical probabilities. So in the in-control case, the process c09-math-145 is i.i.d. with a marginal distribution that might be approximated by a c09-math-146-distribution (see Horn (1977) concerning the goodness of this approximation). As an alternative, Weiß (2012) proposed using a control statistic that measures the relative change of categorical dispersion. As the underlying categorical dispersion measure, the Gini index (6.1) might be used. If c09-math-147 is i.i.d., following the in-control model, then

9.4 equation

is approximately normally distributed, with a mean of c09-math-149 and variance c09-math-150; see Section 6.2. These approximate distributions for c09-math-151 or c09-math-152 may be used during chart design. But since the quality of these approximations is often rather bad (note that c09-math-153 is often quite small and that the control limit is usually chosen as an extreme quantile), the final design and evaluation of the ARL performance requires simulations in practice.

A sample-based approach is also possible if c09-math-200 is serially dependent. But then, certainly, the distributions of c09-math-201 and hence of c09-math-202 and c09-math-203 will deviate from those given above for the i.i.d. case. If, for instance, c09-math-204 is an NDARMA process (Section 7.2), then the effect on the distribution can be quantified in terms of the constant c09-math-205 from (7.12). Considering the complete vector c09-math-206, the covariance matrix c09-math-207 from Example A.3.3 is asymptotically inflated by the factor c09-math-208 (Weiß, 2013b). For the Gini statistic c09-math-209, variance and mean change (approximately) according to (7.13) and (7.14), respectively, while Weiß (2013b) showed that c09-math-210 is approximately c09-math-211-distributed.

Image described by caption/surrounding text.

Figure 9.4 (a) c09-math-220 chart and (b) c09-math-221 chart applied to simulated sample; see Example 9.1.2.2.

Höhle (2010) proposed a log-LR CUSUM chart if the c09-math-222 stem from a marginal multinomial logit regression model; see Section 7.4.

9.2 Continuously Monitoring Categorical Processes

If the process evolves sufficiently slowly, then it is possible to implement a continuous monitoring approach of the categorical process c09-math-223. So as a new categorical observation c09-math-224 arrives, the next control statistic is computed and plotted on the control chart.

9.2.1 Continuous Monitoring: Binary Case

As in Section 9.1.1, let us first focus on the special case of a binary process c09-math-225. Perhaps the best-known approach for (quasi) continuously monitoring a binary process is by plotting run lengths c09-math-226 on an appropriately designed chart:

9.5 equation

As an example,

equation

The monitoring of such runs is a reasonable approach, especially for high-quality processes where c09-math-228 is very small. Small c09-math-229 implies that long runs are observed, but if c09-math-230 increases (deterioration of quality), the runs become shorter (and vice versa). So the detection of a decrease in the run lengths is often particularly relevant. Having fixed a truly two-sided design c09-math-231, we stop monitoring with the c09-math-232th run if either c09-math-233 for the first time, or if already c09-math-234 zeros have been observed since the last run (because then, c09-math-235 will necessarily become larger than c09-math-236, but we do not need to wait until the run is finished).

Concerning performance evaluation, Remark 9.1.1.1 should be remembered. The ARL – that is, the average number of plotted runs until the first alarm – would be quite misleading, since a single run might comprise a rather large number of original observations. Therefore, the ATS is clearly preferable as a measure of chart performance.

If c09-math-237 is i.i.d. (Bourke, 1991; Xie et al., 2000; Weiß, 2013c), then c09-math-238 is also i.i.d. according to the shifted geometric distribution (Example A.1.5). The ATS can be computed according to (Weiß, 2013c)

Illustration of ATS performance of runs chart against Π.

Figure 9.5 ATS performance of runs chart against c09-math-240; see Example 9.2.1.1.

The monitoring of runs is straightforwardly extended to the Markov case; see Blatterman & Champ (1992) and Lai et al. (2000). The runs c09-math-268 from a binary Markov chain c09-math-269 according to Example 7.1.3 are still serially independent, but their distribution is no longer shifted geometric. While c09-math-270 has to be treated separately, for c09-math-271 with c09-math-272, we obviously have

equation

If ‘1’s are observed more frequently, the runs become quite short on average. In such a case, the CUSUM procedure proposed by Bourke (1991) to monitor the run length in c09-math-273 is more appropriate. This geometric CUSUM control chart is essentially equivalent to the Bernoulli CUSUM control chart, which was proposed by Reynolds & Stoumbos (1999) for an i.i.d. binary process c09-math-274, and which was extended to the case of a binary Markov chain, as in Example 7.1.3, by Mousavi & Reynolds (2009). These charts are constructed in an analogous way to (8.15): the CUSUM chart is defined by accumulating the contributions to the log-likelihood ratio (log-LR) at times c09-math-275. The contribution by the c09-math-276th observation equals

equation

where c09-math-277 refers to the relevant out-of-control parameter value of c09-math-278, while c09-math-279 represents the in-control value. In the Markov case (Example 7.1.3; see also Remark 8.3.2.2), c09-math-280 is computed as before, while

equation

An upper-sided CUSUM chart (we restrict to this case, since usually increases in c09-math-281 are to be detected) can now be constructed analogously to (8.13) by defining

In the i.i.d. case, the CUSUM (9.7) might be rewritten in the form

Note that the plotted statistics of these CUSUM charts go along with the observations, so c09-math-284. To allow for an exact ARL computation with the MC approach (Section 8.2.2), c09-math-285 can be required to take the form c09-math-286 with an c09-math-287 (Reynolds & Stoumbos, 1999); a similar strategy is proposed by Mousavi & Reynolds (2009) for the case of c09-math-288 being a binary Markov chain. In this case (with c09-math-289 being a multiple of c09-math-290), the resulting transition matrices c09-math-291 for the MC approach are sparse matrices (see Section 8.3.2), since only a few combinations c09-math-292 are possible at all for c09-math-293. We have

equation

Note that a lower-sided CUSUM chart can be constructed in an analogous way to (9.8). For a log-LR CUSUM with respect to an underlying logit regression model (Section 7.4), see Höhle (2010).

Scheme for CUSUM charts.

Figure 9.7 CUSUM charts of Example 9.2.1.2: c09-math-310 against c09-math-311.

Image described by caption/surrounding text.

Figure 9.8 CUSUM chart c09-math-312 for fatty liver data; see Example 9.2.1.2.

We conclude by pointing out another approach for continuously monitoring a binary process: the EWMA chart discussed in Section 8.3.3, which was applied to binary processes by Yeh et al. (2008) and Weiß & Atzmüller (2010), among others.

9.2.2 Continuous Monitoring: Categorical Case

Generally, while it is quite natural to check for runs in a binary process, it is more difficult to define a run for the truly categorical case in a reasonable way. One possible solution was discussed in Weiß (2012), where one waits for a segment c09-math-313 of length c09-math-314, where c09-math-315 is taken from a specified subset c09-math-316. But as pointed out in Weiß (2012), waiting times for completely different types of patterns might also be relevant, depending on the actual application scenario. Because of this ambiguity, we shall not further consider the monitoring of runs in a categorical process here.

Instead, we follow the path of Section 9.2.1 and consider CUSUM charts derived from the log-likelihood ratio approach. So let c09-math-317 denote again the in-control value of the marginal distribution c09-math-318, and let c09-math-319 be the relevant out-of-control value. A CUSUM scheme for the case of an underlying i.i.d. process was proposed by Ryan et al. (2011). In this case, the contribution to the log-LR at time c09-math-320, c09-math-321, can be expressed as either

where the latter version uses the binarization c09-math-323 of c09-math-324. The log-LR approach also applies to serially dependent categorical processes. As an example, in analogy to the work by Mousavi & Reynolds (2009) concerning a binary Markov chain (see Section 9.2.1), we consider a categorical Markov chain. Then, c09-math-325 is computed as before, where

equation

If we are concerned with the particular case of a DAR(1) process as in Examples 7.2.2 and 9.1.2.3, it then follows that

9.10 equation

The log-LR-based CUSUM statistic at time c09-math-327 is defined as before, by the recursion

An alarm is triggered once c09-math-329 violates the upper control limit c09-math-330 for the first time. For a log-LR CUSUM with respect to an underlying logit regression model (Section 7.4), see Höhle (2010).

An EWMA control chart for the monitoring of a categorical process is proposed by Ye et al. (2002).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.37.126