Full article: SPC methods for time-dependent processes of counts

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

During the last few years, there was increasing interest in SPC methods for time-dependent processes of counts. We survey recent developments in this field: feasible models for autocorrelated counts processes are presented, approaches for corresponding control charts are considered, and also the topic of process capability indices is briefly discussed. The article is accompanied by a comprehensive list of relevant references, and it concludes by outlining promising directions for future research.

Keywords:

Public Interest Statement

In many fields of application, we are concerned with count data processes. Typical examples are counts of defects per produced item in manufacturing industry, counts of new cases of an infection per time unit in health care monitoring, or counts of complaints by customers per time unit in service industry. Often, it is important to detect changes in the process as soon as possible to be able to start preventive actions or to avoid further damages. Methods of statistical process control are a suitable tool for this purpose. The article provides a detailed survey of such methods together with a comprehensive list of relevant references, and it concludes by outlining promising directions for future research.

1. Introduction

Methods of statistical process control (SPC) help to monitor and improve processes in manufacturing and service industries. For such a process, certain quality characteristics are measured at discrete times $t \in N : = {1, 2, \dots}$ , thus leading to a (possibly multivariate) stochastic process ${(X_{t})}_{N}$ of continuous-valued or discrete-valued random variables (variables data or attributes data, respectively). One of the most important SPC tools is the control chart, which requires the relevant quality characteristics to be measured online. Control charts are applied to a process operating in a stable state (in control), i.e. ${(X_{t})}_{N}$ is assumed to be stationary according to a specified model (in-control model). As a new measurement arrives, it is used to compute a statistic (possibly also incorporating past values of the quality characteristic) which is then plotted on the control chart with its control limits. If the statistic violates the limits, an alarm is triggered, signaling that the process may not be stable anymore (out of control) and requires corrective actions. Besides such an online monitoring to detect changes in the process, it is also important to analyze to what extent the given target values and specification limitsFootnote¹ are met by the process in its in-control state. A widely used SPC solution for this purpose is process capability indices, which are also briefly discussed in the present article. Furthermore, also another type of application of control charts is considered. The use of control charts for online monitoring, as described before, is commonly referred to as the Phase-II application. But control charts may also be applied in a retrospective manner to already available in-control data, with the aim of characterizing the in-control properties of ${(X_{t})}_{N}$ ; this is called the Phase-I application of a control chart. More details about all these terms and concepts can be found in the textbook by Montgomery (Citation2009) and in the survey papers by Woodall (Citation2000), Woodall and Montgomery (Citation2014).

In this article, we shall concentrate on a type of attributes data processes: count data processes, where each $X_{t}$ has a range contained in the set of non-negative integers, $N_{0} : = {0, 1, \dots}$ . Typical examples are counts of defects per produced item in manufacturing industry, counts of new cases of an infection per time unit in health care monitoring, or counts of complaints by customers per time unit in service industry. A lot of work has been done regarding such attributes data processes, see the survey by Woodall (Citation1997), but with one important restriction: the large majority of papers about SPC methods for attributes data assumes the underlying process to be serially independent in its in-control state, so the counts $X_{1}, X_{2}, \dots$ have to be independent and identically distributed (i.i.d.). Only during the last few years, increasing research activity can be observed concerning attributes data processes with serial dependence. The aim of the present paper is to present a survey of these research activities, and to outline relevant issues for future research in this area.

At this point, it is important to stress that this lack of interest in autocorrelated attributes data is in sharp contrast to the variables case. After few scattered works concerning the effects of autocorrelation on variables control charts’ performance during the 1960s to 1980s, a lot of research activity in this direction can be observed since the 1990s, initiated, among others, by the works by Alwan and Roberts (Citation1988), Alwan (Citation1992,Citation1995). Surveys on control charts for autocorrelated variables data processes are provided by Knoth and Schmid (Citation2004), Psarakis and Papaleonida (Citation2007). Although not being a topic of research until a few years ago, Alwan (Citation1995) had already shown that autocorrelation is indeed a common phenomenon if being concerned with attributes data processes. Typical reasons for counts data processes to be autocorrelated are high sampling frequency due to automated production environments in manufacturing industry, or varying service times (extending over more than one time unit) in service industry, or varying incubation times and infectivities of diseases in health care monitoring.

The delay in working on SPC methods for autocorrelated attributes data processes might have been caused by the problem that simple stochastic models for such processes, i.e. which are of comparable simplicity to the well-known autoregressive moving average (ARMA) models for autocorrelated variables data processes, were not known to a broader audience for a long time. Therefore, we start in Section 2 with a brief review of the basic approaches for modeling autocorrelated processes of counts. Section 3 then provides information about the most popular SPC tools, control charts and process capability indices. While Section 3.1 only presents a basic Shewhart chart and puts more emphasize on topics like performance evaluation and the effect of estimated parameters, details on advanced control charts like CUSUM and EWMA methods are presented in Section 4. Finally, we outline possible directions for future research in Section 5.

2. Basic models for autocorrelated counts processes

In the sequel, several common count data distributions shall be mentioned without presenting further details about them; a reader being interested in more background information is referred to the book by Johnson, Kemp, and Kotz (Citation2005).

One of the oldest approaches toward stationary count data processes is the INAR(1) model by McKenzie (Citation1985), Al-Osh and Alzaid (Citation1987), the integer-valued counterpart to the usual autoregressive model of order 1. This model can be understood as a special type of branching process with immigration, and it uses the binomial thinning operator by Steutel and van Harn (Citation1979): If X is a count data random variable and if $α \in (0; 1)$ , then the random variable $α \circ X : = \sum_{i = 1}^{X} Z_{i}$ is said to arise from X by binomial thinning, where $Z_{i}$ are i.i.d. binary random variables with $P (Z_{i} = 1) = α$ (we abbreviate $Z \sim B i n (1, α)$ ), which are also independent of X. So $α \circ X$ is conditionally binomially distributed, $α \circ X \sim B i n (X, α)$ .

Let ${(ϵ_{t})}_{N}$ be an (unobservable) i.i.d. count data process with $E [ϵ_{t}] = μ_{ϵ}$ and $V [ϵ_{t}] = σ_{ϵ}^{2}$ , the innovations to the process. The INAR(1) model by McKenzie (Citation1985), Al-Osh and Alzaid (Citation1987) now assumes the observations ${(X_{t})}_{N_{0}}$ to satisfy the recursion(2.1) $\begin{matrix} X_{t} = α \circ X_{t - 1} + ϵ_{t}, \end{matrix}$ (2.1)

where all thinning operations are performed independently of each other and of ${(ϵ_{t})}_{N}$ , and where the thinning operations at each time t as well as $ϵ_{t}$ are independent of ${(X_{s})}_{s < t}$ .

Except using the thinning operator “ $\circ$ ” instead of the usual multiplication “ $\cdot$ ”, recursion (2.1) looks like the usual AR(1) recursion. In fact, it also constitutes a Markov chain with an exponentially decaying autocorrelation function (ACF), $ρ (k) : = C o r r [X_{t}, X_{t - k}] = α^{k}$ , and marginal mean and variance-mean ratio are obtained as(2.2) $\begin{matrix} μ_{X} = \frac{μ_{ϵ}}{1 - α}, \frac{σ_{X}^{2}}{μ_{X}} = \frac{\frac{σ_{ϵ}^{2}}{μ_{ϵ}} + α}{1 + α} . \end{matrix}$ (2.2)

Beyond mimicing the typical AR(1)-like autocorrelation structure, the INAR(1) model is particularly relevant for typical tasks of statistical quality control due to its intuitive interpretation (see Weiß, Citation2007). The thinning operation $α \circ X$ itself is interpreted as expressing the number of survivors from a population of size X, where each individual, independent of the other individuals, has survival probability $α$ . So recursion (2.1) is interpreted as(2.3) $\begin{matrix} \underset{Population at time}{\underset{⏟}{X_{t}}} t = \underset{Survivors of generation}{\underset{⏟}{α \circ X_{t - 1}}} t - 1 + \underset{Immigration}{\underset{⏟}{ϵ_{t}}} . \end{matrix}$ (2.3)

Adapted to the application scenarios sketched in Section 1, the “population” at time t might consist of faults in a system or network, of persons being infected by a certain disease, or of unanswered complaints by customers. These might be faults or infected persons or complaints that were already available at the previous time $t - 1$ (“survivors”), or which newly occured at time t (“immigration”).

The most popular case of the INAR(1) family is the Poisson INAR(1) model. Here, it is assumed that the innovations $ϵ_{t}$ are Poisson-distributed according to $P o i (λ)$ such that $μ_{ϵ} = σ_{ϵ}^{2} = λ$ . Then the stationary marginal distribution is also a Poisson distribution, $P o i (\frac{λ}{1 - α})$ (see Al-Osh & Alzaid, Citation1987), such that also the observations have a variance being equal to the mean (the latter property is referred to as equidispersion). In applications, however, one often observes the counts having a variance being larger than the mean, i.e. having overdispersion (Weiß& Testik, Citation2011). According to (2.2), such a feature is easily implemented into the INAR(1) model by simply using an overdispersed distribution for the innovations, like a compound Poisson distribution (Schweer & Weiß, Citation2014) or the Poisson log-normal distribution (Weiß& Testik, Citation2015a). By the same approach, also other non-standard features like, e.g. zero inflation (excess of zeros) can be implemented into the INAR(1) model (see Jazi, Jones, & Lai, Citation2012). Finally, it should be mentioned that also higher order INARMA models have been discussed in the literature, for instance, in Du and Li (Citation1991), Weiß(Citation2008b).

Often motivated by the aim of defining an AR(1)-like model for counts with overdispersion, a number of modifications to the basic INAR(1) model (2.1) have been proposed where the binomial thinning operator is replaced by another type of thinning (see Weiß, Citation2008a for a survey). As an example, Ristić, Bakouch, and Nastić (Citation2009) introduced the negative binomial thinning operator $α * X : = \sum_{i = 1}^{X} Z_{i}$ , where the $Z_{j}$ ’s are geometrically distributed with “success probability” $1 / (1 + α)$ (such that $E [Z_{i}] = α > 0$ ). Then the innovations’ distribution can be chosen in such a way that the new geometric integer-valued autoregressive (NGINAR) process of order 1, defined by(2.4) $\begin{matrix} X_{t} = α * X_{t - 1} + ϵ_{t}, \end{matrix}$ (2.4)

is stationary with geometrically distributed marginals having an arbitrary mean $μ > 0$ , provided that $α \leq μ / (1 + μ)$ , and with ACF $ρ (k) = α^{k}$ .

Another popular approach for modeling stationary processes of counts are the INGARCH models, which are particularly attractive for overdispersed counts. The INGARCH model, the integer-valued counterpart to the conventional generalized autoregressive conditional heteroskedasticity model, was introduced by Heinen (Citation2003), Ferland, Latour, and Oraichi (Citation2006). Given the past observations, a conditional Poisson distribution with an ARMA-like recursion for the conditional means is assumed. For the special case of the INARCH(1) model, which constitutes a counterpart to the INAR(1) model discussed before, let us denote the model parameters by $β > 0$ and $0 < α < 1$ . Then the process ${(X_{t})}_{Z}$ is said to follow the INARCH(1) model if $X_{t}$ is conditionally Poisson distributed in the following way:(2.5) $\begin{matrix} X_{t} | X_{t - 1}, X_{t - 2}, \dots \sim P o i (β + α \cdot X_{t - 1}) . \end{matrix}$ (2.5)

The ACF equals $ρ (k) = α^{k}$ like in the standard AR(1) case, and marginal mean and variance-mean ratio of the INARCH(1) process are given by(2.6) $\begin{matrix} μ_{X} = \frac{β}{1 - α}, \frac{σ_{X}^{2}}{μ_{X}} = \frac{1}{1 - α^{2}} > 1 . \end{matrix}$ (2.6)

There are certainly many alternative approaches for modeling time series of counts, e.g. regression models (Kedem & Fokianos, Citation2002) or hidden Markov models (Zucchini & MacDonald, Citation2009), but these shall not be considered further in this text, since they have not been used yet in an SPC context (to the knowledge of the author).

3. Common SPC methods

3.1. Control charts

The most common application scenario for control charts is the so-called Phase-II application (also see Section 1 before), i.e. the prospective online monitoring to detect a possible change in the process. The (unknown) time where such a process change first happens is called a change point. To be more precise, we consider the following (unconditional) change point model (Knoth, Citation2006):

For $τ \in N$ , we assume that ${(X_{t})}_{t < τ}$ and ${(X_{t})}_{t \geq τ}$ are stationary processes with distributions abbreviated as $F_{0}$ and $F_{1}$ , respectively. The time index $τ$ is the change point, which is not known in practice. For $t < τ$ , the process is said to be in control, while it is out of control for $t \geq τ$ if $F_{1} \neq F_{0}$ .

Applying a control chart, we aim at detecting the unknown change point $τ$ as early as possible. The most simple control charts are the so-called Shewhart charts, which are based on statistics $Z_{t}$ being a function only of the most recent observation $X_{t}$ (or of the most recent sample for a sample-based monitoring). Then $Z_{t}$ is plotted on a chart against time t with time-invariant lower and upper control limits $l < u$ . An alarm is triggered at time t for the first time if(3.1) $\begin{matrix} Z_{1}, \dots, Z_{t - 1} \in [l; u], but Z_{t} \notin [l; u] . \end{matrix}$ (3.1)

An extensive review of Shewhart control charts is given by Montgomery (Citation2009).

Regarding count data monitoring, the so-called c chart is particularly relevant where simply $Z_{t} = X_{t}$ , i.e. the counts are directly plotted on the chart as they arrive in time. More advanced control charts, where $Z_{t} : = f_{t} (X_{1}, \dots, X_{t}; δ)$ is an appropriately chosen measurable function of $X_{1}, \dots, X_{t}$ and of a vector $δ$ of design parameters, are considered later in Section 4 in more detail. Applications of the c chart to INAR(1) processes (2.1) were considered by Weiß(Citation2007,Citation2011b), Morais and Pacheco (Citationin press), to NGINAR(1) processes (2.4) by Li, Wang, and Zhu (Citationin press), and to INARCH(1) processes (2.5) by Weißand Testik (Citation2012).

Remark 3.1

(Change Point Methods) An approach being related to the control chart are tests for a change point within a given time series. For the case of a count data time series stemming from an INARCH(1) model, such change point tests were developed by Franke, Kirch, and Kamgaing (Citation2012), Kang and Lee (Citation2014), Kang and Song (Citation2015), while Torkamani, Niaki, Aminnayeri, and Davoodi (Citation2014), Davoodi, Niaki, and Torkamani (Citation2015) considered an underlying INAR(1) process, also see the references in Hudecová, Hušková, and Meintanis (Citation2015). Note that the main difference between such change point tests and the above control charts is that the first are usually applied in an offline manner, to find the location of the change point withing the available (and static) time series. Online versions of change point tests, where the in-control model is sequentially tested based on the available data at each time, are presented by Hudecová et al. (Citation2015) for the case of the INAR(1) model (2.1), and by Kirch and Kamgaing (Citation2015) for the case of the INARCH(1) model (2.5).

The essential step before starting process monitoring is to find an appropriate chart design, i.e. appropriate values for the control limits $l < u$ in case of the c chart. Although sometimes being criticized (Kenett & Pollak, Citation2012), still, the main approach is to consider appropriately defined mean statistics based on the run length L, i.e. an average run length (ARL), where $L : = min {t \in N | Z_{t} \notin [l; u]}$ is defined as the random number of plotted points until the first alarm is triggered. The most common ARL concepts are as follows (Knoth, Citation2006): Defining $E_{τ} [\cdot]$ as the expectation related to the change point $τ$ ,

the zero-state ARL (also initial-state ARL) is defined as (3.2) $\begin{matrix} A R L : = E_{1} [L], \end{matrix}$ (3.2)
the expected conditional ARL (also expected or conditional delay) is defined as (3.3) $\begin{matrix} A R L^{(τ)} : = E_{τ} [L - τ + 1 | L \geq τ], \end{matrix}$ (3.3)
the steady-state ARL is defined as (3.4) $\begin{matrix} A R L^{(\infty)} : = lim_{τ \to \infty} A R L^{(τ)} . \end{matrix}$ (3.4)

Obviously, we have

A R L^{(1)} = A R L

. For any of these ARL concepts, we refer to the computed ARL value as the in-control ARL (out-of-control ARL) if

F_{1} = F_{0}

(

F_{1} \neq F_{0}

); the in-control ARL is commonly abbreviated by adding the index “0”. A popular approach for chart design is to choose

l < u

such that the zero-state ARL

_{0}

reaches a prespecified level (expressing the robustness of the chart against false alarms), and then to evaluate the out-of-control performance based on the steady-state ARL (since the value of the change point is not known but it will satisfy

τ ≫ 1

in many real applications).

It remains to ask how to compute any of the ARL concepts (3.2)–(3.4) given a certain chart design (this question holds in the same way also for the advanced control charts discussed in Section 4 below). Certainly, in any case where it is possible at all to simulate the considered type of counts data process, ARLs can be approximated based on such simulations with a sufficiently high number of replications (usually at least 10,000). But if ${(X_{t})}_{N}$ follows a type of discrete Markov model (note that any of the three models (2.1), (2.4), and (2.5) constitutes a discrete Markov chain), then it is often possible to adapt the Markov chain approach (MC approach) as first proposed by Brook and Evans (Citation1972). A detailed description for several types of control charts (including the c chart for INAR(1) processes), together with corresponding software implementations, is provided by the tutorial by Weiß(Citation2011b).

To conclude this section, let us briefly look at the Phase-I application of control charts, and at the related topic of the effect of estimated parameters on the control charts’ performance. To design the control charts for use in Phase II, a model for the in-control behavior of the process is required (which is then used for chart design as outlined before). Since in practice, the true in-control model is hardly known, one has to fit a model to a set of historic data which are believed to stem from the in-control model. There are several issues that have to be considered carefully in this context, see the recent survey by Jones-Farmer, Woodall, Steiner, and Champ (Citation2014). Among others, once a data sample for Phase-I analysis is available, it has to be checked if these data can be assumed at all to stem from a unique model, or if, for instance, the data are contaminated by outliers. In the latter case, such outliers have to be excluded from the data before fitting the in-control model. For this task, control charts are often used (especially Shewhart charts), which is known as the Phase-I application of control charts. In Weißand Testik (Citation2015b), the concrete implementation of the Phase-I analysis for an underlying INAR(1) process is discussed in detail, and the effect of undetected outliers during Phase I on the resulting chart design and performance during Phase II is studied.

Once the available data can be assumed to be “clean”, the parameters of the in-control model have to be estimated. The estimated in-control model is then used for chart design for Phase II. Many articles considered the effect of estimated parameters on the charts’ performance in Phase II (see Jensen, Jones-Farmer, Champ, & Woodall, Citation2006 for references), where the properties of the used estimators or the sample size play an important role. In the context of autocorrelated count data processes, this topic was considered by Weißand Testik (Citation2011), Zhang, Nie, He, and Hou (Citation2014), Weißand Testik (Citation2015b) for the Poisson INAR(1) model and diverse types of control charts.

3.2. Process capability indices

Saying that a process is in control only implies that it is stationary, following a specified model (see above), but it does not imply that the output of the process meets the given quality requirements. Concerning the latter issue, one has to check, for instance, to what extent the given target values and specification limits are met by the process. If the process is not consistent with the given external specifications, adjustments are necessary such that the new in-control model better agrees with the quality requirements. A popular tool for evaluating the actual process capability is process capability indices. An introduction to such indices (especially for the variables data case) can be found in the book by Montgomery (Citation2009), the most recent literature survey seems to be the one by Saha and Maiti (Citation2015).

Only few of the works about capability indices refer to attributes data processes. Perakis and Xekalaki (Citation2005) picked up the idea of considering the actual “proportion of conformance”: if the upper specification limit USL describes, e.g. the maximal acceptable number of non-conformities per produced item, then the probability $P (X > U S L)$ is compared to a prespecified acceptable probability level $1 - p_{0}$ . Perakis and Xekalaki (Citation2005) considered an index defined by the quotient(3.5) $\begin{matrix} C_{PX} : = \frac{1 - p_{0}}{P (X > U S L)} \in [1 - p_{0}; \infty) . \end{matrix}$ (3.5)

A related approach designed for the specific level $1 - p_{0} = 0.0027$ was proposed by Borges and Ho (Citation2001) as(3.6) $\begin{matrix} C_{BH} : = \frac{1}{3} \cdot Φ^{- 1} (1 - \frac{1}{2} \cdot P (X > U S L)) \in [0; \infty), \end{matrix}$ (3.6)

where $Φ$ denotes the distribution function of the standard normal distribution $N (0, 1)$ .

For practice, a relevant question is how to estimate the indices (3.5) and (3.6) from given in-control data (in analogy to the Phase-I analysis discussed before). While Perakis and Xekalaki (Citation2005) considered this task for an underlying i.i.d. process of Poisson counts, Weiß(Citation2012b) extended this work to an underlying Poisson INAR(1) process (2.1), distinguishing between the process capability for the observations or innovations, respectively, from such an INAR(1) process.

4. Advanced control charts

The basic c chart presented in Section 3.1 allows for a continuous monitoring of a serially dependent count data process, but the statistic plotted on the chart at time t, which is simply the count value being observed at time t, does not comprise information about past values of the process (at least not explicitly, beyond the mere effect of autocorrelation). Therefore, the c chart (as any other Shewhart-type chart) is not particularly sensitive to small or moderate changes in the process. For this reason, several types of advanced control charts have been proposed, where the plotted statistic at time t also uses past observations of the process and hence accumulates information about the process for a longer period of time.

4.1. CUSUM charts

The traditional cumulative sum (CUSUM) control chart (Page, Citation1954), being applied directly to the observations $X_{t}$ of the process, is perhaps the most natural advanced candidate for monitoring autocorrelated processes of counts, because it preserves the discrete nature of the process by only using additions (but no multiplications). Initialized by a starting value $c_{0}^{+} \geq 0$ , the upper-sided CUSUM is defined by(4.1) $\begin{matrix} C_{0}^{+} = c_{0}^{+}, C_{t}^{+} = max (0; X_{t} - k^{+} + C_{t - 1}^{+}) for t = 1, 2, \dots \end{matrix}$ (4.1)

The starting value is commonly chosen as $c_{0}^{+} = 0$ ; a value $c_{0}^{+} > 0$ is referred to as a fast initial response (FIR) feature, and it may help to detect an initial out-of-control state more quickly. If $k^{+}$ and $c_{0}^{+}$ are taken as integer values, then also ${(C_{t}^{+})}_{N_{0}}$ is integer valued, or, as another example, if $k^{+}, c_{0}^{+} \in {0, 1 / 2, 1, 3 / 2, \dots}$ then so is $C_{t}^{+}$ . An alarm is triggered if $C_{t}^{+}$ violates the control limit $h^{+}$ (decision interval).

While the upper-sided CUSUM is mainly designed to detect increases in the process mean, the lower-sided CUSUM, defined by(4.2) $\begin{matrix} C_{0}^{-} = c_{0}^{-}, C_{t}^{-} = max (0; k^{-} - X_{t} + C_{t - 1}^{-}) for t = 1, 2, \dots, \end{matrix}$ (4.2)

aims at uncovering decreases in the mean. If $(C_{t}^{+}, C_{t}^{-})$ are monitored simultaneously, then this chart combination is referred to as a two-sided CUSUM chart. An excellent book with a lot of background information about CUSUM charts is the one by Hawkins and Olwell (Citation1998).

In the context of monitoring autocorrelated counts processes, the upper-sided CUSUM was applied to INAR(1) processes (2.1) by Weißand Testik (Citation2009,Citation2011), to NGINAR(1) processes (2.4) by Li et al. (Citationin press), and to INARCH(1) processes (2.5) by Weißand Testik (Citation2012). The lower-sided and the two-sided version were applied to INAR(1) processes by Yontay, Weiß, Testik, and Bayindir (Citation2013). For performance evaluation, it is important that the CUSUM preserves the discrete range. Therefore, exact run length computations are possible with a type of MC approach (Weiß, Citation2011b): the one-sided CUSUM requires to consider the bivariate Markov chain $(X_{t}, C_{t}^{\pm})$ (Weiß& Testik, Citation2009), the two-sided CUSUM the trivariate Markov chain $(X_{t}, C_{t}^{+}, C_{t}^{-})$ (Yontay et al., Citation2013).

Besides the basic CUSUM approach (4.1), also INAR(1) CUSUM charts with additional Winsorization have been considered (Hawkins, Citation1993; Weiß& Testik, Citation2011), as well as CUSUM charts for diverse types of residuals from an INAR(1) process (Weiß& Testik, Citation2015a) and CUSUM charts based on the likelihood ratio of an INARCH(1) process (Weiß& Testik, Citation2012).

4.2. EWMA charts

Another advanced approach for process monitoring, which is also very popular in applications, is the exponentially weighted moving average (EWMA) control chart dating back to Roberts (Citation1959). The standard EWMA recursion defined by(4.3) $\begin{matrix} Z_{t} = λ \cdot X_{t} + (1 - λ) \cdot Z_{t - 1} for t = 1, 2, \dots, with λ \in (0; 1], \end{matrix}$ (4.3)

however, has an important drawback compared to the CUSUM approach of the previous Section 4.1 if applied to count data processes: it does not preserve the discrete range. Quite the contrary, the range of possible values of $Z_{t}$ changes in time, which rules out, among others, the possibility of an exact ARL computation by the Markov chain approach (remember Section 3.1). Therefore, Gan (Citation1990) suggests to plot rounded values of the statistic (4.3):(4.4) $\begin{matrix} Q_{t} = r o u n d (λ \cdot X_{t} + (1 - λ) \cdot Q_{t - 1}) for t = 1, 2, \dots, with λ \in (0; 1], \end{matrix}$ (4.4)

which is initialized by $Q_{0} : = q_{0} \in N_{0}$ . $q_{0}$ might be chosen as the rounded value of the in-control mean. An alarm is triggered if $Q_{t}$ violates one of the control limits $0 \leq l \leq u$ . Note that the statistics $Q_{t}$ can take only integer values from $N_{0}$ .

If the underlying count data process ${(X_{t})}_{N}$ is a Markov chain, then ${(X_{t}, Q_{t})}_{N}$ is a bivariate Markov chain with range $N_{0}^{2}$ , so ARLs can be computed again exactly by adapting the MC approach (see Weiß, Citation2009b for details). In the latter article as well as in Zhang et al. (Citation2014), the particular case of an underlying INAR(1) process (2.1) was considered, while Li et al. (Citationin press) investigated the EWMA approach (4.4) applied to an NGINAR(1) process (2.4).

A possible disadvantage of the rounded EWMA approach (4.4) was presented in Weiß(Citation2011a): especially for small values of $λ$ , which are generally recommended if small mean shifts are to be detected, one may observe some kind of “oversmoothing”, i.e. $Q_{t}$ becomes piecewise constant in time t and rather insensitive to process changes. Therefore, Weiß(Citation2011a) proposed a modification of (4.4), where a refined rounding operation is used: For $s \in N$ , the operation s-round maps x onto the nearest fraction with denominator s. For $s = 1$ , we obtain the usual rounding operation, while 2-round rounds onto values in ${0, 1 / 2, 1, 3 / 2, \dots}$ , for example. The resulting s-EWMA chart follows the recursion(4.5) $\begin{matrix} Q_{t}^{(s)} = s - r o u n d (λ \cdot X_{t} + (1 - λ) \cdot Q_{t - 1}^{(s)}) for t = 1, 2, \dots, with λ \in (0; 1] . \end{matrix}$ (4.5)

If ${(X_{t})}_{N}$ is a Markov chain (Weiß, Citation2011a considered the instance of an INAR(1) process (2.1)), then ${(X_{t}, Q_{t}^{(s)})}_{N}$ again is a discrete Markov chain, now with range $N_{0} \times Q_{0, s}^{+}$ , where $Q_{0, s}^{+} : = {\frac{r}{s} | r \in N_{0}}$ is the set of all non-negative rationals with denominator s. So again, it is possible to adapt the MC approach by Brook and Evans (Citation1972) for an exact ARL computation.

4.3. Jumps chart

The last type of advanced control chart to be presented here is the jumps chart proposed by Weiß(Citation2009c). It considers the “jumps” $J_{t} : = X_{t} - X_{t - 1}$ (Weiß, Citation2008b), which are particularly sensitive to a reduction of autocorrelation, since this leads to increased jumps. So in view of monitoring changes in the mean and the autocorrelation structure simultaneously, Weiß(Citation2009c) proposed to apply the combined jumps chart, where the counts $X_{t}$ and jumps $J_{t}$ are plotted simultaneously on a c chart with limits $0 \leq l < u$ and a jumps chart with limits $\mp k$ , respectively. If ${(X_{t})}_{N}$ is a Markov chain (Weiß, Citation2009c considered the instance of an INAR(1) process (2.1), (Li et al., Citationin press) that of an NGINAR(1) process (2.4)), then ${(X_{t}, J_{t})}_{N}$ is a discrete Markov chain with range $N_{0} \times Z$ , so ARLs can be computed exactly by adapting the MC approach.

5. Conclusions

After having been neglected for a long time, there was a rapidly increasing research interest in SPC methods for time-dependent processes of counts during the last few years. The present article provides a comprehensive survey of recent developments in this field in conjunction with a list of relevant references being as complete as possible.

We conclude this article by briefly discussing possible directions for future research in the area of SPC methods for autocorrelated attributes data. Up to now, mainly “well behaved” types of counts data processes have been considered, especially those having a Poisson marginal distribution. But in view of real counts processes as observed, e.g. in epidemiology, future research should also consider phenomena like an excessive number of zeros (zero inflation) or seasonality (the latter leading to a non-stationary but still a “regular” in-control behavior). Also the topic of count data processes having a finite range ${0, \dots, n}$ , with a fixed upper limit n reflecting, e.g. the sample size in manufacturing industry or the number of service entities in service industry, would be very relevant in practice, but was considered only casually up to now (Weiß, Citation2009a; Weiß& Kim, Citation2013; Weiß& Testik, Citation2015b. The same applies to multivariate count data processes in Bersimis, Psarakis, and Panaretos (Citation2007, p. 523). Even more sobering, it seems that the case of serially dependent processes with the full set of integers $Z = {\dots, - 1, 0, 1, \dots}$ as their range (Kim & Park, Citation2008) has not been discussed so far at all in an SPC context. It is also important to emphasize that INAR(1) processes are related to certain queue length processes (with an infinite number of servers (see Schweer & Wichelhaus, Citation2015), so control charts for queueing systems as in Chen and Zhou (Citation2015) and the control charts for autocorrelated counts as described in this article might be mutually enriching.

Besides other types of process models, also different approaches for process monitoring and chart design should be considered in future works. These may cover adaptive sampling procedures (e.g. variable sampling intervals) as discussed in Epprecht, Costa, and Mendes (Citation2003), Montgomery (Citation2009), for instance, or the additional use of runs rules as, e.g. in Alwan, Champ, and Maragah (Citation1994), Acosta-Mejia Citation1999, Koutras, Bersimis, and Maravelakis(Citation2007). Related to the latter approach, the so-called synthetic control charts attracted a lot of research interest during the last years, but recently also drew sound criticism (Knoth, Citationin press). Concerning chart design, it might be interesting to apply economic design principles (Celano, Citation2011; Montgomery, Citation2009) in the context of autocorrelated counts, and also the Phase-I analysis for such processes (choice of estimators, effect of parameter estimation, etc.) deserves more attention.

Finally, much more research effort should be put on other types of discrete-valued and serially dependent processes, especially on categorical processes (both ordinal and nominal). There are some works for the special case of serially dependent binary attributes, e.g. the Markov Binary CUSUM chart for continuously monitoring a Markov-dependent Bernoulli process as proposed by Mousavi and Reynolds (Citation2009), or the Markov Binomial EWMA chart for monitoring segments taken from such a Markovian Bernoulli process (see Weiß, Citation2009d). But if product or service quality, for instance, is classified in more than only two categories, then methods for monitoring non-binary but serially dependent categorical processes would be required. See Weiß(Citation2012a) and the references therein for a few first approaches in this direction, while a comprehensive treatment of this area is still pending.

Acknowledgements

The author thanks the two referees for carefully reading the manuscript and for their valuable comments, which greatly improved the article.

Additional information

Funding

The author received no direct funding for this research.

Notes on contributors

Christian H. Weiß

Christian H. Weiß is a Professor at the Department of Mathematics and Statistics at the Helmut Schmidt University in Hamburg, Germany (since 2013). He earned his doctorate (Mathematical Statistics) in 2009 at the University of Würzburg, and from 2009 to 2013 he had a permanent post as an “Akademischer Rat” at the Department of Mathematics at Darmstadt University of Technology. His research areas include time series analysis, statistical quality control, and computational statistics.

Notes

1 While the control limits are chosen according to stochastic properties of the monitored process, see below, the specification limits are determined according to the usability of the produced items: if the considered quality characteristic of a produced item violates the specification limits, it has to be classified as being defective.

References

Acosta-Mejia, C. A. (1999). Improved p charts to monitor process quality. IIE Transactions, 31, 509–516.
Web of Science ®Google Scholar
Al-Osh, M. A., & Alzaid, A. A. (1987). First-order integer-valued autoregressive (INAR(1)) process. Journal of Time Series Analysis, 8, 261–275.
Google Scholar
Alwan, L. C. (1992). Effects of autocorrelation on control chart performance. Communications in Statistics - Theory and Methods, 21, 1025–1049.
Web of Science ®Google Scholar
Alwan, L. C. (1995). The problem of misplaced control limits. Journal of the Royal Statistical Society C, 44, 269–278.
Web of Science ®Google Scholar
Alwan, L. C., Champ, C. W., & Maragah, H. D. (1994). Study of average run lengths for supplementary runs rules in the presence of autocorrelation. Communications in Statistics - Simulation and Computation, 23, 373–391.
Web of Science ®Google Scholar
Alwan, L. C., & Roberts, H. V. (1988). Time series modeling for statistical process control. Journal of Business & Economic Statistics, 6, 87–95.
Web of Science ®Google Scholar
Bersimis, S., Psarakis, S., & Panaretos, J. (2007). Multivariate statistical process control charts: an overview. Quality and Reliability Engineering International, 23, 517–543.
Web of Science ®Google Scholar
Borges, W., & Ho, L. L. (2001). A fraction defective based capability index. Quality and Reliability Engineering International, 17, 447–458.
Web of Science ®Google Scholar
Brook, D., & Evans, D. A. (1972). An approach to the probability distribution of CUSUM run length. Biometrika, 59, 539–549.
Web of Science ®Google Scholar
Celano, G. (2011). On the constrained economic design of control charts: A literature review. Produ\c{c}ão, 21, 223–234.
Google Scholar
Chen, N., & Zhou, S. (2015). CUSUM statistical monitoring of M/M/1 queues and extensions. Technometrics, 57, 245–256.
Web of Science ®Google Scholar
Davoodi, M., Niaki, S. T. A., & Torkamani, E. A. (2015). A maximum likelihood approach to estimate the change point of multistage Poisson count processes. International Journal of Advanced Manufacturing Technology, 77, 1443–1464.
Web of Science ®Google Scholar
Du, J.-G., & Li, Y. (1991). The integer-valued autoregressive (INAR(p)) model. Journal of Time Series Analysis, 12, 129–142.
Google Scholar
Epprecht, E. K., Costa, A. F. B., & Mendes, F. C. T. (2003). Adaptive control charts for attributes. IIE Transactions, 35, 567–582.
Web of Science ®Google Scholar
Ferland, R., Latour, A., & Oraichi, D. (2006). Integer-valued GARCH processes. Journal of Time Series Analysis, 27, 923–942.
Web of Science ®Google Scholar
Franke, J., Kirch, C., & Kamgaing, J. T. (2012). Changepoints in times series of counts. Journal of Time Series Analysis, 33, 757–770.
Web of Science ®Google Scholar
Gan, F. F. (1990). Monitoring Poisson observations using modified exponentially weighted moving average control charts. Communications in Statistics - Simulation and Computation, 19, 103–124.
Web of Science ®Google Scholar
Hawkins, D. M. (1993). Robustification of cumulative sum charts by Winsorization. Journal of Quality Technology, 25, 248–261.
Web of Science ®Google Scholar
Hawkins, D. M., & Olwell, D. H. (1998). Cumulative sum charts and charting for quality improvement. New York, NY: Springer-Verlag.
Google Scholar
Heinen, A. (2003). Modelling time series count data: An autoregressive conditional Poisson model ( CORE Discussion Paper No. 2003-63). Belgium: University of Louvain.
Google Scholar
Hudecov\’a, \v{S}, Hu\v{s}kov\’a, M., & Meintanis, S. (2015). Detection of changes in INAR models. In A. Steland, E. Rafaj{\l}owicz, & K. Szajowski (Eds.), Stochastic models, statistics and their applications, Springer proceedings in mathematics & statistics (Vol. 122, pp. 11–18). Springer.
Google Scholar
Jazi, M. A., Jones, G., & Lai, C.-D. (2012). First-order integer valued AR processes with zero inflated Poisson innovations. Journal of Time Series Analysis, 33, 954–963.
Web of Science ®Google Scholar
Jensen, W. A., Jones-Farmer, L. A., Champ, C. W., & Woodall, W. H. (2006). Effects of parameter estimation on control chart properties: a literature review. Journal of Quality Technology, 32, 395–409.
Google Scholar
Johnson, N. L., Kemp, A. W., & Kotz, S. (2005). Univariate discrete distributions (3rd ed.). Hoboken, NJ: Wiley.
Google Scholar
Jones-Farmer, L. A., Woodall, W. H., Steiner, S. H., & Champ, C. W. (2014). An overview of phase I analysis for process improvement and monitoring. Journal of Quality Technology, 46, 265–280.
Web of Science ®Google Scholar
Kang, J., & Lee, S. (2014). Parameter change test for Poisson autoregressive models. Scandinavian Journal of Statistics, 41, 1136–1152.
Web of Science ®Google Scholar
Kang, J., & Song, J. (2015). Robust parameter change test for Poisson autoregressive models. Statistics and Probability Letters, 104, 14–21.
Web of Science ®Google Scholar
Kedem, B., & Fokianos, K. (2002). Regression models for time series analysis. Hoboken, NJ: Wiley.
Google Scholar
Kenett, R. S., & Pollak, M. (2012). On assessing the performance of sequential procedures for detecting a change. Quality and Reliability Engineering International, 28, 500–507.
Web of Science ®Google Scholar
Kim, H.-Y., & Park, Y. (2008). A non-stationary integer-valued autoregressive model. Statistical Papers, 49, 485–502.
Web of Science ®Google Scholar
Kirch, C., & Kamgaing, J. T. (2015). On the use of estimating functions in monitoring time series for change points. Journal of Statistical Planning and Inference, 161, 25–49.
Web of Science ®Google Scholar
Knoth, S. (2006). The art of evaluating monotoring schemes---How to measure the performance of control charts? In H.-J. Lenz & P.-T. Wilrich (Eds.), Frontiers in statistical quality control 8 (pp. 74–99). Heidelberg: Physica Verlag.
Google Scholar
Knoth, S. (in press). The case against the use of synthetic control charts. Journal of Quality Technology.
Web of Science ®Google Scholar
Knoth, S., & Schmid, W. (2004). Control charts for time series: A review. In H. J. Lenz & P. T. Wilrich (Eds.), Frontiers in statistical quality control 7 (pp. 210–236). Heidelberg: Physica-Verlag.
Google Scholar
Koutras, M. V., Bersimis, S., & Maravelakis, P. E. (2007). Statistical process control using Shewhart control charts with supplementary runs rules. Methodology and Computing in Applied Probability, 9, 207–224.
Web of Science ®Google Scholar
Li, C., Wang, D., & Zhu, F. (in press). Effective control charts for monitoring the NGINAR(1) process. Quality and Reliability Engineering International.
Web of Science ®Google Scholar
McKenzie, E. (1985). Some simple models for discrete variate time series. Water Resources Bulletin, 21, 645–650.
Google Scholar
Montgomery, D. C. (2009). Introduction to statistical quality control (6th ed.). New York, NY: Wiley.
Google Scholar
Morais, M. C., & Pacheco, A. (in press). On hitting times for Markov time series of counts with applications to quality control. RevStat.
PubMed Web of Science ®Google Scholar
Mousavi, S., & Reynolds, Jr., M. R., (2009). A CUSUM chart for monitoring a proportion with autocorrelated binary observations. Journal of Quality Technology, 41, 401–414.
Web of Science ®Google Scholar
Page, E. (1954). Continuous inspection schemes. Biometrika, 41, 100–115.
Web of Science ®Google Scholar
Perakis, M., & Xekalaki, E. (2005). A process capability index for discrete processes. Journal of Statistical Computation and Simulation, 75, 175–187.
Web of Science ®Google Scholar
Psarakis, S., & Papaleonida, G. E. A. (2007). SPC procedures for monitoring autocorrelated processes. Quality Technology & Quantitative Management, 4, 501–540.
Google Scholar
Ristić, M. M., Bakouch, H. S., & Nastić, A. S. (2009). A new geometric first-order integer-valued autoregressive (NGINAR(1)) process. Journal of Statistical Planning and Inference, 139, 2218–2226.
Web of Science ®Google Scholar
Roberts, S. W. (1959). Control chart tests based on geometric moving averages. Technometrics, 1, 239–250.
Google Scholar
Saha, M., & Maiti, S. S. (2015). Trends and practices in process capability studies. arXiv:1503.06885v1 [stat.AP].
Google Scholar
Schweer, S., & Wei{\ss}, C. H. (2014). Compound Poisson INAR(1) processes: Stochastic properties and testing for overdispersion. Computational Statistics & Data Analysis, 77, 267–284.
Web of Science ®Google Scholar
Schweer, S., & Wichelhaus, C. (2015). Queueing systems of INAR(1) processes with compound Poisson arrivals. Stochastic Models, 31, 618–635.
Web of Science ®Google Scholar
Steutel, F. W., & van Harn, K. (1979). Discrete analogues of self-decomposability and stability. Annals of Probability, 7, 893–899.
Web of Science ®Google Scholar
Torkamani, E. A., Niaki, S. T. A., Aminnayeri, M., & Davoodi, M. (2014). Estimating the change point of correlated Poisson count processes. Quality Engineering, 26, 182–195.
Web of Science ®Google Scholar
Wei{\ss}, C. H. (2007). Controlling correlated processes of Poisson counts. Quality and Reliability Engineering International, 23, 741–754.
Web of Science ®Google Scholar
Wei{\ss}, C. H. (2008a). Thinning operations for modelling time series of counts---A survey. Advances in Statistical Analysis, 92, 319–341.
Web of Science ®Google Scholar
Wei{\ss}, C. H. (2008b). Serial dependence and regression of Poisson INARMA models. Journal of Statistical Planning and Inference, 138, 2975–2990.
Web of Science ®Google Scholar
Wei{\ss}, C. H. (2009a). Monitoring correlated processes with binomial marginals. Journal of Applied Statistics, 36, 399–414.
Web of Science ®Google Scholar
Wei{\ss}, C. H. (2009b). EWMA monitoring of correlated processes of Poisson counts. Quality Technology and Quantitative Management, 6, 137–153.
Web of Science ®Google Scholar
Wei{\ss}, C. H. (2009c). Controlling jumps in correlated processes of Poisson counts. Applied Stochastic Models in Business and Industry, 25, 551–564.
Web of Science ®Google Scholar
Wei{\ss}, C. H. (2009d). Group inspection of dependent binary processes. Quality Reliability Engineering International, 25, 151–165.
Web of Science ®Google Scholar
Wei{\ss}, C. H. (2011a). Detecting mean increases in Poisson INAR(1) processes with EWMA control charts. Journal of Applied Statistics, 38, 383–398.
Web of Science ®Google Scholar
Wei{\ss}, C. H. (2011b). The Markov chain approach for performance evaluation of control charts---A tutorial. In S. P. Werther (Ed.), Process control: Problems, techniques and applications (pp. 205–228). New York, NY: Nova Science.
Google Scholar
Wei{\ss}, C. H. (2012a). Continuously monitoring categorical processes. Quality Technology and Quantitative Management, 9, 171–188.
Web of Science ®Google Scholar
Wei{\ss}, C. H. (2012b). Process capability analysis for serially dependent processes of Poisson counts. Journal of Statistical Computation and Simulation, 82, 383–404.
Web of Science ®Google Scholar
Wei{\ss}, C. H., & Kim, H.-Y. (2013). Parameter estimation for binomial AR(1) models with applications in finance and industry. Statistical Papers, 54, 563–590.
Web of Science ®Google Scholar
Wei{\ss}, C. H., & Testik, M. C. (2009). CUSUM monitoring of first-order integer-valued autoregressive processes of Poisson counts. Journal of Quality Technology, 41, 389–400.
Web of Science ®Google Scholar
Wei{\ss}, C. H., & Testik, M. C. (2011). The Poisson INAR(1) CUSUM chart under overdispersion and estimation error. IIE Transactions, 43, 805–818.
Web of Science ®Google Scholar
Wei{\ss}, C. H., & Testik, M. C. (2012). Detection of abrupt changes in count data time series: Cumulative sum derivations for INARCH(1) models. Journal of Quality Technology, 44, 249–264.
Web of Science ®Google Scholar
Wei{\ss}, C. H., & Testik, M. C. (2015a). Residuals-based CUSUM charts for Poisson INAR(1) processes. Journal of Quality Technology, 47, 30–42.
Web of Science ®Google Scholar
Wei{\ss}, C. H., & Testik, M. C. (2015b). On the phase I analysis for monitoring time-dependent count processes. IIE Transactions, 47, 294–306.
Web of Science ®Google Scholar
Woodall, W. H. (1997). Control charts based on attribute data: Bibliography and review. Journal of Quality Technology, 29, 172–183.
Web of Science ®Google Scholar
Woodall, W. H. (2000). Controversies and contradictions in statistical process control. Journal of Quality Technology, 32, 341–350.
Web of Science ®Google Scholar
Woodall, W. H., & Montgomery, D. C. (2014). Some current directions in the theory and application of statistical process monitoring. Journal of Quality Technology, 46, 78–94.
Web of Science ®Google Scholar
Yontay, P., Wei\ss, C. H., Testik, M. C., & Bayindir, Z. P. (2013). A two-sided CUSUM chart for first-order integer-valued autoregressive processes of Poisson counts. Quality and Reliability Engineering International, 29, 33–42.
Web of Science ®Google Scholar
Zhang, M., Nie, G., He, Z., & Hou, X. (2014). The Poisson INAR(1) one-sided EWMA chart with estimated parameters. International Journal of Production Research, 52, 5415–5431.
Web of Science ®Google Scholar
Zucchini, W., & MacDonald, I. L. (2009). Hidden Markov models for time series: An introduction using R. London: Chapman & Hall/CRC.
Google Scholar

SPC methods for time-dependent processes of counts—A literature review

Abstract

Public Interest Statement

1. Introduction

2. Basic models for autocorrelated counts processes

3. Common SPC methods

3.1. Control charts

3.2. Process capability indices

4. Advanced control charts

4.1. CUSUM charts

4.2. EWMA charts

4.3. Jumps chart

5. Conclusions

Acknowledgements

Notes on contributors

Christian H. Weiß

References

Information for

Open access

Opportunities

Help and information

SPC methods for time-dependent processes of counts—A literature review

Abstract

Public Interest Statement

1. Introduction

2. Basic models for autocorrelated counts processes

3. Common SPC methods

3.1. Control charts

3.2. Process capability indices

4. Advanced control charts

4.1. CUSUM charts

4.2. EWMA charts

4.3. Jumps chart

5. Conclusions

Acknowledgements

Additional information

Funding

Notes on contributors

Christian H. Weiß

Notes

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date