Full article: Developing Machine Learning-based Control Charts for Monitoring Different GLM-type Profiles With Different Link Functions

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

In certain situations, the quality of a process is determined by dependent variables in relation to independent variables, often modeled through a regression framework referred to as a profile. The practice of monitoring and preserving this relationship is known as profile monitoring. In this paper, we propose an innovative approach that uses different machine-learning (ML) techniques for constructing control charts and monitoring generalized linear model (GLM) profiles with three different GLM-type response distributions of Binomial, Poisson, and Gamma, and by examining different link functions for each response distribution. Through our simulation study, we undertake a comparative analysis of different training methods. We measure the charts’ performance using the average run length, which signifies the average number of samples taken before observing a data point that exceeds the predefined control limits. The result shows that the selection of ML control charts is contingent on the response distribution and link function, and depends on the shift sizes in the process and the utilized training method. To illustrate the practical application of the proposed ML control charts, we present two real-world cases as examples: a drug–response study and a volcano-eruption study, to demonstrate how each ML chart can be implemented in practice.

Introduction

Statistical quality control (SQC) is the utilization of statistical tools and techniques to control a product or service’s quality and is used across a range of industries. It can be divided into three main categories: statistical process control (SPC), design of experiments (DOE), and acceptance samplings.

SPC is a quality control method used to monitor and control a process to ensure that it operates within a specified range of variation. The goal of SPC is to detect and correct process variations before they result in defective products or services. By monitoring and controlling processes, organizations can improve quality, reduce waste, and increase efficiency. More details about SQC can be found in Montgomery (Citation2012).

The main tool in SPC is a control chart. In 1924, the first control chart was introduced by Dr. Walter A. Shewhart. At the same time, the concept of statistical process monitoring started developing.

A special case of SPC is profile monitoring, in which we monitor a relationship over time instead of monitoring quality characteristics. Profile monitoring is used to detect changes or shifts in the relationship between the key variables over time, allowing for proactive action to be taken to address any issues or identify opportunities for improvements. Profile monitoring can be characterized as the relationship between one or more response variables and one or more explanatory variables. This functional relationship is in the form of a regression model. If the regression has one explanatory variable, it is called a simple profile; otherwise, it is called a multiple profile. The focus of this paper is solely on the simple profiles. Profile monitoring using control charts is divided into two phases: Phase I and Phase II. In Phase I, a set of historical data points is available, and the main goal of Phase I is evaluating the process stability and estimating the process parameters from the in-control samples. In most of the studies, as well as in ours, it is assumed that the parameters are estimated before the investigation and the study is performed in Phase II to conduct online monitoring and detect shifts.

Kang and Albin (Citation2000) introduced simple linear profiles with two main applications in semiconductor and food manufacturing. They used the memory-less Hotelling´s T² and memory-type exponentially weighted moving average (EWMA) control charts. A significant development in monitoring simple linear profiles was made by Kim, Mahmoud, and Woodall (Citation2003). Other notable work has been proposed by Zou, Zhang, and Wang (Citation2006). They used a control chart based on a change-point model to monitor linear profiles. More information about different profile types and profile monitoring schemes can be found in Noorossana, Saghaei, and Amiri (Citation2011).

Simple linear profiles are generally used with the assumptions of being quantitative and the normal distribution of the response variable (Amiri et al. Citation2010; Soleimani, Noorossana, and Amiri Citation2009). However, these assumptions can be violated in some applications, and instead, generalized linear models (GLMs) can be used to describe the profiles. We can use a range of distributions and different link functions in GLMs. Koosha and Amiri (Citation2011) investigated the effect of link functions on monitoring logistic regression profiles using the Hotelling’s T² control chart. The other types of distributions most commonly used other than the Binomial distribution, are the Gamma and Poisson distributions. Amiri, Koosha, and Azhdari (Citation2012) used the Hotelling’s T²-based methods for monitoring Gamma profiles, while Sharafi, Aminnayeri, and Amiri (Citation2013) used the maximum likelihood estimator approach for estimating the time of the step changes in Poisson regression profiles. Koosha and Amiri (Citation2013) considered a generalized linear mixed model for monitoring autocorrelated logistic regression profiles. Shadman et al. (Citation2014) used a change point approach based on the Rao score test for monitoring GLM profiles. Qi et al. (Citation2016) used the weighted likelihood ratio charts for monitoring GLM profiles. Amiri, Sogandi, and Ayoubi (Citation2018) developed some EWMA-based methods for simultaneous monitoring of the multivariate linear and GLM regression profiles.

One strategy would be using machine learning (ML) techniques for process or profile monitoring for different purposes such as dimension reduction, detection, identification, pattern recognition, and diagnosis. The most common ML techniques used for process monitoring are Support Vector Machines (SVM), Random Forests (RF), and Neural Networks (NN) (Apsemidis, Psarakis, and Moguerza Citation2020; Escobar and Morales-Mendez Citation2018; Mohd Amiruddin et al. Citation2020; Yeganeh et al. Citation2022; Yeganeh, Pourpanah, and Shadman Citation2021). In recent years, researchers have combined ML techniques with process monitoring and have used them to construct control charts (Mohammadzadeh, Yeganeh, and Shadman Citation2021; Sabahno and Amiri Citation2023; Sabahno and Niaki Citation2023; Yeganeh et al. Citation2023; Yeganeh, Chukhrova, et al. Citation2023; Yeganeh, Johannssen, et al. Citation2023; Yeganeh, Johannssen, Chukhrova, et al. Citation2023; Yeganeh et al. Citation2024). These studies have shown superior performances of the ML-based control charts over the statistics-based control charts.

Sabahno and Amiri (Citation2023) developed statistical and ML control charts for monitoring GLM profiles. However, they only investigated the Binomilly distributed response and the logit link function. In this paper, for the first time in the literature, we investigate different ML techniques, namely SVM, RF, and NN to build control charts for monitoring different GLM profiles with different link functions. To model profiles, we consider three main distributions: Binomial, Gamma, and Poisson, each with three different corresponding link functions detailed in Section GLM Profiles. We simulate data with fixed parameters and try a variety of training methods to obtain the best ML structures to achieve the desired performance. The performances are evaluated regarding the average run length (ARL) with different shift sizes and types. Run length is the number of samples (subgroups) taken before the chart signals a point beyond the control limit. The main purpose of this paper is to use a range of ML techniques to build the control charts and investigate different response distributions and different link functions as well as different training methods for the ML structures to see which one results in better performance (smaller values of ARL).

This paper is structured as follows. In Section GLM Profiles, we introduce the main theory connected with GLM profiles, explain the differences between profiles for different distributions used in this paper, and briefly describe the theory connected with the link functions. Section Machine Learning Techniques follows with a description of ML techniques. Following the introduction of ML techniques, ML control charts are described in Section Control Charts. In Section Simulation Studies, extensive numerical analyses to evaluate the ML control charts with different distributions, link functions, training methods, and shift sizes and types are performed. After the simulation studies, two different illustrative examples are presented in Section Illustrative Examples. Finally, we summarize the results obtained by this study, state our conclusions, and make some suggestions for future research in Section Concluding Remarks.

GLM profiles

We describe GLM profiles according to Sogandi and Amiri (Citation2017) and Olsson (Citation2002). We assume for the kth profile to have a set of n observations ${(x_{j, k}, y_{j, k}), j = 1, \dots, n},$ where $x_{j, k}$ is a vector of predictor variables and $y_{j, k}$ represents the jth response variable in the kth profile. We assume that the relationship between the response $y_{j, k}$ and the predictor variables $x_{j, k}$ is defined by generalized linear models (GLM), where $x_{j, k}$ is a vector of q predictor variables ( $x_{j, k, 1}, \dots, x_{j, k, q})$ . As a result, the response variables $y_{j, k}$ have the same distribution for each profile. The predictor variables are combined linearly with the coefficients vector $β_{j}$ to define the linear predictor $μ_{j, k} = x_{j, k}^{T} β_{j}$ , where $β_{j} = (β_{j, 1}, \dots, β_{j, q})$ . We also assume a monotone link function g defined with the following relationship:

g (η_{j, k}) = μ_{j, k} = x_{j, k}^{T} β_{j},

where $η_{j, k} = E (y_{j, k})$ and g is a monotone link function. The link functions will be described in the following section. If the value of $x_{j, k, 1}$ is equal to 1, $β_{j, 1}$ is the intercept of the model, and in our study we have $q = 2$ . Most of the studies in profile monitoring assume that the response variable follows the Normal distribution. However, in this paper, we assume that the response variable belongs to the exponential family of distributions and has a Binomial, Poisson, or Gamma distribution, because they are more common in real practices. For the kth profile, the probability distribution belongs to the exponential family if it can be written as:

f (y_{j, k}, θ_{j, k}, ϕ_{j, k}) = \exp (\frac{(y_{j, k} θ_{j, k} - b (θ_{j, k}))}{a (y_{j, k})} + c (y_{j, k}, θ_{j, k})),

where $a (.), b (.)$ and $c (.)$ are some functions, and $ϕ_{j, k}$ is the dispersion parameter. $ϕ_{j, k}$ is the canonical parameter representing some function of the location parameter of the distribution. More information about these functions and parameters for the corresponding distribution can be found in Olsson (Citation2002).

Link functions transform the probabilities of the levels of a categorical response variable to a continuous scale that is unbounded. Link function g represents a relationship between the mean of the response variable and the explanatory variables as:

g (E (Y)) = X^{T} β .

We also know the canonical link functions. The canonical link functions transform the mean to a canonical location parameter of the exponential family member. For the canonical link function $g (.)$ , we have $g (E (Y)) = θ$ . So for the Binomial distribution, the canonical link is logit, for the Poisson is log and for the Gamma is negative inverse. There are also some references where the Gamma distribution has different parametrization or the canonical link function (e.g. in Myers et al. Citation2010, it is mentioned that the canonical link function is inverse). The choice of the link function mostly depends on the data type and there are times when the canonical link may not be appropriate. Therefore, we define the most common link functions in the following tables, which are later used for the corresponding distributions. For the Binomial distribution from we have for the cloglog link function $μ_{B} = 1 - \exp (- exp (X^{T} β))$ , for the logit link function $μ_{B} = \frac{\exp (X^{T} β)}{1 + \exp (X^{T} β)}$ and for the probit link function $μ_{B} = ϕ^{- 1} (X^{T} β)$ , where $μ_{B}$ represents the mean value for the Binomial distribution.

Table 1. Link functions for Binomial distribution.

Display Table

For the Poisson distribution from we have the identity link function $λ = X^{T} β$ , the log link function $λ = \exp (X^{T} β)$ , and the sqrt link function $λ = (X^{T} β)^{2}$ .

Table 2. Link functions for Poisson distribution.

Display Table

For the Gamma distribution from we have the identity link function $μ_{G} = κω = X^{T} β$ , the inverse link function $μ_{G} = κω = (X^{T} β)^{- 1}$ , and the log link function $μ_{G} = κω = \exp (X^{T} β)$ .

Table 3. Link functions for Gamma distribution.

Display Table

Machine Learning Techniques

We use several ML techniques to build control charts, namely Support Vector Machine (SVM), Random Forest (RF), and Neural Networks (NN), which are briefly described in what follows.

Support Vector Machine

SVM is one of the supervised learning algorithms used for the analysis of data in classification or regression problems. In the case of a regression problem, it is called SVR. The algorithm was first proposed by Boser, Guyon, and Vapnik (Citation1992). The goal of SVM is to transform the input space to a higher dimensional feature space through a mapping function and to construct a separating hyperplane with maximum distance from the closest points, called support vectors.

The kernel functions are used for the transformation of the input space to a higher dimensional feature space. The most commonly used kernel functions are linear, polynomial, sigmoid, and radial basis. The process of training an SVM involves solving a quadratic optimization problem to find the optimal solution. For optimization, in our case, the computer package uses the sequential minimization optimization algorithm. For more information about SVM, we refer interested readers to Stoean and Stoean (Citation2014).

Random Forest

RF is another ML algorithm that is widely used in classification and regression problems. It is a statistical learning method based on the principle of ensemble learning, which combines an ensemble (group of trained models). An RF algorithm consists of many decision trees. We can also call them the building blocks of the RF. Ensemble learning combines predictions (by taking the average in a regression case or choosing the class with the maximum number of occurrences in a classification case) from multiple ML algorithms to make a more accurate prediction than a single model (a decision tree, in the RF case). There are several ways to improve the model. We can specify the maximum depth of the trees, increase or decrease the number of trees, and specify the maximum number of features to be included at each node split. Increasing the number of trees can increase the precision of the prediction. For more information about RF, readers are referred to Breiman (Citation2001).

Neural Networks

The last ML technique used in this paper is NN, whose name and structure are inspired by the human brain. It can be described as a set of algorithms designed to recognize a pattern in a dataset. A NN consists of an input layer, one or more hidden layers, and an output layer.

We will only consider one hidden layer in this research. NNs with more than one hidden layer are called deep learning techniques. Nodes from one layer to another are connected with weight parameters. The problem of an NN is to determine the optimal values of the connection weights and node biases. Node bias refers to a constant input value added to each node, affecting its activation and allowing the network to account for the varying importance of different nodes. In this paper, we use the Broyden-Fletcher-Goldfarb-Shanno optimization method. The most popular NNs are multilayer perceptron, convolutional, and recurrent NNs. More information about NNs can be found in Patterson (Citation1998).

Control Charts

Control charts are the main tool of the SPC and they check whether the process is stable or not (in-control or out-of-control). The values of a measured characteristic (statistic) must be within control limits, otherwise the process is called out-of-control.

Developing control charts involves Phases I and II. In Phase-I, it is necessary to ensure the stability of the process and estimate the in-control values of the process parameters, whereas in Phase II, we monitor the future observations (online monitoring) to detect any shift in the process from the in-control state determined in Phase-I.

Many quality characteristics can be expressed in terms of numerical measurements. We call single measurable characteristics (such as weight, volume, or dimension) variables. That is one type of statistical control charts. Another type of control charts can be for attribute data (such as counts or percentages). More information about variable and attribute control charts can be found in Montgomery (Citation2012).

In the construction of control charts for process monitoring, we consider sample size, sampling interval, and control limits. Usually, sample size and sampling interval are fixed, and the control limit can be obtained with the following algorithm:

Step 1: Set the values of the probability of Type-I error $α$ and sample size n.

Step 2: Find a suitable statistic based on the problem at hand.

Step 3: Generate and sort 10,000 in-control samples in ascending order using the statistic from Step 2. The initial value of UCL will be the $[10000 (1 - α)]$ th value in this set.

Step 4: Run at least 5000 simulations and adjust UCL to obtain $ARL = \frac{1}{α}$ .

ML Control Charts

In this paper, we focus on the ML control charts similar to Sabahno and Amiri (Citation2023). We consider three input variables, the estimated slope $\hat{β_{0}}$ , intercept $\hat{β_{1}}$ , and sample mean $\overset{ˉ}{Y}$ for all the ML control charts. The output variable is a continuous variable that generates a linear number (regression output). As a result, we call the SVM, support vector regression (SVR), and we call the RF, random forest regression (RFR). An out-of-control signal is triggered provided that $output > UCL$ .

UCL is an upper control limit and is obtained using the following algorithm:

Step 1: For each distribution, link function, training method, and regression parameters values, we generate in-control and out-of-control data consisting of the estimated regression intercepts, slopes, and means of the response variables as predictor variables (inputs for the ML structure).

Step 2: Assign 0 to the response variable for the in-control training data and 1 to the response variable for the out-of-control training data.

Step 3: Train the ML structure with the training data in regression form (the training data will be described in Section Simulation Studies).

Step 4: Generate 10 000 samples from the in-control process.

Step 5: Set the value of the probability of Type-I error $α$ .

Step 6: Sort ML structure outputs in an ascending order.

Step 7: The initial value of UCL will be the $[10000 (1 - α)]$ th value in the set.

Step 8: Run at least 5000 simulations and adjust UCL to obtain $ARL = \frac{1}{α}$ .

The performance is evaluated with the shifts in the intercept, slope, and simultaneous shifts with the ML structures described in Section Simulation Studies.

Simulation Studies

In this paper, we consider three different training methods. The first training method consists of 2400 profiles, where 1200 are in-control and 1200 are out-of-control. The out-of-control profiles have three types of shifts, namely shifts in the intercept, shifts in the slope, and simultaneous shifts. All types of shifts consist of 400 out-of-control profiles and the shift sizes are equal to 0.1. The second training method consists of 2700 profiles, 1350 in-control profiles, and 1350 out-of-control profiles. The same as the first method, the out-of-control profiles have three types of shifts. In this case, we have three different shift sizes, namely 0.1, 0.5, and 1. We have 150 out-of-control profiles for each size and type. The third training method consists of 1200 in-control and 1200 out-of-control profiles. For the out-of-control profiles, we consider the same types of shifts, but only two shift sizes 0.1 and 1. In this case, we have 200 out-of-control profiles for each type and size. We adopted the first and third training methods from Sabahno and Amiri (Citation2023), and the other one is then proposed with small, moderate, and large shift sizes.

We evaluate the performance of control charts using the ARL for different profiles with corresponding distribution and link functions for each training method and ML technique. The ARL represents the average run length, the average number of samples (subgroups) taken before the chart signals (point beyond the control limits). In addition, we mention the SDRL (standard deviation of run length) as additional information. Changes are made in the profile parameters and a step change of size $Δ = {(δ_{1}, δ_{2})}^{T}$ occurs in the profile parameters, where $δ_{1}$ and $δ_{2}$ represent changes in the intercept and slope in terms of its standard deviation, so $σ_{1}$ and $σ_{2}$ are the standard deviations of the intercept and slope parameters adopted from different studies to be mentioned. To calculate the ARL and SDRL, we use the following algorithm:

This algorithm is used for the calculation of UCL in the in-control situation with $δ_{1} = 0$ and $δ_{2} = 0$ as described in Section ML Control Charts. Assuming $α$ = 0.005, the ARL value for the in-control situation is set to 200 for all the control charts. We also use the algorithm for evaluation of the performance and calculation of the ARL and SDRL in the out-of-control situations ( $δ_{1}$ and/or $δ_{2} \neq 0)$ .

We use similar shift sizes as used in Mohammadzadeh, Yeganeh, and Shadman (Citation2021). For training models, R packages e1071/svm (Meyer Citation2023), randomForest (Liaw and Wiener Citation2022), and nnet (Ripley and Venables Citation2023) are used, in which there are built-in functions for training the corresponding ML techniques. In the case of the SVR, we use the svm() function with default settings, except for the kernel function type. We consider linear, polynomial, radial, and sigmoid kernels to find the best one for each model by considering the performance and the root mean square error (RMSE). For the RFR, we use the randomForest() function by changing the number of trees, other than that, we use the default setting. The same as before, we try to find the best number of trees according to the performance and the RMSE. We use the nnet() function for training the NN. We use the default setting and only vary the number of nodes for the hidden layer and the number of iterations. To be as consistent as possible, we build the NN model with a small number of nodes. We also consider the RMSE and the performance as in previous cases. The obtained RMSEs and the package settings for all the following simulation studies are not reported in this paper to save space but they are available in table formats and can be requested from the corresponding author.

Binomial Profiles

For the Binomial profiles described in Section GLM Profiles, we assume $y_{j, k} \sim Bin (m, p_{j, k})$ , where $m = 30$ is the number of observations and $p_{j, k}$ represents the probability of success in the kth profile for the jth observation. It is defined by the following relationship with the same simulation setting as in Shadman et al. (Citation2014):

g (p_{j, k}) = β_{0} + β_{1} x_{j, k},

where the in-control parameters are set to $β_{0} = {(- 2.8, 1)}^{T}$ . The values of the predictor variable are $0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1$ with and corresponding link function $g$ . The out-of-control setting is defined for the parameters as $β_{1} = β_{0} + Δ$ , where $Δ = {(σ_{1} δ_{1}, σ_{2} δ_{2})}^{T}$ . The covariance matrix of the Binomial regression parameters is

\sum = [\begin{matrix} {σ_{1}}^{2} & σ_{1} σ_{2} \\ σ_{1} σ_{2} & {σ_{2}}^{2} \end{matrix}] = [\begin{matrix} 0.2186 & - 0.2936 \\ - 0.2936 & 0.4771 \end{matrix}] .

Hence, $σ_{1} = 0.4676$ and $σ_{2} = 0.6907$ .

For the Binomial distribution with the logit link function in , we investigate different shift sizes of the intercept, slope, and simultaneous shifts using the ARL with different training methods and ML control charts. In the case of the first training method, we obtain the smallest values of ARL with the SVR control chart. We see different results for the second training method. The smallest values of ARL are for the NN control chart with some exceptions under shifts in the slope and simultaneous shifts (see ). For the third training method, we obtain the smallest values of ARL with the SVR control chart, except for the smallest and largest shift sizes in the intercept, slope, and simultaneous shift. In these cases, we obtain the smallest values of ARL with the NN control chart. We can see that the ARL values decrease with increasing shift sizes for all the control charts except for the RFR control chart with the first training method.

Developing Machine Learning-based Control Charts for Monitoring Different GLM-type Profiles With Different Link Functions

ABSTRACT

Introduction

GLM profiles

Table 1. Link functions for Binomial distribution.

Table 2. Link functions for Poisson distribution.

Table 3. Link functions for Gamma distribution.

Machine Learning Techniques

Support Vector Machine

Random Forest

Neural Networks

Control Charts

ML Control Charts

Simulation Studies

Binomial Profiles

Table 4. ARL (SDRL) values of the SVR, RFR and NN control charts under different shifts in the intercept for Binomial distribution with the logit link function.

Table 5. ARL (SDRL) values of the SVR, RFR and NN control charts under different shifts in the slope for Binomial distribution with the logit link function.

Table 6. ARL (SDRL) values of the SVR, RFR, and NN control charts under different simultaneous shifts for Binomial distribution with the logit link function.

Table 7. ARL (SDRL) values of the SVR, RFR and NN control charts under different shifts in the intercept for Binomial distribution with the probit link function.

Table 8. ARL (SDRL) values of the SVR, RFR and NN control charts under different shifts in the slope for Binomial distribution with the probit link function.

Table 9. ARL (SDRL) values of the SVR, RFR and NN control charts under different simultaneous shifts for Binomial distribution with the probit link function.

Table 10. ARL (SDRL) values of the SVR, RFR and NN control charts under different shifts in the intercept for Binomial distribution with the cloglog link function.

Table 11. ARL (SDRL) values of the SVR, RFR and NN control charts under different shifts in the slope for Binomial distribution with the cloglog link function.

Table 12. ARL (SDRL) values of the SVR, RFR and NN control charts under different simultaneous shifts for Binomial distribution with the probit link function.

Poisson Profiles

Table 13. ARL (SDRL) values of the SVR, RFR and NN control charts under different shifts in the intercept for Poisson distribution with the log link function.

Table 14. ARL (SDRL) values of the SVR, RFR and NN control charts under different shifts in the slope for Poisson distribution with the log link function.

Table 15. ARL (SDRL) values of the SVR, RFR and NN control charts under different simultaneous shifts for Poisson distribution with the log link function.

Table 16. ARL (SDRL) values of the SVR, RFR and NN control charts under different shifts in the intercept for Poisson distribution with the sqrt link function.

Table 17. ARL (SDRL) values of the SVR, RFR and NN control charts under different shifts in the slope for Poisson distribution with the sqrt link function.

Table 18. ARL (SDRL) values of the SVR, RFR and NN control charts under different simultaneous shifts for Poisson distribution with the sqrt link function.

Table 19. ARL (SDRL) values of the SVR, RFR and NN control charts under different shifts in the intercept for Poisson distribution with the identity link function.

Table 20. ARL (SDRL) values of the SVR, RFR and NN control charts under different shifts in the intercept for Poisson distribution with the identity link function.

Table 21. ARL (SDRL) values of the SVR, RFR and NN control charts under different simultaneous shifts for Poisson distribution with the identity link function.

Gamma Profiles

Illustrative Examples

A Drug–Response Study

A Volcano–Eruption Study

Concluding Remarks

Acknowledgements

Disclosure Statement

Data Availability Statement

Correction Statement

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date