791

Views

CrossRef citations to date

Altmetric

Research Articles

A regularised fast recursive algorithm for fraction model identification of nonlinear dynamic systems

Li Zhanga Shanghai Key Laboratory of Power Station Automation Technology, School of Mechatronic Engineering and Automation, Shanghai University, Shanghai, People's Republic of ChinaView further author information

Kang Lib School of Electronic and Electrical Engineering, University of Leeds, Leeds, UKCorrespondence[email protected]
View further author information

Dajun Dua Shanghai Key Laboratory of Power Station Automation Technology, School of Mechatronic Engineering and Automation, Shanghai University, Shanghai, People's Republic of ChinaCorrespondence[email protected]

https://orcid.org/0000-0003-2979-1507 View further author information

Yihuan Lic School of Control and Computer Engineering, North China Electric Power University, Beijing, People's Republic of ChinaView further author information

Minrui Feia Shanghai Key Laboratory of Power Station Automation Technology, School of Mechatronic Engineering and Automation, Shanghai University, Shanghai, People's Republic of ChinaView further author information

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

The fraction model has been widely used to represent a range of engineering systems. To accurately identify the fraction model is however challenging, and this paper presents a regularised fast recursive algorithm (RFRA) to identify both the true fraction model structure and the associated unknown model parameters. This is achieved first by transforming the fraction form to a linear combination of nonlinear model terms. Then the terms in the denominator are used to form a regularisation term in the cost function to offset the bias induced by the linear transformation. According to the structural risk minimisation principle based on the new cost function, the model terms are selected based on their contributions to the cost function and the coefficients are then identified recursively without explicitly solving the inverse matrix. The proposed method is proved to have low computational complexity. Simulation results confirm the efficacy of the method in fast identification of the true fraction models for the targeted nonlinear systems.

Keywords:

1. Introduction

Machine learning is a powerful tool for technological progress and has been widely adopted for approximating nonlinear dynamic systems (Yildiz et al., Citation2017). The nonlinear system modelling has been a subject of intensive research in the past decades, and various machine learning regression (Singh & Dhiman, Citation2021) models have been proposed, such as polynomial nonlinear models (Paduart et al., Citation2010; Wang et al., Citation2021; C. Yang et al., Citation2020), rational models (Austin et al., Citation2021; Zhu et al., Citation2015, Citation2019), support vector machine (SVM) based regression models (Liang & Zhang, Citation2021; Suykens et al., Citation2002) and neural network models (Du et al., Citation2022; Qiao et al., Citation2018; Z. Yang et al., Citation2020), to just name a few.

Among them, polynomial nonlinear regression models are the most popular machine learning regression methods due to their simplicity to develop and capture complex relationships between system inputs and system outputs (Gambella et al., Citation2021). In particular, the rational model which is a ratio of two polynomials has attracted more attentions in the community as many engineering systems can be modelled as the ratio of two linear-in-the-parameter equations (Andreev & Borulko, Citation2017; Curtain & Morris, Citation2009; Hussein & Batarseh, Citation2011; Kambhampati et al., Citation2000; Swarnakar et al., Citation2017, Dec; Zhu et al., Citation2019), each consisting of some base functions. They are often referred to fraction models, and the rational models which all consist of polynomial terms in both numerator and denominator is a special case (Li, Citation2005). The fraction model has the capability of concisely approximating a wide range of nonlinear systems, while traditional models such as nonlinear polynomial ARX models, neural networks, support vector machines (SVM) (J. Zhang et al., Citation2012) may become overly complex in order to achieve the same level of modelling accuracy.

To fit a fraction model using the training data is a machine learning regression modelling problem. To establish an accurate fraction model, an efficient automatic learning method is the key to both determine the model structure and identify unknown parameters. This is however technically challenging, and for rational models as a special case, the solutions fall into two major categories. The first one is to transform the rational form to a linear-in-the-parameter expression (Billings & Zhu, Citation1991, Citation1994). Based on the transformed system, various algorithms have been developed to identify a suitable model structure and in the meantime to identify the unknown linear coefficients (Billings & Mao, Citation1998; Billings & Zhu, Citation1991; Zhu, Citation2003, Citation2005; Zhu & Billings, Citation1996; Zhu et al., Citation2017). These algorithms all have the same aim to iteratively offset the bias by updating the coefficients and the regressors based on the variance of estimation errors. The resultant models identified by these well-established methods are often less complex, though they still suffer from high computation complexity. The other approach is directly selecting the base function using the orthogonal methods, such as an orthogonal rational model estimation (Billings & Zhu, Citation1994) and Stieltjes process (Austin et al., Citation2021).

In particular, the popular orthogonal least square (OLS) method has been widely used for identifying a range of nonlinear models including the rational models (Zhu & Billings, Citation1996) although the conventional OLS scheme is still computationally expensive, particularly for backward model reduction. To reduce the computational complexity, Li et al proposed a fast recursive algorithm (FRA) (Li et al., Citation2005). The FRA framework has further led to the development of a series of effective algorithms for nonlinear system modelling, such as the two-stage stepwise algorithm (Li et al., Citation2006), two-stage OLS method (L. Zhang et al., Citation2015), least squares support vector machine (LSSVM) (Zhao et al., Citation2015) as well as methods to construct compact RBF networks (Deng et al., Citation2012, Citation2013, Citation2010). This paper will use the similar algorithmic framework to build fraction models for nonlinear systems, aiming to both develop a compact model and reduce the computational complexity.

A challenge for identifying the fraction model is to eliminate the bias caused by the errors introduced by the transformation. To tackle this challenge, various methods have been proposed to estimate the errors (J. Chen et al., Citation2018; Jin & Weng, Citation2019; Manela & Biess, Citation2021; Piga & Tóth, Citation2014). In order to penalise the bias induced by the linear transformation, this paper adopts a negative regularisation term to penalise the errors introduced into the transformed linear model. In essence, the regularisation term is used to balance the linear model estimation error and the bias of the estimated parameters due to the linear transformation (Ohlsson et al., Citation2010). The regularisation method proposed by Tikhonov (Tikhonov & Glasko, Citation1965) and its variants (Luo et al., Citation2016; Münker & Nelles, Citation2016; L. Zhang et al., Citation2019) have been widely used in machine learning to reduce the bias introduced into the parameter estimation. Popular regularisation methods include the $ℓ_{0}$ regularisation, $ℓ_{1}$ regularisation and $ℓ_{2}$ regularisation (Castella et al., Citation2018; Luo et al., Citation2016; Senhaji et al., Citation2020; Tao et al., Citation2015; Torres-Barrán et al., Citation2018) where the weight of the regularisation term is often set positive to avoid over-fitting, while in this paper a negative weight is applied to the penalty term to reduce the bias introduced by the linear transformation. However, the weight has to be sufficiently small to ensure that the contributions to the cost function are positive.

Within the regularisation framework, to choose a suitable regularisation term is critical, and a number of kernel functions have been used to define the regularisation term (Birpoutsoukis et al., Citation2017; T. Chen et al., Citation2014; Pillonetto, Citation2018). The kernel method which performs a nonlinear data transformation has been widely applied in system identification, machine learning and function estimation (Bai et al., Citation2014; T. Chen et al., Citation2014; Hofmann et al., Citation2008; Pillonetto & De Nicolao, Citation2010; Pillonetto et al., Citation2014). Reproducing kernel Hilbert spaces (RKHSs) (Pillonetto et al., Citation2014; Valencia & lvarez, Citation2017) is a popular one. Similarly, in this paper, the inner product of nonlinear mapping terms in the denominator is added to the cost function with the aim to reduce the bias caused by the linear transformation. The effectiveness of this treatment will be assessed in the simulation section.

Given the quadratic cost function as a trade-off between the linear model error and model parameters bias caused by linear transformation, a regularised fast recursive algorithm is proposed in this paper to solve the optimisation problem. The main contributions in this paper are summarised as follows:

A duality formulation is introduced to translate the fraction model identification problem to a quadratic linear regression problem. Based on the structural risk minimisation principle, the dual linear regression problem is minimised by a variant of the least squares solution. In essence, identification of a fraction model is thus translated to solving a linear regression problem, thus significantly reduced the computational complexity.
To eliminate the bias induced by the linear-in-the-parameters transformation and to recover the true fraction model rather than identify a parsimonious model, a regularisation term is added to the cost function. The optimisation problem can be easily solved by a regularised least-squares method.
To automatically identify both the fraction model coefficients and the fraction model structure, a regularised fast approach (RFRA) is proposed based on the original fast recursive algorithm (Li et al., Citation2005). Unlike the locally regularised automatic construction (LRAC) method (Du et al., Citation2012), the regularisation matrix does not have to be a diagonal matrix when using the novel regularised fast recursive method. In order to efficiently evaluate the candidate base function, a recursive net contribution is developed. The proposed RFRA method simplifies the calculation of the net contribution of model terms and computation of the least squares equations for the coefficients by defining a set of recursive matrices, and the fraction model structure is determined according to the net contribution easily.
Finally, the complexity of the proposed method is analysed and compared with the well-known orthogonal method proposed in Zhu and Billings (Citation1996) to confirm its efficiency.

The remainder of the paper is organised as follows. The fraction model identification problem is formulated in Section 2. Section 3 gives some necessary preliminaries for the proposed algorithm. Then a regularised fast method is detailed in Section 4, covering the principle of the new method, fast computation of the net contribution of model terms, fast computation of model coefficients, and computational complexity analysis. Simulation case studies are presented in Section 5. Finally, Section 6 concludes this paper.

2. Problem formulation

2.1. Fraction model

Considering a multiple-input single-output system, the fraction model can be expressed as the ratio of two linear-in-the-parameter models. (1) $y (t) = \frac{a (t)}{c (t)} + e (t) = \frac{\sum_{i = 1}^{r} [p_{n i} (x (t)) \cdot θ_{n i}]}{\sum_{j = 1}^{h} [p_{d j} (x (t)) \cdot θ_{d j}]} + e (t)$ (1) where $y (t)$ , $x (t) = [x_{1} (t), \dots, x_{q} (t)]$ and $e (t)$ are the model output, the model input vector with q dimension and the model error at time instant t, respectively. $p_{n i}$ represents the ith nonlinear base mapping term of the numerator and $p_{d j}$ represents the jth nonlinear base mapping term of the denominator. r and h denote the numbers of the nonlinear terms in the numerator and in the denominator of the fraction model respectively. Also $θ_{n i}$ and $θ_{d j}$ represent the coefficients of the ith regressor in the numerator and the jth regressor in the denominator respectively.

To identify the fraction model from the training data, the system (Equation1(1) $y (t) = \frac{a (t)}{c (t)} + e (t) = \frac{\sum_{i = 1}^{r} [p_{n i} (x (t)) \cdot θ_{n i}]}{\sum_{j = 1}^{h} [p_{d j} (x (t)) \cdot θ_{d j}]} + e (t)$ (1) ) can be expressed in the matrix representation first. Then the cost function of the corresponding dual quadratic linear regression problem can be calculated and further simplified. Suppose N data samples ${x (t), y (t)}_{t = 1}^{N}$ are used for model identification, Equation (Equation1(1) $y (t) = \frac{a (t)}{c (t)} + e (t) = \frac{\sum_{i = 1}^{r} [p_{n i} (x (t)) \cdot θ_{n i}]}{\sum_{j = 1}^{h} [p_{d j} (x (t)) \cdot θ_{d j}]} + e (t)$ (1) ) can be formulated as (2) $y = a ⊘ c + e = (P_{n} Θ_{n}) ⊘ (P_{d} Θ_{d}) + e$ (2) where $y = [y (1), \dots, y (N)]^{T}$ is the model output vector. $a = [a (1), \dots, a (N)]^{T}$ is the numerator sample vector. $c = [c (1), \dots, c (N)]^{T}$ is the denominator sample vector. The components of $a ⊘ c$ are $(a ⊘ c)_{l} = a (l) / c (l), 1 =< l <= N$ . $e = [e (1), \dots, e (N)]^{(T)}$ is the model error vector. $P_{n} = [p_{n 1}, \dots, p_{n r}]$ $\in ℜ^{N \times r}$ is the selected nonlinear mapping matrix for the numerator with $p_{n i} = [p_{n i} (x (1)), \dots$ $p_{n i} (x (N))]^{T}, i = 1, \dots, r$ . $Θ_{n} = [θ_{n 1}, \dots, θ_{n r}]^{T} \in ℜ^{r \times 1}$ is the coefficients vector for the numerator. $P_{d} = [p_{d 1}, \dots, p_{d h}] \in ℜ^{N \times h}$ is the selected nonlinear mapping matrix for the denominator with $p_{d j} = [p_{d j} (x (1)), \dots, p_{d j} (x (N))]^{T},$ $j = 1, \dots, h$ . $Θ_{d} = [θ_{d 1}, \dots, θ_{d h}]^{T} \in ℜ^{h \times 1}$ is the coefficients vector for the denominator.

2.2. Transformation to linear-in-the-parameter formulation

Fraction models are not naturally linear in the parameters and its identification presents a challenge. To identify fraction models, an elementwise product of the two vectors is first defined, namely Hadamard product using ⊙ to denote the product (Nielsen, Citation2019). Suppose $α$ and $β$ are two vectors of the same dimension, then the lth component of $α ⊙ β$ is $(α ⊙ β)_{l} = α (l) β (l)$ .

To obtain the matrix representation of the corresponding linear system in the sense of least squares, both sides of Equation (Equation2(2) $y = a ⊘ c + e = (P_{n} Θ_{n}) ⊘ (P_{d} Θ_{d}) + e$ (2) ) are Hadamard multiplied by ' $c \in ℜ^{N \times 1}$ ', thus: (3) $\begin{aligned} y ⊙ c & = a ⊘ c ⊙ c + e ⊙ c = a + e ⊙ c \\ = (P_{n} Θ_{n}) + e ⊙ c = y ⊙ (P_{d} Θ_{d}) \end{aligned}$ (3) Expanding $P_{d} Θ_{d}$ and $P_{n} Θ_{n}$ in (Equation3(3) $\begin{aligned} y ⊙ c & = a ⊘ c ⊙ c + e ⊙ c = a + e ⊙ c \\ = (P_{n} Θ_{n}) + e ⊙ c = y ⊙ (P_{d} Θ_{d}) \end{aligned}$ (3) ) gives (4) $\begin{aligned} y ⊙ (p_{d 1} \cdot θ_{d 1} + p_{d 2} \cdot θ_{d 2} + \dots + p_{d h} \cdot θ_{d h}) \\ = (p_{n 1} \cdot θ_{n 1} + p_{n 2} \cdot θ_{n 2} + \dots + p_{n r} \cdot θ_{n r}) + e ⊙ c \end{aligned}$ (4) Thereby, (5) $\begin{aligned} y ⊙ p_{d 1} \cdot θ_{d 1} + y ⊙ p_{d 2} \cdot θ_{d 2} + \dots + y ⊙ p_{d h} \cdot θ_{d h} \\ = (p_{n 1} \cdot θ_{n 1} + p_{n 2} \cdot θ_{n 2} + \dots + p_{n r} \cdot θ_{n r}) + e ⊙ c \end{aligned}$ (5) Then all the terms except ' $y ⊙ p_{d 1} \cdot θ_{d 1}$ ' are moved to the right side, resulting in a linear-in-the-parameter model (6) $\begin{aligned} y ⊙ p_{d 1} θ_{d 1} & = (p_{n 1} \cdot θ_{n 1} + p_{n 2} \cdot θ_{n 2} + \dots + p_{n r} \cdot θ_{n r}) \\ - y ⊙ p_{d 2} \cdot θ_{d 2} - \dots - y ⊙ p_{d h} \cdot θ_{d h} \\ = [p_{n 1}, p_{n 2}, \dots, p_{n r} \cdot θ_{n r}] \\ \times [θ_{n 1}, θ_{n 2}, \dots, θ_{n r}]^{T} \\ - [y ⊙ p_{d 2}, \dots, y ⊙ p_{d h}] \\ \cdot [θ_{d 2}, \dots, θ_{d h}]^{T} + e ⊙ c \\ = [p_{n 1}, \dots, p_{n r}, - y ⊙ p_{d 2}, \dots, - y ⊙ p_{d h}] \\ \times [θ_{n 1}, \dots, θ_{n r}, θ_{d 2}, \dots, θ_{d h}]^{T} \\ + e ⊙ c \end{aligned}$ (6) By defining $Y = y ⊙ p_{d 1} θ_{d 1}$ , $Φ = [p_{n 1}, \dots, p_{n r}, - y ⊙ p_{d 2}, \dots, - y ⊙ p_{d h}]$ and $Θ = [θ_{n 1}, \dots, θ_{n r}, θ_{d 2}, \dots, θ_{d h}]^{T}$ , the system can be reformulated as follows: (7) $Y = Φ Θ + c ⊙ e = Φ Θ + Ξ$ (7) where $θ_{d 1}$ generally takes the value of 1, $p_{d 1} = 1_{N \times 1}$ , $Y = [y (1) \cdot p_{d 1} (1), \dots, y (N) \cdot p_{d 1} (N)]^{T} = [y (1), \dots, y (N)]^{T}$ is the output vector of new system, $Θ = [θ_{n 1},$ $\dots, θ_{n r}, θ_{d 2}, \dots, θ_{d h}]^{T}$ is the new system linear parameter vector, $Ξ = [c (1) e (1), \dots,$ $c (N) e (N)]^{T}$ is the coloured noise of the new system, $Φ = [ϕ_{1}, \dots,$ $ϕ_{M}] = [p_{n 1}, \dots, p_{n r}, - y ⊙ p_{d 2}, \dots, - y ⊙ p_{d h}]$ with M regressors for inclusion in the model, where M = r + h−1.

2.3. Model identification challenges

By transforming the fraction model in the linear-in-the-parameter form, the model identification is simplified to a linear regression problem. Both the least squares method and the likelihood estimation are two well-established methods. Here the least squares method (LS) is applied to model equation (Equation7(7) $Y = Φ Θ + c ⊙ e = Φ Θ + Ξ$ (7) ) given that the distribution of the training samples is unknown prior.

For the new system formulated in Equation (Equation7(7) $Y = Φ Θ + c ⊙ e = Φ Θ + Ξ$ (7) ), the input matrix and output vector of the model are $[x_{1}, \dots, x_{q}, y] \in ℜ^{N \times (q + 1)}$ and $Y \in ℜ^{N \times 1}$ , respectively. Then m candidate nonlinear features $φ_{i} (x_{1}, \dots, x_{q}, y), i = 1, \dots, m$ are generated if the true system is unknown a priori. Generally speaking, if the true model is unknown, then various parsimonious nonlinear models such as polynomial models and other linear-in-the-parameter models can be built to achieve equivalent modelling performance under certain conditions (Li & Peng, Citation2007). Thus, a polynomial is often used to approximate a range of nonlinear systems. To this end, $φ_{i}$ can be defined as a polynomial term. If the system model is partially known a priori, then fundamental knowledge can be obtained for an engineering system (Li, Citation2005) based on which fundamental functions ( $φ_{i}$ ) can be identified, such as exponential, fraction order polynomial terms, trigonometric, or rational functions, etc. The resultant model structure is illustrated in Figure .

Figure 1. The structure of the new linear-in-parameters system.

In statistics, there exists biased parameter estimators to dominate any unbiased parameter estimator in terms of the mean squared error (MSE) (Hong et al., Citation2008), therefore there is always a requirement to balance the variance and bias using the least square method. In LS method, the linear coefficients $Θ$ can be estimated by (8) $\hat{Θ} = (Φ^{T} Φ)^{- 1} Φ^{T} Y$ (8) In theory, for the linear-in-the-parameter system subject to white noise, the least square solution of the coefficients ( $E [\hat{Θ}]$ ) is equal to the true value $Θ$ if the sample number is sufficiently large. However, for the system in Equation (Equation7(7) $Y = Φ Θ + c ⊙ e = Φ Θ + Ξ$ (7) ), the linear coefficients in Equation (Equation8(8) $\hat{Θ} = (Φ^{T} Φ)^{- 1} Φ^{T} Y$ (8) ) can be described as (9) $\hat{Θ} = (Φ^{T} Φ)^{- 1} Φ^{T} (Φ Θ + Ξ) = Θ + (Φ^{T} Φ)^{- 1} Φ^{T} Ξ$ (9) It is clear that the noise $Ξ$ in Equation (Equation9(9) $\hat{Θ} = (Φ^{T} Φ)^{- 1} Φ^{T} (Φ Θ + Ξ) = Θ + (Φ^{T} Φ)^{- 1} Φ^{T} Ξ$ (9) ) comes from $c ⊙ e$ defined in Equation (Equation7(7) $Y = Φ Θ + c ⊙ e = Φ Θ + Ξ$ (7) ), therefore it is a coloured noise. From Figure , $Φ$ is correlated with $e$ as it is included in the information of the output $y$ . Therefore, the expectation of estimation coefficients ( $E [\hat{Θ}]$ ) is $Θ + E [(Φ^{T} Φ)^{- 1} Φ^{T} Ξ]$ rather than $Θ$ . Thus, the least square method applied to system (Equation7(7) $Y = Φ Θ + c ⊙ e = Φ Θ + Ξ$ (7) ) has a bias term in the solution, and it is necessary to eliminate the bias to improve the generalisation performance of a parsimonious model or to identify the true system model from a model candidate term pool consisting of the true model terms.

From Equation (Equation9(9) $\hat{Θ} = (Φ^{T} Φ)^{- 1} Φ^{T} (Φ Θ + Ξ) = Θ + (Φ^{T} Φ)^{- 1} Φ^{T} Ξ$ (9) ), it is clear that the bias is related to the selected terms of the denominator and $e$ . Accordingly, an improved method is proposed in the following to eliminate the bias.

3. Preliminaries

Through the linear transformation, noises have been introduced into both input and outputs of the new system (Equation2(2) $y = a ⊘ c + e = (P_{n} Θ_{n}) ⊘ (P_{d} Θ_{d}) + e$ (2) ). Furthermore, the output-error is a coloured noise which is the response excited by the white noise of the original system. Given the coloured noise, the conventional recursive least square method leads to a biased solution. To achieve unbiased coefficients estimation, a similar process to the reproducing kernel Hilbert space (RKHS) method (Pillonetto, Citation2018) which uses the inner products of the denominator is employed to mitigate the contribution of the bias to the cost function. By adding a regularisation term with a negative weight, it is possible to eliminate the bias introduced into the estimation of the original system.

3.1. The original cost function

Lemma 3.1

Suppose $e = [e (1), \dots, e (N)]$ obeys a Gaussian distribution with zero mean, for given sufficient enough data samples, then Hadamard product ( $(c ⊙ e)^{T} (c ⊙ e)$ ) satisfies: (10) $(c ⊙ e)^{T} (c ⊙ e) = Ξ^{T} Ξ = \frac{1}{N} \cdot c^{T} c \cdot e^{T} e$ (10) The proof is given as follows.

According to the multiplication of matrices, the right side of Equation (Equation10(10) $(c ⊙ e)^{T} (c ⊙ e) = Ξ^{T} Ξ = \frac{1}{N} \cdot c^{T} c \cdot e^{T} e$ (10) ) can be expanded as (11) $\begin{aligned} c^{T} c \cdot e^{T} e & = \sum_{i = 1}^{N} c^{2} (i) \sum_{j = 1}^{N} e^{2} (i) \\ = (c^{2} (1) + \dots + c^{2} (N)) \sum_{j = 1}^{N} e^{2} (j) \\ = c^{2} (1) \sum_{j = 1}^{N} e^{2} (j) + \dots + c^{2} (N) \sum_{j = 1}^{N} e^{2} (j) \\ = (c ⊙ e)^{T} (c ⊙ e) + c^{2} (1) \sum_{j = 2}^{N} e^{2} (j) + \dots \\ + c^{2} (i) \sum_{j = 1 & j \neq i}^{N} e^{2} (j) + \dots \\ + c^{2} (N) \sum_{j = 1}^{N - 1} e^{2} (j) \end{aligned}$ (11) If the system error ‘ $e$ ’ is a white noise with the variance ‘ $σ_{e}^{2}$ ’, Equation (Equation11(11) $\begin{aligned} c^{T} c \cdot e^{T} e & = \sum_{i = 1}^{N} c^{2} (i) \sum_{j = 1}^{N} e^{2} (i) \\ = (c^{2} (1) + \dots + c^{2} (N)) \sum_{j = 1}^{N} e^{2} (j) \\ = c^{2} (1) \sum_{j = 1}^{N} e^{2} (j) + \dots + c^{2} (N) \sum_{j = 1}^{N} e^{2} (j) \\ = (c ⊙ e)^{T} (c ⊙ e) + c^{2} (1) \sum_{j = 2}^{N} e^{2} (j) + \dots \\ + c^{2} (i) \sum_{j = 1 & j \neq i}^{N} e^{2} (j) + \dots \\ + c^{2} (N) \sum_{j = 1}^{N - 1} e^{2} (j) \end{aligned}$ (11) ) can be reformulated as follows given that there are sufficient data samples. (12) $\begin{aligned} c^{T} c \cdot e^{T} e & = Ξ^{T} Ξ + c^{2} (1) \cdot (N - 1) \cdot σ_{e}^{2} + \dots \\ + c^{2} (i) \cdot (N - 1) \cdot σ_{e}^{2} + \dots \\ + c^{2} (N) \cdot (N - 1) \cdot σ_{e}^{2} \\ = Ξ^{T} Ξ + \sum_{i = 1}^{N} c^{2} (i) \cdot (N - 1) \cdot σ_{e}^{2} \\ = Ξ^{T} Ξ + \sum_{i = 1}^{N} c^{2} (i) \cdot \frac{N - 1}{N} e^{T} e \\ = Ξ^{T} Ξ + \frac{N - 1}{N} c^{T} c \cdot e^{T} e \end{aligned}$ (12) Accordingly, Equation (Equation10(10) $(c ⊙ e)^{T} (c ⊙ e) = Ξ^{T} Ξ = \frac{1}{N} \cdot c^{T} c \cdot e^{T} e$ (10) ) is true.

Considering the coloured noise ' $Ξ$ ' in Equation (Equation7(7) $Y = Φ Θ + c ⊙ e = Φ Θ + Ξ$ (7) ), the sum of squared coloured noises can be calculated as follows: (13) $Ξ^{T} Ξ = \frac{c^{T} c \cdot e^{T} e}{N}$ (13) where $Ξ = Y - Φ Θ$ and $e = y - a ⊘ c$ are the linear-in-the-parameter system error vector and the original system error vector respectively. To estimate coefficients ( $Θ$ ), the denominator ( $c$ ) of the original system in Equation (Equation2(2) $y = a ⊘ c + e = (P_{n} Θ_{n}) ⊘ (P_{d} Θ_{d}) + e$ (2) ) can be written as follows: (14) $c = Ψ Θ + E$ (14) where $Ψ_{N \times M}$ with M = r + h−1 and $E$ together represent the nonlinear mapping matrix $P_{d}$ in Equation (Equation2(2) $y = a ⊘ c + e = (P_{n} Θ_{n}) ⊘ (P_{d} Θ_{d}) + e$ (2) ). M represents the number of the regressors for inclusion in the model. $E = 1_{N \times 1}$ with size $N \times 1$ represent the constant bias ( $p_{d 1}$ ) of the denominator on the left. $Ψ_{N \times M} = [ψ_{1} \dots ψ_{M}] = [0 \dots 0 p_{d 2} \dots p_{d h}]$ is augmented from the terms ( $[p_{d 2}, \dots, p_{d h}]$ ) of the denominator being moved to the right side, hence they are referred to augmented terms in the following context. And $ψ_{ρ} = {\begin{cases} 0 & ρ = 1, \dots, r \\ p_{d j} & ρ = r + 1, \dots, M; j = 2, \dots, h \end{cases}$ Notice from Equation (Equation13(13) $Ξ^{T} Ξ = \frac{c^{T} c \cdot e^{T} e}{N}$ (13) ) that the cost function using the sum of the squared fraction system errors can be computed by: (15) $J_{o} = e^{T} e = \frac{N Ξ^{T} Ξ}{c^{T} c}$ (15)

3.2. The dual solution

To minimise the modelling error, Equation (Equation15(15) $J_{o} = e^{T} e = \frac{N Ξ^{T} Ξ}{c^{T} c}$ (15) ) is often used as the criterion. However, the cost function is nonlinear with respect to the model coefficients. To this end, a linear optimisation algorithm is a well-known approach for many nonlinear problems, for example, the least squares SVM is identified by solving linear equations in the dual instead of solving a quadratic programming problem with inequality constraints in the primal (Mall & Suykens, Citation2015). Thus, instead of using a complex nonlinear derivative-based methods, similar to Pillonetto et al. (Citation2014), a linear dual problem combining the original linear cost function and the inner product of the denominator sample vector, is proposed as the cost function which is expressed in Proposition 3.1.

Proposition 3.1

Subject to the existence of 'λ' which satisfies ' $min (Ξ^{T} Ξ + λ c^{T} c) = 0$ ', minimising the cost function (Equation16(16) $\begin{aligned} J & = Ξ^{T} Ξ + λ c^{T} c \\ = (Y - Φ Θ)^{T} (Y - Φ Θ) \\ + λ (Ψ Θ + E)^{T} (Ψ Θ + E) \\ = Y^{T} Y - Θ^{T} Φ^{T} Y - Y^{T} Φ Θ + Θ^{T} Φ^{T} Φ Θ \\ + λ Θ^{T} Ψ^{T} Ψ Θ + λ Θ^{T} Ψ^{T} E + λ E^{T} Ψ Θ + λ E^{T} E \end{aligned}$ (16) ) for the transformed model (Equation7(7) $Y = Φ Θ + c ⊙ e = Φ Θ + Ξ$ (7) ) is equivalent to minimising (Equation15(15) $J_{o} = e^{T} e = \frac{N Ξ^{T} Ξ}{c^{T} c}$ (15) ) for the original model (Equation2(2) $y = a ⊘ c + e = (P_{n} Θ_{n}) ⊘ (P_{d} Θ_{d}) + e$ (2) ).

(16) $\begin{aligned} J & = Ξ^{T} Ξ + λ c^{T} c \\ = (Y - Φ Θ)^{T} (Y - Φ Θ) \\ + λ (Ψ Θ + E)^{T} (Ψ Θ + E) \\ = Y^{T} Y - Θ^{T} Φ^{T} Y - Y^{T} Φ Θ + Θ^{T} Φ^{T} Φ Θ \\ + λ Θ^{T} Ψ^{T} Ψ Θ + λ Θ^{T} Ψ^{T} E + λ E^{T} Ψ Θ + λ E^{T} E \end{aligned}$ (16) where $Ξ^{T} Ξ$ is used to evaluate the estimation error of the transformed linear-in-the-parameter system in Equation (Equation7(7) $Y = Φ Θ + c ⊙ e = Φ Θ + Ξ$ (7) ). From the expansion of Equation (Equation16(16) $\begin{aligned} J & = Ξ^{T} Ξ + λ c^{T} c \\ = (Y - Φ Θ)^{T} (Y - Φ Θ) \\ + λ (Ψ Θ + E)^{T} (Ψ Θ + E) \\ = Y^{T} Y - Θ^{T} Φ^{T} Y - Y^{T} Φ Θ + Θ^{T} Φ^{T} Φ Θ \\ + λ Θ^{T} Ψ^{T} Ψ Θ + λ Θ^{T} Ψ^{T} E + λ E^{T} Ψ Θ + λ E^{T} E \end{aligned}$ (16) ), $λ Θ^{T} Ψ^{T} Ψ Θ + λ Θ^{T} Ψ^{T} E + λ E^{T} Ψ Θ$ includes the regularisation parameter λ and the $ℓ_{1}$ and $ℓ_{2}$ norms of the estimated coefficients ( $Θ$ ). Thus, $λ Ψ^{T} Ψ + λ Ψ^{T} E + λ E^{T} Ψ$ with the regularisation parameter λ is used as the regularisation penalty term. Meanwhile, ' $min (Ξ^{T} Ξ + λ c^{T} c) = 0$ ' is the minimum value of the cost function (Equation16(16) $\begin{aligned} J & = Ξ^{T} Ξ + λ c^{T} c \\ = (Y - Φ Θ)^{T} (Y - Φ Θ) \\ + λ (Ψ Θ + E)^{T} (Ψ Θ + E) \\ = Y^{T} Y - Θ^{T} Φ^{T} Y - Y^{T} Φ Θ + Θ^{T} Φ^{T} Φ Θ \\ + λ Θ^{T} Ψ^{T} Ψ Θ + λ Θ^{T} Ψ^{T} E + λ E^{T} Ψ Θ + λ E^{T} E \end{aligned}$ (16) ), i.e. the optimal cost function.

The proof of the Proposition 3.1 is given below.

Considering the new cost function, the minimum estimation is given when the partial derivative of the cost function with respect to the coefficients is equal to zero. Thus, the coefficients can be estimated using: (17) $\frac{\partial J}{\partial Θ} = - 2 Φ^{T} Y + 2 Φ^{T} Φ Θ + 2 λ Ψ^{T} Ψ Θ + 2 λ Ψ^{T} E$ (17) Setting Equation (Equation17(17) $\frac{\partial J}{\partial Θ} = - 2 Φ^{T} Y + 2 Φ^{T} Φ Θ + 2 λ Ψ^{T} Ψ Θ + 2 λ Ψ^{T} E$ (17) ) equal to zero, the optimal coefficients are given as (18) $\hat{Θ} = (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} (Φ^{T} Y - λ Ψ^{T} E)$ (18) Similarly, for the nonlinear original cost function in Equation (Equation15(15) $J_{o} = e^{T} e = \frac{N Ξ^{T} Ξ}{c^{T} c}$ (15) ), its partial derivative with respect to the coefficients is calculated as (19) $\begin{aligned} \frac{\partial J_{o}}{\partial Θ} & = \frac{\partial ((Y - Φ Θ)^{T} (Y - Φ Θ) / (Ψ Θ)^{T} (Ψ Θ))}{\partial Θ} \\ = N (\frac{\begin{matrix} (- 2 Φ^{T} Y + 2 Φ^{T} Φ Θ) \\ ((Ψ Θ + E)^{T} (Ψ Θ + E)) \end{matrix}}{((Ψ Θ + E)^{T} (Ψ Θ + E))^{2}} \\ - \frac{\begin{matrix} ((Y - Φ Θ)^{T} (Y - Φ Θ)) \\ (2 λ Ψ^{T} (Ψ Θ + E)) \end{matrix}}{((Ψ Θ + E)^{T} (Ψ Θ + E))^{2}}) \end{aligned}$ (19) To calculate the partial derivative in Equation (Equation19(19) $\begin{aligned} \frac{\partial J_{o}}{\partial Θ} & = \frac{\partial ((Y - Φ Θ)^{T} (Y - Φ Θ) / (Ψ Θ)^{T} (Ψ Θ))}{\partial Θ} \\ = N (\frac{\begin{matrix} (- 2 Φ^{T} Y + 2 Φ^{T} Φ Θ) \\ ((Ψ Θ + E)^{T} (Ψ Θ + E)) \end{matrix}}{((Ψ Θ + E)^{T} (Ψ Θ + E))^{2}} \\ - \frac{\begin{matrix} ((Y - Φ Θ)^{T} (Y - Φ Θ)) \\ (2 λ Ψ^{T} (Ψ Θ + E)) \end{matrix}}{((Ψ Θ + E)^{T} (Ψ Θ + E))^{2}}) \end{aligned}$ (19) ) using the optimal coefficients, $2 λ Ψ^{T} Ψ Θ + 2 λ Ψ^{T} E$ is added in the first part, thus (20) $\begin{aligned} \frac{\partial J_{o}}{\partial Θ} & = N (\frac{\begin{matrix} (- 2 Φ^{T} Y + 2 Φ^{T} Φ Θ + 2 λ Ψ^{T} Ψ Θ \\ + 2 λ Ψ^{T} E) ((Ψ Θ + E)^{T} (Ψ Θ + E)) \end{matrix}}{((Ψ Θ + E)^{T} (Ψ Θ + E))^{2}} \\ - \frac{((Y - Φ Θ)^{T} (Y - Φ Θ)) (2 λ Ψ^{T} (Ψ Θ + E))}{((Ψ Θ + E)^{T} (Ψ Θ + E))^{2}} \\ - \frac{\begin{matrix} (2 λ Ψ^{T} Ψ Θ + 2 λ Ψ^{T} E) \\ ((Ψ Θ + E)^{T} (Ψ Θ + E)) \end{matrix}}{((Ψ Θ + E)^{T} (Ψ Θ + E))^{2}}) \end{aligned}$ (20) Obviously, the first part of the numerator in Equation (Equation20(20) $\begin{aligned} \frac{\partial J_{o}}{\partial Θ} & = N (\frac{\begin{matrix} (- 2 Φ^{T} Y + 2 Φ^{T} Φ Θ + 2 λ Ψ^{T} Ψ Θ \\ + 2 λ Ψ^{T} E) ((Ψ Θ + E)^{T} (Ψ Θ + E)) \end{matrix}}{((Ψ Θ + E)^{T} (Ψ Θ + E))^{2}} \\ - \frac{((Y - Φ Θ)^{T} (Y - Φ Θ)) (2 λ Ψ^{T} (Ψ Θ + E))}{((Ψ Θ + E)^{T} (Ψ Θ + E))^{2}} \\ - \frac{\begin{matrix} (2 λ Ψ^{T} Ψ Θ + 2 λ Ψ^{T} E) \\ ((Ψ Θ + E)^{T} (Ψ Θ + E)) \end{matrix}}{((Ψ Θ + E)^{T} (Ψ Θ + E))^{2}}) \end{aligned}$ (20) ) is the partial derivative in Equation (Equation17(17) $\frac{\partial J}{\partial Θ} = - 2 Φ^{T} Y + 2 Φ^{T} Φ Θ + 2 λ Ψ^{T} Ψ Θ + 2 λ Ψ^{T} E$ (17) ). Substituting Equation (Equation18(18) $\hat{Θ} = (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} (Φ^{T} Y - λ Ψ^{T} E)$ (18) ) into Equation (Equation20(20) $\begin{aligned} \frac{\partial J_{o}}{\partial Θ} & = N (\frac{\begin{matrix} (- 2 Φ^{T} Y + 2 Φ^{T} Φ Θ + 2 λ Ψ^{T} Ψ Θ \\ + 2 λ Ψ^{T} E) ((Ψ Θ + E)^{T} (Ψ Θ + E)) \end{matrix}}{((Ψ Θ + E)^{T} (Ψ Θ + E))^{2}} \\ - \frac{((Y - Φ Θ)^{T} (Y - Φ Θ)) (2 λ Ψ^{T} (Ψ Θ + E))}{((Ψ Θ + E)^{T} (Ψ Θ + E))^{2}} \\ - \frac{\begin{matrix} (2 λ Ψ^{T} Ψ Θ + 2 λ Ψ^{T} E) \\ ((Ψ Θ + E)^{T} (Ψ Θ + E)) \end{matrix}}{((Ψ Θ + E)^{T} (Ψ Θ + E))^{2}}) \end{aligned}$ (20) ), the first part of Equation (Equation20(20) $\begin{aligned} \frac{\partial J_{o}}{\partial Θ} & = N (\frac{\begin{matrix} (- 2 Φ^{T} Y + 2 Φ^{T} Φ Θ + 2 λ Ψ^{T} Ψ Θ \\ + 2 λ Ψ^{T} E) ((Ψ Θ + E)^{T} (Ψ Θ + E)) \end{matrix}}{((Ψ Θ + E)^{T} (Ψ Θ + E))^{2}} \\ - \frac{((Y - Φ Θ)^{T} (Y - Φ Θ)) (2 λ Ψ^{T} (Ψ Θ + E))}{((Ψ Θ + E)^{T} (Ψ Θ + E))^{2}} \\ - \frac{\begin{matrix} (2 λ Ψ^{T} Ψ Θ + 2 λ Ψ^{T} E) \\ ((Ψ Θ + E)^{T} (Ψ Θ + E)) \end{matrix}}{((Ψ Θ + E)^{T} (Ψ Θ + E))^{2}}) \end{aligned}$ (20) ) is zero and it can be rewritten as follows after N is left out: (21) $\begin{aligned} \frac{\partial J_{o}}{\partial Θ} & = - \frac{((Y - Φ \hat{Θ})^{T} (Y - Φ \hat{Θ})) (2 λ Ψ^{T} (Ψ \hat{Θ} + E))}{((Ψ \hat{Θ} + E)^{T} (Ψ \hat{Θ} + E))^{2}} \\ - \frac{\begin{matrix} (2 λ Ψ^{T} Ψ \hat{Θ} + 2 λ Ψ^{T} E) \\ ((Ψ \hat{Θ} + E)^{T} (Ψ \hat{Θ} + E)) \end{matrix}}{((Ψ \hat{Θ} + E)^{T} (Ψ \hat{Θ} + E))^{2}} \end{aligned}$ (21) Since both $(Y - Φ \hat{Θ})^{T} (Y - Φ \hat{Θ})$ and $(Ψ \hat{Θ} + E)^{T} (Ψ \hat{Θ} + E)$ are scalars, extracting $2 λ Ψ^{T} (Ψ \hat{Θ} + E)$ after changing its position, then (Equation21(21) $\begin{aligned} \frac{\partial J_{o}}{\partial Θ} & = - \frac{((Y - Φ \hat{Θ})^{T} (Y - Φ \hat{Θ})) (2 λ Ψ^{T} (Ψ \hat{Θ} + E))}{((Ψ \hat{Θ} + E)^{T} (Ψ \hat{Θ} + E))^{2}} \\ - \frac{\begin{matrix} (2 λ Ψ^{T} Ψ \hat{Θ} + 2 λ Ψ^{T} E) \\ ((Ψ \hat{Θ} + E)^{T} (Ψ \hat{Θ} + E)) \end{matrix}}{((Ψ \hat{Θ} + E)^{T} (Ψ \hat{Θ} + E))^{2}} \end{aligned}$ (21) ) can be expressed as (22) $\begin{aligned} \frac{\partial J_{o}}{\partial Θ} & = - \frac{(2 λ Ψ^{T} (Ψ \hat{Θ} + E)) (min (Ξ^{T} Ξ + λ c^{T} c))}{((Ψ \hat{Θ} + E)^{T} (Ψ \hat{Θ} + E))^{2}} \\ = - \frac{(2 λ Ψ^{T} (Ψ \hat{Θ} + E)) (min (J))}{((Ψ \hat{Θ} + E)^{T} (Ψ \hat{Θ} + E))^{2}} \end{aligned}$ (22) Now setting Equation (Equation22(22) $\begin{aligned} \frac{\partial J_{o}}{\partial Θ} & = - \frac{(2 λ Ψ^{T} (Ψ \hat{Θ} + E)) (min (Ξ^{T} Ξ + λ c^{T} c))}{((Ψ \hat{Θ} + E)^{T} (Ψ \hat{Θ} + E))^{2}} \\ = - \frac{(2 λ Ψ^{T} (Ψ \hat{Θ} + E)) (min (J))}{((Ψ \hat{Θ} + E)^{T} (Ψ \hat{Θ} + E))^{2}} \end{aligned}$ (22) ) equal to zero, yielding (23) $min (Ξ^{T} Ξ + λ c^{T} c) = 0$ (23) The above derivations show that if there exists a negative number λ which makes the dual problem in Equation (Equation16(16) $\begin{aligned} J & = Ξ^{T} Ξ + λ c^{T} c \\ = (Y - Φ Θ)^{T} (Y - Φ Θ) \\ + λ (Ψ Θ + E)^{T} (Ψ Θ + E) \\ = Y^{T} Y - Θ^{T} Φ^{T} Y - Y^{T} Φ Θ + Θ^{T} Φ^{T} Φ Θ \\ + λ Θ^{T} Ψ^{T} Ψ Θ + λ Θ^{T} Ψ^{T} E + λ E^{T} Ψ Θ + λ E^{T} E \end{aligned}$ (16) ) of system (Equation7(7) $Y = Φ Θ + c ⊙ e = Φ Θ + Ξ$ (7) ) equal to zero for the optimal coefficients, then the minimum of $J$ is zero, and Equations (Equation22(22) $\begin{aligned} \frac{\partial J_{o}}{\partial Θ} & = - \frac{(2 λ Ψ^{T} (Ψ \hat{Θ} + E)) (min (Ξ^{T} Ξ + λ c^{T} c))}{((Ψ \hat{Θ} + E)^{T} (Ψ \hat{Θ} + E))^{2}} \\ = - \frac{(2 λ Ψ^{T} (Ψ \hat{Θ} + E)) (min (J))}{((Ψ \hat{Θ} + E)^{T} (Ψ \hat{Θ} + E))^{2}} \end{aligned}$ (22) ) and (Equation23(23) $min (Ξ^{T} Ξ + λ c^{T} c) = 0$ (23) ) are satisfied. Thus, λ should be optimised to ensure that its absolute value is sufficiently small such that $(Ξ^{T} Ξ + λ c^{T} c) >= 0$ , and our study shows that the order of its magnitude is generally no greater than $- 4$ . This shows that the identification of the fraction system is equivalent to optimising the proposed dual problem, and the proposed dual problem (Equation16(16) $\begin{aligned} J & = Ξ^{T} Ξ + λ c^{T} c \\ = (Y - Φ Θ)^{T} (Y - Φ Θ) \\ + λ (Ψ Θ + E)^{T} (Ψ Θ + E) \\ = Y^{T} Y - Θ^{T} Φ^{T} Y - Y^{T} Φ Θ + Θ^{T} Φ^{T} Φ Θ \\ + λ Θ^{T} Ψ^{T} Ψ Θ + λ Θ^{T} Ψ^{T} E + λ E^{T} Ψ Θ + λ E^{T} E \end{aligned}$ (16) ) can be solved based on the regularised least squares. It should be noted that the regularisation penalty term ( $λ Ψ^{T} Ψ + λ Ψ^{T} E + λ E^{T} Ψ$ ) has only three parts and they are only related to part of the model coefficients. This is different from conventional regularisation problem where penalty is applied to all model coefficients. To reduce the computation complexity of the regularised least squares, this paper proposes to simplify the general regularised least squares according to the FRA framework.

4. Regularised fast recursive algorithm

According to Section 3, the fraction system is identifiable using the proposed primal–dual solution. To solve the transformed quadratic cost function, the least squares methods are most popular. In particular, the orthogonal least squares (OLS) (Zhu & Billings, Citation1996) and the fast recursive algorithm (FRA) (Li et al., Citation2006, Citation2005) can not only be used for parameters estimation, but also for model structure selection. Among them, the FRA method solves the least squares problem recursively without requiring matrix decomposition, and it further allows both fast forward and backward model identification. In the following, an regularised fast recursive method is presented.

4.1. The principle of regularised fast recursive method

4.1.1. Model term selection based on the regularised cost function

Referring to Li et al. (Citation2005), a fast calculation of the net contribution of each term chosen into the model is a key. As discussed earlier, the cost function can be defined as (24) $\begin{aligned} J & = Y^{T} Y - Θ^{T} Φ^{T} Y - Y^{T} Φ Θ + Θ^{T} Φ^{T} Φ Θ \\ + λ Θ^{T} Ψ^{T} Ψ Θ + λ Θ^{T} Ψ^{T} E + λ E^{T} Ψ Θ + λ E^{T} E \end{aligned}$ (24) Substituting the optimal coefficients Equation (Equation18(18) $\hat{Θ} = (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} (Φ^{T} Y - λ Ψ^{T} E)$ (18) ) into Equation (Equation24(24) $\begin{aligned} J & = Y^{T} Y - Θ^{T} Φ^{T} Y - Y^{T} Φ Θ + Θ^{T} Φ^{T} Φ Θ \\ + λ Θ^{T} Ψ^{T} Ψ Θ + λ Θ^{T} Ψ^{T} E + λ E^{T} Ψ Θ + λ E^{T} E \end{aligned}$ (24) ), the corresponding minimal cost function is given as follows and the simplification procedure is given in Appendix 1: (25) $\begin{aligned} J & = Y^{T} (I - Φ (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} Φ^{T}) Y \\ + 2 λ Y^{T} Ψ (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} Φ^{T} E \\ + λ E^{T} (I - λ Ψ (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} Ψ^{T}) E \end{aligned}$ (25) The elegance of the FRA approach lies in the efficient and stable calculation of the net decrease in the cost function when a candidate term is added to the model and the model coefficients can be efficiently obtained using backward substitution (Li et al., Citation2005). This framework also allows for both forward and backward model identification (Li et al., Citation2006). In this paper, to select the regressors and estimate the coefficients efficiently using RFRA, a proper selection criteria with the associated coefficient estimation method needs to be derived and implemented using a simplified and efficient computational framework.

Therefore, in the following, the net contribution of each model term will be derived first based on the cost function (Equation25(25) $\begin{aligned} J & = Y^{T} (I - Φ (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} Φ^{T}) Y \\ + 2 λ Y^{T} Ψ (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} Φ^{T} E \\ + λ E^{T} (I - λ Ψ (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} Ψ^{T}) E \end{aligned}$ (25) ) for the RFRA. Then, the four stages of the RFRA method are introduced in detail. Firstly, a cost function is designed for the fraction system identification based on the dual transformation. Secondly, according to the conventional regularised least squares method, the selection criteria and the coefficients are calculated based on Equations (Equation25(25) $\begin{aligned} J & = Y^{T} (I - Φ (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} Φ^{T}) Y \\ + 2 λ Y^{T} Ψ (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} Φ^{T} E \\ + λ E^{T} (I - λ Ψ (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} Ψ^{T}) E \end{aligned}$ (25) ) and (Equation18(18) $\hat{Θ} = (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} (Φ^{T} Y - λ Ψ^{T} E)$ (18) ). Thirdly, recursive matrices are defined to derive a simplified and efficient computational framework for calculating the net contributions of model terms and the model coefficients. Finally, to identify the fraction model using RFRA, the elements of recursive matrices can be updated recursively without incurring much calculations. Substituting these recursive matrices into the target equations, the regressors are selected and the coefficients are estimated respectively. The schematic of the proposed RFRA is illustrated in Figure .

Figure 2. The schematic of the proposed RFRA method.

Suppose the model (Equation2(2) $y = a ⊘ c + e = (P_{n} Θ_{n}) ⊘ (P_{d} Θ_{d}) + e$ (2) ) has a maximum of $(m + 1) / 2$ regression terms both in the numerator and the denominator, the matrix of candidate regression terms for (Equation2(2) $y = a ⊘ c + e = (P_{n} Θ_{n}) ⊘ (P_{d} Θ_{d}) + e$ (2) ) can be initially set as $Ω_{(m + 1) / 2} = [1, ω_{1}, \dots, ω_{(m - 1) / 2}]$ . Then the transformed linear model (Equation7(7) $Y = Φ Θ + c ⊙ e = Φ Θ + Ξ$ (7) ) has a maximum of m regression terms because the constant term in the denominator should have been moved as a part of the output. The matrix of candidate regression terms in (Equation7(7) $Y = Φ Θ + c ⊙ e = Φ Θ + Ξ$ (7) ) can be initially set as $Q_{m} = [φ_{1}, \dots, φ_{m}] = [1, ω_{1}, \dots, ω_{(m - 1) / 2}, - y ⊙ ω_{1}, \dots, - y ⊙ ω_{(m - 1) / 2}]$ and the corresponding auxiliary matrix for regularisation can be initially set as $Υ_{m} = [κ_{1}, \dots, κ_{m}] = [0, 0, \dots, 0, ω_{1}, \dots, ω_{(m - 1) / 2}]$ respectively, where the indices of the elements in $Q_{m - k + 1}$ and $Υ_{m - k + 1}$ at the kth iteration start from k, and thus $φ_{i}$ and $κ_{i}$ are the $(i - k + 1)$ th candidate terms in the candidate regression matrix $Q_{m - k + 1}$ and the augmented matrix $Υ_{m - k + 1}$ , respectively.

After the kth regressors has been selected into the model (Equation7(7) $Y = Φ Θ + c ⊙ e = Φ Θ + Ξ$ (7) ), the selected regression term matrix is recorded as $Φ_{k} = [ϕ_{1}, \dots, ϕ_{k}]$ and the selected augmented term matrix is recorded as $Ψ_{k} = [ψ_{1}, \dots, ψ_{k}]$ . Then the remaining candidate regressor matrix and the candidate augmented matrix are $Q_{m - k} = [φ_{k + 1}, \dots, φ_{m}]$ and $Υ_{m - k} = [κ_{k + 1}, \dots, κ_{m}]$ respectively. Thus the optimal model coefficients and the corresponding cost function are given as (26) $\begin{aligned} {\hat{Θ}}_{k} & = (Φ_{k}^{T} Φ_{k} + λ Ψ_{k}^{T} Ψ_{k})^{- 1} (Φ_{k}^{T} Y - λ Ψ_{k}^{T} E) \end{aligned}$ (26) (27) $\begin{aligned} J_{k} & = Y^{T} (I - Φ_{k} (Φ_{k}^{T} Φ_{k} + λ Ψ_{k}^{T} Ψ_{k})^{- 1} Φ_{k}^{T}) Y \\ + 2 λ Y^{T} Ψ_{k} (Φ_{k}^{T} Φ_{k} + λ Ψ_{k}^{T} Ψ_{k})^{- 1} Φ_{k}^{T} E \\ + λ E^{T} (I - λ Ψ_{k} (Φ_{k}^{T} Φ_{k} + λ Ψ_{k}^{T} Ψ_{k})^{- 1} Ψ_{k}^{T}) E \end{aligned}$ (27) Before introducing the regularised fast recursive algorithm, a recursive matrix at the kth iteration is defined as (28) $M_{k} ≜ Φ_{k}^{T} Φ_{k} + λ Ψ_{k}^{T} Ψ_{k}$ (28) And the inversion of $M_{k}^{- 1} = (Φ_{k}^{T} Φ_{k} + λ Ψ_{k}^{T} Ψ_{k})^{- 1}$ can be computed as follows (29) $\begin{aligned} M_{k}^{- 1} & = {[\begin{array}{cc} M_{k - 1} & \begin{matrix} Φ_{k - 1}^{T} ϕ_{k} \\ + λ Ψ_{k - 1}^{T} ψ_{k} \end{matrix} \\ ϕ_{k}^{T} Φ_{k - 1} + λ ψ_{k}^{T} Ψ_{k - 1} & ϕ_{k}^{T} ϕ_{k} + λ ψ_{k}^{T} ψ_{k} \end{array}]}^{- 1} \\ = [\begin{array}{cc} F_{k - 1} & g_{k - 1} \\ g_{k - 1}^{T} & f_{k - 1} \end{array}] \end{aligned}$ (29) where $Φ_{k - 1} = [ϕ_{1}, \dots, ϕ_{k - 1}]$ and $Ψ_{k - 1} = [ψ_{1}, \dots, ψ_{k - 1}]$ .

Given that $M_{k} \cdot M_{k}^{- 1} = I$ , then $F_{k - 1}$ , $g_{k - 1}$ and $f_{k - 1}$ can be calculated as follows (30) ${\begin{cases} F_{k - 1} = [M_{k - 1} - (Φ_{k - 1}^{T} ϕ_{k} + λ Ψ_{k - 1}^{T} ψ_{k}) \\ (ϕ_{k}^{T} Φ_{k - 1} + λ ψ_{k}^{T} Ψ_{k - 1}) / (ϕ_{k}^{T} ϕ_{k} + λ ψ_{k}^{T} ψ_{k})]^{- 1} \\ g_{k - 1} = - F_{k - 1} (Φ_{k - 1}^{T} ϕ_{k} + λ Ψ_{k - 1}^{T} ψ_{k}) \\ / (ϕ_{k}^{T} ϕ_{k} + λ ψ_{k}^{T} ψ_{k}) \\ f_{k - 1} = \frac{\begin{matrix} (ϕ_{k}^{T} ϕ_{k} + λ ψ_{k}^{T} ψ_{k}) \\ + (ϕ_{k}^{T} Φ_{k - 1} + λ ψ_{k}^{T} Ψ_{k - 1}) F_{k - 1} \\ (Φ_{k - 1}^{T} ϕ_{k} + λ Ψ_{k - 1}^{T} ψ_{k}) \end{matrix}}{(ϕ_{k}^{T} ϕ_{k} + λ ψ_{k}^{T} ψ_{k})^{2}} \end{cases}$ (30) According to the matrix inversion lemma $[A + B C D]^{- 1} = A^{- 1} - A^{- 1} B [D A^{- 1} B + C^{- 1}]^{- 1} D A^{- 1}$ , $F_{k - 1}$ can be computed as follows (31) $F_{k - 1} = M_{k - 1}^{- 1} + \frac{\begin{matrix} M_{k - 1}^{- 1} (Φ_{k - 1}^{T} ϕ_{k} + λ Ψ_{k - 1}^{T} ψ_{k}) \\ (ϕ_{k}^{T} Φ_{k - 1} + λ ψ_{k}^{T} Ψ_{k - 1}) M_{k - 1}^{- 1} \end{matrix}}{\begin{matrix} ϕ_{k}^{T} ϕ_{k} + λ ψ_{k}^{T} ψ_{k} \\ - (ϕ_{k}^{T} Φ_{k - 1} + λ ψ_{k}^{T} Ψ_{k - 1}) M_{k - 1}^{- 1} \\ (Φ_{k - 1}^{T} ϕ_{k} + λ Ψ_{k - 1}^{T} ψ_{k}) \end{matrix}}$ (31) Thus, after kth selection, the cost function of Equation (Equation27(27) $\begin{aligned} J_{k} & = Y^{T} (I - Φ_{k} (Φ_{k}^{T} Φ_{k} + λ Ψ_{k}^{T} Ψ_{k})^{- 1} Φ_{k}^{T}) Y \\ + 2 λ Y^{T} Ψ_{k} (Φ_{k}^{T} Φ_{k} + λ Ψ_{k}^{T} Ψ_{k})^{- 1} Φ_{k}^{T} E \\ + λ E^{T} (I - λ Ψ_{k} (Φ_{k}^{T} Φ_{k} + λ Ψ_{k}^{T} Ψ_{k})^{- 1} Ψ_{k}^{T}) E \end{aligned}$ (27) ) becomes (32) $\begin{aligned} J_{k} & = Y^{T} (I - Φ_{k} M_{k}^{- 1} Φ_{k}^{T}) Y + 2 λ Y^{T} Φ_{k} M_{k}^{- 1} Ψ_{k}^{T} E \\ + λ E^{T} (I - λ Ψ_{k} M_{k}^{- 1} Ψ_{k}^{T}) E \end{aligned}$ (32) Assuming the regression term $φ_{i}$ is selected from the candidate matrix $Q_{m - k}$ at $(k + 1)$ th selection, the selected regression term matrix becomes $Φ_{k + 1} = [Φ_{k}, φ_{i}]$ and the selected augmented regression term matrix is $Ψ_{k + 1} = [Ψ_{k}, κ_{i}]$ , while the remaining candidate regressor matrix and the candidate augmented matrix are $Q_{m - k - 1} = [φ_{k + 1}, \dots, φ_{i - 1}, φ_{i + 1}, \dots, φ_{m}]$ and $Υ_{m - k - 1} = [κ_{k + 1}, \dots, κ_{i - 1}, κ_{i + 1}, \dots, κ_{m}]$ respectively.

Thus, the cost function is updated as follows (33) $\begin{aligned} J_{k + 1} & = Y^{T} (I - Φ_{k + 1} M_{k + 1}^{- 1} Φ_{k + 1}^{T}) Y \\ + 2 λ Y^{T} Φ_{k + 1} M_{k + 1}^{- 1} Ψ_{k + 1}^{T} E \\ + λ E^{T} (I - λ Ψ_{k + 1} M_{k + 1}^{- 1} Ψ_{k + 1}^{T}) E \end{aligned}$ (33) where $M_{k + 1} = Φ_{k + 1}^{T} Φ_{k + 1} + λ Ψ_{k + 1}^{T} Ψ_{k + 1}$ .

Then, the net contribution of the selected term ( $φ_{i}$ ) is given as (34) $δ J_{k + 1} = J_{k + 1} - J_{k}$ (34) In order to simplify the calculations, three recursive matrices (indexed with (1), (2) and (3)) at $(k + 1)$ th iteration, recorded as $R_{k + 1}^{(1)}$ , $R_{k + 1}^{(2)}$ and $R_{k + 1}^{(3)}$ , are defined as (35) $\begin{aligned} R_{k + 1}^{(1)} & ≜ I - Φ_{k + 1} M_{k + 1}^{- 1} Φ_{k + 1}^{T} \end{aligned}$ (35) (36) $\begin{aligned} R_{k + 1}^{(2)} & ≜ I - λ Ψ_{k + 1} M_{k + 1}^{- 1} Ψ_{k + 1}^{T} \end{aligned}$ (36) (37) $\begin{aligned} R_{k + 1}^{(3)} & ≜ - Φ_{k + 1} M_{k + 1}^{- 1} Ψ_{k + 1}^{T} \end{aligned}$ (37) where the matrix sizes of $R_{k + 1}^{(1)}$ , $R_{k + 1}^{(2)}$ and $R_{k + 1}^{(3)}$ are all N by N, and $R_{0}^{(1)} = I_{N \times N}$ , $R_{0}^{(2)} = I_{N \times N}$ , $R_{0}^{(3)} = 0_{N \times N}$ .

Having defined these three recursive matrices, the explicit formulation to efficiently calculate the net contribution can be derived first. According to the calculation of $M_{k}^{- 1}$ using Equations (Equation30(30) ${\begin{cases} F_{k - 1} = [M_{k - 1} - (Φ_{k - 1}^{T} ϕ_{k} + λ Ψ_{k - 1}^{T} ψ_{k}) \\ (ϕ_{k}^{T} Φ_{k - 1} + λ ψ_{k}^{T} Ψ_{k - 1}) / (ϕ_{k}^{T} ϕ_{k} + λ ψ_{k}^{T} ψ_{k})]^{- 1} \\ g_{k - 1} = - F_{k - 1} (Φ_{k - 1}^{T} ϕ_{k} + λ Ψ_{k - 1}^{T} ψ_{k}) \\ / (ϕ_{k}^{T} ϕ_{k} + λ ψ_{k}^{T} ψ_{k}) \\ f_{k - 1} = \frac{\begin{matrix} (ϕ_{k}^{T} ϕ_{k} + λ ψ_{k}^{T} ψ_{k}) \\ + (ϕ_{k}^{T} Φ_{k - 1} + λ ψ_{k}^{T} Ψ_{k - 1}) F_{k - 1} \\ (Φ_{k - 1}^{T} ϕ_{k} + λ Ψ_{k - 1}^{T} ψ_{k}) \end{matrix}}{(ϕ_{k}^{T} ϕ_{k} + λ ψ_{k}^{T} ψ_{k})^{2}} \end{cases}$ (30) ) and (Equation31(31) $F_{k - 1} = M_{k - 1}^{- 1} + \frac{\begin{matrix} M_{k - 1}^{- 1} (Φ_{k - 1}^{T} ϕ_{k} + λ Ψ_{k - 1}^{T} ψ_{k}) \\ (ϕ_{k}^{T} Φ_{k - 1} + λ ψ_{k}^{T} Ψ_{k - 1}) M_{k - 1}^{- 1} \end{matrix}}{\begin{matrix} ϕ_{k}^{T} ϕ_{k} + λ ψ_{k}^{T} ψ_{k} \\ - (ϕ_{k}^{T} Φ_{k - 1} + λ ψ_{k}^{T} Ψ_{k - 1}) M_{k - 1}^{- 1} \\ (Φ_{k - 1}^{T} ϕ_{k} + λ Ψ_{k - 1}^{T} ψ_{k}) \end{matrix}}$ (31) ), the recursive matrices defined in Equations (Equation35(35) $\begin{aligned} R_{k + 1}^{(1)} & ≜ I - Φ_{k + 1} M_{k + 1}^{- 1} Φ_{k + 1}^{T} \end{aligned}$ (35) ), (Equation36(36) $\begin{aligned} R_{k + 1}^{(2)} & ≜ I - λ Ψ_{k + 1} M_{k + 1}^{- 1} Ψ_{k + 1}^{T} \end{aligned}$ (36) ) and (Equation37(37) $\begin{aligned} R_{k + 1}^{(3)} & ≜ - Φ_{k + 1} M_{k + 1}^{- 1} Ψ_{k + 1}^{T} \end{aligned}$ (37) ) can be recursively updated as follows (38) $\begin{aligned} R_{k + 1}^{(1)} & = R_{k}^{(1)} - \frac{\begin{matrix} (R_{k}^{(1)} φ_{i} + λ R_{k}^{(3)} κ_{i}) (φ_{i}^{T} \\ (R_{k}^{(1)})^{T} + λ κ_{i}^{T} (R_{k}^{(3)})^{T}) \end{matrix}}{\begin{matrix} (φ_{i}^{T} R_{k}^{(1)} φ_{i} + λ κ_{i}^{T} R_{k}^{(2)} κ_{i} \\ + 2 λ φ_{i}^{T} R_{k}^{(3)} κ_{i}) \end{matrix}} \\ \times k = 0, \dots, m - 1, i = k + 1, \dots, m . \end{aligned}$ (38) (39) $\begin{aligned} R_{k + 1}^{(2)} & = R_{k}^{(2)} - \frac{\begin{matrix} (R_{k}^{(2)} κ_{i} + (R_{k}^{(3)})^{T} φ_{i}) \\ (κ_{i}^{T} (R_{k}^{(2)})^{T} + φ_{i}^{T} R_{k}^{(3)}) \end{matrix}}{\begin{matrix} (φ_{i}^{T} R_{k}^{(1)} φ_{i} + λ κ_{i}^{T} R_{k}^{(2)} κ_{i} \\ + 2 λ φ_{i}^{T} R_{k}^{(3)} κ_{i}) \end{matrix}} \\ \times k = 0, \dots, m - 1, i = k + 1, \dots, m . \end{aligned}$ (39) (40) $\begin{aligned} R_{k + 1}^{(3)} & = R_{k}^{(3)} - \frac{\begin{matrix} (R_{k}^{(1)} φ_{i} + λ R_{k}^{(3)} κ_{i}) \\ (κ_{i}^{T} (R_{k}^{(2)})^{T} + φ_{i}^{T} R_{k}^{(3)}) \end{matrix}}{\begin{matrix} (φ_{i}^{T} R_{k}^{(1)} φ_{i} + λ κ_{i}^{T} R_{k}^{(2)} κ_{i} \\ + 2 λ φ_{i}^{T} R_{k}^{(3)} κ_{i}) \end{matrix}} \\ \times k = 0, \dots, m - 1, i = k + 1, \dots, m . \end{aligned}$ (40) The proof of these properties in Equations (Equation38(38) $\begin{aligned} R_{k + 1}^{(1)} & = R_{k}^{(1)} - \frac{\begin{matrix} (R_{k}^{(1)} φ_{i} + λ R_{k}^{(3)} κ_{i}) (φ_{i}^{T} \\ (R_{k}^{(1)})^{T} + λ κ_{i}^{T} (R_{k}^{(3)})^{T}) \end{matrix}}{\begin{matrix} (φ_{i}^{T} R_{k}^{(1)} φ_{i} + λ κ_{i}^{T} R_{k}^{(2)} κ_{i} \\ + 2 λ φ_{i}^{T} R_{k}^{(3)} κ_{i}) \end{matrix}} \\ \times k = 0, \dots, m - 1, i = k + 1, \dots, m . \end{aligned}$ (38) )–(Equation40(40) $\begin{aligned} R_{k + 1}^{(3)} & = R_{k}^{(3)} - \frac{\begin{matrix} (R_{k}^{(1)} φ_{i} + λ R_{k}^{(3)} κ_{i}) \\ (κ_{i}^{T} (R_{k}^{(2)})^{T} + φ_{i}^{T} R_{k}^{(3)}) \end{matrix}}{\begin{matrix} (φ_{i}^{T} R_{k}^{(1)} φ_{i} + λ κ_{i}^{T} R_{k}^{(2)} κ_{i} \\ + 2 λ φ_{i}^{T} R_{k}^{(3)} κ_{i}) \end{matrix}} \\ \times k = 0, \dots, m - 1, i = k + 1, \dots, m . \end{aligned}$ (40) ) is given in Appendix 2.

In addition, based on the definitions in Equation (Equation35(35) $\begin{aligned} R_{k + 1}^{(1)} & ≜ I - Φ_{k + 1} M_{k + 1}^{- 1} Φ_{k + 1}^{T} \end{aligned}$ (35) )–(Equation37(37) $\begin{aligned} R_{k + 1}^{(3)} & ≜ - Φ_{k + 1} M_{k + 1}^{- 1} Ψ_{k + 1}^{T} \end{aligned}$ (37) ), several properties can be inferred as given in Equations (Equation41(41) $\begin{aligned} R_{k}^{(1)} & = (R_{k}^{(1)})^{T}; R_{k}^{(2)} = (R_{k}^{(2)})^{T}, \\ k = 0, 1, \dots, m - 1. \end{aligned}$ (41) )–(Equation44(44) $\begin{aligned} R_{k}^{(3)} & = R_{k}^{(1)} R_{k}^{(3)} + R_{k}^{(3)} R_{k}^{(2)}; k = 0, 1, \dots, m - 1. \end{aligned}$ (44) ). (41) $\begin{aligned} R_{k}^{(1)} & = (R_{k}^{(1)})^{T}; R_{k}^{(2)} = (R_{k}^{(2)})^{T}, \\ k = 0, 1, \dots, m - 1. \end{aligned}$ (41) (42) $\begin{aligned} R_{k}^{(1)} & = (R_{k}^{(1)})^{2} + λ R_{k}^{(3)} (R_{k}^{(3)})^{T}; k = 0, 1, \dots, m - 1. \end{aligned}$ (42) (43) $\begin{aligned} R_{k}^{(2)} & = (R_{k}^{(2)})^{2} + (R_{k}^{(3)})^{T} R_{k}^{(3)}; k = 0, 1, \dots, m - 1. \end{aligned}$ (43) (44) $\begin{aligned} R_{k}^{(3)} & = R_{k}^{(1)} R_{k}^{(3)} + R_{k}^{(3)} R_{k}^{(2)}; k = 0, 1, \dots, m - 1. \end{aligned}$ (44) Then, the cost function $J_{k}$ at the kth iteration can be reformulated as (45) $\begin{aligned} J_{k} & = Y^{T} R_{k}^{(1)} Y + λ E^{T} R_{k}^{(2)} E + 2 λ Y^{T} R_{k}^{(3)} E + λ E^{T} E \\ k = 0, \dots, m - 1. \end{aligned}$ (45) And, the net contribution of the selected term ( $φ_{i}$ ) in Equation (Equation34(34) $δ J_{k + 1} = J_{k + 1} - J_{k}$ (34) ) can be rewritten as (46) $\begin{aligned} δ J_{k + 1} (φ_{i}) & = Y^{T} (R_{k + 1}^{(1)} - R_{k}^{(1)}) Y \\ + λ E^{T} (R_{k + 1}^{(2)} - R_{k}^{(2)}) E \\ + 2 λ Y^{T} (R_{k + 1}^{(3)} - R_{k}^{(3)}) E \\ + (λ E^{T} E - λ E^{T} E) \\ = Y^{T} (R_{k + 1}^{(1)} - R_{k}^{(1)}) Y \\ + λ E^{T} (R_{k + 1}^{(2)} - R_{k}^{(2)}) E \\ + 2 λ Y^{T} (R_{k + 1}^{(3)} - R_{k}^{(3)}) E \\ \times k = 0, \dots, m - 1. \end{aligned}$ (46) According to the properties given in Equations (Equation38(38) $\begin{aligned} R_{k + 1}^{(1)} & = R_{k}^{(1)} - \frac{\begin{matrix} (R_{k}^{(1)} φ_{i} + λ R_{k}^{(3)} κ_{i}) (φ_{i}^{T} \\ (R_{k}^{(1)})^{T} + λ κ_{i}^{T} (R_{k}^{(3)})^{T}) \end{matrix}}{\begin{matrix} (φ_{i}^{T} R_{k}^{(1)} φ_{i} + λ κ_{i}^{T} R_{k}^{(2)} κ_{i} \\ + 2 λ φ_{i}^{T} R_{k}^{(3)} κ_{i}) \end{matrix}} \\ \times k = 0, \dots, m - 1, i = k + 1, \dots, m . \end{aligned}$ (38) )–(Equation39(39) $\begin{aligned} R_{k + 1}^{(2)} & = R_{k}^{(2)} - \frac{\begin{matrix} (R_{k}^{(2)} κ_{i} + (R_{k}^{(3)})^{T} φ_{i}) \\ (κ_{i}^{T} (R_{k}^{(2)})^{T} + φ_{i}^{T} R_{k}^{(3)}) \end{matrix}}{\begin{matrix} (φ_{i}^{T} R_{k}^{(1)} φ_{i} + λ κ_{i}^{T} R_{k}^{(2)} κ_{i} \\ + 2 λ φ_{i}^{T} R_{k}^{(3)} κ_{i}) \end{matrix}} \\ \times k = 0, \dots, m - 1, i = k + 1, \dots, m . \end{aligned}$ (39) ), Equation (Equation46(46) $\begin{aligned} δ J_{k + 1} (φ_{i}) & = Y^{T} (R_{k + 1}^{(1)} - R_{k}^{(1)}) Y \\ + λ E^{T} (R_{k + 1}^{(2)} - R_{k}^{(2)}) E \\ + 2 λ Y^{T} (R_{k + 1}^{(3)} - R_{k}^{(3)}) E \\ + (λ E^{T} E - λ E^{T} E) \\ = Y^{T} (R_{k + 1}^{(1)} - R_{k}^{(1)}) Y \\ + λ E^{T} (R_{k + 1}^{(2)} - R_{k}^{(2)}) E \\ + 2 λ Y^{T} (R_{k + 1}^{(3)} - R_{k}^{(3)}) E \\ \times k = 0, \dots, m - 1. \end{aligned}$ (46) ) can be calculated as follows (47) $\begin{aligned} δ J_{k + 1} (φ_{i}) & = - \frac{\begin{matrix} Y^{T} (R_{k}^{(1)} φ_{i} + λ R_{k}^{(3)} κ_{i}) \\ (φ_{i}^{T} (R_{k}^{(1)})^{T} + λ κ_{i}^{T} (R_{k}^{(3)})^{T}) Y \end{matrix}}{\begin{matrix} (φ_{i}^{T} R_{k}^{(1)} φ_{i} + λ κ_{i}^{T} R_{k}^{(2)} κ_{i} \\ + 2 λ φ_{i}^{T} R_{k}^{(3)} κ_{i}) \end{matrix}} \\ - λ \frac{\begin{matrix} E^{T} (R_{k}^{(2)} κ_{i} + (R_{k}^{(3)})^{T} φ_{i}) \\ (κ_{i}^{T} (R_{k}^{(2)})^{T} + φ_{i}^{T} R_{k}^{(3)}) E \end{matrix}}{\begin{matrix} (φ_{i}^{T} R_{k}^{(1)} φ_{i} + λ κ_{i}^{T} R_{k}^{(2)} κ_{i} \\ + 2 λ φ_{i}^{T} R_{k}^{(3)} κ_{i}) \end{matrix}} \\ - 2 λ \frac{\begin{matrix} Y^{T} (R_{k}^{(1)} φ_{i} + λ R_{k}^{(3)} κ_{i}) (κ_{i}^{T} \\ (R_{k}^{(2)})^{T} + φ_{i}^{T} R_{k}^{(3)}) E \end{matrix}}{\begin{matrix} (φ_{i}^{T} R_{k}^{(1)} φ_{i} + λ κ_{i}^{T} R_{k}^{(2)} κ_{i} \\ + 2 λ φ_{i}^{T} R_{k}^{(3)} κ_{i}) \end{matrix}} \\ \times k = 0, \dots, m - 1. \end{aligned}$ (47)

4.1.2. Fast computation of net contributions of model terms to the cost function

Note that there are m−k candidate terms (i.e. $i = k + 1, \dots, m$ ) to be evaluated at the $(k + 1)$ th iteration, and in order to select a new term $ϕ_{k + 1}$ in $Φ_{k + 1}$ to be included into the model, one needs to calculate $δ J_{k + 1} (φ_{i})$ ( $i = k + 1, \dots, m$ ) using (Equation47(47) $\begin{aligned} δ J_{k + 1} (φ_{i}) & = - \frac{\begin{matrix} Y^{T} (R_{k}^{(1)} φ_{i} + λ R_{k}^{(3)} κ_{i}) \\ (φ_{i}^{T} (R_{k}^{(1)})^{T} + λ κ_{i}^{T} (R_{k}^{(3)})^{T}) Y \end{matrix}}{\begin{matrix} (φ_{i}^{T} R_{k}^{(1)} φ_{i} + λ κ_{i}^{T} R_{k}^{(2)} κ_{i} \\ + 2 λ φ_{i}^{T} R_{k}^{(3)} κ_{i}) \end{matrix}} \\ - λ \frac{\begin{matrix} E^{T} (R_{k}^{(2)} κ_{i} + (R_{k}^{(3)})^{T} φ_{i}) \\ (κ_{i}^{T} (R_{k}^{(2)})^{T} + φ_{i}^{T} R_{k}^{(3)}) E \end{matrix}}{\begin{matrix} (φ_{i}^{T} R_{k}^{(1)} φ_{i} + λ κ_{i}^{T} R_{k}^{(2)} κ_{i} \\ + 2 λ φ_{i}^{T} R_{k}^{(3)} κ_{i}) \end{matrix}} \\ - 2 λ \frac{\begin{matrix} Y^{T} (R_{k}^{(1)} φ_{i} + λ R_{k}^{(3)} κ_{i}) (κ_{i}^{T} \\ (R_{k}^{(2)})^{T} + φ_{i}^{T} R_{k}^{(3)}) E \end{matrix}}{\begin{matrix} (φ_{i}^{T} R_{k}^{(1)} φ_{i} + λ κ_{i}^{T} R_{k}^{(2)} κ_{i} \\ + 2 λ φ_{i}^{T} R_{k}^{(3)} κ_{i}) \end{matrix}} \\ \times k = 0, \dots, m - 1. \end{aligned}$ (47) ) and the candidate term which is selected as $ϕ_{k + 1}$ for inclusion into the model must have the maximal net contribution to the cost function, i.e. (48) $Δ J_{k + 1} (ϕ_{k + 1}) = max_{Q_{m - k}} Δ J_{k + 1} (φ_{i})$ (48) However, calculating the net contribution of m−k candidate terms using (Equation47(47) $\begin{aligned} δ J_{k + 1} (φ_{i}) & = - \frac{\begin{matrix} Y^{T} (R_{k}^{(1)} φ_{i} + λ R_{k}^{(3)} κ_{i}) \\ (φ_{i}^{T} (R_{k}^{(1)})^{T} + λ κ_{i}^{T} (R_{k}^{(3)})^{T}) Y \end{matrix}}{\begin{matrix} (φ_{i}^{T} R_{k}^{(1)} φ_{i} + λ κ_{i}^{T} R_{k}^{(2)} κ_{i} \\ + 2 λ φ_{i}^{T} R_{k}^{(3)} κ_{i}) \end{matrix}} \\ - λ \frac{\begin{matrix} E^{T} (R_{k}^{(2)} κ_{i} + (R_{k}^{(3)})^{T} φ_{i}) \\ (κ_{i}^{T} (R_{k}^{(2)})^{T} + φ_{i}^{T} R_{k}^{(3)}) E \end{matrix}}{\begin{matrix} (φ_{i}^{T} R_{k}^{(1)} φ_{i} + λ κ_{i}^{T} R_{k}^{(2)} κ_{i} \\ + 2 λ φ_{i}^{T} R_{k}^{(3)} κ_{i}) \end{matrix}} \\ - 2 λ \frac{\begin{matrix} Y^{T} (R_{k}^{(1)} φ_{i} + λ R_{k}^{(3)} κ_{i}) (κ_{i}^{T} \\ (R_{k}^{(2)})^{T} + φ_{i}^{T} R_{k}^{(3)}) E \end{matrix}}{\begin{matrix} (φ_{i}^{T} R_{k}^{(1)} φ_{i} + λ κ_{i}^{T} R_{k}^{(2)} κ_{i} \\ + 2 λ φ_{i}^{T} R_{k}^{(3)} κ_{i}) \end{matrix}} \\ \times k = 0, \dots, m - 1. \end{aligned}$ (47) ) will consume a large amount of computing resources including memory and time, to further simplify the calculations, a few intermediate matrices $A^{(l)} ≜ [a_{j, i}^{(l)}] \in ℜ^{m \times m}$ , $b^{(l)} ≜ [b_{i}^{(l)}] \in ℜ^{1 \times m}$ (l = 1, 2, 3, 4) are further introduced to calculate the net contribution to the cost function of the remaining m−k candidates. For $A^{(l)}$ , the elements $a_{j, i}^{(l)}$ are given by: (49) $\begin{aligned} a_{j, i}^{(1)} & ≜ {\begin{cases} 0, & j > i \\ ϕ_{j}^{T} (R_{j - 1}^{(1)}) ϕ_{i}, & j \leq i \leq k \\ ϕ_{j}^{T} (R_{j - 1}^{(1)}) φ_{i} & j \leq k, i = k + 1, \dots, m \\ φ_{j}^{T} (R_{k}^{(1)}) φ_{i} & \begin{array}{l} j = k + 1, \dots, m, \\ i = j, \dots, m \end{array} \end{cases} \end{aligned}$ (49) (50) $\begin{aligned} a_{j, i}^{(2)} & ≜ {\begin{cases} 0, & j > i \\ ψ_{j}^{T} (R_{j - 1}^{(2)}) ψ_{i}, & j \leq i \leq k \\ ψ_{j}^{T} (R_{j - 1}^{(2)}) κ_{i}, & j \leq k, i = k + 1, \dots, m \\ κ_{j}^{T} (R_{k}^{(2)}) κ_{i} & \begin{array}{l} j = k + 1, \dots, m, \\ i = j, \dots, m \end{array} \end{cases} \end{aligned}$ (50) (51) $\begin{aligned} a_{j, i}^{(3)} & ≜ {\begin{cases} 0, & j > i \\ ψ_{j}^{T} (R_{j - 1}^{(3)}) ϕ_{i}, & j \leq i \leq k \\ ψ_{j}^{T} (R_{j - 1}^{(3)}) φ_{i}, & j \leq k, i = k + 1, \dots, m \\ κ_{j}^{T} (R_{k}^{(3)})^{T} φ_{i} & \begin{array}{l} j = k + 1, \dots, m, \\ i = j, \dots, m \end{array} \end{cases} \end{aligned}$ (51) (52) $\begin{aligned} a_{j, i}^{(4)} & ≜ {\begin{cases} 0, & j > i \\ ϕ_{j}^{T} (R_{j - 1}^{(3)}) ψ_{i}, & j \leq i \leq k \\ ϕ_{j}^{T} (R_{j - 1}^{(3)}) κ_{i}, & j \leq k, i = k + 1, \dots, m \\ φ_{j}^{T} (R_{k}^{(3)}) κ_{i} & \begin{array}{l} j = k + 1, \dots, m, \\ i = j, \dots, m \end{array} \end{cases} \end{aligned}$ (52) where the first three terms in each equation are used for recursive update of the elements, while the last term in each equation is used to calculate the net contribution of a candidate term and only the diagonal elements are used.

For $b^{(l)}$ , the elements $b_{i}^{(l)}$ for the remaining candidates at $(k + 1)$ th iteration are given by: (53) $\begin{aligned} b_{i}^{(1)} & ≜ {\begin{cases} Y^{T} (R_{i - 1}^{(1)}) ϕ_{i} & i \leq k \\ Y^{T} (R_{k}^{(1)}) φ_{i} & i = k + 1, \dots, m \end{cases} \end{aligned}$ (53) (54) $\begin{aligned} b_{i}^{(3)} & ≜ {\begin{cases} Y^{T} (R_{i - 1}^{(3)}) ψ_{i} & i \leq k \\ Y^{T} (R_{k}^{(3)}) κ_{i} & i = k + 1, \dots, m \end{cases} \end{aligned}$ (54) (55) $\begin{aligned} b_{i}^{(2)} & ≜ {\begin{cases} E^{T} (R_{i - 1}^{(2)}) ψ_{i} & i \leq k \\ E^{T} (R_{k}^{(2)}) κ_{i} & i = k + 1, \dots, m \end{cases} \end{aligned}$ (55) (56) $\begin{aligned} b_{i}^{(4)} & ≜ {\begin{cases} E^{T} (R_{i - 1}^{(3)})^{T} ϕ_{i} & i \leq k \\ E^{T} (R_{k}^{(3)})^{T} φ_{i} & i = k + 1, \dots, m \end{cases} \end{aligned}$ (56) where the first term in each equation is used for recursive update of the elements, while the last term in each equation is used to calculate the net contribution.

Because of the symmetrical property and the properties in Equations (Equation41(41) $\begin{aligned} R_{k}^{(1)} & = (R_{k}^{(1)})^{T}; R_{k}^{(2)} = (R_{k}^{(2)})^{T}, \\ k = 0, 1, \dots, m - 1. \end{aligned}$ (41) )–(Equation44(44) $\begin{aligned} R_{k}^{(3)} & = R_{k}^{(1)} R_{k}^{(3)} + R_{k}^{(3)} R_{k}^{(2)}; k = 0, 1, \dots, m - 1. \end{aligned}$ (44) ), it is evident that $a_{j, i}^{(1)} = (a_{j, i}^{(1)})^{T}$ , $a_{j, i}^{(2)} = (a_{j, i}^{(2)})^{T}$ , $b_{i}^{(1)} = (b_{i}^{(1)})^{T}$ and $b_{i}^{(3)} = (b_{i}^{(3)})^{T}$ .

Using Equation (Equation38(38) $\begin{aligned} R_{k + 1}^{(1)} & = R_{k}^{(1)} - \frac{\begin{matrix} (R_{k}^{(1)} φ_{i} + λ R_{k}^{(3)} κ_{i}) (φ_{i}^{T} \\ (R_{k}^{(1)})^{T} + λ κ_{i}^{T} (R_{k}^{(3)})^{T}) \end{matrix}}{\begin{matrix} (φ_{i}^{T} R_{k}^{(1)} φ_{i} + λ κ_{i}^{T} R_{k}^{(2)} κ_{i} \\ + 2 λ φ_{i}^{T} R_{k}^{(3)} κ_{i}) \end{matrix}} \\ \times k = 0, \dots, m - 1, i = k + 1, \dots, m . \end{aligned}$ (38) ) and the definition of $a_{j, i}^{(1)}$ in Equation (Equation49(49) $\begin{aligned} a_{j, i}^{(1)} & ≜ {\begin{cases} 0, & j > i \\ ϕ_{j}^{T} (R_{j - 1}^{(1)}) ϕ_{i}, & j \leq i \leq k \\ ϕ_{j}^{T} (R_{j - 1}^{(1)}) φ_{i} & j \leq k, i = k + 1, \dots, m \\ φ_{j}^{T} (R_{k}^{(1)}) φ_{i} & \begin{array}{l} j = k + 1, \dots, m, \\ i = j, \dots, m \end{array} \end{cases} \end{aligned}$ (49) ), $a_{j, i}^{(1)}$ can be recursively computed. Taking the second term of (Equation49(49) $\begin{aligned} a_{j, i}^{(1)} & ≜ {\begin{cases} 0, & j > i \\ ϕ_{j}^{T} (R_{j - 1}^{(1)}) ϕ_{i}, & j \leq i \leq k \\ ϕ_{j}^{T} (R_{j - 1}^{(1)}) φ_{i} & j \leq k, i = k + 1, \dots, m \\ φ_{j}^{T} (R_{k}^{(1)}) φ_{i} & \begin{array}{l} j = k + 1, \dots, m, \\ i = j, \dots, m \end{array} \end{cases} \end{aligned}$ (49) ) as an example, it can be recursively updated as follows (57) $\begin{aligned} a_{j, i}^{(1)} & = ϕ_{j}^{T} (R_{j - 1}^{(1)}) ϕ_{i} = ϕ_{j}^{T} R_{j - 2}^{(1)} ϕ_{i} \\ - \frac{\begin{matrix} ϕ_{j}^{T} (R_{j - 2}^{(1)} ϕ_{j - 1} + λ R_{j - 2}^{(3)} ψ_{j - 1}) \\ (ϕ_{j - 1}^{T} (R_{j - 2}^{(1)})^{T} + λ ψ_{j - 1}^{T} (R_{j - 2}^{(3)})^{T}) ϕ_{i} \end{matrix}}{\begin{matrix} (ϕ_{j - 1}^{T} R_{j - 2}^{(1)} ϕ_{j - 1} + λ ψ_{j - 1}^{T} R_{j - 2}^{(2)} ψ_{j - 1} \\ + 2 λ ϕ_{j - 1}^{T} R_{j - 2}^{(3)} ψ_{j - 1}) \end{matrix}} \\ = ϕ_{j}^{T} R_{j - 2}^{(1)} ϕ_{i} - \frac{(a_{j - 1, j}^{(1)} + λ a_{j - 1, j}^{(3)}) (a_{j - 1, i}^{(1)} + λ a_{j - 1, i}^{(3)})}{a_{j - 1, j - 1}^{(1)} + λ a_{j - 1, j - 1}^{(2)} + 2 λ a_{j - 1, j - 1}^{(3)}} \end{aligned}$ (57) This updating procedure based on (Equation38(38) $\begin{aligned} R_{k + 1}^{(1)} & = R_{k}^{(1)} - \frac{\begin{matrix} (R_{k}^{(1)} φ_{i} + λ R_{k}^{(3)} κ_{i}) (φ_{i}^{T} \\ (R_{k}^{(1)})^{T} + λ κ_{i}^{T} (R_{k}^{(3)})^{T}) \end{matrix}}{\begin{matrix} (φ_{i}^{T} R_{k}^{(1)} φ_{i} + λ κ_{i}^{T} R_{k}^{(2)} κ_{i} \\ + 2 λ φ_{i}^{T} R_{k}^{(3)} κ_{i}) \end{matrix}} \\ \times k = 0, \dots, m - 1, i = k + 1, \dots, m . \end{aligned}$ (38) ) will continue until the recursive matrix reaches the values $R_{0}^{(1)}$ , then (58) $\begin{aligned} a_{j, i}^{(1)} & = ϕ_{j}^{T} ϕ_{i} - \sum_{s = 1}^{j} \frac{(a_{s, j}^{(1)} + λ a_{s, j}^{(3)}) (a_{s, i}^{(1)} + λ a_{s, i}^{(3)})}{a_{s, s}^{(1)} + λ a_{s, s}^{(2)} + 2 λ a_{s, s}^{(3)}} \\ j \leq i \leq k \end{aligned}$ (58) By adopting the similar recursive procedure to the other terms in (Equation38(38) $\begin{aligned} R_{k + 1}^{(1)} & = R_{k}^{(1)} - \frac{\begin{matrix} (R_{k}^{(1)} φ_{i} + λ R_{k}^{(3)} κ_{i}) (φ_{i}^{T} \\ (R_{k}^{(1)})^{T} + λ κ_{i}^{T} (R_{k}^{(3)})^{T}) \end{matrix}}{\begin{matrix} (φ_{i}^{T} R_{k}^{(1)} φ_{i} + λ κ_{i}^{T} R_{k}^{(2)} κ_{i} \\ + 2 λ φ_{i}^{T} R_{k}^{(3)} κ_{i}) \end{matrix}} \\ \times k = 0, \dots, m - 1, i = k + 1, \dots, m . \end{aligned}$ (38) ), $a_{j, i}^{(1)}$ can be calculated as follows

(59) $a_{j, i}^{(1)} = {\begin{cases} ϕ_{j}^{T} ϕ_{i} - \sum_{s = 1}^{j - 1} \frac{(a_{s, j}^{(1)} + λ a_{s, j}^{(3)}) (a_{s, i}^{(1)} + λ a_{s, i}^{(3)})}{a_{s, s}^{(1)} + λ a_{s, s}^{(2)} + 2 λ a_{s, s}^{(3)}} & j \leq i \leq k \\ ϕ_{j}^{T} φ_{i} - \sum_{s = 1}^{j - 1} \frac{(a_{s, j}^{(1)} + λ a_{s, j}^{(3)}) (a_{s, i}^{(1)} + λ a_{s, i}^{(3)})}{a_{s, s}^{(1)} + λ a_{s, s}^{(2)} + 2 λ a_{s, s}^{(3)}} & j \leq k, i = k + 1, \dots, m \\ φ_{j}^{T} φ_{i} - \sum_{s = 1}^{k} \frac{(a_{s, j}^{(1)} + λ a_{s, j}^{(3)}) (a_{s, i}^{(1)} + λ a_{s, i}^{(3)})}{a_{s, s}^{(1)} + λ a_{s, s}^{(2)} + 2 λ a_{s, s}^{(3)}} & j = k + 1, \dots, m, i = j, \dots, m \end{cases}$ (59)

Similarly, $a_{j, i}^{(2)}$ , $a_{j, i}^{(3)}$ , $a_{j, i}^{(4)}$ , $b_{i}^{(1)}$ , $b_{i}^{(3)}$ , $b_{i}^{(2)}$ and $b_{i}^{(4)}$ can be quickly calculated as follows using the properties Equations (Equation38(38) $\begin{aligned} R_{k + 1}^{(1)} & = R_{k}^{(1)} - \frac{\begin{matrix} (R_{k}^{(1)} φ_{i} + λ R_{k}^{(3)} κ_{i}) (φ_{i}^{T} \\ (R_{k}^{(1)})^{T} + λ κ_{i}^{T} (R_{k}^{(3)})^{T}) \end{matrix}}{\begin{matrix} (φ_{i}^{T} R_{k}^{(1)} φ_{i} + λ κ_{i}^{T} R_{k}^{(2)} κ_{i} \\ + 2 λ φ_{i}^{T} R_{k}^{(3)} κ_{i}) \end{matrix}} \\ \times k = 0, \dots, m - 1, i = k + 1, \dots, m . \end{aligned}$ (38) )–(Equation40(40) $\begin{aligned} R_{k + 1}^{(3)} & = R_{k}^{(3)} - \frac{\begin{matrix} (R_{k}^{(1)} φ_{i} + λ R_{k}^{(3)} κ_{i}) \\ (κ_{i}^{T} (R_{k}^{(2)})^{T} + φ_{i}^{T} R_{k}^{(3)}) \end{matrix}}{\begin{matrix} (φ_{i}^{T} R_{k}^{(1)} φ_{i} + λ κ_{i}^{T} R_{k}^{(2)} κ_{i} \\ + 2 λ φ_{i}^{T} R_{k}^{(3)} κ_{i}) \end{matrix}} \\ \times k = 0, \dots, m - 1, i = k + 1, \dots, m . \end{aligned}$ (40) ).

(60) $\begin{aligned} a_{j, i}^{(2)} & = {\begin{cases} ψ_{j}^{T} ψ_{i} - \sum_{s = 1}^{j - 1} \frac{(a_{s, j}^{(2)} + a_{s, j}^{(4)}) (a_{s, i}^{(2)} + a_{s, i}^{(4)})}{a_{s, s}^{(1)} + λ a_{s, s}^{(2)} + 2 λ a_{s, s}^{(3)}} & j \leq i \leq k \\ ψ_{j}^{T} κ_{i} - \sum_{s = 1}^{j - 1} \frac{(a_{s, j}^{(2)} + a_{s, j}^{(4)}) (a_{s, i}^{(2)} + a_{s, i}^{(4)})}{a_{s, s}^{(1)} + λ a_{s, s}^{(2)} + 2 λ a_{s, s}^{(3)}} & j \leq k, i = k + 1, \dots, m \\ κ_{j}^{T} κ_{i} - \sum_{s = 1}^{k} \frac{(a_{s, j}^{(2)} + a_{s, j}^{(4)}) (a_{s, i}^{(2)} + a_{s, i}^{(4)})}{a_{s, s}^{(1)} + λ a_{s, s}^{(2)} + 2 λ a_{s, s}^{(3)}} & j = k + 1, \dots, m, i = j, \dots, m \end{cases} \end{aligned}$ (60) (61) $\begin{aligned} a_{j, i}^{(3)} & = {\begin{cases} - \sum_{s = 1}^{j - 1} \frac{(a_{s, j}^{(2)} + a_{s, j}^{(4)}) (a_{s, i}^{(1)} + λ a_{s, i}^{(3)})}{a_{s, s}^{(1)} + λ a_{s, s}^{(2)} + 2 λ a_{s, s}^{(3)}} & j \leq i \leq k \\ - \sum_{s = 1}^{j - 1} \frac{(a_{s, j}^{(2)} + a_{s, j}^{(4)}) (a_{s, i}^{(1)} + λ a_{s, i}^{(3)})}{a_{s, s}^{(1)} + λ a_{s, s}^{(2)} + 2 λ a_{s, s}^{(3)}} & j \leq k, i = k + 1, \dots, m \\ - \sum_{s = 1}^{k} \frac{(a_{s, j}^{(2)} + a_{s, j}^{(4)}) (a_{s, i}^{(1)} + λ a_{s, i}^{(3)})}{a_{s, s}^{(1)} + λ a_{s, s}^{(2)} + 2 λ a_{s, s}^{(3)}} & j = k + 1, \dots, m, i = j, \dots, m \end{cases} \end{aligned}$ (61) (62) $\begin{aligned} a_{k + 1, i}^{(4)} & = {\begin{cases} - \sum_{s = 1}^{j - 1} \frac{(a_{s, j}^{(1)} + λ a_{s, j}^{(3)}) (a_{s, i}^{(2)} + a_{s, i}^{(4)})}{a_{s, s}^{(1)} + λ a_{s, s}^{(2)} + 2 λ a_{s, s}^{(3)}} & j \leq i \leq k \\ - \sum_{s = 1}^{j - 1} \frac{(a_{s, j}^{(1)} + λ a_{s, j}^{(3)}) (a_{s, i}^{(2)} + a_{s, i}^{(4)})}{a_{s, s}^{(1)} + λ a_{s, s}^{(2)} + 2 λ a_{s, s}^{(3)}} & j \leq k, i = k + 1, \dots, m \\ - \sum_{s = 1}^{k} \frac{(a_{s, j}^{(1)} + λ a_{s, j}^{(3)}) (a_{s, i}^{(2)} + a_{s, i}^{(4)})}{a_{s, s}^{(1)} + λ a_{s, s}^{(2)} + 2 λ a_{s, s}^{(3)}} & j = k + 1, \dots, m, i = j, \dots, m \end{cases} \end{aligned}$ (62) (63) $\begin{aligned} b_{i}^{(1)} & = {\begin{cases} Y^{T} ϕ_{i} - \sum_{s = 1}^{j - 1} \frac{(b_{s}^{(1)} + λ b_{s}^{(3)}) (a_{s, j}^{(1)} + λ a_{s, j}^{(3)})}{a_{s, s}^{(1)} + λ a_{s, s}^{(2)} + 2 λ a_{s, s}^{(3)}} & i \leq k \\ Y^{T} φ_{i} - \sum_{s = 1}^{k} \frac{(b_{s}^{(1)} + λ b_{s}^{(3)}) (a_{s, j}^{(1)} + λ a_{s, j}^{(3)})}{a_{s, s}^{(1)} + λ a_{s, s}^{(2)} + 2 λ a_{s, s}^{(3)}} & i = k + 1, \dots, m \end{cases} \end{aligned}$ (63) (64) $\begin{aligned} b_{i}^{(3)} & = {\begin{cases} - \sum_{s = 1}^{j - 1} \frac{(b_{s}^{(1)} + λ b_{s}^{(3)}) (a_{s, k + 1}^{(2)} + a_{s, k + 1}^{(4)})}{a_{s, s}^{(1)} + λ a_{s, s}^{(2)} + 2 λ a_{s, s}^{(3)}} & i \leq k \\ - \sum_{s = 1}^{k} \frac{(b_{s}^{(1)} + λ b_{s}^{(3)}) (a_{s, k + 1}^{(2)} + a_{s, k + 1}^{(4)})}{a_{s, s}^{(1)} + λ a_{s, s}^{(2)} + 2 λ a_{s, s}^{(3)}} & i = k + 1, \dots, m \end{cases} \end{aligned}$ (64) (65) $\begin{aligned} b_{i}^{(2)} & = {\begin{cases} E^{T} ψ_{i} - \sum_{s = 1}^{j - 1} \frac{(b_{s}^{(2)} + b_{s}^{(4)}) (a_{s, j}^{(2)} + a_{s, j}^{(4)})}{a_{s, s}^{(1)} + λ a_{s, s}^{(2)} + 2 λ a_{s, s}^{(3)}} & i \leq k \\ E^{T} κ_{i} - \sum_{s = 1}^{k} \frac{(b_{s}^{(2)} + b_{s}^{(4)}) (a_{s, j}^{(2)} + a_{s, j}^{(4)})}{a_{s, s}^{(1)} + λ a_{s, s}^{(2)} + 2 λ a_{s, s}^{(3)}} & i = k + 1, \dots, m \end{cases} \end{aligned}$ (65) (66) $\begin{aligned} b_{i}^{(4)} & = {\begin{cases} - \sum_{s = 1}^{j - 1} \frac{(b_{s}^{(2)} + b_{s}^{(4)}) (a_{s, j}^{(1)} + λ a_{s, j}^{(3)})}{a_{s, s}^{(1)} + λ a_{s, s}^{(2)} + 2 λ a_{s, s}^{(3)}} & i \leq k \\ - \sum_{s = 1}^{k} \frac{(b_{s}^{(2)} + b_{s}^{(4)}) (a_{s, j}^{(1)} + λ a_{s, j}^{(3)})}{a_{s, s}^{(1)} + λ a_{s, s}^{(2)} + 2 λ a_{s, s}^{(3)}} & i = k + 1, \dots, m \end{cases} \end{aligned}$ (66)

Noticing from Equations (Equation47(47) $\begin{aligned} δ J_{k + 1} (φ_{i}) & = - \frac{\begin{matrix} Y^{T} (R_{k}^{(1)} φ_{i} + λ R_{k}^{(3)} κ_{i}) \\ (φ_{i}^{T} (R_{k}^{(1)})^{T} + λ κ_{i}^{T} (R_{k}^{(3)})^{T}) Y \end{matrix}}{\begin{matrix} (φ_{i}^{T} R_{k}^{(1)} φ_{i} + λ κ_{i}^{T} R_{k}^{(2)} κ_{i} \\ + 2 λ φ_{i}^{T} R_{k}^{(3)} κ_{i}) \end{matrix}} \\ - λ \frac{\begin{matrix} E^{T} (R_{k}^{(2)} κ_{i} + (R_{k}^{(3)})^{T} φ_{i}) \\ (κ_{i}^{T} (R_{k}^{(2)})^{T} + φ_{i}^{T} R_{k}^{(3)}) E \end{matrix}}{\begin{matrix} (φ_{i}^{T} R_{k}^{(1)} φ_{i} + λ κ_{i}^{T} R_{k}^{(2)} κ_{i} \\ + 2 λ φ_{i}^{T} R_{k}^{(3)} κ_{i}) \end{matrix}} \\ - 2 λ \frac{\begin{matrix} Y^{T} (R_{k}^{(1)} φ_{i} + λ R_{k}^{(3)} κ_{i}) (κ_{i}^{T} \\ (R_{k}^{(2)})^{T} + φ_{i}^{T} R_{k}^{(3)}) E \end{matrix}}{\begin{matrix} (φ_{i}^{T} R_{k}^{(1)} φ_{i} + λ κ_{i}^{T} R_{k}^{(2)} κ_{i} \\ + 2 λ φ_{i}^{T} R_{k}^{(3)} κ_{i}) \end{matrix}} \\ \times k = 0, \dots, m - 1. \end{aligned}$ (47) ) and (Equation49(49) $\begin{aligned} a_{j, i}^{(1)} & ≜ {\begin{cases} 0, & j > i \\ ϕ_{j}^{T} (R_{j - 1}^{(1)}) ϕ_{i}, & j \leq i \leq k \\ ϕ_{j}^{T} (R_{j - 1}^{(1)}) φ_{i} & j \leq k, i = k + 1, \dots, m \\ φ_{j}^{T} (R_{k}^{(1)}) φ_{i} & \begin{array}{l} j = k + 1, \dots, m, \\ i = j, \dots, m \end{array} \end{cases} \end{aligned}$ (49) )–(Equation56(56) $\begin{aligned} b_{i}^{(4)} & ≜ {\begin{cases} E^{T} (R_{i - 1}^{(3)})^{T} ϕ_{i} & i \leq k \\ E^{T} (R_{k}^{(3)})^{T} φ_{i} & i = k + 1, \dots, m \end{cases} \end{aligned}$ (56) ), the net contribution of any candidate term $φ_{i} (i = k + 1, \dots, m)$ at $(k + 1)$ th iteration can be quickly calculated as follows (67) $\begin{aligned} δ J_{k + 1} (φ_{i}) & = \frac{(b_{i}^{(1)} + λ b_{i}^{(3)})^{2}}{a_{i, i}^{(1)} + λ a_{i, i}^{(2)} + 2 λ a_{i, i}^{(3)}} \\ + \frac{λ (b_{i}^{(2)} + b_{i}^{(4)})^{2}}{a_{i, i}^{(1)} + λ a_{i, i}^{(2)} + 2 λ a_{i, i}^{(3)}} \\ + \frac{2 λ (b_{i}^{(1)} + λ b_{i}^{(3)}) (b_{i}^{(2)} + b_{i}^{(4)})}{a_{i, i}^{(1)} + λ a_{i, i}^{(2)} + 2 λ a_{i, i}^{(3)}} \end{aligned}$ (67) Equation (Equation67(67) $\begin{aligned} δ J_{k + 1} (φ_{i}) & = \frac{(b_{i}^{(1)} + λ b_{i}^{(3)})^{2}}{a_{i, i}^{(1)} + λ a_{i, i}^{(2)} + 2 λ a_{i, i}^{(3)}} \\ + \frac{λ (b_{i}^{(2)} + b_{i}^{(4)})^{2}}{a_{i, i}^{(1)} + λ a_{i, i}^{(2)} + 2 λ a_{i, i}^{(3)}} \\ + \frac{2 λ (b_{i}^{(1)} + λ b_{i}^{(3)}) (b_{i}^{(2)} + b_{i}^{(4)})}{a_{i, i}^{(1)} + λ a_{i, i}^{(2)} + 2 λ a_{i, i}^{(3)}} \end{aligned}$ (67) ) gives a simplified formulation for calculating the net contribution with low demand on computation and memory resources. Accordingly, the regressor with the maximum net contribution will be selected from the candidates based on their net contributions $δ J_{k + 1} (φ_{i})$ . The selected term will then be moved to the $(k + 1)$ th position of the matrix $Φ_{k + 1}$ and denoted as $ϕ_{k + 1}$ .

4.1.3. Fast computation of model coefficients

The next step is then to develop effective formula for estimating coefficients. According to the definition (Equation28(28) $M_{k} ≜ Φ_{k}^{T} Φ_{k} + λ Ψ_{k}^{T} Ψ_{k}$ (28) ), the optimal model coefficients (Equation26(26) $\begin{aligned} {\hat{Θ}}_{k} & = (Φ_{k}^{T} Φ_{k} + λ Ψ_{k}^{T} Ψ_{k})^{- 1} (Φ_{k}^{T} Y - λ Ψ_{k}^{T} E) \end{aligned}$ (26) ) can be rewritten as (68) ${\hat{Θ}}_{k} = M_{k}^{- 1} (Φ_{k}^{T} Y - λ Ψ_{k}^{T} E)$ (68) To use (Equation35(35) $\begin{aligned} R_{k + 1}^{(1)} & ≜ I - Φ_{k + 1} M_{k + 1}^{- 1} Φ_{k + 1}^{T} \end{aligned}$ (35) ) and (Equation37(37) $\begin{aligned} R_{k + 1}^{(3)} & ≜ - Φ_{k + 1} M_{k + 1}^{- 1} Ψ_{k + 1}^{T} \end{aligned}$ (37) ) to simplify the calculations, both sides of (Equation68(68) ${\hat{Θ}}_{k} = M_{k}^{- 1} (Φ_{k}^{T} Y - λ Ψ_{k}^{T} E)$ (68) ) are multiplied by $Φ_{k}$ , yielding (69) $\begin{aligned} Φ_{k} {\hat{Θ}}_{k} & = Φ_{k} M_{k}^{- 1} (Φ_{k}^{T} Y - λ Ψ_{k}^{T} E) \\ = (I - R_{k}^{(1)}) Y + λ R_{k}^{(3)} E \end{aligned}$ (69) Expanding $Φ_{k} {\hat{Θ}}_{k}$ in Equation (Equation69(69) $\begin{aligned} Φ_{k} {\hat{Θ}}_{k} & = Φ_{k} M_{k}^{- 1} (Φ_{k}^{T} Y - λ Ψ_{k}^{T} E) \\ = (I - R_{k}^{(1)}) Y + λ R_{k}^{(3)} E \end{aligned}$ (69) ) gives: (70) ${\hat{θ}}_{1} ϕ_{1} + \dots + {\hat{θ}}_{k} ϕ_{k} = (I - R_{k}^{(1)}) Y + λ R_{k}^{(3)} E$ (70) Referring to Li et al. (Citation2005), a recursive matrix at $(k + 1)$ th iteration is defined as $R_{k + 1}^{(0)} ≜ I - Φ_{k + 1} (Φ_{k + 1}^{T} Φ_{k + 1})^{- 1} Φ_{k + 1}^{T}$ (where $R_{0}^{(0)} = I_{N \times N}$ ) for parameter estimation using the backward substitution.

Then, both sides of Equation (Equation70(70) ${\hat{θ}}_{1} ϕ_{1} + \dots + {\hat{θ}}_{k} ϕ_{k} = (I - R_{k}^{(1)}) Y + λ R_{k}^{(3)} E$ (70) ) are multiplied by $(R_{j - 1}^{(0)} ϕ_{j})^{T}$ and consider the property $R_{k}^{(0)} ϕ_{i} = 0$ ( $i < k$ ) (Li et al., Citation2005), yielding (71) $\begin{aligned} \sum_{i = j}^{k} {\hat{θ}}_{i} (R_{j - 1}^{(0)} ϕ_{j})^{T} ϕ_{i} \\ = \sum_{i = j}^{k} {\hat{θ}}_{i} ϕ_{j}^{T} R_{j - 1}^{(0)} ϕ_{i} \\ = (R_{j - 1}^{(0)} ϕ_{j})^{T} ((I - R_{k}^{(1)}) Y + λ R_{k}^{(3)} E) \\ = (R_{j - 1}^{(0)} ϕ_{j})^{T} (Y - R_{k}^{(1)} Y + λ R_{k}^{(3)} E) j = 1, \dots, k . \end{aligned}$ (71) For the final model consisting of M = h + r−1 regressors, Equation (Equation70(70) ${\hat{θ}}_{1} ϕ_{1} + \dots + {\hat{θ}}_{k} ϕ_{k} = (I - R_{k}^{(1)}) Y + λ R_{k}^{(3)} E$ (70) ) can be rewritten as (72) $\begin{aligned} \sum_{i = j}^{M} {\hat{θ}}_{i} (R_{j - 1}^{(0)} ϕ_{j})^{T} ϕ_{i} = (R_{j - 1}^{(0)} ϕ_{j})^{T} \\ \times (Y - R_{M}^{(1)} Y + λ R_{M}^{(3)} E), j = 1, \dots, M . \end{aligned}$ (72) Obviously, the model coefficients of the last selected term can be first calculated using (Equation72(72) $\begin{aligned} \sum_{i = j}^{M} {\hat{θ}}_{i} (R_{j - 1}^{(0)} ϕ_{j})^{T} ϕ_{i} = (R_{j - 1}^{(0)} ϕ_{j})^{T} \\ \times (Y - R_{M}^{(1)} Y + λ R_{M}^{(3)} E), j = 1, \dots, M . \end{aligned}$ (72) ) and then the others can be backward recursively computed. However, the computation is still high due to the presence of $R_{j - 1}^{(0)}$ , $R_{k}^{(1)}$ and $R_{k}^{(3)}$ . To simplify the calculations, a recursive matrix $A ≜ [a_{k + 1, i}] \in ℜ^{M \times M}$ and three recursive vectors $Y^{(k)} \in ℜ^{N \times 1}$ , $E^{(k)} \in ℜ^{N \times 1}$ and $b ≜ [b_{k + 1}] \in ℜ^{1 \times M}$ are defined as (73) $\begin{aligned} Y^{(k)} & ≜ R_{k}^{(1)} Y \end{aligned}$ (73) (74) $\begin{aligned} E^{(k)} & ≜ R_{k}^{(3)} E \end{aligned}$ (74) (75) $\begin{aligned} a_{k + 1, i} & ≜ ϕ_{k + 1}^{T} R_{k}^{(0)} ϕ_{i} \end{aligned}$ (75) (76) $\begin{aligned} b_{k + 1} & ≜ (Y - Y^{(M)} + λ E^{(M)})^{T} R_{k}^{(0)} ϕ_{k + 1} \end{aligned}$ (76) where $k = 0, 1, \dots, M - 1$ representing the number of selected model terms and $i = k + 1, \dots, M$ is the index of the term whose coefficient has been estimated.

For the selected model terms at $(k + 1)$ th time, Equation (Equation72(72) $\begin{aligned} \sum_{i = j}^{M} {\hat{θ}}_{i} (R_{j - 1}^{(0)} ϕ_{j})^{T} ϕ_{i} = (R_{j - 1}^{(0)} ϕ_{j})^{T} \\ \times (Y - R_{M}^{(1)} Y + λ R_{M}^{(3)} E), j = 1, \dots, M . \end{aligned}$ (72) ) can be rewritten as (77) $\begin{aligned} \sum_{i = k + 1}^{M} {\hat{θ}}_{i} a_{k + 1, i} & = (R_{k}^{(0)} ϕ_{k + 1})^{T} (Y - Y^{(M)} + λ E^{(M)}) \\ = b_{k + 1} \end{aligned}$ (77) To further simplify $Y^{(M)}$ and $E^{(M)}$ , two recursive regressors $ϕ_{i}^{(k)}, ψ_{i}^{(k)}$ are defined as follows: (78) $\begin{aligned} ϕ_{i}^{(k)} & ≜ R_{k}^{(1)} ϕ_{i} \end{aligned}$ (78) (79) $\begin{aligned} ψ_{i}^{(k)} & ≜ R_{k}^{(3)} ψ_{i} \end{aligned}$ (79) where $k = 0, 1, \dots, M - 1$ and $i = 1, 2, \dots, k$ .

To select M model terms, the output $Y$ and the regressors $ϕ_{i}, ψ_{i}$ , $E$ will be recursively updated for M times. Their updated notations are defined as $Y^{(M)}$ , $ϕ_{i}^{(M)}$ , $ψ_{i}^{(M)}$ and $E^{(M)}$ respectively. Based on Equations (Equation38(38) $\begin{aligned} R_{k + 1}^{(1)} & = R_{k}^{(1)} - \frac{\begin{matrix} (R_{k}^{(1)} φ_{i} + λ R_{k}^{(3)} κ_{i}) (φ_{i}^{T} \\ (R_{k}^{(1)})^{T} + λ κ_{i}^{T} (R_{k}^{(3)})^{T}) \end{matrix}}{\begin{matrix} (φ_{i}^{T} R_{k}^{(1)} φ_{i} + λ κ_{i}^{T} R_{k}^{(2)} κ_{i} \\ + 2 λ φ_{i}^{T} R_{k}^{(3)} κ_{i}) \end{matrix}} \\ \times k = 0, \dots, m - 1, i = k + 1, \dots, m . \end{aligned}$ (38) ), (Equation40(40) $\begin{aligned} R_{k + 1}^{(3)} & = R_{k}^{(3)} - \frac{\begin{matrix} (R_{k}^{(1)} φ_{i} + λ R_{k}^{(3)} κ_{i}) \\ (κ_{i}^{T} (R_{k}^{(2)})^{T} + φ_{i}^{T} R_{k}^{(3)}) \end{matrix}}{\begin{matrix} (φ_{i}^{T} R_{k}^{(1)} φ_{i} + λ κ_{i}^{T} R_{k}^{(2)} κ_{i} \\ + 2 λ φ_{i}^{T} R_{k}^{(3)} κ_{i}) \end{matrix}} \\ \times k = 0, \dots, m - 1, i = k + 1, \dots, m . \end{aligned}$ (40) ) (Equation73(73) $\begin{aligned} Y^{(k)} & ≜ R_{k}^{(1)} Y \end{aligned}$ (73) ), (Equation74(74) $\begin{aligned} E^{(k)} & ≜ R_{k}^{(3)} E \end{aligned}$ (74) ), (Equation78(78) $\begin{aligned} ϕ_{i}^{(k)} & ≜ R_{k}^{(1)} ϕ_{i} \end{aligned}$ (78) ) and (Equation79(79) $\begin{aligned} ψ_{i}^{(k)} & ≜ R_{k}^{(3)} ψ_{i} \end{aligned}$ (79) ), the four vectors are recursively updated by $R_{M}^{(1)}$ or $R_{M}^{(3)}$ . Similar to the derivation of $a_{k + 1, i}^{(1)}$ , the four vectors can be calculated as (80) $\begin{aligned} ϕ_{i}^{(M)} & = ϕ_{i} - \sum_{j = 1}^{M} \frac{(ϕ_{j}^{(j - 1)} + λ ψ_{j}^{(j - 1)}) (a_{j, i}^{(1)} + λ a_{j, i}^{(3)})}{a_{j, j}^{(1)} + λ a_{j, j}^{(2)} + 2 λ a_{j, j}^{(3)}} \end{aligned}$ (80) (81) $\begin{aligned} ψ_{i}^{(M)} & = - \sum_{j = 1}^{M} \frac{(ϕ_{j}^{(j - 1)} + λ ψ_{j}^{(j - 1)}) (a_{j, i}^{(2)} + a_{j, i}^{(4)})}{a_{j, j}^{(1)} + λ a_{j, j}^{(2)} + 2 λ a_{j, j}^{(3)}} \end{aligned}$ (81) Thus (82) $\begin{aligned} ϕ_{i}^{(M)} + λ ψ_{i}^{(M)} = ϕ_{i} - \sum_{j = 1}^{M} \frac{\begin{matrix} (a_{j, i}^{(1)} + λ a_{j, i}^{(2)} + λ a_{j, i}^{(3)} \\ + λ a_{j, i}^{(4)}) (ϕ_{j}^{(j - 1)} \\ + λ ψ_{j}^{(j - 1)}) \end{matrix}}{a_{j, j}^{(1)} + λ a_{j, j}^{(2)} + 2 λ a_{j, j}^{(3)}} \end{aligned}$ (82) And then (83) $\begin{aligned} Y^{(M)} & = Y - \sum_{j = 1}^{M} \frac{(ϕ_{j}^{(j - 1)} + λ ψ_{j}^{(j - 1)}) (b_{j}^{(1)} + λ b_{j}^{(3)})}{a_{j, j}^{(1)} + λ a_{j, j}^{(2)} + 2 λ a_{j, j}^{(3)}} \end{aligned}$ (83) (84) $\begin{aligned} E^{(M)} & = - \sum_{j = 1}^{M} \frac{(ϕ_{j}^{(j - 1)} + λ ψ_{j}^{(j - 1)}) (b_{j}^{(2)} + b_{j}^{(4)})}{a_{j, j}^{(1)} + λ a_{j, j}^{(2)} + 2 λ a_{j, j}^{(3)}} \end{aligned}$ (84) For the definitions of $R_{k}^{(0)}$ , Equations (Equation75(75) $\begin{aligned} a_{k + 1, i} & ≜ ϕ_{k + 1}^{T} R_{k}^{(0)} ϕ_{i} \end{aligned}$ (75) ) and (Equation76(76) $\begin{aligned} b_{k + 1} & ≜ (Y - Y^{(M)} + λ E^{(M)})^{T} R_{k}^{(0)} ϕ_{k + 1} \end{aligned}$ (76) ) are similar as those defined in Li et al. (Citation2005), and $a_{k + 1, i}$ and $b_{k + 1}$ can be calculated in a similar way as for the calculation of $a_{k + 1, i}^{(1)}$ using (85) $\begin{aligned} a_{k + 1, i} & = ϕ_{k + 1}^{T} ϕ_{i} - \sum_{j = 1}^{k} \frac{(a_{j, k + 1} a_{j, i})}{a_{j, j}} \end{aligned}$ (85) (86) $\begin{aligned} b_{k + 1} & = (Y - Y^{(M)} + λ E^{(M)})^{T} Φ_{k + 1} - \sum_{j = 1}^{k} \frac{(b_{j} a_{j, k + 1})}{a_{j, j}} \end{aligned}$ (86) Accordingly, after updating these above matrices and vectors in M iterations, Equation (Equation77(77) $\begin{aligned} \sum_{i = k + 1}^{M} {\hat{θ}}_{i} a_{k + 1, i} & = (R_{k}^{(0)} ϕ_{k + 1})^{T} (Y - Y^{(M)} + λ E^{(M)}) \\ = b_{k + 1} \end{aligned}$ (77) ) can be reformulated as follows: (87) ${\hat{θ}}_{j} = (b_{j} - \sum_{i = j + 1}^{M} {\hat{θ}}_{i} a_{j, i}) / a_{j, j} j = M, M - 1, \dots, 1.$ (87)

Remark 1

To identify a fraction model using the proposed RFRA, Equation (Equation67(67) $\begin{aligned} δ J_{k + 1} (φ_{i}) & = \frac{(b_{i}^{(1)} + λ b_{i}^{(3)})^{2}}{a_{i, i}^{(1)} + λ a_{i, i}^{(2)} + 2 λ a_{i, i}^{(3)}} \\ + \frac{λ (b_{i}^{(2)} + b_{i}^{(4)})^{2}}{a_{i, i}^{(1)} + λ a_{i, i}^{(2)} + 2 λ a_{i, i}^{(3)}} \\ + \frac{2 λ (b_{i}^{(1)} + λ b_{i}^{(3)}) (b_{i}^{(2)} + b_{i}^{(4)})}{a_{i, i}^{(1)} + λ a_{i, i}^{(2)} + 2 λ a_{i, i}^{(3)}} \end{aligned}$ (67) ) is first used for model term selection, which computes the net contribution of a candidate regression term to the cost function, then Equation (Equation87(87) ${\hat{θ}}_{j} = (b_{j} - \sum_{i = j + 1}^{M} {\hat{θ}}_{i} a_{j, i}) / a_{j, j} j = M, M - 1, \dots, 1.$ (87) ) is used for model coefficient estimation once the model terms have been selected.

4.2. Regularisation analysis and the proposed algorithm

In this paper, a regularisation term is used to compensate the estimation bias which is induced by the linear transformation. The value of the regularisation parameter has to be carefully chosen, as inappropriate value will either lead to the overfitting or underfitting problem. According to Proposition 3.1, the regularisation parameter needs to satisfy Equation (Equation23(23) $min (Ξ^{T} Ξ + λ c^{T} c) = 0$ (23) ). Furthermore, Equation (Equation23(23) $min (Ξ^{T} Ξ + λ c^{T} c) = 0$ (23) ) is based on known model structure. However, the structure is unknown a prior, therefore, the regularisation parameter has to be identified in two steps, including the selection of the magnitude order of the regularisation parameter and value refinement during the process.

The proposed method is detailed in Algorithm 1:

4.3. Computation complexity

The proposed algorithm, the orthogonal least sqare method (OLS) (Billings & Zhu, Citation1994) and the fast orthogonal identification method (Zhu & Billings, Citation1996) are all capable of simultaneously selecting the model terms and estimating the linear parameters. In Zhu and Billings (Citation1996), it is shown that the fast orthogonal identification method is computationally more efficient than the conventional OLS method. In the following, the computational complexity of the proposed method will be compared with that of the fast orthogonal identification algorithm when both are applied to the identification problem including the structure detection and the parameters estimation.

Suppose there are initially m candidate nonlinear terms to build a fraction nonlinear dynamic model expressed in Equation (Equation2(2) $y = a ⊘ c + e = (P_{n} Θ_{n}) ⊘ (P_{d} Θ_{d}) + e$ (2) ), M terms (M<m) will be selected to construct the model from the candidate terms and the coefficients of the resultant model should be estimated. The basic arithmetic operations used in the two identification algorithms are addition/subtraction, and multiplication/division. According to the procedures of RFRA in Algorithm 1, the model term selection process should repeat M−1 times and there are m−k times for updating any of $a_{j, i}^{(1)}$ , $a_{j, i}^{(2)}$ , $a_{j, i}^{(3)}$ , $a_{j, i}^{(4)}$ , $b_{i}^{(1)}$ , $b_{i}^{(2)}$ , $b_{i}^{(3)}$ , $b_{i}^{(4)}$ and the net contribution at each selection. Thus the operations in Equations (Equation59(59) $a_{j, i}^{(1)} = {\begin{cases} ϕ_{j}^{T} ϕ_{i} - \sum_{s = 1}^{j - 1} \frac{(a_{s, j}^{(1)} + λ a_{s, j}^{(3)}) (a_{s, i}^{(1)} + λ a_{s, i}^{(3)})}{a_{s, s}^{(1)} + λ a_{s, s}^{(2)} + 2 λ a_{s, s}^{(3)}} & j \leq i \leq k \\ ϕ_{j}^{T} φ_{i} - \sum_{s = 1}^{j - 1} \frac{(a_{s, j}^{(1)} + λ a_{s, j}^{(3)}) (a_{s, i}^{(1)} + λ a_{s, i}^{(3)})}{a_{s, s}^{(1)} + λ a_{s, s}^{(2)} + 2 λ a_{s, s}^{(3)}} & j \leq k, i = k + 1, \dots, m \\ φ_{j}^{T} φ_{i} - \sum_{s = 1}^{k} \frac{(a_{s, j}^{(1)} + λ a_{s, j}^{(3)}) (a_{s, i}^{(1)} + λ a_{s, i}^{(3)})}{a_{s, s}^{(1)} + λ a_{s, s}^{(2)} + 2 λ a_{s, s}^{(3)}} & j = k + 1, \dots, m, i = j, \dots, m \end{cases}$ (59) )–(Equation67(67) $\begin{aligned} δ J_{k + 1} (φ_{i}) & = \frac{(b_{i}^{(1)} + λ b_{i}^{(3)})^{2}}{a_{i, i}^{(1)} + λ a_{i, i}^{(2)} + 2 λ a_{i, i}^{(3)}} \\ + \frac{λ (b_{i}^{(2)} + b_{i}^{(4)})^{2}}{a_{i, i}^{(1)} + λ a_{i, i}^{(2)} + 2 λ a_{i, i}^{(3)}} \\ + \frac{2 λ (b_{i}^{(1)} + λ b_{i}^{(3)}) (b_{i}^{(2)} + b_{i}^{(4)})}{a_{i, i}^{(1)} + λ a_{i, i}^{(2)} + 2 λ a_{i, i}^{(3)}} \end{aligned}$ (67) ) should be counted m−k times at $(k + 1)$ th selection. In addition, Equations (Equation82(82) $\begin{aligned} ϕ_{i}^{(M)} + λ ψ_{i}^{(M)} = ϕ_{i} - \sum_{j = 1}^{M} \frac{\begin{matrix} (a_{j, i}^{(1)} + λ a_{j, i}^{(2)} + λ a_{j, i}^{(3)} \\ + λ a_{j, i}^{(4)}) (ϕ_{j}^{(j - 1)} \\ + λ ψ_{j}^{(j - 1)}) \end{matrix}}{a_{j, j}^{(1)} + λ a_{j, j}^{(2)} + 2 λ a_{j, j}^{(3)}} \end{aligned}$ (82) )–(Equation87(87) ${\hat{θ}}_{j} = (b_{j} - \sum_{i = j + 1}^{M} {\hat{θ}}_{i} a_{j, i}) / a_{j, j} j = M, M - 1, \dots, 1.$ (87) ) only need to repeat M times because there are M coefficients for the selected terms. Thereby, for the proposed method, the total number of addition/subtraction and multiplication/division operations is: (88) $\begin{aligned} C^{(R F R A)} & = 4 N M m + 9 m M (M - 1) \\ - \frac{19 M (M - 1) (2 M - 1)}{6} \\ + 5 M (2 m - M) \\ - 2 N M (M - 1) + \frac{3 M^{2} (M - 1)}{2} \\ + 8 N M - M - 4 N \\ + \frac{3 M (M - 1)}{2} + 7 m M - 5 M^{2} + N M^{2} \\ - N M + 2 M \end{aligned}$ (88) Referring to Zhu and Billings (Citation1996), though there are also M−1 selections, additional calculations are however needed for the coefficients estimation and for computing the errors at each selection iteration. According to the implementation using Equation 5.4 in Zhu and Billings (Citation1996), the total number of addition/subtraction and multiplication/division operations for the fast orthogonal method applied to the same model identification problem is: (89) $\begin{aligned} C^{(O L S)} & = 6 N M m - 2 N M (M - 1) + 8 m M (M - 1) \\ + 4 N m + 10 N M - \frac{9 M (M - 1)}{2} - 4 N \\ - \frac{13 M (M - 1) (2 M - 1)}{6} \\ + (M - 1) + 22 M m \end{aligned}$ (89) The ratio of the computational complexity of the fast OLS algorithm in Zhu and Billings (Citation1996) and the computational complexity of the proposed algorithm is illustrated in Figure as a function of the number of candidate terms m and actual model size M. It is graphically shown that the proposed method is computationally more efficient than the fast OLS algorithm in all cases, and the efficiency is much higher as the model size decreases.

Figure 3. The ratio of computational complexities between the fast OLS method and the proposed method.

5. Simulation and analysis

To verify the effectiveness of the proposed method, two simulation examples are used. The first example is a rational model. The second one is the case where the regressors are not all polynomials. For each case, 1000 samples are generated with the same distribution of input and error signals.

The input signals for generating the training and testing samples are independent and uniformly distributed signals with amplitude range $[- 1, 1]$ and variance of $0.33$ and an independent zero mean Gaussian error with variance $0.01$ is generated as used in Zhu and Billings (Citation1996), S. Chen et al. (Citation1989).

The proposed method is then applied to identify the system defined in Equation (Equation90(90) $\begin{aligned} y (t) & = \frac{a (t)}{c (t)} + e (t) \\ = \frac{\begin{matrix} 0.25 y (t - 1) - 0.85 y (t - 2) e (t - 2) \\ + u (t - 1) u (t - 2) \end{matrix}}{1 + 0.68 y^{2} (t - 1) + y^{2} (t - 2)} + e (t) \end{aligned}$ (90) ). (90) $\begin{aligned} y (t) & = \frac{a (t)}{c (t)} + e (t) \\ = \frac{\begin{matrix} 0.25 y (t - 1) - 0.85 y (t - 2) e (t - 2) \\ + u (t - 1) u (t - 2) \end{matrix}}{1 + 0.68 y^{2} (t - 1) + y^{2} (t - 2)} + e (t) \end{aligned}$ (90) The rational fraction system in Equation (Equation90(90) $\begin{aligned} y (t) & = \frac{a (t)}{c (t)} + e (t) \\ = \frac{\begin{matrix} 0.25 y (t - 1) - 0.85 y (t - 2) e (t - 2) \\ + u (t - 1) u (t - 2) \end{matrix}}{1 + 0.68 y^{2} (t - 1) + y^{2} (t - 2)} + e (t) \end{aligned}$ (90) ) is a typical rational nonlinear autoregressive moving average with eXogenous inputs (NARMAX) model (Waheeb et al., Citation2019). Models of this kind include past inputs, past outputs and past model errors which are the differences between the estimated values and the true values. Though incorporating the past error signal into the model may lead to a more accurate model, they are often difficult to measure or make the modelling process rather complicated. It should be noted that errors always exist in modelling nonlinear systems (Nelles, Citation2020). The model error is often decomposed into the bias and the variance. The bias can be decreased to zero by increasing the model nonlinearity, while the variance is significantly influenced by the data size and data quality. Thus the error in Equation (Equation90(90) $\begin{aligned} y (t) & = \frac{a (t)}{c (t)} + e (t) \\ = \frac{\begin{matrix} 0.25 y (t - 1) - 0.85 y (t - 2) e (t - 2) \\ + u (t - 1) u (t - 2) \end{matrix}}{1 + 0.68 y^{2} (t - 1) + y^{2} (t - 2)} + e (t) \end{aligned}$ (90) ) is often treated as known a priori (Neji & Beji, Citation2000). In general, the errors are assumed as independent Gaussian random variables with zero mean and a finite variance, and the variance is often given a priori. To this end, in this case study, the variance is set to $0.01$ . Accordingly, 27 polynomial terms are generated first based on the principle that the highest order is 2 and the maximal delays of input, error and output signals are 2. According to Equation (Equation7(7) $Y = Φ Θ + c ⊙ e = Φ Θ + Ξ$ (7) ), the candidate terms consist of 54 polynomial regressors.

First, the regularisation parameter is symmetrically adjusted around 0 and a number of trials are tested. Table shows four cases with different values for the regularisation parameter. It is found that $- 9$ is the optimal magnitude order for the regularisation parameter in Example 1. Then, the regularisation parameter is fine tuned around the optimal magnitude order, and the final value for the regularisation parameter is chosen to be 1.8276e−9.

Table 1. Different regularisation parameters for Example 1.

Display Table

Then the rational fraction system in Equation (Equation90(90) $\begin{aligned} y (t) & = \frac{a (t)}{c (t)} + e (t) \\ = \frac{\begin{matrix} 0.25 y (t - 1) - 0.85 y (t - 2) e (t - 2) \\ + u (t - 1) u (t - 2) \end{matrix}}{1 + 0.68 y^{2} (t - 1) + y^{2} (t - 2)} + e (t) \end{aligned}$ (90) ) is identified by the RFRA algorithm and the fast OLS method in reference Zhu and Billings (Citation1996), respectively. The selected nonlinear terms from the above 54 nonlinear terms and their estimated linear parameters are listed in Table . By comparing the parameters obtained by the two algorithms, it is clear that the parameters identified by RFRA algorithm are closer to the actual parameters of the model, and the model obtained is more accurate than that produced by the fast OLS method in reference Zhu and Billings (Citation1996).

Table 2. Comparison of estimation coefficients for Example 1.

Display Table

Finally, a more general fraction system given in Equation (Equation91(91) $\begin{aligned} y (t) & = \frac{a (t)}{c (t)} + e (t) \\ = \frac{\begin{matrix} 0.3 y (t - 1) - 0.7 y (t - 2) \\ + s i n (u (t - 1)) + u (t - 1) \end{matrix}}{1 + e x p (- y^{2} (t - 1)) + u^{2} (t - 2)} + e (t) \end{aligned}$ (91) ), which contains both polynomials and other nonlinear terms in the rational function, is used as the second case study. (91) $\begin{aligned} y (t) & = \frac{a (t)}{c (t)} + e (t) \\ = \frac{\begin{matrix} 0.3 y (t - 1) - 0.7 y (t - 2) \\ + s i n (u (t - 1)) + u (t - 1) \end{matrix}}{1 + e x p (- y^{2} (t - 1)) + u^{2} (t - 2)} + e (t) \end{aligned}$ (91) Similarly, nonlinear candidate items are generated. Considering the small number of samples which is only 1000, different basis functions are used as the numerator and denominator candidate terms, leading to the generation of 52 candidate terms.

It is shown that the bias becomes greater in the positive direction in Example 1, hence, similarly the regularisation parameter is fine tuned along the negative values for Example 2. Following the same procedure in example 1, the regularisation parameter is firstly coarsely adjusted and Table shows some cases with different values for the regularisation parameter. After numerous trials, it is evident again that $- 9$ is the optimum magnitude order for the regularisation parameter. Then, the regularisation parameter is further fine tuned. It is noted that the magnitude order of the MSE validation errors in all the two examples could achieve to $- 4$ even without fine-tuning. The nonlinear regressors are selected and the coefficients are estimated which are shown in Table . It is again evident that the fraction system can be well identified using the proposed method.

Table 3. Different regularisation parameters for Example 2.

Display Table

Table 4. Structure selection and parameters estimation for Example 2.

Display Table

6. Conclusion

This paper has proposed a novel fast method to identify the fraction models where the model structure and the coefficients are simultaneously estimated. The identification of the fraction system is first transformed to an equivalent quadratic optimisation problem by adding a regularisation term which is formulated as the inner product of the denominator terms. To solve the quadratic optimisation problem, a regularised fast recursive algorithm is developed. The proposed method is further validated by two simulation examples to confirm its efficacy. The method can be further applied to build the output layer of the traditional neural networks or the full connected layer of the deep neural networks which can reduce the size of the neural networks. However, a potential limitation of the proposed method is the computational effort to fine tune the regularisation parameter though we have shown that it value has to be negative and can not exceed $Y^{T} Y$ . It will inevitably cost more computations to optimise the regularisation parameter if the system is too complex. Fast optimisation of the penalty parameter will be a future research topic.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

The authors confirm that the data supporting the findings of this study are available within the article.

Additional information

Funding

This work is supported by Key Project of Science and Technology Commission of Shanghai Municipality [No. 19500712300], 111 Project [No. D18003].

Notes on contributors

Li Zhang

Li Zhang received the B.Sc. and M.Sc. degrees from Anhui University of Technology, Anhui, China, in 2009 and 2013, respectively. She is currently pursuing the Ph.D. degree in control theory and control engineering at Shanghai University. Between 2018 and 2019, she was a visiting Ph.D. student with the University of Leeds, U.K. Her current research interests include nonlinear system modeling and identification, machine learning algorithm and power system.

Kang Li

Kang Li (M'05-SM'11) received the B.Sc. degree in Industrial Automation from Xiangtan University, Hunan, China, in 1989, the M.Sc. degree in Control Theory and Applications from Harbin Institute of Technology, Harbin, China, in 1992, and the Ph.D. degree in Control Theory and Applications from Shanghai Jiaotong University, Shanghai, China, in 1995. He also received D.Sc. degree in system identification from Queen's University Belfast, UK, in 2015. Between 1995 and 2002, he worked at Shanghai Jiaotong University, Delft University of Technology and Queen's University Belfast as a research fellow. Between 2002 and 2018, he was a Lecturer (2002), a Senior Lecturer (2007), a Reader (2009) and a Chair Professor (2011) with the School of Electronics, Electrical Engineering and Computer Science, Queen's University Belfast, Belfast, U.K. He currently holds the Chair of Smart Energy Systems at the University of Leeds, UK. His research interests cover nonlinear system modelling, identification, and control, and machine learning, with substantial applications to energy and power systems, smart grid, electric vehicles, railway systems, and energy management in energy intensive manufacturing processes. He has authored/co-authored over 200 journal publications and edited/co-edited 19 conference proceedings.Dr Li chaired the IEEE UKRI Control and Communication Ireland chapter, and was the Secretary of the IEEE UK & Ireland Section.

Dajun Du

Dajun Du received the B.Sc. degree in electrical technology, and M.Sc. degrees in control theory and control engineering all from the Zhengzhou University, China in 2002 and 2005, respectively, and his Ph. D. degree in control theory and control engineering from Shanghai University in 2010. From September 2008 to September 2009, he was a visiting PhD student at Queen's University Belfast, UK. From April 2011 to August 2012, he was a Research Fellow at Queen's University Belfast, UK. He is currently a professor in Shanghai University. His main research interests include system modelling and identification and networked control systems.

Yihuan Li

Yihuan Li received the B.Sc. and M.Sc. degrees from North China Electric Power University, Beijing, China, in 2014 and 2017, respectively, and the PhD degree from University of Leeds in 2021. She is currently a lecture in North China Electric Power University. Her main research interests include system identification and state estimation, and the application of machine learning techniques in the field of battery management.

Minrui Fei

Minrui Fei received his B.S. and M.S. degrees in Industrial Automation from the Shanghai University of Technology in 1984 and 1992, respectively, and his PhD degree in Control Theory and Control Engineering from Shanghai University in 1997. Since 1998, he has been a full professor at Shanghai University. He is Chairman of Embedded Instrument and System Sub-society, and Standing Director of China Instrument & Control Society; Chairman of Life System Modeling and Simulation Sub-society, Vice-chairman of Intelligent Control and Intelligent Management Sub-society, and Director of Chinese Artificial Intelligence Association. His research interests are in the areas of networked control systems, intelligent control, complex system modeling, hybrid network systems, and field control systems.

References

Andreev, M., & Borulko, V. (2017). Polynomial and rational approximate models of microwave elements. In 2017 XXIIND International Seminar/Workshop on Direct and Inverse Problems of Electromagnetic and Acoustic Wave Theory (DIPED) (pp. 256–259). IEEE.
Google Scholar
Austin, A. P., Krishnamoorthy, M., Leyffer, S., Mrenna, S., Mller, J., & Schulz, H. (2021). Practical algorithms for multivariate rational approximation. Computer Physics Communications, 261, 107663. https://doi.org/10.1016/j.cpc.2020.107663.
Google Scholar
Bai, E.-w., Li, K., Zhao, W., & Xu, W. (2014). Kernel based approaches to local nonlinear non-parametric variable selection. Automatica, 50(1), 100–113. https://doi.org/10.1016/j.automatica.2013.10.010
Web of Science ®Google Scholar
Billings, S., & Mao, K. (1998). Structure detection for nonlinear rational models using genetic algorithms. International Journal of Systems Science, 29(3), 223–231. https://doi.org/10.1080/00207729808929516
Web of Science ®Google Scholar
Billings, S., & Zhu, Q. (1991). Rational model identification using an extended least-squares algorithm. International Journal of Control, 54(3), 529–546. https://doi.org/10.1080/00207179108934174
Web of Science ®Google Scholar
Billings, S., & Zhu, Q. (1994). A structure detection algorithm for nonlinear dynamic rational models. International Journal of Control, 59(6), 1439–1463. https://doi.org/10.1080/00207179408923140
Web of Science ®Google Scholar
Birpoutsoukis, G., Marconato, A., Lataire, J., & Schoukens, J. (2017). Regularized nonparametric volterra kernel estimation. Automatica, 82, 324–327. https://doi.org/10.1016/j.automatica.2017.04.014
Web of Science ®Google Scholar
Castella, M., Pesquet, J.-C., & Marmin, A. (2018). Rational optimization for nonlinear reconstruction with approximate ℓ0 penalization. IEEE Transactions on Signal Processing, 67(6), 1407–1417. https://doi.org/10.1109/TSP.2018.2890065
Web of Science ®Google Scholar
Chen, J., Liu, Y., & Zhu, Q. (2018). Bias compensation recursive algorithm for dual-rate rational models. IET Control Theory & Applications, 12(16), 2184–2193. https://doi.org/10.1049/cth2.v12.16
Web of Science ®Google Scholar
Chen, S., Billings, S. A., & Luo, W. (1989). Orthogonal least squares methods and their application to non-linear system identification. International Journal of Control, 50(5), 1873–1896. https://doi.org/10.1080/00207178908953472
Web of Science ®Google Scholar
Chen, T., Andersen, M. S., Ljung, L., Chiuso, A., & Pillonetto, G. (2014). System identification via sparse multiple kernel-based regularization using sequential convex optimization techniques. IEEE Transactions on Automatic Control, 59(11), 2933–2945. https://doi.org/10.1109/TAC.2014.2351851
Web of Science ®Google Scholar
Curtain, R., & Morris, K. (2009). Transfer functions of distributed parameter systems: A tutorial. Automatica, 45(5), 1101–1116. https://doi.org/10.1016/j.automatica.2009.01.008
Web of Science ®Google Scholar
Deng, J., Li, K., Harkin-Jones, E., Fei, M., & Li, S. (2013). A novel two stage algorithm for construction of RBF neural models based on a-optimality criterion. In 2013 Ninth International Conference On Natural Computation (ICNC) (pp. 1–7). IEEE.
Google Scholar
Deng, J., Li, K., & Irwin, G. W. (2010). A two-stage algorithm for automatic construction of RBF neural models. In Melecon 2010 15th IEEE Mediterranean Electrotechnical Conference (pp. 166–171). IEEE.
Google Scholar
Deng, J., Li, K., & Irwin, G. W. (2012). Locally regularised two-stage learning algorithm for rbf network centre selection. International Journal of Systems Science, 43(6), 1157–1170. https://doi.org/10.1080/00207721.2010.545490
Web of Science ®Google Scholar
Du, D., Li, X., Fei, M., & Irwin, G. W. (2012). A novel locally regularized automatic construction method for rbf neural models. Neurocomputing, 98, 4–11. https://doi.org/10.1016/j.neucom.2011.05.045.
Web of Science ®Google Scholar
Du, D., Zhu, M., Li, X., Fei, M., Bu, S., Wu, L., & Li, K. (2022). A review on cybersecurity analysis, attack detection, and attack defense methods in cyber-physical power systems. Journal of Modern Power Systems and Clean Energy, to be published. https://doi.org/10.35833/MPCE.2021.000604.
Google Scholar
Gambella, C., Ghaddar, B., & Naoum-Sawaya, J. (2021). Optimization problems for machine learning: A survey. European Journal of Operational Research, 290(3), 807–828. https://doi.org/10.1016/j.ejor.2020.08.045
Web of Science ®Google Scholar
Hofmann, T., Schölkopf, B., & Smola, A. J. (2008). Kernel methods in machine learning. The Annals of Statistics, 36(3), 1171–1220. https://doi.org/10.1214/009053607000000677
Web of Science ®Google Scholar
Hong, X., Mitchell, R., Chen, S., Harris, C., Li, K., & Irwin, G. (2008). Model selection approaches for non-linear system identification: A review. International Journal of Systems Science, 39(10), 925–946. https://doi.org/10.1080/00207720802083018
Web of Science ®Google Scholar
Hussein, A. A.-h., & Batarseh, I. (2011). An overview of generic battery models. In 2011 IEEE Power And Energy Society General Meeting (pp. 1–6). IEEE.
Google Scholar
Jin, R., & Weng, G. (2019). A robust active contour model driven by pre-fitting bias correction and optimized fuzzy c-means algorithm for fast image segmentation. Neurocomputing, 359, 408–419. https://doi.org/10.1016/j.neucom.2019.06.019
Web of Science ®Google Scholar
Kambhampati, C., Mason, J., & Warwick, K. (2000). A stable one-step-ahead predictive control of non-linear systems. Automatica, 36(4), 485–495. https://doi.org/10.1016/S0005-1098(99)00173-9
Web of Science ®Google Scholar
Li, K. (2005). Eng-genes: A new genetic modelling approach for nonlinear dynamic systems. IFAC Proceedings Volumes, 38(1), 162–167. https://doi.org/10.3182/20050703-6-CZ-1902.01105
Google Scholar
Li, K., & Peng, J.-X. (2007). Neural input selectiona fast model-based approach. Neurocomputing, 70(4), 762–769. https://doi.org/10.1016/j.neucom.2006.10.011
Web of Science ®Google Scholar
Li, K., Peng, J.-X., & Bai, E.-W. (2006). A two-stage algorithm for identification of nonlinear dynamic systems. Automatica, 42(7), 1189–1197. https://doi.org/10.1016/j.automatica.2006.03.004
Web of Science ®Google Scholar
Li, K., Peng, J.-X., & Irwin, G. W. (2005). A fast nonlinear model identification method. IEEE Transactions on Automatic Control, 50(8), 1211–1216. https://doi.org/10.1109/TAC.2005.852557
Web of Science ®Google Scholar
Liang, Z., & Zhang, L. (2021). Support vector machines with the epsilon-insensitive pinball loss function for uncertain data classification. Neurocomputing, 457, 117–127. https://doi.org/10.1016/j.neucom.2021.06.044
Web of Science ®Google Scholar
Luo, X., Chang, X., & Ban, X. (2016). Regression and classification using extreme learning machine based on l1-norm and l2-norm. Neurocomputing, 174, 179–186. https://doi.org/10.1016/j.neucom.2015.03.112
Web of Science ®Google Scholar
Mall, R., & Suykens, J. A. K. (May 2015). Very sparse LSSVM reductions for large-scale data. IEEE Transactions on Neural Networks and Learning Systems, 26(5), 1086–1097. https://doi.org/10.1109/TNNLS.2014.2333879
PubMed Web of Science ®Google Scholar
Manela, B., & Biess, A. (2021). Bias-reduced hindsight experience replay with virtual goal prioritization. Neurocomputing, 451, 305–315. https://doi.org/10.1016/j.neucom.2021.02.090
Web of Science ®Google Scholar
Münker, T., & Nelles, O. (2016). Nonlinear system identification with regularized local fir model networks. IFAC-PapersOnLine, 49(5), 61–66. https://doi.org/10.1016/j.ifacol.2016.07.090
Google Scholar
Neji, Z., & Beji, F.-M. (2000). Neural network and time series identification and prediction. In Proceedings Of The IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives For The New Millennium (Vol. 4, p. 461–466). IEEE.
Google Scholar
Nelles, O. (2020). Nonlinear system identification, from classical approaches to neural networks, fuzzy models, and gaussian processes. Springer.
Google Scholar
Nielsen, M. (2019). Neural networks and deep learning. Retrieved from http://neuralnetworksanddeeplearning.com/chap2.html (Online Book).
Google Scholar
Ohlsson, H., Ljung, L., & Boyd, S. (2010). Segmentation of arx-models using sum-of-norms regularization. Automatica, 46(6), 1107–1111. https://doi.org/10.1016/j.automatica.2010.03.013
Web of Science ®Google Scholar
Paduart, J., Lauwers, L., Swevers, J., Smolders, K., Schoukens, J., & Pintelon, R. (2010). Identification of nonlinear systems using polynomial nonlinear state space models. Automatica, 46(4), 647–656. https://doi.org/10.1016/j.automatica.2010.01.001
Web of Science ®Google Scholar
Piga, D., & Tóth, R. (2014). A bias-corrected estimator for nonlinear systems with outputerror type model structures. Automatica, 50(9), 2373–2380. https://doi.org/10.1016/j.automatica.2014.07.021
Web of Science ®Google Scholar
Pillonetto, G. (2018). System identification using kernel-based regularization: New insights on stability and consistency issues. Automatica, 93, 321–332. https://doi.org/10.1016/j.automatica.2018.03.065
Web of Science ®Google Scholar
Pillonetto, G., & De Nicolao, G. (2010). A new kernel-based approach for linear system identification. Automatica, 46(1), 81–93. https://doi.org/10.1016/j.automatica.2009.10.031
Web of Science ®Google Scholar
Pillonetto, G., Dinuzzo, F., Chen, T., De Nicolao, G., & Ljung, L. (2014). Kernel methods in system identification, machine learning and function estimation: A survey. Automatica, 50(3), 657–682. https://doi.org/10.1016/j.automatica.2014.01.001
Web of Science ®Google Scholar
Qiao, J., Meng, X., & Li, W. (2018). An incremental neuronal-activity-based rbf neural network for nonlinear system modeling. Neurocomputing, 302, 1–11. https://doi.org/10.1016/j.neucom.2018.01.001
Web of Science ®Google Scholar
Senhaji, K., Ramchoun, H., & Ettaouil, M. (2020). Training feedforward neural network via multiobjective optimization model using non-smooth l1/2 regularization. Neurocomputing, 410, 1–11. https://doi.org/10.1016/j.neucom.2020.05.066
Web of Science ®Google Scholar
Singh, J., & Dhiman, G. (2021). A survey on machine-learning approaches: Theory and their concepts. Materials Today: Proceedings. Advance online publication. Retrieved from https://www.sciencedirect.com/science/article/pii/S2214785321039298.
Google Scholar
Suykens, J. A., De Brabanter, J., Lukas, L., & Vandewalle, J. (2002). Weighted least squares support vector machines: Robustness and sparse approximation. Neurocomputing, 48(1), 85–105. https://doi.org/10.1016/S0925-2312(01)00644-0
Web of Science ®Google Scholar
Swarnakar, J., Sarkar, P., Dey, M., & Singh, L. J. (2017, Dec). Rational approximation of fractional order system in delta domain a unified approach. In 2017 IEEE Region 10 Humanitarian Technology Conference (R10-HTC) (pp. 144–150). IEEE.
Google Scholar
Tao, J., Wen, S., & Hu, W. (2015). L1-norm locally linear representation regularization multi-source adaptation learning. Neural Networks, 69, 80–98. https://doi.org/10.1016/j.neunet.2015.01.009
PubMed Web of Science ®Google Scholar
Tikhonov, A. N., & Glasko, V. B. (1965). Use of the regularization method in non-linear problems. USSR Computational Mathematics and Mathematical Physics, 5(3), 93–107. https://doi.org/10.1016/0041-5553(65)90150-3
Google Scholar
Torres-Barrán, A., Alaíz, C. M., & Dorronsoro, J. R. (2018). ν-svm solutions of constrained lasso and elastic net. Neurocomputing, 275, 1921–1931. https://doi.org/10.1016/j.neucom.2017.10.029
Web of Science ®Google Scholar
Valencia, E. A., & lvarez, M. A. (2017). Short-term time series prediction using hilbert space embeddings of autoregressive processes. Neurocomputing, 266, 595–605. https://doi.org/10.1016/j.neucom.2017.05.067
Web of Science ®Google Scholar
Waheeb, W., Ghazali, R., & Shah, H. (2019). Nonlinear autoregressive moving-average (NARMA) time series forecasting using neural networks. In 2019 International Conference on Computer and Information Sciences (ICCIS) (pp. 1–5).IEEE.
Google Scholar
Wang, L., Liu, J., & Qian, F. (2021). Wind speed frequency distribution modeling and wind energy resource assessment based on polynomial regression model. International Journal of Electrical Power and Energy Systems, 130, 106964. https://doi.org/10.1016/j.ijepes.2021.106964
Google Scholar
Yang, C., Zhu, X., Qiao, J., & Nie, K. (2020). Forward and backward input variable selection for polynomial echo state networks. Neurocomputing, 398, 83–94. https://doi.org/10.1016/j.neucom.2020.02.034
Web of Science ®Google Scholar
Yang, Z., Mourshed, M., Liu, K., Xu, X., & Feng, S. (2020). A novel competitive swarm optimized rbf neural network model for short-term solar power generation forecasting. Neurocomputing, 397, 415–421. https://doi.org/10.1016/j.neucom.2019.09.110
Web of Science ®Google Scholar
Yildiz, B., Bilbao, J., & Sproul, A. (2017). A review and analysis of regression and machine learning models on commercial building electricity load forecasting. Renewable and Sustainable Energy Reviews, 73, 1104–1122. https://doi.org/10.1016/j.rser.2017.02.023
Web of Science ®Google Scholar
Zhang, J., Li, K., Irwin, G. W., & Zhao, W. (2012). A regression approach to LS-SVM and sparse realization based on fast subset selection. In 2012 10th World Congress on Intelligent Control and Automation (WCICA) (pp. 612–617). IEEE.
Google Scholar
Zhang, L., Li, K., Bai, E.-W., & Irwin, G. W. (2015). Two-stage orthogonal least squares methods for neural network construction. IEEE Transactions on Neural Networks and Learning Systems, 26(8), 1608–1621. https://doi.org/10.1109/TNNLS.2014.2346399
PubMed Web of Science ®Google Scholar
Zhang, L., Zhang, D., Sun, J., Wei, G., & Bo, H. (2019). Salient object detection by local and global manifold regularized svm model. Neurocomputing, 340, 42–54. https://doi.org/10.1016/j.neucom.2019.02.041
Web of Science ®Google Scholar
Zhao, W., Zhang, J., & Li, K. (2015). An efficient ls-svm-based method for fuzzy system construction. IEEE Transactions on Fuzzy Systems, 23(3), 627–643. https://doi.org/10.1109/TFUZZ.2014.2321594
Web of Science ®Google Scholar
Zhu, Q. (2003). A back propagation algorithm to estimate the parameters of non-linear dynamic rational models. Applied Mathematical Modelling, 27(3), 169–187. https://doi.org/10.1016/S0307-904X(02)00097-5
Web of Science ®Google Scholar
Zhu, Q. (2005). An implicit least squares algorithm for nonlinear rational model parameter estimation. Applied Mathematical Modelling, 29(7), 673–689. https://doi.org/10.1016/j.apm.2004.10.008
Web of Science ®Google Scholar
Zhu, Q., & Billings, S. (1996). Fast orthogonal identification of nonlinear stochastic models and radial basis function neural networks. International Journal of Control, 64(5), 871–886. https://doi.org/10.1080/00207179608921662
Web of Science ®Google Scholar
Zhu, Q., Wang, Y., Zhao, D., Li, S., & Billings, S. A. (2015). Review of rational (total) nonlinear dynamic system modelling, identification, and control. International Journal of Systems Science, 46(12), 2122–2133. https://doi.org/10.1080/00207721.2013.849774
Web of Science ®Google Scholar
Zhu, Q., Yu, D., & Zhao, D. (2017). An enhanced linear Kalman filter (enlkf) algorithm for parameter estimation of nonlinear rational models. International Journal of Systems Science, 48(3), 451–461. https://doi.org/10.1080/00207721.2016.1186243
Web of Science ®Google Scholar
Zhu, Q., Zhang, W., Zhang, J., & Sun, B. (2019). U-neural network-enhanced control of nonlinear dynamic systems. Neurocomputing, 352, 12–21. https://doi.org/10.1016/j.neucom.2019.04.008
Web of Science ®Google Scholar

Appendices

Appendix 1.

Simplifying the regularized cost function

According to the principles of matrix manipulation, the cost function is simplified in Equation (EquationA1).

(A1)

\begin{aligned} J & = Y^{T} Y - Θ^{T} Φ^{T} Y - Y^{T} Φ Θ + Θ^{T} Φ^{T} Φ Θ \\ + λ Θ^{T} Ψ^{T} Ψ Θ + λ Θ^{T} Ψ^{T} E + λ E^{T} Ψ Θ \\ = Y^{T} Y - (Y^{T} Φ - λ E^{T} Ψ) (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} Φ^{T} Y \\ + λ E^{T} E - Y^{T} Φ (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} (Φ^{T} Y - λ Ψ^{T} E) \\ + (Y^{T} Φ - λ E^{T} Ψ) (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} \\ \times Φ^{T} Φ (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} (Φ^{T} Y - λ Ψ^{T} E) \\ + λ (Y^{T} Φ - λ E^{T} Ψ) (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} \\ \times Ψ^{T} Ψ (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} (Φ^{T} Y - λ Ψ^{T} E) \\ + λ (Y^{T} Φ - λ E^{T} Ψ) (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} Ψ^{T} E \\ + λ E^{T} Ψ (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} (Φ^{T} Y + λ Ψ^{T} E) - λ E^{T} E \end{aligned}

(A1)

\begin{aligned} = Y^{T} Y - Y^{T} Φ (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} Φ^{T} Y \\ + λ E^{T} Ψ (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} Φ^{T} Y \\ + λ E^{T} E - Y^{T} Φ (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} Φ^{T} Y \\ + λ Y^{T} Φ (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} Ψ^{T} E \\ + (Y^{T} Φ - λ E^{T} Ψ) (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} (Φ^{T} Φ + λ Ψ^{T} Ψ) \\ \times (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} (Φ^{T} Y - λ Ψ^{T} E) \\ + λ Y^{T} Φ (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} Ψ^{T} E \\ - λ E^{T} Ψ (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} λ Ψ^{T} E \\ + λ E^{T} Ψ (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} Φ^{T} Y \\ - λ E^{T} Ψ (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} λ Ψ^{T} E \\ = Y^{T} Y + λ E^{T} E - 2 Y^{T} Φ (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} Φ^{T} Y \\ + 4 λ E^{T} Ψ (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} Φ^{T} Y \\ + (Y^{T} Φ - λ E^{T} Ψ) (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} (Φ^{T} Y - λ Ψ^{T} E) \\ - 2 λ E^{T} Ψ (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} λ Ψ^{T} E \\ = Y^{T} Y + λ E^{T} E - 2 Y^{T} Φ (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} Φ^{T} Y \\ + 4 λ E^{T} Ψ (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} Φ^{T} Y \\ + Y^{T} Φ (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} Φ^{T} Y \\ - Y^{T} Φ (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} λ Ψ^{T} E \\ - λ E^{T} Ψ (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} Φ^{T} Y \\ + λ E^{T} Ψ (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} λ Ψ^{T} E \\ - 2 λ E^{T} Ψ (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} λ Ψ^{T} E \end{aligned}

\begin{aligned} = Y^{T} Y + λ E^{T} E - Y^{T} Φ (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} Φ^{T} Y \\ + 2 λ E^{T} Ψ (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} Φ^{T} Y \\ - λ E^{T} Ψ (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} λ Ψ^{T} E \\ = Y^{T} (I - Φ (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} Φ^{T}) Y \\ + 2 λ Y^{T} Ψ (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} Φ^{T} E \\ + λ E^{T} (I - λ Ψ (Φ^{T} Φ + λ Ψ^{T} Ψ)^{- 1} Ψ^{T}) E \end{aligned}

Appendix 2.

Proof of Properties of the recursive matrices

From the definition of $M_{k}^{- 1}$ in Equation (Equation29(29) $\begin{aligned} M_{k}^{- 1} & = {[\begin{array}{cc} M_{k - 1} & \begin{matrix} Φ_{k - 1}^{T} ϕ_{k} \\ + λ Ψ_{k - 1}^{T} ψ_{k} \end{matrix} \\ ϕ_{k}^{T} Φ_{k - 1} + λ ψ_{k}^{T} Ψ_{k - 1} & ϕ_{k}^{T} ϕ_{k} + λ ψ_{k}^{T} ψ_{k} \end{array}]}^{- 1} \\ = [\begin{array}{cc} F_{k - 1} & g_{k - 1} \\ g_{k - 1}^{T} & f_{k - 1} \end{array}] \end{aligned}$ (29) ), substituting Equations (Equation30(30) ${\begin{cases} F_{k - 1} = [M_{k - 1} - (Φ_{k - 1}^{T} ϕ_{k} + λ Ψ_{k - 1}^{T} ψ_{k}) \\ (ϕ_{k}^{T} Φ_{k - 1} + λ ψ_{k}^{T} Ψ_{k - 1}) / (ϕ_{k}^{T} ϕ_{k} + λ ψ_{k}^{T} ψ_{k})]^{- 1} \\ g_{k - 1} = - F_{k - 1} (Φ_{k - 1}^{T} ϕ_{k} + λ Ψ_{k - 1}^{T} ψ_{k}) \\ / (ϕ_{k}^{T} ϕ_{k} + λ ψ_{k}^{T} ψ_{k}) \\ f_{k - 1} = \frac{\begin{matrix} (ϕ_{k}^{T} ϕ_{k} + λ ψ_{k}^{T} ψ_{k}) \\ + (ϕ_{k}^{T} Φ_{k - 1} + λ ψ_{k}^{T} Ψ_{k - 1}) F_{k - 1} \\ (Φ_{k - 1}^{T} ϕ_{k} + λ Ψ_{k - 1}^{T} ψ_{k}) \end{matrix}}{(ϕ_{k}^{T} ϕ_{k} + λ ψ_{k}^{T} ψ_{k})^{2}} \end{cases}$ (30) ) and (Equation31(31) $F_{k - 1} = M_{k - 1}^{- 1} + \frac{\begin{matrix} M_{k - 1}^{- 1} (Φ_{k - 1}^{T} ϕ_{k} + λ Ψ_{k - 1}^{T} ψ_{k}) \\ (ϕ_{k}^{T} Φ_{k - 1} + λ ψ_{k}^{T} Ψ_{k - 1}) M_{k - 1}^{- 1} \end{matrix}}{\begin{matrix} ϕ_{k}^{T} ϕ_{k} + λ ψ_{k}^{T} ψ_{k} \\ - (ϕ_{k}^{T} Φ_{k - 1} + λ ψ_{k}^{T} Ψ_{k - 1}) M_{k - 1}^{- 1} \\ (Φ_{k - 1}^{T} ϕ_{k} + λ Ψ_{k - 1}^{T} ψ_{k}) \end{matrix}}$ (31) ) to the definition of $R_{k + 1}^{(1)}$ in Equation (Equation35(35) $\begin{aligned} R_{k + 1}^{(1)} & ≜ I - Φ_{k + 1} M_{k + 1}^{- 1} Φ_{k + 1}^{T} \end{aligned}$ (35) ), the recursive formulation of $R_{k + 1}^{(1)}$ is given in Equation (EquationA2(A2) $\begin{aligned} - \frac{\begin{matrix} ϕ_{k + 1} (ϕ_{k + 1}^{T} Φ_{k} + λ ψ_{k + 1}^{T} Ψ_{k}) M_{k}^{- 1} Φ_{k}^{T} \\ - Φ_{k} M_{k}^{- 1} (Φ_{k}^{T} ϕ_{k + 1} + λ Ψ_{k}^{T} ψ_{k + 1}) \\ ϕ_{k + 1}^{T} + ϕ_{k + 1} ϕ_{k + 1}^{T} \end{matrix}}{\begin{matrix} (ϕ_{k + 1}^{T} ϕ_{k + 1} + λ ψ_{k + 1}^{T} ψ_{k + 1} \\ - (ϕ_{k + 1}^{T} Φ_{k} + λ ψ_{k + 1}^{T} Ψ_{k}) M_{k}^{- 1} \\ (Φ_{k}^{T} ϕ_{k + 1} + λ Ψ_{k}^{T} ψ_{k + 1})) \end{matrix}} \\ = \frac{\begin{matrix} (Φ_{k} M_{k}^{- 1} (Φ_{k}^{T} ϕ_{k + 1} + λ Ψ_{k}^{T} ψ_{k + 1}) \\ - ϕ_{k + 1}) ((ϕ_{k + 1}^{T} Φ_{k} + λ ψ_{k + 1}^{T} Ψ_{k}) \\ M_{k}^{- 1} Φ_{k}^{T} - ϕ_{k + 1}^{T}) \end{matrix}}{\begin{matrix} (ϕ_{k + 1}^{T} ϕ_{k + 1} + λ ψ_{k + 1}^{T} ψ_{k + 1} \\ - (ϕ_{k + 1}^{T} Φ_{k} + λ ψ_{k + 1}^{T} Ψ_{k}) M_{k}^{- 1} \\ (Φ_{k}^{T} ϕ_{k + 1} + λ Ψ_{k}^{T} ψ_{k + 1})) \end{matrix}} \\ = \frac{\begin{matrix} (R_{k}^{(1)} ϕ_{k + 1} + λ R_{k}^{(3)} ψ_{k + 1}) \\ (ϕ_{k + 1}^{T} (R_{k}^{(1)})^{T} + λ ψ_{k + 1}^{T} (R_{k}^{(3)})^{T}) \end{matrix}}{\begin{matrix} (ϕ_{k + 1}^{T} ϕ_{k + 1} + λ ψ_{k + 1}^{T} ψ_{k + 1} \\ - (ϕ_{k + 1}^{T} Φ_{k} + λ ψ_{k + 1}^{T} Ψ_{k}) M_{k}^{- 1} \\ (Φ_{k}^{T} ϕ_{k + 1} + λ Ψ_{k}^{T} ψ_{k + 1})) \end{matrix}} \\ = \frac{\begin{matrix} (R_{k}^{(1)} ϕ_{k + 1} + λ R_{k}^{(3)} ψ_{k + 1}) \\ (ϕ_{k + 1}^{T} (R_{k}^{(1)})^{T} + λ ψ_{k + 1}^{T} (R_{k}^{(3)})^{T}) \end{matrix}}{\begin{matrix} (ϕ_{k + 1}^{T} R_{k}^{(1)} ϕ_{k + 1} + λ ψ_{k + 1}^{T} R_{k}^{(2)} ψ_{k + 1} \\ + 2 λ ϕ_{k + 1}^{T} R_{k}^{(3)} ψ_{k + 1}) \end{matrix}} \end{aligned}$ (A2) ). $\begin{aligned} R_{k}^{(1)} - R_{k + 1}^{(1)} & = Φ_{k + 1} M_{k + 1}^{- 1} Φ_{k + 1}^{T} - Φ_{k} M_{k}^{- 1} Φ_{k}^{T} \\ = [\begin{array}{cc} Φ_{k} & ϕ_{k + 1} \end{array}] [\begin{array}{cc} F_{k} & g_{k} \\ g_{k}^{T} & f_{k} \end{array}] [\begin{matrix} Φ_{k}^{T} \\ ϕ_{k + 1}^{T} \end{matrix}] - Φ_{k} M_{k}^{- 1} Φ_{k}^{T} \\ = Φ_{k} F_{k} Φ_{k}^{T} + ϕ_{k + 1} g_{k}^{T} Φ_{k}^{T} + Φ_{k} g_{k} ϕ_{k + 1}^{T} \\ + ϕ_{k + 1} f_{k} ϕ_{k + 1}^{T} - Φ_{k} M_{k}^{- 1} Φ_{k}^{T} \\ = Φ_{k} F_{k} Φ_{k}^{T} - ϕ_{k + 1} \frac{ϕ_{k + 1} Φ_{k}^{T} + λ ψ_{k + 1}^{T} Ψ_{k}}{ϕ_{k}^{T} ϕ_{k + 1} + λ ψ_{k + 1}^{T} ψ_{k + 1}} F_{k} Φ_{k}^{T} \\ - Φ_{k} F_{k} \frac{Φ_{k}^{T} ϕ_{k + 1} + λ Ψ_{k}^{T} ψ_{k + 1}}{ϕ_{k}^{T} ϕ_{k + 1} + λ ψ_{k + 1}^{T} ψ_{k + 1}} ϕ_{k + 1}^{T} \\ + ϕ_{k + 1} \frac{\begin{matrix} (ϕ_{k + 1}^{T} ϕ_{k + 1} + λ ψ_{k + 1}^{T} ψ_{k + 1}) \\ + (ϕ_{k + 1}^{T} Φ_{k} + λ ψ_{k + 1}^{T} Ψ_{k}) \\ F_{k} (Φ_{k}^{T} ϕ_{k + 1} + λ Ψ_{k}^{T} ψ_{k + 1}) \end{matrix}}{\begin{matrix} (ϕ_{k + 1}^{T} ϕ_{k + 1} + λ ψ_{k + 1}^{T} ψ_{k + 1})^{2} \end{matrix}} ϕ_{k + 1}^{T} \\ - Φ_{k} M_{k}^{- 1} Φ_{k}^{T} \end{aligned}$ $\begin{aligned} = Φ_{k} M_{k}^{- 1} Φ_{k}^{T} + Φ_{k} \frac{\begin{matrix} M_{k}^{- 1} (Φ_{k}^{T} ϕ_{k + 1} + λ Ψ_{k}^{T} ψ_{k + 1}) \\ (ϕ_{k + 1}^{T} Φ_{k} + λ ψ_{k + 1}^{T} Ψ_{k}) M_{k}^{- 1} \end{matrix}}{\begin{matrix} ϕ_{k + 1}^{T} ϕ_{k + 1} + λ ψ_{k + 1}^{T} ψ_{k + 1} \\ - (ϕ_{k + 1}^{T} Φ_{k} + λ ψ_{k + 1}^{T} Ψ_{k}) M_{k}^{- 1} \\ (Φ_{k}^{T} ϕ_{k + 1} + λ Ψ_{k}^{T} ψ_{k + 1}) \end{matrix}} Φ_{k}^{T} \\ - \frac{\begin{matrix} ϕ_{k + 1} (ϕ_{k + 1}^{T} Φ_{k} + λ ψ_{k + 1}^{T} Ψ_{k}) \\ M_{k}^{- 1} (Φ_{k}^{T} ϕ_{k + 1} + λ Ψ_{k}^{T} ψ_{k + 1}) \\ (ϕ_{k + 1}^{T} Φ_{k} + λ ψ_{k + 1}^{T} Ψ_{k}) M_{k}^{- 1} Φ_{k}^{T} \end{matrix}}{\begin{matrix} (ϕ_{k}^{T} ϕ_{k + 1} + λ ψ_{k + 1}^{T} ψ_{k + 1}) \\ (ϕ_{k + 1}^{T} ϕ_{k + 1} + λ ψ_{k + 1}^{T} ψ_{k + 1} \\ - (ϕ_{k + 1}^{T} Φ_{k} + λ ψ_{k + 1}^{T} Ψ_{k}) M_{k}^{- 1} \\ (Φ_{k}^{T} ϕ_{k + 1} + λ Ψ_{k}^{T} ψ_{k + 1})) \end{matrix}} \\ - ϕ_{k + 1} \frac{ϕ_{k + 1}^{T} Φ_{k} + λ ψ_{k + 1}^{T} Ψ_{k}}{ϕ_{k}^{T} ϕ_{k + 1} + λ ψ_{k + 1}^{T} ψ_{k + 1}} M_{k}^{- 1} Φ_{k}^{T} \\ - Φ_{k} M_{k}^{- 1} \frac{Φ_{k}^{T} ϕ_{k + 1} + λ Ψ_{k}^{T} ψ_{k + 1}}{ϕ_{k}^{T} ϕ_{k + 1} + λ ψ_{k + 1}^{T} ψ_{k + 1}} ϕ_{k + 1}^{T} \\ - \frac{\begin{matrix} Φ_{k} M_{k}^{- 1} (Φ_{k}^{T} ϕ_{k + 1} + λ Ψ_{k}^{T} ψ_{k + 1}) \\ (ϕ_{k + 1}^{T} Φ_{k} + λ ψ_{k + 1}^{T} Ψ_{k}) M_{k}^{- 1} \\ (Φ_{k}^{T} ϕ_{k + 1} + λ Ψ_{k}^{T} ψ_{k + 1}) ϕ_{k + 1}^{T} \end{matrix}}{\begin{matrix} (ϕ_{k + 1}^{T} ϕ_{k + 1} + λ ψ_{k + 1}^{T} ψ_{k + 1} \\ - (ϕ_{k + 1}^{T} Φ_{k} + λ ψ_{k + 1}^{T} Ψ_{k}) M_{k}^{- 1} \\ (Φ_{k}^{T} ϕ_{k + 1} + λ Ψ_{k}^{T} ψ_{k + 1})) \\ (ϕ_{k}^{T} ϕ_{k + 1} + λ ψ_{k + 1}^{T} ψ_{k + 1}) \end{matrix}} \end{aligned}$ $\begin{aligned} + \frac{\begin{matrix} ϕ_{k + 1} (ϕ_{k + 1}^{T} ϕ_{k + 1} + λ ψ_{k + 1}^{T} ψ_{k + 1}) ϕ_{k + 1}^{T} \end{matrix}}{\begin{matrix} (ϕ_{k + 1}^{T} ϕ_{k + 1} + λ ψ_{k + 1}^{T} ψ_{k + 1})^{2} \end{matrix}} \\ + ϕ_{k + 1} \frac{\begin{matrix} (ϕ_{k + 1}^{T} Φ_{k} + λ ψ_{k + 1}^{T} Ψ_{k}) M_{k}^{- 1} \\ (Φ_{k}^{T} ϕ_{k + 1} + λ Ψ_{k}^{T} ψ_{k + 1}) \end{matrix}}{\begin{matrix} (ϕ_{k + 1}^{T} ϕ_{k + 1} + λ ψ_{k + 1}^{T} ψ_{k + 1})^{2} \end{matrix}} ϕ_{k + 1}^{T} \\ + \frac{\begin{matrix} ϕ_{k + 1} (ϕ_{k + 1}^{T} Φ_{k} + λ ψ_{k + 1}^{T} Ψ_{k}) M_{k}^{- 1} \\ (Φ_{k}^{T} ϕ_{k + 1} + λ Ψ_{k}^{T} ψ_{k + 1}) \\ (ϕ_{k + 1}^{T} Φ_{k} + λ ψ_{k + 1}^{T} Ψ_{k}) M_{k}^{- 1} \\ (Φ_{k}^{T} ϕ_{k + 1} + λ Ψ_{k}^{T} ψ_{k + 1}) ϕ_{k + 1}^{T} \end{matrix}}{\begin{matrix} (ϕ_{k + 1}^{T} ϕ_{k + 1} + λ ψ_{k + 1}^{T} ψ_{k + 1})^{2} \\ (ϕ_{k + 1}^{T} ϕ_{k + 1} + λ ψ_{k + 1}^{T} ψ_{k + 1} \\ - (ϕ_{k + 1}^{T} Φ_{k} + λ ψ_{k + 1}^{T} Ψ_{k}) M_{k}^{- 1} \\ (Φ_{k}^{T} ϕ_{k + 1} + λ Ψ_{k}^{T} ψ_{k + 1})) \end{matrix}} \\ - Φ_{k} M_{k}^{- 1} Φ_{k}^{T} & = Φ_{k} \frac{\begin{matrix} M_{k}^{- 1} (Φ_{k}^{T} ϕ_{k + 1} + λ Ψ_{k}^{T} ψ_{k + 1}) \\ (ϕ_{k + 1}^{T} Φ_{k} + λ ψ_{k + 1}^{T} Ψ_{k}) M_{k}^{- 1} \end{matrix}}{\begin{matrix} ϕ_{k + 1}^{T} ϕ_{k + 1} + λ ψ_{k + 1}^{T} ψ_{k + 1} \\ - (ϕ_{k + 1}^{T} Φ_{k} + λ ψ_{k + 1}^{T} Ψ_{k}) M_{k}^{- 1} \\ (Φ_{k}^{T} ϕ_{k + 1} + λ Ψ_{k}^{T} ψ_{k + 1}) \end{matrix}} Φ_{k}^{T} \end{aligned}$ (A2) $\begin{aligned} - \frac{\begin{matrix} ϕ_{k + 1} (ϕ_{k + 1}^{T} Φ_{k} + λ ψ_{k + 1}^{T} Ψ_{k}) M_{k}^{- 1} Φ_{k}^{T} \\ - Φ_{k} M_{k}^{- 1} (Φ_{k}^{T} ϕ_{k + 1} + λ Ψ_{k}^{T} ψ_{k + 1}) \\ ϕ_{k + 1}^{T} + ϕ_{k + 1} ϕ_{k + 1}^{T} \end{matrix}}{\begin{matrix} (ϕ_{k + 1}^{T} ϕ_{k + 1} + λ ψ_{k + 1}^{T} ψ_{k + 1} \\ - (ϕ_{k + 1}^{T} Φ_{k} + λ ψ_{k + 1}^{T} Ψ_{k}) M_{k}^{- 1} \\ (Φ_{k}^{T} ϕ_{k + 1} + λ Ψ_{k}^{T} ψ_{k + 1})) \end{matrix}} \\ = \frac{\begin{matrix} (Φ_{k} M_{k}^{- 1} (Φ_{k}^{T} ϕ_{k + 1} + λ Ψ_{k}^{T} ψ_{k + 1}) \\ - ϕ_{k + 1}) ((ϕ_{k + 1}^{T} Φ_{k} + λ ψ_{k + 1}^{T} Ψ_{k}) \\ M_{k}^{- 1} Φ_{k}^{T} - ϕ_{k + 1}^{T}) \end{matrix}}{\begin{matrix} (ϕ_{k + 1}^{T} ϕ_{k + 1} + λ ψ_{k + 1}^{T} ψ_{k + 1} \\ - (ϕ_{k + 1}^{T} Φ_{k} + λ ψ_{k + 1}^{T} Ψ_{k}) M_{k}^{- 1} \\ (Φ_{k}^{T} ϕ_{k + 1} + λ Ψ_{k}^{T} ψ_{k + 1})) \end{matrix}} \\ = \frac{\begin{matrix} (R_{k}^{(1)} ϕ_{k + 1} + λ R_{k}^{(3)} ψ_{k + 1}) \\ (ϕ_{k + 1}^{T} (R_{k}^{(1)})^{T} + λ ψ_{k + 1}^{T} (R_{k}^{(3)})^{T}) \end{matrix}}{\begin{matrix} (ϕ_{k + 1}^{T} ϕ_{k + 1} + λ ψ_{k + 1}^{T} ψ_{k + 1} \\ - (ϕ_{k + 1}^{T} Φ_{k} + λ ψ_{k + 1}^{T} Ψ_{k}) M_{k}^{- 1} \\ (Φ_{k}^{T} ϕ_{k + 1} + λ Ψ_{k}^{T} ψ_{k + 1})) \end{matrix}} \\ = \frac{\begin{matrix} (R_{k}^{(1)} ϕ_{k + 1} + λ R_{k}^{(3)} ψ_{k + 1}) \\ (ϕ_{k + 1}^{T} (R_{k}^{(1)})^{T} + λ ψ_{k + 1}^{T} (R_{k}^{(3)})^{T}) \end{matrix}}{\begin{matrix} (ϕ_{k + 1}^{T} R_{k}^{(1)} ϕ_{k + 1} + λ ψ_{k + 1}^{T} R_{k}^{(2)} ψ_{k + 1} \\ + 2 λ ϕ_{k + 1}^{T} R_{k}^{(3)} ψ_{k + 1}) \end{matrix}} \end{aligned}$ (A2) Similarly, it can be proved that Equations (Equation39(39) $\begin{aligned} R_{k + 1}^{(2)} & = R_{k}^{(2)} - \frac{\begin{matrix} (R_{k}^{(2)} κ_{i} + (R_{k}^{(3)})^{T} φ_{i}) \\ (κ_{i}^{T} (R_{k}^{(2)})^{T} + φ_{i}^{T} R_{k}^{(3)}) \end{matrix}}{\begin{matrix} (φ_{i}^{T} R_{k}^{(1)} φ_{i} + λ κ_{i}^{T} R_{k}^{(2)} κ_{i} \\ + 2 λ φ_{i}^{T} R_{k}^{(3)} κ_{i}) \end{matrix}} \\ \times k = 0, \dots, m - 1, i = k + 1, \dots, m . \end{aligned}$ (39) ) and (Equation40(40) $\begin{aligned} R_{k + 1}^{(3)} & = R_{k}^{(3)} - \frac{\begin{matrix} (R_{k}^{(1)} φ_{i} + λ R_{k}^{(3)} κ_{i}) \\ (κ_{i}^{T} (R_{k}^{(2)})^{T} + φ_{i}^{T} R_{k}^{(3)}) \end{matrix}}{\begin{matrix} (φ_{i}^{T} R_{k}^{(1)} φ_{i} + λ κ_{i}^{T} R_{k}^{(2)} κ_{i} \\ + 2 λ φ_{i}^{T} R_{k}^{(3)} κ_{i}) \end{matrix}} \\ \times k = 0, \dots, m - 1, i = k + 1, \dots, m . \end{aligned}$ (40) ) are indeed true.

A regularised fast recursive algorithm for fraction model identification of nonlinear dynamic systems

Abstract

1. Introduction

2. Problem formulation

2.1. Fraction model

2.2. Transformation to linear-in-the-parameter formulation

2.3. Model identification challenges

3. Preliminaries

3.1. The original cost function

3.2. The dual solution

4. Regularised fast recursive algorithm

4.1. The principle of regularised fast recursive method

4.1.1. Model term selection based on the regularised cost function

4.1.2. Fast computation of net contributions of model terms to the cost function

4.1.3. Fast computation of model coefficients

4.2. Regularisation analysis and the proposed algorithm

4.3. Computation complexity

5. Simulation and analysis

Table 1. Different regularisation parameters for Example 1.

Table 2. Comparison of estimation coefficients for Example 1.

Table 3. Different regularisation parameters for Example 2.

Table 4. Structure selection and parameters estimation for Example 2.

6. Conclusion

Disclosure statement

Data availability statement

Additional information

Funding

Notes on contributors

Li Zhang

Kang Li

Dajun Du

Yihuan Li

Minrui Fei

References

Appendices

Appendix 1.

Simplifying the regularized cost function

Appendix 2.

Proof of Properties of the recursive matrices

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date