Full article: An online self-adaptive RBF network algorithm based on the Levenberg-Marquardt algorithm

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

Aiming at the problem that the Levenberg-Marquardt (LM) algorithm can not train online radial basis function (RBF) neural network and the deficiency in the RBF network structure design methods, this paper proposes an online self-adaptive algorithm for constructing RBF neural network (OSA-RBFNN) based on LM algorithm. Thus, the ideas of the sliding window method and online structure optimization methods are adopted to solve the proposed problems. On the one hand, the sliding window method enables the RBF network to be trained online by the LM algorithm making the RBF network more robust to the changes in the learning parameters and faster convergence compared with the other investigated algorithms. On the other hand, online structure optimization can adjust the structure of the RBF network based on the information of training errors and hidden nodes to track the non-linear time-varying systems, which helps to maintain a compact network and satisfactory generalization ability. Finally, verified by simulation analysis, it is demonstrated that OSA-RBFNN exhibits a compact RBF network.

Introduction

RBF neural network has been extensively applied to industrial control, pattern classification, signal processing, modeling of non-linear systems and other areas, as it has the ability of strong non-linear fitting, faster convergence, strong robustness, and is not easy to lead to local minima (Gao Citation2022; La Rosa Centeno et al. Citation2018; Zhang et al. Citation2020; Zhou, Oh, and Qiu Citation2022). Structure constructing methods and parameters tuning algorithms are the keys to construct RBF neural networks (Gu, Tok, and Yu Citation2018). Structural constructing is the strategy used to determine the number of hidden nodes in the RBF network; Parameter tuning is how to adjust three parameters in this network (Kadakadiyavar, Ramrao, and Singh Citation2020).

In recent years, considerable advancement has been proposed in the structural construction of the RBF network. The first algorithm dynamically adapts the weights of the participating kernels using the gradient descent method thereby alleviating the need for predetermined weights (Khan et al. Citation2017). The second method realizes online modeling by combing the advantages of the sliding window strategy and clustering algorithm (Jia, Li, and Qiao Citation2022). The novel algorithm provides better performance such as faster convergence rate, better local minima, and resilience against leading to poor local minima. It is a multi-kernel radial basis function neural network in which every base kernel has its weight (Atif et al. Citation2022).

Regarding architectural optimization for RBF neural networks, the most classic online algorithms are resource-allocating network (RAN), minimal resource allocating network (MRAN), and generalized growing and pruning radial basis function (GGAP-RBF). The principles mentioned above of algorithms both design the network online to meet the accuracy requirements, maintain a compact structure, and improve its generalization performance (Meng, Zhang, and Qiao Citation2021). Therefore, it is important to design an online neural network. However, these algorithms have deficiencies in networks.

John Platt (Platt Citation1991) presented RAN to test each training sample continuously. When the new training sample meets the ”‘novelty’,” a new node is allocated to the training sample. However, once the hidden nodes are added, they will not be pruned in the algorithm. Therefore, redundant hidden nodes will inevitably appear in this network for complex online learning tasks, affecting the network’s generalization performance.

To address the mentioned problems, Lu Y W et al. (Lu, Sundararajan, and Saratchandran Citation1997) proposed that MRAN is improved by RAN. On the one hand, if the deviation of the current network for multiple consecutive training samples is too large, add a hidden node; if several consecutive training samples cannot activate a hidden node, prune the node (Jia, Li, and Qiao Citation2022). To a certain extent, MRAN can obtain a compact network model. On the other hand, it also has blindness with adding nodes, because the center of the kernel function is determined randomly. Thereby it results in the poor robustness and generalization performance of the algorithms (Arif, Ray Chaudhuri, and Ray et al. Citation2009).

Based on the poor performance of the MRAN network. In (Li, Chen, and Huang Citation2006), GGAP-RBF neural network links the required learning accuracy to the significance of neurons in the learning algorithm to realize a compact RBF network. But the neural network needs to initialize the network parameters based on all samples, thus it is difficult to realize an optimal online algorithm for the RBF network.

The RAN, MRAN, and GGAP-RBF algorithms use the gradient descent method for the parameters learning algorithm. The gradient descent method is the first-order algorithm. The main problem of the first-order algorithm is slow convergence, and it is easy to lead to the local minimum of the curved error surface. In these situations, second-order algorithms are superior. The Levenberg-Marquardt (LM) algorithm (Houcine Bergou, Diouane, and Kungurtsev Citation2020) is a second-order algorithm, which is a mix of the Steepest Gradient Method and Gauss–Newton method. When the gradient of the error surface is small, the LM algorithm is similar to the Steepest Gradient Method. When the gradient of the error surface is large, the LM algorithm is similar to the Gauss–Newton method. LM can estimate the learning rate of each gradient under the curved error surface according to the Hessian matrix. The LM algorithm is efficient for training neural networks compared to the first-order algorithm (Wilamowski and Yu Citation2010).

The error correction (ErrCor) algorithm (Hao et al. Citation2014) applies the LM algorithm to train the RBF network, and after each iteration, it turns out that a much better learning ability of the RBF network can be obtained when adding a new RBF node at the location of the highest error peak or lowest error valley. This algorithm has strong robustness and can design a compact network (Xi et al. Citation2018). However, it is an offline design and difficult to apply to non-linear time-varying systems.

Based on previous research, according to the above-mentioned problems, we propose an online self-adaptive algorithm for the RBF network based on the LM algorithm. The algorithm builds a sliding window and uses the LM algorithm to train the RBF network. During the training process, it can add, prune or merge hidden nodes according to the training error information and the relevant information of each hidden node, which makes the RBF network structure compact in the learning process, and then ensures the generalization performance of the RBF network. Using the sliding window method can achieve the online application of the LM algorithm and make the RBF network more robust to changes in learning parameters and easier to converge. Finally, the performance of OSA-RBFNN is verified by simulation experiments.

The remainder of this paper is organized as follows. Section 1 introduces RBF network briefly. In Section 2, an online self-adaptive optimal algorithm for the RBF network is proposed in detail. Section 3 evaluates OSA-RBFNN through simulation analysis. Finally, the study is concluded in Section 4.

Materials and Methods

RBF Network

The RBF network comprises three feedforward network layers: the input layer, the hidden layer, and the output layer. The weight connection from the input layer to the hidden layer is fixed at 1. Without loss of generality, set the RBF network structure is I-H-1, that is I input nodes, H hidden nodes, and one output node. The structure is given in .

Figure 1. Structure of RBF networks.

Set $x_{p} = [x_{p, 1}, x_{p .2}, \dots, x_{p, I}]$ is the pth I dimensional sample in the RBF network, the output of the hth hidden node is as EquationEquation (1)(1) $ϕ_{p} (x_{p}) = exp (- \frac{∥ x_{p} - c_{h} ∥^{2}}{σ_{h}})$ (1) :

(1)

ϕ_{p} (x_{p}) = exp (- \frac{∥ x_{p} - c_{h} ∥^{2}}{σ_{h}})

(1)

Where $c_{h}$ and $σ_{h}$ are the center vector of the hth RBF node and its width, respectively, $∥ \cdot ∥$ denotes Euclidean distance.

The output of the network for the pth sample is as EquationEquation (2)(2) $O_{p} = \sum_{h = 1}^{H} w_{h} ϕ_{p} (x_{p}) + w_{0}$ (2) :

(2)

O_{p} = \sum_{h = 1}^{H} w_{h} ϕ_{p} (x_{p}) + w_{0}

(2)

where $w_{h}$ denotes the weight connecting between the hth hidden node and the output node, $w_{0}$ is bias.

LM Algorithm

The LM algorithm is a second-order algorithm successfully applied to the back propagation (BP) network. In terms of the LM algorithm training RBF network, the paper (Wilamowski and Yu Citation2010) improved the LM algorithm. On this basis, the paper (Hao et al. Citation2014) proposed an error correction (ErrCor) algorithm to construct the RBF network structure based on the peak of error training. This algorithm belongs to the growth algorithm and can design a compact RBF network structure.

When the LM algorithm trains the RBF network, the parameter update rule is as EquationEquation (3)(3) $Δ_{k + 1} = Δ_{k} - {(Q_{k} + μ_{k} I)}^{- 1} g_{k}$ (3) .

(3)

Δ_{k + 1} = Δ_{k} - {(Q_{k} + μ_{k} I)}^{- 1} g_{k}

(3)

Where $Δ$ denotes tuning parameter in RBF network (including centers c, widths σ, and the output weights w); Q denotes Quasi-Hessian matrix; I is the identity matrix; $μ$ is the learning coefficient; g is the gradient vector.

The training error $e_{p}$ is calculated as the desired output $y_{p}$ and actual output $o_{p}$ , it is shown in EquationEquation (4)(4) $e_{p} = y_{p} - o_{p} .$ (4) .

(4)

e_{p} = y_{p} - o_{p} .

(4)

The element $j_{p, n}$ of the nth row of the Jacobian matrix can be calculated by EquationEquation (5)(5) $j_{p, n} = \frac{\partial e_{p}}{\partial Δ_{n}},$ (5) .

(5)

j_{p, n} = \frac{\partial e_{p}}{\partial Δ_{n}},

(5)

Where n is three tuning parameters in this RBF network.

The gradient vector g is calculated through the sum of sub-vector $η_{p}$ in EquationEquation (6)(6) $g = \sum_{p = 1}^{P} η_{p}$ (6) .

(6)

g = \sum_{p = 1}^{P} η_{p}

(6)

and sub-vector $η_{p}$ is calculated as EquationEquation (7)(7) $η_{p} = j_{p}^{T} e_{p}$ (7) .

(7)

η_{p} = j_{p}^{T} e_{p}

(7)

Where $j_{p}$ is the row of Jacobian matrix and $e_{p}$ is calculated as EquationEquation (4)(4) $e_{p} = y_{p} - o_{p} .$ (4) .

The calculation of quasi-Hessian matrix Q is transformed to the sum of sub-matrices in EquationEquation (8)(8) $Q = \sum_{p = 1}^{P} q_{p}, q_{p} = j_{p}^{T} j_{p}$ (8) .

(8)

Q = \sum_{p = 1}^{P} q_{p}, q_{p} = j_{p}^{T} j_{p}

(8)

For a given p training sample, and considering the tuning parameter $w_{h}, c_{h, i}$ , and $σ_{h}$ under the RBF network, the Jacobian row elements is calculated by EquationEquation (9)(9) $j_{p} = [\begin{matrix} \frac{\partial e_{p}}{\partial ω_{0}}, \frac{\partial e_{p}}{\partial ω_{1}} \dots \frac{\partial e_{p}}{\partial ω_{h}} \dots \frac{\partial e_{p}}{\partial ω_{H}}, \frac{\partial e_{p}}{\partial c_{1, 1}} \dots \frac{\partial e_{p}}{\partial c_{1, i}} \dots \frac{\partial e_{p}}{\partial c_{1, I}} \dots \frac{\partial e_{p}}{\partial c_{h, 1}} \dots \\ \frac{\partial e_{p}}{\partial c_{h, i}} \dots \frac{\partial e_{p}}{\partial c_{H, 1}} \dots \frac{\partial e_{p}}{\partial c_{H, i}} \dots \frac{\partial e_{p}}{\partial c_{H, I}}, \frac{\partial e_{p}}{\partial σ_{1}} \dots \frac{\partial e_{p}}{\partial σ_{h}} \dots \frac{\partial e_{p}}{\partial σ_{H}} \end{matrix}] .$ (9) .

(9)

j_{p} = [\begin{matrix} \frac{\partial e_{p}}{\partial ω_{0}}, \frac{\partial e_{p}}{\partial ω_{1}} \dots \frac{\partial e_{p}}{\partial ω_{h}} \dots \frac{\partial e_{p}}{\partial ω_{H}}, \frac{\partial e_{p}}{\partial c_{1, 1}} \dots \frac{\partial e_{p}}{\partial c_{1, i}} \dots \frac{\partial e_{p}}{\partial c_{1, I}} \dots \frac{\partial e_{p}}{\partial c_{h, 1}} \dots \\ \frac{\partial e_{p}}{\partial c_{h, i}} \dots \frac{\partial e_{p}}{\partial c_{H, 1}} \dots \frac{\partial e_{p}}{\partial c_{H, i}} \dots \frac{\partial e_{p}}{\partial c_{H, I}}, \frac{\partial e_{p}}{\partial σ_{1}} \dots \frac{\partial e_{p}}{\partial σ_{h}} \dots \frac{\partial e_{p}}{\partial σ_{H}} \end{matrix}] .

(9)

Integrating EquationEquations (1)(1) $ϕ_{p} (x_{p}) = exp (- \frac{∥ x_{p} - c_{h} ∥^{2}}{σ_{h}})$ (1) , (Equation2(2) $O_{p} = \sum_{h = 1}^{H} w_{h} ϕ_{p} (x_{p}) + w_{0}$ (2) ), and (Equation4(4) $e_{p} = y_{p} - o_{p} .$ (4) ), with the differential chain rule, the Jacobian row of the pth training sample in (6) can be rewritten as EquationEquations (10)(10) $\frac{\partial e_{p}}{\partial ω_{h}} = - φ_{h} (X_{p}), \frac{\partial e_{p}}{\partial ω_{0}} = - 1,$ (10) ~Equation(12)(12) $\frac{\partial e_{p}}{\partial σ_{h}} = - \frac{ω_{h} φ_{h} (X_{p}) ∥ X_{p} - c_{h} ∥^{2}}{σ_{h}^{2}} .$ (12) .

(10)

\frac{\partial e_{p}}{\partial ω_{h}} = - φ_{h} (X_{p}), \frac{\partial e_{p}}{\partial ω_{0}} = - 1,

(10)

(11)

\frac{\partial e_{p}}{\partial c_{h, i}} = - \frac{2 ω_{h} φ_{h} (X_{p}) (x_{p, i} - c_{h, i})}{σ_{h}},

(11)

(12)

\frac{\partial e_{p}}{\partial σ_{h}} = - \frac{ω_{h} φ_{h} (X_{p}) ∥ X_{p} - c_{h} ∥^{2}}{σ_{h}^{2}} .

(12)

Integrating EquationEquations (10)(10) $\frac{\partial e_{p}}{\partial ω_{h}} = - φ_{h} (X_{p}), \frac{\partial e_{p}}{\partial ω_{0}} = - 1,$ (10) ~(Equation12(12) $\frac{\partial e_{p}}{\partial σ_{h}} = - \frac{ω_{h} φ_{h} (X_{p}) ∥ X_{p} - c_{h} ∥^{2}}{σ_{h}^{2}} .$ (12) ), all the elements of Jacobian row of the jth can be calculated. For all input samples, all elements of the Jacobian matrix can be calculated. Quasi-Jacobian matrix Q and gradient vector g are obtained by EquationEquations (6)(6) $g = \sum_{p = 1}^{P} η_{p}$ (6) ~(Equation8(8) $Q = \sum_{p = 1}^{P} q_{p}, q_{p} = j_{p}^{T} j_{p}$ (8) ). To apply the update EquationEquation (3)(3) $Δ_{k + 1} = Δ_{k} - {(Q_{k} + μ_{k} I)}^{- 1} g_{k}$ (3) for parameter adjustment.

Online Self-Adaptive Optimal Algorithm for RBF Network(OSA-RBFNN)

Although the LM algorithm is a effective algorithm for training neural networks at present, the LM algorithm can only be used in batch processing when training the RBF network. Hence, the ErrCor algorithm is an offline algorithm for designing the RBF network structure and cannot be performed online, it is not easy to apply to non-linear time-varying systems. In addition, since the RAN, MRAN, and GGAP-RBF online algorithms all use the latest single sample to train the RBF network, it leads to a poor local optimum, and if it has a sample noise impact on learning, it will result in the poor accuracy of learning. For online modeling, the reasonable method is to use the latest multiple samples to dynamically adjust the network parameters during the learning process (Arif, Ray Chaudhuri, and Ray et al. Citation2009).

The mentioned problems can be worked out by the using sliding window method in this paper (Pedro and Ruano Citation2009). The sliding window is a “FIFO” (First In First Out) queue of a fixed length. The elements of the queue are the online input sample by the entry window in chronological order. Sets online input sample is $(x_{n}, y_{n})$ , thus the elements in the sliding window of the length of L denotes $[(x_{i}, y_{i}), (x_{i + 1}, y_{i + 1}), \dots, (x_{i + L - 1}, y_{i + l - 1})] .$ When a new sample arrives, the sample in this window are updated by including the latest sample and eliminating the oldest. All the samples in the sliding window are the RBF network training sample.

As above, when using the sliding window method, and applying the LM algorithm to training the RBF network, the target function of RBF network learning is as EquationEquation (13)(13) $e_{L} = \sum_{i = 1}^{L} β_{i} {(y_{i} - o_{i})}^{2} .$ (13) .

(13)

e_{L} = \sum_{i = 1}^{L} β_{i} {(y_{i} - o_{i})}^{2} .

(13)

Where L is the length of the sliding window; $y_{i}$ and $o_{i}$ denotes desired outputs and actual outputs of the ith sample in the sliding window; $β_{i}$ is a forgetting factor, and it can be shown as EquationEquation (14)(14) $β_{i} = \frac{2 i}{L (L + 1)}, \sum_{i}^{L} β_{i} = 1.$ (14) .

(14)

β_{i} = \frac{2 i}{L (L + 1)}, \sum_{i}^{L} β_{i} = 1.

(14)

Based on EquationEquations (13)(13) $e_{L} = \sum_{i = 1}^{L} β_{i} {(y_{i} - o_{i})}^{2} .$ (13) and (Equation14(14) $β_{i} = \frac{2 i}{L (L + 1)}, \sum_{i}^{L} β_{i} = 1.$ (14) ), the latest sample has a large amount of information from online learning, the weighting coefficients of the latest sample of the sliding window are more significant than under the old sample, its weighting coefficients are small.

The online self-adaptive optimal algorithm for the RBF network in this paper has three options: adding, pruning, and merging network hidden nodes.

Adding the hidden nodes. For the process of online training, the maximum error of the samples in each sliding window is detected and recorded. Therefore, one hidden node is added to the hidden layer and regarded with its kernel function center as the training sample corresponding to the currently recorded maximum training error when the RBF network is trained to a certain step, and the root mean square error (RMSE) of the training samples in the sliding window does not reach the target value. The RMSE of the training sample in the sliding window is shown in EquationEquation (15)(15) $e_{r m s e} = \sqrt{\frac{\sum_{i = 1}^{L} β_{i} {(y_{i} - o_{i})}^{2}}{L} .}$ (15) .

(15)

e_{r m s e} = \sqrt{\frac{\sum_{i = 1}^{L} β_{i} {(y_{i} - o_{i})}^{2}}{L} .}

(15)

Pruning the hidden nodes. If such inactive hidden nodes can be detected and removed as learning progresses, it will mean that the hidden node will lose the ability of learning. The criterion for judging whether the hidden node is activated is in EquationEquation (16)(16) $ϕ_{h}^{k} (x_{k}) = w_{h}^{k} exp (- \frac{∥ x_{k} - c_{h} ∥^{2}}{σ_{h}}), r_{h}^{k} =∥ \frac{ϕ_{h}^{k}}{ϕ_{max}^{k}} ∥ .$ (16) (Lu, Sundararajan, and Saratchandran Citation1997).

(16)

ϕ_{h}^{k} (x_{k}) = w_{h}^{k} exp (- \frac{∥ x_{k} - c_{h} ∥^{2}}{σ_{h}}), r_{h}^{k} =∥ \frac{ϕ_{h}^{k}}{ϕ_{max}^{k}} ∥ .

(16)

where $ϕ_{h}^{k}$ is the output of the hth hidden node at time k, $w_{h}^{k}$ is the weight connecting from hidden node h to output node at k time. $r_{h}^{k}$ is the normalized output of the hth hidden nodes at time k, $ϕ_{max}^{k}$ is the value of the largest absolute value among the outputs of all hidden nodes at k time. For multiple consecutive training samples, if $r_{h}^{k}$ is less than a threshold $δ$ , it will be pruned.

Merging the hidden nodes. In the process of learning, if the distance of the center and the width are close significantly between the two hidden nodes under the RBF network, according to the characteristics of the local response characteristics of the hidden nodes of the RBF network, the function of the two nodes is almost identical. Thus, we will merge the two hidden nodes into one. This operation will not only simplify the RBF network structure but make no difference to the learning performance of the current network. The relevant parameters can be set as EquationEquation(17)(17) $\{\begin{matrix} w_{i} = w_{i} + w_{j}, \\ c_{i} = (c_{i} + c_{j}) / 2, \\ σ_{i} = max (σ_{i}, σ_{j}) . \end{matrix}$ (17) .

(17)

\{\begin{matrix} w_{i} = w_{i} + w_{j}, \\ c_{i} = (c_{i} + c_{j}) / 2, \\ σ_{i} = max (σ_{i}, σ_{j}) . \end{matrix}

(17)

Using the sliding window method can achieve the online application of the LM algorithm and make the RBF network more robust to changes in learning parameters and faster convergence. The structure optimization algorithm combines the advantages of RAN, MRAN, and GGAP-RBF, and it also overcomes their shortcomings. One hidden node is added to the hidden layer and regarded as its kernel function center as the training sample corresponding to the currently recorded maximum training error. It can not only reduce the training error but avoid adding hidden nodes randomly.

Considering the aforementioned problems, an online self-adaptive optimal algorithm for the RBF network can be obtained and it is shown in .

Table 1. Algorithmic depiction of OSA-RBFNN.

Display Table

Results and Discussion

Computational Complexity Analysis

In the subsection, we compare the computation complexity of the traditional RBF neural network algorithm with OSA-RBFNN.

To build a compact RBF neural network structure, we adopt the sliding window method based on the LM algorithm to optimize the structure by adjusting parameters continuously. RAN, MRAN, and GGAP algorithms all use the gradient descent method, while OSA-RNFNN uses the LM algorithm for training optimization. Let m denote the number of iterations required to reach the object training error, and s denote the training step.

The gradient descent method has the characters of slow convergence and a large number of iterations, the time complexity of one iteration can be approximated as $O (n)$ . Thus, the computational complexity of training can be approximated as $O (m * s * n)$ . While the LM algorithm has fast convergence and a small number of iterations, and one iteration time complexity is $O (n^{3})$ , the computational complexity can be approximated as $O (m * s * n^{3})$ . For the above-mentioned computational complexity, it is shown that the gradient descent method is better than the LM algorithm. However, for the RBF neural network structure, the LM algorithm can calculate the learning rate of each gradient under the curved error surface according to the Hessian matrix to obtain the optima compared to the gradient descent method. It will reduce the number of iterations largely. And OSA-RBFNN based on the LM algorithm uses the latest single sample to adjust parameters dynamically, it can reduce the training step and the training iterations to reach the object training error. Thus, the computational complexity of the LM algorithm is superior to the gradient descent method for OSA-RBFNN.

Non-Linear Function Approximation

We proposed OSA-RBFNN for constructing minimal RBF structure. According to EquationEquation (1)(1) $ϕ_{p} (x_{p}) = exp (- \frac{∥ x_{p} - c_{h} ∥^{2}}{σ_{h}})$ (1) , we build a non-linear function in EquationEquation (18)(18) $y (x) = e x p [- \frac{{(x_{1} - 0.3)}^{2} + {(x_{2} - 0.2)}^{2}}{0.01}] + e x p [- \frac{{(x_{1} - 0.7)}^{2} + {(x_{2} - 0.2)}^{2}}{0.01}] + e x p [- \frac{{(x_{1} - 0.1)}^{2} + {(x_{2} - 0.5)}^{2}}{0.02}] + e x p [- \frac{{(x_{1} - 0.9)}^{2} + {(x_{2} - 0.5)}^{2}}{0.01}] + e x p [- \frac{{(x_{1} - 0.3)}^{2} + {(x_{2} - 0.8)}^{2}}{0.01}] + e x p [- \frac{{(x_{1} - 0.7)}^{2} + {(x_{2} - 0.8)}^{2}}{0.01}]$ (18) which consists of six exponential Gaussian functions (Yingwei, Sundararajan, and Saratchandran Citation1997). The function is the summation of six Gaussian exponential functions; thus, the RBF network should have six nodes with Gaussian functions in its hidden layer. In , “△” indicates the true positions of hidden unit centers from non-linear function, and the circles represent the estimated positions of centers obtained from the minimal RBF network.

Figure 2. The attribution of the true and estimated centers.

(18)

y (x) = e x p [- \frac{{(x_{1} - 0.3)}^{2} + {(x_{2} - 0.2)}^{2}}{0.01}] + e x p [- \frac{{(x_{1} - 0.7)}^{2} + {(x_{2} - 0.2)}^{2}}{0.01}] + e x p [- \frac{{(x_{1} - 0.1)}^{2} + {(x_{2} - 0.5)}^{2}}{0.02}] + e x p [- \frac{{(x_{1} - 0.9)}^{2} + {(x_{2} - 0.5)}^{2}}{0.01}] + e x p [- \frac{{(x_{1} - 0.3)}^{2} + {(x_{2} - 0.8)}^{2}}{0.01}] + e x p [- \frac{{(x_{1} - 0.7)}^{2} + {(x_{2} - 0.8)}^{2}}{0.01}]

(18)

The aim is to construct a minimal RBF network using the method to approximate the function with small error. For this approximation, 2000 training samples $((x_{1}, x_{2}), y)$ were generated randomly, $(x_{1}, x_{2})$ and y present input and output, $x_{i} \in (0, 1), i \in \{1, 2\} .$ And set the length of sliding window L is equal to 20.

Because of the randomness of the training results, we try running the experiment 20 times independently. The algorithm generated six hidden nodes, similar to the non-linear function, by training 16 times. In other times, we got seven hidden nodes. From , the centers, widths, and output weights for the non-linear function in the hidden nodes are quite close to the true values. Thus, OSA-RBFNN can accurately approximate the nonlinear function with minimal network size.

Table 2. Performance comparison of true and estimated value on object function.

Download CSV Display Table

Non-Linear Time-Varying System Identification

We choose a benchmark problem Mackey-Glass (MG) chaotic time series (Harpham and Dawson Citation2006; Jiang et al. Citation2022) generated by the differential delay EquationEquation (19)(19) $\frac{d x}{d t} = \frac{0.2 x (t - τ)}{1 + x^{10} (t - τ)} - 0.1 x (t) .$ (19) to test the ability of OSA-RBFNN for non-linear time-varying system identification.

(19)

\frac{d x}{d t} = \frac{0.2 x (t - τ)}{1 + x^{10} (t - τ)} - 0.1 x (t) .

(19)

And we use the input vector $[x (t) x (t - 6) x (t - 12) x (t - 18)]$ to predict the output vector $x (t + 50)$ . EquationEquation (19)(19) $\frac{d x}{d t} = \frac{0.2 x (t - τ)}{1 + x^{10} (t - τ)} - 0.1 x (t) .$ (19) is a static chaotic time series under the condition that τ is constant. We set τ equal to 17, 30, 50, and 100 separately to test the ability of an online self-adaptive algorithm for the RBF network to build time-varying systems. And the chaotic behavior of the system increases with the delay coefficients. shows how the delay between x(t) and x(t + 50) becomes more chaotic as τ increases.

Figure 3. Chaotic behavior of x(t) and x(t + 50).

To build a time-varying system, mixing the static MG sequences with delay coefficients of 17, 30, 50, and 100 were used, and the mixing method is shown according to EquationEquations (20)(20) $f (x_{t}) = α (t) f_{1} x (t) + (1 - α (t)) f_{2} (x_{t}),$ (20) and (Equation21(21) $α (t) = exp (- 5 t / T), t = 1, 2, \dots, T .$ (21) ).

(20)

f (x_{t}) = α (t) f_{1} x (t) + (1 - α (t)) f_{2} (x_{t}),

(20)

(21)

α (t) = exp (- 5 t / T), t = 1, 2, \dots, T .

(21)

By mixing four static MG sequences, i.e. τ = 17→30, τ = 30→50, τ = 50→ 100, τ = 100→50, τ = 50→30, τ = 30→17. Each changing process has 500 data. Thus, we obtain a total of 3000 samples, the first 2500 samples were selected as training samples and the last 500 samples for testing. Set the length of sliding window L is equal to 200.

In , at the beginning of learning, the training error is large, but it is suppressed quickly by adding hidden nodes. It shows that OSA-RBFNN models the actual output well. shows the training RMSE is large at the beginning of learning and it has a small fluctuation when constructing the network, but the overall trend tends to converge. presents that when t is equal to 241, the hidden nodes reach 36, which completes the construction of the network by adding nodes. In the whole process of learning, the maximum number of hidden nodes reach 41, and the nodes reaches 39 at the end of the training.

Figure 4. The performance of online learning.

Figure 5. The performance of RMSE.

Figure 6. The change in hidden nodes.

In , the maximum error is within 0.03, and the average error of 500 testing samples is 0.0072. For this experiment, we did 20 independent experiments and set the length of the sliding window L equal to 100, 150, and 200, respectively. The results show that the number of hidden nodes was between 39 and 45, and the average error of 500 testing samples was also 0.0072 ~ 0.0136, which is a stable result.

Figure 7. The testing error of 500 samples.

To further verify the effectiveness of the proposed method, we compare OSA-RBFNN with RAN and MRAN. indicates that the training RMSE and testing error of OSA-RBFNN on the MG series is lower than the other algorithms.

Table 3. The performance of comparison of different algorithms.

Download CSV Display Table

The reasons are that we combine the advantages of RAN and MRAN. RAN can modify the parameters of the node when adding a new one; thus, it will improve the learning speed, but the node cannot be pruned once it is added to the RBF network. MRAN can prune inactive hidden nodes to decrease the redundant hidden nodes of the RBF network. We also choose the LM algorithm to estimate the learning rate of each gradient under the curved error surface according to the Hessian matrix. Thus, the learning ability of the RBF network will be improved compared to RAN and RAN. Moreover, we employ a sliding window method to use the latest sample to adjust parameters, it can enhance the accuracy of learning and make the network more stable.

eTS (Rong, Sundararejan, Huang, and Saratchandran Rong et al. Citation2006)

OSAMNN (Qiao, Zhang, and Bo Citation2012)

Conclusion

Aiming at the deficiencies that the LM algorithm can not train online RBF network, we propose OSA-RBFNN based on the LM algorithm. We combine the sliding window method with the LM algorithm to build an online self-adaptive network. We also adopt the operations of adding, pruning, and merging hidden nodes to optimize the RBF network structure. Moreover, the hidden nodes are directly added to the training sample with the maximum training error, which can effectively suppress the training error of the current network and avoid adding hidden nodes randomly. Pruning and merging hidden nodes can reduce the impact on the RBF network performance and simplify the network structure. Finally, we have simulation analyze on non-linear function approximation and non-linear time-varying system identification, the results demonstrate that the proposed OSA-RBFNN realizes online modeling with a compact and stable structure. It has a generalization ability with a minimal structure.

Code Availability

Sample code is available on Github (https://github.com/YLiu000222/OSARBFNN)

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work is supported by the [Basic Research plan of Natural Science Foundation of Shaanxi Coal Joint Fund] under Grant [No.2019JLZ-08]; [Basic Research Plan of Nature Science in Shaanxi Province of China] under Grant [No.2020JM-522].

References

Arif, J., N. Ray Chaudhuri, S. Ray, et al. 2009. Online Levenberg-Marquardt algorithm for neural network based estimation and control of power systems. 2009 International Joint Conference on Neural Networks. Atlanta, GA, USA, 199–3809. IEEE.
Google Scholar
Atif, S. M., S. Khan, I. Naseem, R. Togneri, and M. Bennamoun. 2022. Multi-kernel fusion for RBF neural networks. Neural Processing Letters. doi:10.1007/s11063-022-10925-3.
Web of Science ®Google Scholar
Gao, X. 2022. A nonlinear prediction model for Chinese speech signal based on RBF neural network. Multimedia Tools and Applications 81:5033–49. doi:10.1007/s11042-021-11612-6.
Web of Science ®Google Scholar
Gu, L., D. K. S. Tok, and D.L. Yu. 2018. Development of adaptive p-step RBF network model with recursive orthogonal least squares training. Neural Compution and Applications 29 (5):1445–54. doi:10.1007/s00521-016-2669-x.
Web of Science ®Google Scholar
Hao, Y., P. D. Reiner, T. Xie, T. Bartczak, and B. M. Wilamowski. 2014. An incremental design of radial basis function networks. IEEE Transactions on Neural Networks and Learning Systems 25 (10):1793–803. doi:10.1109/TNNLS.2013.2295813.
PubMed Web of Science ®Google Scholar
Harpham, C., and C. W. Dawson. 2006. The effect of different basis functions on a radial basis function network for time series prediction: A comparative study. Neurocomputing 69 (16–18):2161–70. doi:10.1016/j.neucom.2005.07.010.
Web of Science ®Google Scholar
Houcine Bergou, E., Y. Diouane, and V. Kungurtsev. 2020. Convergence and complexity analysis of a Levenberg–Marquardt algorithm for inverse problems. Journal of Optimization Theory and Applications 185 (3):1–18. doi:10.1007/s10957-020-01666-1.
Web of Science ®Google Scholar
Jia, L., W. Li, and J. Qiao. 2022. An online adjusting RBF neural network for nonlinear system modeling. Application Intelligence. Advanced online publication. doi: 10.1007/s10489-021-03106-7.
Web of Science ®Google Scholar
Jiang, Q., L. Zhu, C. Shu, and V. Sekar. 2022. An efficient multilayer RBF neural network and its application to regression problems. Neural Computing & Applications 34 (6):4133–50. doi:10.1007/s00521-021-06373-0.
Web of Science ®Google Scholar
Kadakadiyavar, S., N. Ramrao, and M. K. Singh. 2020. Efficient mixture control chart pattern recognition using adaptive RBF neural network. International Journal of Information Technology 12 (4):1271–80. doi:10.1007/s41870-019-00381-z.
Google Scholar
Khan, S., I. Naseem, R. Togneri, and M. Bennamoun. 2017. A novel adaptive kernel for the RBF neural networks. Circuits Systems and Signal Processing 36 (4):1639–53. doi:10.1007/s00034-016-0375-7.
Web of Science ®Google Scholar
La Rosa Centeno, L., F. C. C. De Castro, M. C. F. De Castro, C. Müller, and S. M. Ribeiro. 2018. Cognitive radio signal classification based on subspace decomposition and RBF neural networks. Wireless Networks 24 (3):821–31. doi:10.1007/s11276-016-1376-y.
Web of Science ®Google Scholar
Li, S., Q. Chen, and G.-B. Huang. 2006. Dynamic temperature modeling of continuous annealing furnace using GGAP-RBF neural network. Neurocomputing 69 (4–6):523–36. doi:10.1016/j.neucom.2005.01.008.
Web of Science ®Google Scholar
Lu, Y. W., N. Sundararajan, and P. Saratchandran. 1997. A sequential learning scheme for function approximation using minimal radial basis function neural networks. Neural computation 9 (2):461–78. doi:10.1162/neco.1997.9.2.461.
PubMed Web of Science ®Google Scholar
Meng, X., Y. Zhang, and J. Qiao. 2021. An adaptive task-oriented RBF network for key water quality parameters prediction in wastewater treatment process. Neural Computing & Applications 33 (17):11401–14. doi:10.1007/s00521-020-05659-z.
Web of Science ®Google Scholar
Pedro, M. F., and A. E. Ruano. 2009. Online sliding-window methods for process model adaptation. IEEE Transactions Instrumentation and Measurement 58 (9):3012–20. doi:10.1109/TIM.2009.2016818.
Web of Science ®Google Scholar
Platt, J. 1991. A resource-allocating network for function interpolation. Neural computation 3 (2):213–25. doi:10.1162/neco.1991.3.2.213.
PubMed Web of Science ®Google Scholar
Qiao, J., Z. Zhang, and Y. Bo. 2012. An online self-adaptive modular neural network for time-varying systems. Neurocomputing 125:7–16. doi:10.1016/j.neucom.2012.09.038.
Web of Science ®Google Scholar
Rong, H.-J., N. Sundararajan, G.-B. Huang, and P. Saratchandran. 2006. Sequential adaptive fuzzy inference system (SAFIS) for nonlinear system identification and prediction. Fuzzy Sets Systems 157 (9):1260–75. doi:10.1016/j.fss.2005.12.011.
Web of Science ®Google Scholar
Wilamowski, B. M., and H. Yu. 2010. Improved computation for Levenberg-Marquardt training. IEEE Transactions on Neural Networks 21 (6):930–37. doi:10.1109/TNN.2010.2045657.
PubMedGoogle Scholar
Xi, M., P. Rozycki, J.-F. Qiao, and B. M. Wilamowski. 2018. Nonlinear system modeling using RBF networks for industrial application. IEEE Transactions on Industrial Informatics 14 (3):931–40. doi:10.1109/TII.2017.2734686.
Web of Science ®Google Scholar
Yingwei, L., N. Sundararajan, and P. Saratchandran. 1997. Identification of time-varying nonlinear systems using miniman radial basis function neural networks. IEEE Proceedings-Control Theory Applications 144 (2):202–08. doi:10.1049/ip-cta:19970891.
Google Scholar
Zhang, Y., D. Kim, Y. Zhao, and J. Lee. 2020. PD control of a manipulator with gravity and inertia compensation using an RBF neural network. International Journal of Control, Automation, and Systems 18 (12):3083–92. doi:10.1007/s12555-019-0482-x.
Web of Science ®Google Scholar
Zhou, K., S.K. Oh, and J. Qiu. 2022. Design of ensemble fuzzy-RBF neural networks based on feature extraction and multi-feature fusion for GIS partial discharge recognition and classification. Journal of Electrical Engineering & Technology 17:513–32. doi:10.1007/s42835-021-00941-z.
Web of Science ®Google Scholar

An online self-adaptive RBF network algorithm based on the Levenberg-Marquardt algorithm

ABSTRACT

Introduction