3,019
Views
14
CrossRef citations to date
0
Altmetric
Review

An improved algorithm for incremental extreme learning machine

ORCID Icon, &
Pages 308-317 | Received 27 Nov 2019, Accepted 19 Apr 2020, Published online: 13 May 2020

Abstract

Incremental extreme learning machine (I-ELM) randomly obtains the input weights and the hidden layer neuron bias during the training process. Some hidden nodes in the ELM play a minor role in the network outputs which may eventually increase the network complexity and even reduce the stability of the network. In order to avoid this issue, this paper proposed an enhanced method for the I-ELM which is referred to as the improved incremental extreme learning machine (II-ELM). At each learning step of original I-ELM, an additional offset k will be added to the hidden layer output matrix before computing the output weights for the new hidden node and analysed the existence of the offset k. Compared with several improved algorithms of ELM, the advantages of the II-ELM in the training time, the forecasting accuracy, and the stability are verified on several benchmark datasets in the UCI database.

1. Introduction

The original extreme learning machine (ELM) (Huang, Citation2015; Huang et al., Citation2004) is a supervised learning algorithm based on the single-hidden layer feedforward neural networks (SLFNs) (Huang et al., Citation2015; Huang, Zhu, et al., Citation2006). Compared with other several neural networks, such as the back propagation (BP) networks (Wang et al., Citation2015), the support vector machine (SVM)(Yang et al., Citation2019), the ELM only needs to set up the number of the hidden nodes, and does not need to adjust the input weights repeatedly, which has tremendous profits including the less human-interface, the faster learning speed, the promising generalization capability, and, etc. (Cao et al., Citation2020; Cui et al., Citation2018; Song et al., Citation2018). As the same, there are also advantages of ELM in the classification problems and have been validated in the literature (Inaba et al., Citation2018; Zhao et al., Citation2011).

However, there are some shortcomings of the ELM. For example, the input weights and the bias of hidden nodes of the ELM network are randomly obtained, which results in poor network stability to a certain extent. When there are outliers or disturbances in the training data, the ill-conditioned problems will exist in the output matrix of the hidden layer, which leads to the poor network robustness, the reduced generalization performance, and the reduced forecasting accuracy.

At present, ELM has been widely studied. The ELM can be divided into the fixed ELM and the incremental extreme learning machine (I-ELM) (Huang, Chen, et al., Citation2006). The training process of the fixed ELM is one-shot computation with the fast learning speed (Song et al., Citation2015). Nevertheless, how to choose the best number of hidden layer neurons and the optimal weights and bias in the fixed ELM is a difficult problem. The literature (Cheng et al., Citation2019) proposes an improved training method for ELM with Genetic Algorithms to get the optimal weights and bias after selection, crossover and mutation. An ELM model based on the differential evolution improved whale optimization algorithm is proposed in the literature (Zhou et al., Citation2020), and is applied in the colour difference classification. The literature (Zhou, Wang, et al., Citation2019) proposes a hybrid grey wolf optimization algorithm based on fuzzy weights and a differential evolution algorithm to overcome the problems of ELM, which randomly selects hidden layer weights and biases. Furthermore, the ELM is suitable for the classification (Sun et al., Citation2019), the image processing (Zhou, Gao, et al., Citation2019; Zhou, Chen, Song, Zhu & Liu), and the online fault diagnose (Zhou, Bo, et al., Citation2019), etc.

Compared with the ELM, I-ELM has the characteristics that the output error gradually decreases and is close to zero as the number of hidden neurons increases (Huang, Chen, et al., Citation2006). It is suitable for regression and classification problems in online continuous learning (Xu & Wang, Citation2016; Zhang et al., Citation2019). However, the training speed of the I-ELM is not as fast as that of the ELM because the number of the computation times of output weights is equal to the number of hidden layer nodes. The input weights and the bias of hidden nodes of the I-ELM are also obtained randomly, which will cause the following two problems. First, some of the hidden neurons may play minor roles on the output of the network as their output weights are too small, which makes the learning speed of the network further slowdown. Secondly, these invalid hidden layer neurons will increase the complexity of the network, and slow down the error reduction in training process. To deal with these problems, there have been put forward some improved methods. In (Wu et al., Citation2017) length-changeable incremental extreme learning machine (LCI-ELM) is proposed and improved the generalization performance by adding the number of hidden nodes, but it is difficult to determine the number of neurons increased in the hidden layer optimally. The literature (Zhou et al., Citation2018) proposes the regularization incremental extreme learning machine with random reduced kernel (RKRIELM) algorithm which combines the kernel function and incremental I-ELM to avoid randomness. In (Li et al., Citation2018) an effective hybrid approach based on variable-length incremental ELM and particle swarm optimization algorithm (PSO-VIELM) is proposed to determine the proper hidden nodes and corresponding number in input weights and hidden bias, but the learning speed has decreased.

This paper proposes an improved I-ELM algorithm which is referred to as the improved incremental extreme learning machine (II-ELM) by adding an offset k to the hidden-layer output to obtain the optimal weights. The essential difference between the offset k in the II-ELM and the bias of the hidden nodes is that the bias is randomly determined before computing the output weights of the hidden-layer, whereas the offset k in the II-ELM is obtained after the output weights. The existence of the offset k is discussed, and the validity of the II-ELM is also demonstrated by comparing it with the I-ELM, convex incremental extreme learning machine (CI-ELM) (Huang & Chen, Citation2007), enhanced method for incremental extreme learning machine (EI-ELM) (Huang & Chen, Citation2008), bidirectional extreme learning machine (B-ELM) (Yang et al., Citation2012), self-adaptive differential evolution and the extreme learning machine (SaDE-ELM) (Cao et al., Citation2012), enhanced bidirectional extreme learning machine (EB-ELM) (Cao et al., Citation2019), and genetic algorithm extreme learning machine (GA-ELM) (Cheng et al., Citation2019) on the regression and classification problems from UCI database.

The rest of the paper will be organized as follows. Section 2 briefly reviews the basic principles of I-ELM. Section 3 proposes the improved algorithm for the I-ELM. Comparisons of the simulation and evaluation of the results are presented in Section 4. The conclusion is arranged in the section 5.

2. Brief review of I-ELM

The structure of the I-ELM network is shown in . It is composed of m inputs, l hidden nodes, and n outputs. ωi is the 1×m input weight matrix of the current hidden layer neuron, whose entries are random numbers uniformly distributed between [−1, 1].

Figure 1. The structure of I-ELM network.

Figure 1. The structure of I-ELM network.

biis the bias of the i-th hidden node, and its value is a random number uniformly distributed between [−1, 1] when the activation function for the hidden layer neuron is the additive type corresponding to (1). β is the l×n output weight matrix.

The activation function for additive hidden nodes is given below: (1) g(x)=11+exp(x)(1) where x is the input matrix.

Given a training set{(X, Y)}, where X is a m×N matrix that represents the input of N data sets and Y is a n×N matrix that represents the output of N data sets. The training steps of the I-ELM algorithm will be summarized as follows:

Step 1: Initialization. Let l=0, the maximum number of the hidden nodes is set to be L, and the expected training accuracy is set to be ϵ. The initial value of the residuals E (the error between the actual output and the target output of the network) is set to be the output Y.

Step 2: Training steps. While l < L and ‖E‖ > ϵ

  1. The number of the hidden nodes l increase by 1: l = l+1.

  2. The input weights ωl and bias bl of the newly increased hidden layer neuron ol is obtained randomly.

  3. Calculate the output of the activation function g(x’) for the node ol. (It is needed to extend bl into a 1×N vector bl.) (2) x=ωlX+bl(2)

  4. Calculate the output vector H¯ of hidden layer neurons: (3) H¯=g(x)(3)

  5. Calculate the output weight for ol: (4) β¯=EH¯TH¯H¯T(4)

  6. Calculate the residuals error after increasing the new hidden node: (5) E=Eβ¯H¯(5)

The output weight obtained from (4) is capable of decreasing the error of the network at the fastest speed. Above steps should be repeated until the residuals is smaller than the expected error ϵ. Otherwise, when l > L and the error is still larger than the expected error ϵ, the training process should be restarted. In most of the cases, it is caused by the random determination of the input weight ωl and the bias bl, so one should restart the training process.

Finally, according to the given testing set {(X’, Y’)}, one can test whether the trained network meets the requirements or not.

3. Proposed improved algorithm for the I-ELM(II-ELM)

To address the problem that the residuals of the I-ELM network declines slowly in its training process, an improved algorithm is proposed in this section. First, the hidden-layer output matrix H is adjusted by adding an additional offset k to the output weights. Then the output weight β’ is calculated to generate the updated output matrix H’. It should be noted that for some cases, the network is optimal with k equal to 0, which means that the hidden nodes do not need to add the offset k. The computation process of offset k is as follows.

It is assumed that E = [e1,e2, … ,eN], H = [h1,h2, … ,hN], the value of the offset is k, and H'=[(h1+k),(h2+k), … , (hN+k)]. The following part shows how to obtain the optimal value of k.

  1. Calculate the residuals error Z’ between E and H':

According to (4), the output weight β’ at this point is (6) β=EHTHHT=i=1Nei(hi+k)i=1N(hi+k)2(6) The root mean square error, denoted as Z’, is calculated as follows: (7) Z=1Ni=1N(eiβ(hi+k))2(7) Substitute the value of β’ obtained by (6) into (7): (8) Z=1Ni=1Nei2i=1Nei(hi+k)2i=1N(hi+k)2(8) From (7), it is obvious that the value of Z’ is always greater than or equal to zero regardless of the value of k. It means that the value of k can be in the range of (-∞, +∞).

(2)

Calculate the value of k that minimizes Z’

To minimize Z’, the derivative of (9) can be taken: (9) y(k)=i=1Nei2i=1Nei(hi+k)2i=1N(hi+k)2(9) Defining the following variables: a1=i=1Nei2, a2=2i=1Neii=1Neihi, a3=i=1Neihi2, b1 = N, b2 = 2i = 1Nhi, and b3=i=1Nhi2, then the derivative of (9) can be written as: (10) y(k)=(a1b2a2b1)k2+2(a1b3a3b1)k+(a2b3a3b2)(b1k2+b2k+b3)2(10)

To calculate the extreme of y(k) in (9), denote c1 = a1b2a2b1, c2 = 2(a1b3a3b2), c3 = a2b3a3b2, and let y′(k) = 0.

Assuming that the derivative y′(k) = 0 has no solution, i.e. b1k2 + b2k + b3=0, and i=1N(hi+k)2=b1k2+b2k+b3, the entries in H must be all equal. When the entries in H are all equal, one has a1/b1=a2/b2=a3/b3=(i=1Nei)2/N. However, from (9), y(k)=i=1Nei2(i=1Nei)2/N is a constant value and its derivative y′(k) = 0. Therefore, the assumption is invalid, namely, there is no point where the derivative y′(k) = 0 has no solution.

The extreme point k obtained by (10) is also the extreme point of Z’ in (7) and (8). The minimum k value of Z’ is in the extreme point, the derivative-free solution point, and the boundary point (k=+∞ or k=-∞), respectively.

However, when k=+∞ or k=-∞, the output weight β’ is zero according to (6), which is not suitable. Furthermore, if the derivative y′(k) has no solution, the value of k, which is capable of minimizing Z’, is at the extreme points. This value of k should be taken and the corresponding output weight β’ should be calculated. When the value of k minimizes the Z’ and k is on the boundary points, the hidden layer node is invalid. When Z’ is a constant value (the entries in the hidden layer output matrix H’ are all equal, which is highly unlikely), the reduction of the network error is the same regardless of the value of k. therefore, k can be set at 0 and the corresponding output weight β’ can be calculated accordingly.

Several types of extreme points of y(k) are discussed as follows:

  1. c1≠0.

The equation y′(k) = 0 can be considered as solving a single variable quadratic equation. One should determine whether its solution exists or not.

Let a22=4a1a3, and its discriminant is evaluated: (11) Δ=c224c1c3=[2(a1b3+a3b1)a22b22]20(11) Because Δ≥0, there are two possible values for the extreme point k, denoted as k1 or k2: (12) k1=c2+c224c1c32c1(12) (13) k2=c2c224c1c32c1(13) When Δ>0, the two extreme points are not the same: one is the minimum point, and the other is the maximum point.

In this case, y′(k) =−(kk1)(kk2)/(b1k2 + b2k + b3)2. The error Z’ rises or decreases monotonically when y′(k) > 0 or y′(k) < 0, respectively. So, it can be concluded that when k ranges from the negative infinite to the positive infinite, the error Z’ will first increase gradually from the error e (the error before adding the current new hidden node) to the maximum, then decrease gradually to a minimum value, and finally increase gradually to the error e (or is just the reverse of the above process). Therefore, the minimum point can minimize the error Z’. By substituting these two extreme points into (8), the global minimum for Z’ in (8) can be calculated, and then the corresponding output weight β’ can be obtained by (6).

When Δ = 0, the two extreme points merge to one, then (14) 2(a1b3+a3b1)a2b2=0(14) Because a22=4a1a3, as long as one of a1, a2, and a3 is zero, the other two are also zero (b1=N≠0, b3>0, the probability that all entries in H are equal is almost zero). If a1, a2, and a3 are all zero, then, c1=0, which violates the previous assumption. Thus, a1, a2, and a3 are all nonzero. Therefore, (14) can be written as: (15) b1a1+b3a3=2b2a2(15)

Given the fact that a1, a3, b1, and b3 are greater than zero, (15) can be written as (16) b1a1b3a32+2b1a1b3a32b2a2=0(16) Since a22=4a1a3, let 4b1b3>b22, and divide both sides of 4b1b3>b22 by a22: (17) b1a1b3a3b2a2(17)

According to (16) and (17), one can have (18) b1a1=b3a3(18) Substituting (18) into (15), one has (19) b1a1=b2a2=b3a3(19)

Then c1 will be equal to zero, which violates the previous assumption. It can be concluded that c1=0 when Δ=0. Therefore, it can be concluded that Δ≠0 when c1≠0.

(2)

If c1=0 and c2≠0, the value of extreme point k is: (20) k=c3c2(20)

Because c1 = 0, and c1=a1b2a2b1=2i=1Neii=1Neii=1NhiNi=1Neihi, one has i=1Nei=0 or i=1Neii=1NhiNi=1Neihi = 0.

If i=1Nei=0, then c2=2(a1b3a3b1)=2Ni=1Neihi2, thus c2 < 0 (c2 ≠ 0). If the value of k is less than the value of the extreme point, then y′(k) < 0 and the error Z’ decreases monotonically. If the value of k is greater than the value of the extreme point, then y′(k) > 0 and the error Z’ increases monotonically. Therefore, the extreme point is the minimum, and the error Z’ will gradually decrease from the error e to the minimum first, and then gradually increase to the error e. By calculating the k value of this point, the corresponding output weight β’ can be obtained by (6).

If i=1Neii=1NhiNi=1Neihi=0, then i=1Nei2i=1Nhi2=N2i=1Neihi2 and c2=2i=1Nei2Ni=1Nhi2i=1Nhi2N. It has been proven that Ni=1Nhi2i=1Nhi20, thus c2<0 (c2≠0). If the value of k is less than the value of the extreme point, then y′(k) > 0; and if the value of k is greater than the value of the extreme point, then y′(k) < 0. Therefore, the extreme point is the maximum, and the error Z’ will increase gradually from the error e to the maximum first, and then decrease gradually to the error e. Hence, these hidden nodes are invalid and will be deleted.

(3)

If c1=0 and c2=0, then k=0.

When c1 = a1b2-a2b1 = 0, c2 = 2(a1b3-a3b1) = 0, the details are discussed as follows.

If a1 = 0, then b1 = N≠0. Thus, a2 = 0, a3 = 0, and c3 = a2b3-a3b2 = 0. If a1 ≠ 0, then c3 = a2b3-a3b2 = a2a3b1/a1-a2a3b1/a1 = 0.

If c1 = 0 and c2 = 0, then c3 = 0. In this case y′(k) = 0 and the error Z’ is a constant. Therefore k can be set at 0, and the corresponding output weight β’ can be calculated accordingly.

The discussion above solves for the offset k which adjusts the output matrix H’, and the corresponding optimal output weight β’. It shows that the II-ELM algorithm takes the point where the error is minimized, and II-ELM is same as I-ELM when the offset k=0. Therefore, when a new node is added, the output error in II-ELM can be reduced faster then I-ELM under the same initial error.

Only one difference from I-ELM in the training steps of the II-ELM, that is needing to modify the substep 5 in Step 2 of the I-ELM: calculating the output offset k and the output weight β’ of the newly added hidden nodes. Therefore, the training time of the II-ELM and the I-ELM is almost the same when a new hidden node is added. Thus, the training speed of the II-ELM is faster than the I-ELM under the same expected error ϵ.

4. Performance evaluation

In this section, in order to validate the effectiveness of II-ELM, several simulations of regression and classification problems are carried out to compare the performance of II-ELM with the I-ELM done. In addition, the degrees of improvement of the I-ELM in regression problems by the II-ELM, CI-ELM, EI-ELM, B-ELM, SaDE-ELM, EB-ELM, and GA-ELM are also compared too. All datasets in this paper are selected from the UCI database. Besides, in order to compare the performance improvement of the II-ELM, CI-ELM, EI-ELM, B-ELM, SaDE-ELM, EB-ELM, and GA-ELM relative to the I-ELM, the data used in the II-ELM are identical to those used in the II-ELM, CI-ELM, EI-ELM, B-ELM, SaDE-ELM, EB-ELM, and GA-ELM. The range of the normalized data is (−1, 1). The error ϵ in the simulations is set as 0.0001. All the simulations have been carried out in the Matlab R2014b running on a desktop PC (2.99 GHz CPU, 4GB RAM, and Windows 7 OS) with the same environment.

4.1 Regression simulation

shows the specification of the ten real regression problems of the simulations in this paper. In order to verify the improvement of the II-ELM compared with the I-ELM in avoiding the problem of the too small output weight and invalid hidden layer neurons by adding offset k, this section simulated the distribution of the number of hidden layer neurons corresponding to the absolute value of the output weight β of the two models in the interval (0, 0.1), respectively. The number of neurons in the hidden layer of I-ELM and II-ELM are all set to 200, respectively. The two models are run 100 times in cycles, and the average value is calculated. The simulation results are shown in .

Figure 2. Comparisons of the distribution of the absolute value of the output weight between I-ELM network and II-ELM network on Boston housing. (a) Distribution of the I-ELM output weight absolute values in the interval (0, 0.1). (b) Distribution of the II-ELM output weight absolute values in the interval (0, 0.1). (c) Distribution of the I-ELM output weight absolute values in the interval (0, 0.005). (d) Distribution of the II-ELM output weight absolute values in the interval (0, 0.005).

Figure 2. Comparisons of the distribution of the absolute value of the output weight between I-ELM network and II-ELM network on Boston housing. (a) Distribution of the I-ELM output weight absolute values in the interval (0, 0.1). (b) Distribution of the II-ELM output weight absolute values in the interval (0, 0.1). (c) Distribution of the I-ELM output weight absolute values in the interval (0, 0.005). (d) Distribution of the II-ELM output weight absolute values in the interval (0, 0.005).

Table 1. Specification of benchmark on regression problems.

There are 190 and 184 neurons in the hidden layer, which corresponding absolute values of output weights β fall in the interval (0, 0.1), respectively in I-ELM and II-ELM. From (a) and (b), it can be seen that the number of neurons in the hidden layer corresponding to the absolute value of the output weight β of the II-ELM in the interval (0,0.02) is significantly less than that of the I-ELM, while in the interval (0.02,0.1) it is significantly higher than that of the I-ELM. For further comparison, the results of the number of neurons in the hidden layer corresponding to the absolute value of the output weight β of the two models in the interval (0,0.005) are shown in (c) and (d). The absolute values of output weights β of the I-ELM and II-ELM in the interval (0, 0.005) correspond to 70 and 30 neurons in the hidden layer, respectively.

According to the above results, the number of invalid neurons in the hidden layer of the II-ELM is significantly smaller than that of the I-ELM after the output offset k is added. The main idea is to find an optimal offset k, and added it to the output vector H of the new added neuron before calculate its weights to the output layer in each training cycle. The optimal offset k can make (8) get minimum value, it means that the residuals error will decline more quickly, and according to (5), the β that closed to zero will be deduced. Therefore II-ELM not only reduces the number of invalid neurons in the I-ELM network, and speeds up the learning speed, but also enhances the stability of the I-ELM network structure and improves the forecasting accuracy.

shows the decline curves of the testing error of the network obtained by the I-ELM, II-ELM, CI-ELM, EI-ELM, B-ELM, SaDE-ELM, and EB-ELM with 200 hidden nodes. We suppose ten hidden nodes are randomly generated at each step of the EB-ELM. It can be seen that the learning effect of the II-ELM is much faster than the I-ELM, II-ELM, CI-ELM, EI-ELM, B-ELM, SaDE-ELM, and EB-ELM.

Figure 3. Comparisons between I-ELM network, II-ELM network, CI-ELM network, EI-ELM network, B-ELM network, EB-ELM network, and SaDE-ELM network on Abalone data. (a) Training results on Abalone. (b) Training results on Boston housing.

Figure 3. Comparisons between I-ELM network, II-ELM network, CI-ELM network, EI-ELM network, B-ELM network, EB-ELM network, and SaDE-ELM network on Abalone data. (a) Training results on Abalone. (b) Training results on Boston housing.

Tables show the training results obtained by the I-ELM, II-ELM, CI-ELM, EI-ELM, B-ELM, SaDE-ELM, EB-ELM, and GA-ELM, respectively. The hidden nodes of them is set to 200. We suppose ten hidden nodes are randomly generated at each step of the EB-ELM. In this paper, the initial values of the parameters of GA are experimentally determined to achieve the best performances, and as follows. The population size is 30, cross probability Pc=0.9, mutation probability Pm=0.01, iteration number of genetic algorithm is 100, root mean square error is 1×10−5, and using elite strategies (four eligible persons passed directly to the next generation). Average results are collected by the 100 trails for the regression problems under the same experimental conditions. In addition, Tables also show the training results of the II-ELM with 30 hidden notes, which is represented as the II-ELM (30).

Table 2. Training time (Second).

Table 3. Testing RMSE.

Table 4. Standard deviation of testing RMSE.

From , it is observed that with the same number of the hidden nodes, the rank of the training time in the eight algorithms (I-ELM, II-ELM, CI-ELM, EI-ELM, B-ELM, SaDE-ELM, EB-ELM, and GA-ELM) is: I-ELM < II-ELM < EI-ELM < SaDE-ELM < CI-ELM < EB-ELM < B-ELM < GA-ELM. The reason is that instead of the random generation, the optimization of the hidden neurons in the improved I-ELM always needs the greater computation workloads. But in , the training time taken by the II-ELM is almost the same as the I-ELM. Compared to other improved I-ELM based methods, the advantage of the II- ELM in the training speed is evident.

From , it is observed that in the CCPP dataset, CCS dataset, and gas turbines dataset, the rank of the testing RMSE in the eight algorithms(I-ELM, II-ELM, CI-ELM, EI-ELM, B-ELM, SaDE-ELM, EB-ELM, and GA-ELM) is GA-ELM < II-ELM < EB-ELM < CI-ELM < EI-ELM < B-ELM < I-ELM < SaDE-ELM. In the above three datasets, the forecasting accuracy of the II-ELM model and GA-ELM model is very close. In the remaining seven datasets, the rank of the testing RMSE in the eight algorithms(I-ELM, II-ELM, CI-ELM, EI-ELM, B-ELM, SaDE-ELM, EB-ELM, and GA-ELM) is: II-ELM < GA-ELM < EB-ELM < CI-ELM < EI-ELM < B-ELM < I-ELM < SaDE-ELM. The GA-ELM model uses the GA algorithm to optimize the random input weights and thresholds of the ELM model, which greatly improves the accuracy. However it needs to iterate the GA algorithm many times, so that the training time is greatly increased which can be seen from . Considering the forecasting accuracy and training time of the model, the II-ELM model is superior to the GA-ELM model.

From , it is observed that with the same number of the hidden nodes, the rank of the standard deviation of the testing RMSE of the I-ELM, II-ELM, CI-ELM, EI-ELM, B-ELM, SaDE-ELM, EB-ELM, and GA-ELM is: II-ELM < EB-ELM < GA-ELM < B-ELM < EI-ELM < I-ELM < CI-ELM < SaDE-ELM. Therefore it can be concluded that compared with the CI-ELM, EI-ELM, B-ELM, SaDE-ELM, EB-ELM, and GA-ELM, the network stability of the II-ELM is higher as well.

From Tables , the following observation is also made. Comparing the II-ELM algorithm with adding 30 hidden nodes with other algorithms with adding 200 hidden nodes, it is found that the training time of the II-ELM is greatly reduced when the testing RMSE and the standard deviation of the testing RMSE are almost the same as the other algorithms. Thus, the learning speed of the II-ELM algorithm is faster than the CI-ELM, EI-ELM, I-ELM, B-ELM, SaDE-ELM, EB-ELM, and GA-ELM with the same error.

4.2 Classification simulation

gives an illustration of the ten real classification problems of the simulations in this paper. shows the rising curves of the classification accuracy of the network obtained by the I-ELM, II-ELM, B-ELM, and SaDE-ELM when adding 200 hidden nodes. It can be seen from that the rising rate of the accuracy in the II-ELM is much faster than the I-ELM, B-ELM, and SaDE-ELM, for both the additive hidden nodes and RBF(the radial basis function) hidden nodes cases.

Figure 4. Comparisons between the I-ELM network, II-ELM network, B-ELM network, and SaDE-ELM network on the Banknote authentication data.

Figure 4. Comparisons between the I-ELM network, II-ELM network, B-ELM network, and SaDE-ELM network on the Banknote authentication data.

Table 5. Specification of the benchmark on classification problems.

shows the results of the training on the ten real classification problems obtained by the I-ELM and the II-ELM with adding 200 hidden nodes.

Table 6. Correct rate of the classification.

From , it is observed that with the same number of the hidden nodes, the accuracy of the classification in the II-ELM is higher than in the I-ELM, for both the additive hidden nodes and the RBF hidden nodes cases.

5. Conclusion

This paper proposes an enhanced algorithm for the I-ELM in which an offset k is added to the output of the hidden nodes, which make the training error of the network reducing more rapidly with the same input weight and bias of the hidden nodes. The algorithm not only improves the training speed of the network, but also improves the testing accuracy. By comparing the performances of the CI-ELM, EI-ELM, B-ELM, SaDE-ELM, EB-ELM, and GA-ELM with the II-ELM on the regression problems, the simulation results show that the II-ELM is better than the CI-ELM, EI-ELM, B-ELM, SaDE-ELM, EB-ELM, and GA-ELM on the learning speed, the network accuracy, and the network stability. By comparing the performances in the II-ELM with I-ELM, B-ELM, and SaDE-ELM, the simulation results demonstrate that the II-ELM is better than the I-ELM, B-ELM, and SaDE-ELM on the classification problems.

However, the II-ELM model proposed in this paper also has certain limitations. The II-ELM model can only reduce invalid neurons in the network and cannot eliminate all invalid neurons, and the input weights and thresholds of the II-ELM model are still obtained randomly. In the next work, the input weights and thresholds of the II-ELM can be further optimized.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work was supported by the National Natural Science Foundation of China [grant number 51767005]; [Natural Science Foundation of Guangxi Province [grant number 2016GXNSFAA380327].

References