1,274

Views

CrossRef citations to date

Altmetric

Articles

Binary atom search optimisation approaches for feature selection

Jingwei TooFaculty of Electrical Engineering, Universiti Teknikal Malaysia Melaka, MalaysiaCorrespondence[email protected]

https://orcid.org/0000-0001-6908-1038

Abdul Rahim AbdullahFaculty of Electrical Engineering, Universiti Teknikal Malaysia Melaka, Malaysia

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

Atom Search Optimisation (ASO) is a recently proposed metaheuristic algorithm that has proved to work effectively on several benchmark tests. In this paper, we propose the binary variants of atom search optimisation (BASO) for wrapper feature selection. In the proposed scheme, eight transfer functions from S-shaped and V-shaped families are used to convert the continuous ASO into the binary version. The proposed BASO approaches are employed to select a subset of significant features for efficient classification. Twenty-two well-known benchmark datasets acquired from the UCI machine learning repository are used for performance validation. In the experiment, the BASO with an optimal transfer function that contributes to the best classification performance is presented. The particle swarm optimisation (PSO), binary differential evolution (BDE), binary bat algorithm (BBA), binary flower pollination algorithm (BFPA), and binary salp swarm algorithm (BSSA) are used to evaluate the efficacy and efficiency of proposed approaches in feature selection. Our experimental results reveal the superiority of proposed BASO not only in high prediction accuracy but also in the minimal number of selected features.

KEYWORDS:

1. Introduction

With the rapid growth of information technology, the amount of data is evolving exponentially. Ideally, all the information offered by the features is meaningful. Many pattern recognition applications, extracting the features to describe the target concept in the classification tasks. However, the dataset normally contains irrelevant and redundant features, which significantly degrades the performance of the classification model (Wang et al., Citation2018; Peng et al., Citation2005). The excessive number of features not only introduces the extra computational complexity but also increases the prediction error (Labani et al., Citation2018).

Feature selection is one of the effective ways to resolve the issues above. Briefly, feature selection attempts to select a small subset of significant features that can maintain or improve the prediction accuracy (Liu et al., Citation2018). In general, feature selection can be categorised into wrapper and filter approaches. Wrapper approaches utilise a learning algorithm to evaluate the optimal feature subset (Mafarja & Mirjalili, Citation2017). On the one hand, filter approaches use the information theory and statistical analysis over feature space to remove the redundant and irrelevant features (Labani et al., Citation2018). In comparison with wrapper, filter approaches can usually work faster, and they are independent of the learning algorithm. However, wrapper approaches can often achieve better classification results (Xue et al., Citation2014). Thus, this study focuses on the wrapper feature selection.

The main goal of feature selection is to improve classification performance and reduce the number of features. Thus it can be considered as a combinatorial optimisation task (Faris et al., Citation2018). Wrapper feature selection is performed using the metaheuristic algorithms such as genetic algorithm (GA) (Huang & Wang, Citation2006), ant colony optimisation (ACO) (Al-Ani, Citation2005), and binary particle swarm optimisation (BPSO) (Chuang et al., Citation2008), binary grey wolf optimisation (BGWO) (Emary et al., Citation2016a), binary salp swarm algorithm (BSSA) (Faris et al., Citation2018), binary tree growth algorithm (BTGA) (Too et al., Citation2018), multi-verse optimiser (BMVO) (Faris et al., Citation2017), and binary differential evolution (BDE) (Zorarpacı & Özel, Citation2016). Previous works showed that metaheuristic algorithms were having high potential when solving the feature selection problem, which made them received a lot of attention from researchers. However, according to the No Free Lunch theorem (NFL), there is no universal metaheuristic algorithm that can solve all the feature selection problems effectively (Wolpert & Macready, Citation1997). Therefore, more and more new algorithms are required for efficient feature selection.

In the past study, Wang et al. (Citation2017) proposed a modified binary-coded ant colony optimisation (MBACO) in which the GA was used to generate a population of high-quality initial solutions for image classification. Rodrigues et al. (Citation2014) applied the binary bat algorithm (BBA) with the optimum-path forest as part of evaluations for feature selection. Moreover, Mafarja et al. (Citation2019) introduced the binary grasshopper optimisation algorithm (BGOA) for feature selection tasks. The authors indicated that the utilisation of S-shaped and V-shaped transfer functions allowed the BGOA to search around the binary search space. Another study shows the implementation of transfer function into the binary butterfly optimisation algorithm (BBOA) can effectively tackle the feature selection problems (Arora & Anand, Citation2019). More recent studies of wrapper feature selection can be found in (Al-Tashi et al., Citation2019; Emary et al., Citation2016b; Faris et al., Citation2018; Mafarja et al., Citation2018; Mirhosseini & Nezamabadi-pour, Citation2018; Pashaei & Aydin, Citation2017).

Atom Search Optimisation (ASO) is a recently established metaheuristic algorithm inspired by the concept of molecular dynamics (Zhao et al., Citation2019a). To date, ASO has attracted the attention of various researchers due to its efficacy in solving global optimisation for different applications. In comparison with the Water Drop Algorithm (WDA) (Siddique & Adeli, Citation2014), Particle Swarm Optimisation (PSO) (Kennedy, Citation2011), GA and Gravitational Search Algorithm (GSA) (Rashedi et al., Citation2009), ASO can usually find promising solutions. Hence, ASO can be a potential metaheuristic algorithm for other real-world applications such as feature selection. To the best of our knowledge, there is no study to apply the ASO for feature selection, which becomes the motivation of this work. This encourages us to develop the binary version of ASO to tackle the feature selection problem in classification tasks.

In this study, we propose the new binary version of atom search optimisation (BASO) for wrapper feature selection. The BASO integrates the S-shaped or V-shaped transfer function, which allows the search agent to move on the binary search space. Twenty-two benchmark datasets acquired from the UCI machine learning repository are used to validate the performance of proposed BASO, and the results are compared with the other five recent and popular algorithms. From the experiment, it shows that BASO is highly capably in evaluating the optimal feature subset, which leads to promising results.

The rest of paper is organised as follows: Section 2 details the standard atom search optimisation. Section 3 describes the proposed binary atom search optimisation approaches. Section 4 depicts the application of proposed approaches for feature selection. Section 5 discusses the findings of the experiments. Finally, Section 6 concluded the findings of the research work.

2. The atom search optimisation

Atom Search Optimisation (ASO) is a new metaheuristic algorithm proposed by Zhao and his colleagues in 2019 (Zhao et al., Citation2019a). The ASO mimics the basic concept of molecular dynamics and movement principle of atoms such as characteristics of the potential function, interaction force, and geometric constraint force. In ASO, the population of solutions is called atoms, and each atom maintains two vectors, namely, position and velocity.

All the atoms are moving at all times by following the movement principle. Mathematically, the acceleration of atom is defined as: (1) $\begin{aligned} a_{i} = \frac{F_{i} + G_{i}}{m_{i}} \end{aligned}$ (1) where F is the interaction force, G is the constraint force, and m is the mass of the atom.

The interaction force between the ith atom and jth atom is described by the Lennard-Jones (L-J) potential as: (2) $\begin{aligned} F_{i j}^{d} (t) = \frac{24 ε (t)}{σ (t)} [2 {(\frac{σ (t)}{r_{i j} (t)})}^{13} - {(\frac{σ (t)}{r_{i j} (t)})}^{7}] \frac{r_{i j} (t)}{r_{i j}^{d} (t)} \end{aligned}$ (2) and (3) $\begin{aligned} F_{i j}^{'} (t) = \frac{24 ε (t)}{σ (t)} [2 {(\frac{σ (t)}{r_{i j} (t)})}^{13} - {(\frac{σ (t)}{r_{i j} (t)})}^{7}] \end{aligned}$ (3) where ε is the depth of potential, σ is the length scale, r is the distance between two atoms, d is the dimension and t is the current iteration. However, Equation (3) cannot directly apply for optimisation tasks. Thus, a revised version of Equation (3) is designed as follow: (4) $\begin{aligned} F_{i j}^{'} (t) = - η (t) [2 {(h_{i j} (t))}^{13} - {(h_{i j} (t))}^{7}] \end{aligned}$ (4) where η is the depth function to regulate the attraction or repulsion region, and it can be expressed as: (5) $\begin{aligned} η (t) = α {(1 - \frac{t - 1}{T})}^{3} e^{- \frac{20 t}{T}} \end{aligned}$ (5) where α is the depth weight and T is the maximum number of iterations. The function h can be computed as follow: (6) $\begin{aligned} h_{i j} (t) = {\begin{cases} h_{min}, if \frac{r_{i j} (t)}{σ (t)} < h_{min} \\ h_{max}, elseif \frac{r_{i j} (t)}{σ (t)} > h_{max} \\ \frac{r_{i j} (t)}{σ (t)}, otherwise \end{cases} \end{aligned}$ (6) where h_max and h_min are the upper and lower limit of h, and they are set to 1.1 and 2.4, respectively (Zhao et al., Citation2019b). The length scale σ^t can be calculated as: (7) $\begin{aligned} σ (t) = {‖ X_{i j} (t), \frac{\sum_{j \in K} X_{i j} (t)}{k (t)} ‖}_{2} \end{aligned}$ (7) and (8) $\begin{aligned} {\begin{cases} h_{min} = g_{0} + g (t) \\ h_{max} = u \end{cases} \end{aligned}$ (8) where K is the subset of atoms with better fitness values, g₀ and µ are equal to 1.1 and 2.4, respectively. The g is the drift factor that controls the balance between exploration and exploitation, and it is defined as: (9) $\begin{aligned} g (t) = 0.1 \times \sin (\frac{π}{2} \times \frac{t}{T}) \end{aligned}$ (9)

The total interaction force can be expressed as: (10) $\begin{aligned} F_{i}^{d} (t) = \sum_{j \in K} r_{j} F_{i j}^{d} (t) \end{aligned}$ (10) where r is a random number in [0,1]. According to Newton’s third law, the force on the jth atom for the same pairwise interaction is the opposite of force on the ith atom as (Zhao et al., Citation2019b): (11) $\begin{aligned} F_{i j} (t) = - F_{i j} (t) \end{aligned}$ (11)

Furthermore, geometric constraint of the ith atom and constraint force can be expressed as follows: (12) $\begin{aligned} θ_{i} (t) & = [{| X_{i} (t) - X_{b e s t} (t) |}^{2} - b_{i, b e s t}^{2}] \end{aligned}$ (12) (13) $\begin{aligned} G_{i}^{d} (t) & = - λ (t) \nabla θ_{i}^{d} (t) = - 2 λ (t) (X_{i}^{d} (t) - X_{b e s t}^{d} (t)) \end{aligned}$ (13) where X_best is the position of the best atom, b_i,bestis the fixed bond length between ith atom and the best atom, and λ is the Lagrangian multiplier. By making 2λ = λ, the constraint force can be represented as: (14) $\begin{aligned} G_{i}^{d} (t) = λ (t) (X_{b e s t}^{d} (t) - X_{i}^{d} (t)) \end{aligned}$ (14)

The Lagrangian multiplier can be expressed as: (15) $\begin{aligned} λ (t) = β e^{- \frac{20 t}{T}} \end{aligned}$ (15) where β is the multiplier weight. Finally, the acceleration of the atom can be written as: (16) $\begin{aligned} a_{i}^{d} (t) & = \frac{F_{i}^{d} (t)}{m_{i}^{d} (t)} + \frac{G_{i}^{d} (t)}{m_{i}^{d} (t)} \\ = - α {(1 - \frac{t - 1}{T})}^{3} e^{- \frac{20 t}{T}} \sum_{j \in K} \frac{r_{j} [2 {(h_{i j} (t))}^{13} - {(h_{i j} (t))}^{7}]}{m_{i} (t)} \\ \frac{(X_{j}^{d} (t) - X_{i}^{d} (t))}{{| | X_{i} (t), X_{j} (t) | |}_{2}} + β e^{- \frac{20 t}{T}} \frac{X_{b e s t}^{d} (t) - X_{i}^{d} (t)}{m_{i} (t)} \end{aligned}$ (16) where m is the mass of the atom, which can be calculated by (17) $\begin{aligned} M_{i} (t) & = e^{- \frac{F i t_{i} (t) - F i t_{b e s t} (t)}{F i t_{w o r s t} (t) - F i t_{b e s t} (t)}} \end{aligned}$ (17) (18) $\begin{aligned} m_{i} (t) & = \frac{M_{i} (t)}{\sum_{j = 1}^{N} M_{j} (t)} \end{aligned}$ (18) where Fit is the fitness value. Considering the minimisation problem, the Fit_best and Fit_worst are represented as: (19) $\begin{aligned} F i t_{b e s t} (t) = {min}_{i = 1}^{N} F i t (t) \end{aligned}$ (19) (20) $\begin{aligned} F i t_{w o r s t} (t) = {max}_{i = 1}^{N} F i t (t) \end{aligned}$ (20) Then, the velocity and position of the atom are updated as follows: (21) $\begin{aligned} V_{i}^{d} (t + 1) & = r_{1} V_{i}^{d} (t) + a_{i}^{d} (t) \end{aligned}$ (21) (22) $\begin{aligned} X_{i}^{d} (t + 1) & = X_{i}^{d} (t) + V_{i}^{d} (t + 1) \end{aligned}$ (22) where X_i and V_i are the position and velocity of the ith atom, a is the acceleration, d is the dimension of search space, r₁ is a random number in [0,1], and t is the current iteration.

In ASO, the number of best atoms in subset K is used to control the exploration and exploitation phase. (23) $\begin{aligned} k (t) = N - (N - 2) \sqrt{\frac{t}{T}} \end{aligned}$ (23) where N is the number of atoms in the population. Initially, higher k promotes exploration. At the end of the iteration, lower k ensures higher exploitation. The pseudocode of ASO is demonstrated in Algorithm 1.

Table

Binary atom search optimisation approaches for feature selection

Abstract

1. Introduction

2. The atom search optimisation

3. Binary version of atom search optimisation

Table 1. S-shaped and V-shaped transfer functions.

4. Proposed binary atom search optimisation for feature selection

5. Experimental results and discussions

5.1. Data Description

Table 2. List of used datasets.

5.2. Comparison algorithms and evaluation metrics

Table 3. Parameter setting.

5.3. Assessments of proposed BASO approaches in feature selection

Table 4. The best fitness value of proposed BASO approaches on 22 datasets.

Table 5. Result of worst fitness value of proposed BASO approaches on 22 datasets.

Table 6. Result of mean fitness value of proposed BASO approaches on 22 datasets.

Table 7. Result of accuracy of proposed BASO approaches on 22 datasets.

Table 8. Result of feature size of proposed BASO approaches on 22 datasets.

Table 9. The p-value of Wilcoxon signed rank test for BASO-S1 and other BASO approaches.

5.4. Comparison with Other Algorithms

Table 10. Result of best fitness value of six different algorithms on 22 datasets.

Table 11. Result of worst fitness value of six different algorithms on 22 datasets.

Table 12. Result of mean fitness value of six different algorithms on 22 datasets.

Table 13. Result of accuracy of six different algorithms on 22 datasets.

Table 14. Result of feature size of six different algorithms on 22 datasets.

Table 15. The p-value of Wilcoxon signed rank test for BASO and other algorithms.

Table 16. Result of computational time of six different algorithms on 22 datasets.

6. Conclusion

Ethical approval

Acknowledgement

Disclosure statement

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date