1,582
Views
0
CrossRef citations to date
0
Altmetric
Research Article

An ensemble machine learning approach for classification tasks using feature generation

, , &
Article: 2231168 | Received 20 Mar 2023, Accepted 23 Jun 2023, Published online: 11 Jul 2023
 

Abstract

Although machine learning classifiers have been successfully used in the medical and engineering fields, there is still room for improving the predictive accuracy of model classification. The higher the accuracy of the classifier, the better suggestions can be provided for the decision makers. Therefore, in this study, we propose an ensemble machine learning approach, called Feature generation-based Ensemble Support Vector Machine (FESVM), for classification tasks. We first apply the feature selection technique to select the most related features. Next, we introduce an ensemble strategy to aggregate multiple base estimators for the final prediction using the meta-classifier SVM. During this stage, we use the classification probabilities obtained from the base classifier to generate new features. After that, the generated features are added to the original data set to form a new data set. Finally, this new data set is utilised to train the meta-classifier SVM to obtain the final classification results. For example, for a binary classification task, each base classifier has two probabilities (p for one class and 1−p for the other class). In this case, two new features are generated from the combination of probabilities based on these base classifiers. One is the sum of p as new feature 1, and the other is the sum of 1−p as new feature 2. These two new features are then added to the original data set to form the new data set. In the same way, our feature generation method can be easily extended for a multi-class task for generating new features, where the number of features depends on the number of classes. Those generated features from the base estimators (first layer) are added to the original data set to form a new data set. This new data set is used as the input to the second layer (meta-classifier) to obtain the final model. Experiments based on the 20 data sets show that our proposed model FESVM has the best performance compared to the other machine learning classifiers under comparison. In addition, our FESVM has better performance than the original stacking method in the multi-class classification tasks. Statistical results based on the Wilcoxon–Holm method also confirms that our FESVM can significantly outperform the other models. These indicate that our FESVM can be a useful tool for classification tasks, especially multi-classification tasks.

Disclosure statement

No potential conflict of interest was reported by the author(s).