An ensemble machine learning approach for classification tasks using feature generation

Table 2. Data set information after preprocessing.

Table 3. Experimental results from the RF, LR, GNB, MLP, KNN, SVM, Stacking, and FESVM using the first eight data sets in Table .

Table 4. Experimental results from the RF, LR, GNB, MLP, KNN, SVM, Stacking, and FESVM using the ninth to sixteenth data sets in Table .

Table 5. Experimental results from the RF, LR, GNB, MLP, KNN, SVM, Stacking, and FESVM using the last four data sets in Table .

Figure 10. Comparison of the average performance results of each model based on the 20 data sets.

Figure 11. Critical difference diagram of Accuracy based on the 20 data sets.

Figure 12. Critical difference diagram of F1-Score based on the 20 data sets.

Table 6. Statistical test results for the RF, LR, GNB, MLP, KNN, SVM, Stacking, and FESVM in terms of accuracy (p-values less than 0.05 are highlighted in bold).

Table 7. Statistical test results for the RF, LR, GNB, MLP, KNN, SVM, Stacking, and FESVM in terms of F1-Score (p-values less than 0.05 are highlighted in bold).

Table 8. Comparison of computation time based on different input features used for the 20 data sets using our proposed FESVM where t1 is the computation time using all features, t2 is the computation time using generated features, t3 is the computation time using feature selection, and t4 is the computation time using the features from the feature generation and feature selection.

Lohweg, V. (2012). Banknote authentication data set. https://archive.ics.uci.edu/ml/datasets/banknote+authenticationl

Patrício, M., Pereira, J., Crisóstomo, J., Matafome, P., Gomes, M., Seiça, R., & Caramelo, F. (2018, January). Using resistin, glucose, age and BMI to predict the presence of breast cancer. BMC Cancer, 18(1), 29. 10.1186/s12885-017-3877-1

PubMedGoogle Scholar

Shapiro, A. (1989). Chess (King-Rook vs. King-Pawn) Data Set. https://archive.ics.uci.edu/ml/datasets/Chess+King-Rook+vs.+King-Pawn

Sejnowski, T. J., & Rosenberg, C. R. (1987). Parallel networks that learn to pronounce English text. Complex System, 1(1), 145–168.

Madan, H. (2022). 3-wine classification dataset. https://www.kaggle.com/datasets/tug004/3wine-classification-dataset

Rahman, R. (2022). Heart Attack Analysis & Prediction Dataset. https://www.kaggle.com/datasets/rashikrahmanpritom/heart-attack-analysis-prediction-dataset

Mehra, A. (2022b). roomoccupancy. https://www.kaggle.com/datasets/aahanmehra/roomoccupancy

Krzysztof J. Cios, L. A. K. (2001). SPECTF Heart Data Set. https://archive.ics.uci.edu/ml/datasets/SPECTF+Heart

Dua, D., & Graff, C. (2017). Statlog (Heart) Data Set. https://archive.ics.uci.edu/ml/datasets/Statlog+Heart

de Alencar Barreto, G., & da Rocha Neto, A. R. (2011). Vertebral Column Data Set. https://archive.ics.uci.edu/ml/datasets/Vertebral+Column

H, M. Y. (2022). BMI Dataset. https://www.kaggle.com/datasets/yasserh/bmidataset

Mehra, A. (2022a). 3-wine classification dataset. https://www.kaggle.com/datasets/tug004/3wine-classification-dataset

Bohanec, M. (1997). Car Evaluation Data Set. https://archive.ics.uci.edu/ml/datasets/Car+Evaluation

Koklu, M., & Ozkan, I. A. (2020). Multiclass classification of dry beans using computer vision and machine learning techniques. Computers and Electronics in Agriculture, 174, 105507. https://doi.org/10.1016/j.compag.2020.105507

Johnson, B. A., Tateishi, R., & Xie, Z. (2012). Using geographically weighted variables for image classification. Remote Sensing Letters, 3(6), 491–499. https://doi.org/10.1080/01431161.2011.629637

Ertam, F. (2019). Internet Firewall Data Set. https://archive.ics.uci.edu/ml/datasets/Internet+Firewall+Data

Fisher, R. (1988). Iris data set. https://archive.ics.uci.edu/ml/datasets/Iris

Er, M. B., & Aydilek, I. B. (2019). Music emotion recognition by using chroma spectrogram and deep visual features. International Journal of Computational Intelligence Systems, 12(2), 1622–1634. https://doi.org/10.2991/ijcis.d.191216.001

Gyamfi, K. S., Brusey, J., Hunt, A., & Gaura, E. I. (2018). Linear dimensionality reduction for classification via a sequential Bayes error minimisation with an application to flow meter diagnostics. Expert Systems with Applications, 91, 252–262. https://doi.org/10.1016/j.eswa.2017.09.010

Abdelhamid, N., Ayesh, A., & Thabtah, F. A. (2014). Phishing detection based associative classification data mining. Expert Systems with Applications, 41(13), 5948–5959. https://doi.org/10.1016/j.eswa.2014.03.019