704
Views
1
CrossRef citations to date
0
Altmetric
Research Article

A novel bias-alleviated hybrid ensemble model based on over-sampling and post-processing for fair classification

, , &
Article: 2184310 | Received 12 Dec 2022, Accepted 21 Feb 2023, Published online: 17 Mar 2023

References

  • Agrawal, D. K., Kirar, B. S., & Pachori, R. B. (2019). Automated glaucoma detection using quasi-bivariate variational mode decomposition from fundus images. IET Image Processing, 13(13), 2401–2408. https://doi.org/10.1049/iet-ipr.2019.0036
  • Asuncion, A., & Newman, D. (2007). UCI machine learning repository. School of Information and Computer Science, University of California. http://www.ics.uci.edu/~mlearn/MLRepository.html
  • Bellamy, R. K. E., Mojsilovic, A., Nagar, S., Ramamurthy, K. N., Richards, J., Saha, D., Sattigeri, P., Singh, M., Varshney, K. R., Zhang, Y., Dey, K., Hind, M., Hoffman, S. C., Houde, S., Kannan, K., Lohia, P., Martino, J., & Mehta, S. (2019). AI fairness 360: An extensible toolkit for detecting and mitigating algorithmic bias. IBM Journal of Research and Development, 63(4/5), 4:1-4:15. https://doi.org/10.1147/JRD.2019.2942287
  • Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140. https://doi.org/10.1007/BF00058655
  • Brodersen, K. H., Ong, C. S., Stephan, K. E., & Buhmann, J. M. (2010). The balanced accuracy and its posterior distribution. In Proceedings of the 20th International Conference on Pattern Recognition, August 23–26, Istanbul, Turkey, pp. 3121–3124.
  • Calmon, F., Wei, D., Vinzamuri, B., Ramamurthy, K. N., & Varshney, K. R. (2017). Optimized pre-processing for discrimination prevention. Advances in Neural Information Processing Systems, 30, 3992–4001. https://proceedings.neurips.cc/paper/2017/hash/9a49a25d845a483fae4be7e341368e36-Abstract.html
  • Celis, L. E., Huang, L., Keswani, V., & Vishnoi, N. K. (2019). Classification with fairness constraints: A meta-algorithm with provable guarantees. In Proceedings of 2019 ACM Conference on Fairness, Accountability, and Transparency, January 29–31, Atlanta, GA, USA, pp. 319–328.
  • Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953
  • Chen, T. Q., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 13–17, San Francisco, USA, pp. 785–794.
  • d’Alessandro, B., O’Neil, C., & LaGatta, T. (2017). Conscientious classification: A data scientist’s guide to discrimination-aware classification. Big Data, 5(2), 120–134. https://doi.org/10.1089/big.2016.0048
  • Dastile, X., Celik, T., & Potsane, M. (2020). Statistical and machine learning models in credit scoring: A systematic literature survey. Applied Soft Computing, 91, 106263. https://doi.org/10.1016/j.asoc.2020.106263
  • Devi, D., Biswas, S. K., & Purkayastha, B. (2019). Learning in presence of class imbalance and class overlapping by using one-class SVM and undersampling technique. Connection Science, 31(2), 105–142. https://doi.org/10.1080/09540091.2018.1560394
  • Fish, B., Kun, J., & Lelkes, Á. D. (2016). A confidence-based approach for balancing fairness and accuracy. In Proceedings of 2016 SIAM International Conference on Data Mining, June 5–7, Miami, Florida, USA, pp. 144–152.
  • Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In Proceedings of the 13th International Conference on Machine Learning, July 3–6, Bari, Italy, pp. 148–156.
  • Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 1189–1232. https://doi.org/10.1214/aos/1013203451
  • Guzmán-Ponce, A., Sánchez, J. S., Valdovinos, R. M., & Marcial-Romero, J. R. (2021). DBIG-US: A two-stage under-sampling algorithm to face the class imbalance problem. Expert Systems with Applications, 168, 114301. https://doi.org/10.1016/j.eswa.2020.114301
  • Hardt, M., Price, E., & Srebro, N. (2016). Equality of opportunity in supervised learning. Advances in Neural Information Processing Systems, 29, 3315–3323. https://proceedings.neurips.cc/paper/2016/hash/9d2682367c3935defcb1f9e247a97c0d-Abstract.html
  • He, H. B., Bai, Y., Garcia, E. A., & Li, S. T. (2008). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), June 1–8, Hong Kong, China, pp. 1322–1328.
  • Iosifidis, V., Fetahu, B., & Ntoutsi, E. (2019). FAE: A fairness-aware ensemble framework. In Proceedings of 2019 IEEE International Conference on Big Data, December 9–12, Los Angeles, CA, USA, pp. 1375–1380.
  • Iosifidis, V., & Ntoutsi, E. (2019). Adafair: Cumulative fairness adaptive boosting. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, November 3–7, Beijing, China, pp. 781–790.
  • Jia, M. L., Reiter, A., Lim, S. N., Artzi, Y., & Cardie, C. (2021). When in doubt: improving classification performance with alternating normalization. arXiv preprint arXiv:2109.13449.
  • Jiang, M. X., Yang, Y. L., & Qiu, H. Q. (2022). Fuzzy entropy and fuzzy support-based boosting random forests for imbalanced data. Applied Intelligence, 52(4), 4126–4143. https://doi.org/10.1007/s10489-021-02620-y
  • Kamiran, F., & Calders, T. (2009). Classifying without discriminating. In Proceedings of 2009 2nd International Conference on Computer, Control and Communication, February 17–18, Karachi, Pakistan, pp. 1–6.
  • Kamiran, F., & Calders, T. (2012). Data preprocessing techniques for classification without discrimination. Knowledge and Information Systems, 33(1), 1–33. https://doi.org/10.1007/s10115-011-0463-8
  • Ke, G. L., Meng, Q., Finley, T., Wang, T. F., Chen, W., Ma, W. D., Ye, Q. W., & Liu, T. Y. (2017). LightGBM: A highly efficient gradient boosting decision tree. In Proceedings of Annual 2017 Conference on Neural Information Processing Systems, December 4–9, California, USA, pp. 3146–3154.
  • Kearns, M., Neel, S., Roth, A., & Wu, Z. S. (2019). An empirical study of rich subgroup fairness for machine learning. In Proceedings of 2019 ACM Conference on Fairness, Accountability, and Transparency, January 29–31, Atlanta, GA, USA, pp. 100–109.
  • Kim, S., Joshi, P., Kalsi, P. S., & Taheri, P. (2018). Crime analysis through machine learning. In Proceedings of 2018 IEEE 9th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), November 1–3, University of British Columbia, Vancouver, Canada, pp. 415–420.
  • Kirar, B. S., & Agrawal, D. K. (2019). Computer aided diagnosis of glaucoma using discrete and empirical wavelet transform from fundus images. IET Image Processing, 13(1), 73–82. https://doi.org/10.1049/iet-ipr.2018.5297
  • Kirar, B. S., Agrawal, D. K., & Kirar, S. (2022). Glaucoma detection using image channels and discrete wavelet transform. IETE Journal of Research, 68(6), 4421–4428. https://doi.org/10.1080/03772063.2020.1795934
  • Kohavi, R. (1996). Scaling up the accuracy of naive-Bayes classifiers: A decision-tree hybrid. In Proceedings of 1996 International Conference on Knowledge Discovery and Data Mining, August 2–4, Oregon, USA, pp. 202–207.
  • Li, X., & Li, K. W. (2022). Imbalanced data classification based on improved EIWAPSO-AdaBoost-C ensemble algorithm. Applied Intelligence, 52(6), 6477–6502. https://doi.org/10.1007/s10489-021-02708-5
  • Liu, W. H., Hu, E. W., Su, B. G., & Wang, J. (2021). Using machine learning techniques for DSP software performance prediction at source code level. Connection Science, 33(1), 26–41. https://doi.org/10.1080/09540091.2020.1762542
  • Liu, X. Y., Wu, J., & Zhou, Z. H. (2008). Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 39(2), 539–550. https://doi.org/10.1109/TSMCB.2008.2007853
  • Lohia, P. K., Ramamurthy, K. N., Bhide, M., Saha, D., Varshney, K. R., & Puri, R. (2019). Bias mitigation post-processing for individual and group fairness. In Proceedings of 2019 IEEE International Conference on Acoustics, Speech and Signal Processing, May 12–17, Brighton, UK, pp. 2847–2851.
  • Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., & Galstyan, A. (2021). A survey on bias and fairness in machine learning. ACM Computing Surveys (CSUR), 54(6), 1–35. https://doi.org/10.1145/3457607
  • Moro, S., Cortez, P., & Rita, P. (2014). A data-driven approach to predict the success of bank telemarketing. Decision Support Systems, 62, 22–31. https://doi.org/10.1016/j.dss.2014.03.001
  • Onan, A. (2018). An ensemble scheme based on language function analysis and feature engineering for text genre classification. Journal of Information Science, 44(1), 28–47. https://doi.org/10.1177/0165551516677911
  • Onan, A. (2019). Consensus clustering-based undersampling approach to imbalanced learning. Scientific Programming, https://doi.org/10.1155/2019/5901087
  • Onan, A., Korukoğlu, S., & Bulut, H. (2016). Ensemble of keyword extraction methods and classifiers in text classification. Expert Systems with Applications, 57, 232–247. https://doi.org/10.1016/j.eswa.2016.03.045
  • Onan, A., Korukoğlu, S., & Bulut, H. (2017). A hybrid ensemble pruning approach based on consensus clustering and multi-objective evolutionary algorithm for sentiment classification. Information Processing & Management, 53(4), 814–833. https://doi.org/10.1016/j.ipm.2017.02.008
  • Onan, A., & Toçoğlu, M. A. (2021). A term weighted neural language model and stacked bidirectional LSTM based framework for sarcasm identification. IEEE Access, 9, 7701–7722. https://doi.org/10.1109/ACCESS.2021.3049734
  • Petrović, A., Nikolić, M., Radovanović, S., Delibašić, B., & Jovanović, M. (2022). FAIR: Fair adversarial instance re-weighting. Neurocomputing, 476, 14–37. https://doi.org/10.1016/j.neucom.2021.12.082
  • Pleiss, G., Raghavan, M., Wu, F., Kleinberg, J., & Weinberger, K. Q. (2017). On fairness and calibration. In Proceedings of 31st Conference on Neural Information Processing Systems, December 4–9, Long Beach, CA, USA, pp. 5684–5693.
  • Potha, N., Kouliaridis, V., & Kambourakis, G. (2021). An extrinsic random-based ensemble approach for android malware detection. Connection Science, 33(4), 1077–1093. https://doi.org/10.1080/09540091.2020.1853056
  • Puntumapon, K., Rakthamamon, T., & Waiyamai, K. (2016). Cluster-based minority over-sampling for imbalanced datasets. IEICE Transactions on Information and Systems, 99(12), 3101–3109. https://doi.org/10.1587/transinf.2016EDP7130
  • Qi, W., Ovur, S. E., Li, Z. J., Marzullo, A., & Song, R. (2021). Multi-sensor guided hand gesture recognition for a teleoperated robot using a recurrent neural network. IEEE Robotics and Automation Letters, 6(3), 6039–6045. https://doi.org/10.1109/LRA.2021.3089999
  • Rezaei, A., Fathony, R., Memarrast, O., & Ziebart, B. (2020). Fairness for robust log loss classification. In Proceedings of the 2020 AAAI Conference on Artificial Intelligence, February 7–12, New York, USA, pp. 5511–5518.
  • Stehman, S. V. (1997). Selecting and interpreting measures of thematic classification accuracy. Remote Sensing of Environment, 62(1), 77–89. https://doi.org/10.1016/S0034-4257(97)00083-7
  • Tao, X. M., Li, Q., Ren, C., Guo, W. J., Li, C. X., He, Q., Liu, R., & Zou, J. R. (2019). Real-value negative selection over-sampling for imbalanced data set learning. Expert Systems with Applications, 129, 118–134. https://doi.org/10.1016/j.eswa.2019.04.011
  • Thabtah, F., Hammoud, S., Kamalov, F., & Gonsalves, A. (2020). Data imbalance in classification: Experimental evaluation. Information Sciences, 513, 429–441. https://doi.org/10.1016/j.ins.2019.11.004
  • Wang, J. L., Liu, Y., & Levy, C. (2021). Fair classification with group-dependent label noise. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, March 3–10, New York, USA, pp. 526–536.
  • Wang, Y. L., Zhang, Y. H., Lu, Y., & Yu, X. R. (2020). A comparative assessment of credit risk model based on machine learning — a case study of bank loan data. Procedia Computer Science, 174, 141–149. https://doi.org/10.1016/j.procs.2020.06.069
  • Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5(2), 241–259. https://doi.org/10.1016/S0893-6080(05)80023-1
  • Xia, Y. F., Liu, C. Z., Da, B., & Xie, F. M. (2018). A novel heterogeneous ensemble credit scoring model based on bstacking approach. Expert Systems with Applications, 93, 182–199. https://doi.org/10.1016/j.eswa.2017.10.022
  • Xia, Y. L., Chen, K., & Yang, Y. (2021). Multi-label classification with weighted classifier selection and stacked ensemble. Information Sciences, 557, 421–442. https://doi.org/10.1016/j.ins.2020.06.017
  • Xu, Y., Yang, C. J., Peng, S. L., & Nojima, Y. (2020). A hybrid two-stage financial stock forecasting algorithm based on clustering and ensemble learning. Applied Intelligence, 50(11), 3852–3867. https://doi.org/10.1007/s10489-020-01766-5
  • Zafar, M. B., Valera, I., Gomez Rodriguez, M., & Gummadi, K. P. (2017). Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In Proceedings of the 26th International Conference on World Wide Web Companion, April 3–7, Perth, Australia, pp. 1171–1180.
  • Zhang, W. Y., Yang, D. Q., & Zhang, S. (2021). A new hybrid ensemble model with voting-based outlier detection and balanced sampling for credit scoring. Expert Systems with Applications, 174, 114744. https://doi.org/10.1016/j.eswa.2021.114744