403
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Bagging Vs. Boosting in Ensemble Machine Learning? An Integrated Application to Fraud Risk Analysis in the Insurance Sector

, , & ORCID Icon
Article: 2355024 | Received 19 Jan 2024, Accepted 29 Apr 2024, Published online: 20 May 2024

References

  • Abakarim, Y., M. Lahby, and A. Attioui. 2023. A bagged ensemble convolutional neural networks approach to recognize insurance claim frauds. Applied System Innovation 6 (1):20. doi:10.3390/asi6010020
  • Ang, J. C., A. Mirzal, H. Haron, and H. Nuzly Abdull Hamed. 2016. Supervised, unsupervised, and semi-supervised feature selection: A review on gene selection. IEEE/ACM Transactions on Computational Biology and Bioinformatics 13 (5):971–34. doi:10.1109/TCBB.2015.2478454
  • Anurag, M., M. Kaur Saggi, S. Rehman, H. Sajjad, S. Inyurt, A. Singh Bhatia, A. Ahsan Farooque, A. Y. Oudah, and Z. Mundher Yaseen. 2022. Deep learning versus gradient boosting machine for pan evaporation prediction. Engineering Applications of Computational Fluid Mechanics 16 (1):570–87. doi:10.1080/19942060.2022.2027273
  • Arai, H., C. Maung, K. Xu, and H. Schweitzer. 2016. Unsupervised feature selection by heuristic search with provable bounds on suboptimality. Proceedings of the AAAI Conference on Artificial Intelligence 30 (1). doi:10.1609/aaai.v30i1.10082
  • Aslam, F., A. Imran, Z. Ftiti, W. Louhichi, and T. Shams. 2022. Research in international business and finance insurance fraud detection: evidence from artificial intelligence and machine learning. Research in International Business and Finance 62 (August):101744. doi:10.1016/j.ribaf.2022.101744
  • Azzone, M., E. Barucci, G. Giuffra Moncayo, and D. Marazzina. 2022. A machine learning model for lapse prediction in life insurance contracts. Expert Systems with Applications 191 (April 2021):116261. doi:10.1016/j.eswa.2021.116261
  • Batista, G. E. A. P. A., R. C. Prati, and M. Carolina Monard. 2004. A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter 6 (1):20–29. doi:10.1145/1007730.1007735
  • Benedek, B., C. Ciumas, and B. Zsolt Nagy. 2022. Automobile insurance fraud detection in the age of big data – A systematic and comprehensive literature review. Journal of Financial Regulation & Compliance 30 (4):503–23. doi:10.1108/JFRC-11-2021-0102
  • Boodhun, N., and M. Jayabalan. 2018. Risk prediction in life insurance industry using supervised learning algorithms. Complex & Intelligent Systems 4 (2):145–54. doi:10.1007/s40747-018-0072-1
  • Boyd, K., K. H. Eng, and C. D. Page. 2013. Erratum: Area under the Precision-Recall Curve: Point Estimates and Confidence Intervals. In Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2013. Lecture Notes in Computer Science, ed. H. Blockeel, K. Kersting, S. Nijssen, and F. Železný, vol. 8190. Berlin, Heidelberg: Springer. doi:10.1007/978-3-642-40994-3_55
  • Bradley, A. P. 1997. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30 (7):1145–59. doi:10.1016/S0031-3203(96)00142-2
  • Breiman, L. 1996. Bagging predictors. Machine Learning 24 (2):123–40. doi:10.1007/BF00058655
  • Caruana, M. A., and L. Grech. 2021. Automobile insurance fraud detection. Communications in Statistics Case Studies, Data Analysis and Applications 7 (4):520–35. doi:10.1080/23737484.2021.1986169
  • Chawla, N. V., K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. 2002. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16:321–57. doi:10.1613/jair.953
  • Chizi, B., and O. Maimon. 2009. Dimension reduction and feature selection. In Data mining and knowledge discovery handbook, 83–100. Boston, MA: Springer US. doi:10.1007/978-0-387-09823-4_5
  • Cinaroglu, S. 2020. Modelling unbalanced catastrophic health expenditure data by using machine-learning methods. Intelligent Systems in Accounting, Finance and Management 27 (4):168–81. doi:10.1002/isaf.1483
  • Cutler, A., D. R. Cutler, and R. S. John. 2012. Random forests. In Ensemble machine learning, 157–75. Boston, MA: Springer US. doi:10.1007/978-1-4419-9326-7_5
  • Davis, J., and M. Goadrich. 2006. The relationship between precision-recall and ROC curves. In Proceedings of the 23rd international conference on Machine learning - ICML ’06, New York, New York, USA, 233–40. ACM Press.
  • Dhieb, N., H. Ghazzai, H. Besbes, and Y. Massoud. 2019. Extreme gradient boosting machine learning algorithm for safe auto insurance operations. In 2019 IEEE International Conference on Vehicular Electronics and Safety, ICVES 2019, 1–5. doi: 10.1109/ICVES.2019.8906396
  • Dhieb, N., H. Ghazzai, H. Besbes, and Y. Massoud. 2020. A secure AI-Driven architecture for automated insurance systems: Fraud detection and risk measurement. IEEE Access 8:58546–58. doi:10.1109/ACCESS.2020.2983300
  • Farquad, M. A. H., V. Ravi, and S. Bapi Raju. 2012. Analytical CRM in banking and finance using SVM: A modified active learning-based rule extraction approach. International Journal of Electronic Customer Relationship Management 6 (1):48. doi:10.1504/IJECRM.2012.046470
  • Friedman, J. H. 2001. Greedy function approximation: A gradient boosting machine. The Annals of Statistics 29 (5):1189–232. doi:10.1214/aos/1013203451
  • Gomes, C., Z. Jin, and H. Yang. 2021. Insurance fraud detection with unsupervised deep learning. Journal of Risk and Insurance 88 (3):591–624. doi:10.1111/jori.12359
  • Gowda, K., and G. Krishna. 1979. The condensed nearest neighbor rule using the concept of mutual nearest neighborhood (Corresp.). IEEE Transactions on Information Theory 25 (4):488–90. doi:10.1109/TIT.1979.1056066
  • Guoming, Z., X. Zhang, M. Bilal, W. Dou, X. Xu, and J. J. P. C. Rodrigues. 2022. Identifying fraud in medical insurance based on blockchain and deep learning. Future Generation Computer Systems 130:140–54. doi:10.1016/j.future.2021.12.006
  • Guo, J., and W. Zhu. 2018. Dependence guided unsupervised feature selection. Proceedings of the AAAI Conference on Artificial Intelligence 32 (1). doi:10.1609/aaai.v32i1.11904
  • Hanafy, M., and R. Ming. 2021a. Improving imbalanced data classification in auto insurance by the data level approaches. International Journal of Advanced Computer Science and Applications 12 (6):493–99. doi:10.14569/IJACSA.2021.0120656
  • Hanafy, M., and R. Ming. 2021b. Machine learning approaches for auto insurance big data. Risks 9 (2):1–23. doi:10.3390/risks9020042
  • Hanafy, M., and R. Ming. 2021c. Using machine learning models to compare various resampling methods in predicting insurance fraud. Journal of Theoretical and Applied Information Technology 99 (12):2819–33.
  • Hanafy, M., and R. Ming. 2022. Classification of the insureds using integrated machine learning algorithms: a comparative study. Applied Artificial Intelligence 36 (1). doi: 10.1080/08839514.2021.2020489
  • Harjai, S., S. Kumar Khatri, and G. Singh. 2019. Detecting fraudulent insurance claims using random forests and synthetic minority oversampling technique. In 2019 4th International Conference on Information Systems and Computer Networks, ISCON 2019, 123–28. doi: 10.1109/ISCON47742.2019.9036162
  • Hassan, A. K. I., and A. Abraham. 2016. Modeling insurance fraud detection using imbalanced data classification. Advances in Intelligent Systems & Computing 419:117–27. doi:10.1007/978-3-319-27400-3_11
  • He, H., Y. Bai, E. A. Garcia, and S. Li. 2008. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), 1322–28. IEEE. doi:10.1109/IJCNN.2008.4633969
  • Henckaerts, R., M. Pier Côté, K. Antonio, and R. Verbelen. 2020. Boosting insights in insurance tariff plans with tree-based machine learning methods. North American Actuarial Journal 25 (2):1–31. doi:10.1080/10920277.2020.1745656
  • Hu, C., Z. Quan, and W. Fung Chong. 2022. Imbalanced learning for insurance using modified loss functions in tree-based models. Insurance: Mathematics and Economics 106:13–32. doi:10.1016/j.insmatheco.2022.04.010
  • Kate, P., V. Ravi, and A. Gangwar. 2022. FinGAN: Chaotic generative adversarial network for analytical customer relationship management in banking and insurance. Neural Computing and Applications 35 (8):6015–28. doi:10.1007/s00521-022-07968-x
  • Kaushik, K., A. Bhardwaj, A. Dhar Dwivedi, and R. Singh. 2022. Machine learning-based regression framework to predict health insurance premiums. International Journal of Environmental Research Public Health 19 (13):7898. doi:10.3390/ijerph19137898
  • Kotb, M. H., and R. Ming. 2021. Comparing SMOTE family techniques in predicting insurance premium defaulting using machine learning models. International Journal of Advanced Computer Science and Applications 12 (9):621–29. doi:10.14569/IJACSA.2021.0120970
  • Kubat, M., and S. Matwin. 1997. Addressing the curse of imbalanced data sets: one-sided sampling. In Proceedings of the Fourteenth International Conference on Machine Learning, Nashville, TN, USA, July 8–12, 179–86.
  • Kumar, S., S. Bhatachharya, R. Pradhan, S. Biswal, S. M. Thampi, and E.-S. M. El-Alfy. 2019. Fuzzy clustering using salp swarm algorithm for automobile insurance fraud detection. Journal of Intelligent & Fuzzy Systems 36 (3):2333–44. doi:10.3233/JIFS-169944
  • Laurikkala, J. 2001. Improving Identification of Difficult Small Classes by Balancing Class Distribution. In Artificial Intelligence in Medicine. AIME 2001. Lecture Notes in Computer Science, ed. S. Quaglini, P. Barahona, and S. Andreassen, vol. 2101. Berlin, Heidelberg: Springer. doi:10.1007/3-540-48229-6_9
  • Li, Q., H. Chen, H. Huang, X. Zhao, Z. Cai, C. Tong, W. Liu, and X. Tian. 2017. An enhanced grey wolf optimization based machine for medical diagnosis. Computational & Mathematical Methods in Medicine 2017:1–15. doi:10.1155/2017/9512741
  • Li, Y., C. Yan, W. Liu, and M. Li. 2018. A principle component analysis-based random forest with the potential nearest neighbor method for automobile insurance fraud identification. Applied Soft Computing Journal 70:1000–09. doi:10.1016/j.asoc.2017.07.027
  • Maiano, L., A. Montuschi, M. Caserio, E. Ferri, F. Kieffer, C. Germanò, L. Baiocco, L. R. Celsi, I. Amerini, and A. Anagnostopoulos. 2023. A deep-learning–based antifraud system for car-insurance claims. Expert Systems with Applications 231:120644. doi:10.1016/j.eswa.2023.120644
  • Majhi, S. K. 2021. Fuzzy clustering algorithm based on modified whale optimization algorithm for automobile insurance fraud detection. Evolutionary Intelligence 14 (1):35–46. doi:10.1007/s12065-019-00260-3
  • Mohamed, H., and M. Omar. 2021. Predict health insurance cost by using machine learning and DNN regression models. International Journal of Innovative Technology and Exploring Engineering 10 (3):137–43. doi:10.35940/ijitee.C8364.0110321
  • Nian, K., H. Zhang, A. Tayal, T. Coleman, and Y. Li. 2016. Auto insurance fraud detection using unsupervised spectral ranking for anomaly. The Journal of Finance and Data Science 2 (1):58–75. doi:10.1016/j.jfds.2016.03.001
  • Oyedele, A., A. Ajayi, L. O. Oyedele, J. Manuel Davila Delgado, L. Akanbi, O. Akinade, H. Owolabi, and M. Bilal. 2021. Deep learning and boosted trees for injuries prediction in power infrastructure projects. Applied Soft Computing 110:107587. doi:10.1016/j.asoc.2021.107587
  • Ozenne, B., F. Subtil, and D. Maucort-Boulch. 2015. The precision–recall curve overcame the optimism of the receiver operating characteristic curve in rare diseases. Journal of Clinical Epidemiology 68 (8):855–59. doi:10.1016/j.jclinepi.2015.02.010
  • Pesantez-Narvaez, J., M. Guillen, and M. Alcañiz. 2019. Predicting motor insurance claims using telematics data—XGboost versus logistic regression. Risks 7 (2). doi:10.3390/risks7020070
  • Robnik-Sikonja, M., and I. Kononenko. 2008. Explaining classifications for individual instances. IEEE Transactions on Knowledge and Data Engineering 20 (5):589–600. doi:10.1109/TKDE.2007.190734
  • Saito, T., R. Marc, and G. Brock. 2015. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets edited by G. Brock. PLOS ONE 10 (3):e0118432. doi:10.1371/journal.pone.0118432
  • Salmi, M. and D. Atif. 2022. Using a Data Mining Approach to Detect Automobile Insurance Fraud. In Proceedings of the 13th International Conference on Soft Computing and Pattern Recognition (SoCPaR 2021). SoCPaR 2021. Lecture Notes in Networks and Systems, ed. A. Abraham, vol. 417. Springer, Cham. doi:10.1007/978-3-030-96302-6_5
  • Seema, R., A. Rawat, D. Kumar, and A. Sai Sabitha. 2021. Application of machine learning and data visualization techniques for decision support in the insurance sector. International Journal of Information Management Data Insights 1 (2):100012. doi:10.1016/j.jjimei.2021.100012
  • Severino, M. K., and Y. Peng. 2021. Machine learning algorithms for fraud prediction in property insurance: Empirical evidence using real-world microdata. Machine Learning with Applications 5 (June):100074. doi:10.1016/j.mlwa.2021.100074
  • Solorio-Fernández, S., J. A. Carrasco-Ochoa, and J. F. Martínez-Trinidad. 2020. A review of unsupervised feature selection methods. Artificial Intelligence Review 53 (2):907–48. doi:10.1007/s10462-019-09682-y
  • Sundarkumar, G., V. R. Ganesh, and V. Siddeshwar. 2016. One-class support vector machine based undersampling: application to churn prediction and insurance fraud detection. In 2015 IEEE International Conference on Computational Intelligence and Computing Research, ICCIC 2015 (ii). doi: 10.1109/ICCIC.2015.7435726
  • Sundarkumar, G. G., and V. Ravi. 2015. A novel hybrid undersampling method for mining unbalanced datasets in banking and insurance. Engineering Applications of Artificial Intelligence 37:368–77. doi:10.1016/j.engappai.2014.09.019
  • Taha, A., B. Cosgrave, and S. McKeever. 2022. Using feature selection with machine learning for generation of insurance insights. Applied Sciences (Switzerland) 12 (6):3209. doi:10.3390/app12063209
  • Ul Hassan, C. A., J. Iqbal, S. Hussain, H. AlSalman, M. A. A. Mosleh, S. Sajid Ullah, and E. Rak. 2021. A computational intelligence approach for predicting medical insurance cost. Mathematical Problems in Engineering 2021:1–13. doi:10.1155/2021/1162553
  • Uysal, A. K., and S. Gunal. 2012. A novel probabilistic feature selection method for text classification. Knowledge-Based Systems 36:226–35. doi:10.1016/j.knosys.2012.06.005
  • Vasu, M., and V. Ravi. 2011. A hybrid under-sampling approach for mining unbalanced datasets: Applications to banking and insurance. International Journal of Data Mining, Modelling and Management 3 (1):75. doi:10.1504/IJDMMM.2011.038812
  • Vijaya, J. N. J., and J. Vijaya. 2022. Boost customer churn prediction in the insurance industry using meta heuristic models. International Journal of Information Technology 14 (5):2619–31. doi:10.1007/s41870-022-01017-5
  • Vosseler, A. 2022. Unsupervised insurance fraud prediction based on anomaly detector ensembles. Risks 10 (7):132. doi:10.3390/risks10070132
  • Wang, S., J. Tang, and H. Liu. 2015. Embedded unsupervised feature selection. Proceedings of the AAAI Conference on Artificial Intelligence 29 (1). doi: 10.1609/aaai.v29i1.9211
  • Wilson, D. L. 1972. Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man, and Cybernetics SMC 2 (3):408–21. doi:10.1109/TSMC.1972.4309137
  • Xu, B., Y. Wang, X. Liao, and K. Wang. 2023. Efficient fraud detection using deep boosting decision trees. Decision Support Systems 175:114037. doi:10.1016/j.dss.2023.114037
  • Yankol-Schalck, M. 2022. The value of cross-data set analysis for automobile insurance fraud detection. Research in International Business and Finance 63 (August):101769. doi:10.1016/j.ribaf.2022.101769
  • Yan, C., Y. Li, W. Liu, M. Li, J. Chen, and L. Wang. 2020. An artificial bee colony-based kernel ridge regression for automobile insurance fraud identification. Neurocomputing 393:115–25. doi:10.1016/j.neucom.2017.12.072