179
Views
0
CrossRef citations to date
0
Altmetric
Research Articles

Enhancing safety of construction workers in Korea: an integrated text mining and machine learning framework for predicting accident types

, &
Pages 203-215 | Received 26 Jul 2023, Accepted 24 Dec 2023, Published online: 02 Jan 2024

References

  • Ahadh, A., Binish, G. V., & Srinivasan, R. (2021). Text mining of accident reports using semi-supervised keyword extraction and topic modeling. Process Safety and Environmental Protection, 155, 455–465. https://doi.org/10.1016/j.psep.2021.09.022
  • Akosa, J. (2017). Predictive accuracy: A misleading performance measure for highly imbalanced data [Paper Presentation]. Proceedings of the SAS Global Forum.
  • Alkaissy, M., Arashpour, M., Golafshani, E. M., Hosseini, M. R., Khanmohammadi, S., Bai, Y., & Feng, H. (2023). Enhancing construction safety: Machine learning-based classification of injury types. Safety Science, 162, 106102. https://doi.org/10.1016/j.ssci.2023.106102
  • Architectural Institute of Korea (AIK). (2020). Online Dictionary of Architecture & Architectural Engineering (AIK’s ArchiDic). http://dict.aik.or.kr/ [in Korean]
  • Azad, P., Navimipour, N. J., Rahmani, A. M., & Sharifi, A. (2020). The role of structured and unstructured data managing mechanisms in the Internet of things. Cluster Computing, 23(2), 1185–1198. https://doi.org/10.1007/s10586-019-02986-2
  • Bagui, S., Nandi, D., Bagui, S., & White, R. J. (2021). Machine learning and deep learning for phishing email classification using one-hot encoding. Journal of Computer Science, 17(7), 610–623. https://doi.org/10.3844/jcssp.2021.610.623
  • Baker, H., Hallowell, M. R., & Tixier, A. J. P. (2020). Automatically learning construction injury precursors from text. Automation in Construction, 118, 103145. https://doi.org/10.1016/j.autcon.2020.103145
  • Biau, G., & Scornet, E. (2016). A random forest guided tour. TEST, 25(2), 197–227. https://doi.org/10.1007/s11749-016-0481-7
  • Blum, A. L., & Langley, P. (1997). Selection of relevant features and examples in machine learning. Artificial Intelligence, 97(1-2), 245–271. https://doi.org/10.1016/S0004-3702(97)00063-5
  • Boughorbel, S., Jarray, F., & El-Anbari, M. (2017). Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric. PloS One, 12(6), e0177678. https://doi.org/10.1371/journal.pone.0177678
  • Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers & Electrical Engineering, 40(1), 16–28. https://doi.org/10.1016/j.compeleceng.2013.11.024
  • Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953
  • Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system [Paper presentation]. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, California, USA. https://doi.org/10.1145/2939672.2939785
  • Chicco, D., & Jurman, G. (2020). The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics. 21(1), 6. https://doi.org/10.1186/s12864-019-6413-7
  • Choi, J., Gu, B., Chin, S., & Lee, J.-S. (2020). Machine learning predictive model based on national data for fatal accidents of construction workers. Automation in Construction, 110, 102974. https://doi.org/10.1016/j.autcon.2019.102974
  • Churchill, R., & Singh, L. (2022). The evolution of topic modeling. ACM Computing Surveys, 54(10s), 1–35. Article 215. https://doi.org/10.1145/3507900
  • Gibb, A., Hide, S., Haslam, R., Gyi, D., Pavitt, T., Atkinson, S., & Duff, R. (2005). Construction tools and equipment – Their influence on accident causality. Journal of Engineering, Design and Technology, 3(1), 12–23. https://doi.org/10.1108/17260530510815303
  • Goncalves Filho, A. P., Waterson, P., & Jun, G. T. (2021). Improving accident analysis in construction – Development of a contributing factor classification framework and evaluation of its validity and reliability. Safety Science, 140, 105303. https://doi.org/10.1016/j.ssci.2021.105303
  • Grandini, M., Bagli, E., & Visani, G. (2020). Metrics for multi-class classification: An overview. arXiv preprint arXiv:2008.05756. https://doi.org/10.48550/arXiv.2008.05756
  • Gunning, D., Stefik, M., Choi, J., Miller, T., Stumpf, S., & Yang, G.-Z. (2019). XAI—Explainable artificial intelligence. Science Robotics, 4(37), eaay7120. https://doi.org/10.1126/scirobotics.aay7120
  • Guo, M., Yuan, Z., Janson, B., Peng, Y., Yang, Y., & Wang, W. (2021). Older pedestrian traffic crashes severity analysis based on an emerging machine learning XGBoost. Sustainability, 13(2), 926. https://doi.org/10.3390/su13020926
  • Hajian-Tilaki, K. (2013). Receiver operating characteristic (ROC) curve analysis for medical diagnostic test evaluation. Caspian Journal of Internal Medicine, 4(2), 627–635.
  • Huh, J.-H. (2018). Big data analysis for personalized health activities: Machine learning processing for automatic keyword extraction approach. Symmetry, 10(4), 93. https://doi.org/10.3390/sym10040093
  • Jamal, A., Zahid, M., Tauhidur Rahman, M., Al-Ahmadi, H. M., Almoshaogeh, M., Farooq, D., & Ahmad, M. (2021). Injury severity prediction of traffic crashes with ensemble machine learning techniques: A comparative study. International Journal of Injury Control and Safety Promotion, 28(4), 408–427. https://doi.org/10.1080/17457300.2021.1928233
  • Janiesch, C., Zschech, P., & Heinrich, K. (2021). Machine learning and deep learning. Electronic Markets, 31(3), 685–695. https://doi.org/10.1007/s12525-021-00475-2
  • Jha, A. N., Kumar, A., Tiwari, G., & Chatterjee, N. (2022). Identification and analysis of offenders causing hit and run accidents using classification algorithms. International Journal of Injury Control and Safety Promotion, 29(3), 360–371. https://doi.org/10.1080/17457300.2022.2040541
  • Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science (New York, N.Y.), 349(6245), 255–260. https://doi.org/10.1126/science.aaa8415
  • Kang, K., & Ryu, H. (2019). Predicting types of occupational accidents at construction sites in Korea using random forest model. Safety Science, 120, 226–236. https://doi.org/10.1016/j.ssci.2019.06.034
  • Kingsford, C., & Salzberg, S. L. (2008). What are decision trees? Nature Biotechnology, 26(9), 1011–1013. https://doi.org/10.1038/nbt0908-1011
  • Koc, K., Ekmekcioğlu, Ö., & Gurgun, A. P. (2022). Prediction of construction accident outcomes based on an imbalanced dataset through integrated resampling techniques and machine learning methods. Engineering, Construction and Architectural Management, 30(9), 4486–4517. https://doi.org/10.1108/ECAM-04-2022-0305
  • Kursa, M. B., & Rudnicki, W. R. (2010). Feature selection with the Boruta package. Journal of Statistical Software, 36(11), 1–13. https://doi.org/10.18637/jss.v036.i11
  • Leevy, J. L., Khoshgoftaar, T. M., Bauder, R. A., & Seliya, N. (2018). A survey on addressing high-class imbalance in big data. Journal of Big Data, 5(1), 42. https://doi.org/10.1186/s40537-018-0151-6
  • Ley, C., Martin, R. K., Pareek, A., Groll, A., Seil, R., & Tischer, T. (2022). Machine learning and conventional statistics: Making sense of the differences. Knee Surgery, Sports Traumatology, Arthroscopy, 30(3), 753–757. https://doi.org/10.1007/s00167-022-06896-6
  • Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R. P., Tang, J., & Liu, H. (2017). Feature selection: A data perspective. ACM Computing Surveys, 50(6), 1–45. Article 94. https://doi.org/10.1145/3136625
  • Lin, J. R., Hu, Z. Z., Li, J. L., & Chen, L. M. (2020). Understanding on-site inspection of construction projects based on keyword extraction and topic modeling. IEEE Access. 8, 198503–198517. https://doi.org/10.1109/ACCESS.2020.3035214
  • Loosemore, M., Sunindijo, R. Y., & Zhang, S. (2020). Comparative analysis of safety climate in the Chinese, Australian, and Indonesian construction industries. Journal of Construction Engineering and Management, 146(12), 04020129. https://doi.org/10.1061/(ASCE)CO.1943-7862.0001934
  • Loushine, T. W., Hoonakker, P. L. T., Carayon, P., & Smith, M. J. (2006). Quality and safety management in construction. Total Quality Management & Business Excellence, 17(9), 1171–1212. https://doi.org/10.1080/14783360600750469
  • Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30. https://doi.org/10.48550/arXiv.1705.07874
  • Ma, L., & Zhang, Y. (2015). Using Word2Vec to process big text data [Paper presentation]. 2015 IEEE International Conference on Big Data (Big Data), 29 Oct.-1 Nov. https://doi.org/10.1109/BigData.2015.7364114
  • Mahamud, A. H., Dey, A. K., Alam, A. N. M. S., Alam, M. G. R., & Zaman, S. (2022). Implementation of explainable AI in mental health informatics: Suicide data of the United Kingdom [Paper presentation]. 2022 12th International Conference on Electrical and Computer Engineering (ICECE), 21-23 Dec. https://doi.org/10.1109/ICECE57408.2022.10088765
  • Nasar, Z., Jaffry, S. W., & Malik, M. K. (2019). Textual keyword extraction and summarization: State-of-the-art. Information Processing & Management, 56(6), 102088. https://doi.org/10.1016/j.ipm.2019.102088
  • Nordin, N., Zainol, Z., Mohd Noor, M. H., & Chan, L. F. (2023). An explainable predictive model for suicide attempt risk using an ensemble learning and Shapley Additive Explanations (SHAP) approach. Asian Journal of Psychiatry, 79, 103316. https://doi.org/10.1016/j.ajp.2022.103316
  • Onan, A., Korukoğlu, S., & Bulut, H. (2016). Ensemble of keyword extraction methods and classifiers in text classification. Expert Systems with Applications, 57, 232–247. https://doi.org/10.1016/j.eswa.2016.03.045
  • Outay, F., Adnan, M., Gazder, U., Baqueri, S. F. A., & Awan, H. H. (2023). Random forest models for motorcycle accident prediction using naturalistic driving based big data. International Journal of Injury Control and Safety Promotion, 30(2), 282–293. https://doi.org/10.1080/17457300.2022.2164310
  • Prajwala, T. (2015). A comparative study on decision tree and random forest using R tool. IJARCCE, 4(1), 196–199. https://doi.org/10.17148/IJARCCE.2015.4142
  • Rish, I. (2001). An empirical study of the naive Bayes classifier. IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, 3(22), 41–46.
  • Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206–215. https://doi.org/10.1038/s42256-019-0048-x
  • Sadeghi Moghadam, M. R., Safari, H., & Yousefi, N. (2021). Clustering quality management models and methods: Systematic literature review and text-mining analysis approach. Total Quality Management & Business Excellence, 32(3-4), 241–264. https://doi.org/10.1080/14783363.2018.1540927
  • Santos, K., Dias, J. P., & Amado, C. (2022). A literature review of machine learning algorithms for crash injury severity prediction. Journal of Safety Research, 80, 254–269. https://doi.org/10.1016/j.jsr.2021.12.007
  • Sarkar, S., Pramanik, A., Maiti, J., & Reniers, G. (2020). Predicting and analyzing injury severity: A machine learning-based approach using class-imbalanced proactive and reactive data. Safety Science, 125, 104616. https://doi.org/10.1016/j.ssci.2020.104616
  • Shamrat, F. M. J. M., Azam, S., Karim, A., Islam, R., Tasnim, Z., Ghosh, P., & De Boer, F. (2022). LungNet22: A fine-tuned model for multiclass classification and prediction of lung disease using X-ray images. Journal of Personalized Medicine, 12(5), 680. https://www.mdpi.com/2075-4426/12/5/680 https://doi.org/10.3390/jpm12050680
  • Shim, Y., Jeong, J., Jeong, J., Lee, J., & Kim, Y. (2022). Comparative analysis of the national fatality rate in construction industry using time-series approach and equivalent evaluation conditions. International Journal of Environmental Research and Public Health, 19(4), 2312. https://doi.org/10.3390/ijerph19042312
  • Singh, G., Kumar, B., Gaur, L., & Tyagi, A. (2019). Comparison between multinomial and Bernoulli Naïve Bayes for Text Classification [Paper presentation]. 2019 International Conference on Automation, Computational and Technology Management (ICACTM), 24-26 April. https://doi.org/10.1109/ICACTM.2019.8776800
  • Sun, Z., Wang, D., Gu, X., Xing, Y., Wang, J., Lu, H., & Chen, Y. (2023). A hybrid clustering and random forest model to analyse vulnerable road user to motor vehicle (VRU-MV) crashes. International Journal of Injury Control and Safety Promotion, 30(3), 338–351. https://doi.org/10.1080/17457300.2023.2180804
  • Sun, Y., Kamel, M. S., & Wang, Y. (2006). Boosting for learning multiple classes with imbalanced class distribution [Paper presentation]. Sixth International Conference on Data Mining (ICDM’06), 18-22 Dec. https://doi.org/10.1109/ICDM.2006.29
  • Tanha, J., Abdi, Y., Samadi, N., Razzaghi, N., & Asadpour, M. (2020). Boosting methods for multi-class imbalanced data classification: An experimental review. Journal of Big Data, 7(1), 70. https://doi.org/10.1186/s40537-020-00349-y
  • Tixier, A. J. P., Hallowell, M. R., Rajagopalan, B., & Bowman, D. (2016a). Application of machine learning to construction injury prediction. Automation in Construction, 69, 102–114. https://doi.org/10.1016/j.autcon.2016.05.016
  • Tixier, A. J. P., Hallowell, M. R., Rajagopalan, B., & Bowman, D. (2016b). Automated content analysis for construction safety: A natural language processing system to extract precursors and outcomes from unstructured injury reports. Automation in Construction, 62, 45–56. https://doi.org/10.1016/j.autcon.2015.11.001
  • Umer, M., Sadiq, S., Missen, M. M. S., Hameed, Z., Aslam, Z., Siddique, M. A., & Nappi, M. (2021). Scientific papers citation analysis using textual features and SMOTE resampling techniques. Pattern Recognition Letters, 150, 250–257. https://doi.org/10.1016/j.patrec.2021.07.009
  • Vens, C., Struyf, J., Schietgat, L., Džeroski, S., & Blockeel, H. (2008). Decision trees for hierarchical multi-label classification. Machine Learning, 73(2), 185–214. https://doi.org/10.1007/s10994-008-5077-3
  • Villanes, A., & Healey, C. G. (2023). Domain-specific text dictionaries for text analytics. International Journal of Data Science and Analytics, 15(1), 105–118. https://doi.org/10.1007/s41060-022-00344-x
  • Wang, R., Wang, L., Zhang, J., He, M., & Xu, J. (2022). XGBoost machine learning algorism performed better than regression models in predicting mortality of moderate-to-severe traumatic brain injury. World Neurosurgery, 163, e617–e622. https://doi.org/10.1016/j.wneu.2022.04.044
  • Wen, X., Xie, Y., Wu, L., & Jiang, L. (2021). Quantifying and comparing the effects of key risk factors on various types of roadway segment crashes with LightGBM and SHAP. Accident; Analysis and Prevention, 159, 106261. https://doi.org/10.1016/j.aap.2021.106261
  • Wu, J., Chen, X.-Y., Zhang, H., Xiong, L.-D., Lei, H., & Deng, S.-H. (2019). Hyperparameter optimization for machine learning models based on Bayesian Optimizationb. Journal of Electronic Science and Technology, 17(1), 26–40. https://doi.org/10.11989/JEST.1674-862X.80904120
  • Yan, Z., Chen, H., Dong, X., Zhou, K., & Xu, Z. (2022). Research on prediction of multi-class theft crimes by an optimized decomposition and fusion method based on XGBoost. Expert Systems with Applications, 207, 117943. https://doi.org/10.1016/j.eswa.2022.117943
  • Yang, F. J. (2018, 12-14 Dec. 2018). An implementation of Naive Bayes classifier [Paper presentation]. 2018 International Conference on Computational Science and Computational Intelligence (CSCI). https://doi.org/10.1109/CSCI46756.2018.00065
  • Yao, Y., Xiao, Z., Wang, B., Viswanath, B., Zheng, H., & Zhao, B. Y. (2017). Complexity vs. performance: Empirical analysis of machine learning as a service [Paper presentation]. Proceedings of the 2017 Internet Measurement Conference, London, United Kingdom. https://doi.org/10.1145/3131365.3131372
  • Yoo, J. W., Park, J. S., & Park, H. J. (2023). Understanding VR-based construction safety training effectiveness: The role of telepresence, risk perception, and training satisfaction. Applied Sciences, 13(2), 1135. https://doi.org/10.3390/app13021135
  • Yousefinaghani, S., Dara, R., Mubareka, S., Papadopoulos, A., & Sharif, S. (2021). An analysis of COVID-19 vaccine sentiments and opinions on Twitter. International Journal of Infectious Diseases: IJID, 108, 256–262. https://doi.org/10.1016/j.ijid.2021.05.059
  • Yu, L., Zhou, R., Chen, R., & Lai, K. K. (2022). Missing data preprocessing in credit classification: One-hot encoding or imputation? Emerging Markets Finance and Trade, 58(2), 472–482. https://doi.org/10.1080/1540496X.2020.1825935
  • Zhao, D., & Lucas, J. (2015). Virtual reality simulation for construction safety promotion. International Journal of Injury Control and Safety Promotion, 22(1), 57–67. https://doi.org/10.1080/17457300.2013.861853
  • Zhe Hui, H., Jane, C., & Dawn, T. (2017). What is an ROC curve? Emergency Medicine Journal, 34(6), 357. https://doi.org/10.1136/emermed-2017-206735
  • Zhen, L., & Qiong, L. (2012). A new feature selection method for internet traffic classification using ML. Physics Procedia. 33, 1338–1345. https://doi.org/10.1016/j.phpro.2012.05.220
  • Zou, P. X., & Zhang, G. (2009). Comparative study on the perception of construction safety risks in China and Australia. Journal of Construction Engineering and Management, 135(7), 620–627. https://doi.org/10.1061/(ASCE)CO.1943-7862.0000019

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.