Abstract
Incident reporting and investigation are components of safety management systems. Timely and accurate identification of risk factors is crucial to effective prevention strategies. However, risk factor identification is often hampered by size, complexity, and the need for human involvement in categorizing incident data. We present a data-mining approach to incident risk factor identification and analysis using data from the Aviation Safety Reporting System, which is part of the Federal Aviation Administration. Our approach is an attempt to overcome obstacles related to labor intensive manual identification of risk factors as well as incomplete data. First, topical mining techniques convert underused textual data (incident narratives) to serve as model input. Second, data-streaming algorithms are used to incrementally build and test classification models for risk factor identification. Three different classification algorithms were tested providing overall accuracy rates ranging from 76 percent to 88 percent, demonstrating the potential for effective use of large and unstructured incident data in safety management. Our research presents and demonstrates an approach to automated incident type identification and contributes to our understanding of the use of text-mining and data-streaming technologies in improving safety management systems.
Acknowledgment
This work was supported in part by (1) Anhui Provincial Natural Science Foundation of China (1508085MF114); (2) Technology Foundation for Selected Overseas Chinese Scholar (2014); and (3) Anhui Provincial Science Foundation for Youths (1508085QF137).
Additional information
Notes on contributors
Donghui Shi
Donghui Shi ([email protected]) is a professor in the Department of Computer Engineering, School of Electronics and Information Engineering, Anhui Jianzhu University, China. He received his Ph.D. in computer science from the University of Science and Technology of China. His research interests focus on machine learning and data mining, especially in adaptive neuro-fuzzy inference systems, data stream, and text mining. His recent work applies data-mining algorithms to solve practical problems in aviation safety, real estate, biomedicine, and power load forecasting.
Jian Guan
Jian Guan ([email protected]) is an associate professor of computer information systems in the College of Business, University of Louisville, Kentucky. He received his Ph.D. in computer science and engineering from the Speed Scientific School, University of Louisville. His research interests lie in data mining and its applications.
Jozef Zurada
Jozef Zurada ([email protected]; corresponding author) is a professor in the Department of Computer Information Systems, College of Business, University of Louisville and a professor at WSB Gdansk, Poland. He received his Ph.D. in computer science engineering from University of Louisville, Kentucky, and D.Sc. from the Polish Academy of Sciences, Warsaw, Poland. His research interests include applications of advanced computational intelligence methods for assisting in decision making in business and manufacturing systems and streaming data analytics.
Andrew Manikas
Andrew Manikas ([email protected]) is an assistant professor in the Management Department at the University of Louisville. He earned his Ph.D. from Georgia Institute of Technology. He was previously a management consultant for KPMG Peat Marwick, CSC, and Deloitte Consulting. He is a CCP (certified computing professional).