Abstract
Accident investigation reports provide useful knowledge to support companies to propose preventive and mitigative measures. However, the information presented in accident report databases is normally large, complex, filled with errors and has missing and/or redundant data. In this article, we propose text mining and natural language processing techniques to investigate low-quality accident reports. We adopted machine learning (ML) to detect and investigate inconsistencies on accident reports. The methodology was applied to 626 documents collected from an actual hydroelectric power company. The initial ML performances indicated data divergences and concerns related to the report structure. Then, the accident database was restructured to a more proper form confirming the supposition about the quality of the reports investigated. The proposed approach can be used as a diagnostic tool to improve the design of accident investigation reports to provide a more useful source of knowledge to support decisions in the safety context.
Acknowledgements
The authors thank the National Agency for Research (CNPq), the Foundation of Support for Science and Technology of Pernambuco (FACEPE), Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – Brasil and the ‘Human Resources Program (PRH) da National Oil Company (ANP) and Finep (Brazilian Innovation Agency) – PRH-ANP 38.1: ‘Risk Analysis and Environmental Modeling in Exploration, Development and Production of Oil and Gas’ for financial support through research grants.
Disclosure statement
No potential conflict of interest was reported by the authors.