219
Views
2
CrossRef citations to date
0
Altmetric
Articles

Crash narrative classification: Identifying agricultural crashes using machine learning with curated keywords

, ORCID Icon, ORCID Icon & ORCID Icon
Pages 74-78 | Received 07 Apr 2020, Accepted 08 Oct 2020, Published online: 18 Nov 2020
 

Abstract

Objective

Traditionally, structured or coded data fields from a crash report are the basis for identifying crashes involving different types of vehicles, such as farm equipment. However, using only the structured data can lead to misclassification of vehicle or crash type. The objective of the current article is to examine the use of machine learning methods for identifying agricultural crashes based on the crash narrative and to transfer the application of models to different settings (e.g., future years of data, other states).

Methods

Different data representations (e.g., bag-of-words [BoW], bag-of-keywords [BoK]) and document classification algorithms (e.g., support vector machine [SVM], multinomial naïve Bayes classifier [MNB]) were explored using Texas and Louisiana crash narratives across different time periods.

Results

The BoK-support vector classifier (SVC), BoK-MNB, and BoW-SVC models trained with Texas data were better predictive models than the baseline rule-based algorithm on the future year test data, with F1 scores of 0.88, 0.89, 0.85 vs. 0.84. The BoK-MNB trained with Louisiana data performed the closest to the baseline rule-based algorithm on the future year test data (F1 scores, 0.91 baseline rule-based algorithm vs. 0.89 BoK-MNB). The BoK-SVC and BoK-MNB models trained with Texas and Louisiana data were better productive models for Texas future year test data with F1 scores 0.89 and 0.90 vs. 0.84. The BoK-MNB model trained with both states’ data was a better predictive model for the Louisiana future year test data, F1 score 0.94 vs. 0.91.

Conclusions

The findings of this study support that machine learning methodologies can potentially reduce the amount of human power required to develop key word lists and manually review narratives.

Acknowledgment

The authors thank the LaDOTD for assistance with providing the data and providing clarification as needed.

Data availability statement

The data sets generated during the current study are not publicly available due to data use agreements. Data can be requested through TxDOT and LaDOTD.

Additional information

Funding

This research was supported by CDC/NIOSH under Cooperative Agreement No. U50 OH07541 to the Southwest Center for Agricultural Health, Injury Prevention, and Education at the University of Texas Health Science Center at Tyler. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of CDC/NIOSH.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 331.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.