Abstract
This study introduces a hybrid Latent Dirichlet Allocation (LDA) model to excavate hidden crash patterns from the large-scale crash dataset. External semantic descriptions have been attached to raw GPS coordinates of crash events. The K-means clustering algorithm is first applied to determine land use characteristics of crash points by grouping surrounding Points of Interests (POIs). Then, each crash record is transformed into a formalized label consisting of land use, Annual Average Daily Traffic (AADT), and time stamps, allowing the analysis of massive traffic crash data as document corpora. Finally, a data-driven modeling approach based on the LDA is conducted to discover hidden crash patterns from traffic crash records combining the external semantic information. The approach is verified using motor vehicle crash data in Manhattan County of New York City. The novel semantic analysis of crash records provides an effective method to investigate the hidden information in traffic crashes. Identifying spatial-temporal patterns on motor vehicle crashes would provide insights into underlying traffic behaviors for intelligent policy-making and resource allocation.
Acknowledgments
The authors would like to thank the editor and the reviewers for their constructive comments and valuable suggestions to improve the quality of this article.
Disclosure statement
The authors declare that there is no conflict of interest regarding the publication of this paper.
Data availability statement
The data used in this paper are available from the corresponding author upon request.