ABSTRACT
Crash frequency modeling has been an active research topic in traffic safety, for which various techniques have been proposed that can be loosely classified as either statistical models or machine learning (ML) methods. Statistical models are suitable for drawing inferences and producing relationships that are verifiable by domain experts. However, they generally suffer from low predictive performance due to built-in assumptions about the crash data and adherence to prespecified functional forms. On the other hand, ML methods are data-driven and free from pre-supposed conditions on the dataset, yet they are often not interpretable. In this paper, a combination scheme is proposed to leverage the advantages of both techniques, and it is evaluated using crash data collected from urban highways in the state of Washington. The results show that this combination scheme could significantly improve the predictive performance and model fitness of statistical models without adversely impacting their interpretability.
Acknowledgments
This research was supported by a seed grant through the Penn State Institute of CyberScience. We would also like to acknowledge FHWA for providing the HSIS data used for the analysis.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Disclaimer
The contents of this paper reflect the views of the authors who are responsible for the facts and accuracy of the data presented herein. The contents do not necessarily reflect the official views or policies of the Federal Highway Administration or the Commonwealth of Pennsylvania at the time of publication. This paper does not constitute a standard, specification, or regulation.
Author contributions
The authors confirm contribution to the paper as follows: study conception and design: VG, DZ, JW; analysis and interpretation of results: VG, DZ, JW; draft manuscript preparation: VG, DZ, JW. All authors reviewed the results and approved the final version of the manuscript.