Abstract
Crash data are often characterized with numerous zero observations. Sometimes, the number of zero observations is directly correlated with the selected spatial and/or temporal scales for data aggregation. Finding a balance in aggregation is a critical task in data preparation. On the one hand, using the disaggregated data may result in having excessive zero observations, in which the popular negative binomial model may not be adequate for the safety analysis. On the other hand, too much aggregation may result in loss of information. This paper documents a simulation study that aimed at determining criteria for deciding when data aggregation is needed. The simulation study explores the information loss due to aggregation as a function of precision or accuracy in estimation of model coefficients. The simulation results indicate that the reduction in variability, i.e. coefficient of variation, of the independent variables after aggregation is important criteria to decide on the aggregation level.
Acknowledgments
Support for this research was provided in part by a grant from the U.S. Department of Transportation, University Transportation Centers Program to the Safety through Disruption (Safe-D) University Transportation Center (451453-19C36). [Disclaimer: The contents of this paper reflect the views of the authors, who are responsible for the facts and the accuracy of the information presented herein. This document is disseminated in the interest of information exchange. The report was funded, partially or entirely, by a grant from the U.S. Department of Transportation’s University Transportation Centers Program. However, the U.S. Government assumes no liability for the contents or use thereof.]
Disclosure statement
No potential conflict of interest was reported by the author(s).