Abstract
This paper presents the development of a new model for predicting traffic incident duration using random forests (RFs), a data-driven machine learning technique. Utilizing an extensive dataset with over 140,000 incident records and 52 variables, the developed models were optimized by fine-tuning their parameters. The best-performing RF model achieved a mean absolute error (MAE) of 36.652 min, which is acceptable given the wide range of incident duration considered (1–1,440 min). Another set of models was developed using a short range of 5- to 120-minute incident duration. The performance of the best models for the short range improved significantly, i.e. the MAE decreased to 14.979 min (about a 40% reduction). In comparison, the ANN models developed using the same dataset slightly outperformed (only 0.24%) their RF counterparts; nevertheless, the RF models showed more stable results with a small-error range. Further analysis confirmed that the accuracy of the predictions could be slightly downgraded in return for a substantial reduction in the number of variables utilized.
Acknowledgement
The authors would like to acknowledge the help received from Houston TranStar and the Texas A&M Transportation Institute, especially in providing us with the data used to complete this research study. The authors also acknowledge that all models developed in this study used the H2O platform (https://www.h2o.ai/).
Disclosure statement
No potential conflict of interest was reported by the author(s).
ORCID
Khaled Hamad http://orcid.org/0000-0002-8110-1115
Rami Al-Ruzouq http://orcid.org/0000-0001-7111-0061
Waleed Zeiada http://orcid.org/0000-0003-2248-5208
Saleh Abu Dabous http://orcid.org/0000-0002-8777-2331
Mohamad Ali Khalil http://orcid.org/0000-0002-3338-0092